Search (1 results, page 1 of 1)

  • × author_ss:"Stamatatos, E."
  • × theme_ss:"Formalerschließung"
  • × year_i:[2010 TO 2020}
  1. Potha, N.; Stamatatos, E.: Improving author verification based on topic modeling (2019) 0.01
    0.0067028617 = product of:
      0.020108584 = sum of:
        0.020108584 = product of:
          0.04021717 = sum of:
            0.04021717 = weight(_text_:indexing in 5385) [ClassicSimilarity], result of:
              0.04021717 = score(doc=5385,freq=2.0), product of:
                0.19018644 = queryWeight, product of:
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.049684696 = queryNorm
                0.21146181 = fieldWeight in 5385, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5385)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Authorship analysis attempts to reveal information about authors of digital documents enabling applications in digital humanities, text forensics, and cyber-security. Author verification is a fundamental task where, given a set of texts written by a certain author, we should decide whether another text is also by that author. In this article we systematically study the usefulness of topic modeling in author verification. We examine several author verification methods that cover the main paradigms, namely, intrinsic (attempt to solve a one-class classification task) and extrinsic (attempt to solve a binary classification task) methods as well as profile-based (all documents of known authorship are treated cumulatively) and instance-based (each document of known authorship is treated separately) approaches combined with well-known topic modeling methods such as Latent Semantic Indexing (LSI) and Latent Dirichlet Allocation (LDA). We use benchmark data sets and demonstrate that LDA is better combined with extrinsic methods, while the most effective intrinsic method is based on LSI. Moreover, topic modeling seems to be particularly effective for profile-based approaches and the performance is enhanced when latent topics are extracted by an enriched set of documents. The comparison to state-of-the-art methods demonstrates the great potential of the approaches presented in this study. It is also demonstrates that even when genre-agnostic external documents are used, the proposed extrinsic models are very competitive.