Document (#34742)

Author
Stamatatos, E.
Title
¬A survey of modern authorship attribution methods
Source
Journal of the American Society for Information Science and Technology. 60(2009) no.3, S.538-556
Year
2009
Abstract
Authorship attribution supported by statistical or computational methods has a long history starting from the 19th century and is marked by the seminal study of Mosteller and Wallace (1964) on the authorship of the disputed Federalist Papers. During the last decade, this scientific field has been developed substantially, taking advantage of research advances in areas such as machine learning, information retrieval, and natural language processing. The plethora of available electronic texts (e.g., e-mail messages, online forum messages, blogs, source code, etc.) indicates a wide variety of applications of this technology, provided it is able to handle short and noisy text from multiple candidate authors. In this article, a survey of recent advances of the automated approaches to attributing authorship is presented, examining their characteristics for both text representation and text classification. The focus of this survey is on computational requirements and settings rather than on linguistic or literary issues. We also discuss evaluation methodologies and criteria for authorship attribution studies and list open questions that will attract future work in this area.

Similar documents (author)

  1. Stamatatos, E.: Author identification : using text sampling to handle the class imbalance problem (2008) 6.19
    6.190705 = sum of:
      6.190705 = weight(author_txt:stamatatos in 2063) [ClassicSimilarity], result of:
        6.190705 = fieldWeight in 2063, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.905128 = idf(docFreq=5, maxDocs=44218)
          0.625 = fieldNorm(doc=2063)
    
  2. Stamatatos, E.: Plagiarism detection using stopword n-grams (2011) 6.19
    6.190705 = sum of:
      6.190705 = weight(author_txt:stamatatos in 4955) [ClassicSimilarity], result of:
        6.190705 = fieldWeight in 4955, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.905128 = idf(docFreq=5, maxDocs=44218)
          0.625 = fieldNorm(doc=4955)
    
  3. Stamatatos, E.: Masking topic-related information to enhance authorship attribution (2018) 6.19
    6.190705 = sum of:
      6.190705 = weight(author_txt:stamatatos in 4124) [ClassicSimilarity], result of:
        6.190705 = fieldWeight in 4124, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.905128 = idf(docFreq=5, maxDocs=44218)
          0.625 = fieldNorm(doc=4124)
    
  4. Potha, N.; Stamatatos, E.: Improving author verification based on topic modeling (2019) 4.95
    4.952564 = sum of:
      4.952564 = weight(author_txt:stamatatos in 5385) [ClassicSimilarity], result of:
        4.952564 = fieldWeight in 5385, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.905128 = idf(docFreq=5, maxDocs=44218)
          0.5 = fieldNorm(doc=5385)
    

Similar documents (content)

  1. Koppel, M.; Schler, J.; Argamon, S.: Computational methods in authorship attribution (2009) 0.26
    0.26302287 = sum of:
      0.26302287 = product of:
        0.9393674 = sum of:
          0.04478386 = weight(abstract_txt:handle in 2683) [ClassicSimilarity], result of:
            0.04478386 = score(doc=2683,freq=1.0), product of:
              0.105959475 = queryWeight, product of:
                1.0306847 = boost
                6.7624135 = idf(docFreq=138, maxDocs=44218)
                0.015202403 = queryNorm
              0.42265084 = fieldWeight in 2683, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7624135 = idf(docFreq=138, maxDocs=44218)
                0.0625 = fieldNorm(doc=2683)
          0.09868102 = weight(abstract_txt:candidate in 2683) [ClassicSimilarity], result of:
            0.09868102 = score(doc=2683,freq=3.0), product of:
              0.12440584 = queryWeight, product of:
                1.1168023 = boost
                7.3274393 = idf(docFreq=78, maxDocs=44218)
                0.015202403 = queryNorm
              0.79321855 = fieldWeight in 2683, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.3274393 = idf(docFreq=78, maxDocs=44218)
                0.0625 = fieldNorm(doc=2683)
          0.0292069 = weight(abstract_txt:methods in 2683) [ClassicSimilarity], result of:
            0.0292069 = score(doc=2683,freq=2.0), product of:
              0.07968607 = queryWeight, product of:
                1.2640438 = boost
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.015202403 = queryNorm
              0.36652455 = fieldWeight in 2683, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.0625 = fieldNorm(doc=2683)
          0.020346878 = weight(abstract_txt:this in 2683) [ClassicSimilarity], result of:
            0.020346878 = score(doc=2683,freq=4.0), product of:
              0.06745704 = queryWeight, product of:
                1.838885 = boost
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.015202403 = queryNorm
              0.3016272 = fieldWeight in 2683, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.0625 = fieldNorm(doc=2683)
          0.0287295 = weight(abstract_txt:text in 2683) [ClassicSimilarity], result of:
            0.0287295 = score(doc=2683,freq=1.0), product of:
              0.11367141 = queryWeight, product of:
                1.8490224 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.015202403 = queryNorm
              0.25274166 = fieldWeight in 2683, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=2683)
          0.3696293 = weight(abstract_txt:attribution in 2683) [ClassicSimilarity], result of:
            0.3696293 = score(doc=2683,freq=3.0), product of:
              0.43274927 = queryWeight, product of:
                3.6077359 = boost
                7.890225 = idf(docFreq=44, maxDocs=44218)
                0.015202403 = queryNorm
              0.8541419 = fieldWeight in 2683, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.890225 = idf(docFreq=44, maxDocs=44218)
                0.0625 = fieldNorm(doc=2683)
          0.34798998 = weight(abstract_txt:authorship in 2683) [ClassicSimilarity], result of:
            0.34798998 = score(doc=2683,freq=2.0), product of:
              0.5641786 = queryWeight, product of:
                5.318011 = boost
                6.9783883 = idf(docFreq=111, maxDocs=44218)
                0.015202403 = queryNorm
              0.6168082 = fieldWeight in 2683, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.9783883 = idf(docFreq=111, maxDocs=44218)
                0.0625 = fieldNorm(doc=2683)
        0.28 = coord(7/25)
    
  2. Savoy, J.: Estimating the probability of an authorship attribution (2016) 0.24
    0.2399305 = sum of:
      0.2399305 = product of:
        0.99971044 = sum of:
          0.020652397 = weight(abstract_txt:methods in 2937) [ClassicSimilarity], result of:
            0.020652397 = score(doc=2937,freq=1.0), product of:
              0.07968607 = queryWeight, product of:
                1.2640438 = boost
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.015202403 = queryNorm
              0.259172 = fieldWeight in 2937, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.0625 = fieldNorm(doc=2937)
          0.12414832 = weight(abstract_txt:disputed in 2937) [ClassicSimilarity], result of:
            0.12414832 = score(doc=2937,freq=1.0), product of:
              0.20909934 = queryWeight, product of:
                1.447879 = boost
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.015202403 = queryNorm
              0.5937289 = fieldWeight in 2937, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.0625 = fieldNorm(doc=2937)
          0.020346878 = weight(abstract_txt:this in 2937) [ClassicSimilarity], result of:
            0.020346878 = score(doc=2937,freq=4.0), product of:
              0.06745704 = queryWeight, product of:
                1.838885 = boost
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.015202403 = queryNorm
              0.3016272 = fieldWeight in 2937, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.0625 = fieldNorm(doc=2937)
          0.04062965 = weight(abstract_txt:text in 2937) [ClassicSimilarity], result of:
            0.04062965 = score(doc=2937,freq=2.0), product of:
              0.11367141 = queryWeight, product of:
                1.8490224 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.015202403 = queryNorm
              0.3574307 = fieldWeight in 2937, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=2937)
          0.30180103 = weight(abstract_txt:attribution in 2937) [ClassicSimilarity], result of:
            0.30180103 = score(doc=2937,freq=2.0), product of:
              0.43274927 = queryWeight, product of:
                3.6077359 = boost
                7.890225 = idf(docFreq=44, maxDocs=44218)
                0.015202403 = queryNorm
              0.6974039 = fieldWeight in 2937, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.890225 = idf(docFreq=44, maxDocs=44218)
                0.0625 = fieldNorm(doc=2937)
          0.49213216 = weight(abstract_txt:authorship in 2937) [ClassicSimilarity], result of:
            0.49213216 = score(doc=2937,freq=4.0), product of:
              0.5641786 = queryWeight, product of:
                5.318011 = boost
                6.9783883 = idf(docFreq=111, maxDocs=44218)
                0.015202403 = queryNorm
              0.87229854 = fieldWeight in 2937, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.9783883 = idf(docFreq=111, maxDocs=44218)
                0.0625 = fieldNorm(doc=2937)
        0.24 = coord(6/25)
    
  3. Stover, J.A.; Winter, Y.; Koppel, M.; Kestemont, M.: Computational authorship verification method attributes a new work to a major 2nd century African author (2016) 0.23
    0.23020728 = sum of:
      0.23020728 = product of:
        0.8221688 = sum of:
          0.056973513 = weight(abstract_txt:candidate in 2503) [ClassicSimilarity], result of:
            0.056973513 = score(doc=2503,freq=1.0), product of:
              0.12440584 = queryWeight, product of:
                1.1168023 = boost
                7.3274393 = idf(docFreq=78, maxDocs=44218)
                0.015202403 = queryNorm
              0.45796496 = fieldWeight in 2503, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3274393 = idf(docFreq=78, maxDocs=44218)
                0.0625 = fieldNorm(doc=2503)
          0.020652397 = weight(abstract_txt:methods in 2503) [ClassicSimilarity], result of:
            0.020652397 = score(doc=2503,freq=1.0), product of:
              0.07968607 = queryWeight, product of:
                1.2640438 = boost
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.015202403 = queryNorm
              0.259172 = fieldWeight in 2503, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.0625 = fieldNorm(doc=2503)
          0.020346878 = weight(abstract_txt:this in 2503) [ClassicSimilarity], result of:
            0.020346878 = score(doc=2503,freq=4.0), product of:
              0.06745704 = queryWeight, product of:
                1.838885 = boost
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.015202403 = queryNorm
              0.3016272 = fieldWeight in 2503, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.0625 = fieldNorm(doc=2503)
          0.057459 = weight(abstract_txt:text in 2503) [ClassicSimilarity], result of:
            0.057459 = score(doc=2503,freq=4.0), product of:
              0.11367141 = queryWeight, product of:
                1.8490224 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.015202403 = queryNorm
              0.5054833 = fieldWeight in 2503, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=2503)
          0.10534148 = weight(abstract_txt:computational in 2503) [ClassicSimilarity], result of:
            0.10534148 = score(doc=2503,freq=2.0), product of:
              0.18740955 = queryWeight, product of:
                1.9385043 = boost
                6.3593493 = idf(docFreq=207, maxDocs=44218)
                0.015202403 = queryNorm
              0.56209236 = fieldWeight in 2503, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.3593493 = idf(docFreq=207, maxDocs=44218)
                0.0625 = fieldNorm(doc=2503)
          0.21340556 = weight(abstract_txt:attribution in 2503) [ClassicSimilarity], result of:
            0.21340556 = score(doc=2503,freq=1.0), product of:
              0.43274927 = queryWeight, product of:
                3.6077359 = boost
                7.890225 = idf(docFreq=44, maxDocs=44218)
                0.015202403 = queryNorm
              0.49313906 = fieldWeight in 2503, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.890225 = idf(docFreq=44, maxDocs=44218)
                0.0625 = fieldNorm(doc=2503)
          0.34798998 = weight(abstract_txt:authorship in 2503) [ClassicSimilarity], result of:
            0.34798998 = score(doc=2503,freq=2.0), product of:
              0.5641786 = queryWeight, product of:
                5.318011 = boost
                6.9783883 = idf(docFreq=111, maxDocs=44218)
                0.015202403 = queryNorm
              0.6168082 = fieldWeight in 2503, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.9783883 = idf(docFreq=111, maxDocs=44218)
                0.0625 = fieldNorm(doc=2503)
        0.28 = coord(7/25)
    
  4. Sebastiani, F.: Classification of text, automatic (2006) 0.23
    0.22958456 = sum of:
      0.22958456 = product of:
        0.95660233 = sum of:
          0.015260159 = weight(abstract_txt:this in 5003) [ClassicSimilarity], result of:
            0.015260159 = score(doc=5003,freq=1.0), product of:
              0.06745704 = queryWeight, product of:
                1.838885 = boost
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.015202403 = queryNorm
              0.2262204 = fieldWeight in 5003, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.09375 = fieldNorm(doc=5003)
          0.060944475 = weight(abstract_txt:text in 5003) [ClassicSimilarity], result of:
            0.060944475 = score(doc=5003,freq=2.0), product of:
              0.11367141 = queryWeight, product of:
                1.8490224 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.015202403 = queryNorm
              0.53614604 = fieldWeight in 5003, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.09375 = fieldNorm(doc=5003)
          0.111731514 = weight(abstract_txt:computational in 5003) [ClassicSimilarity], result of:
            0.111731514 = score(doc=5003,freq=1.0), product of:
              0.18740955 = queryWeight, product of:
                1.9385043 = boost
                6.3593493 = idf(docFreq=207, maxDocs=44218)
                0.015202403 = queryNorm
              0.596189 = fieldWeight in 5003, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.3593493 = idf(docFreq=207, maxDocs=44218)
                0.09375 = fieldNorm(doc=5003)
          0.0794587 = weight(abstract_txt:survey in 5003) [ClassicSimilarity], result of:
            0.0794587 = score(doc=5003,freq=1.0), product of:
              0.17092253 = queryWeight, product of:
                2.2673378 = boost
                4.9587345 = idf(docFreq=843, maxDocs=44218)
                0.015202403 = queryNorm
              0.46488136 = fieldWeight in 5003, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9587345 = idf(docFreq=843, maxDocs=44218)
                0.09375 = fieldNorm(doc=5003)
          0.32010835 = weight(abstract_txt:attribution in 5003) [ClassicSimilarity], result of:
            0.32010835 = score(doc=5003,freq=1.0), product of:
              0.43274927 = queryWeight, product of:
                3.6077359 = boost
                7.890225 = idf(docFreq=44, maxDocs=44218)
                0.015202403 = queryNorm
              0.7397086 = fieldWeight in 5003, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.890225 = idf(docFreq=44, maxDocs=44218)
                0.09375 = fieldNorm(doc=5003)
          0.36909914 = weight(abstract_txt:authorship in 5003) [ClassicSimilarity], result of:
            0.36909914 = score(doc=5003,freq=1.0), product of:
              0.5641786 = queryWeight, product of:
                5.318011 = boost
                6.9783883 = idf(docFreq=111, maxDocs=44218)
                0.015202403 = queryNorm
              0.6542239 = fieldWeight in 5003, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9783883 = idf(docFreq=111, maxDocs=44218)
                0.09375 = fieldNorm(doc=5003)
        0.24 = coord(6/25)
    
  5. Stamatatos, E.: Masking topic-related information to enhance authorship attribution (2018) 0.21
    0.20905112 = sum of:
      0.20905112 = product of:
        1.0452555 = sum of:
          0.041304793 = weight(abstract_txt:methods in 4124) [ClassicSimilarity], result of:
            0.041304793 = score(doc=4124,freq=4.0), product of:
              0.07968607 = queryWeight, product of:
                1.2640438 = boost
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.015202403 = queryNorm
              0.518344 = fieldWeight in 4124, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.0625 = fieldNorm(doc=4124)
          0.014387416 = weight(abstract_txt:this in 4124) [ClassicSimilarity], result of:
            0.014387416 = score(doc=4124,freq=2.0), product of:
              0.06745704 = queryWeight, product of:
                1.838885 = boost
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.015202403 = queryNorm
              0.21328263 = fieldWeight in 4124, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.0625 = fieldNorm(doc=4124)
          0.04062965 = weight(abstract_txt:text in 4124) [ClassicSimilarity], result of:
            0.04062965 = score(doc=4124,freq=2.0), product of:
              0.11367141 = queryWeight, product of:
                1.8490224 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.015202403 = queryNorm
              0.3574307 = fieldWeight in 4124, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=4124)
          0.52273476 = weight(abstract_txt:attribution in 4124) [ClassicSimilarity], result of:
            0.52273476 = score(doc=4124,freq=6.0), product of:
              0.43274927 = queryWeight, product of:
                3.6077359 = boost
                7.890225 = idf(docFreq=44, maxDocs=44218)
                0.015202403 = queryNorm
              1.2079391 = fieldWeight in 4124, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.890225 = idf(docFreq=44, maxDocs=44218)
                0.0625 = fieldNorm(doc=4124)
          0.42619893 = weight(abstract_txt:authorship in 4124) [ClassicSimilarity], result of:
            0.42619893 = score(doc=4124,freq=3.0), product of:
              0.5641786 = queryWeight, product of:
                5.318011 = boost
                6.9783883 = idf(docFreq=111, maxDocs=44218)
                0.015202403 = queryNorm
              0.75543267 = fieldWeight in 4124, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.9783883 = idf(docFreq=111, maxDocs=44218)
                0.0625 = fieldNorm(doc=4124)
        0.2 = coord(5/25)