Document (#40331)

Author
Kocher, M.
Savoy, J.
Title
¬A simple and efficient algorithm for authorship verification
Source
Journal of the Association for Information Science and Technology. 68(2017) no.1, S.259-269
Year
2017
Abstract
This paper describes and evaluates an unsupervised and effective authorship verification model called Spatium-L1. As features, we suggest using the 200 most frequent terms of the disputed text (isolated words and punctuation symbols). Applying a simple distance measure and a set of impostors, we can determine whether or not the disputed text was written by the proposed author. Moreover, based on a simple rule we can define when there is enough evidence to propose an answer or when the attribution scheme is unable to make a decision with a high degree of certainty. Evaluations based on 6 test collections (PAN CLEF 2014 evaluation campaign) indicate that Spatium-L1 usually appears in the top 3 best verification systems, and on an aggregate measure, presents the best performance. The suggested strategy can be adapted without any problem to different Indo-European languages (such as English, Dutch, Spanish, and Greek) or genres (essay, novel, review, and newspaper article).
Content
Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23648/full.
Theme
Formalerschließung

Similar documents (author)

  1. Savoy, J.: Stemming of French words based on grammatical categories (1993) 5.21
    5.2059946 = sum of:
      5.2059946 = weight(author_txt:savoy in 4650) [ClassicSimilarity], result of:
        5.2059946 = fieldWeight in 4650, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.329592 = idf(docFreq=28, maxDocs=44218)
          0.625 = fieldNorm(doc=4650)
    
  2. Savoy, J.: Effectiveness of information retrieval systems used in a hypertext environment (1993) 5.21
    5.2059946 = sum of:
      5.2059946 = weight(author_txt:savoy in 6511) [ClassicSimilarity], result of:
        5.2059946 = fieldWeight in 6511, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.329592 = idf(docFreq=28, maxDocs=44218)
          0.625 = fieldNorm(doc=6511)
    
  3. Savoy, J.: ¬A learning scheme for information retrieval in hypertext (1994) 5.21
    5.2059946 = sum of:
      5.2059946 = weight(author_txt:savoy in 7292) [ClassicSimilarity], result of:
        5.2059946 = fieldWeight in 7292, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.329592 = idf(docFreq=28, maxDocs=44218)
          0.625 = fieldNorm(doc=7292)
    
  4. Savoy, J.: Bayesian inference networks and spreading activation in hypertext systems (1992) 5.21
    5.2059946 = sum of:
      5.2059946 = weight(author_txt:savoy in 192) [ClassicSimilarity], result of:
        5.2059946 = fieldWeight in 192, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.329592 = idf(docFreq=28, maxDocs=44218)
          0.625 = fieldNorm(doc=192)
    
  5. Savoy, J.: Searching information in legal hypertext systems (1993/94) 5.21
    5.2059946 = sum of:
      5.2059946 = weight(author_txt:savoy in 757) [ClassicSimilarity], result of:
        5.2059946 = fieldWeight in 757, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.329592 = idf(docFreq=28, maxDocs=44218)
          0.625 = fieldNorm(doc=757)
    

Similar documents (content)

  1. Adamovic, S.; Miskovic, V.; Milosavljevic, M.; Sarac, M.; Veinovic, M.: Automated language-independent authorship verification (for Indo-European languages) : facilitating adaptive visual exploration of scientific publications by citation links (2019) 0.30
    0.299809 = sum of:
      0.299809 = product of:
        1.0707464 = sum of:
          0.053140104 = weight(abstract_txt:spanish in 5327) [ClassicSimilarity], result of:
            0.053140104 = score(doc=5327,freq=1.0), product of:
              0.12214907 = queryWeight, product of:
                1.0037457 = boost
                6.9606886 = idf(docFreq=113, maxDocs=44218)
                0.017482933 = queryNorm
              0.43504304 = fieldWeight in 5327, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9606886 = idf(docFreq=113, maxDocs=44218)
                0.0625 = fieldNorm(doc=5327)
          0.09856769 = weight(abstract_txt:dutch in 5327) [ClassicSimilarity], result of:
            0.09856769 = score(doc=5327,freq=2.0), product of:
              0.14635974 = queryWeight, product of:
                1.098726 = boost
                7.61935 = idf(docFreq=58, maxDocs=44218)
                0.017482933 = queryNorm
              0.67346174 = fieldWeight in 5327, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.61935 = idf(docFreq=58, maxDocs=44218)
                0.0625 = fieldNorm(doc=5327)
          0.02083945 = weight(abstract_txt:text in 5327) [ClassicSimilarity], result of:
            0.02083945 = score(doc=5327,freq=1.0), product of:
              0.082453564 = queryWeight, product of:
                1.1662679 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.017482933 = queryNorm
              0.25274166 = fieldWeight in 5327, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=5327)
          0.08995458 = weight(abstract_txt:greek in 5327) [ClassicSimilarity], result of:
            0.08995458 = score(doc=5327,freq=1.0), product of:
              0.17349651 = queryWeight, product of:
                1.1962556 = boost
                8.29569 = idf(docFreq=29, maxDocs=44218)
                0.017482933 = queryNorm
              0.5184806 = fieldWeight in 5327, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.29569 = idf(docFreq=29, maxDocs=44218)
                0.0625 = fieldNorm(doc=5327)
          0.022496555 = weight(abstract_txt:when in 5327) [ClassicSimilarity], result of:
            0.022496555 = score(doc=5327,freq=1.0), product of:
              0.0867686 = queryWeight, product of:
                1.1963959 = boost
                4.148331 = idf(docFreq=1897, maxDocs=44218)
                0.017482933 = queryNorm
              0.2592707 = fieldWeight in 5327, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.148331 = idf(docFreq=1897, maxDocs=44218)
                0.0625 = fieldNorm(doc=5327)
          0.2394673 = weight(abstract_txt:authorship in 5327) [ClassicSimilarity], result of:
            0.2394673 = score(doc=5327,freq=5.0), product of:
              0.24554215 = queryWeight, product of:
                2.0125961 = boost
                6.9783883 = idf(docFreq=111, maxDocs=44218)
                0.017482933 = queryNorm
              0.9752594 = fieldWeight in 5327, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.9783883 = idf(docFreq=111, maxDocs=44218)
                0.0625 = fieldNorm(doc=5327)
          0.54628074 = weight(abstract_txt:verification in 5327) [ClassicSimilarity], result of:
            0.54628074 = score(doc=5327,freq=6.0), product of:
              0.45836258 = queryWeight, product of:
                3.3677807 = boost
                7.7848644 = idf(docFreq=49, maxDocs=44218)
                0.017482933 = queryNorm
              1.1918092 = fieldWeight in 5327, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.7848644 = idf(docFreq=49, maxDocs=44218)
                0.0625 = fieldNorm(doc=5327)
        0.28 = coord(7/25)
    
  2. Savoy, J.: Estimating the probability of an authorship attribution (2016) 0.24
    0.2447235 = sum of:
      0.2447235 = product of:
        0.8740125 = sum of:
          0.06264321 = weight(abstract_txt:2014 in 2937) [ClassicSimilarity], result of:
            0.06264321 = score(doc=2937,freq=1.0), product of:
              0.13630901 = queryWeight, product of:
                1.0603296 = boost
                7.3530817 = idf(docFreq=76, maxDocs=44218)
                0.017482933 = queryNorm
              0.4595676 = fieldWeight in 2937, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3530817 = idf(docFreq=76, maxDocs=44218)
                0.0625 = fieldNorm(doc=2937)
          0.10945835 = weight(abstract_txt:attribution in 2937) [ClassicSimilarity], result of:
            0.10945835 = score(doc=2937,freq=2.0), product of:
              0.15695116 = queryWeight, product of:
                1.1377867 = boost
                7.890225 = idf(docFreq=44, maxDocs=44218)
                0.017482933 = queryNorm
              0.6974039 = fieldWeight in 2937, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.890225 = idf(docFreq=44, maxDocs=44218)
                0.0625 = fieldNorm(doc=2937)
          0.029471435 = weight(abstract_txt:text in 2937) [ClassicSimilarity], result of:
            0.029471435 = score(doc=2937,freq=2.0), product of:
              0.082453564 = queryWeight, product of:
                1.1662679 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.017482933 = queryNorm
              0.3574307 = fieldWeight in 2937, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=2937)
          0.022496555 = weight(abstract_txt:when in 2937) [ClassicSimilarity], result of:
            0.022496555 = score(doc=2937,freq=1.0), product of:
              0.0867686 = queryWeight, product of:
                1.1963959 = boost
                4.148331 = idf(docFreq=1897, maxDocs=44218)
                0.017482933 = queryNorm
              0.2592707 = fieldWeight in 2937, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.148331 = idf(docFreq=1897, maxDocs=44218)
                0.0625 = fieldNorm(doc=2937)
          0.1655973 = weight(abstract_txt:certainty in 2937) [ClassicSimilarity], result of:
            0.1655973 = score(doc=2937,freq=2.0), product of:
              0.2068398 = queryWeight, product of:
                1.3061578 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.017482933 = queryNorm
              0.8006066 = fieldWeight in 2937, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.0625 = fieldNorm(doc=2937)
          0.21418606 = weight(abstract_txt:authorship in 2937) [ClassicSimilarity], result of:
            0.21418606 = score(doc=2937,freq=4.0), product of:
              0.24554215 = queryWeight, product of:
                2.0125961 = boost
                6.9783883 = idf(docFreq=111, maxDocs=44218)
                0.017482933 = queryNorm
              0.87229854 = fieldWeight in 2937, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.9783883 = idf(docFreq=111, maxDocs=44218)
                0.0625 = fieldNorm(doc=2937)
          0.2701595 = weight(abstract_txt:disputed in 2937) [ClassicSimilarity], result of:
            0.2701595 = score(doc=2937,freq=1.0), product of:
              0.45502168 = queryWeight, product of:
                2.7397418 = boost
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.017482933 = queryNorm
              0.5937289 = fieldWeight in 2937, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.0625 = fieldNorm(doc=2937)
        0.28 = coord(7/25)
    
  3. Schaalje, G.B.; Blades, N.J.; Funai, T.: ¬An open-set size-adjusted Bayesian classifier for authorship attribution (2013) 0.13
    0.12819508 = sum of:
      0.12819508 = product of:
        0.64097536 = sum of:
          0.10945835 = weight(abstract_txt:attribution in 1041) [ClassicSimilarity], result of:
            0.10945835 = score(doc=1041,freq=2.0), product of:
              0.15695116 = queryWeight, product of:
                1.1377867 = boost
                7.890225 = idf(docFreq=44, maxDocs=44218)
                0.017482933 = queryNorm
              0.6974039 = fieldWeight in 1041, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.890225 = idf(docFreq=44, maxDocs=44218)
                0.0625 = fieldNorm(doc=1041)
          0.036094986 = weight(abstract_txt:text in 1041) [ClassicSimilarity], result of:
            0.036094986 = score(doc=1041,freq=3.0), product of:
              0.082453564 = queryWeight, product of:
                1.1662679 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.017482933 = queryNorm
              0.4377614 = fieldWeight in 1041, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=1041)
          0.039771944 = weight(abstract_txt:best in 1041) [ClassicSimilarity], result of:
            0.039771944 = score(doc=1041,freq=1.0), product of:
              0.12686343 = queryWeight, product of:
                1.4466445 = boost
                5.0160327 = idf(docFreq=796, maxDocs=44218)
                0.017482933 = queryNorm
              0.31350204 = fieldWeight in 1041, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0160327 = idf(docFreq=796, maxDocs=44218)
                0.0625 = fieldNorm(doc=1041)
          0.18549056 = weight(abstract_txt:authorship in 1041) [ClassicSimilarity], result of:
            0.18549056 = score(doc=1041,freq=3.0), product of:
              0.24554215 = queryWeight, product of:
                2.0125961 = boost
                6.9783883 = idf(docFreq=111, maxDocs=44218)
                0.017482933 = queryNorm
              0.75543267 = fieldWeight in 1041, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.9783883 = idf(docFreq=111, maxDocs=44218)
                0.0625 = fieldNorm(doc=1041)
          0.2701595 = weight(abstract_txt:disputed in 1041) [ClassicSimilarity], result of:
            0.2701595 = score(doc=1041,freq=1.0), product of:
              0.45502168 = queryWeight, product of:
                2.7397418 = boost
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.017482933 = queryNorm
              0.5937289 = fieldWeight in 1041, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.0625 = fieldNorm(doc=1041)
        0.2 = coord(5/25)
    
  4. Stamatatos, E.: Masking topic-related information to enhance authorship attribution (2018) 0.11
    0.10810969 = sum of:
      0.10810969 = product of:
        0.54054844 = sum of:
          0.1041841 = weight(abstract_txt:genres in 4124) [ClassicSimilarity], result of:
            0.1041841 = score(doc=4124,freq=3.0), product of:
              0.1326689 = queryWeight, product of:
                1.0460758 = boost
                7.2542357 = idf(docFreq=84, maxDocs=44218)
                0.017482933 = queryNorm
              0.78529406 = fieldWeight in 4124, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.2542357 = idf(docFreq=84, maxDocs=44218)
                0.0625 = fieldNorm(doc=4124)
          0.18958744 = weight(abstract_txt:attribution in 4124) [ClassicSimilarity], result of:
            0.18958744 = score(doc=4124,freq=6.0), product of:
              0.15695116 = queryWeight, product of:
                1.1377867 = boost
                7.890225 = idf(docFreq=44, maxDocs=44218)
                0.017482933 = queryNorm
              1.2079391 = fieldWeight in 4124, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.890225 = idf(docFreq=44, maxDocs=44218)
                0.0625 = fieldNorm(doc=4124)
          0.029471435 = weight(abstract_txt:text in 4124) [ClassicSimilarity], result of:
            0.029471435 = score(doc=4124,freq=2.0), product of:
              0.082453564 = queryWeight, product of:
                1.1662679 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.017482933 = queryNorm
              0.3574307 = fieldWeight in 4124, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=4124)
          0.031814933 = weight(abstract_txt:when in 4124) [ClassicSimilarity], result of:
            0.031814933 = score(doc=4124,freq=2.0), product of:
              0.0867686 = queryWeight, product of:
                1.1963959 = boost
                4.148331 = idf(docFreq=1897, maxDocs=44218)
                0.017482933 = queryNorm
              0.36666414 = fieldWeight in 4124, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.148331 = idf(docFreq=1897, maxDocs=44218)
                0.0625 = fieldNorm(doc=4124)
          0.18549056 = weight(abstract_txt:authorship in 4124) [ClassicSimilarity], result of:
            0.18549056 = score(doc=4124,freq=3.0), product of:
              0.24554215 = queryWeight, product of:
                2.0125961 = boost
                6.9783883 = idf(docFreq=111, maxDocs=44218)
                0.017482933 = queryNorm
              0.75543267 = fieldWeight in 4124, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.9783883 = idf(docFreq=111, maxDocs=44218)
                0.0625 = fieldNorm(doc=4124)
        0.2 = coord(5/25)
    
  5. Stover, J.A.; Winter, Y.; Koppel, M.; Kestemont, M.: Computational authorship verification method attributes a new work to a major 2nd century African author (2016) 0.11
    0.10508941 = sum of:
      0.10508941 = product of:
        0.65680885 = sum of:
          0.07739875 = weight(abstract_txt:attribution in 2503) [ClassicSimilarity], result of:
            0.07739875 = score(doc=2503,freq=1.0), product of:
              0.15695116 = queryWeight, product of:
                1.1377867 = boost
                7.890225 = idf(docFreq=44, maxDocs=44218)
                0.017482933 = queryNorm
              0.49313906 = fieldWeight in 2503, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.890225 = idf(docFreq=44, maxDocs=44218)
                0.0625 = fieldNorm(doc=2503)
          0.0416789 = weight(abstract_txt:text in 2503) [ClassicSimilarity], result of:
            0.0416789 = score(doc=2503,freq=4.0), product of:
              0.082453564 = queryWeight, product of:
                1.1662679 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.017482933 = queryNorm
              0.5054833 = fieldWeight in 2503, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=2503)
          0.1514524 = weight(abstract_txt:authorship in 2503) [ClassicSimilarity], result of:
            0.1514524 = score(doc=2503,freq=2.0), product of:
              0.24554215 = queryWeight, product of:
                2.0125961 = boost
                6.9783883 = idf(docFreq=111, maxDocs=44218)
                0.017482933 = queryNorm
              0.6168082 = fieldWeight in 2503, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.9783883 = idf(docFreq=111, maxDocs=44218)
                0.0625 = fieldNorm(doc=2503)
          0.38627878 = weight(abstract_txt:verification in 2503) [ClassicSimilarity], result of:
            0.38627878 = score(doc=2503,freq=3.0), product of:
              0.45836258 = queryWeight, product of:
                3.3677807 = boost
                7.7848644 = idf(docFreq=49, maxDocs=44218)
                0.017482933 = queryNorm
              0.8427363 = fieldWeight in 2503, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.7848644 = idf(docFreq=49, maxDocs=44218)
                0.0625 = fieldNorm(doc=2503)
        0.16 = coord(4/25)