Document (#42328)

Author
Adamovic, S.
Miskovic, V.
Milosavljevic, M.
Sarac, M.
Veinovic, M.
Title
Automated language-independent authorship verification (for Indo-European languages) : facilitating adaptive visual exploration of scientific publications by citation links
Source
Journal of the Association for Information Science and Technology. 70(2019) no.8, S.858-871
Year
2019
Abstract
In this article we examine automated language-independent authorship verification using text examples in several representative Indo-European languages, in cases when the examined texts belong to an open set of authors, that is, the author is unknown. We showcase the set of developed language-dependent and language-independent features, the model of training examples, consisting of pairs of equal features for known and unknown texts, and the appropriate method of authorship verification. An authorship verification accuracy greater than 90% was accomplished via the application of stylometric methods on four different languages (English, Greek, Spanish, and Dutch, while the verification for Dutch is slightly lower). For the multilingual case, the highest authorship verification accuracy using basic machine-learning methods, over 90%, was achieved by the application of the kNN and SVM-SMO methods, using the feature selection method SVM-RFE. The improvement in authorship verification accuracy in multilingual cases, over 94%, was accomplished via ensemble learning methods, with the MultiboostAB method being a bit more accurate, but Random Forest is generally more appropriate
Content
Vgl.: https://onlinelibrary.wiley.com/doi/10.1002/asi.24163.
Theme
Formalerschließung

Similar documents (content)

  1. Kocher, M.; Savoy, J.: ¬A simple and efficient algorithm for authorship verification (2017) 0.30
    0.30055502 = sum of:
      0.30055502 = product of:
        1.0734107 = sum of:
          0.02309456 = weight(abstract_txt:features in 3330) [ClassicSimilarity], result of:
            0.02309456 = score(doc=3330,freq=1.0), product of:
              0.06512458 = queryWeight, product of:
                1.0433446 = boost
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.013751258 = queryNorm
              0.35462123 = fieldWeight in 3330, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.078125 = fieldNorm(doc=3330)
          0.0153843425 = weight(abstract_txt:using in 3330) [ClassicSimilarity], result of:
            0.0153843425 = score(doc=3330,freq=1.0), product of:
              0.056861922 = queryWeight, product of:
                1.1940204 = boost
                3.4631186 = idf(docFreq=3765, maxDocs=44218)
                0.013751258 = queryNorm
              0.27055615 = fieldWeight in 3330, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4631186 = idf(docFreq=3765, maxDocs=44218)
                0.078125 = fieldNorm(doc=3330)
          0.045562673 = weight(abstract_txt:european in 3330) [ClassicSimilarity], result of:
            0.045562673 = score(doc=3330,freq=1.0), product of:
              0.10244198 = queryWeight, product of:
                1.308562 = boost
                5.6930003 = idf(docFreq=404, maxDocs=44218)
                0.013751258 = queryNorm
              0.44476566 = fieldWeight in 3330, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6930003 = idf(docFreq=404, maxDocs=44218)
                0.078125 = fieldNorm(doc=3330)
          0.10922935 = weight(abstract_txt:dutch in 3330) [ClassicSimilarity], result of:
            0.10922935 = score(doc=3330,freq=1.0), product of:
              0.18349802 = queryWeight, product of:
                1.7513423 = boost
                7.61935 = idf(docFreq=58, maxDocs=44218)
                0.013751258 = queryNorm
              0.5952617 = fieldWeight in 3330, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.61935 = idf(docFreq=58, maxDocs=44218)
                0.078125 = fieldNorm(doc=3330)
          0.051725693 = weight(abstract_txt:languages in 3330) [ClassicSimilarity], result of:
            0.051725693 = score(doc=3330,freq=1.0), product of:
              0.12761639 = queryWeight, product of:
                1.7887688 = boost
                5.188118 = idf(docFreq=670, maxDocs=44218)
                0.013751258 = queryNorm
              0.40532172 = fieldWeight in 3330, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.188118 = idf(docFreq=670, maxDocs=44218)
                0.078125 = fieldNorm(doc=3330)
          0.2517516 = weight(abstract_txt:authorship in 3330) [ClassicSimilarity], result of:
            0.2517516 = score(doc=3330,freq=1.0), product of:
              0.46177146 = queryWeight, product of:
                4.812043 = boost
                6.9783883 = idf(docFreq=111, maxDocs=44218)
                0.013751258 = queryNorm
              0.5451866 = fieldWeight in 3330, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9783883 = idf(docFreq=111, maxDocs=44218)
                0.078125 = fieldNorm(doc=3330)
          0.57666254 = weight(abstract_txt:verification in 3330) [ClassicSimilarity], result of:
            0.57666254 = score(doc=3330,freq=2.0), product of:
              0.6704489 = queryWeight, product of:
                6.2628527 = boost
                7.7848644 = idf(docFreq=49, maxDocs=44218)
                0.013751258 = queryNorm
              0.8601141 = fieldWeight in 3330, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.7848644 = idf(docFreq=49, maxDocs=44218)
                0.078125 = fieldNorm(doc=3330)
        0.28 = coord(7/25)
    
  2. Stover, J.A.; Winter, Y.; Koppel, M.; Kestemont, M.: Computational authorship verification method attributes a new work to a major 2nd century African author (2016) 0.23
    0.22952123 = sum of:
      0.22952123 = product of:
        0.95633847 = sum of:
          0.018934753 = weight(abstract_txt:application in 2503) [ClassicSimilarity], result of:
            0.018934753 = score(doc=2503,freq=1.0), product of:
              0.06619903 = queryWeight, product of:
                1.0519161 = boost
                4.5764427 = idf(docFreq=1236, maxDocs=44218)
                0.013751258 = queryNorm
              0.28602767 = fieldWeight in 2503, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5764427 = idf(docFreq=1236, maxDocs=44218)
                0.0625 = fieldNorm(doc=2503)
          0.021183407 = weight(abstract_txt:learning in 2503) [ClassicSimilarity], result of:
            0.021183407 = score(doc=2503,freq=1.0), product of:
              0.07134152 = queryWeight, product of:
                1.0920098 = boost
                4.750873 = idf(docFreq=1038, maxDocs=44218)
                0.013751258 = queryNorm
              0.29692957 = fieldWeight in 2503, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.750873 = idf(docFreq=1038, maxDocs=44218)
                0.0625 = fieldNorm(doc=2503)
          0.038211517 = weight(abstract_txt:method in 2503) [ClassicSimilarity], result of:
            0.038211517 = score(doc=2503,freq=2.0), product of:
              0.0960495 = queryWeight, product of:
                1.5518457 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.013751258 = queryNorm
              0.3978315 = fieldWeight in 2503, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.0625 = fieldNorm(doc=2503)
          0.028172782 = weight(abstract_txt:methods in 2503) [ClassicSimilarity], result of:
            0.028172782 = score(doc=2503,freq=1.0), product of:
              0.10870303 = queryWeight, product of:
                1.9062996 = boost
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.013751258 = queryNorm
              0.259172 = fieldWeight in 2503, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.0625 = fieldNorm(doc=2503)
          0.2848244 = weight(abstract_txt:authorship in 2503) [ClassicSimilarity], result of:
            0.2848244 = score(doc=2503,freq=2.0), product of:
              0.46177146 = queryWeight, product of:
                4.812043 = boost
                6.9783883 = idf(docFreq=111, maxDocs=44218)
                0.013751258 = queryNorm
              0.6168082 = fieldWeight in 2503, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.9783883 = idf(docFreq=111, maxDocs=44218)
                0.0625 = fieldNorm(doc=2503)
          0.5650116 = weight(abstract_txt:verification in 2503) [ClassicSimilarity], result of:
            0.5650116 = score(doc=2503,freq=3.0), product of:
              0.6704489 = queryWeight, product of:
                6.2628527 = boost
                7.7848644 = idf(docFreq=49, maxDocs=44218)
                0.013751258 = queryNorm
              0.8427363 = fieldWeight in 2503, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.7848644 = idf(docFreq=49, maxDocs=44218)
                0.0625 = fieldNorm(doc=2503)
        0.24 = coord(6/25)
    
  3. Potha, N.; Stamatatos, E.: Improving author verification based on topic modeling (2019) 0.21
    0.20791514 = sum of:
      0.20791514 = product of:
        1.0395757 = sum of:
          0.035710968 = weight(abstract_txt:texts in 5385) [ClassicSimilarity], result of:
            0.035710968 = score(doc=5385,freq=1.0), product of:
              0.101052314 = queryWeight, product of:
                1.2996562 = boost
                5.6542544 = idf(docFreq=420, maxDocs=44218)
                0.013751258 = queryNorm
              0.3533909 = fieldWeight in 5385, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6542544 = idf(docFreq=420, maxDocs=44218)
                0.0625 = fieldNorm(doc=5385)
          0.027019626 = weight(abstract_txt:method in 5385) [ClassicSimilarity], result of:
            0.027019626 = score(doc=5385,freq=1.0), product of:
              0.0960495 = queryWeight, product of:
                1.5518457 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.013751258 = queryNorm
              0.28130937 = fieldWeight in 5385, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.0625 = fieldNorm(doc=5385)
          0.06299625 = weight(abstract_txt:methods in 5385) [ClassicSimilarity], result of:
            0.06299625 = score(doc=5385,freq=5.0), product of:
              0.10870303 = queryWeight, product of:
                1.9062996 = boost
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.013751258 = queryNorm
              0.5795262 = fieldWeight in 5385, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.0625 = fieldNorm(doc=5385)
          0.34883726 = weight(abstract_txt:authorship in 5385) [ClassicSimilarity], result of:
            0.34883726 = score(doc=5385,freq=3.0), product of:
              0.46177146 = queryWeight, product of:
                4.812043 = boost
                6.9783883 = idf(docFreq=111, maxDocs=44218)
                0.013751258 = queryNorm
              0.75543267 = fieldWeight in 5385, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.9783883 = idf(docFreq=111, maxDocs=44218)
                0.0625 = fieldNorm(doc=5385)
          0.5650116 = weight(abstract_txt:verification in 5385) [ClassicSimilarity], result of:
            0.5650116 = score(doc=5385,freq=3.0), product of:
              0.6704489 = queryWeight, product of:
                6.2628527 = boost
                7.7848644 = idf(docFreq=49, maxDocs=44218)
                0.013751258 = queryNorm
              0.8427363 = fieldWeight in 5385, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.7848644 = idf(docFreq=49, maxDocs=44218)
                0.0625 = fieldNorm(doc=5385)
        0.2 = coord(5/25)
    
  4. HaCohen-Kerner, Y.; Kass, A.; Peretz, A.: HAADS: a Hebrew Aramaic abbreviation disambiguation system (2010) 0.16
    0.16177435 = sum of:
      0.16177435 = product of:
        0.40443587 = sum of:
          0.032660637 = weight(abstract_txt:features in 3990) [ClassicSimilarity], result of:
            0.032660637 = score(doc=3990,freq=2.0), product of:
              0.06512458 = queryWeight, product of:
                1.0433446 = boost
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.013751258 = queryNorm
              0.50151014 = fieldWeight in 3990, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.078125 = fieldNorm(doc=3990)
          0.023668442 = weight(abstract_txt:application in 3990) [ClassicSimilarity], result of:
            0.023668442 = score(doc=3990,freq=1.0), product of:
              0.06619903 = queryWeight, product of:
                1.0519161 = boost
                4.5764427 = idf(docFreq=1236, maxDocs=44218)
                0.013751258 = queryNorm
              0.3575346 = fieldWeight in 3990, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5764427 = idf(docFreq=1236, maxDocs=44218)
                0.078125 = fieldNorm(doc=3990)
          0.02647926 = weight(abstract_txt:learning in 3990) [ClassicSimilarity], result of:
            0.02647926 = score(doc=3990,freq=1.0), product of:
              0.07134152 = queryWeight, product of:
                1.0920098 = boost
                4.750873 = idf(docFreq=1038, maxDocs=44218)
                0.013751258 = queryNorm
              0.37116197 = fieldWeight in 3990, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.750873 = idf(docFreq=1038, maxDocs=44218)
                0.078125 = fieldNorm(doc=3990)
          0.0153843425 = weight(abstract_txt:using in 3990) [ClassicSimilarity], result of:
            0.0153843425 = score(doc=3990,freq=1.0), product of:
              0.056861922 = queryWeight, product of:
                1.1940204 = boost
                3.4631186 = idf(docFreq=3765, maxDocs=44218)
                0.013751258 = queryNorm
              0.27055615 = fieldWeight in 3990, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4631186 = idf(docFreq=3765, maxDocs=44218)
                0.078125 = fieldNorm(doc=3990)
          0.04480829 = weight(abstract_txt:cases in 3990) [ClassicSimilarity], result of:
            0.04480829 = score(doc=3990,freq=1.0), product of:
              0.10130808 = queryWeight, product of:
                1.3012998 = boost
                5.6614056 = idf(docFreq=417, maxDocs=44218)
                0.013751258 = queryNorm
              0.4422973 = fieldWeight in 3990, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6614056 = idf(docFreq=417, maxDocs=44218)
                0.078125 = fieldNorm(doc=3990)
          0.033774532 = weight(abstract_txt:method in 3990) [ClassicSimilarity], result of:
            0.033774532 = score(doc=3990,freq=1.0), product of:
              0.0960495 = queryWeight, product of:
                1.5518457 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.013751258 = queryNorm
              0.3516367 = fieldWeight in 3990, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.078125 = fieldNorm(doc=3990)
          0.051725693 = weight(abstract_txt:languages in 3990) [ClassicSimilarity], result of:
            0.051725693 = score(doc=3990,freq=1.0), product of:
              0.12761639 = queryWeight, product of:
                1.7887688 = boost
                5.188118 = idf(docFreq=670, maxDocs=44218)
                0.013751258 = queryNorm
              0.40532172 = fieldWeight in 3990, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.188118 = idf(docFreq=670, maxDocs=44218)
                0.078125 = fieldNorm(doc=3990)
          0.060995862 = weight(abstract_txt:methods in 3990) [ClassicSimilarity], result of:
            0.060995862 = score(doc=3990,freq=3.0), product of:
              0.10870303 = queryWeight, product of:
                1.9062996 = boost
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.013751258 = queryNorm
              0.56112385 = fieldWeight in 3990, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.078125 = fieldNorm(doc=3990)
          0.036123924 = weight(abstract_txt:language in 3990) [ClassicSimilarity], result of:
            0.036123924 = score(doc=3990,freq=1.0), product of:
              0.1105635 = queryWeight, product of:
                1.9225436 = boost
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.013751258 = queryNorm
              0.32672557 = fieldWeight in 3990, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.078125 = fieldNorm(doc=3990)
          0.07881492 = weight(abstract_txt:accuracy in 3990) [ClassicSimilarity], result of:
            0.07881492 = score(doc=3990,freq=1.0), product of:
              0.1689823 = queryWeight, product of:
                2.058361 = boost
                5.9700394 = idf(docFreq=306, maxDocs=44218)
                0.013751258 = queryNorm
              0.46640933 = fieldWeight in 3990, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9700394 = idf(docFreq=306, maxDocs=44218)
                0.078125 = fieldNorm(doc=3990)
        0.4 = coord(10/25)
    
  5. Stamatatos, E.: ¬A survey of modern authorship attribution methods (2009) 0.13
    0.1323656 = sum of:
      0.1323656 = product of:
        0.5515233 = sum of:
          0.021183407 = weight(abstract_txt:learning in 2741) [ClassicSimilarity], result of:
            0.021183407 = score(doc=2741,freq=1.0), product of:
              0.07134152 = queryWeight, product of:
                1.0920098 = boost
                4.750873 = idf(docFreq=1038, maxDocs=44218)
                0.013751258 = queryNorm
              0.29692957 = fieldWeight in 2741, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.750873 = idf(docFreq=1038, maxDocs=44218)
                0.0625 = fieldNorm(doc=2741)
          0.03475451 = weight(abstract_txt:automated in 2741) [ClassicSimilarity], result of:
            0.03475451 = score(doc=2741,freq=1.0), product of:
              0.09923982 = queryWeight, product of:
                1.287948 = boost
                5.6033173 = idf(docFreq=442, maxDocs=44218)
                0.013751258 = queryNorm
              0.35020733 = fieldWeight in 2741, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6033173 = idf(docFreq=442, maxDocs=44218)
                0.0625 = fieldNorm(doc=2741)
          0.035710968 = weight(abstract_txt:texts in 2741) [ClassicSimilarity], result of:
            0.035710968 = score(doc=2741,freq=1.0), product of:
              0.101052314 = queryWeight, product of:
                1.2996562 = boost
                5.6542544 = idf(docFreq=420, maxDocs=44218)
                0.013751258 = queryNorm
              0.3533909 = fieldWeight in 2741, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6542544 = idf(docFreq=420, maxDocs=44218)
                0.0625 = fieldNorm(doc=2741)
          0.028172782 = weight(abstract_txt:methods in 2741) [ClassicSimilarity], result of:
            0.028172782 = score(doc=2741,freq=1.0), product of:
              0.10870303 = queryWeight, product of:
                1.9062996 = boost
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.013751258 = queryNorm
              0.259172 = fieldWeight in 2741, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.0625 = fieldNorm(doc=2741)
          0.028899139 = weight(abstract_txt:language in 2741) [ClassicSimilarity], result of:
            0.028899139 = score(doc=2741,freq=1.0), product of:
              0.1105635 = queryWeight, product of:
                1.9225436 = boost
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.013751258 = queryNorm
              0.26138046 = fieldWeight in 2741, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.0625 = fieldNorm(doc=2741)
          0.40280256 = weight(abstract_txt:authorship in 2741) [ClassicSimilarity], result of:
            0.40280256 = score(doc=2741,freq=4.0), product of:
              0.46177146 = queryWeight, product of:
                4.812043 = boost
                6.9783883 = idf(docFreq=111, maxDocs=44218)
                0.013751258 = queryNorm
              0.87229854 = fieldWeight in 2741, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.9783883 = idf(docFreq=111, maxDocs=44218)
                0.0625 = fieldNorm(doc=2741)
        0.24 = coord(6/25)