Document (#39130)

Author
Savoy, J.
Title
Text clustering : an application with the 'State of the Union' addresses
Source
Journal of the Association for Information Science and Technology. 66(2015) no.8, S.1645-1654
Year
2015
Abstract
This paper describes a clustering and authorship attribution study over the State of the Union addresses from 1790 to 2014 (224 speeches delivered by 41 presidents). To define the style of each presidency, we have applied a principal component analysis (PCA) based on the part-of-speech (POS) frequencies. From Roosevelt (1934), each president tends to own a distinctive style whereas previous presidents tend usually to share some stylistic aspects with others. Applying an automatic classification based on the frequencies of all content-bearing word-types we show that chronology tends to play a central role in forming clusters, a factor that is more important than political affiliation. Using the 300 most frequent word-types, we generate another clustering representation based on the style of each president. This second view shares similarities with the first one, but usually with more numerous and smaller clusters. Finally, an authorship attribution approach for each speech can reach a success rate of around 95.7% under some constraints. When an incorrect assignment is detected, the proposed author often belongs to the same party and has lived during roughly the same time period as the presumed author. A deeper analysis of some incorrect assignments reveals interesting reasons justifying difficult attributions.
Content
Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23283/abstract.

Similar documents (author)

  1. Savoy, J.: Stemming of French words based on grammatical categories (1993) 5.21
    5.2141504 = sum of:
      5.2141504 = weight(author_txt:savoy in 4650) [ClassicSimilarity], result of:
        5.2141504 = fieldWeight in 4650, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.342641 = idf(docFreq=27, maxDocs=43254)
          0.625 = fieldNorm(doc=4650)
    
  2. Savoy, J.: Effectiveness of information retrieval systems used in a hypertext environment (1993) 5.21
    5.2141504 = sum of:
      5.2141504 = weight(author_txt:savoy in 6511) [ClassicSimilarity], result of:
        5.2141504 = fieldWeight in 6511, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.342641 = idf(docFreq=27, maxDocs=43254)
          0.625 = fieldNorm(doc=6511)
    
  3. Savoy, J.: ¬A learning scheme for information retrieval in hypertext (1994) 5.21
    5.2141504 = sum of:
      5.2141504 = weight(author_txt:savoy in 292) [ClassicSimilarity], result of:
        5.2141504 = fieldWeight in 292, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.342641 = idf(docFreq=27, maxDocs=43254)
          0.625 = fieldNorm(doc=292)
    
  4. Savoy, J.: Bayesian inference networks and spreading activation in hypertext systems (1992) 5.21
    5.2141504 = sum of:
      5.2141504 = weight(author_txt:savoy in 1261) [ClassicSimilarity], result of:
        5.2141504 = fieldWeight in 1261, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.342641 = idf(docFreq=27, maxDocs=43254)
          0.625 = fieldNorm(doc=1261)
    
  5. Savoy, J.: Searching information in legal hypertext systems (1993/94) 5.21
    5.2141504 = sum of:
      5.2141504 = weight(author_txt:savoy in 1826) [ClassicSimilarity], result of:
        5.2141504 = fieldWeight in 1826, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.342641 = idf(docFreq=27, maxDocs=43254)
          0.625 = fieldNorm(doc=1826)
    

Similar documents (content)

  1. Savoy, J.: Estimating the probability of an authorship attribution (2016) 0.48
    0.47886118 = sum of:
      0.47886118 = product of:
        0.9976275 = sum of:
          0.017744746 = weight(abstract_txt:based in 4402) [ClassicSimilarity], result of:
            0.017744746 = score(doc=4402,freq=2.0), product of:
              0.06269754 = queryWeight, product of:
                1.0135525 = boost
                3.2020218 = idf(docFreq=4782, maxDocs=43254)
                0.019318791 = queryNorm
              0.28302142 = fieldWeight in 4402, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.2020218 = idf(docFreq=4782, maxDocs=43254)
                0.0625 = fieldNorm(doc=4402)
          0.028940419 = weight(abstract_txt:state in 4402) [ClassicSimilarity], result of:
            0.028940419 = score(doc=4402,freq=1.0), product of:
              0.09561369 = queryWeight, product of:
                1.0219636 = boost
                4.842891 = idf(docFreq=926, maxDocs=43254)
                0.019318791 = queryNorm
              0.3026807 = fieldWeight in 4402, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.842891 = idf(docFreq=926, maxDocs=43254)
                0.0625 = fieldNorm(doc=4402)
          0.117315836 = weight(abstract_txt:1790 in 4402) [ClassicSimilarity], result of:
            0.117315836 = score(doc=4402,freq=1.0), product of:
              0.19293512 = queryWeight, product of:
                1.0265167 = boost
                9.728935 = idf(docFreq=6, maxDocs=43254)
                0.019318791 = queryNorm
              0.60805845 = fieldWeight in 4402, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.728935 = idf(docFreq=6, maxDocs=43254)
                0.0625 = fieldNorm(doc=4402)
          0.06308927 = weight(abstract_txt:author in 4402) [ClassicSimilarity], result of:
            0.06308927 = score(doc=4402,freq=4.0), product of:
              0.10126682 = queryWeight, product of:
                1.0517414 = boost
                4.9840026 = idf(docFreq=804, maxDocs=43254)
                0.019318791 = queryNorm
              0.6230003 = fieldWeight in 4402, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.9840026 = idf(docFreq=804, maxDocs=43254)
                0.0625 = fieldNorm(doc=4402)
          0.008064563 = weight(abstract_txt:with in 4402) [ClassicSimilarity], result of:
            0.008064563 = score(doc=4402,freq=1.0), product of:
              0.051394198 = queryWeight, product of:
                1.0596133 = boost
                2.5106533 = idf(docFreq=9548, maxDocs=43254)
                0.019318791 = queryNorm
              0.15691583 = fieldWeight in 4402, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.5106533 = idf(docFreq=9548, maxDocs=43254)
                0.0625 = fieldNorm(doc=4402)
          0.019076474 = weight(abstract_txt:some in 4402) [ClassicSimilarity], result of:
            0.019076474 = score(doc=4402,freq=1.0), product of:
              0.08289838 = queryWeight, product of:
                1.165451 = boost
                3.6819005 = idf(docFreq=2959, maxDocs=43254)
                0.019318791 = queryNorm
              0.23011878 = fieldWeight in 4402, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6819005 = idf(docFreq=2959, maxDocs=43254)
                0.0625 = fieldNorm(doc=4402)
          0.050372645 = weight(abstract_txt:addresses in 4402) [ClassicSimilarity], result of:
            0.050372645 = score(doc=4402,freq=1.0), product of:
              0.13835028 = queryWeight, product of:
                1.2293212 = boost
                5.82552 = idf(docFreq=346, maxDocs=43254)
                0.019318791 = queryNorm
              0.364095 = fieldWeight in 4402, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.82552 = idf(docFreq=346, maxDocs=43254)
                0.0625 = fieldNorm(doc=4402)
          0.057061315 = weight(abstract_txt:union in 4402) [ClassicSimilarity], result of:
            0.057061315 = score(doc=4402,freq=1.0), product of:
              0.15034121 = queryWeight, product of:
                1.2814876 = boost
                6.0727262 = idf(docFreq=270, maxDocs=43254)
                0.019318791 = queryNorm
              0.3795454 = fieldWeight in 4402, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0727262 = idf(docFreq=270, maxDocs=43254)
                0.0625 = fieldNorm(doc=4402)
          0.17564508 = weight(abstract_txt:authorship in 4402) [ClassicSimilarity], result of:
            0.17564508 = score(doc=4402,freq=4.0), product of:
              0.20041068 = queryWeight, product of:
                1.479571 = boost
                7.011406 = idf(docFreq=105, maxDocs=43254)
                0.019318791 = queryNorm
              0.87642574 = fieldWeight in 4402, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.011406 = idf(docFreq=105, maxDocs=43254)
                0.0625 = fieldNorm(doc=4402)
          0.17858103 = weight(abstract_txt:attribution in 4402) [ClassicSimilarity], result of:
            0.17858103 = score(doc=4402,freq=2.0), product of:
              0.2553076 = queryWeight, product of:
                1.6699646 = boost
                7.913645 = idf(docFreq=42, maxDocs=43254)
                0.019318791 = queryNorm
              0.699474 = fieldWeight in 4402, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.913645 = idf(docFreq=42, maxDocs=43254)
                0.0625 = fieldNorm(doc=4402)
          0.035773966 = weight(abstract_txt:each in 4402) [ClassicSimilarity], result of:
            0.035773966 = score(doc=4402,freq=1.0), product of:
              0.13875169 = queryWeight, product of:
                1.741043 = boost
                4.125236 = idf(docFreq=1899, maxDocs=43254)
                0.019318791 = queryNorm
              0.25782725 = fieldWeight in 4402, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.125236 = idf(docFreq=1899, maxDocs=43254)
                0.0625 = fieldNorm(doc=4402)
          0.24596216 = weight(abstract_txt:presidents in 4402) [ClassicSimilarity], result of:
            0.24596216 = score(doc=4402,freq=1.0), product of:
              0.39819494 = queryWeight, product of:
                2.0855627 = boost
                9.883085 = idf(docFreq=5, maxDocs=43254)
                0.019318791 = queryNorm
              0.6176928 = fieldWeight in 4402, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.883085 = idf(docFreq=5, maxDocs=43254)
                0.0625 = fieldNorm(doc=4402)
        0.48 = coord(12/25)
    
  2. Savoy, J.: Text representation strategies : an example with the State of the union addresses (2016) 0.33
    0.32685354 = sum of:
      0.32685354 = product of:
        0.742849 = sum of:
          0.017744746 = weight(abstract_txt:based in 4507) [ClassicSimilarity], result of:
            0.017744746 = score(doc=4507,freq=2.0), product of:
              0.06269754 = queryWeight, product of:
                1.0135525 = boost
                3.2020218 = idf(docFreq=4782, maxDocs=43254)
                0.019318791 = queryNorm
              0.28302142 = fieldWeight in 4507, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.2020218 = idf(docFreq=4782, maxDocs=43254)
                0.0625 = fieldNorm(doc=4507)
          0.028940419 = weight(abstract_txt:state in 4507) [ClassicSimilarity], result of:
            0.028940419 = score(doc=4507,freq=1.0), product of:
              0.09561369 = queryWeight, product of:
                1.0219636 = boost
                4.842891 = idf(docFreq=926, maxDocs=43254)
                0.019318791 = queryNorm
              0.3026807 = fieldWeight in 4507, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.842891 = idf(docFreq=926, maxDocs=43254)
                0.0625 = fieldNorm(doc=4507)
          0.117315836 = weight(abstract_txt:1790 in 4507) [ClassicSimilarity], result of:
            0.117315836 = score(doc=4507,freq=1.0), product of:
              0.19293512 = queryWeight, product of:
                1.0265167 = boost
                9.728935 = idf(docFreq=6, maxDocs=43254)
                0.019318791 = queryNorm
              0.60805845 = fieldWeight in 4507, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.728935 = idf(docFreq=6, maxDocs=43254)
                0.0625 = fieldNorm(doc=4507)
          0.011405014 = weight(abstract_txt:with in 4507) [ClassicSimilarity], result of:
            0.011405014 = score(doc=4507,freq=2.0), product of:
              0.051394198 = queryWeight, product of:
                1.0596133 = boost
                2.5106533 = idf(docFreq=9548, maxDocs=43254)
                0.019318791 = queryNorm
              0.22191249 = fieldWeight in 4507, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.5106533 = idf(docFreq=9548, maxDocs=43254)
                0.0625 = fieldNorm(doc=4507)
          0.041029047 = weight(abstract_txt:word in 4507) [ClassicSimilarity], result of:
            0.041029047 = score(doc=4507,freq=1.0), product of:
              0.12066403 = queryWeight, product of:
                1.1480592 = boost
                5.4404345 = idf(docFreq=509, maxDocs=43254)
                0.019318791 = queryNorm
              0.34002715 = fieldWeight in 4507, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4404345 = idf(docFreq=509, maxDocs=43254)
                0.0625 = fieldNorm(doc=4507)
          0.026978208 = weight(abstract_txt:some in 4507) [ClassicSimilarity], result of:
            0.026978208 = score(doc=4507,freq=2.0), product of:
              0.08289838 = queryWeight, product of:
                1.165451 = boost
                3.6819005 = idf(docFreq=2959, maxDocs=43254)
                0.019318791 = queryNorm
              0.3254371 = fieldWeight in 4507, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.6819005 = idf(docFreq=2959, maxDocs=43254)
                0.0625 = fieldNorm(doc=4507)
          0.050372645 = weight(abstract_txt:addresses in 4507) [ClassicSimilarity], result of:
            0.050372645 = score(doc=4507,freq=1.0), product of:
              0.13835028 = queryWeight, product of:
                1.2293212 = boost
                5.82552 = idf(docFreq=346, maxDocs=43254)
                0.019318791 = queryNorm
              0.364095 = fieldWeight in 4507, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.82552 = idf(docFreq=346, maxDocs=43254)
                0.0625 = fieldNorm(doc=4507)
          0.057061315 = weight(abstract_txt:union in 4507) [ClassicSimilarity], result of:
            0.057061315 = score(doc=4507,freq=1.0), product of:
              0.15034121 = queryWeight, product of:
                1.2814876 = boost
                6.0727262 = idf(docFreq=270, maxDocs=43254)
                0.019318791 = queryNorm
              0.3795454 = fieldWeight in 4507, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0727262 = idf(docFreq=270, maxDocs=43254)
                0.0625 = fieldNorm(doc=4507)
          0.11026565 = weight(abstract_txt:frequencies in 4507) [ClassicSimilarity], result of:
            0.11026565 = score(doc=4507,freq=1.0), product of:
              0.23324394 = queryWeight, product of:
                1.5961752 = boost
                7.563971 = idf(docFreq=60, maxDocs=43254)
                0.019318791 = queryNorm
              0.4727482 = fieldWeight in 4507, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.563971 = idf(docFreq=60, maxDocs=43254)
                0.0625 = fieldNorm(doc=4507)
          0.035773966 = weight(abstract_txt:each in 4507) [ClassicSimilarity], result of:
            0.035773966 = score(doc=4507,freq=1.0), product of:
              0.13875169 = queryWeight, product of:
                1.741043 = boost
                4.125236 = idf(docFreq=1899, maxDocs=43254)
                0.019318791 = queryNorm
              0.25782725 = fieldWeight in 4507, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.125236 = idf(docFreq=1899, maxDocs=43254)
                0.0625 = fieldNorm(doc=4507)
          0.24596216 = weight(abstract_txt:presidents in 4507) [ClassicSimilarity], result of:
            0.24596216 = score(doc=4507,freq=1.0), product of:
              0.39819494 = queryWeight, product of:
                2.0855627 = boost
                9.883085 = idf(docFreq=5, maxDocs=43254)
                0.019318791 = queryNorm
              0.6176928 = fieldWeight in 4507, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.883085 = idf(docFreq=5, maxDocs=43254)
                0.0625 = fieldNorm(doc=4507)
        0.44 = coord(11/25)
    
  3. Stamatatos, E.: Masking topic-related information to enhance authorship attribution (2018) 0.16
    0.16407153 = sum of:
      0.16407153 = product of:
        0.6836314 = sum of:
          0.028940419 = weight(abstract_txt:state in 125) [ClassicSimilarity], result of:
            0.028940419 = score(doc=125,freq=1.0), product of:
              0.09561369 = queryWeight, product of:
                1.0219636 = boost
                4.842891 = idf(docFreq=926, maxDocs=43254)
                0.019318791 = queryNorm
              0.3026807 = fieldWeight in 125, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.842891 = idf(docFreq=926, maxDocs=43254)
                0.0625 = fieldNorm(doc=125)
          0.031544633 = weight(abstract_txt:author in 125) [ClassicSimilarity], result of:
            0.031544633 = score(doc=125,freq=1.0), product of:
              0.10126682 = queryWeight, product of:
                1.0517414 = boost
                4.9840026 = idf(docFreq=804, maxDocs=43254)
                0.019318791 = queryNorm
              0.31150016 = fieldWeight in 125, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9840026 = idf(docFreq=804, maxDocs=43254)
                0.0625 = fieldNorm(doc=125)
          0.011405014 = weight(abstract_txt:with in 125) [ClassicSimilarity], result of:
            0.011405014 = score(doc=125,freq=2.0), product of:
              0.051394198 = queryWeight, product of:
                1.0596133 = boost
                2.5106533 = idf(docFreq=9548, maxDocs=43254)
                0.019318791 = queryNorm
              0.22191249 = fieldWeight in 125, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.5106533 = idf(docFreq=9548, maxDocs=43254)
                0.0625 = fieldNorm(doc=125)
          0.1521131 = weight(abstract_txt:authorship in 125) [ClassicSimilarity], result of:
            0.1521131 = score(doc=125,freq=3.0), product of:
              0.20041068 = queryWeight, product of:
                1.479571 = boost
                7.011406 = idf(docFreq=105, maxDocs=43254)
                0.019318791 = queryNorm
              0.7590069 = fieldWeight in 125, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.011406 = idf(docFreq=105, maxDocs=43254)
                0.0625 = fieldNorm(doc=125)
          0.30931142 = weight(abstract_txt:attribution in 125) [ClassicSimilarity], result of:
            0.30931142 = score(doc=125,freq=6.0), product of:
              0.2553076 = queryWeight, product of:
                1.6699646 = boost
                7.913645 = idf(docFreq=42, maxDocs=43254)
                0.019318791 = queryNorm
              1.2115245 = fieldWeight in 125, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.913645 = idf(docFreq=42, maxDocs=43254)
                0.0625 = fieldNorm(doc=125)
          0.15031686 = weight(abstract_txt:style in 125) [ClassicSimilarity], result of:
            0.15031686 = score(doc=125,freq=2.0), product of:
              0.2605408 = queryWeight, product of:
                2.066136 = boost
                6.5273504 = idf(docFreq=171, maxDocs=43254)
                0.019318791 = queryNorm
              0.5769417 = fieldWeight in 125, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.5273504 = idf(docFreq=171, maxDocs=43254)
                0.0625 = fieldNorm(doc=125)
        0.24 = coord(6/25)
    
  4. Savoy, J.: Authorship of Pauline epistles revisited (2019) 0.16
    0.1626829 = sum of:
      0.1626829 = product of:
        0.5083841 = sum of:
          0.047303785 = weight(abstract_txt:same in 387) [ClassicSimilarity], result of:
            0.047303785 = score(doc=387,freq=3.0), product of:
              0.09198995 = queryWeight, product of:
                1.0024104 = boost
                4.7502327 = idf(docFreq=1016, maxDocs=43254)
                0.019318791 = queryNorm
              0.51422775 = fieldWeight in 387, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.7502327 = idf(docFreq=1016, maxDocs=43254)
                0.0625 = fieldNorm(doc=387)
          0.017744746 = weight(abstract_txt:based in 387) [ClassicSimilarity], result of:
            0.017744746 = score(doc=387,freq=2.0), product of:
              0.06269754 = queryWeight, product of:
                1.0135525 = boost
                3.2020218 = idf(docFreq=4782, maxDocs=43254)
                0.019318791 = queryNorm
              0.28302142 = fieldWeight in 387, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.2020218 = idf(docFreq=4782, maxDocs=43254)
                0.0625 = fieldNorm(doc=387)
          0.054636903 = weight(abstract_txt:author in 387) [ClassicSimilarity], result of:
            0.054636903 = score(doc=387,freq=3.0), product of:
              0.10126682 = queryWeight, product of:
                1.0517414 = boost
                4.9840026 = idf(docFreq=804, maxDocs=43254)
                0.019318791 = queryNorm
              0.5395341 = fieldWeight in 387, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.9840026 = idf(docFreq=804, maxDocs=43254)
                0.0625 = fieldNorm(doc=387)
          0.011405014 = weight(abstract_txt:with in 387) [ClassicSimilarity], result of:
            0.011405014 = score(doc=387,freq=2.0), product of:
              0.051394198 = queryWeight, product of:
                1.0596133 = boost
                2.5106533 = idf(docFreq=9548, maxDocs=43254)
                0.019318791 = queryNorm
              0.22191249 = fieldWeight in 387, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.5106533 = idf(docFreq=9548, maxDocs=43254)
                0.0625 = fieldNorm(doc=387)
          0.070671424 = weight(abstract_txt:clusters in 387) [ClassicSimilarity], result of:
            0.070671424 = score(doc=387,freq=1.0), product of:
              0.1733855 = queryWeight, product of:
                1.3762007 = boost
                6.5215535 = idf(docFreq=172, maxDocs=43254)
                0.019318791 = queryNorm
              0.4075971 = fieldWeight in 387, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5215535 = idf(docFreq=172, maxDocs=43254)
                0.0625 = fieldNorm(doc=387)
          0.08782254 = weight(abstract_txt:authorship in 387) [ClassicSimilarity], result of:
            0.08782254 = score(doc=387,freq=1.0), product of:
              0.20041068 = queryWeight, product of:
                1.479571 = boost
                7.011406 = idf(docFreq=105, maxDocs=43254)
                0.019318791 = queryNorm
              0.43821287 = fieldWeight in 387, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.011406 = idf(docFreq=105, maxDocs=43254)
                0.0625 = fieldNorm(doc=387)
          0.12627587 = weight(abstract_txt:attribution in 387) [ClassicSimilarity], result of:
            0.12627587 = score(doc=387,freq=1.0), product of:
              0.2553076 = queryWeight, product of:
                1.6699646 = boost
                7.913645 = idf(docFreq=42, maxDocs=43254)
                0.019318791 = queryNorm
              0.4946028 = fieldWeight in 387, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.913645 = idf(docFreq=42, maxDocs=43254)
                0.0625 = fieldNorm(doc=387)
          0.09252381 = weight(abstract_txt:clustering in 387) [ClassicSimilarity], result of:
            0.09252381 = score(doc=387,freq=1.0), product of:
              0.23752882 = queryWeight, product of:
                1.9727823 = boost
                6.232427 = idf(docFreq=230, maxDocs=43254)
                0.019318791 = queryNorm
              0.3895267 = fieldWeight in 387, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.232427 = idf(docFreq=230, maxDocs=43254)
                0.0625 = fieldNorm(doc=387)
        0.32 = coord(8/25)
    
  5. Cai, X.; Li, W.: Enhancing sentence-level clustering with integrated and interactive frameworks for theme-based summarization (2011) 0.14
    0.13743743 = sum of:
      0.13743743 = product of:
        0.572656 = sum of:
          0.017744746 = weight(abstract_txt:based in 1235) [ClassicSimilarity], result of:
            0.017744746 = score(doc=1235,freq=2.0), product of:
              0.06269754 = queryWeight, product of:
                1.0135525 = boost
                3.2020218 = idf(docFreq=4782, maxDocs=43254)
                0.019318791 = queryNorm
              0.28302142 = fieldWeight in 1235, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.2020218 = idf(docFreq=4782, maxDocs=43254)
                0.0625 = fieldNorm(doc=1235)
          0.008064563 = weight(abstract_txt:with in 1235) [ClassicSimilarity], result of:
            0.008064563 = score(doc=1235,freq=1.0), product of:
              0.051394198 = queryWeight, product of:
                1.0596133 = boost
                2.5106533 = idf(docFreq=9548, maxDocs=43254)
                0.019318791 = queryNorm
              0.15691583 = fieldWeight in 1235, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.5106533 = idf(docFreq=9548, maxDocs=43254)
                0.0625 = fieldNorm(doc=1235)
          0.082058094 = weight(abstract_txt:word in 1235) [ClassicSimilarity], result of:
            0.082058094 = score(doc=1235,freq=4.0), product of:
              0.12066403 = queryWeight, product of:
                1.1480592 = boost
                5.4404345 = idf(docFreq=509, maxDocs=43254)
                0.019318791 = queryNorm
              0.6800543 = fieldWeight in 1235, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.4404345 = idf(docFreq=509, maxDocs=43254)
                0.0625 = fieldNorm(doc=1235)
          0.070671424 = weight(abstract_txt:clusters in 1235) [ClassicSimilarity], result of:
            0.070671424 = score(doc=1235,freq=1.0), product of:
              0.1733855 = queryWeight, product of:
                1.3762007 = boost
                6.5215535 = idf(docFreq=172, maxDocs=43254)
                0.019318791 = queryNorm
              0.4075971 = fieldWeight in 1235, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5215535 = idf(docFreq=172, maxDocs=43254)
                0.0625 = fieldNorm(doc=1235)
          0.035773966 = weight(abstract_txt:each in 1235) [ClassicSimilarity], result of:
            0.035773966 = score(doc=1235,freq=1.0), product of:
              0.13875169 = queryWeight, product of:
                1.741043 = boost
                4.125236 = idf(docFreq=1899, maxDocs=43254)
                0.019318791 = queryNorm
              0.25782725 = fieldWeight in 1235, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.125236 = idf(docFreq=1899, maxDocs=43254)
                0.0625 = fieldNorm(doc=1235)
          0.35834318 = weight(abstract_txt:clustering in 1235) [ClassicSimilarity], result of:
            0.35834318 = score(doc=1235,freq=15.0), product of:
              0.23752882 = queryWeight, product of:
                1.9727823 = boost
                6.232427 = idf(docFreq=230, maxDocs=43254)
                0.019318791 = queryNorm
              1.5086304 = fieldWeight in 1235, product of:
                3.8729835 = tf(freq=15.0), with freq of:
                  15.0 = termFreq=15.0
                6.232427 = idf(docFreq=230, maxDocs=43254)
                0.0625 = fieldNorm(doc=1235)
        0.24 = coord(6/25)