Document (#34531)

Author
Pong, J.Y.-H.
Kwok, R.C.-W.
Lau, R.Y.-K.
Hao, J.-X.
Wong, P.C.-C.
Title
¬A comparative study of two automatic document classification methods in a library setting
Source
Journal of information science. 34(2008) no.2, S.213-230
Year
2008
Abstract
In current library practice, trained human experts usually carry out document cataloguing and indexing based on a manual approach. With the explosive growth in the number of electronic documents available on the Internet and digital libraries, it is increasingly difficult for library practitioners to categorize both electronic documents and traditional library materials using just a manual approach. To improve the effectiveness and efficiency of document categorization at the library setting, more in-depth studies of using automatic document classification methods to categorize library items are required. Machine learning research has advanced rapidly in recent years. However, applying machine learning techniques to improve library practice is still a relatively unexplored area. This paper illustrates the design and development of a machine learning based automatic document classification system to alleviate the manual categorization problem encountered within the library setting. Two supervised machine learning algorithms have been tested. Our empirical tests show that supervised machine learning algorithms in general, and the k-nearest neighbours (KNN) algorithm in particular, can be used to develop an effective document classification system to enhance current library practice. Moreover, some concrete recommendations regarding how to practically apply the KNN algorithm to develop automatic document classification in a library setting are made. To our best knowledge, this is the first in-depth study of applying the KNN algorithm to automatic document classification based on the widely used LCC classification scheme adopted by many large libraries.
Theme
Automatisches Klassifizieren

Similar documents (author)

  1. Wu, H.C.; Luk, R.W.P.; Wong, K.F,; Kwok, K.L.: ¬A retrospective study of a hybrid document-context based retrieval model (2007) 3.79
    3.7917166 = sum of:
      3.7917166 = sum of:
        1.6359766 = weight(author_txt:wong in 2934) [ClassicSimilarity], result of:
          1.6359766 = score(doc=2934,freq=1.0), product of:
            0.6395768 = queryWeight, product of:
              8.185295 = idf(docFreq=32, maxDocs=43556)
              0.07813729 = queryNorm
            2.5579047 = fieldWeight in 2934, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.185295 = idf(docFreq=32, maxDocs=43556)
              0.3125 = fieldNorm(doc=2934)
        2.15574 = weight(author_txt:kwok in 2934) [ClassicSimilarity], result of:
          2.15574 = score(doc=2934,freq=1.0), product of:
            0.7687272 = queryWeight, product of:
              1.0963261 = boost
              8.973753 = idf(docFreq=14, maxDocs=43556)
              0.07813729 = queryNorm
            2.804298 = fieldWeight in 2934, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.973753 = idf(docFreq=14, maxDocs=43556)
              0.3125 = fieldNorm(doc=2934)
    
  2. Kwok, K.L.: ¬The use of titles and cited titles as document representations for automatic classification (1975) 2.16
    2.15574 = sum of:
      2.15574 = product of:
        4.31148 = sum of:
          4.31148 = weight(author_txt:kwok in 4346) [ClassicSimilarity], result of:
            4.31148 = score(doc=4346,freq=1.0), product of:
              0.7687272 = queryWeight, product of:
                1.0963261 = boost
                8.973753 = idf(docFreq=14, maxDocs=43556)
                0.07813729 = queryNorm
              5.608596 = fieldWeight in 4346, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.973753 = idf(docFreq=14, maxDocs=43556)
                0.625 = fieldNorm(doc=4346)
        0.5 = coord(1/2)
    
  3. Kwok, K.L.: Employing multiple representations for Chinese information retrieval (1999) 2.16
    2.15574 = sum of:
      2.15574 = product of:
        4.31148 = sum of:
          4.31148 = weight(author_txt:kwok in 4771) [ClassicSimilarity], result of:
            4.31148 = score(doc=4771,freq=1.0), product of:
              0.7687272 = queryWeight, product of:
                1.0963261 = boost
                8.973753 = idf(docFreq=14, maxDocs=43556)
                0.07813729 = queryNorm
              5.608596 = fieldWeight in 4771, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.973753 = idf(docFreq=14, maxDocs=43556)
                0.625 = fieldNorm(doc=4771)
        0.5 = coord(1/2)
    
  4. Kwok, K.L.: ¬A network approach to probabilistic information retrieval (1995) 2.16
    2.15574 = sum of:
      2.15574 = product of:
        4.31148 = sum of:
          4.31148 = weight(author_txt:kwok in 694) [ClassicSimilarity], result of:
            4.31148 = score(doc=694,freq=1.0), product of:
              0.7687272 = queryWeight, product of:
                1.0963261 = boost
                8.973753 = idf(docFreq=14, maxDocs=43556)
                0.07813729 = queryNorm
              5.608596 = fieldWeight in 694, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.973753 = idf(docFreq=14, maxDocs=43556)
                0.625 = fieldNorm(doc=694)
        0.5 = coord(1/2)
    
  5. Kwok, K.L.: Improving English and Chinese ad-hoc retrieval : a TIPSTER text phase 3 project report (2000) 2.16
    2.15574 = sum of:
      2.15574 = product of:
        4.31148 = sum of:
          4.31148 = weight(author_txt:kwok in 386) [ClassicSimilarity], result of:
            4.31148 = score(doc=386,freq=1.0), product of:
              0.7687272 = queryWeight, product of:
                1.0963261 = boost
                8.973753 = idf(docFreq=14, maxDocs=43556)
                0.07813729 = queryNorm
              5.608596 = fieldWeight in 386, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.973753 = idf(docFreq=14, maxDocs=43556)
                0.625 = fieldNorm(doc=386)
        0.5 = coord(1/2)
    

Similar documents (content)

  1. Wang, J.: ¬An extensive study on automated Dewey Decimal Classification (2009) 0.41
    0.41176936 = sum of:
      0.41176936 = product of:
        0.9358395 = sum of:
          0.032876924 = weight(abstract_txt:improve in 170) [ClassicSimilarity], result of:
            0.032876924 = score(doc=170,freq=1.0), product of:
              0.10547531 = queryWeight, product of:
                1.1993399 = boost
                4.987241 = idf(docFreq=807, maxDocs=43556)
                0.017633893 = queryNorm
              0.31170255 = fieldWeight in 170, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.987241 = idf(docFreq=807, maxDocs=43556)
                0.0625 = fieldNorm(doc=170)
          0.05778951 = weight(abstract_txt:depth in 170) [ClassicSimilarity], result of:
            0.05778951 = score(doc=170,freq=1.0), product of:
              0.15362293 = queryWeight, product of:
                1.4474211 = boost
                6.018842 = idf(docFreq=287, maxDocs=43556)
                0.017633893 = queryNorm
              0.37617764 = fieldWeight in 170, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.018842 = idf(docFreq=287, maxDocs=43556)
                0.0625 = fieldNorm(doc=170)
          0.077086315 = weight(abstract_txt:categorization in 170) [ClassicSimilarity], result of:
            0.077086315 = score(doc=170,freq=1.0), product of:
              0.18615508 = queryWeight, product of:
                1.5933249 = boost
                6.625557 = idf(docFreq=156, maxDocs=43556)
                0.017633893 = queryNorm
              0.4140973 = fieldWeight in 170, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.625557 = idf(docFreq=156, maxDocs=43556)
                0.0625 = fieldNorm(doc=170)
          0.11427636 = weight(abstract_txt:supervised in 170) [ClassicSimilarity], result of:
            0.11427636 = score(doc=170,freq=1.0), product of:
              0.2420254 = queryWeight, product of:
                1.8167591 = boost
                7.5546684 = idf(docFreq=61, maxDocs=43556)
                0.017633893 = queryNorm
              0.47216678 = fieldWeight in 170, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5546684 = idf(docFreq=61, maxDocs=43556)
                0.0625 = fieldNorm(doc=170)
          0.07433353 = weight(abstract_txt:algorithm in 170) [ClassicSimilarity], result of:
            0.07433353 = score(doc=170,freq=1.0), product of:
              0.20799057 = queryWeight, product of:
                2.0626917 = boost
                5.7182236 = idf(docFreq=388, maxDocs=43556)
                0.017633893 = queryNorm
              0.35738897 = fieldWeight in 170, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7182236 = idf(docFreq=388, maxDocs=43556)
                0.0625 = fieldNorm(doc=170)
          0.072598144 = weight(abstract_txt:learning in 170) [ClassicSimilarity], result of:
            0.072598144 = score(doc=170,freq=1.0), product of:
              0.24274692 = queryWeight, product of:
                2.876827 = boost
                4.7851086 = idf(docFreq=988, maxDocs=43556)
                0.017633893 = queryNorm
              0.2990693 = fieldWeight in 170, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7851086 = idf(docFreq=988, maxDocs=43556)
                0.0625 = fieldNorm(doc=170)
          0.092932545 = weight(abstract_txt:automatic in 170) [ClassicSimilarity], result of:
            0.092932545 = score(doc=170,freq=1.0), product of:
              0.28618613 = queryWeight, product of:
                3.1236415 = boost
                5.195642 = idf(docFreq=655, maxDocs=43556)
                0.017633893 = queryNorm
              0.32472762 = fieldWeight in 170, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.195642 = idf(docFreq=655, maxDocs=43556)
                0.0625 = fieldNorm(doc=170)
          0.0989293 = weight(abstract_txt:machine in 170) [ClassicSimilarity], result of:
            0.0989293 = score(doc=170,freq=1.0), product of:
              0.29836875 = queryWeight, product of:
                3.1894336 = boost
                5.3050756 = idf(docFreq=587, maxDocs=43556)
                0.017633893 = queryNorm
              0.33156723 = fieldWeight in 170, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3050756 = idf(docFreq=587, maxDocs=43556)
                0.0625 = fieldNorm(doc=170)
          0.15699497 = weight(abstract_txt:classification in 170) [ClassicSimilarity], result of:
            0.15699497 = score(doc=170,freq=7.0), product of:
              0.23739444 = queryWeight, product of:
                3.366171 = boost
                3.9993203 = idf(docFreq=2169, maxDocs=43556)
                0.017633893 = queryNorm
              0.6613254 = fieldWeight in 170, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                3.9993203 = idf(docFreq=2169, maxDocs=43556)
                0.0625 = fieldNorm(doc=170)
          0.07440413 = weight(abstract_txt:library in 170) [ClassicSimilarity], result of:
            0.07440413 = score(doc=170,freq=3.0), product of:
              0.21556139 = queryWeight, product of:
                3.83387 = boost
                3.1884925 = idf(docFreq=4881, maxDocs=43556)
                0.017633893 = queryNorm
              0.34516445 = fieldWeight in 170, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.1884925 = idf(docFreq=4881, maxDocs=43556)
                0.0625 = fieldNorm(doc=170)
          0.08361776 = weight(abstract_txt:document in 170) [ClassicSimilarity], result of:
            0.08361776 = score(doc=170,freq=1.0), product of:
              0.31196728 = queryWeight, product of:
                4.125261 = boost
                4.28854 = idf(docFreq=1624, maxDocs=43556)
                0.017633893 = queryNorm
              0.26803374 = fieldWeight in 170, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.28854 = idf(docFreq=1624, maxDocs=43556)
                0.0625 = fieldNorm(doc=170)
        0.44 = coord(11/25)
    
  2. Dietterich, T.G.: Machine-learning research : four current directions (1997) 0.37
    0.36756164 = sum of:
      0.36756164 = product of:
        1.3127202 = sum of:
          0.047643572 = weight(abstract_txt:methods in 4319) [ClassicSimilarity], result of:
            0.047643572 = score(doc=4319,freq=1.0), product of:
              0.07332739 = queryWeight, product of:
                4.1583214 = idf(docFreq=1850, maxDocs=43556)
                0.017633893 = queryNorm
              0.6497377 = fieldWeight in 4319, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1583214 = idf(docFreq=1850, maxDocs=43556)
                0.15625 = fieldNorm(doc=4319)
          0.05280695 = weight(abstract_txt:current in 4319) [ClassicSimilarity], result of:
            0.05280695 = score(doc=4319,freq=1.0), product of:
              0.07853395 = queryWeight, product of:
                1.0348934 = boost
                4.303419 = idf(docFreq=1600, maxDocs=43556)
                0.017633893 = queryNorm
              0.67240924 = fieldWeight in 4319, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.303419 = idf(docFreq=1600, maxDocs=43556)
                0.15625 = fieldNorm(doc=4319)
          0.12507322 = weight(abstract_txt:algorithms in 4319) [ClassicSimilarity], result of:
            0.12507322 = score(doc=4319,freq=1.0), product of:
              0.13954243 = queryWeight, product of:
                1.3794947 = boost
                5.736382 = idf(docFreq=381, maxDocs=43556)
                0.017633893 = queryNorm
              0.8963097 = fieldWeight in 4319, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.736382 = idf(docFreq=381, maxDocs=43556)
                0.15625 = fieldNorm(doc=4319)
          0.28569087 = weight(abstract_txt:supervised in 4319) [ClassicSimilarity], result of:
            0.28569087 = score(doc=4319,freq=1.0), product of:
              0.2420254 = queryWeight, product of:
                1.8167591 = boost
                7.5546684 = idf(docFreq=61, maxDocs=43556)
                0.017633893 = queryNorm
              1.180417 = fieldWeight in 4319, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5546684 = idf(docFreq=61, maxDocs=43556)
                0.15625 = fieldNorm(doc=4319)
          0.405836 = weight(abstract_txt:learning in 4319) [ClassicSimilarity], result of:
            0.405836 = score(doc=4319,freq=5.0), product of:
              0.24274692 = queryWeight, product of:
                2.876827 = boost
                4.7851086 = idf(docFreq=988, maxDocs=43556)
                0.017633893 = queryNorm
              1.6718482 = fieldWeight in 4319, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.7851086 = idf(docFreq=988, maxDocs=43556)
                0.15625 = fieldNorm(doc=4319)
          0.24732326 = weight(abstract_txt:machine in 4319) [ClassicSimilarity], result of:
            0.24732326 = score(doc=4319,freq=1.0), product of:
              0.29836875 = queryWeight, product of:
                3.1894336 = boost
                5.3050756 = idf(docFreq=587, maxDocs=43556)
                0.017633893 = queryNorm
              0.8289181 = fieldWeight in 4319, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3050756 = idf(docFreq=587, maxDocs=43556)
                0.15625 = fieldNorm(doc=4319)
          0.1483463 = weight(abstract_txt:classification in 4319) [ClassicSimilarity], result of:
            0.1483463 = score(doc=4319,freq=1.0), product of:
              0.23739444 = queryWeight, product of:
                3.366171 = boost
                3.9993203 = idf(docFreq=2169, maxDocs=43556)
                0.017633893 = queryNorm
              0.6248938 = fieldWeight in 4319, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9993203 = idf(docFreq=2169, maxDocs=43556)
                0.15625 = fieldNorm(doc=4319)
        0.28 = coord(7/25)
    
  3. Li, Y.; Shawe-Taylor, J.: Advanced learning algorithms for cross-language patent retrieval and classification (2007) 0.28
    0.2823974 = sum of:
      0.2823974 = product of:
        0.8824918 = sum of:
          0.023821786 = weight(abstract_txt:methods in 2929) [ClassicSimilarity], result of:
            0.023821786 = score(doc=2929,freq=1.0), product of:
              0.07332739 = queryWeight, product of:
                4.1583214 = idf(docFreq=1850, maxDocs=43556)
                0.017633893 = queryNorm
              0.32486886 = fieldWeight in 2929, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1583214 = idf(docFreq=1850, maxDocs=43556)
                0.078125 = fieldNorm(doc=2929)
          0.0162813 = weight(abstract_txt:based in 2929) [ClassicSimilarity], result of:
            0.0162813 = score(doc=2929,freq=1.0), product of:
              0.06512881 = queryWeight, product of:
                1.1542479 = boost
                3.1998224 = idf(docFreq=4826, maxDocs=43556)
                0.017633893 = queryNorm
              0.24998613 = fieldWeight in 2929, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1998224 = idf(docFreq=4826, maxDocs=43556)
                0.078125 = fieldNorm(doc=2929)
          0.12507322 = weight(abstract_txt:algorithms in 2929) [ClassicSimilarity], result of:
            0.12507322 = score(doc=2929,freq=4.0), product of:
              0.13954243 = queryWeight, product of:
                1.3794947 = boost
                5.736382 = idf(docFreq=381, maxDocs=43556)
                0.017633893 = queryNorm
              0.8963097 = fieldWeight in 2929, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.736382 = idf(docFreq=381, maxDocs=43556)
                0.078125 = fieldNorm(doc=2929)
          0.09291692 = weight(abstract_txt:algorithm in 2929) [ClassicSimilarity], result of:
            0.09291692 = score(doc=2929,freq=1.0), product of:
              0.20799057 = queryWeight, product of:
                2.0626917 = boost
                5.7182236 = idf(docFreq=388, maxDocs=43556)
                0.017633893 = queryNorm
              0.44673622 = fieldWeight in 2929, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7182236 = idf(docFreq=388, maxDocs=43556)
                0.078125 = fieldNorm(doc=2929)
          0.2400958 = weight(abstract_txt:learning in 2929) [ClassicSimilarity], result of:
            0.2400958 = score(doc=2929,freq=7.0), product of:
              0.24274692 = queryWeight, product of:
                2.876827 = boost
                4.7851086 = idf(docFreq=988, maxDocs=43556)
                0.017633893 = queryNorm
              0.98907864 = fieldWeight in 2929, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                4.7851086 = idf(docFreq=988, maxDocs=43556)
                0.078125 = fieldNorm(doc=2929)
          0.17488393 = weight(abstract_txt:machine in 2929) [ClassicSimilarity], result of:
            0.17488393 = score(doc=2929,freq=2.0), product of:
              0.29836875 = queryWeight, product of:
                3.1894336 = boost
                5.3050756 = idf(docFreq=587, maxDocs=43556)
                0.017633893 = queryNorm
              0.58613354 = fieldWeight in 2929, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.3050756 = idf(docFreq=587, maxDocs=43556)
                0.078125 = fieldNorm(doc=2929)
          0.10489668 = weight(abstract_txt:classification in 2929) [ClassicSimilarity], result of:
            0.10489668 = score(doc=2929,freq=2.0), product of:
              0.23739444 = queryWeight, product of:
                3.366171 = boost
                3.9993203 = idf(docFreq=2169, maxDocs=43556)
                0.017633893 = queryNorm
              0.44186664 = fieldWeight in 2929, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.9993203 = idf(docFreq=2169, maxDocs=43556)
                0.078125 = fieldNorm(doc=2929)
          0.1045222 = weight(abstract_txt:document in 2929) [ClassicSimilarity], result of:
            0.1045222 = score(doc=2929,freq=1.0), product of:
              0.31196728 = queryWeight, product of:
                4.125261 = boost
                4.28854 = idf(docFreq=1624, maxDocs=43556)
                0.017633893 = queryNorm
              0.33504218 = fieldWeight in 2929, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.28854 = idf(docFreq=1624, maxDocs=43556)
                0.078125 = fieldNorm(doc=2929)
        0.32 = coord(8/25)
    
  4. Goller, C.; Löning, J.; Will, T.; Wolff, W.: Automatic document classification : a thourough evaluation of various methods (2000) 0.25
    0.24975194 = sum of:
      0.24975194 = product of:
        0.78047484 = sum of:
          0.041260544 = weight(abstract_txt:methods in 478) [ClassicSimilarity], result of:
            0.041260544 = score(doc=478,freq=3.0), product of:
              0.07332739 = queryWeight, product of:
                4.1583214 = idf(docFreq=1850, maxDocs=43556)
                0.017633893 = queryNorm
              0.56268936 = fieldWeight in 478, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.1583214 = idf(docFreq=1850, maxDocs=43556)
                0.078125 = fieldNorm(doc=478)
          0.0162813 = weight(abstract_txt:based in 478) [ClassicSimilarity], result of:
            0.0162813 = score(doc=478,freq=1.0), product of:
              0.06512881 = queryWeight, product of:
                1.1542479 = boost
                3.1998224 = idf(docFreq=4826, maxDocs=43556)
                0.017633893 = queryNorm
              0.24998613 = fieldWeight in 478, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1998224 = idf(docFreq=4826, maxDocs=43556)
                0.078125 = fieldNorm(doc=478)
          0.04109615 = weight(abstract_txt:improve in 478) [ClassicSimilarity], result of:
            0.04109615 = score(doc=478,freq=1.0), product of:
              0.10547531 = queryWeight, product of:
                1.1993399 = boost
                4.987241 = idf(docFreq=807, maxDocs=43556)
                0.017633893 = queryNorm
              0.38962817 = fieldWeight in 478, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.987241 = idf(docFreq=807, maxDocs=43556)
                0.078125 = fieldNorm(doc=478)
          0.12833661 = weight(abstract_txt:learning in 478) [ClassicSimilarity], result of:
            0.12833661 = score(doc=478,freq=2.0), product of:
              0.24274692 = queryWeight, product of:
                2.876827 = boost
                4.7851086 = idf(docFreq=988, maxDocs=43556)
                0.017633893 = queryNorm
              0.5286848 = fieldWeight in 478, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.7851086 = idf(docFreq=988, maxDocs=43556)
                0.078125 = fieldNorm(doc=478)
          0.11616568 = weight(abstract_txt:automatic in 478) [ClassicSimilarity], result of:
            0.11616568 = score(doc=478,freq=1.0), product of:
              0.28618613 = queryWeight, product of:
                3.1236415 = boost
                5.195642 = idf(docFreq=655, maxDocs=43556)
                0.017633893 = queryNorm
              0.40590954 = fieldWeight in 478, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.195642 = idf(docFreq=655, maxDocs=43556)
                0.078125 = fieldNorm(doc=478)
          0.12366163 = weight(abstract_txt:machine in 478) [ClassicSimilarity], result of:
            0.12366163 = score(doc=478,freq=1.0), product of:
              0.29836875 = queryWeight, product of:
                3.1894336 = boost
                5.3050756 = idf(docFreq=587, maxDocs=43556)
                0.017633893 = queryNorm
              0.41445905 = fieldWeight in 478, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3050756 = idf(docFreq=587, maxDocs=43556)
                0.078125 = fieldNorm(doc=478)
          0.16585621 = weight(abstract_txt:classification in 478) [ClassicSimilarity], result of:
            0.16585621 = score(doc=478,freq=5.0), product of:
              0.23739444 = queryWeight, product of:
                3.366171 = boost
                3.9993203 = idf(docFreq=2169, maxDocs=43556)
                0.017633893 = queryNorm
              0.6986525 = fieldWeight in 478, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.9993203 = idf(docFreq=2169, maxDocs=43556)
                0.078125 = fieldNorm(doc=478)
          0.14781672 = weight(abstract_txt:document in 478) [ClassicSimilarity], result of:
            0.14781672 = score(doc=478,freq=2.0), product of:
              0.31196728 = queryWeight, product of:
                4.125261 = boost
                4.28854 = idf(docFreq=1624, maxDocs=43556)
                0.017633893 = queryNorm
              0.4738212 = fieldWeight in 478, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.28854 = idf(docFreq=1624, maxDocs=43556)
                0.078125 = fieldNorm(doc=478)
        0.32 = coord(8/25)
    
  5. Ruiz, M.E.; Srinivasan, P.: Combining machine learning and hierarchical indexing structures for text categorization (2001) 0.24
    0.2443272 = sum of:
      0.2443272 = product of:
        0.87259716 = sum of:
          0.028586144 = weight(abstract_txt:methods in 2593) [ClassicSimilarity], result of:
            0.028586144 = score(doc=2593,freq=1.0), product of:
              0.07332739 = queryWeight, product of:
                4.1583214 = idf(docFreq=1850, maxDocs=43556)
                0.017633893 = queryNorm
              0.38984263 = fieldWeight in 2593, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1583214 = idf(docFreq=1850, maxDocs=43556)
                0.09375 = fieldNorm(doc=2593)
          0.019537559 = weight(abstract_txt:based in 2593) [ClassicSimilarity], result of:
            0.019537559 = score(doc=2593,freq=1.0), product of:
              0.06512881 = queryWeight, product of:
                1.1542479 = boost
                3.1998224 = idf(docFreq=4826, maxDocs=43556)
                0.017633893 = queryNorm
              0.29998335 = fieldWeight in 2593, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1998224 = idf(docFreq=4826, maxDocs=43556)
                0.09375 = fieldNorm(doc=2593)
          0.16352476 = weight(abstract_txt:categorization in 2593) [ClassicSimilarity], result of:
            0.16352476 = score(doc=2593,freq=2.0), product of:
              0.18615508 = queryWeight, product of:
                1.5933249 = boost
                6.625557 = idf(docFreq=156, maxDocs=43556)
                0.017633893 = queryNorm
              0.878433 = fieldWeight in 2593, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.625557 = idf(docFreq=156, maxDocs=43556)
                0.09375 = fieldNorm(doc=2593)
          0.15768525 = weight(abstract_txt:algorithm in 2593) [ClassicSimilarity], result of:
            0.15768525 = score(doc=2593,freq=2.0), product of:
              0.20799057 = queryWeight, product of:
                2.0626917 = boost
                5.7182236 = idf(docFreq=388, maxDocs=43556)
                0.017633893 = queryNorm
              0.7581365 = fieldWeight in 2593, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.7182236 = idf(docFreq=388, maxDocs=43556)
                0.09375 = fieldNorm(doc=2593)
          0.15400392 = weight(abstract_txt:learning in 2593) [ClassicSimilarity], result of:
            0.15400392 = score(doc=2593,freq=2.0), product of:
              0.24274692 = queryWeight, product of:
                2.876827 = boost
                4.7851086 = idf(docFreq=988, maxDocs=43556)
                0.017633893 = queryNorm
              0.6344217 = fieldWeight in 2593, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.7851086 = idf(docFreq=988, maxDocs=43556)
                0.09375 = fieldNorm(doc=2593)
          0.13939881 = weight(abstract_txt:automatic in 2593) [ClassicSimilarity], result of:
            0.13939881 = score(doc=2593,freq=1.0), product of:
              0.28618613 = queryWeight, product of:
                3.1236415 = boost
                5.195642 = idf(docFreq=655, maxDocs=43556)
                0.017633893 = queryNorm
              0.48709142 = fieldWeight in 2593, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.195642 = idf(docFreq=655, maxDocs=43556)
                0.09375 = fieldNorm(doc=2593)
          0.20986073 = weight(abstract_txt:machine in 2593) [ClassicSimilarity], result of:
            0.20986073 = score(doc=2593,freq=2.0), product of:
              0.29836875 = queryWeight, product of:
                3.1894336 = boost
                5.3050756 = idf(docFreq=587, maxDocs=43556)
                0.017633893 = queryNorm
              0.70336026 = fieldWeight in 2593, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.3050756 = idf(docFreq=587, maxDocs=43556)
                0.09375 = fieldNorm(doc=2593)
        0.28 = coord(7/25)