Document (#28391)

Author
Sebastiani, F.
Title
Machine learning in automated text categorization
Source
ACM computing surveys. 34(2002) no.1, S.1-47
Year
2002
Abstract
The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last 10 years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based an machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert labor power, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.
Theme
Automatisches Klassifizieren
Computerlinguistik

Similar documents (author)

  1. Sebastiani, F.: On the role of logic in information retrieval (1998) 5.99
    5.9875464 = sum of:
      5.9875464 = weight(author_txt:sebastiani in 2141) [ClassicSimilarity], result of:
        5.9875464 = fieldWeight in 2141, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.580074 = idf(docFreq=7, maxDocs=42596)
          0.625 = fieldNorm(doc=2141)
    
  2. Sebastiani, F.: ¬A tutorial an automated text categorisation (1999) 5.99
    5.9875464 = sum of:
      5.9875464 = weight(author_txt:sebastiani in 4391) [ClassicSimilarity], result of:
        5.9875464 = fieldWeight in 4391, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.580074 = idf(docFreq=7, maxDocs=42596)
          0.625 = fieldNorm(doc=4391)
    
  3. Sebastiani, F.: Classification of text, automatic (2006) 5.99
    5.9875464 = sum of:
      5.9875464 = weight(author_txt:sebastiani in 4) [ClassicSimilarity], result of:
        5.9875464 = fieldWeight in 4, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.580074 = idf(docFreq=7, maxDocs=42596)
          0.625 = fieldNorm(doc=4)
    
  4. Debole, F.; Sebastiani, F.: ¬An analysis of the relative hardness of Reuters-21578 subsets (2005) 4.79
    4.790037 = sum of:
      4.790037 = weight(author_txt:sebastiani in 4457) [ClassicSimilarity], result of:
        4.790037 = fieldWeight in 4457, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.580074 = idf(docFreq=7, maxDocs=42596)
          0.5 = fieldNorm(doc=4457)
    
  5. Giorgetti, D.; Sebastiani, F.: Automating survey coding by multiclass text categorization techniques (2003) 4.79
    4.790037 = sum of:
      4.790037 = weight(author_txt:sebastiani in 173) [ClassicSimilarity], result of:
        4.790037 = fieldWeight in 173, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.580074 = idf(docFreq=7, maxDocs=42596)
          0.5 = fieldNorm(doc=173)
    

Similar documents (content)

  1. Sebastiani, F.: ¬A tutorial an automated text categorisation (1999) 0.87
    0.874049 = sum of:
      0.874049 = product of:
        1.4567482 = sum of:
          0.059529167 = weight(abstract_txt:builds in 4391) [ClassicSimilarity], result of:
            0.059529167 = score(doc=4391,freq=1.0), product of:
              0.13303153 = queryWeight, product of:
                1.0974886 = boost
                7.159706 = idf(docFreq=89, maxDocs=42596)
                0.016930094 = queryNorm
              0.44748163 = fieldWeight in 4391, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.159706 = idf(docFreq=89, maxDocs=42596)
                0.0625 = fieldNorm(doc=4391)
          0.059529167 = weight(abstract_txt:dominant in 4391) [ClassicSimilarity], result of:
            0.059529167 = score(doc=4391,freq=1.0), product of:
              0.13303153 = queryWeight, product of:
                1.0974886 = boost
                7.159706 = idf(docFreq=89, maxDocs=42596)
                0.016930094 = queryNorm
              0.44748163 = fieldWeight in 4391, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.159706 = idf(docFreq=89, maxDocs=42596)
                0.0625 = fieldNorm(doc=4391)
          0.01005056 = weight(abstract_txt:this in 4391) [ClassicSimilarity], result of:
            0.01005056 = score(doc=4391,freq=2.0), product of:
              0.04651846 = queryWeight, product of:
                1.124077 = boost
                2.4443867 = idf(docFreq=10047, maxDocs=42596)
                0.016930094 = queryNorm
              0.2160553 = fieldWeight in 4391, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4443867 = idf(docFreq=10047, maxDocs=42596)
                0.0625 = fieldNorm(doc=4391)
          0.076018766 = weight(abstract_txt:inductive in 4391) [ClassicSimilarity], result of:
            0.076018766 = score(doc=4391,freq=1.0), product of:
              0.15658444 = queryWeight, product of:
                1.1906855 = boost
                7.7676954 = idf(docFreq=48, maxDocs=42596)
                0.016930094 = queryNorm
              0.48548096 = fieldWeight in 4391, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.7676954 = idf(docFreq=48, maxDocs=42596)
                0.0625 = fieldNorm(doc=4391)
          0.030456742 = weight(abstract_txt:text in 4391) [ClassicSimilarity], result of:
            0.030456742 = score(doc=4391,freq=2.0), product of:
              0.085098855 = queryWeight, product of:
                1.2413653 = boost
                4.049158 = idf(docFreq=2018, maxDocs=42596)
                0.016930094 = queryNorm
              0.35789838 = fieldWeight in 4391, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.049158 = idf(docFreq=2018, maxDocs=42596)
                0.0625 = fieldNorm(doc=4391)
          0.039111115 = weight(abstract_txt:documents in 4391) [ClassicSimilarity], result of:
            0.039111115 = score(doc=4391,freq=3.0), product of:
              0.08782897 = queryWeight, product of:
                1.2611207 = boost
                4.1135974 = idf(docFreq=1892, maxDocs=42596)
                0.016930094 = queryNorm
              0.44530997 = fieldWeight in 4391, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.1135974 = idf(docFreq=1892, maxDocs=42596)
                0.0625 = fieldNorm(doc=4391)
          0.10203626 = weight(abstract_txt:savings in 4391) [ClassicSimilarity], result of:
            0.10203626 = score(doc=4391,freq=1.0), product of:
              0.19053338 = queryWeight, product of:
                1.3134341 = boost
                8.568473 = idf(docFreq=21, maxDocs=42596)
                0.016930094 = queryNorm
              0.53552955 = fieldWeight in 4391, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.568473 = idf(docFreq=21, maxDocs=42596)
                0.0625 = fieldNorm(doc=4391)
          0.11152715 = weight(abstract_txt:witnessed in 4391) [ClassicSimilarity], result of:
            0.11152715 = score(doc=4391,freq=1.0), product of:
              0.20217237 = queryWeight, product of:
                1.3529559 = boost
                8.826303 = idf(docFreq=16, maxDocs=42596)
                0.016930094 = queryNorm
              0.5516439 = fieldWeight in 4391, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.826303 = idf(docFreq=16, maxDocs=42596)
                0.0625 = fieldNorm(doc=4391)
          0.12885617 = weight(abstract_txt:booming in 4391) [ClassicSimilarity], result of:
            0.12885617 = score(doc=4391,freq=1.0), product of:
              0.22260669 = queryWeight, product of:
                1.4196845 = boost
                9.2616205 = idf(docFreq=10, maxDocs=42596)
                0.016930094 = queryNorm
              0.5788513 = fieldWeight in 4391, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.2616205 = idf(docFreq=10, maxDocs=42596)
                0.0625 = fieldNorm(doc=4391)
          0.06460812 = weight(abstract_txt:categories in 4391) [ClassicSimilarity], result of:
            0.06460812 = score(doc=4391,freq=2.0), product of:
              0.14049454 = queryWeight, product of:
                1.595025 = boost
                5.202746 = idf(docFreq=636, maxDocs=42596)
                0.016930094 = queryNorm
              0.4598621 = fieldWeight in 4391, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.202746 = idf(docFreq=636, maxDocs=42596)
                0.0625 = fieldNorm(doc=4391)
          0.08140045 = weight(abstract_txt:automated in 4391) [ClassicSimilarity], result of:
            0.08140045 = score(doc=4391,freq=2.0), product of:
              0.16389005 = queryWeight, product of:
                1.7227174 = boost
                5.619261 = idf(docFreq=419, maxDocs=42596)
                0.016930094 = queryNorm
              0.49667716 = fieldWeight in 4391, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.619261 = idf(docFreq=419, maxDocs=42596)
                0.0625 = fieldNorm(doc=4391)
          0.037028857 = weight(abstract_txt:approach in 4391) [ClassicSimilarity], result of:
            0.037028857 = score(doc=4391,freq=2.0), product of:
              0.11096653 = queryWeight, product of:
                1.7361186 = boost
                3.7753158 = idf(docFreq=2654, maxDocs=42596)
                0.016930094 = queryNorm
              0.33369392 = fieldWeight in 4391, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.7753158 = idf(docFreq=2654, maxDocs=42596)
                0.0625 = fieldNorm(doc=4391)
          0.1049247 = weight(abstract_txt:machine in 4391) [ClassicSimilarity], result of:
            0.1049247 = score(doc=4391,freq=2.0), product of:
              0.22220321 = queryWeight, product of:
                2.4567363 = boost
                5.342351 = idf(docFreq=553, maxDocs=42596)
                0.016930094 = queryNorm
              0.47220156 = fieldWeight in 4391, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.342351 = idf(docFreq=553, maxDocs=42596)
                0.0625 = fieldNorm(doc=4391)
          0.12508976 = weight(abstract_txt:learning in 4391) [ClassicSimilarity], result of:
            0.12508976 = score(doc=4391,freq=3.0), product of:
              0.24021244 = queryWeight, product of:
                2.9495142 = boost
                4.810449 = idf(docFreq=942, maxDocs=42596)
                0.016930094 = queryNorm
              0.5207464 = fieldWeight in 4391, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.810449 = idf(docFreq=942, maxDocs=42596)
                0.0625 = fieldNorm(doc=4391)
          0.42658144 = weight(abstract_txt:classifier in 4391) [ClassicSimilarity], result of:
            0.42658144 = score(doc=4391,freq=3.0), product of:
              0.54422975 = queryWeight, product of:
                4.4396005 = boost
                7.240675 = idf(docFreq=82, maxDocs=42596)
                0.016930094 = queryNorm
              0.78382605 = fieldWeight in 4391, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.240675 = idf(docFreq=82, maxDocs=42596)
                0.0625 = fieldNorm(doc=4391)
        0.6 = coord(15/25)
    
  2. Sebastiani, F.: Classification of text, automatic (2006) 0.45
    0.44954783 = sum of:
      0.44954783 = product of:
        1.1238695 = sum of:
          0.08929375 = weight(abstract_txt:builds in 4) [ClassicSimilarity], result of:
            0.08929375 = score(doc=4,freq=1.0), product of:
              0.13303153 = queryWeight, product of:
                1.0974886 = boost
                7.159706 = idf(docFreq=89, maxDocs=42596)
                0.016930094 = queryNorm
              0.67122245 = fieldWeight in 4, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.159706 = idf(docFreq=89, maxDocs=42596)
                0.09375 = fieldNorm(doc=4)
          0.010660229 = weight(abstract_txt:this in 4) [ClassicSimilarity], result of:
            0.010660229 = score(doc=4,freq=1.0), product of:
              0.04651846 = queryWeight, product of:
                1.124077 = boost
                2.4443867 = idf(docFreq=10047, maxDocs=42596)
                0.016930094 = queryNorm
              0.22916126 = fieldWeight in 4, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4443867 = idf(docFreq=10047, maxDocs=42596)
                0.09375 = fieldNorm(doc=4)
          0.045685116 = weight(abstract_txt:text in 4) [ClassicSimilarity], result of:
            0.045685116 = score(doc=4,freq=2.0), product of:
              0.085098855 = queryWeight, product of:
                1.2413653 = boost
                4.049158 = idf(docFreq=2018, maxDocs=42596)
                0.016930094 = queryNorm
              0.5368476 = fieldWeight in 4, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.049158 = idf(docFreq=2018, maxDocs=42596)
                0.09375 = fieldNorm(doc=4)
          0.13089186 = weight(abstract_txt:predefined in 4) [ClassicSimilarity], result of:
            0.13089186 = score(doc=4,freq=1.0), product of:
              0.17166522 = queryWeight, product of:
                1.2467055 = boost
                8.133155 = idf(docFreq=33, maxDocs=42596)
                0.016930094 = queryNorm
              0.76248324 = fieldWeight in 4, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.133155 = idf(docFreq=33, maxDocs=42596)
                0.09375 = fieldNorm(doc=4)
          0.096912175 = weight(abstract_txt:categories in 4) [ClassicSimilarity], result of:
            0.096912175 = score(doc=4,freq=2.0), product of:
              0.14049454 = queryWeight, product of:
                1.595025 = boost
                5.202746 = idf(docFreq=636, maxDocs=42596)
                0.016930094 = queryNorm
              0.68979317 = fieldWeight in 4, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.202746 = idf(docFreq=636, maxDocs=42596)
                0.09375 = fieldNorm(doc=4)
          0.122100666 = weight(abstract_txt:automated in 4) [ClassicSimilarity], result of:
            0.122100666 = score(doc=4,freq=2.0), product of:
              0.16389005 = queryWeight, product of:
                1.7227174 = boost
                5.619261 = idf(docFreq=419, maxDocs=42596)
                0.016930094 = queryNorm
              0.74501574 = fieldWeight in 4, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.619261 = idf(docFreq=419, maxDocs=42596)
                0.09375 = fieldNorm(doc=4)
          0.03927503 = weight(abstract_txt:approach in 4) [ClassicSimilarity], result of:
            0.03927503 = score(doc=4,freq=1.0), product of:
              0.11096653 = queryWeight, product of:
                1.7361186 = boost
                3.7753158 = idf(docFreq=2654, maxDocs=42596)
                0.016930094 = queryNorm
              0.35393584 = fieldWeight in 4, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.7753158 = idf(docFreq=2654, maxDocs=42596)
                0.09375 = fieldNorm(doc=4)
          0.111289464 = weight(abstract_txt:machine in 4) [ClassicSimilarity], result of:
            0.111289464 = score(doc=4,freq=1.0), product of:
              0.22220321 = queryWeight, product of:
                2.4567363 = boost
                5.342351 = idf(docFreq=553, maxDocs=42596)
                0.016930094 = queryNorm
              0.50084543 = fieldWeight in 4, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.342351 = idf(docFreq=553, maxDocs=42596)
                0.09375 = fieldNorm(doc=4)
          0.108330905 = weight(abstract_txt:learning in 4) [ClassicSimilarity], result of:
            0.108330905 = score(doc=4,freq=1.0), product of:
              0.24021244 = queryWeight, product of:
                2.9495142 = boost
                4.810449 = idf(docFreq=942, maxDocs=42596)
                0.016930094 = queryNorm
              0.4509796 = fieldWeight in 4, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.810449 = idf(docFreq=942, maxDocs=42596)
                0.09375 = fieldNorm(doc=4)
          0.3694304 = weight(abstract_txt:classifier in 4) [ClassicSimilarity], result of:
            0.3694304 = score(doc=4,freq=1.0), product of:
              0.54422975 = queryWeight, product of:
                4.4396005 = boost
                7.240675 = idf(docFreq=82, maxDocs=42596)
                0.016930094 = queryNorm
              0.6788133 = fieldWeight in 4, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.240675 = idf(docFreq=82, maxDocs=42596)
                0.09375 = fieldNorm(doc=4)
        0.4 = coord(10/25)
    
  3. Li, T.; Zhu, S.; Ogihara, M.: Hierarchical document classification using automatically generated hierarchy (2007) 0.36
    0.35610443 = sum of:
      0.35610443 = product of:
        0.89026105 = sum of:
          0.008883524 = weight(abstract_txt:this in 798) [ClassicSimilarity], result of:
            0.008883524 = score(doc=798,freq=1.0), product of:
              0.04651846 = queryWeight, product of:
                1.124077 = boost
                2.4443867 = idf(docFreq=10047, maxDocs=42596)
                0.016930094 = queryNorm
              0.19096771 = fieldWeight in 798, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4443867 = idf(docFreq=10047, maxDocs=42596)
                0.078125 = fieldNorm(doc=798)
          0.020433461 = weight(abstract_txt:different in 798) [ClassicSimilarity], result of:
            0.020433461 = score(doc=798,freq=1.0), product of:
              0.07081077 = queryWeight, product of:
                1.1323675 = boost
                3.6936228 = idf(docFreq=2880, maxDocs=42596)
                0.016930094 = queryNorm
              0.2885643 = fieldWeight in 798, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6936228 = idf(docFreq=2880, maxDocs=42596)
                0.078125 = fieldNorm(doc=798)
          0.03807093 = weight(abstract_txt:text in 798) [ClassicSimilarity], result of:
            0.03807093 = score(doc=798,freq=2.0), product of:
              0.085098855 = queryWeight, product of:
                1.2413653 = boost
                4.049158 = idf(docFreq=2018, maxDocs=42596)
                0.016930094 = queryNorm
              0.44737297 = fieldWeight in 798, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.049158 = idf(docFreq=2018, maxDocs=42596)
                0.078125 = fieldNorm(doc=798)
          0.028226018 = weight(abstract_txt:documents in 798) [ClassicSimilarity], result of:
            0.028226018 = score(doc=798,freq=1.0), product of:
              0.08782897 = queryWeight, product of:
                1.2611207 = boost
                4.1135974 = idf(docFreq=1892, maxDocs=42596)
                0.016930094 = queryNorm
              0.3213748 = fieldWeight in 798, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1135974 = idf(docFreq=1892, maxDocs=42596)
                0.078125 = fieldNorm(doc=798)
          0.13940895 = weight(abstract_txt:witnessed in 798) [ClassicSimilarity], result of:
            0.13940895 = score(doc=798,freq=1.0), product of:
              0.20217237 = queryWeight, product of:
                1.3529559 = boost
                8.826303 = idf(docFreq=16, maxDocs=42596)
                0.016930094 = queryNorm
              0.68955487 = fieldWeight in 798, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.826303 = idf(docFreq=16, maxDocs=42596)
                0.078125 = fieldNorm(doc=798)
          0.16107021 = weight(abstract_txt:booming in 798) [ClassicSimilarity], result of:
            0.16107021 = score(doc=798,freq=1.0), product of:
              0.22260669 = queryWeight, product of:
                1.4196845 = boost
                9.2616205 = idf(docFreq=10, maxDocs=42596)
                0.016930094 = queryNorm
              0.7235641 = fieldWeight in 798, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.2616205 = idf(docFreq=10, maxDocs=42596)
                0.078125 = fieldNorm(doc=798)
          0.08076015 = weight(abstract_txt:categories in 798) [ClassicSimilarity], result of:
            0.08076015 = score(doc=798,freq=2.0), product of:
              0.14049454 = queryWeight, product of:
                1.595025 = boost
                5.202746 = idf(docFreq=636, maxDocs=42596)
                0.016930094 = queryNorm
              0.5748277 = fieldWeight in 798, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.202746 = idf(docFreq=636, maxDocs=42596)
                0.078125 = fieldNorm(doc=798)
          0.07194851 = weight(abstract_txt:automated in 798) [ClassicSimilarity], result of:
            0.07194851 = score(doc=798,freq=1.0), product of:
              0.16389005 = queryWeight, product of:
                1.7227174 = boost
                5.619261 = idf(docFreq=419, maxDocs=42596)
                0.016930094 = queryNorm
              0.43900475 = fieldWeight in 798, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.619261 = idf(docFreq=419, maxDocs=42596)
                0.078125 = fieldNorm(doc=798)
          0.032729197 = weight(abstract_txt:approach in 798) [ClassicSimilarity], result of:
            0.032729197 = score(doc=798,freq=1.0), product of:
              0.11096653 = queryWeight, product of:
                1.7361186 = boost
                3.7753158 = idf(docFreq=2654, maxDocs=42596)
                0.016930094 = queryNorm
              0.29494655 = fieldWeight in 798, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.7753158 = idf(docFreq=2654, maxDocs=42596)
                0.078125 = fieldNorm(doc=798)
          0.3087301 = weight(abstract_txt:categorization in 798) [ClassicSimilarity], result of:
            0.3087301 = score(doc=798,freq=3.0), product of:
              0.34348997 = queryWeight, product of:
                3.054502 = boost
                6.6422358 = idf(docFreq=150, maxDocs=42596)
                0.016930094 = queryNorm
              0.89880383 = fieldWeight in 798, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.6422358 = idf(docFreq=150, maxDocs=42596)
                0.078125 = fieldNorm(doc=798)
        0.4 = coord(10/25)
    
  4. Duwairi, R.M.: Machine learning for Arabic text categorization (2006) 0.32
    0.3163804 = sum of:
      0.3163804 = product of:
        1.12993 = sum of:
          0.008883524 = weight(abstract_txt:this in 116) [ClassicSimilarity], result of:
            0.008883524 = score(doc=116,freq=1.0), product of:
              0.04651846 = queryWeight, product of:
                1.124077 = boost
                2.4443867 = idf(docFreq=10047, maxDocs=42596)
                0.016930094 = queryNorm
              0.19096771 = fieldWeight in 116, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4443867 = idf(docFreq=10047, maxDocs=42596)
                0.078125 = fieldNorm(doc=116)
          0.026920212 = weight(abstract_txt:text in 116) [ClassicSimilarity], result of:
            0.026920212 = score(doc=116,freq=1.0), product of:
              0.085098855 = queryWeight, product of:
                1.2413653 = boost
                4.049158 = idf(docFreq=2018, maxDocs=42596)
                0.016930094 = queryNorm
              0.31634048 = fieldWeight in 116, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.049158 = idf(docFreq=2018, maxDocs=42596)
                0.078125 = fieldNorm(doc=116)
          0.056452036 = weight(abstract_txt:documents in 116) [ClassicSimilarity], result of:
            0.056452036 = score(doc=116,freq=4.0), product of:
              0.08782897 = queryWeight, product of:
                1.2611207 = boost
                4.1135974 = idf(docFreq=1892, maxDocs=42596)
                0.016930094 = queryNorm
              0.6427496 = fieldWeight in 116, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.1135974 = idf(docFreq=1892, maxDocs=42596)
                0.078125 = fieldNorm(doc=116)
          0.08076015 = weight(abstract_txt:categories in 116) [ClassicSimilarity], result of:
            0.08076015 = score(doc=116,freq=2.0), product of:
              0.14049454 = queryWeight, product of:
                1.595025 = boost
                5.202746 = idf(docFreq=636, maxDocs=42596)
                0.016930094 = queryNorm
              0.5748277 = fieldWeight in 116, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.202746 = idf(docFreq=636, maxDocs=42596)
                0.078125 = fieldNorm(doc=116)
          0.090275764 = weight(abstract_txt:learning in 116) [ClassicSimilarity], result of:
            0.090275764 = score(doc=116,freq=1.0), product of:
              0.24021244 = queryWeight, product of:
                2.9495142 = boost
                4.810449 = idf(docFreq=942, maxDocs=42596)
                0.016930094 = queryNorm
              0.37581635 = fieldWeight in 116, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.810449 = idf(docFreq=942, maxDocs=42596)
                0.078125 = fieldNorm(doc=116)
          0.17824541 = weight(abstract_txt:categorization in 116) [ClassicSimilarity], result of:
            0.17824541 = score(doc=116,freq=1.0), product of:
              0.34348997 = queryWeight, product of:
                3.054502 = boost
                6.6422358 = idf(docFreq=150, maxDocs=42596)
                0.016930094 = queryNorm
              0.51892465 = fieldWeight in 116, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6422358 = idf(docFreq=150, maxDocs=42596)
                0.078125 = fieldNorm(doc=116)
          0.6883929 = weight(abstract_txt:classifier in 116) [ClassicSimilarity], result of:
            0.6883929 = score(doc=116,freq=5.0), product of:
              0.54422975 = queryWeight, product of:
                4.4396005 = boost
                7.240675 = idf(docFreq=82, maxDocs=42596)
                0.016930094 = queryNorm
              1.2648939 = fieldWeight in 116, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.240675 = idf(docFreq=82, maxDocs=42596)
                0.078125 = fieldNorm(doc=116)
        0.28 = coord(7/25)
    
  5. Ruiz, M.E.; Srinivasan, P.: Combining machine learning and hierarchical indexing structures for text categorization (2001) 0.31
    0.31006798 = sum of:
      0.31006798 = product of:
        1.1073856 = sum of:
          0.010660229 = weight(abstract_txt:this in 2596) [ClassicSimilarity], result of:
            0.010660229 = score(doc=2596,freq=1.0), product of:
              0.04651846 = queryWeight, product of:
                1.124077 = boost
                2.4443867 = idf(docFreq=10047, maxDocs=42596)
                0.016930094 = queryNorm
              0.22916126 = fieldWeight in 2596, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4443867 = idf(docFreq=10047, maxDocs=42596)
                0.09375 = fieldNorm(doc=2596)
          0.045685116 = weight(abstract_txt:text in 2596) [ClassicSimilarity], result of:
            0.045685116 = score(doc=2596,freq=2.0), product of:
              0.085098855 = queryWeight, product of:
                1.2413653 = boost
                4.049158 = idf(docFreq=2018, maxDocs=42596)
                0.016930094 = queryNorm
              0.5368476 = fieldWeight in 2596, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.049158 = idf(docFreq=2018, maxDocs=42596)
                0.09375 = fieldNorm(doc=2596)
          0.06852726 = weight(abstract_txt:categories in 2596) [ClassicSimilarity], result of:
            0.06852726 = score(doc=2596,freq=1.0), product of:
              0.14049454 = queryWeight, product of:
                1.595025 = boost
                5.202746 = idf(docFreq=636, maxDocs=42596)
                0.016930094 = queryNorm
              0.48775744 = fieldWeight in 2596, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.202746 = idf(docFreq=636, maxDocs=42596)
                0.09375 = fieldNorm(doc=2596)
          0.15738705 = weight(abstract_txt:machine in 2596) [ClassicSimilarity], result of:
            0.15738705 = score(doc=2596,freq=2.0), product of:
              0.22220321 = queryWeight, product of:
                2.4567363 = boost
                5.342351 = idf(docFreq=553, maxDocs=42596)
                0.016930094 = queryNorm
              0.7083023 = fieldWeight in 2596, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.342351 = idf(docFreq=553, maxDocs=42596)
                0.09375 = fieldNorm(doc=2596)
          0.15320306 = weight(abstract_txt:learning in 2596) [ClassicSimilarity], result of:
            0.15320306 = score(doc=2596,freq=2.0), product of:
              0.24021244 = queryWeight, product of:
                2.9495142 = boost
                4.810449 = idf(docFreq=942, maxDocs=42596)
                0.016930094 = queryNorm
              0.6377815 = fieldWeight in 2596, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.810449 = idf(docFreq=942, maxDocs=42596)
                0.09375 = fieldNorm(doc=2596)
          0.3024925 = weight(abstract_txt:categorization in 2596) [ClassicSimilarity], result of:
            0.3024925 = score(doc=2596,freq=2.0), product of:
              0.34348997 = queryWeight, product of:
                3.054502 = boost
                6.6422358 = idf(docFreq=150, maxDocs=42596)
                0.016930094 = queryNorm
              0.8806443 = fieldWeight in 2596, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.6422358 = idf(docFreq=150, maxDocs=42596)
                0.09375 = fieldNorm(doc=2596)
          0.3694304 = weight(abstract_txt:classifier in 2596) [ClassicSimilarity], result of:
            0.3694304 = score(doc=2596,freq=1.0), product of:
              0.54422975 = queryWeight, product of:
                4.4396005 = boost
                7.240675 = idf(docFreq=82, maxDocs=42596)
                0.016930094 = queryNorm
              0.6788133 = fieldWeight in 2596, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.240675 = idf(docFreq=82, maxDocs=42596)
                0.09375 = fieldNorm(doc=2596)
        0.28 = coord(7/25)