Document (#28391)

Author
Sebastiani, F.
Title
Machine learning in automated text categorization
Source
ACM computing surveys. 34(2002) no.1, S.1-47
Year
2002
Abstract
The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last 10 years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based an machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert labor power, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.
Theme
Automatisches Klassifizieren
Computerlinguistik

Similar documents (author)

  1. Sebastiani, F.: On the role of logic in information retrieval (1998) 6.00
    5.9971275 = sum of:
      5.9971275 = weight(author_txt:sebastiani in 3141) [ClassicSimilarity], result of:
        5.9971275 = fieldWeight in 3141, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.595404 = idf(docFreq=7, maxDocs=43254)
          0.625 = fieldNorm(doc=3141)
    
  2. Sebastiani, F.: ¬A tutorial an automated text categorisation (1999) 6.00
    5.9971275 = sum of:
      5.9971275 = weight(author_txt:sebastiani in 5391) [ClassicSimilarity], result of:
        5.9971275 = fieldWeight in 5391, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.595404 = idf(docFreq=7, maxDocs=43254)
          0.625 = fieldNorm(doc=5391)
    
  3. Sebastiani, F.: Classification of text, automatic (2006) 6.00
    5.9971275 = sum of:
      5.9971275 = weight(author_txt:sebastiani in 4) [ClassicSimilarity], result of:
        5.9971275 = fieldWeight in 4, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.595404 = idf(docFreq=7, maxDocs=43254)
          0.625 = fieldNorm(doc=4)
    
  4. Debole, F.; Sebastiani, F.: ¬An analysis of the relative hardness of Reuters-21578 subsets (2005) 4.80
    4.797702 = sum of:
      4.797702 = weight(author_txt:sebastiani in 5457) [ClassicSimilarity], result of:
        4.797702 = fieldWeight in 5457, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.595404 = idf(docFreq=7, maxDocs=43254)
          0.5 = fieldNorm(doc=5457)
    
  5. Giorgetti, D.; Sebastiani, F.: Automating survey coding by multiclass text categorization techniques (2003) 4.80
    4.797702 = sum of:
      4.797702 = weight(author_txt:sebastiani in 173) [ClassicSimilarity], result of:
        4.797702 = fieldWeight in 173, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.595404 = idf(docFreq=7, maxDocs=43254)
          0.5 = fieldNorm(doc=173)
    

Similar documents (content)

  1. Sebastiani, F.: ¬A tutorial an automated text categorisation (1999) 0.87
    0.8731544 = sum of:
      0.8731544 = product of:
        1.4552573 = sum of:
          0.058053933 = weight(abstract_txt:dominant in 5391) [ClassicSimilarity], result of:
            0.058053933 = score(doc=5391,freq=1.0), product of:
              0.1308233 = queryWeight, product of:
                1.0925475 = boost
                7.100134 = idf(docFreq=96, maxDocs=43254)
                0.016864685 = queryNorm
              0.44375837 = fieldWeight in 5391, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.100134 = idf(docFreq=96, maxDocs=43254)
                0.0625 = fieldNorm(doc=5391)
          0.058308512 = weight(abstract_txt:builds in 5391) [ClassicSimilarity], result of:
            0.058308512 = score(doc=5391,freq=1.0), product of:
              0.13120548 = queryWeight, product of:
                1.0941422 = boost
                7.110497 = idf(docFreq=95, maxDocs=43254)
                0.016864685 = queryNorm
              0.44440606 = fieldWeight in 5391, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.110497 = idf(docFreq=95, maxDocs=43254)
                0.0625 = fieldNorm(doc=5391)
          0.009891634 = weight(abstract_txt:this in 5391) [ClassicSimilarity], result of:
            0.009891634 = score(doc=5391,freq=2.0), product of:
              0.04602634 = queryWeight, product of:
                1.1224362 = boost
                2.4314568 = idf(docFreq=10335, maxDocs=43254)
                0.016864685 = queryNorm
              0.21491244 = fieldWeight in 5391, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4314568 = idf(docFreq=10335, maxDocs=43254)
                0.0625 = fieldNorm(doc=5391)
          0.07587365 = weight(abstract_txt:inductive in 5391) [ClassicSimilarity], result of:
            0.07587365 = score(doc=5391,freq=1.0), product of:
              0.15638365 = queryWeight, product of:
                1.1945201 = boost
                7.762822 = idf(docFreq=49, maxDocs=43254)
                0.016864685 = queryNorm
              0.48517638 = fieldWeight in 5391, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.762822 = idf(docFreq=49, maxDocs=43254)
                0.0625 = fieldNorm(doc=5391)
          0.030468972 = weight(abstract_txt:text in 5391) [ClassicSimilarity], result of:
            0.030468972 = score(doc=5391,freq=2.0), product of:
              0.085120834 = queryWeight, product of:
                1.2463233 = boost
                4.049738 = idf(docFreq=2048, maxDocs=43254)
                0.016864685 = queryNorm
              0.35794964 = fieldWeight in 5391, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.049738 = idf(docFreq=2048, maxDocs=43254)
                0.0625 = fieldNorm(doc=5391)
          0.039187964 = weight(abstract_txt:documents in 5391) [ClassicSimilarity], result of:
            0.039187964 = score(doc=5391,freq=3.0), product of:
              0.08794316 = queryWeight, product of:
                1.2668169 = boost
                4.1163282 = idf(docFreq=1916, maxDocs=43254)
                0.016864685 = queryNorm
              0.4456056 = fieldWeight in 5391, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.1163282 = idf(docFreq=1916, maxDocs=43254)
                0.0625 = fieldNorm(doc=5391)
          0.102582 = weight(abstract_txt:savings in 5391) [ClassicSimilarity], result of:
            0.102582 = score(doc=5391,freq=1.0), product of:
              0.19121037 = queryWeight, product of:
                1.32085 = boost
                8.583802 = idf(docFreq=21, maxDocs=43254)
                0.016864685 = queryNorm
              0.53648764 = fieldWeight in 5391, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.583802 = idf(docFreq=21, maxDocs=43254)
                0.0625 = fieldNorm(doc=5391)
          0.11210614 = weight(abstract_txt:witnessed in 5391) [ClassicSimilarity], result of:
            0.11210614 = score(doc=5391,freq=1.0), product of:
              0.20286958 = queryWeight, product of:
                1.3605242 = boost
                8.841632 = idf(docFreq=16, maxDocs=43254)
                0.016864685 = queryNorm
              0.552602 = fieldWeight in 5391, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.841632 = idf(docFreq=16, maxDocs=43254)
                0.0625 = fieldNorm(doc=5391)
          0.12949342 = weight(abstract_txt:booming in 5391) [ClassicSimilarity], result of:
            0.12949342 = score(doc=5391,freq=1.0), product of:
              0.22333792 = queryWeight, product of:
                1.4275095 = boost
                9.27695 = idf(docFreq=10, maxDocs=43254)
                0.016864685 = queryNorm
              0.57980937 = fieldWeight in 5391, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.27695 = idf(docFreq=10, maxDocs=43254)
                0.0625 = fieldNorm(doc=5391)
          0.06453959 = weight(abstract_txt:categories in 5391) [ClassicSimilarity], result of:
            0.06453959 = score(doc=5391,freq=2.0), product of:
              0.14039387 = queryWeight, product of:
                1.6006149 = boost
                5.2009544 = idf(docFreq=647, maxDocs=43254)
                0.016864685 = queryNorm
              0.45970377 = fieldWeight in 5391, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.2009544 = idf(docFreq=647, maxDocs=43254)
                0.0625 = fieldNorm(doc=5391)
          0.08144793 = weight(abstract_txt:automated in 5391) [ClassicSimilarity], result of:
            0.08144793 = score(doc=5391,freq=2.0), product of:
              0.16395225 = queryWeight, product of:
                1.7297027 = boost
                5.6204057 = idf(docFreq=425, maxDocs=43254)
                0.016864685 = queryNorm
              0.49677837 = fieldWeight in 5391, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.6204057 = idf(docFreq=425, maxDocs=43254)
                0.0625 = fieldNorm(doc=5391)
          0.036458086 = weight(abstract_txt:approach in 5391) [ClassicSimilarity], result of:
            0.036458086 = score(doc=5391,freq=2.0), product of:
              0.10982225 = queryWeight, product of:
                1.7338183 = boost
                3.7558525 = idf(docFreq=2748, maxDocs=43254)
                0.016864685 = queryNorm
              0.33197358 = fieldWeight in 5391, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.7558525 = idf(docFreq=2748, maxDocs=43254)
                0.0625 = fieldNorm(doc=5391)
          0.10363813 = weight(abstract_txt:machine in 5391) [ClassicSimilarity], result of:
            0.10363813 = score(doc=5391,freq=2.0), product of:
              0.22038099 = queryWeight, product of:
                2.456097 = boost
                5.320475 = idf(docFreq=574, maxDocs=43254)
                0.016864685 = queryNorm
              0.47026798 = fieldWeight in 5391, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.320475 = idf(docFreq=574, maxDocs=43254)
                0.0625 = fieldNorm(doc=5391)
          0.12392288 = weight(abstract_txt:learning in 5391) [ClassicSimilarity], result of:
            0.12392288 = score(doc=5391,freq=3.0), product of:
              0.23871401 = queryWeight, product of:
                2.9516628 = boost
                4.7954893 = idf(docFreq=971, maxDocs=43254)
                0.016864685 = queryNorm
              0.51912695 = fieldWeight in 5391, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.7954893 = idf(docFreq=971, maxDocs=43254)
                0.0625 = fieldNorm(doc=5391)
          0.4292846 = weight(abstract_txt:classifier in 5391) [ClassicSimilarity], result of:
            0.4292846 = score(doc=5391,freq=3.0), product of:
              0.54652137 = queryWeight, product of:
                4.46613 = boost
                7.2560043 = idf(docFreq=82, maxDocs=43254)
                0.016864685 = queryNorm
              0.7854855 = fieldWeight in 5391, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.2560043 = idf(docFreq=82, maxDocs=43254)
                0.0625 = fieldNorm(doc=5391)
        0.6 = coord(15/25)
    
  2. Sebastiani, F.: Classification of text, automatic (2006) 0.45
    0.44878206 = sum of:
      0.44878206 = product of:
        1.1219552 = sum of:
          0.08746277 = weight(abstract_txt:builds in 4) [ClassicSimilarity], result of:
            0.08746277 = score(doc=4,freq=1.0), product of:
              0.13120548 = queryWeight, product of:
                1.0941422 = boost
                7.110497 = idf(docFreq=95, maxDocs=43254)
                0.016864685 = queryNorm
              0.6666091 = fieldWeight in 4, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.110497 = idf(docFreq=95, maxDocs=43254)
                0.09375 = fieldNorm(doc=4)
          0.010491663 = weight(abstract_txt:this in 4) [ClassicSimilarity], result of:
            0.010491663 = score(doc=4,freq=1.0), product of:
              0.04602634 = queryWeight, product of:
                1.1224362 = boost
                2.4314568 = idf(docFreq=10335, maxDocs=43254)
                0.016864685 = queryNorm
              0.22794908 = fieldWeight in 4, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4314568 = idf(docFreq=10335, maxDocs=43254)
                0.09375 = fieldNorm(doc=4)
          0.04570346 = weight(abstract_txt:text in 4) [ClassicSimilarity], result of:
            0.04570346 = score(doc=4,freq=2.0), product of:
              0.085120834 = queryWeight, product of:
                1.2463233 = boost
                4.049738 = idf(docFreq=2048, maxDocs=43254)
                0.016864685 = queryNorm
              0.5369245 = fieldWeight in 4, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.049738 = idf(docFreq=2048, maxDocs=43254)
                0.09375 = fieldNorm(doc=4)
          0.13162969 = weight(abstract_txt:predefined in 4) [ClassicSimilarity], result of:
            0.13162969 = score(doc=4,freq=1.0), product of:
              0.17230812 = queryWeight, product of:
                1.2538646 = boost
                8.148484 = idf(docFreq=33, maxDocs=43254)
                0.016864685 = queryNorm
              0.7639204 = fieldWeight in 4, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.148484 = idf(docFreq=33, maxDocs=43254)
                0.09375 = fieldNorm(doc=4)
          0.09680939 = weight(abstract_txt:categories in 4) [ClassicSimilarity], result of:
            0.09680939 = score(doc=4,freq=2.0), product of:
              0.14039387 = queryWeight, product of:
                1.6006149 = boost
                5.2009544 = idf(docFreq=647, maxDocs=43254)
                0.016864685 = queryNorm
              0.68955564 = fieldWeight in 4, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.2009544 = idf(docFreq=647, maxDocs=43254)
                0.09375 = fieldNorm(doc=4)
          0.12217189 = weight(abstract_txt:automated in 4) [ClassicSimilarity], result of:
            0.12217189 = score(doc=4,freq=2.0), product of:
              0.16395225 = queryWeight, product of:
                1.7297027 = boost
                5.6204057 = idf(docFreq=425, maxDocs=43254)
                0.016864685 = queryNorm
              0.74516755 = fieldWeight in 4, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.6204057 = idf(docFreq=425, maxDocs=43254)
                0.09375 = fieldNorm(doc=4)
          0.038669642 = weight(abstract_txt:approach in 4) [ClassicSimilarity], result of:
            0.038669642 = score(doc=4,freq=1.0), product of:
              0.10982225 = queryWeight, product of:
                1.7338183 = boost
                3.7558525 = idf(docFreq=2748, maxDocs=43254)
                0.016864685 = queryNorm
              0.35211116 = fieldWeight in 4, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.7558525 = idf(docFreq=2748, maxDocs=43254)
                0.09375 = fieldNorm(doc=4)
          0.10992484 = weight(abstract_txt:machine in 4) [ClassicSimilarity], result of:
            0.10992484 = score(doc=4,freq=1.0), product of:
              0.22038099 = queryWeight, product of:
                2.456097 = boost
                5.320475 = idf(docFreq=574, maxDocs=43254)
                0.016864685 = queryNorm
              0.49879456 = fieldWeight in 4, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.320475 = idf(docFreq=574, maxDocs=43254)
                0.09375 = fieldNorm(doc=4)
          0.10732036 = weight(abstract_txt:learning in 4) [ClassicSimilarity], result of:
            0.10732036 = score(doc=4,freq=1.0), product of:
              0.23871401 = queryWeight, product of:
                2.9516628 = boost
                4.7954893 = idf(docFreq=971, maxDocs=43254)
                0.016864685 = queryNorm
              0.44957712 = fieldWeight in 4, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7954893 = idf(docFreq=971, maxDocs=43254)
                0.09375 = fieldNorm(doc=4)
          0.3717714 = weight(abstract_txt:classifier in 4) [ClassicSimilarity], result of:
            0.3717714 = score(doc=4,freq=1.0), product of:
              0.54652137 = queryWeight, product of:
                4.46613 = boost
                7.2560043 = idf(docFreq=82, maxDocs=43254)
                0.016864685 = queryNorm
              0.6802504 = fieldWeight in 4, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.2560043 = idf(docFreq=82, maxDocs=43254)
                0.09375 = fieldNorm(doc=4)
        0.4 = coord(10/25)
    
  3. Li, T.; Zhu, S.; Ogihara, M.: Hierarchical document classification using automatically generated hierarchy (2007) 0.36
    0.35614425 = sum of:
      0.35614425 = product of:
        0.8903606 = sum of:
          0.008743051 = weight(abstract_txt:this in 1262) [ClassicSimilarity], result of:
            0.008743051 = score(doc=1262,freq=1.0), product of:
              0.04602634 = queryWeight, product of:
                1.1224362 = boost
                2.4314568 = idf(docFreq=10335, maxDocs=43254)
                0.016864685 = queryNorm
              0.18995756 = fieldWeight in 1262, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4314568 = idf(docFreq=10335, maxDocs=43254)
                0.078125 = fieldNorm(doc=1262)
          0.020244535 = weight(abstract_txt:different in 1262) [ClassicSimilarity], result of:
            0.020244535 = score(doc=1262,freq=1.0), product of:
              0.07037297 = queryWeight, product of:
                1.1332239 = boost
                3.6822383 = idf(docFreq=2958, maxDocs=43254)
                0.016864685 = queryNorm
              0.28767487 = fieldWeight in 1262, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6822383 = idf(docFreq=2958, maxDocs=43254)
                0.078125 = fieldNorm(doc=1262)
          0.038086213 = weight(abstract_txt:text in 1262) [ClassicSimilarity], result of:
            0.038086213 = score(doc=1262,freq=2.0), product of:
              0.085120834 = queryWeight, product of:
                1.2463233 = boost
                4.049738 = idf(docFreq=2048, maxDocs=43254)
                0.016864685 = queryNorm
              0.44743705 = fieldWeight in 1262, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.049738 = idf(docFreq=2048, maxDocs=43254)
                0.078125 = fieldNorm(doc=1262)
          0.028281478 = weight(abstract_txt:documents in 1262) [ClassicSimilarity], result of:
            0.028281478 = score(doc=1262,freq=1.0), product of:
              0.08794316 = queryWeight, product of:
                1.2668169 = boost
                4.1163282 = idf(docFreq=1916, maxDocs=43254)
                0.016864685 = queryNorm
              0.32158816 = fieldWeight in 1262, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1163282 = idf(docFreq=1916, maxDocs=43254)
                0.078125 = fieldNorm(doc=1262)
          0.14013267 = weight(abstract_txt:witnessed in 1262) [ClassicSimilarity], result of:
            0.14013267 = score(doc=1262,freq=1.0), product of:
              0.20286958 = queryWeight, product of:
                1.3605242 = boost
                8.841632 = idf(docFreq=16, maxDocs=43254)
                0.016864685 = queryNorm
              0.6907525 = fieldWeight in 1262, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.841632 = idf(docFreq=16, maxDocs=43254)
                0.078125 = fieldNorm(doc=1262)
          0.16186677 = weight(abstract_txt:booming in 1262) [ClassicSimilarity], result of:
            0.16186677 = score(doc=1262,freq=1.0), product of:
              0.22333792 = queryWeight, product of:
                1.4275095 = boost
                9.27695 = idf(docFreq=10, maxDocs=43254)
                0.016864685 = queryNorm
              0.7247617 = fieldWeight in 1262, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.27695 = idf(docFreq=10, maxDocs=43254)
                0.078125 = fieldNorm(doc=1262)
          0.08067449 = weight(abstract_txt:categories in 1262) [ClassicSimilarity], result of:
            0.08067449 = score(doc=1262,freq=2.0), product of:
              0.14039387 = queryWeight, product of:
                1.6006149 = boost
                5.2009544 = idf(docFreq=647, maxDocs=43254)
                0.016864685 = queryNorm
              0.5746297 = fieldWeight in 1262, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.2009544 = idf(docFreq=647, maxDocs=43254)
                0.078125 = fieldNorm(doc=1262)
          0.071990475 = weight(abstract_txt:automated in 1262) [ClassicSimilarity], result of:
            0.071990475 = score(doc=1262,freq=1.0), product of:
              0.16395225 = queryWeight, product of:
                1.7297027 = boost
                5.6204057 = idf(docFreq=425, maxDocs=43254)
                0.016864685 = queryNorm
              0.4390942 = fieldWeight in 1262, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6204057 = idf(docFreq=425, maxDocs=43254)
                0.078125 = fieldNorm(doc=1262)
          0.0322247 = weight(abstract_txt:approach in 1262) [ClassicSimilarity], result of:
            0.0322247 = score(doc=1262,freq=1.0), product of:
              0.10982225 = queryWeight, product of:
                1.7338183 = boost
                3.7558525 = idf(docFreq=2748, maxDocs=43254)
                0.016864685 = queryNorm
              0.29342598 = fieldWeight in 1262, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.7558525 = idf(docFreq=2748, maxDocs=43254)
                0.078125 = fieldNorm(doc=1262)
          0.30811623 = weight(abstract_txt:categorization in 1262) [ClassicSimilarity], result of:
            0.30811623 = score(doc=1262,freq=3.0), product of:
              0.3430313 = queryWeight, product of:
                3.0642576 = boost
                6.6378922 = idf(docFreq=153, maxDocs=43254)
                0.016864685 = queryNorm
              0.8982161 = fieldWeight in 1262, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.6378922 = idf(docFreq=153, maxDocs=43254)
                0.078125 = fieldNorm(doc=1262)
        0.4 = coord(10/25)
    
  4. Duwairi, R.M.: Machine learning for Arabic text categorization (2006) 0.32
    0.3172375 = sum of:
      0.3172375 = product of:
        1.1329911 = sum of:
          0.008743051 = weight(abstract_txt:this in 116) [ClassicSimilarity], result of:
            0.008743051 = score(doc=116,freq=1.0), product of:
              0.04602634 = queryWeight, product of:
                1.1224362 = boost
                2.4314568 = idf(docFreq=10335, maxDocs=43254)
                0.016864685 = queryNorm
              0.18995756 = fieldWeight in 116, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4314568 = idf(docFreq=10335, maxDocs=43254)
                0.078125 = fieldNorm(doc=116)
          0.026931021 = weight(abstract_txt:text in 116) [ClassicSimilarity], result of:
            0.026931021 = score(doc=116,freq=1.0), product of:
              0.085120834 = queryWeight, product of:
                1.2463233 = boost
                4.049738 = idf(docFreq=2048, maxDocs=43254)
                0.016864685 = queryNorm
              0.31638578 = fieldWeight in 116, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.049738 = idf(docFreq=2048, maxDocs=43254)
                0.078125 = fieldNorm(doc=116)
          0.056562956 = weight(abstract_txt:documents in 116) [ClassicSimilarity], result of:
            0.056562956 = score(doc=116,freq=4.0), product of:
              0.08794316 = queryWeight, product of:
                1.2668169 = boost
                4.1163282 = idf(docFreq=1916, maxDocs=43254)
                0.016864685 = queryNorm
              0.6431763 = fieldWeight in 116, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.1163282 = idf(docFreq=1916, maxDocs=43254)
                0.078125 = fieldNorm(doc=116)
          0.08067449 = weight(abstract_txt:categories in 116) [ClassicSimilarity], result of:
            0.08067449 = score(doc=116,freq=2.0), product of:
              0.14039387 = queryWeight, product of:
                1.6006149 = boost
                5.2009544 = idf(docFreq=647, maxDocs=43254)
                0.016864685 = queryNorm
              0.5746297 = fieldWeight in 116, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.2009544 = idf(docFreq=647, maxDocs=43254)
                0.078125 = fieldNorm(doc=116)
          0.08943363 = weight(abstract_txt:learning in 116) [ClassicSimilarity], result of:
            0.08943363 = score(doc=116,freq=1.0), product of:
              0.23871401 = queryWeight, product of:
                2.9516628 = boost
                4.7954893 = idf(docFreq=971, maxDocs=43254)
                0.016864685 = queryNorm
              0.37464762 = fieldWeight in 116, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7954893 = idf(docFreq=971, maxDocs=43254)
                0.078125 = fieldNorm(doc=116)
          0.17789099 = weight(abstract_txt:categorization in 116) [ClassicSimilarity], result of:
            0.17789099 = score(doc=116,freq=1.0), product of:
              0.3430313 = queryWeight, product of:
                3.0642576 = boost
                6.6378922 = idf(docFreq=153, maxDocs=43254)
                0.016864685 = queryNorm
              0.5185853 = fieldWeight in 116, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6378922 = idf(docFreq=153, maxDocs=43254)
                0.078125 = fieldNorm(doc=116)
          0.692755 = weight(abstract_txt:classifier in 116) [ClassicSimilarity], result of:
            0.692755 = score(doc=116,freq=5.0), product of:
              0.54652137 = queryWeight, product of:
                4.46613 = boost
                7.2560043 = idf(docFreq=82, maxDocs=43254)
                0.016864685 = queryNorm
              1.2675717 = fieldWeight in 116, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.2560043 = idf(docFreq=82, maxDocs=43254)
                0.078125 = fieldNorm(doc=116)
        0.28 = coord(7/25)
    
  5. Ruiz, M.E.; Srinivasan, P.: Combining machine learning and hierarchical indexing structures for text categorization (2001) 0.31
    0.30955207 = sum of:
      0.30955207 = product of:
        1.1055431 = sum of:
          0.010491663 = weight(abstract_txt:this in 3596) [ClassicSimilarity], result of:
            0.010491663 = score(doc=3596,freq=1.0), product of:
              0.04602634 = queryWeight, product of:
                1.1224362 = boost
                2.4314568 = idf(docFreq=10335, maxDocs=43254)
                0.016864685 = queryNorm
              0.22794908 = fieldWeight in 3596, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4314568 = idf(docFreq=10335, maxDocs=43254)
                0.09375 = fieldNorm(doc=3596)
          0.04570346 = weight(abstract_txt:text in 3596) [ClassicSimilarity], result of:
            0.04570346 = score(doc=3596,freq=2.0), product of:
              0.085120834 = queryWeight, product of:
                1.2463233 = boost
                4.049738 = idf(docFreq=2048, maxDocs=43254)
                0.016864685 = queryNorm
              0.5369245 = fieldWeight in 3596, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.049738 = idf(docFreq=2048, maxDocs=43254)
                0.09375 = fieldNorm(doc=3596)
          0.06845457 = weight(abstract_txt:categories in 3596) [ClassicSimilarity], result of:
            0.06845457 = score(doc=3596,freq=1.0), product of:
              0.14039387 = queryWeight, product of:
                1.6006149 = boost
                5.2009544 = idf(docFreq=647, maxDocs=43254)
                0.016864685 = queryNorm
              0.48758948 = fieldWeight in 3596, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2009544 = idf(docFreq=647, maxDocs=43254)
                0.09375 = fieldNorm(doc=3596)
          0.15545718 = weight(abstract_txt:machine in 3596) [ClassicSimilarity], result of:
            0.15545718 = score(doc=3596,freq=2.0), product of:
              0.22038099 = queryWeight, product of:
                2.456097 = boost
                5.320475 = idf(docFreq=574, maxDocs=43254)
                0.016864685 = queryNorm
              0.70540196 = fieldWeight in 3596, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.320475 = idf(docFreq=574, maxDocs=43254)
                0.09375 = fieldNorm(doc=3596)
          0.15177391 = weight(abstract_txt:learning in 3596) [ClassicSimilarity], result of:
            0.15177391 = score(doc=3596,freq=2.0), product of:
              0.23871401 = queryWeight, product of:
                2.9516628 = boost
                4.7954893 = idf(docFreq=971, maxDocs=43254)
                0.016864685 = queryNorm
              0.6357981 = fieldWeight in 3596, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.7954893 = idf(docFreq=971, maxDocs=43254)
                0.09375 = fieldNorm(doc=3596)
          0.301891 = weight(abstract_txt:categorization in 3596) [ClassicSimilarity], result of:
            0.301891 = score(doc=3596,freq=2.0), product of:
              0.3430313 = queryWeight, product of:
                3.0642576 = boost
                6.6378922 = idf(docFreq=153, maxDocs=43254)
                0.016864685 = queryNorm
              0.8800684 = fieldWeight in 3596, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.6378922 = idf(docFreq=153, maxDocs=43254)
                0.09375 = fieldNorm(doc=3596)
          0.3717714 = weight(abstract_txt:classifier in 3596) [ClassicSimilarity], result of:
            0.3717714 = score(doc=3596,freq=1.0), product of:
              0.54652137 = queryWeight, product of:
                4.46613 = boost
                7.2560043 = idf(docFreq=82, maxDocs=43254)
                0.016864685 = queryNorm
              0.6802504 = fieldWeight in 3596, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.2560043 = idf(docFreq=82, maxDocs=43254)
                0.09375 = fieldNorm(doc=3596)
        0.28 = coord(7/25)