Document (#28390)

Author
Sebastiani, F.
Title
Machine learning in automated text categorization
Source
ACM computing surveys. 34(2002) no.1, S.1-47
Year
2002
Abstract
The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last 10 years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based an machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert labor power, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.
Theme
Automatisches Klassifizieren
Computerlinguistik

Similar documents (author)

  1. Sebastiani, F.: On the role of logic in information retrieval (1998) 5.94
    5.937289 = sum of:
      5.937289 = weight(author_txt:sebastiani in 1140) [ClassicSimilarity], result of:
        5.937289 = fieldWeight in 1140, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.499662 = idf(docFreq=8, maxDocs=44218)
          0.625 = fieldNorm(doc=1140)
    
  2. Sebastiani, F.: ¬A tutorial an automated text categorisation (1999) 5.94
    5.937289 = sum of:
      5.937289 = weight(author_txt:sebastiani in 3390) [ClassicSimilarity], result of:
        5.937289 = fieldWeight in 3390, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.499662 = idf(docFreq=8, maxDocs=44218)
          0.625 = fieldNorm(doc=3390)
    
  3. Sebastiani, F.: Classification of text, automatic (2006) 5.94
    5.937289 = sum of:
      5.937289 = weight(author_txt:sebastiani in 5003) [ClassicSimilarity], result of:
        5.937289 = fieldWeight in 5003, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.499662 = idf(docFreq=8, maxDocs=44218)
          0.625 = fieldNorm(doc=5003)
    
  4. Debole, F.; Sebastiani, F.: ¬An analysis of the relative hardness of Reuters-21578 subsets (2005) 4.75
    4.749831 = sum of:
      4.749831 = weight(author_txt:sebastiani in 3456) [ClassicSimilarity], result of:
        4.749831 = fieldWeight in 3456, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.499662 = idf(docFreq=8, maxDocs=44218)
          0.5 = fieldNorm(doc=3456)
    
  5. Giorgetti, D.; Sebastiani, F.: Automating survey coding by multiclass text categorization techniques (2003) 4.75
    4.749831 = sum of:
      4.749831 = weight(author_txt:sebastiani in 5172) [ClassicSimilarity], result of:
        4.749831 = fieldWeight in 5172, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.499662 = idf(docFreq=8, maxDocs=44218)
          0.5 = fieldNorm(doc=5172)
    

Similar documents (content)

  1. Sebastiani, F.: ¬A tutorial an automated text categorisation (1999) 0.87
    0.87062097 = sum of:
      0.87062097 = product of:
        1.4510349 = sum of:
          0.05791277 = weight(abstract_txt:builds in 3390) [ClassicSimilarity], result of:
            0.05791277 = score(doc=3390,freq=1.0), product of:
              0.13084367 = queryWeight, product of:
                1.0925362 = boost
                7.0817666 = idf(docFreq=100, maxDocs=44218)
                0.016911233 = queryNorm
              0.4426104 = fieldWeight in 3390, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0817666 = idf(docFreq=100, maxDocs=44218)
                0.0625 = fieldNorm(doc=3390)
          0.05815724 = weight(abstract_txt:dominant in 3390) [ClassicSimilarity], result of:
            0.05815724 = score(doc=3390,freq=1.0), product of:
              0.13121164 = queryWeight, product of:
                1.0940714 = boost
                7.0917172 = idf(docFreq=99, maxDocs=44218)
                0.016911233 = queryNorm
              0.44323233 = fieldWeight in 3390, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0917172 = idf(docFreq=99, maxDocs=44218)
                0.0625 = fieldNorm(doc=3390)
          0.009720022 = weight(abstract_txt:this in 3390) [ClassicSimilarity], result of:
            0.009720022 = score(doc=3390,freq=2.0), product of:
              0.045573432 = queryWeight, product of:
                1.1168015 = boost
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.016911233 = queryNorm
              0.21328263 = fieldWeight in 3390, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.0625 = fieldNorm(doc=3390)
          0.07521665 = weight(abstract_txt:inductive in 3390) [ClassicSimilarity], result of:
            0.07521665 = score(doc=3390,freq=1.0), product of:
              0.15575637 = queryWeight, product of:
                1.1920168 = boost
                7.7265954 = idf(docFreq=52, maxDocs=44218)
                0.016911233 = queryNorm
              0.4829122 = fieldWeight in 3390, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.7265954 = idf(docFreq=52, maxDocs=44218)
                0.0625 = fieldNorm(doc=3390)
          0.030498961 = weight(abstract_txt:text in 3390) [ClassicSimilarity], result of:
            0.030498961 = score(doc=3390,freq=2.0), product of:
              0.08532832 = queryWeight, product of:
                1.2477313 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.016911233 = queryNorm
              0.3574307 = fieldWeight in 3390, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=3390)
          0.039540634 = weight(abstract_txt:documents in 3390) [ClassicSimilarity], result of:
            0.039540634 = score(doc=3390,freq=3.0), product of:
              0.0886275 = queryWeight, product of:
                1.2716241 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.016911233 = queryNorm
              0.44614407 = fieldWeight in 3390, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.0625 = fieldNorm(doc=3390)
          0.10392739 = weight(abstract_txt:savings in 3390) [ClassicSimilarity], result of:
            0.10392739 = score(doc=3390,freq=1.0), product of:
              0.19322197 = queryWeight, product of:
                1.3276626 = boost
                8.6058445 = idf(docFreq=21, maxDocs=44218)
                0.016911233 = queryNorm
              0.5378653 = fieldWeight in 3390, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.6058445 = idf(docFreq=21, maxDocs=44218)
                0.0625 = fieldNorm(doc=3390)
          0.11136837 = weight(abstract_txt:witnessed in 3390) [ClassicSimilarity], result of:
            0.11136837 = score(doc=3390,freq=1.0), product of:
              0.20233813 = queryWeight, product of:
                1.3586211 = boost
                8.806516 = idf(docFreq=17, maxDocs=44218)
                0.016911233 = queryNorm
              0.55040723 = fieldWeight in 3390, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.806516 = idf(docFreq=17, maxDocs=44218)
                0.0625 = fieldNorm(doc=3390)
          0.13111645 = weight(abstract_txt:booming in 3390) [ClassicSimilarity], result of:
            0.13111645 = score(doc=3390,freq=1.0), product of:
              0.22560114 = queryWeight, product of:
                1.4345977 = boost
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.016911233 = queryNorm
              0.581187 = fieldWeight in 3390, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.0625 = fieldNorm(doc=3390)
          0.06401994 = weight(abstract_txt:categories in 3390) [ClassicSimilarity], result of:
            0.06401994 = score(doc=3390,freq=2.0), product of:
              0.13988785 = queryWeight, product of:
                1.5975869 = boost
                5.17774 = idf(docFreq=677, maxDocs=44218)
                0.016911233 = queryNorm
              0.45765188 = fieldWeight in 3390, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.17774 = idf(docFreq=677, maxDocs=44218)
                0.0625 = fieldNorm(doc=3390)
          0.081139095 = weight(abstract_txt:automated in 3390) [ClassicSimilarity], result of:
            0.081139095 = score(doc=3390,freq=2.0), product of:
              0.16382869 = queryWeight, product of:
                1.7288983 = boost
                5.6033173 = idf(docFreq=442, maxDocs=44218)
                0.016911233 = queryNorm
              0.49526796 = fieldWeight in 3390, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.6033173 = idf(docFreq=442, maxDocs=44218)
                0.0625 = fieldNorm(doc=3390)
          0.03634589 = weight(abstract_txt:approach in 3390) [ClassicSimilarity], result of:
            0.03634589 = score(doc=3390,freq=2.0), product of:
              0.10979194 = queryWeight, product of:
                1.7334262 = boost
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.016911233 = queryNorm
              0.33104333 = fieldWeight in 3390, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.0625 = fieldNorm(doc=3390)
          0.10174731 = weight(abstract_txt:machine in 3390) [ClassicSimilarity], result of:
            0.10174731 = score(doc=3390,freq=2.0), product of:
              0.21807985 = queryWeight, product of:
                2.4430249 = boost
                5.2785225 = idf(docFreq=612, maxDocs=44218)
                0.016911233 = queryNorm
              0.4665599 = fieldWeight in 3390, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.2785225 = idf(docFreq=612, maxDocs=44218)
                0.0625 = fieldNorm(doc=3390)
          0.12114079 = weight(abstract_txt:learning in 3390) [ClassicSimilarity], result of:
            0.12114079 = score(doc=3390,freq=3.0), product of:
              0.23554634 = queryWeight, product of:
                2.9317548 = boost
                4.750873 = idf(docFreq=1038, maxDocs=44218)
                0.016911233 = queryNorm
              0.51429707 = fieldWeight in 3390, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.750873 = idf(docFreq=1038, maxDocs=44218)
                0.0625 = fieldNorm(doc=3390)
          0.42918333 = weight(abstract_txt:classifier in 3390) [ClassicSimilarity], result of:
            0.42918333 = score(doc=3390,freq=3.0), product of:
              0.5474082 = queryWeight, product of:
                4.469358 = boost
                7.24254 = idf(docFreq=85, maxDocs=44218)
                0.016911233 = queryNorm
              0.78402793 = fieldWeight in 3390, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.24254 = idf(docFreq=85, maxDocs=44218)
                0.0625 = fieldNorm(doc=3390)
        0.6 = coord(15/25)
    
  2. Sebastiani, F.: Classification of text, automatic (2006) 0.45
    0.4468565 = sum of:
      0.4468565 = product of:
        1.1171412 = sum of:
          0.08686916 = weight(abstract_txt:builds in 5003) [ClassicSimilarity], result of:
            0.08686916 = score(doc=5003,freq=1.0), product of:
              0.13084367 = queryWeight, product of:
                1.0925362 = boost
                7.0817666 = idf(docFreq=100, maxDocs=44218)
                0.016911233 = queryNorm
              0.66391563 = fieldWeight in 5003, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0817666 = idf(docFreq=100, maxDocs=44218)
                0.09375 = fieldNorm(doc=5003)
          0.01030964 = weight(abstract_txt:this in 5003) [ClassicSimilarity], result of:
            0.01030964 = score(doc=5003,freq=1.0), product of:
              0.045573432 = queryWeight, product of:
                1.1168015 = boost
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.016911233 = queryNorm
              0.2262204 = fieldWeight in 5003, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.09375 = fieldNorm(doc=5003)
          0.04574844 = weight(abstract_txt:text in 5003) [ClassicSimilarity], result of:
            0.04574844 = score(doc=5003,freq=2.0), product of:
              0.08532832 = queryWeight, product of:
                1.2477313 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.016911233 = queryNorm
              0.53614604 = fieldWeight in 5003, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.09375 = fieldNorm(doc=5003)
          0.13341078 = weight(abstract_txt:predefined in 5003) [ClassicSimilarity], result of:
            0.13341078 = score(doc=5003,freq=1.0), product of:
              0.1741685 = queryWeight, product of:
                1.2605041 = boost
                8.1705265 = idf(docFreq=33, maxDocs=44218)
                0.016911233 = queryNorm
              0.76598686 = fieldWeight in 5003, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.1705265 = idf(docFreq=33, maxDocs=44218)
                0.09375 = fieldNorm(doc=5003)
          0.096029915 = weight(abstract_txt:categories in 5003) [ClassicSimilarity], result of:
            0.096029915 = score(doc=5003,freq=2.0), product of:
              0.13988785 = queryWeight, product of:
                1.5975869 = boost
                5.17774 = idf(docFreq=677, maxDocs=44218)
                0.016911233 = queryNorm
              0.68647784 = fieldWeight in 5003, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.17774 = idf(docFreq=677, maxDocs=44218)
                0.09375 = fieldNorm(doc=5003)
          0.12170865 = weight(abstract_txt:automated in 5003) [ClassicSimilarity], result of:
            0.12170865 = score(doc=5003,freq=2.0), product of:
              0.16382869 = queryWeight, product of:
                1.7288983 = boost
                5.6033173 = idf(docFreq=442, maxDocs=44218)
                0.016911233 = queryNorm
              0.7429019 = fieldWeight in 5003, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.6033173 = idf(docFreq=442, maxDocs=44218)
                0.09375 = fieldNorm(doc=5003)
          0.03855064 = weight(abstract_txt:approach in 5003) [ClassicSimilarity], result of:
            0.03855064 = score(doc=5003,freq=1.0), product of:
              0.10979194 = queryWeight, product of:
                1.7334262 = boost
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.016911233 = queryNorm
              0.3511245 = fieldWeight in 5003, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.09375 = fieldNorm(doc=5003)
          0.10791932 = weight(abstract_txt:machine in 5003) [ClassicSimilarity], result of:
            0.10791932 = score(doc=5003,freq=1.0), product of:
              0.21807985 = queryWeight, product of:
                2.4430249 = boost
                5.2785225 = idf(docFreq=612, maxDocs=44218)
                0.016911233 = queryNorm
              0.49486148 = fieldWeight in 5003, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2785225 = idf(docFreq=612, maxDocs=44218)
                0.09375 = fieldNorm(doc=5003)
          0.10491101 = weight(abstract_txt:learning in 5003) [ClassicSimilarity], result of:
            0.10491101 = score(doc=5003,freq=1.0), product of:
              0.23554634 = queryWeight, product of:
                2.9317548 = boost
                4.750873 = idf(docFreq=1038, maxDocs=44218)
                0.016911233 = queryNorm
              0.44539434 = fieldWeight in 5003, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.750873 = idf(docFreq=1038, maxDocs=44218)
                0.09375 = fieldNorm(doc=5003)
          0.37168366 = weight(abstract_txt:classifier in 5003) [ClassicSimilarity], result of:
            0.37168366 = score(doc=5003,freq=1.0), product of:
              0.5474082 = queryWeight, product of:
                4.469358 = boost
                7.24254 = idf(docFreq=85, maxDocs=44218)
                0.016911233 = queryNorm
              0.6789881 = fieldWeight in 5003, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.24254 = idf(docFreq=85, maxDocs=44218)
                0.09375 = fieldNorm(doc=5003)
        0.4 = coord(10/25)
    
  3. Li, T.; Zhu, S.; Ogihara, M.: Hierarchical document classification using automatically generated hierarchy (2007) 0.35
    0.35421595 = sum of:
      0.35421595 = product of:
        0.8855399 = sum of:
          0.008591366 = weight(abstract_txt:this in 4797) [ClassicSimilarity], result of:
            0.008591366 = score(doc=4797,freq=1.0), product of:
              0.045573432 = queryWeight, product of:
                1.1168015 = boost
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.016911233 = queryNorm
              0.18851699 = fieldWeight in 4797, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.078125 = fieldNorm(doc=4797)
          0.020076586 = weight(abstract_txt:different in 4797) [ClassicSimilarity], result of:
            0.020076586 = score(doc=4797,freq=1.0), product of:
              0.07010781 = queryWeight, product of:
                1.1309872 = boost
                3.6655018 = idf(docFreq=3075, maxDocs=44218)
                0.016911233 = queryNorm
              0.28636733 = fieldWeight in 4797, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6655018 = idf(docFreq=3075, maxDocs=44218)
                0.078125 = fieldNorm(doc=4797)
          0.0381237 = weight(abstract_txt:text in 4797) [ClassicSimilarity], result of:
            0.0381237 = score(doc=4797,freq=2.0), product of:
              0.08532832 = queryWeight, product of:
                1.2477313 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.016911233 = queryNorm
              0.44678837 = fieldWeight in 4797, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=4797)
          0.028535997 = weight(abstract_txt:documents in 4797) [ClassicSimilarity], result of:
            0.028535997 = score(doc=4797,freq=1.0), product of:
              0.0886275 = queryWeight, product of:
                1.2716241 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.016911233 = queryNorm
              0.32197678 = fieldWeight in 4797, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.078125 = fieldNorm(doc=4797)
          0.13921046 = weight(abstract_txt:witnessed in 4797) [ClassicSimilarity], result of:
            0.13921046 = score(doc=4797,freq=1.0), product of:
              0.20233813 = queryWeight, product of:
                1.3586211 = boost
                8.806516 = idf(docFreq=17, maxDocs=44218)
                0.016911233 = queryNorm
              0.688009 = fieldWeight in 4797, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.806516 = idf(docFreq=17, maxDocs=44218)
                0.078125 = fieldNorm(doc=4797)
          0.16389556 = weight(abstract_txt:booming in 4797) [ClassicSimilarity], result of:
            0.16389556 = score(doc=4797,freq=1.0), product of:
              0.22560114 = queryWeight, product of:
                1.4345977 = boost
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.016911233 = queryNorm
              0.72648376 = fieldWeight in 4797, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.078125 = fieldNorm(doc=4797)
          0.08002493 = weight(abstract_txt:categories in 4797) [ClassicSimilarity], result of:
            0.08002493 = score(doc=4797,freq=2.0), product of:
              0.13988785 = queryWeight, product of:
                1.5975869 = boost
                5.17774 = idf(docFreq=677, maxDocs=44218)
                0.016911233 = queryNorm
              0.5720649 = fieldWeight in 4797, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.17774 = idf(docFreq=677, maxDocs=44218)
                0.078125 = fieldNorm(doc=4797)
          0.07171751 = weight(abstract_txt:automated in 4797) [ClassicSimilarity], result of:
            0.07171751 = score(doc=4797,freq=1.0), product of:
              0.16382869 = queryWeight, product of:
                1.7288983 = boost
                5.6033173 = idf(docFreq=442, maxDocs=44218)
                0.016911233 = queryNorm
              0.43775916 = fieldWeight in 4797, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6033173 = idf(docFreq=442, maxDocs=44218)
                0.078125 = fieldNorm(doc=4797)
          0.032125533 = weight(abstract_txt:approach in 4797) [ClassicSimilarity], result of:
            0.032125533 = score(doc=4797,freq=1.0), product of:
              0.10979194 = queryWeight, product of:
                1.7334262 = boost
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.016911233 = queryNorm
              0.29260373 = fieldWeight in 4797, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.078125 = fieldNorm(doc=4797)
          0.30323818 = weight(abstract_txt:categorization in 4797) [ClassicSimilarity], result of:
            0.30323818 = score(doc=4797,freq=3.0), product of:
              0.34000534 = queryWeight, product of:
                3.0504434 = boost
                6.590942 = idf(docFreq=164, maxDocs=44218)
                0.016911233 = queryNorm
              0.891863 = fieldWeight in 4797, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.590942 = idf(docFreq=164, maxDocs=44218)
                0.078125 = fieldNorm(doc=4797)
        0.4 = coord(10/25)
    
  4. Duwairi, R.M.: Machine learning for Arabic text categorization (2006) 0.32
    0.3157666 = sum of:
      0.3157666 = product of:
        1.1277379 = sum of:
          0.008591366 = weight(abstract_txt:this in 5115) [ClassicSimilarity], result of:
            0.008591366 = score(doc=5115,freq=1.0), product of:
              0.045573432 = queryWeight, product of:
                1.1168015 = boost
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.016911233 = queryNorm
              0.18851699 = fieldWeight in 5115, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.078125 = fieldNorm(doc=5115)
          0.026957527 = weight(abstract_txt:text in 5115) [ClassicSimilarity], result of:
            0.026957527 = score(doc=5115,freq=1.0), product of:
              0.08532832 = queryWeight, product of:
                1.2477313 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.016911233 = queryNorm
              0.3159271 = fieldWeight in 5115, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=5115)
          0.057071995 = weight(abstract_txt:documents in 5115) [ClassicSimilarity], result of:
            0.057071995 = score(doc=5115,freq=4.0), product of:
              0.0886275 = queryWeight, product of:
                1.2716241 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.016911233 = queryNorm
              0.64395356 = fieldWeight in 5115, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.078125 = fieldNorm(doc=5115)
          0.08002493 = weight(abstract_txt:categories in 5115) [ClassicSimilarity], result of:
            0.08002493 = score(doc=5115,freq=2.0), product of:
              0.13988785 = queryWeight, product of:
                1.5975869 = boost
                5.17774 = idf(docFreq=677, maxDocs=44218)
                0.016911233 = queryNorm
              0.5720649 = fieldWeight in 5115, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.17774 = idf(docFreq=677, maxDocs=44218)
                0.078125 = fieldNorm(doc=5115)
          0.08742584 = weight(abstract_txt:learning in 5115) [ClassicSimilarity], result of:
            0.08742584 = score(doc=5115,freq=1.0), product of:
              0.23554634 = queryWeight, product of:
                2.9317548 = boost
                4.750873 = idf(docFreq=1038, maxDocs=44218)
                0.016911233 = queryNorm
              0.37116197 = fieldWeight in 5115, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.750873 = idf(docFreq=1038, maxDocs=44218)
                0.078125 = fieldNorm(doc=5115)
          0.17507464 = weight(abstract_txt:categorization in 5115) [ClassicSimilarity], result of:
            0.17507464 = score(doc=5115,freq=1.0), product of:
              0.34000534 = queryWeight, product of:
                3.0504434 = boost
                6.590942 = idf(docFreq=164, maxDocs=44218)
                0.016911233 = queryNorm
              0.5149173 = fieldWeight in 5115, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.590942 = idf(docFreq=164, maxDocs=44218)
                0.078125 = fieldNorm(doc=5115)
          0.6925916 = weight(abstract_txt:classifier in 5115) [ClassicSimilarity], result of:
            0.6925916 = score(doc=5115,freq=5.0), product of:
              0.5474082 = queryWeight, product of:
                4.469358 = boost
                7.24254 = idf(docFreq=85, maxDocs=44218)
                0.016911233 = queryNorm
              1.2652196 = fieldWeight in 5115, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.24254 = idf(docFreq=85, maxDocs=44218)
                0.078125 = fieldNorm(doc=5115)
        0.28 = coord(7/25)
    
  5. Ruiz, M.E.; Srinivasan, P.: Combining machine learning and hierarchical indexing structures for text categorization (2001) 0.31
    0.30624837 = sum of:
      0.30624837 = product of:
        1.0937442 = sum of:
          0.01030964 = weight(abstract_txt:this in 1595) [ClassicSimilarity], result of:
            0.01030964 = score(doc=1595,freq=1.0), product of:
              0.045573432 = queryWeight, product of:
                1.1168015 = boost
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.016911233 = queryNorm
              0.2262204 = fieldWeight in 1595, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.09375 = fieldNorm(doc=1595)
          0.04574844 = weight(abstract_txt:text in 1595) [ClassicSimilarity], result of:
            0.04574844 = score(doc=1595,freq=2.0), product of:
              0.08532832 = queryWeight, product of:
                1.2477313 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.016911233 = queryNorm
              0.53614604 = fieldWeight in 1595, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.09375 = fieldNorm(doc=1595)
          0.0679034 = weight(abstract_txt:categories in 1595) [ClassicSimilarity], result of:
            0.0679034 = score(doc=1595,freq=1.0), product of:
              0.13988785 = queryWeight, product of:
                1.5975869 = boost
                5.17774 = idf(docFreq=677, maxDocs=44218)
                0.016911233 = queryNorm
              0.48541313 = fieldWeight in 1595, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.17774 = idf(docFreq=677, maxDocs=44218)
                0.09375 = fieldNorm(doc=1595)
          0.15262097 = weight(abstract_txt:machine in 1595) [ClassicSimilarity], result of:
            0.15262097 = score(doc=1595,freq=2.0), product of:
              0.21807985 = queryWeight, product of:
                2.4430249 = boost
                5.2785225 = idf(docFreq=612, maxDocs=44218)
                0.016911233 = queryNorm
              0.69983983 = fieldWeight in 1595, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.2785225 = idf(docFreq=612, maxDocs=44218)
                0.09375 = fieldNorm(doc=1595)
          0.14836656 = weight(abstract_txt:learning in 1595) [ClassicSimilarity], result of:
            0.14836656 = score(doc=1595,freq=2.0), product of:
              0.23554634 = queryWeight, product of:
                2.9317548 = boost
                4.750873 = idf(docFreq=1038, maxDocs=44218)
                0.016911233 = queryNorm
              0.6298827 = fieldWeight in 1595, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.750873 = idf(docFreq=1038, maxDocs=44218)
                0.09375 = fieldNorm(doc=1595)
          0.2971115 = weight(abstract_txt:categorization in 1595) [ClassicSimilarity], result of:
            0.2971115 = score(doc=1595,freq=2.0), product of:
              0.34000534 = queryWeight, product of:
                3.0504434 = boost
                6.590942 = idf(docFreq=164, maxDocs=44218)
                0.016911233 = queryNorm
              0.87384367 = fieldWeight in 1595, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.590942 = idf(docFreq=164, maxDocs=44218)
                0.09375 = fieldNorm(doc=1595)
          0.37168366 = weight(abstract_txt:classifier in 1595) [ClassicSimilarity], result of:
            0.37168366 = score(doc=1595,freq=1.0), product of:
              0.5474082 = queryWeight, product of:
                4.469358 = boost
                7.24254 = idf(docFreq=85, maxDocs=44218)
                0.016911233 = queryNorm
              0.6789881 = fieldWeight in 1595, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.24254 = idf(docFreq=85, maxDocs=44218)
                0.09375 = fieldNorm(doc=1595)
        0.28 = coord(7/25)