Document (#35332)

Author
Liu, R.-L.
Title
Context-based term frequency assessment for text classification
Source
Journal of the American Society for Information Science and Technology. 61(2010) no.2, S.300-309
Year
2010
Abstract
Automatic text classification (TC) is essential for the management of information. To properly classify a document d, it is essential to identify the semantics of each term t in d, while the semantics heavily depend on context (neighboring terms) of t in d. Therefore, we present a technique CTFA (Context-based Term Frequency Assessment) that improves text classifiers by considering term contexts in test documents. The results of the term context recognition are used to assess term frequencies of terms, and hence CTFA may easily work with various kinds of text classifiers that base their TC decisions on term frequencies, without needing to modify the classifiers. Moreover, CTFA is efficient, and neither huge memory nor domain-specific knowledge is required. Empirical results show that CTFA successfully enhances performance of several kinds of text classifiers on different experimental data.
Theme
Automatisches Klassifizieren
Object
Context-based Term Frequency Assessment

Similar documents (content)

  1. Liu, R.-L.: ¬A passage extractor for classification of disease aspect information (2013) 0.30
    0.2999195 = sum of:
      0.2999195 = product of:
        1.071141 = sum of:
          0.010575508 = weight(abstract_txt:that in 1107) [ClassicSimilarity], result of:
            0.010575508 = score(doc=1107,freq=3.0), product of:
              0.041229535 = queryWeight, product of:
                1.0631405 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.01636687 = queryNorm
              0.2565032 = fieldWeight in 1107, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0625 = fieldNorm(doc=1107)
          0.0733213 = weight(abstract_txt:enhances in 1107) [ClassicSimilarity], result of:
            0.0733213 = score(doc=1107,freq=1.0), product of:
              0.149909 = queryWeight, product of:
                1.1704144 = boost
                7.825686 = idf(docFreq=47, maxDocs=44218)
                0.01636687 = queryNorm
              0.48910537 = fieldWeight in 1107, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.825686 = idf(docFreq=47, maxDocs=44218)
                0.0625 = fieldNorm(doc=1107)
          0.03371705 = weight(abstract_txt:classification in 1107) [ClassicSimilarity], result of:
            0.03371705 = score(doc=1107,freq=3.0), product of:
              0.078020774 = queryWeight, product of:
                1.1941144 = boost
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.01636687 = queryNorm
              0.4321548 = fieldWeight in 1107, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.0625 = fieldNorm(doc=1107)
          0.053830512 = weight(abstract_txt:essential in 1107) [ClassicSimilarity], result of:
            0.053830512 = score(doc=1107,freq=1.0), product of:
              0.15371041 = queryWeight, product of:
                1.6760712 = boost
                5.6033173 = idf(docFreq=442, maxDocs=44218)
                0.01636687 = queryNorm
              0.35020733 = fieldWeight in 1107, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6033173 = idf(docFreq=442, maxDocs=44218)
                0.0625 = fieldNorm(doc=1107)
          0.14307652 = weight(abstract_txt:text in 1107) [ClassicSimilarity], result of:
            0.14307652 = score(doc=1107,freq=8.0), product of:
              0.20014583 = queryWeight, product of:
                3.0240161 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.01636687 = queryNorm
              0.7148614 = fieldWeight in 1107, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=1107)
          0.6380946 = weight(abstract_txt:classifiers in 1107) [ClassicSimilarity], result of:
            0.6380946 = score(doc=1107,freq=6.0), product of:
              0.55407333 = queryWeight, product of:
                4.500279 = boost
                7.5225 = idf(docFreq=64, maxDocs=44218)
                0.01636687 = queryNorm
              1.1516429 = fieldWeight in 1107, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.5225 = idf(docFreq=64, maxDocs=44218)
                0.0625 = fieldNorm(doc=1107)
          0.118525445 = weight(abstract_txt:term in 1107) [ClassicSimilarity], result of:
            0.118525445 = score(doc=1107,freq=1.0), product of:
              0.39498568 = queryWeight, product of:
                5.026498 = boost
                4.8012047 = idf(docFreq=987, maxDocs=44218)
                0.01636687 = queryNorm
              0.3000753 = fieldWeight in 1107, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8012047 = idf(docFreq=987, maxDocs=44218)
                0.0625 = fieldNorm(doc=1107)
        0.28 = coord(7/25)
    
  2. Altinel, B.; Ganiz, M.C.: Semantic text classification : a survey of past and recent advances (2018) 0.26
    0.25822937 = sum of:
      0.25822937 = product of:
        0.71730375 = sum of:
          0.009253569 = weight(abstract_txt:that in 5051) [ClassicSimilarity], result of:
            0.009253569 = score(doc=5051,freq=3.0), product of:
              0.041229535 = queryWeight, product of:
                1.0631405 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.01636687 = queryNorm
              0.22444029 = fieldWeight in 5051, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5051)
          0.06141419 = weight(abstract_txt:classification in 5051) [ClassicSimilarity], result of:
            0.06141419 = score(doc=5051,freq=13.0), product of:
              0.078020774 = queryWeight, product of:
                1.1941144 = boost
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.01636687 = queryNorm
              0.78715175 = fieldWeight in 5051, product of:
                3.6055512 = tf(freq=13.0), with freq of:
                  13.0 = termFreq=13.0
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5051)
          0.017704815 = weight(abstract_txt:terms in 5051) [ClassicSimilarity], result of:
            0.017704815 = score(doc=5051,freq=1.0), product of:
              0.08005833 = queryWeight, product of:
                1.2096064 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.01636687 = queryNorm
              0.22114895 = fieldWeight in 5051, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5051)
          0.107641205 = weight(abstract_txt:neighboring in 5051) [ClassicSimilarity], result of:
            0.107641205 = score(doc=5051,freq=1.0), product of:
              0.21166772 = queryWeight, product of:
                1.390763 = boost
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.01636687 = queryNorm
              0.5085386 = fieldWeight in 5051, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5051)
          0.056325227 = weight(abstract_txt:frequency in 5051) [ClassicSimilarity], result of:
            0.056325227 = score(doc=5051,freq=1.0), product of:
              0.17317328 = queryWeight, product of:
                1.7790217 = boost
                5.947494 = idf(docFreq=313, maxDocs=44218)
                0.01636687 = queryNorm
              0.32525358 = fieldWeight in 5051, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.947494 = idf(docFreq=313, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5051)
          0.082893565 = weight(abstract_txt:semantics in 5051) [ClassicSimilarity], result of:
            0.082893565 = score(doc=5051,freq=2.0), product of:
              0.17783456 = queryWeight, product of:
                1.8028055 = boost
                6.027006 = idf(docFreq=289, maxDocs=44218)
                0.01636687 = queryNorm
              0.46612746 = fieldWeight in 5051, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.027006 = idf(docFreq=289, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5051)
          0.07581435 = weight(abstract_txt:context in 5051) [ClassicSimilarity], result of:
            0.07581435 = score(doc=5051,freq=3.0), product of:
              0.18442343 = queryWeight, product of:
                2.5963538 = boost
                4.339969 = idf(docFreq=1566, maxDocs=44218)
                0.01636687 = queryNorm
              0.4110885 = fieldWeight in 5051, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.339969 = idf(docFreq=1566, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5051)
          0.15958905 = weight(abstract_txt:text in 5051) [ClassicSimilarity], result of:
            0.15958905 = score(doc=5051,freq=13.0), product of:
              0.20014583 = queryWeight, product of:
                3.0240161 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.01636687 = queryNorm
              0.7973639 = fieldWeight in 5051, product of:
                3.6055512 = tf(freq=13.0), with freq of:
                  13.0 = termFreq=13.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5051)
          0.14666775 = weight(abstract_txt:term in 5051) [ClassicSimilarity], result of:
            0.14666775 = score(doc=5051,freq=2.0), product of:
              0.39498568 = queryWeight, product of:
                5.026498 = boost
                4.8012047 = idf(docFreq=987, maxDocs=44218)
                0.01636687 = queryNorm
              0.3713242 = fieldWeight in 5051, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.8012047 = idf(docFreq=987, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5051)
        0.36 = coord(9/25)
    
  3. Liu, R.-L.; Huang, Y.-C.: Ranker enhancement for proximity-based ranking of biomedical texts (2011) 0.20
    0.20380458 = sum of:
      0.20380458 = product of:
        0.63688934 = sum of:
          0.04643019 = weight(abstract_txt:improves in 4947) [ClassicSimilarity], result of:
            0.04643019 = score(doc=4947,freq=1.0), product of:
              0.110545546 = queryWeight, product of:
                1.0050703 = boost
                6.7201533 = idf(docFreq=144, maxDocs=44218)
                0.01636687 = queryNorm
              0.42000958 = fieldWeight in 4947, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7201533 = idf(docFreq=144, maxDocs=44218)
                0.0625 = fieldNorm(doc=4947)
          0.05410954 = weight(abstract_txt:huge in 4947) [ClassicSimilarity], result of:
            0.05410954 = score(doc=4947,freq=1.0), product of:
              0.12242126 = queryWeight, product of:
                1.0576799 = boost
                7.071914 = idf(docFreq=101, maxDocs=44218)
                0.01636687 = queryNorm
              0.44199464 = fieldWeight in 4947, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.071914 = idf(docFreq=101, maxDocs=44218)
                0.0625 = fieldNorm(doc=4947)
          0.013652922 = weight(abstract_txt:that in 4947) [ClassicSimilarity], result of:
            0.013652922 = score(doc=4947,freq=5.0), product of:
              0.041229535 = queryWeight, product of:
                1.0631405 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.01636687 = queryNorm
              0.3311442 = fieldWeight in 4947, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0625 = fieldNorm(doc=4947)
          0.020234074 = weight(abstract_txt:terms in 4947) [ClassicSimilarity], result of:
            0.020234074 = score(doc=4947,freq=1.0), product of:
              0.08005833 = queryWeight, product of:
                1.2096064 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.01636687 = queryNorm
              0.25274166 = fieldWeight in 4947, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=4947)
          0.059948005 = weight(abstract_txt:kinds in 4947) [ClassicSimilarity], result of:
            0.059948005 = score(doc=4947,freq=1.0), product of:
              0.16514575 = queryWeight, product of:
                1.7372988 = boost
                5.808009 = idf(docFreq=360, maxDocs=44218)
                0.01636687 = queryNorm
              0.36300057 = fieldWeight in 4947, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.808009 = idf(docFreq=360, maxDocs=44218)
                0.0625 = fieldNorm(doc=4947)
          0.06437169 = weight(abstract_txt:frequency in 4947) [ClassicSimilarity], result of:
            0.06437169 = score(doc=4947,freq=1.0), product of:
              0.17317328 = queryWeight, product of:
                1.7790217 = boost
                5.947494 = idf(docFreq=313, maxDocs=44218)
                0.01636687 = queryNorm
              0.37171838 = fieldWeight in 4947, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.947494 = idf(docFreq=313, maxDocs=44218)
                0.0625 = fieldNorm(doc=4947)
          0.11311193 = weight(abstract_txt:text in 4947) [ClassicSimilarity], result of:
            0.11311193 = score(doc=4947,freq=5.0), product of:
              0.20014583 = queryWeight, product of:
                3.0240161 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.01636687 = queryNorm
              0.5651476 = fieldWeight in 4947, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=4947)
          0.26503095 = weight(abstract_txt:term in 4947) [ClassicSimilarity], result of:
            0.26503095 = score(doc=4947,freq=5.0), product of:
              0.39498568 = queryWeight, product of:
                5.026498 = boost
                4.8012047 = idf(docFreq=987, maxDocs=44218)
                0.01636687 = queryNorm
              0.67098874 = fieldWeight in 4947, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.8012047 = idf(docFreq=987, maxDocs=44218)
                0.0625 = fieldNorm(doc=4947)
        0.32 = coord(8/25)
    
  4. Lee, D.L.; Ren, L.: Document ranking on weight-partitioned signature files (1996) 0.19
    0.18769883 = sum of:
      0.18769883 = product of:
        0.78207844 = sum of:
          0.019383328 = weight(abstract_txt:results in 2417) [ClassicSimilarity], result of:
            0.019383328 = score(doc=2417,freq=1.0), product of:
              0.059371177 = queryWeight, product of:
                1.0416664 = boost
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.01636687 = queryNorm
              0.32647708 = fieldWeight in 2417, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.09375 = fieldNorm(doc=2417)
          0.009158658 = weight(abstract_txt:that in 2417) [ClassicSimilarity], result of:
            0.009158658 = score(doc=2417,freq=1.0), product of:
              0.041229535 = queryWeight, product of:
                1.0631405 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.01636687 = queryNorm
              0.22213829 = fieldWeight in 2417, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.09375 = fieldNorm(doc=2417)
          0.030351112 = weight(abstract_txt:terms in 2417) [ClassicSimilarity], result of:
            0.030351112 = score(doc=2417,freq=1.0), product of:
              0.08005833 = queryWeight, product of:
                1.2096064 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.01636687 = queryNorm
              0.37911248 = fieldWeight in 2417, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.09375 = fieldNorm(doc=2417)
          0.16724257 = weight(abstract_txt:frequency in 2417) [ClassicSimilarity], result of:
            0.16724257 = score(doc=2417,freq=3.0), product of:
              0.17317328 = queryWeight, product of:
                1.7790217 = boost
                5.947494 = idf(docFreq=313, maxDocs=44218)
                0.01636687 = queryNorm
              0.9657527 = fieldWeight in 2417, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.947494 = idf(docFreq=313, maxDocs=44218)
                0.09375 = fieldNorm(doc=2417)
          0.20036642 = weight(abstract_txt:frequencies in 2417) [ClassicSimilarity], result of:
            0.20036642 = score(doc=2417,freq=1.0), product of:
              0.28173453 = queryWeight, product of:
                2.2691376 = boost
                7.5860133 = idf(docFreq=60, maxDocs=44218)
                0.01636687 = queryNorm
              0.71118873 = fieldWeight in 2417, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5860133 = idf(docFreq=60, maxDocs=44218)
                0.09375 = fieldNorm(doc=2417)
          0.35557634 = weight(abstract_txt:term in 2417) [ClassicSimilarity], result of:
            0.35557634 = score(doc=2417,freq=4.0), product of:
              0.39498568 = queryWeight, product of:
                5.026498 = boost
                4.8012047 = idf(docFreq=987, maxDocs=44218)
                0.01636687 = queryNorm
              0.9002259 = fieldWeight in 2417, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.8012047 = idf(docFreq=987, maxDocs=44218)
                0.09375 = fieldNorm(doc=2417)
        0.24 = coord(6/25)
    
  5. Liu, R.-L.: Dynamic category profiling for text filtering and classification (2007) 0.18
    0.18332003 = sum of:
      0.18332003 = product of:
        0.91660017 = sum of:
          0.016152775 = weight(abstract_txt:results in 900) [ClassicSimilarity], result of:
            0.016152775 = score(doc=900,freq=1.0), product of:
              0.059371177 = queryWeight, product of:
                1.0416664 = boost
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.01636687 = queryNorm
              0.27206424 = fieldWeight in 900, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.078125 = fieldNorm(doc=900)
          0.015264431 = weight(abstract_txt:that in 900) [ClassicSimilarity], result of:
            0.015264431 = score(doc=900,freq=4.0), product of:
              0.041229535 = queryWeight, product of:
                1.0631405 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.01636687 = queryNorm
              0.3702305 = fieldWeight in 900, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.078125 = fieldNorm(doc=900)
          0.024333188 = weight(abstract_txt:classification in 900) [ClassicSimilarity], result of:
            0.024333188 = score(doc=900,freq=1.0), product of:
              0.078020774 = queryWeight, product of:
                1.1941144 = boost
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.01636687 = queryNorm
              0.3118809 = fieldWeight in 900, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.078125 = fieldNorm(doc=900)
          0.06323149 = weight(abstract_txt:text in 900) [ClassicSimilarity], result of:
            0.06323149 = score(doc=900,freq=1.0), product of:
              0.20014583 = queryWeight, product of:
                3.0240161 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.01636687 = queryNorm
              0.3159271 = fieldWeight in 900, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=900)
          0.79761827 = weight(abstract_txt:classifiers in 900) [ClassicSimilarity], result of:
            0.79761827 = score(doc=900,freq=6.0), product of:
              0.55407333 = queryWeight, product of:
                4.500279 = boost
                7.5225 = idf(docFreq=64, maxDocs=44218)
                0.01636687 = queryNorm
              1.4395536 = fieldWeight in 900, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.5225 = idf(docFreq=64, maxDocs=44218)
                0.078125 = fieldNorm(doc=900)
        0.2 = coord(5/25)