Document (#35333)

Author
Liu, R.-L.
Title
Context-based term frequency assessment for text classification
Source
Journal of the American Society for Information Science and Technology. 61(2010) no.2, S.300-309
Year
2010
Abstract
Automatic text classification (TC) is essential for the management of information. To properly classify a document d, it is essential to identify the semantics of each term t in d, while the semantics heavily depend on context (neighboring terms) of t in d. Therefore, we present a technique CTFA (Context-based Term Frequency Assessment) that improves text classifiers by considering term contexts in test documents. The results of the term context recognition are used to assess term frequencies of terms, and hence CTFA may easily work with various kinds of text classifiers that base their TC decisions on term frequencies, without needing to modify the classifiers. Moreover, CTFA is efficient, and neither huge memory nor domain-specific knowledge is required. Empirical results show that CTFA successfully enhances performance of several kinds of text classifiers on different experimental data.
Theme
Automatisches Klassifizieren
Object
Context-based Term Frequency Assessment

Similar documents (content)

  1. Liu, R.-L.: ¬A passage extractor for classification of disease aspect information (2013) 0.30
    0.30174136 = sum of:
      0.30174136 = product of:
        1.0776477 = sum of:
          0.010807968 = weight(abstract_txt:that in 3108) [ClassicSimilarity], result of:
            0.010807968 = score(doc=3108,freq=3.0), product of:
              0.041692678 = queryWeight, product of:
                1.0722278 = boost
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.016237872 = queryNorm
              0.2592294 = fieldWeight in 3108, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.0625 = fieldNorm(doc=3108)
          0.073446915 = weight(abstract_txt:enhances in 3108) [ClassicSimilarity], result of:
            0.073446915 = score(doc=3108,freq=1.0), product of:
              0.14958204 = queryWeight, product of:
                1.1725633 = boost
                7.856228 = idf(docFreq=44, maxDocs=42740)
                0.016237872 = queryNorm
              0.49101424 = fieldWeight in 3108, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.856228 = idf(docFreq=44, maxDocs=42740)
                0.0625 = fieldNorm(doc=3108)
          0.033580452 = weight(abstract_txt:classification in 3108) [ClassicSimilarity], result of:
            0.033580452 = score(doc=3108,freq=3.0), product of:
              0.07755165 = queryWeight, product of:
                1.1940075 = boost
                3.9999528 = idf(docFreq=2127, maxDocs=42740)
                0.016237872 = queryNorm
              0.4330076 = fieldWeight in 3108, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.9999528 = idf(docFreq=2127, maxDocs=42740)
                0.0625 = fieldNorm(doc=3108)
          0.054686602 = weight(abstract_txt:essential in 3108) [ClassicSimilarity], result of:
            0.054686602 = score(doc=3108,freq=1.0), product of:
              0.15482023 = queryWeight, product of:
                1.6870401 = boost
                5.6516232 = idf(docFreq=407, maxDocs=42740)
                0.016237872 = queryNorm
              0.35322645 = fieldWeight in 3108, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6516232 = idf(docFreq=407, maxDocs=42740)
                0.0625 = fieldNorm(doc=3108)
          0.14230837 = weight(abstract_txt:text in 3108) [ClassicSimilarity], result of:
            0.14230837 = score(doc=3108,freq=8.0), product of:
              0.1987669 = queryWeight, product of:
                3.0224116 = boost
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.016237872 = queryNorm
              0.7159561 = fieldWeight in 3108, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.0625 = fieldNorm(doc=3108)
          0.64343446 = weight(abstract_txt:classifiers in 3108) [ClassicSimilarity], result of:
            0.64343446 = score(doc=3108,freq=6.0), product of:
              0.5553109 = queryWeight, product of:
                4.5185037 = boost
                7.568546 = idf(docFreq=59, maxDocs=42740)
                0.016237872 = queryNorm
              1.1586922 = fieldWeight in 3108, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.568546 = idf(docFreq=59, maxDocs=42740)
                0.0625 = fieldNorm(doc=3108)
          0.11938289 = weight(abstract_txt:term in 3108) [ClassicSimilarity], result of:
            0.11938289 = score(doc=3108,freq=1.0), product of:
              0.39557105 = queryWeight, product of:
                5.044961 = boost
                4.8287816 = idf(docFreq=928, maxDocs=42740)
                0.016237872 = queryNorm
              0.30179885 = fieldWeight in 3108, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8287816 = idf(docFreq=928, maxDocs=42740)
                0.0625 = fieldNorm(doc=3108)
        0.28 = coord(7/25)
    
  2. Altinel, B.; Ganiz, M.C.: Semantic text classification : a survey of past and recent advances (2018) 0.26
    0.25899518 = sum of:
      0.25899518 = product of:
        0.71943104 = sum of:
          0.009456972 = weight(abstract_txt:that in 1052) [ClassicSimilarity], result of:
            0.009456972 = score(doc=1052,freq=3.0), product of:
              0.041692678 = queryWeight, product of:
                1.0722278 = boost
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.016237872 = queryNorm
              0.22682571 = fieldWeight in 1052, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.0546875 = fieldNorm(doc=1052)
          0.061165377 = weight(abstract_txt:classification in 1052) [ClassicSimilarity], result of:
            0.061165377 = score(doc=1052,freq=13.0), product of:
              0.07755165 = queryWeight, product of:
                1.1940075 = boost
                3.9999528 = idf(docFreq=2127, maxDocs=42740)
                0.016237872 = queryNorm
              0.78870505 = fieldWeight in 1052, product of:
                3.6055512 = tf(freq=13.0), with freq of:
                  13.0 = termFreq=13.0
                3.9999528 = idf(docFreq=2127, maxDocs=42740)
                0.0546875 = fieldNorm(doc=1052)
          0.017739616 = weight(abstract_txt:terms in 1052) [ClassicSimilarity], result of:
            0.017739616 = score(doc=1052,freq=1.0), product of:
              0.079897135 = queryWeight, product of:
                1.211929 = boost
                4.05999 = idf(docFreq=2003, maxDocs=42740)
                0.016237872 = queryNorm
              0.2220307 = fieldWeight in 1052, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.05999 = idf(docFreq=2003, maxDocs=42740)
                0.0546875 = fieldNorm(doc=1052)
          0.10869498 = weight(abstract_txt:neighboring in 1052) [ClassicSimilarity], result of:
            0.10869498 = score(doc=1052,freq=1.0), product of:
              0.21233979 = queryWeight, product of:
                1.397051 = boost
                9.360306 = idf(docFreq=9, maxDocs=42740)
                0.016237872 = queryNorm
              0.5118917 = fieldWeight in 1052, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.360306 = idf(docFreq=9, maxDocs=42740)
                0.0546875 = fieldNorm(doc=1052)
          0.056187924 = weight(abstract_txt:frequency in 1052) [ClassicSimilarity], result of:
            0.056187924 = score(doc=1052,freq=1.0), product of:
              0.17231789 = queryWeight, product of:
                1.7798227 = boost
                5.962447 = idf(docFreq=298, maxDocs=42740)
                0.016237872 = queryNorm
              0.32607132 = fieldWeight in 1052, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.962447 = idf(docFreq=298, maxDocs=42740)
                0.0546875 = fieldNorm(doc=1052)
          0.08270506 = weight(abstract_txt:semantics in 1052) [ClassicSimilarity], result of:
            0.08270506 = score(doc=1052,freq=2.0), product of:
              0.1769755 = queryWeight, product of:
                1.8037158 = boost
                6.0424895 = idf(docFreq=275, maxDocs=42740)
                0.016237872 = queryNorm
              0.4673249 = fieldWeight in 1052, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.0424895 = idf(docFreq=275, maxDocs=42740)
                0.0546875 = fieldNorm(doc=1052)
          0.07702008 = weight(abstract_txt:context in 1052) [ClassicSimilarity], result of:
            0.07702008 = score(doc=1052,freq=3.0), product of:
              0.18575504 = queryWeight, product of:
                2.6133456 = boost
                4.377384 = idf(docFreq=1458, maxDocs=42740)
                0.016237872 = queryNorm
              0.4146325 = fieldWeight in 1052, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.377384 = idf(docFreq=1458, maxDocs=42740)
                0.0546875 = fieldNorm(doc=1052)
          0.15873225 = weight(abstract_txt:text in 1052) [ClassicSimilarity], result of:
            0.15873225 = score(doc=1052,freq=13.0), product of:
              0.1987669 = queryWeight, product of:
                3.0224116 = boost
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.016237872 = queryNorm
              0.79858494 = fieldWeight in 1052, product of:
                3.6055512 = tf(freq=13.0), with freq of:
                  13.0 = termFreq=13.0
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.0546875 = fieldNorm(doc=1052)
          0.14772879 = weight(abstract_txt:term in 1052) [ClassicSimilarity], result of:
            0.14772879 = score(doc=1052,freq=2.0), product of:
              0.39557105 = queryWeight, product of:
                5.044961 = boost
                4.8287816 = idf(docFreq=928, maxDocs=42740)
                0.016237872 = queryNorm
              0.373457 = fieldWeight in 1052, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.8287816 = idf(docFreq=928, maxDocs=42740)
                0.0546875 = fieldNorm(doc=1052)
        0.36 = coord(9/25)
    
  3. Liu, R.-L.; Huang, Y.-C.: Ranker enhancement for proximity-based ranking of biomedical texts (2011) 0.20
    0.20458378 = sum of:
      0.20458378 = product of:
        0.6393243 = sum of:
          0.047369406 = weight(abstract_txt:improves in 1948) [ClassicSimilarity], result of:
            0.047369406 = score(doc=1948,freq=1.0), product of:
              0.11165951 = queryWeight, product of:
                1.0130816 = boost
                6.787693 = idf(docFreq=130, maxDocs=42740)
                0.016237872 = queryNorm
              0.4242308 = fieldWeight in 1948, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.787693 = idf(docFreq=130, maxDocs=42740)
                0.0625 = fieldNorm(doc=1948)
          0.05417998 = weight(abstract_txt:huge in 1948) [ClassicSimilarity], result of:
            0.05417998 = score(doc=1948,freq=1.0), product of:
              0.122120805 = queryWeight, product of:
                1.0594766 = boost
                7.098542 = idf(docFreq=95, maxDocs=42740)
                0.016237872 = queryNorm
              0.4436589 = fieldWeight in 1948, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.098542 = idf(docFreq=95, maxDocs=42740)
                0.0625 = fieldNorm(doc=1948)
          0.013953026 = weight(abstract_txt:that in 1948) [ClassicSimilarity], result of:
            0.013953026 = score(doc=1948,freq=5.0), product of:
              0.041692678 = queryWeight, product of:
                1.0722278 = boost
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.016237872 = queryNorm
              0.33466372 = fieldWeight in 1948, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.0625 = fieldNorm(doc=1948)
          0.020273848 = weight(abstract_txt:terms in 1948) [ClassicSimilarity], result of:
            0.020273848 = score(doc=1948,freq=1.0), product of:
              0.079897135 = queryWeight, product of:
                1.211929 = boost
                4.05999 = idf(docFreq=2003, maxDocs=42740)
                0.016237872 = queryNorm
              0.25374937 = fieldWeight in 1948, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.05999 = idf(docFreq=2003, maxDocs=42740)
                0.0625 = fieldNorm(doc=1948)
          0.059880428 = weight(abstract_txt:kinds in 1948) [ClassicSimilarity], result of:
            0.059880428 = score(doc=1948,freq=1.0), product of:
              0.1644739 = queryWeight, product of:
                1.7388418 = boost
                5.82516 = idf(docFreq=342, maxDocs=42740)
                0.016237872 = queryNorm
              0.3640725 = fieldWeight in 1948, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.82516 = idf(docFreq=342, maxDocs=42740)
                0.0625 = fieldNorm(doc=1948)
          0.06421477 = weight(abstract_txt:frequency in 1948) [ClassicSimilarity], result of:
            0.06421477 = score(doc=1948,freq=1.0), product of:
              0.17231789 = queryWeight, product of:
                1.7798227 = boost
                5.962447 = idf(docFreq=298, maxDocs=42740)
                0.016237872 = queryNorm
              0.37265295 = fieldWeight in 1948, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.962447 = idf(docFreq=298, maxDocs=42740)
                0.0625 = fieldNorm(doc=1948)
          0.112504646 = weight(abstract_txt:text in 1948) [ClassicSimilarity], result of:
            0.112504646 = score(doc=1948,freq=5.0), product of:
              0.1987669 = queryWeight, product of:
                3.0224116 = boost
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.016237872 = queryNorm
              0.566013 = fieldWeight in 1948, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.0625 = fieldNorm(doc=1948)
          0.26694825 = weight(abstract_txt:term in 1948) [ClassicSimilarity], result of:
            0.26694825 = score(doc=1948,freq=5.0), product of:
              0.39557105 = queryWeight, product of:
                5.044961 = boost
                4.8287816 = idf(docFreq=928, maxDocs=42740)
                0.016237872 = queryNorm
              0.6748428 = fieldWeight in 1948, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.8287816 = idf(docFreq=928, maxDocs=42740)
                0.0625 = fieldNorm(doc=1948)
        0.32 = coord(8/25)
    
  4. Lee, D.L.; Ren, L.: Document ranking on weight-partitioned signature files (1996) 0.19
    0.18752861 = sum of:
      0.18752861 = product of:
        0.7813692 = sum of:
          0.019604217 = weight(abstract_txt:results in 3418) [ClassicSimilarity], result of:
            0.019604217 = score(doc=3418,freq=1.0), product of:
              0.059622828 = queryWeight, product of:
                1.0469304 = boost
                3.5072412 = idf(docFreq=3482, maxDocs=42740)
                0.016237872 = queryNorm
              0.32880387 = fieldWeight in 3418, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5072412 = idf(docFreq=3482, maxDocs=42740)
                0.09375 = fieldNorm(doc=3418)
          0.009359974 = weight(abstract_txt:that in 3418) [ClassicSimilarity], result of:
            0.009359974 = score(doc=3418,freq=1.0), product of:
              0.041692678 = queryWeight, product of:
                1.0722278 = boost
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.016237872 = queryNorm
              0.22449924 = fieldWeight in 3418, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.09375 = fieldNorm(doc=3418)
          0.030410772 = weight(abstract_txt:terms in 3418) [ClassicSimilarity], result of:
            0.030410772 = score(doc=3418,freq=1.0), product of:
              0.079897135 = queryWeight, product of:
                1.211929 = boost
                4.05999 = idf(docFreq=2003, maxDocs=42740)
                0.016237872 = queryNorm
              0.38062406 = fieldWeight in 3418, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.05999 = idf(docFreq=2003, maxDocs=42740)
                0.09375 = fieldNorm(doc=3418)
          0.16683486 = weight(abstract_txt:frequency in 3418) [ClassicSimilarity], result of:
            0.16683486 = score(doc=3418,freq=3.0), product of:
              0.17231789 = queryWeight, product of:
                1.7798227 = boost
                5.962447 = idf(docFreq=298, maxDocs=42740)
                0.016237872 = queryNorm
              0.9681807 = fieldWeight in 3418, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.962447 = idf(docFreq=298, maxDocs=42740)
                0.09375 = fieldNorm(doc=3418)
          0.19701074 = weight(abstract_txt:frequencies in 3418) [ClassicSimilarity], result of:
            0.19701074 = score(doc=3418,freq=1.0), product of:
              0.27765545 = queryWeight, product of:
                2.2592518 = boost
                7.568546 = idf(docFreq=59, maxDocs=42740)
                0.016237872 = queryNorm
              0.70955116 = fieldWeight in 3418, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.568546 = idf(docFreq=59, maxDocs=42740)
                0.09375 = fieldNorm(doc=3418)
          0.35814866 = weight(abstract_txt:term in 3418) [ClassicSimilarity], result of:
            0.35814866 = score(doc=3418,freq=4.0), product of:
              0.39557105 = queryWeight, product of:
                5.044961 = boost
                4.8287816 = idf(docFreq=928, maxDocs=42740)
                0.016237872 = queryNorm
              0.9053966 = fieldWeight in 3418, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.8287816 = idf(docFreq=928, maxDocs=42740)
                0.09375 = fieldNorm(doc=3418)
        0.24 = coord(6/25)
    
  5. Liu, R.-L.: Dynamic category profiling for text filtering and classification (2007) 0.18
    0.1846713 = sum of:
      0.1846713 = product of:
        0.9233565 = sum of:
          0.016336845 = weight(abstract_txt:results in 2901) [ClassicSimilarity], result of:
            0.016336845 = score(doc=2901,freq=1.0), product of:
              0.059622828 = queryWeight, product of:
                1.0469304 = boost
                3.5072412 = idf(docFreq=3482, maxDocs=42740)
                0.016237872 = queryNorm
              0.2740032 = fieldWeight in 2901, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5072412 = idf(docFreq=3482, maxDocs=42740)
                0.078125 = fieldNorm(doc=2901)
          0.015599959 = weight(abstract_txt:that in 2901) [ClassicSimilarity], result of:
            0.015599959 = score(doc=2901,freq=4.0), product of:
              0.041692678 = queryWeight, product of:
                1.0722278 = boost
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.016237872 = queryNorm
              0.37416542 = fieldWeight in 2901, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.078125 = fieldNorm(doc=2901)
          0.024234604 = weight(abstract_txt:classification in 2901) [ClassicSimilarity], result of:
            0.024234604 = score(doc=2901,freq=1.0), product of:
              0.07755165 = queryWeight, product of:
                1.1940075 = boost
                3.9999528 = idf(docFreq=2127, maxDocs=42740)
                0.016237872 = queryNorm
              0.3124963 = fieldWeight in 2901, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9999528 = idf(docFreq=2127, maxDocs=42740)
                0.078125 = fieldNorm(doc=2901)
          0.06289201 = weight(abstract_txt:text in 2901) [ClassicSimilarity], result of:
            0.06289201 = score(doc=2901,freq=1.0), product of:
              0.1987669 = queryWeight, product of:
                3.0224116 = boost
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.016237872 = queryNorm
              0.3164109 = fieldWeight in 2901, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.078125 = fieldNorm(doc=2901)
          0.80429304 = weight(abstract_txt:classifiers in 2901) [ClassicSimilarity], result of:
            0.80429304 = score(doc=2901,freq=6.0), product of:
              0.5553109 = queryWeight, product of:
                4.5185037 = boost
                7.568546 = idf(docFreq=59, maxDocs=42740)
                0.016237872 = queryNorm
              1.4483653 = fieldWeight in 2901, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.568546 = idf(docFreq=59, maxDocs=42740)
                0.078125 = fieldNorm(doc=2901)
        0.2 = coord(5/25)