Search (1 results, page 1 of 1)

  • × theme_ss:"Automatisches Klassifizieren"
  • × theme_ss:"Computerlinguistik"
  • × year_i:[2010 TO 2020}
  1. Ko, Y.: ¬A new term-weighting scheme for text classification using the odds of positive and negative class probabilities (2015) 0.17
    0.16593526 = product of:
      0.33187053 = sum of:
        0.17925708 = weight(_text_:term in 2339) [ClassicSimilarity], result of:
          0.17925708 = score(doc=2339,freq=14.0), product of:
            0.21904005 = queryWeight, product of:
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.04694356 = queryNorm
            0.8183758 = fieldWeight in 2339, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.046875 = fieldNorm(doc=2339)
        0.15261345 = weight(_text_:frequency in 2339) [ClassicSimilarity], result of:
          0.15261345 = score(doc=2339,freq=4.0), product of:
            0.27643865 = queryWeight, product of:
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.04694356 = queryNorm
            0.55206984 = fieldWeight in 2339, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.046875 = fieldNorm(doc=2339)
      0.5 = coord(2/4)
    
    Abstract
    Text classification (TC) is a core technique for text mining and information retrieval. It has been applied to many applications in many different research and industrial areas. Term-weighting schemes assign an appropriate weight to each term to obtain a high TC performance. Although term weighting is one of the important modules for TC and TC has different peculiarities from those in information retrieval, many term-weighting schemes used in information retrieval, such as term frequency-inverse document frequency (tf-idf), have been used in TC in the same manner. The peculiarity of TC that differs most from information retrieval is the existence of class information. This article proposes a new term-weighting scheme that uses class information using positive and negative class distributions. As a result, the proposed scheme, log tf-TRR, consistently performs better than do other schemes using class information as well as traditional schemes such as tf-idf.