Ko, Y.: ¬A new term-weighting scheme for text classification using the odds of positive and negative class probabilities (2015)
0.02
0.024449104 = product of:
0.11409582 = sum of:
0.028549349 = weight(_text_:classification in 2339) [ClassicSimilarity], result of:
0.028549349 = score(doc=2339,freq=4.0), product of:
0.09562149 = queryWeight, product of:
3.1847067 = idf(docFreq=4974, maxDocs=44218)
0.03002521 = queryNorm
0.29856625 = fieldWeight in 2339, product of:
2.0 = tf(freq=4.0), with freq of:
4.0 = termFreq=4.0
3.1847067 = idf(docFreq=4974, maxDocs=44218)
0.046875 = fieldNorm(doc=2339)
0.05699712 = product of:
0.11399424 = sum of:
0.11399424 = weight(_text_:schemes in 2339) [ClassicSimilarity], result of:
0.11399424 = score(doc=2339,freq=8.0), product of:
0.16067243 = queryWeight, product of:
5.3512506 = idf(docFreq=569, maxDocs=44218)
0.03002521 = queryNorm
0.7094823 = fieldWeight in 2339, product of:
2.828427 = tf(freq=8.0), with freq of:
8.0 = termFreq=8.0
5.3512506 = idf(docFreq=569, maxDocs=44218)
0.046875 = fieldNorm(doc=2339)
0.5 = coord(1/2)
0.028549349 = weight(_text_:classification in 2339) [ClassicSimilarity], result of:
0.028549349 = score(doc=2339,freq=4.0), product of:
0.09562149 = queryWeight, product of:
3.1847067 = idf(docFreq=4974, maxDocs=44218)
0.03002521 = queryNorm
0.29856625 = fieldWeight in 2339, product of:
2.0 = tf(freq=4.0), with freq of:
4.0 = termFreq=4.0
3.1847067 = idf(docFreq=4974, maxDocs=44218)
0.046875 = fieldNorm(doc=2339)
0.21428572 = coord(3/14)
- Abstract
- Text classification (TC) is a core technique for text mining and information retrieval. It has been applied to many applications in many different research and industrial areas. Term-weighting schemes assign an appropriate weight to each term to obtain a high TC performance. Although term weighting is one of the important modules for TC and TC has different peculiarities from those in information retrieval, many term-weighting schemes used in information retrieval, such as term frequency-inverse document frequency (tf-idf), have been used in TC in the same manner. The peculiarity of TC that differs most from information retrieval is the existence of class information. This article proposes a new term-weighting scheme that uses class information using positive and negative class distributions. As a result, the proposed scheme, log tf-TRR, consistently performs better than do other schemes using class information as well as traditional schemes such as tf-idf.