Search (2 results, page 1 of 1)

  • × author_ss:"Ko, Y."
  • × language_ss:"e"
  • × theme_ss:"Automatisches Klassifizieren"
  1. Ko, Y.: ¬A new term-weighting scheme for text classification using the odds of positive and negative class probabilities (2015) 0.01
    0.007581752 = product of:
      0.05307226 = sum of:
        0.01712272 = weight(_text_:information in 2339) [ClassicSimilarity], result of:
          0.01712272 = score(doc=2339,freq=16.0), product of:
            0.052020688 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.029633347 = queryNorm
            0.3291521 = fieldWeight in 2339, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=2339)
        0.03594954 = weight(_text_:retrieval in 2339) [ClassicSimilarity], result of:
          0.03594954 = score(doc=2339,freq=8.0), product of:
            0.08963835 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.029633347 = queryNorm
            0.40105087 = fieldWeight in 2339, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=2339)
      0.14285715 = coord(2/14)
    
    Abstract
    Text classification (TC) is a core technique for text mining and information retrieval. It has been applied to many applications in many different research and industrial areas. Term-weighting schemes assign an appropriate weight to each term to obtain a high TC performance. Although term weighting is one of the important modules for TC and TC has different peculiarities from those in information retrieval, many term-weighting schemes used in information retrieval, such as term frequency-inverse document frequency (tf-idf), have been used in TC in the same manner. The peculiarity of TC that differs most from information retrieval is the existence of class information. This article proposes a new term-weighting scheme that uses class information using positive and negative class distributions. As a result, the proposed scheme, log tf-TRR, consistently performs better than do other schemes using class information as well as traditional schemes such as tf-idf.
    Source
    Journal of the Association for Information Science and Technology. 66(2015) no.12, S.2553-2565
  2. Ko, Y.; Seo, J.: Text classification from unlabeled documents with bootstrapping and feature projection techniques (2009) 0.00
    4.32414E-4 = product of:
      0.0060537956 = sum of:
        0.0060537956 = weight(_text_:information in 2452) [ClassicSimilarity], result of:
          0.0060537956 = score(doc=2452,freq=2.0), product of:
            0.052020688 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.029633347 = queryNorm
            0.116372846 = fieldWeight in 2452, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=2452)
      0.071428575 = coord(1/14)
    
    Source
    Information processing and management. 45(2009) no.1, S.70-83