Search (2 results, page 1 of 1)

  • × author_ss:"Ko, Y."
  • × theme_ss:"Computerlinguistik"
  1. Ko, Y.: ¬A new term-weighting scheme for text classification using the odds of positive and negative class probabilities (2015) 0.01
    0.009362629 = product of:
      0.02340657 = sum of:
        0.0100103095 = weight(_text_:a in 2339) [ClassicSimilarity], result of:
          0.0100103095 = score(doc=2339,freq=12.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.18723148 = fieldWeight in 2339, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=2339)
        0.013396261 = product of:
          0.026792523 = sum of:
            0.026792523 = weight(_text_:information in 2339) [ClassicSimilarity], result of:
              0.026792523 = score(doc=2339,freq=16.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.3291521 = fieldWeight in 2339, product of:
                  4.0 = tf(freq=16.0), with freq of:
                    16.0 = termFreq=16.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2339)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    Text classification (TC) is a core technique for text mining and information retrieval. It has been applied to many applications in many different research and industrial areas. Term-weighting schemes assign an appropriate weight to each term to obtain a high TC performance. Although term weighting is one of the important modules for TC and TC has different peculiarities from those in information retrieval, many term-weighting schemes used in information retrieval, such as term frequency-inverse document frequency (tf-idf), have been used in TC in the same manner. The peculiarity of TC that differs most from information retrieval is the existence of class information. This article proposes a new term-weighting scheme that uses class information using positive and negative class distributions. As a result, the proposed scheme, log tf-TRR, consistently performs better than do other schemes using class information as well as traditional schemes such as tf-idf.
    Source
    Journal of the Association for Information Science and Technology. 66(2015) no.12, S.2553-2565
    Type
    a
  2. Kim, S.; Ko, Y.; Oard, D.W.: Combining lexical and statistical translation evidence for cross-language information retrieval (2015) 0.01
    0.008412599 = product of:
      0.021031497 = sum of:
        0.01155891 = weight(_text_:a in 1606) [ClassicSimilarity], result of:
          0.01155891 = score(doc=1606,freq=16.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.2161963 = fieldWeight in 1606, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=1606)
        0.009472587 = product of:
          0.018945174 = sum of:
            0.018945174 = weight(_text_:information in 1606) [ClassicSimilarity], result of:
              0.018945174 = score(doc=1606,freq=8.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.23274569 = fieldWeight in 1606, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1606)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    This article explores how best to use lexical and statistical translation evidence together for cross-language information retrieval (CLIR). Lexical translation evidence is assembled from Wikipedia and from a large machine-readable dictionary, statistical translation evidence is drawn from parallel corpora, and evidence from co-occurrence in the document language provides a basis for limiting the adverse effect of translation ambiguity. Coverage statistics for NII Testbeds and Community for Information Access Research (NTCIR) queries confirm that these resources have complementary strengths. Experiments with translation evidence from a small parallel corpus indicate that even rather rough estimates of translation probabilities can yield further improvements over a strong technique for translation weighting based on using Jensen-Shannon divergence as a term-association measure. Finally, a novel approach to posttranslation query expansion using a random walk over the Wikipedia concept link graph is shown to yield further improvements over alternative techniques for posttranslation query expansion. Evaluation results on the NTCIR-5 English-Korean test collection show statistically significant improvements over strong baselines.
    Source
    Journal of the Association for Information Science and Technology. 66(2015) no.1, S.23-39
    Type
    a