Search (2 results, page 1 of 1)

  • × author_ss:"Kwok, K.L."
  1. Kwok, K.L.: ¬The use of titles and cited titles as document representations for automatic classification (1975) 0.02
    0.018768014 = product of:
      0.05630404 = sum of:
        0.05630404 = product of:
          0.11260808 = sum of:
            0.11260808 = weight(_text_:indexing in 4347) [ClassicSimilarity], result of:
              0.11260808 = score(doc=4347,freq=2.0), product of:
                0.19018644 = queryWeight, product of:
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.049684696 = queryNorm
                0.5920931 = fieldWeight in 4347, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.109375 = fieldNorm(doc=4347)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Theme
    Citation indexing
  2. Kwok, K.L.: Employing multiple representations for Chinese information retrieval (1999) 0.01
    0.013931636 = product of:
      0.041794907 = sum of:
        0.041794907 = product of:
          0.083589815 = sum of:
            0.083589815 = weight(_text_:indexing in 3773) [ClassicSimilarity], result of:
              0.083589815 = score(doc=3773,freq=6.0), product of:
                0.19018644 = queryWeight, product of:
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.049684696 = queryNorm
                0.4395151 = fieldWeight in 3773, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3773)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    For information retrieval in the Chinese language, 3 representation methods for texts are popular, namely: 1-gram or character, bigram, and short-word. Each has its advantages as well as drawbacks. Employing more than one method may combine advantages from them and enhance retrieval effectiveness. We investigated 2 ways of using them simultaneously: mixing representations in documents and queries, and combining retrieval lists obtained via different representations. The experiments were done with the 170 MB evaluated Chinese corpora and 54 long and short queries available from the TREC program and using our Probabilistic Indexing and Retrieval Components System (PIRCS retrieval system). Experiments show that good retrieval need not depend on accurate word segmentation; approximate segmentation into short-words will do. Results also show and confirm that bigram representation alone works well; mixing characters with bigram representation boosts effectiveness further, but it is preferable to mix characters with short-word indexing which is more efficient, needs less resource, and gives better retrieval more often. Cobining retrieval lists from short-word with character representation and from bigram indexing provides the best retrieval results but also at a substabtial cost