Search (5 results, page 1 of 1)

  • × author_ss:"Khoo, C.S.G."
  1. Khoo, C.S.G.; Dai, D.; Loh, T.E.: Using statistical and contextual information to identify two- and three-character words in Chinese text (2002) 0.07
    0.06980721 = product of:
      0.34903604 = sum of:
        0.34903604 = weight(_text_:grams in 5206) [ClassicSimilarity], result of:
          0.34903604 = score(doc=5206,freq=8.0), product of:
            0.39198354 = queryWeight, product of:
              8.059301 = idf(docFreq=37, maxDocs=44218)
              0.04863741 = queryNorm
            0.89043546 = fieldWeight in 5206, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              8.059301 = idf(docFreq=37, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5206)
      0.2 = coord(1/5)
    
    Abstract
    Khoo, Dai, and Loh examine new statistical methods for the identification of two and three character words in Chinese text. Some meaningful Chinese words are simple (independent units of one or more characters in a sentence that have independent meaning) but others are compounds of two or more simple words. In their segmentation they utilize the Modern Chinese Word Segmentation for Application of Information Processing, with some modifications to focus on meaningful words to do manual segmentation. About 37% of meaningful words are longer than 2 characters indicating a need to handle three and four character words. Four hundred sentences from news articles were manually broken into overlapping bi-grams and tri-grams. Using logistic regression, the log of the odds that such bi/tri-grams were meaningful words was calculated. Variables like relative frequency, document frequency, local frequency, and contextual and positional information, were incorporated in the model only if the concordance measure improved by at least 2% with their addition. For two- and three-character words relative frequency of adjacent characters and document frequency of overlapping bi-grams were found to be significant. Using measures of recall and precision where correct automatic segmentation is normalized either by manual segmentation or by automatic segmentation, the contextual information formula for 2 character words provides significantly better results than previous formulations and using both the 2 and 3 character formulations in combination significantly improves the 2 character results.
  2. Khoo, C.S.G.; Teng, T.B.-R.; Ng, H.-C.; Wong, K.-P.: Developing a taxonomy to support user browsing and learning in a digital heritage portal with crowd-sourced content (2014) 0.01
    0.006589697 = product of:
      0.032948487 = sum of:
        0.032948487 = weight(_text_:22 in 1433) [ClassicSimilarity], result of:
          0.032948487 = score(doc=1433,freq=2.0), product of:
            0.17031991 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04863741 = queryNorm
            0.19345059 = fieldWeight in 1433, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1433)
      0.2 = coord(1/5)
    
    Source
    Knowledge organization in the 21st century: between historical patterns and future prospects. Proceedings of the Thirteenth International ISKO Conference 19-22 May 2014, Kraków, Poland. Ed.: Wieslaw Babik
  3. Khoo, C.S.G.; Ng, K.; Ou, S.: ¬An exploratory study of human clustering of Web pages (2003) 0.01
    0.005271758 = product of:
      0.026358789 = sum of:
        0.026358789 = weight(_text_:22 in 2741) [ClassicSimilarity], result of:
          0.026358789 = score(doc=2741,freq=2.0), product of:
            0.17031991 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04863741 = queryNorm
            0.15476047 = fieldWeight in 2741, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=2741)
      0.2 = coord(1/5)
    
    Date
    12. 9.2004 9:56:22
  4. Wang, Z.; Chaudhry, A.S.; Khoo, C.S.G.: Using classification schemes and thesauri to build an organizational taxonomy for organizing content and aiding navigation (2008) 0.01
    0.005271758 = product of:
      0.026358789 = sum of:
        0.026358789 = weight(_text_:22 in 2346) [ClassicSimilarity], result of:
          0.026358789 = score(doc=2346,freq=2.0), product of:
            0.17031991 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04863741 = queryNorm
            0.15476047 = fieldWeight in 2346, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=2346)
      0.2 = coord(1/5)
    
    Date
    7.11.2008 15:22:04
  5. Khoo, C.S.G.; Wan, K.-W.: ¬A simple relevancy-ranking strategy for an interface to Boolean OPACs (2004) 0.00
    0.004612788 = product of:
      0.02306394 = sum of:
        0.02306394 = weight(_text_:22 in 2509) [ClassicSimilarity], result of:
          0.02306394 = score(doc=2509,freq=2.0), product of:
            0.17031991 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04863741 = queryNorm
            0.1354154 = fieldWeight in 2509, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.02734375 = fieldNorm(doc=2509)
      0.2 = coord(1/5)
    
    Source
    Electronic library. 22(2004) no.2, S.112-120