Document (#30207)

Author
Khoo, C.S.G.
Dai, D.
Loh, T.E.
Title
Using statistical and contextual information to identify two- and three-character words in Chinese text
Source
Journal of the American Society for Information Science and technology. 53(2002) no.5, S.365-377
Year
2002
Abstract
Khoo, Dai, and Loh examine new statistical methods for the identification of two and three character words in Chinese text. Some meaningful Chinese words are simple (independent units of one or more characters in a sentence that have independent meaning) but others are compounds of two or more simple words. In their segmentation they utilize the Modern Chinese Word Segmentation for Application of Information Processing, with some modifications to focus on meaningful words to do manual segmentation. About 37% of meaningful words are longer than 2 characters indicating a need to handle three and four character words. Four hundred sentences from news articles were manually broken into overlapping bi-grams and tri-grams. Using logistic regression, the log of the odds that such bi/tri-grams were meaningful words was calculated. Variables like relative frequency, document frequency, local frequency, and contextual and positional information, were incorporated in the model only if the concordance measure improved by at least 2% with their addition. For two- and three-character words relative frequency of adjacent characters and document frequency of overlapping bi-grams were found to be significant. Using measures of recall and precision where correct automatic segmentation is normalized either by manual segmentation or by automatic segmentation, the contextual information formula for 2 character words provides significantly better results than previous formulations and using both the 2 and 3 character formulations in combination significantly improves the 2 character results.
Theme
Computerlinguistik

Similar documents (author)

  1. Khoo, C.S.G.; Poo, D.C.C.: ¬An expert system approach to online catalog subject searching (1994) 6.10
    6.1002054 = sum of:
      6.1002054 = sum of:
        2.7357059 = weight(author_txt:khoo in 7303) [ClassicSimilarity], result of:
          2.7357059 = score(doc=7303,freq=1.0), product of:
            0.65686435 = queryWeight, product of:
              8.329592 = idf(docFreq=28, maxDocs=44218)
              0.07885913 = queryNorm
            4.164796 = fieldWeight in 7303, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.329592 = idf(docFreq=28, maxDocs=44218)
              0.5 = fieldNorm(doc=7303)
        3.3644998 = weight(author_txt:c.s.g in 7303) [ClassicSimilarity], result of:
          3.3644998 = score(doc=7303,freq=1.0), product of:
            0.7540088 = queryWeight, product of:
              1.0713968 = boost
              8.924298 = idf(docFreq=15, maxDocs=44218)
              0.07885913 = queryNorm
            4.462149 = fieldWeight in 7303, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.924298 = idf(docFreq=15, maxDocs=44218)
              0.5 = fieldNorm(doc=7303)
    
  2. Chaudhry, A.S.; Khoo, C.S.G..: ¬A survey of the top-level categories in the structure of corporate Websites (2008) 6.10
    6.1002054 = sum of:
      6.1002054 = sum of:
        2.7357059 = weight(author_txt:khoo in 2259) [ClassicSimilarity], result of:
          2.7357059 = score(doc=2259,freq=1.0), product of:
            0.65686435 = queryWeight, product of:
              8.329592 = idf(docFreq=28, maxDocs=44218)
              0.07885913 = queryNorm
            4.164796 = fieldWeight in 2259, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.329592 = idf(docFreq=28, maxDocs=44218)
              0.5 = fieldNorm(doc=2259)
        3.3644998 = weight(author_txt:c.s.g in 2259) [ClassicSimilarity], result of:
          3.3644998 = score(doc=2259,freq=1.0), product of:
            0.7540088 = queryWeight, product of:
              1.0713968 = boost
              8.924298 = idf(docFreq=15, maxDocs=44218)
              0.07885913 = queryNorm
            4.462149 = fieldWeight in 2259, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.924298 = idf(docFreq=15, maxDocs=44218)
              0.5 = fieldNorm(doc=2259)
    
  3. Khoo, C.S.G.; Ou, S.: Machine versus human clustering of concepts across documents (2008) 6.10
    6.1002054 = sum of:
      6.1002054 = sum of:
        2.7357059 = weight(author_txt:khoo in 2286) [ClassicSimilarity], result of:
          2.7357059 = score(doc=2286,freq=1.0), product of:
            0.65686435 = queryWeight, product of:
              8.329592 = idf(docFreq=28, maxDocs=44218)
              0.07885913 = queryNorm
            4.164796 = fieldWeight in 2286, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.329592 = idf(docFreq=28, maxDocs=44218)
              0.5 = fieldNorm(doc=2286)
        3.3644998 = weight(author_txt:c.s.g in 2286) [ClassicSimilarity], result of:
          3.3644998 = score(doc=2286,freq=1.0), product of:
            0.7540088 = queryWeight, product of:
              1.0713968 = boost
              8.924298 = idf(docFreq=15, maxDocs=44218)
              0.07885913 = queryNorm
            4.462149 = fieldWeight in 2286, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.924298 = idf(docFreq=15, maxDocs=44218)
              0.5 = fieldNorm(doc=2286)
    
  4. Poo, D.C.C.; Khoo, C.S.G.: Online Catalog Subject Searching (2009) 6.10
    6.1002054 = sum of:
      6.1002054 = sum of:
        2.7357059 = weight(author_txt:khoo in 3851) [ClassicSimilarity], result of:
          2.7357059 = score(doc=3851,freq=1.0), product of:
            0.65686435 = queryWeight, product of:
              8.329592 = idf(docFreq=28, maxDocs=44218)
              0.07885913 = queryNorm
            4.164796 = fieldWeight in 3851, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.329592 = idf(docFreq=28, maxDocs=44218)
              0.5 = fieldNorm(doc=3851)
        3.3644998 = weight(author_txt:c.s.g in 3851) [ClassicSimilarity], result of:
          3.3644998 = score(doc=3851,freq=1.0), product of:
            0.7540088 = queryWeight, product of:
              1.0713968 = boost
              8.924298 = idf(docFreq=15, maxDocs=44218)
              0.07885913 = queryNorm
            4.462149 = fieldWeight in 3851, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.924298 = idf(docFreq=15, maxDocs=44218)
              0.5 = fieldNorm(doc=3851)
    
  5. Sun, G.; Khoo, C.S.G.: ¬A framework to represent variables and values in social science research data sets to support data curation and reuse (2018) 6.10
    6.1002054 = sum of:
      6.1002054 = sum of:
        2.7357059 = weight(author_txt:khoo in 4744) [ClassicSimilarity], result of:
          2.7357059 = score(doc=4744,freq=1.0), product of:
            0.65686435 = queryWeight, product of:
              8.329592 = idf(docFreq=28, maxDocs=44218)
              0.07885913 = queryNorm
            4.164796 = fieldWeight in 4744, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.329592 = idf(docFreq=28, maxDocs=44218)
              0.5 = fieldNorm(doc=4744)
        3.3644998 = weight(author_txt:c.s.g in 4744) [ClassicSimilarity], result of:
          3.3644998 = score(doc=4744,freq=1.0), product of:
            0.7540088 = queryWeight, product of:
              1.0713968 = boost
              8.924298 = idf(docFreq=15, maxDocs=44218)
              0.07885913 = queryNorm
            4.462149 = fieldWeight in 4744, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.924298 = idf(docFreq=15, maxDocs=44218)
              0.5 = fieldNorm(doc=4744)
    

Similar documents (content)

  1. Yang, C.C.; Li, K.W.: ¬A heuristic method based on a statistical approach for chinese text segmentation (2005) 0.62
    0.6167229 = sum of:
      0.6167229 = product of:
        1.713119 = sum of:
          0.05597283 = weight(abstract_txt:adjacent in 4580) [ClassicSimilarity], result of:
            0.05597283 = score(doc=4580,freq=1.0), product of:
              0.1010377 = queryWeight, product of:
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.011399077 = queryNorm
              0.55397964 = fieldWeight in 4580, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.0625 = fieldNorm(doc=4580)
          0.007901546 = weight(abstract_txt:information in 4580) [ClassicSimilarity], result of:
            0.007901546 = score(doc=4580,freq=3.0), product of:
              0.030149926 = queryWeight, product of:
                1.092525 = boost
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.011399077 = queryNorm
              0.26207513 = fieldWeight in 4580, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.0625 = fieldNorm(doc=4580)
          0.022546155 = weight(abstract_txt:automatic in 4580) [ClassicSimilarity], result of:
            0.022546155 = score(doc=4580,freq=1.0), product of:
              0.069431566 = queryWeight, product of:
                1.172335 = boost
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.011399077 = queryNorm
              0.32472485 = fieldWeight in 4580, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.0625 = fieldNorm(doc=4580)
          0.038787108 = weight(abstract_txt:statistical in 4580) [ClassicSimilarity], result of:
            0.038787108 = score(doc=4580,freq=2.0), product of:
              0.07912072 = queryWeight, product of:
                1.251464 = boost
                5.5462847 = idf(docFreq=468, maxDocs=44218)
                0.011399077 = queryNorm
              0.49022692 = fieldWeight in 4580, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.5462847 = idf(docFreq=468, maxDocs=44218)
                0.0625 = fieldNorm(doc=4580)
          0.09689921 = weight(abstract_txt:characters in 4580) [ClassicSimilarity], result of:
            0.09689921 = score(doc=4580,freq=1.0), product of:
              0.21009669 = queryWeight, product of:
                2.4976323 = boost
                7.3793993 = idf(docFreq=74, maxDocs=44218)
                0.011399077 = queryNorm
              0.46121246 = fieldWeight in 4580, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3793993 = idf(docFreq=74, maxDocs=44218)
                0.0625 = fieldNorm(doc=4580)
          0.24155317 = weight(abstract_txt:chinese in 4580) [ClassicSimilarity], result of:
            0.24155317 = score(doc=4580,freq=9.0), product of:
              0.20438372 = queryWeight, product of:
                2.8445358 = boost
                6.30326 = idf(docFreq=219, maxDocs=44218)
                0.011399077 = queryNorm
              1.1818612 = fieldWeight in 4580, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                6.30326 = idf(docFreq=219, maxDocs=44218)
                0.0625 = fieldNorm(doc=4580)
          0.25073373 = weight(abstract_txt:grams in 4580) [ClassicSimilarity], result of:
            0.25073373 = score(doc=4580,freq=2.0), product of:
              0.34592646 = queryWeight, product of:
                3.7006683 = boost
                8.200379 = idf(docFreq=32, maxDocs=44218)
                0.011399077 = queryNorm
              0.724818 = fieldWeight in 4580, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.200379 = idf(docFreq=32, maxDocs=44218)
                0.0625 = fieldNorm(doc=4580)
          0.72303945 = weight(abstract_txt:segmentation in 4580) [ClassicSimilarity], result of:
            0.72303945 = score(doc=4580,freq=9.0), product of:
              0.4859328 = queryWeight, product of:
                5.3718266 = boost
                7.935687 = idf(docFreq=42, maxDocs=44218)
                0.011399077 = queryNorm
              1.4879413 = fieldWeight in 4580, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                7.935687 = idf(docFreq=42, maxDocs=44218)
                0.0625 = fieldNorm(doc=4580)
          0.27568576 = weight(abstract_txt:words in 4580) [ClassicSimilarity], result of:
            0.27568576 = score(doc=4580,freq=5.0), product of:
              0.3685119 = queryWeight, product of:
                6.0392637 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.011399077 = queryNorm
              0.74810547 = fieldWeight in 4580, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.0625 = fieldNorm(doc=4580)
        0.36 = coord(9/25)
    
  2. Wang, F.L.; Yang, C.C.: Mining Web data for Chinese segmentation (2007) 0.35
    0.35470915 = sum of:
      0.35470915 = product of:
        1.4779549 = sum of:
          0.0045619593 = weight(abstract_txt:information in 604) [ClassicSimilarity], result of:
            0.0045619593 = score(doc=604,freq=1.0), product of:
              0.030149926 = queryWeight, product of:
                1.092525 = boost
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.011399077 = queryNorm
              0.15130915 = fieldWeight in 604, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.0625 = fieldNorm(doc=604)
          0.031585917 = weight(abstract_txt:independent in 604) [ClassicSimilarity], result of:
            0.031585917 = score(doc=604,freq=1.0), product of:
              0.08693025 = queryWeight, product of:
                1.3117732 = boost
                5.813565 = idf(docFreq=358, maxDocs=44218)
                0.011399077 = queryNorm
              0.3633478 = fieldWeight in 604, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.813565 = idf(docFreq=358, maxDocs=44218)
                0.0625 = fieldNorm(doc=604)
          0.21302988 = weight(abstract_txt:chinese in 604) [ClassicSimilarity], result of:
            0.21302988 = score(doc=604,freq=7.0), product of:
              0.20438372 = queryWeight, product of:
                2.8445358 = boost
                6.30326 = idf(docFreq=219, maxDocs=44218)
                0.011399077 = queryNorm
              1.0423036 = fieldWeight in 604, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                6.30326 = idf(docFreq=219, maxDocs=44218)
                0.0625 = fieldNorm(doc=604)
          0.22004575 = weight(abstract_txt:character in 604) [ClassicSimilarity], result of:
            0.22004575 = score(doc=604,freq=2.0), product of:
              0.38211724 = queryWeight, product of:
                5.14524 = boost
                6.515104 = idf(docFreq=177, maxDocs=44218)
                0.011399077 = queryNorm
              0.57585925 = fieldWeight in 604, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.515104 = idf(docFreq=177, maxDocs=44218)
                0.0625 = fieldNorm(doc=604)
          0.7621506 = weight(abstract_txt:segmentation in 604) [ClassicSimilarity], result of:
            0.7621506 = score(doc=604,freq=10.0), product of:
              0.4859328 = queryWeight, product of:
                5.3718266 = boost
                7.935687 = idf(docFreq=42, maxDocs=44218)
                0.011399077 = queryNorm
              1.5684279 = fieldWeight in 604, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                7.935687 = idf(docFreq=42, maxDocs=44218)
                0.0625 = fieldNorm(doc=604)
          0.24658082 = weight(abstract_txt:words in 604) [ClassicSimilarity], result of:
            0.24658082 = score(doc=604,freq=4.0), product of:
              0.3685119 = queryWeight, product of:
                6.0392637 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.011399077 = queryNorm
              0.66912585 = fieldWeight in 604, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.0625 = fieldNorm(doc=604)
        0.24 = coord(6/25)
    
  3. Kwok, K.L.: Employing multiple representations for Chinese information retrieval (1999) 0.31
    0.31181628 = sum of:
      0.31181628 = product of:
        0.97442585 = sum of:
          0.0045619593 = weight(abstract_txt:information in 3773) [ClassicSimilarity], result of:
            0.0045619593 = score(doc=3773,freq=1.0), product of:
              0.030149926 = queryWeight, product of:
                1.092525 = boost
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.011399077 = queryNorm
              0.15130915 = fieldWeight in 3773, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.0625 = fieldNorm(doc=3773)
          0.018884826 = weight(abstract_txt:using in 3773) [ClassicSimilarity], result of:
            0.018884826 = score(doc=3773,freq=2.0), product of:
              0.06169509 = queryWeight, product of:
                1.5628365 = boost
                3.4631186 = idf(docFreq=3765, maxDocs=44218)
                0.011399077 = queryNorm
              0.30609933 = fieldWeight in 3773, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4631186 = idf(docFreq=3765, maxDocs=44218)
                0.0625 = fieldNorm(doc=3773)
          0.015893398 = weight(abstract_txt:were in 3773) [ClassicSimilarity], result of:
            0.015893398 = score(doc=3773,freq=1.0), product of:
              0.06928881 = queryWeight, product of:
                1.6562269 = boost
                3.6700637 = idf(docFreq=3061, maxDocs=44218)
                0.011399077 = queryNorm
              0.22937898 = fieldWeight in 3773, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6700637 = idf(docFreq=3061, maxDocs=44218)
                0.0625 = fieldNorm(doc=3773)
          0.13703617 = weight(abstract_txt:characters in 3773) [ClassicSimilarity], result of:
            0.13703617 = score(doc=3773,freq=2.0), product of:
              0.21009669 = queryWeight, product of:
                2.4976323 = boost
                7.3793993 = idf(docFreq=74, maxDocs=44218)
                0.011399077 = queryNorm
              0.6522529 = fieldWeight in 3773, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.3793993 = idf(docFreq=74, maxDocs=44218)
                0.0625 = fieldNorm(doc=3773)
          0.11386926 = weight(abstract_txt:chinese in 3773) [ClassicSimilarity], result of:
            0.11386926 = score(doc=3773,freq=2.0), product of:
              0.20438372 = queryWeight, product of:
                2.8445358 = boost
                6.30326 = idf(docFreq=219, maxDocs=44218)
                0.011399077 = queryNorm
              0.5571347 = fieldWeight in 3773, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.30326 = idf(docFreq=219, maxDocs=44218)
                0.0625 = fieldNorm(doc=3773)
          0.22004575 = weight(abstract_txt:character in 3773) [ClassicSimilarity], result of:
            0.22004575 = score(doc=3773,freq=2.0), product of:
              0.38211724 = queryWeight, product of:
                5.14524 = boost
                6.515104 = idf(docFreq=177, maxDocs=44218)
                0.011399077 = queryNorm
              0.57585925 = fieldWeight in 3773, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.515104 = idf(docFreq=177, maxDocs=44218)
                0.0625 = fieldNorm(doc=3773)
          0.3408441 = weight(abstract_txt:segmentation in 3773) [ClassicSimilarity], result of:
            0.3408441 = score(doc=3773,freq=2.0), product of:
              0.4859328 = queryWeight, product of:
                5.3718266 = boost
                7.935687 = idf(docFreq=42, maxDocs=44218)
                0.011399077 = queryNorm
              0.7014223 = fieldWeight in 3773, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.935687 = idf(docFreq=42, maxDocs=44218)
                0.0625 = fieldNorm(doc=3773)
          0.12329041 = weight(abstract_txt:words in 3773) [ClassicSimilarity], result of:
            0.12329041 = score(doc=3773,freq=1.0), product of:
              0.3685119 = queryWeight, product of:
                6.0392637 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.011399077 = queryNorm
              0.33456293 = fieldWeight in 3773, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.0625 = fieldNorm(doc=3773)
        0.32 = coord(8/25)
    
  4. Lee, K.H.; Ng, M.K.M.; Lu, Q.: Text segmentation for Chinese spell checking (1999) 0.30
    0.3030908 = sum of:
      0.3030908 = product of:
        1.2628784 = sum of:
          0.09689921 = weight(abstract_txt:characters in 3913) [ClassicSimilarity], result of:
            0.09689921 = score(doc=3913,freq=1.0), product of:
              0.21009669 = queryWeight, product of:
                2.4976323 = boost
                7.3793993 = idf(docFreq=74, maxDocs=44218)
                0.011399077 = queryNorm
              0.46121246 = fieldWeight in 3913, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3793993 = idf(docFreq=74, maxDocs=44218)
                0.0625 = fieldNorm(doc=3913)
          0.19722736 = weight(abstract_txt:chinese in 3913) [ClassicSimilarity], result of:
            0.19722736 = score(doc=3913,freq=6.0), product of:
              0.20438372 = queryWeight, product of:
                2.8445358 = boost
                6.30326 = idf(docFreq=219, maxDocs=44218)
                0.011399077 = queryNorm
              0.96498567 = fieldWeight in 3913, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.30326 = idf(docFreq=219, maxDocs=44218)
                0.0625 = fieldNorm(doc=3913)
          0.0845489 = weight(abstract_txt:frequency in 3913) [ClassicSimilarity], result of:
            0.0845489 = score(doc=3913,freq=1.0), product of:
              0.22745419 = queryWeight, product of:
                3.3549824 = boost
                5.947494 = idf(docFreq=313, maxDocs=44218)
                0.011399077 = queryNorm
              0.37171838 = fieldWeight in 3913, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.947494 = idf(docFreq=313, maxDocs=44218)
                0.0625 = fieldNorm(doc=3913)
          0.15559584 = weight(abstract_txt:character in 3913) [ClassicSimilarity], result of:
            0.15559584 = score(doc=3913,freq=1.0), product of:
              0.38211724 = queryWeight, product of:
                5.14524 = boost
                6.515104 = idf(docFreq=177, maxDocs=44218)
                0.011399077 = queryNorm
              0.407194 = fieldWeight in 3913, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.515104 = idf(docFreq=177, maxDocs=44218)
                0.0625 = fieldNorm(doc=3913)
          0.48202634 = weight(abstract_txt:segmentation in 3913) [ClassicSimilarity], result of:
            0.48202634 = score(doc=3913,freq=4.0), product of:
              0.4859328 = queryWeight, product of:
                5.3718266 = boost
                7.935687 = idf(docFreq=42, maxDocs=44218)
                0.011399077 = queryNorm
              0.9919609 = fieldWeight in 3913, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.935687 = idf(docFreq=42, maxDocs=44218)
                0.0625 = fieldNorm(doc=3913)
          0.24658082 = weight(abstract_txt:words in 3913) [ClassicSimilarity], result of:
            0.24658082 = score(doc=3913,freq=4.0), product of:
              0.3685119 = queryWeight, product of:
                6.0392637 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.011399077 = queryNorm
              0.66912585 = fieldWeight in 3913, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.0625 = fieldNorm(doc=3913)
        0.24 = coord(6/25)
    
  5. Peng, F.; Huang, X.: Machine learning for Asian language text classification (2007) 0.26
    0.25631294 = sum of:
      0.25631294 = product of:
        0.9154033 = sum of:
          0.006451585 = weight(abstract_txt:information in 831) [ClassicSimilarity], result of:
            0.006451585 = score(doc=831,freq=2.0), product of:
              0.030149926 = queryWeight, product of:
                1.092525 = boost
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.011399077 = queryNorm
              0.21398345 = fieldWeight in 831, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.0625 = fieldNorm(doc=831)
          0.024230188 = weight(abstract_txt:simple in 831) [ClassicSimilarity], result of:
            0.024230188 = score(doc=831,freq=1.0), product of:
              0.07284725 = queryWeight, product of:
                1.2008253 = boost
                5.321862 = idf(docFreq=586, maxDocs=44218)
                0.011399077 = queryNorm
              0.3326164 = fieldWeight in 831, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.321862 = idf(docFreq=586, maxDocs=44218)
                0.0625 = fieldNorm(doc=831)
          0.026811766 = weight(abstract_txt:significantly in 831) [ClassicSimilarity], result of:
            0.026811766 = score(doc=831,freq=1.0), product of:
              0.07793375 = queryWeight, product of:
                1.2420413 = boost
                5.5045247 = idf(docFreq=488, maxDocs=44218)
                0.011399077 = queryNorm
              0.3440328 = fieldWeight in 831, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5045247 = idf(docFreq=488, maxDocs=44218)
                0.0625 = fieldNorm(doc=831)
          0.027426627 = weight(abstract_txt:statistical in 831) [ClassicSimilarity], result of:
            0.027426627 = score(doc=831,freq=1.0), product of:
              0.07912072 = queryWeight, product of:
                1.251464 = boost
                5.5462847 = idf(docFreq=468, maxDocs=44218)
                0.011399077 = queryNorm
              0.3466428 = fieldWeight in 831, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5462847 = idf(docFreq=468, maxDocs=44218)
                0.0625 = fieldNorm(doc=831)
          0.031786796 = weight(abstract_txt:were in 831) [ClassicSimilarity], result of:
            0.031786796 = score(doc=831,freq=4.0), product of:
              0.06928881 = queryWeight, product of:
                1.6562269 = boost
                3.6700637 = idf(docFreq=3061, maxDocs=44218)
                0.011399077 = queryNorm
              0.45875797 = fieldWeight in 831, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.6700637 = idf(docFreq=3061, maxDocs=44218)
                0.0625 = fieldNorm(doc=831)
          0.16103546 = weight(abstract_txt:chinese in 831) [ClassicSimilarity], result of:
            0.16103546 = score(doc=831,freq=4.0), product of:
              0.20438372 = queryWeight, product of:
                2.8445358 = boost
                6.30326 = idf(docFreq=219, maxDocs=44218)
                0.011399077 = queryNorm
              0.7879075 = fieldWeight in 831, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.30326 = idf(docFreq=219, maxDocs=44218)
                0.0625 = fieldNorm(doc=831)
          0.63766086 = weight(abstract_txt:segmentation in 831) [ClassicSimilarity], result of:
            0.63766086 = score(doc=831,freq=7.0), product of:
              0.4859328 = queryWeight, product of:
                5.3718266 = boost
                7.935687 = idf(docFreq=42, maxDocs=44218)
                0.011399077 = queryNorm
              1.3122408 = fieldWeight in 831, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                7.935687 = idf(docFreq=42, maxDocs=44218)
                0.0625 = fieldNorm(doc=831)
        0.28 = coord(7/25)