Document (#30208)

Author
Khoo, C.S.G.
Dai, D.
Loh, T.E.
Title
Using statistical and contextual information to identify two- and three-character words in Chinese text
Source
Journal of the American Society for Information Science and technology. 53(2002) no.5, S.365-377
Year
2002
Abstract
Khoo, Dai, and Loh examine new statistical methods for the identification of two and three character words in Chinese text. Some meaningful Chinese words are simple (independent units of one or more characters in a sentence that have independent meaning) but others are compounds of two or more simple words. In their segmentation they utilize the Modern Chinese Word Segmentation for Application of Information Processing, with some modifications to focus on meaningful words to do manual segmentation. About 37% of meaningful words are longer than 2 characters indicating a need to handle three and four character words. Four hundred sentences from news articles were manually broken into overlapping bi-grams and tri-grams. Using logistic regression, the log of the odds that such bi/tri-grams were meaningful words was calculated. Variables like relative frequency, document frequency, local frequency, and contextual and positional information, were incorporated in the model only if the concordance measure improved by at least 2% with their addition. For two- and three-character words relative frequency of adjacent characters and document frequency of overlapping bi-grams were found to be significant. Using measures of recall and precision where correct automatic segmentation is normalized either by manual segmentation or by automatic segmentation, the contextual information formula for 2 character words provides significantly better results than previous formulations and using both the 2 and 3 character formulations in combination significantly improves the 2 character results.
Theme
Computerlinguistik

Similar documents (author)

  1. Khoo, C.S.G.; Poo, D.C.C.: ¬An expert system approach to online catalog subject searching (1994) 6.12
    6.119851 = sum of:
      6.119851 = sum of:
        2.730059 = weight(author_txt:khoo in 303) [ClassicSimilarity], result of:
          2.730059 = score(doc=303,freq=1.0), product of:
            0.65448314 = queryWeight, product of:
              8.342641 = idf(docFreq=27, maxDocs=43254)
              0.07845035 = queryNorm
            4.1713204 = fieldWeight in 303, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.342641 = idf(docFreq=27, maxDocs=43254)
              0.5 = fieldNorm(doc=303)
        3.389792 = weight(author_txt:c.s.g in 303) [ClassicSimilarity], result of:
          3.389792 = score(doc=303,freq=1.0), product of:
            0.75607663 = queryWeight, product of:
              1.0748149 = boost
              8.966795 = idf(docFreq=14, maxDocs=43254)
              0.07845035 = queryNorm
            4.4833975 = fieldWeight in 303, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.966795 = idf(docFreq=14, maxDocs=43254)
              0.5 = fieldNorm(doc=303)
    
  2. Chaudhry, A.S.; Khoo, C.S.G..: ¬A survey of the top-level categories in the structure of corporate Websites (2008) 6.12
    6.119851 = sum of:
      6.119851 = sum of:
        2.730059 = weight(author_txt:khoo in 4260) [ClassicSimilarity], result of:
          2.730059 = score(doc=4260,freq=1.0), product of:
            0.65448314 = queryWeight, product of:
              8.342641 = idf(docFreq=27, maxDocs=43254)
              0.07845035 = queryNorm
            4.1713204 = fieldWeight in 4260, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.342641 = idf(docFreq=27, maxDocs=43254)
              0.5 = fieldNorm(doc=4260)
        3.389792 = weight(author_txt:c.s.g in 4260) [ClassicSimilarity], result of:
          3.389792 = score(doc=4260,freq=1.0), product of:
            0.75607663 = queryWeight, product of:
              1.0748149 = boost
              8.966795 = idf(docFreq=14, maxDocs=43254)
              0.07845035 = queryNorm
            4.4833975 = fieldWeight in 4260, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.966795 = idf(docFreq=14, maxDocs=43254)
              0.5 = fieldNorm(doc=4260)
    
  3. Khoo, C.S.G.; Ou, S.: Machine versus human clustering of concepts across documents (2008) 6.12
    6.119851 = sum of:
      6.119851 = sum of:
        2.730059 = weight(author_txt:khoo in 4287) [ClassicSimilarity], result of:
          2.730059 = score(doc=4287,freq=1.0), product of:
            0.65448314 = queryWeight, product of:
              8.342641 = idf(docFreq=27, maxDocs=43254)
              0.07845035 = queryNorm
            4.1713204 = fieldWeight in 4287, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.342641 = idf(docFreq=27, maxDocs=43254)
              0.5 = fieldNorm(doc=4287)
        3.389792 = weight(author_txt:c.s.g in 4287) [ClassicSimilarity], result of:
          3.389792 = score(doc=4287,freq=1.0), product of:
            0.75607663 = queryWeight, product of:
              1.0748149 = boost
              8.966795 = idf(docFreq=14, maxDocs=43254)
              0.07845035 = queryNorm
            4.4833975 = fieldWeight in 4287, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.966795 = idf(docFreq=14, maxDocs=43254)
              0.5 = fieldNorm(doc=4287)
    
  4. Poo, D.C.C.; Khoo, C.S.G.: Online Catalog Subject Searching (2009) 6.12
    6.119851 = sum of:
      6.119851 = sum of:
        2.730059 = weight(author_txt:khoo in 316) [ClassicSimilarity], result of:
          2.730059 = score(doc=316,freq=1.0), product of:
            0.65448314 = queryWeight, product of:
              8.342641 = idf(docFreq=27, maxDocs=43254)
              0.07845035 = queryNorm
            4.1713204 = fieldWeight in 316, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.342641 = idf(docFreq=27, maxDocs=43254)
              0.5 = fieldNorm(doc=316)
        3.389792 = weight(author_txt:c.s.g in 316) [ClassicSimilarity], result of:
          3.389792 = score(doc=316,freq=1.0), product of:
            0.75607663 = queryWeight, product of:
              1.0748149 = boost
              8.966795 = idf(docFreq=14, maxDocs=43254)
              0.07845035 = queryNorm
            4.4833975 = fieldWeight in 316, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.966795 = idf(docFreq=14, maxDocs=43254)
              0.5 = fieldNorm(doc=316)
    
  5. Sun, G.; Khoo, C.S.G.: ¬A framework to represent variables and values in social science research data sets to support data curation and reuse (2018) 6.12
    6.119851 = sum of:
      6.119851 = sum of:
        2.730059 = weight(author_txt:khoo in 745) [ClassicSimilarity], result of:
          2.730059 = score(doc=745,freq=1.0), product of:
            0.65448314 = queryWeight, product of:
              8.342641 = idf(docFreq=27, maxDocs=43254)
              0.07845035 = queryNorm
            4.1713204 = fieldWeight in 745, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.342641 = idf(docFreq=27, maxDocs=43254)
              0.5 = fieldNorm(doc=745)
        3.389792 = weight(author_txt:c.s.g in 745) [ClassicSimilarity], result of:
          3.389792 = score(doc=745,freq=1.0), product of:
            0.75607663 = queryWeight, product of:
              1.0748149 = boost
              8.966795 = idf(docFreq=14, maxDocs=43254)
              0.07845035 = queryNorm
            4.4833975 = fieldWeight in 745, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.966795 = idf(docFreq=14, maxDocs=43254)
              0.5 = fieldNorm(doc=745)
    

Similar documents (content)

  1. Yang, C.C.; Li, K.W.: ¬A heuristic method based on a statistical approach for chinese text segmentation (2005) 0.61
    0.61492586 = sum of:
      0.61492586 = product of:
        1.7081274 = sum of:
          0.05561544 = weight(abstract_txt:adjacent in 581) [ClassicSimilarity], result of:
            0.05561544 = score(doc=581,freq=1.0), product of:
              0.10064285 = queryWeight, product of:
                8.841632 = idf(docFreq=16, maxDocs=43254)
                0.011382837 = queryNorm
              0.552602 = fieldWeight in 581, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.841632 = idf(docFreq=16, maxDocs=43254)
                0.0625 = fieldNorm(doc=581)
          0.007968656 = weight(abstract_txt:information in 581) [ClassicSimilarity], result of:
            0.007968656 = score(doc=581,freq=3.0), product of:
              0.030331159 = queryWeight, product of:
                1.0979512 = boost
                2.42692 = idf(docFreq=10382, maxDocs=43254)
                0.011382837 = queryNorm
              0.26272178 = fieldWeight in 581, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.42692 = idf(docFreq=10382, maxDocs=43254)
                0.0625 = fieldNorm(doc=581)
          0.022579774 = weight(abstract_txt:automatic in 581) [ClassicSimilarity], result of:
            0.022579774 = score(doc=581,freq=1.0), product of:
              0.06952523 = queryWeight, product of:
                1.1754245 = boost
                5.1963353 = idf(docFreq=650, maxDocs=43254)
                0.011382837 = queryNorm
              0.32477096 = fieldWeight in 581, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1963353 = idf(docFreq=650, maxDocs=43254)
                0.0625 = fieldNorm(doc=581)
          0.03881812 = weight(abstract_txt:statistical in 581) [ClassicSimilarity], result of:
            0.03881812 = score(doc=581,freq=2.0), product of:
              0.07919098 = queryWeight, product of:
                1.2544732 = boost
                5.545795 = idf(docFreq=458, maxDocs=43254)
                0.011382837 = queryNorm
              0.49018365 = fieldWeight in 581, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.545795 = idf(docFreq=458, maxDocs=43254)
                0.0625 = fieldNorm(doc=581)
          0.096135736 = weight(abstract_txt:characters in 581) [ClassicSimilarity], result of:
            0.096135736 = score(doc=581,freq=1.0), product of:
              0.20906581 = queryWeight, product of:
                2.4963799 = boost
                7.357357 = idf(docFreq=74, maxDocs=43254)
                0.011382837 = queryNorm
              0.4598348 = fieldWeight in 581, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.357357 = idf(docFreq=74, maxDocs=43254)
                0.0625 = fieldNorm(doc=581)
          0.2440879 = weight(abstract_txt:chinese in 581) [ClassicSimilarity], result of:
            0.2440879 = score(doc=581,freq=9.0), product of:
              0.20588404 = queryWeight, product of:
                2.860552 = boost
                6.322987 = idf(docFreq=210, maxDocs=43254)
                0.011382837 = queryNorm
              1.1855601 = fieldWeight in 581, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                6.322987 = idf(docFreq=210, maxDocs=43254)
                0.0625 = fieldNorm(doc=581)
          0.2489821 = weight(abstract_txt:grams in 581) [ClassicSimilarity], result of:
            0.2489821 = score(doc=581,freq=2.0), product of:
              0.34443566 = queryWeight, product of:
                3.699922 = boost
                8.178337 = idf(docFreq=32, maxDocs=43254)
                0.011382837 = queryNorm
              0.7228697 = fieldWeight in 581, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.178337 = idf(docFreq=32, maxDocs=43254)
                0.0625 = fieldNorm(doc=581)
          0.7177947 = weight(abstract_txt:segmentation in 581) [ClassicSimilarity], result of:
            0.7177947 = score(doc=581,freq=9.0), product of:
              0.48375162 = queryWeight, product of:
                5.3702607 = boost
                7.913645 = idf(docFreq=42, maxDocs=43254)
                0.011382837 = queryNorm
              1.4838084 = fieldWeight in 581, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                7.913645 = idf(docFreq=42, maxDocs=43254)
                0.0625 = fieldNorm(doc=581)
          0.27614483 = weight(abstract_txt:words in 581) [ClassicSimilarity], result of:
            0.27614483 = score(doc=581,freq=5.0), product of:
              0.36905175 = queryWeight, product of:
                6.05553 = boost
                5.354077 = idf(docFreq=555, maxDocs=43254)
                0.011382837 = queryNorm
              0.748255 = fieldWeight in 581, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.354077 = idf(docFreq=555, maxDocs=43254)
                0.0625 = fieldNorm(doc=581)
        0.36 = coord(9/25)
    
  2. Wang, F.L.; Yang, C.C.: Mining Web data for Chinese segmentation (2007) 0.35
    0.35412076 = sum of:
      0.35412076 = product of:
        1.4755032 = sum of:
          0.004600706 = weight(abstract_txt:information in 2605) [ClassicSimilarity], result of:
            0.004600706 = score(doc=2605,freq=1.0), product of:
              0.030331159 = queryWeight, product of:
                1.0979512 = boost
                2.42692 = idf(docFreq=10382, maxDocs=43254)
                0.011382837 = queryNorm
              0.1516825 = fieldWeight in 2605, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.42692 = idf(docFreq=10382, maxDocs=43254)
                0.0625 = fieldNorm(doc=2605)
          0.031674188 = weight(abstract_txt:independent in 2605) [ClassicSimilarity], result of:
            0.031674188 = score(doc=2605,freq=1.0), product of:
              0.08712304 = queryWeight, product of:
                1.3158004 = boost
                5.8169117 = idf(docFreq=349, maxDocs=43254)
                0.011382837 = queryNorm
              0.36355698 = fieldWeight in 2605, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8169117 = idf(docFreq=349, maxDocs=43254)
                0.0625 = fieldNorm(doc=2605)
          0.21526529 = weight(abstract_txt:chinese in 2605) [ClassicSimilarity], result of:
            0.21526529 = score(doc=2605,freq=7.0), product of:
              0.20588404 = queryWeight, product of:
                2.860552 = boost
                6.322987 = idf(docFreq=210, maxDocs=43254)
                0.011382837 = queryNorm
              1.0455657 = fieldWeight in 2605, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                6.322987 = idf(docFreq=210, maxDocs=43254)
                0.0625 = fieldNorm(doc=2605)
          0.2203496 = weight(abstract_txt:character in 2605) [ClassicSimilarity], result of:
            0.2203496 = score(doc=2605,freq=2.0), product of:
              0.3826046 = queryWeight, product of:
                5.1586094 = boost
                6.5157895 = idf(docFreq=173, maxDocs=43254)
                0.011382837 = queryNorm
              0.57591987 = fieldWeight in 2605, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.5157895 = idf(docFreq=173, maxDocs=43254)
                0.0625 = fieldNorm(doc=2605)
          0.7566221 = weight(abstract_txt:segmentation in 2605) [ClassicSimilarity], result of:
            0.7566221 = score(doc=2605,freq=10.0), product of:
              0.48375162 = queryWeight, product of:
                5.3702607 = boost
                7.913645 = idf(docFreq=42, maxDocs=43254)
                0.011382837 = queryNorm
              1.5640714 = fieldWeight in 2605, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                7.913645 = idf(docFreq=42, maxDocs=43254)
                0.0625 = fieldNorm(doc=2605)
          0.24699143 = weight(abstract_txt:words in 2605) [ClassicSimilarity], result of:
            0.24699143 = score(doc=2605,freq=4.0), product of:
              0.36905175 = queryWeight, product of:
                6.05553 = boost
                5.354077 = idf(docFreq=555, maxDocs=43254)
                0.011382837 = queryNorm
              0.6692596 = fieldWeight in 2605, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.354077 = idf(docFreq=555, maxDocs=43254)
                0.0625 = fieldNorm(doc=2605)
        0.24 = coord(6/25)
    
  3. Kwok, K.L.: Employing multiple representations for Chinese information retrieval (1999) 0.31
    0.3113557 = sum of:
      0.3113557 = product of:
        0.9729866 = sum of:
          0.004600706 = weight(abstract_txt:information in 5774) [ClassicSimilarity], result of:
            0.004600706 = score(doc=5774,freq=1.0), product of:
              0.030331159 = queryWeight, product of:
                1.0979512 = boost
                2.42692 = idf(docFreq=10382, maxDocs=43254)
                0.011382837 = queryNorm
              0.1516825 = fieldWeight in 5774, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.42692 = idf(docFreq=10382, maxDocs=43254)
                0.0625 = fieldNorm(doc=5774)
          0.019070268 = weight(abstract_txt:using in 5774) [ClassicSimilarity], result of:
            0.019070268 = score(doc=5774,freq=2.0), product of:
              0.062120344 = queryWeight, product of:
                1.5712868 = boost
                3.4731848 = idf(docFreq=3646, maxDocs=43254)
                0.011382837 = queryNorm
              0.30698907 = fieldWeight in 5774, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4731848 = idf(docFreq=3646, maxDocs=43254)
                0.0625 = fieldNorm(doc=5774)
          0.016078034 = weight(abstract_txt:were in 5774) [ClassicSimilarity], result of:
            0.016078034 = score(doc=5774,freq=1.0), product of:
              0.06984918 = queryWeight, product of:
                1.6661695 = boost
                3.6829145 = idf(docFreq=2956, maxDocs=43254)
                0.011382837 = queryNorm
              0.23018216 = fieldWeight in 5774, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6829145 = idf(docFreq=2956, maxDocs=43254)
                0.0625 = fieldNorm(doc=5774)
          0.13595647 = weight(abstract_txt:characters in 5774) [ClassicSimilarity], result of:
            0.13595647 = score(doc=5774,freq=2.0), product of:
              0.20906581 = queryWeight, product of:
                2.4963799 = boost
                7.357357 = idf(docFreq=74, maxDocs=43254)
                0.011382837 = queryNorm
              0.6503046 = fieldWeight in 5774, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.357357 = idf(docFreq=74, maxDocs=43254)
                0.0625 = fieldNorm(doc=5774)
          0.11506414 = weight(abstract_txt:chinese in 5774) [ClassicSimilarity], result of:
            0.11506414 = score(doc=5774,freq=2.0), product of:
              0.20588404 = queryWeight, product of:
                2.860552 = boost
                6.322987 = idf(docFreq=210, maxDocs=43254)
                0.011382837 = queryNorm
              0.55887836 = fieldWeight in 5774, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.322987 = idf(docFreq=210, maxDocs=43254)
                0.0625 = fieldNorm(doc=5774)
          0.2203496 = weight(abstract_txt:character in 5774) [ClassicSimilarity], result of:
            0.2203496 = score(doc=5774,freq=2.0), product of:
              0.3826046 = queryWeight, product of:
                5.1586094 = boost
                6.5157895 = idf(docFreq=173, maxDocs=43254)
                0.011382837 = queryNorm
              0.57591987 = fieldWeight in 5774, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.5157895 = idf(docFreq=173, maxDocs=43254)
                0.0625 = fieldNorm(doc=5774)
          0.33837166 = weight(abstract_txt:segmentation in 5774) [ClassicSimilarity], result of:
            0.33837166 = score(doc=5774,freq=2.0), product of:
              0.48375162 = queryWeight, product of:
                5.3702607 = boost
                7.913645 = idf(docFreq=42, maxDocs=43254)
                0.011382837 = queryNorm
              0.699474 = fieldWeight in 5774, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.913645 = idf(docFreq=42, maxDocs=43254)
                0.0625 = fieldNorm(doc=5774)
          0.12349571 = weight(abstract_txt:words in 5774) [ClassicSimilarity], result of:
            0.12349571 = score(doc=5774,freq=1.0), product of:
              0.36905175 = queryWeight, product of:
                6.05553 = boost
                5.354077 = idf(docFreq=555, maxDocs=43254)
                0.011382837 = queryNorm
              0.3346298 = fieldWeight in 5774, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.354077 = idf(docFreq=555, maxDocs=43254)
                0.0625 = fieldNorm(doc=5774)
        0.32 = coord(8/25)
    
  4. Lee, K.H.; Ng, M.K.M.; Lu, Q.: Text segmentation for Chinese spell checking (1999) 0.30
    0.302742 = sum of:
      0.302742 = product of:
        1.261425 = sum of:
          0.096135736 = weight(abstract_txt:characters in 5914) [ClassicSimilarity], result of:
            0.096135736 = score(doc=5914,freq=1.0), product of:
              0.20906581 = queryWeight, product of:
                2.4963799 = boost
                7.357357 = idf(docFreq=74, maxDocs=43254)
                0.011382837 = queryNorm
              0.4598348 = fieldWeight in 5914, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.357357 = idf(docFreq=74, maxDocs=43254)
                0.0625 = fieldNorm(doc=5914)
          0.19929694 = weight(abstract_txt:chinese in 5914) [ClassicSimilarity], result of:
            0.19929694 = score(doc=5914,freq=6.0), product of:
              0.20588404 = queryWeight, product of:
                2.860552 = boost
                6.322987 = idf(docFreq=210, maxDocs=43254)
                0.011382837 = queryNorm
              0.9680058 = fieldWeight in 5914, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.322987 = idf(docFreq=210, maxDocs=43254)
                0.0625 = fieldNorm(doc=5914)
          0.08466042 = weight(abstract_txt:frequency in 5914) [ClassicSimilarity], result of:
            0.08466042 = score(doc=5914,freq=1.0), product of:
              0.22773492 = queryWeight, product of:
                3.363631 = boost
                5.947997 = idf(docFreq=306, maxDocs=43254)
                0.011382837 = queryNorm
              0.37174982 = fieldWeight in 5914, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.947997 = idf(docFreq=306, maxDocs=43254)
                0.0625 = fieldNorm(doc=5914)
          0.15581068 = weight(abstract_txt:character in 5914) [ClassicSimilarity], result of:
            0.15581068 = score(doc=5914,freq=1.0), product of:
              0.3826046 = queryWeight, product of:
                5.1586094 = boost
                6.5157895 = idf(docFreq=173, maxDocs=43254)
                0.011382837 = queryNorm
              0.40723684 = fieldWeight in 5914, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5157895 = idf(docFreq=173, maxDocs=43254)
                0.0625 = fieldNorm(doc=5914)
          0.4785298 = weight(abstract_txt:segmentation in 5914) [ClassicSimilarity], result of:
            0.4785298 = score(doc=5914,freq=4.0), product of:
              0.48375162 = queryWeight, product of:
                5.3702607 = boost
                7.913645 = idf(docFreq=42, maxDocs=43254)
                0.011382837 = queryNorm
              0.9892056 = fieldWeight in 5914, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.913645 = idf(docFreq=42, maxDocs=43254)
                0.0625 = fieldNorm(doc=5914)
          0.24699143 = weight(abstract_txt:words in 5914) [ClassicSimilarity], result of:
            0.24699143 = score(doc=5914,freq=4.0), product of:
              0.36905175 = queryWeight, product of:
                6.05553 = boost
                5.354077 = idf(docFreq=555, maxDocs=43254)
                0.011382837 = queryNorm
              0.6692596 = fieldWeight in 5914, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.354077 = idf(docFreq=555, maxDocs=43254)
                0.0625 = fieldNorm(doc=5914)
        0.24 = coord(6/25)
    
  5. Peng, F.; Huang, X.: Machine learning for Asian language text classification (2007) 0.26
    0.25578916 = sum of:
      0.25578916 = product of:
        0.9135327 = sum of:
          0.0065063806 = weight(abstract_txt:information in 2832) [ClassicSimilarity], result of:
            0.0065063806 = score(doc=2832,freq=2.0), product of:
              0.030331159 = queryWeight, product of:
                1.0979512 = boost
                2.42692 = idf(docFreq=10382, maxDocs=43254)
                0.011382837 = queryNorm
              0.21451144 = fieldWeight in 2832, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.42692 = idf(docFreq=10382, maxDocs=43254)
                0.0625 = fieldNorm(doc=2832)
          0.024404803 = weight(abstract_txt:simple in 2832) [ClassicSimilarity], result of:
            0.024404803 = score(doc=2832,freq=1.0), product of:
              0.07322278 = queryWeight, product of:
                1.2062758 = boost
                5.3327236 = idf(docFreq=567, maxDocs=43254)
                0.011382837 = queryNorm
              0.33329523 = fieldWeight in 2832, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3327236 = idf(docFreq=567, maxDocs=43254)
                0.0625 = fieldNorm(doc=2832)
          0.027256165 = weight(abstract_txt:significantly in 2832) [ClassicSimilarity], result of:
            0.027256165 = score(doc=2832,freq=1.0), product of:
              0.0788205 = queryWeight, product of:
                1.2515353 = boost
                5.5328074 = idf(docFreq=464, maxDocs=43254)
                0.011382837 = queryNorm
              0.34580046 = fieldWeight in 2832, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5328074 = idf(docFreq=464, maxDocs=43254)
                0.0625 = fieldNorm(doc=2832)
          0.027448557 = weight(abstract_txt:statistical in 2832) [ClassicSimilarity], result of:
            0.027448557 = score(doc=2832,freq=1.0), product of:
              0.07919098 = queryWeight, product of:
                1.2544732 = boost
                5.545795 = idf(docFreq=458, maxDocs=43254)
                0.011382837 = queryNorm
              0.3466122 = fieldWeight in 2832, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.545795 = idf(docFreq=458, maxDocs=43254)
                0.0625 = fieldNorm(doc=2832)
          0.03215607 = weight(abstract_txt:were in 2832) [ClassicSimilarity], result of:
            0.03215607 = score(doc=2832,freq=4.0), product of:
              0.06984918 = queryWeight, product of:
                1.6661695 = boost
                3.6829145 = idf(docFreq=2956, maxDocs=43254)
                0.011382837 = queryNorm
              0.4603643 = fieldWeight in 2832, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.6829145 = idf(docFreq=2956, maxDocs=43254)
                0.0625 = fieldNorm(doc=2832)
          0.16272527 = weight(abstract_txt:chinese in 2832) [ClassicSimilarity], result of:
            0.16272527 = score(doc=2832,freq=4.0), product of:
              0.20588404 = queryWeight, product of:
                2.860552 = boost
                6.322987 = idf(docFreq=210, maxDocs=43254)
                0.011382837 = queryNorm
              0.7903734 = fieldWeight in 2832, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.322987 = idf(docFreq=210, maxDocs=43254)
                0.0625 = fieldNorm(doc=2832)
          0.6330354 = weight(abstract_txt:segmentation in 2832) [ClassicSimilarity], result of:
            0.6330354 = score(doc=2832,freq=7.0), product of:
              0.48375162 = queryWeight, product of:
                5.3702607 = boost
                7.913645 = idf(docFreq=42, maxDocs=43254)
                0.011382837 = queryNorm
              1.308596 = fieldWeight in 2832, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                7.913645 = idf(docFreq=42, maxDocs=43254)
                0.0625 = fieldNorm(doc=2832)
        0.28 = coord(7/25)