Document (#34458)

Author
Chau, M.
Lu, Y.
Fang, X.
Yang, C.C.
Title
Characteristics of character usage in Chinese Web searching
Source
Information processing and management. 45(2009) no.1, S.115-130
Year
2009
Abstract
The use of non-English Web search engines has been prevalent. Given the popularity of Chinese Web searching and the unique characteristics of Chinese language, it is imperative to conduct studies with focuses on the analysis of Chinese Web search queries. In this paper, we report our research on the character usage of Chinese search logs from a Web search engine in Hong Kong. By examining the distribution of search query terms, we found that users tended to use more diversified terms and that the usage of characters in search queries was quite different from the character usage of general online information in Chinese. After studying the Zipf distribution of n-grams with different values of n, we found that the curve of unigram is the most curved one of all while the bigram curve follows the Zipf distribution best, and that the curves of n-grams with larger n (n = 3-6) had similar structures with ?-values in the range of 0.66-0.86. The distribution of combined n-grams was also studied. All the analyses are performed on the data both before and after the removal of function terms and incomplete terms and similar findings are revealed. We believe the findings from this study have provided some insights into further research in non-English Web searching and will assist in the design of more effective Chinese Web search engines.

Similar documents (author)

  1. Chau, M.; Fang, X.; Sheng, O.R.U.: Analysis of the query logs of a Web site search engine (2005) 2.86
    2.8588367 = sum of:
      2.8588367 = product of:
        4.2882547 = sum of:
          1.9603987 = weight(author_txt:fang in 574) [ClassicSimilarity], result of:
            1.9603987 = score(doc=574,freq=1.0), product of:
              0.6031654 = queryWeight, product of:
                1.1944534 = boost
                8.667158 = idf(docFreq=19, maxDocs=42740)
                0.058262683 = queryNorm
              3.2501843 = fieldWeight in 574, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.667158 = idf(docFreq=19, maxDocs=42740)
                0.375 = fieldNorm(doc=574)
          2.3278563 = weight(author_txt:chau in 574) [ClassicSimilarity], result of:
            2.3278563 = score(doc=574,freq=1.0), product of:
              0.6763595 = queryWeight, product of:
                1.2648522 = boost
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.058262683 = queryNorm
              3.441744 = fieldWeight in 574, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.375 = fieldNorm(doc=574)
        0.6666667 = coord(2/3)
    
  2. Chau, M.; Fang, X.; Rittman, C.C.: Web searching in Chinese : a study of a search engine in Hong Kong (2007) 2.86
    2.8588367 = sum of:
      2.8588367 = product of:
        4.2882547 = sum of:
          1.9603987 = weight(author_txt:fang in 2337) [ClassicSimilarity], result of:
            1.9603987 = score(doc=2337,freq=1.0), product of:
              0.6031654 = queryWeight, product of:
                1.1944534 = boost
                8.667158 = idf(docFreq=19, maxDocs=42740)
                0.058262683 = queryNorm
              3.2501843 = fieldWeight in 2337, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.667158 = idf(docFreq=19, maxDocs=42740)
                0.375 = fieldNorm(doc=2337)
          2.3278563 = weight(author_txt:chau in 2337) [ClassicSimilarity], result of:
            2.3278563 = score(doc=2337,freq=1.0), product of:
              0.6763595 = queryWeight, product of:
                1.2648522 = boost
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.058262683 = queryNorm
              3.441744 = fieldWeight in 2337, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.375 = fieldNorm(doc=2337)
        0.6666667 = coord(2/3)
    
  3. Chau, M.Y.: Finding order in a chaotic world : a model for organized research using the World Wide Web (1997) 1.29
    1.2932535 = sum of:
      1.2932535 = product of:
        3.8797605 = sum of:
          3.8797605 = weight(author_txt:chau in 1530) [ClassicSimilarity], result of:
            3.8797605 = score(doc=1530,freq=1.0), product of:
              0.6763595 = queryWeight, product of:
                1.2648522 = boost
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.058262683 = queryNorm
              5.7362404 = fieldWeight in 1530, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.625 = fieldNorm(doc=1530)
        0.33333334 = coord(1/3)
    
  4. Fang, L.: ¬A developing search service : heterogeneous resources integration and retrieval system (2004) 1.09
    1.0891105 = sum of:
      1.0891105 = product of:
        3.2673314 = sum of:
          3.2673314 = weight(author_txt:fang in 3194) [ClassicSimilarity], result of:
            3.2673314 = score(doc=3194,freq=1.0), product of:
              0.6031654 = queryWeight, product of:
                1.1944534 = boost
                8.667158 = idf(docFreq=19, maxDocs=42740)
                0.058262683 = queryNorm
              5.416974 = fieldWeight in 3194, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.667158 = idf(docFreq=19, maxDocs=42740)
                0.625 = fieldNorm(doc=3194)
        0.33333334 = coord(1/3)
    
  5. Fang, H.: Classifying research articles in multidisciplinary sciences journals into subject categories (2015) 1.09
    1.0891105 = sum of:
      1.0891105 = product of:
        3.2673314 = sum of:
          3.2673314 = weight(author_txt:fang in 4195) [ClassicSimilarity], result of:
            3.2673314 = score(doc=4195,freq=1.0), product of:
              0.6031654 = queryWeight, product of:
                1.1944534 = boost
                8.667158 = idf(docFreq=19, maxDocs=42740)
                0.058262683 = queryNorm
              5.416974 = fieldWeight in 4195, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.667158 = idf(docFreq=19, maxDocs=42740)
                0.625 = fieldNorm(doc=4195)
        0.33333334 = coord(1/3)
    

Similar documents (content)

  1. Chau, M.; Fang, X.; Rittman, C.C.: Web searching in Chinese : a study of a search engine in Hong Kong (2007) 0.58
    0.57663333 = sum of:
      0.57663333 = product of:
        1.1089103 = sum of:
          0.07331015 = weight(abstract_txt:kong in 2337) [ClassicSimilarity], result of:
            0.07331015 = score(doc=2337,freq=1.0), product of:
              0.13755223 = queryWeight, product of:
                8.527396 = idf(docFreq=22, maxDocs=42740)
                0.016130624 = queryNorm
              0.53296226 = fieldWeight in 2337, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.527396 = idf(docFreq=22, maxDocs=42740)
                0.0625 = fieldNorm(doc=2337)
          0.07331015 = weight(abstract_txt:hong in 2337) [ClassicSimilarity], result of:
            0.07331015 = score(doc=2337,freq=1.0), product of:
              0.13755223 = queryWeight, product of:
                8.527396 = idf(docFreq=22, maxDocs=42740)
                0.016130624 = queryNorm
              0.53296226 = fieldWeight in 2337, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.527396 = idf(docFreq=22, maxDocs=42740)
                0.0625 = fieldNorm(doc=2337)
          0.012987799 = weight(abstract_txt:that in 2337) [ClassicSimilarity], result of:
            0.012987799 = score(doc=2337,freq=4.0), product of:
              0.04338923 = queryWeight, product of:
                1.1232778 = boost
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.016130624 = queryNorm
              0.29933232 = fieldWeight in 2337, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.0625 = fieldNorm(doc=2337)
          0.03945687 = weight(abstract_txt:characteristics in 2337) [ClassicSimilarity], result of:
            0.03945687 = score(doc=2337,freq=2.0), product of:
              0.09101368 = queryWeight, product of:
                1.150362 = boost
                4.904796 = idf(docFreq=860, maxDocs=42740)
                0.016130624 = queryNorm
              0.4335268 = fieldWeight in 2337, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.904796 = idf(docFreq=860, maxDocs=42740)
                0.0625 = fieldNorm(doc=2337)
          0.053982947 = weight(abstract_txt:queries in 2337) [ClassicSimilarity], result of:
            0.053982947 = score(doc=2337,freq=3.0), product of:
              0.09798634 = queryWeight, product of:
                1.1936141 = boost
                5.0892105 = idf(docFreq=715, maxDocs=42740)
                0.016130624 = queryNorm
              0.55092317 = fieldWeight in 2337, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.0892105 = idf(docFreq=715, maxDocs=42740)
                0.0625 = fieldNorm(doc=2337)
          0.034045044 = weight(abstract_txt:similar in 2337) [ClassicSimilarity], result of:
            0.034045044 = score(doc=2337,freq=1.0), product of:
              0.10392918 = queryWeight, product of:
                1.2292775 = boost
                5.241268 = idf(docFreq=614, maxDocs=42740)
                0.016130624 = queryNorm
              0.32757926 = fieldWeight in 2337, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.241268 = idf(docFreq=614, maxDocs=42740)
                0.0625 = fieldNorm(doc=2337)
          0.04977984 = weight(abstract_txt:engines in 2337) [ClassicSimilarity], result of:
            0.04977984 = score(doc=2337,freq=2.0), product of:
              0.106265895 = queryWeight, product of:
                1.24302 = boost
                5.2998624 = idf(docFreq=579, maxDocs=42740)
                0.016130624 = queryNorm
              0.46844608 = fieldWeight in 2337, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.2998624 = idf(docFreq=579, maxDocs=42740)
                0.0625 = fieldNorm(doc=2337)
          0.093508564 = weight(abstract_txt:english in 2337) [ClassicSimilarity], result of:
            0.093508564 = score(doc=2337,freq=5.0), product of:
              0.119200796 = queryWeight, product of:
                1.3164997 = boost
                5.6131573 = idf(docFreq=423, maxDocs=42740)
                0.016130624 = queryNorm
              0.7844626 = fieldWeight in 2337, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.6131573 = idf(docFreq=423, maxDocs=42740)
                0.0625 = fieldNorm(doc=2337)
          0.027570399 = weight(abstract_txt:searching in 2337) [ClassicSimilarity], result of:
            0.027570399 = score(doc=2337,freq=1.0), product of:
              0.10336195 = queryWeight, product of:
                1.5014372 = boost
                4.267783 = idf(docFreq=1627, maxDocs=42740)
                0.016130624 = queryNorm
              0.26673645 = fieldWeight in 2337, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.267783 = idf(docFreq=1627, maxDocs=42740)
                0.0625 = fieldNorm(doc=2337)
          0.13947944 = weight(abstract_txt:character in 2337) [ClassicSimilarity], result of:
            0.13947944 = score(doc=2337,freq=2.0), product of:
              0.24176614 = queryWeight, product of:
                2.2962785 = boost
                6.527092 = idf(docFreq=169, maxDocs=42740)
                0.016130624 = queryNorm
              0.57691884 = fieldWeight in 2337, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.527092 = idf(docFreq=169, maxDocs=42740)
                0.0625 = fieldNorm(doc=2337)
          0.085815094 = weight(abstract_txt:usage in 2337) [ClassicSimilarity], result of:
            0.085815094 = score(doc=2337,freq=1.0), product of:
              0.24252361 = queryWeight, product of:
                2.6556644 = boost
                5.6614757 = idf(docFreq=403, maxDocs=42740)
                0.016130624 = queryNorm
              0.35384223 = fieldWeight in 2337, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6614757 = idf(docFreq=403, maxDocs=42740)
                0.0625 = fieldNorm(doc=2337)
          0.12742053 = weight(abstract_txt:search in 2337) [ClassicSimilarity], result of:
            0.12742053 = score(doc=2337,freq=10.0), product of:
              0.17655656 = queryWeight, product of:
                2.9974859 = boost
                3.6515355 = idf(docFreq=3014, maxDocs=42740)
                0.016130624 = queryNorm
              0.7216981 = fieldWeight in 2337, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                3.6515355 = idf(docFreq=3014, maxDocs=42740)
                0.0625 = fieldNorm(doc=2337)
          0.2982435 = weight(abstract_txt:chinese in 2337) [ClassicSimilarity], result of:
            0.2982435 = score(doc=2337,freq=2.0), product of:
              0.53222454 = queryWeight, product of:
                5.2043037 = boost
                6.3398805 = idf(docFreq=204, maxDocs=42740)
                0.016130624 = queryNorm
              0.5603716 = fieldWeight in 2337, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.3398805 = idf(docFreq=204, maxDocs=42740)
                0.0625 = fieldNorm(doc=2337)
        0.52 = coord(13/25)
    
  2. Chung, W.; Zhang, Y.; Huang, Z.; Wang, G.; Ong, T.-H.; Chen, H.: Internet searching and browsing in a multilingual world : an experiment an the Chinese Business Intelligence Portal (CBizPort) (2004) 0.33
    0.32983774 = sum of:
      0.32983774 = product of:
        0.9162159 = sum of:
          0.07331015 = weight(abstract_txt:kong in 3394) [ClassicSimilarity], result of:
            0.07331015 = score(doc=3394,freq=1.0), product of:
              0.13755223 = queryWeight, product of:
                8.527396 = idf(docFreq=22, maxDocs=42740)
                0.016130624 = queryNorm
              0.53296226 = fieldWeight in 3394, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.527396 = idf(docFreq=22, maxDocs=42740)
                0.0625 = fieldNorm(doc=3394)
          0.07331015 = weight(abstract_txt:hong in 3394) [ClassicSimilarity], result of:
            0.07331015 = score(doc=3394,freq=1.0), product of:
              0.13755223 = queryWeight, product of:
                8.527396 = idf(docFreq=22, maxDocs=42740)
                0.016130624 = queryNorm
              0.53296226 = fieldWeight in 3394, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.527396 = idf(docFreq=22, maxDocs=42740)
                0.0625 = fieldNorm(doc=3394)
          0.012987799 = weight(abstract_txt:that in 3394) [ClassicSimilarity], result of:
            0.012987799 = score(doc=3394,freq=4.0), product of:
              0.04338923 = queryWeight, product of:
                1.1232778 = boost
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.016130624 = queryNorm
              0.29933232 = fieldWeight in 3394, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.0625 = fieldNorm(doc=3394)
          0.07039933 = weight(abstract_txt:engines in 3394) [ClassicSimilarity], result of:
            0.07039933 = score(doc=3394,freq=4.0), product of:
              0.106265895 = queryWeight, product of:
                1.24302 = boost
                5.2998624 = idf(docFreq=579, maxDocs=42740)
                0.016130624 = queryNorm
              0.6624828 = fieldWeight in 3394, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.2998624 = idf(docFreq=579, maxDocs=42740)
                0.0625 = fieldNorm(doc=3394)
          0.07243142 = weight(abstract_txt:english in 3394) [ClassicSimilarity], result of:
            0.07243142 = score(doc=3394,freq=3.0), product of:
              0.119200796 = queryWeight, product of:
                1.3164997 = boost
                5.6131573 = idf(docFreq=423, maxDocs=42740)
                0.016130624 = queryNorm
              0.6076421 = fieldWeight in 3394, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.6131573 = idf(docFreq=423, maxDocs=42740)
                0.0625 = fieldNorm(doc=3394)
          0.06164929 = weight(abstract_txt:searching in 3394) [ClassicSimilarity], result of:
            0.06164929 = score(doc=3394,freq=5.0), product of:
              0.10336195 = queryWeight, product of:
                1.5014372 = boost
                4.267783 = idf(docFreq=1627, maxDocs=42740)
                0.016130624 = queryNorm
              0.59644085 = fieldWeight in 3394, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.267783 = idf(docFreq=1627, maxDocs=42740)
                0.0625 = fieldNorm(doc=3394)
          0.03164824 = weight(abstract_txt:terms in 3394) [ClassicSimilarity], result of:
            0.03164824 = score(doc=3394,freq=1.0), product of:
              0.12472244 = queryWeight, product of:
                1.9044453 = boost
                4.05999 = idf(docFreq=2003, maxDocs=42740)
                0.016130624 = queryNorm
              0.25374937 = fieldWeight in 3394, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.05999 = idf(docFreq=2003, maxDocs=42740)
                0.0625 = fieldNorm(doc=3394)
          0.09869952 = weight(abstract_txt:search in 3394) [ClassicSimilarity], result of:
            0.09869952 = score(doc=3394,freq=6.0), product of:
              0.17655656 = queryWeight, product of:
                2.9974859 = boost
                3.6515355 = idf(docFreq=3014, maxDocs=42740)
                0.016130624 = queryNorm
              0.55902493 = fieldWeight in 3394, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                3.6515355 = idf(docFreq=3014, maxDocs=42740)
                0.0625 = fieldNorm(doc=3394)
          0.42178 = weight(abstract_txt:chinese in 3394) [ClassicSimilarity], result of:
            0.42178 = score(doc=3394,freq=4.0), product of:
              0.53222454 = queryWeight, product of:
                5.2043037 = boost
                6.3398805 = idf(docFreq=204, maxDocs=42740)
                0.016130624 = queryNorm
              0.79248506 = fieldWeight in 3394, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.3398805 = idf(docFreq=204, maxDocs=42740)
                0.0625 = fieldNorm(doc=3394)
        0.36 = coord(9/25)
    
  3. Yang, C.C.; Li, K.W.: ¬A heuristic method based on a statistical approach for chinese text segmentation (2005) 0.30
    0.30103526 = sum of:
      0.30103526 = product of:
        1.0751259 = sum of:
          0.02167713 = weight(abstract_txt:found in 581) [ClassicSimilarity], result of:
            0.02167713 = score(doc=581,freq=1.0), product of:
              0.07691944 = queryWeight, product of:
                1.0575459 = boost
                4.5090566 = idf(docFreq=1278, maxDocs=42740)
                0.016130624 = queryNorm
              0.28181604 = fieldWeight in 581, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5090566 = idf(docFreq=1278, maxDocs=42740)
                0.0625 = fieldNorm(doc=581)
          0.009183761 = weight(abstract_txt:that in 581) [ClassicSimilarity], result of:
            0.009183761 = score(doc=581,freq=2.0), product of:
              0.04338923 = queryWeight, product of:
                1.1232778 = boost
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.016130624 = queryNorm
              0.21165991 = fieldWeight in 581, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.0625 = fieldNorm(doc=581)
          0.013070948 = weight(abstract_txt:with in 581) [ClassicSimilarity], result of:
            0.013070948 = score(doc=581,freq=3.0), product of:
              0.047959637 = queryWeight, product of:
                1.1809571 = boost
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.016130624 = queryNorm
              0.2725406 = fieldWeight in 581, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.0625 = fieldNorm(doc=581)
          0.041818302 = weight(abstract_txt:english in 581) [ClassicSimilarity], result of:
            0.041818302 = score(doc=581,freq=1.0), product of:
              0.119200796 = queryWeight, product of:
                1.3164997 = boost
                5.6131573 = idf(docFreq=423, maxDocs=42740)
                0.016130624 = queryNorm
              0.35082233 = fieldWeight in 581, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6131573 = idf(docFreq=423, maxDocs=42740)
                0.0625 = fieldNorm(doc=581)
          0.083531335 = weight(abstract_txt:distribution in 581) [ClassicSimilarity], result of:
            0.083531335 = score(doc=581,freq=1.0), product of:
              0.23820151 = queryWeight, product of:
                2.6318944 = boost
                5.610801 = idf(docFreq=424, maxDocs=42740)
                0.016130624 = queryNorm
              0.35067508 = fieldWeight in 581, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.610801 = idf(docFreq=424, maxDocs=42740)
                0.0625 = fieldNorm(doc=581)
          0.27317452 = weight(abstract_txt:grams in 581) [ClassicSimilarity], result of:
            0.27317452 = score(doc=581,freq=2.0), product of:
              0.37845606 = queryWeight, product of:
                2.872993 = boost
                8.166383 = idf(docFreq=32, maxDocs=42740)
                0.016130624 = queryNorm
              0.7218131 = fieldWeight in 581, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.166383 = idf(docFreq=32, maxDocs=42740)
                0.0625 = fieldNorm(doc=581)
          0.63267 = weight(abstract_txt:chinese in 581) [ClassicSimilarity], result of:
            0.63267 = score(doc=581,freq=9.0), product of:
              0.53222454 = queryWeight, product of:
                5.2043037 = boost
                6.3398805 = idf(docFreq=204, maxDocs=42740)
                0.016130624 = queryNorm
              1.1887276 = fieldWeight in 581, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                6.3398805 = idf(docFreq=204, maxDocs=42740)
                0.0625 = fieldNorm(doc=581)
        0.28 = coord(7/25)
    
  4. Arsenault, C.: Aggregation consistency and frequency of Chinese words and characters (2006) 0.25
    0.25377667 = sum of:
      0.25377667 = product of:
        0.90634525 = sum of:
          0.014520802 = weight(abstract_txt:that in 1735) [ClassicSimilarity], result of:
            0.014520802 = score(doc=1735,freq=5.0), product of:
              0.04338923 = queryWeight, product of:
                1.1232778 = boost
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.016130624 = queryNorm
              0.33466372 = fieldWeight in 1735, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.0625 = fieldNorm(doc=1735)
          0.010672385 = weight(abstract_txt:with in 1735) [ClassicSimilarity], result of:
            0.010672385 = score(doc=1735,freq=2.0), product of:
              0.047959637 = queryWeight, product of:
                1.1809571 = boost
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.016130624 = queryNorm
              0.22252847 = fieldWeight in 1735, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.0625 = fieldNorm(doc=1735)
          0.04475737 = weight(abstract_txt:terms in 1735) [ClassicSimilarity], result of:
            0.04475737 = score(doc=1735,freq=2.0), product of:
              0.12472244 = queryWeight, product of:
                1.9044453 = boost
                4.05999 = idf(docFreq=2003, maxDocs=42740)
                0.016130624 = queryNorm
              0.35885578 = fieldWeight in 1735, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.05999 = idf(docFreq=2003, maxDocs=42740)
                0.0625 = fieldNorm(doc=1735)
          0.14892517 = weight(abstract_txt:zipf in 1735) [ClassicSimilarity], result of:
            0.14892517 = score(doc=1735,freq=1.0), product of:
              0.27798006 = queryWeight, product of:
                2.0104256 = boost
                8.571848 = idf(docFreq=21, maxDocs=42740)
                0.016130624 = queryNorm
              0.5357405 = fieldWeight in 1735, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.571848 = idf(docFreq=21, maxDocs=42740)
                0.0625 = fieldNorm(doc=1735)
          0.09862687 = weight(abstract_txt:character in 1735) [ClassicSimilarity], result of:
            0.09862687 = score(doc=1735,freq=1.0), product of:
              0.24176614 = queryWeight, product of:
                2.2962785 = boost
                6.527092 = idf(docFreq=169, maxDocs=42740)
                0.016130624 = queryNorm
              0.40794325 = fieldWeight in 1735, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.527092 = idf(docFreq=169, maxDocs=42740)
                0.0625 = fieldNorm(doc=1735)
          0.16706267 = weight(abstract_txt:distribution in 1735) [ClassicSimilarity], result of:
            0.16706267 = score(doc=1735,freq=4.0), product of:
              0.23820151 = queryWeight, product of:
                2.6318944 = boost
                5.610801 = idf(docFreq=424, maxDocs=42740)
                0.016130624 = queryNorm
              0.70135015 = fieldWeight in 1735, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.610801 = idf(docFreq=424, maxDocs=42740)
                0.0625 = fieldNorm(doc=1735)
          0.42178 = weight(abstract_txt:chinese in 1735) [ClassicSimilarity], result of:
            0.42178 = score(doc=1735,freq=4.0), product of:
              0.53222454 = queryWeight, product of:
                5.2043037 = boost
                6.3398805 = idf(docFreq=204, maxDocs=42740)
                0.016130624 = queryNorm
              0.79248506 = fieldWeight in 1735, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.3398805 = idf(docFreq=204, maxDocs=42740)
                0.0625 = fieldNorm(doc=1735)
        0.28 = coord(7/25)
    
  5. Khoo, C.S.G.; Dai, D.; Loh, T.E.: Using statistical and contextual information to identify two- and three-character words in Chinese text (2002) 0.25
    0.24833232 = sum of:
      0.24833232 = product of:
        1.034718 = sum of:
          0.02167713 = weight(abstract_txt:found in 207) [ClassicSimilarity], result of:
            0.02167713 = score(doc=207,freq=1.0), product of:
              0.07691944 = queryWeight, product of:
                1.0575459 = boost
                4.5090566 = idf(docFreq=1278, maxDocs=42740)
                0.016130624 = queryNorm
              0.28181604 = fieldWeight in 207, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5090566 = idf(docFreq=1278, maxDocs=42740)
                0.0625 = fieldNorm(doc=207)
          0.009183761 = weight(abstract_txt:that in 207) [ClassicSimilarity], result of:
            0.009183761 = score(doc=207,freq=2.0), product of:
              0.04338923 = queryWeight, product of:
                1.1232778 = boost
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.016130624 = queryNorm
              0.21165991 = fieldWeight in 207, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.0625 = fieldNorm(doc=207)
          0.010672385 = weight(abstract_txt:with in 207) [ClassicSimilarity], result of:
            0.010672385 = score(doc=207,freq=2.0), product of:
              0.047959637 = queryWeight, product of:
                1.1809571 = boost
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.016130624 = queryNorm
              0.22252847 = fieldWeight in 207, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.0625 = fieldNorm(doc=207)
          0.24158551 = weight(abstract_txt:character in 207) [ClassicSimilarity], result of:
            0.24158551 = score(doc=207,freq=6.0), product of:
              0.24176614 = queryWeight, product of:
                2.2962785 = boost
                6.527092 = idf(docFreq=169, maxDocs=42740)
                0.016130624 = queryNorm
              0.99925286 = fieldWeight in 207, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.527092 = idf(docFreq=169, maxDocs=42740)
                0.0625 = fieldNorm(doc=207)
          0.38632712 = weight(abstract_txt:grams in 207) [ClassicSimilarity], result of:
            0.38632712 = score(doc=207,freq=4.0), product of:
              0.37845606 = queryWeight, product of:
                2.872993 = boost
                8.166383 = idf(docFreq=32, maxDocs=42740)
                0.016130624 = queryNorm
              1.0207978 = fieldWeight in 207, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                8.166383 = idf(docFreq=32, maxDocs=42740)
                0.0625 = fieldNorm(doc=207)
          0.3652722 = weight(abstract_txt:chinese in 207) [ClassicSimilarity], result of:
            0.3652722 = score(doc=207,freq=3.0), product of:
              0.53222454 = queryWeight, product of:
                5.2043037 = boost
                6.3398805 = idf(docFreq=204, maxDocs=42740)
                0.016130624 = queryNorm
              0.6863122 = fieldWeight in 207, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.3398805 = idf(docFreq=204, maxDocs=42740)
                0.0625 = fieldNorm(doc=207)
        0.24 = coord(6/25)