Document (#29581)

Author
Yang, C.C.
Li, K.W.
Title
¬A heuristic method based on a statistical approach for chinese text segmentation
Source
Journal of the American Society for Information Science and Technology. 56(2005) no.13, S.1438-1447
Year
2005
Abstract
The authors propose a heuristic method for Chinese automatic text segmentation based an a statistical approach. This method is developed based an statistical information about the association among adjacent characters in Chinese text. Mutual information of bi-grams and significant estimation of tri-grams are utilized. A heuristic method with six rules is then proposed to determine the segmentation points in a Chinese sentence. No dictionary is required in this method. Chinese text segmentation is important in Chinese text indexing and thus greatly affects the performance of Chinese information retrieval. Due to the lack of delimiters of words in Chinese text, Chinese text segmentation is more difficult than English text segmentation. Besides, segmentation ambiguities and occurrences of out-of-vocabulary words (i.e., unknown words) are the major challenges in Chinese segmentation. Many research studies dealing with the problem of word segmentation have focused an the resolution of segmentation ambiguities. The problem of unknown word identification has not drawn much attention. The experimental result Shows that the proposed heuristic method is promising to segment the unknown words as weIl as the known words. The authors further investigated the distribution of the errors of commission and the errors of omission caused by the proposed heuristic method and benchmarked the proposed heuristic method with a previous proposed technique, boundary detection. It is found that the heuristic method outperformed the boundary detection method.

Similar documents (author)

  1. Yang, S.C.: ¬An interpretive and situated approach to an evaluation of Perseus digital libraries (2001) 4.50
    4.4981737 = sum of:
      4.4981737 = weight(author_txt:yang in 6933) [ClassicSimilarity], result of:
        4.4981737 = fieldWeight in 6933, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.1970778 = idf(docFreq=89, maxDocs=44218)
          0.625 = fieldNorm(doc=6933)
    
  2. Yang, K.: Information retrieval on the Web (2004) 4.50
    4.4981737 = sum of:
      4.4981737 = weight(author_txt:yang in 4278) [ClassicSimilarity], result of:
        4.4981737 = fieldWeight in 4278, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.1970778 = idf(docFreq=89, maxDocs=44218)
          0.625 = fieldNorm(doc=4278)
    
  3. Yang, C.C.: Content-based image retrievaI : a comparison between query by example and image browsing map approaches (2005) 4.50
    4.4981737 = sum of:
      4.4981737 = weight(author_txt:yang in 4649) [ClassicSimilarity], result of:
        4.4981737 = fieldWeight in 4649, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.1970778 = idf(docFreq=89, maxDocs=44218)
          0.625 = fieldNorm(doc=4649)
    
  4. Salton, G.; Yang, C.S.: On the specification of term values in automatic indexing (1973) 3.60
    3.5985389 = sum of:
      3.5985389 = weight(author_txt:yang in 5476) [ClassicSimilarity], result of:
        3.5985389 = fieldWeight in 5476, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.1970778 = idf(docFreq=89, maxDocs=44218)
          0.5 = fieldNorm(doc=5476)
    
  5. Yang, Y.; Chute, C.G.A.: ¬A schematic analysis of the Unified Medical Language System (1992) 3.60
    3.5985389 = sum of:
      3.5985389 = weight(author_txt:yang in 6445) [ClassicSimilarity], result of:
        3.5985389 = fieldWeight in 6445, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.1970778 = idf(docFreq=89, maxDocs=44218)
          0.5 = fieldNorm(doc=6445)
    

Similar documents (content)

  1. Lee, K.H.; Ng, M.K.M.; Lu, Q.: Text segmentation for Chinese spell checking (1999) 0.75
    0.7471317 = sum of:
      0.7471317 = product of:
        1.5565244 = sum of:
          0.0042933426 = weight(abstract_txt:with in 3913) [ClassicSimilarity], result of:
            0.0042933426 = score(doc=3913,freq=2.0), product of:
              0.019431522 = queryWeight, product of:
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.007773438 = queryNorm
              0.22094731 = fieldWeight in 3913, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.0625 = fieldNorm(doc=3913)
          0.008905122 = weight(abstract_txt:based in 3913) [ClassicSimilarity], result of:
            0.008905122 = score(doc=3913,freq=2.0), product of:
              0.03160359 = queryWeight, product of:
                1.2753072 = boost
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.007773438 = queryNorm
              0.28177565 = fieldWeight in 3913, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.0625 = fieldNorm(doc=3913)
          0.041613255 = weight(abstract_txt:word in 3913) [ClassicSimilarity], result of:
            0.041613255 = score(doc=3913,freq=4.0), product of:
              0.061247803 = queryWeight, product of:
                1.4495934 = boost
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.007773438 = queryNorm
              0.67942446 = fieldWeight in 3913, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.0625 = fieldNorm(doc=3913)
          0.03640083 = weight(abstract_txt:errors in 3913) [ClassicSimilarity], result of:
            0.03640083 = score(doc=3913,freq=1.0), product of:
              0.0889263 = queryWeight, product of:
                1.746691 = boost
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.007773438 = queryNorm
              0.40933704 = fieldWeight in 3913, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.0625 = fieldNorm(doc=3913)
          0.04045855 = weight(abstract_txt:detection in 3913) [ClassicSimilarity], result of:
            0.04045855 = score(doc=3913,freq=1.0), product of:
              0.09541784 = queryWeight, product of:
                1.8093215 = boost
                6.784232 = idf(docFreq=135, maxDocs=44218)
                0.007773438 = queryNorm
              0.4240145 = fieldWeight in 3913, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.784232 = idf(docFreq=135, maxDocs=44218)
                0.0625 = fieldNorm(doc=3913)
          0.10589961 = weight(abstract_txt:ambiguities in 3913) [ClassicSimilarity], result of:
            0.10589961 = score(doc=3913,freq=2.0), product of:
              0.14383866 = queryWeight, product of:
                2.2214613 = boost
                8.329592 = idf(docFreq=28, maxDocs=44218)
                0.007773438 = queryNorm
              0.73623884 = fieldWeight in 3913, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.329592 = idf(docFreq=28, maxDocs=44218)
                0.0625 = fieldNorm(doc=3913)
          0.031721737 = weight(abstract_txt:proposed in 3913) [ClassicSimilarity], result of:
            0.031721737 = score(doc=3913,freq=1.0), product of:
              0.11011353 = queryWeight, product of:
                3.0732033 = boost
                4.6093135 = idf(docFreq=1196, maxDocs=44218)
                0.007773438 = queryNorm
              0.2880821 = fieldWeight in 3913, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6093135 = idf(docFreq=1196, maxDocs=44218)
                0.0625 = fieldNorm(doc=3913)
          0.09937373 = weight(abstract_txt:words in 3913) [ClassicSimilarity], result of:
            0.09937373 = score(doc=3913,freq=4.0), product of:
              0.14851277 = queryWeight, product of:
                3.5690517 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.007773438 = queryNorm
              0.66912585 = fieldWeight in 3913, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.0625 = fieldNorm(doc=3913)
          0.059363473 = weight(abstract_txt:text in 3913) [ClassicSimilarity], result of:
            0.059363473 = score(doc=3913,freq=3.0), product of:
              0.13560691 = queryWeight, product of:
                4.3139176 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.007773438 = queryNorm
              0.4377614 = fieldWeight in 3913, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=3913)
          0.08354216 = weight(abstract_txt:method in 3913) [ClassicSimilarity], result of:
            0.08354216 = score(doc=3913,freq=2.0), product of:
              0.20999384 = queryWeight, product of:
                6.0019064 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.007773438 = queryNorm
              0.3978315 = fieldWeight in 3913, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.0625 = fieldNorm(doc=3913)
          0.39741978 = weight(abstract_txt:chinese in 3913) [ClassicSimilarity], result of:
            0.39741978 = score(doc=3913,freq=6.0), product of:
              0.41184008 = queryWeight, product of:
                8.405243 = boost
                6.30326 = idf(docFreq=219, maxDocs=44218)
                0.007773438 = queryNorm
              0.96498567 = fieldWeight in 3913, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.30326 = idf(docFreq=219, maxDocs=44218)
                0.0625 = fieldNorm(doc=3913)
          0.6475328 = weight(abstract_txt:segmentation in 3913) [ClassicSimilarity], result of:
            0.6475328 = score(doc=3913,freq=4.0), product of:
              0.6527806 = queryWeight, product of:
                10.582045 = boost
                7.935687 = idf(docFreq=42, maxDocs=44218)
                0.007773438 = queryNorm
              0.9919609 = fieldWeight in 3913, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.935687 = idf(docFreq=42, maxDocs=44218)
                0.0625 = fieldNorm(doc=3913)
        0.48 = coord(12/25)
    
  2. Wang, F.L.; Yang, C.C.: Mining Web data for Chinese segmentation (2007) 0.71
    0.70883214 = sum of:
      0.70883214 = product of:
        1.7720803 = sum of:
          0.0042933426 = weight(abstract_txt:with in 604) [ClassicSimilarity], result of:
            0.0042933426 = score(doc=604,freq=2.0), product of:
              0.019431522 = queryWeight, product of:
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.007773438 = queryNorm
              0.22094731 = fieldWeight in 604, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.0625 = fieldNorm(doc=604)
          0.0062968726 = weight(abstract_txt:based in 604) [ClassicSimilarity], result of:
            0.0062968726 = score(doc=604,freq=1.0), product of:
              0.03160359 = queryWeight, product of:
                1.2753072 = boost
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.007773438 = queryNorm
              0.19924548 = fieldWeight in 604, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.0625 = fieldNorm(doc=604)
          0.029425014 = weight(abstract_txt:word in 604) [ClassicSimilarity], result of:
            0.029425014 = score(doc=604,freq=2.0), product of:
              0.061247803 = queryWeight, product of:
                1.4495934 = boost
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.007773438 = queryNorm
              0.48042563 = fieldWeight in 604, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.0625 = fieldNorm(doc=604)
          0.04045855 = weight(abstract_txt:detection in 604) [ClassicSimilarity], result of:
            0.04045855 = score(doc=604,freq=1.0), product of:
              0.09541784 = queryWeight, product of:
                1.8093215 = boost
                6.784232 = idf(docFreq=135, maxDocs=44218)
                0.007773438 = queryNorm
              0.4240145 = fieldWeight in 604, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.784232 = idf(docFreq=135, maxDocs=44218)
                0.0625 = fieldNorm(doc=604)
          0.073135786 = weight(abstract_txt:unknown in 604) [ClassicSimilarity], result of:
            0.073135786 = score(doc=604,freq=1.0), product of:
              0.16208385 = queryWeight, product of:
                2.888128 = boost
                7.2195506 = idf(docFreq=87, maxDocs=44218)
                0.007773438 = queryNorm
              0.4512219 = fieldWeight in 604, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.2195506 = idf(docFreq=87, maxDocs=44218)
                0.0625 = fieldNorm(doc=604)
          0.031721737 = weight(abstract_txt:proposed in 604) [ClassicSimilarity], result of:
            0.031721737 = score(doc=604,freq=1.0), product of:
              0.11011353 = queryWeight, product of:
                3.0732033 = boost
                4.6093135 = idf(docFreq=1196, maxDocs=44218)
                0.007773438 = queryNorm
              0.2880821 = fieldWeight in 604, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6093135 = idf(docFreq=1196, maxDocs=44218)
                0.0625 = fieldNorm(doc=604)
          0.09937373 = weight(abstract_txt:words in 604) [ClassicSimilarity], result of:
            0.09937373 = score(doc=604,freq=4.0), product of:
              0.14851277 = queryWeight, product of:
                3.5690517 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.007773438 = queryNorm
              0.66912585 = fieldWeight in 604, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.0625 = fieldNorm(doc=604)
          0.034273516 = weight(abstract_txt:text in 604) [ClassicSimilarity], result of:
            0.034273516 = score(doc=604,freq=1.0), product of:
              0.13560691 = queryWeight, product of:
                4.3139176 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.007773438 = queryNorm
              0.25274166 = fieldWeight in 604, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=604)
          0.42926237 = weight(abstract_txt:chinese in 604) [ClassicSimilarity], result of:
            0.42926237 = score(doc=604,freq=7.0), product of:
              0.41184008 = queryWeight, product of:
                8.405243 = boost
                6.30326 = idf(docFreq=219, maxDocs=44218)
                0.007773438 = queryNorm
              1.0423036 = fieldWeight in 604, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                6.30326 = idf(docFreq=219, maxDocs=44218)
                0.0625 = fieldNorm(doc=604)
          1.0238394 = weight(abstract_txt:segmentation in 604) [ClassicSimilarity], result of:
            1.0238394 = score(doc=604,freq=10.0), product of:
              0.6527806 = queryWeight, product of:
                10.582045 = boost
                7.935687 = idf(docFreq=42, maxDocs=44218)
                0.007773438 = queryNorm
              1.5684279 = fieldWeight in 604, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                7.935687 = idf(docFreq=42, maxDocs=44218)
                0.0625 = fieldNorm(doc=604)
        0.4 = coord(10/25)
    
  3. Khoo, C.S.G.; Dai, D.; Loh, T.E.: Using statistical and contextual information to identify two- and three-character words in Chinese text (2002) 0.54
    0.54132897 = sum of:
      0.54132897 = product of:
        1.5036916 = sum of:
          0.0042933426 = weight(abstract_txt:with in 5206) [ClassicSimilarity], result of:
            0.0042933426 = score(doc=5206,freq=2.0), product of:
              0.019431522 = queryWeight, product of:
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.007773438 = queryNorm
              0.22094731 = fieldWeight in 5206, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.0625 = fieldNorm(doc=5206)
          0.045114852 = weight(abstract_txt:adjacent in 5206) [ClassicSimilarity], result of:
            0.045114852 = score(doc=5206,freq=1.0), product of:
              0.081437744 = queryWeight, product of:
                1.1819493 = boost
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.007773438 = queryNorm
              0.55397964 = fieldWeight in 5206, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.0625 = fieldNorm(doc=5206)
          0.020806627 = weight(abstract_txt:word in 5206) [ClassicSimilarity], result of:
            0.020806627 = score(doc=5206,freq=1.0), product of:
              0.061247803 = queryWeight, product of:
                1.4495934 = boost
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.007773438 = queryNorm
              0.33971223 = fieldWeight in 5206, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.0625 = fieldNorm(doc=5206)
          0.14290257 = weight(abstract_txt:grams in 5206) [ClassicSimilarity], result of:
            0.14290257 = score(doc=5206,freq=4.0), product of:
              0.13941069 = queryWeight, product of:
                2.187001 = boost
                8.200379 = idf(docFreq=32, maxDocs=44218)
                0.007773438 = queryNorm
              1.0250474 = fieldWeight in 5206, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                8.200379 = idf(docFreq=32, maxDocs=44218)
                0.0625 = fieldNorm(doc=5206)
          0.033159345 = weight(abstract_txt:statistical in 5206) [ClassicSimilarity], result of:
            0.033159345 = score(doc=5206,freq=1.0), product of:
              0.09565854 = queryWeight, product of:
                2.2187505 = boost
                5.5462847 = idf(docFreq=468, maxDocs=44218)
                0.007773438 = queryNorm
              0.3466428 = fieldWeight in 5206, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5462847 = idf(docFreq=468, maxDocs=44218)
                0.0625 = fieldNorm(doc=5206)
          0.1490606 = weight(abstract_txt:words in 5206) [ClassicSimilarity], result of:
            0.1490606 = score(doc=5206,freq=9.0), product of:
              0.14851277 = queryWeight, product of:
                3.5690517 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.007773438 = queryNorm
              1.0036888 = fieldWeight in 5206, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.0625 = fieldNorm(doc=5206)
          0.034273516 = weight(abstract_txt:text in 5206) [ClassicSimilarity], result of:
            0.034273516 = score(doc=5206,freq=1.0), product of:
              0.13560691 = queryWeight, product of:
                4.3139176 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.007773438 = queryNorm
              0.25274166 = fieldWeight in 5206, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=5206)
          0.2810182 = weight(abstract_txt:chinese in 5206) [ClassicSimilarity], result of:
            0.2810182 = score(doc=5206,freq=3.0), product of:
              0.41184008 = queryWeight, product of:
                8.405243 = boost
                6.30326 = idf(docFreq=219, maxDocs=44218)
                0.007773438 = queryNorm
              0.6823479 = fieldWeight in 5206, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.30326 = idf(docFreq=219, maxDocs=44218)
                0.0625 = fieldNorm(doc=5206)
          0.7930625 = weight(abstract_txt:segmentation in 5206) [ClassicSimilarity], result of:
            0.7930625 = score(doc=5206,freq=6.0), product of:
              0.6527806 = queryWeight, product of:
                10.582045 = boost
                7.935687 = idf(docFreq=42, maxDocs=44218)
                0.007773438 = queryNorm
              1.2148991 = fieldWeight in 5206, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.935687 = idf(docFreq=42, maxDocs=44218)
                0.0625 = fieldNorm(doc=5206)
        0.36 = coord(9/25)
    
  4. Peng, F.; Huang, X.: Machine learning for Asian language text classification (2007) 0.45
    0.45056266 = sum of:
      0.45056266 = product of:
        1.4080083 = sum of:
          0.005258249 = weight(abstract_txt:with in 831) [ClassicSimilarity], result of:
            0.005258249 = score(doc=831,freq=3.0), product of:
              0.019431522 = queryWeight, product of:
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.007773438 = queryNorm
              0.27060407 = fieldWeight in 831, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.0625 = fieldNorm(doc=831)
          0.010906503 = weight(abstract_txt:based in 831) [ClassicSimilarity], result of:
            0.010906503 = score(doc=831,freq=3.0), product of:
              0.03160359 = queryWeight, product of:
                1.2753072 = boost
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.007773438 = queryNorm
              0.3451033 = fieldWeight in 831, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.0625 = fieldNorm(doc=831)
          0.041613255 = weight(abstract_txt:word in 831) [ClassicSimilarity], result of:
            0.041613255 = score(doc=831,freq=4.0), product of:
              0.061247803 = queryWeight, product of:
                1.4495934 = boost
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.007773438 = queryNorm
              0.67942446 = fieldWeight in 831, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.0625 = fieldNorm(doc=831)
          0.059335824 = weight(abstract_txt:boundary in 831) [ClassicSimilarity], result of:
            0.059335824 = score(doc=831,freq=1.0), product of:
              0.1231688 = queryWeight, product of:
                2.05566 = boost
                7.7079034 = idf(docFreq=53, maxDocs=44218)
                0.007773438 = queryNorm
              0.48174396 = fieldWeight in 831, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.7079034 = idf(docFreq=53, maxDocs=44218)
                0.0625 = fieldNorm(doc=831)
          0.033159345 = weight(abstract_txt:statistical in 831) [ClassicSimilarity], result of:
            0.033159345 = score(doc=831,freq=1.0), product of:
              0.09565854 = queryWeight, product of:
                2.2187505 = boost
                5.5462847 = idf(docFreq=468, maxDocs=44218)
                0.007773438 = queryNorm
              0.3466428 = fieldWeight in 831, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5462847 = idf(docFreq=468, maxDocs=44218)
                0.0625 = fieldNorm(doc=831)
          0.076637916 = weight(abstract_txt:text in 831) [ClassicSimilarity], result of:
            0.076637916 = score(doc=831,freq=5.0), product of:
              0.13560691 = queryWeight, product of:
                4.3139176 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.007773438 = queryNorm
              0.5651476 = fieldWeight in 831, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=831)
          0.3244919 = weight(abstract_txt:chinese in 831) [ClassicSimilarity], result of:
            0.3244919 = score(doc=831,freq=4.0), product of:
              0.41184008 = queryWeight, product of:
                8.405243 = boost
                6.30326 = idf(docFreq=219, maxDocs=44218)
                0.007773438 = queryNorm
              0.7879075 = fieldWeight in 831, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.30326 = idf(docFreq=219, maxDocs=44218)
                0.0625 = fieldNorm(doc=831)
          0.85660535 = weight(abstract_txt:segmentation in 831) [ClassicSimilarity], result of:
            0.85660535 = score(doc=831,freq=7.0), product of:
              0.6527806 = queryWeight, product of:
                10.582045 = boost
                7.935687 = idf(docFreq=42, maxDocs=44218)
                0.007773438 = queryNorm
              1.3122408 = fieldWeight in 831, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                7.935687 = idf(docFreq=42, maxDocs=44218)
                0.0625 = fieldNorm(doc=831)
        0.32 = coord(8/25)
    
  5. Xinglin, L.: Automatic summarization method based on compound word recognition (2015) 0.40
    0.39501616 = sum of:
      0.39501616 = product of:
        0.897764 = sum of:
          0.0030358515 = weight(abstract_txt:with in 1841) [ClassicSimilarity], result of:
            0.0030358515 = score(doc=1841,freq=1.0), product of:
              0.019431522 = queryWeight, product of:
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.007773438 = queryNorm
              0.15623334 = fieldWeight in 1841, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.0625 = fieldNorm(doc=1841)
          0.011499338 = weight(abstract_txt:problem in 1841) [ClassicSimilarity], result of:
            0.011499338 = score(doc=1841,freq=1.0), product of:
              0.04124816 = queryWeight, product of:
                1.1896063 = boost
                4.460548 = idf(docFreq=1388, maxDocs=44218)
                0.007773438 = queryNorm
              0.27878425 = fieldWeight in 1841, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.460548 = idf(docFreq=1388, maxDocs=44218)
                0.0625 = fieldNorm(doc=1841)
          0.0062968726 = weight(abstract_txt:based in 1841) [ClassicSimilarity], result of:
            0.0062968726 = score(doc=1841,freq=1.0), product of:
              0.03160359 = queryWeight, product of:
                1.2753072 = boost
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.007773438 = queryNorm
              0.19924548 = fieldWeight in 1841, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.0625 = fieldNorm(doc=1841)
          0.036038134 = weight(abstract_txt:word in 1841) [ClassicSimilarity], result of:
            0.036038134 = score(doc=1841,freq=3.0), product of:
              0.061247803 = queryWeight, product of:
                1.4495934 = boost
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.007773438 = queryNorm
              0.5883988 = fieldWeight in 1841, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.0625 = fieldNorm(doc=1841)
          0.073135786 = weight(abstract_txt:unknown in 1841) [ClassicSimilarity], result of:
            0.073135786 = score(doc=1841,freq=1.0), product of:
              0.16208385 = queryWeight, product of:
                2.888128 = boost
                7.2195506 = idf(docFreq=87, maxDocs=44218)
                0.007773438 = queryNorm
              0.4512219 = fieldWeight in 1841, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.2195506 = idf(docFreq=87, maxDocs=44218)
                0.0625 = fieldNorm(doc=1841)
          0.04486131 = weight(abstract_txt:proposed in 1841) [ClassicSimilarity], result of:
            0.04486131 = score(doc=1841,freq=2.0), product of:
              0.11011353 = queryWeight, product of:
                3.0732033 = boost
                4.6093135 = idf(docFreq=1196, maxDocs=44218)
                0.007773438 = queryNorm
              0.4074096 = fieldWeight in 1841, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.6093135 = idf(docFreq=1196, maxDocs=44218)
                0.0625 = fieldNorm(doc=1841)
          0.07026784 = weight(abstract_txt:words in 1841) [ClassicSimilarity], result of:
            0.07026784 = score(doc=1841,freq=2.0), product of:
              0.14851277 = queryWeight, product of:
                3.5690517 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.007773438 = queryNorm
              0.47314343 = fieldWeight in 1841, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.0625 = fieldNorm(doc=1841)
          0.048470072 = weight(abstract_txt:text in 1841) [ClassicSimilarity], result of:
            0.048470072 = score(doc=1841,freq=2.0), product of:
              0.13560691 = queryWeight, product of:
                4.3139176 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.007773438 = queryNorm
              0.3574307 = fieldWeight in 1841, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=1841)
          0.118146464 = weight(abstract_txt:method in 1841) [ClassicSimilarity], result of:
            0.118146464 = score(doc=1841,freq=4.0), product of:
              0.20999384 = queryWeight, product of:
                6.0019064 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.007773438 = queryNorm
              0.56261873 = fieldWeight in 1841, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.0625 = fieldNorm(doc=1841)
          0.16224594 = weight(abstract_txt:chinese in 1841) [ClassicSimilarity], result of:
            0.16224594 = score(doc=1841,freq=1.0), product of:
              0.41184008 = queryWeight, product of:
                8.405243 = boost
                6.30326 = idf(docFreq=219, maxDocs=44218)
                0.007773438 = queryNorm
              0.39395374 = fieldWeight in 1841, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.30326 = idf(docFreq=219, maxDocs=44218)
                0.0625 = fieldNorm(doc=1841)
          0.3237664 = weight(abstract_txt:segmentation in 1841) [ClassicSimilarity], result of:
            0.3237664 = score(doc=1841,freq=1.0), product of:
              0.6527806 = queryWeight, product of:
                10.582045 = boost
                7.935687 = idf(docFreq=42, maxDocs=44218)
                0.007773438 = queryNorm
              0.49598044 = fieldWeight in 1841, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.935687 = idf(docFreq=42, maxDocs=44218)
                0.0625 = fieldNorm(doc=1841)
        0.44 = coord(11/25)