Document (#29582)

Author
Yang, C.C.
Li, K.W.
Title
¬A heuristic method based on a statistical approach for chinese text segmentation
Source
Journal of the American Society for Information Science and Technology. 56(2005) no.13, S.1438-1447
Year
2005
Abstract
The authors propose a heuristic method for Chinese automatic text segmentation based an a statistical approach. This method is developed based an statistical information about the association among adjacent characters in Chinese text. Mutual information of bi-grams and significant estimation of tri-grams are utilized. A heuristic method with six rules is then proposed to determine the segmentation points in a Chinese sentence. No dictionary is required in this method. Chinese text segmentation is important in Chinese text indexing and thus greatly affects the performance of Chinese information retrieval. Due to the lack of delimiters of words in Chinese text, Chinese text segmentation is more difficult than English text segmentation. Besides, segmentation ambiguities and occurrences of out-of-vocabulary words (i.e., unknown words) are the major challenges in Chinese segmentation. Many research studies dealing with the problem of word segmentation have focused an the resolution of segmentation ambiguities. The problem of unknown word identification has not drawn much attention. The experimental result Shows that the proposed heuristic method is promising to segment the unknown words as weIl as the known words. The authors further investigated the distribution of the errors of commission and the errors of omission caused by the proposed heuristic method and benchmarked the proposed heuristic method with a previous proposed technique, boundary detection. It is found that the heuristic method outperformed the boundary detection method.

Similar documents (author)

  1. Yang, S.C.: ¬An interpretive and situated approach to an evaluation of Perseus digital libraries (2001) 4.54
    4.535107 = sum of:
      4.535107 = weight(author_txt:yang in 934) [ClassicSimilarity], result of:
        4.535107 = fieldWeight in 934, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.256171 = idf(docFreq=81, maxDocs=42740)
          0.625 = fieldNorm(doc=934)
    
  2. Yang, K.: Information retrieval on the Web (2004) 4.54
    4.535107 = sum of:
      4.535107 = weight(author_txt:yang in 279) [ClassicSimilarity], result of:
        4.535107 = fieldWeight in 279, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.256171 = idf(docFreq=81, maxDocs=42740)
          0.625 = fieldNorm(doc=279)
    
  3. Yang, C.C.: Content-based image retrievaI : a comparison between query by example and image browsing map approaches (2005) 4.54
    4.535107 = sum of:
      4.535107 = weight(author_txt:yang in 650) [ClassicSimilarity], result of:
        4.535107 = fieldWeight in 650, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.256171 = idf(docFreq=81, maxDocs=42740)
          0.625 = fieldNorm(doc=650)
    
  4. Salton, G.; Yang, C.S.: On the specification of term values in automatic indexing (1973) 3.63
    3.6280856 = sum of:
      3.6280856 = weight(author_txt:yang in 5476) [ClassicSimilarity], result of:
        3.6280856 = fieldWeight in 5476, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.256171 = idf(docFreq=81, maxDocs=42740)
          0.5 = fieldNorm(doc=5476)
    
  5. Yang, Y.; Chute, C.G.A.: ¬A schematic analysis of the Unified Medical Language System (1992) 3.63
    3.6280856 = sum of:
      3.6280856 = weight(author_txt:yang in 6445) [ClassicSimilarity], result of:
        3.6280856 = fieldWeight in 6445, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.256171 = idf(docFreq=81, maxDocs=42740)
          0.5 = fieldNorm(doc=6445)
    

Similar documents (content)

  1. Lee, K.H.; Ng, M.K.M.; Lu, Q.: Text segmentation for Chinese spell checking (1999) 0.75
    0.7476281 = sum of:
      0.7476281 = product of:
        1.5575585 = sum of:
          0.004366547 = weight(abstract_txt:with in 4914) [ClassicSimilarity], result of:
            0.004366547 = score(doc=4914,freq=2.0), product of:
              0.019622419 = queryWeight, product of:
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.0077940286 = queryNorm
              0.22252847 = fieldWeight in 4914, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.0625 = fieldNorm(doc=4914)
          0.009040964 = weight(abstract_txt:based in 4914) [ClassicSimilarity], result of:
            0.009040964 = score(doc=4914,freq=2.0), product of:
              0.031876475 = queryWeight, product of:
                1.2745558 = boost
                3.2088501 = idf(docFreq=4693, maxDocs=42740)
                0.0077940286 = queryNorm
              0.28362495 = fieldWeight in 4914, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.2088501 = idf(docFreq=4693, maxDocs=42740)
                0.0625 = fieldNorm(doc=4914)
          0.04186078 = weight(abstract_txt:word in 4914) [ClassicSimilarity], result of:
            0.04186078 = score(doc=4914,freq=4.0), product of:
              0.061398573 = queryWeight, product of:
                1.4442993 = boost
                5.4543004 = idf(docFreq=496, maxDocs=42740)
                0.0077940286 = queryNorm
              0.68178755 = fieldWeight in 4914, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.4543004 = idf(docFreq=496, maxDocs=42740)
                0.0625 = fieldNorm(doc=4914)
          0.035966363 = weight(abstract_txt:errors in 4914) [ClassicSimilarity], result of:
            0.035966363 = score(doc=4914,freq=1.0), product of:
              0.088085495 = queryWeight, product of:
                1.729937 = boost
                6.532992 = idf(docFreq=168, maxDocs=42740)
                0.0077940286 = queryNorm
              0.408312 = fieldWeight in 4914, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.532992 = idf(docFreq=168, maxDocs=42740)
                0.0625 = fieldNorm(doc=4914)
          0.04103696 = weight(abstract_txt:detection in 4914) [ClassicSimilarity], result of:
            0.04103696 = score(doc=4914,freq=1.0), product of:
              0.09618119 = queryWeight, product of:
                1.8076867 = boost
                6.8266087 = idf(docFreq=125, maxDocs=42740)
                0.0077940286 = queryNorm
              0.42666304 = fieldWeight in 4914, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8266087 = idf(docFreq=125, maxDocs=42740)
                0.0625 = fieldNorm(doc=4914)
          0.10546719 = weight(abstract_txt:ambiguities in 4914) [ClassicSimilarity], result of:
            0.10546719 = score(doc=4914,freq=2.0), product of:
              0.14323252 = queryWeight, product of:
                2.2059665 = boost
                8.330686 = idf(docFreq=27, maxDocs=42740)
                0.0077940286 = queryNorm
              0.7363355 = fieldWeight in 4914, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.330686 = idf(docFreq=27, maxDocs=42740)
                0.0625 = fieldNorm(doc=4914)
          0.032233745 = weight(abstract_txt:proposed in 4914) [ClassicSimilarity], result of:
            0.032233745 = score(doc=4914,freq=1.0), product of:
              0.11112895 = queryWeight, product of:
                3.0722864 = boost
                4.640914 = idf(docFreq=1120, maxDocs=42740)
                0.0077940286 = queryNorm
              0.29005712 = fieldWeight in 4914, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.640914 = idf(docFreq=1120, maxDocs=42740)
                0.0625 = fieldNorm(doc=4914)
          0.09923063 = weight(abstract_txt:words in 4914) [ClassicSimilarity], result of:
            0.09923063 = score(doc=4914,freq=4.0), product of:
              0.14814849 = queryWeight, product of:
                3.54729 = boost
                5.358442 = idf(docFreq=546, maxDocs=42740)
                0.0077940286 = queryNorm
              0.6698052 = fieldWeight in 4914, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.358442 = idf(docFreq=546, maxDocs=42740)
                0.0625 = fieldNorm(doc=4914)
          0.05936971 = weight(abstract_txt:text in 4914) [ClassicSimilarity], result of:
            0.05936971 = score(doc=4914,freq=3.0), product of:
              0.1354138 = queryWeight, product of:
                4.2898245 = boost
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.0077940286 = queryNorm
              0.43843177 = fieldWeight in 4914, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.0625 = fieldNorm(doc=4914)
          0.0843208 = weight(abstract_txt:method in 4914) [ClassicSimilarity], result of:
            0.0843208 = score(doc=4914,freq=2.0), product of:
              0.21098092 = queryWeight, product of:
                5.986661 = boost
                4.5216455 = idf(docFreq=1262, maxDocs=42740)
                0.0077940286 = queryNorm
              0.39966077 = fieldWeight in 4914, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.5216455 = idf(docFreq=1262, maxDocs=42740)
                0.0625 = fieldNorm(doc=4914)
          0.40257704 = weight(abstract_txt:chinese in 4914) [ClassicSimilarity], result of:
            0.40257704 = score(doc=4914,freq=6.0), product of:
              0.41477472 = queryWeight, product of:
                8.394005 = boost
                6.3398805 = idf(docFreq=204, maxDocs=42740)
                0.0077940286 = queryNorm
              0.970592 = fieldWeight in 4914, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.3398805 = idf(docFreq=204, maxDocs=42740)
                0.0625 = fieldNorm(doc=4914)
          0.6420877 = weight(abstract_txt:segmentation in 4914) [ClassicSimilarity], result of:
            0.6420877 = score(doc=4914,freq=4.0), product of:
              0.64814615 = queryWeight, product of:
                10.492997 = boost
                7.925221 = idf(docFreq=41, maxDocs=42740)
                0.0077940286 = queryNorm
              0.9906526 = fieldWeight in 4914, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.925221 = idf(docFreq=41, maxDocs=42740)
                0.0625 = fieldNorm(doc=4914)
        0.48 = coord(12/25)
    
  2. Wang, F.L.; Yang, C.C.: Mining Web data for Chinese segmentation (2007) 0.71
    0.708752 = sum of:
      0.708752 = product of:
        1.7718799 = sum of:
          0.004366547 = weight(abstract_txt:with in 2605) [ClassicSimilarity], result of:
            0.004366547 = score(doc=2605,freq=2.0), product of:
              0.019622419 = queryWeight, product of:
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.0077940286 = queryNorm
              0.22252847 = fieldWeight in 2605, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.0625 = fieldNorm(doc=2605)
          0.006392927 = weight(abstract_txt:based in 2605) [ClassicSimilarity], result of:
            0.006392927 = score(doc=2605,freq=1.0), product of:
              0.031876475 = queryWeight, product of:
                1.2745558 = boost
                3.2088501 = idf(docFreq=4693, maxDocs=42740)
                0.0077940286 = queryNorm
              0.20055313 = fieldWeight in 2605, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.2088501 = idf(docFreq=4693, maxDocs=42740)
                0.0625 = fieldNorm(doc=2605)
          0.029600043 = weight(abstract_txt:word in 2605) [ClassicSimilarity], result of:
            0.029600043 = score(doc=2605,freq=2.0), product of:
              0.061398573 = queryWeight, product of:
                1.4442993 = boost
                5.4543004 = idf(docFreq=496, maxDocs=42740)
                0.0077940286 = queryNorm
              0.48209658 = fieldWeight in 2605, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4543004 = idf(docFreq=496, maxDocs=42740)
                0.0625 = fieldNorm(doc=2605)
          0.04103696 = weight(abstract_txt:detection in 2605) [ClassicSimilarity], result of:
            0.04103696 = score(doc=2605,freq=1.0), product of:
              0.09618119 = queryWeight, product of:
                1.8076867 = boost
                6.8266087 = idf(docFreq=125, maxDocs=42740)
                0.0077940286 = queryNorm
              0.42666304 = fieldWeight in 2605, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8266087 = idf(docFreq=125, maxDocs=42740)
                0.0625 = fieldNorm(doc=2605)
          0.07467928 = weight(abstract_txt:unknown in 2605) [ClassicSimilarity], result of:
            0.07467928 = score(doc=2605,freq=1.0), product of:
              0.16411081 = queryWeight, product of:
                2.8919601 = boost
                7.280864 = idf(docFreq=79, maxDocs=42740)
                0.0077940286 = queryNorm
              0.455054 = fieldWeight in 2605, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.280864 = idf(docFreq=79, maxDocs=42740)
                0.0625 = fieldNorm(doc=2605)
          0.032233745 = weight(abstract_txt:proposed in 2605) [ClassicSimilarity], result of:
            0.032233745 = score(doc=2605,freq=1.0), product of:
              0.11112895 = queryWeight, product of:
                3.0722864 = boost
                4.640914 = idf(docFreq=1120, maxDocs=42740)
                0.0077940286 = queryNorm
              0.29005712 = fieldWeight in 2605, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.640914 = idf(docFreq=1120, maxDocs=42740)
                0.0625 = fieldNorm(doc=2605)
          0.09923063 = weight(abstract_txt:words in 2605) [ClassicSimilarity], result of:
            0.09923063 = score(doc=2605,freq=4.0), product of:
              0.14814849 = queryWeight, product of:
                3.54729 = boost
                5.358442 = idf(docFreq=546, maxDocs=42740)
                0.0077940286 = queryNorm
              0.6698052 = fieldWeight in 2605, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.358442 = idf(docFreq=546, maxDocs=42740)
                0.0625 = fieldNorm(doc=2605)
          0.03427712 = weight(abstract_txt:text in 2605) [ClassicSimilarity], result of:
            0.03427712 = score(doc=2605,freq=1.0), product of:
              0.1354138 = queryWeight, product of:
                4.2898245 = boost
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.0077940286 = queryNorm
              0.2531287 = fieldWeight in 2605, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.0625 = fieldNorm(doc=2605)
          0.43483287 = weight(abstract_txt:chinese in 2605) [ClassicSimilarity], result of:
            0.43483287 = score(doc=2605,freq=7.0), product of:
              0.41477472 = queryWeight, product of:
                8.394005 = boost
                6.3398805 = idf(docFreq=204, maxDocs=42740)
                0.0077940286 = queryNorm
              1.0483592 = fieldWeight in 2605, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                6.3398805 = idf(docFreq=204, maxDocs=42740)
                0.0625 = fieldNorm(doc=2605)
          1.0152298 = weight(abstract_txt:segmentation in 2605) [ClassicSimilarity], result of:
            1.0152298 = score(doc=2605,freq=10.0), product of:
              0.64814615 = queryWeight, product of:
                10.492997 = boost
                7.925221 = idf(docFreq=41, maxDocs=42740)
                0.0077940286 = queryNorm
              1.5663594 = fieldWeight in 2605, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                7.925221 = idf(docFreq=41, maxDocs=42740)
                0.0625 = fieldNorm(doc=2605)
        0.4 = coord(10/25)
    
  3. Khoo, C.S.G.; Dai, D.; Loh, T.E.: Using statistical and contextual information to identify two- and three-character words in Chinese text (2002) 0.54
    0.53905034 = sum of:
      0.53905034 = product of:
        1.497362 = sum of:
          0.004366547 = weight(abstract_txt:with in 207) [ClassicSimilarity], result of:
            0.004366547 = score(doc=207,freq=2.0), product of:
              0.019622419 = queryWeight, product of:
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.0077940286 = queryNorm
              0.22252847 = fieldWeight in 207, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.0625 = fieldNorm(doc=207)
          0.044398136 = weight(abstract_txt:adjacent in 207) [ClassicSimilarity], result of:
            0.044398136 = score(doc=207,freq=1.0), product of:
              0.08045256 = queryWeight, product of:
                1.1690499 = boost
                8.829678 = idf(docFreq=16, maxDocs=42740)
                0.0077940286 = queryNorm
              0.55185485 = fieldWeight in 207, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.829678 = idf(docFreq=16, maxDocs=42740)
                0.0625 = fieldNorm(doc=207)
          0.02093039 = weight(abstract_txt:word in 207) [ClassicSimilarity], result of:
            0.02093039 = score(doc=207,freq=1.0), product of:
              0.061398573 = queryWeight, product of:
                1.4442993 = boost
                5.4543004 = idf(docFreq=496, maxDocs=42740)
                0.0077940286 = queryNorm
              0.34089378 = fieldWeight in 207, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4543004 = idf(docFreq=496, maxDocs=42740)
                0.0625 = fieldNorm(doc=207)
          0.14050099 = weight(abstract_txt:grams in 207) [ClassicSimilarity], result of:
            0.14050099 = score(doc=207,freq=4.0), product of:
              0.1376384 = queryWeight, product of:
                2.1624591 = boost
                8.166383 = idf(docFreq=32, maxDocs=42740)
                0.0077940286 = queryNorm
              1.0207978 = fieldWeight in 207, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                8.166383 = idf(docFreq=32, maxDocs=42740)
                0.0625 = fieldNorm(doc=207)
          0.03298432 = weight(abstract_txt:statistical in 207) [ClassicSimilarity], result of:
            0.03298432 = score(doc=207,freq=1.0), product of:
              0.09517922 = queryWeight, product of:
                2.2023928 = boost
                5.544793 = idf(docFreq=453, maxDocs=42740)
                0.0077940286 = queryNorm
              0.34654957 = fieldWeight in 207, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.544793 = idf(docFreq=453, maxDocs=42740)
                0.0625 = fieldNorm(doc=207)
          0.14884594 = weight(abstract_txt:words in 207) [ClassicSimilarity], result of:
            0.14884594 = score(doc=207,freq=9.0), product of:
              0.14814849 = queryWeight, product of:
                3.54729 = boost
                5.358442 = idf(docFreq=546, maxDocs=42740)
                0.0077940286 = queryNorm
              1.0047078 = fieldWeight in 207, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                5.358442 = idf(docFreq=546, maxDocs=42740)
                0.0625 = fieldNorm(doc=207)
          0.03427712 = weight(abstract_txt:text in 207) [ClassicSimilarity], result of:
            0.03427712 = score(doc=207,freq=1.0), product of:
              0.1354138 = queryWeight, product of:
                4.2898245 = boost
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.0077940286 = queryNorm
              0.2531287 = fieldWeight in 207, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.0625 = fieldNorm(doc=207)
          0.28466496 = weight(abstract_txt:chinese in 207) [ClassicSimilarity], result of:
            0.28466496 = score(doc=207,freq=3.0), product of:
              0.41477472 = queryWeight, product of:
                8.394005 = boost
                6.3398805 = idf(docFreq=204, maxDocs=42740)
                0.0077940286 = queryNorm
              0.6863122 = fieldWeight in 207, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.3398805 = idf(docFreq=204, maxDocs=42740)
                0.0625 = fieldNorm(doc=207)
          0.78639364 = weight(abstract_txt:segmentation in 207) [ClassicSimilarity], result of:
            0.78639364 = score(doc=207,freq=6.0), product of:
              0.64814615 = queryWeight, product of:
                10.492997 = boost
                7.925221 = idf(docFreq=41, maxDocs=42740)
                0.0077940286 = queryNorm
              1.2132968 = fieldWeight in 207, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.925221 = idf(docFreq=41, maxDocs=42740)
                0.0625 = fieldNorm(doc=207)
        0.36 = coord(9/25)
    
  4. Peng, F.; Huang, X.: Machine learning for Asian language text classification (2007) 0.45
    0.44979873 = sum of:
      0.44979873 = product of:
        1.405621 = sum of:
          0.0053479057 = weight(abstract_txt:with in 2832) [ClassicSimilarity], result of:
            0.0053479057 = score(doc=2832,freq=3.0), product of:
              0.019622419 = queryWeight, product of:
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.0077940286 = queryNorm
              0.2725406 = fieldWeight in 2832, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.0625 = fieldNorm(doc=2832)
          0.011072874 = weight(abstract_txt:based in 2832) [ClassicSimilarity], result of:
            0.011072874 = score(doc=2832,freq=3.0), product of:
              0.031876475 = queryWeight, product of:
                1.2745558 = boost
                3.2088501 = idf(docFreq=4693, maxDocs=42740)
                0.0077940286 = queryNorm
              0.3473682 = fieldWeight in 2832, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.2088501 = idf(docFreq=4693, maxDocs=42740)
                0.0625 = fieldNorm(doc=2832)
          0.04186078 = weight(abstract_txt:word in 2832) [ClassicSimilarity], result of:
            0.04186078 = score(doc=2832,freq=4.0), product of:
              0.061398573 = queryWeight, product of:
                1.4442993 = boost
                5.4543004 = idf(docFreq=496, maxDocs=42740)
                0.0077940286 = queryNorm
              0.68178755 = fieldWeight in 2832, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.4543004 = idf(docFreq=496, maxDocs=42740)
                0.0625 = fieldNorm(doc=2832)
          0.059604373 = weight(abstract_txt:boundary in 2832) [ClassicSimilarity], result of:
            0.059604373 = score(doc=2832,freq=1.0), product of:
              0.123355575 = queryWeight, product of:
                2.0471869 = boost
                7.731065 = idf(docFreq=50, maxDocs=42740)
                0.0077940286 = queryNorm
              0.48319155 = fieldWeight in 2832, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.731065 = idf(docFreq=50, maxDocs=42740)
                0.0625 = fieldNorm(doc=2832)
          0.03298432 = weight(abstract_txt:statistical in 2832) [ClassicSimilarity], result of:
            0.03298432 = score(doc=2832,freq=1.0), product of:
              0.09517922 = queryWeight, product of:
                2.2023928 = boost
                5.544793 = idf(docFreq=453, maxDocs=42740)
                0.0077940286 = queryNorm
              0.34654957 = fieldWeight in 2832, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.544793 = idf(docFreq=453, maxDocs=42740)
                0.0625 = fieldNorm(doc=2832)
          0.07664596 = weight(abstract_txt:text in 2832) [ClassicSimilarity], result of:
            0.07664596 = score(doc=2832,freq=5.0), product of:
              0.1354138 = queryWeight, product of:
                4.2898245 = boost
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.0077940286 = queryNorm
              0.566013 = fieldWeight in 2832, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.0625 = fieldNorm(doc=2832)
          0.32870278 = weight(abstract_txt:chinese in 2832) [ClassicSimilarity], result of:
            0.32870278 = score(doc=2832,freq=4.0), product of:
              0.41477472 = queryWeight, product of:
                8.394005 = boost
                6.3398805 = idf(docFreq=204, maxDocs=42740)
                0.0077940286 = queryNorm
              0.79248506 = fieldWeight in 2832, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.3398805 = idf(docFreq=204, maxDocs=42740)
                0.0625 = fieldNorm(doc=2832)
          0.8494021 = weight(abstract_txt:segmentation in 2832) [ClassicSimilarity], result of:
            0.8494021 = score(doc=2832,freq=7.0), product of:
              0.64814615 = queryWeight, product of:
                10.492997 = boost
                7.925221 = idf(docFreq=41, maxDocs=42740)
                0.0077940286 = queryNorm
              1.3105102 = fieldWeight in 2832, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                7.925221 = idf(docFreq=41, maxDocs=42740)
                0.0625 = fieldNorm(doc=2832)
        0.32 = coord(8/25)
    
  5. Xinglin, L.: Automatic summarization method based on compound word recognition (2015) 0.40
    0.39631268 = sum of:
      0.39631268 = product of:
        0.90071064 = sum of:
          0.003087615 = weight(abstract_txt:with in 3842) [ClassicSimilarity], result of:
            0.003087615 = score(doc=3842,freq=1.0), product of:
              0.019622419 = queryWeight, product of:
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.0077940286 = queryNorm
              0.15735139 = fieldWeight in 3842, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.0625 = fieldNorm(doc=3842)
          0.01142825 = weight(abstract_txt:problem in 3842) [ClassicSimilarity], result of:
            0.01142825 = score(doc=3842,freq=1.0), product of:
              0.04101662 = queryWeight, product of:
                1.1804783 = boost
                4.457998 = idf(docFreq=1345, maxDocs=42740)
                0.0077940286 = queryNorm
              0.27862486 = fieldWeight in 3842, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.457998 = idf(docFreq=1345, maxDocs=42740)
                0.0625 = fieldNorm(doc=3842)
          0.006392927 = weight(abstract_txt:based in 3842) [ClassicSimilarity], result of:
            0.006392927 = score(doc=3842,freq=1.0), product of:
              0.031876475 = queryWeight, product of:
                1.2745558 = boost
                3.2088501 = idf(docFreq=4693, maxDocs=42740)
                0.0077940286 = queryNorm
              0.20055313 = fieldWeight in 3842, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.2088501 = idf(docFreq=4693, maxDocs=42740)
                0.0625 = fieldNorm(doc=3842)
          0.036252502 = weight(abstract_txt:word in 3842) [ClassicSimilarity], result of:
            0.036252502 = score(doc=3842,freq=3.0), product of:
              0.061398573 = queryWeight, product of:
                1.4442993 = boost
                5.4543004 = idf(docFreq=496, maxDocs=42740)
                0.0077940286 = queryNorm
              0.59044534 = fieldWeight in 3842, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.4543004 = idf(docFreq=496, maxDocs=42740)
                0.0625 = fieldNorm(doc=3842)
          0.07467928 = weight(abstract_txt:unknown in 3842) [ClassicSimilarity], result of:
            0.07467928 = score(doc=3842,freq=1.0), product of:
              0.16411081 = queryWeight, product of:
                2.8919601 = boost
                7.280864 = idf(docFreq=79, maxDocs=42740)
                0.0077940286 = queryNorm
              0.455054 = fieldWeight in 3842, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.280864 = idf(docFreq=79, maxDocs=42740)
                0.0625 = fieldNorm(doc=3842)
          0.045585398 = weight(abstract_txt:proposed in 3842) [ClassicSimilarity], result of:
            0.045585398 = score(doc=3842,freq=2.0), product of:
              0.11112895 = queryWeight, product of:
                3.0722864 = boost
                4.640914 = idf(docFreq=1120, maxDocs=42740)
                0.0077940286 = queryNorm
              0.4102027 = fieldWeight in 3842, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.640914 = idf(docFreq=1120, maxDocs=42740)
                0.0625 = fieldNorm(doc=3842)
          0.070166655 = weight(abstract_txt:words in 3842) [ClassicSimilarity], result of:
            0.070166655 = score(doc=3842,freq=2.0), product of:
              0.14814849 = queryWeight, product of:
                3.54729 = boost
                5.358442 = idf(docFreq=546, maxDocs=42740)
                0.0077940286 = queryNorm
              0.4736238 = fieldWeight in 3842, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.358442 = idf(docFreq=546, maxDocs=42740)
                0.0625 = fieldNorm(doc=3842)
          0.048475165 = weight(abstract_txt:text in 3842) [ClassicSimilarity], result of:
            0.048475165 = score(doc=3842,freq=2.0), product of:
              0.1354138 = queryWeight, product of:
                4.2898245 = boost
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.0077940286 = queryNorm
              0.35797805 = fieldWeight in 3842, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.0625 = fieldNorm(doc=3842)
          0.119247615 = weight(abstract_txt:method in 3842) [ClassicSimilarity], result of:
            0.119247615 = score(doc=3842,freq=4.0), product of:
              0.21098092 = queryWeight, product of:
                5.986661 = boost
                4.5216455 = idf(docFreq=1262, maxDocs=42740)
                0.0077940286 = queryNorm
              0.5652057 = fieldWeight in 3842, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.5216455 = idf(docFreq=1262, maxDocs=42740)
                0.0625 = fieldNorm(doc=3842)
          0.16435139 = weight(abstract_txt:chinese in 3842) [ClassicSimilarity], result of:
            0.16435139 = score(doc=3842,freq=1.0), product of:
              0.41477472 = queryWeight, product of:
                8.394005 = boost
                6.3398805 = idf(docFreq=204, maxDocs=42740)
                0.0077940286 = queryNorm
              0.39624253 = fieldWeight in 3842, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.3398805 = idf(docFreq=204, maxDocs=42740)
                0.0625 = fieldNorm(doc=3842)
          0.32104385 = weight(abstract_txt:segmentation in 3842) [ClassicSimilarity], result of:
            0.32104385 = score(doc=3842,freq=1.0), product of:
              0.64814615 = queryWeight, product of:
                10.492997 = boost
                7.925221 = idf(docFreq=41, maxDocs=42740)
                0.0077940286 = queryNorm
              0.4953263 = fieldWeight in 3842, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.925221 = idf(docFreq=41, maxDocs=42740)
                0.0625 = fieldNorm(doc=3842)
        0.44 = coord(11/25)