Document (#41677)

Author
Doval, Y.
Gómez-Rodríguez, C.
Title
Comparing neural- and N-gram-based language models for word segmentation
Source
Journal of the Association for Information Science and Technology. 70(2019) no.2, S.187-197
Year
2019
Abstract
Word segmentation is the task of inserting or deleting word boundary characters in order to separate character sequences that correspond to words in some language. In this article we propose an approach based on a beam search algorithm and a language model working at the byte/character level, the latter component implemented either as an n-gram model or a recurrent neural network. The resulting system analyzes the text input with no word boundaries one token at a time, which can be a character or a byte, and uses the information gathered by the language model to determine if a boundary must be placed in the current position or not. Our aim is to use this system in a preprocessing step for a microtext normalization system. This means that it needs to effectively cope with the data sparsity present on this kind of texts. We also strove to surpass the performance of two readily available word segmentation systems: The well-known and accessible Word Breaker by Microsoft, and the Python module WordSegment by Grant Jenks. The results show that we have met our objectives, and we hope to continue to improve both the precision and the efficiency of our system in the future.
Content
Vgl.: https://onlinelibrary.wiley.com/doi/10.1002/asi.24082.
Theme
Computerlinguistik

Similar documents (author)

  1. Cuesta, P.; Gómez, A.M.; Rodríguez, F.J.: Using agents for information retrieval (2003) 4.37
    4.372631 = sum of:
      4.372631 = sum of:
        1.8127655 = weight(author_txt:rodríguez in 3746) [ClassicSimilarity], result of:
          1.8127655 = score(doc=3746,freq=1.0), product of:
            0.62205607 = queryWeight, product of:
              7.77107 = idf(docFreq=48, maxDocs=42740)
              0.080047674 = queryNorm
            2.9141512 = fieldWeight in 3746, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              7.77107 = idf(docFreq=48, maxDocs=42740)
              0.375 = fieldNorm(doc=3746)
        2.5598657 = weight(author_txt:gómez in 3746) [ClassicSimilarity], result of:
          2.5598657 = score(doc=3746,freq=1.0), product of:
            0.78297263 = queryWeight, product of:
              1.1219113 = boost
              8.7184515 = idf(docFreq=18, maxDocs=42740)
              0.080047674 = queryNorm
            3.2694192 = fieldWeight in 3746, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.7184515 = idf(docFreq=18, maxDocs=42740)
              0.375 = fieldNorm(doc=3746)
    
  2. Vilares, D.; Alonso, M.A.; Gómez-Rodríguez, C.: On the usefulness of lexical and syntactic processing in polarity classification of Twitter messages (2015) 4.37
    4.372631 = sum of:
      4.372631 = sum of:
        1.8127655 = weight(author_txt:rodríguez in 4162) [ClassicSimilarity], result of:
          1.8127655 = score(doc=4162,freq=1.0), product of:
            0.62205607 = queryWeight, product of:
              7.77107 = idf(docFreq=48, maxDocs=42740)
              0.080047674 = queryNorm
            2.9141512 = fieldWeight in 4162, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              7.77107 = idf(docFreq=48, maxDocs=42740)
              0.375 = fieldNorm(doc=4162)
        2.5598657 = weight(author_txt:gómez in 4162) [ClassicSimilarity], result of:
          2.5598657 = score(doc=4162,freq=1.0), product of:
            0.78297263 = queryWeight, product of:
              1.1219113 = boost
              8.7184515 = idf(docFreq=18, maxDocs=42740)
              0.080047674 = queryNorm
            3.2694192 = fieldWeight in 4162, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.7184515 = idf(docFreq=18, maxDocs=42740)
              0.375 = fieldNorm(doc=4162)
    
  3. Olmeda-Gómez, C.; Perianes-Rodríguez, A.; Ovalle-Perandones, M.A.: Mapas de ciencias multidisciplinares : la biología molecular en la Comunidad de Madrid (2007) 3.64
    3.6438594 = sum of:
      3.6438594 = sum of:
        1.5106379 = weight(author_txt:rodríguez in 3121) [ClassicSimilarity], result of:
          1.5106379 = score(doc=3121,freq=1.0), product of:
            0.62205607 = queryWeight, product of:
              7.77107 = idf(docFreq=48, maxDocs=42740)
              0.080047674 = queryNorm
            2.4284594 = fieldWeight in 3121, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              7.77107 = idf(docFreq=48, maxDocs=42740)
              0.3125 = fieldNorm(doc=3121)
        2.1332216 = weight(author_txt:gómez in 3121) [ClassicSimilarity], result of:
          2.1332216 = score(doc=3121,freq=1.0), product of:
            0.78297263 = queryWeight, product of:
              1.1219113 = boost
              8.7184515 = idf(docFreq=18, maxDocs=42740)
              0.080047674 = queryNorm
            2.7245162 = fieldWeight in 3121, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.7184515 = idf(docFreq=18, maxDocs=42740)
              0.3125 = fieldNorm(doc=3121)
    
  4. Gómez Prada, R. Gómez => Gómez Prada, R.: 2.22
    2.216909 = sum of:
      2.216909 = product of:
        4.433818 = sum of:
          4.433818 = weight(author_txt:gómez in 1120) [ClassicSimilarity], result of:
            4.433818 = score(doc=1120,freq=3.0), product of:
              0.78297263 = queryWeight, product of:
                1.1219113 = boost
                8.7184515 = idf(docFreq=18, maxDocs=42740)
                0.080047674 = queryNorm
              5.6628003 = fieldWeight in 1120, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.7184515 = idf(docFreq=18, maxDocs=42740)
                0.375 = fieldNorm(doc=1120)
        0.5 = coord(1/2)
    
  5. Gómez, C. Olmeda- -> Olmeda-Gómez, C.: 1.81
    1.8100985 = sum of:
      1.8100985 = product of:
        3.620197 = sum of:
          3.620197 = weight(author_txt:gómez in 7447) [ClassicSimilarity], result of:
            3.620197 = score(doc=7447,freq=2.0), product of:
              0.78297263 = queryWeight, product of:
                1.1219113 = boost
                8.7184515 = idf(docFreq=18, maxDocs=42740)
                0.080047674 = queryNorm
              4.623657 = fieldWeight in 7447, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.7184515 = idf(docFreq=18, maxDocs=42740)
                0.375 = fieldNorm(doc=7447)
        0.5 = coord(1/2)
    

Similar documents (content)

  1. Kwok, K.L.: Employing multiple representations for Chinese information retrieval (1999) 0.30
    0.30295685 = sum of:
      0.30295685 = product of:
        0.94674015 = sum of:
          0.007462124 = weight(abstract_txt:that in 4774) [ClassicSimilarity], result of:
            0.007462124 = score(doc=4774,freq=2.0), product of:
              0.035255253 = queryWeight, product of:
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.014722455 = queryNorm
              0.21165991 = fieldWeight in 4774, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.0625 = fieldNorm(doc=4774)
          0.07178887 = weight(abstract_txt:characters in 4774) [ClassicSimilarity], result of:
            0.07178887 = score(doc=4774,freq=2.0), product of:
              0.11057235 = queryWeight, product of:
                1.0224704 = boost
                7.3454022 = idf(docFreq=74, maxDocs=42740)
                0.014722455 = queryNorm
              0.64924794 = fieldWeight in 4774, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.3454022 = idf(docFreq=74, maxDocs=42740)
                0.0625 = fieldNorm(doc=4774)
          0.02756696 = weight(abstract_txt:system in 4774) [ClassicSimilarity], result of:
            0.02756696 = score(doc=4774,freq=2.0), product of:
              0.092730165 = queryWeight, product of:
                1.8726989 = boost
                3.3633559 = idf(docFreq=4021, maxDocs=42740)
                0.014722455 = queryNorm
              0.29728147 = fieldWeight in 4774, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.3633559 = idf(docFreq=4021, maxDocs=42740)
                0.0625 = fieldNorm(doc=4774)
          0.13112518 = weight(abstract_txt:gram in 4774) [ClassicSimilarity], result of:
            0.13112518 = score(doc=4774,freq=1.0), product of:
              0.26227236 = queryWeight, product of:
                2.2269921 = boost
                7.999329 = idf(docFreq=38, maxDocs=42740)
                0.014722455 = queryNorm
              0.49995807 = fieldWeight in 4774, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.999329 = idf(docFreq=38, maxDocs=42740)
                0.0625 = fieldNorm(doc=4774)
          0.03779032 = weight(abstract_txt:language in 4774) [ClassicSimilarity], result of:
            0.03779032 = score(doc=4774,freq=1.0), product of:
              0.14417572 = queryWeight, product of:
                2.3350894 = boost
                4.1938066 = idf(docFreq=1752, maxDocs=42740)
                0.014722455 = queryNorm
              0.26211292 = fieldWeight in 4774, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1938066 = idf(docFreq=1752, maxDocs=42740)
                0.0625 = fieldNorm(doc=4774)
          0.15110916 = weight(abstract_txt:character in 4774) [ClassicSimilarity], result of:
            0.15110916 = score(doc=4774,freq=2.0), product of:
              0.26192448 = queryWeight, product of:
                2.725688 = boost
                6.527092 = idf(docFreq=169, maxDocs=42740)
                0.014722455 = queryNorm
              0.57691884 = fieldWeight in 4774, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.527092 = idf(docFreq=169, maxDocs=42740)
                0.0625 = fieldNorm(doc=4774)
          0.27049908 = weight(abstract_txt:segmentation in 4774) [ClassicSimilarity], result of:
            0.27049908 = score(doc=4774,freq=2.0), product of:
              0.386153 = queryWeight, product of:
                3.3095412 = boost
                7.925221 = idf(docFreq=41, maxDocs=42740)
                0.014722455 = queryNorm
              0.70049715 = fieldWeight in 4774, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.925221 = idf(docFreq=41, maxDocs=42740)
                0.0625 = fieldNorm(doc=4774)
          0.24939848 = weight(abstract_txt:word in 4774) [ClassicSimilarity], result of:
            0.24939848 = score(doc=4774,freq=4.0), product of:
              0.3658009 = queryWeight, product of:
                4.555389 = boost
                5.4543004 = idf(docFreq=496, maxDocs=42740)
                0.014722455 = queryNorm
              0.68178755 = fieldWeight in 4774, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.4543004 = idf(docFreq=496, maxDocs=42740)
                0.0625 = fieldNorm(doc=4774)
        0.32 = coord(8/25)
    
  2. Wang, F.L.; Yang, C.C.: Mining Web data for Chinese segmentation (2007) 0.30
    0.2981743 = sum of:
      0.2981743 = product of:
        1.0649081 = sum of:
          0.007462124 = weight(abstract_txt:that in 2605) [ClassicSimilarity], result of:
            0.007462124 = score(doc=2605,freq=2.0), product of:
              0.035255253 = queryWeight, product of:
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.014722455 = queryNorm
              0.21165991 = fieldWeight in 2605, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.0625 = fieldNorm(doc=2605)
          0.052206267 = weight(abstract_txt:sequences in 2605) [ClassicSimilarity], result of:
            0.052206267 = score(doc=2605,freq=1.0), product of:
              0.11265925 = queryWeight, product of:
                1.0320741 = boost
                7.4143953 = idf(docFreq=69, maxDocs=42740)
                0.014722455 = queryNorm
              0.4633997 = fieldWeight in 2605, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.4143953 = idf(docFreq=69, maxDocs=42740)
                0.0625 = fieldNorm(doc=2605)
          0.0074700527 = weight(abstract_txt:this in 2605) [ClassicSimilarity], result of:
            0.0074700527 = score(doc=2605,freq=1.0), product of:
              0.04892388 = queryWeight, product of:
                1.3602474 = boost
                2.442996 = idf(docFreq=10095, maxDocs=42740)
                0.014722455 = queryNorm
              0.15268725 = fieldWeight in 2605, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.442996 = idf(docFreq=10095, maxDocs=42740)
                0.0625 = fieldNorm(doc=2605)
          0.06545475 = weight(abstract_txt:language in 2605) [ClassicSimilarity], result of:
            0.06545475 = score(doc=2605,freq=3.0), product of:
              0.14417572 = queryWeight, product of:
                2.3350894 = boost
                4.1938066 = idf(docFreq=1752, maxDocs=42740)
                0.014722455 = queryNorm
              0.45399287 = fieldWeight in 2605, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.1938066 = idf(docFreq=1752, maxDocs=42740)
                0.0625 = fieldNorm(doc=2605)
          0.15110916 = weight(abstract_txt:character in 2605) [ClassicSimilarity], result of:
            0.15110916 = score(doc=2605,freq=2.0), product of:
              0.26192448 = queryWeight, product of:
                2.725688 = boost
                6.527092 = idf(docFreq=169, maxDocs=42740)
                0.014722455 = queryNorm
              0.57691884 = fieldWeight in 2605, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.527092 = idf(docFreq=169, maxDocs=42740)
                0.0625 = fieldNorm(doc=2605)
          0.6048544 = weight(abstract_txt:segmentation in 2605) [ClassicSimilarity], result of:
            0.6048544 = score(doc=2605,freq=10.0), product of:
              0.386153 = queryWeight, product of:
                3.3095412 = boost
                7.925221 = idf(docFreq=41, maxDocs=42740)
                0.014722455 = queryNorm
              1.5663594 = fieldWeight in 2605, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                7.925221 = idf(docFreq=41, maxDocs=42740)
                0.0625 = fieldNorm(doc=2605)
          0.17635135 = weight(abstract_txt:word in 2605) [ClassicSimilarity], result of:
            0.17635135 = score(doc=2605,freq=2.0), product of:
              0.3658009 = queryWeight, product of:
                4.555389 = boost
                5.4543004 = idf(docFreq=496, maxDocs=42740)
                0.014722455 = queryNorm
              0.48209658 = fieldWeight in 2605, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4543004 = idf(docFreq=496, maxDocs=42740)
                0.0625 = fieldNorm(doc=2605)
        0.28 = coord(7/25)
    
  3. Peng, F.; Huang, X.: Machine learning for Asian language text classification (2007) 0.28
    0.27835247 = sum of:
      0.27835247 = product of:
        0.99411595 = sum of:
          0.009139198 = weight(abstract_txt:that in 2832) [ClassicSimilarity], result of:
            0.009139198 = score(doc=2832,freq=3.0), product of:
              0.035255253 = queryWeight, product of:
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.014722455 = queryNorm
              0.2592294 = fieldWeight in 2832, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.0625 = fieldNorm(doc=2832)
          0.01056425 = weight(abstract_txt:this in 2832) [ClassicSimilarity], result of:
            0.01056425 = score(doc=2832,freq=2.0), product of:
              0.04892388 = queryWeight, product of:
                1.3602474 = boost
                2.442996 = idf(docFreq=10095, maxDocs=42740)
                0.014722455 = queryNorm
              0.21593238 = fieldWeight in 2832, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.442996 = idf(docFreq=10095, maxDocs=42740)
                0.0625 = fieldNorm(doc=2832)
          0.025005512 = weight(abstract_txt:model in 2832) [ClassicSimilarity], result of:
            0.025005512 = score(doc=2832,freq=1.0), product of:
              0.099467844 = queryWeight, product of:
                1.6796911 = boost
                4.022287 = idf(docFreq=2080, maxDocs=42740)
                0.014722455 = queryNorm
              0.25139293 = fieldWeight in 2832, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.022287 = idf(docFreq=2080, maxDocs=42740)
                0.0625 = fieldNorm(doc=2832)
          0.118370466 = weight(abstract_txt:boundary in 2832) [ClassicSimilarity], result of:
            0.118370466 = score(doc=2832,freq=1.0), product of:
              0.24497628 = queryWeight, product of:
                2.1523082 = boost
                7.731065 = idf(docFreq=50, maxDocs=42740)
                0.014722455 = queryNorm
              0.48319155 = fieldWeight in 2832, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.731065 = idf(docFreq=50, maxDocs=42740)
                0.0625 = fieldNorm(doc=2832)
          0.07558064 = weight(abstract_txt:language in 2832) [ClassicSimilarity], result of:
            0.07558064 = score(doc=2832,freq=4.0), product of:
              0.14417572 = queryWeight, product of:
                2.3350894 = boost
                4.1938066 = idf(docFreq=1752, maxDocs=42740)
                0.014722455 = queryNorm
              0.52422583 = fieldWeight in 2832, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.1938066 = idf(docFreq=1752, maxDocs=42740)
                0.0625 = fieldNorm(doc=2832)
          0.50605744 = weight(abstract_txt:segmentation in 2832) [ClassicSimilarity], result of:
            0.50605744 = score(doc=2832,freq=7.0), product of:
              0.386153 = queryWeight, product of:
                3.3095412 = boost
                7.925221 = idf(docFreq=41, maxDocs=42740)
                0.014722455 = queryNorm
              1.3105102 = fieldWeight in 2832, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                7.925221 = idf(docFreq=41, maxDocs=42740)
                0.0625 = fieldNorm(doc=2832)
          0.24939848 = weight(abstract_txt:word in 2832) [ClassicSimilarity], result of:
            0.24939848 = score(doc=2832,freq=4.0), product of:
              0.3658009 = queryWeight, product of:
                4.555389 = boost
                5.4543004 = idf(docFreq=496, maxDocs=42740)
                0.014722455 = queryNorm
              0.68178755 = fieldWeight in 2832, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.4543004 = idf(docFreq=496, maxDocs=42740)
                0.0625 = fieldNorm(doc=2832)
        0.28 = coord(7/25)
    
  4. Yang, C.C.; Li, K.W.: ¬A heuristic method based on a statistical approach for chinese text segmentation (2005) 0.24
    0.23672555 = sum of:
      0.23672555 = product of:
        0.9863565 = sum of:
          0.007462124 = weight(abstract_txt:that in 581) [ClassicSimilarity], result of:
            0.007462124 = score(doc=581,freq=2.0), product of:
              0.035255253 = queryWeight, product of:
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.014722455 = queryNorm
              0.21165991 = fieldWeight in 581, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.0625 = fieldNorm(doc=581)
          0.0507624 = weight(abstract_txt:characters in 581) [ClassicSimilarity], result of:
            0.0507624 = score(doc=581,freq=1.0), product of:
              0.11057235 = queryWeight, product of:
                1.0224704 = boost
                7.3454022 = idf(docFreq=74, maxDocs=42740)
                0.014722455 = queryNorm
              0.45908764 = fieldWeight in 581, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3454022 = idf(docFreq=74, maxDocs=42740)
                0.0625 = fieldNorm(doc=581)
          0.01056425 = weight(abstract_txt:this in 581) [ClassicSimilarity], result of:
            0.01056425 = score(doc=581,freq=2.0), product of:
              0.04892388 = queryWeight, product of:
                1.3602474 = boost
                2.442996 = idf(docFreq=10095, maxDocs=42740)
                0.014722455 = queryNorm
              0.21593238 = fieldWeight in 581, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.442996 = idf(docFreq=10095, maxDocs=42740)
                0.0625 = fieldNorm(doc=581)
          0.16740112 = weight(abstract_txt:boundary in 581) [ClassicSimilarity], result of:
            0.16740112 = score(doc=581,freq=2.0), product of:
              0.24497628 = queryWeight, product of:
                2.1523082 = boost
                7.731065 = idf(docFreq=50, maxDocs=42740)
                0.014722455 = queryNorm
              0.683336 = fieldWeight in 581, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.731065 = idf(docFreq=50, maxDocs=42740)
                0.0625 = fieldNorm(doc=581)
          0.5738152 = weight(abstract_txt:segmentation in 581) [ClassicSimilarity], result of:
            0.5738152 = score(doc=581,freq=9.0), product of:
              0.386153 = queryWeight, product of:
                3.3095412 = boost
                7.925221 = idf(docFreq=41, maxDocs=42740)
                0.014722455 = queryNorm
              1.485979 = fieldWeight in 581, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                7.925221 = idf(docFreq=41, maxDocs=42740)
                0.0625 = fieldNorm(doc=581)
          0.17635135 = weight(abstract_txt:word in 581) [ClassicSimilarity], result of:
            0.17635135 = score(doc=581,freq=2.0), product of:
              0.3658009 = queryWeight, product of:
                4.555389 = boost
                5.4543004 = idf(docFreq=496, maxDocs=42740)
                0.014722455 = queryNorm
              0.48209658 = fieldWeight in 581, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4543004 = idf(docFreq=496, maxDocs=42740)
                0.0625 = fieldNorm(doc=581)
        0.24 = coord(6/25)
    
  5. Khoo, C.S.G.; Dai, D.; Loh, T.E.: Using statistical and contextual information to identify two- and three-character words in Chinese text (2002) 0.23
    0.23408085 = sum of:
      0.23408085 = product of:
        0.9753369 = sum of:
          0.007462124 = weight(abstract_txt:that in 207) [ClassicSimilarity], result of:
            0.007462124 = score(doc=207,freq=2.0), product of:
              0.035255253 = queryWeight, product of:
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.014722455 = queryNorm
              0.21165991 = fieldWeight in 207, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.0625 = fieldNorm(doc=207)
          0.08792306 = weight(abstract_txt:characters in 207) [ClassicSimilarity], result of:
            0.08792306 = score(doc=207,freq=3.0), product of:
              0.11057235 = queryWeight, product of:
                1.0224704 = boost
                7.3454022 = idf(docFreq=74, maxDocs=42740)
                0.014722455 = queryNorm
              0.7951631 = fieldWeight in 207, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.3454022 = idf(docFreq=74, maxDocs=42740)
                0.0625 = fieldNorm(doc=207)
          0.025005512 = weight(abstract_txt:model in 207) [ClassicSimilarity], result of:
            0.025005512 = score(doc=207,freq=1.0), product of:
              0.099467844 = queryWeight, product of:
                1.6796911 = boost
                4.022287 = idf(docFreq=2080, maxDocs=42740)
                0.014722455 = queryNorm
              0.25139293 = fieldWeight in 207, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.022287 = idf(docFreq=2080, maxDocs=42740)
                0.0625 = fieldNorm(doc=207)
          0.2617288 = weight(abstract_txt:character in 207) [ClassicSimilarity], result of:
            0.2617288 = score(doc=207,freq=6.0), product of:
              0.26192448 = queryWeight, product of:
                2.725688 = boost
                6.527092 = idf(docFreq=169, maxDocs=42740)
                0.014722455 = queryNorm
              0.99925286 = fieldWeight in 207, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.527092 = idf(docFreq=169, maxDocs=42740)
                0.0625 = fieldNorm(doc=207)
          0.4685182 = weight(abstract_txt:segmentation in 207) [ClassicSimilarity], result of:
            0.4685182 = score(doc=207,freq=6.0), product of:
              0.386153 = queryWeight, product of:
                3.3095412 = boost
                7.925221 = idf(docFreq=41, maxDocs=42740)
                0.014722455 = queryNorm
              1.2132968 = fieldWeight in 207, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.925221 = idf(docFreq=41, maxDocs=42740)
                0.0625 = fieldNorm(doc=207)
          0.12469924 = weight(abstract_txt:word in 207) [ClassicSimilarity], result of:
            0.12469924 = score(doc=207,freq=1.0), product of:
              0.3658009 = queryWeight, product of:
                4.555389 = boost
                5.4543004 = idf(docFreq=496, maxDocs=42740)
                0.014722455 = queryNorm
              0.34089378 = fieldWeight in 207, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4543004 = idf(docFreq=496, maxDocs=42740)
                0.0625 = fieldNorm(doc=207)
        0.24 = coord(6/25)