Document (#34115)

Author
Hu, G.
Zhou, S.
Guan, J.
Hu, X.
Title
Towards effective document clustering : a constrained K-means based approach
Source
Information processing and management. 44(2008) no.4, S.1397-1409
Year
2008
Abstract
Document clustering is an important tool for document collection organization and browsing. In real applications, some limited knowledge about cluster membership of a small number of documents is often available, such as some pairs of documents belonging to the same cluster. This kind of prior knowledge can be served as constraints for the clustering process. We integrate the constraints into the trace formulation of the sum of square Euclidean distance function of K-means. Then, the combined criterion function is transformed into trace maximization, which is further optimized by eigen-decomposition. Our experimental evaluation shows that the proposed semi-supervised clustering method can achieve better performance, compared to three existing methods.
Theme
Automatisches Klassifizieren

Similar documents (author)

  1. Bell, D.A.; Guan, J.W.: Computational methods for rough classification and discovery (1998) 1.99
    1.9890324 = sum of:
      1.9890324 = product of:
        3.9780648 = sum of:
          3.9780648 = weight(author_txt:guan in 3910) [ClassicSimilarity], result of:
            3.9780648 = score(doc=3910,freq=1.0), product of:
              0.81878626 = queryWeight, product of:
                1.1942413 = boost
                9.71698 = idf(docFreq=6, maxDocs=42740)
                0.070558146 = queryNorm
              4.85849 = fieldWeight in 3910, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.71698 = idf(docFreq=6, maxDocs=42740)
                0.5 = fieldNorm(doc=3910)
        0.5 = coord(1/2)
    
  2. Cowie, J.; Guan, Z.: CRL English routing system for TREC-5 (1997) 1.99
    1.9890324 = sum of:
      1.9890324 = product of:
        3.9780648 = sum of:
          3.9780648 = weight(author_txt:guan in 4107) [ClassicSimilarity], result of:
            3.9780648 = score(doc=4107,freq=1.0), product of:
              0.81878626 = queryWeight, product of:
                1.1942413 = boost
                9.71698 = idf(docFreq=6, maxDocs=42740)
                0.070558146 = queryNorm
              4.85849 = fieldWeight in 4107, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.71698 = idf(docFreq=6, maxDocs=42740)
                0.5 = fieldNorm(doc=4107)
        0.5 = coord(1/2)
    
  3. Wang, J.; Guan, J.: ¬The analysis and evaluation of knowledge efficiency in research groups (2005) 1.99
    1.9890324 = sum of:
      1.9890324 = product of:
        3.9780648 = sum of:
          3.9780648 = weight(author_txt:guan in 239) [ClassicSimilarity], result of:
            3.9780648 = score(doc=239,freq=1.0), product of:
              0.81878626 = queryWeight, product of:
                1.1942413 = boost
                9.71698 = idf(docFreq=6, maxDocs=42740)
                0.070558146 = queryNorm
              4.85849 = fieldWeight in 239, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.71698 = idf(docFreq=6, maxDocs=42740)
                0.5 = fieldNorm(doc=239)
        0.5 = coord(1/2)
    
  4. Guan, J.C.; Gao, X.: Exploring the h-index at patent level (2009) 1.99
    1.9890324 = sum of:
      1.9890324 = product of:
        3.9780648 = sum of:
          3.9780648 = weight(author_txt:guan in 4697) [ClassicSimilarity], result of:
            3.9780648 = score(doc=4697,freq=1.0), product of:
              0.81878626 = queryWeight, product of:
                1.1942413 = boost
                9.71698 = idf(docFreq=6, maxDocs=42740)
                0.070558146 = queryNorm
              4.85849 = fieldWeight in 4697, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.71698 = idf(docFreq=6, maxDocs=42740)
                0.5 = fieldNorm(doc=4697)
        0.5 = coord(1/2)
    
  5. Ma, N.; Guan, J.; Zhao, Y.: Bringing PageRank to the citation analysis (2008) 1.49
    1.4917743 = sum of:
      1.4917743 = product of:
        2.9835486 = sum of:
          2.9835486 = weight(author_txt:guan in 4065) [ClassicSimilarity], result of:
            2.9835486 = score(doc=4065,freq=1.0), product of:
              0.81878626 = queryWeight, product of:
                1.1942413 = boost
                9.71698 = idf(docFreq=6, maxDocs=42740)
                0.070558146 = queryNorm
              3.6438675 = fieldWeight in 4065, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.71698 = idf(docFreq=6, maxDocs=42740)
                0.375 = fieldNorm(doc=4065)
        0.5 = coord(1/2)
    

Similar documents (content)

  1. AlQenaei, Z.M.; Monarchi, D.E.: ¬The use of learning techniques to analyze the results of a manual classification system (2016) 0.22
    0.2236726 = sum of:
      0.2236726 = product of:
        0.6989769 = sum of:
          0.054588184 = weight(abstract_txt:pairs in 4837) [ClassicSimilarity], result of:
            0.054588184 = score(doc=4837,freq=1.0), product of:
              0.12764297 = queryWeight, product of:
                6.842609 = idf(docFreq=123, maxDocs=42740)
                0.018654138 = queryNorm
              0.42766306 = fieldWeight in 4837, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.842609 = idf(docFreq=123, maxDocs=42740)
                0.0625 = fieldNorm(doc=4837)
          0.07538256 = weight(abstract_txt:supervised in 4837) [ClassicSimilarity], result of:
            0.07538256 = score(doc=4837,freq=1.0), product of:
              0.15828693 = queryWeight, product of:
                1.1135868 = boost
                7.619839 = idf(docFreq=56, maxDocs=42740)
                0.018654138 = queryNorm
              0.47623995 = fieldWeight in 4837, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.619839 = idf(docFreq=56, maxDocs=42740)
                0.0625 = fieldNorm(doc=4837)
          0.08332921 = weight(abstract_txt:belonging in 4837) [ClassicSimilarity], result of:
            0.08332921 = score(doc=4837,freq=1.0), product of:
              0.16922426 = queryWeight, product of:
                1.1514176 = boost
                7.8787007 = idf(docFreq=43, maxDocs=42740)
                0.018654138 = queryNorm
              0.4924188 = fieldWeight in 4837, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.8787007 = idf(docFreq=43, maxDocs=42740)
                0.0625 = fieldNorm(doc=4837)
          0.084060796 = weight(abstract_txt:decomposition in 4837) [ClassicSimilarity], result of:
            0.084060796 = score(doc=4837,freq=1.0), product of:
              0.1702133 = queryWeight, product of:
                1.1547774 = boost
                7.9016905 = idf(docFreq=42, maxDocs=42740)
                0.018654138 = queryNorm
              0.49385566 = fieldWeight in 4837, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.9016905 = idf(docFreq=42, maxDocs=42740)
                0.0625 = fieldNorm(doc=4837)
          0.033590086 = weight(abstract_txt:documents in 4837) [ClassicSimilarity], result of:
            0.033590086 = score(doc=4837,freq=2.0), product of:
              0.09234326 = queryWeight, product of:
                1.2028713 = boost
                4.115389 = idf(docFreq=1895, maxDocs=42740)
                0.018654138 = queryNorm
              0.36375242 = fieldWeight in 4837, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.115389 = idf(docFreq=1895, maxDocs=42740)
                0.0625 = fieldNorm(doc=4837)
          0.1638831 = weight(abstract_txt:euclidean in 4837) [ClassicSimilarity], result of:
            0.1638831 = score(doc=4837,freq=1.0), product of:
              0.2656362 = queryWeight, product of:
                1.4425975 = boost
                9.871131 = idf(docFreq=5, maxDocs=42740)
                0.018654138 = queryNorm
              0.6169457 = fieldWeight in 4837, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.871131 = idf(docFreq=5, maxDocs=42740)
                0.0625 = fieldNorm(doc=4837)
          0.040097676 = weight(abstract_txt:document in 4837) [ClassicSimilarity], result of:
            0.040097676 = score(doc=4837,freq=1.0), product of:
              0.14987104 = queryWeight, product of:
                1.8768132 = boost
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.018654138 = queryNorm
              0.26754788 = fieldWeight in 4837, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.0625 = fieldNorm(doc=4837)
          0.16404526 = weight(abstract_txt:clustering in 4837) [ClassicSimilarity], result of:
            0.16404526 = score(doc=4837,freq=1.0), product of:
              0.4219493 = queryWeight, product of:
                3.6363165 = boost
                6.220473 = idf(docFreq=230, maxDocs=42740)
                0.018654138 = queryNorm
              0.38877955 = fieldWeight in 4837, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.220473 = idf(docFreq=230, maxDocs=42740)
                0.0625 = fieldNorm(doc=4837)
        0.32 = coord(8/25)
    
  2. Dunlavy, D.M.; O'Leary, D.P.; Conroy, J.M.; Schlesinger, J.D.: QCS: A system for querying, clustering and summarizing documents (2007) 0.16
    0.15526 = sum of:
      0.15526 = product of:
        0.5545 = sum of:
          0.015360655 = weight(abstract_txt:into in 2948) [ClassicSimilarity], result of:
            0.015360655 = score(doc=2948,freq=1.0), product of:
              0.0754876 = queryWeight, product of:
                1.0875628 = boost
                3.7208836 = idf(docFreq=2812, maxDocs=42740)
                0.018654138 = queryNorm
              0.20348582 = fieldWeight in 2948, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.7208836 = idf(docFreq=2812, maxDocs=42740)
                0.0546875 = fieldNorm(doc=2948)
          0.0735532 = weight(abstract_txt:decomposition in 2948) [ClassicSimilarity], result of:
            0.0735532 = score(doc=2948,freq=1.0), product of:
              0.1702133 = queryWeight, product of:
                1.1547774 = boost
                7.9016905 = idf(docFreq=42, maxDocs=42740)
                0.018654138 = queryNorm
              0.4321237 = fieldWeight in 2948, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.9016905 = idf(docFreq=42, maxDocs=42740)
                0.0546875 = fieldNorm(doc=2948)
          0.029391326 = weight(abstract_txt:documents in 2948) [ClassicSimilarity], result of:
            0.029391326 = score(doc=2948,freq=2.0), product of:
              0.09234326 = queryWeight, product of:
                1.2028713 = boost
                4.115389 = idf(docFreq=1895, maxDocs=42740)
                0.018654138 = queryNorm
              0.31828338 = fieldWeight in 2948, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.115389 = idf(docFreq=1895, maxDocs=42740)
                0.0546875 = fieldNorm(doc=2948)
          0.03758375 = weight(abstract_txt:means in 2948) [ClassicSimilarity], result of:
            0.03758375 = score(doc=2948,freq=1.0), product of:
              0.13706793 = queryWeight, product of:
                1.4654955 = boost
                5.013906 = idf(docFreq=771, maxDocs=42740)
                0.018654138 = queryNorm
              0.274198 = fieldWeight in 2948, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.013906 = idf(docFreq=771, maxDocs=42740)
                0.0546875 = fieldNorm(doc=2948)
          0.049618345 = weight(abstract_txt:document in 2948) [ClassicSimilarity], result of:
            0.049618345 = score(doc=2948,freq=2.0), product of:
              0.14987104 = queryWeight, product of:
                1.8768132 = boost
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.018654138 = queryNorm
              0.3310736 = fieldWeight in 2948, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.0546875 = fieldNorm(doc=2948)
          0.14599706 = weight(abstract_txt:cluster in 2948) [ClassicSimilarity], result of:
            0.14599706 = score(doc=2948,freq=3.0), product of:
              0.23485048 = queryWeight, product of:
                1.9182812 = boost
                6.563024 = idf(docFreq=163, maxDocs=42740)
                0.018654138 = queryNorm
              0.62165964 = fieldWeight in 2948, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.563024 = idf(docFreq=163, maxDocs=42740)
                0.0546875 = fieldNorm(doc=2948)
          0.20299566 = weight(abstract_txt:clustering in 2948) [ClassicSimilarity], result of:
            0.20299566 = score(doc=2948,freq=2.0), product of:
              0.4219493 = queryWeight, product of:
                3.6363165 = boost
                6.220473 = idf(docFreq=230, maxDocs=42740)
                0.018654138 = queryNorm
              0.48109016 = fieldWeight in 2948, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.220473 = idf(docFreq=230, maxDocs=42740)
                0.0546875 = fieldNorm(doc=2948)
        0.28 = coord(7/25)
    
  3. Zamir, O.; Etzioni, O.: Grouper : a dynamic clustering interface to Web search results (1999) 0.15
    0.14823945 = sum of:
      0.14823945 = product of:
        0.7411972 = sum of:
          0.021943795 = weight(abstract_txt:into in 208) [ClassicSimilarity], result of:
            0.021943795 = score(doc=208,freq=1.0), product of:
              0.0754876 = queryWeight, product of:
                1.0875628 = boost
                3.7208836 = idf(docFreq=2812, maxDocs=42740)
                0.018654138 = queryNorm
              0.29069403 = fieldWeight in 208, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.7208836 = idf(docFreq=2812, maxDocs=42740)
                0.078125 = fieldNorm(doc=208)
          0.029689722 = weight(abstract_txt:documents in 208) [ClassicSimilarity], result of:
            0.029689722 = score(doc=208,freq=1.0), product of:
              0.09234326 = queryWeight, product of:
                1.2028713 = boost
                4.115389 = idf(docFreq=1895, maxDocs=42740)
                0.018654138 = queryNorm
              0.32151476 = fieldWeight in 208, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.115389 = idf(docFreq=1895, maxDocs=42740)
                0.078125 = fieldNorm(doc=208)
          0.07088335 = weight(abstract_txt:document in 208) [ClassicSimilarity], result of:
            0.07088335 = score(doc=208,freq=2.0), product of:
              0.14987104 = queryWeight, product of:
                1.8768132 = boost
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.018654138 = queryNorm
              0.4729623 = fieldWeight in 208, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.078125 = fieldNorm(doc=208)
          0.20856725 = weight(abstract_txt:cluster in 208) [ClassicSimilarity], result of:
            0.20856725 = score(doc=208,freq=3.0), product of:
              0.23485048 = queryWeight, product of:
                1.9182812 = boost
                6.563024 = idf(docFreq=163, maxDocs=42740)
                0.018654138 = queryNorm
              0.88808525 = fieldWeight in 208, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.563024 = idf(docFreq=163, maxDocs=42740)
                0.078125 = fieldNorm(doc=208)
          0.41011313 = weight(abstract_txt:clustering in 208) [ClassicSimilarity], result of:
            0.41011313 = score(doc=208,freq=4.0), product of:
              0.4219493 = queryWeight, product of:
                3.6363165 = boost
                6.220473 = idf(docFreq=230, maxDocs=42740)
                0.018654138 = queryNorm
              0.97194886 = fieldWeight in 208, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.220473 = idf(docFreq=230, maxDocs=42740)
                0.078125 = fieldNorm(doc=208)
        0.2 = coord(5/25)
    
  4. Rooney, N.; Patterson, D.; Galushka, M.; Dobrynin, V.; Smirnova, E.: ¬An investigation into the stability of contextual document clustering (2008) 0.14
    0.14369208 = sum of:
      0.14369208 = product of:
        0.598717 = sum of:
          0.017555036 = weight(abstract_txt:into in 3357) [ClassicSimilarity], result of:
            0.017555036 = score(doc=3357,freq=1.0), product of:
              0.0754876 = queryWeight, product of:
                1.0875628 = boost
                3.7208836 = idf(docFreq=2812, maxDocs=42740)
                0.018654138 = queryNorm
              0.23255523 = fieldWeight in 3357, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.7208836 = idf(docFreq=2812, maxDocs=42740)
                0.0625 = fieldNorm(doc=3357)
          0.033590086 = weight(abstract_txt:documents in 3357) [ClassicSimilarity], result of:
            0.033590086 = score(doc=3357,freq=2.0), product of:
              0.09234326 = queryWeight, product of:
                1.2028713 = boost
                4.115389 = idf(docFreq=1895, maxDocs=42740)
                0.018654138 = queryNorm
              0.36375242 = fieldWeight in 3357, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.115389 = idf(docFreq=1895, maxDocs=42740)
                0.0625 = fieldNorm(doc=3357)
          0.042952858 = weight(abstract_txt:means in 3357) [ClassicSimilarity], result of:
            0.042952858 = score(doc=3357,freq=1.0), product of:
              0.13706793 = queryWeight, product of:
                1.4654955 = boost
                5.013906 = idf(docFreq=771, maxDocs=42740)
                0.018654138 = queryNorm
              0.31336913 = fieldWeight in 3357, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.013906 = idf(docFreq=771, maxDocs=42740)
                0.0625 = fieldNorm(doc=3357)
          0.08019535 = weight(abstract_txt:document in 3357) [ClassicSimilarity], result of:
            0.08019535 = score(doc=3357,freq=4.0), product of:
              0.14987104 = queryWeight, product of:
                1.8768132 = boost
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.018654138 = queryNorm
              0.53509575 = fieldWeight in 3357, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.0625 = fieldNorm(doc=3357)
          0.09633309 = weight(abstract_txt:cluster in 3357) [ClassicSimilarity], result of:
            0.09633309 = score(doc=3357,freq=1.0), product of:
              0.23485048 = queryWeight, product of:
                1.9182812 = boost
                6.563024 = idf(docFreq=163, maxDocs=42740)
                0.018654138 = queryNorm
              0.410189 = fieldWeight in 3357, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.563024 = idf(docFreq=163, maxDocs=42740)
                0.0625 = fieldNorm(doc=3357)
          0.32809052 = weight(abstract_txt:clustering in 3357) [ClassicSimilarity], result of:
            0.32809052 = score(doc=3357,freq=4.0), product of:
              0.4219493 = queryWeight, product of:
                3.6363165 = boost
                6.220473 = idf(docFreq=230, maxDocs=42740)
                0.018654138 = queryNorm
              0.7775591 = fieldWeight in 3357, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.220473 = idf(docFreq=230, maxDocs=42740)
                0.0625 = fieldNorm(doc=3357)
        0.24 = coord(6/25)
    
  5. Na, S.-H.; Kang, I.-S.; Lee, J.-H.: Adaptive document clustering based on query-based similarity (2007) 0.14
    0.13888569 = sum of:
      0.13888569 = product of:
        0.69442844 = sum of:
          0.023751777 = weight(abstract_txt:documents in 2921) [ClassicSimilarity], result of:
            0.023751777 = score(doc=2921,freq=1.0), product of:
              0.09234326 = queryWeight, product of:
                1.2028713 = boost
                4.115389 = idf(docFreq=1895, maxDocs=42740)
                0.018654138 = queryNorm
              0.2572118 = fieldWeight in 2921, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.115389 = idf(docFreq=1895, maxDocs=42740)
                0.0625 = fieldNorm(doc=2921)
          0.042952858 = weight(abstract_txt:means in 2921) [ClassicSimilarity], result of:
            0.042952858 = score(doc=2921,freq=1.0), product of:
              0.13706793 = queryWeight, product of:
                1.4654955 = boost
                5.013906 = idf(docFreq=771, maxDocs=42740)
                0.018654138 = queryNorm
              0.31336913 = fieldWeight in 2921, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.013906 = idf(docFreq=771, maxDocs=42740)
                0.0625 = fieldNorm(doc=2921)
          0.08966113 = weight(abstract_txt:document in 2921) [ClassicSimilarity], result of:
            0.08966113 = score(doc=2921,freq=5.0), product of:
              0.14987104 = queryWeight, product of:
                1.8768132 = boost
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.018654138 = queryNorm
              0.5982552 = fieldWeight in 2921, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.0625 = fieldNorm(doc=2921)
          0.13623555 = weight(abstract_txt:cluster in 2921) [ClassicSimilarity], result of:
            0.13623555 = score(doc=2921,freq=2.0), product of:
              0.23485048 = queryWeight, product of:
                1.9182812 = boost
                6.563024 = idf(docFreq=163, maxDocs=42740)
                0.018654138 = queryNorm
              0.5800948 = fieldWeight in 2921, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.563024 = idf(docFreq=163, maxDocs=42740)
                0.0625 = fieldNorm(doc=2921)
          0.4018272 = weight(abstract_txt:clustering in 2921) [ClassicSimilarity], result of:
            0.4018272 = score(doc=2921,freq=6.0), product of:
              0.4219493 = queryWeight, product of:
                3.6363165 = boost
                6.220473 = idf(docFreq=230, maxDocs=42740)
                0.018654138 = queryNorm
              0.9523116 = fieldWeight in 2921, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.220473 = idf(docFreq=230, maxDocs=42740)
                0.0625 = fieldNorm(doc=2921)
        0.2 = coord(5/25)