Document (#30291)

Author
Aqeel, S.U.
Beitzel, S.M.
Jensen, E.C.
Grossman, D.
Frieder, O.
Title
On the development of name search techniques for Arabic
Source
Journal of the American Society for Information Science and Technology. 57(2006) no.6, S.728-739
Year
2006
Abstract
The need for effective identity matching systems has led to extensive research in the area of name search. For the most part, such work has been limited to English and other Latin-based languages. Consequently, algorithms such as Soundex and n-gram matching are of limited utility for languages such as Arabic, which has vastly different morphologic features that rely heavily on phonetic information. The dearth of work in this field is partly caused by the lack of standardized test data. Consequently, we have built a collection of 7,939 Arabic names, along with 50 training queries and 111 test queries. We use this collection to evaluate a variety of algorithms, including a derivative of Soundex tailored to Arabic (ASOUNDEX), measuring effectiveness by using standard information retrieval measures. Our results show an improvement of 70% over existing approaches.

Similar documents (author)

  1. Beitzel, S.M.; Jensen, E.C.; Chowdhury, A.; Frieder, O.; Grossman, D.: Temporal analysis of a very large topically categorized Web query log (2007) 4.78
    4.7840414 = sum of:
      4.7840414 = sum of:
        1.4290812 = weight(author_txt:jensen in 2061) [ClassicSimilarity], result of:
          1.4290812 = score(doc=2061,freq=1.0), product of:
            0.53627855 = queryWeight, product of:
              8.527396 = idf(docFreq=22, maxDocs=42740)
              0.0628889 = queryNorm
            2.6648114 = fieldWeight in 2061, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.527396 = idf(docFreq=22, maxDocs=42740)
              0.3125 = fieldNorm(doc=2061)
        1.6194109 = weight(author_txt:frieder in 2061) [ClassicSimilarity], result of:
          1.6194109 = score(doc=2061,freq=1.0), product of:
            0.5828953 = queryWeight, product of:
              1.0425576 = boost
              8.890302 = idf(docFreq=15, maxDocs=42740)
              0.0628889 = queryNorm
            2.7782192 = fieldWeight in 2061, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.890302 = idf(docFreq=15, maxDocs=42740)
              0.3125 = fieldNorm(doc=2061)
        1.7355493 = weight(author_txt:grossman in 2061) [ClassicSimilarity], result of:
          1.7355493 = score(doc=2061,freq=1.0), product of:
            0.61044115 = queryWeight, product of:
              1.0669073 = boost
              9.097941 = idf(docFreq=12, maxDocs=42740)
              0.0628889 = queryNorm
            2.8431067 = fieldWeight in 2061, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              9.097941 = idf(docFreq=12, maxDocs=42740)
              0.3125 = fieldNorm(doc=2061)
    
  2. Cathey, R.J.; Jensen, E.C.; Beitzel, S.M.; Frieder, O.; Grossman, D.: Exploiting parallelism to support scalable hierarchical clustering (2007) 4.78
    4.7840414 = sum of:
      4.7840414 = sum of:
        1.4290812 = weight(author_txt:jensen in 2449) [ClassicSimilarity], result of:
          1.4290812 = score(doc=2449,freq=1.0), product of:
            0.53627855 = queryWeight, product of:
              8.527396 = idf(docFreq=22, maxDocs=42740)
              0.0628889 = queryNorm
            2.6648114 = fieldWeight in 2449, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.527396 = idf(docFreq=22, maxDocs=42740)
              0.3125 = fieldNorm(doc=2449)
        1.6194109 = weight(author_txt:frieder in 2449) [ClassicSimilarity], result of:
          1.6194109 = score(doc=2449,freq=1.0), product of:
            0.5828953 = queryWeight, product of:
              1.0425576 = boost
              8.890302 = idf(docFreq=15, maxDocs=42740)
              0.0628889 = queryNorm
            2.7782192 = fieldWeight in 2449, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.890302 = idf(docFreq=15, maxDocs=42740)
              0.3125 = fieldNorm(doc=2449)
        1.7355493 = weight(author_txt:grossman in 2449) [ClassicSimilarity], result of:
          1.7355493 = score(doc=2449,freq=1.0), product of:
            0.61044115 = queryWeight, product of:
              1.0669073 = boost
              9.097941 = idf(docFreq=12, maxDocs=42740)
              0.0628889 = queryNorm
            2.8431067 = fieldWeight in 2449, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              9.097941 = idf(docFreq=12, maxDocs=42740)
              0.3125 = fieldNorm(doc=2449)
    
  3. Beitzel, S.M.; Jensen, E.C.; Chowdhury, A.; Grossman, D.; Frieder, O; Goharian, N.: Fusion of effective retrieval strategies in the same information retrieval system (2004) 3.83
    3.827233 = sum of:
      3.827233 = sum of:
        1.1432649 = weight(author_txt:jensen in 3503) [ClassicSimilarity], result of:
          1.1432649 = score(doc=3503,freq=1.0), product of:
            0.53627855 = queryWeight, product of:
              8.527396 = idf(docFreq=22, maxDocs=42740)
              0.0628889 = queryNorm
            2.131849 = fieldWeight in 3503, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.527396 = idf(docFreq=22, maxDocs=42740)
              0.25 = fieldNorm(doc=3503)
        1.2955288 = weight(author_txt:frieder in 3503) [ClassicSimilarity], result of:
          1.2955288 = score(doc=3503,freq=1.0), product of:
            0.5828953 = queryWeight, product of:
              1.0425576 = boost
              8.890302 = idf(docFreq=15, maxDocs=42740)
              0.0628889 = queryNorm
            2.2225754 = fieldWeight in 3503, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.890302 = idf(docFreq=15, maxDocs=42740)
              0.25 = fieldNorm(doc=3503)
        1.3884394 = weight(author_txt:grossman in 3503) [ClassicSimilarity], result of:
          1.3884394 = score(doc=3503,freq=1.0), product of:
            0.61044115 = queryWeight, product of:
              1.0669073 = boost
              9.097941 = idf(docFreq=12, maxDocs=42740)
              0.0628889 = queryNorm
            2.2744853 = fieldWeight in 3503, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              9.097941 = idf(docFreq=12, maxDocs=42740)
              0.25 = fieldNorm(doc=3503)
    
  4. Grossman, D.A.; Frieder, O.: Information retrieval : algorithms and heuristics (1998) 3.58
    3.5786242 = sum of:
      3.5786242 = product of:
        5.367936 = sum of:
          2.5910575 = weight(author_txt:frieder in 3183) [ClassicSimilarity], result of:
            2.5910575 = score(doc=3183,freq=1.0), product of:
              0.5828953 = queryWeight, product of:
                1.0425576 = boost
                8.890302 = idf(docFreq=15, maxDocs=42740)
                0.0628889 = queryNorm
              4.445151 = fieldWeight in 3183, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.890302 = idf(docFreq=15, maxDocs=42740)
                0.5 = fieldNorm(doc=3183)
          2.7768788 = weight(author_txt:grossman in 3183) [ClassicSimilarity], result of:
            2.7768788 = score(doc=3183,freq=1.0), product of:
              0.61044115 = queryWeight, product of:
                1.0669073 = boost
                9.097941 = idf(docFreq=12, maxDocs=42740)
                0.0628889 = queryNorm
              4.5489707 = fieldWeight in 3183, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.097941 = idf(docFreq=12, maxDocs=42740)
                0.5 = fieldNorm(doc=3183)
        0.6666667 = coord(2/3)
    
  5. Grossman, D.A.; Frieder, O.: Information retrieval : algorithms and heuristics (2004) 3.58
    3.5786242 = sum of:
      3.5786242 = product of:
        5.367936 = sum of:
          2.5910575 = weight(author_txt:frieder in 3487) [ClassicSimilarity], result of:
            2.5910575 = score(doc=3487,freq=1.0), product of:
              0.5828953 = queryWeight, product of:
                1.0425576 = boost
                8.890302 = idf(docFreq=15, maxDocs=42740)
                0.0628889 = queryNorm
              4.445151 = fieldWeight in 3487, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.890302 = idf(docFreq=15, maxDocs=42740)
                0.5 = fieldNorm(doc=3487)
          2.7768788 = weight(author_txt:grossman in 3487) [ClassicSimilarity], result of:
            2.7768788 = score(doc=3487,freq=1.0), product of:
              0.61044115 = queryWeight, product of:
                1.0669073 = boost
                9.097941 = idf(docFreq=12, maxDocs=42740)
                0.0628889 = queryNorm
              4.5489707 = fieldWeight in 3487, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.097941 = idf(docFreq=12, maxDocs=42740)
                0.5 = fieldNorm(doc=3487)
        0.6666667 = coord(2/3)
    

Similar documents (content)

  1. Kanan, T.; Fox, E.A.: Automated arabic text classification with P-Stemmer, machine learning, and a tailored news article taxonomy (2016) 0.23
    0.23421086 = sum of:
      0.23421086 = product of:
        0.9758786 = sum of:
          0.07093971 = weight(abstract_txt:tailored in 5152) [ClassicSimilarity], result of:
            0.07093971 = score(doc=5152,freq=1.0), product of:
              0.14996743 = queryWeight, product of:
                1.1099684 = boost
                7.568546 = idf(docFreq=59, maxDocs=42740)
                0.017851466 = queryNorm
              0.4730341 = fieldWeight in 5152, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.568546 = idf(docFreq=59, maxDocs=42740)
                0.0625 = fieldNorm(doc=5152)
          0.018341249 = weight(abstract_txt:work in 5152) [ClassicSimilarity], result of:
            0.018341249 = score(doc=5152,freq=1.0), product of:
              0.07668315 = queryWeight, product of:
                1.1224761 = boost
                3.826916 = idf(docFreq=2529, maxDocs=42740)
                0.017851466 = queryNorm
              0.23918225 = fieldWeight in 5152, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.826916 = idf(docFreq=2529, maxDocs=42740)
                0.0625 = fieldNorm(doc=5152)
          0.03315123 = weight(abstract_txt:collection in 5152) [ClassicSimilarity], result of:
            0.03315123 = score(doc=5152,freq=1.0), product of:
              0.11378381 = queryWeight, product of:
                1.3673112 = boost
                4.661645 = idf(docFreq=1097, maxDocs=42740)
                0.017851466 = queryNorm
              0.2913528 = fieldWeight in 5152, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.661645 = idf(docFreq=1097, maxDocs=42740)
                0.0625 = fieldNorm(doc=5152)
          0.0426103 = weight(abstract_txt:test in 5152) [ClassicSimilarity], result of:
            0.0426103 = score(doc=5152,freq=1.0), product of:
              0.13451077 = queryWeight, product of:
                1.4866395 = boost
                5.068477 = idf(docFreq=730, maxDocs=42740)
                0.017851466 = queryNorm
              0.31677982 = fieldWeight in 5152, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.068477 = idf(docFreq=730, maxDocs=42740)
                0.0625 = fieldNorm(doc=5152)
          0.055068173 = weight(abstract_txt:limited in 5152) [ClassicSimilarity], result of:
            0.055068173 = score(doc=5152,freq=1.0), product of:
              0.15959322 = queryWeight, product of:
                1.6193262 = boost
                5.520853 = idf(docFreq=464, maxDocs=42740)
                0.017851466 = queryNorm
              0.34505332 = fieldWeight in 5152, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.520853 = idf(docFreq=464, maxDocs=42740)
                0.0625 = fieldNorm(doc=5152)
          0.75576794 = weight(abstract_txt:arabic in 5152) [ClassicSimilarity], result of:
            0.75576794 = score(doc=5152,freq=7.0), product of:
              0.6025369 = queryWeight, product of:
                4.4497333 = boost
                7.585353 = idf(docFreq=58, maxDocs=42740)
                0.017851466 = queryNorm
              1.2543098 = fieldWeight in 5152, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                7.585353 = idf(docFreq=58, maxDocs=42740)
                0.0625 = fieldNorm(doc=5152)
        0.24 = coord(6/25)
    
  2. Bellaachia, A.; Amor-Tijani, G.: Proper nouns in English-Arabic cross language information retrieval (2008) 0.15
    0.14864503 = sum of:
      0.14864503 = product of:
        0.7432251 = sum of:
          0.1450686 = weight(abstract_txt:gram in 4373) [ClassicSimilarity], result of:
            0.1450686 = score(doc=4373,freq=3.0), product of:
              0.16752484 = queryWeight, product of:
                1.1731452 = boost
                7.999329 = idf(docFreq=38, maxDocs=42740)
                0.017851466 = queryNorm
              0.8659528 = fieldWeight in 4373, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.999329 = idf(docFreq=38, maxDocs=42740)
                0.0625 = fieldNorm(doc=4373)
          0.13013309 = weight(abstract_txt:phonetic in 4373) [ClassicSimilarity], result of:
            0.13013309 = score(doc=4373,freq=1.0), product of:
              0.22473076 = queryWeight, product of:
                1.358762 = boost
                9.264996 = idf(docFreq=10, maxDocs=42740)
                0.017851466 = queryNorm
              0.5790622 = fieldWeight in 4373, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.264996 = idf(docFreq=10, maxDocs=42740)
                0.0625 = fieldNorm(doc=4373)
          0.079335876 = weight(abstract_txt:languages in 4373) [ClassicSimilarity], result of:
            0.079335876 = score(doc=4373,freq=3.0), product of:
              0.14115188 = queryWeight, product of:
                1.5228968 = boost
                5.192091 = idf(docFreq=645, maxDocs=42740)
                0.017851466 = queryNorm
              0.56206036 = fieldWeight in 4373, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.192091 = idf(docFreq=645, maxDocs=42740)
                0.0625 = fieldNorm(doc=4373)
          0.103034094 = weight(abstract_txt:matching in 4373) [ClassicSimilarity], result of:
            0.103034094 = score(doc=4373,freq=2.0), product of:
              0.19233485 = queryWeight, product of:
                1.7776904 = boost
                6.060772 = idf(docFreq=270, maxDocs=42740)
                0.017851466 = queryNorm
              0.53570163 = fieldWeight in 4373, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.060772 = idf(docFreq=270, maxDocs=42740)
                0.0625 = fieldNorm(doc=4373)
          0.28565344 = weight(abstract_txt:arabic in 4373) [ClassicSimilarity], result of:
            0.28565344 = score(doc=4373,freq=1.0), product of:
              0.6025369 = queryWeight, product of:
                4.4497333 = boost
                7.585353 = idf(docFreq=58, maxDocs=42740)
                0.017851466 = queryNorm
              0.47408456 = fieldWeight in 4373, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.585353 = idf(docFreq=58, maxDocs=42740)
                0.0625 = fieldNorm(doc=4373)
        0.2 = coord(5/25)
    
  3. Rushdi-Saleh, M.; Martín-Valdivia, M.T.; Ureña-López, L.A.; Perea-Ortega, J.M.: OCA: Opinion corpus for Arabic (2011) 0.15
    0.14687082 = sum of:
      0.14687082 = product of:
        0.7343541 = sum of:
          0.025221426 = weight(abstract_txt:such in 1361) [ClassicSimilarity], result of:
            0.025221426 = score(doc=1361,freq=1.0), product of:
              0.09354434 = queryWeight, product of:
                1.5183837 = boost
                3.451136 = idf(docFreq=3683, maxDocs=42740)
                0.017851466 = queryNorm
              0.26962 = fieldWeight in 1361, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.451136 = idf(docFreq=3683, maxDocs=42740)
                0.078125 = fieldNorm(doc=1361)
          0.057255734 = weight(abstract_txt:languages in 1361) [ClassicSimilarity], result of:
            0.057255734 = score(doc=1361,freq=1.0), product of:
              0.14115188 = queryWeight, product of:
                1.5228968 = boost
                5.192091 = idf(docFreq=645, maxDocs=42740)
                0.017851466 = queryNorm
              0.4056321 = fieldWeight in 1361, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.192091 = idf(docFreq=645, maxDocs=42740)
                0.078125 = fieldNorm(doc=1361)
          0.068835214 = weight(abstract_txt:limited in 1361) [ClassicSimilarity], result of:
            0.068835214 = score(doc=1361,freq=1.0), product of:
              0.15959322 = queryWeight, product of:
                1.6193262 = boost
                5.520853 = idf(docFreq=464, maxDocs=42740)
                0.017851466 = queryNorm
              0.43131664 = fieldWeight in 1361, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.520853 = idf(docFreq=464, maxDocs=42740)
                0.078125 = fieldNorm(doc=1361)
          0.07807292 = weight(abstract_txt:algorithms in 1361) [ClassicSimilarity], result of:
            0.07807292 = score(doc=1361,freq=1.0), product of:
              0.17356984 = queryWeight, product of:
                1.6887459 = boost
                5.757529 = idf(docFreq=366, maxDocs=42740)
                0.017851466 = queryNorm
              0.44980693 = fieldWeight in 1361, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.757529 = idf(docFreq=366, maxDocs=42740)
                0.078125 = fieldNorm(doc=1361)
          0.50496876 = weight(abstract_txt:arabic in 1361) [ClassicSimilarity], result of:
            0.50496876 = score(doc=1361,freq=2.0), product of:
              0.6025369 = queryWeight, product of:
                4.4497333 = boost
                7.585353 = idf(docFreq=58, maxDocs=42740)
                0.017851466 = queryNorm
              0.83807105 = fieldWeight in 1361, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.585353 = idf(docFreq=58, maxDocs=42740)
                0.078125 = fieldNorm(doc=1361)
        0.2 = coord(5/25)
    
  4. Shaalan, K.; Raza, H.: NERA: Named Entity Recognition for Arabic (2009) 0.14
    0.14264794 = sum of:
      0.14264794 = product of:
        0.89154965 = sum of:
          0.040079013 = weight(abstract_txt:languages in 4954) [ClassicSimilarity], result of:
            0.040079013 = score(doc=4954,freq=1.0), product of:
              0.14115188 = queryWeight, product of:
                1.5228968 = boost
                5.192091 = idf(docFreq=645, maxDocs=42740)
                0.017851466 = queryNorm
              0.28394246 = fieldWeight in 4954, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.192091 = idf(docFreq=645, maxDocs=42740)
                0.0546875 = fieldNorm(doc=4954)
          0.06814338 = weight(abstract_txt:limited in 4954) [ClassicSimilarity], result of:
            0.06814338 = score(doc=4954,freq=2.0), product of:
              0.15959322 = queryWeight, product of:
                1.6193262 = boost
                5.520853 = idf(docFreq=464, maxDocs=42740)
                0.017851466 = queryNorm
              0.4269817 = fieldWeight in 4954, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.520853 = idf(docFreq=464, maxDocs=42740)
                0.0546875 = fieldNorm(doc=4954)
          0.12203027 = weight(abstract_txt:name in 4954) [ClassicSimilarity], result of:
            0.12203027 = score(doc=4954,freq=5.0), product of:
              0.17340583 = queryWeight, product of:
                1.6879476 = boost
                5.7548075 = idf(docFreq=367, maxDocs=42740)
                0.017851466 = queryNorm
              0.7037265 = fieldWeight in 4954, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.7548075 = idf(docFreq=367, maxDocs=42740)
                0.0546875 = fieldNorm(doc=4954)
          0.66129696 = weight(abstract_txt:arabic in 4954) [ClassicSimilarity], result of:
            0.66129696 = score(doc=4954,freq=7.0), product of:
              0.6025369 = queryWeight, product of:
                4.4497333 = boost
                7.585353 = idf(docFreq=58, maxDocs=42740)
                0.017851466 = queryNorm
              1.0975211 = fieldWeight in 4954, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                7.585353 = idf(docFreq=58, maxDocs=42740)
                0.0546875 = fieldNorm(doc=4954)
        0.16 = coord(4/25)
    
  5. Kanaan, G.; Al-Shalabi, R.; Ghwanmeh, S.; Al-Ma'adeed, H.: ¬A comparison of text-classification techniques applied to Arabic text (2009) 0.13
    0.13417402 = sum of:
      0.13417402 = product of:
        1.1181169 = sum of:
          0.027511872 = weight(abstract_txt:work in 97) [ClassicSimilarity], result of:
            0.027511872 = score(doc=97,freq=1.0), product of:
              0.07668315 = queryWeight, product of:
                1.1224761 = boost
                3.826916 = idf(docFreq=2529, maxDocs=42740)
                0.017851466 = queryNorm
              0.35877338 = fieldWeight in 97, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.826916 = idf(docFreq=2529, maxDocs=42740)
                0.09375 = fieldNorm(doc=97)
          0.13249412 = weight(abstract_txt:algorithms in 97) [ClassicSimilarity], result of:
            0.13249412 = score(doc=97,freq=2.0), product of:
              0.17356984 = queryWeight, product of:
                1.6887459 = boost
                5.757529 = idf(docFreq=366, maxDocs=42740)
                0.017851466 = queryNorm
              0.7633476 = fieldWeight in 97, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.757529 = idf(docFreq=366, maxDocs=42740)
                0.09375 = fieldNorm(doc=97)
          0.9581108 = weight(abstract_txt:arabic in 97) [ClassicSimilarity], result of:
            0.9581108 = score(doc=97,freq=5.0), product of:
              0.6025369 = queryWeight, product of:
                4.4497333 = boost
                7.585353 = idf(docFreq=58, maxDocs=42740)
                0.017851466 = queryNorm
              1.590128 = fieldWeight in 97, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.585353 = idf(docFreq=58, maxDocs=42740)
                0.09375 = fieldNorm(doc=97)
        0.12 = coord(3/25)