Document (#26421)

Author
Navarro, G.
Baeza-Yates, R.
Azevedo Arcoverde, J.M.
Title
Matchsimile : a flexible approximate matching tool for searching proper names
Source
Journal of the American Society for Information Science and technology. 54(2003) no.1, S.3-15
Year
2003
Abstract
We present the architecture and algorithms behind Matchsimile, an approximate string matching lookup tool especially designed for extracting person and company names from large texts. Part of a larger information extraction environment, this specific engine receives a large set of proper names to search for, a text to search, and search options; and outputs all the occurrences of the names found in the text. Beyond the similarity search capabilities applied at the intraword level, the tool considers a set of specific person name formation rules at the word level, such as combination, abbreviation, duplicity detections, ordering, word omission and insertion, among others. This engine is used in a successful commercial application (also named Matchsimile), which allows searching for lawyer names in official law publications.

Similar documents (author)

  1. Baeza-Yates, R.; Navarro, G.: Block addressing indices for approximate text retrieval (2000) 4.09
    4.093771 = sum of:
      4.093771 = product of:
        5.4583616 = sum of:
          1.7455504 = weight(author_txt:navarro in 4295) [ClassicSimilarity], result of:
            1.7455504 = score(doc=4295,freq=1.0), product of:
              0.455853 = queryWeight, product of:
                8.752448 = idf(docFreq=18, maxDocs=44218)
                0.05208291 = queryNorm
              3.829196 = fieldWeight in 4295, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.752448 = idf(docFreq=18, maxDocs=44218)
                0.4375 = fieldNorm(doc=4295)
          1.7780997 = weight(author_txt:yates in 4295) [ClassicSimilarity], result of:
            1.7780997 = score(doc=4295,freq=1.0), product of:
              0.46150234 = queryWeight, product of:
                1.0061774 = boost
                8.806516 = idf(docFreq=17, maxDocs=44218)
                0.05208291 = queryNorm
              3.8528507 = fieldWeight in 4295, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.806516 = idf(docFreq=17, maxDocs=44218)
                0.4375 = fieldNorm(doc=4295)
          1.9347117 = weight(author_txt:baeza in 4295) [ClassicSimilarity], result of:
            1.9347117 = score(doc=4295,freq=1.0), product of:
              0.48821828 = queryWeight, product of:
                1.034891 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.05208291 = queryNorm
              3.9628005 = fieldWeight in 4295, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.4375 = fieldNorm(doc=4295)
        0.75 = coord(3/4)
    
  2. Baeza-Yates, R.; Navarro, G.: XQL and proximal nodes (2002) 4.09
    4.093771 = sum of:
      4.093771 = product of:
        5.4583616 = sum of:
          1.7455504 = weight(author_txt:navarro in 454) [ClassicSimilarity], result of:
            1.7455504 = score(doc=454,freq=1.0), product of:
              0.455853 = queryWeight, product of:
                8.752448 = idf(docFreq=18, maxDocs=44218)
                0.05208291 = queryNorm
              3.829196 = fieldWeight in 454, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.752448 = idf(docFreq=18, maxDocs=44218)
                0.4375 = fieldNorm(doc=454)
          1.7780997 = weight(author_txt:yates in 454) [ClassicSimilarity], result of:
            1.7780997 = score(doc=454,freq=1.0), product of:
              0.46150234 = queryWeight, product of:
                1.0061774 = boost
                8.806516 = idf(docFreq=17, maxDocs=44218)
                0.05208291 = queryNorm
              3.8528507 = fieldWeight in 454, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.806516 = idf(docFreq=17, maxDocs=44218)
                0.4375 = fieldNorm(doc=454)
          1.9347117 = weight(author_txt:baeza in 454) [ClassicSimilarity], result of:
            1.9347117 = score(doc=454,freq=1.0), product of:
              0.48821828 = queryWeight, product of:
                1.034891 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.05208291 = queryNorm
              3.9628005 = fieldWeight in 454, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.4375 = fieldNorm(doc=454)
        0.75 = coord(3/4)
    
  3. Baeza-Yates, R.A.: Introduction to data structures and algorithms related to information retrieval (1992) 2.12
    2.1216063 = sum of:
      2.1216063 = product of:
        4.2432127 = sum of:
          2.0321138 = weight(author_txt:yates in 3082) [ClassicSimilarity], result of:
            2.0321138 = score(doc=3082,freq=1.0), product of:
              0.46150234 = queryWeight, product of:
                1.0061774 = boost
                8.806516 = idf(docFreq=17, maxDocs=44218)
                0.05208291 = queryNorm
              4.403258 = fieldWeight in 3082, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.806516 = idf(docFreq=17, maxDocs=44218)
                0.5 = fieldNorm(doc=3082)
          2.2110991 = weight(author_txt:baeza in 3082) [ClassicSimilarity], result of:
            2.2110991 = score(doc=3082,freq=1.0), product of:
              0.48821828 = queryWeight, product of:
                1.034891 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.05208291 = queryNorm
              4.528915 = fieldWeight in 3082, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.5 = fieldNorm(doc=3082)
        0.5 = coord(2/4)
    
  4. Baeza-Yates, R.A.: String searching algorithms (1992) 2.12
    2.1216063 = sum of:
      2.1216063 = product of:
        4.2432127 = sum of:
          2.0321138 = weight(author_txt:yates in 3505) [ClassicSimilarity], result of:
            2.0321138 = score(doc=3505,freq=1.0), product of:
              0.46150234 = queryWeight, product of:
                1.0061774 = boost
                8.806516 = idf(docFreq=17, maxDocs=44218)
                0.05208291 = queryNorm
              4.403258 = fieldWeight in 3505, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.806516 = idf(docFreq=17, maxDocs=44218)
                0.5 = fieldNorm(doc=3505)
          2.2110991 = weight(author_txt:baeza in 3505) [ClassicSimilarity], result of:
            2.2110991 = score(doc=3505,freq=1.0), product of:
              0.48821828 = queryWeight, product of:
                1.034891 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.05208291 = queryNorm
              4.528915 = fieldWeight in 3505, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.5 = fieldNorm(doc=3505)
        0.5 = coord(2/4)
    
  5. Castillo, C.; Baeza-Yates, R.: Web retrieval and mining (2009) 1.86
    1.8564057 = sum of:
      1.8564057 = product of:
        3.7128115 = sum of:
          1.7780997 = weight(author_txt:yates in 3904) [ClassicSimilarity], result of:
            1.7780997 = score(doc=3904,freq=1.0), product of:
              0.46150234 = queryWeight, product of:
                1.0061774 = boost
                8.806516 = idf(docFreq=17, maxDocs=44218)
                0.05208291 = queryNorm
              3.8528507 = fieldWeight in 3904, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.806516 = idf(docFreq=17, maxDocs=44218)
                0.4375 = fieldNorm(doc=3904)
          1.9347117 = weight(author_txt:baeza in 3904) [ClassicSimilarity], result of:
            1.9347117 = score(doc=3904,freq=1.0), product of:
              0.48821828 = queryWeight, product of:
                1.034891 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.05208291 = queryNorm
              3.9628005 = fieldWeight in 3904, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.4375 = fieldNorm(doc=3904)
        0.5 = coord(2/4)
    

Similar documents (content)

  1. Lutz, R.; Green, S.: Data stewardship : the care and handling of named entries (1999) 0.24
    0.24342442 = sum of:
      0.24342442 = product of:
        0.7607013 = sum of:
          0.019571027 = weight(abstract_txt:searching in 6710) [ClassicSimilarity], result of:
            0.019571027 = score(doc=6710,freq=1.0), product of:
              0.097442664 = queryWeight, product of:
                1.235733 = boost
                4.284727 = idf(docFreq=1655, maxDocs=44218)
                0.018403538 = queryNorm
              0.20084658 = fieldWeight in 6710, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.284727 = idf(docFreq=1655, maxDocs=44218)
                0.046875 = fieldNorm(doc=6710)
          0.028251534 = weight(abstract_txt:specific in 6710) [ClassicSimilarity], result of:
            0.028251534 = score(doc=6710,freq=2.0), product of:
              0.0987851 = queryWeight, product of:
                1.2442161 = boost
                4.314141 = idf(docFreq=1607, maxDocs=44218)
                0.018403538 = queryNorm
              0.28598982 = fieldWeight in 6710, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.314141 = idf(docFreq=1607, maxDocs=44218)
                0.046875 = fieldNorm(doc=6710)
          0.021984713 = weight(abstract_txt:large in 6710) [ClassicSimilarity], result of:
            0.021984713 = score(doc=6710,freq=1.0), product of:
              0.10529812 = queryWeight, product of:
                1.2845777 = boost
                4.454089 = idf(docFreq=1397, maxDocs=44218)
                0.018403538 = queryNorm
              0.20878543 = fieldWeight in 6710, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.454089 = idf(docFreq=1397, maxDocs=44218)
                0.046875 = fieldNorm(doc=6710)
          0.039951913 = weight(abstract_txt:word in 6710) [ClassicSimilarity], result of:
            0.039951913 = score(doc=6710,freq=1.0), product of:
              0.15680689 = queryWeight, product of:
                1.5675906 = boost
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.018403538 = queryNorm
              0.25478417 = fieldWeight in 6710, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.046875 = fieldNorm(doc=6710)
          0.06341233 = weight(abstract_txt:person in 6710) [ClassicSimilarity], result of:
            0.06341233 = score(doc=6710,freq=1.0), product of:
              0.21336469 = queryWeight, product of:
                1.828569 = boost
                6.340301 = idf(docFreq=211, maxDocs=44218)
                0.018403538 = queryNorm
              0.2972016 = fieldWeight in 6710, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.340301 = idf(docFreq=211, maxDocs=44218)
                0.046875 = fieldNorm(doc=6710)
          0.06934303 = weight(abstract_txt:proper in 6710) [ClassicSimilarity], result of:
            0.06934303 = score(doc=6710,freq=1.0), product of:
              0.22646892 = queryWeight, product of:
                1.883885 = boost
                6.532101 = idf(docFreq=174, maxDocs=44218)
                0.018403538 = queryNorm
              0.30619225 = fieldWeight in 6710, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.532101 = idf(docFreq=174, maxDocs=44218)
                0.046875 = fieldNorm(doc=6710)
          0.024357004 = weight(abstract_txt:search in 6710) [ClassicSimilarity], result of:
            0.024357004 = score(doc=6710,freq=1.0), product of:
              0.1420472 = queryWeight, product of:
                2.109995 = boost
                3.6580524 = idf(docFreq=3098, maxDocs=44218)
                0.018403538 = queryNorm
              0.17147121 = fieldWeight in 6710, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6580524 = idf(docFreq=3098, maxDocs=44218)
                0.046875 = fieldNorm(doc=6710)
          0.49382976 = weight(abstract_txt:names in 6710) [ClassicSimilarity], result of:
            0.49382976 = score(doc=6710,freq=16.0), product of:
              0.45150745 = queryWeight, product of:
                4.205838 = boost
                5.8332562 = idf(docFreq=351, maxDocs=44218)
                0.018403538 = queryNorm
              1.0937356 = fieldWeight in 6710, product of:
                4.0 = tf(freq=16.0), with freq of:
                  16.0 = termFreq=16.0
                5.8332562 = idf(docFreq=351, maxDocs=44218)
                0.046875 = fieldNorm(doc=6710)
        0.32 = coord(8/25)
    
  2. Baeza-Yates, R.; Navarro, G.: Block addressing indices for approximate text retrieval (2000) 0.21
    0.20618528 = sum of:
      0.20618528 = product of:
        0.644329 = sum of:
          0.06126838 = weight(abstract_txt:string in 4295) [ClassicSimilarity], result of:
            0.06126838 = score(doc=4295,freq=1.0), product of:
              0.13662449 = queryWeight, product of:
                1.034664 = boost
                7.1750984 = idf(docFreq=91, maxDocs=44218)
                0.018403538 = queryNorm
              0.44844365 = fieldWeight in 4295, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.1750984 = idf(docFreq=91, maxDocs=44218)
                0.0625 = fieldNorm(doc=4295)
          0.05373397 = weight(abstract_txt:text in 4295) [ClassicSimilarity], result of:
            0.05373397 = score(doc=4295,freq=6.0), product of:
              0.08679535 = queryWeight, product of:
                1.1662679 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.018403538 = queryNorm
              0.6190881 = fieldWeight in 4295, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=4295)
          0.04519735 = weight(abstract_txt:searching in 4295) [ClassicSimilarity], result of:
            0.04519735 = score(doc=4295,freq=3.0), product of:
              0.097442664 = queryWeight, product of:
                1.235733 = boost
                4.284727 = idf(docFreq=1655, maxDocs=44218)
                0.018403538 = queryNorm
              0.4638353 = fieldWeight in 4295, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.284727 = idf(docFreq=1655, maxDocs=44218)
                0.0625 = fieldNorm(doc=4295)
          0.02931295 = weight(abstract_txt:large in 4295) [ClassicSimilarity], result of:
            0.02931295 = score(doc=4295,freq=1.0), product of:
              0.10529812 = queryWeight, product of:
                1.2845777 = boost
                4.454089 = idf(docFreq=1397, maxDocs=44218)
                0.018403538 = queryNorm
              0.27838057 = fieldWeight in 4295, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.454089 = idf(docFreq=1397, maxDocs=44218)
                0.0625 = fieldNorm(doc=4295)
          0.05326922 = weight(abstract_txt:word in 4295) [ClassicSimilarity], result of:
            0.05326922 = score(doc=4295,freq=1.0), product of:
              0.15680689 = queryWeight, product of:
                1.5675906 = boost
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.018403538 = queryNorm
              0.33971223 = fieldWeight in 4295, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.0625 = fieldNorm(doc=4295)
          0.073383674 = weight(abstract_txt:matching in 4295) [ClassicSimilarity], result of:
            0.073383674 = score(doc=4295,freq=1.0), product of:
              0.1941395 = queryWeight, product of:
                1.7442431 = boost
                6.047913 = idf(docFreq=283, maxDocs=44218)
                0.018403538 = queryNorm
              0.37799457 = fieldWeight in 4295, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.047913 = idf(docFreq=283, maxDocs=44218)
                0.0625 = fieldNorm(doc=4295)
          0.04592801 = weight(abstract_txt:search in 4295) [ClassicSimilarity], result of:
            0.04592801 = score(doc=4295,freq=2.0), product of:
              0.1420472 = queryWeight, product of:
                2.109995 = boost
                3.6580524 = idf(docFreq=3098, maxDocs=44218)
                0.018403538 = queryNorm
              0.3233292 = fieldWeight in 4295, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.6580524 = idf(docFreq=3098, maxDocs=44218)
                0.0625 = fieldNorm(doc=4295)
          0.28223544 = weight(abstract_txt:approximate in 4295) [ClassicSimilarity], result of:
            0.28223544 = score(doc=4295,freq=3.0), product of:
              0.33043158 = queryWeight, product of:
                2.2755735 = boost
                7.890225 = idf(docFreq=44, maxDocs=44218)
                0.018403538 = queryNorm
              0.8541419 = fieldWeight in 4295, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.890225 = idf(docFreq=44, maxDocs=44218)
                0.0625 = fieldNorm(doc=4295)
        0.32 = coord(8/25)
    
  3. Järvelin, A.; Keskustalo, H.; Sormunen, E.; Saastamoinen, M.; Kettunen, K.: Information retrieval from historical newspaper collections in highly inflectional languages : a query expansion approach (2016) 0.19
    0.1885005 = sum of:
      0.1885005 = product of:
        0.78541875 = sum of:
          0.12253676 = weight(abstract_txt:string in 3223) [ClassicSimilarity], result of:
            0.12253676 = score(doc=3223,freq=4.0), product of:
              0.13662449 = queryWeight, product of:
                1.034664 = boost
                7.1750984 = idf(docFreq=91, maxDocs=44218)
                0.018403538 = queryNorm
              0.8968873 = fieldWeight in 3223, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.1750984 = idf(docFreq=91, maxDocs=44218)
                0.0625 = fieldNorm(doc=3223)
          0.06776468 = weight(abstract_txt:occurrences in 3223) [ClassicSimilarity], result of:
            0.06776468 = score(doc=3223,freq=1.0), product of:
              0.14611894 = queryWeight, product of:
                1.0700113 = boost
                7.4202213 = idf(docFreq=71, maxDocs=44218)
                0.018403538 = queryNorm
              0.46376383 = fieldWeight in 3223, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.4202213 = idf(docFreq=71, maxDocs=44218)
                0.0625 = fieldNorm(doc=3223)
          0.030187584 = weight(abstract_txt:level in 3223) [ClassicSimilarity], result of:
            0.030187584 = score(doc=3223,freq=1.0), product of:
              0.10738242 = queryWeight, product of:
                1.297229 = boost
                4.497956 = idf(docFreq=1337, maxDocs=44218)
                0.018403538 = queryNorm
              0.28112224 = fieldWeight in 3223, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.497956 = idf(docFreq=1337, maxDocs=44218)
                0.0625 = fieldNorm(doc=3223)
          0.09226499 = weight(abstract_txt:word in 3223) [ClassicSimilarity], result of:
            0.09226499 = score(doc=3223,freq=3.0), product of:
              0.15680689 = queryWeight, product of:
                1.5675906 = boost
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.018403538 = queryNorm
              0.5883988 = fieldWeight in 3223, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.0625 = fieldNorm(doc=3223)
          0.14676735 = weight(abstract_txt:matching in 3223) [ClassicSimilarity], result of:
            0.14676735 = score(doc=3223,freq=4.0), product of:
              0.1941395 = queryWeight, product of:
                1.7442431 = boost
                6.047913 = idf(docFreq=283, maxDocs=44218)
                0.018403538 = queryNorm
              0.75598913 = fieldWeight in 3223, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.047913 = idf(docFreq=283, maxDocs=44218)
                0.0625 = fieldNorm(doc=3223)
          0.32589743 = weight(abstract_txt:approximate in 3223) [ClassicSimilarity], result of:
            0.32589743 = score(doc=3223,freq=4.0), product of:
              0.33043158 = queryWeight, product of:
                2.2755735 = boost
                7.890225 = idf(docFreq=44, maxDocs=44218)
                0.018403538 = queryNorm
              0.9862781 = fieldWeight in 3223, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.890225 = idf(docFreq=44, maxDocs=44218)
                0.0625 = fieldNorm(doc=3223)
        0.24 = coord(6/25)
    
  4. Mustafa, S.H.: Word-oriented approximate string matching using occurrence heuristic tables : a heuristic for searching Arabic text (2005) 0.19
    0.1869566 = sum of:
      0.1869566 = product of:
        0.66770214 = sum of:
          0.07658548 = weight(abstract_txt:string in 1715) [ClassicSimilarity], result of:
            0.07658548 = score(doc=1715,freq=1.0), product of:
              0.13662449 = queryWeight, product of:
                1.034664 = boost
                7.1750984 = idf(docFreq=91, maxDocs=44218)
                0.018403538 = queryNorm
              0.56055456 = fieldWeight in 1715, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.1750984 = idf(docFreq=91, maxDocs=44218)
                0.078125 = fieldNorm(doc=1715)
          0.084705845 = weight(abstract_txt:occurrences in 1715) [ClassicSimilarity], result of:
            0.084705845 = score(doc=1715,freq=1.0), product of:
              0.14611894 = queryWeight, product of:
                1.0700113 = boost
                7.4202213 = idf(docFreq=71, maxDocs=44218)
                0.018403538 = queryNorm
              0.57970476 = fieldWeight in 1715, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.4202213 = idf(docFreq=71, maxDocs=44218)
                0.078125 = fieldNorm(doc=1715)
          0.027421003 = weight(abstract_txt:text in 1715) [ClassicSimilarity], result of:
            0.027421003 = score(doc=1715,freq=1.0), product of:
              0.08679535 = queryWeight, product of:
                1.1662679 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.018403538 = queryNorm
              0.3159271 = fieldWeight in 1715, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=1715)
          0.032618377 = weight(abstract_txt:searching in 1715) [ClassicSimilarity], result of:
            0.032618377 = score(doc=1715,freq=1.0), product of:
              0.097442664 = queryWeight, product of:
                1.235733 = boost
                4.284727 = idf(docFreq=1655, maxDocs=44218)
                0.018403538 = queryNorm
              0.3347443 = fieldWeight in 1715, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.284727 = idf(docFreq=1655, maxDocs=44218)
                0.078125 = fieldNorm(doc=1715)
          0.066586524 = weight(abstract_txt:word in 1715) [ClassicSimilarity], result of:
            0.066586524 = score(doc=1715,freq=1.0), product of:
              0.15680689 = queryWeight, product of:
                1.5675906 = boost
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.018403538 = queryNorm
              0.4246403 = fieldWeight in 1715, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.078125 = fieldNorm(doc=1715)
          0.09172959 = weight(abstract_txt:matching in 1715) [ClassicSimilarity], result of:
            0.09172959 = score(doc=1715,freq=1.0), product of:
              0.1941395 = queryWeight, product of:
                1.7442431 = boost
                6.047913 = idf(docFreq=283, maxDocs=44218)
                0.018403538 = queryNorm
              0.4724932 = fieldWeight in 1715, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.047913 = idf(docFreq=283, maxDocs=44218)
                0.078125 = fieldNorm(doc=1715)
          0.28805533 = weight(abstract_txt:approximate in 1715) [ClassicSimilarity], result of:
            0.28805533 = score(doc=1715,freq=2.0), product of:
              0.33043158 = queryWeight, product of:
                2.2755735 = boost
                7.890225 = idf(docFreq=44, maxDocs=44218)
                0.018403538 = queryNorm
              0.8717549 = fieldWeight in 1715, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.890225 = idf(docFreq=44, maxDocs=44218)
                0.078125 = fieldNorm(doc=1715)
        0.28 = coord(7/25)
    
  5. Thelwall, M.: Text characteristics of English language university Web sites (2005) 0.18
    0.18364188 = sum of:
      0.18364188 = product of:
        0.7651745 = sum of:
          0.033294752 = weight(abstract_txt:specific in 3463) [ClassicSimilarity], result of:
            0.033294752 = score(doc=3463,freq=1.0), product of:
              0.0987851 = queryWeight, product of:
                1.2442161 = boost
                4.314141 = idf(docFreq=1607, maxDocs=44218)
                0.018403538 = queryNorm
              0.33704224 = fieldWeight in 3463, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.314141 = idf(docFreq=1607, maxDocs=44218)
                0.078125 = fieldNorm(doc=3463)
          0.09416756 = weight(abstract_txt:word in 3463) [ClassicSimilarity], result of:
            0.09416756 = score(doc=3463,freq=2.0), product of:
              0.15680689 = queryWeight, product of:
                1.5675906 = boost
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.018403538 = queryNorm
              0.60053205 = fieldWeight in 3463, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.078125 = fieldNorm(doc=3463)
          0.0700207 = weight(abstract_txt:engine in 3463) [ClassicSimilarity], result of:
            0.0700207 = score(doc=3463,freq=1.0), product of:
              0.16215308 = queryWeight, product of:
                1.5940894 = boost
                5.5272765 = idf(docFreq=477, maxDocs=44218)
                0.018403538 = queryNorm
              0.4318185 = fieldWeight in 3463, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5272765 = idf(docFreq=477, maxDocs=44218)
                0.078125 = fieldNorm(doc=3463)
          0.115571715 = weight(abstract_txt:proper in 3463) [ClassicSimilarity], result of:
            0.115571715 = score(doc=3463,freq=1.0), product of:
              0.22646892 = queryWeight, product of:
                1.883885 = boost
                6.532101 = idf(docFreq=174, maxDocs=44218)
                0.018403538 = queryNorm
              0.5103204 = fieldWeight in 3463, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.532101 = idf(docFreq=174, maxDocs=44218)
                0.078125 = fieldNorm(doc=3463)
          0.040595006 = weight(abstract_txt:search in 3463) [ClassicSimilarity], result of:
            0.040595006 = score(doc=3463,freq=1.0), product of:
              0.1420472 = queryWeight, product of:
                2.109995 = boost
                3.6580524 = idf(docFreq=3098, maxDocs=44218)
                0.018403538 = queryNorm
              0.28578535 = fieldWeight in 3463, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6580524 = idf(docFreq=3098, maxDocs=44218)
                0.078125 = fieldNorm(doc=3463)
          0.41152477 = weight(abstract_txt:names in 3463) [ClassicSimilarity], result of:
            0.41152477 = score(doc=3463,freq=4.0), product of:
              0.45150745 = queryWeight, product of:
                4.205838 = boost
                5.8332562 = idf(docFreq=351, maxDocs=44218)
                0.018403538 = queryNorm
              0.9114463 = fieldWeight in 3463, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.8332562 = idf(docFreq=351, maxDocs=44218)
                0.078125 = fieldNorm(doc=3463)
        0.24 = coord(6/25)