Document (#23336)

Author
Wollf, J.G.
Title
¬A scalable technique for best-match retrieval of sequential information using metrics-guided search
Source
Journal of information science. 20(1994) no.1, S.16-28
Year
1994
Abstract
Describes a new technique for retrieving information by finding the best match or matches between a textual query and a textual database. The technique uses principles of beam search with a measure of probability to guide the search and prune the search tree. Unlike many methods for comparing strings, the method gives a set of alternative matches, graded by the quality of the matching. The new technique is embodies in a software simulation SP21 which runs on a conventional computer. Presnts examples showing best-match retrieval of information from a textual database. Presents analytic and emprirical evidence on the performance of the technique. It lends itself well to parallel processing. Discusses planned developments
Theme
Retrievalalgorithmen

Similar documents (content)

  1. Loughran, H.: ¬A review of nearest neighbour information retrieval (1994) 0.20
    0.19730894 = sum of:
      0.19730894 = product of:
        0.8221206 = sum of:
          0.029043714 = weight(abstract_txt:retrieval in 685) [ClassicSimilarity], result of:
            0.029043714 = score(doc=685,freq=1.0), product of:
              0.06706004 = queryWeight, product of:
                1.0508974 = boost
                3.4648013 = idf(docFreq=3633, maxDocs=42740)
                0.018417267 = queryNorm
              0.43310016 = fieldWeight in 685, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4648013 = idf(docFreq=3633, maxDocs=42740)
                0.125 = fieldNorm(doc=685)
          0.01503083 = weight(abstract_txt:information in 685) [ClassicSimilarity], result of:
            0.01503083 = score(doc=685,freq=1.0), product of:
              0.049482096 = queryWeight, product of:
                1.1056 = boost
                2.430104 = idf(docFreq=10226, maxDocs=42740)
                0.018417267 = queryNorm
              0.303763 = fieldWeight in 685, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.430104 = idf(docFreq=10226, maxDocs=42740)
                0.125 = fieldNorm(doc=685)
          0.06799448 = weight(abstract_txt:search in 685) [ClassicSimilarity], result of:
            0.06799448 = score(doc=685,freq=1.0), product of:
              0.14896634 = queryWeight, product of:
                2.21507 = boost
                3.6515355 = idf(docFreq=3014, maxDocs=42740)
                0.018417267 = queryNorm
              0.45644194 = fieldWeight in 685, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6515355 = idf(docFreq=3014, maxDocs=42740)
                0.125 = fieldNorm(doc=685)
          0.13410342 = weight(abstract_txt:best in 685) [ClassicSimilarity], result of:
            0.13410342 = score(doc=685,freq=1.0), product of:
              0.21285605 = queryWeight, product of:
                2.2930684 = boost
                5.040154 = idf(docFreq=751, maxDocs=42740)
                0.018417267 = queryNorm
              0.63001925 = fieldWeight in 685, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.040154 = idf(docFreq=751, maxDocs=42740)
                0.125 = fieldNorm(doc=685)
          0.27067113 = weight(abstract_txt:match in 685) [ClassicSimilarity], result of:
            0.27067113 = score(doc=685,freq=1.0), product of:
              0.33995447 = queryWeight, product of:
                2.8979065 = boost
                6.3695855 = idf(docFreq=198, maxDocs=42740)
                0.018417267 = queryNorm
              0.7961982 = fieldWeight in 685, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.3695855 = idf(docFreq=198, maxDocs=42740)
                0.125 = fieldNorm(doc=685)
          0.30527705 = weight(abstract_txt:technique in 685) [ClassicSimilarity], result of:
            0.30527705 = score(doc=685,freq=1.0), product of:
              0.43672207 = queryWeight, product of:
                4.2403426 = boost
                5.5921526 = idf(docFreq=432, maxDocs=42740)
                0.018417267 = queryNorm
              0.6990191 = fieldWeight in 685, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5921526 = idf(docFreq=432, maxDocs=42740)
                0.125 = fieldNorm(doc=685)
        0.24 = coord(6/25)
    
  2. Sakai, T.: On the reliability of information retrieval metrics based on graded relevance (2007) 0.17
    0.16901216 = sum of:
      0.16901216 = product of:
        0.7042174 = sum of:
          0.1472141 = weight(abstract_txt:metrics in 2911) [ClassicSimilarity], result of:
            0.1472141 = score(doc=2911,freq=5.0), product of:
              0.12564406 = queryWeight, product of:
                1.0171485 = boost
                6.7070637 = idf(docFreq=141, maxDocs=42740)
                0.018417267 = queryNorm
              1.1716758 = fieldWeight in 2911, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.7070637 = idf(docFreq=141, maxDocs=42740)
                0.078125 = fieldNorm(doc=2911)
          0.01815232 = weight(abstract_txt:retrieval in 2911) [ClassicSimilarity], result of:
            0.01815232 = score(doc=2911,freq=1.0), product of:
              0.06706004 = queryWeight, product of:
                1.0508974 = boost
                3.4648013 = idf(docFreq=3633, maxDocs=42740)
                0.018417267 = queryNorm
              0.2706876 = fieldWeight in 2911, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4648013 = idf(docFreq=3633, maxDocs=42740)
                0.078125 = fieldNorm(doc=2911)
          0.0093942685 = weight(abstract_txt:information in 2911) [ClassicSimilarity], result of:
            0.0093942685 = score(doc=2911,freq=1.0), product of:
              0.049482096 = queryWeight, product of:
                1.1056 = boost
                2.430104 = idf(docFreq=10226, maxDocs=42740)
                0.018417267 = queryNorm
              0.18985188 = fieldWeight in 2911, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.430104 = idf(docFreq=10226, maxDocs=42740)
                0.078125 = fieldNorm(doc=2911)
          0.104920246 = weight(abstract_txt:runs in 2911) [ClassicSimilarity], result of:
            0.104920246 = score(doc=2911,freq=1.0), product of:
              0.1714241 = queryWeight, product of:
                1.18809 = boost
                7.834249 = idf(docFreq=45, maxDocs=42740)
                0.018417267 = queryNorm
              0.6120507 = fieldWeight in 2911, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.834249 = idf(docFreq=45, maxDocs=42740)
                0.078125 = fieldNorm(doc=2911)
          0.27936518 = weight(abstract_txt:graded in 2911) [ClassicSimilarity], result of:
            0.27936518 = score(doc=2911,freq=4.0), product of:
              0.20745657 = queryWeight, product of:
                1.3070042 = boost
                8.618368 = idf(docFreq=20, maxDocs=42740)
                0.018417267 = queryNorm
              1.3466201 = fieldWeight in 2911, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                8.618368 = idf(docFreq=20, maxDocs=42740)
                0.078125 = fieldNorm(doc=2911)
          0.14517121 = weight(abstract_txt:best in 2911) [ClassicSimilarity], result of:
            0.14517121 = score(doc=2911,freq=3.0), product of:
              0.21285605 = queryWeight, product of:
                2.2930684 = boost
                5.040154 = idf(docFreq=751, maxDocs=42740)
                0.018417267 = queryNorm
              0.6820159 = fieldWeight in 2911, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.040154 = idf(docFreq=751, maxDocs=42740)
                0.078125 = fieldNorm(doc=2911)
        0.24 = coord(6/25)
    
  3. Sormunen, E.; Kekäläinen, J.; Koivisto, J.; Järvelin, K.: Document text characteristics affect the ranking of the most relevant documents by expanded structured queries (2001) 0.15
    0.15028562 = sum of:
      0.15028562 = product of:
        0.53673434 = sum of:
          0.020537008 = weight(abstract_txt:retrieval in 488) [ClassicSimilarity], result of:
            0.020537008 = score(doc=488,freq=2.0), product of:
              0.06706004 = queryWeight, product of:
                1.0508974 = boost
                3.4648013 = idf(docFreq=3633, maxDocs=42740)
                0.018417267 = queryNorm
              0.30624807 = fieldWeight in 488, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4648013 = idf(docFreq=3633, maxDocs=42740)
                0.0625 = fieldNorm(doc=488)
          0.01503083 = weight(abstract_txt:information in 488) [ClassicSimilarity], result of:
            0.01503083 = score(doc=488,freq=4.0), product of:
              0.049482096 = queryWeight, product of:
                1.1056 = boost
                2.430104 = idf(docFreq=10226, maxDocs=42740)
                0.018417267 = queryNorm
              0.303763 = fieldWeight in 488, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.430104 = idf(docFreq=10226, maxDocs=42740)
                0.0625 = fieldNorm(doc=488)
          0.03399724 = weight(abstract_txt:search in 488) [ClassicSimilarity], result of:
            0.03399724 = score(doc=488,freq=1.0), product of:
              0.14896634 = queryWeight, product of:
                2.21507 = boost
                3.6515355 = idf(docFreq=3014, maxDocs=42740)
                0.018417267 = queryNorm
              0.22822097 = fieldWeight in 488, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6515355 = idf(docFreq=3014, maxDocs=42740)
                0.0625 = fieldNorm(doc=488)
          0.06705171 = weight(abstract_txt:best in 488) [ClassicSimilarity], result of:
            0.06705171 = score(doc=488,freq=1.0), product of:
              0.21285605 = queryWeight, product of:
                2.2930684 = boost
                5.040154 = idf(docFreq=751, maxDocs=42740)
                0.018417267 = queryNorm
              0.31500962 = fieldWeight in 488, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.040154 = idf(docFreq=751, maxDocs=42740)
                0.0625 = fieldNorm(doc=488)
          0.11214347 = weight(abstract_txt:textual in 488) [ClassicSimilarity], result of:
            0.11214347 = score(doc=488,freq=1.0), product of:
              0.2999131 = queryWeight, product of:
                2.7218974 = boost
                5.982718 = idf(docFreq=292, maxDocs=42740)
                0.018417267 = queryNorm
              0.37391987 = fieldWeight in 488, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.982718 = idf(docFreq=292, maxDocs=42740)
                0.0625 = fieldNorm(doc=488)
          0.13533556 = weight(abstract_txt:match in 488) [ClassicSimilarity], result of:
            0.13533556 = score(doc=488,freq=1.0), product of:
              0.33995447 = queryWeight, product of:
                2.8979065 = boost
                6.3695855 = idf(docFreq=198, maxDocs=42740)
                0.018417267 = queryNorm
              0.3980991 = fieldWeight in 488, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.3695855 = idf(docFreq=198, maxDocs=42740)
                0.0625 = fieldNorm(doc=488)
          0.15263852 = weight(abstract_txt:technique in 488) [ClassicSimilarity], result of:
            0.15263852 = score(doc=488,freq=1.0), product of:
              0.43672207 = queryWeight, product of:
                4.2403426 = boost
                5.5921526 = idf(docFreq=432, maxDocs=42740)
                0.018417267 = queryNorm
              0.34950954 = fieldWeight in 488, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5921526 = idf(docFreq=432, maxDocs=42740)
                0.0625 = fieldNorm(doc=488)
        0.28 = coord(7/25)
    
  4. Hoad, T.C.; Zobel, J.: Methods for identifying versioned and plagiarized documents (2003) 0.12
    0.1223644 = sum of:
      0.1223644 = product of:
        0.5098517 = sum of:
          0.014521857 = weight(abstract_txt:retrieval in 160) [ClassicSimilarity], result of:
            0.014521857 = score(doc=160,freq=1.0), product of:
              0.06706004 = queryWeight, product of:
                1.0508974 = boost
                3.4648013 = idf(docFreq=3633, maxDocs=42740)
                0.018417267 = queryNorm
              0.21655008 = fieldWeight in 160, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4648013 = idf(docFreq=3633, maxDocs=42740)
                0.0625 = fieldNorm(doc=160)
          0.007515415 = weight(abstract_txt:information in 160) [ClassicSimilarity], result of:
            0.007515415 = score(doc=160,freq=1.0), product of:
              0.049482096 = queryWeight, product of:
                1.1056 = boost
                2.430104 = idf(docFreq=10226, maxDocs=42740)
                0.018417267 = queryNorm
              0.1518815 = fieldWeight in 160, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.430104 = idf(docFreq=10226, maxDocs=42740)
                0.0625 = fieldNorm(doc=160)
          0.069563694 = weight(abstract_txt:strings in 160) [ClassicSimilarity], result of:
            0.069563694 = score(doc=160,freq=1.0), product of:
              0.15124956 = queryWeight, product of:
                1.1159903 = boost
                7.358825 = idf(docFreq=73, maxDocs=42740)
                0.018417267 = queryNorm
              0.45992658 = fieldWeight in 160, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.358825 = idf(docFreq=73, maxDocs=42740)
                0.0625 = fieldNorm(doc=160)
          0.06705171 = weight(abstract_txt:best in 160) [ClassicSimilarity], result of:
            0.06705171 = score(doc=160,freq=1.0), product of:
              0.21285605 = queryWeight, product of:
                2.2930684 = boost
                5.040154 = idf(docFreq=751, maxDocs=42740)
                0.018417267 = queryNorm
              0.31500962 = fieldWeight in 160, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.040154 = idf(docFreq=751, maxDocs=42740)
                0.0625 = fieldNorm(doc=160)
          0.13533556 = weight(abstract_txt:match in 160) [ClassicSimilarity], result of:
            0.13533556 = score(doc=160,freq=1.0), product of:
              0.33995447 = queryWeight, product of:
                2.8979065 = boost
                6.3695855 = idf(docFreq=198, maxDocs=42740)
                0.018417267 = queryNorm
              0.3980991 = fieldWeight in 160, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.3695855 = idf(docFreq=198, maxDocs=42740)
                0.0625 = fieldNorm(doc=160)
          0.21586347 = weight(abstract_txt:technique in 160) [ClassicSimilarity], result of:
            0.21586347 = score(doc=160,freq=2.0), product of:
              0.43672207 = queryWeight, product of:
                4.2403426 = boost
                5.5921526 = idf(docFreq=432, maxDocs=42740)
                0.018417267 = queryNorm
              0.4942811 = fieldWeight in 160, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.5921526 = idf(docFreq=432, maxDocs=42740)
                0.0625 = fieldNorm(doc=160)
        0.24 = coord(6/25)
    
  5. He, W.; Erdelez, S.; Wang, F.-K.; Shyu, C.-R.: ¬The effects of conceptual description and search practice on users' mental models and information seeking in a case-based reasoning retrieval system (2008) 0.12
    0.12156203 = sum of:
      0.12156203 = product of:
        0.60781014 = sum of:
          0.020537008 = weight(abstract_txt:retrieval in 4037) [ClassicSimilarity], result of:
            0.020537008 = score(doc=4037,freq=2.0), product of:
              0.06706004 = queryWeight, product of:
                1.0508974 = boost
                3.4648013 = idf(docFreq=3633, maxDocs=42740)
                0.018417267 = queryNorm
              0.30624807 = fieldWeight in 4037, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4648013 = idf(docFreq=3633, maxDocs=42740)
                0.0625 = fieldNorm(doc=4037)
          0.007515415 = weight(abstract_txt:information in 4037) [ClassicSimilarity], result of:
            0.007515415 = score(doc=4037,freq=1.0), product of:
              0.049482096 = queryWeight, product of:
                1.1056 = boost
                2.430104 = idf(docFreq=10226, maxDocs=42740)
                0.018417267 = queryNorm
              0.1518815 = fieldWeight in 4037, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.430104 = idf(docFreq=10226, maxDocs=42740)
                0.0625 = fieldNorm(doc=4037)
          0.12720604 = weight(abstract_txt:search in 4037) [ClassicSimilarity], result of:
            0.12720604 = score(doc=4037,freq=14.0), product of:
              0.14896634 = queryWeight, product of:
                2.21507 = boost
                3.6515355 = idf(docFreq=3014, maxDocs=42740)
                0.018417267 = queryNorm
              0.8539247 = fieldWeight in 4037, product of:
                3.7416575 = tf(freq=14.0), with freq of:
                  14.0 = termFreq=14.0
                3.6515355 = idf(docFreq=3014, maxDocs=42740)
                0.0625 = fieldNorm(doc=4037)
          0.14993218 = weight(abstract_txt:best in 4037) [ClassicSimilarity], result of:
            0.14993218 = score(doc=4037,freq=5.0), product of:
              0.21285605 = queryWeight, product of:
                2.2930684 = boost
                5.040154 = idf(docFreq=751, maxDocs=42740)
                0.018417267 = queryNorm
              0.70438296 = fieldWeight in 4037, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.040154 = idf(docFreq=751, maxDocs=42740)
                0.0625 = fieldNorm(doc=4037)
          0.30261952 = weight(abstract_txt:match in 4037) [ClassicSimilarity], result of:
            0.30261952 = score(doc=4037,freq=5.0), product of:
              0.33995447 = queryWeight, product of:
                2.8979065 = boost
                6.3695855 = idf(docFreq=198, maxDocs=42740)
                0.018417267 = queryNorm
              0.89017665 = fieldWeight in 4037, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.3695855 = idf(docFreq=198, maxDocs=42740)
                0.0625 = fieldNorm(doc=4037)
        0.2 = coord(5/25)