Document (#7295)

Author
Can, F.
Title
On the efficiency of best-match cluster searches
Source
Information processing and management. 30(1994) no.3, S.343-361
Year
1994
Abstract
The efficiency of various cluster-based retrieval (CBR) strategies is analyzed. The possibility of combining CBR and inverted index search (IIS) is investigated. A method for combining the two approaches is proposed and shown to be cost effective in terms of paging and CPU time. In the new method, the selection of documents from the best-matching clusters is done using the inverted index for all documents. Although this is counterintuitive to the concept of best-match CBR, the observations prove that it is much more efficient than conventional approaches. In the experiments, the effects of the number of selected clusters, page size, centroid length, and matching functions are considered. The experiments show that the storage overhead of the new method would be moderately higher than that of IIS

Similar documents (content)

  1. Kang, I.-S.; Na, S.-H.; Kim, J.; Lee, J.-H.: Cluster-based patent retrieval (2007) 0.23
    0.23275112 = sum of:
      0.23275112 = product of:
        0.72734725 = sum of:
          0.012646921 = weight(abstract_txt:that in 930) [ClassicSimilarity], result of:
            0.012646921 = score(doc=930,freq=3.0), product of:
              0.049305122 = queryWeight, product of:
                1.1341648 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.018346943 = queryNorm
              0.2565032 = fieldWeight in 930, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0625 = fieldNorm(doc=930)
          0.044365015 = weight(abstract_txt:documents in 930) [ClassicSimilarity], result of:
            0.044365015 = score(doc=930,freq=3.0), product of:
              0.09944101 = queryWeight, product of:
                1.3151258 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.018346943 = queryNorm
              0.44614407 = fieldWeight in 930, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.0625 = fieldNorm(doc=930)
          0.0358137 = weight(abstract_txt:approaches in 930) [ClassicSimilarity], result of:
            0.0358137 = score(doc=930,freq=1.0), product of:
              0.124340214 = queryWeight, product of:
                1.4705857 = boost
                4.6084785 = idf(docFreq=1197, maxDocs=44218)
                0.018346943 = queryNorm
              0.2880299 = fieldWeight in 930, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6084785 = idf(docFreq=1197, maxDocs=44218)
                0.0625 = fieldNorm(doc=930)
          0.055419166 = weight(abstract_txt:experiments in 930) [ClassicSimilarity], result of:
            0.055419166 = score(doc=930,freq=1.0), product of:
              0.16634847 = queryWeight, product of:
                1.7009593 = boost
                5.3304167 = idf(docFreq=581, maxDocs=44218)
                0.018346943 = queryNorm
              0.33315104 = fieldWeight in 930, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3304167 = idf(docFreq=581, maxDocs=44218)
                0.0625 = fieldNorm(doc=930)
          0.09475183 = weight(abstract_txt:match in 930) [ClassicSimilarity], result of:
            0.09475183 = score(doc=930,freq=1.0), product of:
              0.2378504 = queryWeight, product of:
                2.0339322 = boost
                6.373877 = idf(docFreq=204, maxDocs=44218)
                0.018346943 = queryNorm
              0.39836732 = fieldWeight in 930, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.373877 = idf(docFreq=204, maxDocs=44218)
                0.0625 = fieldNorm(doc=930)
          0.14310525 = weight(abstract_txt:clusters in 930) [ClassicSimilarity], result of:
            0.14310525 = score(doc=930,freq=2.0), product of:
              0.24850734 = queryWeight, product of:
                2.0789983 = boost
                6.515104 = idf(docFreq=177, maxDocs=44218)
                0.018346943 = queryNorm
              0.57585925 = fieldWeight in 930, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.515104 = idf(docFreq=177, maxDocs=44218)
                0.0625 = fieldNorm(doc=930)
          0.27197477 = weight(abstract_txt:cluster in 930) [ClassicSimilarity], result of:
            0.27197477 = score(doc=930,freq=7.0), product of:
              0.25112998 = queryWeight, product of:
                2.08994 = boost
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.018346943 = queryNorm
              1.083004 = fieldWeight in 930, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.0625 = fieldNorm(doc=930)
          0.06927059 = weight(abstract_txt:best in 930) [ClassicSimilarity], result of:
            0.06927059 = score(doc=930,freq=1.0), product of:
              0.22095737 = queryWeight, product of:
                2.400957 = boost
                5.0160327 = idf(docFreq=796, maxDocs=44218)
                0.018346943 = queryNorm
              0.31350204 = fieldWeight in 930, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0160327 = idf(docFreq=796, maxDocs=44218)
                0.0625 = fieldNorm(doc=930)
        0.32 = coord(8/25)
    
  2. Willett, P.: Best-match text retrieval (1993) 0.17
    0.17259413 = sum of:
      0.17259413 = product of:
        0.86297065 = sum of:
          0.014603408 = weight(abstract_txt:that in 7818) [ClassicSimilarity], result of:
            0.014603408 = score(doc=7818,freq=1.0), product of:
              0.049305122 = queryWeight, product of:
                1.1341648 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.018346943 = queryNorm
              0.2961844 = fieldWeight in 7818, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.125 = fieldNorm(doc=7818)
          0.05122831 = weight(abstract_txt:documents in 7818) [ClassicSimilarity], result of:
            0.05122831 = score(doc=7818,freq=1.0), product of:
              0.09944101 = queryWeight, product of:
                1.3151258 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.018346943 = queryNorm
              0.5151628 = fieldWeight in 7818, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.125 = fieldNorm(doc=7818)
          0.22894865 = weight(abstract_txt:matching in 7818) [ClassicSimilarity], result of:
            0.22894865 = score(doc=7818,freq=2.0), product of:
              0.21414481 = queryWeight, product of:
                1.9299157 = boost
                6.047913 = idf(docFreq=283, maxDocs=44218)
                0.018346943 = queryNorm
              1.0691301 = fieldWeight in 7818, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.047913 = idf(docFreq=283, maxDocs=44218)
                0.125 = fieldNorm(doc=7818)
          0.32822993 = weight(abstract_txt:match in 7818) [ClassicSimilarity], result of:
            0.32822993 = score(doc=7818,freq=3.0), product of:
              0.2378504 = queryWeight, product of:
                2.0339322 = boost
                6.373877 = idf(docFreq=204, maxDocs=44218)
                0.018346943 = queryNorm
              1.3799849 = fieldWeight in 7818, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.373877 = idf(docFreq=204, maxDocs=44218)
                0.125 = fieldNorm(doc=7818)
          0.23996036 = weight(abstract_txt:best in 7818) [ClassicSimilarity], result of:
            0.23996036 = score(doc=7818,freq=3.0), product of:
              0.22095737 = queryWeight, product of:
                2.400957 = boost
                5.0160327 = idf(docFreq=796, maxDocs=44218)
                0.018346943 = queryNorm
              1.086003 = fieldWeight in 7818, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.0160327 = idf(docFreq=796, maxDocs=44218)
                0.125 = fieldNorm(doc=7818)
        0.2 = coord(5/25)
    
  3. Dunlavy, D.M.; O'Leary, D.P.; Conroy, J.M.; Schlesinger, J.D.: QCS: A system for querying, clustering and summarizing documents (2007) 0.15
    0.15338932 = sum of:
      0.15338932 = product of:
        0.47934163 = sum of:
          0.006388991 = weight(abstract_txt:that in 947) [ClassicSimilarity], result of:
            0.006388991 = score(doc=947,freq=1.0), product of:
              0.049305122 = queryWeight, product of:
                1.1341648 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.018346943 = queryNorm
              0.12958068 = fieldWeight in 947, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0546875 = fieldNorm(doc=947)
          0.018920647 = weight(abstract_txt:than in 947) [ClassicSimilarity], result of:
            0.018920647 = score(doc=947,freq=1.0), product of:
              0.08882409 = queryWeight, product of:
                1.2429394 = boost
                3.8950868 = idf(docFreq=2444, maxDocs=44218)
                0.018346943 = queryNorm
              0.21301256 = fieldWeight in 947, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.8950868 = idf(docFreq=2444, maxDocs=44218)
                0.0546875 = fieldNorm(doc=947)
          0.0316959 = weight(abstract_txt:documents in 947) [ClassicSimilarity], result of:
            0.0316959 = score(doc=947,freq=2.0), product of:
              0.09944101 = queryWeight, product of:
                1.3151258 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.018346943 = queryNorm
              0.31874073 = fieldWeight in 947, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.0546875 = fieldNorm(doc=947)
          0.04849177 = weight(abstract_txt:experiments in 947) [ClassicSimilarity], result of:
            0.04849177 = score(doc=947,freq=1.0), product of:
              0.16634847 = queryWeight, product of:
                1.7009593 = boost
                5.3304167 = idf(docFreq=581, maxDocs=44218)
                0.018346943 = queryNorm
              0.29150715 = fieldWeight in 947, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3304167 = idf(docFreq=581, maxDocs=44218)
                0.0546875 = fieldNorm(doc=947)
          0.08854186 = weight(abstract_txt:clusters in 947) [ClassicSimilarity], result of:
            0.08854186 = score(doc=947,freq=1.0), product of:
              0.24850734 = queryWeight, product of:
                2.0789983 = boost
                6.515104 = idf(docFreq=177, maxDocs=44218)
                0.018346943 = queryNorm
              0.35629475 = fieldWeight in 947, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.515104 = idf(docFreq=177, maxDocs=44218)
                0.0546875 = fieldNorm(doc=947)
          0.15579313 = weight(abstract_txt:cluster in 947) [ClassicSimilarity], result of:
            0.15579313 = score(doc=947,freq=3.0), product of:
              0.25112998 = queryWeight, product of:
                2.08994 = boost
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.018346943 = queryNorm
              0.6203685 = fieldWeight in 947, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.0546875 = fieldNorm(doc=947)
          0.043791354 = weight(abstract_txt:method in 947) [ClassicSimilarity], result of:
            0.043791354 = score(doc=947,freq=1.0), product of:
              0.17790827 = queryWeight, product of:
                2.1544094 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.018346943 = queryNorm
              0.2461457 = fieldWeight in 947, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.0546875 = fieldNorm(doc=947)
          0.085717976 = weight(abstract_txt:best in 947) [ClassicSimilarity], result of:
            0.085717976 = score(doc=947,freq=2.0), product of:
              0.22095737 = queryWeight, product of:
                2.400957 = boost
                5.0160327 = idf(docFreq=796, maxDocs=44218)
                0.018346943 = queryNorm
              0.38793898 = fieldWeight in 947, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.0160327 = idf(docFreq=796, maxDocs=44218)
                0.0546875 = fieldNorm(doc=947)
        0.32 = coord(8/25)
    
  4. Buckley, C.; Allan, J.; Salton, G.: Automatic routing and retrieval using Smart : TREC-2 (1995) 0.15
    0.14854673 = sum of:
      0.14854673 = product of:
        0.46420854 = sum of:
          0.05630516 = weight(abstract_txt:conventional in 5699) [ClassicSimilarity], result of:
            0.05630516 = score(doc=5699,freq=1.0), product of:
              0.11499023 = queryWeight, product of:
                6.2675414 = idf(docFreq=227, maxDocs=44218)
                0.018346943 = queryNorm
              0.48965168 = fieldWeight in 5699, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2675414 = idf(docFreq=227, maxDocs=44218)
                0.078125 = fieldNorm(doc=5699)
          0.06390837 = weight(abstract_txt:length in 5699) [ClassicSimilarity], result of:
            0.06390837 = score(doc=5699,freq=1.0), product of:
              0.12512209 = queryWeight, product of:
                1.0431254 = boost
                6.537832 = idf(docFreq=173, maxDocs=44218)
                0.018346943 = queryNorm
              0.5107681 = fieldWeight in 5699, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.537832 = idf(docFreq=173, maxDocs=44218)
                0.078125 = fieldNorm(doc=5699)
          0.01290771 = weight(abstract_txt:that in 5699) [ClassicSimilarity], result of:
            0.01290771 = score(doc=5699,freq=2.0), product of:
              0.049305122 = queryWeight, product of:
                1.1341648 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.018346943 = queryNorm
              0.26179248 = fieldWeight in 5699, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.078125 = fieldNorm(doc=5699)
          0.027029496 = weight(abstract_txt:than in 5699) [ClassicSimilarity], result of:
            0.027029496 = score(doc=5699,freq=1.0), product of:
              0.08882409 = queryWeight, product of:
                1.2429394 = boost
                3.8950868 = idf(docFreq=2444, maxDocs=44218)
                0.018346943 = queryNorm
              0.30430365 = fieldWeight in 5699, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.8950868 = idf(docFreq=2444, maxDocs=44218)
                0.078125 = fieldNorm(doc=5699)
          0.032017697 = weight(abstract_txt:documents in 5699) [ClassicSimilarity], result of:
            0.032017697 = score(doc=5699,freq=1.0), product of:
              0.09944101 = queryWeight, product of:
                1.3151258 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.018346943 = queryNorm
              0.32197678 = fieldWeight in 5699, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.078125 = fieldNorm(doc=5699)
          0.06331027 = weight(abstract_txt:approaches in 5699) [ClassicSimilarity], result of:
            0.06331027 = score(doc=5699,freq=2.0), product of:
              0.124340214 = queryWeight, product of:
                1.4705857 = boost
                4.6084785 = idf(docFreq=1197, maxDocs=44218)
                0.018346943 = queryNorm
              0.50916976 = fieldWeight in 5699, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.6084785 = idf(docFreq=1197, maxDocs=44218)
                0.078125 = fieldNorm(doc=5699)
          0.09796816 = weight(abstract_txt:experiments in 5699) [ClassicSimilarity], result of:
            0.09796816 = score(doc=5699,freq=2.0), product of:
              0.16634847 = queryWeight, product of:
                1.7009593 = boost
                5.3304167 = idf(docFreq=581, maxDocs=44218)
                0.018346943 = queryNorm
              0.58893335 = fieldWeight in 5699, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.3304167 = idf(docFreq=581, maxDocs=44218)
                0.078125 = fieldNorm(doc=5699)
          0.11076167 = weight(abstract_txt:combining in 5699) [ClassicSimilarity], result of:
            0.11076167 = score(doc=5699,freq=1.0), product of:
              0.22745657 = queryWeight, product of:
                1.9889954 = boost
                6.2330556 = idf(docFreq=235, maxDocs=44218)
                0.018346943 = queryNorm
              0.48695746 = fieldWeight in 5699, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2330556 = idf(docFreq=235, maxDocs=44218)
                0.078125 = fieldNorm(doc=5699)
        0.32 = coord(8/25)
    
  5. He, J.; Meij, E.; Rijke, M. de: Result diversification based on query-specific cluster ranking (2011) 0.15
    0.14521444 = sum of:
      0.14521444 = product of:
        0.60506016 = sum of:
          0.016327105 = weight(abstract_txt:that in 4355) [ClassicSimilarity], result of:
            0.016327105 = score(doc=4355,freq=5.0), product of:
              0.049305122 = queryWeight, product of:
                1.1341648 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.018346943 = queryNorm
              0.3311442 = fieldWeight in 4355, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0625 = fieldNorm(doc=4355)
          0.021623597 = weight(abstract_txt:than in 4355) [ClassicSimilarity], result of:
            0.021623597 = score(doc=4355,freq=1.0), product of:
              0.08882409 = queryWeight, product of:
                1.2429394 = boost
                3.8950868 = idf(docFreq=2444, maxDocs=44218)
                0.018346943 = queryNorm
              0.24344292 = fieldWeight in 4355, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.8950868 = idf(docFreq=2444, maxDocs=44218)
                0.0625 = fieldNorm(doc=4355)
          0.062741615 = weight(abstract_txt:documents in 4355) [ClassicSimilarity], result of:
            0.062741615 = score(doc=4355,freq=6.0), product of:
              0.09944101 = queryWeight, product of:
                1.3151258 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.018346943 = queryNorm
              0.63094306 = fieldWeight in 4355, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.0625 = fieldNorm(doc=4355)
          0.055419166 = weight(abstract_txt:experiments in 4355) [ClassicSimilarity], result of:
            0.055419166 = score(doc=4355,freq=1.0), product of:
              0.16634847 = queryWeight, product of:
                1.7009593 = boost
                5.3304167 = idf(docFreq=581, maxDocs=44218)
                0.018346943 = queryNorm
              0.33315104 = fieldWeight in 4355, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3304167 = idf(docFreq=581, maxDocs=44218)
                0.0625 = fieldNorm(doc=4355)
          0.30357206 = weight(abstract_txt:clusters in 4355) [ClassicSimilarity], result of:
            0.30357206 = score(doc=4355,freq=9.0), product of:
              0.24850734 = queryWeight, product of:
                2.0789983 = boost
                6.515104 = idf(docFreq=177, maxDocs=44218)
                0.018346943 = queryNorm
              1.2215819 = fieldWeight in 4355, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                6.515104 = idf(docFreq=177, maxDocs=44218)
                0.0625 = fieldNorm(doc=4355)
          0.14537662 = weight(abstract_txt:cluster in 4355) [ClassicSimilarity], result of:
            0.14537662 = score(doc=4355,freq=2.0), product of:
              0.25112998 = queryWeight, product of:
                2.08994 = boost
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.018346943 = queryNorm
              0.57888997 = fieldWeight in 4355, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.0625 = fieldNorm(doc=4355)
        0.24 = coord(6/25)