Document (#20416)

Author
Lee, D.L.
Ren, L.
Title
Document ranking on weight-partitioned signature files
Source
ACM transactions on information systems. 14(1996) no.2, S.109-137
Year
1996
Abstract
Proposes the weight partitioned signature file, a signature file organization for supporting document ranking. It uses multiple signature files each corresponding to one term frequency to represent terms with different term frequencies. Words with the same term frequency in a document are grouped together and hased into the signature file corresponding to that term frequency. Investigates the effect of false drops on retrieval effectiveness. Analyses the performance of the weight partitioned signature file under different search strategies and configurations. Obtains an optimal formula for storage allocation to minimise the effect of false drops on document ranks. Analytical results are supported by experiments on document collections
Theme
Retrievalalgorithmen

Similar documents (content)

  1. Lam, W.; Wong, K.-F.; Wong, C.-Y.: Chinese document indexing based on new partitioned signature file : model and evaluation (2001) 0.43
    0.42610294 = sum of:
      0.42610294 = product of:
        1.5217962 = sum of:
          0.027631471 = weight(abstract_txt:analytical in 1301) [ClassicSimilarity], result of:
            0.027631471 = score(doc=1301,freq=1.0), product of:
              0.067293145 = queryWeight, product of:
                1.077318 = boost
                6.569815 = idf(docFreq=165, maxDocs=43556)
                0.009507663 = queryNorm
              0.41061345 = fieldWeight in 1301, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.569815 = idf(docFreq=165, maxDocs=43556)
                0.0625 = fieldNorm(doc=1301)
          0.051462125 = weight(abstract_txt:files in 1301) [ClassicSimilarity], result of:
            0.051462125 = score(doc=1301,freq=2.0), product of:
              0.10186539 = queryWeight, product of:
                1.8745059 = boost
                5.715656 = idf(docFreq=389, maxDocs=43556)
                0.009507663 = queryNorm
              0.50519735 = fieldWeight in 1301, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.715656 = idf(docFreq=389, maxDocs=43556)
                0.0625 = fieldNorm(doc=1301)
          0.0862724 = weight(abstract_txt:false in 1301) [ClassicSimilarity], result of:
            0.0862724 = score(doc=1301,freq=1.0), product of:
              0.18111709 = queryWeight, product of:
                2.4995003 = boost
                7.62136 = idf(docFreq=57, maxDocs=43556)
                0.009507663 = queryNorm
              0.476335 = fieldWeight in 1301, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.62136 = idf(docFreq=57, maxDocs=43556)
                0.0625 = fieldNorm(doc=1301)
          0.038427565 = weight(abstract_txt:document in 1301) [ClassicSimilarity], result of:
            0.038427565 = score(doc=1301,freq=1.0), product of:
              0.1433684 = queryWeight, product of:
                3.5161726 = boost
                4.28854 = idf(docFreq=1624, maxDocs=43556)
                0.009507663 = queryNorm
              0.26803374 = fieldWeight in 1301, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.28854 = idf(docFreq=1624, maxDocs=43556)
                0.0625 = fieldNorm(doc=1301)
          0.15063766 = weight(abstract_txt:file in 1301) [ClassicSimilarity], result of:
            0.15063766 = score(doc=1301,freq=5.0), product of:
              0.19350277 = queryWeight, product of:
                3.6536932 = boost
                5.5703354 = idf(docFreq=450, maxDocs=43556)
                0.009507663 = queryNorm
              0.778478 = fieldWeight in 1301, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.5703354 = idf(docFreq=450, maxDocs=43556)
                0.0625 = fieldNorm(doc=1301)
          0.21615571 = weight(abstract_txt:partitioned in 1301) [ClassicSimilarity], result of:
            0.21615571 = score(doc=1301,freq=1.0), product of:
              0.38246033 = queryWeight, product of:
                4.448487 = boost
                9.042746 = idf(docFreq=13, maxDocs=43556)
                0.009507663 = queryNorm
              0.5651716 = fieldWeight in 1301, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.042746 = idf(docFreq=13, maxDocs=43556)
                0.0625 = fieldNorm(doc=1301)
          0.9512093 = weight(abstract_txt:signature in 1301) [ClassicSimilarity], result of:
            0.9512093 = score(doc=1301,freq=7.0), product of:
              0.67645144 = queryWeight, product of:
                8.366666 = boost
                8.503749 = idf(docFreq=23, maxDocs=43556)
                0.009507663 = queryNorm
              1.4061753 = fieldWeight in 1301, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                8.503749 = idf(docFreq=23, maxDocs=43556)
                0.0625 = fieldNorm(doc=1301)
        0.28 = coord(7/25)
    
  2. Carterette, B.; Can, F.: Comparing inverted files and signature files for searching a large lexicon (2005) 0.22
    0.22439493 = sum of:
      0.22439493 = product of:
        1.4024683 = sum of:
          0.07878497 = weight(abstract_txt:files in 3027) [ClassicSimilarity], result of:
            0.07878497 = score(doc=3027,freq=3.0), product of:
              0.10186539 = queryWeight, product of:
                1.8745059 = boost
                5.715656 = idf(docFreq=389, maxDocs=43556)
                0.009507663 = queryNorm
              0.77342236 = fieldWeight in 3027, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.715656 = idf(docFreq=389, maxDocs=43556)
                0.078125 = fieldNorm(doc=3027)
          0.054454315 = weight(abstract_txt:term in 3027) [ClassicSimilarity], result of:
            0.054454315 = score(doc=3027,freq=1.0), product of:
              0.14470038 = queryWeight, product of:
                3.1595361 = boost
                4.816955 = idf(docFreq=957, maxDocs=43556)
                0.009507663 = queryNorm
              0.37632462 = fieldWeight in 3027, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.816955 = idf(docFreq=957, maxDocs=43556)
                0.078125 = fieldNorm(doc=3027)
          0.16841802 = weight(abstract_txt:file in 3027) [ClassicSimilarity], result of:
            0.16841802 = score(doc=3027,freq=4.0), product of:
              0.19350277 = queryWeight, product of:
                3.6536932 = boost
                5.5703354 = idf(docFreq=450, maxDocs=43556)
                0.009507663 = queryNorm
              0.8703649 = fieldWeight in 3027, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.5703354 = idf(docFreq=450, maxDocs=43556)
                0.078125 = fieldNorm(doc=3027)
          1.100811 = weight(abstract_txt:signature in 3027) [ClassicSimilarity], result of:
            1.100811 = score(doc=3027,freq=6.0), product of:
              0.67645144 = queryWeight, product of:
                8.366666 = boost
                8.503749 = idf(docFreq=23, maxDocs=43556)
                0.009507663 = queryNorm
              1.6273319 = fieldWeight in 3027, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                8.503749 = idf(docFreq=23, maxDocs=43556)
                0.078125 = fieldNorm(doc=3027)
        0.16 = coord(4/25)
    
  3. Lee, D.L.: Massive parallelism on the hybrid text-retrieval machine (1995) 0.16
    0.16312239 = sum of:
      0.16312239 = product of:
        1.3593533 = sum of:
          0.10105081 = weight(abstract_txt:file in 4141) [ClassicSimilarity], result of:
            0.10105081 = score(doc=4141,freq=1.0), product of:
              0.19350277 = queryWeight, product of:
                3.6536932 = boost
                5.5703354 = idf(docFreq=450, maxDocs=43556)
                0.009507663 = queryNorm
              0.52221894 = fieldWeight in 4141, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5703354 = idf(docFreq=450, maxDocs=43556)
                0.09375 = fieldNorm(doc=4141)
          0.32423356 = weight(abstract_txt:partitioned in 4141) [ClassicSimilarity], result of:
            0.32423356 = score(doc=4141,freq=1.0), product of:
              0.38246033 = queryWeight, product of:
                4.448487 = boost
                9.042746 = idf(docFreq=13, maxDocs=43556)
                0.009507663 = queryNorm
              0.8477574 = fieldWeight in 4141, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.042746 = idf(docFreq=13, maxDocs=43556)
                0.09375 = fieldNorm(doc=4141)
          0.934069 = weight(abstract_txt:signature in 4141) [ClassicSimilarity], result of:
            0.934069 = score(doc=4141,freq=3.0), product of:
              0.67645144 = queryWeight, product of:
                8.366666 = boost
                8.503749 = idf(docFreq=23, maxDocs=43556)
                0.009507663 = queryNorm
              1.3808367 = fieldWeight in 4141, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.503749 = idf(docFreq=23, maxDocs=43556)
                0.09375 = fieldNorm(doc=4141)
        0.12 = coord(3/25)
    
  4. Kelledy, F.; Smeaton, A.F.: Signature files and beyond (1996) 0.15
    0.1457953 = sum of:
      0.1457953 = product of:
        1.2149608 = sum of:
          0.09097305 = weight(abstract_txt:files in 40) [ClassicSimilarity], result of:
            0.09097305 = score(doc=40,freq=4.0), product of:
              0.10186539 = queryWeight, product of:
                1.8745059 = boost
                5.715656 = idf(docFreq=389, maxDocs=43556)
                0.009507663 = queryNorm
              0.89307123 = fieldWeight in 40, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.715656 = idf(docFreq=389, maxDocs=43556)
                0.078125 = fieldNorm(doc=40)
          0.11908952 = weight(abstract_txt:file in 40) [ClassicSimilarity], result of:
            0.11908952 = score(doc=40,freq=2.0), product of:
              0.19350277 = queryWeight, product of:
                3.6536932 = boost
                5.5703354 = idf(docFreq=450, maxDocs=43556)
                0.009507663 = queryNorm
              0.6154409 = fieldWeight in 40, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.5703354 = idf(docFreq=450, maxDocs=43556)
                0.078125 = fieldNorm(doc=40)
          1.0048983 = weight(abstract_txt:signature in 40) [ClassicSimilarity], result of:
            1.0048983 = score(doc=40,freq=5.0), product of:
              0.67645144 = queryWeight, product of:
                8.366666 = boost
                8.503749 = idf(docFreq=23, maxDocs=43556)
                0.009507663 = queryNorm
              1.4855438 = fieldWeight in 40, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                8.503749 = idf(docFreq=23, maxDocs=43556)
                0.078125 = fieldNorm(doc=40)
        0.12 = coord(3/25)
    
  5. Zhu, W.Z.; Allen, R.B.: Document clustering using the LSI subspace signature model (2013) 0.11
    0.11000896 = sum of:
      0.11000896 = product of:
        0.687556 = sum of:
          0.06063609 = weight(abstract_txt:ranking in 2688) [ClassicSimilarity], result of:
            0.06063609 = score(doc=2688,freq=2.0), product of:
              0.09792997 = queryWeight, product of:
                1.8379399 = boost
                5.6041603 = idf(docFreq=435, maxDocs=43556)
                0.009507663 = queryNorm
              0.61917806 = fieldWeight in 2688, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.6041603 = idf(docFreq=435, maxDocs=43556)
                0.078125 = fieldNorm(doc=2688)
          0.094317645 = weight(abstract_txt:term in 2688) [ClassicSimilarity], result of:
            0.094317645 = score(doc=2688,freq=3.0), product of:
              0.14470038 = queryWeight, product of:
                3.1595361 = boost
                4.816955 = idf(docFreq=957, maxDocs=43556)
                0.009507663 = queryNorm
              0.6518134 = fieldWeight in 2688, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.816955 = idf(docFreq=957, maxDocs=43556)
                0.078125 = fieldNorm(doc=2688)
          0.08319813 = weight(abstract_txt:document in 2688) [ClassicSimilarity], result of:
            0.08319813 = score(doc=2688,freq=3.0), product of:
              0.1433684 = queryWeight, product of:
                3.5161726 = boost
                4.28854 = idf(docFreq=1624, maxDocs=43556)
                0.009507663 = queryNorm
              0.5803101 = fieldWeight in 2688, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.28854 = idf(docFreq=1624, maxDocs=43556)
                0.078125 = fieldNorm(doc=2688)
          0.44940418 = weight(abstract_txt:signature in 2688) [ClassicSimilarity], result of:
            0.44940418 = score(doc=2688,freq=1.0), product of:
              0.67645144 = queryWeight, product of:
                8.366666 = boost
                8.503749 = idf(docFreq=23, maxDocs=43556)
                0.009507663 = queryNorm
              0.6643554 = fieldWeight in 2688, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.503749 = idf(docFreq=23, maxDocs=43556)
                0.078125 = fieldNorm(doc=2688)
        0.16 = coord(4/25)