Document (#20419)

Author
Lee, D.L.
Ren, L.
Title
Document ranking on weight-partitioned signature files
Source
ACM transactions on information systems. 14(1996) no.2, S.109-137
Year
1996
Abstract
Proposes the weight partitioned signature file, a signature file organization for supporting document ranking. It uses multiple signature files each corresponding to one term frequency to represent terms with different term frequencies. Words with the same term frequency in a document are grouped together and hased into the signature file corresponding to that term frequency. Investigates the effect of false drops on retrieval effectiveness. Analyses the performance of the weight partitioned signature file under different search strategies and configurations. Obtains an optimal formula for storage allocation to minimise the effect of false drops on document ranks. Analytical results are supported by experiments on document collections
Theme
Retrievalalgorithmen

Similar documents (content)

  1. Lam, W.; Wong, K.-F.; Wong, C.-Y.: Chinese document indexing based on new partitioned signature file : model and evaluation (2001) 0.43
    0.42506006 = sum of:
      0.42506006 = product of:
        1.5180717 = sum of:
          0.027824262 = weight(abstract_txt:analytical in 1304) [ClassicSimilarity], result of:
            0.027824262 = score(doc=1304,freq=1.0), product of:
              0.0676425 = queryWeight, product of:
                1.077861 = boost
                6.581486 = idf(docFreq=160, maxDocs=42740)
                0.009535269 = queryNorm
              0.4113429 = fieldWeight in 1304, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.581486 = idf(docFreq=160, maxDocs=42740)
                0.0625 = fieldNorm(doc=1304)
          0.0513836 = weight(abstract_txt:files in 1304) [ClassicSimilarity], result of:
            0.0513836 = score(doc=1304,freq=2.0), product of:
              0.10181699 = queryWeight, product of:
                1.8701569 = boost
                5.709647 = idf(docFreq=384, maxDocs=42740)
                0.009535269 = queryNorm
              0.50466627 = fieldWeight in 1304, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.709647 = idf(docFreq=384, maxDocs=42740)
                0.0625 = fieldNorm(doc=1304)
          0.08821272 = weight(abstract_txt:false in 1304) [ClassicSimilarity], result of:
            0.08821272 = score(doc=1304,freq=1.0), product of:
              0.18392244 = queryWeight, product of:
                2.513537 = boost
                7.6739063 = idf(docFreq=53, maxDocs=42740)
                0.009535269 = queryNorm
              0.47961915 = fieldWeight in 1304, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.6739063 = idf(docFreq=53, maxDocs=42740)
                0.0625 = fieldNorm(doc=1304)
          0.038281254 = weight(abstract_txt:document in 1304) [ClassicSimilarity], result of:
            0.038281254 = score(doc=1304,freq=1.0), product of:
              0.14308189 = queryWeight, product of:
                3.5053408 = boost
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.009535269 = queryNorm
              0.26754788 = fieldWeight in 1304, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.0625 = fieldNorm(doc=1304)
          0.1508007 = weight(abstract_txt:file in 1304) [ClassicSimilarity], result of:
            0.1508007 = score(doc=1304,freq=5.0), product of:
              0.1937475 = queryWeight, product of:
                3.6483877 = boost
                5.5693207 = idf(docFreq=442, maxDocs=42740)
                0.009535269 = queryNorm
              0.7783362 = fieldWeight in 1304, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.5693207 = idf(docFreq=442, maxDocs=42740)
                0.0625 = fieldNorm(doc=1304)
          0.2151524 = weight(abstract_txt:partitioned in 1304) [ClassicSimilarity], result of:
            0.2151524 = score(doc=1304,freq=1.0), product of:
              0.38148293 = queryWeight, product of:
                4.4335446 = boost
                9.023833 = idf(docFreq=13, maxDocs=42740)
                0.009535269 = queryNorm
              0.5639896 = fieldWeight in 1304, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.023833 = idf(docFreq=13, maxDocs=42740)
                0.0625 = fieldNorm(doc=1304)
          0.9464168 = weight(abstract_txt:signature in 1304) [ClassicSimilarity], result of:
            0.9464168 = score(doc=1304,freq=7.0), product of:
              0.67454344 = queryWeight, product of:
                8.337455 = boost
                8.484837 = idf(docFreq=23, maxDocs=42740)
                0.009535269 = queryNorm
              1.4030479 = fieldWeight in 1304, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                8.484837 = idf(docFreq=23, maxDocs=42740)
                0.0625 = fieldNorm(doc=1304)
        0.28 = coord(7/25)
    
  2. Carterette, B.; Can, F.: Comparing inverted files and signature files for searching a large lexicon (2005) 0.22
    0.22359607 = sum of:
      0.22359607 = product of:
        1.3974755 = sum of:
          0.07866475 = weight(abstract_txt:files in 3030) [ClassicSimilarity], result of:
            0.07866475 = score(doc=3030,freq=3.0), product of:
              0.10181699 = queryWeight, product of:
                1.8701569 = boost
                5.709647 = idf(docFreq=384, maxDocs=42740)
                0.009535269 = queryNorm
              0.77260923 = fieldWeight in 3030, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.709647 = idf(docFreq=384, maxDocs=42740)
                0.078125 = fieldNorm(doc=3030)
          0.05494579 = weight(abstract_txt:term in 3030) [ClassicSimilarity], result of:
            0.05494579 = score(doc=3030,freq=1.0), product of:
              0.14564878 = queryWeight, product of:
                3.1632705 = boost
                4.8287816 = idf(docFreq=928, maxDocs=42740)
                0.009535269 = queryNorm
              0.37724856 = fieldWeight in 3030, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8287816 = idf(docFreq=928, maxDocs=42740)
                0.078125 = fieldNorm(doc=3030)
          0.1686003 = weight(abstract_txt:file in 3030) [ClassicSimilarity], result of:
            0.1686003 = score(doc=3030,freq=4.0), product of:
              0.1937475 = queryWeight, product of:
                3.6483877 = boost
                5.5693207 = idf(docFreq=442, maxDocs=42740)
                0.009535269 = queryNorm
              0.87020636 = fieldWeight in 3030, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.5693207 = idf(docFreq=442, maxDocs=42740)
                0.078125 = fieldNorm(doc=3030)
          1.0952647 = weight(abstract_txt:signature in 3030) [ClassicSimilarity], result of:
            1.0952647 = score(doc=3030,freq=6.0), product of:
              0.67454344 = queryWeight, product of:
                8.337455 = boost
                8.484837 = idf(docFreq=23, maxDocs=42740)
                0.009535269 = queryNorm
              1.6237127 = fieldWeight in 3030, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                8.484837 = idf(docFreq=23, maxDocs=42740)
                0.078125 = fieldNorm(doc=3030)
        0.16 = coord(4/25)
    
  3. Lee, D.L.: Massive parallelism on the hybrid text-retrieval machine (1995) 0.16
    0.1623902 = sum of:
      0.1623902 = product of:
        1.3532517 = sum of:
          0.10116018 = weight(abstract_txt:file in 4144) [ClassicSimilarity], result of:
            0.10116018 = score(doc=4144,freq=1.0), product of:
              0.1937475 = queryWeight, product of:
                3.6483877 = boost
                5.5693207 = idf(docFreq=442, maxDocs=42740)
                0.009535269 = queryNorm
              0.5221238 = fieldWeight in 4144, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5693207 = idf(docFreq=442, maxDocs=42740)
                0.09375 = fieldNorm(doc=4144)
          0.32272857 = weight(abstract_txt:partitioned in 4144) [ClassicSimilarity], result of:
            0.32272857 = score(doc=4144,freq=1.0), product of:
              0.38148293 = queryWeight, product of:
                4.4335446 = boost
                9.023833 = idf(docFreq=13, maxDocs=42740)
                0.009535269 = queryNorm
              0.84598434 = fieldWeight in 4144, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.023833 = idf(docFreq=13, maxDocs=42740)
                0.09375 = fieldNorm(doc=4144)
          0.9293629 = weight(abstract_txt:signature in 4144) [ClassicSimilarity], result of:
            0.9293629 = score(doc=4144,freq=3.0), product of:
              0.67454344 = queryWeight, product of:
                8.337455 = boost
                8.484837 = idf(docFreq=23, maxDocs=42740)
                0.009535269 = queryNorm
              1.3777658 = fieldWeight in 4144, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.484837 = idf(docFreq=23, maxDocs=42740)
                0.09375 = fieldNorm(doc=4144)
        0.12 = coord(3/25)
    
  4. Kelledy, F.; Smeaton, A.F.: Signature files and beyond (1996) 0.15
    0.14518654 = sum of:
      0.14518654 = product of:
        1.2098879 = sum of:
          0.09083424 = weight(abstract_txt:files in 43) [ClassicSimilarity], result of:
            0.09083424 = score(doc=43,freq=4.0), product of:
              0.10181699 = queryWeight, product of:
                1.8701569 = boost
                5.709647 = idf(docFreq=384, maxDocs=42740)
                0.009535269 = queryNorm
              0.8921324 = fieldWeight in 43, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.709647 = idf(docFreq=384, maxDocs=42740)
                0.078125 = fieldNorm(doc=43)
          0.11921842 = weight(abstract_txt:file in 43) [ClassicSimilarity], result of:
            0.11921842 = score(doc=43,freq=2.0), product of:
              0.1937475 = queryWeight, product of:
                3.6483877 = boost
                5.5693207 = idf(docFreq=442, maxDocs=42740)
                0.009535269 = queryNorm
              0.6153288 = fieldWeight in 43, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.5693207 = idf(docFreq=442, maxDocs=42740)
                0.078125 = fieldNorm(doc=43)
          0.99983525 = weight(abstract_txt:signature in 43) [ClassicSimilarity], result of:
            0.99983525 = score(doc=43,freq=5.0), product of:
              0.67454344 = queryWeight, product of:
                8.337455 = boost
                8.484837 = idf(docFreq=23, maxDocs=42740)
                0.009535269 = queryNorm
              1.48224 = fieldWeight in 43, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                8.484837 = idf(docFreq=23, maxDocs=42740)
                0.078125 = fieldNorm(doc=43)
        0.12 = coord(3/25)
    
  5. Moura, E.S. de; Fernandes, D.; Ribeiro-Neto, B.; Silva, A.S. da; Gonçalves, M.A.: Using structural information to improve search in Web collections (2010) 0.11
    0.11072818 = sum of:
      0.11072818 = product of:
        0.5536409 = sum of:
          0.10491154 = weight(abstract_txt:ranking in 1120) [ClassicSimilarity], result of:
            0.10491154 = score(doc=1120,freq=6.0), product of:
              0.097912684 = queryWeight, product of:
                1.8339496 = boost
                5.5991054 = idf(docFreq=429, maxDocs=42740)
                0.009535269 = queryNorm
              1.0714806 = fieldWeight in 1120, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.5991054 = idf(docFreq=429, maxDocs=42740)
                0.078125 = fieldNorm(doc=1120)
          0.109716415 = weight(abstract_txt:frequency in 1120) [ClassicSimilarity], result of:
            0.109716415 = score(doc=1120,freq=2.0), product of:
              0.166549 = queryWeight, product of:
                2.9294395 = boost
                5.962447 = idf(docFreq=298, maxDocs=42740)
                0.009535269 = queryNorm
              0.6587636 = fieldWeight in 1120, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.962447 = idf(docFreq=298, maxDocs=42740)
                0.078125 = fieldNorm(doc=1120)
          0.077705085 = weight(abstract_txt:term in 1120) [ClassicSimilarity], result of:
            0.077705085 = score(doc=1120,freq=2.0), product of:
              0.14564878 = queryWeight, product of:
                3.1632705 = boost
                4.8287816 = idf(docFreq=928, maxDocs=42740)
                0.009535269 = queryNorm
              0.53351 = fieldWeight in 1120, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.8287816 = idf(docFreq=928, maxDocs=42740)
                0.078125 = fieldNorm(doc=1120)
          0.04785157 = weight(abstract_txt:document in 1120) [ClassicSimilarity], result of:
            0.04785157 = score(doc=1120,freq=1.0), product of:
              0.14308189 = queryWeight, product of:
                3.5053408 = boost
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.009535269 = queryNorm
              0.33443484 = fieldWeight in 1120, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.078125 = fieldNorm(doc=1120)
          0.21345632 = weight(abstract_txt:weight in 1120) [ClassicSimilarity], result of:
            0.21345632 = score(doc=1120,freq=2.0), product of:
              0.25955755 = queryWeight, product of:
                3.6570456 = boost
                7.4433827 = idf(docFreq=67, maxDocs=42740)
                0.009535269 = queryNorm
              0.8223853 = fieldWeight in 1120, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.4433827 = idf(docFreq=67, maxDocs=42740)
                0.078125 = fieldNorm(doc=1120)
        0.2 = coord(5/25)