Document (#15825)

Author
Persin, M.
Zobel, J.
Sacks-Davis, R.
Title
Filtered document retrieval with frequency-sorted indexes
Source
Journal of the American Society for Information SCience. 47(1996) no.10, S.749-764
Year
1996
Abstract
Proposes an evaluation technique for ranking that uses early recognition of which documents are likely to be highly ranked to reduce costs. Queries are evaluated in 2% of the memory of standard implementation without degradation in retrieval effectiveness. CPU time and disc traffic can also be dramatically reduced by designing inverted indexes explicitly to support the technique. Inverted lists are sorted by decreasing within-document frequency rather than by document number, and this method experimentally reduces CPU time and disk traffic to around 1/3rd of the original requirement. Frequency sorting can lead to a net reduction in index size, regardless of whether the index is compressed

Similar documents (author)

  1. Kaszkiel, M.; Zobel, J.: Effective ranking with arbitrary passages (2001) 1.95
    1.9512537 = sum of:
      1.9512537 = product of:
        3.9025073 = sum of:
          3.9025073 = weight(author_txt:zobel in 762) [ClassicSimilarity], result of:
            3.9025073 = score(doc=762,freq=1.0), product of:
              0.8321605 = queryWeight, product of:
                1.2250085 = boost
                9.379218 = idf(docFreq=9, maxDocs=43556)
                0.07242714 = queryNorm
              4.689609 = fieldWeight in 762, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.379218 = idf(docFreq=9, maxDocs=43556)
                0.5 = fieldNorm(doc=762)
        0.5 = coord(1/2)
    
  2. Heinz, S.; Zobel, J.: Efficient single-pass index construction for text databases (2003) 1.95
    1.9512537 = sum of:
      1.9512537 = product of:
        3.9025073 = sum of:
          3.9025073 = weight(author_txt:zobel in 2676) [ClassicSimilarity], result of:
            3.9025073 = score(doc=2676,freq=1.0), product of:
              0.8321605 = queryWeight, product of:
                1.2250085 = boost
                9.379218 = idf(docFreq=9, maxDocs=43556)
                0.07242714 = queryNorm
              4.689609 = fieldWeight in 2676, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.379218 = idf(docFreq=9, maxDocs=43556)
                0.5 = fieldNorm(doc=2676)
        0.5 = coord(1/2)
    
  3. Uitdenbogerd, A.L.; Zobel, J.: ¬An architecture for effective music information retrieval (2004) 1.95
    1.9512537 = sum of:
      1.9512537 = product of:
        3.9025073 = sum of:
          3.9025073 = weight(author_txt:zobel in 4053) [ClassicSimilarity], result of:
            3.9025073 = score(doc=4053,freq=1.0), product of:
              0.8321605 = queryWeight, product of:
                1.2250085 = boost
                9.379218 = idf(docFreq=9, maxDocs=43556)
                0.07242714 = queryNorm
              4.689609 = fieldWeight in 4053, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.379218 = idf(docFreq=9, maxDocs=43556)
                0.5 = fieldNorm(doc=4053)
        0.5 = coord(1/2)
    
  4. Hoad, T.C.; Zobel, J.: Methods for identifying versioned and plagiarized documents (2003) 1.95
    1.9512537 = sum of:
      1.9512537 = product of:
        3.9025073 = sum of:
          3.9025073 = weight(author_txt:zobel in 157) [ClassicSimilarity], result of:
            3.9025073 = score(doc=157,freq=1.0), product of:
              0.8321605 = queryWeight, product of:
                1.2250085 = boost
                9.379218 = idf(docFreq=9, maxDocs=43556)
                0.07242714 = queryNorm
              4.689609 = fieldWeight in 157, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.379218 = idf(docFreq=9, maxDocs=43556)
                0.5 = fieldNorm(doc=157)
        0.5 = coord(1/2)
    
  5. Moffat, A.; Zobel, J.: Self-indexing inverted files for fast text retrieval (1996) 1.95
    1.9512537 = sum of:
      1.9512537 = product of:
        3.9025073 = sum of:
          3.9025073 = weight(author_txt:zobel in 2007) [ClassicSimilarity], result of:
            3.9025073 = score(doc=2007,freq=1.0), product of:
              0.8321605 = queryWeight, product of:
                1.2250085 = boost
                9.379218 = idf(docFreq=9, maxDocs=43556)
                0.07242714 = queryNorm
              4.689609 = fieldWeight in 2007, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.379218 = idf(docFreq=9, maxDocs=43556)
                0.5 = fieldNorm(doc=2007)
        0.5 = coord(1/2)
    

Similar documents (content)

  1. Moffat, A.; Bell, T.A.H.: In situ generation of compressed inverted files (1995) 0.18
    0.1807967 = sum of:
      0.1807967 = product of:
        0.75331956 = sum of:
          0.115809135 = weight(abstract_txt:disc in 2714) [ClassicSimilarity], result of:
            0.115809135 = score(doc=2714,freq=2.0), product of:
              0.14455752 = queryWeight, product of:
                1.0465883 = boost
                7.250986 = idf(docFreq=83, maxDocs=43556)
                0.019048806 = queryNorm
              0.8011284 = fieldWeight in 2714, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.250986 = idf(docFreq=83, maxDocs=43556)
                0.078125 = fieldNorm(doc=2714)
          0.15189894 = weight(abstract_txt:compressed in 2714) [ClassicSimilarity], result of:
            0.15189894 = score(doc=2714,freq=1.0), product of:
              0.21823546 = queryWeight, product of:
                1.2859325 = boost
                8.909214 = idf(docFreq=15, maxDocs=43556)
                0.019048806 = queryNorm
              0.69603235 = fieldWeight in 2714, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.909214 = idf(docFreq=15, maxDocs=43556)
                0.078125 = fieldNorm(doc=2714)
          0.079517394 = weight(abstract_txt:index in 2714) [ClassicSimilarity], result of:
            0.079517394 = score(doc=2714,freq=3.0), product of:
              0.12383209 = queryWeight, product of:
                1.3698945 = boost
                4.74546 = idf(docFreq=1028, maxDocs=43556)
                0.019048806 = queryNorm
              0.64213884 = fieldWeight in 2714, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.74546 = idf(docFreq=1028, maxDocs=43556)
                0.078125 = fieldNorm(doc=2714)
          0.08065156 = weight(abstract_txt:indexes in 2714) [ClassicSimilarity], result of:
            0.08065156 = score(doc=2714,freq=1.0), product of:
              0.18029098 = queryWeight, product of:
                1.6529416 = boost
                5.7259655 = idf(docFreq=385, maxDocs=43556)
                0.019048806 = queryNorm
              0.44734105 = fieldWeight in 2714, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7259655 = idf(docFreq=385, maxDocs=43556)
                0.078125 = fieldNorm(doc=2714)
          0.050826035 = weight(abstract_txt:document in 2714) [ClassicSimilarity], result of:
            0.050826035 = score(doc=2714,freq=1.0), product of:
              0.1517004 = queryWeight, product of:
                1.8569897 = boost
                4.28854 = idf(docFreq=1624, maxDocs=43556)
                0.019048806 = queryNorm
              0.33504218 = fieldWeight in 2714, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.28854 = idf(docFreq=1624, maxDocs=43556)
                0.078125 = fieldNorm(doc=2714)
          0.2746165 = weight(abstract_txt:inverted in 2714) [ClassicSimilarity], result of:
            0.2746165 = score(doc=2714,freq=2.0), product of:
              0.32387188 = queryWeight, product of:
                2.2154255 = boost
                7.6744695 = idf(docFreq=54, maxDocs=43556)
                0.019048806 = queryNorm
              0.8479171 = fieldWeight in 2714, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.6744695 = idf(docFreq=54, maxDocs=43556)
                0.078125 = fieldNorm(doc=2714)
        0.24 = coord(6/25)
    
  2. Moffat, A.; Zobel, J.: Self-indexing inverted files for fast text retrieval (1996) 0.16
    0.16372266 = sum of:
      0.16372266 = product of:
        0.6821778 = sum of:
          0.02495471 = weight(abstract_txt:retrieval in 2007) [ClassicSimilarity], result of:
            0.02495471 = score(doc=2007,freq=3.0), product of:
              0.06635904 = queryWeight, product of:
                1.0028144 = boost
                3.4738557 = idf(docFreq=3669, maxDocs=43556)
                0.019048806 = queryNorm
              0.3760559 = fieldWeight in 2007, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.4738557 = idf(docFreq=3669, maxDocs=43556)
                0.0625 = fieldNorm(doc=2007)
          0.08269031 = weight(abstract_txt:reduced in 2007) [ClassicSimilarity], result of:
            0.08269031 = score(doc=2007,freq=2.0), product of:
              0.1340053 = queryWeight, product of:
                1.0076658 = boost
                6.9813223 = idf(docFreq=109, maxDocs=43556)
                0.019048806 = queryNorm
              0.6170675 = fieldWeight in 2007, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.9813223 = idf(docFreq=109, maxDocs=43556)
                0.0625 = fieldNorm(doc=2007)
          0.05525807 = weight(abstract_txt:time in 2007) [ClassicSimilarity], result of:
            0.05525807 = score(doc=2007,freq=5.0), product of:
              0.09508514 = queryWeight, product of:
                1.2004024 = boost
                4.1583214 = idf(docFreq=1850, maxDocs=43556)
                0.019048806 = queryNorm
              0.5811431 = fieldWeight in 2007, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.1583214 = idf(docFreq=1850, maxDocs=43556)
                0.0625 = fieldNorm(doc=2007)
          0.17185403 = weight(abstract_txt:compressed in 2007) [ClassicSimilarity], result of:
            0.17185403 = score(doc=2007,freq=2.0), product of:
              0.21823546 = queryWeight, product of:
                1.2859325 = boost
                8.909214 = idf(docFreq=15, maxDocs=43556)
                0.019048806 = queryNorm
              0.7874707 = fieldWeight in 2007, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.909214 = idf(docFreq=15, maxDocs=43556)
                0.0625 = fieldNorm(doc=2007)
          0.036727514 = weight(abstract_txt:index in 2007) [ClassicSimilarity], result of:
            0.036727514 = score(doc=2007,freq=1.0), product of:
              0.12383209 = queryWeight, product of:
                1.3698945 = boost
                4.74546 = idf(docFreq=1028, maxDocs=43556)
                0.019048806 = queryNorm
              0.29659125 = fieldWeight in 2007, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.74546 = idf(docFreq=1028, maxDocs=43556)
                0.0625 = fieldNorm(doc=2007)
          0.31069311 = weight(abstract_txt:inverted in 2007) [ClassicSimilarity], result of:
            0.31069311 = score(doc=2007,freq=4.0), product of:
              0.32387188 = queryWeight, product of:
                2.2154255 = boost
                7.6744695 = idf(docFreq=54, maxDocs=43556)
                0.019048806 = queryNorm
              0.9593087 = fieldWeight in 2007, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.6744695 = idf(docFreq=54, maxDocs=43556)
                0.0625 = fieldNorm(doc=2007)
        0.24 = coord(6/25)
    
  3. MacFarlane, A.; McCann, J.A.; Robertson, S.E.: Parallel methods for the update of partitioned inverted files (2007) 0.15
    0.15165961 = sum of:
      0.15165961 = product of:
        0.63191503 = sum of:
          0.020375434 = weight(abstract_txt:retrieval in 2817) [ClassicSimilarity], result of:
            0.020375434 = score(doc=2817,freq=2.0), product of:
              0.06635904 = queryWeight, product of:
                1.0028144 = boost
                3.4738557 = idf(docFreq=3669, maxDocs=43556)
                0.019048806 = queryNorm
              0.30704835 = fieldWeight in 2817, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4738557 = idf(docFreq=3669, maxDocs=43556)
                0.0625 = fieldNorm(doc=2817)
          0.07226793 = weight(abstract_txt:requirement in 2817) [ClassicSimilarity], result of:
            0.07226793 = score(doc=2817,freq=1.0), product of:
              0.15433316 = queryWeight, product of:
                1.0813969 = boost
                7.492148 = idf(docFreq=65, maxDocs=43556)
                0.019048806 = queryNorm
              0.46825925 = fieldWeight in 2817, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.492148 = idf(docFreq=65, maxDocs=43556)
                0.0625 = fieldNorm(doc=2817)
          0.036727514 = weight(abstract_txt:index in 2817) [ClassicSimilarity], result of:
            0.036727514 = score(doc=2817,freq=1.0), product of:
              0.12383209 = queryWeight, product of:
                1.3698945 = boost
                4.74546 = idf(docFreq=1028, maxDocs=43556)
                0.019048806 = queryNorm
              0.29659125 = fieldWeight in 2817, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.74546 = idf(docFreq=1028, maxDocs=43556)
                0.0625 = fieldNorm(doc=2817)
          0.064521246 = weight(abstract_txt:indexes in 2817) [ClassicSimilarity], result of:
            0.064521246 = score(doc=2817,freq=1.0), product of:
              0.18029098 = queryWeight, product of:
                1.6529416 = boost
                5.7259655 = idf(docFreq=385, maxDocs=43556)
                0.019048806 = queryNorm
              0.35787284 = fieldWeight in 2817, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7259655 = idf(docFreq=385, maxDocs=43556)
                0.0625 = fieldNorm(doc=2817)
          0.057503097 = weight(abstract_txt:document in 2817) [ClassicSimilarity], result of:
            0.057503097 = score(doc=2817,freq=2.0), product of:
              0.1517004 = queryWeight, product of:
                1.8569897 = boost
                4.28854 = idf(docFreq=1624, maxDocs=43556)
                0.019048806 = queryNorm
              0.37905696 = fieldWeight in 2817, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.28854 = idf(docFreq=1624, maxDocs=43556)
                0.0625 = fieldNorm(doc=2817)
          0.3805198 = weight(abstract_txt:inverted in 2817) [ClassicSimilarity], result of:
            0.3805198 = score(doc=2817,freq=6.0), product of:
              0.32387188 = queryWeight, product of:
                2.2154255 = boost
                7.6744695 = idf(docFreq=54, maxDocs=43556)
                0.019048806 = queryNorm
              1.1749084 = fieldWeight in 2817, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.6744695 = idf(docFreq=54, maxDocs=43556)
                0.0625 = fieldNorm(doc=2817)
        0.24 = coord(6/25)
    
  4. Tseng, Y.-H.: Automatic thesaurus generation for Chinese documents (2002) 0.13
    0.12558174 = sum of:
      0.12558174 = product of:
        0.52325726 = sum of:
          0.058470882 = weight(abstract_txt:reduced in 224) [ClassicSimilarity], result of:
            0.058470882 = score(doc=224,freq=1.0), product of:
              0.1340053 = queryWeight, product of:
                1.0076658 = boost
                6.9813223 = idf(docFreq=109, maxDocs=43556)
                0.019048806 = queryNorm
              0.43633264 = fieldWeight in 224, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9813223 = idf(docFreq=109, maxDocs=43556)
                0.0625 = fieldNorm(doc=224)
          0.08123366 = weight(abstract_txt:decreasing in 224) [ClassicSimilarity], result of:
            0.08123366 = score(doc=224,freq=1.0), product of:
              0.16684742 = queryWeight, product of:
                1.1243856 = boost
                7.7899823 = idf(docFreq=48, maxDocs=43556)
                0.019048806 = queryNorm
              0.4868739 = fieldWeight in 224, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.7899823 = idf(docFreq=48, maxDocs=43556)
                0.0625 = fieldNorm(doc=224)
          0.02471216 = weight(abstract_txt:time in 224) [ClassicSimilarity], result of:
            0.02471216 = score(doc=224,freq=1.0), product of:
              0.09508514 = queryWeight, product of:
                1.2004024 = boost
                4.1583214 = idf(docFreq=1850, maxDocs=43556)
                0.019048806 = queryNorm
              0.2598951 = fieldWeight in 224, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1583214 = idf(docFreq=1850, maxDocs=43556)
                0.0625 = fieldNorm(doc=224)
          0.057503097 = weight(abstract_txt:document in 224) [ClassicSimilarity], result of:
            0.057503097 = score(doc=224,freq=2.0), product of:
              0.1517004 = queryWeight, product of:
                1.8569897 = boost
                4.28854 = idf(docFreq=1624, maxDocs=43556)
                0.019048806 = queryNorm
              0.37905696 = fieldWeight in 224, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.28854 = idf(docFreq=1624, maxDocs=43556)
                0.0625 = fieldNorm(doc=224)
          0.19282943 = weight(abstract_txt:sorted in 224) [ClassicSimilarity], result of:
            0.19282943 = score(doc=224,freq=1.0), product of:
              0.3740713 = queryWeight, product of:
                2.380936 = boost
                8.247815 = idf(docFreq=30, maxDocs=43556)
                0.019048806 = queryNorm
              0.51548845 = fieldWeight in 224, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.247815 = idf(docFreq=30, maxDocs=43556)
                0.0625 = fieldNorm(doc=224)
          0.10850801 = weight(abstract_txt:frequency in 224) [ClassicSimilarity], result of:
            0.10850801 = score(doc=224,freq=1.0), product of:
              0.2918617 = queryWeight, product of:
                2.5757558 = boost
                5.9484615 = idf(docFreq=308, maxDocs=43556)
                0.019048806 = queryNorm
              0.37177885 = fieldWeight in 224, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9484615 = idf(docFreq=308, maxDocs=43556)
                0.0625 = fieldNorm(doc=224)
        0.24 = coord(6/25)
    
  5. Shieh, W.-Y.; Chung, C.-P.: ¬A statistics-based approach to incrementally update inverted files (2005) 0.12
    0.122851186 = sum of:
      0.122851186 = product of:
        0.6142559 = sum of:
          0.01800951 = weight(abstract_txt:retrieval in 3008) [ClassicSimilarity], result of:
            0.01800951 = score(doc=3008,freq=1.0), product of:
              0.06635904 = queryWeight, product of:
                1.0028144 = boost
                3.4738557 = idf(docFreq=3669, maxDocs=43556)
                0.019048806 = queryNorm
              0.27139497 = fieldWeight in 3008, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4738557 = idf(docFreq=3669, maxDocs=43556)
                0.078125 = fieldNorm(doc=3008)
          0.08032338 = weight(abstract_txt:reduction in 3008) [ClassicSimilarity], result of:
            0.08032338 = score(doc=3008,freq=1.0), product of:
              0.1427086 = queryWeight, product of:
                1.0398737 = boost
                7.204466 = idf(docFreq=87, maxDocs=43556)
                0.019048806 = queryNorm
              0.5628489 = fieldWeight in 3008, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.204466 = idf(docFreq=87, maxDocs=43556)
                0.078125 = fieldNorm(doc=3008)
          0.0308902 = weight(abstract_txt:time in 3008) [ClassicSimilarity], result of:
            0.0308902 = score(doc=3008,freq=1.0), product of:
              0.09508514 = queryWeight, product of:
                1.2004024 = boost
                4.1583214 = idf(docFreq=1850, maxDocs=43556)
                0.019048806 = queryNorm
              0.32486886 = fieldWeight in 3008, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1583214 = idf(docFreq=1850, maxDocs=43556)
                0.078125 = fieldNorm(doc=3008)
          0.050826035 = weight(abstract_txt:document in 3008) [ClassicSimilarity], result of:
            0.050826035 = score(doc=3008,freq=1.0), product of:
              0.1517004 = queryWeight, product of:
                1.8569897 = boost
                4.28854 = idf(docFreq=1624, maxDocs=43556)
                0.019048806 = queryNorm
              0.33504218 = fieldWeight in 3008, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.28854 = idf(docFreq=1624, maxDocs=43556)
                0.078125 = fieldNorm(doc=3008)
          0.4342068 = weight(abstract_txt:inverted in 3008) [ClassicSimilarity], result of:
            0.4342068 = score(doc=3008,freq=5.0), product of:
              0.32387188 = queryWeight, product of:
                2.2154255 = boost
                7.6744695 = idf(docFreq=54, maxDocs=43556)
                0.019048806 = queryNorm
              1.3406746 = fieldWeight in 3008, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.6744695 = idf(docFreq=54, maxDocs=43556)
                0.078125 = fieldNorm(doc=3008)
        0.2 = coord(5/25)