Document (#15828)

Author
Persin, M.
Zobel, J.
Sacks-Davis, R.
Title
Filtered document retrieval with frequency-sorted indexes
Source
Journal of the American Society for Information SCience. 47(1996) no.10, S.749-764
Year
1996
Abstract
Proposes an evaluation technique for ranking that uses early recognition of which documents are likely to be highly ranked to reduce costs. Queries are evaluated in 2% of the memory of standard implementation without degradation in retrieval effectiveness. CPU time and disc traffic can also be dramatically reduced by designing inverted indexes explicitly to support the technique. Inverted lists are sorted by decreasing within-document frequency rather than by document number, and this method experimentally reduces CPU time and disk traffic to around 1/3rd of the original requirement. Frequency sorting can lead to a net reduction in index size, regardless of whether the index is compressed

Similar documents (author)

  1. Kaszkiel, M.; Zobel, J.: Effective ranking with arbitrary passages (2001) 1.94
    1.9421507 = sum of:
      1.9421507 = product of:
        3.8843014 = sum of:
          3.8843014 = weight(author_txt:zobel in 680) [ClassicSimilarity], result of:
            3.8843014 = score(doc=680,freq=1.0), product of:
              0.8299518 = queryWeight, product of:
                1.2197577 = boost
                9.360306 = idf(docFreq=9, maxDocs=42740)
                0.07269245 = queryNorm
              4.680153 = fieldWeight in 680, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.360306 = idf(docFreq=9, maxDocs=42740)
                0.5 = fieldNorm(doc=680)
        0.5 = coord(1/2)
    
  2. Heinz, S.; Zobel, J.: Efficient single-pass index construction for text databases (2003) 1.94
    1.9421507 = sum of:
      1.9421507 = product of:
        3.8843014 = sum of:
          3.8843014 = weight(author_txt:zobel in 2679) [ClassicSimilarity], result of:
            3.8843014 = score(doc=2679,freq=1.0), product of:
              0.8299518 = queryWeight, product of:
                1.2197577 = boost
                9.360306 = idf(docFreq=9, maxDocs=42740)
                0.07269245 = queryNorm
              4.680153 = fieldWeight in 2679, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.360306 = idf(docFreq=9, maxDocs=42740)
                0.5 = fieldNorm(doc=2679)
        0.5 = coord(1/2)
    
  3. Uitdenbogerd, A.L.; Zobel, J.: ¬An architecture for effective music information retrieval (2004) 1.94
    1.9421507 = sum of:
      1.9421507 = product of:
        3.8843014 = sum of:
          3.8843014 = weight(author_txt:zobel in 4056) [ClassicSimilarity], result of:
            3.8843014 = score(doc=4056,freq=1.0), product of:
              0.8299518 = queryWeight, product of:
                1.2197577 = boost
                9.360306 = idf(docFreq=9, maxDocs=42740)
                0.07269245 = queryNorm
              4.680153 = fieldWeight in 4056, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.360306 = idf(docFreq=9, maxDocs=42740)
                0.5 = fieldNorm(doc=4056)
        0.5 = coord(1/2)
    
  4. Hoad, T.C.; Zobel, J.: Methods for identifying versioned and plagiarized documents (2003) 1.94
    1.9421507 = sum of:
      1.9421507 = product of:
        3.8843014 = sum of:
          3.8843014 = weight(author_txt:zobel in 160) [ClassicSimilarity], result of:
            3.8843014 = score(doc=160,freq=1.0), product of:
              0.8299518 = queryWeight, product of:
                1.2197577 = boost
                9.360306 = idf(docFreq=9, maxDocs=42740)
                0.07269245 = queryNorm
              4.680153 = fieldWeight in 160, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.360306 = idf(docFreq=9, maxDocs=42740)
                0.5 = fieldNorm(doc=160)
        0.5 = coord(1/2)
    
  5. Moffat, A.; Zobel, J.: Self-indexing inverted files for fast text retrieval (1996) 1.94
    1.9421507 = sum of:
      1.9421507 = product of:
        3.8843014 = sum of:
          3.8843014 = weight(author_txt:zobel in 2010) [ClassicSimilarity], result of:
            3.8843014 = score(doc=2010,freq=1.0), product of:
              0.8299518 = queryWeight, product of:
                1.2197577 = boost
                9.360306 = idf(docFreq=9, maxDocs=42740)
                0.07269245 = queryNorm
              4.680153 = fieldWeight in 2010, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.360306 = idf(docFreq=9, maxDocs=42740)
                0.5 = fieldNorm(doc=2010)
        0.5 = coord(1/2)
    

Similar documents (content)

  1. Moffat, A.; Bell, T.A.H.: In situ generation of compressed inverted files (1995) 0.18
    0.18012136 = sum of:
      0.18012136 = product of:
        0.7505057 = sum of:
          0.114961706 = weight(abstract_txt:disc in 2717) [ClassicSimilarity], result of:
            0.114961706 = score(doc=2717,freq=2.0), product of:
              0.143875 = queryWeight, product of:
                1.0436491 = boost
                7.232074 = idf(docFreq=83, maxDocs=42740)
                0.019061979 = queryNorm
              0.79903877 = fieldWeight in 2717, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.232074 = idf(docFreq=83, maxDocs=42740)
                0.078125 = fieldNorm(doc=2717)
          0.15100773 = weight(abstract_txt:compressed in 2717) [ClassicSimilarity], result of:
            0.15100773 = score(doc=2717,freq=1.0), product of:
              0.21741657 = queryWeight, product of:
                1.2829454 = boost
                8.890302 = idf(docFreq=15, maxDocs=42740)
                0.019061979 = queryNorm
              0.6945548 = fieldWeight in 2717, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.890302 = idf(docFreq=15, maxDocs=42740)
                0.078125 = fieldNorm(doc=2717)
          0.07904816 = weight(abstract_txt:index in 2717) [ClassicSimilarity], result of:
            0.07904816 = score(doc=2717,freq=3.0), product of:
              0.12336461 = queryWeight, product of:
                1.3666967 = boost
                4.7353325 = idf(docFreq=1019, maxDocs=42740)
                0.019061979 = queryNorm
              0.64076847 = fieldWeight in 2717, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.7353325 = idf(docFreq=1019, maxDocs=42740)
                0.078125 = fieldNorm(doc=2717)
          0.08022237 = weight(abstract_txt:indexes in 2717) [ClassicSimilarity], result of:
            0.08022237 = score(doc=2717,freq=1.0), product of:
              0.17968018 = queryWeight, product of:
                1.6494036 = boost
                5.7148557 = idf(docFreq=382, maxDocs=42740)
                0.019061979 = queryNorm
              0.4464731 = fieldWeight in 2717, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7148557 = idf(docFreq=382, maxDocs=42740)
                0.078125 = fieldNorm(doc=2717)
          0.050574943 = weight(abstract_txt:document in 2717) [ClassicSimilarity], result of:
            0.050574943 = score(doc=2717,freq=1.0), product of:
              0.1512251 = queryWeight, product of:
                1.8532518 = boost
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.019061979 = queryNorm
              0.33443484 = fieldWeight in 2717, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.078125 = fieldNorm(doc=2717)
          0.2746908 = weight(abstract_txt:inverted in 2717) [ClassicSimilarity], result of:
            0.2746908 = score(doc=2717,freq=2.0), product of:
              0.32398328 = queryWeight, product of:
                2.2148185 = boost
                7.6739063 = idf(docFreq=53, maxDocs=42740)
                0.019061979 = queryNorm
              0.84785485 = fieldWeight in 2717, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.6739063 = idf(docFreq=53, maxDocs=42740)
                0.078125 = fieldNorm(doc=2717)
        0.24 = coord(6/25)
    
  2. Moffat, A.; Zobel, J.: Self-indexing inverted files for fast text retrieval (1996) 0.16
    0.1638942 = sum of:
      0.1638942 = product of:
        0.68289256 = sum of:
          0.02477224 = weight(abstract_txt:retrieval in 2010) [ClassicSimilarity], result of:
            0.02477224 = score(doc=2010,freq=3.0), product of:
              0.06604597 = queryWeight, product of:
                3.4648013 = idf(docFreq=3633, maxDocs=42740)
                0.019061979 = queryNorm
              0.37507573 = fieldWeight in 2010, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.4648013 = idf(docFreq=3633, maxDocs=42740)
                0.0625 = fieldNorm(doc=2010)
          0.08440727 = weight(abstract_txt:reduced in 2010) [ClassicSimilarity], result of:
            0.08440727 = score(doc=2010,freq=2.0), product of:
              0.1358761 = queryWeight, product of:
                1.0142229 = boost
                7.0281615 = idf(docFreq=102, maxDocs=42740)
                0.019061979 = queryNorm
              0.6212076 = fieldWeight in 2010, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.0281615 = idf(docFreq=102, maxDocs=42740)
                0.0625 = fieldNorm(doc=2010)
          0.055579342 = weight(abstract_txt:time in 2010) [ClassicSimilarity], result of:
            0.055579342 = score(doc=2010,freq=5.0), product of:
              0.09546895 = queryWeight, product of:
                1.2022864 = boost
                4.1656833 = idf(docFreq=1802, maxDocs=42740)
                0.019061979 = queryNorm
              0.5821719 = fieldWeight in 2010, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.1656833 = idf(docFreq=1802, maxDocs=42740)
                0.0625 = fieldNorm(doc=2010)
          0.17084575 = weight(abstract_txt:compressed in 2010) [ClassicSimilarity], result of:
            0.17084575 = score(doc=2010,freq=2.0), product of:
              0.21741657 = queryWeight, product of:
                1.2829454 = boost
                8.890302 = idf(docFreq=15, maxDocs=42740)
                0.019061979 = queryNorm
              0.7857991 = fieldWeight in 2010, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.890302 = idf(docFreq=15, maxDocs=42740)
                0.0625 = fieldNorm(doc=2010)
          0.036510777 = weight(abstract_txt:index in 2010) [ClassicSimilarity], result of:
            0.036510777 = score(doc=2010,freq=1.0), product of:
              0.12336461 = queryWeight, product of:
                1.3666967 = boost
                4.7353325 = idf(docFreq=1019, maxDocs=42740)
                0.019061979 = queryNorm
              0.29595828 = fieldWeight in 2010, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7353325 = idf(docFreq=1019, maxDocs=42740)
                0.0625 = fieldNorm(doc=2010)
          0.31077716 = weight(abstract_txt:inverted in 2010) [ClassicSimilarity], result of:
            0.31077716 = score(doc=2010,freq=4.0), product of:
              0.32398328 = queryWeight, product of:
                2.2148185 = boost
                7.6739063 = idf(docFreq=53, maxDocs=42740)
                0.019061979 = queryNorm
              0.9592383 = fieldWeight in 2010, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.6739063 = idf(docFreq=53, maxDocs=42740)
                0.0625 = fieldNorm(doc=2010)
        0.24 = coord(6/25)
    
  3. MacFarlane, A.; McCann, J.A.; Robertson, S.E.: Parallel methods for the update of partitioned inverted files (2007) 0.15
    0.151537 = sum of:
      0.151537 = product of:
        0.63140416 = sum of:
          0.02022645 = weight(abstract_txt:retrieval in 2820) [ClassicSimilarity], result of:
            0.02022645 = score(doc=2820,freq=2.0), product of:
              0.06604597 = queryWeight, product of:
                3.4648013 = idf(docFreq=3633, maxDocs=42740)
                0.019061979 = queryNorm
              0.30624807 = fieldWeight in 2820, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4648013 = idf(docFreq=3633, maxDocs=42740)
                0.0625 = fieldNorm(doc=2820)
          0.07264729 = weight(abstract_txt:requirement in 2820) [ClassicSimilarity], result of:
            0.07264729 = score(doc=2820,freq=1.0), product of:
              0.15489812 = queryWeight, product of:
                1.0828915 = boost
                7.5040073 = idf(docFreq=63, maxDocs=42740)
                0.019061979 = queryNorm
              0.46900046 = fieldWeight in 2820, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5040073 = idf(docFreq=63, maxDocs=42740)
                0.0625 = fieldNorm(doc=2820)
          0.036510777 = weight(abstract_txt:index in 2820) [ClassicSimilarity], result of:
            0.036510777 = score(doc=2820,freq=1.0), product of:
              0.12336461 = queryWeight, product of:
                1.3666967 = boost
                4.7353325 = idf(docFreq=1019, maxDocs=42740)
                0.019061979 = queryNorm
              0.29595828 = fieldWeight in 2820, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7353325 = idf(docFreq=1019, maxDocs=42740)
                0.0625 = fieldNorm(doc=2820)
          0.06417789 = weight(abstract_txt:indexes in 2820) [ClassicSimilarity], result of:
            0.06417789 = score(doc=2820,freq=1.0), product of:
              0.17968018 = queryWeight, product of:
                1.6494036 = boost
                5.7148557 = idf(docFreq=382, maxDocs=42740)
                0.019061979 = queryNorm
              0.35717848 = fieldWeight in 2820, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7148557 = idf(docFreq=382, maxDocs=42740)
                0.0625 = fieldNorm(doc=2820)
          0.057219017 = weight(abstract_txt:document in 2820) [ClassicSimilarity], result of:
            0.057219017 = score(doc=2820,freq=2.0), product of:
              0.1512251 = queryWeight, product of:
                1.8532518 = boost
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.019061979 = queryNorm
              0.37836984 = fieldWeight in 2820, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.0625 = fieldNorm(doc=2820)
          0.38062274 = weight(abstract_txt:inverted in 2820) [ClassicSimilarity], result of:
            0.38062274 = score(doc=2820,freq=6.0), product of:
              0.32398328 = queryWeight, product of:
                2.2148185 = boost
                7.6739063 = idf(docFreq=53, maxDocs=42740)
                0.019061979 = queryNorm
              1.1748222 = fieldWeight in 2820, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.6739063 = idf(docFreq=53, maxDocs=42740)
                0.0625 = fieldNorm(doc=2820)
        0.24 = coord(6/25)
    
  4. Tseng, Y.-H.: Automatic thesaurus generation for Chinese documents (2002) 0.13
    0.12560919 = sum of:
      0.12560919 = product of:
        0.52337164 = sum of:
          0.05968495 = weight(abstract_txt:reduced in 227) [ClassicSimilarity], result of:
            0.05968495 = score(doc=227,freq=1.0), product of:
              0.1358761 = queryWeight, product of:
                1.0142229 = boost
                7.0281615 = idf(docFreq=102, maxDocs=42740)
                0.019061979 = queryNorm
              0.4392601 = fieldWeight in 227, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0281615 = idf(docFreq=102, maxDocs=42740)
                0.0625 = fieldNorm(doc=227)
          0.08068302 = weight(abstract_txt:decreasing in 227) [ClassicSimilarity], result of:
            0.08068302 = score(doc=227,freq=1.0), product of:
              0.16611977 = queryWeight, product of:
                1.1214309 = boost
                7.77107 = idf(docFreq=48, maxDocs=42740)
                0.019061979 = queryNorm
              0.48569188 = fieldWeight in 227, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.77107 = idf(docFreq=48, maxDocs=42740)
                0.0625 = fieldNorm(doc=227)
          0.02485584 = weight(abstract_txt:time in 227) [ClassicSimilarity], result of:
            0.02485584 = score(doc=227,freq=1.0), product of:
              0.09546895 = queryWeight, product of:
                1.2022864 = boost
                4.1656833 = idf(docFreq=1802, maxDocs=42740)
                0.019061979 = queryNorm
              0.2603552 = fieldWeight in 227, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1656833 = idf(docFreq=1802, maxDocs=42740)
                0.0625 = fieldNorm(doc=227)
          0.057219017 = weight(abstract_txt:document in 227) [ClassicSimilarity], result of:
            0.057219017 = score(doc=227,freq=2.0), product of:
              0.1512251 = queryWeight, product of:
                1.8532518 = boost
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.019061979 = queryNorm
              0.37836984 = fieldWeight in 227, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.0625 = fieldNorm(doc=227)
          0.19160004 = weight(abstract_txt:sorted in 227) [ClassicSimilarity], result of:
            0.19160004 = score(doc=227,freq=1.0), product of:
              0.3725406 = queryWeight, product of:
                2.3750002 = boost
                8.228904 = idf(docFreq=30, maxDocs=42740)
                0.019061979 = queryNorm
              0.5143065 = fieldWeight in 227, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.228904 = idf(docFreq=30, maxDocs=42740)
                0.0625 = fieldNorm(doc=227)
          0.109328784 = weight(abstract_txt:frequency in 227) [ClassicSimilarity], result of:
            0.109328784 = score(doc=227,freq=1.0), product of:
              0.29337963 = queryWeight, product of:
                2.5812938 = boost
                5.962447 = idf(docFreq=298, maxDocs=42740)
                0.019061979 = queryNorm
              0.37265295 = fieldWeight in 227, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.962447 = idf(docFreq=298, maxDocs=42740)
                0.0625 = fieldNorm(doc=227)
        0.24 = coord(6/25)
    
  5. Shieh, W.-Y.; Chung, C.-P.: ¬A statistics-based approach to incrementally update inverted files (2005) 0.12
    0.12286923 = sum of:
      0.12286923 = product of:
        0.61434615 = sum of:
          0.017877826 = weight(abstract_txt:retrieval in 3011) [ClassicSimilarity], result of:
            0.017877826 = score(doc=3011,freq=1.0), product of:
              0.06604597 = queryWeight, product of:
                3.4648013 = idf(docFreq=3633, maxDocs=42740)
                0.019061979 = queryNorm
              0.2706876 = fieldWeight in 3011, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4648013 = idf(docFreq=3633, maxDocs=42740)
                0.078125 = fieldNorm(doc=3011)
          0.08049932 = weight(abstract_txt:reduction in 3011) [ClassicSimilarity], result of:
            0.08049932 = score(doc=3011,freq=1.0), product of:
              0.1429403 = queryWeight, product of:
                1.0402535 = boost
                7.2085433 = idf(docFreq=85, maxDocs=42740)
                0.019061979 = queryNorm
              0.56316745 = fieldWeight in 3011, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.2085433 = idf(docFreq=85, maxDocs=42740)
                0.078125 = fieldNorm(doc=3011)
          0.031069798 = weight(abstract_txt:time in 3011) [ClassicSimilarity], result of:
            0.031069798 = score(doc=3011,freq=1.0), product of:
              0.09546895 = queryWeight, product of:
                1.2022864 = boost
                4.1656833 = idf(docFreq=1802, maxDocs=42740)
                0.019061979 = queryNorm
              0.325444 = fieldWeight in 3011, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1656833 = idf(docFreq=1802, maxDocs=42740)
                0.078125 = fieldNorm(doc=3011)
          0.050574943 = weight(abstract_txt:document in 3011) [ClassicSimilarity], result of:
            0.050574943 = score(doc=3011,freq=1.0), product of:
              0.1512251 = queryWeight, product of:
                1.8532518 = boost
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.019061979 = queryNorm
              0.33443484 = fieldWeight in 3011, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.078125 = fieldNorm(doc=3011)
          0.4343243 = weight(abstract_txt:inverted in 3011) [ClassicSimilarity], result of:
            0.4343243 = score(doc=3011,freq=5.0), product of:
              0.32398328 = queryWeight, product of:
                2.2148185 = boost
                7.6739063 = idf(docFreq=53, maxDocs=42740)
                0.019061979 = queryNorm
              1.3405763 = fieldWeight in 3011, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.6739063 = idf(docFreq=53, maxDocs=42740)
                0.078125 = fieldNorm(doc=3011)
        0.2 = coord(5/25)