Document (#23707)

Author
Cannane, A.
Williams, H.E.
Title
General-purpose compression for efficient retrieval
Source
Journal of the American Society for Information Science and technology. 52(2001) no.5, S.430-437
Year
2001
Abstract
Compression of databases not only reduces space requirements but can also reduce overall retrieval times. In text databases, compression of documents based on semistatic modeling with words has been shown to be both practical and fast. Similarly, for specific applications -such as databases of integers or scientific databases-specially designed semistatic compression schemes work well. We propose a scheme for general-purpose compression that can be applied to all types of data stored in large collections. We describe our approach -which we call RAY-in detail, and show experimentally the compression available, compression and decompression costs, and performance as a stream and random-access technique. We show that, in many cases, RAY achieves better compression than an efficient Huffman scheme and popular adaptive compression techniques, and that it can be used as an efficient general-purpose compression scheme
Theme
Retrievalalgorithmen

Similar documents (author)

  1. Williams, R.M.: ISI search network research front specialties (1983) 4.51
    4.5062704 = sum of:
      4.5062704 = weight(author_txt:williams in 1474) [ClassicSimilarity], result of:
        4.5062704 = fieldWeight in 1474, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.210033 = idf(docFreq=84, maxDocs=42306)
          0.625 = fieldNorm(doc=1474)
    
  2. Williams, J.W.: Serials cataloging, 1985-1990 : an overview of a half-decade (1992) 4.51
    4.5062704 = sum of:
      4.5062704 = weight(author_txt:williams in 4207) [ClassicSimilarity], result of:
        4.5062704 = fieldWeight in 4207, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.210033 = idf(docFreq=84, maxDocs=42306)
          0.625 = fieldNorm(doc=4207)
    
  3. Williams, D.A.: Information skills in the school curriculum (1991) 4.51
    4.5062704 = sum of:
      4.5062704 = weight(author_txt:williams in 4835) [ClassicSimilarity], result of:
        4.5062704 = fieldWeight in 4835, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.210033 = idf(docFreq=84, maxDocs=42306)
          0.625 = fieldNorm(doc=4835)
    
  4. Williams, M.: Transparent information systems through gateways, front ends, intermediaries, and interfaces (1986) 4.51
    4.5062704 = sum of:
      4.5062704 = weight(author_txt:williams in 5135) [ClassicSimilarity], result of:
        4.5062704 = fieldWeight in 5135, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.210033 = idf(docFreq=84, maxDocs=42306)
          0.625 = fieldNorm(doc=5135)
    
  5. Williams, F.: Appraisal and evaluation of software products (1992) 4.51
    4.5062704 = sum of:
      4.5062704 = weight(author_txt:williams in 5307) [ClassicSimilarity], result of:
        4.5062704 = fieldWeight in 5307, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.210033 = idf(docFreq=84, maxDocs=42306)
          0.625 = fieldNorm(doc=5307)
    

Similar documents (content)

  1. Bell, T.C.; Moffat, A.; Nevill-Manning, C.G.; Witten, I.H.; Zobel, J.: Data compression in full-text retrieval system (1993) 0.30
    0.2986489 = sum of:
      0.2986489 = product of:
        1.2443705 = sum of:
          0.037708893 = weight(abstract_txt:stored in 5643) [ClassicSimilarity], result of:
            0.037708893 = score(doc=5643,freq=1.0), product of:
              0.06349737 = queryWeight, product of:
                1.0695608 = boost
                6.334564 = idf(docFreq=203, maxDocs=42306)
                0.009372028 = queryNorm
              0.5938654 = fieldWeight in 5643, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.334564 = idf(docFreq=203, maxDocs=42306)
                0.09375 = fieldNorm(doc=5643)
          0.012317802 = weight(abstract_txt:retrieval in 5643) [ClassicSimilarity], result of:
            0.012317802 = score(doc=5643,freq=1.0), product of:
              0.037945364 = queryWeight, product of:
                1.1692892 = boost
                3.4626071 = idf(docFreq=3604, maxDocs=42306)
                0.009372028 = queryNorm
              0.3246194 = fieldWeight in 5643, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4626071 = idf(docFreq=3604, maxDocs=42306)
                0.09375 = fieldNorm(doc=5643)
          0.008753803 = weight(abstract_txt:that in 5643) [ClassicSimilarity], result of:
            0.008753803 = score(doc=5643,freq=2.0), product of:
              0.027455002 = queryWeight, product of:
                1.2181449 = boost
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.009372028 = queryNorm
              0.31884181 = fieldWeight in 5643, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.09375 = fieldNorm(doc=5643)
          0.02638487 = weight(abstract_txt:show in 5643) [ClassicSimilarity], result of:
            0.02638487 = score(doc=5643,freq=1.0), product of:
              0.063053116 = queryWeight, product of:
                1.5072867 = boost
                4.463516 = idf(docFreq=1324, maxDocs=42306)
                0.009372028 = queryNorm
              0.41845465 = fieldWeight in 5643, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.463516 = idf(docFreq=1324, maxDocs=42306)
                0.09375 = fieldNorm(doc=5643)
          0.071037345 = weight(abstract_txt:databases in 5643) [ClassicSimilarity], result of:
            0.071037345 = score(doc=5643,freq=2.0), product of:
              0.12202845 = queryWeight, product of:
                2.9654331 = boost
                4.390757 = idf(docFreq=1424, maxDocs=42306)
                0.009372028 = queryNorm
              0.5821376 = fieldWeight in 5643, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.390757 = idf(docFreq=1424, maxDocs=42306)
                0.09375 = fieldNorm(doc=5643)
          1.0881678 = weight(abstract_txt:compression in 5643) [ClassicSimilarity], result of:
            1.0881678 = score(doc=5643,freq=3.0), product of:
              0.89238054 = queryWeight, product of:
                12.679515 = boost
                7.5095496 = idf(docFreq=62, maxDocs=42306)
                0.009372028 = queryNorm
              1.2193989 = fieldWeight in 5643, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.5095496 = idf(docFreq=62, maxDocs=42306)
                0.09375 = fieldNorm(doc=5643)
        0.24 = coord(6/25)
    
  2. Cheng, K.-S.; Young, G.H.; Wong, K.-F.: ¬A study on word-based and integral-bit Chinese text compression algorithms (1999) 0.23
    0.22902901 = sum of:
      0.22902901 = product of:
        1.4314314 = sum of:
          0.010212769 = weight(abstract_txt:that in 4057) [ClassicSimilarity], result of:
            0.010212769 = score(doc=4057,freq=2.0), product of:
              0.027455002 = queryWeight, product of:
                1.2181449 = boost
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.009372028 = queryNorm
              0.3719821 = fieldWeight in 4057, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.109375 = fieldNorm(doc=4057)
          0.030782348 = weight(abstract_txt:show in 4057) [ClassicSimilarity], result of:
            0.030782348 = score(doc=4057,freq=1.0), product of:
              0.063053116 = queryWeight, product of:
                1.5072867 = boost
                4.463516 = idf(docFreq=1324, maxDocs=42306)
                0.009372028 = queryNorm
              0.4881971 = fieldWeight in 4057, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.463516 = idf(docFreq=1324, maxDocs=42306)
                0.109375 = fieldNorm(doc=4057)
          0.120907195 = weight(abstract_txt:scheme in 4057) [ClassicSimilarity], result of:
            0.120907195 = score(doc=4057,freq=2.0), product of:
              0.14261349 = queryWeight, product of:
                2.7763135 = boost
                5.4809837 = idf(docFreq=478, maxDocs=42306)
                0.009372028 = queryNorm
              0.8477964 = fieldWeight in 4057, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4809837 = idf(docFreq=478, maxDocs=42306)
                0.109375 = fieldNorm(doc=4057)
          1.2695291 = weight(abstract_txt:compression in 4057) [ClassicSimilarity], result of:
            1.2695291 = score(doc=4057,freq=3.0), product of:
              0.89238054 = queryWeight, product of:
                12.679515 = boost
                7.5095496 = idf(docFreq=62, maxDocs=42306)
                0.009372028 = queryNorm
              1.422632 = fieldWeight in 4057, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.5095496 = idf(docFreq=62, maxDocs=42306)
                0.109375 = fieldNorm(doc=4057)
        0.16 = coord(4/25)
    
  3. Moffat, A.; Isal, R.Y.K.: Word-based text compression using the Burrows-Wheeler transform (2005) 0.20
    0.19828321 = sum of:
      0.19828321 = product of:
        0.99141604 = sum of:
          0.02821882 = weight(abstract_txt:modeling in 3045) [ClassicSimilarity], result of:
            0.02821882 = score(doc=3045,freq=1.0), product of:
              0.05910261 = queryWeight, product of:
                1.0318841 = boost
                6.1114206 = idf(docFreq=254, maxDocs=42306)
                0.009372028 = queryNorm
              0.47745472 = fieldWeight in 3045, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1114206 = idf(docFreq=254, maxDocs=42306)
                0.078125 = fieldNorm(doc=3045)
          0.029245043 = weight(abstract_txt:costs in 3045) [ClassicSimilarity], result of:
            0.029245043 = score(doc=3045,freq=1.0), product of:
              0.06052697 = queryWeight, product of:
                1.0442442 = boost
                6.184624 = idf(docFreq=236, maxDocs=42306)
                0.009372028 = queryNorm
              0.48317376 = fieldWeight in 3045, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.184624 = idf(docFreq=236, maxDocs=42306)
                0.078125 = fieldNorm(doc=3045)
          0.005158228 = weight(abstract_txt:that in 3045) [ClassicSimilarity], result of:
            0.005158228 = score(doc=3045,freq=1.0), product of:
              0.027455002 = queryWeight, product of:
                1.2181449 = boost
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.009372028 = queryNorm
              0.18787934 = fieldWeight in 3045, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.078125 = fieldNorm(doc=3045)
          0.021987392 = weight(abstract_txt:show in 3045) [ClassicSimilarity], result of:
            0.021987392 = score(doc=3045,freq=1.0), product of:
              0.063053116 = queryWeight, product of:
                1.5072867 = boost
                4.463516 = idf(docFreq=1324, maxDocs=42306)
                0.009372028 = queryNorm
              0.3487122 = fieldWeight in 3045, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.463516 = idf(docFreq=1324, maxDocs=42306)
                0.078125 = fieldNorm(doc=3045)
          0.9068065 = weight(abstract_txt:compression in 3045) [ClassicSimilarity], result of:
            0.9068065 = score(doc=3045,freq=3.0), product of:
              0.89238054 = queryWeight, product of:
                12.679515 = boost
                7.5095496 = idf(docFreq=62, maxDocs=42306)
                0.009372028 = queryNorm
              1.0161657 = fieldWeight in 3045, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.5095496 = idf(docFreq=62, maxDocs=42306)
                0.078125 = fieldNorm(doc=3045)
        0.2 = coord(5/25)
    
  4. Adiego, J.; Navarro, G.; Fuente, P. de la: Lempel-Ziv compression of highly structured documents (2007) 0.18
    0.17521714 = sum of:
      0.17521714 = product of:
        0.87608564 = sum of:
          0.03902944 = weight(abstract_txt:random in 994) [ClassicSimilarity], result of:
            0.03902944 = score(doc=994,freq=2.0), product of:
              0.06757286 = queryWeight, product of:
                1.1033511 = boost
                6.5346904 = idf(docFreq=166, maxDocs=42306)
                0.009372028 = queryNorm
              0.57759047 = fieldWeight in 994, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.5346904 = idf(docFreq=166, maxDocs=42306)
                0.0625 = fieldNorm(doc=994)
          0.046815284 = weight(abstract_txt:adaptive in 994) [ClassicSimilarity], result of:
            0.046815284 = score(doc=994,freq=2.0), product of:
              0.07628442 = queryWeight, product of:
                1.1723182 = boost
                6.943154 = idf(docFreq=110, maxDocs=42306)
                0.009372028 = queryNorm
              0.6136939 = fieldWeight in 994, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.943154 = idf(docFreq=110, maxDocs=42306)
                0.0625 = fieldNorm(doc=994)
          0.0071474495 = weight(abstract_txt:that in 994) [ClassicSimilarity], result of:
            0.0071474495 = score(doc=994,freq=3.0), product of:
              0.027455002 = queryWeight, product of:
                1.2181449 = boost
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.009372028 = queryNorm
              0.26033324 = fieldWeight in 994, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.0625 = fieldNorm(doc=994)
          0.057648268 = weight(abstract_txt:efficient in 994) [ClassicSimilarity], result of:
            0.057648268 = score(doc=994,freq=1.0), product of:
              0.15925217 = queryWeight, product of:
                2.9338026 = boost
                5.791898 = idf(docFreq=350, maxDocs=42306)
                0.009372028 = queryNorm
              0.3619936 = fieldWeight in 994, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.791898 = idf(docFreq=350, maxDocs=42306)
                0.0625 = fieldNorm(doc=994)
          0.7254452 = weight(abstract_txt:compression in 994) [ClassicSimilarity], result of:
            0.7254452 = score(doc=994,freq=3.0), product of:
              0.89238054 = queryWeight, product of:
                12.679515 = boost
                7.5095496 = idf(docFreq=62, maxDocs=42306)
                0.009372028 = queryNorm
              0.81293255 = fieldWeight in 994, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.5095496 = idf(docFreq=62, maxDocs=42306)
                0.0625 = fieldNorm(doc=994)
        0.2 = coord(5/25)
    
  5. Moffat, A.; Zobel, J.: Self-indexing inverted files for fast text retrieval (1996) 0.16
    0.16460861 = sum of:
      0.16460861 = product of:
        0.5878879 = sum of:
          0.023396036 = weight(abstract_txt:costs in 2010) [ClassicSimilarity], result of:
            0.023396036 = score(doc=2010,freq=1.0), product of:
              0.06052697 = queryWeight, product of:
                1.0442442 = boost
                6.184624 = idf(docFreq=236, maxDocs=42306)
                0.009372028 = queryNorm
              0.386539 = fieldWeight in 2010, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.184624 = idf(docFreq=236, maxDocs=42306)
                0.0625 = fieldNorm(doc=2010)
          0.025081089 = weight(abstract_txt:reduce in 2010) [ClassicSimilarity], result of:
            0.025081089 = score(doc=2010,freq=1.0), product of:
              0.06339938 = queryWeight, product of:
                1.0687351 = boost
                6.3296742 = idf(docFreq=204, maxDocs=42306)
                0.009372028 = queryNorm
              0.39560464 = fieldWeight in 2010, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.3296742 = idf(docFreq=204, maxDocs=42306)
                0.0625 = fieldNorm(doc=2010)
          0.014223372 = weight(abstract_txt:retrieval in 2010) [ClassicSimilarity], result of:
            0.014223372 = score(doc=2010,freq=3.0), product of:
              0.037945364 = queryWeight, product of:
                1.1692892 = boost
                3.4626071 = idf(docFreq=3604, maxDocs=42306)
                0.009372028 = queryNorm
              0.3748382 = fieldWeight in 2010, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.4626071 = idf(docFreq=3604, maxDocs=42306)
                0.0625 = fieldNorm(doc=2010)
          0.004126582 = weight(abstract_txt:that in 2010) [ClassicSimilarity], result of:
            0.004126582 = score(doc=2010,freq=1.0), product of:
              0.027455002 = queryWeight, product of:
                1.2181449 = boost
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.009372028 = queryNorm
              0.15030347 = fieldWeight in 2010, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.0625 = fieldNorm(doc=2010)
          0.06873745 = weight(abstract_txt:similarly in 2010) [ClassicSimilarity], result of:
            0.06873745 = score(doc=2010,freq=2.0), product of:
              0.098546155 = queryWeight, product of:
                1.3324393 = boost
                7.8914843 = idf(docFreq=42, maxDocs=42306)
                0.009372028 = queryNorm
              0.69751525 = fieldWeight in 2010, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.8914843 = idf(docFreq=42, maxDocs=42306)
                0.0625 = fieldNorm(doc=2010)
          0.03348733 = weight(abstract_txt:databases in 2010) [ClassicSimilarity], result of:
            0.03348733 = score(doc=2010,freq=1.0), product of:
              0.12202845 = queryWeight, product of:
                2.9654331 = boost
                4.390757 = idf(docFreq=1424, maxDocs=42306)
                0.009372028 = queryNorm
              0.27442232 = fieldWeight in 2010, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.390757 = idf(docFreq=1424, maxDocs=42306)
                0.0625 = fieldNorm(doc=2010)
          0.418836 = weight(abstract_txt:compression in 2010) [ClassicSimilarity], result of:
            0.418836 = score(doc=2010,freq=1.0), product of:
              0.89238054 = queryWeight, product of:
                12.679515 = boost
                7.5095496 = idf(docFreq=62, maxDocs=42306)
                0.009372028 = queryNorm
              0.46934685 = fieldWeight in 2010, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5095496 = idf(docFreq=62, maxDocs=42306)
                0.0625 = fieldNorm(doc=2010)
        0.28 = coord(7/25)