Document (#33012)

Author
Shieh, W.-Y.
Chung, C.-P.
Title
¬A statistics-based approach to incrementally update inverted files
Source
Information processing and management. 41(2005) no.2, S.275-288
Year
2005
Abstract
Many information retrieval systems use the inverted file as indexing structure. The inverted file, however, requires inefficient reorganization when new documents are to be added to an existing collection. Most studies suggest dealing with this problem by sparing free space in an inverted file for incremental updates. In this paper, we propose a run-time statistics-based approach to allocate the spare space. This approach estimates the space requirements in an inverted file using only a little most recent statistical data on space usage and document update request rate. For best indexing speed and space efficiency, the amount of the spare space to be allocated is determined by adaptively balancing the trade-offs between reorganization reduction and space utilization. Experiment results show that the proposed space-sparing approach significantly avoids reorganization in updating an inverted file, and in the meantime, unused free space can be well controlled such that the file access speed is not affected.

Similar documents (author)

  1. Chung, T.M.: ¬A corpus comparison approach for terminology extraction (2003) 5.08
    5.0789523 = sum of:
      5.0789523 = weight(author_txt:chung in 73) [ClassicSimilarity], result of:
        5.0789523 = fieldWeight in 73, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.126324 = idf(docFreq=33, maxDocs=42306)
          0.625 = fieldNorm(doc=73)
    
  2. Chung, H.H.: User friendly audiovisual material cataloging at Westchester County Public Library System (2001) 5.08
    5.0789523 = sum of:
      5.0789523 = weight(author_txt:chung in 416) [ClassicSimilarity], result of:
        5.0789523 = fieldWeight in 416, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.126324 = idf(docFreq=33, maxDocs=42306)
          0.625 = fieldNorm(doc=416)
    
  3. Chung, Y.-K.: Characteristics of references in international classification systems literature (1995) 4.06
    4.063162 = sum of:
      4.063162 = weight(author_txt:chung in 3008) [ClassicSimilarity], result of:
        4.063162 = fieldWeight in 3008, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.126324 = idf(docFreq=33, maxDocs=42306)
          0.5 = fieldNorm(doc=3008)
    
  4. Chung, Y.-K.: Bradford distribution and core authors in classification systems literature (1994) 4.06
    4.063162 = sum of:
      4.063162 = weight(author_txt:chung in 5135) [ClassicSimilarity], result of:
        4.063162 = fieldWeight in 5135, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.126324 = idf(docFreq=33, maxDocs=42306)
          0.5 = fieldNorm(doc=5135)
    
  5. Chung, Y.-K.: Core international journals of classification systems : an application of Bradford's law (1994) 4.06
    4.063162 = sum of:
      4.063162 = weight(author_txt:chung in 5139) [ClassicSimilarity], result of:
        4.063162 = fieldWeight in 5139, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.126324 = idf(docFreq=33, maxDocs=42306)
          0.5 = fieldNorm(doc=5139)
    

Similar documents (content)

  1. MacFarlane, A.; McCann, J.A.; Robertson, S.E.: Parallel methods for the update of partitioned inverted files (2007) 0.29
    0.2867659 = sum of:
      0.2867659 = product of:
        1.024164 = sum of:
          0.0063077733 = weight(abstract_txt:this in 2820) [ClassicSimilarity], result of:
            0.0063077733 = score(doc=2820,freq=2.0), product of:
              0.029075876 = queryWeight, product of:
                2.4544165 = idf(docFreq=9879, maxDocs=42306)
                0.01184635 = queryNorm
              0.21694182 = fieldWeight in 2820, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4544165 = idf(docFreq=9879, maxDocs=42306)
                0.0625 = fieldNorm(doc=2820)
          0.04341844 = weight(abstract_txt:updates in 2820) [ClassicSimilarity], result of:
            0.04341844 = score(doc=2820,freq=1.0), product of:
              0.09191107 = queryWeight, product of:
                1.0264951 = boost
                7.5583396 = idf(docFreq=59, maxDocs=42306)
                0.01184635 = queryNorm
              0.47239622 = fieldWeight in 2820, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5583396 = idf(docFreq=59, maxDocs=42306)
                0.0625 = fieldNorm(doc=2820)
          0.051273417 = weight(abstract_txt:incremental in 2820) [ClassicSimilarity], result of:
            0.051273417 = score(doc=2820,freq=1.0), product of:
              0.10268646 = queryWeight, product of:
                1.0849996 = boost
                7.9891224 = idf(docFreq=38, maxDocs=42306)
                0.01184635 = queryNorm
              0.49932015 = fieldWeight in 2820, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.9891224 = idf(docFreq=38, maxDocs=42306)
                0.0625 = fieldNorm(doc=2820)
          0.13207823 = weight(abstract_txt:update in 2820) [ClassicSimilarity], result of:
            0.13207823 = score(doc=2820,freq=4.0), product of:
              0.15315428 = queryWeight, product of:
                1.8739264 = boost
                6.899094 = idf(docFreq=115, maxDocs=42306)
                0.01184635 = queryNorm
              0.86238676 = fieldWeight in 2820, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.899094 = idf(docFreq=115, maxDocs=42306)
                0.0625 = fieldNorm(doc=2820)
          0.02187964 = weight(abstract_txt:approach in 2820) [ClassicSimilarity], result of:
            0.02187964 = score(doc=2820,freq=1.0), product of:
              0.092391446 = queryWeight, product of:
                2.0583482 = boost
                3.789033 = idf(docFreq=2600, maxDocs=42306)
                0.01184635 = queryNorm
              0.23681456 = fieldWeight in 2820, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.789033 = idf(docFreq=2600, maxDocs=42306)
                0.0625 = fieldNorm(doc=2820)
          0.10402905 = weight(abstract_txt:file in 2820) [ClassicSimilarity], result of:
            0.10402905 = score(doc=2820,freq=1.0), product of:
              0.29904634 = queryWeight, product of:
                4.5354233 = boost
                5.5659094 = idf(docFreq=439, maxDocs=42306)
                0.01184635 = queryNorm
              0.34786934 = fieldWeight in 2820, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5659094 = idf(docFreq=439, maxDocs=42306)
                0.0625 = fieldNorm(doc=2820)
          0.6651774 = weight(abstract_txt:inverted in 2820) [ClassicSimilarity], result of:
            0.6651774 = score(doc=2820,freq=6.0), product of:
              0.5669481 = queryWeight, product of:
                6.2448244 = boost
                7.6637 = idf(docFreq=53, maxDocs=42306)
                0.01184635 = queryNorm
              1.1732597 = fieldWeight in 2820, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.6637 = idf(docFreq=53, maxDocs=42306)
                0.0625 = fieldNorm(doc=2820)
        0.28 = coord(7/25)
    
  2. MacFarlane, A.; McCann, J.A.; Robertson, S.E.: Parallel methods for the generation of partitioned inverted files (2005) 0.24
    0.24398893 = sum of:
      0.24398893 = product of:
        0.87138903 = sum of:
          0.017792074 = weight(abstract_txt:most in 1777) [ClassicSimilarity], result of:
            0.017792074 = score(doc=1777,freq=2.0), product of:
              0.05070717 = queryWeight, product of:
                1.0782579 = boost
                3.969741 = idf(docFreq=2170, maxDocs=42306)
                0.01184635 = queryNorm
              0.35087886 = fieldWeight in 1777, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.969741 = idf(docFreq=2170, maxDocs=42306)
                0.0625 = fieldNorm(doc=1777)
          0.032836977 = weight(abstract_txt:indexing in 1777) [ClassicSimilarity], result of:
            0.032836977 = score(doc=1777,freq=4.0), product of:
              0.06055506 = queryWeight, product of:
                1.1783198 = boost
                4.3381314 = idf(docFreq=1501, maxDocs=42306)
                0.01184635 = queryNorm
              0.5422664 = fieldWeight in 1777, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.3381314 = idf(docFreq=1501, maxDocs=42306)
                0.0625 = fieldNorm(doc=1777)
          0.08070304 = weight(abstract_txt:speed in 1777) [ClassicSimilarity], result of:
            0.08070304 = score(doc=1777,freq=2.0), product of:
              0.13894564 = queryWeight, product of:
                1.7848858 = boost
                6.57128 = idf(docFreq=160, maxDocs=42306)
                0.01184635 = queryNorm
              0.58082455 = fieldWeight in 1777, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.57128 = idf(docFreq=160, maxDocs=42306)
                0.0625 = fieldNorm(doc=1777)
          0.02187964 = weight(abstract_txt:approach in 1777) [ClassicSimilarity], result of:
            0.02187964 = score(doc=1777,freq=1.0), product of:
              0.092391446 = queryWeight, product of:
                2.0583482 = boost
                3.789033 = idf(docFreq=2600, maxDocs=42306)
                0.01184635 = queryNorm
              0.23681456 = fieldWeight in 1777, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.789033 = idf(docFreq=2600, maxDocs=42306)
                0.0625 = fieldNorm(doc=1777)
          0.10402905 = weight(abstract_txt:file in 1777) [ClassicSimilarity], result of:
            0.10402905 = score(doc=1777,freq=1.0), product of:
              0.29904634 = queryWeight, product of:
                4.5354233 = boost
                5.5659094 = idf(docFreq=439, maxDocs=42306)
                0.01184635 = queryNorm
              0.34786934 = fieldWeight in 1777, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5659094 = idf(docFreq=439, maxDocs=42306)
                0.0625 = fieldNorm(doc=1777)
          0.47035143 = weight(abstract_txt:inverted in 1777) [ClassicSimilarity], result of:
            0.47035143 = score(doc=1777,freq=3.0), product of:
              0.5669481 = queryWeight, product of:
                6.2448244 = boost
                7.6637 = idf(docFreq=53, maxDocs=42306)
                0.01184635 = queryNorm
              0.8296199 = fieldWeight in 1777, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.6637 = idf(docFreq=53, maxDocs=42306)
                0.0625 = fieldNorm(doc=1777)
          0.14379679 = weight(abstract_txt:space in 1777) [ClassicSimilarity], result of:
            0.14379679 = score(doc=1777,freq=1.0), product of:
              0.42478117 = queryWeight, product of:
                6.6202874 = boost
                5.4163146 = idf(docFreq=510, maxDocs=42306)
                0.01184635 = queryNorm
              0.33851966 = fieldWeight in 1777, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4163146 = idf(docFreq=510, maxDocs=42306)
                0.0625 = fieldNorm(doc=1777)
        0.28 = coord(7/25)
    
  3. Baeza-Yates, R.; Navarro, G.: Block addressing indices for approximate text retrieval (2000) 0.16
    0.1595155 = sum of:
      0.1595155 = product of:
        0.66464794 = sum of:
          0.0044602696 = weight(abstract_txt:this in 5296) [ClassicSimilarity], result of:
            0.0044602696 = score(doc=5296,freq=1.0), product of:
              0.029075876 = queryWeight, product of:
                2.4544165 = idf(docFreq=9879, maxDocs=42306)
                0.01184635 = queryNorm
              0.15340103 = fieldWeight in 5296, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4544165 = idf(docFreq=9879, maxDocs=42306)
                0.0625 = fieldNorm(doc=5296)
          0.012580896 = weight(abstract_txt:most in 5296) [ClassicSimilarity], result of:
            0.012580896 = score(doc=5296,freq=1.0), product of:
              0.05070717 = queryWeight, product of:
                1.0782579 = boost
                3.969741 = idf(docFreq=2170, maxDocs=42306)
                0.01184635 = queryNorm
              0.24810882 = fieldWeight in 5296, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.969741 = idf(docFreq=2170, maxDocs=42306)
                0.0625 = fieldNorm(doc=5296)
          0.065236405 = weight(abstract_txt:offs in 5296) [ClassicSimilarity], result of:
            0.065236405 = score(doc=5296,freq=1.0), product of:
              0.1205716 = queryWeight, product of:
                1.1756972 = boost
                8.656952 = idf(docFreq=19, maxDocs=42306)
                0.01184635 = queryNorm
              0.5410595 = fieldWeight in 5296, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.656952 = idf(docFreq=19, maxDocs=42306)
                0.0625 = fieldNorm(doc=5296)
          0.023219248 = weight(abstract_txt:indexing in 5296) [ClassicSimilarity], result of:
            0.023219248 = score(doc=5296,freq=2.0), product of:
              0.06055506 = queryWeight, product of:
                1.1783198 = boost
                4.3381314 = idf(docFreq=1501, maxDocs=42306)
                0.01184635 = queryNorm
              0.38344026 = fieldWeight in 5296, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.3381314 = idf(docFreq=1501, maxDocs=42306)
                0.0625 = fieldNorm(doc=5296)
          0.2715575 = weight(abstract_txt:inverted in 5296) [ClassicSimilarity], result of:
            0.2715575 = score(doc=5296,freq=1.0), product of:
              0.5669481 = queryWeight, product of:
                6.2448244 = boost
                7.6637 = idf(docFreq=53, maxDocs=42306)
                0.01184635 = queryNorm
              0.47898126 = fieldWeight in 5296, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.6637 = idf(docFreq=53, maxDocs=42306)
                0.0625 = fieldNorm(doc=5296)
          0.28759357 = weight(abstract_txt:space in 5296) [ClassicSimilarity], result of:
            0.28759357 = score(doc=5296,freq=4.0), product of:
              0.42478117 = queryWeight, product of:
                6.6202874 = boost
                5.4163146 = idf(docFreq=510, maxDocs=42306)
                0.01184635 = queryNorm
              0.6770393 = fieldWeight in 5296, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.4163146 = idf(docFreq=510, maxDocs=42306)
                0.0625 = fieldNorm(doc=5296)
        0.24 = coord(6/25)
    
  4. Moffat, A.; Bell, T.A.H.: In situ generation of compressed inverted files (1995) 0.14
    0.14156944 = sum of:
      0.14156944 = product of:
        0.884809 = sum of:
          0.020523109 = weight(abstract_txt:indexing in 2717) [ClassicSimilarity], result of:
            0.020523109 = score(doc=2717,freq=1.0), product of:
              0.06055506 = queryWeight, product of:
                1.1783198 = boost
                4.3381314 = idf(docFreq=1501, maxDocs=42306)
                0.01184635 = queryNorm
              0.3389165 = fieldWeight in 2717, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3381314 = idf(docFreq=1501, maxDocs=42306)
                0.078125 = fieldNorm(doc=2717)
          0.13003632 = weight(abstract_txt:file in 2717) [ClassicSimilarity], result of:
            0.13003632 = score(doc=2717,freq=1.0), product of:
              0.29904634 = queryWeight, product of:
                4.5354233 = boost
                5.5659094 = idf(docFreq=439, maxDocs=42306)
                0.01184635 = queryNorm
              0.4348367 = fieldWeight in 2717, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5659094 = idf(docFreq=439, maxDocs=42306)
                0.078125 = fieldNorm(doc=2717)
          0.48005038 = weight(abstract_txt:inverted in 2717) [ClassicSimilarity], result of:
            0.48005038 = score(doc=2717,freq=2.0), product of:
              0.5669481 = queryWeight, product of:
                6.2448244 = boost
                7.6637 = idf(docFreq=53, maxDocs=42306)
                0.01184635 = queryNorm
              0.8467272 = fieldWeight in 2717, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.6637 = idf(docFreq=53, maxDocs=42306)
                0.078125 = fieldNorm(doc=2717)
          0.25419918 = weight(abstract_txt:space in 2717) [ClassicSimilarity], result of:
            0.25419918 = score(doc=2717,freq=2.0), product of:
              0.42478117 = queryWeight, product of:
                6.6202874 = boost
                5.4163146 = idf(docFreq=510, maxDocs=42306)
                0.01184635 = queryNorm
              0.59842384 = fieldWeight in 2717, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4163146 = idf(docFreq=510, maxDocs=42306)
                0.078125 = fieldNorm(doc=2717)
        0.16 = coord(4/25)
    
  5. Nelson, M.J.: ¬A prefix trie index for inverted files (1997) 0.13
    0.13082437 = sum of:
      0.13082437 = product of:
        0.6541219 = sum of:
          0.0055753365 = weight(abstract_txt:this in 1496) [ClassicSimilarity], result of:
            0.0055753365 = score(doc=1496,freq=1.0), product of:
              0.029075876 = queryWeight, product of:
                2.4544165 = idf(docFreq=9879, maxDocs=42306)
                0.01184635 = queryNorm
              0.19175129 = fieldWeight in 1496, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4544165 = idf(docFreq=9879, maxDocs=42306)
                0.078125 = fieldNorm(doc=1496)
          0.06232158 = weight(abstract_txt:statistics in 1496) [ClassicSimilarity], result of:
            0.06232158 = score(doc=1496,freq=1.0), product of:
              0.1269835 = queryWeight, product of:
                1.7063243 = boost
                6.2820463 = idf(docFreq=214, maxDocs=42306)
                0.01184635 = queryNorm
              0.49078488 = fieldWeight in 1496, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2820463 = idf(docFreq=214, maxDocs=42306)
                0.078125 = fieldNorm(doc=1496)
          0.116741754 = weight(abstract_txt:update in 1496) [ClassicSimilarity], result of:
            0.116741754 = score(doc=1496,freq=2.0), product of:
              0.15315428 = queryWeight, product of:
                1.8739264 = boost
                6.899094 = idf(docFreq=115, maxDocs=42306)
                0.01184635 = queryNorm
              0.76224935 = fieldWeight in 1496, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.899094 = idf(docFreq=115, maxDocs=42306)
                0.078125 = fieldNorm(doc=1496)
          0.13003632 = weight(abstract_txt:file in 1496) [ClassicSimilarity], result of:
            0.13003632 = score(doc=1496,freq=1.0), product of:
              0.29904634 = queryWeight, product of:
                4.5354233 = boost
                5.5659094 = idf(docFreq=439, maxDocs=42306)
                0.01184635 = queryNorm
              0.4348367 = fieldWeight in 1496, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5659094 = idf(docFreq=439, maxDocs=42306)
                0.078125 = fieldNorm(doc=1496)
          0.3394469 = weight(abstract_txt:inverted in 1496) [ClassicSimilarity], result of:
            0.3394469 = score(doc=1496,freq=1.0), product of:
              0.5669481 = queryWeight, product of:
                6.2448244 = boost
                7.6637 = idf(docFreq=53, maxDocs=42306)
                0.01184635 = queryNorm
              0.5987266 = fieldWeight in 1496, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.6637 = idf(docFreq=53, maxDocs=42306)
                0.078125 = fieldNorm(doc=1496)
        0.2 = coord(5/25)