Document (#33011)

Author
Shieh, W.-Y.
Chung, C.-P.
Title
¬A statistics-based approach to incrementally update inverted files
Source
Information processing and management. 41(2005) no.2, S.275-288
Year
2005
Abstract
Many information retrieval systems use the inverted file as indexing structure. The inverted file, however, requires inefficient reorganization when new documents are to be added to an existing collection. Most studies suggest dealing with this problem by sparing free space in an inverted file for incremental updates. In this paper, we propose a run-time statistics-based approach to allocate the spare space. This approach estimates the space requirements in an inverted file using only a little most recent statistical data on space usage and document update request rate. For best indexing speed and space efficiency, the amount of the spare space to be allocated is determined by adaptively balancing the trade-offs between reorganization reduction and space utilization. Experiment results show that the proposed space-sparing approach significantly avoids reorganization in updating an inverted file, and in the meantime, unused free space can be well controlled such that the file access speed is not affected.

Similar documents (author)

  1. Chung, T.M.: ¬A corpus comparison approach for terminology extraction (2003) 5.11
    5.106579 = sum of:
      5.106579 = weight(author_txt:chung in 4072) [ClassicSimilarity], result of:
        5.106579 = fieldWeight in 4072, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.1705265 = idf(docFreq=33, maxDocs=44218)
          0.625 = fieldNorm(doc=4072)
    
  2. Chung, H.H.: User friendly audiovisual material cataloging at Westchester County Public Library System (2001) 5.11
    5.106579 = sum of:
      5.106579 = weight(author_txt:chung in 5415) [ClassicSimilarity], result of:
        5.106579 = fieldWeight in 5415, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.1705265 = idf(docFreq=33, maxDocs=44218)
          0.625 = fieldNorm(doc=5415)
    
  3. Chung, Y.-K.: Characteristics of references in international classification systems literature (1995) 4.09
    4.0852633 = sum of:
      4.0852633 = weight(author_txt:chung in 2939) [ClassicSimilarity], result of:
        4.0852633 = fieldWeight in 2939, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.1705265 = idf(docFreq=33, maxDocs=44218)
          0.5 = fieldNorm(doc=2939)
    
  4. Chung, Y.-K.: Bradford distribution and core authors in classification systems literature (1994) 4.09
    4.0852633 = sum of:
      4.0852633 = weight(author_txt:chung in 5066) [ClassicSimilarity], result of:
        4.0852633 = fieldWeight in 5066, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.1705265 = idf(docFreq=33, maxDocs=44218)
          0.5 = fieldNorm(doc=5066)
    
  5. Chung, Y.-K.: Core international journals of classification systems : an application of Bradford's law (1994) 4.09
    4.0852633 = sum of:
      4.0852633 = weight(author_txt:chung in 5070) [ClassicSimilarity], result of:
        4.0852633 = fieldWeight in 5070, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.1705265 = idf(docFreq=33, maxDocs=44218)
          0.5 = fieldNorm(doc=5070)
    

Similar documents (content)

  1. MacFarlane, A.; McCann, J.A.; Robertson, S.E.: Parallel methods for the update of partitioned inverted files (2007) 0.29
    0.28816506 = sum of:
      0.28816506 = product of:
        1.029161 = sum of:
          0.0060237893 = weight(abstract_txt:this in 819) [ClassicSimilarity], result of:
            0.0060237893 = score(doc=819,freq=2.0), product of:
              0.028243225 = queryWeight, product of:
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.011704526 = queryNorm
              0.21328263 = fieldWeight in 819, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.0625 = fieldNorm(doc=819)
          0.042755473 = weight(abstract_txt:updates in 819) [ClassicSimilarity], result of:
            0.042755473 = score(doc=819,freq=1.0), product of:
              0.091123804 = queryWeight, product of:
                1.0370463 = boost
                7.5072327 = idf(docFreq=65, maxDocs=44218)
                0.011704526 = queryNorm
              0.46920204 = fieldWeight in 819, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5072327 = idf(docFreq=65, maxDocs=44218)
                0.0625 = fieldNorm(doc=819)
          0.05095223 = weight(abstract_txt:incremental in 819) [ClassicSimilarity], result of:
            0.05095223 = score(doc=819,freq=1.0), product of:
              0.10242661 = queryWeight, product of:
                1.0994833 = boost
                7.9592175 = idf(docFreq=41, maxDocs=44218)
                0.011704526 = queryNorm
              0.4974511 = fieldWeight in 819, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.9592175 = idf(docFreq=41, maxDocs=44218)
                0.0625 = fieldNorm(doc=819)
          0.13237673 = weight(abstract_txt:update in 819) [ClassicSimilarity], result of:
            0.13237673 = score(doc=819,freq=4.0), product of:
              0.15363911 = queryWeight, product of:
                1.9043558 = boost
                6.892866 = idf(docFreq=121, maxDocs=44218)
                0.011704526 = queryNorm
              0.86160827 = fieldWeight in 819, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.892866 = idf(docFreq=121, maxDocs=44218)
                0.0625 = fieldNorm(doc=819)
          0.021236435 = weight(abstract_txt:approach in 819) [ClassicSimilarity], result of:
            0.021236435 = score(doc=819,freq=1.0), product of:
              0.090721816 = queryWeight, product of:
                2.0695126 = boost
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.011704526 = queryNorm
              0.234083 = fieldWeight in 819, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.0625 = fieldNorm(doc=819)
          0.10527456 = weight(abstract_txt:file in 819) [ClassicSimilarity], result of:
            0.10527456 = score(doc=819,freq=1.0), product of:
              0.30192798 = queryWeight, product of:
                4.6239114 = boost
                5.57879 = idf(docFreq=453, maxDocs=44218)
                0.011704526 = queryNorm
              0.3486744 = fieldWeight in 819, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.57879 = idf(docFreq=453, maxDocs=44218)
                0.0625 = fieldNorm(doc=819)
          0.6705418 = weight(abstract_txt:inverted in 819) [ClassicSimilarity], result of:
            0.6705418 = score(doc=819,freq=6.0), product of:
              0.5709367 = queryWeight, product of:
                6.358458 = boost
                7.6715355 = idf(docFreq=55, maxDocs=44218)
                0.011704526 = queryNorm
              1.1744592 = fieldWeight in 819, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.6715355 = idf(docFreq=55, maxDocs=44218)
                0.0625 = fieldNorm(doc=819)
        0.28 = coord(7/25)
    
  2. MacFarlane, A.; McCann, J.A.; Robertson, S.E.: Parallel methods for the generation of partitioned inverted files (2005) 0.25
    0.24525043 = sum of:
      0.24525043 = product of:
        0.8758944 = sum of:
          0.017530987 = weight(abstract_txt:most in 651) [ClassicSimilarity], result of:
            0.017530987 = score(doc=651,freq=2.0), product of:
              0.050293084 = queryWeight, product of:
                1.0895605 = boost
                3.943693 = idf(docFreq=2328, maxDocs=44218)
                0.011704526 = queryNorm
              0.3485765 = fieldWeight in 651, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.943693 = idf(docFreq=2328, maxDocs=44218)
                0.0625 = fieldNorm(doc=651)
          0.033262603 = weight(abstract_txt:indexing in 651) [ClassicSimilarity], result of:
            0.033262603 = score(doc=651,freq=4.0), product of:
              0.06117841 = queryWeight, product of:
                1.2017007 = boost
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.011704526 = queryNorm
              0.54369843 = fieldWeight in 651, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.0625 = fieldNorm(doc=651)
          0.08183508 = weight(abstract_txt:speed in 651) [ClassicSimilarity], result of:
            0.08183508 = score(doc=651,freq=2.0), product of:
              0.14047435 = queryWeight, product of:
                1.8209404 = boost
                6.590942 = idf(docFreq=164, maxDocs=44218)
                0.011704526 = queryNorm
              0.58256245 = fieldWeight in 651, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.590942 = idf(docFreq=164, maxDocs=44218)
                0.0625 = fieldNorm(doc=651)
          0.021236435 = weight(abstract_txt:approach in 651) [ClassicSimilarity], result of:
            0.021236435 = score(doc=651,freq=1.0), product of:
              0.090721816 = queryWeight, product of:
                2.0695126 = boost
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.011704526 = queryNorm
              0.234083 = fieldWeight in 651, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.0625 = fieldNorm(doc=651)
          0.10527456 = weight(abstract_txt:file in 651) [ClassicSimilarity], result of:
            0.10527456 = score(doc=651,freq=1.0), product of:
              0.30192798 = queryWeight, product of:
                4.6239114 = boost
                5.57879 = idf(docFreq=453, maxDocs=44218)
                0.011704526 = queryNorm
              0.3486744 = fieldWeight in 651, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.57879 = idf(docFreq=453, maxDocs=44218)
                0.0625 = fieldNorm(doc=651)
          0.47414467 = weight(abstract_txt:inverted in 651) [ClassicSimilarity], result of:
            0.47414467 = score(doc=651,freq=3.0), product of:
              0.5709367 = queryWeight, product of:
                6.358458 = boost
                7.6715355 = idf(docFreq=55, maxDocs=44218)
                0.011704526 = queryNorm
              0.83046806 = fieldWeight in 651, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.6715355 = idf(docFreq=55, maxDocs=44218)
                0.0625 = fieldNorm(doc=651)
          0.14261006 = weight(abstract_txt:space in 651) [ClassicSimilarity], result of:
            0.14261006 = score(doc=651,freq=1.0), product of:
              0.42314085 = queryWeight, product of:
                6.7041845 = boost
                5.3924384 = idf(docFreq=546, maxDocs=44218)
                0.011704526 = queryNorm
              0.3370274 = fieldWeight in 651, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3924384 = idf(docFreq=546, maxDocs=44218)
                0.0625 = fieldNorm(doc=651)
        0.28 = coord(7/25)
    
  3. Baeza-Yates, R.; Navarro, G.: Block addressing indices for approximate text retrieval (2000) 0.16
    0.15950418 = sum of:
      0.15950418 = product of:
        0.66460073 = sum of:
          0.0042594625 = weight(abstract_txt:this in 4295) [ClassicSimilarity], result of:
            0.0042594625 = score(doc=4295,freq=1.0), product of:
              0.028243225 = queryWeight, product of:
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.011704526 = queryNorm
              0.1508136 = fieldWeight in 4295, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.0625 = fieldNorm(doc=4295)
          0.01239628 = weight(abstract_txt:most in 4295) [ClassicSimilarity], result of:
            0.01239628 = score(doc=4295,freq=1.0), product of:
              0.050293084 = queryWeight, product of:
                1.0895605 = boost
                3.943693 = idf(docFreq=2328, maxDocs=44218)
                0.011704526 = queryNorm
              0.24648081 = fieldWeight in 4295, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.943693 = idf(docFreq=2328, maxDocs=44218)
                0.0625 = fieldNorm(doc=4295)
          0.065457076 = weight(abstract_txt:offs in 4295) [ClassicSimilarity], result of:
            0.065457076 = score(doc=4295,freq=1.0), product of:
              0.12104358 = queryWeight, product of:
                1.1952344 = boost
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.011704526 = queryNorm
              0.5407728 = fieldWeight in 4295, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.0625 = fieldNorm(doc=4295)
          0.023520213 = weight(abstract_txt:indexing in 4295) [ClassicSimilarity], result of:
            0.023520213 = score(doc=4295,freq=2.0), product of:
              0.06117841 = queryWeight, product of:
                1.2017007 = boost
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.011704526 = queryNorm
              0.38445285 = fieldWeight in 4295, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.0625 = fieldNorm(doc=4295)
          0.27374756 = weight(abstract_txt:inverted in 4295) [ClassicSimilarity], result of:
            0.27374756 = score(doc=4295,freq=1.0), product of:
              0.5709367 = queryWeight, product of:
                6.358458 = boost
                7.6715355 = idf(docFreq=55, maxDocs=44218)
                0.011704526 = queryNorm
              0.47947097 = fieldWeight in 4295, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.6715355 = idf(docFreq=55, maxDocs=44218)
                0.0625 = fieldNorm(doc=4295)
          0.28522012 = weight(abstract_txt:space in 4295) [ClassicSimilarity], result of:
            0.28522012 = score(doc=4295,freq=4.0), product of:
              0.42314085 = queryWeight, product of:
                6.7041845 = boost
                5.3924384 = idf(docFreq=546, maxDocs=44218)
                0.011704526 = queryNorm
              0.6740548 = fieldWeight in 4295, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.3924384 = idf(docFreq=546, maxDocs=44218)
                0.0625 = fieldNorm(doc=4295)
        0.24 = coord(6/25)
    
  4. Moffat, A.; Bell, T.A.H.: In situ generation of compressed inverted files (1995) 0.14
    0.1421449 = sum of:
      0.1421449 = product of:
        0.8884056 = sum of:
          0.020789128 = weight(abstract_txt:indexing in 2648) [ClassicSimilarity], result of:
            0.020789128 = score(doc=2648,freq=1.0), product of:
              0.06117841 = queryWeight, product of:
                1.2017007 = boost
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.011704526 = queryNorm
              0.3398115 = fieldWeight in 2648, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.078125 = fieldNorm(doc=2648)
          0.1315932 = weight(abstract_txt:file in 2648) [ClassicSimilarity], result of:
            0.1315932 = score(doc=2648,freq=1.0), product of:
              0.30192798 = queryWeight, product of:
                4.6239114 = boost
                5.57879 = idf(docFreq=453, maxDocs=44218)
                0.011704526 = queryNorm
              0.435843 = fieldWeight in 2648, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.57879 = idf(docFreq=453, maxDocs=44218)
                0.078125 = fieldNorm(doc=2648)
          0.48392192 = weight(abstract_txt:inverted in 2648) [ClassicSimilarity], result of:
            0.48392192 = score(doc=2648,freq=2.0), product of:
              0.5709367 = queryWeight, product of:
                6.358458 = boost
                7.6715355 = idf(docFreq=55, maxDocs=44218)
                0.011704526 = queryNorm
              0.84759295 = fieldWeight in 2648, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.6715355 = idf(docFreq=55, maxDocs=44218)
                0.078125 = fieldNorm(doc=2648)
          0.25210136 = weight(abstract_txt:space in 2648) [ClassicSimilarity], result of:
            0.25210136 = score(doc=2648,freq=2.0), product of:
              0.42314085 = queryWeight, product of:
                6.7041845 = boost
                5.3924384 = idf(docFreq=546, maxDocs=44218)
                0.011704526 = queryNorm
              0.5957859 = fieldWeight in 2648, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.3924384 = idf(docFreq=546, maxDocs=44218)
                0.078125 = fieldNorm(doc=2648)
        0.16 = coord(4/25)
    
  5. Mazur, Z.: Inverted file organization in the information retrieval system based on thesaurus with weights (1979) 0.13
    0.13172983 = sum of:
      0.13172983 = product of:
        1.0977485 = sum of:
          0.007454059 = weight(abstract_txt:this in 5494) [ClassicSimilarity], result of:
            0.007454059 = score(doc=5494,freq=1.0), product of:
              0.028243225 = queryWeight, product of:
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.011704526 = queryNorm
              0.2639238 = fieldWeight in 5494, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.109375 = fieldNorm(doc=5494)
          0.26054123 = weight(abstract_txt:file in 5494) [ClassicSimilarity], result of:
            0.26054123 = score(doc=5494,freq=2.0), product of:
              0.30192798 = queryWeight, product of:
                4.6239114 = boost
                5.57879 = idf(docFreq=453, maxDocs=44218)
                0.011704526 = queryNorm
              0.86292505 = fieldWeight in 5494, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.57879 = idf(docFreq=453, maxDocs=44218)
                0.109375 = fieldNorm(doc=5494)
          0.82975316 = weight(abstract_txt:inverted in 5494) [ClassicSimilarity], result of:
            0.82975316 = score(doc=5494,freq=3.0), product of:
              0.5709367 = queryWeight, product of:
                6.358458 = boost
                7.6715355 = idf(docFreq=55, maxDocs=44218)
                0.011704526 = queryNorm
              1.4533191 = fieldWeight in 5494, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.6715355 = idf(docFreq=55, maxDocs=44218)
                0.109375 = fieldNorm(doc=5494)
        0.12 = coord(3/25)