Document (#42595)

Masanes, J.
Web archiving methods and approaches : a comparative study
Library trends. 54(2005) no.1, S.72-90
The Web is a virtually infinite information space, and archiving its entirety, all its aspects, is a utopia. The volume of information presents a challenge, but it is neither the only nor the most limiting factor given the continuous drop in storage device costs. Significant challenges lie in the management and technical issues of the location and collection of Web sites. As a consequence of this, archiving the Web is a task that no single institution can carry out alone. This article will present various approaches undertaken today by different institutions; it will discuss their focuses, strengths, and limits, as well as a model for appraisal and identifying potential complementary aspects amongst them. A comparison for discovery accuracy is presented between the snapshot approach done by the Internet Archive (IA) and the event-based collection done by the Bibliothèque Nationale de France (BNF) in 2002 for the presidential and parliamentary elections. The balanced conclusion of this comparison allows for identification of future direction for improvement of the former approach.
Vgl.: DOI: 10.1353/lib.2006.0005.

Similar documents (content)

  1. Poole, A.H.: ¬The information work of community archives : a systematic literature review (2020) 0.07
    0.072339565 = sum of:
      0.072339565 = product of:
        0.36169782 = sum of:
          0.014598415 = weight(abstract_txt:this in 5840) [ClassicSimilarity], result of:
            0.014598415 = score(doc=5840,freq=4.0), product of:
              0.04839887 = queryWeight, product of:
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.020057406 = queryNorm
              0.3016272 = fieldWeight in 5840, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.0625 = fieldNorm(doc=5840)
          0.018195858 = weight(abstract_txt:approach in 5840) [ClassicSimilarity], result of:
            0.018195858 = score(doc=5840,freq=1.0), product of:
              0.07773251 = queryWeight, product of:
                1.0347563 = boost
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.020057406 = queryNorm
              0.234083 = fieldWeight in 5840, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.0625 = fieldNorm(doc=5840)
          0.097694606 = weight(abstract_txt:appraisal in 5840) [ClassicSimilarity], result of:
            0.097694606 = score(doc=5840,freq=1.0), product of:
              0.18917252 = queryWeight, product of:
                1.141434 = boost
                8.2629 = idf(docFreq=30, maxDocs=44218)
                0.020057406 = queryNorm
              0.5164313 = fieldWeight in 5840, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.2629 = idf(docFreq=30, maxDocs=44218)
                0.0625 = fieldNorm(doc=5840)
          0.03478899 = weight(abstract_txt:collection in 5840) [ClassicSimilarity], result of:
            0.03478899 = score(doc=5840,freq=1.0), product of:
              0.119742654 = queryWeight, product of:
                1.2842844 = boost
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.020057406 = queryNorm
              0.2905313 = fieldWeight in 5840, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.0625 = fieldNorm(doc=5840)
          0.19641995 = weight(abstract_txt:archiving in 5840) [ClassicSimilarity], result of:
            0.19641995 = score(doc=5840,freq=1.0), product of:
              0.43461877 = queryWeight, product of:
                2.996654 = boost
                7.230979 = idf(docFreq=86, maxDocs=44218)
                0.020057406 = queryNorm
              0.4519362 = fieldWeight in 5840, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.230979 = idf(docFreq=86, maxDocs=44218)
                0.0625 = fieldNorm(doc=5840)
        0.2 = coord(5/25)
  2. Käki, M.; Aula, A.: Controlling the complexity in comparing search user interfaces via user studies (2008) 0.07
    0.07172676 = sum of:
      0.07172676 = product of:
        0.35863382 = sum of:
          0.00912401 = weight(abstract_txt:this in 2024) [ClassicSimilarity], result of:
            0.00912401 = score(doc=2024,freq=1.0), product of:
              0.04839887 = queryWeight, product of:
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.020057406 = queryNorm
              0.18851699 = fieldWeight in 2024, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.078125 = fieldNorm(doc=2024)
          0.024923919 = weight(abstract_txt:will in 2024) [ClassicSimilarity], result of:
            0.024923919 = score(doc=2024,freq=1.0), product of:
              0.08262127 = queryWeight, product of:
                1.0667992 = boost
                3.8613079 = idf(docFreq=2528, maxDocs=44218)
                0.020057406 = queryNorm
              0.30166468 = fieldWeight in 2024, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.8613079 = idf(docFreq=2528, maxDocs=44218)
                0.078125 = fieldNorm(doc=2024)
          0.107239984 = weight(abstract_txt:balanced in 2024) [ClassicSimilarity], result of:
            0.107239984 = score(doc=2024,freq=1.0), product of:
              0.1734771 = queryWeight, product of:
                1.093057 = boost
                7.912698 = idf(docFreq=43, maxDocs=44218)
                0.020057406 = queryNorm
              0.6181795 = fieldWeight in 2024, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.912698 = idf(docFreq=43, maxDocs=44218)
                0.078125 = fieldNorm(doc=2024)
          0.10817742 = weight(abstract_txt:limiting in 2024) [ClassicSimilarity], result of:
            0.10817742 = score(doc=2024,freq=1.0), product of:
              0.17448659 = queryWeight, product of:
                1.0962328 = boost
                7.935687 = idf(docFreq=42, maxDocs=44218)
                0.020057406 = queryNorm
              0.61997557 = fieldWeight in 2024, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.935687 = idf(docFreq=42, maxDocs=44218)
                0.078125 = fieldNorm(doc=2024)
          0.10916847 = weight(abstract_txt:comparison in 2024) [ClassicSimilarity], result of:
            0.10916847 = score(doc=2024,freq=2.0), product of:
              0.17555065 = queryWeight, product of:
                1.5550271 = boost
                5.628462 = idf(docFreq=431, maxDocs=44218)
                0.020057406 = queryNorm
              0.62186307 = fieldWeight in 2024, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.628462 = idf(docFreq=431, maxDocs=44218)
                0.078125 = fieldNorm(doc=2024)
        0.2 = coord(5/25)
  3. Huang, T.; Nie, R.; Zhao, Y.: Archival knowledge in the field of personal archiving : an exploratory study based on grounded theory (2021) 0.07
    0.07087972 = sum of:
      0.07087972 = product of:
        0.3543986 = sum of:
          0.0072992076 = weight(abstract_txt:this in 173) [ClassicSimilarity], result of:
            0.0072992076 = score(doc=173,freq=1.0), product of:
              0.04839887 = queryWeight, product of:
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.020057406 = queryNorm
              0.1508136 = fieldWeight in 173, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.0625 = fieldNorm(doc=173)
          0.018195858 = weight(abstract_txt:approach in 173) [ClassicSimilarity], result of:
            0.018195858 = score(doc=173,freq=1.0), product of:
              0.07773251 = queryWeight, product of:
                1.0347563 = boost
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.020057406 = queryNorm
              0.234083 = fieldWeight in 173, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.0625 = fieldNorm(doc=173)
          0.097694606 = weight(abstract_txt:appraisal in 173) [ClassicSimilarity], result of:
            0.097694606 = score(doc=173,freq=1.0), product of:
              0.18917252 = queryWeight, product of:
                1.141434 = boost
                8.2629 = idf(docFreq=30, maxDocs=44218)
                0.020057406 = queryNorm
              0.5164313 = fieldWeight in 173, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.2629 = idf(docFreq=30, maxDocs=44218)
                0.0625 = fieldNorm(doc=173)
          0.03478899 = weight(abstract_txt:collection in 173) [ClassicSimilarity], result of:
            0.03478899 = score(doc=173,freq=1.0), product of:
              0.119742654 = queryWeight, product of:
                1.2842844 = boost
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.020057406 = queryNorm
              0.2905313 = fieldWeight in 173, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.0625 = fieldNorm(doc=173)
          0.19641995 = weight(abstract_txt:archiving in 173) [ClassicSimilarity], result of:
            0.19641995 = score(doc=173,freq=1.0), product of:
              0.43461877 = queryWeight, product of:
                2.996654 = boost
                7.230979 = idf(docFreq=86, maxDocs=44218)
                0.020057406 = queryNorm
              0.4519362 = fieldWeight in 173, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.230979 = idf(docFreq=86, maxDocs=44218)
                0.0625 = fieldNorm(doc=173)
        0.2 = coord(5/25)
  4. Filipp, H.; Waudig, D.: Erfassung und Erschließung von Softwareinformationen (1991) 0.07
    0.067097485 = sum of:
      0.067097485 = product of:
        0.55914575 = sum of:
          0.01824802 = weight(abstract_txt:this in 4748) [ClassicSimilarity], result of:
            0.01824802 = score(doc=4748,freq=1.0), product of:
              0.04839887 = queryWeight, product of:
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.020057406 = queryNorm
              0.37703398 = fieldWeight in 4748, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.15625 = fieldNorm(doc=4748)
          0.049847838 = weight(abstract_txt:will in 4748) [ClassicSimilarity], result of:
            0.049847838 = score(doc=4748,freq=1.0), product of:
              0.08262127 = queryWeight, product of:
                1.0667992 = boost
                3.8613079 = idf(docFreq=2528, maxDocs=44218)
                0.020057406 = queryNorm
              0.60332936 = fieldWeight in 4748, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.8613079 = idf(docFreq=2528, maxDocs=44218)
                0.15625 = fieldNorm(doc=4748)
          0.4910499 = weight(abstract_txt:archiving in 4748) [ClassicSimilarity], result of:
            0.4910499 = score(doc=4748,freq=1.0), product of:
              0.43461877 = queryWeight, product of:
                2.996654 = boost
                7.230979 = idf(docFreq=86, maxDocs=44218)
                0.020057406 = queryNorm
              1.1298405 = fieldWeight in 4748, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.230979 = idf(docFreq=86, maxDocs=44218)
                0.15625 = fieldNorm(doc=4748)
        0.12 = coord(3/25)
  5. Steenbakkers, J.F.: NEDLIB Guidelines for setting up a deposit system for electronic publications (2001) 0.07
    0.0662345 = sum of:
      0.0662345 = product of:
        0.41396564 = sum of:
          0.012903298 = weight(abstract_txt:this in 6004) [ClassicSimilarity], result of:
            0.012903298 = score(doc=6004,freq=2.0), product of:
              0.04839887 = queryWeight, product of:
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.020057406 = queryNorm
              0.2666033 = fieldWeight in 6004, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.078125 = fieldNorm(doc=6004)
          0.10914257 = weight(abstract_txt:amongst in 6004) [ClassicSimilarity], result of:
            0.10914257 = score(doc=6004,freq=1.0), product of:
              0.1755229 = queryWeight, product of:
                1.0994833 = boost
                7.9592175 = idf(docFreq=41, maxDocs=44218)
                0.020057406 = queryNorm
              0.6218139 = fieldWeight in 6004, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.9592175 = idf(docFreq=41, maxDocs=44218)
                0.078125 = fieldNorm(doc=6004)
          0.04639483 = weight(abstract_txt:aspects in 6004) [ClassicSimilarity], result of:
            0.04639483 = score(doc=6004,freq=1.0), product of:
              0.12502418 = queryWeight, product of:
                1.3123019 = boost
                4.7499113 = idf(docFreq=1039, maxDocs=44218)
                0.020057406 = queryNorm
              0.37108684 = fieldWeight in 6004, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7499113 = idf(docFreq=1039, maxDocs=44218)
                0.078125 = fieldNorm(doc=6004)
          0.24552494 = weight(abstract_txt:archiving in 6004) [ClassicSimilarity], result of:
            0.24552494 = score(doc=6004,freq=1.0), product of:
              0.43461877 = queryWeight, product of:
                2.996654 = boost
                7.230979 = idf(docFreq=86, maxDocs=44218)
                0.020057406 = queryNorm
              0.56492025 = fieldWeight in 6004, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.230979 = idf(docFreq=86, maxDocs=44218)
                0.078125 = fieldNorm(doc=6004)
        0.16 = coord(4/25)