Document (#41313)

Author
Munkelt, J.
Schaer, P.
Lepsky, K.
Title
Towards an IR test collection for the German National Library
Issue
[Preprint].
Year
2018
Abstract
Automatic content indexing is one of the innovations that are increasingly changing the way libraries work. In theory, it promises a cataloguing service that would hardly be possible with humans in terms of speed, quantity and maybe quality. The German National Library (DNB) has also recognised this potential and is increasingly relying on the automatic indexing of their catalogue content. The DNB took a major step in this direction in 2017, which was announced in two papers. The announcement was rather restrained, but the content of the papers is all the more explosive for the library community: Since September 2017, the DNB has discontinued the intellectual indexing of series Band H and has switched to an automatic process for these series. The subject indexing of online publications (series O) has been purely automatical since 2010; from September 2017, monographs and periodicals published outside the publishing industry and university publications will no longer be indexed by people. This raises the question: What is the quality of the automatic indexing compared to the manual work or in other words to which degree can the automatic indexing replace people without a signi cant drop in regards to quality?
Footnote
Munkelt-etal_DNB_TestColletion.pdf.
Theme
Retrievalstudien
Automatisches Indexieren

Similar documents (author)

  1. Schaer, P.: Integration von Open-Access-Repositorien in Fachportale (2010) 2.25
    2.2527566 = sum of:
      2.2527566 = product of:
        4.505513 = sum of:
          4.505513 = weight(author_txt:schaer in 140) [ClassicSimilarity], result of:
            4.505513 = score(doc=140,freq=1.0), product of:
              0.7863215 = queryWeight, product of:
                1.1281581 = boost
                9.167778 = idf(docFreq=11, maxDocs=42306)
                0.07602669 = queryNorm
              5.7298613 = fieldWeight in 140, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.167778 = idf(docFreq=11, maxDocs=42306)
                0.625 = fieldNorm(doc=140)
        0.5 = coord(1/2)
    
  2. Munkelt, J.; Schaer, P.: Towards an IR test collection for the German National Library (2018) 1.80
    1.8022053 = sum of:
      1.8022053 = product of:
        3.6044106 = sum of:
          3.6044106 = weight(author_txt:schaer in 781) [ClassicSimilarity], result of:
            3.6044106 = score(doc=781,freq=1.0), product of:
              0.7863215 = queryWeight, product of:
                1.1281581 = boost
                9.167778 = idf(docFreq=11, maxDocs=42306)
                0.07602669 = queryNorm
              4.583889 = fieldWeight in 781, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.167778 = idf(docFreq=11, maxDocs=42306)
                0.5 = fieldNorm(doc=781)
        0.5 = coord(1/2)
    
  3. Lepsky, K.: Art and language : Ernst H. Gombrich and Karl Bühler's theory of language (1996) 1.57
    1.5689329 = sum of:
      1.5689329 = product of:
        3.1378658 = sum of:
          3.1378658 = weight(author_txt:lepsky in 5229) [ClassicSimilarity], result of:
            3.1378658 = score(doc=5229,freq=1.0), product of:
              0.6178175 = queryWeight, product of:
                8.126324 = idf(docFreq=33, maxDocs=42306)
                0.07602669 = queryNorm
              5.0789523 = fieldWeight in 5229, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.126324 = idf(docFreq=33, maxDocs=42306)
                0.625 = fieldNorm(doc=5229)
        0.5 = coord(1/2)
    
  4. Lepsky, K.: Maschinelle Indexierung von Titelaufnahmen zur Verbesserung der sachlichen Erschließung in Online-Publikumskatalogen (1994) 1.57
    1.5689329 = sum of:
      1.5689329 = product of:
        3.1378658 = sum of:
          3.1378658 = weight(author_txt:lepsky in 7064) [ClassicSimilarity], result of:
            3.1378658 = score(doc=7064,freq=1.0), product of:
              0.6178175 = queryWeight, product of:
                8.126324 = idf(docFreq=33, maxDocs=42306)
                0.07602669 = queryNorm
              5.0789523 = fieldWeight in 7064, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.126324 = idf(docFreq=33, maxDocs=42306)
                0.625 = fieldNorm(doc=7064)
        0.5 = coord(1/2)
    
  5. Lepsky, K.: RSWK - und was noch? : Stellungnahme zum Bericht 'Sacherschließung in Online-Katalogen' der Expertengruppe Online-Kataloge (1995) 1.57
    1.5689329 = sum of:
      1.5689329 = product of:
        3.1378658 = sum of:
          3.1378658 = weight(author_txt:lepsky in 841) [ClassicSimilarity], result of:
            3.1378658 = score(doc=841,freq=1.0), product of:
              0.6178175 = queryWeight, product of:
                8.126324 = idf(docFreq=33, maxDocs=42306)
                0.07602669 = queryNorm
              5.0789523 = fieldWeight in 841, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.126324 = idf(docFreq=33, maxDocs=42306)
                0.625 = fieldNorm(doc=841)
        0.5 = coord(1/2)
    

Similar documents (content)

  1. Reiner, U.: DDC-based search in the data of the German National Bibliography (2008) 0.14
    0.14330928 = sum of:
      0.14330928 = product of:
        0.597122 = sum of:
          0.07257763 = weight(abstract_txt:national in 4167) [ClassicSimilarity], result of:
            0.07257763 = score(doc=4167,freq=4.0), product of:
              0.10094039 = queryWeight, product of:
                1.1730007 = boost
                4.6016946 = idf(docFreq=1153, maxDocs=42306)
                0.018700315 = queryNorm
              0.71901476 = fieldWeight in 4167, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.6016946 = idf(docFreq=1153, maxDocs=42306)
                0.078125 = fieldNorm(doc=4167)
          0.01811106 = weight(abstract_txt:library in 4167) [ClassicSimilarity], result of:
            0.01811106 = score(doc=4167,freq=1.0), product of:
              0.0727014 = queryWeight, product of:
                1.2192221 = boost
                3.188681 = idf(docFreq=4740, maxDocs=42306)
                0.018700315 = queryNorm
              0.24911569 = fieldWeight in 4167, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.188681 = idf(docFreq=4740, maxDocs=42306)
                0.078125 = fieldNorm(doc=4167)
          0.04401356 = weight(abstract_txt:since in 4167) [ClassicSimilarity], result of:
            0.04401356 = score(doc=4167,freq=1.0), product of:
              0.11479971 = queryWeight, product of:
                1.2509391 = boost
                4.907448 = idf(docFreq=849, maxDocs=42306)
                0.018700315 = queryNorm
              0.38339436 = fieldWeight in 4167, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.907448 = idf(docFreq=849, maxDocs=42306)
                0.078125 = fieldNorm(doc=4167)
          0.18424258 = weight(abstract_txt:german in 4167) [ClassicSimilarity], result of:
            0.18424258 = score(doc=4167,freq=4.0), product of:
              0.18784074 = queryWeight, product of:
                1.60015 = boost
                6.2774057 = idf(docFreq=215, maxDocs=42306)
                0.018700315 = queryNorm
              0.9808446 = fieldWeight in 4167, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.2774057 = idf(docFreq=215, maxDocs=42306)
                0.078125 = fieldNorm(doc=4167)
          0.093479946 = weight(abstract_txt:series in 4167) [ClassicSimilarity], result of:
            0.093479946 = score(doc=4167,freq=1.0), product of:
              0.217133 = queryWeight, product of:
                2.1070476 = boost
                5.510647 = idf(docFreq=464, maxDocs=42306)
                0.018700315 = queryNorm
              0.43051928 = fieldWeight in 4167, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.510647 = idf(docFreq=464, maxDocs=42306)
                0.078125 = fieldNorm(doc=4167)
          0.18469729 = weight(abstract_txt:automatic in 4167) [ClassicSimilarity], result of:
            0.18469729 = score(doc=4167,freq=2.0), product of:
              0.32173142 = queryWeight, product of:
                3.311178 = boost
                5.1959147 = idf(docFreq=636, maxDocs=42306)
                0.018700315 = queryNorm
              0.5740729 = fieldWeight in 4167, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.1959147 = idf(docFreq=636, maxDocs=42306)
                0.078125 = fieldNorm(doc=4167)
        0.24 = coord(6/25)
    
  2. Junger, U.: Can indexing be automated? : the example of the Deutsche Nationalbibliothek (2012) 0.11
    0.10625831 = sum of:
      0.10625831 = product of:
        0.53129154 = sum of:
          0.058062106 = weight(abstract_txt:national in 3718) [ClassicSimilarity], result of:
            0.058062106 = score(doc=3718,freq=1.0), product of:
              0.10094039 = queryWeight, product of:
                1.1730007 = boost
                4.6016946 = idf(docFreq=1153, maxDocs=42306)
                0.018700315 = queryNorm
              0.5752118 = fieldWeight in 3718, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6016946 = idf(docFreq=1153, maxDocs=42306)
                0.125 = fieldNorm(doc=3718)
          0.028977696 = weight(abstract_txt:library in 3718) [ClassicSimilarity], result of:
            0.028977696 = score(doc=3718,freq=1.0), product of:
              0.0727014 = queryWeight, product of:
                1.2192221 = boost
                3.188681 = idf(docFreq=4740, maxDocs=42306)
                0.018700315 = queryNorm
              0.3985851 = fieldWeight in 3718, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.188681 = idf(docFreq=4740, maxDocs=42306)
                0.125 = fieldNorm(doc=3718)
          0.08986691 = weight(abstract_txt:publications in 3718) [ClassicSimilarity], result of:
            0.08986691 = score(doc=3718,freq=1.0), product of:
              0.13506298 = queryWeight, product of:
                1.3568566 = boost
                5.322963 = idf(docFreq=560, maxDocs=42306)
                0.018700315 = queryNorm
              0.6653704 = fieldWeight in 3718, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.322963 = idf(docFreq=560, maxDocs=42306)
                0.125 = fieldNorm(doc=3718)
          0.20844668 = weight(abstract_txt:german in 3718) [ClassicSimilarity], result of:
            0.20844668 = score(doc=3718,freq=2.0), product of:
              0.18784074 = queryWeight, product of:
                1.60015 = boost
                6.2774057 = idf(docFreq=215, maxDocs=42306)
                0.018700315 = queryNorm
              1.109699 = fieldWeight in 3718, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.2774057 = idf(docFreq=215, maxDocs=42306)
                0.125 = fieldNorm(doc=3718)
          0.14593814 = weight(abstract_txt:indexing in 3718) [ClassicSimilarity], result of:
            0.14593814 = score(doc=3718,freq=1.0), product of:
              0.26912627 = queryWeight, product of:
                3.3174505 = boost
                4.3381314 = idf(docFreq=1501, maxDocs=42306)
                0.018700315 = queryNorm
              0.5422664 = fieldWeight in 3718, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3381314 = idf(docFreq=1501, maxDocs=42306)
                0.125 = fieldNorm(doc=3718)
        0.2 = coord(5/25)
    
  3. Junger, U.: Can indexing be automated? : the example of the Deutsche Nationalbibliothek (2014) 0.11
    0.10625831 = sum of:
      0.10625831 = product of:
        0.53129154 = sum of:
          0.058062106 = weight(abstract_txt:national in 3970) [ClassicSimilarity], result of:
            0.058062106 = score(doc=3970,freq=1.0), product of:
              0.10094039 = queryWeight, product of:
                1.1730007 = boost
                4.6016946 = idf(docFreq=1153, maxDocs=42306)
                0.018700315 = queryNorm
              0.5752118 = fieldWeight in 3970, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6016946 = idf(docFreq=1153, maxDocs=42306)
                0.125 = fieldNorm(doc=3970)
          0.028977696 = weight(abstract_txt:library in 3970) [ClassicSimilarity], result of:
            0.028977696 = score(doc=3970,freq=1.0), product of:
              0.0727014 = queryWeight, product of:
                1.2192221 = boost
                3.188681 = idf(docFreq=4740, maxDocs=42306)
                0.018700315 = queryNorm
              0.3985851 = fieldWeight in 3970, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.188681 = idf(docFreq=4740, maxDocs=42306)
                0.125 = fieldNorm(doc=3970)
          0.08986691 = weight(abstract_txt:publications in 3970) [ClassicSimilarity], result of:
            0.08986691 = score(doc=3970,freq=1.0), product of:
              0.13506298 = queryWeight, product of:
                1.3568566 = boost
                5.322963 = idf(docFreq=560, maxDocs=42306)
                0.018700315 = queryNorm
              0.6653704 = fieldWeight in 3970, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.322963 = idf(docFreq=560, maxDocs=42306)
                0.125 = fieldNorm(doc=3970)
          0.20844668 = weight(abstract_txt:german in 3970) [ClassicSimilarity], result of:
            0.20844668 = score(doc=3970,freq=2.0), product of:
              0.18784074 = queryWeight, product of:
                1.60015 = boost
                6.2774057 = idf(docFreq=215, maxDocs=42306)
                0.018700315 = queryNorm
              1.109699 = fieldWeight in 3970, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.2774057 = idf(docFreq=215, maxDocs=42306)
                0.125 = fieldNorm(doc=3970)
          0.14593814 = weight(abstract_txt:indexing in 3970) [ClassicSimilarity], result of:
            0.14593814 = score(doc=3970,freq=1.0), product of:
              0.26912627 = queryWeight, product of:
                3.3174505 = boost
                4.3381314 = idf(docFreq=1501, maxDocs=42306)
                0.018700315 = queryNorm
              0.5422664 = fieldWeight in 3970, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3381314 = idf(docFreq=1501, maxDocs=42306)
                0.125 = fieldNorm(doc=3970)
        0.2 = coord(5/25)
    
  4. Svensson, L.G.; Jahns, Y.: PDF, CSV, RSS and other Acronyms : redefining the bibliographic services in the German National Library (2010) 0.10
    0.10484046 = sum of:
      0.10484046 = product of:
        0.43683526 = sum of:
          0.058062106 = weight(abstract_txt:national in 971) [ClassicSimilarity], result of:
            0.058062106 = score(doc=971,freq=4.0), product of:
              0.10094039 = queryWeight, product of:
                1.1730007 = boost
                4.6016946 = idf(docFreq=1153, maxDocs=42306)
                0.018700315 = queryNorm
              0.5752118 = fieldWeight in 971, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.6016946 = idf(docFreq=1153, maxDocs=42306)
                0.0625 = fieldNorm(doc=971)
          0.014488848 = weight(abstract_txt:library in 971) [ClassicSimilarity], result of:
            0.014488848 = score(doc=971,freq=1.0), product of:
              0.0727014 = queryWeight, product of:
                1.2192221 = boost
                3.188681 = idf(docFreq=4740, maxDocs=42306)
                0.018700315 = queryNorm
              0.19929256 = fieldWeight in 971, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.188681 = idf(docFreq=4740, maxDocs=42306)
                0.0625 = fieldNorm(doc=971)
          0.14283425 = weight(abstract_txt:discontinued in 971) [ClassicSimilarity], result of:
            0.14283425 = score(doc=971,freq=1.0), product of:
              0.23175797 = queryWeight, product of:
                1.2568057 = boost
                9.860925 = idf(docFreq=5, maxDocs=42306)
                0.018700315 = queryNorm
              0.6163078 = fieldWeight in 971, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.860925 = idf(docFreq=5, maxDocs=42306)
                0.0625 = fieldNorm(doc=971)
          0.07369704 = weight(abstract_txt:german in 971) [ClassicSimilarity], result of:
            0.07369704 = score(doc=971,freq=1.0), product of:
              0.18784074 = queryWeight, product of:
                1.60015 = boost
                6.2774057 = idf(docFreq=215, maxDocs=42306)
                0.018700315 = queryNorm
              0.39233786 = fieldWeight in 971, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2774057 = idf(docFreq=215, maxDocs=42306)
                0.0625 = fieldNorm(doc=971)
          0.07478396 = weight(abstract_txt:series in 971) [ClassicSimilarity], result of:
            0.07478396 = score(doc=971,freq=1.0), product of:
              0.217133 = queryWeight, product of:
                2.1070476 = boost
                5.510647 = idf(docFreq=464, maxDocs=42306)
                0.018700315 = queryNorm
              0.34441543 = fieldWeight in 971, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.510647 = idf(docFreq=464, maxDocs=42306)
                0.0625 = fieldNorm(doc=971)
          0.07296907 = weight(abstract_txt:indexing in 971) [ClassicSimilarity], result of:
            0.07296907 = score(doc=971,freq=1.0), product of:
              0.26912627 = queryWeight, product of:
                3.3174505 = boost
                4.3381314 = idf(docFreq=1501, maxDocs=42306)
                0.018700315 = queryNorm
              0.2711332 = fieldWeight in 971, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3381314 = idf(docFreq=1501, maxDocs=42306)
                0.0625 = fieldNorm(doc=971)
        0.24 = coord(6/25)
    
  5. Balakrishnan, U.; Voß, J.: ¬The Cocoda mapping tool (2015) 0.10
    0.10483602 = sum of:
      0.10483602 = product of:
        0.32761255 = sum of:
          0.018144408 = weight(abstract_txt:national in 1124) [ClassicSimilarity], result of:
            0.018144408 = score(doc=1124,freq=1.0), product of:
              0.10094039 = queryWeight, product of:
                1.1730007 = boost
                4.6016946 = idf(docFreq=1153, maxDocs=42306)
                0.018700315 = queryNorm
              0.17975369 = fieldWeight in 1124, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6016946 = idf(docFreq=1153, maxDocs=42306)
                0.0390625 = fieldNorm(doc=1124)
          0.015684638 = weight(abstract_txt:library in 1124) [ClassicSimilarity], result of:
            0.015684638 = score(doc=1124,freq=3.0), product of:
              0.0727014 = queryWeight, product of:
                1.2192221 = boost
                3.188681 = idf(docFreq=4740, maxDocs=42306)
                0.018700315 = queryNorm
              0.21574052 = fieldWeight in 1124, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.188681 = idf(docFreq=4740, maxDocs=42306)
                0.0390625 = fieldNorm(doc=1124)
          0.02200678 = weight(abstract_txt:since in 1124) [ClassicSimilarity], result of:
            0.02200678 = score(doc=1124,freq=1.0), product of:
              0.11479971 = queryWeight, product of:
                1.2509391 = boost
                4.907448 = idf(docFreq=849, maxDocs=42306)
                0.018700315 = queryNorm
              0.19169718 = fieldWeight in 1124, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.907448 = idf(docFreq=849, maxDocs=42306)
                0.0390625 = fieldNorm(doc=1124)
          0.046060644 = weight(abstract_txt:german in 1124) [ClassicSimilarity], result of:
            0.046060644 = score(doc=1124,freq=1.0), product of:
              0.18784074 = queryWeight, product of:
                1.60015 = boost
                6.2774057 = idf(docFreq=215, maxDocs=42306)
                0.018700315 = queryNorm
              0.24521115 = fieldWeight in 1124, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2774057 = idf(docFreq=215, maxDocs=42306)
                0.0390625 = fieldNorm(doc=1124)
          0.020809505 = weight(abstract_txt:content in 1124) [ClassicSimilarity], result of:
            0.020809505 = score(doc=1124,freq=1.0), product of:
              0.12660223 = queryWeight, product of:
                1.6089113 = boost
                4.207851 = idf(docFreq=1710, maxDocs=42306)
                0.018700315 = queryNorm
              0.16436918 = fieldWeight in 1124, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.207851 = idf(docFreq=1710, maxDocs=42306)
                0.0390625 = fieldNorm(doc=1124)
          0.028700208 = weight(abstract_txt:quality in 1124) [ClassicSimilarity], result of:
            0.028700208 = score(doc=1124,freq=1.0), product of:
              0.15686409 = queryWeight, product of:
                1.7909076 = boost
                4.6838336 = idf(docFreq=1062, maxDocs=42306)
                0.018700315 = queryNorm
              0.18296225 = fieldWeight in 1124, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6838336 = idf(docFreq=1062, maxDocs=42306)
                0.0390625 = fieldNorm(doc=1124)
          0.1306007 = weight(abstract_txt:automatic in 1124) [ClassicSimilarity], result of:
            0.1306007 = score(doc=1124,freq=4.0), product of:
              0.32173142 = queryWeight, product of:
                3.311178 = boost
                5.1959147 = idf(docFreq=636, maxDocs=42306)
                0.018700315 = queryNorm
              0.40593085 = fieldWeight in 1124, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.1959147 = idf(docFreq=636, maxDocs=42306)
                0.0390625 = fieldNorm(doc=1124)
          0.045605667 = weight(abstract_txt:indexing in 1124) [ClassicSimilarity], result of:
            0.045605667 = score(doc=1124,freq=1.0), product of:
              0.26912627 = queryWeight, product of:
                3.3174505 = boost
                4.3381314 = idf(docFreq=1501, maxDocs=42306)
                0.018700315 = queryNorm
              0.16945826 = fieldWeight in 1124, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3381314 = idf(docFreq=1501, maxDocs=42306)
                0.0390625 = fieldNorm(doc=1124)
        0.32 = coord(8/25)