Document (#41596)

Author
Kugler, A.
Title
Automatisierte Volltexterschließung von Retrodigitalisaten am Beispiel historischer Zeitungen
Source
Perspektive Bibliothek. 7(2018) H.1, S.33-54
Year
2018
Abstract
Seit ein paar Jahren postuliert die DFG in ihren Praxisregeln "Digitalisierung", dass eine ausschließliche Bilddigitalisierung nicht mehr den wissenschaftlichen Ansprüchen Genüge leiste, sondern der digitale Volltext notwendig sei, da dieser die Basis für eine wissenschaftliche Nachnutzung darstellt. Um ein besseres Verständnis davon zu erlangen, was sich hinter dem Begriff "Volltext" verbirgt, wird im Folgenden ein kleiner Einblick in die technischen Verfahren zur automatisierten Volltexterschließung von Retrodigitalisaten geboten. Fortschritte und auch Grenzen der aktuellen Methoden werden vorgestellt und wie Qualität in diesem Zusammenhang überhaupt bemessen werden kann. Die automatisierten Verfahren zur Volltexterschließung werden am Beispiel historischer Zeitungen erläutert, da deren Zugänglichmachung gerade in den Geisteswissenschaften ein großes Desiderat ist und diese Quellengattung zugleich aufgrund der Spaltenstruktur besondere technische Herausforderungen mit sich bringt. 2016 wurde das DFG-Projekt zur Erstellung eines "Masterplan Zeitungsdigitalisierung" fertiggestellt, dessen Ergebnisse hier einfließen.
Content
Vgl.: http://journals.ub.uni-heidelberg.de/index.php/bibliothek/article/view/48394. Vgl. auch: URN (PDF): http://nbn-resolving.de/urn:nbn:de:bsz:16-pb-483949.
Theme
Volltextretrieval
Automatisches Indexieren
Form
Zeitungen

Similar documents (content)

  1. Waidmann, S.: Erschließung historischer Bestände mittels Crowdsourcing : eine Analyse ausgewählter aktueller Projekte (2014) 0.07
    0.07415066 = sum of:
      0.07415066 = product of:
        0.4634416 = sum of:
          0.03334819 = weight(abstract_txt:werden in 2460) [ClassicSimilarity], result of:
            0.03334819 = score(doc=2460,freq=2.0), product of:
              0.08608424 = queryWeight, product of:
                1.3679293 = boost
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.017948015 = queryNorm
              0.3873902 = fieldWeight in 2460, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.078125 = fieldNorm(doc=2460)
          0.071657695 = weight(abstract_txt:beispiel in 2460) [ClassicSimilarity], result of:
            0.071657695 = score(doc=2460,freq=1.0), product of:
              0.15777212 = queryWeight, product of:
                1.512068 = boost
                5.813565 = idf(docFreq=358, maxDocs=44218)
                0.017948015 = queryNorm
              0.45418474 = fieldWeight in 2460, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.813565 = idf(docFreq=358, maxDocs=44218)
                0.078125 = fieldNorm(doc=2460)
          0.1449918 = weight(abstract_txt:volltext in 2460) [ClassicSimilarity], result of:
            0.1449918 = score(doc=2460,freq=1.0), product of:
              0.2523969 = queryWeight, product of:
                1.9124858 = boost
                7.3530817 = idf(docFreq=76, maxDocs=44218)
                0.017948015 = queryNorm
              0.5744595 = fieldWeight in 2460, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3530817 = idf(docFreq=76, maxDocs=44218)
                0.078125 = fieldNorm(doc=2460)
          0.21344392 = weight(abstract_txt:historischer in 2460) [ClassicSimilarity], result of:
            0.21344392 = score(doc=2460,freq=1.0), product of:
              0.32662112 = queryWeight, product of:
                2.1755965 = boost
                8.364683 = idf(docFreq=27, maxDocs=44218)
                0.017948015 = queryNorm
              0.6534909 = fieldWeight in 2460, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.364683 = idf(docFreq=27, maxDocs=44218)
                0.078125 = fieldNorm(doc=2460)
        0.16 = coord(4/25)
    
  2. Garbe, G.: Informationeller Mehrwert durch den Einsatz von Inhouse-Retrieval-Systemen : am Beispiel von Literaturbestellungen als Realisierung im Datenbanksystem STAR (1992) 0.06
    0.060955364 = sum of:
      0.060955364 = product of:
        0.5079614 = sum of:
          0.03772917 = weight(abstract_txt:werden in 558) [ClassicSimilarity], result of:
            0.03772917 = score(doc=558,freq=1.0), product of:
              0.08608424 = queryWeight, product of:
                1.3679293 = boost
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.017948015 = queryNorm
              0.43828195 = fieldWeight in 558, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.125 = fieldNorm(doc=558)
          0.114652306 = weight(abstract_txt:beispiel in 558) [ClassicSimilarity], result of:
            0.114652306 = score(doc=558,freq=1.0), product of:
              0.15777212 = queryWeight, product of:
                1.512068 = boost
                5.813565 = idf(docFreq=358, maxDocs=44218)
                0.017948015 = queryNorm
              0.7266956 = fieldWeight in 558, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.813565 = idf(docFreq=358, maxDocs=44218)
                0.125 = fieldNorm(doc=558)
          0.35557988 = weight(abstract_txt:automatisierten in 558) [ClassicSimilarity], result of:
            0.35557988 = score(doc=558,freq=1.0), product of:
              0.3355314 = queryWeight, product of:
                2.2050722 = boost
                8.478011 = idf(docFreq=24, maxDocs=44218)
                0.017948015 = queryNorm
              1.0597514 = fieldWeight in 558, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.478011 = idf(docFreq=24, maxDocs=44218)
                0.125 = fieldNorm(doc=558)
        0.12 = coord(3/25)
    
  3. Gerick, T.: Content-based Information Retrieval auf Basis semantischer Abfragenetze : Kooperative Technologien am Beispsiel der Dokumentenrecherche in GENIOS Wirtschaftsdatenbanken (1999) 0.06
    0.060771644 = sum of:
      0.060771644 = product of:
        0.3798228 = sum of:
          0.03334819 = weight(abstract_txt:werden in 3874) [ClassicSimilarity], result of:
            0.03334819 = score(doc=3874,freq=2.0), product of:
              0.08608424 = queryWeight, product of:
                1.3679293 = boost
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.017948015 = queryNorm
              0.3873902 = fieldWeight in 3874, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.078125 = fieldNorm(doc=3874)
          0.06976755 = weight(abstract_txt:verfahren in 3874) [ClassicSimilarity], result of:
            0.06976755 = score(doc=3874,freq=1.0), product of:
              0.15498537 = queryWeight, product of:
                1.4986546 = boost
                5.761993 = idf(docFreq=377, maxDocs=44218)
                0.017948015 = queryNorm
              0.4501557 = fieldWeight in 3874, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.761993 = idf(docFreq=377, maxDocs=44218)
                0.078125 = fieldNorm(doc=3874)
          0.071657695 = weight(abstract_txt:beispiel in 3874) [ClassicSimilarity], result of:
            0.071657695 = score(doc=3874,freq=1.0), product of:
              0.15777212 = queryWeight, product of:
                1.512068 = boost
                5.813565 = idf(docFreq=358, maxDocs=44218)
                0.017948015 = queryNorm
              0.45418474 = fieldWeight in 3874, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.813565 = idf(docFreq=358, maxDocs=44218)
                0.078125 = fieldNorm(doc=3874)
          0.20504937 = weight(abstract_txt:volltext in 3874) [ClassicSimilarity], result of:
            0.20504937 = score(doc=3874,freq=2.0), product of:
              0.2523969 = queryWeight, product of:
                1.9124858 = boost
                7.3530817 = idf(docFreq=76, maxDocs=44218)
                0.017948015 = queryNorm
              0.8124084 = fieldWeight in 3874, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.3530817 = idf(docFreq=76, maxDocs=44218)
                0.078125 = fieldNorm(doc=3874)
        0.16 = coord(4/25)
    
  4. Aufbau und Erschließung begrifflicher Datenbanken : Beiträge zur bibliothekarischen Klassifikation. Eine Auswahl von Vorträgen der Jahrestagungen 1993 (Kaiserslautern) und 1994 (Oldenburg) der Gesellschaft für Klassifikation (1995) 0.06
    0.060592454 = sum of:
      0.060592454 = product of:
        0.5049371 = sum of:
          0.03772917 = weight(abstract_txt:werden in 921) [ClassicSimilarity], result of:
            0.03772917 = score(doc=921,freq=1.0), product of:
              0.08608424 = queryWeight, product of:
                1.3679293 = boost
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.017948015 = queryNorm
              0.43828195 = fieldWeight in 921, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.125 = fieldNorm(doc=921)
          0.11162808 = weight(abstract_txt:verfahren in 921) [ClassicSimilarity], result of:
            0.11162808 = score(doc=921,freq=1.0), product of:
              0.15498537 = queryWeight, product of:
                1.4986546 = boost
                5.761993 = idf(docFreq=377, maxDocs=44218)
                0.017948015 = queryNorm
              0.7202491 = fieldWeight in 921, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.761993 = idf(docFreq=377, maxDocs=44218)
                0.125 = fieldNorm(doc=921)
          0.35557988 = weight(abstract_txt:automatisierten in 921) [ClassicSimilarity], result of:
            0.35557988 = score(doc=921,freq=1.0), product of:
              0.3355314 = queryWeight, product of:
                2.2050722 = boost
                8.478011 = idf(docFreq=24, maxDocs=44218)
                0.017948015 = queryNorm
              1.0597514 = fieldWeight in 921, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.478011 = idf(docFreq=24, maxDocs=44218)
                0.125 = fieldNorm(doc=921)
        0.12 = coord(3/25)
    
  5. Geisriegler, E.: Enriching electronic texts with semantic metadata : a use case for the historical Newspaper Collection ANNO (Austrian Newspapers Online) of the Austrian National Libraryhek (2012) 0.06
    0.05536063 = sum of:
      0.05536063 = product of:
        0.46133858 = sum of:
          0.018864585 = weight(abstract_txt:werden in 595) [ClassicSimilarity], result of:
            0.018864585 = score(doc=595,freq=1.0), product of:
              0.08608424 = queryWeight, product of:
                1.3679293 = boost
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.017948015 = queryNorm
              0.21914098 = fieldWeight in 595, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.0625 = fieldNorm(doc=595)
          0.20098978 = weight(abstract_txt:zeitungen in 595) [ClassicSimilarity], result of:
            0.20098978 = score(doc=595,freq=2.0), product of:
              0.28900215 = queryWeight, product of:
                2.0464764 = boost
                7.8682456 = idf(docFreq=45, maxDocs=44218)
                0.017948015 = queryNorm
              0.6954612 = fieldWeight in 595, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.8682456 = idf(docFreq=45, maxDocs=44218)
                0.0625 = fieldNorm(doc=595)
          0.24148421 = weight(abstract_txt:historischer in 595) [ClassicSimilarity], result of:
            0.24148421 = score(doc=595,freq=2.0), product of:
              0.32662112 = queryWeight, product of:
                2.1755965 = boost
                8.364683 = idf(docFreq=27, maxDocs=44218)
                0.017948015 = queryNorm
              0.7393405 = fieldWeight in 595, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.364683 = idf(docFreq=27, maxDocs=44218)
                0.0625 = fieldNorm(doc=595)
        0.12 = coord(3/25)