Document (#41597)

Author
Kugler, A.
Title
Automatisierte Volltexterschließung von Retrodigitalisaten am Beispiel historischer Zeitungen
Source
Perspektive Bibliothek. 7(2018) H.1, S.33-54
Year
2018
Abstract
Seit ein paar Jahren postuliert die DFG in ihren Praxisregeln "Digitalisierung", dass eine ausschließliche Bilddigitalisierung nicht mehr den wissenschaftlichen Ansprüchen Genüge leiste, sondern der digitale Volltext notwendig sei, da dieser die Basis für eine wissenschaftliche Nachnutzung darstellt. Um ein besseres Verständnis davon zu erlangen, was sich hinter dem Begriff "Volltext" verbirgt, wird im Folgenden ein kleiner Einblick in die technischen Verfahren zur automatisierten Volltexterschließung von Retrodigitalisaten geboten. Fortschritte und auch Grenzen der aktuellen Methoden werden vorgestellt und wie Qualität in diesem Zusammenhang überhaupt bemessen werden kann. Die automatisierten Verfahren zur Volltexterschließung werden am Beispiel historischer Zeitungen erläutert, da deren Zugänglichmachung gerade in den Geisteswissenschaften ein großes Desiderat ist und diese Quellengattung zugleich aufgrund der Spaltenstruktur besondere technische Herausforderungen mit sich bringt. 2016 wurde das DFG-Projekt zur Erstellung eines "Masterplan Zeitungsdigitalisierung" fertiggestellt, dessen Ergebnisse hier einfließen.
Content
Vgl.: http://journals.ub.uni-heidelberg.de/index.php/bibliothek/article/view/48394. Vgl. auch: URN (PDF): http://nbn-resolving.de/urn:nbn:de:bsz:16-pb-483949.
Theme
Volltextretrieval
Automatisches Indexieren
Form
Zeitungen

Similar documents (content)

  1. Waidmann, S.: Erschließung historischer Bestände mittels Crowdsourcing : eine Analyse ausgewählter aktueller Projekte (2014) 0.07
    0.07448569 = sum of:
      0.07448569 = product of:
        0.46553555 = sum of:
          0.03359835 = weight(abstract_txt:werden in 3925) [ClassicSimilarity], result of:
            0.03359835 = score(doc=3925,freq=2.0), product of:
              0.08638289 = queryWeight, product of:
                1.3569318 = boost
                3.5203447 = idf(docFreq=3478, maxDocs=43254)
                0.018083584 = queryNorm
              0.3889468 = fieldWeight in 3925, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5203447 = idf(docFreq=3478, maxDocs=43254)
                0.078125 = fieldNorm(doc=3925)
          0.071666546 = weight(abstract_txt:beispiel in 3925) [ClassicSimilarity], result of:
            0.071666546 = score(doc=3925,freq=1.0), product of:
              0.15754561 = queryWeight, product of:
                1.4962415 = boost
                5.8226423 = idf(docFreq=347, maxDocs=43254)
                0.018083584 = queryNorm
              0.45489395 = fieldWeight in 3925, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8226423 = idf(docFreq=347, maxDocs=43254)
                0.078125 = fieldNorm(doc=3925)
          0.14380495 = weight(abstract_txt:volltext in 3925) [ClassicSimilarity], result of:
            0.14380495 = score(doc=3925,freq=1.0), product of:
              0.25063664 = queryWeight, product of:
                1.8872126 = boost
                7.3441114 = idf(docFreq=75, maxDocs=43254)
                0.018083584 = queryNorm
              0.5737587 = fieldWeight in 3925, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3441114 = idf(docFreq=75, maxDocs=43254)
                0.078125 = fieldNorm(doc=3925)
          0.21646571 = weight(abstract_txt:historischer in 3925) [ClassicSimilarity], result of:
            0.21646571 = score(doc=3925,freq=1.0), product of:
              0.32919616 = queryWeight, product of:
                2.1628475 = boost
                8.416748 = idf(docFreq=25, maxDocs=43254)
                0.018083584 = queryNorm
              0.65755844 = fieldWeight in 3925, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.416748 = idf(docFreq=25, maxDocs=43254)
                0.078125 = fieldNorm(doc=3925)
        0.16 = coord(4/25)
    
  2. Garbe, G.: Informationeller Mehrwert durch den Einsatz von Inhouse-Retrieval-Systemen : am Beispiel von Literaturbestellungen als Realisierung im Datenbanksystem STAR (1992) 0.06
    0.061725654 = sum of:
      0.061725654 = product of:
        0.51438046 = sum of:
          0.03801219 = weight(abstract_txt:werden in 1559) [ClassicSimilarity], result of:
            0.03801219 = score(doc=1559,freq=1.0), product of:
              0.08638289 = queryWeight, product of:
                1.3569318 = boost
                3.5203447 = idf(docFreq=3478, maxDocs=43254)
                0.018083584 = queryNorm
              0.4400431 = fieldWeight in 1559, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5203447 = idf(docFreq=3478, maxDocs=43254)
                0.125 = fieldNorm(doc=1559)
          0.11466647 = weight(abstract_txt:beispiel in 1559) [ClassicSimilarity], result of:
            0.11466647 = score(doc=1559,freq=1.0), product of:
              0.15754561 = queryWeight, product of:
                1.4962415 = boost
                5.8226423 = idf(docFreq=347, maxDocs=43254)
                0.018083584 = queryNorm
              0.7278303 = fieldWeight in 1559, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8226423 = idf(docFreq=347, maxDocs=43254)
                0.125 = fieldNorm(doc=1559)
          0.36170176 = weight(abstract_txt:automatisierten in 1559) [ClassicSimilarity], result of:
            0.36170176 = score(doc=1559,freq=1.0), product of:
              0.33885646 = queryWeight, product of:
                2.1943526 = boost
                8.5393505 = idf(docFreq=22, maxDocs=43254)
                0.018083584 = queryNorm
              1.0674188 = fieldWeight in 1559, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.5393505 = idf(docFreq=22, maxDocs=43254)
                0.125 = fieldNorm(doc=1559)
        0.12 = coord(3/25)
    
  3. Aufbau und Erschließung begrifflicher Datenbanken : Beiträge zur bibliothekarischen Klassifikation. Eine Auswahl von Vorträgen der Jahrestagungen 1993 (Kaiserslautern) und 1994 (Oldenburg) der Gesellschaft für Klassifikation (1995) 0.06
    0.061409425 = sum of:
      0.061409425 = product of:
        0.5117452 = sum of:
          0.03801219 = weight(abstract_txt:werden in 1990) [ClassicSimilarity], result of:
            0.03801219 = score(doc=1990,freq=1.0), product of:
              0.08638289 = queryWeight, product of:
                1.3569318 = boost
                3.5203447 = idf(docFreq=3478, maxDocs=43254)
                0.018083584 = queryNorm
              0.4400431 = fieldWeight in 1990, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5203447 = idf(docFreq=3478, maxDocs=43254)
                0.125 = fieldNorm(doc=1990)
          0.11203124 = weight(abstract_txt:verfahren in 1990) [ClassicSimilarity], result of:
            0.11203124 = score(doc=1990,freq=1.0), product of:
              0.15512249 = queryWeight, product of:
                1.4846904 = boost
                5.7776914 = idf(docFreq=363, maxDocs=43254)
                0.018083584 = queryNorm
              0.7222114 = fieldWeight in 1990, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7776914 = idf(docFreq=363, maxDocs=43254)
                0.125 = fieldNorm(doc=1990)
          0.36170176 = weight(abstract_txt:automatisierten in 1990) [ClassicSimilarity], result of:
            0.36170176 = score(doc=1990,freq=1.0), product of:
              0.33885646 = queryWeight, product of:
                2.1943526 = boost
                8.5393505 = idf(docFreq=22, maxDocs=43254)
                0.018083584 = queryNorm
              1.0674188 = fieldWeight in 1990, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.5393505 = idf(docFreq=22, maxDocs=43254)
                0.125 = fieldNorm(doc=1990)
        0.12 = coord(3/25)
    
  4. Gerick, T.: Content-based Information Retrieval auf Basis semantischer Abfragenetze : Kooperative Technologien am Beispsiel der Dokumentenrecherche in GENIOS Wirtschaftsdatenbanken (1999) 0.06
    0.06058485 = sum of:
      0.06058485 = product of:
        0.3786553 = sum of:
          0.03359835 = weight(abstract_txt:werden in 5875) [ClassicSimilarity], result of:
            0.03359835 = score(doc=5875,freq=2.0), product of:
              0.08638289 = queryWeight, product of:
                1.3569318 = boost
                3.5203447 = idf(docFreq=3478, maxDocs=43254)
                0.018083584 = queryNorm
              0.3889468 = fieldWeight in 5875, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5203447 = idf(docFreq=3478, maxDocs=43254)
                0.078125 = fieldNorm(doc=5875)
          0.07001952 = weight(abstract_txt:verfahren in 5875) [ClassicSimilarity], result of:
            0.07001952 = score(doc=5875,freq=1.0), product of:
              0.15512249 = queryWeight, product of:
                1.4846904 = boost
                5.7776914 = idf(docFreq=363, maxDocs=43254)
                0.018083584 = queryNorm
              0.45138213 = fieldWeight in 5875, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7776914 = idf(docFreq=363, maxDocs=43254)
                0.078125 = fieldNorm(doc=5875)
          0.071666546 = weight(abstract_txt:beispiel in 5875) [ClassicSimilarity], result of:
            0.071666546 = score(doc=5875,freq=1.0), product of:
              0.15754561 = queryWeight, product of:
                1.4962415 = boost
                5.8226423 = idf(docFreq=347, maxDocs=43254)
                0.018083584 = queryNorm
              0.45489395 = fieldWeight in 5875, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8226423 = idf(docFreq=347, maxDocs=43254)
                0.078125 = fieldNorm(doc=5875)
          0.20337091 = weight(abstract_txt:volltext in 5875) [ClassicSimilarity], result of:
            0.20337091 = score(doc=5875,freq=2.0), product of:
              0.25063664 = queryWeight, product of:
                1.8872126 = boost
                7.3441114 = idf(docFreq=75, maxDocs=43254)
                0.018083584 = queryNorm
              0.81141734 = fieldWeight in 5875, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.3441114 = idf(docFreq=75, maxDocs=43254)
                0.078125 = fieldNorm(doc=5875)
        0.16 = coord(4/25)
    
  5. Geisriegler, E.: Enriching electronic texts with semantic metadata : a use case for the historical Newspaper Collection ANNO (Austrian Newspapers Online) of the Austrian National Libraryhek (2012) 0.06
    0.05609619 = sum of:
      0.05609619 = product of:
        0.46746826 = sum of:
          0.019006096 = weight(abstract_txt:werden in 2060) [ClassicSimilarity], result of:
            0.019006096 = score(doc=2060,freq=1.0), product of:
              0.08638289 = queryWeight, product of:
                1.3569318 = boost
                3.5203447 = idf(docFreq=3478, maxDocs=43254)
                0.018083584 = queryNorm
              0.22002155 = fieldWeight in 2060, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5203447 = idf(docFreq=3478, maxDocs=43254)
                0.0625 = fieldNorm(doc=2060)
          0.20355919 = weight(abstract_txt:zeitungen in 2060) [ClassicSimilarity], result of:
            0.20355919 = score(doc=2060,freq=2.0), product of:
              0.29101753 = queryWeight, product of:
                2.0335653 = boost
                7.913645 = idf(docFreq=42, maxDocs=43254)
                0.018083584 = queryNorm
              0.699474 = fieldWeight in 2060, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.913645 = idf(docFreq=42, maxDocs=43254)
                0.0625 = fieldNorm(doc=2060)
          0.24490298 = weight(abstract_txt:historischer in 2060) [ClassicSimilarity], result of:
            0.24490298 = score(doc=2060,freq=2.0), product of:
              0.32919616 = queryWeight, product of:
                2.1628475 = boost
                8.416748 = idf(docFreq=25, maxDocs=43254)
                0.018083584 = queryNorm
              0.74394244 = fieldWeight in 2060, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.416748 = idf(docFreq=25, maxDocs=43254)
                0.0625 = fieldNorm(doc=2060)
        0.12 = coord(3/25)