Document (#36385)

Author
Mühlberger, G.
Title
Digitalisierung historischer Zeitungen aus dem Blickwinkel der automatisierten Text- und Strukturerkennung (OCR)
Source
Zeitschrift für Bibliothekswesen und Bibliographie. 58(2011) H.1, S.10-18
Year
2011
Abstract
Die OCR Erkennung ist eine Schlüsseltechnologie, an der man bei der systematischen Digitalisierung von historischen Zeitungen nicht vorbeikommen wird. Obwohl vielfach nur eine Wortgenauigkeit von 80% oder weniger für Zeitungen des 19. und 20. Jahrhunderts zu erzielen sein wird, bietet dieser fehlerhafte Volltext trotzdem die Grundlage für eine ganze Reihe interessanter Anwendungen - von der Volltextsuche, über die Indexierung durch Suchmaschinen bis zur Online-Korrektur durch Benutzer. Der Einsatz der OCR erfordert allerdings sowohl bei der Projektplanung, der Gestaltung des Workflows, der Durchführung der Qualitätskontrolle als auch der Konzeption der Langzeitarchivierung und der Präsentation im Internet ein Umdenken gegenüber herkömmlichen Digitalisierungsprojekten.
Form
Zeitungen
Object
OCR

Similar documents (author)

  1. Mühlberger, G.: ¬Der digitalisierte Nominalkatalog der Universitätsbibliothek Innsbruck (2004) 6.17
    6.169457 = sum of:
      6.169457 = weight(author_txt:mühlberger in 3201) [ClassicSimilarity], result of:
        6.169457 = fieldWeight in 3201, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.871131 = idf(docFreq=5, maxDocs=42740)
          0.625 = fieldNorm(doc=3201)
    
  2. Mühlberger, G.; Habitzel, K.: ¬Das digitalisierte Zeitungsausschnittarchiv : Im EU-Projekt LAURIN des Innsbrucker Zeitungsarchivs/IZA der Universität Inssbruck werden neue Wege der Archivierung und Bereitstellung gegangen (1998) 4.94
    4.9355655 = sum of:
      4.9355655 = weight(author_txt:mühlberger in 1830) [ClassicSimilarity], result of:
        4.9355655 = fieldWeight in 1830, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.871131 = idf(docFreq=5, maxDocs=42740)
          0.5 = fieldNorm(doc=1830)
    
  3. Mühlberger, G.; Klein, M.: Digitalisierte Zeitungsausschnitte im Internet : Das Innsbrucker Zeitungsarchiv zur deutsch- und frendsprachigen Literatur bietet seine Sammlung online an: http://iza.uibk.ac.at/ (2001) 4.94
    4.9355655 = sum of:
      4.9355655 = weight(author_txt:mühlberger in 915) [ClassicSimilarity], result of:
        4.9355655 = fieldWeight in 915, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.871131 = idf(docFreq=5, maxDocs=42740)
          0.5 = fieldNorm(doc=915)
    
  4. Sigmund, K.; Dawson, J.; Mühlberger, K.: Kurt Gödel : Das Album - The Album (2006) 3.70
    3.701674 = sum of:
      3.701674 = weight(author_txt:mühlberger in 1596) [ClassicSimilarity], result of:
        3.701674 = fieldWeight in 1596, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.871131 = idf(docFreq=5, maxDocs=42740)
          0.375 = fieldNorm(doc=1596)
    

Similar documents (content)

  1. Kugler, A.: Automatisierte Volltexterschließung von Retrodigitalisaten am Beispiel historischer Zeitungen (2018) 0.25
    0.25377116 = sum of:
      0.25377116 = product of:
        0.9063255 = sum of:
          0.11368503 = weight(abstract_txt:volltext in 596) [ClassicSimilarity], result of:
            0.11368503 = score(doc=596,freq=2.0), product of:
              0.14033516 = queryWeight, product of:
                7.332157 = idf(docFreq=75, maxDocs=42740)
                0.019139683 = queryNorm
              0.81009656 = fieldWeight in 596, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.332157 = idf(docFreq=75, maxDocs=42740)
                0.078125 = fieldNorm(doc=596)
          0.022245683 = weight(abstract_txt:wird in 596) [ClassicSimilarity], result of:
            0.022245683 = score(doc=596,freq=1.0), product of:
              0.07508413 = queryWeight, product of:
                1.0344412 = boost
                3.7923427 = idf(docFreq=2618, maxDocs=42740)
                0.019139683 = queryNorm
              0.29627678 = fieldWeight in 596, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.7923427 = idf(docFreq=2618, maxDocs=42740)
                0.078125 = fieldNorm(doc=596)
          0.12278348 = weight(abstract_txt:historischer in 596) [ClassicSimilarity], result of:
            0.12278348 = score(doc=596,freq=1.0), product of:
              0.18612339 = queryWeight, product of:
                1.1516412 = boost
                8.444015 = idf(docFreq=24, maxDocs=42740)
                0.019139683 = queryNorm
              0.65968865 = fieldWeight in 596, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.444015 = idf(docFreq=24, maxDocs=42740)
                0.078125 = fieldNorm(doc=596)
          0.1816483 = weight(abstract_txt:automatisierten in 596) [ClassicSimilarity], result of:
            0.1816483 = score(doc=596,freq=2.0), product of:
              0.19180144 = queryWeight, product of:
                1.1690758 = boost
                8.571848 = idf(docFreq=21, maxDocs=42740)
                0.019139683 = queryNorm
              0.9470643 = fieldWeight in 596, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.571848 = idf(docFreq=21, maxDocs=42740)
                0.078125 = fieldNorm(doc=596)
          0.037969787 = weight(abstract_txt:eine in 596) [ClassicSimilarity], result of:
            0.037969787 = score(doc=596,freq=2.0), product of:
              0.09743093 = queryWeight, product of:
                1.4431976 = boost
                3.5272505 = idf(docFreq=3413, maxDocs=42740)
                0.019139683 = queryNorm
              0.3897098 = fieldWeight in 596, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5272505 = idf(docFreq=3413, maxDocs=42740)
                0.078125 = fieldNorm(doc=596)
          0.12345035 = weight(abstract_txt:digitalisierung in 596) [ClassicSimilarity], result of:
            0.12345035 = score(doc=596,freq=1.0), product of:
              0.23534909 = queryWeight, product of:
                1.8314202 = boost
                6.7141304 = idf(docFreq=140, maxDocs=42740)
                0.019139683 = queryNorm
              0.52454144 = fieldWeight in 596, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7141304 = idf(docFreq=140, maxDocs=42740)
                0.078125 = fieldNorm(doc=596)
          0.30454284 = weight(abstract_txt:zeitungen in 596) [ClassicSimilarity], result of:
            0.30454284 = score(doc=596,freq=1.0), product of:
              0.4918662 = queryWeight, product of:
                3.242656 = boost
                7.925221 = idf(docFreq=41, maxDocs=42740)
                0.019139683 = queryNorm
              0.6191579 = fieldWeight in 596, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.925221 = idf(docFreq=41, maxDocs=42740)
                0.078125 = fieldNorm(doc=596)
        0.28 = coord(7/25)
    
  2. Mikro-Univers : 2. Workshop "Digitalisierung, Erschließung, Internetpräsentation und Langzeitarchivierung" (2004) 0.11
    0.11163314 = sum of:
      0.11163314 = product of:
        0.39868978 = sum of:
          0.06821102 = weight(abstract_txt:volltext in 4043) [ClassicSimilarity], result of:
            0.06821102 = score(doc=4043,freq=2.0), product of:
              0.14033516 = queryWeight, product of:
                7.332157 = idf(docFreq=75, maxDocs=42740)
                0.019139683 = queryNorm
              0.48605794 = fieldWeight in 4043, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.332157 = idf(docFreq=75, maxDocs=42740)
                0.046875 = fieldNorm(doc=4043)
          0.013347409 = weight(abstract_txt:wird in 4043) [ClassicSimilarity], result of:
            0.013347409 = score(doc=4043,freq=1.0), product of:
              0.07508413 = queryWeight, product of:
                1.0344412 = boost
                3.7923427 = idf(docFreq=2618, maxDocs=42740)
                0.019139683 = queryNorm
              0.17776605 = fieldWeight in 4043, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.7923427 = idf(docFreq=2618, maxDocs=42740)
                0.046875 = fieldNorm(doc=4043)
          0.019031314 = weight(abstract_txt:durch in 4043) [ClassicSimilarity], result of:
            0.019031314 = score(doc=4043,freq=1.0), product of:
              0.095117986 = queryWeight, product of:
                1.1642951 = boost
                4.2683973 = idf(docFreq=1626, maxDocs=42740)
                0.019139683 = queryNorm
              0.20008112 = fieldWeight in 4043, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2683973 = idf(docFreq=1626, maxDocs=42740)
                0.046875 = fieldNorm(doc=4043)
          0.08423232 = weight(abstract_txt:volltextsuche in 4043) [ClassicSimilarity], result of:
            0.08423232 = score(doc=4043,freq=1.0), product of:
              0.20351323 = queryWeight, product of:
                1.2042401 = boost
                8.829678 = idf(docFreq=16, maxDocs=42740)
                0.019139683 = queryNorm
              0.41389114 = fieldWeight in 4043, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.829678 = idf(docFreq=16, maxDocs=42740)
                0.046875 = fieldNorm(doc=4043)
          0.1037762 = weight(abstract_txt:digitalisierungsprojekten in 4043) [ClassicSimilarity], result of:
            0.1037762 = score(doc=4043,freq=1.0), product of:
              0.23388658 = queryWeight, product of:
                1.2909796 = boost
                9.465666 = idf(docFreq=8, maxDocs=42740)
                0.019139683 = queryNorm
              0.4437031 = fieldWeight in 4043, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.465666 = idf(docFreq=8, maxDocs=42740)
                0.046875 = fieldNorm(doc=4043)
          0.036021303 = weight(abstract_txt:eine in 4043) [ClassicSimilarity], result of:
            0.036021303 = score(doc=4043,freq=5.0), product of:
              0.09743093 = queryWeight, product of:
                1.4431976 = boost
                3.5272505 = idf(docFreq=3413, maxDocs=42740)
                0.019139683 = queryNorm
              0.3697112 = fieldWeight in 4043, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.5272505 = idf(docFreq=3413, maxDocs=42740)
                0.046875 = fieldNorm(doc=4043)
          0.07407021 = weight(abstract_txt:digitalisierung in 4043) [ClassicSimilarity], result of:
            0.07407021 = score(doc=4043,freq=1.0), product of:
              0.23534909 = queryWeight, product of:
                1.8314202 = boost
                6.7141304 = idf(docFreq=140, maxDocs=42740)
                0.019139683 = queryNorm
              0.31472486 = fieldWeight in 4043, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7141304 = idf(docFreq=140, maxDocs=42740)
                0.046875 = fieldNorm(doc=4043)
        0.28 = coord(7/25)
    
  3. Waidmann, S.: Erschließung historischer Bestände mittels Crowdsourcing : eine Analyse ausgewählter aktueller Projekte (2014) 0.11
    0.111042336 = sum of:
      0.111042336 = product of:
        0.55521166 = sum of:
          0.08038746 = weight(abstract_txt:volltext in 4461) [ClassicSimilarity], result of:
            0.08038746 = score(doc=4461,freq=1.0), product of:
              0.14033516 = queryWeight, product of:
                7.332157 = idf(docFreq=75, maxDocs=42740)
                0.019139683 = queryNorm
              0.5728248 = fieldWeight in 4461, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.332157 = idf(docFreq=75, maxDocs=42740)
                0.078125 = fieldNorm(doc=4461)
          0.12278348 = weight(abstract_txt:historischer in 4461) [ClassicSimilarity], result of:
            0.12278348 = score(doc=4461,freq=1.0), product of:
              0.18612339 = queryWeight, product of:
                1.1516412 = boost
                8.444015 = idf(docFreq=24, maxDocs=42740)
                0.019139683 = queryNorm
              0.65968865 = fieldWeight in 4461, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.444015 = idf(docFreq=24, maxDocs=42740)
                0.078125 = fieldNorm(doc=4461)
          0.07092552 = weight(abstract_txt:durch in 4461) [ClassicSimilarity], result of:
            0.07092552 = score(doc=4461,freq=5.0), product of:
              0.095117986 = queryWeight, product of:
                1.1642951 = boost
                4.2683973 = idf(docFreq=1626, maxDocs=42740)
                0.019139683 = queryNorm
              0.74565834 = fieldWeight in 4461, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.2683973 = idf(docFreq=1626, maxDocs=42740)
                0.078125 = fieldNorm(doc=4461)
          0.15766488 = weight(abstract_txt:korrektur in 4461) [ClassicSimilarity], result of:
            0.15766488 = score(doc=4461,freq=1.0), product of:
              0.219886 = queryWeight, product of:
                1.251744 = boost
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.019139683 = queryNorm
              0.71703005 = fieldWeight in 4461, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.078125 = fieldNorm(doc=4461)
          0.12345035 = weight(abstract_txt:digitalisierung in 4461) [ClassicSimilarity], result of:
            0.12345035 = score(doc=4461,freq=1.0), product of:
              0.23534909 = queryWeight, product of:
                1.8314202 = boost
                6.7141304 = idf(docFreq=140, maxDocs=42740)
                0.019139683 = queryNorm
              0.52454144 = fieldWeight in 4461, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7141304 = idf(docFreq=140, maxDocs=42740)
                0.078125 = fieldNorm(doc=4461)
        0.2 = coord(5/25)
    
  4. Lepsky, K.: Automatische Indexierung des Reallexikons zur Deutschen Kunstgeschichte (2006) 0.10
    0.10421906 = sum of:
      0.10421906 = product of:
        0.37221092 = sum of:
          0.06961758 = weight(abstract_txt:volltext in 1081) [ClassicSimilarity], result of:
            0.06961758 = score(doc=1081,freq=3.0), product of:
              0.14033516 = queryWeight, product of:
                7.332157 = idf(docFreq=75, maxDocs=42740)
                0.019139683 = queryNorm
              0.49608082 = fieldWeight in 1081, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.332157 = idf(docFreq=75, maxDocs=42740)
                0.0390625 = fieldNorm(doc=1081)
          0.019265328 = weight(abstract_txt:wird in 1081) [ClassicSimilarity], result of:
            0.019265328 = score(doc=1081,freq=3.0), product of:
              0.07508413 = queryWeight, product of:
                1.0344412 = boost
                3.7923427 = idf(docFreq=2618, maxDocs=42740)
                0.019139683 = queryNorm
              0.2565832 = fieldWeight in 1081, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.7923427 = idf(docFreq=2618, maxDocs=42740)
                0.0390625 = fieldNorm(doc=1081)
          0.0537769 = weight(abstract_txt:erzielen in 1081) [ClassicSimilarity], result of:
            0.0537769 = score(doc=1081,freq=1.0), product of:
              0.17039551 = queryWeight, product of:
                1.1019092 = boost
                8.079371 = idf(docFreq=35, maxDocs=42740)
                0.019139683 = queryNorm
              0.31560045 = fieldWeight in 1081, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.079371 = idf(docFreq=35, maxDocs=42740)
                0.0390625 = fieldNorm(doc=1081)
          0.022428617 = weight(abstract_txt:durch in 1081) [ClassicSimilarity], result of:
            0.022428617 = score(doc=1081,freq=2.0), product of:
              0.095117986 = queryWeight, product of:
                1.1642951 = boost
                4.2683973 = idf(docFreq=1626, maxDocs=42740)
                0.019139683 = queryNorm
              0.23579785 = fieldWeight in 1081, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2683973 = idf(docFreq=1626, maxDocs=42740)
                0.0390625 = fieldNorm(doc=1081)
          0.0701936 = weight(abstract_txt:volltextsuche in 1081) [ClassicSimilarity], result of:
            0.0701936 = score(doc=1081,freq=1.0), product of:
              0.20351323 = queryWeight, product of:
                1.2042401 = boost
                8.829678 = idf(docFreq=16, maxDocs=42740)
                0.019139683 = queryNorm
              0.34490928 = fieldWeight in 1081, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.829678 = idf(docFreq=16, maxDocs=42740)
                0.0390625 = fieldNorm(doc=1081)
          0.030017754 = weight(abstract_txt:eine in 1081) [ClassicSimilarity], result of:
            0.030017754 = score(doc=1081,freq=5.0), product of:
              0.09743093 = queryWeight, product of:
                1.4431976 = boost
                3.5272505 = idf(docFreq=3413, maxDocs=42740)
                0.019139683 = queryNorm
              0.30809265 = fieldWeight in 1081, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.5272505 = idf(docFreq=3413, maxDocs=42740)
                0.0390625 = fieldNorm(doc=1081)
          0.10691114 = weight(abstract_txt:digitalisierung in 1081) [ClassicSimilarity], result of:
            0.10691114 = score(doc=1081,freq=3.0), product of:
              0.23534909 = queryWeight, product of:
                1.8314202 = boost
                6.7141304 = idf(docFreq=140, maxDocs=42740)
                0.019139683 = queryNorm
              0.45426622 = fieldWeight in 1081, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.7141304 = idf(docFreq=140, maxDocs=42740)
                0.0390625 = fieldNorm(doc=1081)
        0.28 = coord(7/25)
    
  5. Meschede, L.: Plane mit, entscheide mit! (2019) 0.10
    0.10164661 = sum of:
      0.10164661 = product of:
        0.63529134 = sum of:
          0.053389635 = weight(abstract_txt:wird in 337) [ClassicSimilarity], result of:
            0.053389635 = score(doc=337,freq=1.0), product of:
              0.07508413 = queryWeight, product of:
                1.0344412 = boost
                3.7923427 = idf(docFreq=2618, maxDocs=42740)
                0.019139683 = queryNorm
              0.7110642 = fieldWeight in 337, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.7923427 = idf(docFreq=2618, maxDocs=42740)
                0.1875 = fieldNorm(doc=337)
          0.22118403 = weight(abstract_txt:vielfach in 337) [ClassicSimilarity], result of:
            0.22118403 = score(doc=337,freq=1.0), product of:
              0.15372199 = queryWeight, product of:
                1.0466096 = boost
                7.6739063 = idf(docFreq=53, maxDocs=42740)
                0.019139683 = queryNorm
              1.4388574 = fieldWeight in 337, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.6739063 = idf(docFreq=53, maxDocs=42740)
                0.1875 = fieldNorm(doc=337)
          0.06443687 = weight(abstract_txt:eine in 337) [ClassicSimilarity], result of:
            0.06443687 = score(doc=337,freq=1.0), product of:
              0.09743093 = queryWeight, product of:
                1.4431976 = boost
                3.5272505 = idf(docFreq=3413, maxDocs=42740)
                0.019139683 = queryNorm
              0.6613595 = fieldWeight in 337, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5272505 = idf(docFreq=3413, maxDocs=42740)
                0.1875 = fieldNorm(doc=337)
          0.29628083 = weight(abstract_txt:digitalisierung in 337) [ClassicSimilarity], result of:
            0.29628083 = score(doc=337,freq=1.0), product of:
              0.23534909 = queryWeight, product of:
                1.8314202 = boost
                6.7141304 = idf(docFreq=140, maxDocs=42740)
                0.019139683 = queryNorm
              1.2588995 = fieldWeight in 337, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7141304 = idf(docFreq=140, maxDocs=42740)
                0.1875 = fieldNorm(doc=337)
        0.16 = coord(4/25)