Document (#36384)

Author
Mühlberger, G.
Title
Digitalisierung historischer Zeitungen aus dem Blickwinkel der automatisierten Text- und Strukturerkennung (OCR)
Source
Zeitschrift für Bibliothekswesen und Bibliographie. 58(2011) H.1, S.10-18
Year
2011
Abstract
Die OCR Erkennung ist eine Schlüsseltechnologie, an der man bei der systematischen Digitalisierung von historischen Zeitungen nicht vorbeikommen wird. Obwohl vielfach nur eine Wortgenauigkeit von 80% oder weniger für Zeitungen des 19. und 20. Jahrhunderts zu erzielen sein wird, bietet dieser fehlerhafte Volltext trotzdem die Grundlage für eine ganze Reihe interessanter Anwendungen - von der Volltextsuche, über die Indexierung durch Suchmaschinen bis zur Online-Korrektur durch Benutzer. Der Einsatz der OCR erfordert allerdings sowohl bei der Projektplanung, der Gestaltung des Workflows, der Durchführung der Qualitätskontrolle als auch der Konzeption der Langzeitarchivierung und der Präsentation im Internet ein Umdenken gegenüber herkömmlichen Digitalisierungsprojekten.
Form
Zeitungen
Object
OCR

Similar documents (author)

  1. Mühlberger, G.: ¬Der digitalisierte Nominalkatalog der Universitätsbibliothek Innsbruck (2004) 6.19
    6.190705 = sum of:
      6.190705 = weight(author_txt:mühlberger in 2200) [ClassicSimilarity], result of:
        6.190705 = fieldWeight in 2200, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.905128 = idf(docFreq=5, maxDocs=44218)
          0.625 = fieldNorm(doc=2200)
    
  2. Mühlberger, G.; Habitzel, K.: ¬Das digitalisierte Zeitungsausschnittarchiv : Im EU-Projekt LAURIN des Innsbrucker Zeitungsarchivs/IZA der Universität Inssbruck werden neue Wege der Archivierung und Bereitstellung gegangen (1998) 4.95
    4.952564 = sum of:
      4.952564 = weight(author_txt:mühlberger in 829) [ClassicSimilarity], result of:
        4.952564 = fieldWeight in 829, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.905128 = idf(docFreq=5, maxDocs=44218)
          0.5 = fieldNorm(doc=829)
    
  3. Mühlberger, G.; Klein, M.: Digitalisierte Zeitungsausschnitte im Internet : Das Innsbrucker Zeitungsarchiv zur deutsch- und frendsprachigen Literatur bietet seine Sammlung online an: http://iza.uibk.ac.at/ (2001) 4.95
    4.952564 = sum of:
      4.952564 = weight(author_txt:mühlberger in 6914) [ClassicSimilarity], result of:
        4.952564 = fieldWeight in 6914, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.905128 = idf(docFreq=5, maxDocs=44218)
          0.5 = fieldNorm(doc=6914)
    
  4. Sigmund, K.; Dawson, J.; Mühlberger, K.: Kurt Gödel : Das Album - The Album (2006) 3.71
    3.7144227 = sum of:
      3.7144227 = weight(author_txt:mühlberger in 470) [ClassicSimilarity], result of:
        3.7144227 = fieldWeight in 470, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.905128 = idf(docFreq=5, maxDocs=44218)
          0.375 = fieldNorm(doc=470)
    

Similar documents (content)

  1. Kugler, A.: Automatisierte Volltexterschließung von Retrodigitalisaten am Beispiel historischer Zeitungen (2018) 0.25
    0.25096563 = sum of:
      0.25096563 = product of:
        0.8963058 = sum of:
          0.11591833 = weight(abstract_txt:volltext in 4595) [ClassicSimilarity], result of:
            0.11591833 = score(doc=4595,freq=2.0), product of:
              0.1426848 = queryWeight, product of:
                1.0017579 = boost
                7.3530817 = idf(docFreq=76, maxDocs=44218)
                0.019370712 = queryNorm
              0.8124084 = fieldWeight in 4595, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.3530817 = idf(docFreq=76, maxDocs=44218)
                0.078125 = fieldNorm(doc=4595)
          0.022150345 = weight(abstract_txt:wird in 4595) [ClassicSimilarity], result of:
            0.022150345 = score(doc=4595,freq=1.0), product of:
              0.07514209 = queryWeight, product of:
                1.0280886 = boost
                3.773177 = idf(docFreq=2761, maxDocs=44218)
                0.019370712 = queryNorm
              0.29477945 = fieldWeight in 4595, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.773177 = idf(docFreq=2761, maxDocs=44218)
                0.078125 = fieldNorm(doc=4595)
          0.12066394 = weight(abstract_txt:historischer in 4595) [ClassicSimilarity], result of:
            0.12066394 = score(doc=4595,freq=1.0), product of:
              0.18464518 = queryWeight, product of:
                1.1395749 = boost
                8.364683 = idf(docFreq=27, maxDocs=44218)
                0.019370712 = queryNorm
              0.6534909 = fieldWeight in 4595, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.364683 = idf(docFreq=27, maxDocs=44218)
                0.078125 = fieldNorm(doc=4595)
          0.17767484 = weight(abstract_txt:automatisierten in 4595) [ClassicSimilarity], result of:
            0.17767484 = score(doc=4595,freq=2.0), product of:
              0.18968235 = queryWeight, product of:
                1.1550143 = boost
                8.478011 = idf(docFreq=24, maxDocs=44218)
                0.019370712 = queryNorm
              0.93669677 = fieldWeight in 4595, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.478011 = idf(docFreq=24, maxDocs=44218)
                0.078125 = fieldNorm(doc=4595)
          0.0371576 = weight(abstract_txt:eine in 4595) [ClassicSimilarity], result of:
            0.0371576 = score(doc=4595,freq=2.0), product of:
              0.09638626 = queryWeight, product of:
                1.426074 = boost
                3.4892128 = idf(docFreq=3668, maxDocs=44218)
                0.019370712 = queryNorm
              0.3855072 = fieldWeight in 4595, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4892128 = idf(docFreq=3668, maxDocs=44218)
                0.078125 = fieldNorm(doc=4595)
          0.121451266 = weight(abstract_txt:digitalisierung in 4595) [ClassicSimilarity], result of:
            0.121451266 = score(doc=4595,freq=1.0), product of:
              0.23364921 = queryWeight, product of:
                1.8128883 = boost
                6.653462 = idf(docFreq=154, maxDocs=44218)
                0.019370712 = queryNorm
              0.51980174 = fieldWeight in 4595, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.653462 = idf(docFreq=154, maxDocs=44218)
                0.078125 = fieldNorm(doc=4595)
          0.3012895 = weight(abstract_txt:zeitungen in 4595) [ClassicSimilarity], result of:
            0.3012895 = score(doc=4595,freq=1.0), product of:
              0.4901354 = queryWeight, product of:
                3.215826 = boost
                7.8682456 = idf(docFreq=45, maxDocs=44218)
                0.019370712 = queryNorm
              0.6147067 = fieldWeight in 4595, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.8682456 = idf(docFreq=45, maxDocs=44218)
                0.078125 = fieldNorm(doc=4595)
        0.28 = coord(7/25)
    
  2. Neudecker, C.: Zur Kuratierung digitalisierter Dokumente mit Künstlicher Intelligenz : das Qurator-Projekt (2020) 0.13
    0.13273627 = sum of:
      0.13273627 = product of:
        0.6636813 = sum of:
          0.026580414 = weight(abstract_txt:wird in 47) [ClassicSimilarity], result of:
            0.026580414 = score(doc=47,freq=1.0), product of:
              0.07514209 = queryWeight, product of:
                1.0280886 = boost
                3.773177 = idf(docFreq=2761, maxDocs=44218)
                0.019370712 = queryNorm
              0.35373533 = fieldWeight in 47, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.773177 = idf(docFreq=2761, maxDocs=44218)
                0.09375 = fieldNorm(doc=47)
          0.037914604 = weight(abstract_txt:durch in 47) [ClassicSimilarity], result of:
            0.037914604 = score(doc=47,freq=1.0), product of:
              0.09521671 = queryWeight, product of:
                1.1572987 = boost
                4.2473893 = idf(docFreq=1718, maxDocs=44218)
                0.019370712 = queryNorm
              0.39819276 = fieldWeight in 47, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2473893 = idf(docFreq=1718, maxDocs=44218)
                0.09375 = fieldNorm(doc=47)
          0.031529266 = weight(abstract_txt:eine in 47) [ClassicSimilarity], result of:
            0.031529266 = score(doc=47,freq=1.0), product of:
              0.09638626 = queryWeight, product of:
                1.426074 = boost
                3.4892128 = idf(docFreq=3668, maxDocs=44218)
                0.019370712 = queryNorm
              0.3271137 = fieldWeight in 47, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4892128 = idf(docFreq=3668, maxDocs=44218)
                0.09375 = fieldNorm(doc=47)
          0.20610963 = weight(abstract_txt:digitalisierung in 47) [ClassicSimilarity], result of:
            0.20610963 = score(doc=47,freq=2.0), product of:
              0.23364921 = queryWeight, product of:
                1.8128883 = boost
                6.653462 = idf(docFreq=154, maxDocs=44218)
                0.019370712 = queryNorm
              0.88213277 = fieldWeight in 47, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.653462 = idf(docFreq=154, maxDocs=44218)
                0.09375 = fieldNorm(doc=47)
          0.3615474 = weight(abstract_txt:zeitungen in 47) [ClassicSimilarity], result of:
            0.3615474 = score(doc=47,freq=1.0), product of:
              0.4901354 = queryWeight, product of:
                3.215826 = boost
                7.8682456 = idf(docFreq=45, maxDocs=44218)
                0.019370712 = queryNorm
              0.737648 = fieldWeight in 47, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.8682456 = idf(docFreq=45, maxDocs=44218)
                0.09375 = fieldNorm(doc=47)
        0.2 = coord(5/25)
    
  3. Mikro-Univers : 2. Workshop "Digitalisierung, Erschließung, Internetpräsentation und Langzeitarchivierung" (2004) 0.11
    0.112127714 = sum of:
      0.112127714 = product of:
        0.40045613 = sum of:
          0.069551 = weight(abstract_txt:volltext in 3042) [ClassicSimilarity], result of:
            0.069551 = score(doc=3042,freq=2.0), product of:
              0.1426848 = queryWeight, product of:
                1.0017579 = boost
                7.3530817 = idf(docFreq=76, maxDocs=44218)
                0.019370712 = queryNorm
              0.48744506 = fieldWeight in 3042, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.3530817 = idf(docFreq=76, maxDocs=44218)
                0.046875 = fieldNorm(doc=3042)
          0.013290207 = weight(abstract_txt:wird in 3042) [ClassicSimilarity], result of:
            0.013290207 = score(doc=3042,freq=1.0), product of:
              0.07514209 = queryWeight, product of:
                1.0280886 = boost
                3.773177 = idf(docFreq=2761, maxDocs=44218)
                0.019370712 = queryNorm
              0.17686766 = fieldWeight in 3042, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.773177 = idf(docFreq=2761, maxDocs=44218)
                0.046875 = fieldNorm(doc=3042)
          0.018957302 = weight(abstract_txt:durch in 3042) [ClassicSimilarity], result of:
            0.018957302 = score(doc=3042,freq=1.0), product of:
              0.09521671 = queryWeight, product of:
                1.1572987 = boost
                4.2473893 = idf(docFreq=1718, maxDocs=44218)
                0.019370712 = queryNorm
              0.19909638 = fieldWeight in 3042, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2473893 = idf(docFreq=1718, maxDocs=44218)
                0.046875 = fieldNorm(doc=3042)
          0.084487535 = weight(abstract_txt:volltextsuche in 3042) [ClassicSimilarity], result of:
            0.084487535 = score(doc=3042,freq=1.0), product of:
              0.20466672 = queryWeight, product of:
                1.1997687 = boost
                8.806516 = idf(docFreq=17, maxDocs=44218)
                0.019370712 = queryNorm
              0.41280544 = fieldWeight in 3042, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.806516 = idf(docFreq=17, maxDocs=44218)
                0.046875 = fieldNorm(doc=3042)
          0.106048554 = weight(abstract_txt:digitalisierungsprojekten in 3042) [ClassicSimilarity], result of:
            0.106048554 = score(doc=3042,freq=1.0), product of:
              0.23815258 = queryWeight, product of:
                1.2942004 = boost
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.019370712 = queryNorm
              0.44529667 = fieldWeight in 3042, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.046875 = fieldNorm(doc=3042)
          0.03525079 = weight(abstract_txt:eine in 3042) [ClassicSimilarity], result of:
            0.03525079 = score(doc=3042,freq=5.0), product of:
              0.09638626 = queryWeight, product of:
                1.426074 = boost
                3.4892128 = idf(docFreq=3668, maxDocs=44218)
                0.019370712 = queryNorm
              0.36572424 = fieldWeight in 3042, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.4892128 = idf(docFreq=3668, maxDocs=44218)
                0.046875 = fieldNorm(doc=3042)
          0.07287075 = weight(abstract_txt:digitalisierung in 3042) [ClassicSimilarity], result of:
            0.07287075 = score(doc=3042,freq=1.0), product of:
              0.23364921 = queryWeight, product of:
                1.8128883 = boost
                6.653462 = idf(docFreq=154, maxDocs=44218)
                0.019370712 = queryNorm
              0.31188104 = fieldWeight in 3042, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.653462 = idf(docFreq=154, maxDocs=44218)
                0.046875 = fieldNorm(doc=3042)
        0.28 = coord(7/25)
    
  4. Waidmann, S.: Erschließung historischer Bestände mittels Crowdsourcing : eine Analyse ausgewählter aktueller Projekte (2014) 0.11
    0.111180596 = sum of:
      0.111180596 = product of:
        0.55590296 = sum of:
          0.08196664 = weight(abstract_txt:volltext in 2460) [ClassicSimilarity], result of:
            0.08196664 = score(doc=2460,freq=1.0), product of:
              0.1426848 = queryWeight, product of:
                1.0017579 = boost
                7.3530817 = idf(docFreq=76, maxDocs=44218)
                0.019370712 = queryNorm
              0.5744595 = fieldWeight in 2460, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3530817 = idf(docFreq=76, maxDocs=44218)
                0.078125 = fieldNorm(doc=2460)
          0.12066394 = weight(abstract_txt:historischer in 2460) [ClassicSimilarity], result of:
            0.12066394 = score(doc=2460,freq=1.0), product of:
              0.18464518 = queryWeight, product of:
                1.1395749 = boost
                8.364683 = idf(docFreq=27, maxDocs=44218)
                0.019370712 = queryNorm
              0.6534909 = fieldWeight in 2460, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.364683 = idf(docFreq=27, maxDocs=44218)
                0.078125 = fieldNorm(doc=2460)
          0.07064969 = weight(abstract_txt:durch in 2460) [ClassicSimilarity], result of:
            0.07064969 = score(doc=2460,freq=5.0), product of:
              0.09521671 = queryWeight, product of:
                1.1572987 = boost
                4.2473893 = idf(docFreq=1718, maxDocs=44218)
                0.019370712 = queryNorm
              0.7419884 = fieldWeight in 2460, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.2473893 = idf(docFreq=1718, maxDocs=44218)
                0.078125 = fieldNorm(doc=2460)
          0.16117145 = weight(abstract_txt:korrektur in 2460) [ClassicSimilarity], result of:
            0.16117145 = score(doc=2460,freq=1.0), product of:
              0.22394688 = queryWeight, product of:
                1.2550077 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.019370712 = queryNorm
              0.71968603 = fieldWeight in 2460, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.078125 = fieldNorm(doc=2460)
          0.121451266 = weight(abstract_txt:digitalisierung in 2460) [ClassicSimilarity], result of:
            0.121451266 = score(doc=2460,freq=1.0), product of:
              0.23364921 = queryWeight, product of:
                1.8128883 = boost
                6.653462 = idf(docFreq=154, maxDocs=44218)
                0.019370712 = queryNorm
              0.51980174 = fieldWeight in 2460, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.653462 = idf(docFreq=154, maxDocs=44218)
                0.078125 = fieldNorm(doc=2460)
        0.2 = coord(5/25)
    
  5. Lepsky, K.: Automatische Indexierung des Reallexikons zur Deutschen Kunstgeschichte (2006) 0.10
    0.104151875 = sum of:
      0.104151875 = product of:
        0.37197098 = sum of:
          0.07098519 = weight(abstract_txt:volltext in 6080) [ClassicSimilarity], result of:
            0.07098519 = score(doc=6080,freq=3.0), product of:
              0.1426848 = queryWeight, product of:
                1.0017579 = boost
                7.3530817 = idf(docFreq=76, maxDocs=44218)
                0.019370712 = queryNorm
              0.4974965 = fieldWeight in 6080, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.3530817 = idf(docFreq=76, maxDocs=44218)
                0.0390625 = fieldNorm(doc=6080)
          0.01918276 = weight(abstract_txt:wird in 6080) [ClassicSimilarity], result of:
            0.01918276 = score(doc=6080,freq=3.0), product of:
              0.07514209 = queryWeight, product of:
                1.0280886 = boost
                3.773177 = idf(docFreq=2761, maxDocs=44218)
                0.019370712 = queryNorm
              0.25528648 = fieldWeight in 6080, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.773177 = idf(docFreq=2761, maxDocs=44218)
                0.0390625 = fieldNorm(doc=6080)
          0.054499835 = weight(abstract_txt:erzielen in 6080) [ClassicSimilarity], result of:
            0.054499835 = score(doc=6080,freq=1.0), product of:
              0.17254528 = queryWeight, product of:
                1.1016039 = boost
                8.085969 = idf(docFreq=36, maxDocs=44218)
                0.019370712 = queryNorm
              0.31585816 = fieldWeight in 6080, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.085969 = idf(docFreq=36, maxDocs=44218)
                0.0390625 = fieldNorm(doc=6080)
          0.022341393 = weight(abstract_txt:durch in 6080) [ClassicSimilarity], result of:
            0.022341393 = score(doc=6080,freq=2.0), product of:
              0.09521671 = queryWeight, product of:
                1.1572987 = boost
                4.2473893 = idf(docFreq=1718, maxDocs=44218)
                0.019370712 = queryNorm
              0.23463732 = fieldWeight in 6080, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2473893 = idf(docFreq=1718, maxDocs=44218)
                0.0390625 = fieldNorm(doc=6080)
          0.07040627 = weight(abstract_txt:volltextsuche in 6080) [ClassicSimilarity], result of:
            0.07040627 = score(doc=6080,freq=1.0), product of:
              0.20466672 = queryWeight, product of:
                1.1997687 = boost
                8.806516 = idf(docFreq=17, maxDocs=44218)
                0.019370712 = queryNorm
              0.3440045 = fieldWeight in 6080, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.806516 = idf(docFreq=17, maxDocs=44218)
                0.0390625 = fieldNorm(doc=6080)
          0.029375661 = weight(abstract_txt:eine in 6080) [ClassicSimilarity], result of:
            0.029375661 = score(doc=6080,freq=5.0), product of:
              0.09638626 = queryWeight, product of:
                1.426074 = boost
                3.4892128 = idf(docFreq=3668, maxDocs=44218)
                0.019370712 = queryNorm
              0.3047702 = fieldWeight in 6080, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.4892128 = idf(docFreq=3668, maxDocs=44218)
                0.0390625 = fieldNorm(doc=6080)
          0.10517987 = weight(abstract_txt:digitalisierung in 6080) [ClassicSimilarity], result of:
            0.10517987 = score(doc=6080,freq=3.0), product of:
              0.23364921 = queryWeight, product of:
                1.8128883 = boost
                6.653462 = idf(docFreq=154, maxDocs=44218)
                0.019370712 = queryNorm
              0.45016146 = fieldWeight in 6080, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.653462 = idf(docFreq=154, maxDocs=44218)
                0.0390625 = fieldNorm(doc=6080)
        0.28 = coord(7/25)