Document (#39819)

Author
Baierer, K.
Zumstein, P.
Title
Verbesserung der OCR in digitalen Sammlungen von Bibliotheken
Source
027.7 Zeitschrift für Bibliothekskultur. 4(2016), H.2
Year
2016
Abstract
Möglichkeiten zur Verbesserung der automatischen Texterkennung (OCR) in digitalen Sammlungen insbesondere durch computerlinguistische Methoden werden beschrieben und bisherige PostOCR-Verfahren analysiert. Im Gegensatz zu diesen Möglichkeiten aus der Forschung oder aus einzelnen Projekten unterscheidet sich die momentane Anwendung von OCR in der Bibliothekspraxis wesentlich und nutzt das Potential nur teilweise aus.
Content
Beitrag in einem Themenschwerpunkt 'Computerlinguistik und Bibliotheken'. Vgl.: http://0277.ch/ojs/index.php/cdrs_0277/article/view/155/353.
Theme
Computerlinguistik
Aid
OCR

Similar documents (author)

  1. Zumstein, P.: ¬Die Rolle des Semantic Web für Bibliotheken : Linked Open Data und mehr: Welche Strategien können hier die Bibliotheken in die Zukunft führen? (2012) 6.19
    6.190705 = sum of:
      6.190705 = weight(author_txt:zumstein in 2450) [ClassicSimilarity], result of:
        6.190705 = fieldWeight in 2450, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.905128 = idf(docFreq=5, maxDocs=44218)
          0.625 = fieldNorm(doc=2450)
    
  2. Zumstein, P.; Stöhr, M.: Zur Nachnutzung von bibliographischen Katalog- und Normdaten für die persönliche Literaturverwaltung und Wissensorganisation (2015) 4.95
    4.952564 = sum of:
      4.952564 = weight(author_txt:zumstein in 3192) [ClassicSimilarity], result of:
        4.952564 = fieldWeight in 3192, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.905128 = idf(docFreq=5, maxDocs=44218)
          0.5 = fieldNorm(doc=3192)
    
  3. Kim, T.C.-w.K.; Zumstein, P.: Semiautomatische Katalogisierung und Normdatenverknüpfung mit Zotero im Index Theologicus (2016) 4.33
    4.333493 = sum of:
      4.333493 = weight(author_txt:zumstein in 3064) [ClassicSimilarity], result of:
        4.333493 = fieldWeight in 3064, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.905128 = idf(docFreq=5, maxDocs=44218)
          0.4375 = fieldNorm(doc=3064)
    
  4. Daquino, M.; Peroni, S.; Shotton, D.; Colavizza, G.; Ghavimi, B.; Lauscher, A.; Mayr, P.; Romanello, M.; Zumstein, P.: ¬The OpenCitations Data Model (2020) 2.17
    2.1667466 = sum of:
      2.1667466 = weight(author_txt:zumstein in 38) [ClassicSimilarity], result of:
        2.1667466 = fieldWeight in 38, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.905128 = idf(docFreq=5, maxDocs=44218)
          0.21875 = fieldNorm(doc=38)
    

Similar documents (content)

  1. Hoffmann, R.: Entwicklung einer benutzerunterstützten automatisierten Klassifikation von Web - Dokumenten : Untersuchung gegenwärtiger Methoden zur automatisierten Dokumentklassifikation und Implementierung eines Prototyps zum verbesserten Information Retrieval für das xFIND System (2002) 0.16
    0.15995385 = sum of:
      0.15995385 = product of:
        0.57126373 = sum of:
          0.03134206 = weight(abstract_txt:verfahren in 4197) [ClassicSimilarity], result of:
            0.03134206 = score(doc=4197,freq=1.0), product of:
              0.116041556 = queryWeight, product of:
                1.2436895 = boost
                5.761993 = idf(docFreq=377, maxDocs=44218)
                0.016193056 = queryNorm
              0.2700934 = fieldWeight in 4197, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.761993 = idf(docFreq=377, maxDocs=44218)
                0.046875 = fieldNorm(doc=4197)
          0.06338486 = weight(abstract_txt:methoden in 4197) [ClassicSimilarity], result of:
            0.06338486 = score(doc=4197,freq=4.0), product of:
              0.11690476 = queryWeight, product of:
                1.2483068 = boost
                5.7833843 = idf(docFreq=369, maxDocs=44218)
                0.016193056 = queryNorm
              0.5421923 = fieldWeight in 4197, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.7833843 = idf(docFreq=369, maxDocs=44218)
                0.046875 = fieldNorm(doc=4197)
          0.054779578 = weight(abstract_txt:beschrieben in 4197) [ClassicSimilarity], result of:
            0.054779578 = score(doc=4197,freq=2.0), product of:
              0.13363831 = queryWeight, product of:
                1.3346602 = boost
                6.1834583 = idf(docFreq=247, maxDocs=44218)
                0.016193056 = queryNorm
              0.40990925 = fieldWeight in 4197, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.1834583 = idf(docFreq=247, maxDocs=44218)
                0.046875 = fieldNorm(doc=4197)
          0.11912879 = weight(abstract_txt:automatischen in 4197) [ClassicSimilarity], result of:
            0.11912879 = score(doc=4197,freq=5.0), product of:
              0.1652785 = queryWeight, product of:
                1.4842716 = boost
                6.8766055 = idf(docFreq=123, maxDocs=44218)
                0.016193056 = queryNorm
              0.72077614 = fieldWeight in 4197, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.8766055 = idf(docFreq=123, maxDocs=44218)
                0.046875 = fieldNorm(doc=4197)
          0.057005182 = weight(abstract_txt:teilweise in 4197) [ClassicSimilarity], result of:
            0.057005182 = score(doc=4197,freq=1.0), product of:
              0.17290388 = queryWeight, product of:
                1.5181252 = boost
                7.033448 = idf(docFreq=105, maxDocs=44218)
                0.016193056 = queryNorm
              0.3296929 = fieldWeight in 4197, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.033448 = idf(docFreq=105, maxDocs=44218)
                0.046875 = fieldNorm(doc=4197)
          0.055647954 = weight(abstract_txt:möglichkeiten in 4197) [ClassicSimilarity], result of:
            0.055647954 = score(doc=4197,freq=1.0), product of:
              0.2143736 = queryWeight, product of:
                2.3905942 = boost
                5.5377917 = idf(docFreq=472, maxDocs=44218)
                0.016193056 = queryNorm
              0.25958398 = fieldWeight in 4197, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5377917 = idf(docFreq=472, maxDocs=44218)
                0.046875 = fieldNorm(doc=4197)
          0.18997534 = weight(abstract_txt:verbesserung in 4197) [ClassicSimilarity], result of:
            0.18997534 = score(doc=4197,freq=3.0), product of:
              0.33699974 = queryWeight, product of:
                2.997333 = boost
                6.943297 = idf(docFreq=115, maxDocs=44218)
                0.016193056 = queryNorm
              0.5637255 = fieldWeight in 4197, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.943297 = idf(docFreq=115, maxDocs=44218)
                0.046875 = fieldNorm(doc=4197)
        0.28 = coord(7/25)
    
  2. Schmitz, K.-D.: Wörterbuch, Thesaurus, Terminologie, Ontologie : Was tragen Terminologiewissenschaft und Informationswissenschaft zur Wissensordnung bei? (2006) 0.16
    0.15946501 = sum of:
      0.15946501 = product of:
        0.6644376 = sum of:
          0.057164267 = weight(abstract_txt:diesen in 6075) [ClassicSimilarity], result of:
            0.057164267 = score(doc=6075,freq=1.0), product of:
              0.10912517 = queryWeight, product of:
                1.2060566 = boost
                5.58764 = idf(docFreq=449, maxDocs=44218)
                0.016193056 = queryNorm
              0.52384126 = fieldWeight in 6075, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.58764 = idf(docFreq=449, maxDocs=44218)
                0.09375 = fieldNorm(doc=6075)
          0.06268412 = weight(abstract_txt:verfahren in 6075) [ClassicSimilarity], result of:
            0.06268412 = score(doc=6075,freq=1.0), product of:
              0.116041556 = queryWeight, product of:
                1.2436895 = boost
                5.761993 = idf(docFreq=377, maxDocs=44218)
                0.016193056 = queryNorm
              0.5401868 = fieldWeight in 6075, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.761993 = idf(docFreq=377, maxDocs=44218)
                0.09375 = fieldNorm(doc=6075)
          0.06338486 = weight(abstract_txt:methoden in 6075) [ClassicSimilarity], result of:
            0.06338486 = score(doc=6075,freq=1.0), product of:
              0.11690476 = queryWeight, product of:
                1.2483068 = boost
                5.7833843 = idf(docFreq=369, maxDocs=44218)
                0.016193056 = queryNorm
              0.5421923 = fieldWeight in 6075, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7833843 = idf(docFreq=369, maxDocs=44218)
                0.09375 = fieldNorm(doc=6075)
          0.06447509 = weight(abstract_txt:einzelnen in 6075) [ClassicSimilarity], result of:
            0.06447509 = score(doc=6075,freq=1.0), product of:
              0.118241474 = queryWeight, product of:
                1.2554232 = boost
                5.8163543 = idf(docFreq=357, maxDocs=44218)
                0.016193056 = queryNorm
              0.5452832 = fieldWeight in 6075, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8163543 = idf(docFreq=357, maxDocs=44218)
                0.09375 = fieldNorm(doc=6075)
          0.14898512 = weight(abstract_txt:nutzt in 6075) [ClassicSimilarity], result of:
            0.14898512 = score(doc=6075,freq=1.0), product of:
              0.20666666 = queryWeight, product of:
                1.6597414 = boost
                7.689554 = idf(docFreq=54, maxDocs=44218)
                0.016193056 = queryNorm
              0.7208957 = fieldWeight in 6075, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.689554 = idf(docFreq=54, maxDocs=44218)
                0.09375 = fieldNorm(doc=6075)
          0.26774412 = weight(abstract_txt:sammlungen in 6075) [ClassicSimilarity], result of:
            0.26774412 = score(doc=6075,freq=1.0), product of:
              0.38488573 = queryWeight, product of:
                3.2032154 = boost
                7.4202213 = idf(docFreq=71, maxDocs=44218)
                0.016193056 = queryNorm
              0.69564575 = fieldWeight in 6075, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.4202213 = idf(docFreq=71, maxDocs=44218)
                0.09375 = fieldNorm(doc=6075)
        0.24 = coord(6/25)
    
  3. Plank, M.: AV-Portal für wissenschaftliche Filme : Analyse der Nutzerbedarfe (2010) 0.14
    0.13928044 = sum of:
      0.13928044 = product of:
        0.5803352 = sum of:
          0.040745243 = weight(abstract_txt:bibliotheken in 4670) [ClassicSimilarity], result of:
            0.040745243 = score(doc=4670,freq=1.0), product of:
              0.07857094 = queryWeight, product of:
                1.0233783 = boost
                4.7412944 = idf(docFreq=1048, maxDocs=44218)
                0.016193056 = queryNorm
              0.51857907 = fieldWeight in 4670, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7412944 = idf(docFreq=1048, maxDocs=44218)
                0.109375 = fieldNorm(doc=4670)
          0.07081254 = weight(abstract_txt:forschung in 4670) [ClassicSimilarity], result of:
            0.07081254 = score(doc=4670,freq=1.0), product of:
              0.113575354 = queryWeight, product of:
                1.2304027 = boost
                5.700435 = idf(docFreq=401, maxDocs=44218)
                0.016193056 = queryNorm
              0.6234851 = fieldWeight in 4670, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.700435 = idf(docFreq=401, maxDocs=44218)
                0.109375 = fieldNorm(doc=4670)
          0.073948994 = weight(abstract_txt:methoden in 4670) [ClassicSimilarity], result of:
            0.073948994 = score(doc=4670,freq=1.0), product of:
              0.11690476 = queryWeight, product of:
                1.2483068 = boost
                5.7833843 = idf(docFreq=369, maxDocs=44218)
                0.016193056 = queryNorm
              0.63255763 = fieldWeight in 4670, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7833843 = idf(docFreq=369, maxDocs=44218)
                0.109375 = fieldNorm(doc=4670)
          0.11898184 = weight(abstract_txt:analysiert in 4670) [ClassicSimilarity], result of:
            0.11898184 = score(doc=4670,freq=1.0), product of:
              0.16052072 = queryWeight, product of:
                1.4627522 = boost
                6.7769065 = idf(docFreq=136, maxDocs=44218)
                0.016193056 = queryNorm
              0.74122417 = fieldWeight in 4670, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7769065 = idf(docFreq=136, maxDocs=44218)
                0.109375 = fieldNorm(doc=4670)
          0.1243107 = weight(abstract_txt:automatischen in 4670) [ClassicSimilarity], result of:
            0.1243107 = score(doc=4670,freq=1.0), product of:
              0.1652785 = queryWeight, product of:
                1.4842716 = boost
                6.8766055 = idf(docFreq=123, maxDocs=44218)
                0.016193056 = queryNorm
              0.7521287 = fieldWeight in 4670, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8766055 = idf(docFreq=123, maxDocs=44218)
                0.109375 = fieldNorm(doc=4670)
          0.1515359 = weight(abstract_txt:digitalen in 4670) [ClassicSimilarity], result of:
            0.1515359 = score(doc=4670,freq=1.0), product of:
              0.23762803 = queryWeight, product of:
                2.516918 = boost
                5.830419 = idf(docFreq=352, maxDocs=44218)
                0.016193056 = queryNorm
              0.6377021 = fieldWeight in 4670, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.830419 = idf(docFreq=352, maxDocs=44218)
                0.109375 = fieldNorm(doc=4670)
        0.24 = coord(6/25)
    
  4. Lepsky, K.: Im Heuhaufen suchen - und finden : Automatische Erschließung von Internetquellen: Möglichkeiten und Grenzen (1998) 0.12
    0.12474685 = sum of:
      0.12474685 = product of:
        0.51977855 = sum of:
          0.04115891 = weight(abstract_txt:bibliotheken in 4655) [ClassicSimilarity], result of:
            0.04115891 = score(doc=4655,freq=2.0), product of:
              0.07857094 = queryWeight, product of:
                1.0233783 = boost
                4.7412944 = idf(docFreq=1048, maxDocs=44218)
                0.016193056 = queryNorm
              0.52384394 = fieldWeight in 4655, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.7412944 = idf(docFreq=1048, maxDocs=44218)
                0.078125 = fieldNorm(doc=4655)
          0.047636885 = weight(abstract_txt:diesen in 4655) [ClassicSimilarity], result of:
            0.047636885 = score(doc=4655,freq=1.0), product of:
              0.10912517 = queryWeight, product of:
                1.2060566 = boost
                5.58764 = idf(docFreq=449, maxDocs=44218)
                0.016193056 = queryNorm
              0.43653435 = fieldWeight in 4655, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.58764 = idf(docFreq=449, maxDocs=44218)
                0.078125 = fieldNorm(doc=4655)
          0.05223677 = weight(abstract_txt:verfahren in 4655) [ClassicSimilarity], result of:
            0.05223677 = score(doc=4655,freq=1.0), product of:
              0.116041556 = queryWeight, product of:
                1.2436895 = boost
                5.761993 = idf(docFreq=377, maxDocs=44218)
                0.016193056 = queryNorm
              0.4501557 = fieldWeight in 4655, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.761993 = idf(docFreq=377, maxDocs=44218)
                0.078125 = fieldNorm(doc=4655)
          0.060617264 = weight(abstract_txt:anwendung in 4655) [ClassicSimilarity], result of:
            0.060617264 = score(doc=4655,freq=1.0), product of:
              0.12814261 = queryWeight, product of:
                1.306929 = boost
                6.0549803 = idf(docFreq=281, maxDocs=44218)
                0.016193056 = queryNorm
              0.47304535 = fieldWeight in 4655, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0549803 = idf(docFreq=281, maxDocs=44218)
                0.078125 = fieldNorm(doc=4655)
          0.09500863 = weight(abstract_txt:teilweise in 4655) [ClassicSimilarity], result of:
            0.09500863 = score(doc=4655,freq=1.0), product of:
              0.17290388 = queryWeight, product of:
                1.5181252 = boost
                7.033448 = idf(docFreq=105, maxDocs=44218)
                0.016193056 = queryNorm
              0.5494881 = fieldWeight in 4655, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.033448 = idf(docFreq=105, maxDocs=44218)
                0.078125 = fieldNorm(doc=4655)
          0.2231201 = weight(abstract_txt:sammlungen in 4655) [ClassicSimilarity], result of:
            0.2231201 = score(doc=4655,freq=1.0), product of:
              0.38488573 = queryWeight, product of:
                3.2032154 = boost
                7.4202213 = idf(docFreq=71, maxDocs=44218)
                0.016193056 = queryNorm
              0.57970476 = fieldWeight in 4655, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.4202213 = idf(docFreq=71, maxDocs=44218)
                0.078125 = fieldNorm(doc=4655)
        0.24 = coord(6/25)
    
  5. Kempf, A.O.: Automatische Indexierung in der sozialwissenschaftlichen Fachinformation : eine Evaluationsstudie zur maschinellen Erschließung für die Datenbank SOLIS (2012) 0.12
    0.12399297 = sum of:
      0.12399297 = product of:
        0.5166374 = sum of:
          0.05223677 = weight(abstract_txt:verfahren in 903) [ClassicSimilarity], result of:
            0.05223677 = score(doc=903,freq=1.0), product of:
              0.116041556 = queryWeight, product of:
                1.2436895 = boost
                5.761993 = idf(docFreq=377, maxDocs=44218)
                0.016193056 = queryNorm
              0.4501557 = fieldWeight in 903, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.761993 = idf(docFreq=377, maxDocs=44218)
                0.078125 = fieldNorm(doc=903)
          0.052820716 = weight(abstract_txt:methoden in 903) [ClassicSimilarity], result of:
            0.052820716 = score(doc=903,freq=1.0), product of:
              0.11690476 = queryWeight, product of:
                1.2483068 = boost
                5.7833843 = idf(docFreq=369, maxDocs=44218)
                0.016193056 = queryNorm
              0.4518269 = fieldWeight in 903, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7833843 = idf(docFreq=369, maxDocs=44218)
                0.078125 = fieldNorm(doc=903)
          0.06455835 = weight(abstract_txt:beschrieben in 903) [ClassicSimilarity], result of:
            0.06455835 = score(doc=903,freq=1.0), product of:
              0.13363831 = queryWeight, product of:
                1.3346602 = boost
                6.1834583 = idf(docFreq=247, maxDocs=44218)
                0.016193056 = queryNorm
              0.48308268 = fieldWeight in 903, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1834583 = idf(docFreq=247, maxDocs=44218)
                0.078125 = fieldNorm(doc=903)
          0.08498702 = weight(abstract_txt:analysiert in 903) [ClassicSimilarity], result of:
            0.08498702 = score(doc=903,freq=1.0), product of:
              0.16052072 = queryWeight, product of:
                1.4627522 = boost
                6.7769065 = idf(docFreq=136, maxDocs=44218)
                0.016193056 = queryNorm
              0.5294458 = fieldWeight in 903, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7769065 = idf(docFreq=136, maxDocs=44218)
                0.078125 = fieldNorm(doc=903)
          0.15379462 = weight(abstract_txt:automatischen in 903) [ClassicSimilarity], result of:
            0.15379462 = score(doc=903,freq=3.0), product of:
              0.1652785 = queryWeight, product of:
                1.4842716 = boost
                6.8766055 = idf(docFreq=123, maxDocs=44218)
                0.016193056 = queryNorm
              0.930518 = fieldWeight in 903, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.8766055 = idf(docFreq=123, maxDocs=44218)
                0.078125 = fieldNorm(doc=903)
          0.10823992 = weight(abstract_txt:digitalen in 903) [ClassicSimilarity], result of:
            0.10823992 = score(doc=903,freq=1.0), product of:
              0.23762803 = queryWeight, product of:
                2.516918 = boost
                5.830419 = idf(docFreq=352, maxDocs=44218)
                0.016193056 = queryNorm
              0.4555015 = fieldWeight in 903, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.830419 = idf(docFreq=352, maxDocs=44218)
                0.078125 = fieldNorm(doc=903)
        0.24 = coord(6/25)