Document (#16215)

Author
Hüfner, J.
Title
Steigerung der Erkennungsgenauigkeit durch maschinellen Abgleich verschiedener durch OCR erzeugter Volltexte
Source
Nachrichten für Dokumentation. 48(1997) H.2, S.79-85
Year
1997
Abstract
Die Erzeugung von Volltexten durch OCR-Programme und die dabeieingesetzten Verfahren werden zusammen mit den dabei auftretenden Fehlern vorgestellt. Dabei wird nach systemunabhängigen und systemimmanenten Fehlern und den Bewertungskriterien zur Beurteilung der Programme unterschieden. Basierend auf der Hypothese, die durch Beispiele gestützt wird, daß systemimmanente Fehler durch den Abgleich von Volltexten, die mit verschiedenen OCR-Programmen erzeugt wurden, reduziert werden können, werden eine Versuchsanordnung und das Ergebnis daraus vorgestellt. Kernpunkt des Abgleichs ist ein Wortvergleichsalgorithmus, der einen vollautomatischen Vergleich zuläßt und damit den manuellen Nachbearbeitungsaufwand reduziert
Object
OCR

Similar documents (content)

  1. Viegener, J.; Maurer, A.: ¬Ein Ansatz zur Dynamisierung von Thesauri in Informationssystemen (1993) 0.12
    0.117893726 = sum of:
      0.117893726 = product of:
        0.49122387 = sum of:
          0.047170967 = weight(abstract_txt:wird in 5590) [ClassicSimilarity], result of:
            0.047170967 = score(doc=5590,freq=3.0), product of:
              0.06599164 = queryWeight, product of:
                1.0111986 = boost
                3.773177 = idf(docFreq=2761, maxDocs=44218)
                0.017295985 = queryNorm
              0.71480215 = fieldWeight in 5590, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.773177 = idf(docFreq=2761, maxDocs=44218)
                0.109375 = fieldNorm(doc=5590)
          0.13826518 = weight(abstract_txt:erzeugung in 5590) [ClassicSimilarity], result of:
            0.13826518 = score(doc=5590,freq=1.0), product of:
              0.15471937 = queryWeight, product of:
                1.0948367 = boost
                8.1705265 = idf(docFreq=33, maxDocs=44218)
                0.017295985 = queryNorm
              0.89365137 = fieldWeight in 5590, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.1705265 = idf(docFreq=33, maxDocs=44218)
                0.109375 = fieldNorm(doc=5590)
          0.052216817 = weight(abstract_txt:dabei in 5590) [ClassicSimilarity], result of:
            0.052216817 = score(doc=5590,freq=1.0), product of:
              0.10184813 = queryWeight, product of:
                1.2562282 = boost
                4.687478 = idf(docFreq=1106, maxDocs=44218)
                0.017295985 = queryNorm
              0.5126929 = fieldWeight in 5590, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.687478 = idf(docFreq=1106, maxDocs=44218)
                0.109375 = fieldNorm(doc=5590)
          0.032780427 = weight(abstract_txt:werden in 5590) [ClassicSimilarity], result of:
            0.032780427 = score(doc=5590,freq=1.0), product of:
              0.08547773 = queryWeight, product of:
                1.4094969 = boost
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.017295985 = queryNorm
              0.3834967 = fieldWeight in 5590, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.109375 = fieldNorm(doc=5590)
          0.08344517 = weight(abstract_txt:vorgestellt in 5590) [ClassicSimilarity], result of:
            0.08344517 = score(doc=5590,freq=1.0), product of:
              0.13921316 = queryWeight, product of:
                1.4686968 = boost
                5.4802814 = idf(docFreq=500, maxDocs=44218)
                0.017295985 = queryNorm
              0.59940577 = fieldWeight in 5590, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4802814 = idf(docFreq=500, maxDocs=44218)
                0.109375 = fieldNorm(doc=5590)
          0.13734533 = weight(abstract_txt:durch in 5590) [ClassicSimilarity], result of:
            0.13734533 = score(doc=5590,freq=2.0), product of:
              0.20905413 = queryWeight, product of:
                2.8457148 = boost
                4.2473893 = idf(docFreq=1718, maxDocs=44218)
                0.017295985 = queryNorm
              0.6569845 = fieldWeight in 5590, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2473893 = idf(docFreq=1718, maxDocs=44218)
                0.109375 = fieldNorm(doc=5590)
        0.24 = coord(6/25)
    
  2. Maislin, S.: Cyborg indexing : half-human half-machine (2007) 0.12
    0.11640209 = sum of:
      0.11640209 = product of:
        0.7275131 = sum of:
          0.02723417 = weight(abstract_txt:wird in 738) [ClassicSimilarity], result of:
            0.02723417 = score(doc=738,freq=1.0), product of:
              0.06599164 = queryWeight, product of:
                1.0111986 = boost
                3.773177 = idf(docFreq=2761, maxDocs=44218)
                0.017295985 = queryNorm
              0.41269124 = fieldWeight in 738, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.773177 = idf(docFreq=2761, maxDocs=44218)
                0.109375 = fieldNorm(doc=738)
          0.032780427 = weight(abstract_txt:werden in 738) [ClassicSimilarity], result of:
            0.032780427 = score(doc=738,freq=1.0), product of:
              0.08547773 = queryWeight, product of:
                1.4094969 = boost
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.017295985 = queryNorm
              0.3834967 = fieldWeight in 738, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.109375 = fieldNorm(doc=738)
          0.49928555 = weight(abstract_txt:fehlern in 738) [ClassicSimilarity], result of:
            0.49928555 = score(doc=738,freq=2.0), product of:
              0.36416832 = queryWeight, product of:
                2.3754346 = boost
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.017295985 = queryNorm
              1.3710296 = fieldWeight in 738, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.109375 = fieldNorm(doc=738)
          0.16821298 = weight(abstract_txt:durch in 738) [ClassicSimilarity], result of:
            0.16821298 = score(doc=738,freq=3.0), product of:
              0.20905413 = queryWeight, product of:
                2.8457148 = boost
                4.2473893 = idf(docFreq=1718, maxDocs=44218)
                0.017295985 = queryNorm
              0.8046384 = fieldWeight in 738, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.2473893 = idf(docFreq=1718, maxDocs=44218)
                0.109375 = fieldNorm(doc=738)
        0.16 = coord(4/25)
    
  3. Miene, A.; Hermes, T.; Ioannidis, G.: Wie kommt das Bild in die Datenbank? : Inhaltsbasierte Analyse von Bildern und Videos (2002) 0.11
    0.111280315 = sum of:
      0.111280315 = product of:
        0.55640155 = sum of:
          0.01945298 = weight(abstract_txt:wird in 213) [ClassicSimilarity], result of:
            0.01945298 = score(doc=213,freq=1.0), product of:
              0.06599164 = queryWeight, product of:
                1.0111986 = boost
                3.773177 = idf(docFreq=2761, maxDocs=44218)
                0.017295985 = queryNorm
              0.29477945 = fieldWeight in 213, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.773177 = idf(docFreq=2761, maxDocs=44218)
                0.078125 = fieldNorm(doc=213)
          0.13150589 = weight(abstract_txt:manuellen in 213) [ClassicSimilarity], result of:
            0.13150589 = score(doc=213,freq=1.0), product of:
              0.18726286 = queryWeight, product of:
                1.204489 = boost
                8.988837 = idf(docFreq=14, maxDocs=44218)
                0.017295985 = queryNorm
              0.7022529 = fieldWeight in 213, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.988837 = idf(docFreq=14, maxDocs=44218)
                0.078125 = fieldNorm(doc=213)
          0.033113234 = weight(abstract_txt:werden in 213) [ClassicSimilarity], result of:
            0.033113234 = score(doc=213,freq=2.0), product of:
              0.08547773 = queryWeight, product of:
                1.4094969 = boost
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.017295985 = queryNorm
              0.3873902 = fieldWeight in 213, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.078125 = fieldNorm(doc=213)
          0.2521773 = weight(abstract_txt:fehlern in 213) [ClassicSimilarity], result of:
            0.2521773 = score(doc=213,freq=1.0), product of:
              0.36416832 = queryWeight, product of:
                2.3754346 = boost
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.017295985 = queryNorm
              0.69247454 = fieldWeight in 213, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.078125 = fieldNorm(doc=213)
          0.12015213 = weight(abstract_txt:durch in 213) [ClassicSimilarity], result of:
            0.12015213 = score(doc=213,freq=3.0), product of:
              0.20905413 = queryWeight, product of:
                2.8457148 = boost
                4.2473893 = idf(docFreq=1718, maxDocs=44218)
                0.017295985 = queryNorm
              0.5747417 = fieldWeight in 213, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.2473893 = idf(docFreq=1718, maxDocs=44218)
                0.078125 = fieldNorm(doc=213)
        0.2 = coord(5/25)
    
  4. Gerick, T.: Content-based Information Retrieval auf Basis semantischer Abfragenetze : Kooperative Technologien am Beispsiel der Dokumentenrecherche in GENIOS Wirtschaftsdatenbanken (1999) 0.11
    0.10872167 = sum of:
      0.10872167 = product of:
        0.45300698 = sum of:
          0.03890596 = weight(abstract_txt:wird in 3874) [ClassicSimilarity], result of:
            0.03890596 = score(doc=3874,freq=4.0), product of:
              0.06599164 = queryWeight, product of:
                1.0111986 = boost
                3.773177 = idf(docFreq=2761, maxDocs=44218)
                0.017295985 = queryNorm
              0.5895589 = fieldWeight in 3874, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.773177 = idf(docFreq=2761, maxDocs=44218)
                0.078125 = fieldNorm(doc=3874)
          0.037297726 = weight(abstract_txt:dabei in 3874) [ClassicSimilarity], result of:
            0.037297726 = score(doc=3874,freq=1.0), product of:
              0.10184813 = queryWeight, product of:
                1.2562282 = boost
                4.687478 = idf(docFreq=1106, maxDocs=44218)
                0.017295985 = queryNorm
              0.3662092 = fieldWeight in 3874, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.687478 = idf(docFreq=1106, maxDocs=44218)
                0.078125 = fieldNorm(doc=3874)
          0.033113234 = weight(abstract_txt:werden in 3874) [ClassicSimilarity], result of:
            0.033113234 = score(doc=3874,freq=2.0), product of:
              0.08547773 = queryWeight, product of:
                1.4094969 = boost
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.017295985 = queryNorm
              0.3873902 = fieldWeight in 3874, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.078125 = fieldNorm(doc=3874)
          0.059603695 = weight(abstract_txt:vorgestellt in 3874) [ClassicSimilarity], result of:
            0.059603695 = score(doc=3874,freq=1.0), product of:
              0.13921316 = queryWeight, product of:
                1.4686968 = boost
                5.4802814 = idf(docFreq=500, maxDocs=44218)
                0.017295985 = queryNorm
              0.428147 = fieldWeight in 3874, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4802814 = idf(docFreq=500, maxDocs=44218)
                0.078125 = fieldNorm(doc=3874)
          0.21471651 = weight(abstract_txt:reduziert in 3874) [ClassicSimilarity], result of:
            0.21471651 = score(doc=3874,freq=1.0), product of:
              0.32714614 = queryWeight, product of:
                2.2514532 = boost
                8.401051 = idf(docFreq=26, maxDocs=44218)
                0.017295985 = queryNorm
              0.6563321 = fieldWeight in 3874, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.401051 = idf(docFreq=26, maxDocs=44218)
                0.078125 = fieldNorm(doc=3874)
          0.06936986 = weight(abstract_txt:durch in 3874) [ClassicSimilarity], result of:
            0.06936986 = score(doc=3874,freq=1.0), product of:
              0.20905413 = queryWeight, product of:
                2.8457148 = boost
                4.2473893 = idf(docFreq=1718, maxDocs=44218)
                0.017295985 = queryNorm
              0.33182728 = fieldWeight in 3874, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2473893 = idf(docFreq=1718, maxDocs=44218)
                0.078125 = fieldNorm(doc=3874)
        0.24 = coord(6/25)
    
  5. Mikro-Univers : 2. Workshop "Digitalisierung, Erschließung, Internetpräsentation und Langzeitarchivierung" (2004) 0.11
    0.10690093 = sum of:
      0.10690093 = product of:
        0.3340654 = sum of:
          0.06385618 = weight(abstract_txt:erzeugt in 3042) [ClassicSimilarity], result of:
            0.06385618 = score(doc=3042,freq=2.0), product of:
              0.12907614 = queryWeight, product of:
                7.462781 = idf(docFreq=68, maxDocs=44218)
                0.017295985 = queryNorm
              0.49471712 = fieldWeight in 3042, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.462781 = idf(docFreq=68, maxDocs=44218)
                0.046875 = fieldNorm(doc=3042)
          0.011671787 = weight(abstract_txt:wird in 3042) [ClassicSimilarity], result of:
            0.011671787 = score(doc=3042,freq=1.0), product of:
              0.06599164 = queryWeight, product of:
                1.0111986 = boost
                3.773177 = idf(docFreq=2761, maxDocs=44218)
                0.017295985 = queryNorm
              0.17686766 = fieldWeight in 3042, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.773177 = idf(docFreq=2761, maxDocs=44218)
                0.046875 = fieldNorm(doc=3042)
          0.079650216 = weight(abstract_txt:volltexte in 3042) [ClassicSimilarity], result of:
            0.079650216 = score(doc=3042,freq=2.0), product of:
              0.14956684 = queryWeight, product of:
                1.076452 = boost
                8.033325 = idf(docFreq=38, maxDocs=44218)
                0.017295985 = queryNorm
              0.53253925 = fieldWeight in 3042, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.033325 = idf(docFreq=38, maxDocs=44218)
                0.046875 = fieldNorm(doc=3042)
          0.0592565 = weight(abstract_txt:erzeugung in 3042) [ClassicSimilarity], result of:
            0.0592565 = score(doc=3042,freq=1.0), product of:
              0.15471937 = queryWeight, product of:
                1.0948367 = boost
                8.1705265 = idf(docFreq=33, maxDocs=44218)
                0.017295985 = queryNorm
              0.38299343 = fieldWeight in 3042, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.1705265 = idf(docFreq=33, maxDocs=44218)
                0.046875 = fieldNorm(doc=3042)
          0.022378635 = weight(abstract_txt:dabei in 3042) [ClassicSimilarity], result of:
            0.022378635 = score(doc=3042,freq=1.0), product of:
              0.10184813 = queryWeight, product of:
                1.2562282 = boost
                4.687478 = idf(docFreq=1106, maxDocs=44218)
                0.017295985 = queryNorm
              0.21972553 = fieldWeight in 3042, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.687478 = idf(docFreq=1106, maxDocs=44218)
                0.046875 = fieldNorm(doc=3042)
          0.01986794 = weight(abstract_txt:werden in 3042) [ClassicSimilarity], result of:
            0.01986794 = score(doc=3042,freq=2.0), product of:
              0.08547773 = queryWeight, product of:
                1.4094969 = boost
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.017295985 = queryNorm
              0.23243411 = fieldWeight in 3042, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.046875 = fieldNorm(doc=3042)
          0.035762217 = weight(abstract_txt:vorgestellt in 3042) [ClassicSimilarity], result of:
            0.035762217 = score(doc=3042,freq=1.0), product of:
              0.13921316 = queryWeight, product of:
                1.4686968 = boost
                5.4802814 = idf(docFreq=500, maxDocs=44218)
                0.017295985 = queryNorm
              0.25688818 = fieldWeight in 3042, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4802814 = idf(docFreq=500, maxDocs=44218)
                0.046875 = fieldNorm(doc=3042)
          0.04162192 = weight(abstract_txt:durch in 3042) [ClassicSimilarity], result of:
            0.04162192 = score(doc=3042,freq=1.0), product of:
              0.20905413 = queryWeight, product of:
                2.8457148 = boost
                4.2473893 = idf(docFreq=1718, maxDocs=44218)
                0.017295985 = queryNorm
              0.19909638 = fieldWeight in 3042, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2473893 = idf(docFreq=1718, maxDocs=44218)
                0.046875 = fieldNorm(doc=3042)
        0.32 = coord(8/25)