Document (#37597)

Author
Geisriegler, E.
Title
Enriching electronic texts with semantic metadata : a use case for the historical Newspaper Collection ANNO (Austrian Newspapers Online) of the Austrian National Libraryhek
Imprint
Wien : Universität / ÖNB
Year
2012
Pages
345 S
Abstract
Die vorliegende Master Thesis setzt sich mit der Frage nach Möglichkeiten der Anreicherung historischer Zeitungen mit semantischen Metadaten auseinander. Sie möchte außerdem analysieren, welcher Nutzen für vor allem geisteswissenschaftlich Forschende, durch die Anreicherung mit zusätzlichen Informationsquellen entsteht. Nach der Darstellung der Entwicklung der interdisziplinären 'Digital Humanities', wurde für die digitale Sammlung historischer Zeitungen (ANNO AustriaN Newspapers Online) der Österreichischen Nationalbibliothek ein Use Case entwickelt, bei dem 'Named Entities' (Personen, Orte, Organisationen und Daten) in ausgewählten Zeitungsausgaben manuell annotiert wurden. Methodisch wurde das Kodieren mit 'TEI', einem Dokumentenformat zur Kodierung und zum Austausch von Texten durchgeführt. Zusätzlich wurden zu allen annotierten 'Named Entities' Einträge in externen Datenbanken wie Wikipedia, Wikipedia Personensuche, der ehemaligen Personennamen- und Schlagwortnormdatei (jetzt Gemeinsame Normdatei GND), VIAF und dem Bildarchiv Austria gesucht und gegebenenfalls verlinkt. Eine Beschreibung der Ergebnisse des manuellen Annotierens der Zeitungsseiten schließt diesen Teil der Arbeit ab. In einem weiteren Abschnitt werden die Ergebnisse des manuellen Annotierens mit jenen Ergebnissen, die automatisch mit dem German NER (Named Entity Recognition) generiert wurden, verglichen und in ihrer Genauigkeit analysiert. Abschließend präsentiert die Arbeit einige Best Practice-Beispiele kodierter und angereicherter Zeitungsseiten, um den zusätzlichen Nutzen durch die Auszeichnung der 'Named Entities' und durch die Verlinkung mit externen Informationsquellen für die BenützerInnen darzustellen.
Footnote
Wien, Univ., Lehrgang Library and Information Studies, Master-Thesis, 2012.
Theme
Zeitungen
Location
A

Similar documents (content)

  1. Brogiato, H.P.; Horn, K.: ¬Der historische Bildbestand im Institut für Länderkunde Leipzig : Aufbau eines digitalen Langzeitarchivs (2003) 0.08
    0.08388322 = sum of:
      0.08388322 = product of:
        0.4194161 = sum of:
          0.112708405 = weight(abstract_txt:bildarchiv in 2325) [ClassicSimilarity], result of:
            0.112708405 = score(doc=2325,freq=1.0), product of:
              0.15571164 = queryWeight, product of:
                1.0346354 = boost
                9.264996 = idf(docFreq=10, maxDocs=42740)
                0.016243832 = queryNorm
              0.7238278 = fieldWeight in 2325, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.264996 = idf(docFreq=10, maxDocs=42740)
                0.078125 = fieldNorm(doc=2325)
          0.047308408 = weight(abstract_txt:ergebnisse in 2325) [ClassicSimilarity], result of:
            0.047308408 = score(doc=2325,freq=1.0), product of:
              0.10998136 = queryWeight, product of:
                1.2297062 = boost
                5.5059114 = idf(docFreq=471, maxDocs=42740)
                0.016243832 = queryNorm
              0.43014932 = fieldWeight in 2325, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5059114 = idf(docFreq=471, maxDocs=42740)
                0.078125 = fieldNorm(doc=2325)
          0.05569015 = weight(abstract_txt:nutzen in 2325) [ClassicSimilarity], result of:
            0.05569015 = score(doc=2325,freq=1.0), product of:
              0.12261561 = queryWeight, product of:
                1.2984185 = boost
                5.8135657 = idf(docFreq=346, maxDocs=42740)
                0.016243832 = queryNorm
              0.45418483 = fieldWeight in 2325, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8135657 = idf(docFreq=346, maxDocs=42740)
                0.078125 = fieldNorm(doc=2325)
          0.033062562 = weight(abstract_txt:durch in 2325) [ClassicSimilarity], result of:
            0.033062562 = score(doc=2325,freq=1.0), product of:
              0.09914746 = queryWeight, product of:
                1.4299743 = boost
                4.2683973 = idf(docFreq=1626, maxDocs=42740)
                0.016243832 = queryNorm
              0.33346856 = fieldWeight in 2325, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2683973 = idf(docFreq=1626, maxDocs=42740)
                0.078125 = fieldNorm(doc=2325)
          0.17064658 = weight(abstract_txt:historischer in 2325) [ClassicSimilarity], result of:
            0.17064658 = score(doc=2325,freq=1.0), product of:
              0.25867745 = queryWeight, product of:
                1.8859106 = boost
                8.444015 = idf(docFreq=24, maxDocs=42740)
                0.016243832 = queryNorm
              0.65968865 = fieldWeight in 2325, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.444015 = idf(docFreq=24, maxDocs=42740)
                0.078125 = fieldNorm(doc=2325)
        0.2 = coord(5/25)
    
  2. Maas, H.-D.: Indexieren mit AUTINDEX (2006) 0.07
    0.073276326 = sum of:
      0.073276326 = product of:
        0.26170117 = sum of:
          0.05088201 = weight(abstract_txt:manuell in 1078) [ClassicSimilarity], result of:
            0.05088201 = score(doc=1078,freq=1.0), product of:
              0.14546093 = queryWeight, product of:
                8.954841 = idf(docFreq=14, maxDocs=42740)
                0.016243832 = queryNorm
              0.34979847 = fieldWeight in 1078, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.954841 = idf(docFreq=14, maxDocs=42740)
                0.0390625 = fieldNorm(doc=1078)
          0.027042182 = weight(abstract_txt:wurde in 1078) [ClassicSimilarity], result of:
            0.027042182 = score(doc=1078,freq=3.0), product of:
              0.083374694 = queryWeight, product of:
                1.0706781 = boost
                4.793876 = idf(docFreq=961, maxDocs=42740)
                0.016243832 = queryNorm
              0.3243452 = fieldWeight in 1078, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.793876 = idf(docFreq=961, maxDocs=42740)
                0.0390625 = fieldNorm(doc=1078)
          0.023654204 = weight(abstract_txt:ergebnisse in 1078) [ClassicSimilarity], result of:
            0.023654204 = score(doc=1078,freq=1.0), product of:
              0.10998136 = queryWeight, product of:
                1.2297062 = boost
                5.5059114 = idf(docFreq=471, maxDocs=42740)
                0.016243832 = queryNorm
              0.21507466 = fieldWeight in 1078, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5059114 = idf(docFreq=471, maxDocs=42740)
                0.0390625 = fieldNorm(doc=1078)
          0.027845075 = weight(abstract_txt:nutzen in 1078) [ClassicSimilarity], result of:
            0.027845075 = score(doc=1078,freq=1.0), product of:
              0.12261561 = queryWeight, product of:
                1.2984185 = boost
                5.8135657 = idf(docFreq=346, maxDocs=42740)
                0.016243832 = queryNorm
              0.22709242 = fieldWeight in 1078, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8135657 = idf(docFreq=346, maxDocs=42740)
                0.0390625 = fieldNorm(doc=1078)
          0.016531281 = weight(abstract_txt:durch in 1078) [ClassicSimilarity], result of:
            0.016531281 = score(doc=1078,freq=1.0), product of:
              0.09914746 = queryWeight, product of:
                1.4299743 = boost
                4.2683973 = idf(docFreq=1626, maxDocs=42740)
                0.016243832 = queryNorm
              0.16673428 = fieldWeight in 1078, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2683973 = idf(docFreq=1626, maxDocs=42740)
                0.0390625 = fieldNorm(doc=1078)
          0.042496815 = weight(abstract_txt:wurden in 1078) [ClassicSimilarity], result of:
            0.042496815 = score(doc=1078,freq=2.0), product of:
              0.14767429 = queryWeight, product of:
                1.7451787 = boost
                5.2092657 = idf(docFreq=634, maxDocs=42740)
                0.016243832 = queryNorm
              0.28777397 = fieldWeight in 1078, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.2092657 = idf(docFreq=634, maxDocs=42740)
                0.0390625 = fieldNorm(doc=1078)
          0.0732496 = weight(abstract_txt:zusätzlichen in 1078) [ClassicSimilarity], result of:
            0.0732496 = score(doc=1078,freq=1.0), product of:
              0.23365963 = queryWeight, product of:
                1.7923948 = boost
                8.025305 = idf(docFreq=37, maxDocs=42740)
                0.016243832 = queryNorm
              0.31348848 = fieldWeight in 1078, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.025305 = idf(docFreq=37, maxDocs=42740)
                0.0390625 = fieldNorm(doc=1078)
        0.28 = coord(7/25)
    
  3. Stelzenmüller, C.: Mashups in Bibliotheken : Untersuchung der Verbreitung von Mashups auf Webseiten wissenschaftlicher Bibliotheken und Erstellung eines praktischen Beispiels (2008) 0.06
    0.06453092 = sum of:
      0.06453092 = product of:
        0.40331823 = sum of:
          0.06724714 = weight(abstract_txt:arbeit in 70) [ClassicSimilarity], result of:
            0.06724714 = score(doc=70,freq=1.0), product of:
              0.101639554 = queryWeight, product of:
                1.1821517 = boost
                5.2929897 = idf(docFreq=583, maxDocs=42740)
                0.016243832 = queryNorm
              0.6616237 = fieldWeight in 70, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2929897 = idf(docFreq=583, maxDocs=42740)
                0.125 = fieldNorm(doc=70)
          0.08910424 = weight(abstract_txt:nutzen in 70) [ClassicSimilarity], result of:
            0.08910424 = score(doc=70,freq=1.0), product of:
              0.12261561 = queryWeight, product of:
                1.2984185 = boost
                5.8135657 = idf(docFreq=346, maxDocs=42740)
                0.016243832 = queryNorm
              0.7266957 = fieldWeight in 70, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8135657 = idf(docFreq=346, maxDocs=42740)
                0.125 = fieldNorm(doc=70)
          0.052900095 = weight(abstract_txt:durch in 70) [ClassicSimilarity], result of:
            0.052900095 = score(doc=70,freq=1.0), product of:
              0.09914746 = queryWeight, product of:
                1.4299743 = boost
                4.2683973 = idf(docFreq=1626, maxDocs=42740)
                0.016243832 = queryNorm
              0.53354967 = fieldWeight in 70, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2683973 = idf(docFreq=1626, maxDocs=42740)
                0.125 = fieldNorm(doc=70)
          0.19406676 = weight(abstract_txt:informationsquellen in 70) [ClassicSimilarity], result of:
            0.19406676 = score(doc=70,freq=1.0), product of:
              0.20602234 = queryWeight, product of:
                1.6830575 = boost
                7.535756 = idf(docFreq=61, maxDocs=42740)
                0.016243832 = queryNorm
              0.9419695 = fieldWeight in 70, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.535756 = idf(docFreq=61, maxDocs=42740)
                0.125 = fieldNorm(doc=70)
        0.16 = coord(4/25)
    
  4. Scholz, D.: Retrokonversion in der Zentralbibliothek der Universitätsbibliothek Dortmund : Abschlussbericht November 1995 bis April 2003 (2003) 0.06
    0.063316084 = sum of:
      0.063316084 = product of:
        0.3165804 = sum of:
          0.041893568 = weight(abstract_txt:wurde in 2942) [ClassicSimilarity], result of:
            0.041893568 = score(doc=2942,freq=5.0), product of:
              0.083374694 = queryWeight, product of:
                1.0706781 = boost
                4.793876 = idf(docFreq=961, maxDocs=42740)
                0.016243832 = queryNorm
              0.5024734 = fieldWeight in 2942, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.793876 = idf(docFreq=961, maxDocs=42740)
                0.046875 = fieldNorm(doc=2942)
          0.050435353 = weight(abstract_txt:arbeit in 2942) [ClassicSimilarity], result of:
            0.050435353 = score(doc=2942,freq=4.0), product of:
              0.101639554 = queryWeight, product of:
                1.1821517 = boost
                5.2929897 = idf(docFreq=583, maxDocs=42740)
                0.016243832 = queryNorm
              0.4962178 = fieldWeight in 2942, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.2929897 = idf(docFreq=583, maxDocs=42740)
                0.046875 = fieldNorm(doc=2942)
          0.034359615 = weight(abstract_txt:durch in 2942) [ClassicSimilarity], result of:
            0.034359615 = score(doc=2942,freq=3.0), product of:
              0.09914746 = queryWeight, product of:
                1.4299743 = boost
                4.2683973 = idf(docFreq=1626, maxDocs=42740)
                0.016243832 = queryNorm
              0.34655064 = fieldWeight in 2942, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.2683973 = idf(docFreq=1626, maxDocs=42740)
                0.046875 = fieldNorm(doc=2942)
          0.10199237 = weight(abstract_txt:wurden in 2942) [ClassicSimilarity], result of:
            0.10199237 = score(doc=2942,freq=8.0), product of:
              0.14767429 = queryWeight, product of:
                1.7451787 = boost
                5.2092657 = idf(docFreq=634, maxDocs=42740)
                0.016243832 = queryNorm
              0.69065756 = fieldWeight in 2942, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                5.2092657 = idf(docFreq=634, maxDocs=42740)
                0.046875 = fieldNorm(doc=2942)
          0.08789952 = weight(abstract_txt:zusätzlichen in 2942) [ClassicSimilarity], result of:
            0.08789952 = score(doc=2942,freq=1.0), product of:
              0.23365963 = queryWeight, product of:
                1.7923948 = boost
                8.025305 = idf(docFreq=37, maxDocs=42740)
                0.016243832 = queryNorm
              0.37618616 = fieldWeight in 2942, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.025305 = idf(docFreq=37, maxDocs=42740)
                0.046875 = fieldNorm(doc=2942)
        0.2 = coord(5/25)
    
  5. Kugler, A.: Automatisierte Volltexterschließung von Retrodigitalisaten am Beispiel historischer Zeitungen (2018) 0.06
    0.0624427 = sum of:
      0.0624427 = product of:
        0.3902669 = sum of:
          0.031225622 = weight(abstract_txt:wurde in 596) [ClassicSimilarity], result of:
            0.031225622 = score(doc=596,freq=1.0), product of:
              0.083374694 = queryWeight, product of:
                1.0706781 = boost
                4.793876 = idf(docFreq=961, maxDocs=42740)
                0.016243832 = queryNorm
              0.37452158 = fieldWeight in 596, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.793876 = idf(docFreq=961, maxDocs=42740)
                0.078125 = fieldNorm(doc=596)
          0.047308408 = weight(abstract_txt:ergebnisse in 596) [ClassicSimilarity], result of:
            0.047308408 = score(doc=596,freq=1.0), product of:
              0.10998136 = queryWeight, product of:
                1.2297062 = boost
                5.5059114 = idf(docFreq=471, maxDocs=42740)
                0.016243832 = queryNorm
              0.43014932 = fieldWeight in 596, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5059114 = idf(docFreq=471, maxDocs=42740)
                0.078125 = fieldNorm(doc=596)
          0.14108628 = weight(abstract_txt:zeitungen in 596) [ClassicSimilarity], result of:
            0.14108628 = score(doc=596,freq=1.0), product of:
              0.227868 = queryWeight, product of:
                1.7700417 = boost
                7.925221 = idf(docFreq=41, maxDocs=42740)
                0.016243832 = queryNorm
              0.6191579 = fieldWeight in 596, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.925221 = idf(docFreq=41, maxDocs=42740)
                0.078125 = fieldNorm(doc=596)
          0.17064658 = weight(abstract_txt:historischer in 596) [ClassicSimilarity], result of:
            0.17064658 = score(doc=596,freq=1.0), product of:
              0.25867745 = queryWeight, product of:
                1.8859106 = boost
                8.444015 = idf(docFreq=24, maxDocs=42740)
                0.016243832 = queryNorm
              0.65968865 = fieldWeight in 596, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.444015 = idf(docFreq=24, maxDocs=42740)
                0.078125 = fieldNorm(doc=596)
        0.16 = coord(4/25)