Document (#40468)

Author
Brantl, M.
Ceynowa, K.
Meiers, T.
Wolf, T.
Title
Visuelle Suche in historischen Werken
Source
Datenbank Spektrum. 17(2017) H.1, S.53-60
Year
2017
Abstract
Die Bayerische Staatsbibliothek (BSB) zählt mit ihrem Bestand von knapp 11 Mio. Bänden zu den bedeutendsten Universalbibliotheken der Welt. Bereits 1,2 Mio. Werke sind digitalisiert, was die BSB zur größten digitalen Kulturinstitution in Deutschland macht. Dieser digitale Bestand umfasst vorwiegend urheberrechtsfreie Werke vom 8. bis ins 20. Jahrhundert, von der mittelalterlichen Bibelhandschrift bis zur Boulevardzeitung der 1920er-Jahre. Diese Vielfalt des zu digitalisierenden schriftlichen Kulturerbes und das hohe Tempo der Massendigitalisierung in den letzten Jahren haben ihren Preis - die inhaltliche Erschließung der Werke hinkt hinterher, insbesondere bei Werken, die nicht mittels Optical Character Recognition-Verfahren (OCR) automatisiert maschinenlesbar transformiert und zugänglich gemacht werden können. Dies gilt insbesondere für mittelalterliche Handschriften, Alte Druck- und Spezialbestände. Deshalb blieb auch der reichhaltige, in diesen Werken verborgene Bildbestand für den Nutzer weitestgehend verborgen und konnte lediglich durch das Durchblättern am Bildschirm entdeckt werden. Dies war Motivation für die Bayerische Staatsbibliothek, gemeinsam mit dem Fraunhofer Heinrich-Hertz-Institut in Berlin ein System zur ähnlichkeitsbasierten Bildsuche aufzubauen, welches sämtliche Bildinhalte aller 1,2 Mio. Digitalisate automatisch identifiziert. Hierbei werden mittels morphologischer Verfahren Bilder aus den Buchseiten extrahiert, die danach aufgrund von Farb- und Kantenmerkmalen klassifiziert werden. Bilder "ohne Informationswert" werden mit Hilfe von Methoden aus dem Bereich des maschinellen Lernens herausgefiltert. Damit konnten aus den digitalisierten Werken der BSB bislang mehr als 43 Mio. einzelne Bilder identifiziert werden, die mittels einer hochperformanten Suchmaschine über eine frei verfügbare Web-Applikation dem Anwender direkt zur Verfügung stehen. Dank der Vielfalt und Reichhaltigkeit der indexierten Bestände spricht dieses Angebot nicht nur Historiker und Buchwissenschaftler an, sondern Interessierte aus den unterschiedlichsten Fachrichtungen. Die Ähnlichkeitssuche stellt dabei unbekannte, ungewöhnliche und oftmals überraschende Bezüge zwischen unterschiedlichsten Werken her.
Content
Vgl.: https://dx.doi.org/10.1007/s13222-017-0250-0.
Theme
Visualisierung
Field
Geschichtswissenschaft
Form
Bilder

Similar documents (author)

  1. Altenhöner, R.; Brantl, M.; Ceynowa, K.: Digitale Langzeitarchivierung in Deutschland : Projekte und Perspektiven (2011) 2.97
    2.969487 = sum of:
      2.969487 = product of:
        4.4542303 = sum of:
          2.0376148 = weight(author_txt:ceynowa in 151) [ClassicSimilarity], result of:
            2.0376148 = score(doc=151,freq=1.0), product of:
              0.5898449 = queryWeight, product of:
                1.1274648 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.056791298 = queryNorm
              3.4544928 = fieldWeight in 151, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.375 = fieldNorm(doc=151)
          2.4166152 = weight(author_txt:brantl in 151) [ClassicSimilarity], result of:
            2.4166152 = score(doc=151,freq=1.0), product of:
              0.6608883 = queryWeight, product of:
                1.1934332 = boost
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.056791298 = queryNorm
              3.6566167 = fieldWeight in 151, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.375 = fieldNorm(doc=151)
        0.6666667 = coord(2/3)
    
  2. Brantl, M.; Ceynowa, K.; Fabian, C.; Messmer, G.; Schäfer, I.: Massendigitalisierung deutscher Drucke des 16. Jahrhunderts : ein Erfahrungsbericht der Bayerischen Staatsbibliothek (2009) 2.47
    2.4745722 = sum of:
      2.4745722 = product of:
        3.7118583 = sum of:
          1.6980125 = weight(author_txt:ceynowa in 188) [ClassicSimilarity], result of:
            1.6980125 = score(doc=188,freq=1.0), product of:
              0.5898449 = queryWeight, product of:
                1.1274648 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.056791298 = queryNorm
              2.8787441 = fieldWeight in 188, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.3125 = fieldNorm(doc=188)
          2.013846 = weight(author_txt:brantl in 188) [ClassicSimilarity], result of:
            2.013846 = score(doc=188,freq=1.0), product of:
              0.6608883 = queryWeight, product of:
                1.1934332 = boost
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.056791298 = queryNorm
              3.0471804 = fieldWeight in 188, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.3125 = fieldNorm(doc=188)
        0.6666667 = coord(2/3)
    
  3. Kempf, K.; Brantl, M.; Meiers, T.; Wolf, T.: Auf der Suche nach dem verborgenen Bild : Künstliche Intelligenz erschließt historische Bibliotheksbestände (2021) 2.13
    2.1324067 = sum of:
      2.1324067 = product of:
        3.1986098 = sum of:
          1.184764 = weight(author_txt:wolf in 147) [ClassicSimilarity], result of:
            1.184764 = score(doc=147,freq=1.0), product of:
              0.4640148 = queryWeight, product of:
                8.1705265 = idf(docFreq=33, maxDocs=44218)
                0.056791298 = queryNorm
              2.5532894 = fieldWeight in 147, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.1705265 = idf(docFreq=33, maxDocs=44218)
                0.3125 = fieldNorm(doc=147)
          2.013846 = weight(author_txt:brantl in 147) [ClassicSimilarity], result of:
            2.013846 = score(doc=147,freq=1.0), product of:
              0.6608883 = queryWeight, product of:
                1.1934332 = boost
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.056791298 = queryNorm
              3.0471804 = fieldWeight in 147, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.3125 = fieldNorm(doc=147)
        0.6666667 = coord(2/3)
    
  4. Kempf, K.; Brantl, M.; Meiers, T.; Wolf, T.: Auf der Suche nach dem verborgenen Bild : Künstliche Intelligenz erschließt historische Bibliotheksbestände (2021) 2.13
    2.1324067 = sum of:
      2.1324067 = product of:
        3.1986098 = sum of:
          1.184764 = weight(author_txt:wolf in 218) [ClassicSimilarity], result of:
            1.184764 = score(doc=218,freq=1.0), product of:
              0.4640148 = queryWeight, product of:
                8.1705265 = idf(docFreq=33, maxDocs=44218)
                0.056791298 = queryNorm
              2.5532894 = fieldWeight in 218, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.1705265 = idf(docFreq=33, maxDocs=44218)
                0.3125 = fieldNorm(doc=218)
          2.013846 = weight(author_txt:brantl in 218) [ClassicSimilarity], result of:
            2.013846 = score(doc=218,freq=1.0), product of:
              0.6608883 = queryWeight, product of:
                1.1934332 = boost
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.056791298 = queryNorm
              3.0471804 = fieldWeight in 218, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.3125 = fieldNorm(doc=218)
        0.6666667 = coord(2/3)
    
  5. Kempf, K.; Brantl, M.; Meiers, T.; Wolf, T.: Auf der Suche nach dem verborgenen Bild : Künstliche Intelligenz erschließt historische Bibliotheksbestände (2021) 2.13
    2.1324067 = sum of:
      2.1324067 = product of:
        3.1986098 = sum of:
          1.184764 = weight(author_txt:wolf in 224) [ClassicSimilarity], result of:
            1.184764 = score(doc=224,freq=1.0), product of:
              0.4640148 = queryWeight, product of:
                8.1705265 = idf(docFreq=33, maxDocs=44218)
                0.056791298 = queryNorm
              2.5532894 = fieldWeight in 224, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.1705265 = idf(docFreq=33, maxDocs=44218)
                0.3125 = fieldNorm(doc=224)
          2.013846 = weight(author_txt:brantl in 224) [ClassicSimilarity], result of:
            2.013846 = score(doc=224,freq=1.0), product of:
              0.6608883 = queryWeight, product of:
                1.1934332 = boost
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.056791298 = queryNorm
              3.0471804 = fieldWeight in 224, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.3125 = fieldNorm(doc=224)
        0.6666667 = coord(2/3)
    

Similar documents (content)

  1. Brumm, A.: Modellierung eines Informationssystems zum Bühnentanz als semantisches Wiki (2010) 0.11
    0.107578695 = sum of:
      0.107578695 = product of:
        0.67236686 = sum of:
          0.07703574 = weight(abstract_txt:insbesondere in 4025) [ClassicSimilarity], result of:
            0.07703574 = score(doc=4025,freq=2.0), product of:
              0.10394527 = queryWeight, product of:
                1.2022517 = boost
                5.5898643 = idf(docFreq=448, maxDocs=44218)
                0.015467071 = queryNorm
              0.74111825 = fieldWeight in 4025, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.5898643 = idf(docFreq=448, maxDocs=44218)
                0.09375 = fieldNorm(doc=4025)
          0.04032974 = weight(abstract_txt:werden in 4025) [ClassicSimilarity], result of:
            0.04032974 = score(doc=4025,freq=1.0), product of:
              0.1226904 = queryWeight, product of:
                2.2623456 = boost
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.015467071 = queryNorm
              0.32871145 = fieldWeight in 4025, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.09375 = fieldNorm(doc=4025)
          0.17520818 = weight(abstract_txt:werke in 4025) [ClassicSimilarity], result of:
            0.17520818 = score(doc=4025,freq=1.0), product of:
              0.25927058 = queryWeight, product of:
                2.325494 = boost
                7.208251 = idf(docFreq=88, maxDocs=44218)
                0.015467071 = queryNorm
              0.6757735 = fieldWeight in 4025, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.208251 = idf(docFreq=88, maxDocs=44218)
                0.09375 = fieldNorm(doc=4025)
          0.37979323 = weight(abstract_txt:werken in 4025) [ClassicSimilarity], result of:
            0.37979323 = score(doc=4025,freq=1.0), product of:
              0.5148705 = queryWeight, product of:
                4.230698 = boost
                7.8682456 = idf(docFreq=45, maxDocs=44218)
                0.015467071 = queryNorm
              0.737648 = fieldWeight in 4025, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.8682456 = idf(docFreq=45, maxDocs=44218)
                0.09375 = fieldNorm(doc=4025)
        0.16 = coord(4/25)
    
  2. Hauer, M.: Zur Bedeutung normierter Terminologien in Zeiten moderner Sprach- und Information-Retrieval-Technologien (2013) 0.10
    0.10190295 = sum of:
      0.10190295 = product of:
        0.63689345 = sum of:
          0.11618132 = weight(abstract_txt:mittels in 995) [ClassicSimilarity], result of:
            0.11618132 = score(doc=995,freq=1.0), product of:
              0.22263597 = queryWeight, product of:
                2.1549456 = boost
                6.6796074 = idf(docFreq=150, maxDocs=44218)
                0.015467071 = queryNorm
              0.5218443 = fieldWeight in 995, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6796074 = idf(docFreq=150, maxDocs=44218)
                0.078125 = fieldNorm(doc=995)
          0.05821097 = weight(abstract_txt:werden in 995) [ClassicSimilarity], result of:
            0.05821097 = score(doc=995,freq=3.0), product of:
              0.1226904 = queryWeight, product of:
                2.2623456 = boost
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.015467071 = queryNorm
              0.47445413 = fieldWeight in 995, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.078125 = fieldNorm(doc=995)
          0.14600684 = weight(abstract_txt:werke in 995) [ClassicSimilarity], result of:
            0.14600684 = score(doc=995,freq=1.0), product of:
              0.25927058 = queryWeight, product of:
                2.325494 = boost
                7.208251 = idf(docFreq=88, maxDocs=44218)
                0.015467071 = queryNorm
              0.5631446 = fieldWeight in 995, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.208251 = idf(docFreq=88, maxDocs=44218)
                0.078125 = fieldNorm(doc=995)
          0.31649435 = weight(abstract_txt:werken in 995) [ClassicSimilarity], result of:
            0.31649435 = score(doc=995,freq=1.0), product of:
              0.5148705 = queryWeight, product of:
                4.230698 = boost
                7.8682456 = idf(docFreq=45, maxDocs=44218)
                0.015467071 = queryNorm
              0.6147067 = fieldWeight in 995, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.8682456 = idf(docFreq=45, maxDocs=44218)
                0.078125 = fieldNorm(doc=995)
        0.16 = coord(4/25)
    
  3. Vorndran, A.: Hervorholen, was in unseren Daten steckt! : Mehrwerte durch Analysen großer Bibliotheksdatenbestände (2018) 0.09
    0.089289166 = sum of:
      0.089289166 = product of:
        0.5580573 = sum of:
          0.058165442 = weight(abstract_txt:dies in 4601) [ClassicSimilarity], result of:
            0.058165442 = score(doc=4601,freq=2.0), product of:
              0.097328655 = queryWeight, product of:
                1.163358 = boost
                5.4090285 = idf(docFreq=537, maxDocs=44218)
                0.015467071 = queryNorm
              0.5976189 = fieldWeight in 4601, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4090285 = idf(docFreq=537, maxDocs=44218)
                0.078125 = fieldNorm(doc=4601)
          0.11618132 = weight(abstract_txt:mittels in 4601) [ClassicSimilarity], result of:
            0.11618132 = score(doc=4601,freq=1.0), product of:
              0.22263597 = queryWeight, product of:
                2.1549456 = boost
                6.6796074 = idf(docFreq=150, maxDocs=44218)
                0.015467071 = queryNorm
              0.5218443 = fieldWeight in 4601, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6796074 = idf(docFreq=150, maxDocs=44218)
                0.078125 = fieldNorm(doc=4601)
          0.06721624 = weight(abstract_txt:werden in 4601) [ClassicSimilarity], result of:
            0.06721624 = score(doc=4601,freq=4.0), product of:
              0.1226904 = queryWeight, product of:
                2.2623456 = boost
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.015467071 = queryNorm
              0.54785246 = fieldWeight in 4601, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.078125 = fieldNorm(doc=4601)
          0.31649435 = weight(abstract_txt:werken in 4601) [ClassicSimilarity], result of:
            0.31649435 = score(doc=4601,freq=1.0), product of:
              0.5148705 = queryWeight, product of:
                4.230698 = boost
                7.8682456 = idf(docFreq=45, maxDocs=44218)
                0.015467071 = queryNorm
              0.6147067 = fieldWeight in 4601, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.8682456 = idf(docFreq=45, maxDocs=44218)
                0.078125 = fieldNorm(doc=4601)
        0.16 = coord(4/25)
    
  4. Google erweitert Buchsuche um Staatsbibliothek (2006) 0.09
    0.08709178 = sum of:
      0.08709178 = product of:
        0.7257649 = sum of:
          0.20138857 = weight(abstract_txt:staatsbibliothek in 159) [ClassicSimilarity], result of:
            0.20138857 = score(doc=159,freq=1.0), product of:
              0.17679796 = queryWeight, product of:
                1.567948 = boost
                7.290168 = idf(docFreq=81, maxDocs=44218)
                0.015467071 = queryNorm
              1.1390887 = fieldWeight in 159, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.290168 = idf(docFreq=81, maxDocs=44218)
                0.15625 = fieldNorm(doc=159)
          0.23236264 = weight(abstract_txt:mittels in 159) [ClassicSimilarity], result of:
            0.23236264 = score(doc=159,freq=1.0), product of:
              0.22263597 = queryWeight, product of:
                2.1549456 = boost
                6.6796074 = idf(docFreq=150, maxDocs=44218)
                0.015467071 = queryNorm
              1.0436887 = fieldWeight in 159, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6796074 = idf(docFreq=150, maxDocs=44218)
                0.15625 = fieldNorm(doc=159)
          0.29201367 = weight(abstract_txt:werke in 159) [ClassicSimilarity], result of:
            0.29201367 = score(doc=159,freq=1.0), product of:
              0.25927058 = queryWeight, product of:
                2.325494 = boost
                7.208251 = idf(docFreq=88, maxDocs=44218)
                0.015467071 = queryNorm
              1.1262892 = fieldWeight in 159, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.208251 = idf(docFreq=88, maxDocs=44218)
                0.15625 = fieldNorm(doc=159)
        0.12 = coord(3/25)
    
  5. Goldmann, M.: Alles rund um Briefmarken und Postgeschichte : die Philatelistische Bibliothek Hamburg bietet hochwertige Auskünfte - mehr als 20.000 Medien im Bestand (2013) 0.09
    0.08661329 = sum of:
      0.08661329 = product of:
        0.72177744 = sum of:
          0.06580669 = weight(abstract_txt:dies in 623) [ClassicSimilarity], result of:
            0.06580669 = score(doc=623,freq=1.0), product of:
              0.097328655 = queryWeight, product of:
                1.163358 = boost
                5.4090285 = idf(docFreq=537, maxDocs=44218)
                0.015467071 = queryNorm
              0.67612857 = fieldWeight in 623, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4090285 = idf(docFreq=537, maxDocs=44218)
                0.125 = fieldNorm(doc=623)
          0.14957973 = weight(abstract_txt:bestand in 623) [ClassicSimilarity], result of:
            0.14957973 = score(doc=623,freq=1.0), product of:
              0.16825807 = queryWeight, product of:
                1.5296109 = boost
                7.11192 = idf(docFreq=97, maxDocs=44218)
                0.015467071 = queryNorm
              0.88899 = fieldWeight in 623, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.11192 = idf(docFreq=97, maxDocs=44218)
                0.125 = fieldNorm(doc=623)
          0.506391 = weight(abstract_txt:werken in 623) [ClassicSimilarity], result of:
            0.506391 = score(doc=623,freq=1.0), product of:
              0.5148705 = queryWeight, product of:
                4.230698 = boost
                7.8682456 = idf(docFreq=45, maxDocs=44218)
                0.015467071 = queryNorm
              0.9835307 = fieldWeight in 623, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.8682456 = idf(docFreq=45, maxDocs=44218)
                0.125 = fieldNorm(doc=623)
        0.12 = coord(3/25)