Document (#40469)

Author
Brantl, M.
Ceynowa, K.
Meiers, T.
Wolf, T.
Title
Visuelle Suche in historischen Werken
Source
Datenbank Spektrum. 17(2017) H.1, S.53-60
Year
2017
Abstract
Die Bayerische Staatsbibliothek (BSB) zählt mit ihrem Bestand von knapp 11 Mio. Bänden zu den bedeutendsten Universalbibliotheken der Welt. Bereits 1,2 Mio. Werke sind digitalisiert, was die BSB zur größten digitalen Kulturinstitution in Deutschland macht. Dieser digitale Bestand umfasst vorwiegend urheberrechtsfreie Werke vom 8. bis ins 20. Jahrhundert, von der mittelalterlichen Bibelhandschrift bis zur Boulevardzeitung der 1920er-Jahre. Diese Vielfalt des zu digitalisierenden schriftlichen Kulturerbes und das hohe Tempo der Massendigitalisierung in den letzten Jahren haben ihren Preis - die inhaltliche Erschließung der Werke hinkt hinterher, insbesondere bei Werken, die nicht mittels Optical Character Recognition-Verfahren (OCR) automatisiert maschinenlesbar transformiert und zugänglich gemacht werden können. Dies gilt insbesondere für mittelalterliche Handschriften, Alte Druck- und Spezialbestände. Deshalb blieb auch der reichhaltige, in diesen Werken verborgene Bildbestand für den Nutzer weitestgehend verborgen und konnte lediglich durch das Durchblättern am Bildschirm entdeckt werden. Dies war Motivation für die Bayerische Staatsbibliothek, gemeinsam mit dem Fraunhofer Heinrich-Hertz-Institut in Berlin ein System zur ähnlichkeitsbasierten Bildsuche aufzubauen, welches sämtliche Bildinhalte aller 1,2 Mio. Digitalisate automatisch identifiziert. Hierbei werden mittels morphologischer Verfahren Bilder aus den Buchseiten extrahiert, die danach aufgrund von Farb- und Kantenmerkmalen klassifiziert werden. Bilder "ohne Informationswert" werden mit Hilfe von Methoden aus dem Bereich des maschinellen Lernens herausgefiltert. Damit konnten aus den digitalisierten Werken der BSB bislang mehr als 43 Mio. einzelne Bilder identifiziert werden, die mittels einer hochperformanten Suchmaschine über eine frei verfügbare Web-Applikation dem Anwender direkt zur Verfügung stehen. Dank der Vielfalt und Reichhaltigkeit der indexierten Bestände spricht dieses Angebot nicht nur Historiker und Buchwissenschaftler an, sondern Interessierte aus den unterschiedlichsten Fachrichtungen. Die Ähnlichkeitssuche stellt dabei unbekannte, ungewöhnliche und oftmals überraschende Bezüge zwischen unterschiedlichsten Werken her.
Content
Vgl.: https://dx.doi.org/10.1007/s13222-017-0250-0.
Theme
Visualisierung
Field
Geschichtswissenschaft
Form
Bilder

Similar documents (author)

  1. Altenhöner, R.; Brantl, M.; Ceynowa, K.: Digitale Langzeitarchivierung in Deutschland : Projekte und Perspektiven (2011) 2.95
    2.9535255 = sum of:
      2.9535255 = product of:
        4.4302883 = sum of:
          2.0262253 = weight(author_txt:ceynowa in 1616) [ClassicSimilarity], result of:
            2.0262253 = score(doc=1616,freq=1.0), product of:
              0.5879547 = queryWeight, product of:
                1.1194806 = boost
                9.189939 = idf(docFreq=11, maxDocs=43254)
                0.057149794 = queryNorm
              3.446227 = fieldWeight in 1616, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.189939 = idf(docFreq=11, maxDocs=43254)
                0.375 = fieldNorm(doc=1616)
          2.404063 = weight(author_txt:brantl in 1616) [ClassicSimilarity], result of:
            2.404063 = score(doc=1616,freq=1.0), product of:
              0.65894514 = queryWeight, product of:
                1.185139 = boost
                9.728935 = idf(docFreq=6, maxDocs=43254)
                0.057149794 = queryNorm
              3.6483507 = fieldWeight in 1616, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.728935 = idf(docFreq=6, maxDocs=43254)
                0.375 = fieldNorm(doc=1616)
        0.6666667 = coord(2/3)
    
  2. Brantl, M.; Ceynowa, K.; Fabian, C.; Messmer, G.; Schäfer, I.: Massendigitalisierung deutscher Drucke des 16. Jahrhunderts : ein Erfahrungsbericht der Bayerischen Staatsbibliothek (2009) 2.46
    2.4612713 = sum of:
      2.4612713 = product of:
        3.691907 = sum of:
          1.688521 = weight(author_txt:ceynowa in 2189) [ClassicSimilarity], result of:
            1.688521 = score(doc=2189,freq=1.0), product of:
              0.5879547 = queryWeight, product of:
                1.1194806 = boost
                9.189939 = idf(docFreq=11, maxDocs=43254)
                0.057149794 = queryNorm
              2.8718557 = fieldWeight in 2189, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.189939 = idf(docFreq=11, maxDocs=43254)
                0.3125 = fieldNorm(doc=2189)
          2.0033858 = weight(author_txt:brantl in 2189) [ClassicSimilarity], result of:
            2.0033858 = score(doc=2189,freq=1.0), product of:
              0.65894514 = queryWeight, product of:
                1.185139 = boost
                9.728935 = idf(docFreq=6, maxDocs=43254)
                0.057149794 = queryNorm
              3.0402923 = fieldWeight in 2189, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.728935 = idf(docFreq=6, maxDocs=43254)
                0.3125 = fieldNorm(doc=2189)
        0.6666667 = coord(2/3)
    
  3. Kempf, K.; Brantl, M.; Meiers, T.; Wolf, T.: Auf der Suche nach dem verborgenen Bild : Künstliche Intelligenz erschließt historische Bibliotheksbestände (2021) 2.14
    2.1379437 = sum of:
      2.1379437 = product of:
        3.2069154 = sum of:
          1.2035296 = weight(author_txt:wolf in 1149) [ClassicSimilarity], result of:
            1.2035296 = score(doc=1149,freq=1.0), product of:
              0.4691489 = queryWeight, product of:
                8.209109 = idf(docFreq=31, maxDocs=43254)
                0.057149794 = queryNorm
              2.5653467 = fieldWeight in 1149, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.209109 = idf(docFreq=31, maxDocs=43254)
                0.3125 = fieldNorm(doc=1149)
          2.0033858 = weight(author_txt:brantl in 1149) [ClassicSimilarity], result of:
            2.0033858 = score(doc=1149,freq=1.0), product of:
              0.65894514 = queryWeight, product of:
                1.185139 = boost
                9.728935 = idf(docFreq=6, maxDocs=43254)
                0.057149794 = queryNorm
              3.0402923 = fieldWeight in 1149, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.728935 = idf(docFreq=6, maxDocs=43254)
                0.3125 = fieldNorm(doc=1149)
        0.6666667 = coord(2/3)
    
  4. Kempf, K.; Brantl, M.; Meiers, T.; Wolf, T.: Auf der Suche nach dem verborgenen Bild : Künstliche Intelligenz erschließt historische Bibliotheksbestände (2021) 2.14
    2.1379437 = sum of:
      2.1379437 = product of:
        3.2069154 = sum of:
          1.2035296 = weight(author_txt:wolf in 1220) [ClassicSimilarity], result of:
            1.2035296 = score(doc=1220,freq=1.0), product of:
              0.4691489 = queryWeight, product of:
                8.209109 = idf(docFreq=31, maxDocs=43254)
                0.057149794 = queryNorm
              2.5653467 = fieldWeight in 1220, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.209109 = idf(docFreq=31, maxDocs=43254)
                0.3125 = fieldNorm(doc=1220)
          2.0033858 = weight(author_txt:brantl in 1220) [ClassicSimilarity], result of:
            2.0033858 = score(doc=1220,freq=1.0), product of:
              0.65894514 = queryWeight, product of:
                1.185139 = boost
                9.728935 = idf(docFreq=6, maxDocs=43254)
                0.057149794 = queryNorm
              3.0402923 = fieldWeight in 1220, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.728935 = idf(docFreq=6, maxDocs=43254)
                0.3125 = fieldNorm(doc=1220)
        0.6666667 = coord(2/3)
    
  5. Kempf, K.; Brantl, M.; Meiers, T.; Wolf, T.: Auf der Suche nach dem verborgenen Bild : Künstliche Intelligenz erschließt historische Bibliotheksbestände (2021) 2.14
    2.1379437 = sum of:
      2.1379437 = product of:
        3.2069154 = sum of:
          1.2035296 = weight(author_txt:wolf in 1226) [ClassicSimilarity], result of:
            1.2035296 = score(doc=1226,freq=1.0), product of:
              0.4691489 = queryWeight, product of:
                8.209109 = idf(docFreq=31, maxDocs=43254)
                0.057149794 = queryNorm
              2.5653467 = fieldWeight in 1226, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.209109 = idf(docFreq=31, maxDocs=43254)
                0.3125 = fieldNorm(doc=1226)
          2.0033858 = weight(author_txt:brantl in 1226) [ClassicSimilarity], result of:
            2.0033858 = score(doc=1226,freq=1.0), product of:
              0.65894514 = queryWeight, product of:
                1.185139 = boost
                9.728935 = idf(docFreq=6, maxDocs=43254)
                0.057149794 = queryNorm
              3.0402923 = fieldWeight in 1226, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.728935 = idf(docFreq=6, maxDocs=43254)
                0.3125 = fieldNorm(doc=1226)
        0.6666667 = coord(2/3)
    

Similar documents (content)

  1. Brumm, A.: Modellierung eines Informationssystems zum Bühnentanz als semantisches Wiki (2010) 0.11
    0.108065695 = sum of:
      0.108065695 = product of:
        0.6754106 = sum of:
          0.07736291 = weight(abstract_txt:insbesondere in 490) [ClassicSimilarity], result of:
            0.07736291 = score(doc=490,freq=2.0), product of:
              0.10407848 = queryWeight, product of:
                1.1963859 = boost
                5.6064196 = idf(docFreq=431, maxDocs=43254)
                0.015516868 = queryNorm
              0.7433132 = fieldWeight in 490, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.6064196 = idf(docFreq=431, maxDocs=43254)
                0.09375 = fieldNorm(doc=490)
          0.04062916 = weight(abstract_txt:werden in 490) [ClassicSimilarity], result of:
            0.04062916 = score(doc=490,freq=1.0), product of:
              0.12310661 = queryWeight, product of:
                2.2536793 = boost
                3.5203447 = idf(docFreq=3478, maxDocs=43254)
                0.015516868 = queryNorm
              0.33003232 = fieldWeight in 490, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5203447 = idf(docFreq=3478, maxDocs=43254)
                0.09375 = fieldNorm(doc=490)
          0.17614183 = weight(abstract_txt:werke in 490) [ClassicSimilarity], result of:
            0.17614183 = score(doc=490,freq=1.0), product of:
              0.25978923 = queryWeight, product of:
                2.3149788 = boost
                7.232194 = idf(docFreq=84, maxDocs=43254)
                0.015516868 = queryNorm
              0.6780182 = fieldWeight in 490, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.232194 = idf(docFreq=84, maxDocs=43254)
                0.09375 = fieldNorm(doc=490)
          0.3812767 = weight(abstract_txt:werken in 490) [ClassicSimilarity], result of:
            0.3812767 = score(doc=490,freq=1.0), product of:
              0.51541364 = queryWeight, product of:
                4.20958 = boost
                7.8906555 = idf(docFreq=43, maxDocs=43254)
                0.015516868 = queryNorm
              0.73974895 = fieldWeight in 490, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.8906555 = idf(docFreq=43, maxDocs=43254)
                0.09375 = fieldNorm(doc=490)
        0.16 = coord(4/25)
    
  2. Hauer, M.: Zur Bedeutung normierter Terminologien in Zeiten moderner Sprach- und Information-Retrieval-Technologien (2013) 0.10
    0.10219204 = sum of:
      0.10219204 = product of:
        0.63870025 = sum of:
          0.11554167 = weight(abstract_txt:mittels in 2460) [ClassicSimilarity], result of:
            0.11554167 = score(doc=2460,freq=1.0), product of:
              0.22147569 = queryWeight, product of:
                2.1374671 = boost
                6.677633 = idf(docFreq=147, maxDocs=43254)
                0.015516868 = queryNorm
              0.5216901 = fieldWeight in 2460, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.677633 = idf(docFreq=147, maxDocs=43254)
                0.078125 = fieldNorm(doc=2460)
          0.05864314 = weight(abstract_txt:werden in 2460) [ClassicSimilarity], result of:
            0.05864314 = score(doc=2460,freq=3.0), product of:
              0.12310661 = queryWeight, product of:
                2.2536793 = boost
                3.5203447 = idf(docFreq=3478, maxDocs=43254)
                0.015516868 = queryNorm
              0.47636062 = fieldWeight in 2460, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.5203447 = idf(docFreq=3478, maxDocs=43254)
                0.078125 = fieldNorm(doc=2460)
          0.14678484 = weight(abstract_txt:werke in 2460) [ClassicSimilarity], result of:
            0.14678484 = score(doc=2460,freq=1.0), product of:
              0.25978923 = queryWeight, product of:
                2.3149788 = boost
                7.232194 = idf(docFreq=84, maxDocs=43254)
                0.015516868 = queryNorm
              0.56501514 = fieldWeight in 2460, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.232194 = idf(docFreq=84, maxDocs=43254)
                0.078125 = fieldNorm(doc=2460)
          0.31773058 = weight(abstract_txt:werken in 2460) [ClassicSimilarity], result of:
            0.31773058 = score(doc=2460,freq=1.0), product of:
              0.51541364 = queryWeight, product of:
                4.20958 = boost
                7.8906555 = idf(docFreq=43, maxDocs=43254)
                0.015516868 = queryNorm
              0.61645746 = fieldWeight in 2460, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.8906555 = idf(docFreq=43, maxDocs=43254)
                0.078125 = fieldNorm(doc=2460)
        0.16 = coord(4/25)
    
  3. Vorndran, A.: Hervorholen, was in unseren Daten steckt! : Mehrwerte durch Analysen großer Bibliotheksdatenbestände (2018) 0.09
    0.089503065 = sum of:
      0.089503065 = product of:
        0.5593942 = sum of:
          0.058406692 = weight(abstract_txt:dies in 602) [ClassicSimilarity], result of:
            0.058406692 = score(doc=602,freq=2.0), product of:
              0.09744696 = queryWeight, product of:
                1.1576438 = boost
                5.4248695 = idf(docFreq=517, maxDocs=43254)
                0.015516868 = queryNorm
              0.59936905 = fieldWeight in 602, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4248695 = idf(docFreq=517, maxDocs=43254)
                0.078125 = fieldNorm(doc=602)
          0.11554167 = weight(abstract_txt:mittels in 602) [ClassicSimilarity], result of:
            0.11554167 = score(doc=602,freq=1.0), product of:
              0.22147569 = queryWeight, product of:
                2.1374671 = boost
                6.677633 = idf(docFreq=147, maxDocs=43254)
                0.015516868 = queryNorm
              0.5216901 = fieldWeight in 602, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.677633 = idf(docFreq=147, maxDocs=43254)
                0.078125 = fieldNorm(doc=602)
          0.06771526 = weight(abstract_txt:werden in 602) [ClassicSimilarity], result of:
            0.06771526 = score(doc=602,freq=4.0), product of:
              0.12310661 = queryWeight, product of:
                2.2536793 = boost
                3.5203447 = idf(docFreq=3478, maxDocs=43254)
                0.015516868 = queryNorm
              0.55005383 = fieldWeight in 602, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.5203447 = idf(docFreq=3478, maxDocs=43254)
                0.078125 = fieldNorm(doc=602)
          0.31773058 = weight(abstract_txt:werken in 602) [ClassicSimilarity], result of:
            0.31773058 = score(doc=602,freq=1.0), product of:
              0.51541364 = queryWeight, product of:
                4.20958 = boost
                7.8906555 = idf(docFreq=43, maxDocs=43254)
                0.015516868 = queryNorm
              0.61645746 = fieldWeight in 602, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.8906555 = idf(docFreq=43, maxDocs=43254)
                0.078125 = fieldNorm(doc=602)
        0.16 = coord(4/25)
    
  4. Google erweitert Buchsuche um Staatsbibliothek (2006) 0.09
    0.08716433 = sum of:
      0.08716433 = product of:
        0.72636944 = sum of:
          0.20171641 = weight(abstract_txt:staatsbibliothek in 2160) [ClassicSimilarity], result of:
            0.20171641 = score(doc=2160,freq=1.0), product of:
              0.17671661 = queryWeight, product of:
                1.5589403 = boost
                7.305397 = idf(docFreq=78, maxDocs=43254)
                0.015516868 = queryNorm
              1.1414683 = fieldWeight in 2160, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.305397 = idf(docFreq=78, maxDocs=43254)
                0.15625 = fieldNorm(doc=2160)
          0.23108333 = weight(abstract_txt:mittels in 2160) [ClassicSimilarity], result of:
            0.23108333 = score(doc=2160,freq=1.0), product of:
              0.22147569 = queryWeight, product of:
                2.1374671 = boost
                6.677633 = idf(docFreq=147, maxDocs=43254)
                0.015516868 = queryNorm
              1.0433801 = fieldWeight in 2160, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.677633 = idf(docFreq=147, maxDocs=43254)
                0.15625 = fieldNorm(doc=2160)
          0.29356968 = weight(abstract_txt:werke in 2160) [ClassicSimilarity], result of:
            0.29356968 = score(doc=2160,freq=1.0), product of:
              0.25978923 = queryWeight, product of:
                2.3149788 = boost
                7.232194 = idf(docFreq=84, maxDocs=43254)
                0.015516868 = queryNorm
              1.1300303 = fieldWeight in 2160, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.232194 = idf(docFreq=84, maxDocs=43254)
                0.15625 = fieldNorm(doc=2160)
        0.12 = coord(3/25)
    
  5. Goldmann, M.: Alles rund um Briefmarken und Postgeschichte : die Philatelistische Bibliothek Hamburg bietet hochwertige Auskünfte - mehr als 20.000 Medien im Bestand (2013) 0.09
    0.08678969 = sum of:
      0.08678969 = product of:
        0.7232474 = sum of:
          0.06607963 = weight(abstract_txt:dies in 2088) [ClassicSimilarity], result of:
            0.06607963 = score(doc=2088,freq=1.0), product of:
              0.09744696 = queryWeight, product of:
                1.1576438 = boost
                5.4248695 = idf(docFreq=517, maxDocs=43254)
                0.015516868 = queryNorm
              0.6781087 = fieldWeight in 2088, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4248695 = idf(docFreq=517, maxDocs=43254)
                0.125 = fieldNorm(doc=2088)
          0.14879885 = weight(abstract_txt:bestand in 2088) [ClassicSimilarity], result of:
            0.14879885 = score(doc=2088,freq=1.0), product of:
              0.16741318 = queryWeight, product of:
                1.5173495 = boost
                7.110497 = idf(docFreq=95, maxDocs=43254)
                0.015516868 = queryNorm
              0.8888121 = fieldWeight in 2088, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.110497 = idf(docFreq=95, maxDocs=43254)
                0.125 = fieldNorm(doc=2088)
          0.5083689 = weight(abstract_txt:werken in 2088) [ClassicSimilarity], result of:
            0.5083689 = score(doc=2088,freq=1.0), product of:
              0.51541364 = queryWeight, product of:
                4.20958 = boost
                7.8906555 = idf(docFreq=43, maxDocs=43254)
                0.015516868 = queryNorm
              0.98633194 = fieldWeight in 2088, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.8906555 = idf(docFreq=43, maxDocs=43254)
                0.125 = fieldNorm(doc=2088)
        0.12 = coord(3/25)