Document (#40469)

Author
Brantl, M.
Ceynowa, K.
Meiers, T.
Wolf, T.
Title
Visuelle Suche in historischen Werken
Source
Datenbank Spektrum. 17(2017) H.1, S.53-60
Year
2017
Abstract
Die Bayerische Staatsbibliothek (BSB) zählt mit ihrem Bestand von knapp 11 Mio. Bänden zu den bedeutendsten Universalbibliotheken der Welt. Bereits 1,2 Mio. Werke sind digitalisiert, was die BSB zur größten digitalen Kulturinstitution in Deutschland macht. Dieser digitale Bestand umfasst vorwiegend urheberrechtsfreie Werke vom 8. bis ins 20. Jahrhundert, von der mittelalterlichen Bibelhandschrift bis zur Boulevardzeitung der 1920er-Jahre. Diese Vielfalt des zu digitalisierenden schriftlichen Kulturerbes und das hohe Tempo der Massendigitalisierung in den letzten Jahren haben ihren Preis - die inhaltliche Erschließung der Werke hinkt hinterher, insbesondere bei Werken, die nicht mittels Optical Character Recognition-Verfahren (OCR) automatisiert maschinenlesbar transformiert und zugänglich gemacht werden können. Dies gilt insbesondere für mittelalterliche Handschriften, Alte Druck- und Spezialbestände. Deshalb blieb auch der reichhaltige, in diesen Werken verborgene Bildbestand für den Nutzer weitestgehend verborgen und konnte lediglich durch das Durchblättern am Bildschirm entdeckt werden. Dies war Motivation für die Bayerische Staatsbibliothek, gemeinsam mit dem Fraunhofer Heinrich-Hertz-Institut in Berlin ein System zur ähnlichkeitsbasierten Bildsuche aufzubauen, welches sämtliche Bildinhalte aller 1,2 Mio. Digitalisate automatisch identifiziert. Hierbei werden mittels morphologischer Verfahren Bilder aus den Buchseiten extrahiert, die danach aufgrund von Farb- und Kantenmerkmalen klassifiziert werden. Bilder "ohne Informationswert" werden mit Hilfe von Methoden aus dem Bereich des maschinellen Lernens herausgefiltert. Damit konnten aus den digitalisierten Werken der BSB bislang mehr als 43 Mio. einzelne Bilder identifiziert werden, die mittels einer hochperformanten Suchmaschine über eine frei verfügbare Web-Applikation dem Anwender direkt zur Verfügung stehen. Dank der Vielfalt und Reichhaltigkeit der indexierten Bestände spricht dieses Angebot nicht nur Historiker und Buchwissenschaftler an, sondern Interessierte aus den unterschiedlichsten Fachrichtungen. Die Ähnlichkeitssuche stellt dabei unbekannte, ungewöhnliche und oftmals überraschende Bezüge zwischen unterschiedlichsten Werken her.
Content
Vgl.: https://dx.doi.org/10.1007/s13222-017-0250-0.
Theme
Visualisierung
Field
Geschichtswissenschaft
Form
Bilder

Similar documents (author)

  1. Ceynowa, K.: Von der 'dreigeteilten' zur 'fraktalen' Bibliothek : benutzerorientierte Bibliotheksarbeit im Wandel, das Beispiel der Stadtbibliothek Paderborn (1994) 2.23
    2.2315671 = sum of:
      2.2315671 = product of:
        4.4631343 = sum of:
          4.4631343 = weight(author_txt:ceynowa in 764) [ClassicSimilarity], result of:
            4.4631343 = score(doc=764,freq=1.0), product of:
              0.7722835 = queryWeight, product of:
                1.1025707 = boost
                9.246624 = idf(docFreq=10, maxDocs=41962)
                0.07575078 = queryNorm
              5.77914 = fieldWeight in 764, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.246624 = idf(docFreq=10, maxDocs=41962)
                0.625 = fieldNorm(doc=764)
        0.5 = coord(1/2)
    
  2. Ceynowa, K.: Von der 'dreigeteilten' zur 'fraktalen' Bibliothek : benutzerorientierte Bibliotheksarbeit im Wandel, das Beispiel der Stadtbibliothek Paderborn (1994) 2.23
    2.2315671 = sum of:
      2.2315671 = product of:
        4.4631343 = sum of:
          4.4631343 = weight(author_txt:ceynowa in 6193) [ClassicSimilarity], result of:
            4.4631343 = score(doc=6193,freq=1.0), product of:
              0.7722835 = queryWeight, product of:
                1.1025707 = boost
                9.246624 = idf(docFreq=10, maxDocs=41962)
                0.07575078 = queryNorm
              5.77914 = fieldWeight in 6193, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.246624 = idf(docFreq=10, maxDocs=41962)
                0.625 = fieldNorm(doc=6193)
        0.5 = coord(1/2)
    
  3. Ceynowa, K.: Sacherschließung - Können wir sie uns noch leisten? Suche nach Antworten mit den Mitteln des Controlling (2003) 2.23
    2.2315671 = sum of:
      2.2315671 = product of:
        4.4631343 = sum of:
          4.4631343 = weight(author_txt:ceynowa in 3379) [ClassicSimilarity], result of:
            4.4631343 = score(doc=3379,freq=1.0), product of:
              0.7722835 = queryWeight, product of:
                1.1025707 = boost
                9.246624 = idf(docFreq=10, maxDocs=41962)
                0.07575078 = queryNorm
              5.77914 = fieldWeight in 3379, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.246624 = idf(docFreq=10, maxDocs=41962)
                0.625 = fieldNorm(doc=3379)
        0.5 = coord(1/2)
    
  4. Ceynowa, K.: ¬Die Bayerische Staatsbibliothek im mobilen Internet : innovative Informationsangebote für Smartphone und iPad (2010) 2.23
    2.2315671 = sum of:
      2.2315671 = product of:
        4.4631343 = sum of:
          4.4631343 = weight(author_txt:ceynowa in 666) [ClassicSimilarity], result of:
            4.4631343 = score(doc=666,freq=1.0), product of:
              0.7722835 = queryWeight, product of:
                1.1025707 = boost
                9.246624 = idf(docFreq=10, maxDocs=41962)
                0.07575078 = queryNorm
              5.77914 = fieldWeight in 666, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.246624 = idf(docFreq=10, maxDocs=41962)
                0.625 = fieldNorm(doc=666)
        0.5 = coord(1/2)
    
  5. Ceynowa, K.: Informationsdienste im mobilen Internet : das Beispiel der Bayerischen Staatsbibliothek (2011) 2.23
    2.2315671 = sum of:
      2.2315671 = product of:
        4.4631343 = sum of:
          4.4631343 = weight(author_txt:ceynowa in 2198) [ClassicSimilarity], result of:
            4.4631343 = score(doc=2198,freq=1.0), product of:
              0.7722835 = queryWeight, product of:
                1.1025707 = boost
                9.246624 = idf(docFreq=10, maxDocs=41962)
                0.07575078 = queryNorm
              5.77914 = fieldWeight in 2198, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.246624 = idf(docFreq=10, maxDocs=41962)
                0.625 = fieldNorm(doc=2198)
        0.5 = coord(1/2)
    

Similar documents (content)

  1. Brumm, A.: Modellierung eines Informationssystems zum Bühnentanz als semantisches Wiki (2010) 0.11
    0.1085227 = sum of:
      0.1085227 = product of:
        0.6782669 = sum of:
          0.07775084 = weight(abstract_txt:insbesondere in 1026) [ClassicSimilarity], result of:
            0.07775084 = score(doc=1026,freq=2.0), product of:
              0.10428273 = queryWeight, product of:
                1.1904987 = boost
                5.6234965 = idf(docFreq=411, maxDocs=41962)
                0.015576756 = queryNorm
              0.74557734 = fieldWeight in 1026, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.6234965 = idf(docFreq=411, maxDocs=41962)
                0.09375 = fieldNorm(doc=1026)
          0.04083377 = weight(abstract_txt:werden in 1026) [ClassicSimilarity], result of:
            0.04083377 = score(doc=1026,freq=1.0), product of:
              0.12334998 = queryWeight, product of:
                2.2426057 = boost
                3.5310931 = idf(docFreq=3338, maxDocs=41962)
                0.015576756 = queryNorm
              0.33103997 = fieldWeight in 1026, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5310931 = idf(docFreq=3338, maxDocs=41962)
                0.09375 = fieldNorm(doc=1026)
          0.17763121 = weight(abstract_txt:werke in 1026) [ClassicSimilarity], result of:
            0.17763121 = score(doc=1026,freq=1.0), product of:
              0.2608929 = queryWeight, product of:
                2.3062134 = boost
                7.262493 = idf(docFreq=79, maxDocs=41962)
                0.015576756 = queryNorm
              0.68085873 = fieldWeight in 1026, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.262493 = idf(docFreq=79, maxDocs=41962)
                0.09375 = fieldNorm(doc=1026)
          0.38205102 = weight(abstract_txt:werken in 1026) [ClassicSimilarity], result of:
            0.38205102 = score(doc=1026,freq=1.0), product of:
              0.51540256 = queryWeight, product of:
                4.1847167 = boost
                7.9068503 = idf(docFreq=41, maxDocs=41962)
                0.015576756 = queryNorm
              0.7412672 = fieldWeight in 1026, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.9068503 = idf(docFreq=41, maxDocs=41962)
                0.09375 = fieldNorm(doc=1026)
        0.16 = coord(4/25)
    
  2. Hauer, M.: Zur Bedeutung normierter Terminologien in Zeiten moderner Sprach- und Information-Retrieval-Technologien (2013) 0.10
    0.102556705 = sum of:
      0.102556705 = product of:
        0.6409794 = sum of:
          0.11563905 = weight(abstract_txt:mittels in 2996) [ClassicSimilarity], result of:
            0.11563905 = score(doc=2996,freq=1.0), product of:
              0.22129585 = queryWeight, product of:
                2.1240025 = boost
                6.6886926 = idf(docFreq=141, maxDocs=41962)
                0.015576756 = queryNorm
              0.5225541 = fieldWeight in 2996, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6886926 = idf(docFreq=141, maxDocs=41962)
                0.078125 = fieldNorm(doc=2996)
          0.058938473 = weight(abstract_txt:werden in 2996) [ClassicSimilarity], result of:
            0.058938473 = score(doc=2996,freq=3.0), product of:
              0.12334998 = queryWeight, product of:
                2.2426057 = boost
                3.5310931 = idf(docFreq=3338, maxDocs=41962)
                0.015576756 = queryNorm
              0.47781503 = fieldWeight in 2996, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.5310931 = idf(docFreq=3338, maxDocs=41962)
                0.078125 = fieldNorm(doc=2996)
          0.148026 = weight(abstract_txt:werke in 2996) [ClassicSimilarity], result of:
            0.148026 = score(doc=2996,freq=1.0), product of:
              0.2608929 = queryWeight, product of:
                2.3062134 = boost
                7.262493 = idf(docFreq=79, maxDocs=41962)
                0.015576756 = queryNorm
              0.5673823 = fieldWeight in 2996, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.262493 = idf(docFreq=79, maxDocs=41962)
                0.078125 = fieldNorm(doc=2996)
          0.31837586 = weight(abstract_txt:werken in 2996) [ClassicSimilarity], result of:
            0.31837586 = score(doc=2996,freq=1.0), product of:
              0.51540256 = queryWeight, product of:
                4.1847167 = boost
                7.9068503 = idf(docFreq=41, maxDocs=41962)
                0.015576756 = queryNorm
              0.6177227 = fieldWeight in 2996, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.9068503 = idf(docFreq=41, maxDocs=41962)
                0.078125 = fieldNorm(doc=2996)
        0.16 = coord(4/25)
    
  3. Vorndran, A.: Hervorholen, was in unseren Daten steckt! : Mehrwerte durch Analysen großer Bibliotheksdatenbestände (2018) 0.09
    0.0896537 = sum of:
      0.0896537 = product of:
        0.56033564 = sum of:
          0.058264412 = weight(abstract_txt:dies in 1166) [ClassicSimilarity], result of:
            0.058264412 = score(doc=1166,freq=2.0), product of:
              0.09715506 = queryWeight, product of:
                1.1490937 = boost
                5.4279137 = idf(docFreq=500, maxDocs=41962)
                0.015576756 = queryNorm
              0.5997054 = fieldWeight in 1166, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4279137 = idf(docFreq=500, maxDocs=41962)
                0.078125 = fieldNorm(doc=1166)
          0.11563905 = weight(abstract_txt:mittels in 1166) [ClassicSimilarity], result of:
            0.11563905 = score(doc=1166,freq=1.0), product of:
              0.22129585 = queryWeight, product of:
                2.1240025 = boost
                6.6886926 = idf(docFreq=141, maxDocs=41962)
                0.015576756 = queryNorm
              0.5225541 = fieldWeight in 1166, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6886926 = idf(docFreq=141, maxDocs=41962)
                0.078125 = fieldNorm(doc=1166)
          0.06805629 = weight(abstract_txt:werden in 1166) [ClassicSimilarity], result of:
            0.06805629 = score(doc=1166,freq=4.0), product of:
              0.12334998 = queryWeight, product of:
                2.2426057 = boost
                3.5310931 = idf(docFreq=3338, maxDocs=41962)
                0.015576756 = queryNorm
              0.5517333 = fieldWeight in 1166, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.5310931 = idf(docFreq=3338, maxDocs=41962)
                0.078125 = fieldNorm(doc=1166)
          0.31837586 = weight(abstract_txt:werken in 1166) [ClassicSimilarity], result of:
            0.31837586 = score(doc=1166,freq=1.0), product of:
              0.51540256 = queryWeight, product of:
                4.1847167 = boost
                7.9068503 = idf(docFreq=41, maxDocs=41962)
                0.015576756 = queryNorm
              0.6177227 = fieldWeight in 1166, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.9068503 = idf(docFreq=41, maxDocs=41962)
                0.078125 = fieldNorm(doc=1166)
        0.16 = coord(4/25)
    
  4. Google erweitert Buchsuche um Staatsbibliothek (2006) 0.09
    0.08787096 = sum of:
      0.08787096 = product of:
        0.732258 = sum of:
          0.2049279 = weight(abstract_txt:staatsbibliothek in 2160) [ClassicSimilarity], result of:
            0.2049279 = score(doc=2160,freq=1.0), product of:
              0.1783421 = queryWeight, product of:
                1.5568604 = boost
                7.35406 = idf(docFreq=72, maxDocs=41962)
                0.015576756 = queryNorm
              1.1490719 = fieldWeight in 2160, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.35406 = idf(docFreq=72, maxDocs=41962)
                0.15625 = fieldNorm(doc=2160)
          0.2312781 = weight(abstract_txt:mittels in 2160) [ClassicSimilarity], result of:
            0.2312781 = score(doc=2160,freq=1.0), product of:
              0.22129585 = queryWeight, product of:
                2.1240025 = boost
                6.6886926 = idf(docFreq=141, maxDocs=41962)
                0.015576756 = queryNorm
              1.0451082 = fieldWeight in 2160, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6886926 = idf(docFreq=141, maxDocs=41962)
                0.15625 = fieldNorm(doc=2160)
          0.296052 = weight(abstract_txt:werke in 2160) [ClassicSimilarity], result of:
            0.296052 = score(doc=2160,freq=1.0), product of:
              0.2608929 = queryWeight, product of:
                2.3062134 = boost
                7.262493 = idf(docFreq=79, maxDocs=41962)
                0.015576756 = queryNorm
              1.1347646 = fieldWeight in 2160, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.262493 = idf(docFreq=79, maxDocs=41962)
                0.15625 = fieldNorm(doc=2160)
        0.12 = coord(3/25)
    
  5. Goldmann, M.: Alles rund um Briefmarken und Postgeschichte : die Philatelistische Bibliothek Hamburg bietet hochwertige Auskünfte - mehr als 20.000 Medien im Bestand (2013) 0.09
    0.08716351 = sum of:
      0.08716351 = product of:
        0.7263626 = sum of:
          0.065918654 = weight(abstract_txt:dies in 2624) [ClassicSimilarity], result of:
            0.065918654 = score(doc=2624,freq=1.0), product of:
              0.09715506 = queryWeight, product of:
                1.1490937 = boost
                5.4279137 = idf(docFreq=500, maxDocs=41962)
                0.015576756 = queryNorm
              0.6784892 = fieldWeight in 2624, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4279137 = idf(docFreq=500, maxDocs=41962)
                0.125 = fieldNorm(doc=2624)
          0.15104255 = weight(abstract_txt:bestand in 2624) [ClassicSimilarity], result of:
            0.15104255 = score(doc=2624,freq=1.0), product of:
              0.1688597 = queryWeight, product of:
                1.5149063 = boost
                7.1558833 = idf(docFreq=88, maxDocs=41962)
                0.015576756 = queryNorm
              0.8944854 = fieldWeight in 2624, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.1558833 = idf(docFreq=88, maxDocs=41962)
                0.125 = fieldNorm(doc=2624)
          0.5094014 = weight(abstract_txt:werken in 2624) [ClassicSimilarity], result of:
            0.5094014 = score(doc=2624,freq=1.0), product of:
              0.51540256 = queryWeight, product of:
                4.1847167 = boost
                7.9068503 = idf(docFreq=41, maxDocs=41962)
                0.015576756 = queryNorm
              0.9883563 = fieldWeight in 2624, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.9068503 = idf(docFreq=41, maxDocs=41962)
                0.125 = fieldNorm(doc=2624)
        0.12 = coord(3/25)