Search (21 results, page 1 of 2)

Junger, U.; Scholze, F.: Neue Wege und Qualitäten : die Inhaltserschließungspolitik der Deutschen Nationalbibliothek (2021) 0.02

0.018787762 = product of:
  0.07984799 = sum of:
    0.024390915 = weight(_text_:und in 365) [ClassicSimilarity], result of:
      0.024390915 = score(doc=365,freq=18.0), product of:
        0.055336144 = queryWeight, product of:
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.024967048 = queryNorm
        0.4407773 = fieldWeight in 365, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.046875 = fieldNorm(doc=365)
    0.016444085 = product of:
      0.03288817 = sum of:
        0.03288817 = weight(_text_:bibliothekswesen in 365) [ClassicSimilarity], result of:
          0.03288817 = score(doc=365,freq=2.0), product of:
            0.11129492 = queryWeight, product of:
              4.457672 = idf(docFreq=1392, maxDocs=44218)
              0.024967048 = queryNorm
            0.2955047 = fieldWeight in 365, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.457672 = idf(docFreq=1392, maxDocs=44218)
              0.046875 = fieldNorm(doc=365)
      0.5 = coord(1/2)
    0.03288817 = weight(_text_:bibliothekswesen in 365) [ClassicSimilarity], result of:
      0.03288817 = score(doc=365,freq=2.0), product of:
        0.11129492 = queryWeight, product of:
          4.457672 = idf(docFreq=1392, maxDocs=44218)
          0.024967048 = queryNorm
        0.2955047 = fieldWeight in 365, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.457672 = idf(docFreq=1392, maxDocs=44218)
          0.046875 = fieldNorm(doc=365)
    0.006124827 = weight(_text_:in in 365) [ClassicSimilarity], result of:
      0.006124827 = score(doc=365,freq=8.0), product of:
        0.033961542 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.024967048 = queryNorm
        0.18034597 = fieldWeight in 365, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.046875 = fieldNorm(doc=365)
  0.23529412 = coord(4/17)

Abstract: Es kommt nicht oft vor, dass ein bibliothekfachliches Thema Gegenstand eines ganzseitigen Artikels im Feuilleton einer der wichtigsten überregionalen Zeitungen in Deutschland wird. Am 31. Juli 2017 war dies der Fall: Die Frankfurter Allgemeine Zeitung veröffentlichte einen Artikel des Generaldirektors der Bayerischen Staatsbibliothek, Klaus Ceynowa, in dem dieser sich kritisch mit einem Konzept zur inhaltlichen Erschließung auseinandersetzte, das die Deutsche Nationalbibliothek (DNB) zuvor für die deutschsprachige bibliothekarische Community veröffentlicht hatte. Hatten bereits zuvor die Bemühungen der DNB, Verfahren zur maschinellen Dokumenterschließung zu entwickeln und einzusetzen, zu kontroversen Reaktionen im Bibliothekswesen geführt, so sorgte dieser Artikel noch einmal in besonderer Weise für Aufmerksamkeit und Diskussionen zu einem Thema, das vielen als eher verstaubt und unattraktiv galt: die Inhaltserschließung. Der folgende Beitrag zeichnet einige Grundlinien der Erschließungspolitik der DNB seit 2010 nach und beschreibt, welche Instrumente und Verfahren bei der Inhaltserschließung zum Einsatz kommen, welche konzeptionellen Entscheidungen ihr zugrunde liegen, wie versucht wird, Qualität zu erfassen und welche Entwicklungs- und Handlungsfelder für die Zukunft gesehen werden.
Series: Bibliotheks- und Informationspraxis; 70
Source: Qualität in der Inhaltserschließung. Hrsg.: M. Franke-Maier, u.a

Mödden, E.; Dreger, A.; Hommes, K.P.; Mohammadianbisheh, N.; Mölck, L.; Pinna, L.; Sitte-Zöllner, D.: ¬Der Weg zur Gründung der AG Erschließung ÖB-DNB und die Entwicklung eines maschinellen Verfahrens zur Verschlagwortung der Kinder- und Jugendliteratur mit GND-Vokabular (2020) 0.01
```
0.0121053485 = product of:
  0.068596974 = sum of:
    0.022745084 = weight(_text_:und in 71) [ClassicSimilarity], result of:
      0.022745084 = score(doc=71,freq=46.0), product of:
        0.055336144 = queryWeight, product of:
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.024967048 = queryNorm
        0.41103485 = fieldWeight in 71, product of:
          6.78233 = tf(freq=46.0), with freq of:
            46.0 = termFreq=46.0
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.02734375 = fieldNorm(doc=71)
    0.0047263918 = weight(_text_:in in 71) [ClassicSimilarity], result of:
      0.0047263918 = score(doc=71,freq=14.0), product of:
        0.033961542 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.024967048 = queryNorm
        0.13916893 = fieldWeight in 71, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.02734375 = fieldNorm(doc=71)
    0.0411255 = weight(_text_:bibliotheken in 71) [ClassicSimilarity], result of:
      0.0411255 = score(doc=71,freq=18.0), product of:
        0.09407886 = queryWeight, product of:
          3.768121 = idf(docFreq=2775, maxDocs=44218)
          0.024967048 = queryNorm
        0.4371386 = fieldWeight in 71, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          3.768121 = idf(docFreq=2775, maxDocs=44218)
          0.02734375 = fieldNorm(doc=71)
  0.1764706 = coord(3/17)
```
Abstract

Öffentliche Bibliotheken und die Deutsche Nationalbibliothek haben viele Gemeinsamkeiten. Öffentliche Bibliotheken und die Deutsche Nationalbibliothek haben aber auch viele Unterschiede. Zu den Gemeinsamkeiten zählt zweifelsohne die inhaltliche Vielfalt des Medienangebots. Anders als institutionell gebundene Bibliotheken wie Hochschulbibliotheken, seien es Universitäts- oder Fachhochschulbibliotheken, offerieren Öffentliche Bibliotheken wie auch die DNB ein über institutionelle Belange hinausreichendes universelles Angebot. Sie haben Kinderbücher und philosophische Abhandlungen, Ratgeber und Romane, Spiele und Noten im Bestand. Die Vielfalt der inhaltlichen und formalen Medien korrespondiert mit der Vielfalt der Nutzerinnen und Nutzer. Die Nutzerinnen und Nutzer der Öffentlichen Bibliotheken und der DNB müssen nicht Angehörige einer Institution sein; es reicht, dass sie ein wie auch immer geartetes je persönliches Informationsbedürfnis haben. Zu den Unterschieden zählen neben den gesetzlichen Aufgaben, für die DNB festgelegt als Bundesgesetz, für Öffentliche Bibliotheken in einigen Bundesländern durch entsprechende Landesgesetze, der ganz unterschiedliche Umgang mit Medien. Haben Öffentliche Bibliotheken den Anspruch, Gebrauchsbibliotheken zu sein, in denen Medien, intensiv genutzt, ein zeitlich begrenztes Aufenthaltsrecht haben, so fungiert die DNB als Gedächtnisinstitution, um die Medien auch für nachfolgende Generationen zu erhalten. Die DNB hat dabei die Aufgabe, die Medien "zu erschließen und bibliografisch zu verzeichnen" und hierbei "zentrale bibliothekarische und nationalbiografische Dienste zu leisten" sowie die Medien "für die Allgemeinheit nutzbar zu machen" (DNBG §2, Satz 1)1. Die Gebrauchsorientierung der Öffentlichen Bibliotheken impliziert, dass mit der hohen Kundenorientierung die gute Auffindbarkeit von Medien im Zentrum der Erschließung steht. Was liegt daher näher, als hierfür die zentralen Metadatendienste der DNB zu nutzen? Die Versorgung mit zentral erfassten Metadaten in hoher Qualität für die Erschließung lokaler Medienbestände ist wirtschaftlich und ermöglicht, die knappen personellen Ressourcen auf dringend erforderliche Vermittlungstätigkeiten zu konzentrieren. Soweit die Theorie, soweit auch die Praxis, bis vor etwa zehn Jahren Veränderungen eintraten.
Man einigte sich auf das Vorhaben, die THEMA-angereicherten Daten des Buchhandels bibliotheksspezifisch aufzuarbeiten und maschinell in verbale Erschließungskategorien zu transferieren. Es wurde darüber informiert, dass die Marketing- und Verlagsservice für den Buchhandel GmbH - MVB sich intensiv bemüht, die Verwendung von THEMA durch die Verlage im Rahmen einer Offensive zur Verbesserung der Qualität der Daten im Verzeichnis Lieferbarer Bücher zu forcieren. Die Workshop-Teilnehmenden waren sich einig in der Auffassung, an der Normierung des Schlagwortvokabulars, wie sie über die GND stattfindet, festzuhalten. Denkbar sei, freie Schlagwörter aus den MVB-Daten mit der GND abzugleichen und/oder eine Liste von Begriffen zu erstellen, die für ein Mapping der THEMA-Notationen auf die GND geeignet wären. Als geeignetstes Segment empfahl sich die Kinderliteratur, zum einen wegen ihrer großen Menge und hohen Bedeutung in den ÖBs und der mangelnden Erschließung durch jedwede Klassifikation, zum anderen wegen der Menge und Qualität der freien Schlagwörter in den Daten des Buchhandels. Verabredet wurde, dass die DNB eine Skizze für ein Projekt zur Nutzbarmachung von THEMA und der freien MVB-Schlagwörter erarbeiten und zur Verfügung stellen sollte, während die ÖB-Vertretungen eine Liste von Schlagwörtern im Bereich der Kinderliteratur, die von besonderer Relevanz z.B. bei der Auskunftserteilung sind, an die DNB übermitteln wollten.

Area

Öffentliche Bibliotheken

Sack, H.: Hybride Künstliche Intelligenz in der automatisierten Inhaltserschließung (2021) 0.01

0.009190094 = product of:
  0.052077197 = sum of:
    0.019915098 = weight(_text_:und in 372) [ClassicSimilarity], result of:
      0.019915098 = score(doc=372,freq=12.0), product of:
        0.055336144 = queryWeight, product of:
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.024967048 = queryNorm
        0.35989314 = fieldWeight in 372, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.046875 = fieldNorm(doc=372)
    0.008661814 = weight(_text_:in in 372) [ClassicSimilarity], result of:
      0.008661814 = score(doc=372,freq=16.0), product of:
        0.033961542 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.024967048 = queryNorm
        0.25504774 = fieldWeight in 372, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.046875 = fieldNorm(doc=372)
    0.023500286 = weight(_text_:bibliotheken in 372) [ClassicSimilarity], result of:
      0.023500286 = score(doc=372,freq=2.0), product of:
        0.09407886 = queryWeight, product of:
          3.768121 = idf(docFreq=2775, maxDocs=44218)
          0.024967048 = queryNorm
        0.24979347 = fieldWeight in 372, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.768121 = idf(docFreq=2775, maxDocs=44218)
          0.046875 = fieldNorm(doc=372)
  0.1764706 = coord(3/17)

Abstract: Effizienter (Online-)Zugang zu Bibliotheks- und Archivmaterialien erfordert eine qualitativ hinreichende inhaltliche Erschließung dieser Dokumente. Die passgenaue Verschlagwortung und Kategorisierung dieser unstrukturierten Dokumente ermöglichen einen strukturell gegliederten Zugang sowohl in der analogen als auch in der digitalen Welt. Darüber hinaus erweitert eine vollständige Transkription der Dokumente den Zugang über die Möglichkeiten der Volltextsuche. Angesichts der in jüngster Zeit erzielten spektakulären Erfolge der Künstlichen Intelligenz liegt die Schlussfolgerung nahe, dass auch das Problem der automatisierten Inhaltserschließung für Bibliotheken und Archive als mehr oder weniger gelöst anzusehen wäre. Allerdings lassen sich die oftmals nur in thematisch engen Teilbereichen erzielten Erfolge nicht immer problemlos verallgemeinern oder in einen neuen Kontext übertragen. Das Ziel der vorliegenden Darstellung liegt in der Diskussion des aktuellen Stands der Technik der automatisierten inhaltlichen Erschließung anhand ausgewählter Beispiele sowie möglicher Fortschritte und Prognosen basierend auf aktuellen Entwicklungen des maschinellen Lernens und der Künstlichen Intelligenz einschließlich deren Kritik.
Series: Bibliotheks- und Informationspraxis; 70
Source: Qualität in der Inhaltserschließung. Hrsg.: M. Franke-Maier, u.a

Pielmeier, S.; Voß, V.; Carstensen, H.; Kahl, B.: Online-Workshop "Computerunterstützte Inhaltserschließung" 2020 (2021) 0.01

0.009141268 = product of:
  0.05180052 = sum of:
    0.022995977 = weight(_text_:und in 4409) [ClassicSimilarity], result of:
      0.022995977 = score(doc=4409,freq=16.0), product of:
        0.055336144 = queryWeight, product of:
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.024967048 = queryNorm
        0.41556883 = fieldWeight in 4409, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.046875 = fieldNorm(doc=4409)
    0.005304256 = weight(_text_:in in 4409) [ClassicSimilarity], result of:
      0.005304256 = score(doc=4409,freq=6.0), product of:
        0.033961542 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.024967048 = queryNorm
        0.1561842 = fieldWeight in 4409, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.046875 = fieldNorm(doc=4409)
    0.023500286 = weight(_text_:bibliotheken in 4409) [ClassicSimilarity], result of:
      0.023500286 = score(doc=4409,freq=2.0), product of:
        0.09407886 = queryWeight, product of:
          3.768121 = idf(docFreq=2775, maxDocs=44218)
          0.024967048 = queryNorm
        0.24979347 = fieldWeight in 4409, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.768121 = idf(docFreq=2775, maxDocs=44218)
          0.046875 = fieldNorm(doc=4409)
  0.1764706 = coord(3/17)

Abstract: Zum ersten Mal in digitaler Form und mit 230 Teilnehmer*innen fand am 11. und 12. November 2020 der 4. Workshop "Computerunterstützte Inhaltserschließung" statt, organisiert von der Deutschen Nationalbibliothek (DNB), der Firma Eurospider Information Technology, der Staatsbibliothek zu Berlin - Preußischer Kulturbesitz (SBB), der UB Stuttgart und dem Bibliotheksservice-Zentrum Baden-Württemberg (BSZ). Im Mittelpunkt stand der "Digitale Assistent DA-3": In elf Vorträgen wurden Anwendungsszenarien und Erfahrungen mit dem System vorgestellt, das Bibliotheken und andere Wissenschafts- und Kultureinrichtungen bei der Inhaltserschließung unterstützen soll. Die Begrüßung und Einführung in die beiden Workshop-Tage übernahm Frank Scholze (Generaldirektor der DNB). Er sieht den DA-3 als Baustein für die Verzahnung der intellektuellen und der maschinellen Erschließung.

Lepsky, K.: Automatisches Indexieren (2023) 0.01
```
0.0065426584 = product of:
  0.055612598 = sum of:
    0.016429119 = weight(_text_:und in 781) [ClassicSimilarity], result of:
      0.016429119 = score(doc=781,freq=6.0), product of:
        0.055336144 = queryWeight, product of:
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.024967048 = queryNorm
        0.2968967 = fieldWeight in 781, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.0546875 = fieldNorm(doc=781)
    0.03918348 = weight(_text_:informationswissenschaft in 781) [ClassicSimilarity], result of:
      0.03918348 = score(doc=781,freq=2.0), product of:
        0.11246919 = queryWeight, product of:
          4.504705 = idf(docFreq=1328, maxDocs=44218)
          0.024967048 = queryNorm
        0.348393 = fieldWeight in 781, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.504705 = idf(docFreq=1328, maxDocs=44218)
          0.0546875 = fieldNorm(doc=781)
  0.11764706 = coord(2/17)
```
Abstract

Unter Indexierung versteht man die Zuordnung von inhaltskennzeichnenden Ausdrücken (Indextermen, Indexaten, Erschließungsmerkmalen) zu Dokumenten. Über die zugeteilten Indexterme soll ein gezieltes Auffinden der Dokumente ermöglicht werden. Indexterme können inhaltsbeschreibende Merkmale wie Notationen, Deskriptoren, kontrollierte oder freie Schlagwörter sein; es kann sich auch um reine Stichwörter handeln, die aus dem Text des Dokuments gewonnen werden. Eine Indexierung kann intellektuell, computerunterstützt oder automatisch erfolgen. Computerunterstützte Indexierungsverfahren kombinieren die intellektuelle Indexierung mit automatischen Vorarbeiten. Bei der automatischen Indexierung werden die Indexterme automatisch aus dem Dokumenttext ermittelt und dem Dokument zugeordnet. Automatische Indexierung bedient sich für die Verarbeitung der Zeichenketten im Dokument linguistischer und statistischer Verfahren.

Source

Grundlagen der Informationswissenschaft. Hrsg.: Rainer Kuhlen, Dirk Lewandowski, Wolfgang Semar und Christa Womser-Hacker. 7., völlig neu gefasste Ausg
Qualität in der Inhaltserschließung (2021) 0.00
```
0.004145476 = product of:
  0.035236545 = sum of:
    0.028164204 = weight(_text_:und in 753) [ClassicSimilarity], result of:
      0.028164204 = score(doc=753,freq=54.0), product of:
        0.055336144 = queryWeight, product of:
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.024967048 = queryNorm
        0.5089658 = fieldWeight in 753, product of:
          7.3484693 = tf(freq=54.0), with freq of:
            54.0 = termFreq=54.0
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.03125 = fieldNorm(doc=753)
    0.0070723416 = weight(_text_:in in 753) [ClassicSimilarity], result of:
      0.0070723416 = score(doc=753,freq=24.0), product of:
        0.033961542 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.024967048 = queryNorm
        0.2082456 = fieldWeight in 753, product of:
          4.8989797 = tf(freq=24.0), with freq of:
            24.0 = termFreq=24.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.03125 = fieldNorm(doc=753)
  0.11764706 = coord(2/17)
```
Abstract

Der 70. Band der BIPRA-Reihe beschäftigt sich mit der Qualität in der Inhaltserschließung im Kontext etablierter Verfahren und technologischer Innovationen. Treffen heterogene Erzeugnisse unterschiedlicher Methoden und Systeme aufeinander, müssen minimale Anforderungen an die Qualität der Inhaltserschließung festgelegt werden. Die Qualitätsfrage wird zurzeit in verschiedenen Zusammenhängen intensiv diskutiert und im vorliegenden Band aufgegriffen. In diesem Themenfeld aktive Autor:innen beschreiben aus ihrem jeweiligen Blickwinkel unterschiedliche Aspekte zu Metadaten, Normdaten, Formaten, Erschließungsverfahren und Erschließungspolitik. Der Band versteht sich als Handreichung und Anregung für die Diskussion um die Qualität in der Inhaltserschließung.

Content

Inhalt: Editorial - Michael Franke-Maier, Anna Kasprzik, Andreas Ledl und Hans Schürmann Qualität in der Inhaltserschließung - Ein Überblick aus 50 Jahren (1970-2020) - Andreas Ledl Fit for Purpose - Standardisierung von inhaltserschließenden Informationen durch Richtlinien für Metadaten - Joachim Laczny Neue Wege und Qualitäten - Die Inhaltserschließungspolitik der Deutschen Nationalbibliothek - Ulrike Junger und Frank Scholze Wissensbasen für die automatische Erschließung und ihre Qualität am Beispiel von Wikidata - Lydia Pintscher, Peter Bourgonje, Julián Moreno Schneider, Malte Ostendorff und Georg Rehm Qualitätssicherung in der GND - Esther Scheven Qualitätskriterien und Qualitätssicherung in der inhaltlichen Erschließung - Thesenpapier des Expertenteams RDA-Anwendungsprofil für die verbale Inhaltserschließung (ET RAVI) Coli-conc - Eine Infrastruktur zur Nutzung und Erstellung von Konkordanzen - Uma Balakrishnan, Stefan Peters und Jakob Voß Methoden und Metriken zur Messung von OCR-Qualität für die Kuratierung von Daten und Metadaten - Clemens Neudecker, Karolina Zaczynska, Konstantin Baierer, Georg Rehm, Mike Gerber und Julián Moreno Schneider Datenqualität als Grundlage qualitativer Inhaltserschließung - Jakob Voß Bemerkungen zu der Qualitätsbewertung von MARC-21-Datensätzen - Rudolf Ungváry und Péter Király Named Entity Linking mit Wikidata und GND - Das Potenzial handkuratierter und strukturierter Datenquellen für die semantische Anreicherung von Volltexten - Sina Menzel, Hannes Schnaitter, Josefine Zinck, Vivien Petras, Clemens Neudecker, Kai Labusch, Elena Leitner und Georg Rehm Ein Protokoll für den Datenabgleich im Web am Beispiel von OpenRefine und der Gemeinsamen Normdatei (GND) - Fabian Steeg und Adrian Pohl Verbale Erschließung in Katalogen und Discovery-Systemen - Überlegungen zur Qualität - Heidrun Wiesenmüller Inhaltserschließung für Discovery-Systeme gestalten - Jan Frederik Maas Evaluierung von Verschlagwortung im Kontext des Information Retrievals - Christian Wartena und Koraljka Golub Die Qualität der Fremddatenanreicherung FRED - Cyrus Beck Quantität als Qualität - Was die Verbünde zur Verbesserung der Inhaltserschließung beitragen können - Rita Albrecht, Barbara Block, Mathias Kratzer und Peter Thiessen Hybride Künstliche Intelligenz in der automatisierten Inhaltserschließung - Harald Sack

Footnote

Vgl.: https://www.degruyter.com/document/doi/10.1515/9783110691597/html. DOI: https://doi.org/10.1515/9783110691597. Rez. in: Information - Wissenschaft und Praxis 73(2022) H.2-3, S.131-132 (B. Lorenz u. V. Steyer). Weitere Rezension in: o-bib 9(20229 Nr.3. (Martin Völkl) [https://www.o-bib.de/bib/article/view/5843/8714].

Series

Bibliotheks- und Informationspraxis; 70
Pintscher, L.; Bourgonje, P.; Moreno Schneider, J.; Ostendorff, M.; Rehm, G.: Wissensbasen für die automatische Erschließung und ihre Qualität am Beispiel von Wikidata : die Inhaltserschließungspolitik der Deutschen Nationalbibliothek (2021) 0.00
```
0.003610394 = product of:
  0.03068835 = sum of:
    0.02347017 = weight(_text_:und in 366) [ClassicSimilarity], result of:
      0.02347017 = score(doc=366,freq=24.0), product of:
        0.055336144 = queryWeight, product of:
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.024967048 = queryNorm
        0.42413816 = fieldWeight in 366, product of:
          4.8989797 = tf(freq=24.0), with freq of:
            24.0 = termFreq=24.0
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.0390625 = fieldNorm(doc=366)
    0.007218178 = weight(_text_:in in 366) [ClassicSimilarity], result of:
      0.007218178 = score(doc=366,freq=16.0), product of:
        0.033961542 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.024967048 = queryNorm
        0.21253976 = fieldWeight in 366, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0390625 = fieldNorm(doc=366)
  0.11764706 = coord(2/17)
```
Abstract

Wikidata ist eine freie Wissensbasis, die allgemeine Daten über die Welt zur Verfügung stellt. Sie wird von Wikimedia entwickelt und betrieben, wie auch das Schwesterprojekt Wikipedia. Die Daten in Wikidata werden von einer großen Community von Freiwilligen gesammelt und gepflegt, wobei die Daten sowie die zugrundeliegende Ontologie von vielen Projekten, Institutionen und Firmen als Basis für Applikationen und Visualisierungen, aber auch für das Training von maschinellen Lernverfahren genutzt werden. Wikidata nutzt MediaWiki und die Erweiterung Wikibase als technische Grundlage der kollaborativen Arbeit an einer Wissensbasis, die verlinkte offene Daten für Menschen und Maschinen zugänglich macht. Ende 2020 beschreibt Wikidata über 90 Millionen Entitäten unter Verwendung von über 8 000 Eigenschaften, womit insgesamt mehr als 1,15 Milliarden Aussagen über die beschriebenen Entitäten getroffen werden. Die Datenobjekte dieser Entitäten sind mit äquivalenten Einträgen in mehr als 5 500 externen Datenbanken, Katalogen und Webseiten verknüpft, was Wikidata zu einem der zentralen Knotenpunkte des Linked Data Web macht. Mehr als 11 500 aktiv Editierende tragen neue Daten in die Wissensbasis ein und pflegen sie. Diese sind in Wiki-Projekten organisiert, die jeweils bestimmte Themenbereiche oder Aufgabengebiete adressieren. Die Daten werden in mehr als der Hälfte der Inhaltsseiten in den Wikimedia-Projekten genutzt und unter anderem mehr als 6,5 Millionen Mal am Tag über den SPARQL-Endpoint abgefragt, um sie in externe Applikationen und Visualisierungen einzubinden.

Series

Bibliotheks- und Informationspraxis; 70

Source

Qualität in der Inhaltserschließung. Hrsg.: M. Franke-Maier, u.a
Giesselbach, S.; Estler-Ziegler, T.: Dokumente schneller analysieren mit Künstlicher Intelligenz (2021) 0.00
```
0.0031857954 = product of:
  0.02707926 = sum of:
    0.02347017 = weight(_text_:und in 128) [ClassicSimilarity], result of:
      0.02347017 = score(doc=128,freq=24.0), product of:
        0.055336144 = queryWeight, product of:
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.024967048 = queryNorm
        0.42413816 = fieldWeight in 128, product of:
          4.8989797 = tf(freq=24.0), with freq of:
            24.0 = termFreq=24.0
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.0390625 = fieldNorm(doc=128)
    0.003609089 = weight(_text_:in in 128) [ClassicSimilarity], result of:
      0.003609089 = score(doc=128,freq=4.0), product of:
        0.033961542 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.024967048 = queryNorm
        0.10626988 = fieldWeight in 128, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0390625 = fieldNorm(doc=128)
  0.11764706 = coord(2/17)
```
Abstract

Künstliche Intelligenz (KI) und natürliches Sprachverstehen (natural language understanding/NLU) verändern viele Aspekte unseres Alltags und unserer Arbeitsweise. Besondere Prominenz erlangte NLU durch Sprachassistenten wie Siri, Alexa und Google Now. NLU bietet Firmen und Einrichtungen das Potential, Prozesse effizienter zu gestalten und Mehrwert aus textuellen Inhalten zu schöpfen. So sind NLU-Lösungen in der Lage, komplexe, unstrukturierte Dokumente inhaltlich zu erschließen. Für die semantische Textanalyse hat das NLU-Team des IAIS Sprachmodelle entwickelt, die mit Deep-Learning-Verfahren trainiert werden. Die NLU-Suite analysiert Dokumente, extrahiert Eckdaten und erstellt bei Bedarf sogar eine strukturierte Zusammenfassung. Mit diesen Ergebnissen, aber auch über den Inhalt der Dokumente selbst, lassen sich Dokumente vergleichen oder Texte mit ähnlichen Informationen finden. KI-basierten Sprachmodelle sind der klassischen Verschlagwortung deutlich überlegen. Denn sie finden nicht nur Texte mit vordefinierten Schlagwörtern, sondern suchen intelligent nach Begriffen, die in ähnlichem Zusammenhang auftauchen oder als Synonym gebraucht werden. Der Vortrag liefert eine Einordnung der Begriffe "Künstliche Intelligenz" und "Natural Language Understanding" und zeigt Möglichkeiten, Grenzen, aktuelle Forschungsrichtungen und Methoden auf. Anhand von Praxisbeispielen wird anschließend demonstriert, wie NLU zur automatisierten Belegverarbeitung, zur Katalogisierung von großen Datenbeständen wie Nachrichten und Patenten und zur automatisierten thematischen Gruppierung von Social Media Beiträgen und Publikationen genutzt werden kann.
Franke-Maier, M.; Beck, C.; Kasprzik, A.; Maas, J.F.; Pielmeier, S.; Wiesenmüller, H: ¬Ein Feuerwerk an Algorithmen und der Startschuss zur Bildung eines Kompetenznetzwerks für maschinelle Erschließung : Bericht zur Fachtagung Netzwerk maschinelle Erschließung an der Deutschen Nationalbibliothek am 10. und 11. Oktober 2019 (2020) 0.00
```
0.0027032366 = product of:
  0.022977512 = sum of:
    0.019915098 = weight(_text_:und in 5851) [ClassicSimilarity], result of:
      0.019915098 = score(doc=5851,freq=12.0), product of:
        0.055336144 = queryWeight, product of:
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.024967048 = queryNorm
        0.35989314 = fieldWeight in 5851, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.046875 = fieldNorm(doc=5851)
    0.0030624135 = weight(_text_:in in 5851) [ClassicSimilarity], result of:
      0.0030624135 = score(doc=5851,freq=2.0), product of:
        0.033961542 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.024967048 = queryNorm
        0.09017298 = fieldWeight in 5851, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.046875 = fieldNorm(doc=5851)
  0.11764706 = coord(2/17)
```
Abstract

Am 10. und 11. Oktober 2019 trafen sich rund 100 Vertreterinnen und Vertreter aus Bibliothek, Wissenschaft und Wirtschaft an der Deutschen Nationalbibliothek (DNB) in Frankfurt am Main zu einer Fachtagung über das derzeitige Trend-Thema "maschinelle Erschließung". Ziel der Veranstaltung war die "Betrachtung unterschiedlicher Anwendungsbereiche maschineller Textanalyse" sowie die Initiation eines Dialogs zu Technologien für die maschinelle Textanalyse, Aufgabenstellungen, Erfahrungen und den Herausforderungen, die maschinelle Verfahren nach sich ziehen. Hintergrund ist der Auftrag des Standardisierungsausschusses an die DNB, regelmäßig einschlägige Tagungen durchzuführen, aus denen "perspektivisch ein Kompetenznetzwerk für die maschinelle Erschließung entsteh[t]".
Kasprzik, A.: Aufbau eines produktiven Dienstes für die automatisierte Inhaltserschließung an der ZBW : ein Status- und Erfahrungsbericht. (2023) 0.00
```
0.0025535726 = product of:
  0.021705367 = sum of:
    0.01714019 = weight(_text_:und in 935) [ClassicSimilarity], result of:
      0.01714019 = score(doc=935,freq=20.0), product of:
        0.055336144 = queryWeight, product of:
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.024967048 = queryNorm
        0.3097467 = fieldWeight in 935, product of:
          4.472136 = tf(freq=20.0), with freq of:
            20.0 = termFreq=20.0
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.03125 = fieldNorm(doc=935)
    0.004565177 = weight(_text_:in in 935) [ClassicSimilarity], result of:
      0.004565177 = score(doc=935,freq=10.0), product of:
        0.033961542 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.024967048 = queryNorm
        0.13442196 = fieldWeight in 935, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.03125 = fieldNorm(doc=935)
  0.11764706 = coord(2/17)
```
Abstract

Die ZBW - Leibniz-Informationszentrum Wirtschaft betreibt seit 2016 eigene angewandte Forschung im Bereich Machine Learning mit dem Zweck, praktikable Lösungen für eine automatisierte oder maschinell unterstützte Inhaltserschließung zu entwickeln. 2020 begann ein Team an der ZBW die Konzeption und Implementierung einer Softwarearchitektur, die es ermöglichte, diese prototypischen Lösungen in einen produktiven Dienst zu überführen und mit den bestehenden Nachweis- und Informationssystemen zu verzahnen. Sowohl die angewandte Forschung als auch die für dieses Vorhaben ("AutoSE") notwendige Softwareentwicklung sind direkt im Bibliotheksbereich der ZBW angesiedelt, werden kontinuierlich anhand des State of the Art vorangetrieben und profitieren von einem engen Austausch mit den Verantwortlichen für die intellektuelle Inhaltserschließung. Dieser Beitrag zeigt die Meilensteine auf, die das AutoSE-Team in zwei Jahren in Bezug auf den Aufbau und die Integration der Software erreicht hat, und skizziert, welche bis zum Ende der Pilotphase (2024) noch ausstehen. Die Architektur basiert auf Open-Source-Software und die eingesetzten Machine-Learning-Komponenten werden im Rahmen einer internationalen Zusammenarbeit im engen Austausch mit der Finnischen Nationalbibliothek (NLF) weiterentwickelt und zur Nachnutzung in dem von der NLF entwickelten Open-Source-Werkzeugkasten Annif aufbereitet. Das Betriebsmodell des AutoSE-Dienstes sieht regelmäßige Überprüfungen sowohl einzelner Komponenten als auch des Produktionsworkflows als Ganzes vor und erlaubt eine fortlaufende Weiterentwicklung der Architektur. Eines der Ergebnisse, das bis zum Ende der Pilotphase vorliegen soll, ist die Dokumentation der Anforderungen an einen dauerhaften produktiven Betrieb des Dienstes, damit die Ressourcen dafür im Rahmen eines tragfähigen Modells langfristig gesichert werden können. Aus diesem Praxisbeispiel lässt sich ableiten, welche Bedingungen gegeben sein müssen, um Machine-Learning-Lösungen wie die in Annif enthaltenen erfolgreich an einer Institution für die Inhaltserschließung einsetzen zu können.
Golub, K.: Automated subject indexing : an overview (2021) 0.00
```
4.203313E-4 = product of:
  0.0071456316 = sum of:
    0.0071456316 = weight(_text_:in in 718) [ClassicSimilarity], result of:
      0.0071456316 = score(doc=718,freq=8.0), product of:
        0.033961542 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.024967048 = queryNorm
        0.21040362 = fieldWeight in 718, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0546875 = fieldNorm(doc=718)
  0.05882353 = coord(1/17)
```
Abstract

In the face of the ever-increasing document volume, libraries around the globe are more and more exploring (semi-) automated approaches to subject indexing. This helps sustain bibliographic objectives, enrich metadata, and establish more connections across documents from various collections, effectively leading to improved information retrieval and access. However, generally accepted automated approaches that are functional in operative systems are lacking. This article aims to provide an overview of basic principles used for automated subject indexing, major approaches in relation to their possible application in actual library systems, existing working examples, as well as related challenges calling for further research.
Chou, C.; Chu, T.: ¬An analysis of BERT (NLP) for assisted subject indexing for Project Gutenberg (2022) 0.00
```
4.203313E-4 = product of:
  0.0071456316 = sum of:
    0.0071456316 = weight(_text_:in in 1139) [ClassicSimilarity], result of:
      0.0071456316 = score(doc=1139,freq=8.0), product of:
        0.033961542 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.024967048 = queryNorm
        0.21040362 = fieldWeight in 1139, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1139)
  0.05882353 = coord(1/17)
```
Abstract

In light of AI (Artificial Intelligence) and NLP (Natural language processing) technologies, this article examines the feasibility of using AI/NLP models to enhance the subject indexing of digital resources. While BERT (Bidirectional Encoder Representations from Transformers) models are widely used in scholarly communities, the authors assess whether BERT models can be used in machine-assisted indexing in the Project Gutenberg collection, through suggesting Library of Congress subject headings filtered by certain Library of Congress Classification subclass labels. The findings of this study are informative for further research on BERT models to assist with automatic subject indexing for digital library collections.
Matthews, P.; Glitre, K.: Genre analysis of movies using a topic model of plot summaries (2021) 0.00
```
3.6028394E-4 = product of:
  0.006124827 = sum of:
    0.006124827 = weight(_text_:in in 412) [ClassicSimilarity], result of:
      0.006124827 = score(doc=412,freq=8.0), product of:
        0.033961542 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.024967048 = queryNorm
        0.18034597 = fieldWeight in 412, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.046875 = fieldNorm(doc=412)
  0.05882353 = coord(1/17)
```
Abstract

Genre plays an important role in the description, navigation, and discovery of movies, but it is rarely studied at large scale using quantitative methods. This allows an analysis of how genre labels are applied, how genres are composed and how these ingredients change, and how genres compare. We apply unsupervised topic modeling to a large collection of textual movie summaries and then use the model's topic proportions to investigate key questions in genre, including recognizability, mapping, canonicity, and change over time. We find that many genres can be quite easily predicted by their lexical signatures and this defines their position on the genre landscape. We find significant genre composition changes between periods for westerns, science fiction and road movies, reflecting changes in production and consumption values. We show that in terms of canonicity, canonical examples are often at the high end of the topic distribution profile for the genre rather than central as might be predicted by categorization theory.
Lowe, D.B.; Dollinger, I.; Koster, T.; Herbert, B.E.: Text mining for type of research classification (2021) 0.00
```
3.1201506E-4 = product of:
  0.005304256 = sum of:
    0.005304256 = weight(_text_:in in 720) [ClassicSimilarity], result of:
      0.005304256 = score(doc=720,freq=6.0), product of:
        0.033961542 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.024967048 = queryNorm
        0.1561842 = fieldWeight in 720, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.046875 = fieldNorm(doc=720)
  0.05882353 = coord(1/17)
```
Abstract

This project brought together undergraduate students in Computer Science with librarians to mine abstracts of articles from the Texas A&M University Libraries' institutional repository, OAKTrust, in order to probe the creation of new metadata to improve discovery and use. The mining operation task consisted simply of classifying the articles into two categories of research type: basic research ("for understanding," "curiosity-based," or "knowledge-based") and applied research ("use-based"). These categories are fundamental especially for funders but are also important to researchers. The mining-to-classification steps took several iterations, but ultimately, we achieved good results with the toolkit BERT (Bidirectional Encoder Representations from Transformers). The project and its workflows represent a preview of what may lie ahead in the future of crafting metadata using text mining techniques to enhance discoverability.
Asula, M.; Makke, J.; Freienthal, L.; Kuulmets, H.-A.; Sirel, R.: Kratt: developing an automatic subject indexing tool for the National Library of Estonia : how to transfer metadata information among work cluster members (2021) 0.00
```
3.1201506E-4 = product of:
  0.005304256 = sum of:
    0.005304256 = weight(_text_:in in 723) [ClassicSimilarity], result of:
      0.005304256 = score(doc=723,freq=6.0), product of:
        0.033961542 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.024967048 = queryNorm
        0.1561842 = fieldWeight in 723, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.046875 = fieldNorm(doc=723)
  0.05882353 = coord(1/17)
```
Abstract

Manual subject indexing in libraries is a time-consuming and costly process and the quality of the assigned subjects is affected by the cataloger's knowledge on the specific topics contained in the book. Trying to solve these issues, we exploited the opportunities arising from artificial intelligence to develop Kratt: a prototype of an automatic subject indexing tool. Kratt is able to subject index a book independent of its extent and genre with a set of keywords present in the Estonian Subject Thesaurus. It takes Kratt approximately one minute to subject index a book, outperforming humans 10-15 times. Although the resulting keywords were not considered satisfactory by the catalogers, the ratings of a small sample of regular library users showed more promise. We also argue that the results can be enhanced by including a bigger corpus for training the model and applying more careful preprocessing techniques.
Zhang, Y.; Zhang, C.; Li, J.: Joint modeling of characters, words, and conversation contexts for microblog keyphrase extraction (2020) 0.00
```
3.0023666E-4 = product of:
  0.005104023 = sum of:
    0.005104023 = weight(_text_:in in 5816) [ClassicSimilarity], result of:
      0.005104023 = score(doc=5816,freq=8.0), product of:
        0.033961542 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.024967048 = queryNorm
        0.15028831 = fieldWeight in 5816, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5816)
  0.05882353 = coord(1/17)
```
Abstract

Millions of messages are produced on microblog platforms every day, leading to the pressing need for automatic identification of key points from the massive texts. To absorb salient content from the vast bulk of microblog posts, this article focuses on the task of microblog keyphrase extraction. In previous work, most efforts treat messages as independent documents and might suffer from the data sparsity problem exhibited in short and informal microblog posts. On the contrary, we propose to enrich contexts via exploiting conversations initialized by target posts and formed by their replies, which are generally centered around relevant topics to the target posts and therefore helpful for keyphrase identification. Concretely, we present a neural keyphrase extraction framework, which has 2 modules: a conversation context encoder and a keyphrase tagger. The conversation context encoder captures indicative representation from their conversation contexts and feeds the representation into the keyphrase tagger, and the keyphrase tagger extracts salient words from target posts. The 2 modules were trained jointly to optimize the conversation context encoding and keyphrase extraction processes. In the conversation context encoder, we leverage hierarchical structures to capture the word-level indicative representation and message-level indicative representation hierarchically. In both of the modules, we apply character-level representations, which enables the model to explore morphological features and deal with the out-of-vocabulary problem caused by the informal language style of microblog messages. Extensive comparison results on real-life data sets indicate that our model outperforms state-of-the-art models from previous studies.
Suominen, O.; Koskenniemi, I.: Annif Analyzer Shootout : comparing text lemmatization methods for automated subject indexing (2022) 0.00
```
3.0023666E-4 = product of:
  0.005104023 = sum of:
    0.005104023 = weight(_text_:in in 658) [ClassicSimilarity], result of:
      0.005104023 = score(doc=658,freq=8.0), product of:
        0.033961542 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.024967048 = queryNorm
        0.15028831 = fieldWeight in 658, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0390625 = fieldNorm(doc=658)
  0.05882353 = coord(1/17)
```
Abstract

Automated text classification is an important function for many AI systems relevant to libraries, including automated subject indexing and classification. When implemented using the traditional natural language processing (NLP) paradigm, one key part of the process is the normalization of words using stemming or lemmatization, which reduces the amount of linguistic variation and often improves the quality of classification. In this paper, we compare the output of seven different text lemmatization algorithms as well as two baseline methods. We measure how the choice of method affects the quality of text classification using example corpora in three languages. The experiments have been performed using the open source Annif toolkit for automated subject indexing and classification, but should generalize also to other NLP toolkits and similar text classification tasks. The results show that lemmatization methods in most cases outperform baseline methods in text classification particularly for Finnish and Swedish text, but not English, where baseline methods are most effective. The differences between lemmatization methods are quite small. The systematic comparison will help optimize text classification pipelines and inform the further development of the Annif toolkit to incorporate a wider choice of normalization methods.
Moulaison-Sandy, H.; Adkins, D.; Bossaller, J.; Cho, H.: ¬An automated approach to describing fiction : a methodology to use book reviews to identify affect (2021) 0.00
```
2.972191E-4 = product of:
  0.005052725 = sum of:
    0.005052725 = weight(_text_:in in 710) [ClassicSimilarity], result of:
      0.005052725 = score(doc=710,freq=4.0), product of:
        0.033961542 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.024967048 = queryNorm
        0.14877784 = fieldWeight in 710, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0546875 = fieldNorm(doc=710)
  0.05882353 = coord(1/17)
```
Abstract

Subject headings and genre terms are notoriously difficult to apply, yet are important for fiction. The current project functions as a proof of concept, using a text-mining methodology to identify affective information (emotion and tone) about fiction titles from professional book reviews as a potential first step in automating the subject analysis process. Findings are presented and discussed, comparing results to the range of aboutness and isness information in library cataloging records. The methodology is likewise presented, and how future work might expand on the current project to enhance catalog records through text-mining is explored.
Villaespesa, E.; Crider, S.: ¬A critical comparison analysis between human and machine-generated tags for the Metropolitan Museum of Art's collection (2021) 0.00
```
2.1229935E-4 = product of:
  0.003609089 = sum of:
    0.003609089 = weight(_text_:in in 341) [ClassicSimilarity], result of:
      0.003609089 = score(doc=341,freq=4.0), product of:
        0.033961542 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.024967048 = queryNorm
        0.10626988 = fieldWeight in 341, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0390625 = fieldNorm(doc=341)
  0.05882353 = coord(1/17)
```
Abstract

Purpose Based on the highlights of The Metropolitan Museum of Art's collection, the purpose of this paper is to examine the similarities and differences between the subject keywords tags assigned by the museum and those produced by three computer vision systems. Design/methodology/approach This paper uses computer vision tools to generate the data and the Getty Research Institute's Art and Architecture Thesaurus (AAT) to compare the subject keyword tags. Findings This paper finds that there are clear opportunities to use computer vision technologies to automatically generate tags that expand the terms used by the museum. This brings a new perspective to the collection that is different from the traditional art historical one. However, the study also surfaces challenges about the accuracy and lack of context within the computer vision results. Practical implications This finding has important implications on how these machine-generated tags complement the current taxonomies and vocabularies inputted in the collection database. In consequence, the museum needs to consider the selection process for choosing which computer vision system to apply to their collection. Furthermore, they also need to think critically about the kind of tags they wish to use, such as colors, materials or objects. Originality/value The study results add to the rapidly evolving field of computer vision within the art information context and provide recommendations of aspects to consider before selecting and implementing these technologies.
Yang, T.-H.; Hsieh, Y.-L.; Liu, S.-H.; Chang, Y.-C.; Hsu, W.-L.: ¬A flexible template generation and matching method with applications for publication reference metadata extraction (2021) 0.00
```
1.5011833E-4 = product of:
  0.0025520115 = sum of:
    0.0025520115 = weight(_text_:in in 63) [ClassicSimilarity], result of:
      0.0025520115 = score(doc=63,freq=2.0), product of:
        0.033961542 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.024967048 = queryNorm
        0.07514416 = fieldWeight in 63, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0390625 = fieldNorm(doc=63)
  0.05882353 = coord(1/17)
```
Abstract

Conventional rule-based approaches use exact template matching to capture linguistic information and necessarily need to enumerate all variations. We propose a novel flexible template generation and matching scheme called the principle-based approach (PBA) based on sequence alignment, and employ it for reference metadata extraction (RME) to demonstrate its effectiveness. The main contributions of this research are threefold. First, we propose an automatic template generation that can capture prominent patterns using the dominating set algorithm. Second, we devise an alignment-based template-matching technique that uses a logistic regression model, which makes it more general and flexible than pure rule-based approaches. Last, we apply PBA to RME on extensive cross-domain corpora and demonstrate its robustness and generality. Experiments reveal that the same set of templates produced by the PBA framework not only deliver consistent performance on various unseen domains, but also surpass hand-crafted knowledge (templates). We use four independent journal style test sets and one conference style test set in the experiments. When compared to renowned machine learning methods, such as conditional random fields (CRF), as well as recent deep learning methods (i.e., bi-directional long short-term memory with a CRF layer, Bi-LSTM-CRF), PBA has the best performance for all datasets.

Search (21 results, page 1 of 2)

Authors

Languages

Types

Themes