Search (61 results, page 2 of 4)

Fuhr, N.: Ranking-Experimente mit gewichteter Indexierung (1986) 0.02

0.016699877 = product of:
  0.08349938 = sum of:
    0.08349938 = weight(_text_:22 in 58) [ClassicSimilarity], result of:
      0.08349938 = score(doc=58,freq=2.0), product of:
        0.1798465 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.051357865 = queryNorm
        0.46428138 = fieldWeight in 58, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.09375 = fieldNorm(doc=58)
  0.2 = coord(1/5)

Date: 14. 6.2015 22:12:44

Hauer, M.: Automatische Indexierung (2000) 0.02

0.016699877 = product of:
  0.08349938 = sum of:
    0.08349938 = weight(_text_:22 in 5887) [ClassicSimilarity], result of:
      0.08349938 = score(doc=5887,freq=2.0), product of:
        0.1798465 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.051357865 = queryNorm
        0.46428138 = fieldWeight in 5887, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.09375 = fieldNorm(doc=5887)
  0.2 = coord(1/5)

Source: Wissen in Aktion: Wege des Knowledge Managements. 22. Online-Tagung der DGI, Frankfurt am Main, 2.-4.5.2000. Proceedings. Hrsg.: R. Schmidt

Fuhr, N.: Rankingexperimente mit gewichteter Indexierung (1986) 0.02

0.016699877 = product of:
  0.08349938 = sum of:
    0.08349938 = weight(_text_:22 in 2051) [ClassicSimilarity], result of:
      0.08349938 = score(doc=2051,freq=2.0), product of:
        0.1798465 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.051357865 = queryNorm
        0.46428138 = fieldWeight in 2051, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.09375 = fieldNorm(doc=2051)
  0.2 = coord(1/5)

Date: 14. 6.2015 22:12:56

Hauer, M.: Tiefenindexierung im Bibliothekskatalog : 17 Jahre intelligentCAPTURE (2019) 0.02

0.016699877 = product of:
  0.08349938 = sum of:
    0.08349938 = weight(_text_:22 in 5629) [ClassicSimilarity], result of:
      0.08349938 = score(doc=5629,freq=2.0), product of:
        0.1798465 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.051357865 = queryNorm
        0.46428138 = fieldWeight in 5629, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.09375 = fieldNorm(doc=5629)
  0.2 = coord(1/5)

Source: B.I.T.online. 22(2019) H.2, S.163-166

Medelyan, O.; Witten, I.H.: Domain-independent automatic keyphrase indexing with small training sets (2008) 0.01
```
0.014540519 = product of:
  0.072702594 = sum of:
    0.072702594 = weight(_text_:thesaurus in 1871) [ClassicSimilarity], result of:
      0.072702594 = score(doc=1871,freq=2.0), product of:
        0.23732872 = queryWeight, product of:
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.051357865 = queryNorm
        0.30633712 = fieldWeight in 1871, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.046875 = fieldNorm(doc=1871)
  0.2 = coord(1/5)
```
Abstract

Keyphrases are widely used in both physical and digital libraries as a brief, but precise, summary of documents. They help organize material based on content, provide thematic access, represent search results, and assist with navigation. Manual assignment is expensive because trained human indexers must reach an understanding of the document and select appropriate descriptors according to defined cataloging rules. We propose a new method that enhances automatic keyphrase extraction by using semantic information about terms and phrases gleaned from a domain-specific thesaurus. The key advantage of the new approach is that it performs well with very little training data. We evaluate it on a large set of manually indexed documents in the domain of agriculture, compare its consistency with a group of six professional indexers, and explore its performance on smaller collections of documents in other domains and of French and Spanish documents.
Kempf, A.O.: Automatische Indexierung in der sozialwissenschaftlichen Fachinformation : eine Evaluationsstudie zur maschinellen Erschließung für die Datenbank SOLIS (2012) 0.01
```
0.014540519 = product of:
  0.072702594 = sum of:
    0.072702594 = weight(_text_:thesaurus in 903) [ClassicSimilarity], result of:
      0.072702594 = score(doc=903,freq=2.0), product of:
        0.23732872 = queryWeight, product of:
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.051357865 = queryNorm
        0.30633712 = fieldWeight in 903, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.046875 = fieldNorm(doc=903)
  0.2 = coord(1/5)
```
Abstract

Automatische Indexierungsverfahren werden mit Zunahme der digitalen Verfügbarkeit von Metadaten und Volltexten mehr und mehr als eine mögliche Antwort auf das Management unstrukturierter Daten diskutiert. In der sozialwissenschaftlichen Fachinformation existiert in diesem Zusammenhang seit einiger Zeit der Vorschlag eines sogenannten Schalenmodells (vgl. Krause, 1996) mit unterschiedlichen Qualitätsstufen bei der inhaltlichen Erschließung. Vor diesem Hintergrund beschreibt die Arbeit zunächst Methoden und Verfahren der inhaltlichen und automatischen Indexierung, bevor vier Testläufe eines automatischen Indexierungssystems (MindServer) zur automatischen Erschließung von Datensätzen der bibliographischen Literaturdatenbank SOLIS mit Deskriptoren des Thesaurus Sozialwissenschaften sowie der Klassifikation Sozialwissenschaften beschrieben und analysiert werden. Es erfolgt eine ausführliche Fehleranalyse mit Beispielen sowie eine abschließende Diskussion, inwieweit die automatische Erschließung in dieser Form für die Randbereiche der Datenbank SOLIS für die Zukunft einen gangbaren Weg darstellt.
Asula, M.; Makke, J.; Freienthal, L.; Kuulmets, H.-A.; Sirel, R.: Kratt: developing an automatic subject indexing tool for the National Library of Estonia : how to transfer metadata information among work cluster members (2021) 0.01
```
0.014540519 = product of:
  0.072702594 = sum of:
    0.072702594 = weight(_text_:thesaurus in 723) [ClassicSimilarity], result of:
      0.072702594 = score(doc=723,freq=2.0), product of:
        0.23732872 = queryWeight, product of:
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.051357865 = queryNorm
        0.30633712 = fieldWeight in 723, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.046875 = fieldNorm(doc=723)
  0.2 = coord(1/5)
```
Abstract

Manual subject indexing in libraries is a time-consuming and costly process and the quality of the assigned subjects is affected by the cataloger's knowledge on the specific topics contained in the book. Trying to solve these issues, we exploited the opportunities arising from artificial intelligence to develop Kratt: a prototype of an automatic subject indexing tool. Kratt is able to subject index a book independent of its extent and genre with a set of keywords present in the Estonian Subject Thesaurus. It takes Kratt approximately one minute to subject index a book, outperforming humans 10-15 times. Although the resulting keywords were not considered satisfactory by the catalogers, the ratings of a small sample of regular library users showed more promise. We also argue that the results can be enhanced by including a bigger corpus for training the model and applying more careful preprocessing techniques.

Biebricher, N.; Fuhr, N.; Lustig, G.; Schwantner, M.; Knorz, G.: ¬The automatic indexing system AIR/PHYS : from research to application (1988) 0.01

0.013916564 = product of:
  0.06958282 = sum of:
    0.06958282 = weight(_text_:22 in 1952) [ClassicSimilarity], result of:
      0.06958282 = score(doc=1952,freq=2.0), product of:
        0.1798465 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.051357865 = queryNorm
        0.38690117 = fieldWeight in 1952, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.078125 = fieldNorm(doc=1952)
  0.2 = coord(1/5)

Date: 16. 8.1998 12:51:22

Tsareva, P.V.: Algoritmy dlya raspoznavaniya pozitivnykh i negativnykh vkhozdenii deskriptorov v tekst i protsedura avtomaticheskoi klassifikatsii tekstov (1999) 0.01

0.013916564 = product of:
  0.06958282 = sum of:
    0.06958282 = weight(_text_:22 in 374) [ClassicSimilarity], result of:
      0.06958282 = score(doc=374,freq=2.0), product of:
        0.1798465 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.051357865 = queryNorm
        0.38690117 = fieldWeight in 374, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.078125 = fieldNorm(doc=374)
  0.2 = coord(1/5)

Date: 1. 4.2002 10:22:41

Stankovic, R. et al.: Indexing of textual databases based on lexical resources : a case study for Serbian (2016) 0.01

0.013916564 = product of:
  0.06958282 = sum of:
    0.06958282 = weight(_text_:22 in 2759) [ClassicSimilarity], result of:
      0.06958282 = score(doc=2759,freq=2.0), product of:
        0.1798465 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.051357865 = queryNorm
        0.38690117 = fieldWeight in 2759, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.078125 = fieldNorm(doc=2759)
  0.2 = coord(1/5)

Date: 1. 2.2016 18:25:22

Groß, T.; Faden, M.: Automatische Indexierung elektronischer Dokumente an der Deutschen Zentralbibliothek für Wirtschaftswissenschaften : Bericht über die Jahrestagung der Internationalen Buchwissenschaftlichen Gesellschaft (2010) 0.01
```
0.013708933 = product of:
  0.06854466 = sum of:
    0.06854466 = weight(_text_:thesaurus in 4051) [ClassicSimilarity], result of:
      0.06854466 = score(doc=4051,freq=4.0), product of:
        0.23732872 = queryWeight, product of:
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.051357865 = queryNorm
        0.2888174 = fieldWeight in 4051, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.03125 = fieldNorm(doc=4051)
  0.2 = coord(1/5)
```
Abstract

Mit der Anfang 2010 begonnen Implementierung und Ergebnisevaluierung des automatischen Indexierungsverfahrens "Decisiv Categorization" der Firma Recommind soll das hier skizzierte Informationsstrukturierungsproblem in zwei Schritten gelöst werden. Kurz- bis mittelfristig soll die intellektuelle Indexierung durch ein semiautomatisches Verfahren6 unterstützt werden. Mittel- bis langfristig soll das maschinelle Verfahren, aufbauend auf einem entsprechenden Training, in die Lage versetzt werden, sowohl im Hause vorliegende Dokumente vollautomatisch zu indexieren als auch ZBW-fremde digitale Informationsressourcen zu verschlagworten bzw. zu klassifizieren, um sie in einem gemeinsamen Suchraum auffindbar machen zu können. Im Anschluss an diese Einleitung werden die ersten Ansätze maschineller Sacherschließung an der ZBW (2001-2004) und deren Ergebnisse und Problemlagen aufgezeigt. Danach werden die Rahmenbedingungen (Projektauftrag und -ziel) für eine Wiederaufnahme des Vorhabens im Jahre 2009 aufgezeigt, gefolgt von einer Darstellung der Funktionsweise der Recommind-Technologie und deren Einsatz im Rahmen der Sacherschließung von Online-Dokumenten mit einem Thesaurus. Schwerpunkt dieser Abhandlung bilden im Anschluss daran die Evaluierungsmöglichkeiten automatischer Indexierungsansätze sowie die aktuellen Ergebnisse und zentralen Erkenntnisse des Einsatzes im Kontext der ZBW. Das Fazit beschreibt die entsprechenden Schlussfolgerungen aus den erzielten Ergebnissen sowie den Ausblick auf das weitere Vorgehen.

Object

Standard-Thesaurus Wirtschaft
Chung, Y.M.; Lee, J.Y.: ¬A corpus-based approach to comparative evaluation of statistical term association measures (2001) 0.01
```
0.012117098 = product of:
  0.06058549 = sum of:
    0.06058549 = weight(_text_:thesaurus in 5769) [ClassicSimilarity], result of:
      0.06058549 = score(doc=5769,freq=2.0), product of:
        0.23732872 = queryWeight, product of:
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.051357865 = queryNorm
        0.2552809 = fieldWeight in 5769, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5769)
  0.2 = coord(1/5)
```
Abstract

Statistical association measures have been widely applied in information retrieval research, usually employing a clustering of documents or terms on the basis of their relationships. Applications of the association measures for term clustering include automatic thesaurus construction and query expansion. This research evaluates the similarity of six association measures by comparing the relationship and behavior they demonstrate in various analyses of a test corpus. Analysis techniques include comparisons of highly ranked term pairs and term clusters, analyses of the correlation among the association measures using Pearson's correlation coefficient and MDS mapping, and an analysis of the impact of a term frequency on the association values by means of z-score. The major findings of the study are as follows: First, the most similar association measures are mutual information and Yule's coefficient of colligation Y, whereas cosine and Jaccard coefficients, as well as X**2 statistic and likelihood ratio, demonstrate quite similar behavior for terms with high frequency. Second, among all the measures, the X**2 statistic is the least affected by the frequency of terms. Third, although cosine and Jaccard coefficients tend to emphasize high frequency terms, mutual information and Yule's Y seem to overestimate rare terms
Strobel, S.: Englischsprachige Erweiterung des TIB / AV-Portals : Ein GND/DBpedia-Mapping zur Gewinnung eines englischen Begriffssystems (2014) 0.01
```
0.012117098 = product of:
  0.06058549 = sum of:
    0.06058549 = weight(_text_:thesaurus in 2876) [ClassicSimilarity], result of:
      0.06058549 = score(doc=2876,freq=2.0), product of:
        0.23732872 = queryWeight, product of:
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.051357865 = queryNorm
        0.2552809 = fieldWeight in 2876, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2876)
  0.2 = coord(1/5)
```
Abstract

Die Videos des TIB / AV-Portals werden mit insgesamt 63.356 GND-Sachbegriffen aus Naturwissenschaft und Technik automatisch verschlagwortet. Neben den deutschsprachigen Videos verfügt das TIB / AV-Portal auch über zahlreiche englischsprachige Videos. Die GND enthält zu den in der TIB / AV-Portal-Wissensbasis verwendeten Sachbegriffen nur sehr wenige englische Bezeichner. Es fehlt demnach ein englisches Indexierungsvokabular, mit dem die englischsprachigen Videos automatisch verschlagwortet werden können. Die Lösung dieses Problems sieht wie folgt aus: Die englischen Bezeichner sollen über ein Mapping der GND-Sachbegriffe auf andere Datensätze gewonnen werden, die eine englische Übersetzung der Begriffe enthalten. Die verwendeten Mappingstrategien nutzen die DBpedia, LCSH, MACS-Ergebnisse sowie den WTI-Thesaurus. Am Ende haben 35.025 GND-Sachbegriffe (mindestens) einen englischen Bezeichner ermittelt bekommen. Diese englischen Bezeichner können für die automatische Verschlagwortung der englischsprachigen Videos unmittelbar herangezogen werden. 11.694 GND-Sachbegriffe konnten zwar nicht ins Englische "übersetzt", aber immerhin mit einem Oberbegriff assoziiert werden, der eine englische Übersetzung hat. Diese Assoziation dient der Erweiterung der Suchergebnisse.
Groß, T.: Automatische Indexierung von Dokumenten in einer wissenschaftlichen Bibliothek : Implementierung und Evaluierung am Beispiel der Deutschen Zentralbibliothek für Wirtschaftswissenschaften (2011) 0.01
```
0.012117098 = product of:
  0.06058549 = sum of:
    0.06058549 = weight(_text_:thesaurus in 1083) [ClassicSimilarity], result of:
      0.06058549 = score(doc=1083,freq=2.0), product of:
        0.23732872 = queryWeight, product of:
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.051357865 = queryNorm
        0.2552809 = fieldWeight in 1083, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1083)
  0.2 = coord(1/5)
```
Abstract

Die Bewertung der Indexierungsqualität bzw. -güte ist ein grundlegendes Problem von intellektuellen und automatischen Indexierungsverfahren. Letztere werden aber gerade im digitalen Zeitalter als einzige Möglichkeit angesehen, den zunehmenden Schwierigkeiten bibliothekarischer Informationsstrukturierung gerecht zu werden. Diese Studie befasst sich mit der Funktionsweise, Implementierung und Evaluierung der Sacherschließungssoftware MindServer Categorizer der Firma Recommind an der Deutschen Zentralbibliothek für Wirtschaftswissenschaften. Grundlage der maschinellen Sacherschließung und anschließenden quantitativen und qualitativen Auswertung bilden rund 39.000 wirtschaftswissenschaftliche Dokumente aus den Datenbanken Econis und EconStor. Unter Zuhilfenahme des rund 6.000 Schlagwörter umfassenden Standard-Thesaurus Wirtschaft wird der ursprünglich rein statistische Indexierungsansatz des MindServer Categorizer zu einem begriffsorientierten Verfahren weiterentwickelt und zur Inhaltserschließung digitaler Informationsressourcen eingesetzt. Der zentrale Fokus dieser Studie liegt vor allem auf der Evaluierung der maschinell beschlagworteten Titel, in Anlehnung an die hierzu von Stock und Lancaster vorgeschlagenen Kriterien: Indexierungskonsistenz, -tiefe, -breite, -spezifität, -effektivität. Weiterhin wird die Belegungsbilanz des STW evaluiert und es erfolgt zusätzlich eine qualitative, stichprobenartige Bewertung der Ergebnisse seitens der zuständigen Fachreferenten und -referentinnen.
Groß, T.: Automatische Indexierung von wirtschaftswissenschaftlichen Dokumenten : Implementierung und Evaluierung am Beispiel der Deutschen Zentralbibliothek für Wirtschaftswissenschaften (2010) 0.01
```
0.012117098 = product of:
  0.06058549 = sum of:
    0.06058549 = weight(_text_:thesaurus in 2078) [ClassicSimilarity], result of:
      0.06058549 = score(doc=2078,freq=2.0), product of:
        0.23732872 = queryWeight, product of:
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.051357865 = queryNorm
        0.2552809 = fieldWeight in 2078, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2078)
  0.2 = coord(1/5)
```
Abstract

Die Bewertung der Indexierungsqualität bzw. -güte ist ein grundlegendes Problem von manuellen und automatischen Indexierungsverfahren. Letztere werden aber gerade im digitalen Zeitalter als einzige Möglichkeit angesehen, den zunehmenden Schwierigkeiten bibliothekarischer Informationsstrukturierung gerecht zu werden. Diese Arbeit befasst sich mit der Funktionsweise, Implementierung und Evaluierung der Sacherschließungssoftware MindServer Categorizer, der Firma Recommind, an der Deutschen Zentralbibliothek für Wirtschaftswissenschaften (ZBW). Grundlage der maschinellen Sacherschließung und anschließenden quantitativen und qualitativen Auswertung bilden rund 39.000 wirtschaftswissenschaftliche Dokumente aus den Datenbanken Econis und EconStor. Unter Zuhilfenahme des rund 6.000 Deskriptoren umfassenden Standard-Thesaurus Wirtschaft (STW) wird der ursprünglich rein statistische Indexierungsansatz des MindServer Categorizer zu einem begriffsorientierten Verfahren weiterentwickelt und zur Inhaltserschließung digitaler Informationsressourcen eingesetzt. Der zentrale Fokus dieser Arbeit liegt vor allem auf der Evaluierung der maschinell beschlagworteten Titel, in Anlehnung und entsprechender Anpassung der von Stock (2008) und Lancaster (2003) hierzu vorgeschlagenen Kriterien: Indexierungskonsistenz, -tiefe, -breite, -spezifität, -effektivität. Zusätzlich wird die Belegungsbilanz des STW evaluiert und es erfolgt ferner eine stichprobenartige, qualitative Bewertung der Ergebnisse seitens der zuständigen Fachreferenten und -referentinnen.
Vlachidis, A.; Tudhope, D.: ¬A knowledge-based approach to information extraction for semantic interoperability in the archaeology domain (2016) 0.01
```
0.012117098 = product of:
  0.06058549 = sum of:
    0.06058549 = weight(_text_:thesaurus in 2895) [ClassicSimilarity], result of:
      0.06058549 = score(doc=2895,freq=2.0), product of:
        0.23732872 = queryWeight, product of:
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.051357865 = queryNorm
        0.2552809 = fieldWeight in 2895, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2895)
  0.2 = coord(1/5)
```
Abstract

The article presents a method for automatic semantic indexing of archaeological grey-literature reports using empirical (rule-based) Information Extraction techniques in combination with domain-specific knowledge organization systems. The semantic annotation system (OPTIMA) performs the tasks of Named Entity Recognition, Relation Extraction, Negation Detection, and Word-Sense Disambiguation using hand-crafted rules and terminological resources for associating contextual abstractions with classes of the standard ontology CIDOC Conceptual Reference Model (CRM) for cultural heritage and its archaeological extension, CRM-EH. Relation Extraction (RE) performance benefits from a syntactic-based definition of RE patterns derived from domain oriented corpus analysis. The evaluation also shows clear benefit in the use of assistive natural language processing (NLP) modules relating to Word-Sense Disambiguation, Negation Detection, and Noun Phrase Validation, together with controlled thesaurus expansion. The semantic indexing results demonstrate the capacity of rule-based Information Extraction techniques to deliver interoperable semantic abstractions (semantic annotations) with respect to the CIDOC CRM and archaeological thesauri. Major contributions include recognition of relevant entities using shallow parsing NLP techniques driven by a complimentary use of ontological and terminological domain resources and empirical derivation of context-driven RE rules for the recognition of semantic relationships from phrases of unstructured text.
Villaespesa, E.; Crider, S.: ¬A critical comparison analysis between human and machine-generated tags for the Metropolitan Museum of Art's collection (2021) 0.01
```
0.012117098 = product of:
  0.06058549 = sum of:
    0.06058549 = weight(_text_:thesaurus in 341) [ClassicSimilarity], result of:
      0.06058549 = score(doc=341,freq=2.0), product of:
        0.23732872 = queryWeight, product of:
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.051357865 = queryNorm
        0.2552809 = fieldWeight in 341, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.0390625 = fieldNorm(doc=341)
  0.2 = coord(1/5)
```
Abstract

Purpose Based on the highlights of The Metropolitan Museum of Art's collection, the purpose of this paper is to examine the similarities and differences between the subject keywords tags assigned by the museum and those produced by three computer vision systems. Design/methodology/approach This paper uses computer vision tools to generate the data and the Getty Research Institute's Art and Architecture Thesaurus (AAT) to compare the subject keyword tags. Findings This paper finds that there are clear opportunities to use computer vision technologies to automatically generate tags that expand the terms used by the museum. This brings a new perspective to the collection that is different from the traditional art historical one. However, the study also surfaces challenges about the accuracy and lack of context within the computer vision results. Practical implications This finding has important implications on how these machine-generated tags complement the current taxonomies and vocabularies inputted in the collection database. In consequence, the museum needs to consider the selection process for choosing which computer vision system to apply to their collection. Furthermore, they also need to think critically about the kind of tags they wish to use, such as colors, materials or objects. Originality/value The study results add to the rapidly evolving field of computer vision within the art information context and provide recommendations of aspects to consider before selecting and implementing these technologies.
Ahmed, M.: Automatic indexing for agriculture : designing a framework by deploying Agrovoc, Agris and Annif (2023) 0.01
```
0.012117098 = product of:
  0.06058549 = sum of:
    0.06058549 = weight(_text_:thesaurus in 1024) [ClassicSimilarity], result of:
      0.06058549 = score(doc=1024,freq=2.0), product of:
        0.23732872 = queryWeight, product of:
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.051357865 = queryNorm
        0.2552809 = fieldWeight in 1024, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1024)
  0.2 = coord(1/5)
```
Abstract

There are several ways to employ machine learning for automating subject indexing. One popular strategy is to utilize a supervised learning algorithm to train a model on a set of documents that have been manually indexed by subject matter using a standard vocabulary. The resulting model can then predict the subject of new and previously unseen documents by identifying patterns learned from the training data. To do this, the first step is to gather a large dataset of documents and manually assign each document a set of subject keywords/descriptors from a controlled vocabulary (e.g., from Agrovoc). Next, the dataset (obtained from Agris) can be divided into - i) a training dataset, and ii) a test dataset. The training dataset is used to train the model, while the test dataset is used to evaluate the model's performance. Machine learning can be a powerful tool for automating the process of subject indexing. This research is an attempt to apply Annif (http://annif. org/), an open-source AI/ML framework, to autogenerate subject keywords/descriptors for documentary resources in the domain of agriculture. The training dataset is obtained from Agris, which applies the Agrovoc thesaurus as a vocabulary tool (https://www.fao.org/agris/download).
Maas, H.-D.: Indexieren mit AUTINDEX (2006) 0.01
```
0.0119953165 = product of:
  0.05997658 = sum of:
    0.05997658 = weight(_text_:thesaurus in 6077) [ClassicSimilarity], result of:
      0.05997658 = score(doc=6077,freq=4.0), product of:
        0.23732872 = queryWeight, product of:
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.051357865 = queryNorm
        0.25271523 = fieldWeight in 6077, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.6210785 = idf(docFreq=1182, maxDocs=44218)
          0.02734375 = fieldNorm(doc=6077)
  0.2 = coord(1/5)
```
Abstract

Wenn man ein Computerprogramm besitzt, das einem zu fast jedem Textwort dessen grammatische Merkmale bestimmt und außerdem noch seine interne Struktur und einige semantische Informationen liefert, dann fragt man sich irgendwann: Könnte ich nicht auf der Grundlage dieser Angaben einen Text global charakterisieren, etwa indem ich versuche, die wichtigen Wörter dieses Textes zu errechnen? Die häufigsten Textwörter können es nicht sein, denn gerade sie sind sehr nichtssagend. Die seltensten Textwörter sind zwar aussagekräftig, aber sie sind zu viele - die meisten Lemmata eines Textes erscheinen nur ein einziges Mal. Irgendwie müsste man den Wortschatz einschränken können. Die rettende Idee war: Wir tun so, als seien die semantischen Merkmale Wörter, denn dann enthält der Wortschatz dieser Sprache nur noch etwa hundert Elemente, weil unsere morphologische Analyse (Mpro) rund 100 semantische Features verwendet. Wir vermuteten nun, dass die häufig vorkommenden Features wichtig für den Text sind und die selteneren als Ausreißer betrachten werden können. Die Implementierung dieser Idee ist der Urahn unseres Programmpaketes AUTINDEX zur automatischen Indexierung von Texten. Dieses allererste Programm erstellte also zu einem Text eine Statistik der semantischen Merkmale und gab die drei häufigsten Klassen mit den zugehörigen Lemmata aus. Das Ergebnis war verblüffend: Auf den ersten Blick konnte man sehen, worum es in dem Text ging. Bei näherem Hinsehen wurden aber auch Unzulänglichkeiten deutlich. Einige der Schlagwörter waren doch ziemlich nichtssagend, andere hätte man gerne in der Liste gehabt, und schließlich hätte man sich noch eine ganz globale Charakterisierung des Textes durch die Angabe von Fachgebieten gewünscht, etwa in der Form: Der Text hat mit Politik oder Wirtschaft zu tun, er berichtet über einen Unfall, eine Feierlichkeit usw. Es wurde also sofort deutlich, dass das Programm ohne eine weitere Wissensquelle keine wirklich guten Ergebnisse würde liefern können. Man braucht also einen Thesaurus, ein Wörterbuch, in dem einzelne Lemmata und auch mehrwortige Ausdrücke mit zusätzlichen Informationen versehen sind.
Die erste Implementierung wurde in Zusammenarbeit mit dem Fachinformationszentrum Technik (Frankfurt) erstellt. Eine Kontrolle der manuell vergebenen Grob- und Feinklassifizierung der Lexikonartikel des Brockhaus Multimedial und anderer Brockhaus-Lexika wurde mit AUTINDEX in Zusammenarbeit mit BIFAB (Mannheim) durchgeführt. AUTINDEX ist auch Bestandteil des Indexierungs- und Retrievalsystems der Firma AGI (Neustadt/Weinstraße), das in der Landesbibliothek Vorarlberg eingesetzt wird. Weiterhin wird AUTINDEX im System LEWI verwendet, das zusammen mit BIFAB entwickelt wird. Dieses System erlaubt natürlichsprachliche Anfragen an den Brockhaus Multimedial und liefert als Antwort die relevanten Lexikonartikel. Im IAI selbst wurden große Textmengen indexiert (Brockhaus- und Dudenlexika, Zeitungstexte usw.), die man für die Weiterentwicklung diverser Thesauri und Wörterbücher nutzen kann. Beispielsweise kann man sich für ein Wort alle Texte ausgeben lassen, in denen dieses Wort wichtig ist. Dabei sind die Texte nach Wichtigkeit sortiert. Zu einem gegebenen Wort kann man sich auch die Assoziationen oder die möglichen Klassifikationen berechnen lassen. Auf diese Weise kann man einen Thesaurus halbautomatisch erweitern.

Tsujii, J.-I.: Automatic acquisition of semantic collocation from corpora (1995) 0.01

0.011133251 = product of:
  0.055666253 = sum of:
    0.055666253 = weight(_text_:22 in 4709) [ClassicSimilarity], result of:
      0.055666253 = score(doc=4709,freq=2.0), product of:
        0.1798465 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.051357865 = queryNorm
        0.30952093 = fieldWeight in 4709, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.0625 = fieldNorm(doc=4709)
  0.2 = coord(1/5)

Date: 31. 7.1996 9:22:19

Search (61 results, page 2 of 4)

Authors

Years

Languages

Types

Themes

Classifications