Diese Datenbank enthält über 40.000 Dokumente zu Themen aus den Bereichen Formalerschließung – Inhaltserschließung – Information Retrieval.
© 2015 W. Gödert, TH Köln, Institut für Informationswissenschaft / Powered by litecat, BIS Oldenburg (Stand: 28. April 2022)
1Huang, L. ; Milne, D. ; Frank, E. ; Witten, I.H.: Learning a concept-based document similarity measure.
In: Journal of the American Society for Information Science and Technology. 63(2012) no.8, S.1593-1608.
Abstract: Document similarity measures are crucial components of many text-analysis tasks, including information retrieval, document classification, and document clustering. Conventional measures are brittle: They estimate the surface overlap between documents based on the words they mention and ignore deeper semantic connections. We propose a new measure that assesses similarity at both the lexical and semantic levels, and learns from human judgments how to combine them by using machine-learning techniques. Experiments show that the new measure produces values for documents that are more consistent with people's judgments than people are with each other. We also use it to classify and cluster large document sets covering different genres and topics, and find that it improves both classification and clustering performance.
Themenfeld: Semantisches Umfeld in Indexierung u. Retrieval
2Frank, E. ; Paynter, G.W.: Predicting Library of Congress Classifications from Library of Congress Subject Headings.
In: Journal of the American Society for Information Science and technology. 55(2004) no.3, S.214-227.
Abstract: This paper addresses the problem of automatically assigning a Library of Congress Classification (LCC) to a work given its set of Library of Congress Subject Headings (LCSH). LCCs are organized in a tree: The root node of this hierarchy comprises all possible topics, and leaf nodes correspond to the most specialized topic areas defined. We describe a procedure that, given a resource identified by its LCSH, automatically places that resource in the LCC hierarchy. The procedure uses machine learning techniques and training data from a large library catalog to learn a model that maps from sets of LCSH to classifications from the LCC tree. We present empirical results for our technique showing its accuracy an an independent collection of 50,000 LCSH/LCC pairs.
Themenfeld: Automatisches Klassifizieren
Objekt: LCC ; LCSH