Diese Datenbank enthält über 40.000 Dokumente zu Themen aus den Bereichen Formalerschließung – Inhaltserschließung – Information Retrieval.
© 2015 W. Gödert, TH Köln, Institut für Informationswissenschaft / Powered by litecat, BIS Oldenburg (Stand: 04. Juni 2021)
1Aletras, N. ; Baldwin, T. ; Lau, J.H. ; Stevenson, M.: Evaluating topic representations for exploring document collections.
In: Journal of the Association for Information Science and Technology. 68(2017) no.1, S.154-167.
Abstract: Topic models have been shown to be a useful way of representing the content of large document collections, for example, via visualization interfaces (topic browsers). These systems enable users to explore collections by way of latent topics. A standard way to represent a topic is using a term list; that is the top-n words with highest conditional probability within the topic. Other topic representations such as textual and image labels also have been proposed. However, there has been no comparison of these alternative representations. In this article, we compare 3 different topic representations in a document retrieval task. Participants were asked to retrieve relevant documents based on predefined queries within a fixed time limit, presenting topics in one of the following modalities: (a) lists of terms, (b) textual phrase labels, and (c) image labels. Results show that textual labels are easier for users to interpret than are term lists and image labels. Moreover, the precision of retrieved documents for textual and image labels is comparable to the precision achieved by representing topics using term lists, demonstrating that labeling methods are an effective alternative topic representation.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23574/full.
2Gonzalez-Agirre, A. ; Rigau, G. ; Agirre, E. ; Aletras, N. ; Stevenson, M.: Why are these similar? : investigating item similarity types in a large digital library.
In: Journal of the Association for Information Science and Technology. 67(2016) no.7, S.1624-1638.
Abstract: We introduce a new problem, identifying the type of relation that holds between a pair of similar items in a digital library. Being able to provide a reason why items are similar has applications in recommendation, personalization, and search. We investigate the problem within the context of Europeana, a large digital library containing items related to cultural heritage. A range of types of similarity in this collection were identified. A set of 1,500 pairs of items from the collection were annotated using crowdsourcing. A high intertagger agreement (average 71.5 Pearson correlation) was obtained and demonstrates that the task is well defined. We also present several approaches to automatically identifying the type of similarity. The best system applies linear regression and achieves a mean Pearson correlation of 71.3, close to human performance. The problem formulation and data set described here were used in a public evaluation exercise, the *SEM shared task on Semantic Textual Similarity. The task attracted the participation of 6 teams, who submitted 14 system runs. All annotations, evaluation scripts, and system runs are freely available.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23482/abstract.
3Plaza, L. ; Stevenson, M. ; Díaz, A.: Resolving ambiguity in biomedical text to improve summarization.
In: Information processing and management. 48(2012) no.4, S.755-766.
Abstract: Access to the vast body of research literature that is now available on biomedicine and related fields can be improved with automatic summarization. This paper describes a summarization system for the biomedical domain that represents documents as graphs formed from concepts and relations in the UMLS Metathesaurus. This system has to deal with the ambiguities that occur in biomedical documents. We describe a variety of strategies that make use of MetaMap and Word Sense Disambiguation (WSD) to accurately map biomedical documents onto UMLS Metathesaurus concepts. Evaluation is carried out using a collection of 150 biomedical scientific articles from the BioMed Central corpus. We find that using WSD improves the quality of the summaries generated.
Inhalt: Vgl.: doi:10.1016/j.ipm.2011.09.005.
Themenfeld: Automatisches Abstracting