Diese Datenbank enthält über 40.000 Dokumente zu Themen aus den Bereichen Formalerschließung – Inhaltserschließung – Information Retrieval.
© 2015 W. Gödert, TH Köln, Institut für Informationswissenschaft / Powered by litecat, BIS Oldenburg (Stand: 28. April 2022)
1Ferro, N. ; Silvello, G.: Toward an anatomy of IR system component performances.
In: Journal of the Association for Information Science and Technology. 69(2018) no.2, S.187-200.
Abstract: Information retrieval (IR) systems are the prominent means for searching and accessing huge amounts of unstructured information on the web and elsewhere. They are complex systems, constituted by many different components interacting together, and evaluation is crucial to both tune and improve them. Nevertheless, in the current evaluation methodology, there is still no way to determine how much each component contributes to the overall performances and how the components interact together. This hampers the possibility of a deep understanding of IR system behavior and, in turn, prevents us from designing ahead which components are best suited to work together for a specific search task. In this paper, we move the evaluation methodology one step forward by overcoming these barriers and beginning to devise an "anatomy" of IR systems and their internals. In particular, we propose a methodology based on the General Linear Mixed Model (GLMM) and analysis of variance (ANOVA) to develop statistical models able to isolate system variance and component effects as well as their interaction, by relying on a grid of points (GoP) containing all the combinations of the analyzed components. We apply the proposed methodology to the analysis of two relevant search tasks-news search and web search-by using standard TREC collections. We analyze the basic set of components typically part of an IR system, namely, stop lists, stemmers, and n-grams, and IR models. In this way, we derive insights about English text retrieval.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23910/full.
2Angelini, M. ; Fazzini, V. ; Ferro, N. ; Santucci, G. ; Silvello, G.: CLAIRE: A combinatorial visual analytics system for information retrieval evaluation.
In: Information processing and management. 54(2018) no.6, S.1077-1100.
Abstract: Information Retrieval (IR) develops complex systems, constituted of several components, which aim at returning and optimally ranking the most relevant documents in response to user queries. In this context, experimental evaluation plays a central role, since it allows for measuring IR systems effectiveness, increasing the understanding of their functioning, and better directing the efforts for improving them. Current evaluation methodologies are limited by two major factors: (i) IR systems are evaluated as "black boxes", since it is not possible to decompose the contributions of the different components, e.g., stop lists, stemmers, and IR models; (ii) given that it is not possible to predict the effectiveness of an IR system, both academia and industry need to explore huge numbers of systems, originated by large combinatorial compositions of their components, to understand how they perform and how these components interact together. We propose a Combinatorial visuaL Analytics system for Information Retrieval Evaluation (CLAIRE) which allows for exploring and making sense of the performances of a large amount of IR systems, in order to quickly and intuitively grasp which system configurations are preferred, what are the contributions of the different components and how these components interact together. The CLAIRE system is then validated against use cases based on several test collections using a wide set of systems, generated by a combinatorial composition of several off-the-shelf components, representing the most common denominator almost always present in English IR systems. In particular, we validate the findings enabled by CLAIRE with respect to consolidated deep statistical analyses and we show that the CLAIRE system allows the generation of new insights, which were not detectable with traditional approaches.
Inhalt: Vgl.: https://doi.org/10.1016/j.ipm.2018.04.006.
3Ferro, N. ; Silvello, G. ; Keskustalo, H. ; Pirkola, A. ; Järvelin, K.: ¬The twist measure for IR evaluation : taking user's effort into account.
In: Journal of the Association for Information Science and Technology. 67(2016) no.3, S.620-648.
Abstract: We present a novel measure for ranking evaluation, called Twist (t). It is a measure for informational intents, which handles both binary and graded relevance. t stems from the observation that searching is currently a that searching is currently taken for granted and it is natural for users to assume that search engines are available and work well. As a consequence, users may assume the utility they have in finding relevant documents, which is the focus of traditional measures, as granted. On the contrary, they may feel uneasy when the system returns nonrelevant documents because they are then forced to do additional work to get the desired information, and this causes avoidable effort. The latter is the focus of t, which evaluates the effectiveness of a system from the point of view of the effort required to the users to retrieve the desired information. We provide a formal definition of t, a demonstration of its properties, and introduce the notion of effort/gain plots, which complement traditional utility-based measures. By means of an extensive experimental evaluation, t is shown to grasp different aspects of system performances, to not require extensive and costly assessments, and to be a robust tool for detecting differences between systems.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23416/abstract.
4Ferro, N. ; Silvello, G.: NESTOR: a formal model for digital archives.
In: Information processing and management. 49(2013) no.6, S.1206-1240.
Abstract: Archives are an extremely valuable part of our cultural heritage since they represent the trace of the activities of a physical or juridical person in the course of their business. Despite their importance, the models and technologies that have been developed over the past two decades in the Digital Library (DL) field have not been specifically tailored to archives. This is especially true when it comes to formal and foundational frameworks, as the Streams, Structures, Spaces, Scenarios, Societies (5S) model is. Therefore, we propose an innovative formal model, called NEsted SeTs for Object hieRarchies (NESTOR), for archives, explicitly built around the concepts of context and hierarchy which play a central role in the archival realm. NESTOR is composed of two set-based data models: the Nested Sets Model (NS-M) and the Inverse Nested Sets Model (INS-M) that express the hierarchical relationships between objects through the inclusion property between sets. We formally study the properties of these models and prove their equivalence with the notion of hierarchy entailed by archives. We then use NESTOR to extend the 5S model in order to take into account the specific features of archives and to tailor the notion of digital library accordingly. This offers the possibility of opening up the full wealth of DL methods and technologies to archives. We demonstrate the impact of NESTOR on this problem through three example use cases.
Inhalt: Vgl.: doi: 10.1016/j.ipm.2013.05.001.
Themenfeld: Elektronisches Publizieren
5Agosti, M. ; Braschler, M. ; Ferro, N. ; Peters, C. ; Siebinga, S.: Roadmap for multiLingual information access in the European Library.
In: Research and advanced technology for digital libraries : 11th European conference, ECDL 2007 / Budapest, Hungary, September 16-21, 2007, proceedings. Eds.: L. Kovacs et al. Berlin : Springer, 2007. S.136-147.
(Lecture notes in computer science ; vol. 4675)
Abstract: The paper studies the problem of implementing MultiLingual Information Access (MLIA) functionality in The European Library (TEL). The issues that must be considered are described in detail and the results of a preliminary feasibility study are presented. The paper concludes by discussing the difficulties inherent in attempting to provide a realistic full-scale MLIA solution and proposes a roadmap aimed at determining whether this is in fact possible.
Themenfeld: Multilinguale Probleme
Objekt: TEL ; EDL
6Bacchin, M. ; Ferro, N. ; Melucci, M.: ¬A probabilistic model for stemmer generation.
In: Information processing and management. 41(2005) no.1, S.121-137.
Abstract: In this paper we will present a language-independent probabilistic model which can automatically generate stemmers. Stemmers can improve the retrieval effectiveness of information retrieval systems, however the designing and the implementation of stemmers requires a laborious amount of effort due to the fact that documents and queries are often written or spoken in several different languages. The probabilistic model proposed in this paper aims at the development of stemmers used for several languages. The proposed model describes the mutual reinforcement relationship between stems and derivations and then provides a probabilistic interpretation. A series of experiments shows that the stemmers generated by the probabilistic model are as effective as the ones based on linguistic knowledge.