Diese Datenbank enthält über 40.000 Dokumente zu Themen aus den Bereichen Formalerschließung – Inhaltserschließung – Information Retrieval.
© 2015 W. Gödert, TH Köln, Institut für Informationswissenschaft / Powered by litecat, BIS Oldenburg (Stand: 21. Januar 2019)
1Silvello, G.: Theory and practice of data citation.
In: Journal of the Association for Information Science and Technology. 69(2018) no.1, S.6-20.
Abstract: Citations are the cornerstone of knowledge propagation and the primary means of assessing the quality of research, as well as directing investments in science. Science is increasingly becoming "data-intensive," where large volumes of data are collected and analyzed to discover complex patterns through simulations and experiments, and most scientific reference works have been replaced by online curated data sets. Yet, given a data set, there is no quantitative, consistent, and established way of knowing how it has been used over time, who contributed to its curation, what results have been yielded, or what value it has. The development of a theory and practice of data citation is fundamental for considering data as first-class research objects with the same relevance and centrality of traditional scientific products. Many works in recent years have discussed data citation from different viewpoints: illustrating why data citation is needed, defining the principles and outlining recommendations for data citation systems, and providing computational methods for addressing specific issues of data citation. The current panorama is many-faceted and an overall view that brings together diverse aspects of this topic is still missing. Therefore, this paper aims to describe the lay of the land for data citation, both from the theoretical (the why and what) and the practical (the how) angle.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23917/full.
2Ferro, N. ; Silvello, G.: Toward an anatomy of IR system component performances.
In: Journal of the Association for Information Science and Technology. 69(2018) no.2, S.187-200.
Abstract: Information retrieval (IR) systems are the prominent means for searching and accessing huge amounts of unstructured information on the web and elsewhere. They are complex systems, constituted by many different components interacting together, and evaluation is crucial to both tune and improve them. Nevertheless, in the current evaluation methodology, there is still no way to determine how much each component contributes to the overall performances and how the components interact together. This hampers the possibility of a deep understanding of IR system behavior and, in turn, prevents us from designing ahead which components are best suited to work together for a specific search task. In this paper, we move the evaluation methodology one step forward by overcoming these barriers and beginning to devise an "anatomy" of IR systems and their internals. In particular, we propose a methodology based on the General Linear Mixed Model (GLMM) and analysis of variance (ANOVA) to develop statistical models able to isolate system variance and component effects as well as their interaction, by relying on a grid of points (GoP) containing all the combinations of the analyzed components. We apply the proposed methodology to the analysis of two relevant search tasks-news search and web search-by using standard TREC collections. We analyze the basic set of components typically part of an IR system, namely, stop lists, stemmers, and n-grams, and IR models. In this way, we derive insights about English text retrieval.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23910/full.
3Silvello, G.: Learning to cite framework : how to automatically construct citations for hierarchical data.
In: Journal of the Association for Information Science and Technology. 68(2017) no.6, S.1505-1524.
Abstract: The practice of citation is foundational for the propagation of knowledge along with scientific development and it is one of the core aspects on which scholarship and scientific publishing rely. Within the broad context of data citation, we focus on the automatic construction of citations problem for hierarchically structured data. We present the "learning to cite" framework, which enables the automatic construction of human- and machine-readable citations with different levels of coarseness. The main goal is to reduce the human intervention on data to a minimum and to provide a citation system general enough to work on heterogeneous and complex XML data sets. We describe how this framework can be realized by a system for creating citations to single nodes within an XML data set and, as a use case, show how it can be applied in the context of digital archives. We conduct an extensive evaluation of the proposed citation system by analyzing its effectiveness from the correctness and completeness viewpoints, showing that it represents a suitable solution that can be easily employed in real-world environments and that reduces human intervention on data to a minimum.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23774/full.
4Ferro, N. ; Silvello, G. ; Keskustalo, H. ; Pirkola, A. ; Järvelin, K.: ¬The twist measure for IR evaluation : taking user's effort into account.
In: Journal of the Association for Information Science and Technology. 67(2016) no.3, S.620-648.
Abstract: We present a novel measure for ranking evaluation, called Twist (t). It is a measure for informational intents, which handles both binary and graded relevance. t stems from the observation that searching is currently a that searching is currently taken for granted and it is natural for users to assume that search engines are available and work well. As a consequence, users may assume the utility they have in finding relevant documents, which is the focus of traditional measures, as granted. On the contrary, they may feel uneasy when the system returns nonrelevant documents because they are then forced to do additional work to get the desired information, and this causes avoidable effort. The latter is the focus of t, which evaluates the effectiveness of a system from the point of view of the effort required to the users to retrieve the desired information. We provide a formal definition of t, a demonstration of its properties, and introduce the notion of effort/gain plots, which complement traditional utility-based measures. By means of an extensive experimental evaluation, t is shown to grasp different aspects of system performances, to not require extensive and costly assessments, and to be a robust tool for detecting differences between systems.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23416/abstract.
5Ferro, N. ; Silvello, G.: NESTOR: a formal model for digital archives.
In: Information processing and management. 49(2013) no.6, S.1206-1240.
Abstract: Archives are an extremely valuable part of our cultural heritage since they represent the trace of the activities of a physical or juridical person in the course of their business. Despite their importance, the models and technologies that have been developed over the past two decades in the Digital Library (DL) field have not been specifically tailored to archives. This is especially true when it comes to formal and foundational frameworks, as the Streams, Structures, Spaces, Scenarios, Societies (5S) model is. Therefore, we propose an innovative formal model, called NEsted SeTs for Object hieRarchies (NESTOR), for archives, explicitly built around the concepts of context and hierarchy which play a central role in the archival realm. NESTOR is composed of two set-based data models: the Nested Sets Model (NS-M) and the Inverse Nested Sets Model (INS-M) that express the hierarchical relationships between objects through the inclusion property between sets. We formally study the properties of these models and prove their equivalence with the notion of hierarchy entailed by archives. We then use NESTOR to extend the 5S model in order to take into account the specific features of archives and to tailor the notion of digital library accordingly. This offers the possibility of opening up the full wealth of DL methods and technologies to archives. We demonstrate the impact of NESTOR on this problem through three example use cases.
Inhalt: Vgl.: doi: 10.1016/j.ipm.2013.05.001.
Themenfeld: Elektronisches Publizieren