Diese Datenbank enthält über 40.000 Dokumente zu Themen aus den Bereichen Formalerschließung – Inhaltserschließung – Information Retrieval.
© 2015 W. Gödert, TH Köln, Institut für Informationswissenschaft / Powered by litecat, BIS Oldenburg (Stand: 04. Juni 2021)
1Kumpulainen, S. ; Keskustalo, H. ; Zhang, B. ; Stefanidis, K.: Historical reasoning in authentic research tasks : mapping cognitive and document spaces.
In: Journal of the Association for Information Science and Technology. 71(2020) no.2, S.230-241.
Abstract: To support historians in their work, we need to understand their work-related needs and propose what is required to support those needs. Although the quantity of digitized historical documents available is increasing, historians' ways of working with the digital documents have not been widely studied, particularly in authentic work settings. To better support the historians' reasoning processes, we investigate history researchers' work tasks as the context of information interaction and examine their cognitive access points into information. The analysis is based on a longitudinal observational research and interviews in a task-based research setting. Based on these findings in the historians' cognitive space, we build bridges into the document space. By studying the information interactions in real task contexts, we facilitate the provision of task-specific handles into documents that can be used in designing digital research tools for historians.
Inhalt: Vgl.: https://asistdl.onlinelibrary.wiley.com/doi/10.1002/asi.24216.
2Ferro, N. ; Silvello, G. ; Keskustalo, H. ; Pirkola, A. ; Järvelin, K.: ¬The twist measure for IR evaluation : taking user's effort into account.
In: Journal of the Association for Information Science and Technology. 67(2016) no.3, S.620-648.
Abstract: We present a novel measure for ranking evaluation, called Twist (t). It is a measure for informational intents, which handles both binary and graded relevance. t stems from the observation that searching is currently a that searching is currently taken for granted and it is natural for users to assume that search engines are available and work well. As a consequence, users may assume the utility they have in finding relevant documents, which is the focus of traditional measures, as granted. On the contrary, they may feel uneasy when the system returns nonrelevant documents because they are then forced to do additional work to get the desired information, and this causes avoidable effort. The latter is the focus of t, which evaluates the effectiveness of a system from the point of view of the effort required to the users to retrieve the desired information. We provide a formal definition of t, a demonstration of its properties, and introduce the notion of effort/gain plots, which complement traditional utility-based measures. By means of an extensive experimental evaluation, t is shown to grasp different aspects of system performances, to not require extensive and costly assessments, and to be a robust tool for detecting differences between systems.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23416/abstract.
3Järvelin, A. ; Keskustalo, H. ; Sormunen, E. ; Saastamoinen, M. ; Kettunen, K.: Information retrieval from historical newspaper collections in highly inflectional languages : a query expansion approach.
In: Journal of the Association for Information Science and Technology. 67(2016) no.12, S.2928-2946.
Abstract: The aim of the study was to test whether query expansion by approximate string matching methods is beneficial in retrieval from historical newspaper collections in a language rich with compounds and inflectional forms (Finnish). First, approximate string matching methods were used to generate lists of index words most similar to contemporary query terms in a digitized newspaper collection from the 1800s. Top index word variants were categorized to estimate the appropriate query expansion ranges in the retrieval test. Second, the effectiveness of approximate string matching methods, automatically generated inflectional forms, and their combinations were measured in a Cranfield-style test. Finally, a detailed topic-level analysis of test results was conducted. In the index of historical newspaper collection the occurrences of a word typically spread to many linguistic and historical variants along with optical character recognition (OCR) errors. All query expansion methods improved the baseline results. Extensive expansion of around 30 variants for each query word was required to achieve the highest performance improvement. Query expansion based on approximate string matching was superior to using the inflectional forms of the query words, showing that coverage of the different types of variation is more important than precision in handling one type of variation.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23379/full.
Themenfeld: Computerlinguistik ; Semantisches Umfeld in Indexierung u. Retrieval
Behandelte Form: Zeitungen
4Lehtokangas, R. ; Keskustalo, H. ; Järvelin, K.: Experiments with transitive dictionary translation and pseudo-relevance feedback using graded relevance assessments.
In: Journal of the American Society for Information Science and Technology. 59(2008) no.3, S.476-488.
Abstract: In this article, the authors present evaluation results for transitive dictionary-based cross-language information retrieval (CLIR) using graded relevance assessments in a best match retrieval environment. A text database containing newspaper articles and a related set of 35 search topics were used in the tests. Source language topics (in English, German, and Swedish) were automatically translated into the target language (Finnish) via an intermediate (or pivot) language. Effectiveness of the transitively translated queries was compared to that of the directly translated and monolingual Finnish queries. Pseudo-relevance feedback (PRF) was also used to expand the original transitive target queries. Cross-language information retrieval performance was evaluated on three relevance thresholds: stringent, regular, and liberal. The transitive translations performed well achieving, on the average, 85-93% of the direct translation performance, and 66-72% of monolingual performance. Moreover, PRF was successful in raising the performance of transitive translation routes in absolute terms as well as in relation to monolingual and direct translation performance applying PRF.
Themenfeld: Multilinguale Probleme
5Toivonen, J. ; Pirkola, A. ; Keskustalo, H. ; Visala, K. ; Järvelin, K.: Translating cross-lingual spelling variants using transformation rules.
In: Information processing and management. 41(2005) no.4, S.859-872.
Abstract: Technical terms and proper names constitute a major problem in dictionary-based cross-language information retrieval (CLIR). However, technical terms and proper names in different languages often share the same Latin or Greek origin, being thus spelling variants of each other. In this paper we present a novel two-step fuzzy translation technique for cross-lingual spelling variants. In the first step, transformation rules are applied to source words to render them more similar to their target language equivalents. The rules are generated automatically using translation dictionaries as source data. In the second step, the intermediate forms obtained in the first step are translated into a target language using fuzzy matching. The effectiveness of the technique was evaluated empirically using five source languages and English as a target language. The two-step technique performed better, in some cases considerably better, than fuzzy matching alone. Even using the first step as such showed promising results.
Themenfeld: Multilinguale Probleme
7Järvelin, K. ; Kristensen, J. ; Niemi, T. ; Sormunen, E. ; Keskustalo, H.: ¬A deductive data model for query expansion.
In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (ACM SIGIR '96), Zürich, Switzerland, August 18-22, 1996. Eds.: H.P. Frei et al. New York, NY : ACM, 1996. S.235-243.
Abstract: We present a deductive data model for concept-based query expansion. It is based on three abstraction levels: the conceptual, linguistic and occurrence levels. Concepts and relationships among them are represented at the conceptual level. The expression level represents natural language expressions for concepts. Each expression has one or more matching models at the occurrence level. Each model specifies the matching of the expression in database indices built in varying ways. The data model supports a concept-based query expansion and formulation tool, the ExpansionTool, for environments providing heterogeneous IR systems. Expansion is controlled by adjustable matching reliability.
Themenfeld: Semantisches Umfeld in Indexierung u. Retrieval ; Wissensrepräsentation