Diese Datenbank enthält über 40.000 Dokumente zu Themen aus den Bereichen Formalerschließung – Inhaltserschließung – Information Retrieval.
© 2015 W. Gödert, TH Köln, Institut für Informationswissenschaft / Powered by litecat, BIS Oldenburg (Stand: 04. Juni 2021)
1Yigit-Sert, S. ; Altingovde, I.S. ; Macdonald, C. ; Ounis, I. ; Ulusoy, Ö,: Explicit diversification of search results across multiple dimensions for educational search.
In: Journal of the Association for Information Science and Technology. 72(2021) no.3, S.315-330.
Abstract: Making use of search systems to foster learning is an emerging research trend known as search as learning. Earlier works identified result diversification as a useful technique to support learning-oriented search, since diversification ensures a comprehensive coverage of various aspects of the queried topic in the result list. Inspired by this finding, first we define a new research problem, multidimensional result diversification, in the context of educational search. We argue that in a search engine for the education domain, it is necessary to diversify results across multiple dimensions, that is, not only for the topical aspects covered by the retrieved documents, but also for other dimensions, such as the type of the document (e.g., text, video, etc.) or its intellectual level (say, for beginners/experts). Second, we propose a framework that extends the probabilistic and supervised diversification methods to take into account the coverage of such multiple dimensions. We demonstrate its effectiveness upon a newly developed test collection based on a real-life educational search engine. Thorough experiments based on gathered relevance annotations reveal that the proposed framework outperforms the baseline by up to 2.4%. An alternative evaluation utilizing user clicks also yields improvements of up to 2% w.r.t. various metrics.
Inhalt: Vgl.: https://asistdl.onlinelibrary.wiley.com/doi/10.1002/asi.24403.
2Gray, A.J.G. ; Gray, N. ; Hall, C.W. ; Ounis, I.: Finding the right term : retrieving and exploring semantic concepts in astronomical vocabularies.
In: Information processing and management. 46(2010) no.4, S.470-478.
Abstract: Astronomy, like many domains, already has several sets of terminology in general use, referred to as controlled vocabularies. For example, the keywords for tagging journal articles, or the taxonomy of terms used to label image files. These existing vocabularies can be encoded into skos, a W3C proposed recommendation for representing vocabularies on the Semantic Web, so that computer systems can help users to search for and discover resources tagged with vocabulary concepts. However, this requires a search mechanism to go from a user-supplied string to a vocabulary concept. In this paper, we present our experiences in implementing the Vocabulary Explorer, a vocabulary search service based on the Terrier Information Retrieval Platform. We investigate the capabilities of existing document weighting models for identifying the correct vocabulary concept for a query. Due to the highly structured nature of a skos encoded vocabulary, we investigate the effects of term weighting (boosting the score of concepts that match on particular fields of a vocabulary concept), and query expansion. We found that the existing document weighting models provided very high quality results, but these could be improved further with the use of term weighting that makes use of the semantic evidence.
3Lioma, C. ; Ounis, I.: ¬A syntactically-based query reformulation technique for information retrieval.
In: Information processing and management. 44(2008) no.1, S.143-162.
Abstract: Whereas in language words of high frequency are generally associated with low content [Bookstein, A., & Swanson, D. (1974). Probabilistic models for automatic indexing. Journal of the American Society of Information Science, 25(5), 312-318; Damerau, F. J. (1965). An experiment in automatic indexing. American Documentation, 16, 283-289; Harter, S. P. (1974). A probabilistic approach to automatic keyword indexing. PhD thesis, University of Chicago; Sparck-Jones, K. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28, 11-21; Yu, C., & Salton, G. (1976). Precision weighting - an effective automatic indexing method. Journal of the Association for Computer Machinery (ACM), 23(1), 76-88], shallow syntactic fragments of high frequency generally correspond to lexical fragments of high content [Lioma, C., & Ounis, I. (2006). Examining the content load of part of speech blocks for information retrieval. In Proceedings of the international committee on computational linguistics and the association for computational linguistics (COLING/ACL 2006), Sydney, Australia]. We implement this finding to Information Retrieval, as follows. We present a novel automatic query reformulation technique, which is based on shallow syntactic evidence induced from various language samples, and used to enhance the performance of an Information Retrieval system. Firstly, we draw shallow syntactic evidence from language samples of varying size, and compare the effect of language sample size upon retrieval performance, when using our syntactically-based query reformulation (SQR) technique. Secondly, we compare SQR to a state-of-the-art probabilistic pseudo-relevance feedback technique. Additionally, we combine both techniques and evaluate their compatibility. We evaluate our proposed technique across two standard Text REtrieval Conference (TREC) English test collections, and three statistically different weighting models. Experimental results suggest that SQR markedly enhances retrieval performance, and is at least comparable to pseudo-relevance feedback. Notably, the combination of SQR and pseudo-relevance feedback further enhances retrieval performance considerably. These collective experimental results confirm the tenet that high frequency shallow syntactic fragments correspond to content-bearing lexical fragments.
4Cacheda, F. ; Carneiro, V. ; Plachouras, V. ; Ounis, I.: Performance analysis of distributed information retrieval architectures using an improved network simulation model.
In: Information processing and management. 43(2007) no.1, S.204-224.
Abstract: The increasing number of documents that have to be indexed in different environments, particularly on the Web, and the lack of scalability of a single centralised index lead to the use of distributed information retrieval systems to effectively search for and locate the required information. In this study, we present several improvements over the two main bottlenecks in a distributed information retrieval system (the network and the brokers). We extend a simulation network model in order to represent a switched network. The new simulation model is validated by comparing the estimated response times with those obtained using a real system. We show that the use of a switched network reduces the saturation of the interconnection network, especially in a replicated system, and some improvements may be achieved using multicast messages and faster connections with the brokers. We also demonstrate that reducing the partial results sets will improve the response time of a distributed system by 53%, with a negligible probability of changing the system's precision and recall values. Finally, we present a simple hierarchical distributed broker model that will reduce the response times for a distributed system by 55%.
5He, B. ; Ounis, I.: Combining fields for query expansion and adaptive query expansion.
In: Information processing and management. 43(2007) no.5, S.1294-1307.
Abstract: In this paper, we aim to improve query expansion for ad-hoc retrieval, by proposing a more fine-grained term reweighting process. This fine-grained process uses statistics from the representation of documents in various fields, such as their titles, the anchor text of their incoming links, and their body content. The contribution of this paper is twofold: First, we propose a novel query expansion mechanism on fields by combining field evidence available in a corpora. Second, we propose an adaptive query expansion mechanism that selects an appropriate collection resource, either the local collection, or a high-quality external resource, for query expansion on a per-query basis. The two proposed query expansion approaches are thoroughly evaluated using two standard Text Retrieval Conference (TREC) Web collections, namely the WT10G collection and the large-scale .GOV2 collection. From the experimental results, we observe a statistically significant improvement compared with the baselines. Moreover, we conclude that the adaptive query expansion mechanism is very effective when the external collection used is much larger than the local collection.
6Chang, Y. ; Ounis, I. ; Kim, M.: Query reformulation using automatically generated query concepts from a document space.
In: Information processing and management. 42(2006) no.2, S.453-468.
Abstract: We propose a new query reformulation approach, using a set of query concepts that are introduced to precisely denote the user's information need. Since a document collection is considered to be a domain which includes latent primitive concepts, we identify those concepts through a local pattern discovery and a global modeling using data mining techniques. For a new query, we select its most associated primitive concepts and choose the most probable interpretations as query concepts. We discuss the issue of constructing the primitive concepts from either the whole corpus or from the retrieved set of documents. Our experiments are performed on the TREC8 collection. The experimental evaluation shows that our approach is as good as current query reformulation approaches, while being particularly effective for poorly performing queries. Moreover, we find that the approach using the primitive concepts generated from the set of retrieved documents leads to the most effective performance.