Diese Datenbank enthält über 40.000 Dokumente zu Themen aus den Bereichen Formalerschließung – Inhaltserschließung – Information Retrieval.
© 2015 W. Gödert, TH Köln, Institut für Informationswissenschaft / Powered by litecat, BIS Oldenburg (Stand: 28. April 2022)
1Pan, M. ; Huang, J.X. ; He, T. ; Mao, Z. ; Ying, Z. ; Tu, X.: ¬A simple kernel co-occurrence-based enhancement for pseudo-relevance feedback.
In: Journal of the Association for Information Science and Technology. 71(2020) no.3, S.264-281.
Abstract: Pseudo-relevance feedback is a well-studied query expansion technique in which it is assumed that the top-ranked documents in an initial set of retrieval results are relevant and expansion terms are then extracted from those documents. When selecting expansion terms, most traditional models do not simultaneously consider term frequency and the co-occurrence relationships between candidate terms and query terms. Intuitively, however, a term that has a higher co-occurrence with a query term is more likely to be related to the query topic. In this article, we propose a kernel co-occurrence-based framework to enhance retrieval performance by integrating term co-occurrence information into the Rocchio model and a relevance language model (RM3). Specifically, a kernel co-occurrence-based Rocchio method (KRoc) and a kernel co-occurrence-based RM3 method (KRM3) are proposed. In our framework, co-occurrence information is incorporated into both the factor of the term discrimination power and the factor of the within-document term weight to boost retrieval performance. The results of a series of experiments show that our proposed methods significantly outperform the corresponding strong baselines over all data sets in terms of the mean average precision and over most data sets in terms of P@10. A direct comparison of standard Text Retrieval Conference data sets indicates that our proposed methods are at least comparable to state-of-the-art approaches.
Inhalt: Vgl.: https://asistdl.onlinelibrary.wiley.com/doi/10.1002/asi.24241.
2Ayadi, H. ; Torjmen-Khemakhem, M. ; Daoud, M. ; Huang, J.X. ; Jemaa, M.B.: Mining correlations between medically dependent features and image retrieval models for query classification.
In: Journal of the Association for Information Science and Technology. 68(2017) no.5, S.1323-1334.
Abstract: The abundance of medical resources has encouraged the development of systems that allow for efficient searches of information in large medical image data sets. State-of-the-art image retrieval models are classified into three categories: content-based (visual) models, textual models, and combined models. Content-based models use visual features to answer image queries, textual image retrieval models use word matching to answer textual queries, and combined image retrieval models, use both textual and visual features to answer queries. Nevertheless, most of previous works in this field have used the same image retrieval model independently of the query type. In this article, we define a list of generic and specific medical query features and exploit them in an association rule mining technique to discover correlations between query features and image retrieval models. Based on these rules, we propose to use an associative classifier (NaiveClass) to find the best suitable retrieval model given a new textual query. We also propose a second associative classifier (SmartClass) to select the most appropriate default class for the query. Experiments are performed on Medical ImageCLEF queries from 2008 to 2012 to evaluate the impact of the proposed query features on the classification performance. The results show that combining our proposed specific and generic query features is effective in query classification.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23772/full.
Themenfeld: Data Mining
3An, X. ; Huang, J.X.: geNov : a new metric for measuring novelty and relevancy in biomedical information retrieval.
In: Journal of the Association for Information Science and Technology. 68(2017) no.11, S.2620-2635.
Abstract: For diversity and novelty evaluation in information retrieval, we expect that the novel documents are always ranked higher than the redundant ones and the relevant ones higher than the irrelevant ones. We also expect that the level of novelty and relevancy should be acknowledged. Accordingly, we expect that the evaluation algorithm would reward rankings that respect these expectations. Nevertheless, there are few research articles in the literature that study how to meet such expectations, even fewer in the field of biomedical information retrieval. In this article, we propose a new metric for novelty and relevancy evaluation in biomedical information retrieval based on an aspect-level performance measure introduced by TREC Genomics Track with formal results to show that those expectations above can be respected under ideal conditions. The empirical evaluation indicates that the proposed metric, geNov, is greatly sensitive to the desired characteristics above, and the three parameters are highly tuneable for different evaluation preferences. By experimentally comparing with state-of-the-art metrics for novelty and diversity, the proposed metric shows its advantages in recognizing the ranking quality in terms of novelty, redundancy, relevancy, and irrelevancy and in its discriminative power. Experiments reveal the proposed metric is faster to compute than state-of-the-art metrics.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23958/full.
Anmerkung: Beitrag in einem Special issue on biomedical information retrieval.
4Ye, Z. ; Huang, J.X.: ¬A learning to rank approach for quality-aware pseudo-relevance feedback.
In: Journal of the Association for Information Science and Technology. 67(2016) no.4, S.942-959.
Abstract: Pseudo relevance feedback (PRF) has shown to be effective in ad hoc information retrieval. In traditional PRF methods, top-ranked documents are all assumed to be relevant and therefore treated equally in the feedback process. However, the performance gain brought by each document is different as showed in our preliminary experiments. Thus, it is more reasonable to predict the performance gain brought by each candidate feedback document in the process of PRF. We define the quality level (QL) and then use this information to adjust the weights of feedback terms in these documents. Unlike previous work, we do not make any explicit relevance assumption and we go beyond just selecting "good" documents for PRF. We propose a quality-based PRF framework, in which two quality-based assumptions are introduced. Particularly, two different strategies, relevance-based QL (RelPRF) and improvement-based QL (ImpPRF) are presented to estimate the QL of each feedback document. Based on this, we select a set of heterogeneous document-level features and apply a learning approach to evaluate the QL of each feedback document. Extensive experiments on standard TREC (Text REtrieval Conference) test collections show that our proposed model performs robustly and outperforms strong baselines significantly.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23430/abstract.
5Daoud, M. ; Huang, J.X.: Modeling geographic, temporal, and proximity contexts for improving geotemporal search.
In: Journal of the American Society for Information Science and Technology. 64(2013) no.1, S.190-212.
Abstract: Traditional information retrieval (IR) systems show significant limitations on returning relevant documents that satisfy the user's information needs. In particular, to answer geographic and temporal user queries, the IR task becomes a nonstraightforward process where the available geographic and temporal information is often unstructured. In this article, we propose a geotemporal search approach that consists of modeling and exploiting geographic and temporal query context evidence that refers to implicit multivarying geographic and temporal intents behind the query. Modeling geographic and temporal query contexts is based on extracting and ranking geographic and temporal keywords found in pseudo-relevant feedback (PRF) documents for a given query. Our geotemporal search approach is based on exploiting the geographic and temporal query contexts separately into a probabilistic ranking model and jointly into a proximity ranking model. Our hypothesis is based on the concept that geographic and temporal expressions tend to co-occur within the document where the closer they are in the document, the more relevant the document is. Finally, geographic, temporal, and proximity scores are combined according to a linear combination formula. An extensive experimental evaluation conducted on a portion of the New York Times news collection and the TREC 2004 robust retrieval track collection shows that our geotemporal approach outperforms significantly a well-known baseline search and the best known geotemporal search approaches in the domain. Finally, an in-depth analysis shows a positive correlation between the geographic and temporal query sensitivity and the retrieval performance. Also, we find that geotemporal distance has a positive impact on retrieval performance generally.
6Ye, Z. ; Huang, J.X. ; He, B. ; Lin, H.: Mining a multilingual association dictionary from Wikipedia for cross-language information retrieval.
In: Journal of the American Society for Information Science and Technology. 63(2012) no.12, S.2474-2487.
Abstract: Wikipedia is characterized by its dense link structure and a large number of articles in different languages, which make it a notable Web corpus for knowledge extraction and mining, in particular for mining the multilingual associations. In this paper, motivated by a psychological theory of word meaning, we propose a graph-based approach to constructing a cross-language association dictionary (CLAD) from Wikipedia, which can be used in a variety of cross-language accessing and processing applications. In order to evaluate the quality of the mined CLAD, and to demonstrate how the mined CLAD can be used in practice, we explore two different applications of the mined CLAD to cross-language information retrieval (CLIR). First, we use the mined CLAD to conduct cross-language query expansion; and, second, we use it to filter out translation candidates with low translation probabilities. Experimental results on a variety of standard CLIR test collections show that the CLIR retrieval performance can be substantially improved with the above two applications of CLAD, which indicates that the mined CLAD is of sound quality.
Themenfeld: Multilinguale Probleme
7Ye, Z. ; Huang, J.X. ; Lin, H.: Finding a good query-related topic for boosting pseudo-relevance feedback.
In: Journal of the American Society for Information Science and Technology. 62(2011) no.4, S.748-760.
Abstract: Pseudo-relevance feedback (PRF) via query expansion (QE) assumes that the top-ranked documents from the first-pass retrieval are relevant. The most informative terms in the pseudo-relevant feedback documents are then used to update the original query representation in order to boost the retrieval performance. Most current PRF approaches estimate the importance of the candidate expansion terms based on their statistics on document level. However, a document for PRF may consist of different topics, which may not be all related to the query even if the document is judged relevant. The main argument of this article is the proposal to conduct PRF on a granularity smaller than on the document level. In this article, we propose a topic-based feedback model with three different strategies for finding a good query-related topic based on the Latent Dirichlet Allocation model. The experimental results on four representative TREC collections show that QE based on the derived topic achieves statistically significant improvements over a strong feedback model in the language modeling framework, which updates the query representation based on the top-ranked documents.