Diese Datenbank enthält über 40.000 Dokumente zu Themen aus den Bereichen Formalerschließung – Inhaltserschließung – Information Retrieval.
© 2015 W. Gödert, TH Köln, Institut für Informationswissenschaft / Powered by litecat, BIS Oldenburg (Stand: 04. Juni 2021)
1Hammache, A. ; Boughanem, M.: Term position-based language model for information retrieval.
In: Journal of the Association for Information Science and Technology. 72(2021) no.5, S.627-642.
Abstract: Term position feature is widely and successfully used in IR and Web search engines, to enhance the retrieval effectiveness. This feature is essentially used for two purposes: to capture query terms proximity or to boost the weight of terms appearing in some parts of a document. In this paper, we are interested in this second category. We propose two novel query-independent techniques based on absolute term positions in a document, whose goal is to boost the weight of terms appearing in the beginning of a document. The first one considers only the earliest occurrence of a term in a document. The second one takes into account all term positions in a document. We formalize each of these two techniques as a document model based on term position, and then we incorporate it into a basic language model (LM). Two smoothing techniques, Dirichlet and Jelinek-Mercer, are considered in the basic LM. Experiments conducted on three TREC test collections show that our model, especially the version based on all term positions, achieves significant improvements over the baseline LMs, and it also often performs better than two state-of-the-art baseline models, the chronological term rank model and the Markov random field model.
Inhalt: Vgl.: https://asistdl.onlinelibrary.wiley.com/doi/10.1002/asi.24431.
2Pan, M. ; Huang, J.X. ; He, T. ; Mao, Z. ; Ying, Z. ; Tu, X.: ¬A simple kernel co-occurrence-based enhancement for pseudo-relevance feedback.
In: Journal of the Association for Information Science and Technology. 71(2020) no.3, S.264-281.
Abstract: Pseudo-relevance feedback is a well-studied query expansion technique in which it is assumed that the top-ranked documents in an initial set of retrieval results are relevant and expansion terms are then extracted from those documents. When selecting expansion terms, most traditional models do not simultaneously consider term frequency and the co-occurrence relationships between candidate terms and query terms. Intuitively, however, a term that has a higher co-occurrence with a query term is more likely to be related to the query topic. In this article, we propose a kernel co-occurrence-based framework to enhance retrieval performance by integrating term co-occurrence information into the Rocchio model and a relevance language model (RM3). Specifically, a kernel co-occurrence-based Rocchio method (KRoc) and a kernel co-occurrence-based RM3 method (KRM3) are proposed. In our framework, co-occurrence information is incorporated into both the factor of the term discrimination power and the factor of the within-document term weight to boost retrieval performance. The results of a series of experiments show that our proposed methods significantly outperform the corresponding strong baselines over all data sets in terms of the mean average precision and over most data sets in terms of P@10. A direct comparison of standard Text Retrieval Conference data sets indicates that our proposed methods are at least comparable to state-of-the-art approaches.
Inhalt: Vgl.: https://asistdl.onlinelibrary.wiley.com/doi/10.1002/asi.24241.
3Belkin, N.J. (Hrsg.): Liu, J. ; Liu, C.: Personalization in text information retrieval : a survey.
In: Journal of the Association for Information Science and Technology. 71(2020) no.3, S.349-369.
Abstract: Personalization of information retrieval (PIR) is aimed at tailoring a search toward individual users and user groups by taking account of additional information about users besides their queries. In the past two decades or so, PIR has received extensive attention in both academia and industry. This article surveys the literature of personalization in text retrieval, following a framework for aspects or factors that can be used for personalization. The framework consists of additional information about users that can be explicitly obtained by asking users for their preferences, or implicitly inferred from users' search behaviors. Users' characteristics and contextual factors such as tasks, time, location, etc., can be helpful for personalization. This article also addresses various issues including when to personalize, the evaluation of PIR, privacy, usability, etc. Based on the extensive review, challenges are discussed and directions for future effort are suggested.
4Behnert, C. ; Plassmeier, K. ; Borst, T. ; Lewandowski, D.: Evaluierung von Rankingverfahren für bibliothekarische Informationssysteme.
In: Information - Wissenschaft und Praxis. 70(2019) H.1, S.14-23.
Abstract: Dieser Beitrag beschreibt eine Studie zur Entwicklung und Evaluierung von Rankingverfahren für bibliothekarische Informationssysteme. Dazu wurden mögliche Faktoren für das Relevanzranking ausgehend von den Verfahren in Websuchmaschinen identifiziert, auf den Bibliothekskontext übertragen und systematisch evaluiert. Mithilfe eines Testsystems, das auf dem ZBW-Informationsportal EconBiz und einer web-basierten Software zur Evaluierung von Suchsystemen aufsetzt, wurden verschiedene Relevanzfaktoren (z. B. Popularität in Verbindung mit Aktualität) getestet. Obwohl die getesteten Rankingverfahren auf einer theoretischen Ebene divers sind, konnten keine einheitlichen Verbesserungen gegenüber den Baseline-Rankings gemessen werden. Die Ergebnisse deuten darauf hin, dass eine Adaptierung des Rankings auf individuelle Nutzer bzw. Nutzungskontexte notwendig sein könnte, um eine höhere Performance zu erzielen.
Inhalt: Vgl.: https://doi.org/10.1515/iwp-2019-0004.
Anmerkung: Teil eines Themenheftes.
Themenfeld: Suchmaschinen ; Retrievalalgorithmen
5Jiang, J.-D. ; Jiang, J.-Y. ; Cheng, P.-J.: Cocluster hypothesis and ranking consistency for relevance ranking in web search.
In: Journal of the Association for Information Science and Technology. 70(2019) no.6, S.535-546.
Abstract: Conventional approaches to relevance ranking typically optimize ranking models by each query separately. The traditional cluster hypothesis also does not consider the dependency between related queries. The goal of this paper is to leverage similar search intents to perform ranking consistency so that the search performance can be improved accordingly. Different from the previous supervised approach, which learns relevance by click-through data, we propose a novel cocluster hypothesis to bridge the gap between relevance ranking and ranking consistency. A nearest-neighbors test is also designed to measure the extent to which the cocluster hypothesis holds. Based on the hypothesis, we further propose a two-stage unsupervised approach, in which two ranking heuristics and a cost function are developed to optimize the combination of consistency and uniqueness (or inconsistency). Extensive experiments have been conducted on a real and large-scale search engine log. The experimental results not only verify the applicability of the proposed cocluster hypothesis but also show that our approach is effective in boosting the retrieval performance of the commercial search engine and reaches a comparable performance to the supervised approach.
Inhalt: Vgl.: https://onlinelibrary.wiley.com/doi/10.1002/asi.24071.
6Jacucci, G. ; Barral, O. ; Daee, P. ; Wenzel, M. ; Serim, B. ; Ruotsalo, T. ; Pluchino, P. ; Freeman, J. ; Gamberini, L. ; Kaski, S. ; Blankertz, B.: Integrating neurophysiologic relevance feedback in intent modeling for information retrieval.
In: Journal of the Association for Information Science and Technology. 70(2019) no.9, S.917-930.
Abstract: The use of implicit relevance feedback from neurophysiology could deliver effortless information retrieval. However, both computing neurophysiologic responses and retrieving documents are characterized by uncertainty because of noisy signals and incomplete or inconsistent representations of the data. We present the first-of-its-kind, fully integrated information retrieval system that makes use of online implicit relevance feedback generated from brain activity as measured through electroencephalography (EEG), and eye movements. The findings of the evaluation experiment (N = 16) show that we are able to compute online neurophysiology-based relevance feedback with performance significantly better than chance in complex data domains and realistic search tasks. We contribute by demonstrating how to integrate in interactive intent modeling this inherently noisy implicit relevance feedback combined with scarce explicit feedback. Although experimental measures of task performance did not allow us to demonstrate how the classification outcomes translated into search task performance, the experiment proved that our approach is able to generate relevance feedback from brain signals and eye movements in a realistic scenario, thus providing promising implications for future work in neuroadaptive information retrieval (IR).
Inhalt: Vgl.: https://onlinelibrary.wiley.com/doi/10.1002/asi.24161.
Anmerkung: Beitrag in einem 'Special issue on neuro-information science'.
7González-Ibáñez, R. ; Esparza-Villamán, A. ; Vargas-Godoy, J.C. ; Shah, C.: ¬A comparison of unimodal and multimodal models for implicit detection of relevance in interactive IR.
In: Journal of the Association for Information Science and Technology. 70(2019) no.11, S.1223-1235.
Abstract: Implicit detection of relevance has been approached by many during the last decade. From the use of individual measures to the use of multiple features from different sources (multimodality), studies have shown the feasibility to automatically detect whether a document is relevant. Despite promising results, it is not clear yet to what extent multimodality constitutes an effective approach compared to unimodality. In this article, we hypothesize that it is possible to build unimodal models capable of outperforming multimodal models in the detection of perceived relevance. To test this hypothesis, we conducted three experiments to compare unimodal and multimodal classification models built using a combination of 24 features. Our classification experiments showed that a univariate unimodal model based on the left-click feature supports our hypothesis. On the other hand, our prediction experiment suggests that multimodality slightly improves early classification compared to the best unimodal models. Based on our results, we argue that the feasibility for practical applications of state-of-the-art multimodal approaches may be strongly constrained by technology, cultural, ethical, and legal aspects, in which case unimodality may offer a better alternative today for supporting relevance detection in interactive information retrieval systems.
Inhalt: Vgl.: https://asistdl.onlinelibrary.wiley.com/doi/10.1002/asi.24202.
8Ayadi, H. ; Torjmen-Khemakhem, M. ; Daoud, M. ; Xiangji Huang, J. ; Ben Jemaa, M.: MF-Re-Rank : a modality feature-based re-ranking model for medical image retrieval.
In: Journal of the Association for Information Science and Technology. 69(2018) no.9, S.1095-1108.
Abstract: One of the main challenges in medical image retrieval is the increasing volume of image data, which render it difficult for domain experts to find relevant information from large data sets. Effective and efficient medical image retrieval systems are required to better manage medical image information. Text-based image retrieval (TBIR) was very successful in retrieving images with textual descriptions. Several TBIR approaches rely on models based on bag-of-words approaches, in which the image retrieval problem turns into one of standard text-based information retrieval; where the meanings and values of specific medical entities in the text and metadata are ignored in the image representation and retrieval process. However, we believe that TBIR should extract specific medical entities and terms and then exploit these elements to achieve better image retrieval results. Therefore, we propose a novel reranking method based on medical-image-dependent features. These features are manually selected by a medical expert from imaging modalities and medical terminology. First, we represent queries and images using only medical-image-dependent features such as image modality and image scale. Second, we exploit the defined features in a new reranking method for medical image retrieval. Our motivation is the large influence of image modality in medical image retrieval and its impact on image-relevance scores. To evaluate our approach, we performed a series of experiments on the medical ImageCLEF data sets from 2009 to 2013. The BM25 model, a language model, and an image-relevance feedback model are used as baselines to evaluate our approach. The experimental results show that compared to the BM25 model, the proposed model significantly enhances image retrieval performance. We also compared our approach with other state-of-the-art approaches and show that our approach performs comparably to those of the top three runs in the official ImageCLEF competition.
Inhalt: Vgl.: https://onlinelibrary.wiley.com/doi/10.1002/asi.24045.
Behandelte Form: Bilder
9Zhu, J. ; Han, L. ; Gou, Z. ; Yuan, X.: ¬A fuzzy clustering-based denoising model for evaluating uncertainty in collaborative filtering recommender systems.
In: Journal of the Association for Information Science and Technology. 69(2018) no.9, S.1109-1121.
Abstract: Recommender systems are effective in predicting the most suitable products for users, such as movies and books. To facilitate personalized recommendations, the quality of item ratings should be guaranteed. However, a few ratings might not be accurate enough due to the uncertainty of user behavior and are referred to as natural noise. In this article, we present a novel fuzzy clustering-based method for detecting noisy ratings. The entropy of a subset of the original ratings dataset is used to indicate the data-driven uncertainty, and evaluation metrics are adopted to represent the prediction-driven uncertainty. After the repetition of resampling and the execution of a recommendation algorithm, the entropy and evaluation metrics vectors are obtained and are empirically categorized to identify the proportion of the potential noise. Then, the fuzzy C-means-based denoising (FCMD) algorithm is performed to verify the natural noise under the assumption that natural noise is primarily the result of the exceptional behavior of users. Finally, a case study is performed using two real-world datasets. The experimental results show that our proposal outperforms previous proposals and has an advantage in dealing with natural noise.
Inhalt: Vgl.: https://onlinelibrary.wiley.com/doi/10.1002/asi.24036.
10Abdelkareem, M.A.A.: In terms of publication index, what indicator is the best for researchers indexing, Google Scholar, Scopus, Clarivate or others?.
Abstract: I believe that Google Scholar is the most popular academic indexing way for researchers and citations. However, some other indexing institutions may be more professional than Google Scholar but not as popular as Google Scholar. Other indexing websites like Scopus and Clarivate are providing more statistical figures for scholars, institutions or even journals. On account of publication citations, always Google Scholar shows higher citations for a paper than other indexing websites since Google Scholar consider most of the publication platforms so he can easily count the citations. While other databases just consider the citations come from those journals that are already indexed in their database
Themenfeld: Retrievalalgorithmen ; Informetrie
Objekt: Google Scholar ; Scopus ; Clarivate
11Li, H. ; Wu, H. ; Li, D. ; Lin, S. ; Su, Z. ; Luo, X.: PSI: A probabilistic semantic interpretable framework for fine-grained image ranking.
In: Journal of the Association for Information Science and Technology. 69(2018) no.12, S.1488-1501.
Abstract: Image Ranking is one of the key problems in information science research area. However, most current methods focus on increasing the performance, leaving the semantic gap problem, which refers to the learned ranking models are hard to be understood, remaining intact. Therefore, in this article, we aim at learning an interpretable ranking model to tackle the semantic gap in fine-grained image ranking. We propose to combine attribute-based representation and online passive-aggressive (PA) learning based ranking models to achieve this goal. Besides, considering the highly localized instances in fine-grained image ranking, we introduce a supervised constrained clustering method to gather class-balanced training instances for local PA-based models, and incorporate the learned local models into a unified probabilistic framework. Extensive experiments on the benchmark demonstrate that the proposed framework outperforms state-of-the-art methods in terms of accuracy and speed.
Behandelte Form: Bilder
12Hora, M.: Methoden für das Ranking in Discovery-Systemen.
In: Perspektive Bibliothek. 7(2018) H.2, S.2-23.
Abstract: Discovery-Systeme bieten meist als Standardeinstellung eine Sortierung nach Relevanz an. Wie die Relevanz ermittelt wird, ist häufig intransparent. Dabei wären Kenntnisse darüber aus Nutzersicht ein wichtiger Faktor in der Informationskompetenz, während Bibliotheken sicherstellen sollten, dass das Ranking zum eigenen Bestand und Publikum passt. In diesem Aufsatz wird dargestellt, wie Discovery-Systeme Treffer auswählen und bewerten. Dazu gehören Indexierung, Prozessierung, Text-Matching und weitere Relevanzkriterien, z. B. Popularität oder Verfügbarkeit. Schließlich müssen alle betrachteten Kriterien zu einem zentralen Score zusammengefasst werden. Ein besonderer Fokus wird auf das Ranking von EBSCO Discovery Service, Primo und Summon gelegt.
Inhalt: Vgl.: https://journals.ub.uni-heidelberg.de/index.php/bibliothek/article/view/57797. Vgl. auch: URN (PDF): http://nbn-resolving.de/urn:nbn:de:bsz:16-pb-577977.
Themenfeld: Retrievalalgorithmen ; OPAC
13Hubert, G. ; Pitarch, Y. ; Pinel-Sauvagnat, K. ; Tournier, R. ; Laporte, L.: TournaRank : when retrieval becomes document competition.
In: Information processing and management. 54(2018) no.2, S.252-272.
Abstract: Numerous feature-based models have been recently proposed by the information retrieval community. The capability of features to express different relevance facets (query- or document-dependent) can explain such a success story. Such models are most of the time supervised, thus requiring a learning phase. To leverage the advantages of feature-based representations of documents, we propose TournaRank, an unsupervised approach inspired by real-life game and sport competition principles. Documents compete against each other in tournaments using features as evidences of relevance. Tournaments are modeled as a sequence of matches, which involve pairs of documents playing in turn their features. Once a tournament is ended, documents are ranked according to their number of won matches during the tournament. This principle is generic since it can be applied to any collection type. It also provides great flexibility since different alternatives can be considered by changing the tournament type, the match rules, the feature set, or the strategies adopted by documents during matches. TournaRank was experimented on several collections to evaluate our model in different contexts and to compare it with related approaches such as Learning To Rank and fusion ones: the TREC Robust2004 collection for homogeneous documents, the TREC Web2014 (ClueWeb12) collection for heterogeneous web documents, and the LETOR3.0 collection for comparison with supervised feature-based models.
Inhalt: Vgl.: https://doi.org/10.1016/j.ipm.2017.11.006.
14Walz, J.: Analyse der Übertragbarkeit allgemeiner Rankingfaktoren von Web-Suchmaschinen auf Discovery-Systeme.
Köln : Fakultät für Informations- und Kommunikationswissenschaften, 2018. 75 S.
Abstract: Ziel: Ziel dieser Bachelorarbeit war es, die Übertragbarkeit der allgemeinen Rankingfaktoren, wie sie von Web-Suchmaschinen verwendet werden, auf Discovery-Systeme zu analysieren. Dadurch könnte das bisher hauptsächlich auf dem textuellen Abgleich zwischen Suchanfrage und Dokumenten basierende bibliothekarische Ranking verbessert werden. Methode: Hierfür wurden Faktoren aus den Gruppen Popularität, Aktualität, Lokalität, Technische Faktoren, sowie dem personalisierten Ranking diskutiert. Die entsprechenden Rankingfaktoren wurden nach ihrer Vorkommenshäufigkeit in der analysierten Literatur und der daraus abgeleiteten Wichtigkeit, ausgewählt. Ergebnis: Von den 23 untersuchten Rankingfaktoren sind 14 (61 %) direkt vom Ranking der Web-Suchmaschinen auf das Ranking der Discovery-Systeme übertragbar. Zu diesen zählen unter anderem das Klickverhalten, das Erstellungsdatum, der Nutzerstandort, sowie die Sprache. Sechs (26%) der untersuchten Faktoren sind dagegen nicht übertragbar (z.B. Aktualisierungsfrequenz und Ladegeschwindigkeit). Die Linktopologie, die Nutzungshäufigkeit, sowie die Aktualisierungsfrequenz sind mit entsprechenden Modifikationen übertragbar.
Inhalt: Vgl.: https://publiscologne.th-koeln.de/frontdoor/index/index/searchtype/authorsearch/author/Julia+Walz/docId/1169/start/0/rows/10.
Anmerkung: Bachelorarbeit Bibliothekswissenschaft, Technische Hochschule Köln.
Themenfeld: Katalogfragen allgemein ; Retrievalalgorithmen
15Dadashkarimia, J. ; Shakery, A. ; Failia, H. ; Zamani, H.: ¬An expectation-maximization algorithm for query translation based on pseudo-relevant documents.
In: Information processing and management. 53(2017) no.2, S.371-387.
Abstract: Query translation in cross-language information retrieval (CLIR) can be done by employing dictionaries, aligned corpora, or machine translators. Scarcity of aligned corpora for various domains in many language pairs intensifies the importance of dictionary-based CLIR which motivates us to use only a bilingual dictionary and two independent collections in source and target languages for query translation. We exploit pseudo-relevant documents for a given query in the source language and pseudo-relevant documents for a translation of the query in the target language with a proposed expectation-maximization algorithm for improving query translation. The proposed method (called EM4QT) assumes that each target term either is translated from the source pseudo-relevant documents or has come from a noisy collection. Since EM4QT does not directly consider term coherency, which is defined as fluency of the target translation, we investigate a crucial question: can EM4QT be improved using either coherency-based methods or token-to-token translation ones? To address this question, we combine different translation models via simple linear interpolation and a proposed divergence minimization method. Evaluations over four CLEF collections in Persian, French, Spanish, and German indicate that EM4QT significantly outperforms competitive baselines in all the collections. Our experiments also reveal that since EM4QT indirectly considers term coherency, combining the method with coherency-based models cannot significantly improve the retrieval performance. On the other hand, investigating the query-by-query results supports the view that EM4QT usually gives a relatively high weight to one translation and its combination with the proposed token-to-token translation model, which is obtained by running EM4QT for each query term separately, soothes the effect and reaches better results for many queries. Comparing the method with a competitive word-embedding baseline reveals the superiority of the proposed model.
Inhalt: Vgl.: http://www.sciencedirect.com/science/article/pii/S0306457316306379 [http://dx.doi.org/10.1016/j.ipm.2016.11.007].
16Ye, Z. ; Huang, J.X.: ¬A learning to rank approach for quality-aware pseudo-relevance feedback.
In: Journal of the Association for Information Science and Technology. 67(2016) no.4, S.942-959.
Abstract: Pseudo relevance feedback (PRF) has shown to be effective in ad hoc information retrieval. In traditional PRF methods, top-ranked documents are all assumed to be relevant and therefore treated equally in the feedback process. However, the performance gain brought by each document is different as showed in our preliminary experiments. Thus, it is more reasonable to predict the performance gain brought by each candidate feedback document in the process of PRF. We define the quality level (QL) and then use this information to adjust the weights of feedback terms in these documents. Unlike previous work, we do not make any explicit relevance assumption and we go beyond just selecting "good" documents for PRF. We propose a quality-based PRF framework, in which two quality-based assumptions are introduced. Particularly, two different strategies, relevance-based QL (RelPRF) and improvement-based QL (ImpPRF) are presented to estimate the QL of each feedback document. Based on this, we select a set of heterogeneous document-level features and apply a learning approach to evaluate the QL of each feedback document. Extensive experiments on standard TREC (Text REtrieval Conference) test collections show that our proposed model performs robustly and outperforms strong baselines significantly.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23430/abstract.
17Xu, B. ; Lin, H. ; Lin, Y.: Assessment of learning to rank methods for query expansion.
In: Journal of the Association for Information Science and Technology. 67(2016) no.6, S.1345-1357.
Abstract: Pseudo relevance feedback, as an effective query expansion method, can significantly improve information retrieval performance. However, the method may negatively impact the retrieval performance when some irrelevant terms are used in the expanded query. Therefore, it is necessary to refine the expansion terms. Learning to rank methods have proven effective in information retrieval to solve ranking problems by ranking the most relevant documents at the top of the returned list, but few attempts have been made to employ learning to rank methods for term refinement in pseudo relevance feedback. This article proposes a novel framework to explore the feasibility of using learning to rank to optimize pseudo relevance feedback by means of reranking the candidate expansion terms. We investigate some learning approaches to choose the candidate terms and introduce some state-of-the-art learning to rank methods to refine the expansion terms. In addition, we propose two term labeling strategies and examine the usefulness of various term features to optimize the framework. Experimental results with three TREC collections show that our framework can effectively improve retrieval performance.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23476/abstract.
Themenfeld: Semantisches Umfeld in Indexierung u. Retrieval ; Retrievalalgorithmen
18Karisani, P. ; Rahgozar, M. ; Oroumchian, F.: Transforming LSA space dimensions into a rubric for an automatic assessment and feedback system.
In: Information processing and management. 52(2016) no.3, S.478-489.
Abstract: Pseudo-relevance feedback is the basis of a category of automatic query modification techniques. Pseudo-relevance feedback methods assume the initial retrieved set of documents to be relevant. Then they use these documents to extract more relevant terms for the query or just re-weigh the user's original query. In this paper, we propose a straightforward, yet effective use of pseudo-relevance feedback method in detecting more informative query terms and re-weighting them. The query-by-query analysis of our results indicates that our method is capable of identifying the most important keywords even in short queries. Our main idea is that some of the top documents may contain a closer context to the user's information need than the others. Therefore, re-examining the similarity of those top documents and weighting this set based on their context could help in identifying and re-weighting informative query terms. Our experimental results in standard English and Persian test collections show that our method improves retrieval performance, in terms of MAP criterion, up to 7% over traditional query term re-weighting methods.
Inhalt: Vgl.: 10.1016/j.ipm.2015.09.002.
19Jiang, X. ; Sun, X. ; Yang, Z. ; Zhuge, H. ; Lapshinova-Koltunski, E. ; Yao, J.: Exploiting heterogeneous scientific literature networks to combat ranking bias : evidence from the computational linguistics area.
In: Journal of the Association for Information Science and Technology. 67(2016) no.7, S.1679-1702.
Abstract: It is important to help researchers find valuable papers from a large literature collection. To this end, many graph-based ranking algorithms have been proposed. However, most of these algorithms suffer from the problem of ranking bias. Ranking bias hurts the usefulness of a ranking algorithm because it returns a ranking list with an undesirable time distribution. This paper is a focused study on how to alleviate ranking bias by leveraging the heterogeneous network structure of the literature collection. We propose a new graph-based ranking algorithm, MutualRank, that integrates mutual reinforcement relationships among networks of papers, researchers, and venues to achieve a more synthetic, accurate, and less-biased ranking than previous methods. MutualRank provides a unified model that involves both intra- and inter-network information for ranking papers, researchers, and venues simultaneously. We use the ACL Anthology Network as the benchmark data set and construct the gold standard from computer linguistics course websites of well-known universities and two well-known textbooks. The experimental results show that MutualRank greatly outperforms the state-of-the-art competitors, including PageRank, HITS, CoRank, Future Rank, and P-Rank, in ranking papers in both improving ranking effectiveness and alleviating ranking bias. Rankings of researchers and venues by MutualRank are also quite reasonable.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23463/abstract.
20Tsai, C.-F. ; Hu, Y.-H. ; Chen, Z.-Y.: Factors affecting rocchio-based pseudorelevance feedback in image retrieval.
In: Journal of the Association for Information Science and Technology. 66(2015) no.1, S.40-57.
Abstract: Pseudorelevance feedback (PRF) was proposed to solve the limitation of relevance feedback (RF), which is based on the user-in-the-loop process. In PRF, the top-k retrieved images are regarded as PRF. Although the PRF set contains noise, PRF has proven effective for automatically improving the overall retrieval result. To implement PRF, the Rocchio algorithm has been considered as a reasonable and well-established baseline. However, the performance of Rocchio-based PRF is subject to various representation choices (or factors). In this article, we examine these factors that affect the performance of Rocchio-based PRF, including image-feature representation, the number of top-ranked images, the weighting parameters of Rocchio, and similarity measure. We offer practical insights on how to optimize the performance of Rocchio-based PRF by choosing appropriate representation choices. Our extensive experiments on NUS-WIDE-LITE and Caltech 101 + Corel 5000 data sets show that the optimal feature representation is color moment + wavelet texture in terms of retrieval efficiency and effectiveness. Other representation choices are that using top-20 ranked images as pseudopositive and pseudonegative feedback sets with the equal weight (i.e., 0.5) by the correlation and cosine distance functions can produce the optimal retrieval result.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23154/abstract.
Behandelte Form: Bilder