Diese Datenbank enthält über 40.000 Dokumente zu Themen aus den Bereichen Formalerschließung – Inhaltserschließung – Information Retrieval.
© 2015 W. Gödert, TH Köln, Institut für Informationswissenschaft / Powered by litecat, BIS Oldenburg (Stand: 28. April 2022)
1Dang, E.K.F. ; Luk, R.W.P. ; Allan, J. ; Ho, K.S. ; Chung, K.F.L. ; Lee, D.L.: ¬A new context-dependent term weight computed by boost and discount using relevance information.
In: Journal of the American Society for Information Science and Technology. 61(2010) no.12, S.2514-2530.
Abstract: We studied the effectiveness of a new class of context-dependent term weights for information retrieval. Unlike the traditional term frequency-inverse document frequency (TF-IDF), the new weighting of a term t in a document d depends not only on the occurrence statistics of t alone but also on the terms found within a text window (or "document-context") centered on t. We introduce a Boost and Discount (B&D) procedure which utilizes partial relevance information to compute the context-dependent term weights of query terms according to a logistic regression model. We investigate the effectiveness of the new term weights compared with the context-independent BM25 weights in the setting of relevance feedback. We performed experiments with title queries of the TREC-6, -7, -8, and 2005 collections, comparing the residual Mean Average Precision (MAP) measures obtained using B&D term weights and those obtained by a baseline using BM25 weights. Given either 10 or 20 relevance judgments of the top retrieved documents, using the new term weights yields improvement over the baseline for all collections tested. The MAP obtained with the new weights has relative improvement over the baseline by 3.3 to 15.2%, with statistical significance at the 95% confidence level across all four collections.
2Dang, E.K.F. ; Luk, R.W.P. ; Ho, K.S. ; Chan, S.C.F. ; Lee, D.L.: ¬A new measure of clustering effectiveness : algorithms and experimental studies.
In: Journal of the American Society for Information Science and Technology. 59(2008) no.3, S.390-406.
Abstract: We propose a new optimal clustering effectiveness measure, called CS1, based on a combination of clusters rather than selecting a single optimal cluster as in the traditional MK1 measure. For hierarchical clustering, we present an algorithm to compute CS1, defined by seeking the optimal combinations of disjoint clusters obtained by cutting the hierarchical structure at a certain similarity level. By reformulating the optimization to a 0-1 linear fractional programming problem, we demonstrate that an exact solution can be obtained by a linear time algorithm. We further discuss how our approach can be generalized to more general problems involving overlapping clusters, and we show how optimal estimates can be obtained by greedy algorithms.
Themenfeld: Automatisches Klassifizieren
3Lan, K.C. ; Ho, K.S. ; Luk, R.W.P. ; Leong, H.V.: Dialogue act recognition using maximum entropy.
In: Journal of the American Society for Information Science and Technology. 59(2008) no.6, S.859-874.
Abstract: A dialogue-based interface for information systems is considered a potentially very useful approach to information access. A key step in computer processing of natural-language dialogues is dialogue-act (DA) recognition. In this paper, we apply a feature-based classification approach for DA recognition, by using the maximum entropy (ME) method to build a classifier for labeling utterances with DA tags. The ME method has the advantage that a large number of heterogeneous features can be flexibly combined in one classifier, which can facilitate feature selection. A unique characteristic of our approach is that it does not need to model the prior probability of DAs directly, and thus avoids the use of a discourse grammar. This simplifies the implementation of the classifier and improves the efficiency of DA recognition, without sacrificing the classification accuracy. We evaluate the classifier using a large data set based on the Switchboard corpus. Encouraging performance is observed; the highest classification accuracy achieved is 75.03%. We also propose a heuristic to address the problem of sparseness of the data set. This problem has resulted in poor classification accuracies of some DA types that have very low occurrence frequencies in the data set. Preliminary evaluation shows that the method is effective in improving the macroaverage classification accuracy of the ME classifier.
4Wong, W.S. ; Luk, R.W.P. ; Leong, H.V. ; Ho, K.S. ; Lee, D.L.: Re-examining the effects of adding relevance information in a relevance feedback environment.
In: Information processing and management. 44(2008) no.3, S.1086-1116.
Abstract: This paper presents an investigation about how to automatically formulate effective queries using full or partial relevance information (i.e., the terms that are in relevant documents) in the context of relevance feedback (RF). The effects of adding relevance information in the RF environment are studied via controlled experiments. The conditions of these controlled experiments are formalized into a set of assumptions that form the framework of our study. This framework is called idealized relevance feedback (IRF) framework. In our IRF settings, we confirm the previous findings of relevance feedback studies. In addition, our experiments show that better retrieval effectiveness can be obtained when (i) we normalize the term weights by their ranks, (ii) we select weighted terms in the top K retrieved documents, (iii) we include terms in the initial title queries, and (iv) we use the best query sizes for each topic instead of the average best query size where they produce at most five percentage points improvement in the mean average precision (MAP) value. We have also achieved a new level of retrieval effectiveness which is about 55-60% MAP instead of 40+% in the previous findings. This new level of retrieval effectiveness was found to be similar to a level using a TREC ad hoc test collection that is about double the number of documents in the TREC-3 test collection used in previous works.