Diese Datenbank enthält über 40.000 Dokumente zu Themen aus den Bereichen Formalerschließung – Inhaltserschließung – Information Retrieval.
© 2015 W. Gödert, TH Köln, Institut für Informationswissenschaft / Powered by litecat, BIS Oldenburg (Stand: 28. April 2022)
1Lan, K.C. ; Ho, K.S. ; Luk, R.W.P. ; Leong, H.V.: Dialogue act recognition using maximum entropy.
In: Journal of the American Society for Information Science and Technology. 59(2008) no.6, S.859-874.
Abstract: A dialogue-based interface for information systems is considered a potentially very useful approach to information access. A key step in computer processing of natural-language dialogues is dialogue-act (DA) recognition. In this paper, we apply a feature-based classification approach for DA recognition, by using the maximum entropy (ME) method to build a classifier for labeling utterances with DA tags. The ME method has the advantage that a large number of heterogeneous features can be flexibly combined in one classifier, which can facilitate feature selection. A unique characteristic of our approach is that it does not need to model the prior probability of DAs directly, and thus avoids the use of a discourse grammar. This simplifies the implementation of the classifier and improves the efficiency of DA recognition, without sacrificing the classification accuracy. We evaluate the classifier using a large data set based on the Switchboard corpus. Encouraging performance is observed; the highest classification accuracy achieved is 75.03%. We also propose a heuristic to address the problem of sparseness of the data set. This problem has resulted in poor classification accuracies of some DA types that have very low occurrence frequencies in the data set. Preliminary evaluation shows that the method is effective in improving the macroaverage classification accuracy of the ME classifier.
2Wong, W.S. ; Luk, R.W.P. ; Leong, H.V. ; Ho, K.S. ; Lee, D.L.: Re-examining the effects of adding relevance information in a relevance feedback environment.
In: Information processing and management. 44(2008) no.3, S.1086-1116.
Abstract: This paper presents an investigation about how to automatically formulate effective queries using full or partial relevance information (i.e., the terms that are in relevant documents) in the context of relevance feedback (RF). The effects of adding relevance information in the RF environment are studied via controlled experiments. The conditions of these controlled experiments are formalized into a set of assumptions that form the framework of our study. This framework is called idealized relevance feedback (IRF) framework. In our IRF settings, we confirm the previous findings of relevance feedback studies. In addition, our experiments show that better retrieval effectiveness can be obtained when (i) we normalize the term weights by their ranks, (ii) we select weighted terms in the top K retrieved documents, (iii) we include terms in the initial title queries, and (iv) we use the best query sizes for each topic instead of the average best query size where they produce at most five percentage points improvement in the mean average precision (MAP) value. We have also achieved a new level of retrieval effectiveness which is about 55-60% MAP instead of 40+% in the previous findings. This new level of retrieval effectiveness was found to be similar to a level using a TREC ad hoc test collection that is about double the number of documents in the TREC-3 test collection used in previous works.
3Luk, R.W.P. ; Leong, H.V. ; Dillon, T.S. ; Chan, A.T.S. ; Croft, W.B. ; Allen, J.: ¬A survey in indexing and searching XML documents.
In: Journal of the American Society for Information Science and technology. 53(2002) no.6, S.415-437.
Abstract: XML holds the promise to yield (1) a more precise search by providing additional information in the elements, (2) a better integrated search of documents from heterogeneous sources, (3) a powerful search paradigm using structural as well as content specifications, and (4) data and information exchange to share resources and to support cooperative search. We survey several indexing techniques for XML documents, grouping them into flatfile, semistructured, and structured indexing paradigms. Searching techniques and supporting techniques for searching are reviewed, including full text search and multistage search. Because searching XML documents can be very flexible, various search result presentations are discussed, as well as database and information retrieval system integration and XML query languages. We also survey various retrieval models, examining how they would be used or extended for retrieving XML documents. To conclude the article, we discuss various open issues that XML poses with respect to information retrieval and database research.