Diese Datenbank enthält über 40.000 Dokumente zu Themen aus den Bereichen Formalerschließung – Inhaltserschließung – Information Retrieval.
© 2015 W. Gödert, TH Köln, Institut für Informationswissenschaft / Powered by litecat, BIS Oldenburg (Stand: 15. Juni 2019)
1Macias-Galindo, D. ; Cavedon, L. ; Thangarajah, J. ; Wong, W.: Effects of domain on measures of semantic relatedness.
In: Journal of the Association for Information Science and Technology. 66(2015) no.10, S.2116-2131.
Abstract: Measures of semantic relatedness have been used in a variety of applications in information retrieval and language technology, such as measuring document similarity and cohesion of text. Definitions of such measures have ranged from using distance-based calculations over WordNet or other taxonomies to statistical distributional metrics over document collections such as Wikipedia or the Web. Existing measures do not explicitly consider the domain associations of terms when calculating relatedness: This article demonstrates that domain matters. We construct a data set of pairs of terms with associated domain information and extract pairs that are scored nearly identical by a sample of existing semantic-relatedness measures. We show that human judgments reliably score those pairs containing terms from the same domain as significantly more related than cross-domain pairs, even though the semantic-relatedness measures assign the pairs similar scores. We provide further evidence for this result using a machine learning setting by demonstrating that domain is an informative feature when learning a metric. We conclude that existing relatedness measures do not account for domain in the same way or to the same extent as do human judges.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23303/abstract.
2Wong, W. ; Thangarajah, J.T. ; Padgham, L.: Contextual question answering for the health domain.
In: Journal of the American Society for Information Science and Technology. 63(2012) no.11, S.2313-2327.
Abstract: Studies have shown that natural language interfaces such as question answering and conversational systems allow information to be accessed and understood more easily by users who are unfamiliar with the nuances of the delivery mechanisms (e.g., keyword-based search engines) or have limited literacy in certain domains (e.g., unable to comprehend health-related content due to terminology barrier). In particular, the increasing use of the web for health information prompts us to reexamine our existing delivery mechanisms. We present enquireMe, which is a contextual question answering system that provides lay users with the ability to obtain responses about a wide range of health topics by vaguely expressing at the start and gradually refining their information needs over the course of an interaction session using natural language. enquireMe allows the users to engage in "conversations" about their health concerns, a process that can be therapeutic in itself. The system uses community-driven question-answer pairs from the web together with a decay model to deliver the top scoring answers as responses to the users' unrestricted inputs. We evaluated enquireMe using benchmark data from WebMD and TREC to assess the accuracy of system-generated answers. Despite the absence of complex knowledge acquisition and deep language processing, enquireMe is comparable to the state-of-the-art question answering systems such as START as well as those interactive systems from TREC.
3Wong, W. ; Liu, W. ; Bennamoun, M.: Ontology learning from text : a look back and into the future.
Abstract: Ontologies are often viewed as the answer to the need for inter-operable semantics in modern information systems. The explosion of textual information on the "Read/Write" Web coupled with the increasing demand for ontologies to power the Semantic Web have made (semi-)automatic ontology learning from text a very promising research area. This together with the advanced state in related areas such as natural language processing have fuelled research into ontology learning over the past decade. This survey looks at how far we have come since the turn of the millennium, and discusses the remaining challenges that will define the research directions in this area in the near future.
Inhalt: Pre-publication version für: ACM Computing Surveys, Vol. X, No. X, Article X, Publication date: X 2011.
Themenfeld: Wissensrepräsentation ; Computerlinguistik
4Wong, W.S. ; Luk, R.W.P. ; Leong, H.V. ; Ho, K.S. ; Lee, D.L.: Re-examining the effects of adding relevance information in a relevance feedback environment.
In: Information processing and management. 44(2008) no.3, S.1086-1116.
Abstract: This paper presents an investigation about how to automatically formulate effective queries using full or partial relevance information (i.e., the terms that are in relevant documents) in the context of relevance feedback (RF). The effects of adding relevance information in the RF environment are studied via controlled experiments. The conditions of these controlled experiments are formalized into a set of assumptions that form the framework of our study. This framework is called idealized relevance feedback (IRF) framework. In our IRF settings, we confirm the previous findings of relevance feedback studies. In addition, our experiments show that better retrieval effectiveness can be obtained when (i) we normalize the term weights by their ranks, (ii) we select weighted terms in the top K retrieved documents, (iii) we include terms in the initial title queries, and (iv) we use the best query sizes for each topic instead of the average best query size where they produce at most five percentage points improvement in the mean average precision (MAP) value. We have also achieved a new level of retrieval effectiveness which is about 55-60% MAP instead of 40+% in the previous findings. This new level of retrieval effectiveness was found to be similar to a level using a TREC ad hoc test collection that is about double the number of documents in the TREC-3 test collection used in previous works.
5Couvreur, T.R. ; Benzel, R.N. ; Miller, S.F. ; Zeitler, D.N. ; Lee, D.L. ; Singhal, M. ; Shivaratri, N. ; Wong, W.Y.P.: ¬An analysis of performance and cost factors in searching large text databases using parallel search systems.
In: Journal of the American Society for Information Science. 45(1994) no.7, S.443-464.
Abstract: The results of modelling the performance of searching large text databases (>10 GBytes) via various parallel hardware architectures and search algorithms are discussed. The performance under load and the cost of each configuration are compared. Strengths, weaknesses, performance sensitivities, and search features supported for each configuration are also addressed. In addition, a common search workload used in the modelling is described. The search workload is derived from a set of searches run against the Chemical Abstracts file of bibliographic and abstract text available on STN International. This common workload is applied to all configurations modelled to provide a common basis of comparison
Themenfeld: Retrievalalgorithmen ; Volltextretrieval
6Wong, W.Y.P. ; Lee, D.L.: Implementation of partial document ranking using inverted files.
In: Information processing and management. 29(1993) no.5, S.647-669.
Abstract: Examines the implementations of document ranking based on inverted files. Studies three heuristic methods for implementing the term frequency X inverse document frequency weighting strategy. The basic idea of the heuristic methods is to process the query terms in an order so that as many top documents as possible can be identified without processing all of the query terms. The heuristics were evaluated and compared. The results show improved performance. Two methods for estimating the retrieval accuracy were studied. All experiments were based on four test collection made available with the SMART system