Diese Datenbank enthält über 40.000 Dokumente zu Themen aus den Bereichen Formalerschließung – Inhaltserschließung – Information Retrieval.
© 2015 W. Gödert, TH Köln, Institut für Informationswissenschaft / Powered by litecat, BIS Oldenburg (Stand: 15. Juni 2019)
1Avrahami, T.T. ; Yau, L. ; Si, L. ; Callan, J.P.: ¬The FedLemur project : Federated search in the real world.
In: Journal of the American Society for Information Science and Technology. 57(2006) no.3, S.347-358.
Abstract: Federated search and distributed information retrieval systems provide a single user interface for searching multiple full-text search engines. They have been an active area of research for more than a decade, but in spite of their success as a research topic, they are still rare in operational environments. This article discusses a prototype federated search system developed for the U.S. government's FedStats Web portal, and the issues addressed in adapting research solutions to this operational environment. A series of experiments explore how well prior research results, parameter settings, and heuristics apply in the FedStats environment. The article concludes with a set of lessons learned from this technology transfer effort, including observations about search engine quality in the real world.
Themenfeld: Verteilte bibliographische Datenbanken
2Collins-Thompson, K. ; Callan, J.: Predicting reading difficulty with statistical language models.
In: Journal of the American Society for Information Science and Technology. 56(2005) no.13, S.1448-1462.
Abstract: A potentially useful feature of information retrieval systems for students is the ability to identify documents that not only are relevant to the query but also match the student's reading level. Manually obtaining an estimate of reading difficulty for each document is not feasible for very large collections, so we require an automated technique. Traditional readability measures, such as the widely used Flesch-Kincaid measure, are simple to apply but perform poorly an Web pages and other nontraditional documents. This work focuses an building a broadly applicable statistical model of text for different reading levels that works for a wide range of documents. To do this, we recast the weIl-studied problem of readability in terms of text categorization and use straightforward techniques from statistical language modeling. We show that with a modified form of text categorization, it is possible to build generally applicable cIassifiers with relatively little training data. We apply this method to the problem of classifying Web pages according to their reading difficulty level and show that by using a mixture model to interpolate evidence of a word's frequency across grades, it is possible to build a classifier that achieves an average root mean squared error of between one and two grade levels for 9 of 12 grades. Such cIassifiers have very efficient implementations and can be applied in many different scenarios. The models can be varied to focus an smaller or larger grade ranges or easily retrained for a variety of tasks or populations.
5Callan, J.: Distributed information retrieval.
In: Advances in information retrieval: Recent research from the Center for Intelligent Information Retrieval. Ed.: W.B. Croft. Boston, MA : Kluwer Academic Publ., 2000. S.127-150.
(The Kluwer international series on information retrieval; 7)
Abstract: A multi-database model of distributed information retrieval is presented, in which people are assumed to have access to many searchable text databases. In such an environment, full-text information retrieval consists of discovering database contents, ranking databases by their expected ability to satisfy the query, searching a small number of databases, and merging results returned by different databases. This paper presents algorithms for each task. It also discusses how to reorganize conventional test collections into multi-database testbeds, and evaluation methodologies for multi-database experiments. A broad and diverse group of experimental results is presented to demonstrate that the algorithms are effective, efficient, robust, and scalable
Themenfeld: Verteilte bibliographische Datenbanken
6Allan, J. ; Callan, J.P. ; Croft, W.B. ; Ballesteros, L. ; Broglio, J. ; Xu, J. ; Shu, H.: INQUERY at TREC-5.
In: The Fifth Text Retrieval Conference (TREC-5). Ed.: E.M. Voorhees u. D.K. Harman. Gaithersburgh, MD : National Institute of Standards and Technology, 1997. S.191-197.
(NIST special publication;)
Objekt: TREC ; INQUERY
7Allan, J. ; Ballesteros, L. ; Callan, J.P. ; Croft, W.B. ; Lu, Z.: Recent experiment with INQUERY.
In: The Fourth Text Retrieval Conference (TREC-4). Ed.: K. Harman. Gaithersburgh, MD : National Institute of Standards and Technology, 1996. S.49-63.
(NIST special publication; 500-236)
Objekt: INQUERY ; TREC
8Callan, J. ; Croft, W.B. ; Broglio, J.: TREC and TIPSTER experiments with INQUERY.
In: Information processing and management. 31(1995) no.3, S.327-343.
Anmerkung: Wiederabgedruckt in: Readings in information retrieval. Ed.: K. Sparck Jones u. P. Willett. San Francisco: Morgan Kaufmann 1997. S.436-439.
Objekt: TREC ; TIPSTER ; INQUERY