Diese Datenbank enthält über 40.000 Dokumente zu Themen aus den Bereichen Formalerschließung – Inhaltserschließung – Information Retrieval.
© 2015 W. Gödert, TH Köln, Institut für Informationswissenschaft / Powered by litecat, BIS Oldenburg (Stand: 28. April 2022)
1Huang, L. ; Milne, D. ; Frank, E. ; Witten, I.H.: Learning a concept-based document similarity measure.
In: Journal of the American Society for Information Science and Technology. 63(2012) no.8, S.1593-1608.
Abstract: Document similarity measures are crucial components of many text-analysis tasks, including information retrieval, document classification, and document clustering. Conventional measures are brittle: They estimate the surface overlap between documents based on the words they mention and ignore deeper semantic connections. We propose a new measure that assesses similarity at both the lexical and semantic levels, and learns from human judgments how to combine them by using machine-learning techniques. Experiments show that the new measure produces values for documents that are more consistent with people's judgments than people are with each other. We also use it to classify and cluster large document sets covering different genres and topics, and find that it improves both classification and clustering performance.
Themenfeld: Semantisches Umfeld in Indexierung u. Retrieval
2Witten, I.H. ; Bainbridge, M. ; Nichols, D.M.: How to build a digital library.2nd ed.
Amsterdam : Morgan Kaufmann, 2010. xxiii, 629 S.
(The Morgan Kaufmann series in multimedia information and systems)
Abstract: "How to Build a Digital Library" is the only book that offers all the knowledge and tools needed to construct and maintain a digital library, regardless of the size or purpose. It is the perfectly self-contained resource for individuals, agencies, and institutions wishing to put this powerful tool to work in their burgeoning information treasuries. The second edition reflects new developments in the field as well as in the Greenstone Digital Library open source software. In Part I, the authors have added an entire new chapter on user groups, user support, collaborative browsing, user contributions, and so on. There is also new material on content-based queries, map-based queries, cross-media queries. There is an increased emphasis placed on multimedia by adding a 'digitizing' section to each major media type. A new chapter has also been added on 'internationalization', which will address Unicode standards, multi-language interfaces and collections, and issues with non-European languages (Chinese, Hindi, etc.). Part II, the software tools section, has been completely rewritten to reflect the new developments in Greenstone Digital Library Software, an internationally popular open source software tool with a comprehensive graphical facility for creating and maintaining digital libraries. As with the First Edition, a web site, implemented as a digital library, will accompany the book and provide access to color versions of all figures, two online appendices, a full-text sentence-level index, and an automatically generated glossary of acronyms and their definitions. In addition, demonstration digital library collections will be included to demonstrate particular points in the book. To access the online content please visit our associated website. This title outlines the history of libraries - both traditional and digital - and their impact on present practices and future directions. It is written for both technical and non-technical audiences and covers the entire spectrum of media, including text, images, audio, video, and related XML standards. It is web-enhanced with software documentation, color illustrations, full-text index, source code, and more.
Inhalt: Orientation : the world of digital libraries -- People in digital libraries -- Presentation : user interfaces -- Textual documents: the raw material -- Multimedia : more raw material -- Metadata : elements of organization -- Interoperability : protocols and services -- Internationalization : the global challenge -- Visions : future, past, and present -- Greenstone digital library software. Building collections -- Operating and interoperating -- Design patterns for advanced user interfaces.
LCSH: Greenstone digital library software ; Digital libraries ; Digital libraries / Collection development / Computer programs
RSWK: Elektronische Bibliothek
DDC: 025.00285 / dc22
GHBS: TWY (SI) ; AWUI (SI)
LCC: ZA4080 .W58 2010
RVK: AN 73000 ; ST 515
3Medelyan, O. ; Witten, I.H.: Domain-independent automatic keyphrase indexing with small training sets.
In: Journal of the American Society for Information Science and Technology. 59(2008) no.7, S.1026-1040.
Abstract: Keyphrases are widely used in both physical and digital libraries as a brief, but precise, summary of documents. They help organize material based on content, provide thematic access, represent search results, and assist with navigation. Manual assignment is expensive because trained human indexers must reach an understanding of the document and select appropriate descriptors according to defined cataloging rules. We propose a new method that enhances automatic keyphrase extraction by using semantic information about terms and phrases gleaned from a domain-specific thesaurus. The key advantage of the new approach is that it performs well with very little training data. We evaluate it on a large set of manually indexed documents in the domain of agriculture, compare its consistency with a group of six professional indexers, and explore its performance on smaller collections of documents in other domains and of French and Spanish documents.
Themenfeld: Automatisches Indexieren
4Nichols, D.M. ; Witten, I.H. ; Keegan, T.T. ; Bainbridge, D. ; Dewsnip, M.: Digital libraries and minority languages.
In: New review of hypermedia and multimedia. 11(2005) no.2, S.139-155.
Abstract: Digital libraries have a pivotal role to play in the preservation and maintenance of international cultures in general and minority languages in particular. This paper outlines a software tool for building digital libraries that is well adapted for creating and distributing local information collections in minority languages, and describes some contexts in which it is used. The system can make multilingual documents available in structured collections and allows them to be accessed via multilingual interfaces. It is issued under a free open-source licence, which encourages participatory design of the software, and an end-user interface allows community-based localization of the various language interfaces-of which there are many.
Inhalt: Beitrag in einem Themenheft "Minority languages, multimedia and the Web"
Themenfeld: Internet ; Multilinguale Probleme
5Witten, I.H. ; Bainbridge, D.: Creating digital library collections with Greenstone.
In: Library hi tech. 23(2005) no.4, S.541-560.
Abstract: Purpose - The purpose of this paper is to introduce Greenstone and explain how librarians use it to create and customize digital library collections. Design/methodology/approach - Through an end-user interface, users may add documents and metadata to collections, create new collections whose structure mirrors existing ones, and build collections and put them in place for users to view. Findings - First-time users can easily and quickly create their own digital library collections. More advanced users can design and customize new collection structures Originality/value - The Greenstone digital library software is a comprehensive system for building and distributing digital library collections. It provides a way of organizing information based on metadata and publishing it on the Internet or on removable media such as CD-ROM/DVD.
6Witten, I.H. ; Bainbridge, D. ; Boddie, S.J.: Greenstone : open-source digital library software.
In: D-Lib magazine. 7(2001) no.10, x S.
Abstract: The Greenstone digital library software is an open-source system for the construction and presentation of information collections. It builds collections with effective full-text searching and metadata-based browsing facilities that are attractive and easy to use. Moreover, they are easily maintained and can be augmented and rebuilt entirely automatically. The system is extensible: software "plugins" accommodate different document and metadata types. Greenstone incorporates an interface that makes it easy for people to create their own library collections. Collections may be built and served locally from the user's own web server, or (given appropriate permissions) remotely on a shared digital library host. End users can easily build new collections styled after existing ones from material on the Web or from their local files (or both), and collections can be updated and new ones brought on-line at any time.
Anmerkung: Vgl.: http://dlib.ukoln.ac.uk/dlib/october01/witten/10witten.html.
8Witten, I.H. ; Moffat, A. ; Bell, T.C.: Managing gigabytes : compressing and indexing documents and images.
New York : Van Nostrand Reinhold, 1994. 429 S.
Abstract: Offers both students and professionals guidance on large-scale information systems. This resource describes a new generation of techniques for compressing, storing, and retrieving information - both machine readable text and optically scanned documents. Appropriate for information science and information retrieval courses
9Bell, T.C. ; Moffat, A. ; Nevill-Manning, C.G. ; Witten, I.H. ; Zobel, J.: Data compression in full-text retrieval system.
In: Journal of the American Society for Information Science. 44(1993) no.9, S.508-531.
Abstract: When data compression is applied to full-text retrieval systems, intricate relationships emerge between the amount of compression, access speed, and computing resources required. We propose compression methods, and explore corresponding tradeoffs, for all components of static full-text systems such as text databases on CD-ROM. These components include lexical indexes, and the mein text itself. Results are reported on the application of the methods to several substantial full-text databases, and show that a large, unindexed text can be stored, along with indexes that facilitate fast searching, in less than half its original size - at some appreciable cost in primary memory requirements