Diese Datenbank enthält über 40.000 Dokumente zu Themen aus den Bereichen Formalerschließung – Inhaltserschließung – Information Retrieval.
© 2015 W. Gödert, TH Köln, Institut für Informationswissenschaft / Powered by litecat, BIS Oldenburg (Stand: 03. März 2020)
1Golub, K.: Automatic subject indexing of text.
In: Knowledge organization. 46(2019) no.2, S.104-121.
(Reviews of concepts in knowledge organization)
Abstract: Automatic subject indexing addresses problems of scale and sustainability and can be at the same time used to enrich existing metadata records, establish more connections across and between resources from various metadata and resource collec-tions, and enhance consistency of the metadata. In this work, au-tomatic subject indexing focuses on assigning index terms or classes from established knowledge organization systems (KOSs) for subject indexing like thesauri, subject headings systems and classification systems. The following major approaches are dis-cussed, in terms of their similarities and differences, advantages and disadvantages for automatic assigned indexing from KOSs: "text categorization," "document clustering," and "document classification." Text categorization is perhaps the most wide-spread, machine-learning approach with what seems generally good reported performance. Document clustering automatically both creates groups of related documents and extracts names of subjects depicting the group at hand. Document classification re-uses the intellectual effort invested into creating a KOS for sub-ject indexing and even simple string-matching algorithms have been reported to achieve good results, because one concept can be described using a number of different terms, including equiv-alent, related, narrower and broader terms. Finally, applicability of automatic subject indexing to operative information systems and challenges of evaluation are outlined, suggesting the need for more research.
Inhalt: DOI:10. 5771/0943-7444-2019-2-104.
Themenfeld: Automatisches Indexieren
2Johansson, S. ; Golub, K.: LibraryThing for libraries : how tag moderation and size limitations affect tag clouds.
In: Knowledge organization. 46(2019) no.4, S.245-259.
Abstract: The aim of this study is to analyse differences between tags on LibraryThing's web page and tag clouds in their "Library-Thing for Libraries" service, and assess if, and how, the Library-Thing tag moderation and limitations to the size of the tag cloud in the library catalogue affect the description of the information resource. An e-mail survey was conducted with personnel at LibraryThing, and the results were compared against tags for twenty different fiction books, collected from two different library catalogues with disparate tag cloud sizes, and Library-Thing's web page. The data were analysed using a modified version of Golder and Huberman's tag categories (2006). The results show that while LibraryThing claims to only remove the inherently personal tags, several other types of tags are found to have been discarded as well. Occasionally a certain type of tag is in-cluded in one book, and excluded in another. The comparison between the two tag cloud sizes suggests that the larger tag clouds provide a more pronounced picture regarding the contents of the book but at the cost of an increase in the number of tags with synonymous or redundant information.
Themenfeld: Folksonomies ; Metadaten
3Golub, K.: Subject access in Swedish discovery services.
In: Knowledge organization. 45(2018) no.4, S.297-309.
Abstract: While support for subject searching has been traditionally advocated for in library catalogs, often in the form of a catalog objective to find everything that a library has on a certain topic, research has shown that subject access has not been satisfactory. Many existing online catalogs and discovery services do not seem to make good use of the intellectual effort invested into assigning controlled subject index terms and classes. For example, few support hierarchical browsing of classification schemes and other controlled vocabularies with hierarchical structures, few provide end-user-friendly options to choose a more specific concept to increase precision, a broader concept or related concepts to increase recall, to disambiguate homonyms, or to find which term is best used to name a concept. Optimum subject access in library catalogs and discovery services is analyzed from the perspective of earlier research as well as contemporary conceptual models and cataloguing codes. Eighteen proposed features of what this should entail in practice are drawn. In an exploratory qualitative study, the three most common discovery services used in Swedish academic libraries are analyzed against these features. In line with previous research, subject access in contemporary interfaces is demonstrated to less than optimal. This is in spite of the fact that individual collections have been indexed with controlled vocabularies and a significant number of controlled vocabularies have been mapped to each other and are available in interoperable standards. Strategic action is proposed to build research-informed (inter)national standards and guidelines.
Themenfeld: OPAC ; Semantische Interoperabilität
4Golub, K. ; Soergel, D. ; Buchanan, G. ; Tudhope, D. ; Lykke, M. ; Hiom, D.: ¬A framework for evaluating automatic indexing or classification in the context of retrieval.
In: Journal of the Association for Information Science and Technology. 67(2016) no.1, S.3-16.
(Advances in information science)
Abstract: Tools for automatic subject assignment help deal with scale and sustainability in creating and enriching metadata, establishing more connections across and between resources and enhancing consistency. Although some software vendors and experimental researchers claim the tools can replace manual subject indexing, hard scientific evidence of their performance in operating information environments is scarce. A major reason for this is that research is usually conducted in laboratory conditions, excluding the complexities of real-life systems and situations. The article reviews and discusses issues with existing evaluation approaches such as problems of aboutness and relevance assessments, implying the need to use more than a single "gold standard" method when evaluating indexing and retrieval, and proposes a comprehensive evaluation framework. The framework is informed by a systematic review of the literature on evaluation approaches: evaluating indexing quality directly through assessment by an evaluator or through comparison with a gold standard, evaluating the quality of computer-assisted indexing directly in the context of an indexing workflow, and evaluating indexing quality indirectly through analyzing retrieval performance.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23600/abstract.
Themenfeld: Automatisches Indexieren ; Automatisches Klassifizieren
5Golub, K.: Subject access to information : an interdisciplinary approach.
Santa Barbara, Calif. : Libraries Unlimited, 2015. XI, 165 S.
Abstract: Drawing on the research of experts from the fields of computing and library science, this ground-breaking work will show you how to combine two very different approaches to classification to create more effective, user-friendly information-retrieval systems. * Provides an interdisciplinary overview of current and potential approaches to organizing information by subject * Covers both pure computer science and pure library science topics in easy-to-understand language accessible to audiences from both disciplines * Reviews technological standards for representation, storage, and retrieval of varied knowledge-organization systems and their constituent elements * Suggests a collaborative approach that will reduce duplicate efforts and make it easier to find solutions to practical problems.
Inhalt: Organizing information by subjectKnowledge organization systems (KOSs) -- Technological standards -- Automated tools for subject information organization : selected topics -- Perspectives for the future.
Themenfeld: Grundlagen u. Einführungen: Allgemeine Literatur
LCSH: Classification ; Subject headings ; Information organization ; Information storage and retrieval systems
RSWK: Klassifikation / Schlagwort / Wissensorganisation / Informationssystem
BK: 06.74 Informationssysteme
RVK: AN 96000
6Golub, K. ; Hansson, J. ; Soergel, D. ; Tudhope, D.: Managing classification in libraries : a methodological outline for evaluating automatic subject indexing and classification in Swedish library catalogues.
In: Classification and authority control: expanding resource discovery: proceedings of the International UDC Seminar 2015, 29-30 October 2015, Lisbon, Portugal. Eds.: Slavic, A. u. M.I. Cordeiro. Würzburg : Ergon-Verlag, 2015. S.163-175.
Abstract: Subject terms play a crucial role in resource discovery but require substantial effort to produce. Automatic subject classification and indexing address problems of scale and sustainability and can be used to enrich existing bibliographic records, establish more connections across and between resources and enhance consistency of bibliographic data. The paper aims to put forward a complex methodological framework to evaluate automatic classification tools of Swedish textual documents based on the Dewey Decimal Classification (DDC) recently introduced to Swedish libraries. Three major complementary approaches are suggested: a quality-built gold standard, retrieval effects, domain analysis. The gold standard is built based on input from at least two catalogue librarians, end-users expert in the subject, end users inexperienced in the subject and automated tools. Retrieval effects are studied through a combination of assigned and free tasks, including factual and comprehensive types. The study also takes into consideration the different role and character of subject terms in various knowledge domains, such as scientific disciplines. As a theoretical framework, domain analysis is used and applied in relation to the implementation of DDC in Swedish libraries and chosen domains of knowledge within the DDC itself.
Inhalt: Präsentation unter: http://www.udcds.com/seminar/2015/media/slides/Hansson_InternationalUDCSeminar2015.pdf.
Themenfeld: Automatisches Klassifizieren
7Golub, K. ; Lykke, M. ; Tudhope, D.: Enhancing social tagging with automated keywords from the Dewey Decimal Classification.
In: Journal of documentation. 70(2014) no.5, S.801-828.
Abstract: Purpose - The purpose of this paper is to explore the potential of applying the Dewey Decimal Classification (DDC) as an established knowledge organization system (KOS) for enhancing social tagging, with the ultimate purpose of improving subject indexing and information retrieval. Design/methodology/approach - Over 11.000 Intute metadata records in politics were used. Totally, 28 politics students were each given four tasks, in which a total of 60 resources were tagged in two different configurations, one with uncontrolled social tags only and another with uncontrolled social tags as well as suggestions from a controlled vocabulary. The controlled vocabulary was DDC comprising also mappings from the Library of Congress Subject Headings. Findings - The results demonstrate the importance of controlled vocabulary suggestions for indexing and retrieval: to help produce ideas of which tags to use, to make it easier to find focus for the tagging, to ensure consistency and to increase the number of access points in retrieval. The value and usefulness of the suggestions proved to be dependent on the quality of the suggestions, both as to conceptual relevance to the user and as to appropriateness of the terminology. Originality/value - No research has investigated the enhancement of social tagging with suggestions from the DDC, an established KOS, in a user trial, comparing social tagging only and social tagging enhanced with the suggestions. This paper is a final reflection on all aspects of the study.
Themenfeld: Social tagging ; Automatisches Indexieren
8Golub, K. ; Tudhope, D. ; Zeng, M.L. ; Zumer, M.: Terminology registries for knowledge organization systems : functionality, use, and attributes.
In: Journal of the Association for Information Science and Technology. 65(2014) no.9, S.1901-1916.
Abstract: Terminology registries (TRs) are a crucial element of the infrastructure required for resource discovery services, digital libraries, Linked Data, and semantic interoperability generally. They can make the content of knowledge organization systems (KOS) available both for human and machine access. The paper describes the attributes and functionality for a TR, based on a review of published literature, existing TRs, and a survey of experts. A domain model based on user tasks is constructed and a set of core metadata elements for use in TRs is proposed. Ideally, the TR should allow searching as well as browsing for a KOS, matching a user's search while also providing information about existing terminology services, accessible to both humans and machines. The issues surrounding metadata for KOS are also discussed, together with the rationale for different aspects and the importance of a core set of KOS metadata for future machine-based access; a possible core set of metadata elements is proposed. This is dealt with in terms of practical experience and in relation to the Dublin Core Application Profile.
Themenfeld: Semantische Interoperabilität
9Golub, K.: Automated subject classification of textual documents in the context of Web-based hierarchical browsing.
In: Knowledge organization. 38(2011) no.3, S.230-244.
Abstract: While automated methods for information organization have been around for several decades now, exponential growth of the World Wide Web has put them into the forefront of research in different communities, within which several approaches can be identified: 1) machine learning (algorithms that allow computers to improve their performance based on learning from pre-existing data); 2) document clustering (algorithms for unsupervised document organization and automated topic extraction); and 3) string matching (algorithms that match given strings within larger text). Here the aim was to automatically organize textual documents into hierarchical structures for subject browsing. The string-matching approach was tested using a controlled vocabulary (containing pre-selected and pre-defined authorized terms, each corresponding to only one concept). The results imply that an appropriate controlled vocabulary, with a sufficient number of entry terms designating classes, could in itself be a solution for automated classification. Then, if the same controlled vocabulary had an appropriat hierarchical structure, it would at the same time provide a good browsing structure for the collection of automatically classified documents.
Inhalt: Vgl.: http://www.ergon-verlag.de/isko_ko/downloads/ko_38_2011_3_d.pdf.
Themenfeld: Automatisches Klassifizieren
10Matthews, B. ; Jones, C. ; Puzon, B. ; Moon, J. ; Tudhope, D. ; Golub, K. ; Nielsen, M.L.: ¬An evaluation of enhancing social tagging with a knowledge organization system.
In: Aslib proceedings. 62(2010) nos.4/5, S.447-465.
Abstract: Purpose - Traditional subject indexing and classification are considered infeasible in many digital collections. This paper seeks to investigate ways of enhancing social tagging via knowledge organization systems, with a view to improving the quality of tags for increased information discovery and retrieval performance. Design/methodology/approach - Enhanced tagging interfaces were developed for exemplar online repositories, and trials were undertaken with author and reader groups to evaluate the effectiveness of tagging augmented with control vocabulary for subject indexing of papers in online repositories. Findings - The results showed that using a knowledge organisation system to augment tagging does appear to increase the effectiveness of non-specialist users (that is, without information science training) in subject indexing. Research limitations/implications - While limited by the size and scope of the trials undertaken, these results do point to the usefulness of a mixed approach in supporting the subject indexing of online resources. Originality/value - The value of this work is as a guide to future developments in the practical support for resource indexing in online repositories.
Anmerkung: Beitrag in einem Special Issue: Content architecture: exploiting and managing diverse resources: proceedings of the first national conference of the United Kingdom chapter of the International Society for Knowedge Organization (ISKO)
Themenfeld: Social tagging
11Golub, K. ; Lykke, M.: Automated classification of web pages in hierarchical browsing.
In: Journal of documentation. 65(2009) no.6, S.901-925.
Abstract: Purpose - The purpose of this study is twofold: to investigate whether it is meaningful to use the Engineering Index (Ei) classification scheme for browsing, and then, if proven useful, to investigate the performance of an automated classification algorithm based on the Ei classification scheme. Design/methodology/approach - A user study was conducted in which users solved four controlled searching tasks. The users browsed the Ei classification scheme in order to examine the suitability of the classification systems for browsing. The classification algorithm was evaluated by the users who judged the correctness of the automatically assigned classes. Findings - The study showed that the Ei classification scheme is suited for browsing. Automatically assigned classes were on average partly correct, with some classes working better than others. Success of browsing showed to be correlated and dependent on classification correctness. Research limitations/implications - Further research should address problems of disparate evaluations of one and the same web page. Additional reasons behind browsing failures in the Ei classification scheme also need further investigation. Practical implications - Improvements for browsing were identified: describing class captions and/or listing their subclasses from start; allowing for searching for words from class captions with synonym search (easily provided for Ei since the classes are mapped to thesauri terms); when searching for class captions, returning the hierarchical tree expanded around the class in which caption the search term is found. The need for improvements of classification schemes was also indicated. Originality/value - A user-based evaluation of automated subject classification in the context of browsing has not been conducted before; hence the study also presents new findings concerning methodology.
Themenfeld: Automatisches Klassifizieren ; Klassifikationssysteme im Online-Retrieval
Objekt: Engineering Index Classification
12Golub, K. ; Moon, J. ; Nielsen, M.L. ; Tudhope, D.: EnTag: Enhanced Tagging for Discovery.
Abstract: Purpose: Investigate the combination of controlled and folksonomy approaches to support resource discovery in repositories and digital collections. Aim: Investigate whether use of an established controlled vocabulary can help improve social tagging for better resource discovery. Objectives: (1) Investigate indexing aspects when using only social tagging versus when using social tagging with suggestions from a controlled vocabulary; (2) Investigate above in two different contexts: tagging by readers and tagging by authors; (3) Investigate influence of only social tagging versus social tagging with a controlled vocabulary on retrieval. - Vgl.: http://www.ukoln.ac.uk/projects/enhanced-tagging/.
Inhalt: Präsentation während der DC NKOS Special Session, 24 Sep 2008, Berlin.
Themenfeld: Metadaten ; Social tagging
Objekt: EnTag ; Intute
13Golub, K. ; Hamon, T. ; Ardö, A.: Automated classification of textual documents based on a controlled vocabulary in engineering.
In: Knowledge organization. 34(2007) no.4, S.247-263.
Abstract: Automated subject classification has been a challenging research issue for many years now, receiving particular attention in the past decade due to rapid increase of digital documents. The most frequent approach to automated classification is machine learning. It, however, requires training documents and performs well on new documents only if these are similar enough to the former. We explore a string-matching algorithm based on a controlled vocabulary, which does not require training documents - instead it reuses the intellectual work put into creating the controlled vocabulary. Terms from the Engineering Information thesaurus and classification scheme were matched against title and abstract of engineering papers from the Compendex database. Simple string-matching was enhanced by several methods such as term weighting schemes and cut-offs, exclusion of certain terms, and en- richment of the controlled vocabulary with automatically extracted terms. The best results are 76% recall when the controlled vocabulary is enriched with new terms, and 79% precision when certain terms are excluded. Precision of individual classes is up to 98%. These results are comparable to state-of-the-art machine-learning algorithms.
Themenfeld: Automatisches Klassifizieren
14Golub, K.: Automated subject classification of textual web documents.
In: Journal of documentation. 62(2006) no.3, S.350-371.
Abstract: Purpose - To provide an integrated perspective to similarities and differences between approaches to automated classification in different research communities (machine learning, information retrieval and library science), and point to problems with the approaches and automated classification as such. Design/methodology/approach - A range of works dealing with automated classification of full-text web documents are discussed. Explorations of individual approaches are given in the following sections: special features (description, differences, evaluation), application and characteristics of web pages. Findings - Provides major similarities and differences between the three approaches: document pre-processing and utilization of web-specific document characteristics is common to all the approaches; major differences are in applied algorithms, employment or not of the vector space model and of controlled vocabularies. Problems of automated classification are recognized. Research limitations/implications - The paper does not attempt to provide an exhaustive bibliography of related resources. Practical implications - As an integrated overview of approaches from different research communities with application examples, it is very useful for students in library and information science and computer science, as well as for practitioners. Researchers from one community have the information on how similar tasks are conducted in different communities. Originality/value - To the author's knowledge, no review paper on automated text classification attempted to discuss more than one community's approach from an integrated perspective.
Themenfeld: Automatisches Klassifizieren
15Golub, K.: Automated subject classification of textual Web pages, based on a controlled vocabulary : challenges and recommendations.
In: New review of hypermedia and multimedia. 12(2006) no.1, S.11-27.
Abstract: The primary objective of this study was to identify and address problems of applying a controlled vocabulary in automated subject classification of textual Web pages, in the area of engineering. Web pages have special characteristics such as structural information, but are at the same time rather heterogeneous. The classification approach used comprises string-to-string matching between words in a term list extracted from the Ei (Engineering Information) thesaurus and classification scheme, and words in the text to be classified. Based on a sample of 70 Web pages, a number of problems with the term list are identified. Reasons for those problems are discussed and improvements proposed. Methods for implementing the improvements are also specified, suggesting further research.
Inhalt: Beitrag eines Themenheftes "Knowledge organization systems and services"
Themenfeld: Automatisches Klassifizieren
16Koch, T. ; Golub, K. ; Ardö, A.: Users browsing behaviour in a DDC-based Web service : a log analysis.
In: Cataloging and classification quarterly. 42(2006) nos.3/4, S.163-186.
Abstract: This study explores the navigation behaviour of all users of a large web service, Renardus, using web log analysis. Renardus provides integrated searching and browsing access to quality-controlled web resources from major individual subject gateway services. The main navigation feature is subject browsing through the Dewey Decimal Classification (DDC) based on mapping of classes of resources from the distributed gateways to the DDC structure. Among the more surprising results are the hugely dominant share of browsing activities, the good use of browsing support features like the graphical fish-eye overviews, rather long and varied navigation sequences, as well as extensive hierarchical directory-style browsing through the large DDC system.
Inhalt: Vgl. auch: http://catalogingandclassificationquarterly.com/
Anmerkung: Beitrag in einem Themenheft "Moving beyond the presentation layer: content and context in the Dewey Decimal Classification (DDC) System"
Themenfeld: Klassifikationssysteme im Online-Retrieval ; Benutzerstudien ; Visualisierung
Objekt: DDC ; Renardus