Literatur zur Informationserschließung
Diese Datenbank enthält über 40.000 Dokumente zu Themen aus den Bereichen Formalerschließung – Inhaltserschließung – Information Retrieval.
© 2015 W. Gödert, TH Köln, Institut für Informationswissenschaft
/
Powered by litecat, BIS Oldenburg
(Stand: 28. April 2022)
Suche
Suchergebnisse
Treffer 21–40 von 159
sortiert nach:
-
21Raieli, R.: ¬The semantic hole : enthusiasm and caution around multimedia information retrieval.
In: Knowledge organization. 39(2012) no.1, S.13-22.
Abstract: This paper centres on the tools for the management of new digital documents, which are not only textual, but also visual-video, audio or multimedia in the full sense. Among the aims is to demonstrate that operating within the terms of generic Information Retrieval through textual language only is limiting, and it is instead necessary to consider ampler criteria, such as those of MultiMedia Information Retrieval, according to which, every type of digital document can be analyzed and searched by the proper elements of language for its proper nature. MMIR is presented as the organic complex of the systems of Text Retrieval, Visual Retrieval, Video Retrieval, and Audio Retrieval, each of which has an approach to information management that handles the concrete textual, visual, audio, or video content of the documents directly, here defined as content-based. In conclusion, the limits of this content-based objective access to documents is underlined. The discrepancy known as the semantic gap is that which occurs between semantic-interpretive access and content-based access. Finally, the integration of these conceptions is explained, gathering and composing the merits and the advantages of each of the approaches and of the systems to access to information.
Inhalt: Vgl.: http://www.ergon-verlag.de/isko_ko/downloads/ko_39_2012_1_b.pdf.
Anmerkung: Bezugnahme auf: Enser, P.G.B.: Visual image retrieval. In: Annual review of information science and technology. 42(2008), S.3-42.
Themenfeld: Multimedia ; Inhaltsanalyse
-
22Thelwall, M. ; Buckley, K. ; Paltoglou, G.: Sentiment strength detection for the social web.
In: Journal of the American Society for Information Science and Technology. 63(2012) no.1, S.163-173.
Abstract: Sentiment analysis is concerned with the automatic extraction of sentiment-related information from text. Although most sentiment analysis addresses commercial tasks, such as extracting opinions from product reviews, there is increasing interest in the affective dimension of the social web, and Twitter in particular. Most sentiment analysis algorithms are not ideally suited to this task because they exploit indirect indicators of sentiment that can reflect genre or topic instead. Hence, such algorithms used to process social web texts can identify spurious sentiment patterns caused by topics rather than affective phenomena. This article assesses an improved version of the algorithm SentiStrength for sentiment strength detection across the social web that primarily uses direct indications of sentiment. The results from six diverse social web data sets (MySpace, Twitter, YouTube, Digg, Runners World, BBC Forums) indicate that SentiStrength 2 is successful in the sense of performing better than a baseline approach for all data sets in both supervised and unsupervised cases. SentiStrength is not always better than machine-learning approaches that exploit indirect indicators of sentiment, however, and is particularly weaker for positive sentiment in news-related discussions. Overall, the results suggest that, even unsupervised, SentiStrength is robust enough to be applied to a wide variety of different social web contexts.
Themenfeld: Inhaltsanalyse
-
23Buckland, M.K.: Obsolescence in subject description.
In: Journal of documentation. 68(2012) no.2, S.154-161.
Abstract: Purpose - The paper aims to explain the character and causes of obsolescence in assigned subject descriptors. Design/methodology/approach - The paper takes the form of a conceptual analysis with examples and reference to existing literature. Findings - Subject description comes in two forms: assigning the name or code of a subject to a document and assigning a document to a named subject category. Each method associates a document with the name of a subject. This naming activity is the site of tensions between the procedural need of information systems for stable records and the inherent multiplicity and instability of linguistic expressions. As languages change, previously assigned subject descriptions become obsolescent. The issues, tensions, and compromises involved are introduced. Originality/value - Drawing on the work of Robert Fairthorne and others, an explanation of the unavoidable obsolescence of assigned subject headings is presented. The discussion relates to libraries, but the same issues arise in any context in which subject description is expected to remain useful for an extended period of time.
Themenfeld: Inhaltsanalyse
-
24Clavier, V. ; Paganelli, C.: Including authorial stance in the indexing of scientific documents.
In: Knowledge organization. 39(2012) no.4, S.292-299.
Abstract: This article argues that authorial stance should be taken into account in the indexing of scientific documents. Authorial stance has been widely studied in linguistics and is a typical feature of scientific writing that reveals the uniqueness of each author's perspective, their scientific contribution, and their thinking. We argue that authorial stance guides the reading of scientific documents and that it can be used to characterize the knowledge contained in such documents. Our research has previously shown that people reading dissertations are interested both in a topic and in a document's authorial stance. Now, we would like to propose a two-tiered indexing system. Dissertations would first be divided into paragraphs; then, each information unit would be defined by topic and by the markers of authorial stance present in the document.
Inhalt: Beitrag aus: Selected Papers from the 8th ISKO-France Conference, 27-28 June 2011, Lille, Université Charles-De-Gaulle Lille 3. Vgl.: http://www.ergon-verlag.de/isko_ko/downloads/ko_39_2012_4_g.pdf.
Themenfeld: Inhaltsanalyse
-
25Bös, K.: Aspektorientierte Inhaltserschließung von Romanen und Bildern : ein Vergleich der Ansätze von Annelise Mark Pejtersen und Sara Shatford.
Köln : Fachhochschule / Fakultät für Informations- und Kommunikationswissenschaften, 2012. 52 S.
Abstract: Für die inhaltliche Erschließung von Sach- und Fachliteratur stehen heutzutage etablierte Verfahren und Standards zur Verfügung. Anders verhält es sich dagegen mit der Erschließung von Schöner Literatur und Bildern. Beide Medien sind sehr verschieden und haben doch eines gemeinsam. Sie lassen sich mit den Regeln für Sach- und Fachliteratur nicht zufriedenstellend inhaltlich erschließen. Dieses Problem erkannten in den 1970er und 80er Jahren beide Autoren, deren Methoden ich hier verglichen habe. Annelise Mark Pejtersen bemühte sich um eine Lösung für die Schöne Literatur und wählte dabei einen empirischen Ansatz. Sara Shatford versuchte durch theoretische Überlegungen eine Lösung für Bilder zu erarbeiten. Der empirische wie der theoretische Ansatz führten zu Methoden, die das jeweilige Medium unter verschiedenen Aspekten betrachten. Diese Aspekten basieren in beiden Fällen auf denselben Fragen. Dennoch unterscheiden sie sich stark voneinander sowohl im Hinblick auf die Inhalte, die sie aufnehmen können, als auch hinsichtlich ihrer Struktur. Eine Anwendung einer der Methoden auf das jeweils andere Medium erscheint daher nicht sinnvoll. In dieser Arbeit werden die Methoden von Pejtersen und Shatford zunächst einzeln erläutert. Im Anschluss werden die Aspekte beider Methoden vergleichend gegenübergestellt. Dazu werden ausgewählte Beispiele mit beiden Methoden erschlossen. Abschließend wird geprüft, ob die wechselseitige Erschließung, wie sie im Vergleich angewendet wurde, in der Praxis sinnvoll ist und ob es Medien gibt, deren Erschließung mit beiden Methoden interessant wäre.
Inhalt: Diplomarbeit im Studiengang Bibliothekswesen
Themenfeld: Schöne Literatur ; Inhaltsanalyse
Behandelte Form: Bilder
Objekt: AMP-Methode
-
26Moraes, J.B.E. de: Aboutness in fiction : methodological perspectives for knowledge organization.
In: Categories, contexts and relations in knowledge organization: Proceedings of the Twelfth International ISKO Conference 6-9 August 2012, Mysore, India. Eds.: Neelameghan, A. u. K.S. Raghavan. Würzburg : Ergon Verlag, 2012. S.242-248.
(Advances in knowledge organization; vol.13)
Abstract: The subject analysis of narrative texts of fiction is complex; the methodological model of identification of concepts as elaborated for scientific texts is not applicable to fiction. It is proposed here that theoretical and methodological use of the Generative Trajectory of Meaning postulated by Greimas may contribute to the identification of aboutness in narrative texts of fiction.
Themenfeld: Schöne Literatur ; Inhaltsanalyse
-
27Arastoopoor, S. ; Fattahi, R.: Users' perception of aboutness and ofness in images : an approach to subject indexing based on Ervin Panofsky's theory and users'' view.
In: Categories, contexts and relations in knowledge organization: Proceedings of the Twelfth International ISKO Conference 6-9 August 2012, Mysore, India. Eds.: Neelameghan, A. u. K.S. Raghavan. Würzburg : Ergon Verlag, 2012. S.345-351.
(Advances in knowledge organization; vol.13)
Abstract: It is widely accepted that subject indexing of an image is based on a two-dimensional approach. The first is the ofness and the second focuses on aboutness of the image. Assigning a suitable set of subject tags based on these two groups depends, to a great deal, on users' perception of the image. This study aims at analyzing users' perception of aboutness and ofness of images. 25 in-depth semi-structured interviews were conducted in two phases. In the first phase a collection of 10 widely known photographs were given to the interviewees and they were asked to assign subject tags (as many as they wanted) to each image. In the second phase some facts regarding each image were given to him / her to assign further tags (again as many as they wanted) or even modify their previous tags. The results show that the interviewees do focus both on ofness and aboutness in subject tagging; but it seems that they emphasize more on aboutness in describing images. On the other hand, as soon as the interviewees were able to distinguish the iconographical ofness, they could speak of iconographical and iconological aboutness. The results also show that subject indexers must focus on the iconographical level, especially regarding those tags which represent the ofness at this level.
Themenfeld: Inhaltsanalyse
Behandelte Form: Bilder
-
28Yoon, J.W.: Utilizing quantitative users' reactions to represent affective meanings of an image.
In: Journal of the American Society for Information Science and Technology. 61(2010) no.7, S.1345-1359.
Abstract: Emotional meaning is critical for users to retrieve relevant images. However, because emotional meanings are subject to the individual viewer's interpretation, they are considered difficult to implement when designing image retrieval systems. With the intent of making an image's emotional messages more readily accessible, this study aims to test a new approach designed to enhance the accessibility of emotional meanings during the image search process. This approach utilizes image searchers' emotional reactions, which are quantitatively measured. Broadly used quantitative measurements for emotional reactions, Semantic Differential (SD) and Self-Assessment Manikin (SAM), were selected as tools for gathering users' reactions. Emotional representations obtained from these two tools were compared with three image perception tasks: searching, describing, and sorting. A survey questionnaire with a set of 12 images was administered to 58 participants, which were tagged with basic emotions. Results demonstrated that the SAM represents basic emotions on 2-dimensional plots (pleasure and arousal dimensions), and this representation consistently corresponded to the three image perception tasks. This study provided experimental evidence that quantitative users' reactions can be a useful complementary element of current image retrieval/indexing systems. Integrating users' reactions obtained from the SAM into image browsing systems would reduce the efforts of human indexers as well as improve the effectiveness of image retrieval systems.
Themenfeld: Inhaltsanalyse
Behandelte Form: Bilder
-
29Knautz, K. ; Dröge, E. ; Finkelmeyer, S. ; Guschauski, D. ; Juchem, K. ; Krzmyk, C. ; Miskovic, D. ; Schiefer, J. ; Sen, E. ; Verbina, J. ; Werner, N. ; Stock, W.G.: Indexieren von Emotionen bei Videos.
In: Information - Wissenschaft und Praxis. 61(2010) H.4, S.221-236.
Abstract: Gegenstand der empirischen Forschungsarbeit sind dargestellte wie empfundene Gefühle bei Videos. Sind Nutzer in der Lage, solche Gefühle derart konsistent zu erschließen, dass man deren Angaben für ein emotionales Videoretrieval gebrauchen kann? Wir arbeiten mit einem kontrollierten Vokabular für neun tionen (Liebe, Freude, Spaß, Überraschung, Sehnsucht, Trauer, Ärger, Ekel und Angst), einem Schieberegler zur Einstellung der jeweiligen Intensität des Gefühls und mit dem Ansatz der broad Folksonomy, lassen also unterschiedliche Nutzer die Videos taggen. Versuchspersonen bekamen insgesamt 20 Videos (bearbeitete Filme aus YouTube) vorgelegt, deren Emotionen sie indexieren sollten. Wir erhielten Angaben von 776 Probanden und entsprechend 279.360 Schiebereglereinstellungen. Die Konsistenz der Nutzervoten ist sehr hoch; die Tags führen zu stabilen Verteilungen der Emotionen für die einzelnen Videos. Die endgültige Form der Verteilungen wird schon bei relativ wenigen Nutzern (unter 100) erreicht. Es ist möglich, im Sinne der Power Tags die jeweils für ein Dokument zentralen Gefühle (soweit überhaupt vorhanden) zu separieren und für das emotionale Information Retrieval (EmIR) aufzubereiten.
Themenfeld: Benutzerstudien ; Inhaltsanalyse
Behandelte Form: Videos
-
30Caldera-Serrano, J.: Thematic description of audio-visual information on television.
In: Aslib proceedings. 62(2010) no.2, S.202-209.
Abstract: Purpose - This paper endeavours to show the possibilities for thematic description of audio-visual documents for television with the aim of promoting and facilitating information retrieval. Design/methodology/approach - To achieve these goals different database fields are shown, as well as the way in which they are organised for indexing and thematic element description, analysed and used as an example. Some of the database fields are extracted from an analytical study of the documentary system of television in Spain. Others are being tested in university television on which indexing experiments are carried out. Findings - Not all thematic descriptions are used on television information systems; nevertheless, some television channels do use thematic descriptions of both image and sound, applying thesauri. Moreover, it is possible to access sequences using full text retrieval as well. Originality/value - The development of the documentary task, applying the described techniques, promotes thematic indexing and hence thematic retrieval. Given the fact that this is without doubt one of the aspects most demanded by television journalists (along with people's names). This conceptualisation translates into the adaptation of databases to new indexing methods.
Themenfeld: Inhaltsanalyse
Behandelte Form: AV-Materialien ; Videos
-
31Winget, M.: Describing art : an alternative approach to subject access and interpretation.
In: Journal of documentation. 65(2009) no.6, S.958-976.
Abstract: Purpose - The purpose of this paper is to examine the art historical antecedents of providing subject access to images. After reviewing the assumptions and limitations inherent in the most prevalent descriptive method, the paper seeks to introduce a new model that allows for more comprehensive representation of visually-based cultural materials. Design/methodology/approach - The paper presents a literature-based conceptual analysis, taking Panofsky's theory of iconography and iconology as the starting-point. Panofsky's conceptual model, while appropriate for art created in the Western academic tradition, ignores or misrepresents work from other eras or cultures. Continued dependence on Panofskian descriptive methods limits the functionality and usefulness of image representation systems. Findings - The paper recommends the development of a more precise and inclusive descriptive model for art objects, which is based on the premise that art is not another sort of text, and should not be interpreted as such. Practical implications - The paper provides suggestions for the development of representation models that will enhance the description of non-textual artifacts. Originality/value - The paper addresses issues in information science, the history of art, and computer science, and suggests that a new descriptive model would be of great value to both humanist and social science scholars.
Themenfeld: Inhaltsanalyse
Wissenschaftsfach: Kunst
-
32Rosso, M.A.: User-based identification of Web genres.
In: Journal of the American Society for Information Science and Technology. 59(2008) no.7, S.1053-1072.
Abstract: This research explores the use of genre as a document descriptor in order to improve the effectiveness of Web searching. A major issue to be resolved is the identification of what document categories should be used as genres. As genre is a kind of folk typology, document categories must enjoy widespread recognition by their intended user groups in order to qualify as genres. Three user studies were conducted to develop a genre palette and show that it is recognizable to users. (Palette is a term used to denote a classification, attributable to Karlgren, Bretan, Dewe, Hallberg, and Wolkert, 1998.) To simplify the users' classification task, it was decided to focus on Web pages from the edu domain. The first study was a survey of user terminology for Web pages. Three participants separated 100 Web page printouts into stacks according to genre, assigning names and definitions to each genre. The second study aimed to refine the resulting set of 48 (often conceptually and lexically similar) genre names and definitions into a smaller palette of user-preferred terminology. Ten participants classified the same 100 Web pages. A set of five principles for creating a genre palette from individuals' sortings was developed, and the list of 48 was trimmed to 18 genres. The third study aimed to show that users would agree on the genres of Web pages when choosing from the genre palette. In an online experiment in which 257 participants categorized a new set of 55 pages using the 18 genres, on average, over 70% agreed on the genre of each page. Suggestions for improving the genre palette and future directions for the work are discussed.
Themenfeld: Internet ; Inhaltsanalyse
-
33Rorissa, A. ; Iyer, H.: Theories of cognition and image categorization : what category labels reveal about basic level theory.
In: Journal of the American Society for Information Science and Technology. 59(2008) no.9, S.1383-1392.
Abstract: Information search and retrieval interactions usually involve information content in the form of document collections, information retrieval systems and interfaces, and the user. To fully understand information search and retrieval interactions between users' cognitive space and the information space, researchers need to turn to cognitive models and theories. In this article, the authors use one of these theories, the basic level theory. Use of the basic level theory to understand human categorization is both appropriate and essential to user-centered design of taxonomies, ontologies, browsing interfaces, and other indexing tools and systems. Analyses of data from two studies involving free sorting by 105 participants of 100 images were conducted. The types of categories formed and category labels were examined. Results of the analyses indicate that image category labels generally belong to superordinate to the basic level, and are generic and interpretive. Implications for research on theories of cognition and categorization, and design of image indexing, retrieval and browsing systems are discussed.
Themenfeld: Inhaltsanalyse
Behandelte Form: Bilder
-
34Rorissa, A.: User-generated descriptions of individual images versus labels of groups of images : a comparison using basic level theory.
In: Information processing and management. 44(2008) no.5, S.1741-1753.
Abstract: Although images are visual information sources with little or no text associated with them, users still tend to use text to describe images and formulate queries. This is because digital libraries and search engines provide mostly text query options and rely on text annotations for representation and retrieval of the semantic content of images. While the main focus of image research is on indexing and retrieval of individual images, the general topic of image browsing and indexing, and retrieval of groups of images has not been adequately investigated. Comparisons of descriptions of individual images as well as labels of groups of images supplied by users using cognitive models are scarce. This work fills this gap. Using the basic level theory as a framework, a comparison of the descriptions of individual images and labels assigned to groups of images by 180 participants in three studies found a marked difference in their level of abstraction. Results confirm assertions by previous researchers in LIS and other fields that groups of images are labeled using more superordinate level terms while individual image descriptions are mainly at the basic level. Implications for design of image browsing interfaces, taxonomies, thesauri, and similar tools are discussed.
Themenfeld: Inhaltsanalyse
Behandelte Form: Bilder
-
35Buckland, M. ; Shaw, R.: 4W vocabulary mapping across diiverse reference genres.
In: Culture and identity in knowledge organization: Proceedings of the Tenth International ISKO Conference 5-8 August 2008, Montreal, Canada. Ed. by Clément Arsenault and Joseph T. Tennis. Würzburg : Ergon Verlag, 2008. S.151-156.
(Advances in knowledge organization; vol.11)
Inhalt: This paper examines three themes in the design of search support services: linking different genres of reference resources (e.g. bibliographies, biographical dictionaries, catalogs, encyclopedias, place name gazetteers); the division of vocabularies by facet (e.g. What, Where, When, and Who); and mapping between both similar and dissimilar vocabularies. Different vocabularies within a facet can be used in conjunction, e.g. a place name combined with spatial coordinates for Where. In practice, vocabularies of different facets are used in combination in the representation or description of complex topics. Rich opportunities arise from mapping across vocabularies of dissimilar reference genres to recreate the amenities of a reference library. In a network environment, in which vocabulary control cannot be imposed, semantic correspondence across diverse vocabularies is a challenge and an opportunity.
Anmerkung: Vgl. unter: http://www.ergon-verlag.de/isko_ko/tocs/0497f79b0c0b3ed06/0497f79b0c0b5550a/index.php
Themenfeld: Inhaltsanalyse
-
36Inskip, C. ; MacFarlane, A. ; Rafferty, P.: Meaning, communication, music : towards a revised communication model.
In: Journal of documentation. 64(2008) no.5, S.687-706.
Abstract: Purpose - If an information retrieval system is going to be of value to the user then it must give meaning to the information which matches the meaning given to it by the user. The meaning given to music varies according to who is interpreting it - the author/composer, the performer, cataloguer or the listener - and this affects how music is organized and retrieved. This paper aims to examine the meaning of music, how meaning is communicated and suggests this may affect music retrieval. Design/methodology/approach - Musicology is used to define music and examine its functions leading to a discussion of how music has been organised and described. Various ways of establishing the meaning of music are reviewed, focussing on established musical analysis techniques. It is suggested that traditional methods are of limited use with digitised popular music. A discussion of semiotics and a review of semiotic analysis in western art music leads to a discussion of semiotics of popular music and examines ideas of Middleton, Stefani and Tagg. Findings - Agreeing that music exists when communication takes place, a discussion of selected communication models leads to the proposal of a revised version of Tagg's model, adjusting it to include listener feedback. Originality/value - The outcome of the analysis is a revised version of Tagg's communication model, adapted to reflect user feedback. It is suggested that this revised communication model reflects the way in which meaning is given to music.
Themenfeld: Inhaltsanalyse
Wissenschaftsfach: Musik
-
37Enser, P.G.B. ; Sandom, C.J. ; Hare, J.S. ; Lewis, P.H.: Facing the reality of semantic image retrieval.
In: Journal of documentation. 63(2007) no.4, S.465-481.
Abstract: Purpose - To provide a better-informed view of the extent of the semantic gap in image retrieval, and the limited potential for bridging it offered by current semantic image retrieval techniques. Design/methodology/approach - Within an ongoing project, a broad spectrum of operational image retrieval activity has been surveyed, and, from a number of collaborating institutions, a test collection assembled which comprises user requests, the images selected in response to those requests, and their associated metadata. This has provided the evidence base upon which to make informed observations on the efficacy of cutting-edge automatic annotation techniques which seek to integrate the text-based and content-based image retrieval paradigms. Findings - Evidence from the real-world practice of image retrieval highlights the existence of a generic-specific continuum of object identification, and the incidence of temporal, spatial, significance and abstract concept facets, manifest in textual indexing and real-query scenarios but often having no directly visible presence in an image. These factors combine to limit the functionality of current semantic image retrieval techniques, which interpret only visible features at the generic extremity of the generic-specific continuum. Research limitations/implications - The project is concerned with the traditional image retrieval environment in which retrieval transactions are conducted on still images which form part of managed collections. The possibilities offered by ontological support for adding functionality to automatic annotation techniques are considered. Originality/value - The paper offers fresh insights into the challenge of migrating content-based image retrieval from the laboratory to the operational environment, informed by newly-assembled, comprehensive, live data.
Themenfeld: Inhaltsanalyse
Behandelte Form: Bilder
-
38Naun, C.C.: Objectivity and subject access in the print library.
In: Cataloging and classification quarterly. 43(2006) no.2, S.83-94.
Abstract: Librarians have inherited from the print environment a particular way of thinking about subject representation, one based on the conscious identification by librarians of appropriate subject classes and terminology. This conception has played a central role in shaping the profession's characteristic approach to upholding one of its core values: objectivity. It is argued that the social and technological roots of traditional indexing practice are closely intertwined. It is further argued that in traditional library practice objectivity is to be understood as impartiality, and reflects the mediating role that librarians have played in society. The case presented here is not a historical one based on empirical research, but rather a conceptual examination of practices that are already familiar to most librarians.
Inhalt: Vgl. auch: http://catalogingandclassificationquarterly.com/
Themenfeld: Inhaltsanalyse
-
39White, M.D. ; Marsh, E.E.: Content analysis : a flexible methodology.
In: Library trends. 55(2006) no.1, S.22-45.
Abstract: Content analysis is a highly flexible research method that has been widely used in library and information science (LIS) studies with varying research goals and objectives. The research method is applied in qualitative, quantitative, and sometimes mixed modes of research frameworks and employs a wide range of analytical techniques to generate findings and put them into context. This article characterizes content analysis as a systematic, rigorous approach to analyzing documents obtained or generated in the course of research. It briefly describes the steps involved in content analysis, differentiates between quantitative and qualitative content analysis, and shows that content analysis serves the purposes of both quantitative research and qualitative research. The authors draw on selected LIS studies that have used content analysis to illustrate the concepts addressed in the article. The article also serves as a gateway to methodological books and articles that provide more detail about aspects of content analysis discussed only briefly in the article.
Anmerkung: Vgl.: 10.1353/lib.2006.0053.
Themenfeld: Inhaltsanalyse
-
40Allen, R.B. ; Wu, Y.: Metrics for the scope of a collection.
In: Journal of the American Society for Information Science and Technology. 56(2005) no.12, S.1243-1249.
Abstract: Some collections cover many topics, while others are narrowly focused an a limited number of topics. We introduce the concept of the "scope" of a collection of documents and we compare two ways of measuring lt. These measures are based an the distances between documents. The first uses the overlap of words between pairs of documents. The second measure uses a novel method that calculates the semantic relatedness to pairs of words from the documents. Those values are combined to obtain an overall distance between the documents. The main validation for the measures compared Web pages categorized by Yahoo. Sets of pages sampied from broad categories were determined to have a higher scope than sets derived from subcategories. The measure was significant and confirmed the expected difference in scope. Finally, we discuss other measures related to scope.
Themenfeld: Inhaltsanalyse