Diese Datenbank enthält über 40.000 Dokumente zu Themen aus den Bereichen Formalerschließung – Inhaltserschließung – Information Retrieval.
© 2015 W. Gödert, TH Köln, Institut für Informationswissenschaft / Powered by litecat, BIS Oldenburg (Stand: 04. Juni 2021)
1Losee, R.M.: Improving collection browsing : small world networking and Gray code ordering.
In: Cataloging and classification quarterly. 55(2017) no.4, S.229-246.
Abstract: Documents in digital and paper libraries may be arranged, based on their topics, in order to facilitate browsing. It may seem intuitively obvious that ordering documents by their subject should improve browsing performance; the results presented in this article suggest that ordering library materials by their Gray code values and through using links consistent with the small world model of document relationships is consistent with improving browsing performance. Below, library circulation data, including ordering with Library of Congress Classification numbers and Library of Congress Subject Headings, are used to provide information useful in generating user-centered document arrangements, as well as user-independent arrangements. Documents may be linearly arranged so they can be placed in a line by topic, such as on a library shelf, or in a list on a computer display. Crossover links, jumps between a document and another document to which it is not adjacent, can be used in library databases to allow additional paths that one might take when browsing. The improvement that is obtained with different combinations of document orderings and different crossovers is examined and applications suggested.
Inhalt: Vgl.: https://doi.org/10.1080/01639374.2017.1292415.
Themenfeld: OPAC ; Klassifikationssysteme im Online-Retrieval ; Verbale Doksprachen im Online-Retrieval
Hilfsmittel: LCC ; LCSH
2Losee, R.: Thesaurus structure, descriptive parameters, and scale.
In: Journal of the Association for Information Science and Technology. 67(2016) no.9, S.2156-2165.
Abstract: A thesaurus contains a set of terms or features that may be used to represent recorded information, including prose documents or scientific data sets. The focus of this work is on the basic structural nature of a thesaurus itself, not on how people develop a thesaurus or how a thesaurus effects retrieval performance. Thesauri in this research are automatically developed in a simulation from sets of randomly or exhaustively generated documents. Each thesaurus is generated by the Thesaurus Generator software from a set of several hundred documents, and thousands of different document sets are used as input to the Thesaurus Generator, producing thousands of thesauri. Thus, thousands of thesauri are generated for each data point in accompanying graphs. The characteristics of this large number of thesauri are studied so that the relationships between thesaurus parameters can be determined. Some rules governing these relationships are suggested, addressing factors such as tree height and width, number of tree roots in thesauri, and number of terms available for the vocabulary. How these parameters scale as vocabularies grow is addressed. These results apply to various information systems that contain features with hierarchical relationships, including many thesauri and ontologies.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23544/full.
Themenfeld: Konzeption und Anwendung des Prinzips Thesaurus
3Losee, R.: Combining high metainformation with high information content : the information-metainformation utility hypothesis.
In: Knowledge organization. 41(2014) no.2, S.123-130.
Abstract: Many documents and other informational objects carry both information and metainformation about the original informational object. There are general characteristics for documents or objects that possess either high levels of information and high levels of metainformation, or high levels of information and low levels of metainformation, or low levels of information and high amounts of metainformation, or low amounts of information and low amounts of metainformation. Each of these combinations represents a frequently occurring type of informative object. We suggest an Information-Metainformation Utility hypothesis that predicts that the expected economic value of information and its associated metainformation is proportional to the combined amounts of information and metainformation. The use of rules consistent with this hypothesis is discussed. This may be applied to any situation where there is either information or metainformation that may or may not be acquired or used, along with the expected value of the informative object. The idea of ideological segregation, where people tend to view media that represents their prior political beliefs, is examined in this context.
Inhalt: Vgl.: http://www.ergon-verlag.de/isko_ko/downloads/ko_41_2014_2_d.pdf.
4Willis, C. ; Losee, R.M.: ¬A random walk on an ontology : using thesaurus structure for automatic subject indexing.
In: Journal of the American Society for Information Science and Technology. 64(2013) no.7, S.1330-1344.
Abstract: Relationships between terms and features are an essential component of thesauri, ontologies, and a range of controlled vocabularies. In this article, we describe ways to identify important concepts in documents using the relationships in a thesaurus or other vocabulary structures. We introduce a methodology for the analysis and modeling of the indexing process based on a weighted random walk algorithm. The primary goal of this research is the analysis of the contribution of thesaurus structure to the indexing process. The resulting models are evaluated in the context of automatic subject indexing using four collections of documents pre-indexed with 4 different thesauri (AGROVOC [UN Food and Agriculture Organization], high-energy physics taxonomy [HEP], National Agricultural Library Thesaurus [NALT], and medical subject headings [MeSH]). We also introduce a thesaurus-centric matching algorithm intended to improve the quality of candidate concepts. In all cases, the weighted random walk improves automatic indexing performance over matching alone with an increase in average precision (AP) of 9% for HEP, 11% for MeSH, 35% for NALT, and 37% for AGROVOC. The results of the analysis support our hypothesis that subject indexing is in part a browsing process, and that using the vocabulary and its structure in a thesaurus contributes to the indexing process. The amount that the vocabulary structure contributes was found to differ among the 4 thesauri, possibly due to the vocabulary used in the corresponding thesauri and the structural relationships between the terms. Each of the thesauri and the manual indexing associated with it is characterized using the methods developed here.
Inhalt: Korrektur einer Referenz in: JASIST 64(2013) no.8, S.1757.
Themenfeld: Automatisches Indexieren ; Konzeption und Anwendung des Prinzips Thesaurus
Hilfsmittel: AGROVOC ; HEP ; NALT ; MeSH
5Losee, R.M.: ¬The effect of assigning a metadata or indexing term on document ordering.
In: Journal of the American Society for Information Science and Technology. 64(2013) no.11, S.2191-2200.
Abstract: The assignment of indexing terms and metadata to documents, data, and other information representations is considered useful, but the utility of including a single term is seldom discussed. The author discusses a simple model of document ordering and then shows how assigning index and metadata labels improves or decreases retrieval performance. The Indexing and Metadata Advantage (IMA) factor measures how indexing or assigning a metadata term helps (or hurts) ordering performance. Performance values and the associated IMA expressions are computed, consistent with several different assumptions. The economic value associated with various term assignment decisions is developed. The IMA term advantage model itself is empirically validated with computer software that shows that the analytic results obtained agree completely with the actual performance gains and losses found when ordering all sets of 14 or fewer documents. When the formulas in the software are changed to differ from this model, the predictions of the actual performance are erroneous.
6Losee, R.M.: Decisions in thesaurus construction and use.
In: Information processing and management. 43(2007) no.4, S.958-968.
Abstract: A thesaurus and an ontology provide a set of structured terms, phrases, and metadata, often in a hierarchical arrangement, that may be used to index, search, and mine documents. We describe the decisions that should be made when including a term, deciding whether a term should be subdivided into its subclasses, or determining which of more than one set of possible subclasses should be used. Based on retrospective measurements or estimates of future performance when using thesaurus terms in document ordering, decisions are made so as to maximize performance. These decisions may be used in the automatic construction of a thesaurus. The evaluation of an existing thesaurus is described, consistent with the decision criteria developed here. These kinds of user-focused decision-theoretic techniques may be applied to other hierarchical applications, such as faceted classification systems used in information architecture or the use of hierarchical terms in "breadcrumb navigation".
Themenfeld: Konzeption und Anwendung des Prinzips Thesaurus
7Losee, R.M.: Browsing mixed structured and unstructured data.
In: Information processing and management. 42(2006) no.2, S.440-452.
Abstract: Both structured and unstructured data, as well as structured data representing several different types of tuples, may be integrated into a single list for browsing or retrieval. Data may be arranged in the Gray code order of the features and metadata, producing optimal ordering for browsing. We provide several metrics for evaluating the performance of systems supporting browsing, given some constraints. Metadata and indexing terms are used for sorting keys and attributes for structured data, as well as for semi-structured or unstructured documents, images, media, etc. Economic and information theoretic models are suggested that enable the ordering to adapt to user preferences. Different relational structures and unstructured data may be integrated into a single, optimal ordering for browsing or for displaying tables in digital libraries, database management systems, or information retrieval systems. Adaptive displays of data are discussed.
8Losee, R.M. ; Church Jr., L.: Are two document clusters better than one? : the cluster performance question for information retrieval.
In: Journal of the American Society for Information Science and Technology. 56(2005) no.1, S.106-108.
Abstract: When do information retrieval systems using two document clusters provide better retrieval performance than systems using no clustering? We answer this question for one set of assumptions and suggest how this may be studied with other assumptions. The "Cluster Hypothesis" asks an empirical question about the relationships between documents and user-supplied relevance judgments, while the "Cluster Performance Question" proposed here focuses an the when and why of information retrieval or digital library performance for clustered and unclustered text databases. This may be generalized to study the relative performance of m versus n clusters.
9Losee, R.: ¬A performance model of the length and number of subject headings and index phrases.
In: Knowledge organization. 31(2004) no.4, S.245-251.
Abstract: When assigning subject headings or index terms to a document, how many terms or phrases should be used to represent the document? The contribution of an indexing phrase to locating and ordering documents can be compared to the contribution of a full-text query to finding documents. The length and number of phrases needed to equal the contribution of a full-text query is the subject of this paper. The appropriate number of phrases is determined in part by the length of the phrases. We suggest several rules that may be used to determine how many subject headings should be assigned, given index phrase lengths, and provide a general model for this process. A difference between characteristics of indexing "hard" science and "social" science literature is suggested.
Anmerkung: Die Aussagen dieses Beitrages könnten einmal mit den RSWK in Verbindung gebracht werden
10Losee, R.M.: Term dependence : a basis for Luhn and Zipf models.
In: Journal of the American Society for Information Science and technology. 52(2001) no.12, S.1019-1025.
Abstract: There are regularities in the statistical information provided by natural language terms about neighboring terms. We find that when phrase rank increases, moving from common to less common phrases, the value of the expected mutual information measure (EMIM) between the terms regularly decreases. Luhn's model suggests that midrange terms are the best index terms and relevance discriminators. We suggest reasons for this principle based on the empirical relationships shown here between the rank of terms within phrases and the average mutual information between terms, which we refer to as the Inverse Representation- EMIM principle. We also suggest an Inverse EMIM term weight for indexing or retrieval applications that is consistent with Luhn's distribution. An information theoretic interpretation of Zipf's Law is provided. Using the regularity noted here, we suggest that Zipf's Law is a consequence of the statistical dependencies that exist between terms, described here using information theoretic concepts.
Objekt: Luhn-Modell ; Zipf-Gesetz
11Losee, R.M.: When information retrieval measures agree about the relative quality of document rankings.
In: Journal of the American Society for Information Science. 51(2000) no.9, S.834-840.
Abstract: The variety of performance measures available for information retrieval systems, search engines, and network filtering agents can be confusing to both practitioners and scholars. Most discussions about these measures address their theoretical foundations and the characteristics of a measure that make it desirable for a particular application. In this work, we consider how measures of performance at a point in a search may be formally compared. Criteria are developed that allow one to determine the percent of time or conditions under which 2 different performance measures suggest that one document ordering is superior to another ordering, or when the 2 measures disagree about the relative value of document orderings. As an example, graphs provide illustrations of the relationships between precision and F
13Losee, R.M.: Comparing Boolean and probabilistic information retrieval systems across queries and disciplines.
In: Journal of the American Society for Information Science. 48(1997) no.2, S.143-156.
Abstract: Suggests a method for comparison of the use of Boolean queries and ranking documents using document and term weights, and examines their relative merits. The performance of information retrieval may be determined either by using experimental simulation, or through the application of analytic techniques that estimate the retrieval performance, given values for query and database characteristics. Using these performance predicting techniques, sample performance figures are provided for queries using the Boolean operators and, and or, as well as for probabilistic systems assuming statistical term independence or term dependence. Examines the performance of models failing to meet statistical and other assumptions
14Losee, R.M.: ¬A discipline independent definition of information.
In: Journal of the American Society for Information Science. 48(1997) no.3, S.254-269.
Abstract: Information may be defined as the characteristics of the output of a process, these being informative about the process and the input. This discipline independent definition may be applied to all domains, from physics to epistemology. Hierarchies of processes linked together, provide a communication channel between each of the corresponding functions and layers in the hierarchies. Models of communication, perception, observation, belief, and knowledge are suggested that are consistent with this conceptual framework of information as the value of the output of any process in a hierarchy of processes. Misinformation and errors are considered
15Losee, R.M.: Browsing document collections : automatically organizing digital libraries and hypermedia using the Gray code.
In: Information processing and management. 33(1997) no.2, S.175-192.
Abstract: Relevance and economic feedback may be used to produce an ordering of documents that supports browsing in hypermedia and digital libraries. Document classification based on the Gray code provides paths through the entire collection, each path traversing each node in the set of documents exactly once. Examines systems organizing document based on weighted and unweighted Gray codes. Relevance feedback is used to conceptually organize the collection for an individual to browse, based on that individual's interests and information needs, as reflected by their relevance judgements and user supplied economic preferences. Applies Bayesian learning theory to estimating the characteristics of documents of interest to the user and supplying an analytic model of browsing performance, based on minimising the Expected Browsing Distance. Economic feedback may be used to change the ordering of documents to benefit the user. Using these techniques, a hypermedia or digital library may order any and all available documents, not just those examined, based on the information provided by the searcher or people with similar interests
Anmerkung: Contribution to a special issue on methods and tools for the automatic construction of hypertext
16Losee, R.M.: Learning syntactic rules and tags with genetic algorithms for information retrieval and filtering : an empirical basis for grammatical rules.
In: Information processing and management. 32(1996) no.2, S.185-197.
Abstract: The grammars of natural languages may be learned by using genetic algorithms that reproduce and mutate grammatical rules and parts of speech tags, improving the quality of later generations of grammatical components. Syntactic rules are randomly generated and then evolve; those rules resulting in improved parsing and occasionally improved filtering performance are allowed to further propagate. The LUST system learns the characteristics of the language or subkanguage used in document abstracts by learning from the document rankings obtained from the parsed abstracts. Unlike the application of traditional linguistic rules to retrieval and filtering applications, LUST develops grammatical structures and tags without the prior imposition of some common grammatical assumptions (e.g. part of speech assumptions), producing grammars that are empirically based and are optimized for this particular application
17Losee, R.M.: Evaluating retrieval performance given database and query characteristics : analytic determination of performance surfaces.
In: Journal of the American Society for Information Science. 47(1996) no.1, S.95-105.
Abstract: An analytic method of information retrieval and filtering evaluation can quantitatively predict the expected number of documents examined in retrieving a relevant document. It also allows researchers and practioners to qualitatively understand how varying different estimates of query parameter values affects retrieval performance. The incoorporation of relevance feedback to increase our knowledge about the parameters of relevant documents and the robustness of parameter estimates is modeled. Single term and two term independence models, as well as a complete term dependence model, are developed. An economic model of retrieval performance may be used to study the effects of database size and to provide analytic answers to questions comparing retrieval from small and large databases, as well as questions about the number of terms in a query. Results are presented as a performance surface, a three dimensional graph showing the effects of two independent variables on performance.
18Losee, R.M.: Text windows and phrases differing by discipline, location in document, and syntactic structure.
In: Information processing and management. 32(1996) no.6, S.747-767.
Abstract: Knowledge of window style, content, location, and grammatical structure may be used to classify documents as originating within a particular discipline or may be used to place a document on a theory vs. practice spectrum. Examines characteristics of phrases and text windows, including their number, location in documents, and grammatical construction, in addition to studying variations in these window characteristics across disciplines. Examines some of the linguistic regularities for individual disciplines, and suggests families of regularities that may provide helpful for the automatic classification of documents, as well as for information retrieval and filtering applications
Themenfeld: Automatisches Klassifizieren
19Spink, A. ; Losee, R.M.: Feedback in information retrieval.
In: Annual review of information science and technology. 31(1996), S.33-78.
Abstract: State of the art review of the mechanisms of feedback in information retrieval (IR) in terms of feedback concepts and models in cybernetics and social sciences. Critically evaluates feedback research based on the traditional IR models and comparing the different approaches to automatic relevance feedback techniques, and feedback research within the framework of interactive IR models. Calls for an extension of the concept of feedback beyond relevance feedback to interactive feedback. Cites specific examples of feedback models used within IR research and presents 6 challenges to future research
20Losee, R.M. ; Haas, S.W.: Sublanguage terms : dictionaries, usage, and automatic classification.
In: Journal of the American Society for Information Science. 46(1995) no.7, S.519-529.
Abstract: The use of terms from natural and social science titles and abstracts is studied from the perspective of sublanguages and their specialized dictionaries. Explores different notions of sublanguage distinctiveness. Object methods for separating hard and soft sciences are suggested based on measures of sublanguage use, dictionary characteristics, and sublanguage distinctiveness. Abstracts were automatically classified with a high degree of accuracy by using a formula that condsiders the degree of uniqueness of terms in each sublanguage. This may prove useful for text filtering of information retrieval systems
Themenfeld: Automatisches Klassifizieren