Diese Datenbank enthält über 40.000 Dokumente zu Themen aus den Bereichen Formalerschließung – Inhaltserschließung – Information Retrieval.
© 2015 W. Gödert, TH Köln, Institut für Informationswissenschaft / Powered by litecat, BIS Oldenburg (Stand: 18. September 2018)
1Lu, K. ; Mao, J. ; Li, G.: Toward effective automated weighted subject indexing : a comparison of different approaches in different environments.
In: Journal of the Association for Information Science and Technology. 69(2018) no.1, S.121-133.
Abstract: Subject indexing plays an important role in supporting subject access to information resources. Current subject indexing systems do not make adequate distinctions on the importance of assigned subject descriptors. Assigning numeric weights to subject descriptors to distinguish their importance to the documents can strengthen the role of subject metadata. Automated methods are more cost-effective. This study compares different automated weighting methods in different environments. Two evaluation methods were used to assess the performance. Experiments on three datasets in the biomedical domain suggest the performance of different weighting methods depends on whether it is an abstract or full text environment. Mutual information with bag-of-words representation shows the best average performance in the full text environment, while cosine with bag-of-words representation is the best in an abstract environment. The cosine measure has relatively consistent and robust performance. A direct weighting method, IDF (Inverse Document Frequency), can produce quick and reasonable estimates of the weights. Bag-of-words representation generally outperforms the concept-based representation. Further improvement in performance can be obtained by using the learning-to-rank method to integrate different weighting methods. This study follows up Lu and Mao (Journal of the Association for Information Science and Technology, 66, 1776-1784, 2015), in which an automated weighted subject indexing method was proposed and validated. The findings from this study contribute to more effective weighted subject indexing.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23912/full.
Anmerkung: Vgl. das Erratum in JASIST 69(2018) no.7, S.956.
Themenfeld: Automatisches Indexieren ; Indexierungsstudien
2Lu, K. ; Cai, X. ; Ajiferuke, I. ; Wolfram, D.: Vocabulary size and its effect on topic representation.
In: Information processing and management. 53(2017) no.3, S.653-665.
Abstract: This study investigates how computational overhead for topic model training may be reduced by selectively removing terms from the vocabulary of text corpora being modeled. We compare the impact of removing singly occurring terms, the top 0.5%, 1% and 5% most frequently occurring terms and both top 0.5% most frequent and singly occurring terms, along with changes in the number of topics modeled (10, 20, 30, 40, 50, 100) using three datasets. Four outcome measures are compared. The removal of singly occurring terms has little impact on outcomes for all of the measures tested. Document discriminative capacity, as measured by the document space density, is reduced by the removal of frequently occurring terms, but increases with higher numbers of topics. Vocabulary size does not greatly influence entropy, but entropy is affected by the number of topics. Finally, topic similarity, as measured by pairwise topic similarity and Jensen-Shannon divergence, decreases with the removal of frequent terms. The findings have implications for information science research in information retrieval and informetrics that makes use of topic modeling.
Inhalt: Vgl.: http://www.sciencedirect.com/science/article/pii/S0306457317300298.
3Lu, K. ; Joo, S. ; Lee, T. ; Hu, R.: Factors that influence query reformulations and search performance in health information retrieval : a multilevel modeling approach.
In: Journal of the Association for Information Science and Technology. 68(2017) no.8, S.1886-1898.
Abstract: Query reformulations can occur multiple times in a session, and queries observed in the same session tend to be related to each other. Due to the interdependent nature of queries in a session, it has been challenging to analyze query reformulation data while controlling for possible dependencies among queries. This study proposes a multilevel modeling approach in an attempt to analyze the effects of contextual factors and system features on types of query reformulation, as well as the relationship between types of query reformulation and search performance within a single research model. The results revealed that system features and users' educational background significantly influence users' query reformulation behaviors. Also, types of query reformulation had a significant impact on search performance. The main contribution of this study lies in that it adopted the multilevel modeling method to analyze query reformulation behavior while considering the nested structure of search session data. Multilevel analysis enables us to design an extensible research model to include both session-level and action-level factors, which provides a more extended understanding of the relationships among factors that influence query reformulation behavior and search performance. The multilevel modeling used in this study has practical implications for future query reformulation studies.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23872/full.
4Lu, K. ; Mao, J.: ¬An automatic approach to weighted subject indexing : an empirical study in the biomedical domain.
In: Journal of the Association for Information Science and Technology. 66(2015) no.9, S.1776-1784.
Abstract: Subject indexing is an intellectually intensive process that has many inherent uncertainties. Existing manual subject indexing systems generally produce binary outcomes for whether or not to assign an indexing term. This does not sufficiently reflect the extent to which the indexing terms are associated with the documents. On the other hand, the idea of probabilistic or weighted indexing was proposed a long time ago and has seen success in capturing uncertainties in the automatic indexing process. One hurdle to overcome in implementing weighted indexing in manual subject indexing systems is the practical burden that could be added to the already intensive indexing process. This study proposes a method to infer automatically the associations between subject terms and documents through text mining. By uncovering the connections between MeSH descriptors and document text, we are able to derive the weights of MeSH descriptors manually assigned to documents. Our initial results suggest that the inference method is feasible and promising. The study has practical implications for improving subject indexing practice and providing better support for information retrieval.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23290/abstract.
Themenfeld: Indexierungsstudien ; Automatisches Indexieren
5Lu, K. ; Kipp, M.E.I.: Understanding the retrieval effectiveness of collaborative tags and author keywords in different retrieval environments : an experimental study on medical collections.
In: Journal of the Association for Information Science and Technology. 65(2014) no.3, S.483-500.
Abstract: This study investigates the retrieval effectiveness of collaborative tags and author keywords in different environments through controlled experiments. Three test collections were built. The first collection tests the impact of tags on retrieval performance when only the title and abstract are available (the abstract environment). The second tests the impact of tags when the full text is available (the full-text environment). The third compares the retrieval effectiveness of tags and author keywords in the abstract environment. In addition, both single-word queries and phrase queries are tested to understand the impact of different query types. Our findings suggest that including tags and author keywords in indexes can enhance recall but may improve or worsen average precision depending on retrieval environments and query types. Indexing tags and author keywords for searching using phrase queries in the abstract environment showed improved average precision, whereas indexing tags for searching using single-word queries in the full-text environment led to a significant drop in average precision. The comparison between tags and author keywords in the abstract environment indicates that they have comparable impact on average precision, but author keywords are more advantageous in enhancing recall. The findings from this study provide useful implications for designing retrieval systems that incorporate tags and author keywords.
6Mu, X. ; Lu, K. ; Ryu, H.: Explicitly integrating MeSH thesaurus help into health information retrieval systems : an empirical user study.
In: Information processing and management. 50(2014) no.1, S.24-40.
Abstract: When consumers search for health information, a major obstacle is their unfamiliarity with the medical terminology. Even though medical thesauri such as the Medical Subject Headings (MeSH) and related tools (e.g., the MeSH Browser) were created to help consumers find medical term definitions, the lack of direct and explicit integration of these help tools into a health retrieval system prevented them from effectively achieving their objectives. To explore this issue, we conducted an empirical study with two systems: One is a simple interface system supporting query-based searching; the other is an augmented system with two new components supporting MeSH term searching and MeSH tree browsing. A total of 45 subjects were recruited to participate in the study. The results indicated that the augmented system is more effective than the simple system in terms of improving user-perceived topic familiarity and question-answer performance, even though we did not find users spend more time on the augmented system. The two new MeSH help components played a critical role in participants' health information retrieval and were found to allow them to develop new search strategies. The findings of the study enhanced our understanding of consumers' search behaviors and shed light on the design of future health information retrieval systems.
Inhalt: Vgl.: doi: 10.1016/j.ipm.2013.03.005.
Themenfeld: Konzeption und Anwendung des Prinzips Thesaurus ; Verbale Doksprachen im Online-Retrieval
7Lu, K. ; Wolfram, D.: Measuring author research relatedness : a comparison of word-based, topic-based, and author cocitation approaches.
In: Journal of the American Society for Information Science and Technology. 63(2012) no.10, S.1973-1986.
Abstract: Relationships between authors based on characteristics of published literature have been studied for decades. Author cocitation analysis using mapping techniques has been most frequently used to study how closely two authors are thought to be in intellectual space based on how members of the research community co-cite their works. Other approaches exist to study author relatedness based more directly on the text of their published works. In this study we present static and dynamic word-based approaches using vector space modeling, as well as a topic-based approach based on latent Dirichlet allocation for mapping author research relatedness. Vector space modeling is used to define an author space consisting of works by a given author. Outcomes for the two word-based approaches and a topic-based approach for 50 prolific authors in library and information science are compared with more traditional author cocitation analysis using multidimensional scaling and hierarchical cluster analysis. The two word-based approaches produced similar outcomes except where two authors were frequent co-authors for the majority of their articles. The topic-based approach produced the most distinctive map.
8Ajiferuke, I. ; Lu, K. ; Wolfram, D.: ¬A comparison of citer and citation-based measure outcomes for multiple disciplines.
In: Journal of the American Society for Information Science and Technology. 61(2010) no.10, S.2086-2096.
Abstract: Author research impact was examined based on citer analysis (the number of citers as opposed to the number of citations) for 90 highly cited authors grouped into three broad subject areas. Citer-based outcome measures were also compared with more traditional citation-based measures for levels of association. The authors found that there are significant differences in citer-based outcomes among the three broad subject areas examined and that there is a high degree of correlation between citer and citation-based measures for all measures compared, except for two outcomes calculated for the social sciences. Citer-based measures do produce slightly different rankings of authors based on citer counts when compared to more traditional citation counts. Examples are provided. Citation measures may not adequately address the influence, or reach, of an author because citations usually do not address the origin of the citation beyond self-citations.