Diese Datenbank enthält über 40.000 Dokumente zu Themen aus den Bereichen Formalerschließung – Inhaltserschließung – Information Retrieval.
© 2015 W. Gödert, TH Köln, Institut für Informationswissenschaft / Powered by litecat, BIS Oldenburg (Stand: 04. Juni 2021)
1Zhu, J. ; Han, L. ; Gou, Z. ; Yuan, X.: ¬A fuzzy clustering-based denoising model for evaluating uncertainty in collaborative filtering recommender systems.
In: Journal of the Association for Information Science and Technology. 69(2018) no.9, S.1109-1121.
Abstract: Recommender systems are effective in predicting the most suitable products for users, such as movies and books. To facilitate personalized recommendations, the quality of item ratings should be guaranteed. However, a few ratings might not be accurate enough due to the uncertainty of user behavior and are referred to as natural noise. In this article, we present a novel fuzzy clustering-based method for detecting noisy ratings. The entropy of a subset of the original ratings dataset is used to indicate the data-driven uncertainty, and evaluation metrics are adopted to represent the prediction-driven uncertainty. After the repetition of resampling and the execution of a recommendation algorithm, the entropy and evaluation metrics vectors are obtained and are empirically categorized to identify the proportion of the potential noise. Then, the fuzzy C-means-based denoising (FCMD) algorithm is performed to verify the natural noise under the assumption that natural noise is primarily the result of the exceptional behavior of users. Finally, a case study is performed using two real-world datasets. The experimental results show that our proposal outperforms previous proposals and has an advantage in dealing with natural noise.
Inhalt: Vgl.: https://onlinelibrary.wiley.com/doi/10.1002/asi.24036.
2Jiang, Y. ; Bai, W. ; Zhang, X. ; Hu, J.: Wikipedia-based information content and semantic similarity computation.
In: Information processing and management. 53(2017) no.1, S.248-265.
Abstract: The Information Content (IC) of a concept is a fundamental dimension in computational linguistics. It enables a better understanding of concept's semantics. In the past, several approaches to compute IC of a concept have been proposed. However, there are some limitations such as the facts of relying on corpora availability, manual tagging, or predefined ontologies and fitting non-dynamic domains in the existing methods. Wikipedia provides a very large domain-independent encyclopedic repository and semantic network for computing IC of concepts with more coverage than usual ontologies. In this paper, we propose some novel methods to IC computation of a concept to solve the shortcomings of existing approaches. The presented methods focus on the IC computation of a concept (i.e., Wikipedia category) drawn from the Wikipedia category structure. We propose several new IC-based measures to compute the semantic similarity between concepts. The evaluation, based on several widely used benchmarks and a benchmark developed in ourselves, sustains the intuitions with respect to human judgments. Overall, some methods proposed in this paper have a good human correlation and constitute some effective ways of determining IC values for concepts and semantic similarity between concepts.
Inhalt: Vgl.: http://www.sciencedirect.com/science/article/pii/S0306457316303934 [http://dx.doi.org/10.1016/j.ipm.2016.09.001].
Themenfeld: Semantisches Umfeld in Indexierung u. Retrieval
3Tang, X.-B. ; Wei Wei, G,-C.L. ; Zhu, J.: ¬An inference model of medical insurance fraud detection : based on ontology and SWRL.
In: Knowledge organization. 44(2017) no.2, S.84-96.
Abstract: Medical insurance fraud is common in many countries' medical insurance systems and represents a serious threat to the insurance funds and the benefits of patients. In this paper, we present an inference model of medical insurance fraud detection, based on a medical detection domain ontology that incorporates the knowledge base provided by the Medical Terminology, NKIMed, and Chinese Library Classification systems. Through analyzing the behaviors of irregular and fraudulent medical services, we defined the scope of the medical domain ontology relevant to the task and built the ontology about medical sciences and medical service behaviors. The ontology then utilizes Semantic Web Rule Language (SWRL) and Java Expert System Shell (JESS) to detect medical irregularities and mine implicit knowledge. The system can be used to improve the management of medical insurance risks.
4Peng, T.-Q. ; Zhu, J.J.H.: Where you publish matters most : a multilevel analysis of factors affecting citations of internet studies.
In: Journal of the American Society for Information Science and Technology. 63(2012) no.9, S.1789-1803.
Abstract: This study explores the factors influencing citations to Internet studies by assessing the relative explanatory power of three perspectives: normative theory, the social constructivist approach, and a natural growth mechanism. Using data on 7,700+ articles of Internet studies published in 100+ Social Sciences Citation Index (SSCI)-listed journals in 2000-2009, the study adopted a multilevel model to disentangle the impact between article- and journal-level factors on citations. This research strategy resulted in a number of both expected and surprising findings. The primary determinants for citations are found to be journal-level factors, accounting for 14% of the variances in citations of Internet studies. The impact of some, if not all, article-level factors on citations are moderated by journal-level factors. Internet studies, like studies in other areas (e.g., management, demography, and ecology), are cited more for rhetorical purposes, as suggested by the social constructivist approach, rather than as a form of reward, as argued by normative theory. The impact of time on citations varies across journals, which creates a growing "citation gap" for Internet studies published in journals with different characteristics.
5Zhu, J. ; Song, D. ; Rüger, S.: Integrating multiple windows and document features for expert finding.
In: Journal of the American Society for Information Science and Technology. 60(2009) no.4, S.694-715.
Abstract: Expert finding is a key task in enterprise search and has recently attracted lots of attention from both research and industry communities. Given a search topic, a prominent existing approach is to apply some information retrieval (IR) system to retrieve top ranking documents, which will then be used to derive associations between experts and the search topic based on cooccurrences. However, we argue that expert finding is more sensitive to multiple levels of associations and document features that current expert finding systems insufficiently address, including (a) multiple levels of associations between experts and search topics, (b) document internal structure, and (c) document authority. We propose a novel approach that integrates the above-mentioned three aspects as well as a query expansion technique in a two-stage model for expert finding. A systematic evaluation is conducted on TREC collections to test the performance of our approach as well as the effects of multiple windows, document features, and query expansion. These experimental results show that query expansion can dramatically improve expert finding performance with statistical significance. For three well-known IR models with or without query expansion, document internal structures help improve a single window-based approach but without statistical significance, while our novel multiple window-based approach can significantly improve the performance of a single window-based approach both with and without document internal structures.
6Hu, J.: ¬The impact of productivity and quality of CJK cataloging : a brief comparison between CJK 2nd edition and 3rd edition.
In: Cataloging and classification quarterly. 29(2000) no.4, S.87-90.
Abstract: This report compares the features of the 2nd and 3rd editions of OCLC's CJK cataloging as implemented at the Chicago Public Library. The 3rd edition is faster for cataloging than the 2nd edition. Alternatively, the 2nd edition has other benefits including stability. Perspectives such as the quality of CJK cataloging between the 2nd and the 3rd edition are discussed also.