Diese Datenbank enthält über 40.000 Dokumente zu Themen aus den Bereichen Formalerschließung – Inhaltserschließung – Information Retrieval.
© 2015 W. Gödert, TH Köln, Institut für Informationswissenschaft / Powered by litecat, BIS Oldenburg (Stand: 28. April 2022)
1Tang, X.-B. ; Liu, G.-C. ; Yang, J. ; Wei, W.: Knowledge-based financial statement fraud detection system : based on an ontology and a decision tree.
In: Knowledge organization. 45(2018) no.3, S.205-219.
Abstract: Financial statement fraud has seriously affected investors' confidence in the stock market and economic stability. Several serious financial statement fraud events have caused huge economic losses. Intelligent financial statement fraud detection has thus been the topic of recent studies. In this paper, we developed a knowledge-based financial statement fraud detection system based on a financial statement detection ontology and detection rules extracted from a C4.5 decision tree algorithm. Through discovering the patterns of financial statement fraud activity, we defined the scope of our financial statement domain ontology. By utilizing SWRL rules and the Pellet inference engine in domain ontology, we detected financial statement fraud activities and discovered implicit knowledge. This system can be used to support investors' decision-making and provide early warning to regulators.
2Li, X. ; Zhang, A. ; Li, C. ; Ouyang, J. ; Cai, Y.: Exploring coherent topics by topic modeling with term weighting.
In: Information processing and management. 54(2018) no.6, S.1345-1358.
Abstract: Topic models often produce unexplainable topics that are filled with noisy words. The reason is that words in topic modeling have equal weights. High frequency words dominate the top topic word lists, but most of them are meaningless words, e.g., domain-specific stopwords. To address this issue, in this paper we aim to investigate how to weight words, and then develop a straightforward but effective term weighting scheme, namely entropy weighting (EW). The proposed EW scheme is based on conditional entropy measured by word co-occurrences. Compared with existing term weighting schemes, the highlight of EW is that it can automatically reward informative words. For more robust word weight, we further suggest a combination form of EW (CEW) with two existing weighting schemes. Basically, our CEW assigns meaningless words lower weights and informative words higher weights, leading to more coherent topics during topic modeling inference. We apply CEW to Dirichlet multinomial mixture and latent Dirichlet allocation, and evaluate it by topic quality, document clustering and classification tasks on 8 real world data sets. Experimental results show that weighting words can effectively improve the topic modeling performance over both short texts and normal long texts. More importantly, the proposed CEW significantly outperforms the existing term weighting schemes, since it further considers which words are informative.
Inhalt: Vgl.: https://doi.org/10.1016/j.ipm.2018.05.009.
Themenfeld: Automatisches Indexieren
3Choi, S. ; Yang, J.S.W. ; Park, H.W.: ¬The triple helix and international collaboration in science.
In: Journal of the Association for Information Science and Technology. 66(2015) no.1, S.201-212.
Abstract: Previous studies of international scientific collaboration have rarely gone beyond revealing the structural relationships between countries. Considering how scientific collaboration is actually initiated, this study focuses on the organization and sector levels of international coauthorship networks, going beyond a country-level description. Based on a network analysis of coauthorship networks between members of the Organisation for Economic Cooperation and Development (OECD), this study attempts to gain a better understanding of international scientific collaboration by exploring the structure of the coauthorship network in terms of university-industry-government (UIG) relationships, the mode of knowledge production, and the underlying dynamic of collaboration in terms of geographic, linguistic, and economic factors. The results suggest that the United States showed overwhelming dominance in all bilateral UIG combinations with the exception of the government-government (GG) network. Scientific collaboration within the industry sector was concentrated in a few players, whereas that between the university and industry sectors was relatively less concentrated. Despite the growing participation from other sectors, universities were still the main locus of knowledge production, with the exception of 5 countries. The university sector in English-speaking wealthy countries and the government sector of non-English-speaking, less-wealthy countries played a key role in international collaborations between OECD countries. The findings did not provide evidence supporting the institutional proximity argument.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23165/abstract.
4Wang, J. ; Clements, M. ; Yang, J. ; Vries, A.P. de ; Reinders, M.J.T.: Personalization of tagging systems.
In: Information processing and management. 46(2010) no.1, S.58-70.
Abstract: Social media systems have encouraged end user participation in the Internet, for the purpose of storing and distributing Internet content, sharing opinions and maintaining relationships. Collaborative tagging allows users to annotate the resulting user-generated content, and enables effective retrieval of otherwise uncategorised data. However, compared to professional web content production, collaborative tagging systems face the challenge that end-users assign tags in an uncontrolled manner, resulting in unsystematic and inconsistent metadata. This paper introduces a framework for the personalization of social media systems. We pinpoint three tasks that would benefit from personalization: collaborative tagging, collaborative browsing and collaborative search. We propose a ranking model for each task that integrates the individual user's tagging history in the recommendation of tags and content, to align its suggestions to the individual user preferences. We demonstrate on two real data sets that for all three tasks, the personalized ranking should take into account both the user's own preference and the opinion of others.
Themenfeld: Social tagging
5Bold, N. ; Kim, W.-J. ; Yang, J.-D.: Converting object-based thesauri into XML Topic Maps.
In: 2010 2nd International Conference on Education Technology and Computer (ICETC). Piscataway, NJ : IEEE, 2010. S.V2:102-106.
Abstract: Constructing ontology is considerably time consuming process in general. Since there are a vast amount of thesauri currently available, it may be a feasible solution to exploit thesauri, when constructing ontology in a short period of time. This paper designs and implements a XTM (XML Topic Maps) code converter generating XTM coded ontology from an object based thesaurus. It is an extended thesaurus, which enriches the conventional thesauri with user defined associations, a notion of instances and occurrences associated with them. The reason we adopt XTM is that it is a verified and practical methodology to semantically reorganize the conceptual structure of extant web applications with minimal effort. Moreover, since XTM is conceptually similar to our object based thesauri, recommendation and inference mechanism already developed in our system could be easily applied to the generated XTM ontology. To show that the XTM ontology is correct, we also verify it with onto pia Omnigator and Vizigator, the components of Ontopia Knowledge Suite (OKS) tool.
Objekt: Topic maps
6Lee, Y.-S. ; Wu, Y.-C. ; Yang, J.-C.: BVideoQA : Online English/Chinese bilingual video question answering.
In: Journal of the American Society for Information Science and Technology. 60(2009) no.3, S.509-525.
Abstract: This article presents a bilingual video question answering (QA) system, namely BVideoQA, which allows users to retrieve Chinese videos through English or Chinese natural language questions. Our method first extracts an optimal one-to-one string pattern matching according to the proposed dense and long N-gram match. On the basis of the matched string patterns, it gives a passage score based on our term-weighting scheme. The main contributions of this approach to multimedia information retrieval literatures include: (a) development of a truly bilingual video QA system, (b) presentation of a robust bilingual passage retrieval algorithm to handle no-word-boundary languages such as Chinese and Japanese, (c) development of a large-scale bilingual video QA corpus for system evaluation, and (d) comparisons of seven top-performing retrieval methods under the fair conditions. The experimental studies indicate that our method is superior to other existing approaches in terms of precision and main rank reciprocal rates. When ported to English, encouraging empirical results also are obtained. Our method is very important to Asian-like languages since the development of a word tokenizer is optional.
7Wan, X. ; Yang, J. ; Xiao, J.: Towards a unified approach to document similarity search using manifold-ranking of blocks.
In: Information processing and management. 44(2008) no.3, S.1032-1048.
Abstract: Document similarity search (i.e. query by example) aims to retrieve a ranked list of documents similar to a query document in a text corpus or on the Web. Most existing approaches to similarity search first compute the pairwise similarity score between each document and the query using a retrieval function or similarity measure (e.g. Cosine), and then rank the documents by the similarity scores. In this paper, we propose a novel retrieval approach based on manifold-ranking of document blocks (i.e. a block of coherent text about a subtopic) to re-rank a small set of documents initially retrieved by some existing retrieval function. The proposed approach can make full use of the intrinsic global manifold structure of the document blocks by propagating the ranking scores between the blocks on a weighted graph. First, the TextTiling algorithm and the VIPS algorithm are respectively employed to segment text documents and web pages into blocks. Then, each block is assigned with a ranking score by the manifold-ranking algorithm. Lastly, a document gets its final ranking score by fusing the scores of its blocks. Experimental results on the TDT data and the ODP data demonstrate that the proposed approach can significantly improve the retrieval performances over baseline approaches. Document block is validated to be a better unit than the whole document in the manifold-ranking process.
8Wan, X. ; Yang, J. ; Xiao, J.: Incorporating cross-document relationships between sentences for single document summarizations.
In: Research and advanced technology for digital libraries : 10th European conference, proceedings / ECDL 2006, Alicante, Spain, September 17 - 22, 2006. Berlin : Springer, 2006. S.403-414.
(Lecture notes in computer science; vol.4172)
Abstract: Graph-based ranking algorithms have recently been proposed for single document summarizations and such algorithms evaluate the importance of a sentence by making use of the relationships between sentences in the document in a recursive way. In this paper, we investigate using other related or relevant documents to improve summarization of one single document based on the graph-based ranking algorithm. In addition to the within-document relationships between sentences in the specified document, the cross-document relationships between sentences in different documents are also taken into account in the proposed approach. We evaluate the performance of the proposed approach on DUC 2002 data with the ROUGE metric and results demonstrate that the cross-document relationships between sentences in different but related documents can significantly improve the performance of single document summarization.
9Gachot, D.A. ; Lange, E. ; Yang, J.: ¬The SYSTRAN NLP browser : an application of machine translation technology in cross-language information retrieval.
In: Cross-language information retrieval. Ed.: G. Grefenstette. Boston, MA : Kluwer Academic Publ., 1998. S.105-118.
(The Kluwer International series on information retrieval)