Diese Datenbank enthält über 40.000 Dokumente zu Themen aus den Bereichen Formalerschließung – Inhaltserschließung – Information Retrieval.
© 2015 W. Gödert, TH Köln, Institut für Informationswissenschaft / Powered by litecat, BIS Oldenburg (Stand: 28. April 2022)
1Ru, C. ; Tang, J. ; Li, S. ; Xie, S. ; Wang, T.: Using semantic similarity to reduce wrong labels in distant supervision for relation extraction.
In: Information processing and management. 54(2018) no.4, S.593-608.
Abstract: Distant supervision (DS) has the advantage of automatically generating large amounts of labelled training data and has been widely used for relation extraction. However, there are usually many wrong labels in the automatically labelled data in distant supervision (Riedel, Yao, & McCallum, 2010). This paper presents a novel method to reduce the wrong labels. The proposed method uses the semantic Jaccard with word embedding to measure the semantic similarity between the relation phrase in the knowledge base and the dependency phrases between two entities in a sentence to filter the wrong labels. In the process of reducing wrong labels, the semantic Jaccard algorithm selects a core dependency phrase to represent the candidate relation in a sentence, which can capture features for relation classification and avoid the negative impact from irrelevant term sequences that previous neural network models of relation extraction often suffer. In the process of relation classification, the core dependency phrases are also used as the input of a convolutional neural network (CNN) for relation classification. The experimental results show that compared with the methods using original DS data, the methods using filtered DS data performed much better in relation extraction. It indicates that the semantic similarity based method is effective in reducing wrong labels. The relation extraction performance of the CNN model using the core dependency phrases as input is the best of all, which indicates that using the core dependency phrases as input of CNN is enough to capture the features for relation classification and could avoid negative impact from irrelevant terms.
Inhalt: Vgl.: https://doi.org/10.1016/j.ipm.2018.04.002.
Themenfeld: Semantisches Umfeld in Indexierung u. Retrieval ; Automatisches Klassifizieren
2Li, D. ; Luo, Z. ; Ding, Y. ; Tang, J. ; Sun, G.G.-Z. ; Dai, X. ; Du, J. ; Zhang, J. ; Kong, S.: User-level microblogging recommendation incorporating social influence.
In: Journal of the Association for Information Science and Technology. 68(2017) no.3, S.553-568.
Abstract: With the information overload of user-generated content in microblogging, users find it extremely challenging to browse and find valuable information in their first attempt. In this paper we propose a microblogging recommendation algorithm, TSI-MR (Topic-Level Social Influence-based Microblogging Recommendation), which can significantly improve users' microblogging experiences. The main innovation of this proposed algorithm is that we consider social influences and their indirect structural relationships, which are largely based on social status theory, from the topic level. The primary advantage of this approach is that it can build an accurate description of latent relationships between two users with weak connections, which can improve the performance of the model; furthermore, it can solve sparsity problems of training data to a certain extent. The realization of the model is mainly based on Factor Graph. We also applied a distributed strategy to further improve the efficiency of the model. Finally, we use data from Tencent Weibo, one of the most popular microblogging services in China, to evaluate our methods. The results show that incorporating social influence can improve microblogging performance considerably, and outperform the baseline methods.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23681/full.
3Li, D. ; Tang, J. ; Ding, Y. ; Shuai, X. ; Chambers, T. ; Sun, G. ; Luo, Z. ; Zhang, J.: Topic-level opinion influence model (TOIM) : an investigation using tencent microblogging.
In: Journal of the Association for Information Science and Technology. 66(2015) no.12, S.2657-2673.
Abstract: Text mining has been widely used in multiple types of user-generated data to infer user opinion, but its application to microblogging is difficult because text messages are short and noisy, providing limited information about user opinion. Given that microblogging users communicate with each other to form a social network, we hypothesize that user opinion is influenced by its neighbors in the network. In this paper, we infer user opinion on a topic by combining two factors: the user's historical opinion about relevant topics and opinion influence from his/her neighbors. We thus build a topic-level opinion influence model (TOIM) by integrating both topic factor and opinion influence factor into a unified probabilistic model. We evaluate our model in one of the largest microblogging sites in China, Tencent Weibo, and the experiments show that TOIM outperforms baseline methods in opinion inference accuracy. Moreover, incorporating indirect influence further improves inference recall and f1-measure. Finally, we demonstrate some useful applications of TOIM in analyzing users' behaviors in Tencent Weibo.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23350/abstract.
Themenfeld: Data Mining
4Lin, N. ; Li, D. ; Ding, Y. ; He, B. ; Qin, Z. ; Tang, J. ; Li, J. ; Dong, T.: ¬The dynamic features of Delicious, Flickr, and YouTube.
In: Journal of the American Society for Information Science and Technology. 63(2012) no.1, S.139-162.
Abstract: This article investigates the dynamic features of social tagging vocabularies in Delicious, Flickr, and YouTube from 2003 to 2008. Three algorithms are designed to study the macro- and micro-tag growth as well as the dynamics of taggers' activities, respectively. Moreover, we propose a Tagger Tag Resource Latent Dirichlet Allocation (TTR-LDA) model to explore the evolution of topics emerging from those social vocabularies. Our results show that (a) at the macro level, tag growth in all the three tagging systems obeys power law distribution with exponents lower than 1; at the micro level, the tag growth of popular resources in all three tagging systems follows a similar power law distribution; (b) the exponents of tag growth vary in different evolving stages of resources; (c) the growth of number of taggers associated with different popular resources presents a feature of convergence over time; (d) the active level of taggers has a positive correlation with the macro-tag growth of different tagging systems; and (e) some topics evolve into several subtopics over time while others experience relatively stable stages in which their contents do not change much, and certain groups of taggers continue their interests in them.
Themenfeld: Social tagging
Objekt: Delicious ; Flickr ; YouTube
5Huang, H. ; Andrews, J. ; Tang, J.: Citation characterization and impact normalization in bioinformatics journals.
In: Journal of the American Society for Information Science and Technology. 63(2012) no.3, S.490-497.
Abstract: Bioinformatics journals publish research findings of intellectual synergies among subfields such as biology, mathematics, and computer science. The objective of this study is to characterize the citation patterns in bioinformatics journals and their correspondent knowledge subfields. Our study analyzed bibliometric data (impact factor, cited-half-life, and references-per-article) of bioinformatics journals and their related subfields collected from the Journal Citation Reports (JCR). The findings showed that bioinformatics journals' citations are field-dependent, with scattered patterns in article life span and citing propensity. Bioinformatics journals originally derived from biology-related subfields have shorter article life spans, more citing on average, and higher impact factors. Those journals, derived from mathematics and statistics, demonstrate converse citation patterns. Journal impact factors were normalized, taking into account the impacts of article life spans and citing propensity. A comparison of these normalized factors to JCR journal impact factors showed rearrangements in the ranking orders of a number of individual journals, but a high overall correlation with JCR impact factors.
6Clough, P. ; Tang, J. ; Hall, M.H. ; Warner, A.: Linking archival data to location : a case study at the UK National Archives.
In: Aslib proceedings. 63(2011) nos.2/3, S.127-147.
Abstract: Purpose - The National Archives (TNA) is the UK Government's official archive. It stores and maintains records spanning over a 1,000 years in both physical and digital form. Much of the information held by TNA includes references to place and frequently user queries to TNA's online catalogue involve searches for location. The purpose of this paper is to illustrate how TNA have extracted the geographic references in their historic data to improve access to the archives. Design/methodology/approach - To be able to quickly enhance the existing archival data with geographic information, existing technologies from Natural Language Processing (NLP) and Geographical Information Retrieval (GIR) have been utilised and adapted to historical archives. Findings - Enhancing the archival records with geographic information has enabled TNA to quickly develop a number of case studies highlighting how geographic information can improve access to large-scale archival collections. The use of existing methods from the GIR domain and technologies, such as OpenLayers, enabled one to quickly implement this process in a way that is easily transferable to other institutions. Practical implications - The methods and technologies described in this paper can be adapted, by other archives, to similarly enhance access to their historic data. Also the data-sharing methods described can be used to enable the integration of knowledge held at different archival institutions. Originality/value - Place is one of the core dimensions for TNA's archival data. Many of the records which are held make reference to place data (wills, legislation, court cases), and approximately one fifth of users' searches involve place names. However, there are still a number of open questions regarding the adaptation of existing GIR methods to the history domain. This paper presents an overview over available GIR methods and the challenges in applying them to historical data.
Objekt: Web 2.0
7Li, D. ; Ding, Y. ; Sugimoto, C. ; He, B. ; Tang, J. ; Yan, E. ; Lin, N. ; Qin, Z. ; Dong, T.: Modeling topic and community structure in social tagging : the TTR-LDA-Community model.
In: Journal of the American Society for Information Science and Technology. 62(2011) no.9, S.1849-1866.
Abstract: The presence of social networks in complex systems has made networks and community structure a focal point of study in many domains. Previous studies have focused on the structural emergence and growth of communities and on the topics displayed within the network. However, few scholars have closely examined the relationship between the thematic and structural properties of networks. Therefore, this article proposes the Tagger Tag Resource-Latent Dirichlet Allocation-Community model (TTR-LDA-Community model), which combines the Latent Dirichlet Allocation (LDA) model with the Girvan-Newman community detection algorithm through an inference mechanism. Using social tagging data from Delicious, this article demonstrates the clustering of active taggers into communities, the topic distributions within communities, and the ranking of taggers, tags, and resources within these communities. The data analysis evaluates patterns in community structure and topical affiliations diachronically. The article evaluates the effectiveness of community detection and the inference mechanism embedded in the model and finds that the TTR-LDA-Community model outperforms other traditional models in tag prediction. This has implications for scholars in domains interested in community detection, profiling, and recommender systems.
Themenfeld: Social tagging
8Tang, J. ; Liang, B.-Y. ; Li, J.-Z.: Toward detecting mapping strategies for ontology interoperability.
Abstract: Ontology mapping is one of the core tasks for ontology interoperability. It is aimed to find semantic relationships between entities (i.e. concept, attribute, and relation) of two ontologies. It benefits many applications, such as integration of ontology based web data sources, interoperability of agents or web services. To reduce the amount of users' effort as much as possible, (semi-) automatic ontology mapping is becoming more and more important to bring it into fruition. In the existing literature, many approaches have found considerable interest by combining several different similar/mapping strategies (namely multi-strategy based mapping). However, experiments show that the multi-strategy based mapping does not always outperform its single-strategy counterpart. In this paper, we mainly aim to deal with two problems: (1) for a new, unseen mapping task, should we select a multi-strategy based algorithm or just one single-strategy based algorithm? (2) if the task is suitable for multi-strategy, then how to select the strategies into the final combined scenario? We propose an approach of multiple strategies detections for ontology mapping. The results obtained so far show that multi-strategy detection improves on precision and recall significantly.
Inhalt: Beitrag anlässlich: Workshop on The Semantic Computing Initiative (SeC 2005) --- From Semantic Web to Semantic World --- to be held in conjunction with The 14th Int'l Conf. on World Wide Web (WWW2005); vgl.: http://www.instsec.org/2005ws/.
Themenfeld: Semantische Interoperabilität