Diese Datenbank enthält über 40.000 Dokumente zu Themen aus den Bereichen Formalerschließung – Inhaltserschließung – Information Retrieval.
© 2015 W. Gödert, TH Köln, Institut für Informationswissenschaft / Powered by litecat, BIS Oldenburg (Stand: 04. Juni 2021)
1Wan, X. ; Liu, F.: Are all literature citations equally important? : automatic citation strength estimation and its applications.
In: Journal of the Association for Information Science and Technology. 65(2014) no.9, S.1929-1938.
Abstract: Literature citation analysis plays a very important role in bibliometrics and scientometrics, such as the Science Citation Index (SCI) impact factor, h-index. Existing citation analysis methods assume that all citations in a paper are equally important, and they simply count the number of citations. Here we argue that the citations in a paper are not equally important and some citations are more important than the others. We use a strength value to assess the importance of each citation and propose to use the regression method with a few useful features for automatically estimating the strength value of each citation. Evaluation results on a manually labeled data set in the computer science field show that the estimated values can achieve good correlation with human-labeled values. We further apply the estimated citation strength values for evaluating paper influence and author influence, and the preliminary evaluation results demonstrate the usefulness of the citation strength values.
2Wan, X. ; Liu, F.: WL-index : leveraging citation mention number to quantify an individual's scientific impact.
In: Journal of the Association for Information Science and Technology. 65(2014) no.12, S.2509-2517.
Abstract: A number of bibliometric indices have been developed to evaluate an individual's scientific impact, and the most popular are the h-index and its variants. However, existing bibliometric indices are computed based on the number of citations received by each article, but they do not consider the frequency with which individual citations are mentioned in an article. We use "citation mention" to denote a unique occurrence of a cited reference mentioned in the citing article, and thus some citations may have more than one mention in an article. According to our analysis of the ACL Anthology Network corpus in the natural language processing field, more than 40% of cited references have been mentioned twice or in corresponding citing articles. We argue that citation mention is a preferable for representing the citation relationships between articles, that is, a reference article mentioned m times in the citing article will be considered to have received m citations, rather than one citation. Based on this assumption, we revise the h-index and propose a new bibliometric index, the WL-index, to evaluation an individual's scientific impact. According to our empirical analysis, the proposed WL-index more accurately discriminates between program committee chairs of reputable conferences and ordinary authors.
3Guo, L. ; Wan, X.: Exploiting syntactic and semantic relationships between terms for opinion retrieval.
In: Journal of the American Society for Information Science and Technology. 63(2012) no.11, S.2269-2282.
Abstract: Opinion retrieval is the task of finding documents that express an opinion about a given query. A key challenge in opinion retrieval is to capture the query-related opinion score of a document. Existing methods rely mainly on the proximity information between the opinion terms and the query terms to address the key challenge. In this study, we propose to incorporate the syntactic and semantic information of terms into a probabilistic model to capture the query-related opinion score more accurately. The syntactic tree structure of a sentence is used to evaluate the modifying probability between an opinion term and a noun within the sentence with a tree kernel method. Moreover, WordNet and the probabilistic topic model are used to evaluate the semantic relatedness between any noun and the given query. The experimental results over standard TREC baselines on the benchmark BLOG06 collection demonstrate the effectiveness of our proposed method, in comparison with the proximity-based method and other baselines.
4Wan, X. ; Yang, J. ; Xiao, J.: Towards a unified approach to document similarity search using manifold-ranking of blocks.
In: Information processing and management. 44(2008) no.3, S.1032-1048.
Abstract: Document similarity search (i.e. query by example) aims to retrieve a ranked list of documents similar to a query document in a text corpus or on the Web. Most existing approaches to similarity search first compute the pairwise similarity score between each document and the query using a retrieval function or similarity measure (e.g. Cosine), and then rank the documents by the similarity scores. In this paper, we propose a novel retrieval approach based on manifold-ranking of document blocks (i.e. a block of coherent text about a subtopic) to re-rank a small set of documents initially retrieved by some existing retrieval function. The proposed approach can make full use of the intrinsic global manifold structure of the document blocks by propagating the ranking scores between the blocks on a weighted graph. First, the TextTiling algorithm and the VIPS algorithm are respectively employed to segment text documents and web pages into blocks. Then, each block is assigned with a ranking score by the manifold-ranking algorithm. Lastly, a document gets its final ranking score by fusing the scores of its blocks. Experimental results on the TDT data and the ODP data demonstrate that the proposed approach can significantly improve the retrieval performances over baseline approaches. Document block is validated to be a better unit than the whole document in the manifold-ranking process.
5Wan, X. ; Yang, J. ; Xiao, J.: Incorporating cross-document relationships between sentences for single document summarizations.
In: Research and advanced technology for digital libraries : 10th European conference, proceedings / ECDL 2006, Alicante, Spain, September 17 - 22, 2006. Berlin : Springer, 2006. S.403-414.
(Lecture notes in computer science; vol.4172)
Abstract: Graph-based ranking algorithms have recently been proposed for single document summarizations and such algorithms evaluate the importance of a sentence by making use of the relationships between sentences in the document in a recursive way. In this paper, we investigate using other related or relevant documents to improve summarization of one single document based on the graph-based ranking algorithm. In addition to the within-document relationships between sentences in the specified document, the cross-document relationships between sentences in different documents are also taken into account in the proposed approach. We evaluate the performance of the proposed approach on DUC 2002 data with the ROUGE metric and results demonstrate that the cross-document relationships between sentences in different but related documents can significantly improve the performance of single document summarization.