Diese Datenbank enthält über 40.000 Dokumente zu Themen aus den Bereichen Formalerschließung – Inhaltserschließung – Information Retrieval.
© 2015 W. Gödert, TH Köln, Institut für Informationswissenschaft / Powered by litecat, BIS Oldenburg (Stand: 04. Juni 2021)
1Hu, K. ; Luo, Q. ; Qi, K. ; Yang, S. ; Mao, J. ; Fu, X. ; Zheng, J. ; Wu, H. ; Guo, Y. ; Zhu, Q.: Understanding the topic evolution of scientific literatures like an evolving city : using Google Word2Vec model and spatial autocorrelation analysis.
In: Information processing and management. 56(2019) no.4, S.1185-1203.
Abstract: Topic evolution has been described by many approaches from a macro level to a detail level, by extracting topic dynamics from text in literature and other media types. However, why the evolution happens is less studied. In this paper, we focus on whether and how the keyword semantics can invoke or affect the topic evolution. We assume that the semantic relatedness among the keywords can affect topic popularity during literature surveying and citing process, thus invoking evolution. However, the assumption is needed to be confirmed in an approach that fully considers the semantic interactions among topics. Traditional topic evolution analyses in scientometric domains cannot provide such support because of using limited semantic meanings. To address this problem, we apply the Google Word2Vec, a deep learning language model, to enhance the keywords with more complete semantic information. We further develop the semantic space as an urban geographic space. We analyze the topic evolution geographically using the measures of spatial autocorrelation, as if keywords are the changing lands in an evolving city. The keyword citations (keyword citation counts one when the paper containing this keyword obtains a citation) are used as an indicator of keyword popularity. Using the bibliographical datasets of the geographical natural hazard field, experimental results demonstrate that in some local areas, the popularity of keywords is affecting that of the surrounding keywords. However, there are no significant impacts on the evolution of all keywords. The spatial autocorrelation analysis identifies the interaction patterns (including High-High leading, High-Low suppressing) among the keywords in local areas. This approach can be regarded as an analyzing framework borrowed from geospatial modeling. Moreover, the prediction results in local areas are demonstrated to be more accurate if considering the spatial autocorrelations.
Inhalt: Vgl.: https://doi.org/10.1016/j.ipm.2019.02.014.
Themenfeld: Semantisches Umfeld in Indexierung u. Retrieval
2Li, H. ; Wu, H. ; Li, D. ; Lin, S. ; Su, Z. ; Luo, X.: PSI: A probabilistic semantic interpretable framework for fine-grained image ranking.
In: Journal of the Association for Information Science and Technology. 69(2018) no.12, S.1488-1501.
Abstract: Image Ranking is one of the key problems in information science research area. However, most current methods focus on increasing the performance, leaving the semantic gap problem, which refers to the learned ranking models are hard to be understood, remaining intact. Therefore, in this article, we aim at learning an interpretable ranking model to tackle the semantic gap in fine-grained image ranking. We propose to combine attribute-based representation and online passive-aggressive (PA) learning based ranking models to achieve this goal. Besides, considering the highly localized instances in fine-grained image ranking, we introduce a supervised constrained clustering method to gather class-balanced training instances for local PA-based models, and incorporate the learned local models into a unified probabilistic framework. Extensive experiments on the benchmark demonstrate that the proposed framework outperforms state-of-the-art methods in terms of accuracy and speed.
Behandelte Form: Bilder
3Lin, W. ; Yueh, H.-P. ; Wu, H.-Y. ; Fu, L.-C.: Developing a service robot for a children's library : a design-based research approach.
In: Journal of the Association for Information Science and Technology. 65(2014) no.2, S.290-301.
Abstract: Understanding book-locating behavior in libraries is important and leads to more effective services that support patrons throughout the book-locating process. This study adopted a design-based approach to incorporate robotic assistance in investigating the book-locating behaviors of child patrons, and developed a service robot for child patrons in library settings. We describe the iterative cycles and process to develop a robot to assist with locating resources in libraries. Stakeholders, including child patrons and librarians, were consulted about their needs, preferences, and performance in locating library resources with robotic assistance. Their needs were analyzed and incorporated into the design of the library robot to provide comprehensive support. The results of the study suggest that the library robot was effective as a mobile and humanoid service agent for providing motivation and knowledgeable guidance to help child patrons in the initially complicated sequence of locating resources.
4Wu, H. ; He, J. ; Pei, Y.: Scientific impact at the topic level : a case study in computational linguistics.
In: Journal of the American Society for Information Science and Technology. 61(2010) no.11, S.2274-2287.
Abstract: In this article, we propose to apply the topic model and topic-level eigenfactor (TEF) algorithm to assess the relative importance of academic entities including articles, authors, journals, and conferences. Scientific impact is measured by the biased PageRank score toward topics created by the latent topic model. The TEF metric considers the impact of an academic entity in multiple granular views as well as in a global view. Experiments on a computational linguistics corpus show that the method is a useful and promising measure to assess scientific impact.
5Ning, X. ; Jin, H. ; Wu, H.: RSS: a framework enabling ranked search on the semantic web.
In: Information processing and management. 44(2008) no.2, S.893-909.
Abstract: The semantic web not only contains resources but also includes the heterogeneous relationships among them, which is sharply distinguished from the current web. As the growth of the semantic web, specialized search techniques are of significance. In this paper, we present RSS-a framework for enabling ranked semantic search on the semantic web. In this framework, the heterogeneity of relationships is fully exploited to determine the global importance of resources. In addition, the search results can be greatly expanded with entities most semantically related to the query, thus able to provide users with properly ordered semantic search results by combining global ranking values and the relevance between the resources and the query. The proposed semantic search model which supports inference is very different from traditional keyword-based search methods. Moreover, RSS also distinguishes from many current methods of accessing the semantic web data in that it applies novel ranking strategies to prevent returning search results in disorder. The experimental results show that the framework is feasible and can produce better ordering of semantic search results than directly applying the standard PageRank algorithm on the semantic web.
Themenfeld: Retrievalalgorithmen ; Semantic Web
6Wu, H.C. ; Luk, R.W.P. ; Wong, K.F, ; Kwok, K.L.: ¬A retrospective study of a hybrid document-context based retrieval model.
In: Information processing and management. 43(2007) no.5, S.1308-1331.
Abstract: This paper describes our novel retrieval model that is based on contexts of query terms in documents (i.e., document contexts). Our model is novel because it explicitly takes into account of the document contexts instead of implicitly using the document contexts to find query expansion terms. Our model is based on simulating a user making relevance decisions, and it is a hybrid of various existing effective models and techniques. It estimates the relevance decision preference of a document context as the log-odds and uses smoothing techniques as found in language models to solve the problem of zero probabilities. It combines these estimated preferences of document contexts using different types of aggregation operators that comply with different relevance decision principles (e.g., aggregate relevance principle). Our model is evaluated using retrospective experiments (i.e., with full relevance information), because such experiments can (a) reveal the potential of our model, (b) isolate the problems of the model from those of the parameter estimation, (c) provide information about the major factors affecting the retrieval effectiveness of the model, and (d) show that whether the model obeys the probability ranking principle. Our model is promising as its mean average precision is 60-80% in our experiments using different TREC ad hoc English collections and the NTCIR-5 ad hoc Chinese collection. Our experiments showed that (a) the operators that are consistent with aggregate relevance principle were effective in combining the estimated preferences, and (b) that estimating probabilities using the contexts in the relevant documents can produce better retrieval effectiveness than using the entire relevant documents.
7Radev, D. ; Fan, W. ; Qu, H. ; Wu, H. ; Grewal, A.: Probabilistic question answering on the Web.
In: Journal of the American Society for Information Science and Technology. 56(2005) no.6, S.571-583.
Abstract: Web-based search engines such as Google and NorthernLight return documents that are relevant to a user query, not answers to user questions. We have developed an architecture that augments existing search engines so that they support natural language question answering. The process entails five steps: query modulation, document retrieval, passage extraction, phrase extraction, and answer ranking. In this article, we describe some probabilistic approaches to the last three of these stages. We show how our techniques apply to a number of existing search engines, and we also present results contrasting three different methods for question answering. Our algorithm, probabilistic phrase reranking (PPR), uses proximity and question type features and achieves a total reciprocal document rank of .20 an the TREC8 corpus. Our techniques have been implemented as a Web-accessible system, called NSIR.
Themenfeld: Suchmaschinen ; Retrievalalgorithmen ; Computerlinguistik ; Sprachretrieval
8Fan, W. ; Fox, E.A. ; Pathak, P. ; Wu, H.: ¬The effects of fitness functions an genetic programming-based ranking discovery for Web search.
In: Journal of the American Society for Information Science and technology. 55(2004) no.7, S.628-636.
Abstract: Genetic-based evolutionary learning algorithms, such as genetic algorithms (GAs) and genetic programming (GP), have been applied to information retrieval (IR) since the 1980s. Recently, GP has been applied to a new IR taskdiscovery of ranking functions for Web search-and has achieved very promising results. However, in our prior research, only one fitness function has been used for GP-based learning. It is unclear how other fitness functions may affect ranking function discovery for Web search, especially since it is weIl known that choosing a proper fitness function is very important for the effectiveness and efficiency of evolutionary algorithms. In this article, we report our experience in contrasting different fitness function designs an GP-based learning using a very large Web corpus. Our results indicate that the design of fitness functions is instrumental in performance improvement. We also give recommendations an the design of fitness functions for genetic-based information retrieval experiments.
9Studwell, W.E. ; Wang, R. ; Wu, H.: Ideological influences on book classification schemes in the People's Republic of China.
In: Cataloging and classification quarterly. 19(1994) no.1, S.61-74.
Abstract: Describes the history of classification in China and discusses the application of 4 major classification schemes in the Chinese People's Republic, to show how the thread of ideology and ideological considerations have influenced their structure and contents and have also influenced the historical development of classification in China from earliest times. Presented recommendations are for prossible future revision of classification schemes
Themenfeld: Geschichte der Klassifikationssysteme
10Studwell, W.E. ; Wang, R. ; Wu, H.: ¬A tale of two decades : the controversy over the choice of a Chinese language romanization system in American cataloging practice.
In: Cataloging and classification quarterly. 18(1993) no.1, S.117-124.
Abstract: Examines the 2 major schemes for transliteration or Romanization of Chinese into English: Wade-Giles and Pinyin. Concludes that both methods are about equally effective, but since most of the rest of the world is now using Pinyin, the Library of Congress should, for practical reasons, switch to Pinyin