Diese Datenbank enthält über 40.000 Dokumente zu Themen aus den Bereichen Formalerschließung – Inhaltserschließung – Information Retrieval.
© 2015 W. Gödert, TH Köln, Institut für Informationswissenschaft / Powered by litecat, BIS Oldenburg (Stand: 28. April 2022)
1Tan, X. ; Luo, X. ; Wang, X. ; Wang, H. ; Hou, X.: Representation and display of digital images of cultural heritage : a semantic enrichment approach.
In: Knowledge organization. 48(2021) no.3, S.231-247.
Abstract: Digital images of cultural heritage (CH) contain rich semantic information. However, today's semantic representations of CH images fail to fully reveal the content entities and context within these vital surrogates. This paper draws on the fields of image research and digital humanities to propose a systematic methodology and a technical route for semantic enrichment of CH digital images. This new methodology systematically applies a series of procedures including: semantic annotation, entity-based enrichment, establishing internal relations, event-centric enrichment, defining hierarchy relations between properties text annotation, and finally, named entity recognition in order to ultimately provide fine-grained contextual semantic content disclosure. The feasibility and advantages of the proposed semantic enrichment methods for semantic representation are demonstrated via a visual display platform for digital images of CH built to represent the Wutai Mountain Map, a typical Dunhuang mural. This study proves that semantic enrichment offers a promising new model for exposing content at a fine-grained level, and establishing a rich semantic network centered on the content of digital images of CH.
Inhalt: Vgl.: doi.org/10.5771/0943-7444-2021-3-231.
2Sa, N. ; Yuan, X.J.: Examining users' partial query modification patterns in voice search.
In: Journal of the Association for Information Science and Technology. 71(2020) no.3, S.251-263.
Abstract: This article investigates how to improve the effectiveness of voice search systems. Earlier research found that participants employed voice search much less frequently than keyboard search. The main reasons that participants disliked voice search are system mistakes and the inability to modify queries. In keyboard search, query reformulation is facilitated by partial query modification, which is not supported by most of the current voice search systems. Consequently, users need to speak the complete query in voice search even with only minor changes. This article focuses on examining partial query modification during voice search through a Wizard of Oz user experiment. It examines if users would prefer partial query modification and how they perform it in voice search. Thirty-two participants participated in the experiment. Results indicated that when given the opportunity, the users performed more partial query modifications than complete queries. Common partial query modification strategies and patterns emerged from the experiment. The results can be used to improve the voice search system design and benefit the research community in general. System implications and future work were discussed.
Inhalt: Vgl.: https://asistdl.onlinelibrary.wiley.com/doi/10.1002/asi.24238.
3Xu, S. ; Zhai, D. ; Wang, F. ; An, X. ; Pang, H. ; Sun, Y.: ¬A novel method for topic linkages between scientific publications and patents.
In: Journal of the Association for Information Science and Technology. 70(2019) no.9, S.1026-1042.
Abstract: It is increasingly important to build topic linkages between scientific publications and patents for the purpose of understanding the relationships between science and technology. Previous studies on the linkages mainly focus on the analysis of nonpatent references on the front page of patents, or the resulting citation-link networks, but with unsatisfactory performance. In the meanwhile, abundant mentioned entities in the scholarly articles and patents further complicate topic linkages. To deal with this situation, a novel statistical entity-topic model (named the CCorrLDA2 model), armed with the collapsed Gibbs sampling inference algorithm, is proposed to discover the hidden topics respectively from the academic articles and patents. In order to reduce the negative impact on topic similarity calculation, word tokens and entity mentions are grouped by the Brown clustering method. Then a topic linkages construction problem is transformed into the well-known optimal transportation problem after topic similarity is calculated on the basis of symmetrized Kullback-Leibler (KL) divergence. Extensive experimental results indicate that our approach is feasible to build topic linkages with more superior performance than the counterparts.
Inhalt: Vgl.: https://onlinelibrary.wiley.com/doi/10.1002/asi.24175.
4Chen, H. ; Baptista Nunes, J.M. ; Ragsdell, G. ; An, X.: Somatic and cultural knowledge : drivers of a habitus-driven model of tacit knowledge acquisition.
In: Journal of documentation. 75(2019) no.5, S.927-953.
Abstract: The purpose of this paper is to identify and explain the role of individual learning and development in acquiring tacit knowledge in the context of the inexorable and intense continuous change (technological and otherwise) that characterizes our society today, and also to investigate the software (SW) sector, which is at the core of contemporary continuous change and is a paradigm of effective and intrinsic knowledge sharing (KS). This makes the SW sector unique and different from others where KS is so hard to implement. Design/methodology/approach The study employed an inductive qualitative approach based on a multi-case study approach, composed of three successful SW companies in China. These companies are representative of the fabric of the sector, namely a small- and medium-sized enterprise, a large private company and a large state-owned enterprise. The fieldwork included 44 participants who were interviewed using a semi-structured script. The interview data were coded and interpreted following the Straussian grounded theory pattern of open coding, axial coding and selective coding. The process of interviewing was stopped when theoretical saturation was achieved after a careful process of theoretical sampling. ; Findings The findings of this research suggest that individual learning and development are deemed to be the fundamental feature for professional success and survival in the continuously changing environment of the SW industry today. However, individual learning was described by the participants as much more than a mere individual process. It involves a collective and participatory effort within the organization and the sector as a whole, and a KS process that transcends organizational, cultural and national borders. Individuals in particular are mostly motivated by the pressing need to face and adapt to the dynamic and changeable environments of today's digital society that is led by the sector. Software practitioners are continuously in need of learning, refreshing and accumulating tacit knowledge, partly because it is required by their companies, but also due to a sound awareness of continuous technical and technological changes that seem only to increase with the advances of information technology. This led to a clear theoretical understanding that the continuous change that faces the sector has led to individual acquisition of culture and somatic knowledge that in turn lay the foundation for not only the awareness of the need for continuous individual professional development but also for the creation of habitus related to KS and continuous learning. Originality/value The study reported in this paper shows that there is a theoretical link between the existence of conducive organizational and sector-wide somatic and cultural knowledge, and the success of KS practices that lead to individual learning and development. Therefore, the theory proposed suggests that somatic and cultural knowledge are crucial drivers for the creation of habitus of individual tacit knowledge acquisition. The paper further proposes a habitus-driven individual development (HDID) Theoretical Model that can be of use to both academics and practitioners interested in fostering and developing processes of KS and individual development in knowledge-intensive organizations.
Inhalt: Vgl.: https://doi.org/10.1108/JD-03-2018-0044.
5Han, B. ; Chen, L. ; Tian, X.: Knowledge based collection selection for distributed information retrieval.
In: Information processing and management. 54(2018) no.1, S.116-128.
Abstract: Recent years have seen a great deal of work on collection selection. Most collection selection methods use central sample index (CSI) that consists of some documents sampled from each collection as collection description. The limitations of these methods are the usage of 'flat' meaning representations that ignore structure and relationships among words in CSI, and the calculation of query-collection similarity metric that ignore semantic distance between query words and indexed words. In this paper, we propose a knowledge based collection selection method (KBCS) to improve collection representation and query-collection similarity metric. KBCS models a collection as a weighted entity set and applies a novel query-collection similarity metric to select highly scored collections. Specifically, in the part of collection representation, context- and structure-based measures are employed to weight the semantic distance between two entities extracted from the sampled documents of a collection. In addition, the novel query-collection similarity metric takes the entity weight, collection size, and other factors into account. To enrich concepts contained in a query, DBpedia based query expansion is integrated. Finally, extensive experiments were conducted on a large webpage dataset, and DBpedia was chosen as the graph knowledge base. Experimental results demonstrate the effectiveness of KBCS.
Inhalt: Vgl.: https://doi.org/10.1016/j.ipm.2017.10.002.
6Zhu, J. ; Han, L. ; Gou, Z. ; Yuan, X.: ¬A fuzzy clustering-based denoising model for evaluating uncertainty in collaborative filtering recommender systems.
In: Journal of the Association for Information Science and Technology. 69(2018) no.9, S.1109-1121.
Abstract: Recommender systems are effective in predicting the most suitable products for users, such as movies and books. To facilitate personalized recommendations, the quality of item ratings should be guaranteed. However, a few ratings might not be accurate enough due to the uncertainty of user behavior and are referred to as natural noise. In this article, we present a novel fuzzy clustering-based method for detecting noisy ratings. The entropy of a subset of the original ratings dataset is used to indicate the data-driven uncertainty, and evaluation metrics are adopted to represent the prediction-driven uncertainty. After the repetition of resampling and the execution of a recommendation algorithm, the entropy and evaluation metrics vectors are obtained and are empirically categorized to identify the proportion of the potential noise. Then, the fuzzy C-means-based denoising (FCMD) algorithm is performed to verify the natural noise under the assumption that natural noise is primarily the result of the exceptional behavior of users. Finally, a case study is performed using two real-world datasets. The experimental results show that our proposal outperforms previous proposals and has an advantage in dealing with natural noise.
Inhalt: Vgl.: https://onlinelibrary.wiley.com/doi/10.1002/asi.24036.
7He, W. ; Tian, X.: ¬A longitudinal study of user queries and browsing requests in a case-based reasoning retrieval system.
In: Journal of the Association for Information Science and Technology. 68(2017) no.5, S.1124-1136.
Abstract: This article reports on a longitudinal analysis of query logs of a web-based case library system during an 8-year period (from 2005 to 2012). The analysis studies 3 different information-seeking approaches: keyword searching, browsing, and case-based reasoning (CBR) searching provided by the system by examining the query logs that stretch over 8 years. The longitudinal dimension of this study offers unique possibilities to see how users used the 3 different approaches over time. Various user information-seeking patterns and trends are identified through the query usage pattern analysis and session analysis. The study identified different user groups and found that a majority of the users tend to stick to their favorite information-seeking approach to meet their immediate information needs and do not seem to care whether alternative search options will offer greater benefits. The study also found that return users used CBR searching much more frequently than 1-time users and tend to use more query terms to look for information than 1-time users.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23738/full.
Themenfeld: Suchtaktik ; Case Based Reasoning
8An, X. ; Huang, J.X.: geNov : a new metric for measuring novelty and relevancy in biomedical information retrieval.
In: Journal of the Association for Information Science and Technology. 68(2017) no.11, S.2620-2635.
Abstract: For diversity and novelty evaluation in information retrieval, we expect that the novel documents are always ranked higher than the redundant ones and the relevant ones higher than the irrelevant ones. We also expect that the level of novelty and relevancy should be acknowledged. Accordingly, we expect that the evaluation algorithm would reward rankings that respect these expectations. Nevertheless, there are few research articles in the literature that study how to meet such expectations, even fewer in the field of biomedical information retrieval. In this article, we propose a new metric for novelty and relevancy evaluation in biomedical information retrieval based on an aspect-level performance measure introduced by TREC Genomics Track with formal results to show that those expectations above can be respected under ideal conditions. The empirical evaluation indicates that the proposed metric, geNov, is greatly sensitive to the desired characteristics above, and the three parameters are highly tuneable for different evaluation preferences. By experimentally comparing with state-of-the-art metrics for novelty and diversity, the proposed metric shows its advantages in recognizing the ranking quality in terms of novelty, redundancy, relevancy, and irrelevancy and in its discriminative power. Experiments reveal the proposed metric is faster to compute than state-of-the-art metrics.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23958/full.
Anmerkung: Beitrag in einem Special issue on biomedical information retrieval.
9Pan, X. ; He, S. ; Zhu, X. ; Fu, Q.: How users employ various popular tags to annotate resources in social tagging : an empirical study.
In: Journal of the Association for Information Science and Technology. 67(2016) no.5, S.1121-1137.
Abstract: This paper focuses on exploring the usage patterns and regularities of co-employment of various popular tags and their relationships with the activeness of users and the interest level of resources in social tagging. A hypernetwork for social tagging is constructed in which a tagging action is expressed as a hyperedge and the user, resource, and tag are expressed as nodes. Quantitative measures for the constructed hypernetwork are defined, including the hyperdegree and its distribution, the excess average hyperdegree, and the hyperdegree conditional probability distribution. Using the data set from Delicious, an empirical study was conducted. The empirical results show that multiple individual tags and one or very few popular tags are generally employed together in one tagging action, and the usage patterns and regularities of tags with varying popularity are correlated to both user activity and resource interest. The empirical results are further discussed and explained from the perspectives of tag functions and motivations. Finally, suggestions regarding the usage of various popular tags for both tagging users and service providers of social tagging are given.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23478/abstract.
10Pan, X. ; Yan, E. ; Hua, W.: Science communication and dissemination in different cultures : an analysis of the audience for TED videos in China and abroad.
In: Journal of the Association for Information Science and Technology. 67(2016) no.6, S.1473-1486.
Abstract: Disseminated across the world in more than 100 languages and viewed over 1 billion times, TED Talks is a successful example of web-based science communication. This study investigates the impact of TED Talks videos on YouKu, a Chinese video portal, and YouTube using 6 measures of impact: number of views; likes; dislikes; comments; bookmarks; and shares. In particular, we study the relationship between the topicality and impact of these videos. Findings demonstrate that topics vary greatly in terms of their impact: Topics on entertainment and psychology/philosophy receive more views and likes, whereas design/art and astronomy/biology/oceanography attract fewer comments and bookmarks. Moreover, we identify several topical differences between YouKu and YouTube users. Topics on global issues and technology are more popular on YouKu, whereas topics on entertainment and psychology/philosophy are more popular on YouTube. By analyzing the popularity distribution of videos and the audience characteristics of YouKu, we find that women are more interested in topics on education and psychology/philosophy, whereas men favor topics on technology and astronomy/biology/oceanography.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23461/abstract.
11Yuan, X. (J.) ; Belkin, N.J.: Applying an information-seeking dialogue model in an interactive information retrieval system.
In: Journal of documentation. 70(2014) no.5, S.829-855.
Abstract: Purpose - People often engage in different information-seeking strategies (ISSs) within a single information-seeking episode. A critical concern for the design of information retrieval (IR) systems is how to provide support for these different behaviors in a manner which searchers can easily understand, navigate and use, as they move from one ISS to another. The purpose of this paper is to describe a dialogue structure that was implemented in an experimental IR system, in order to address this concern. Design/methodology/approach - The authors conducted a user-centered experiment to evaluate the IR systems. Participants were asked to search for information on two different task types, with four different topics per task, in both the experimental system and a baseline system emulating state-of-the-art IR systems. The authors report here the results related explicitly to the use of the experimental system's dialogue structure. Findings - For one of the task types, most participants followed the search steps as predicted in the dialogue structures, and those who did so completed the task in fewer moves. For the other task type, predicted order of moves was often not followed, but participants again used fewer moves when following the predicted order. Results demonstrate that the dialogue structures the authors designed indeed support effective human information behavior patterns in a variety of ways, and that searchers can effectively use a system which changes to support different ISSs. Originality/value - This study shows that it is both possible and beneficial, to design an IR system which can support multiple ISSs, and that such a system can be understood and used successfully.
12Wan, X. ; Liu, F.: Are all literature citations equally important? : automatic citation strength estimation and its applications.
In: Journal of the Association for Information Science and Technology. 65(2014) no.9, S.1929-1938.
Abstract: Literature citation analysis plays a very important role in bibliometrics and scientometrics, such as the Science Citation Index (SCI) impact factor, h-index. Existing citation analysis methods assume that all citations in a paper are equally important, and they simply count the number of citations. Here we argue that the citations in a paper are not equally important and some citations are more important than the others. We use a strength value to assess the importance of each citation and propose to use the regression method with a few useful features for automatically estimating the strength value of each citation. Evaluation results on a manually labeled data set in the computer science field show that the estimated values can achieve good correlation with human-labeled values. We further apply the estimated citation strength values for evaluating paper influence and author influence, and the preliminary evaluation results demonstrate the usefulness of the citation strength values.
13Wan, X. ; Liu, F.: WL-index : leveraging citation mention number to quantify an individual's scientific impact.
In: Journal of the Association for Information Science and Technology. 65(2014) no.12, S.2509-2517.
Abstract: A number of bibliometric indices have been developed to evaluate an individual's scientific impact, and the most popular are the h-index and its variants. However, existing bibliometric indices are computed based on the number of citations received by each article, but they do not consider the frequency with which individual citations are mentioned in an article. We use "citation mention" to denote a unique occurrence of a cited reference mentioned in the citing article, and thus some citations may have more than one mention in an article. According to our analysis of the ACL Anthology Network corpus in the natural language processing field, more than 40% of cited references have been mentioned twice or in corresponding citing articles. We argue that citation mention is a preferable for representing the citation relationships between articles, that is, a reference article mentioned m times in the citing article will be considered to have received m citations, rather than one citation. Based on this assumption, we revise the h-index and propose a new bibliometric index, the WL-index, to evaluation an individual's scientific impact. According to our empirical analysis, the proposed WL-index more accurately discriminates between program committee chairs of reputable conferences and ordinary authors.
14Guo, L. ; Wan, X.: Exploiting syntactic and semantic relationships between terms for opinion retrieval.
In: Journal of the American Society for Information Science and Technology. 63(2012) no.11, S.2269-2282.
Abstract: Opinion retrieval is the task of finding documents that express an opinion about a given query. A key challenge in opinion retrieval is to capture the query-related opinion score of a document. Existing methods rely mainly on the proximity information between the opinion terms and the query terms to address the key challenge. In this study, we propose to incorporate the syntactic and semantic information of terms into a probabilistic model to capture the query-related opinion score more accurately. The syntactic tree structure of a sentence is used to evaluate the modifying probability between an opinion term and a noun within the sentence with a tree kernel method. Moreover, WordNet and the probabilistic topic model are used to evaluate the semantic relatedness between any noun and the given query. The experimental results over standard TREC baselines on the benchmark BLOG06 collection demonstrate the effectiveness of our proposed method, in comparison with the proximity-based method and other baselines.
15Rorissa, A. ; Yuan, X.: Visualizing and mapping the intellectual structure of information retrieval.
In: Information processing and management. 48(2012) no.1, S.120-135.
Abstract: Information retrieval is a long established subfield of library and information science. Since its inception in the early- to mid -1950s, it has grown as a result, in part, of well-regarded retrieval system evaluation exercises/campaigns, the proliferation of Web search engines, and the expansion of digital libraries. Although researchers have examined the intellectual structure and nature of the general field of library and information science, the same cannot be said about the subfield of information retrieval. We address that in this work by sketching the information retrieval intellectual landscape through visualizations of citation behaviors. Citation data for 10 years (2000-2009) were retrieved from the Web of Science and analyzed using existing visualization techniques. Our results address information retrieval's co-authorship network, highly productive authors, highly cited journals and papers, author-assigned keywords, active institutions, and the import of ideas from other disciplines.
Inhalt: Vgl.: doi:10.1016/j.ipm.2011.03.004.
16Yuan, X. ; Belkin, N.J.: Investigating information retrieval support techniques for different information-seeking strategies.
In: Journal of the American Society for Information Science and Technology. 61(2010) no.8, S.1543-1563.
Abstract: We report on a study that investigated the efficacy of four different interactive information retrieval (IIR) systems, each designed to support a specific information-seeking strategy (ISS). These systems were constructed using different combinations of IR techniques (i.e., combinations of different methods of representation, comparison, presentation and navigation), each of which was hypothesized to be well suited to support a specific ISS. We compared the performance of searchers in each such system, designated experimental, to an appropriate baseline system, which implemented the standard specified query and results list model of current state-of-the-art experimental and operational IR systems. Four within-subjects experiments were conducted for the purpose of this comparison. Results showed that each of the experimental systems was superior to its baseline system in supporting user performance for the specific ISS (that is, the information problem leading to that ISS) for which the system was designed. These results indicate that an IIR system, which intends to support more than one kind of ISS, should be designed within a framework which allows the use and combination of different IR support techniques for different ISSs.
Themenfeld: Benutzerstudien ; Suchtaktik
17Yuan, X. ; Belkin, N.J.: Evaluating an integrated system supporting multiple information-seeking strategies.
In: Journal of the American Society for Information Science and Technology. 61(2010) no.10, S.1987-2010.
Abstract: Many studies have demonstrated that people engage in a variety of different information behaviors when engaging in information seeking. However, standard information retrieval systems such as Web search engines continue to be designed to support mainly one such behavior, specified searching. This situation has led to suggestions that people would be better served by information retrieval systems which support different kinds of information-seeking strategies. This article reports on an experiment comparing the retrieval effectiveness of an integrated interactive information retrieval (IIR) system which adapts to support different information-seeking strategies with that of a standard baseline IIR system. The experiment, with 32 participants each searching on eight different topics, indicates that using the integrated IIR system resulted in significantly better user satisfaction with search results, significantly more effective interaction, and significantly better usability than that using the baseline system.
18Wan, X. ; Yang, J. ; Xiao, J.: Towards a unified approach to document similarity search using manifold-ranking of blocks.
In: Information processing and management. 44(2008) no.3, S.1032-1048.
Abstract: Document similarity search (i.e. query by example) aims to retrieve a ranked list of documents similar to a query document in a text corpus or on the Web. Most existing approaches to similarity search first compute the pairwise similarity score between each document and the query using a retrieval function or similarity measure (e.g. Cosine), and then rank the documents by the similarity scores. In this paper, we propose a novel retrieval approach based on manifold-ranking of document blocks (i.e. a block of coherent text about a subtopic) to re-rank a small set of documents initially retrieved by some existing retrieval function. The proposed approach can make full use of the intrinsic global manifold structure of the document blocks by propagating the ranking scores between the blocks on a weighted graph. First, the TextTiling algorithm and the VIPS algorithm are respectively employed to segment text documents and web pages into blocks. Then, each block is assigned with a ranking score by the manifold-ranking algorithm. Lastly, a document gets its final ranking score by fusing the scores of its blocks. Experimental results on the TDT data and the ODP data demonstrate that the proposed approach can significantly improve the retrieval performances over baseline approaches. Document block is validated to be a better unit than the whole document in the manifold-ranking process.
19Murdock, V. ; Kelly, D. ; Croft, W.B. ; Belkin, N.J. ; Yuan, X.: Identifying and improving retrieval for procedural questions.
In: Information processing and management. 43(2007) no.1, S.181-203.
Abstract: People use questions to elicit information from other people in their everyday lives and yet the most common method of obtaining information from a search engine is by posing keywords. There has been research that suggests users are better at expressing their information needs in natural language, however the vast majority of work to improve document retrieval has focused on queries posed as sets of keywords or Boolean queries. This paper focuses on improving document retrieval for the subset of natural language questions asking about how something is done. We classify questions as asking either for a description of a process or asking for a statement of fact, with better than 90% accuracy. Further we identify non-content features of documents relevant to questions asking about a process. Finally we demonstrate that we can use these features to significantly improve the precision of document retrieval results for questions asking about a process. Our approach, based on exploiting the structure of documents, shows a significant improvement in precision at rank one for questions asking about how something is done.
20Wan, X. ; Yang, J. ; Xiao, J.: Incorporating cross-document relationships between sentences for single document summarizations.
In: Research and advanced technology for digital libraries : 10th European conference, proceedings / ECDL 2006, Alicante, Spain, September 17 - 22, 2006. Berlin : Springer, 2006. S.403-414.
(Lecture notes in computer science; vol.4172)
Abstract: Graph-based ranking algorithms have recently been proposed for single document summarizations and such algorithms evaluate the importance of a sentence by making use of the relationships between sentences in the document in a recursive way. In this paper, we investigate using other related or relevant documents to improve summarization of one single document based on the graph-based ranking algorithm. In addition to the within-document relationships between sentences in the specified document, the cross-document relationships between sentences in different documents are also taken into account in the proposed approach. We evaluate the performance of the proposed approach on DUC 2002 data with the ROUGE metric and results demonstrate that the cross-document relationships between sentences in different but related documents can significantly improve the performance of single document summarization.