Diese Datenbank enthält über 40.000 Dokumente zu Themen aus den Bereichen Formalerschließung – Inhaltserschließung – Information Retrieval.
© 2015 W. Gödert, TH Köln, Institut für Informationswissenschaft / Powered by litecat, BIS Oldenburg (Stand: 13. Juni 2017)
1Jiang, Y. ; Bai, W. ; Zhang, X. ; Hu, J.: Wikipedia-based information content and semantic similarity computation.
In: Information processing and management. 53(2017) no.1, S.248-265.
Abstract: The Information Content (IC) of a concept is a fundamental dimension in computational linguistics. It enables a better understanding of concept's semantics. In the past, several approaches to compute IC of a concept have been proposed. However, there are some limitations such as the facts of relying on corpora availability, manual tagging, or predefined ontologies and fitting non-dynamic domains in the existing methods. Wikipedia provides a very large domain-independent encyclopedic repository and semantic network for computing IC of concepts with more coverage than usual ontologies. In this paper, we propose some novel methods to IC computation of a concept to solve the shortcomings of existing approaches. The presented methods focus on the IC computation of a concept (i.e., Wikipedia category) drawn from the Wikipedia category structure. We propose several new IC-based measures to compute the semantic similarity between concepts. The evaluation, based on several widely used benchmarks and a benchmark developed in ourselves, sustains the intuitions with respect to human judgments. Overall, some methods proposed in this paper have a good human correlation and constitute some effective ways of determining IC values for concepts and semantic similarity between concepts.
Inhalt: Vgl.: http://www.sciencedirect.com/science/article/pii/S0306457316303934 [http://dx.doi.org/10.1016/j.ipm.2016.09.001].
Themenfeld: Semantisches Umfeld in Indexierung u. Retrieval
2Zhang, X. ; Liu, J. ; Cole, M. ; Belkin, N.: Predicting users' domain knowledge in information retrieval using multiple regression analysis of search behaviors.
In: Journal of the Association for Information Science and Technology. 66(2015) no.5, S.980-1000.
Abstract: User domain knowledge affects search behaviors and search success. Predicting a user's knowledge level from implicit evidence such as search behaviors could allow an adaptive information retrieval system to better personalize its interaction with users. This study examines whether user domain knowledge can be predicted from search behaviors by applying a regression modeling analysis method. We identify behavioral features that contribute most to a successful prediction model. A user experiment was conducted with 40 participants searching on task topics in the domain of genomics. Participant domain knowledge level was assessed based on the users' familiarity with and expertise in the search topics and their knowledge of MeSH (Medical Subject Headings) terms in the categories that corresponded to the search topics. Users' search behaviors were captured by logging software, which includes querying behaviors, document selection behaviors, and general task interaction behaviors. Multiple regression analysis was run on the behavioral data using different variable selection methods. Four successful predictive models were identified, each involving a slightly different set of behavioral variables. The models were compared for the best on model fit, significance of the model, and contributions of individual predictors in each model. Each model was validated using the split sampling method. The final model highlights three behavioral variables as domain knowledge level predictors: the number of documents saved, the average query length, and the average ranking position of the documents opened. The results are discussed, study limitations are addressed, and future research directions are suggested.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23218/abstract.
3Jiang, Y. ; Zhang, X. ; Tang, Y. ; Nie, R.: Feature-based approaches to semantic similarity assessment of concepts using Wikipedia.
In: Information processing and management. 51(2015) no.3, S.215-234.
Abstract: Semantic similarity assessment between concepts is an important task in many language related applications. In the past, several approaches to assess similarity by evaluating the knowledge modeled in an (or multiple) ontology (or ontologies) have been proposed. However, there are some limitations such as the facts of relying on predefined ontologies and fitting non-dynamic domains in the existing measures. Wikipedia provides a very large domain-independent encyclopedic repository and semantic network for computing semantic similarity of concepts with more coverage than usual ontologies. In this paper, we propose some novel feature based similarity assessment methods that are fully dependent on Wikipedia and can avoid most of the limitations and drawbacks introduced above. To implement similarity assessment based on feature by making use of Wikipedia, firstly a formal representation of Wikipedia concepts is presented. We then give a framework for feature based similarity based on the formal representation of Wikipedia concepts. Lastly, we investigate several feature based approaches to semantic similarity measures resulting from instantiations of the framework. The evaluation, based on several widely used benchmarks and a benchmark developed in ourselves, sustains the intuitions with respect to human judgements. Overall, several methods proposed in this paper have good human correlation and constitute some effective ways of determining similarity between Wikipedia concepts.
Inhalt: Vgl.: doi: 10.1016/j.ipm.2015.01.001.
Themenfeld: Semantisches Umfeld in Indexierung u. Retrieval
4Ho, S.M. ; Bieber, M. ; Song, M. ; Zhang, X.: Seeking beyond with IntegraL : a user study of sense-making enabled by anchor-based virtual integration of library systems.
In: Journal of the American Society for Information Science and Technology. 64(2013) no.9, S.1927-1945.
Abstract: This article presents a user study showing the effectiveness of a linked-based, virtual integration infrastructure that gives users access to relevant online resources, empowering them to design an information-seeking path that is specifically relevant to their context. IntegraL provides a lightweight approach to improve and augment search functionality by dynamically generating context-focused "anchors" for recognized elements of interest generated by library services. This article includes a description of how IntegraL's design supports users' information-seeking behavior. A full user study with both objective and subjective measures of IntegraL and hypothesis testing regarding IntegraL's effectiveness of the user's information-seeking experience are described along with data analysis, implications arising from this kind of virtual integration, and possible future directions.
5Squicciarini, A.C ; Heng Xu, H. ; Zhang, X.(L.): CoPE: enabling collaborative privacy management in online social networks.
In: Journal of the American Society for Information Science and Technology. 62(2011) no.3, S.521-534.
Abstract: Online Social Networks (OSNs) facilitate the creation and maintenance of interpersonal online relationships. Unfortunately, the availability of personal data on social networks may unwittingly expose users to numerous privacy risks. As a result, establishing effective methods to control personal data and maintain privacy within these OSNs have become increasingly important. This research extends the current access control mechanisms employed by OSNs to protect private information shared among users of OSNs. The proposed approach presents a system of collaborative content management that relies on an extended notion of a "content stakeholder." A tool, Collaborative Privacy Management (CoPE), is implemented as an application within a popular social-networking site, facebook.com, to ensure the protection of shared images generated by users. We present a user study of our CoPE tool through a survey-based study (n=80). The results demonstrate that regardless of whether Facebook users are worried about their privacy, they like the idea of collaborative privacy management and believe that a tool such as CoPE would be useful to manage their personal information shared within a social network.
6Taylor, A. ; Zhang, X. ; Amadio, W.J.: Examination of relevance criteria choices and the information search process.
In: Journal of documentation. 65(2009) no.5, S.719-744.
Abstract: Purpose - The purpose of this paper is to examine changes in relevance assessments, specifically the selection of relevance criteria by subjects as they move through the information search process. Design/methodology/approach - The paper examines the relevance criteria choices of 39 subjects in relation to search stage. Subjects were assigned a specific search task in a controlled test. Statistics were collected and analyzed using descriptive statistics and the chi-square goodness-of-fit tests. Findings - The statistically significant findings identified a number of commonly reported relevance criteria, which varied over an information search process for relevant and partially relevant judgments. These results provide statistical confirmations of previous studies, and extend these findings identifying specific criteria for both relevant and partially relevant judgments. Research limitations/implications - The study only examines a short duration search process and since the convenience sample of subjects were from similar backgrounds and were assigned similar tasks, the study did not explicitly examine the impact of contextual factors such as user experience, background or task in relation to relevance criteria choices. Practical implications - The paper has implications for the development of search systems which are adaptive and recognize the cognitive changes which occur during the information search process. Examining and identifying relevance criteria beyond topicality and the importance of those criteria to a user can help in the generation of better search queries. Originality/value - The paper adds more rigorous statistical analysis to the study of relevance criteria and the information search process.
Themenfeld: Benutzerstudien ; Suchtaktik
7Zhang, X. ; Li, Y. ; Liu, J. ; Zhang, Y.: Effects of interaction design in digital libraries on user interactions.
In: Journal of documentation. 64(2008) no.3, S.438-463.
Abstract: Purpose - This study aims to investigate the effects of different search and browse features in digital libraries (DLs) on task interactions, and what features would lead to poor user experience. Design/methodology/approach - Three operational DLs: ACM, IEEE CS, and IEEE Xplore are used in this study. These three DLs present different features in their search and browsing designs. Two information-seeking tasks are constructed: one search task and one browsing task. An experiment was conducted in a usability laboratory. Data from 35 participants are collected on a set of measures for user interactions. Findings - The results demonstrate significant differences in many aspects of the user interactions between the three DLs. For both search and browse designs, the features that lead to poor user interactions are identified. Research limitations/implications - User interactions are affected by specific design features in DLs. Some of the design features may lead to poor user performance and should be improved. The study was limited mainly in the variety and the number of tasks used. Originality/value - The study provided empirical evidence to the effects of interaction design features in DLs on user interactions and performance. The results contribute to our knowledge about DL designs in general and about the three operational DLs in particular.
Themenfeld: Information Gateway ; Benutzerstudien ; Suchoberflächen
8Qin, T. ; Zhang, X.-D. ; Tsai, M.-F. ; Wang, D.-S. ; Liu, T.-Y. ; Li, H.: Query-level loss functions for information retrieval.
In: Information processing and management. 44(2008) no.2, S.838-855.
Abstract: Many machine learning technologies such as support vector machines, boosting, and neural networks have been applied to the ranking problem in information retrieval. However, since originally the methods were not developed for this task, their loss functions do not directly link to the criteria used in the evaluation of ranking. Specifically, the loss functions are defined on the level of documents or document pairs, in contrast to the fact that the evaluation criteria are defined on the level of queries. Therefore, minimizing the loss functions does not necessarily imply enhancing ranking performances. To solve this problem, we propose using query-level loss functions in learning of ranking functions. We discuss the basic properties that a query-level loss function should have and propose a query-level loss function based on the cosine similarity between a ranking list and the corresponding ground truth. We further design a coordinate descent algorithm, referred to as RankCosine, which utilizes the proposed loss function to create a generalized additive ranking model. We also discuss whether the loss functions of existing ranking algorithms can be extended to query-level. Experimental results on the datasets of TREC web track, OHSUMED, and a commercial web search engine show that with the use of the proposed query-level loss function we can significantly improve ranking accuracies. Furthermore, we found that it is difficult to extend the document-level loss functions to query-level loss functions.
9Zhang, X.: Concept integration of document databases using different indexing languages.
In: Information processing and management. 42(2006) no.1, S.121-135.
Abstract: An integrated information retrieval system generally contains multiple databases that are inconsistent in terms of their content and indexing. This paper proposes a rough set-based transfer (RST) model for integration of the concepts of document databases using various indexing languages, so that users can search through the multiple databases using any of the current indexing languages. The RST model aims to effectively create meaningful transfer relations between the terms of two indexing languages, provided a number of documents are indexed with them in parallel. In our experiment, the indexing concepts of two databases respectively using the Thesaurus of Social Science (IZ) and the Schlagwortnormdatei (SWD) are integrated by means of the RST model. Finally, this paper compares the results achieved with a cross-concordance method, a conditional probability based method and the RST model.
Themenfeld: Semantische Interoperabilität
10Zhang, X. ; Han, H.: ¬An empirical testing of user stereotypes of information retrieval systems.
In: Information processing and management. 41(2005) no.3, S.651-664.
Abstract: Stereotyping is a technique used in many information systems to represent user groups and/or to generate initial individual user models. However, there has been a lack of evidence on the accuracy of their use in representing users. We propose a formal evaluation method to test the accuracy or homogeneity of the stereotypes that are based on users' explicit characteristics. Using the method, the results of an empirical testing on 11 common user stereotypes of information retrieval (IR) systems are reported. The participants' memberships in the stereotypes were predicted using discriminant analysis, based on their IR knowledge. The actual membership and the predicted membership of each stereotype were compared. The data show that "librarians/IR professionals" is an accurate stereotype in representing its members, while some others, such as "undergraduate students" and "social sciences/humanities" users, are not accurate stereotypes. The data also demonstrate that based on the user's IR knowledge a stereotype can be made more accurate or homogeneous. The results show the promise that our method can help better detect the differences among stereotype members, and help with better stereotype design and user modeling. We assume that accurate stereotypes have better performance in user modeling and thus the system performance. Limitations and future directions of the study are discussed.
11Zhang, X.: Collaborative relevance judgment : a group consensus method for evaluating user search performance.
In: Journal of the American Society for Information Science and technology. 53(2002) no.3, S.220-231.
Abstract: Relevance judgment has traditionally been considered a personal and subjective matter. A user's search and the search result are treated as an isolated event. To consider the collaborative nature of information retrieval (IR) in a group/organization or even societal context, this article proposes a method that measures relevance based on group/peer consensus. The method can be used in IR experiments. In this method, the relevance of a document is decided by group consensus, or more specifically, by the number of users (or experiment participants) who retrieve it for the same search question. The more users who retrieve it, the more relevant the document will be considered. A user's search performance can be measured by a relevance score based on this notion. The article reports the results of an experiment using this method to compare the search performance of different types of users. Related issues with the method and future directions are also discussed
12Zhang, X. ; Chignell, M.: Assessment of the effects of user characteristics on mental models of information retrieval systems.
In: Journal of the American Society for Information Science and technology. 52(2001) no.6, S.445-459.
Abstract: This article reports the results of a study that investigated effects of four user characteristics on users' mental models of information retrieval systems: educational and professional status, first language, academic background, and computer experience. The repertory grid technique was used in the study. Using this method, important components of information retrieval systems were represented by nine concepts, based on four IR experts' judgments. Users' mental models were represented by factor scores that were derived from users' matrices of concept ratings on different attributes of the concepts. The study found that educational and professional status, academic background, and computer experience had significant effects in differentiating users on their factor scores. First language had a borderline effect, but the effect was not significant enough at a = 0.05 level. Specific different views regarding IR systems among different groups of users are described and discussed. Implications of the study for information science and IR system designs are suggested