Diese Datenbank enthält über 40.000 Dokumente zu Themen aus den Bereichen Formalerschließung – Inhaltserschließung – Information Retrieval.
© 2015 W. Gödert, TH Köln, Institut für Informationswissenschaft / Powered by litecat, BIS Oldenburg (Stand: 16. Dezember 2019)
1Wang, P. ; Li, X.: Assessing the quality of information on Wikipedia : a deep-learning approach.
In: Journal of the Association for Information Science and Technology. 71(2020) no.1, S.16-28.
Abstract: Currently, web document repositories have been collaboratively created and edited. One of these repositories, Wikipedia, is facing an important problem: assessing the quality of Wikipedia. Existing approaches exploit techniques such as statistical models or machine leaning algorithms to assess Wikipedia article quality. However, existing models do not provide satisfactory results. Furthermore, these models fail to adopt a comprehensive feature framework. In this article, we conduct an extensive survey of previous studies and summarize a comprehensive feature framework, including text statistics, writing style, readability, article structure, network, and editing history. Selected state-of-the-art deep-learning models, including the convolutional neural network (CNN), deep neural network (DNN), long short-term memory (LSTMs) network, CNN-LSTMs, bidirectional LSTMs, and stacked LSTMs, are applied to assess the quality of Wikipedia. A detailed comparison of deep-learning models is conducted with regard to different aspects: classification performance and training performance. We include an importance analysis of different features and feature sets to determine which features or feature sets are most effective in distinguishing Wikipedia article quality. This extensive experiment validates the effectiveness of the proposed model.
Inhalt: Vgl.: https://asistdl.onlinelibrary.wiley.com/doi/10.1002/asi.24210.
2Li, X. ; Rijke, M.de: Characterizing and predicting downloads in academic search.
In: Information processing and management. 56(2019) no.3, S.394-407.
Abstract: Numerous studies have been conducted on the information interaction behavior of search engine users. Few studies have considered information interactions in the domain of academic search. We focus on conversion behavior in this domain. Conversions have been widely studied in the e-commerce domain, e.g., for online shopping and hotel booking, but little is known about conversions in academic search. We start with a description of a unique dataset of a particular type of conversion in academic search, viz. users' downloads of scientific papers. Then we move to an observational analysis of users' download actions. We first characterize user actions and show their statistics in sessions. Then we focus on behavioral and topical aspects of downloads, revealing behavioral correlations across download sessions. We discover unique properties that differ from other conversion settings such as online shopping. Using insights gained from these observations, we consider the task of predicting the next download. In particular, we focus on predicting the time until the next download session, and on predicting the number of downloads. We cast these as time series prediction problems and model them using LSTMs. We develop a specialized model built on user segmentations that achieves significant improvements over the state-of-the art.
Inhalt: Vgl.: https://doi.org/10.1016/j.ipm.2018.10.019.
3Lu, W. ; Li, X. ; Liu, Z. ; Cheng, Q.: How do author-selected keywords function semantically in scientific manuscripts?.
In: Knowledge organization. 46(2019) no.6, S.403-418.
Abstract: Author-selected keywords have been widely utilized for indexing, information retrieval, bibliometrics and knowledge organization in previous studies. However, few studies exist con-cerning how author-selected keywords function semantically in scientific manuscripts. In this paper, we investigated this problem from the perspective of term function (TF) by devising indica-tors of the diversity and symmetry of keyword term functions in papers, as well as the intensity of individual term functions in papers. The data obtained from the whole Journal of Informetrics(JOI) were manually processed by an annotation scheme of key-word term functions, including "research topic," "research method," "research object," "research area," "data" and "others," based on empirical work in content analysis. The results show, quantitatively, that the diversity of keyword term function de-creases, and the irregularity increases with the number of author-selected keywords in a paper. Moreover, the distribution of the intensity of individual keyword term function indicated that no significant difference exists between the ranking of the five term functions with the increase of the number of author-selected keywords (i.e., "research topic" > "research method" > "research object" > "research area" > "data"). The findings indicate that precise keyword related research must take into account the dis-tinct types of author-selected keywords.
4Su, S. ; Li, X. ; Cheng, X. ; Sun, C.: Location-aware targeted influence maximization in social networks.
In: Journal of the Association for Information Science and Technology. 69(2018) no.2, S.229-241.
Abstract: In this paper, we study the location-aware targeted influence maximization problem in social networks, which finds a seed set to maximize the influence spread over the targeted users. In particular, we consider those users who have both topic and geographical preferences on promotion products as targeted users. To efficiently solve this problem, one challenge is how to find the targeted users and compute their preferences efficiently for given requests. To address this challenge, we devise a TR-tree index structure, where each tree node stores users' topic and geographical preferences. By traversing the TR-tree in depth-first order, we can efficiently find the targeted users. Another challenge of the problem is to devise algorithms for efficient seeds selection. We solve this challenge from two complementary directions. In one direction, we adopt the maximum influence arborescence (MIA) model to approximate the influence spread, and propose two efficient approximation algorithms with math formula approximation ratio, which prune some candidate seeds with small influences by precomputing users' initial influences offline and estimating the upper bound of their marginal influences online. In the other direction, we propose a fast heuristic algorithm to improve efficiency. Experiments conducted on real-world data sets demonstrate the effectiveness and efficiency of our proposed algorithms.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23931/full.
5Li, X. ; Zhang, A. ; Li, C. ; Ouyang, J. ; Cai, Y.: Exploring coherent topics by topic modeling with term weighting.
In: Information processing and management. 54(2018) no.6, S.1345-1358.
Abstract: Topic models often produce unexplainable topics that are filled with noisy words. The reason is that words in topic modeling have equal weights. High frequency words dominate the top topic word lists, but most of them are meaningless words, e.g., domain-specific stopwords. To address this issue, in this paper we aim to investigate how to weight words, and then develop a straightforward but effective term weighting scheme, namely entropy weighting (EW). The proposed EW scheme is based on conditional entropy measured by word co-occurrences. Compared with existing term weighting schemes, the highlight of EW is that it can automatically reward informative words. For more robust word weight, we further suggest a combination form of EW (CEW) with two existing weighting schemes. Basically, our CEW assigns meaningless words lower weights and informative words higher weights, leading to more coherent topics during topic modeling inference. We apply CEW to Dirichlet multinomial mixture and latent Dirichlet allocation, and evaluate it by topic quality, document clustering and classification tasks on 8 real world data sets. Experimental results show that weighting words can effectively improve the topic modeling performance over both short texts and normal long texts. More importantly, the proposed CEW significantly outperforms the existing term weighting schemes, since it further considers which words are informative.
Inhalt: Vgl.: https://doi.org/10.1016/j.ipm.2018.05.009.
Themenfeld: Automatisches Indexieren
6Li, X. ; Cox, A. ; Ford, N. ; Creaser, C. ; Fry, J. ; Willett, P.: Knowledge construction by users : a content analysis framework and a knowledge construction process model for virtual product user communities.
In: Journal of documentation. 73(2017) no.2, S.284-304.
Abstract: Purpose The purpose of this paper is to develop a content analysis framework and from that derive a process model of knowledge construction in the context of virtual product user communities, organization sponsored online forums where product users collaboratively construct knowledge to solve their technical problems. Design/methodology/approach The study is based on a deductive and qualitative content analysis of discussion threads about solving technical problems selected from a series of virtual product user communities. Data are complemented with thematic analysis of interviews with forum members. Findings The research develops a content analysis framework for knowledge construction. It is based on a combination of existing codes derived from frameworks developed for computer-supported collaborative learning and new categories identified from the data. Analysis using this framework allows the authors to propose a knowledge construction process model showing how these elements are organized around a typical "trial and error" knowledge construction strategy. Practical implications The research makes suggestions about organizations' management of knowledge activities in virtual product user communities, including moderators' roles in facilitation. Originality/value The paper outlines a new framework for analysing knowledge activities where there is a low level of critical thinking and a model of knowledge construction by trial and error. The new framework and model can be applied in other similar contexts.
Inhalt: Vgl.: http://www.emeraldinsight.com/doi/full/10.1108/JD-05-2016-0060.
7Xu, G. ; Cao, Y. ; Ren, Y. ; Li, X. ; Feng, Z.: Network security situation awareness based on semantic ontology and user-defined rules for Internet of Things.
Abstract: Internet of Things (IoT) brings the third development wave of the global information industry which makes users, network and perception devices cooperate more closely. However, if IoT has security problems, it may cause a variety of damage and even threaten human lives and properties. To improve the abilities of monitoring, providing emergency response and predicting the development trend of IoT security, a new paradigm called network security situation awareness (NSSA) is proposed. However, it is limited by its ability to mine and evaluate security situation elements from multi-source heterogeneous network security information. To solve this problem, this paper proposes an IoT network security situation awareness model using situation reasoning method based on semantic ontology and user-defined rules. Ontology technology can provide a unified and formalized description to solve the problem of semantic heterogeneity in the IoT security domain. In this paper, four key sub-domains are proposed to reflect an IoT security situation: context, attack, vulnerability and network flow. Further, user-defined rules can compensate for the limited description ability of ontology, and hence can enhance the reasoning ability of our proposed ontology model. The examples in real IoT scenarios show that the ability of the network security situation awareness that adopts our situation reasoning method is more comprehensive and more powerful reasoning abilities than the traditional NSSA methods. [http://ieeexplore.ieee.org/abstract/document/7999187/]
Inhalt: DOI 10.1109/ACCESS.2017.2734681.
8Li, X. ; Schijvenaars, B.J.A. ; Rijke, M.de: Investigating queries and search failures in academic search.
In: Information processing and management. 53(2017) no.3, S.666-683.
Abstract: Academic search concerns the retrieval and profiling of information objects in the domain of academic research. In this paper we reveal important observations of academic search queries, and provide an algorithmic solution to address a type of failure during search sessions: null queries. We start by providing a general characterization of academic search queries, by analyzing a large-scale transaction log of a leading academic search engine. Unlike previous small-scale analyses of academic search queries, we find important differences with query characteristics known from web search. E.g., in academic search there is a substantially bigger proportion of entity queries, and a heavier tail in query length distribution. We then focus on search failures and, in particular, on null queries that lead to an empty search engine result page, on null sessions that contain such null queries, and on users who are prone to issue null queries. In academic search approximately 1 in 10 queries is a null query, and 25% of the sessions contain a null query. They appear in different types of search sessions, and prevent users from achieving their search goal. To address the high rate of null queries in academic search, we consider the task of providing query suggestions. Specifically we focus on a highly frequent query type: non-boolean informational queries. To this end we need to overcome query sparsity and make effective use of session information. We find that using entities helps to surface more relevant query suggestions in the face of query sparsity. We also find that query suggestions should be conditioned on the type of session in which they are offered to be more effective. After casting the session classification problem as a multi-label classification problem, we generate session-conditional query suggestions based on predicted session type. We find that this session-conditional method leads to significant improvements over a generic query suggestion method. Personalization yields very little further improvements over session-conditional query suggestions.
Inhalt: Vgl.: http://www.sciencedirect.com/science/article/pii/S0306457316304071.
9Xie, H. ; Li, X. ; Wang, T. ; Lau, R.Y.K. ; Wong, T.-L. ; Chen, L. ; Wang, F.L. ; Li, Q.: Incorporating sentiment into tag-based user profiles and resource profiles for personalized search in folksonomy.
In: Information processing and management. 52(2016) no.1, S.61-72.
Abstract: In recent years, there has been a rapid growth of user-generated data in collaborative tagging (a.k.a. folksonomy-based) systems due to the prevailing of Web 2.0 communities. To effectively assist users to find their desired resources, it is critical to understand user behaviors and preferences. Tag-based profile techniques, which model users and resources by a vector of relevant tags, are widely employed in folksonomy-based systems. This is mainly because that personalized search and recommendations can be facilitated by measuring relevance between user profiles and resource profiles. However, conventional measurements neglect the sentiment aspect of user-generated tags. In fact, tags can be very emotional and subjective, as users usually express their perceptions and feelings about the resources by tags. Therefore, it is necessary to take sentiment relevance into account into measurements. In this paper, we present a novel generic framework SenticRank to incorporate various sentiment information to various sentiment-based information for personalized search by user profiles and resource profiles. In this framework, content-based sentiment ranking and collaborative sentiment ranking methods are proposed to obtain sentiment-based personalized ranking. To the best of our knowledge, this is the first work of integrating sentiment information to address the problem of the personalized tag-based search in collaborative tagging systems. Moreover, we compare the proposed sentiment-based personalized search with baselines in the experiments, the results of which have verified the effectiveness of the proposed framework. In addition, we study the influences by popular sentiment dictionaries, and SenticNet is the most prominent knowledge base to boost the performance of personalized search in folksonomy.
Inhalt: Vgl.: doi:10.1016/j.ipm.2015.03.001.
Anmerkung: Beitrag in einem Themenheft "Emotion and sentiment in social and expressive media"
Themenfeld: Folksonomies ; Inhaltsanalyse
10Li, X. ; Thelwall, M. ; Kousha, K.: ¬The role of arXiv, RePEc, SSRN and PMC in formal scholarly communication.
In: Aslib journal of information management. 67(2015) no.6, S.614-635.
Abstract: Purpose The four major Subject Repositories (SRs), arXiv, Research Papers in Economics (RePEc), Social Science Research Network (SSRN) and PubMed Central (PMC), are all important within their disciplines but no previous study has systematically compared how often they are cited in academic publications. In response, the purpose of this paper is to report an analysis of citations to SRs from Scopus publications, 2000-2013. Design/methodology/approach Scopus searches were used to count the number of documents citing the four SRs in each year. A random sample of 384 documents citing the four SRs was then visited to investigate the nature of the citations. Findings Each SR was most cited within its own subject area but attracted substantial citations from other subject areas, suggesting that they are open to interdisciplinary uses. The proportion of documents citing each SR is continuing to increase rapidly, and the SRs all seem to attract substantial numbers of citations from more than one discipline. Research limitations/implications Scopus does not cover all publications, and most citations to documents found in the four SRs presumably cite the published version, when one exists, rather than the repository version. Practical implications SRs are continuing to grow and do not seem to be threatened by institutional repositories and so research managers should encourage their continued use within their core disciplines, including for research that aims at an audience in other disciplines. Originality/value This is the first simultaneous analysis of Scopus citations to the four most popular SRs.
Inhalt: Vgl.: http://www.emeraldinsight.com/doi/abs/10.1108/AJIM-03-2015-0049.
Themenfeld: Elektronisches Publizieren
Objekt: arXiv ; Research Papers in Economics ; Social Science Research Network ; PubMed Central
11Thelwall, M. ; Li, X. ; Barjak, F. ; Robinson, S.: Assessing the international web connectivity of research groups.
In: Aslib proceedings. 60(2008) no.1, S.18-31.
Abstract: Purpose - The purpose of this paper is to claim that it is useful to assess the web connectivity of research groups, describe hyperlink-based techniques to achieve this and present brief details of European life sciences research groups as a case study. Design/methodology/approach - A commercial search engine was harnessed to deliver hyperlink data via its automatic query submission interface. A special purpose link analysis tool, LexiURL, then summarised and graphed the link data in appropriate ways. Findings - Webometrics can provide a wide range of descriptive information about the international connectivity of research groups. Research limitations/implications - Only one field was analysed, data was taken from only one search engine, and the results were not validated. Practical implications - Web connectivity seems to be particularly important for attracting overseas job applicants and to promote research achievements and capabilities, and hence we contend that it can be useful for national and international governments to use webometrics to ensure that the web is being used effectively by research groups. Originality/value - This is the first paper to make a case for the value of using a range of webometric techniques to evaluate the web presences of research groups within a field, and possibly the first "applied" webometrics study produced for an external contract.
12Li, J. ; Zhang, Z. ; Li, X. ; Chen, H.: Kernel-based learning for biomedical relation extraction.
In: Journal of the American Society for Information Science and Technology. 59(2008) no.5, S.756-769.
Abstract: Relation extraction is the process of scanning text for relationships between named entities. Recently, significant studies have focused on automatically extracting relations from biomedical corpora. Most existing biomedical relation extractors require manual creation of biomedical lexicons or parsing templates based on domain knowledge. In this study, we propose to use kernel-based learning methods to automatically extract biomedical relations from literature text. We develop a framework of kernel-based learning for biomedical relation extraction. In particular, we modified the standard tree kernel function by incorporating a trace kernel to capture richer contextual information. In our experiments on a biomedical corpus, we compare different kernel functions for biomedical relation detection and classification. The experimental results show that a tree kernel outperforms word and sequence kernels for relation detection, our trace-tree kernel outperforms the standard tree kernel, and a composite kernel outperforms individual kernels for relation extraction.
13Li, X.: ¬A new robust relevance model in the language model framework.
In: Information processing and management. 44(2008) no.3, S.991-1007.
Abstract: In this paper, a new robust relevance model is proposed that can be applied to both pseudo and true relevance feedback in the language-modeling framework for document retrieval. There are at least three main differences between our new relevance model and other relevance models. The proposed model brings back the original query into the relevance model by treating it as a short, special document, in addition to a number of top-ranked documents returned from the first round retrieval for pseudo feedback, or a number of relevant documents for true relevance feedback. Second, instead of using a uniform prior as in the original relevance model proposed by Lavrenko and Croft, documents are assigned with different priors according to their lengths (in terms) and ranks in the first round retrieval. Third, the probability of a term in the relevance model is further adjusted by its probability in a background language model. In both pseudo and true relevance cases, we have compared the performance of our model to that of the two baselines: the original relevance model and a linear combination model. Our experimental results show that the proposed new model outperforms both of the two baselines in terms of mean average precision.
14Xiaoyan Li, X. ; Croft, W.B.: ¬An information-pattern-based approach to novelty detection.
In: Information processing and management. 44(2008) no.3, S.1159-1188.
Abstract: In this paper, a new novelty detection approach based on the identification of sentence level information patterns is proposed. First, "novelty" is redefined based on the proposed information patterns, and several different types of information patterns are given corresponding to different types of users' information needs. Second, a thorough analysis of sentence level information patterns is elaborated using data from the TREC novelty tracks, including sentence lengths, named entities (NEs), and sentence level opinion patterns. Finally, a unified information-pattern-based approach to novelty detection (ip-BAND) is presented for both specific NE topics and more general topics. Experiments on novelty detection on data from the TREC 2002, 2003 and 2004 novelty tracks show that the proposed approach significantly improves the performance of novelty detection in terms of precision at top ranks. Future research directions are suggested.
15Barjak, F. ; Li, X. ; Thelwall, M.: Which factors explain the Web impact of scientists' personal homepages?.
In: Journal of the American Society for Information Science and Technology. 58(2007) no.2, S.200-211.
Abstract: In recent years, a considerable body of Webometric research has used hyperlinks to generate indicators for the impact of Web documents and the organizations that created them. The relationship between this Web impact and other, offline impact indicators has been explored for entire universities, departments, countries, and scientific journals, but not yet for individual scientists-an important omission. The present research closes this gap by investigating factors that may influence the Web impact (i.e., inlink counts) of scientists' personal homepages. Data concerning 456 scientists from five scientific disciplines in six European countries were analyzed, showing that both homepage content and personal and institutional characteristics of the homepage owners had significant relationships with inlink counts. A multivariate statistical analysis confirmed that full-text articles are the most linked-to content in homepages. At the individual homepage level, hyperlinks are related to several offline characteristics. Notable differences regarding total inlinks to scientists' homepages exist between the scientific disciplines and the countries in the sample. There also are both gender and age effects: fewer external inlinks (i.e., links from other Web domains) to the homepages of female and of older scientists. There is only a weak relationship between a scientist's recognition and homepage inlinks and, surprisingly, no relationship between research productivity and inlink counts. Contrary to expectations, the size of collaboration networks is negatively related to hyperlink counts. Some of the relationships between hyperlinks to homepages and the properties of their owners can be explained by the content that the homepage owners put on their homepage and their level of Internet use; however, the findings about productivity and collaborations do not seem to have a simple, intuitive explanation. Overall, the results emphasize the complexity of the phenomenon of Web linking, when analyzed at the level of individual pages.
16Yan, X. ; Li, X. ; Song, D.: ¬A correlation analysis on LSA and HAL semantic space models.
In: Computational and information science. First International Symposium, CIS 2004, Shanghai, China, December 16-18, 2004. Proceedings. Eds.: J. Zhang et al. Berlin : Springer, 2004. S.711-717.
(Lecture notes in computer science; vol. 3314)
Abstract: In this paper, we compare a well-known semantic spacemodel, Latent Semantic Analysis (LSA) with another model, Hyperspace Analogue to Language (HAL) which is widely used in different area, especially in automatic query refinement. We conduct this comparative analysis to prove our hypothesis that with respect to ability of extracting the lexical information from a corpus of text, LSA is quite similar to HAL. We regard HAL and LSA as black boxes. Through a Pearson's correlation analysis to the outputs of these two black boxes, we conclude that LSA highly co-relates with HAL and thus there is a justification that LSA and HAL can potentially play a similar role in the area of facilitating automatic query refinement. This paper evaluates LSA in a new application area and contributes an effective way to compare different semantic space models.
Themenfeld: Semantisches Umfeld in Indexierung u. Retrieval
Objekt: Hypertspace Analogue to Language ; Latent Semantic Analysis
17Li, X. ; Fullerton, J.P.: Create, edit, and manage Web database content using active server pages.
In: Library hi tech. 20(2002) no.3, S.285-301.
Abstract: Libraries have been integrating active server pages (ASP) with Web-based databases for searching and retrieving electronic information for the past five years; however, a literature review reveals that a more complete description of modifying data through the Web interface is needed. At the Texas A&M University Libraries, a Web database of Internet links was developed using ASP, Microsoft Access, and Microsoft Internet Information Server (IIS) to facilitate use of online resources. The implementation of the Internet Links database is described with focus on its data management functions. Also described are other library applications of ASP technology. The project explores a more complete approach to library Web database applications than was found in the current literature and should serve to facilitate reference service.
Inhalt: Vgl. auch unter: http://www.emeraldinsight.com/10.1108/07378830210444487.
18Li, X.: Designing an interactive Web tutorial with cross-browser dynamic HTML.
In: Library hi tech. 18(2000) no.4, S.369-382.
Abstract: Texas A&M University Libraries developed a Web-based training (WBT) application for LandView III, a federal depository CD-ROM publication using cross-browser dynamic HTML (DHTML) and other Web technologies. The interactive and self-paced tutorial demonstrates the major features of the CD-ROM and shows how to navigate the programs. The tutorial features dynamic HTML techniques, such as hiding, showing and moving layers; dragging objects; and windows-style drop-down menus. It also integrates interactive forms, common gateway interface (CGI), frames, and animated GIF images in the design of the WBT. After describing the design and implementation of the tutorial project, an evaluation of usage statistics and user feedback was conducted, as well as an assessment of its strengths and weaknesses, and a comparison of this tutorial with other common types of training methods. The present article describes an innovative approach for CD-ROM training using advanced Web technologies such as dynamic HTML, which can simulate and demonstrate the interactive use of the CD-ROM, as well as the actual search process of a database.
Inhalt: Vgl. auch unter: http://www.emeraldinsight.com/10.1108/07378830010360464.
19Li, X. ; Crane, N.: Electronic styles : a handbook for citing electronic information.2nd ed.
Medford, NJ : Information Today, 1996. 214 S.
Abstract: The second edition of the best-selling guide to referencing electronic information and citing the complete range of electronic formats includes text-based information, electronic journals and discussion lists, Web sites, CD-ROM and multimedia products, and commercial online documents
Behandelte Form: Elektronische Dokumente
20Li, X. ; Crane, N.B.: Electronic styles : a guide to citing electronic information.
Westport, CT : Meckler, 1993. xi, 65 S.
Anmerkung: Rez. in: Library resources and technical services 38(1994) no.2, S.199-201 (C.J. Palowitch)
Behandelte Form: Elektronische Datenträger
LCSH: Citation of electronic information resources
RSWK: Information / Datenbank / Zitat (SWB) ; Literaturdatenbank / Führer (BVB) ; Datenbank / Führer (BVB) ; Literaturdatenbank / Zitat / Führer (BVB) ; Datenbank / Zitat / Führer (BVB) ; Zitat / Informationssystem (BVB)
DDC: 808/.027 / dc20
GHBS: ALB (W)
LCC: PN171.D37L5 1993
RVK: ST 620 Informatik / Monographien / Einzelne Anwendungen der Datenverarbeitung / Datenverarbeitung in Anwendungsgebieten / Technik ; QP 500 (BVB)