Diese Datenbank enthält über 40.000 Dokumente zu Themen aus den Bereichen Formalerschließung – Inhaltserschließung – Information Retrieval.
© 2015 W. Gödert, TH Köln, Institut für Informationswissenschaft / Powered by litecat, BIS Oldenburg (Stand: 04. Juni 2021)
1Zhang, Y. ; Zhang, C.: Enhancing keyphrase extraction from microblogs using human reading time.
In: Journal of the Association for Information Science and Technology. 72(2021) no.5, S.611-626.
Abstract: The premise of manual keyphrase annotation is to read the corresponding content of an annotated object. Intuitively, when we read, more important words will occupy a longer reading time. Hence, by leveraging human reading time, we can find the salient words in the corresponding content. However, previous studies on keyphrase extraction ignore human reading features. In this article, we aim to leverage human reading time to extract keyphrases from microblog posts. There are two main tasks in this study. One is to determine how to measure the time spent by a human on reading a word. We use eye fixation durations (FDs) extracted from an open source eye-tracking corpus. Moreover, we propose strategies to make eye FD more effective on keyphrase extraction. The other task is to determine how to integrate human reading time into keyphrase extraction models. We propose two novel neural network models. The first is a model in which the human reading time is used as the ground truth of the attention mechanism. In the second model, we use human reading time as the external feature. Quantitative and qualitative experiments show that our proposed models yield better performance than the baseline models on two microblog datasets.
Inhalt: Vgl.: https://asistdl.onlinelibrary.wiley.com/doi/10.1002/asi.24430.
2Qu, Z. ; Tan, B.C.Y. (Hrsg.): Yao, X. ; Zhang, C.: Global village or virtual balkans? : evolution and performance of scientific collaboration in the information age.
In: Journal of the Association for Information Science and Technology. 71(2020) no.4, S.395-408.
Abstract: Scientific collaboration is essential and almost imperative in modern science. However, collaboration may be difficult to achieve because of 2 major barriers: geographic distance and social divides. It is predicted that the advancement of information communication technologies (ICTs) will bring a puzzled conclusion for collaboration in the scientific community: the "Global Village" trend with significantly increased physical distance among collaborated scientists and the "Virtual Balkans" trend with significantly increased social stratification among collaborated scientists. The results of this study reveal that the scientific community evolves towards the Global Village generally on both the geographic and social dimension, but with variations in term of collaboration patterns. The influence of such collaboration patterns on research performance (that is, productivity and impact), however, is asymmetric to each side of collaborators. When researchers from top-tier and general-tier institutions collaborate, researchers from top-tier institutions face a decrease in research productivity and impact, whereas researchers from general-tier institutions increase in research productivity and impact. Furthermore, the development of ICTs plays an important role in shaping the evolving trends and moderating effects of collaboration patterns. Our findings provide a comprehensive understanding of scientific collaboration in the geographic, social, and technological aspect.
3Zhang, Y. ; Zhang, C. ; Li, J.: Joint modeling of characters, words, and conversation contexts for microblog keyphrase extraction.
In: Journal of the Association for Information Science and Technology. 71(2020) no.5, S.553-567.
Abstract: Millions of messages are produced on microblog platforms every day, leading to the pressing need for automatic identification of key points from the massive texts. To absorb salient content from the vast bulk of microblog posts, this article focuses on the task of microblog keyphrase extraction. In previous work, most efforts treat messages as independent documents and might suffer from the data sparsity problem exhibited in short and informal microblog posts. On the contrary, we propose to enrich contexts via exploiting conversations initialized by target posts and formed by their replies, which are generally centered around relevant topics to the target posts and therefore helpful for keyphrase identification. Concretely, we present a neural keyphrase extraction framework, which has 2 modules: a conversation context encoder and a keyphrase tagger. The conversation context encoder captures indicative representation from their conversation contexts and feeds the representation into the keyphrase tagger, and the keyphrase tagger extracts salient words from target posts. The 2 modules were trained jointly to optimize the conversation context encoding and keyphrase extraction processes. In the conversation context encoder, we leverage hierarchical structures to capture the word-level indicative representation and message-level indicative representation hierarchically. In both of the modules, we apply character-level representations, which enables the model to explore morphological features and deal with the out-of-vocabulary problem caused by the informal language style of microblog messages. Extensive comparison results on real-life data sets indicate that our model outperforms state-of-the-art models from previous studies.
Themenfeld: Automatisches Indexieren ; Computerlinguistik
4Lu, C. ; Zhang, Y. ; Ahn, Y.-Y. ; Ding, Y. ; Zhang, C. ; Ma, D.: Co-contributorship network and division of labor in individual scientific collaborations.
In: Journal of the Association for Information Science and Technology. 71(2020) no.10, S.1162-1178.
Abstract: Collaborations are pervasive in current science. Collaborations have been studied and encouraged in many disciplines. However, little is known about how a team really functions from the detailed division of labor within. In this research, we investigate the patterns of scientific collaboration and division of labor within individual scholarly articles by analyzing their co-contributorship networks. Co-contributorship networks are constructed by performing the one-mode projection of the author-task bipartite networks obtained from 138,787 articles published in PLoS journals. Given an article, we define 3 types of contributors: Specialists, Team-players, and Versatiles. Specialists are those who contribute to all their tasks alone; team-players are those who contribute to every task with other collaborators; and versatiles are those who do both. We find that team-players are the majority and they tend to contribute to the 5 most common tasks as expected, such as "data analysis" and "performing experiments." The specialists and versatiles are more prevalent than expected by our designed 2 null models. Versatiles tend to be senior authors associated with funding and supervision. Specialists are associated with 2 contrasting roles: the supervising role as team leaders or marginal and specialized contributors.
5Lu, C. ; Bu, Y. ; Wang, J. ; Ding, Y. ; Torvik, V. ; Schnaars, M. ; Zhang, C.: Examining scientific writing styles from the perspective of linguistic complexity : a cross-level moderation model.
In: Journal of the Association for Information Science and Technology. 70(2019) no.5, S.462-475.
Abstract: Publishing articles in high-impact English journals is difficult for scholars around the world, especially for non-native English-speaking scholars (NNESs), most of whom struggle with proficiency in English. To uncover the differences in English scientific writing between native English-speaking scholars (NESs) and NNESs, we collected a large-scale data set containing more than 150,000 full-text articles published in PLoS between 2006 and 2015. We divided these articles into three groups according to the ethnic backgrounds of the first and corresponding authors, obtained by Ethnea, and examined the scientific writing styles in English from a two-fold perspective of linguistic complexity: (a) syntactic complexity, including measurements of sentence length and sentence complexity; and (b) lexical complexity, including measurements of lexical diversity, lexical density, and lexical sophistication. The observations suggest marginal differences between groups in syntactical and lexical complexity.
Inhalt: Vgl.: https://onlinelibrary.wiley.com/doi/10.1002/asi.24126.
6Zhang, C. ; Zhao, H. ; Chi, X. ; Ma, S.: Information organization patterns from online users in a social network.
In: Knowledge organization. 46(2019) no.2, S.90-103.
Abstract: Recent years have seen the rise of user-generated con-tents (UGCs) in online social media. Diverse UGC sources and information overload are making it increasingly difficult to satisfy personalized information needs. To organize UGCs in a user-centered way, we should not only map them based on textual top-ics but also link them with users and even user communities. We propose a multi-dimensional framework to organize information by connecting UGCs, users, and user communities. First, we use a topic model to generate a topic hierarchy from UGCs. Second, an author-topic model is applied to learn user interests. Third, user communities are detected through a label propagation algo-rithm. Finally, a multi-dimensional information organization pat-tern is formulated based on similarities among the topic hierar-chies of UGCs, user interests, and user communities. The results reveal that: 1) our proposed framework can organize information rom multiple sources in a user-centered way; 2) hierarchical topic structures can provide comprehensive and in-depth topics for us-ers; and, 3) user communities are efficient in helping people to connect with others who have similar interests.
7Zhang, C. ; Bu, Y. ; Ding, Y. ; Xu, J.: Understanding scientific collaboration : homophily, transitivity, and preferential attachment.
In: Journal of the Association for Information Science and Technology. 69(2018) no.1, S.72-86.
Abstract: Scientific collaboration is essential in solving problems and breeding innovation. Coauthor network analysis has been utilized to study scholars' collaborations for a long time, but these studies have not simultaneously taken different collaboration features into consideration. In this paper, we present a systematic approach to analyze the differences in possibilities that two authors will cooperate as seen from the effects of homophily, transitivity, and preferential attachment. Exponential random graph models (ERGMs) are applied in this research. We find that different types of publications one author has written play diverse roles in his/her collaborations. An author's tendency to form new collaborations with her/his coauthors' collaborators is strong, where the more coauthors one author had before, the more new collaborators he/she will attract. We demonstrate that considering the authors' attributes and homophily effects as well as the transitivity and preferential attachment effects of the coauthorship network in which they are embedded helps us gain a comprehensive understanding of scientific collaboration.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23916/full.
8Li, L. ; He, D. ; Zhang, C. ; Geng, L. ; Zhang, K.: Characterizing peer-judged answer quality on academic Q&A sites : a cross-disciplinary case study on ResearchGate.
In: Aslib journal of information management. 70(2018) no.3, S.269-287.
Abstract: Purpose Academic social (question and answer) Q&A sites are now utilised by millions of scholars and researchers for seeking and sharing discipline-specific information. However, little is known about the factors that can affect their votes on the quality of an answer, nor how the discipline might influence these factors. The paper aims to discuss this issue. Design/methodology/approach Using 1,021 answers collected over three disciplines (library and information services, history of art, and astrophysics) in ResearchGate, statistical analysis is performed to identify the characteristics of high-quality academic answers, and comparisons were made across the three disciplines. In particular, two major categories of characteristics of the answer provider and answer content were extracted and examined. Findings The results reveal that high-quality answers on academic social Q&A sites tend to possess two characteristics: first, they are provided by scholars with higher academic reputations (e.g. more followers, etc.); and second, they provide objective information (e.g. longer answer with fewer subjective opinions). However, the impact of these factors varies across disciplines, e.g., objectivity is more favourable in physics than in other disciplines. Originality/value The study is envisioned to help academic Q&A sites to select and recommend high-quality answers across different disciplines, especially in a cold-start scenario where the answer has not received enough judgements from peers.
Inhalt: Vgl.: https://doi.org/10.1108/AJIM-11-2017-0246.
9Hu, B. ; Dong, X. ; Zhang, C. ; Bowman, T.D. ; Ding, Y. ; Milojevic, S. ; Ni, C. ; Yan, E. ; Larivière, V.: ¬A lead-lag analysis of the topic evolution patterns for preprints and publications.
In: Journal of the Association for Information Science and Technology. 66(2015) no.12, S.2643-2656.
Abstract: This study applied LDA (latent Dirichlet allocation) and regression analysis to conduct a lead-lag analysis to identify different topic evolution patterns between preprints and papers from arXiv and the Web of Science (WoS) in astrophysics over the last 20 years (1992-2011). Fifty topics in arXiv and WoS were generated using an LDA algorithm and then regression models were used to explain 4 types of topic growth patterns. Based on the slopes of the fitted equation curves, the paper redefines the topic trends and popularity. Results show that arXiv and WoS share similar topics in a given domain, but differ in evolution trends. Topics in WoS lose their popularity much earlier and their durations of popularity are shorter than those in arXiv. This work demonstrates that open access preprints have stronger growth tendency as compared to traditional printed publications.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23347/abstract.
Themenfeld: Elektronisches Publizieren
10Wang, X. ; Hong, Z. ; Xu, Y.(C.) ; Zhang, C. ; Ling, H.: Relevance judgments of mobile commercial information.
In: Journal of the Association for Information Science and Technology. 65(2014) no.7, S.1335-1348.
Abstract: In the age of mobile commerce, users receive floods of commercial messages. How do users judge the relevance of such information? Is their relevance judgment affected by contextual factors, such as location and time? How do message content and contextual factors affect users' privacy concerns? With a focus on mobile ads, we propose a research model based on theories of relevance judgment and mobile marketing research. We suggest topicality, reliability, and economic value as key content factors and location and time as key contextual factors. We found mobile relevance judgment is affected mainly by content factors, whereas privacy concerns are affected by both content and contextual factors. Moreover, topicality and economic value have a synergetic effect that makes a message more relevant. Higher topicality and location precision exacerbate privacy concerns, whereas message reliability alleviates privacy concerns caused by location precision. These findings reveal an interesting intricacy in user relevance judgment and privacy concerns and provide nuanced guidance for the design and delivery of mobile commercial information.
11Zhang, C. ; Liu, X. ; Xu, Y.(C.) ; Wang, Y.: Quality-structure index : a new metric to measure scientific journal influence.
In: Journal of the American Society for Information Science and Technology. 62(2011) no.4, S.643-653.
Abstract: An innovative model to measure the influence among scientific journals is developed in this study. This model is based on the path analysis of a journal citation network, and its output is a journal influence matrix that describes the directed influence among all journals. Based on this model, an index of journals' overall influence, the quality-structure index (QSI), is derived. Journal ranking based on QSI has the advantage of accounting for both intrinsic journal quality and the structural position of a journal in a citation network. The QSI also integrates the characteristics of two prevailing streams of journal-assessment measures: those based on bibliometric statistics to approximate intrinsic journal quality, such as the Journal Impact Factor, and those using a journal's structural position based on the PageRank-type of algorithm, such as the Eigenfactor score. Empirical results support our finding that the new index is significantly closer to scholars' subjective perception of journal influence than are the two aforementioned measures. In addition, the journal influence matrix offers a new way to measure two-way influences between any two academic journals, hence establishing a theoretical basis for future scientometrics studies to investigate the knowledge flow within and across research disciplines.
12Zhang, C.-T.: Relationship of the h-index, g-index, and e-index.
In: Journal of the American Society for Information Science and Technology. 61(2010) no.3, S.625-628.
Abstract: Of h-type indices available now, the g-index is an important one in that it not only keeps some advantages of the h-index but also counts citations from highly cited articles. However, the g-index has a drawback that one has to add fictitious articles with zero citation to calculate this index in some important cases. Based on an alternative definition without introducing fictitious articles, an analytical method has been proposed to calculate the g-index based approximately on the h-index and the e-index. If citations for a scientist are ranked by a power law, it is shown that the g-index can be calculated accurately by the h-index, the e-index, and the power parameter. The relationship of the h-, g-, and e-indices presented here shows that the g-index contains the citation information from the h-index, the e-index, and some papers beyond the h-core.
Objekt: h-index ; g-index ; e-index
13Zhang, C. ; Zeng, D. ; Li, J. ; Wang, F.-Y. ; Zuo, W.: Sentiment analysis of Chinese documents : from sentence to document level.
In: Journal of the American Society for Information Science and Technology. 60(2009) no.12, S.2474-2487.
Abstract: User-generated content on the Web has become an extremely valuable source for mining and analyzing user opinions on any topic. Recent years have seen an increasing body of work investigating methods to recognize favorable and unfavorable sentiments toward specific subjects from online text. However, most of these efforts focus on English and there have been very few studies on sentiment analysis of Chinese content. This paper aims to address the unique challenges posed by Chinese sentiment analysis. We propose a rule-based approach including two phases: (1) determining each sentence's sentiment based on word dependency, and (2) aggregating sentences to predict the document sentiment. We report the results of an experimental study comparing our approach with three machine learning-based approaches using two sets of Chinese articles. These results illustrate the effectiveness of our proposed method and its advantages against learning-based approaches.