Search (29 results, page 1 of 2)

  • × year_i:[2010 TO 2020}
  • × author_ss:"Ding, Y."
  1. Ding, Y.; Zhang, G.; Chambers, T.; Song, M.; Wang, X.; Zhai, C.: Content-based citation analysis : the next generation of citation analysis (2014) 0.03
    0.028437529 = product of:
      0.08531258 = sum of:
        0.050336715 = weight(_text_:applications in 1521) [ClassicSimilarity], result of:
          0.050336715 = score(doc=1521,freq=2.0), product of:
            0.17247584 = queryWeight, product of:
              4.4025097 = idf(docFreq=1471, maxDocs=44218)
              0.03917671 = queryNorm
            0.2918479 = fieldWeight in 1521, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.4025097 = idf(docFreq=1471, maxDocs=44218)
              0.046875 = fieldNorm(doc=1521)
        0.019052157 = weight(_text_:of in 1521) [ClassicSimilarity], result of:
          0.019052157 = score(doc=1521,freq=18.0), product of:
            0.061262865 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.03917671 = queryNorm
            0.3109903 = fieldWeight in 1521, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.046875 = fieldNorm(doc=1521)
        0.015923709 = product of:
          0.031847417 = sum of:
            0.031847417 = weight(_text_:22 in 1521) [ClassicSimilarity], result of:
              0.031847417 = score(doc=1521,freq=2.0), product of:
                0.13719016 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03917671 = queryNorm
                0.23214069 = fieldWeight in 1521, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1521)
          0.5 = coord(1/2)
      0.33333334 = coord(3/9)
    
    Abstract
    Traditional citation analysis has been widely applied to detect patterns of scientific collaboration, map the landscapes of scholarly disciplines, assess the impact of research outputs, and observe knowledge transfer across domains. It is, however, limited, as it assumes all citations are of similar value and weights each equally. Content-based citation analysis (CCA) addresses a citation's value by interpreting each one based on its context at both the syntactic and semantic levels. This paper provides a comprehensive overview of CAA research in terms of its theoretical foundations, methodical approaches, and example applications. In addition, we highlight how increased computational capabilities and publicly available full-text resources have opened this area of research to vast possibilities, which enable deeper citation analysis, more accurate citation prediction, and increased knowledge discovery.
    Date
    22. 8.2014 16:52:04
    Source
    Journal of the Association for Information Science and Technology. 65(2014) no.9, S.1820-1833
  2. Lin, N.; Li, D.; Ding, Y.; He, B.; Qin, Z.; Tang, J.; Li, J.; Dong, T.: ¬The dynamic features of Delicious, Flickr, and YouTube (2012) 0.01
    0.012267706 = product of:
      0.05520468 = sum of:
        0.019801848 = weight(_text_:of in 4970) [ClassicSimilarity], result of:
          0.019801848 = score(doc=4970,freq=28.0), product of:
            0.061262865 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.03917671 = queryNorm
            0.32322758 = fieldWeight in 4970, product of:
              5.2915025 = tf(freq=28.0), with freq of:
                28.0 = termFreq=28.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4970)
        0.03540283 = weight(_text_:systems in 4970) [ClassicSimilarity], result of:
          0.03540283 = score(doc=4970,freq=6.0), product of:
            0.12039685 = queryWeight, product of:
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.03917671 = queryNorm
            0.29405114 = fieldWeight in 4970, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4970)
      0.22222222 = coord(2/9)
    
    Abstract
    This article investigates the dynamic features of social tagging vocabularies in Delicious, Flickr, and YouTube from 2003 to 2008. Three algorithms are designed to study the macro- and micro-tag growth as well as the dynamics of taggers' activities, respectively. Moreover, we propose a Tagger Tag Resource Latent Dirichlet Allocation (TTR-LDA) model to explore the evolution of topics emerging from those social vocabularies. Our results show that (a) at the macro level, tag growth in all the three tagging systems obeys power law distribution with exponents lower than 1; at the micro level, the tag growth of popular resources in all three tagging systems follows a similar power law distribution; (b) the exponents of tag growth vary in different evolving stages of resources; (c) the growth of number of taggers associated with different popular resources presents a feature of convergence over time; (d) the active level of taggers has a positive correlation with the macro-tag growth of different tagging systems; and (e) some topics evolve into several subtopics over time while others experience relatively stable stages in which their contents do not change much, and certain groups of taggers continue their interests in them.
    Source
    Journal of the American Society for Information Science and Technology. 63(2012) no.1, S.139-162
  3. Li, D.; Tang, J.; Ding, Y.; Shuai, X.; Chambers, T.; Sun, G.; Luo, Z.; Zhang, J.: Topic-level opinion influence model (TOIM) : an investigation using tencent microblogging (2015) 0.01
    0.011673733 = product of:
      0.052531797 = sum of:
        0.041947264 = weight(_text_:applications in 2345) [ClassicSimilarity], result of:
          0.041947264 = score(doc=2345,freq=2.0), product of:
            0.17247584 = queryWeight, product of:
              4.4025097 = idf(docFreq=1471, maxDocs=44218)
              0.03917671 = queryNorm
            0.2432066 = fieldWeight in 2345, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.4025097 = idf(docFreq=1471, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2345)
        0.010584532 = weight(_text_:of in 2345) [ClassicSimilarity], result of:
          0.010584532 = score(doc=2345,freq=8.0), product of:
            0.061262865 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.03917671 = queryNorm
            0.17277241 = fieldWeight in 2345, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2345)
      0.22222222 = coord(2/9)
    
    Abstract
    Text mining has been widely used in multiple types of user-generated data to infer user opinion, but its application to microblogging is difficult because text messages are short and noisy, providing limited information about user opinion. Given that microblogging users communicate with each other to form a social network, we hypothesize that user opinion is influenced by its neighbors in the network. In this paper, we infer user opinion on a topic by combining two factors: the user's historical opinion about relevant topics and opinion influence from his/her neighbors. We thus build a topic-level opinion influence model (TOIM) by integrating both topic factor and opinion influence factor into a unified probabilistic model. We evaluate our model in one of the largest microblogging sites in China, Tencent Weibo, and the experiments show that TOIM outperforms baseline methods in opinion inference accuracy. Moreover, incorporating indirect influence further improves inference recall and f1-measure. Finally, we demonstrate some useful applications of TOIM in analyzing users' behaviors in Tencent Weibo.
    Source
    Journal of the Association for Information Science and Technology. 66(2015) no.12, S.2657-2673
  4. Ding, Y.; Jacob, E.K.; Fried, M.; Toma, I.; Yan, E.; Foo, S.; Milojevicacute, S.: Upper tag ontology for integrating social tagging data (2010) 0.01
    0.010539032 = product of:
      0.047425643 = sum of:
        0.022897845 = weight(_text_:of in 3421) [ClassicSimilarity], result of:
          0.022897845 = score(doc=3421,freq=26.0), product of:
            0.061262865 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.03917671 = queryNorm
            0.37376386 = fieldWeight in 3421, product of:
              5.0990195 = tf(freq=26.0), with freq of:
                26.0 = termFreq=26.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.046875 = fieldNorm(doc=3421)
        0.0245278 = weight(_text_:systems in 3421) [ClassicSimilarity], result of:
          0.0245278 = score(doc=3421,freq=2.0), product of:
            0.12039685 = queryWeight, product of:
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.03917671 = queryNorm
            0.2037246 = fieldWeight in 3421, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.046875 = fieldNorm(doc=3421)
      0.22222222 = coord(2/9)
    
    Abstract
    Data integration and mediation have become central concerns of information technology over the past few decades. With the advent of the Web and the rapid increases in the amount of data and the number of Web documents and users, researchers have focused on enhancing the interoperability of data through the development of metadata schemes. Other researchers have looked to the wealth of metadata generated by bookmarking sites on the Social Web. While several existing ontologies have capitalized on the semantics of metadata created by tagging activities, the Upper Tag Ontology (UTO) emphasizes the structure of tagging activities to facilitate modeling of tagging data and the integration of data from different bookmarking sites as well as the alignment of tagging ontologies. UTO is described and its utility in modeling, harvesting, integrating, searching, and analyzing data is demonstrated with metadata harvested from three major social tagging systems (Delicious, Flickr, and YouTube).
    Source
    Journal of the American Society for Information Science and Technology. 61(2010) no.3, S.505-521
  5. Li, D.; Ding, Y.; Sugimoto, C.; He, B.; Tang, J.; Yan, E.; Lin, N.; Qin, Z.; Dong, T.: Modeling topic and community structure in social tagging : the TTR-LDA-Community model (2011) 0.01
    0.009750018 = product of:
      0.04387508 = sum of:
        0.014968789 = weight(_text_:of in 4759) [ClassicSimilarity], result of:
          0.014968789 = score(doc=4759,freq=16.0), product of:
            0.061262865 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.03917671 = queryNorm
            0.24433708 = fieldWeight in 4759, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4759)
        0.02890629 = weight(_text_:systems in 4759) [ClassicSimilarity], result of:
          0.02890629 = score(doc=4759,freq=4.0), product of:
            0.12039685 = queryWeight, product of:
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.03917671 = queryNorm
            0.24009174 = fieldWeight in 4759, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4759)
      0.22222222 = coord(2/9)
    
    Abstract
    The presence of social networks in complex systems has made networks and community structure a focal point of study in many domains. Previous studies have focused on the structural emergence and growth of communities and on the topics displayed within the network. However, few scholars have closely examined the relationship between the thematic and structural properties of networks. Therefore, this article proposes the Tagger Tag Resource-Latent Dirichlet Allocation-Community model (TTR-LDA-Community model), which combines the Latent Dirichlet Allocation (LDA) model with the Girvan-Newman community detection algorithm through an inference mechanism. Using social tagging data from Delicious, this article demonstrates the clustering of active taggers into communities, the topic distributions within communities, and the ranking of taggers, tags, and resources within these communities. The data analysis evaluates patterns in community structure and topical affiliations diachronically. The article evaluates the effectiveness of community detection and the inference mechanism embedded in the model and finds that the TTR-LDA-Community model outperforms other traditional models in tag prediction. This has implications for scholars in domains interested in community detection, profiling, and recommender systems.
    Source
    Journal of the American Society for Information Science and Technology. 62(2011) no.9, S.1849-1866
  6. Ding, Y.: Applying weighted PageRank to author citation networks (2011) 0.01
    0.0069801607 = product of:
      0.031410724 = sum of:
        0.0128330635 = weight(_text_:of in 4188) [ClassicSimilarity], result of:
          0.0128330635 = score(doc=4188,freq=6.0), product of:
            0.061262865 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.03917671 = queryNorm
            0.20947541 = fieldWeight in 4188, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4188)
        0.018577661 = product of:
          0.037155323 = sum of:
            0.037155323 = weight(_text_:22 in 4188) [ClassicSimilarity], result of:
              0.037155323 = score(doc=4188,freq=2.0), product of:
                0.13719016 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03917671 = queryNorm
                0.2708308 = fieldWeight in 4188, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=4188)
          0.5 = coord(1/2)
      0.22222222 = coord(2/9)
    
    Abstract
    This article aims to identify whether different weighted PageRank algorithms can be applied to author citation networks to measure the popularity and prestige of a scholar from a citation perspective. Information retrieval (IR) was selected as a test field and data from 1956-2008 were collected from Web of Science. Weighted PageRank with citation and publication as weighted vectors were calculated on author citation networks. The results indicate that both popularity rank and prestige rank were highly correlated with the weighted PageRank. Principal component analysis was conducted to detect relationships among these different measures. For capturing prize winners within the IR field, prestige rank outperformed all the other measures
    Date
    22. 1.2011 13:02:21
    Source
    Journal of the American Society for Information Science and Technology. 62(2011) no.2, S.236-245
  7. Ni, C.; Shaw, D.; Lind, S.M.; Ding, Y.: Journal impact and proximity : an assessment using bibliographic features (2013) 0.00
    0.002544205 = product of:
      0.022897845 = sum of:
        0.022897845 = weight(_text_:of in 686) [ClassicSimilarity], result of:
          0.022897845 = score(doc=686,freq=26.0), product of:
            0.061262865 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.03917671 = queryNorm
            0.37376386 = fieldWeight in 686, product of:
              5.0990195 = tf(freq=26.0), with freq of:
                26.0 = termFreq=26.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.046875 = fieldNorm(doc=686)
      0.11111111 = coord(1/9)
    
    Abstract
    Journals in the Information Science & Library Science category of Journal Citation Reports (JCR) were compared using both bibliometric and bibliographic features. Data collected covered journal impact factor (JIF), number of issues per year, number of authors per article, longevity, editorial board membership, frequency of publication, number of databases indexing the journal, number of aggregators providing full-text access, country of publication, JCR categories, Dewey decimal classification, and journal statement of scope. Three features significantly correlated with JIF: number of editorial board members and number of JCR categories in which a journal is listed correlated positively; journal longevity correlated negatively with JIF. Coword analysis of journal descriptions provided a proximity clustering of journals, which differed considerably from the clusters based on editorial board membership. Finally, a multiple linear regression model was built to predict the JIF based on all the collected bibliographic features.
    Source
    Journal of the American Society for Information Science and Technology. 64(2013) no.4, S.802-817
  8. Milojevic, S.; Sugimoto, C.R.; Yan, E.; Ding, Y.: ¬The cognitive structure of Library and Information Science : analysis of article title words (2011) 0.00
    0.0024947983 = product of:
      0.022453185 = sum of:
        0.022453185 = weight(_text_:of in 4608) [ClassicSimilarity], result of:
          0.022453185 = score(doc=4608,freq=36.0), product of:
            0.061262865 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.03917671 = queryNorm
            0.36650562 = fieldWeight in 4608, product of:
              6.0 = tf(freq=36.0), with freq of:
                36.0 = termFreq=36.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4608)
      0.11111111 = coord(1/9)
    
    Abstract
    This study comprises a suite of analyses of words in article titles in order to reveal the cognitive structure of Library and Information Science (LIS). The use of title words to elucidate the cognitive structure of LIS has been relatively neglected. The present study addresses this gap by performing (a) co-word analysis and hierarchical clustering, (b) multidimensional scaling, and (c) determination of trends in usage of terms. The study is based on 10,344 articles published between 1988 and 2007 in 16 LIS journals. Methodologically, novel aspects of this study are: (a) its large scale, (b) removal of non-specific title words based on the "word concentration" measure (c) identification of the most frequent terms that include both single words and phrases, and (d) presentation of the relative frequencies of terms using "heatmaps". Conceptually, our analysis reveals that LIS consists of three main branches: the traditionally recognized library-related and information-related branches, plus an equally distinct bibliometrics/scientometrics branch. The three branches focus on: libraries, information, and science, respectively. In addition, our study identifies substructures within each branch. We also tentatively identify "information seeking behavior" as a branch that is establishing itself separate from the three main branches. Furthermore, we find that cognitive concepts in LIS evolve continuously, with no stasis since 1992. The most rapid development occurred between 1998 and 2001, influenced by the increased focus on the Internet. The change in the cognitive landscape is found to be driven by the emergence of new information technologies, and the retirement of old ones.
    Source
    Journal of the American Society for Information Science and Technology. 62(2011) no.10, S.1933-1953
  9. Zhang, G.; Ding, Y.; Milojevic, S.: Citation content analysis (CCA) : a framework for syntactic and semantic analysis of citation content (2013) 0.00
    0.0022314154 = product of:
      0.020082738 = sum of:
        0.020082738 = weight(_text_:of in 975) [ClassicSimilarity], result of:
          0.020082738 = score(doc=975,freq=20.0), product of:
            0.061262865 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.03917671 = queryNorm
            0.32781258 = fieldWeight in 975, product of:
              4.472136 = tf(freq=20.0), with freq of:
                20.0 = termFreq=20.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.046875 = fieldNorm(doc=975)
      0.11111111 = coord(1/9)
    
    Abstract
    This study proposes a new framework for citation content analysis (CCA), for syntactic and semantic analysis of citation content that can be used to better analyze the rich sociocultural context of research behavior. This framework could be considered the next generation of citation analysis. The authors briefly review the history and features of content analysis in traditional social sciences and its previous application in library and information science (LIS). Based on critical discussion of the theoretical necessity of a new method as well as the limits of citation analysis, the nature and purposes of CCA are discussed, and potential procedures to conduct CCA, including principles to identify the reference scope, a two-dimensional (citing and cited) and two-module (syntactic and semantic) codebook, are provided and described. Future work and implications are also suggested.
    Source
    Journal of the American Society for Information Science and Technology. 64(2013) no.7, S.1490-1503
  10. Zhai, Y; Ding, Y.; Wang, F.: Measuring the diffusion of an innovation : a citation analysis (2018) 0.00
    0.0022314154 = product of:
      0.020082738 = sum of:
        0.020082738 = weight(_text_:of in 4116) [ClassicSimilarity], result of:
          0.020082738 = score(doc=4116,freq=20.0), product of:
            0.061262865 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.03917671 = queryNorm
            0.32781258 = fieldWeight in 4116, product of:
              4.472136 = tf(freq=20.0), with freq of:
                20.0 = termFreq=20.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.046875 = fieldNorm(doc=4116)
      0.11111111 = coord(1/9)
    
    Abstract
    Innovations transform our research traditions and become the driving force to advance individual, group, and social creativity. Meanwhile, interdisciplinary research is increasingly being promoted as a route to advance the complex challenges we face as a society. In this paper, we use Latent Dirichlet Allocation (LDA) citation as a proxy context for the diffusion of an innovation. With an analysis of topic evolution, we divide the diffusion process into five stages: testing and evaluation, implementation, improvement, extending, and fading. Through a correlation analysis of topic and subject, we show the application of LDA in different subjects. We also reveal the cross-boundary diffusion between different subjects based on the analysis of the interdisciplinary studies. The results show that as LDA is transferred into different areas, the adoption of each subject is relatively adjacent to those with similar research interests. Our findings further support researchers' understanding of the impact formation of innovation.
    Source
    Journal of the Association for Information Science and Technology. 69(2018) no.3, S.368-379
  11. Yan, E.; Ding, Y.: Weighted citation : an indicator of an article's prestige (2010) 0.00
    0.0021037988 = product of:
      0.018934188 = sum of:
        0.018934188 = weight(_text_:of in 3705) [ClassicSimilarity], result of:
          0.018934188 = score(doc=3705,freq=10.0), product of:
            0.061262865 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.03917671 = queryNorm
            0.3090647 = fieldWeight in 3705, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0625 = fieldNorm(doc=3705)
      0.11111111 = coord(1/9)
    
    Abstract
    The authors propose using the technique of weighted citation to measure an article's prestige. The technique allocates a different weight to each reference by taking into account the impact of citing journals and citation time intervals. Weightedcitation captures prestige, whereas citation counts capture popularity. They compare the value variances for popularity and prestige for articles published in the Journal of the American Society for Information Science and Technology from 1998 to 2007, and find that the majority have comparable status.
    Source
    Journal of the American Society for Information Science and Technology. 61(2010) no.8, S.1635-1643
  12. Song, M.; Kim, S.Y.; Zhang, G.; Ding, Y.; Chambers, T.: Productivity and influence in bioinformatics : a bibliometric analysis using PubMed central (2014) 0.00
    0.0018669361 = product of:
      0.016802425 = sum of:
        0.016802425 = weight(_text_:of in 1202) [ClassicSimilarity], result of:
          0.016802425 = score(doc=1202,freq=14.0), product of:
            0.061262865 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.03917671 = queryNorm
            0.2742677 = fieldWeight in 1202, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.046875 = fieldNorm(doc=1202)
      0.11111111 = coord(1/9)
    
    Abstract
    Bioinformatics is a fast-growing field based on the optimal use of "big data" gathered in genomic, proteomics, and functional genomics research. In this paper, we conduct a comprehensive and in-depth bibliometric analysis of the field of bioinformatics by extracting citation data from PubMed Central full-text. Citation data for the period 2000 to 2011, comprising 20,869 papers with 546,245 citations, was used to evaluate the productivity and influence of this emerging field. Four measures were used to identify productivity; most productive authors, most productive countries, most productive organizations, and most popular subject terms. Research impact was analyzed based on the measures of most cited papers, most cited authors, emerging stars, and leading organizations. Results show the overall trends between the periods 2000 to 2003 and 2004 to 2007 were dissimilar, while trends between the periods 2004 to 2007 and 2008 to 2011 were similar. In addition, the field of bioinformatics has undergone a significant shift, co-evolving with other biomedical disciplines.
    Source
    Journal of the Association for Information Science and Technology. 65(2014) no.2, S.352-371
  13. Lu, C.; Bu, Y.; Wang, J.; Ding, Y.; Torvik, V.; Schnaars, M.; Zhang, C.: Examining scientific writing styles from the perspective of linguistic complexity : a cross-level moderation model (2019) 0.00
    0.0018669361 = product of:
      0.016802425 = sum of:
        0.016802425 = weight(_text_:of in 5219) [ClassicSimilarity], result of:
          0.016802425 = score(doc=5219,freq=14.0), product of:
            0.061262865 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.03917671 = queryNorm
            0.2742677 = fieldWeight in 5219, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.046875 = fieldNorm(doc=5219)
      0.11111111 = coord(1/9)
    
    Abstract
    Publishing articles in high-impact English journals is difficult for scholars around the world, especially for non-native English-speaking scholars (NNESs), most of whom struggle with proficiency in English. To uncover the differences in English scientific writing between native English-speaking scholars (NESs) and NNESs, we collected a large-scale data set containing more than 150,000 full-text articles published in PLoS between 2006 and 2015. We divided these articles into three groups according to the ethnic backgrounds of the first and corresponding authors, obtained by Ethnea, and examined the scientific writing styles in English from a two-fold perspective of linguistic complexity: (a) syntactic complexity, including measurements of sentence length and sentence complexity; and (b) lexical complexity, including measurements of lexical diversity, lexical density, and lexical sophistication. The observations suggest marginal differences between groups in syntactical and lexical complexity.
    Source
    Journal of the Association for Information Science and Technology. 70(2019) no.5, S.462-475
  14. Sugimoto, C.R.; Li, D.; Russell, T.G.; Finlay, S.C.; Ding, Y.: ¬The shifting sands of disciplinary development : analyzing North American Library and Information Science dissertations using latent Dirichlet allocation (2011) 0.00
    0.0018595128 = product of:
      0.016735615 = sum of:
        0.016735615 = weight(_text_:of in 4143) [ClassicSimilarity], result of:
          0.016735615 = score(doc=4143,freq=20.0), product of:
            0.061262865 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.03917671 = queryNorm
            0.27317715 = fieldWeight in 4143, product of:
              4.472136 = tf(freq=20.0), with freq of:
                20.0 = termFreq=20.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4143)
      0.11111111 = coord(1/9)
    
    Abstract
    This work identifies changes in dominant topics in library and information science (LIS) over time, by analyzing the 3,121 doctoral dissertations completed between 1930 and 2009 at North American Library and Information Science programs. The authors utilize latent Dirichlet allocation (LDA) to identify latent topics diachronically and to identify representative dissertations of those topics. The findings indicate that the main topics in LIS have changed substantially from those in the initial period (1930-1969) to the present (2000-2009). However, some themes occurred in multiple periods, representing core areas of the field: library history occurred in the first two periods; citation analysis in the second and third periods; and information-seeking behavior in the fourth and last period. Two topics occurred in three of the five periods: information retrieval and information use. One of the notable changes in the topics was the diminishing use of the word library (and related terms). This has implications for the provision of doctoral education in LIS. This work is compared to other earlier analyses and provides validation for the use of LDA in topic analysis of a discipline.
    Source
    Journal of the American Society for Information Science and Technology. 62(2011) no.1, S.185-204
  15. He, B.; Ding, Y.; Ni, C.: Mining enriched contextual information of scientific collaboration : a meso perspective (2011) 0.00
    0.0018595128 = product of:
      0.016735615 = sum of:
        0.016735615 = weight(_text_:of in 4444) [ClassicSimilarity], result of:
          0.016735615 = score(doc=4444,freq=20.0), product of:
            0.061262865 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.03917671 = queryNorm
            0.27317715 = fieldWeight in 4444, product of:
              4.472136 = tf(freq=20.0), with freq of:
                20.0 = termFreq=20.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4444)
      0.11111111 = coord(1/9)
    
    Abstract
    Studying scientific collaboration using coauthorship networks has attracted much attention in recent years. How and in what context two authors collaborate remain among the major questions. Previous studies, however, have focused on either exploring the global topology of coauthorship networks (macro perspective) or ranking the impact of individual authors (micro perspective). Neither of them has provided information on the context of the collaboration between two specific authors, which may potentially imply rich socioeconomic, disciplinary, and institutional information on collaboration. Different from the macro perspective and micro perspective, this article proposes a novel method (meso perspective) to analyze scientific collaboration, in which a contextual subgraph is extracted as the unit of analysis. A contextual subgraph is defined as a small subgraph of a large-scale coauthorship network that captures relationship and context between two coauthors. This method is applied to the field of library and information science. Topological properties of all the subgraphs in four time spans are investigated, including size, average degree, clustering coefficient, and network centralization. Results show that contextual subgprahs capture useful contextual information on two authors' collaboration.
    Source
    Journal of the American Society for Information Science and Technology. 62(2011) no.5, S.831-845
  16. Li, D.; Luo, Z.; Ding, Y.; Tang, J.; Sun, G.G.-Z.; Dai, X.; Du, J.; Zhang, J.; Kong, S.: User-level microblogging recommendation incorporating social influence (2017) 0.00
    0.0018595128 = product of:
      0.016735615 = sum of:
        0.016735615 = weight(_text_:of in 3426) [ClassicSimilarity], result of:
          0.016735615 = score(doc=3426,freq=20.0), product of:
            0.061262865 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.03917671 = queryNorm
            0.27317715 = fieldWeight in 3426, product of:
              4.472136 = tf(freq=20.0), with freq of:
                20.0 = termFreq=20.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3426)
      0.11111111 = coord(1/9)
    
    Abstract
    With the information overload of user-generated content in microblogging, users find it extremely challenging to browse and find valuable information in their first attempt. In this paper we propose a microblogging recommendation algorithm, TSI-MR (Topic-Level Social Influence-based Microblogging Recommendation), which can significantly improve users' microblogging experiences. The main innovation of this proposed algorithm is that we consider social influences and their indirect structural relationships, which are largely based on social status theory, from the topic level. The primary advantage of this approach is that it can build an accurate description of latent relationships between two users with weak connections, which can improve the performance of the model; furthermore, it can solve sparsity problems of training data to a certain extent. The realization of the model is mainly based on Factor Graph. We also applied a distributed strategy to further improve the efficiency of the model. Finally, we use data from Tencent Weibo, one of the most popular microblogging services in China, to evaluate our methods. The results show that incorporating social influence can improve microblogging performance considerably, and outperform the baseline methods.
    Source
    Journal of the Association for Information Science and Technology. 68(2017) no.3, S.553-568
  17. Min, C.; Ding, Y.; Li, J.; Bu, Y.; Pei, L.; Sun, J.: Innovation or imitation : the diffusion of citations (2018) 0.00
    0.0017640886 = product of:
      0.015876798 = sum of:
        0.015876798 = weight(_text_:of in 4445) [ClassicSimilarity], result of:
          0.015876798 = score(doc=4445,freq=18.0), product of:
            0.061262865 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.03917671 = queryNorm
            0.25915858 = fieldWeight in 4445, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4445)
      0.11111111 = coord(1/9)
    
    Abstract
    Citations in scientific literature are important both for tracking the historical development of scientific ideas and for forecasting research trends. However, the diffusion mechanisms underlying the citation process remain poorly understood, despite the frequent and longstanding use of citation counts for assessment purposes within the scientific community. Here, we extend the study of citation dynamics to a more general diffusion process to understand how citation growth associates with different diffusion patterns. Using a classic diffusion model, we quantify and illustrate specific diffusion mechanisms which have been proven to exert a significant impact on the growth and decay of citation counts. Experiments reveal a positive relation between the "low p and low q" pattern and high scientific impact. A sharp citation peak produced by rapid change of citation counts, however, has a negative effect on future impact. In addition, we have suggested a simple indicator, saturation level, to roughly estimate an individual article's current stage in the life cycle and its potential to attract future attention. The proposed approach can also be extended to higher levels of aggregation (e.g., individual scientists, journals, institutions), providing further insights into the practice of scientific evaluation.
    Source
    Journal of the Association for Information Science and Technology. 69(2018) no.10, S.1271-1282
  18. Hu, B.; Dong, X.; Zhang, C.; Bowman, T.D.; Ding, Y.; Milojevic, S.; Ni, C.; Yan, E.; Larivière, V.: ¬A lead-lag analysis of the topic evolution patterns for preprints and publications (2015) 0.00
    0.0017284468 = product of:
      0.015556021 = sum of:
        0.015556021 = weight(_text_:of in 2337) [ClassicSimilarity], result of:
          0.015556021 = score(doc=2337,freq=12.0), product of:
            0.061262865 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.03917671 = queryNorm
            0.25392252 = fieldWeight in 2337, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.046875 = fieldNorm(doc=2337)
      0.11111111 = coord(1/9)
    
    Abstract
    This study applied LDA (latent Dirichlet allocation) and regression analysis to conduct a lead-lag analysis to identify different topic evolution patterns between preprints and papers from arXiv and the Web of Science (WoS) in astrophysics over the last 20 years (1992-2011). Fifty topics in arXiv and WoS were generated using an LDA algorithm and then regression models were used to explain 4 types of topic growth patterns. Based on the slopes of the fitted equation curves, the paper redefines the topic trends and popularity. Results show that arXiv and WoS share similar topics in a given domain, but differ in evolution trends. Topics in WoS lose their popularity much earlier and their durations of popularity are shorter than those in arXiv. This work demonstrates that open access preprints have stronger growth tendency as compared to traditional printed publications.
    Source
    Journal of the Association for Information Science and Technology. 66(2015) no.12, S.2643-2656
  19. Huang, Y.; Bu, Y.; Ding, Y.; Lu, W.: From zero to one : a perspective on citing (2019) 0.00
    0.0017284468 = product of:
      0.015556021 = sum of:
        0.015556021 = weight(_text_:of in 5387) [ClassicSimilarity], result of:
          0.015556021 = score(doc=5387,freq=12.0), product of:
            0.061262865 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.03917671 = queryNorm
            0.25392252 = fieldWeight in 5387, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.046875 = fieldNorm(doc=5387)
      0.11111111 = coord(1/9)
    
    Abstract
    This article investigates the lengths of time that publications with different numbers of citations take to receive their first citation (the beginning stage), and then compares the lengths of time to receive two or more citations after receiving the first citation (the accumulative stage) in the field of computer science. We find that in the beginning stage, that is, from zero to one citation, high-, medium-, and low-cited publications do not obviously exhibit different lengths of time. However, in the accumulative stage, that is, from one to N citations, highly cited publications begin to receive citations much more rapidly than medium- and low-cited publications. Moreover, as N increases, the difference in receiving new citations among high-, medium-, and low-cited publications increases quite significantly.
    Source
    Journal of the Association for Information Science and Technology. 70(2019) no.10, S.1098-1107
  20. Li, R.; Chambers, T.; Ding, Y.; Zhang, G.; Meng, L.: Patent citation analysis : calculating science linkage based on citing motivation (2014) 0.00
    0.0016631988 = product of:
      0.014968789 = sum of:
        0.014968789 = weight(_text_:of in 1257) [ClassicSimilarity], result of:
          0.014968789 = score(doc=1257,freq=16.0), product of:
            0.061262865 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.03917671 = queryNorm
            0.24433708 = fieldWeight in 1257, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1257)
      0.11111111 = coord(1/9)
    
    Abstract
    Science linkage is a widely used patent bibliometric indicator to measure patent linkage to scientific research based on the frequency of citations to scientific papers within the patent. Science linkage is also regarded as noisy because the subject of patent citation behavior varies from inventors/applicants to examiners. In order to identify and ultimately reduce this noise, we analyzed the different citing motivations of examiners and inventors/applicants. We built 4 hypotheses based upon our study of patent law, the unique economic nature of a patent, and a patent citation's market effect. To test our hypotheses, we conducted an expert survey based on our science linkage calculation in the domain of catalyst from U.S. patent data (2006-2009) over 3 types of citations: self-citation by inventor/applicant, non-self-citation by inventor/applicant, and citation by examiner. According to our results, evaluated by domain experts, we conclude that the non-self-citation by inventor/applicant is quite noisy and cannot indicate science linkage and that self-citation by inventor/applicant, although limited, is more appropriate for understanding science linkage.
    Source
    Journal of the Association for Information Science and Technology. 65(2014) no.5, S.1007-1017