Search (7 results, page 1 of 1)

  • × author_ss:"Chen, Y."
  1. Vaughan, L.; Chen, Y.: Data mining from web search queries : a comparison of Google trends and Baidu index (2015) 0.02
    0.018982807 = product of:
      0.037965614 = sum of:
        0.037965614 = sum of:
          0.006765375 = weight(_text_:a in 1605) [ClassicSimilarity], result of:
            0.006765375 = score(doc=1605,freq=8.0), product of:
              0.053105544 = queryWeight, product of:
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.046056706 = queryNorm
              0.12739488 = fieldWeight in 1605, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1605)
          0.03120024 = weight(_text_:22 in 1605) [ClassicSimilarity], result of:
            0.03120024 = score(doc=1605,freq=2.0), product of:
              0.16128273 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046056706 = queryNorm
              0.19345059 = fieldWeight in 1605, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1605)
      0.5 = coord(1/2)
    
    Abstract
    Numerous studies have explored the possibility of uncovering information from web search queries but few have examined the factors that affect web query data sources. We conducted a study that investigated this issue by comparing Google Trends and Baidu Index. Data from these two services are based on queries entered by users into Google and Baidu, two of the largest search engines in the world. We first compared the features and functions of the two services based on documents and extensive testing. We then carried out an empirical study that collected query volume data from the two sources. We found that data from both sources could be used to predict the quality of Chinese universities and companies. Despite the differences between the two services in terms of technology, such as differing methods of language processing, the search volume data from the two were highly correlated and combining the two data sources did not improve the predictive power of the data. However, there was a major difference between the two in terms of data availability. Baidu Index was able to provide more search volume data than Google Trends did. Our analysis showed that the disadvantage of Google Trends in this regard was due to Google's smaller user base in China. The implication of this finding goes beyond China. Google's user bases in many countries are smaller than that in China, so the search volume data related to those countries could result in the same issue as that related to China.
    Source
    Journal of the Association for Information Science and Technology. 66(2015) no.1, S.13-22
    Type
    a
  2. Jiang, Z.; Liu, X.; Chen, Y.: Recovering uncaptured citations in a scholarly network : a two-step citation analysis to estimate publication importance (2016) 0.00
    0.0023919214 = product of:
      0.0047838427 = sum of:
        0.0047838427 = product of:
          0.009567685 = sum of:
            0.009567685 = weight(_text_:a in 3018) [ClassicSimilarity], result of:
              0.009567685 = score(doc=3018,freq=16.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.18016359 = fieldWeight in 3018, product of:
                  4.0 = tf(freq=16.0), with freq of:
                    16.0 = termFreq=16.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3018)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The citation relationships between publications, which are significant for assessing the importance of scholarly components within a network, have been used for various scientific applications. Missing citation metadata in scholarly databases, however, create problems for classical citation-based ranking algorithms and challenge the performance of citation-based retrieval systems. In this research, we utilize a two-step citation analysis method to investigate the importance of publications for which citation information is partially missing. First, we calculate the importance of the author and then use his importance to estimate the publication importance for some selected articles. To evaluate this method, we designed a simulation experiment-"random citation-missing"-to test the two-step citation analysis that we carried out with the Association for Computing Machinery (ACM) Digital Library (DL). In this experiment, we simulated different scenarios in a large-scale scientific digital library, from high-quality citation data, to very poor quality data, The results show that a two-step citation analysis can effectively uncover the importance of publications in different situations. More importantly, we found that the optimized impact from the importance of an author (first step) is exponentially increased when the quality of citation decreases. The findings from this study can further enhance citation-based publication-ranking algorithms for real-world applications.
    Type
    a
  3. Zeng, M.L.; Chen, Y.: Features of an integrated thesaurus management and search system for the networked environment (2003) 0.00
    0.0023678814 = product of:
      0.0047357627 = sum of:
        0.0047357627 = product of:
          0.009471525 = sum of:
            0.009471525 = weight(_text_:a in 3817) [ClassicSimilarity], result of:
              0.009471525 = score(doc=3817,freq=8.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.17835285 = fieldWeight in 3817, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=3817)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Reports an integrated system that employs an open structure for managing the distributed resources (thesauri and databases) and integrates a thesaurus management system with a crossthesaurus search system. Describes the functions of the system that highlight the unique design for the networked environment.
    Source
    Subject retrieval in a networked environment: Proceedings of the IFLA Satellite Meeting held in Dublin, OH, 14-16 August 2001 and sponsored by the IFLA Classification and Indexing Section, the IFLA Information Technology Section and OCLC. Ed.: I.C. McIlwaine
    Type
    a
  4. Liu, Y.; Shi, J.; Chen, Y.: Patient-centered and experience-aware mining for effective adverse drug reaction discovery in online health forums (2018) 0.00
    0.002269176 = product of:
      0.004538352 = sum of:
        0.004538352 = product of:
          0.009076704 = sum of:
            0.009076704 = weight(_text_:a in 4114) [ClassicSimilarity], result of:
              0.009076704 = score(doc=4114,freq=10.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.1709182 = fieldWeight in 4114, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4114)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Adverse Drug Reactions (ADRs) have become a serious health problem and even a leading cause of death in the United States. Pre-marketing clinical trials and traditional post-marketing surveillance using voluntary and spontaneous report systems are insufficient for ADR detection. On the other hand, online health forums provide valuable evidences in a large scale and in a timely fashion through the active participation of patients, caregivers, and doctors. In this article, we present patient-centered and experience-aware mining framework for effective ADR discovery using online health forum data. Our experimental evaluation with both an official ADR knowledge base and human-annotated ground truth verifies the effectiveness of the proposed method for ADR discovery.
    Type
    a
  5. Ackerman, B.; Wang, C.; Chen, Y.: ¬A session-specific opportunity cost model for rank-oriented recommendation (2018) 0.00
    0.002269176 = product of:
      0.004538352 = sum of:
        0.004538352 = product of:
          0.009076704 = sum of:
            0.009076704 = weight(_text_:a in 4468) [ClassicSimilarity], result of:
              0.009076704 = score(doc=4468,freq=10.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.1709182 = fieldWeight in 4468, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4468)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Recommender systems are changing the way that people find information, products, and even other people. This paper studies the problem of leveraging the context of the items presented to the user in a user/system interaction session to improve the recommender system's ranking prediction. We propose a novel model that incorporates the opportunity cost of giving up the other items in the session and computes session-specific relevance values for items for context-aware recommendation. The model can work on a variety of different problems settings with emphasis on implicit user feedback as it supports varying levels of ordinal relevance. Experimental evaluation demonstrates the advantages of our new model with respect to the ranking quality.
    Type
    a
  6. Wang, C.; Zhao, S.; Kalra, A.; Borcea, C.; Chen, Y.: Predictive models and analysis for webpage depth-level dwell time (2018) 0.00
    0.0022374375 = product of:
      0.004474875 = sum of:
        0.004474875 = product of:
          0.00894975 = sum of:
            0.00894975 = weight(_text_:a in 4370) [ClassicSimilarity], result of:
              0.00894975 = score(doc=4370,freq=14.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.1685276 = fieldWeight in 4370, product of:
                  3.7416575 = tf(freq=14.0), with freq of:
                    14.0 = termFreq=14.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4370)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    A half of online display ads are not rendered viewable because the users do not scroll deep enough or spend sufficient time at the page depth where the ads are placed. In order to increase the marketing efficiency and ad effectiveness, there is a strong demand for viewability prediction from both advertisers and publishers. This paper aims to predict the dwell time for a given urn:x-wiley:23301635:media:asi24025:asi24025-math-0001 triplet based on historic data collected by publishers. This problem is difficult because of user behavior variability and data sparsity. To solve it, we propose predictive models based on Factorization Machines and Field-aware Factorization Machines in order to overcome the data sparsity issue and provide flexibility to add auxiliary information such as the visible area of a user's browser. In addition, we leverage the prior dwell time behavior of the user within the current page view, that is, time series information, to further improve the proposed models. Experimental results using data from a large web publisher demonstrate that the proposed models outperform comparison models. Also, the results show that adding time series information further improves the performance.
    Type
    a
  7. Zhang, J.; Chen, Y.; Zhao, Y.; Wolfram, D.; Ma, F.: Public health and social media : a study of Zika virus-related posts on Yahoo! Answers (2020) 0.00
    0.0016913437 = product of:
      0.0033826875 = sum of:
        0.0033826875 = product of:
          0.006765375 = sum of:
            0.006765375 = weight(_text_:a in 5672) [ClassicSimilarity], result of:
              0.006765375 = score(doc=5672,freq=8.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.12739488 = fieldWeight in 5672, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5672)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    This study investigates the content of questions and responses about the Zika virus on Yahoo! Answers as a recent example of how public concerns regarding an international health issue are reflected in social media. We investigate the contents of posts about the Zika virus on Yahoo! Answers, identify and reveal subject patterns about the Zika virus, and analyze the temporal changes of the revealed subject topics over 4 defined periods of the Zika virus outbreak. Multidimensional scaling analysis, temporal analysis, and inferential statistical analysis approaches were used in the study. A resulting 2-layer Zika virus schema, and term connections and relationships are presented. The results indicate that consumers' concerns changed over the 4 defined periods. Consumers paid more attention to the basic information about the Zika virus, and the prevention and protection from the Zika virus at the beginning of the outbreak of the Zika virus. During the later periods, consumers became more interested in the role that the government and health organizations played in the public health emergency.
    Type
    a