Search (7 results, page 1 of 1)

  • × theme_ss:"Data Mining"
  • × year_i:[2010 TO 2020}
  1. Vaughan, L.; Chen, Y.: Data mining from web search queries : a comparison of Google trends and Baidu index (2015) 0.05
    0.05160403 = product of:
      0.12901007 = sum of:
        0.09411452 = weight(_text_:index in 1605) [ClassicSimilarity], result of:
          0.09411452 = score(doc=1605,freq=6.0), product of:
            0.2250935 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.051511593 = queryNorm
            0.418113 = fieldWeight in 1605, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1605)
        0.03489555 = weight(_text_:22 in 1605) [ClassicSimilarity], result of:
          0.03489555 = score(doc=1605,freq=2.0), product of:
            0.18038483 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051511593 = queryNorm
            0.19345059 = fieldWeight in 1605, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1605)
      0.4 = coord(2/5)
    
    Abstract
    Numerous studies have explored the possibility of uncovering information from web search queries but few have examined the factors that affect web query data sources. We conducted a study that investigated this issue by comparing Google Trends and Baidu Index. Data from these two services are based on queries entered by users into Google and Baidu, two of the largest search engines in the world. We first compared the features and functions of the two services based on documents and extensive testing. We then carried out an empirical study that collected query volume data from the two sources. We found that data from both sources could be used to predict the quality of Chinese universities and companies. Despite the differences between the two services in terms of technology, such as differing methods of language processing, the search volume data from the two were highly correlated and combining the two data sources did not improve the predictive power of the data. However, there was a major difference between the two in terms of data availability. Baidu Index was able to provide more search volume data than Google Trends did. Our analysis showed that the disadvantage of Google Trends in this regard was due to Google's smaller user base in China. The implication of this finding goes beyond China. Google's user bases in many countries are smaller than that in China, so the search volume data related to those countries could result in the same issue as that related to China.
    Source
    Journal of the Association for Information Science and Technology. 66(2015) no.1, S.13-22
  2. Tu, Y.-N.; Hsu, S.-L.: Constructing conceptual trajectory maps to trace the development of research fields (2016) 0.02
    0.015368836 = product of:
      0.07684418 = sum of:
        0.07684418 = weight(_text_:index in 3059) [ClassicSimilarity], result of:
          0.07684418 = score(doc=3059,freq=4.0), product of:
            0.2250935 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.051511593 = queryNorm
            0.3413878 = fieldWeight in 3059, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3059)
      0.2 = coord(1/5)
    
    Abstract
    This study proposes a new method to construct and trace the trajectory of conceptual development of a research field by combining main path analysis, citation analysis, and text-mining techniques. Main path analysis, a method used commonly to trace the most critical path in a citation network, helps describe the developmental trajectory of a research field. This study extends the main path analysis method and applies text-mining techniques in the new method, which reflects the trajectory of conceptual development in an academic research field more accurately than citation frequency, which represents only the articles examined. Articles can be merged based on similarity of concepts, and by merging concepts the history of a research field can be described more precisely. The new method was applied to the "h-index" and "text mining" fields. The precision, recall, and F-measures of the h-index were 0.738, 0.652, and 0.658 and those of text-mining were 0.501, 0.653, and 0.551, respectively. Last, this study not only establishes the conceptual trajectory map of a research field, but also recommends keywords that are more precise than those used currently by researchers. These precise keywords could enable researchers to gather related works more quickly than before.
  3. Leydesdorff, L.; Persson, O.: Mapping the geography of science : distribution patterns and networks of relations among cities and institutes (2010) 0.01
    0.013040888 = product of:
      0.06520444 = sum of:
        0.06520444 = weight(_text_:index in 3704) [ClassicSimilarity], result of:
          0.06520444 = score(doc=3704,freq=2.0), product of:
            0.2250935 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.051511593 = queryNorm
            0.28967714 = fieldWeight in 3704, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.046875 = fieldNorm(doc=3704)
      0.2 = coord(1/5)
    
    Object
    Science Citation Index
  4. Mining text data (2012) 0.01
    0.008693925 = product of:
      0.043469626 = sum of:
        0.043469626 = weight(_text_:index in 362) [ClassicSimilarity], result of:
          0.043469626 = score(doc=362,freq=2.0), product of:
            0.2250935 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.051511593 = queryNorm
            0.1931181 = fieldWeight in 362, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.03125 = fieldNorm(doc=362)
      0.2 = coord(1/5)
    
    Content
    Inhalt: An Introduction to Text Mining.- Information Extraction from Text.- A Survey of Text Summarization Techniques.- A Survey of Text Clustering Algorithms.- Dimensionality Reduction and Topic Modeling.- A Survey of Text Classification Algorithms.- Transfer Learning for Text Mining.- Probabilistic Models for Text Mining.- Mining Text Streams.- Translingual Mining from Text Data.- Text Mining in Multimedia.- Text Analytics in Social Media.- A Survey of Opinion Mining and Sentiment Analysis.- Biomedical Text Mining: A Survey of Recent Progress.- Index.
  5. Hallonsten, O.; Holmberg, D.: Analyzing structural stratification in the Swedish higher education system : data contextualization with policy-history analysis (2013) 0.01
    0.00697911 = product of:
      0.03489555 = sum of:
        0.03489555 = weight(_text_:22 in 668) [ClassicSimilarity], result of:
          0.03489555 = score(doc=668,freq=2.0), product of:
            0.18038483 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051511593 = queryNorm
            0.19345059 = fieldWeight in 668, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=668)
      0.2 = coord(1/5)
    
    Date
    22. 3.2013 19:43:01
  6. Fonseca, F.; Marcinkowski, M.; Davis, C.: Cyber-human systems of thought and understanding (2019) 0.01
    0.00697911 = product of:
      0.03489555 = sum of:
        0.03489555 = weight(_text_:22 in 5011) [ClassicSimilarity], result of:
          0.03489555 = score(doc=5011,freq=2.0), product of:
            0.18038483 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051511593 = queryNorm
            0.19345059 = fieldWeight in 5011, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5011)
      0.2 = coord(1/5)
    
    Date
    7. 3.2019 16:32:22
  7. Jäger, L.: Von Big Data zu Big Brother (2018) 0.01
    0.005583288 = product of:
      0.02791644 = sum of:
        0.02791644 = weight(_text_:22 in 5234) [ClassicSimilarity], result of:
          0.02791644 = score(doc=5234,freq=2.0), product of:
            0.18038483 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051511593 = queryNorm
            0.15476047 = fieldWeight in 5234, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=5234)
      0.2 = coord(1/5)
    
    Date
    22. 1.2018 11:33:49