Search (8 results, page 1 of 1)

  • × year_i:[2010 TO 2020}
  • × theme_ss:"Data Mining"
  1. Vaughan, L.; Chen, Y.: Data mining from web search queries : a comparison of Google trends and Baidu index (2015) 0.04
    0.039523818 = product of:
      0.079047635 = sum of:
        0.06387607 = weight(_text_:engines in 1605) [ClassicSimilarity], result of:
          0.06387607 = score(doc=1605,freq=2.0), product of:
            0.22757743 = queryWeight, product of:
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.04479146 = queryNorm
            0.2806784 = fieldWeight in 1605, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1605)
        0.015171562 = product of:
          0.030343125 = sum of:
            0.030343125 = weight(_text_:22 in 1605) [ClassicSimilarity], result of:
              0.030343125 = score(doc=1605,freq=2.0), product of:
                0.15685207 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04479146 = queryNorm
                0.19345059 = fieldWeight in 1605, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1605)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Numerous studies have explored the possibility of uncovering information from web search queries but few have examined the factors that affect web query data sources. We conducted a study that investigated this issue by comparing Google Trends and Baidu Index. Data from these two services are based on queries entered by users into Google and Baidu, two of the largest search engines in the world. We first compared the features and functions of the two services based on documents and extensive testing. We then carried out an empirical study that collected query volume data from the two sources. We found that data from both sources could be used to predict the quality of Chinese universities and companies. Despite the differences between the two services in terms of technology, such as differing methods of language processing, the search volume data from the two were highly correlated and combining the two data sources did not improve the predictive power of the data. However, there was a major difference between the two in terms of data availability. Baidu Index was able to provide more search volume data than Google Trends did. Our analysis showed that the disadvantage of Google Trends in this regard was due to Google's smaller user base in China. The implication of this finding goes beyond China. Google's user bases in many countries are smaller than that in China, so the search volume data related to those countries could result in the same issue as that related to China.
    Source
    Journal of the Association for Information Science and Technology. 66(2015) no.1, S.13-22
  2. Berry, M.W.; Esau, R.; Kiefer, B.: ¬The use of text mining techniques in electronic discovery for legal matters (2012) 0.03
    0.027100323 = product of:
      0.10840129 = sum of:
        0.10840129 = weight(_text_:engines in 91) [ClassicSimilarity], result of:
          0.10840129 = score(doc=91,freq=4.0), product of:
            0.22757743 = queryWeight, product of:
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.04479146 = queryNorm
            0.47632706 = fieldWeight in 91, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.046875 = fieldNorm(doc=91)
      0.25 = coord(1/4)
    
    Footnote
    Vgl.: http://www.igi-global.com/book/next-generation-search-engines/64425.
    Source
    Next generation search engines: advanced models for information retrieval. Eds.: C. Jouis, u.a
  3. Biskri, I.; Rompré, L.: Using association rules for query reformulation (2012) 0.03
    0.027100323 = product of:
      0.10840129 = sum of:
        0.10840129 = weight(_text_:engines in 92) [ClassicSimilarity], result of:
          0.10840129 = score(doc=92,freq=4.0), product of:
            0.22757743 = queryWeight, product of:
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.04479146 = queryNorm
            0.47632706 = fieldWeight in 92, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.046875 = fieldNorm(doc=92)
      0.25 = coord(1/4)
    
    Footnote
    Vgl.: http://www.igi-global.com/book/next-generation-search-engines/64430.
    Source
    Next generation search engines: advanced models for information retrieval. Eds.: C. Jouis, u.a
  4. Huvila, I.: Mining qualitative data on human information behaviour from the Web (2010) 0.02
    0.022356624 = product of:
      0.089426495 = sum of:
        0.089426495 = weight(_text_:engines in 4676) [ClassicSimilarity], result of:
          0.089426495 = score(doc=4676,freq=2.0), product of:
            0.22757743 = queryWeight, product of:
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.04479146 = queryNorm
            0.39294976 = fieldWeight in 4676, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4676)
      0.25 = coord(1/4)
    
    Abstract
    This paper discusses an approach of collecting qualitative data on human information behaviour that is based on mining web data using search engines. The approach is technically the same that has been used for some time in webometric research to make statistical inferences on web data, but the present paper shows how the same tools and data collecting methods can be used to gather data for qualitative data analysis on human information behaviour.
  5. Sarnikar, S.; Zhang, Z.; Zhao, J.L.: Query-performance prediction for effective query routing in domain-specific repositories (2014) 0.02
    0.01916282 = product of:
      0.07665128 = sum of:
        0.07665128 = weight(_text_:engines in 1326) [ClassicSimilarity], result of:
          0.07665128 = score(doc=1326,freq=2.0), product of:
            0.22757743 = queryWeight, product of:
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.04479146 = queryNorm
            0.33681408 = fieldWeight in 1326, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.046875 = fieldNorm(doc=1326)
      0.25 = coord(1/4)
    
    Abstract
    The effective use of corporate memory is becoming increasingly important because every aspect of e-business requires access to information repositories. Unfortunately, less-than-satisfying effectiveness in state-of-the-art information-retrieval techniques is well known, even for some of the best search engines such as Google. In this study, the authors resolve this retrieval ineffectiveness problem by developing a new framework for predicting query performance, which is the first step toward better retrieval effectiveness. Specifically, they examine the relationship between query performance and query context. A query context consists of the query itself, the document collection, and the interaction between the two. The authors first analyze the characteristics of query context and develop various features for predicting query performance. Then, they propose a context-sensitive model for predicting query performance based on the characteristics of the query and the document collection. Finally, they validate this model with respect to five real-world collections of documents and demonstrate its utility in routing queries to the correct repository with high accuracy.
  6. Hallonsten, O.; Holmberg, D.: Analyzing structural stratification in the Swedish higher education system : data contextualization with policy-history analysis (2013) 0.00
    0.0037928906 = product of:
      0.015171562 = sum of:
        0.015171562 = product of:
          0.030343125 = sum of:
            0.030343125 = weight(_text_:22 in 668) [ClassicSimilarity], result of:
              0.030343125 = score(doc=668,freq=2.0), product of:
                0.15685207 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04479146 = queryNorm
                0.19345059 = fieldWeight in 668, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=668)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    22. 3.2013 19:43:01
  7. Fonseca, F.; Marcinkowski, M.; Davis, C.: Cyber-human systems of thought and understanding (2019) 0.00
    0.0037928906 = product of:
      0.015171562 = sum of:
        0.015171562 = product of:
          0.030343125 = sum of:
            0.030343125 = weight(_text_:22 in 5011) [ClassicSimilarity], result of:
              0.030343125 = score(doc=5011,freq=2.0), product of:
                0.15685207 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04479146 = queryNorm
                0.19345059 = fieldWeight in 5011, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5011)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    7. 3.2019 16:32:22
  8. Jäger, L.: Von Big Data zu Big Brother (2018) 0.00
    0.0030343123 = product of:
      0.012137249 = sum of:
        0.012137249 = product of:
          0.024274498 = sum of:
            0.024274498 = weight(_text_:22 in 5234) [ClassicSimilarity], result of:
              0.024274498 = score(doc=5234,freq=2.0), product of:
                0.15685207 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04479146 = queryNorm
                0.15476047 = fieldWeight in 5234, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03125 = fieldNorm(doc=5234)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    22. 1.2018 11:33:49