Search (7 results, page 1 of 1)

  • × theme_ss:"Data Mining"
  • × theme_ss:"Suchmaschinen"
  1. Vaughan, L.; Chen, Y.: Data mining from web search queries : a comparison of Google trends and Baidu index (2015) 0.08
    0.08185689 = product of:
      0.16371378 = sum of:
        0.08959906 = weight(_text_:data in 1605) [ClassicSimilarity], result of:
          0.08959906 = score(doc=1605,freq=24.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.60511017 = fieldWeight in 1605, product of:
              4.8989797 = tf(freq=24.0), with freq of:
                24.0 = termFreq=24.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1605)
        0.07411472 = sum of:
          0.042392377 = weight(_text_:processing in 1605) [ClassicSimilarity], result of:
            0.042392377 = score(doc=1605,freq=2.0), product of:
              0.18956426 = queryWeight, product of:
                4.048147 = idf(docFreq=2097, maxDocs=44218)
                0.046827413 = queryNorm
              0.22363065 = fieldWeight in 1605, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.048147 = idf(docFreq=2097, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1605)
          0.03172234 = weight(_text_:22 in 1605) [ClassicSimilarity], result of:
            0.03172234 = score(doc=1605,freq=2.0), product of:
              0.16398162 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046827413 = queryNorm
              0.19345059 = fieldWeight in 1605, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1605)
      0.5 = coord(2/4)
    
    Abstract
    Numerous studies have explored the possibility of uncovering information from web search queries but few have examined the factors that affect web query data sources. We conducted a study that investigated this issue by comparing Google Trends and Baidu Index. Data from these two services are based on queries entered by users into Google and Baidu, two of the largest search engines in the world. We first compared the features and functions of the two services based on documents and extensive testing. We then carried out an empirical study that collected query volume data from the two sources. We found that data from both sources could be used to predict the quality of Chinese universities and companies. Despite the differences between the two services in terms of technology, such as differing methods of language processing, the search volume data from the two were highly correlated and combining the two data sources did not improve the predictive power of the data. However, there was a major difference between the two in terms of data availability. Baidu Index was able to provide more search volume data than Google Trends did. Our analysis showed that the disadvantage of Google Trends in this regard was due to Google's smaller user base in China. The implication of this finding goes beyond China. Google's user bases in many countries are smaller than that in China, so the search volume data related to those countries could result in the same issue as that related to China.
    Source
    Journal of the Association for Information Science and Technology. 66(2015) no.1, S.13-22
    Theme
    Data Mining
  2. Hölzig, C.: Google spürt Grippewellen auf : Die neue Anwendung ist bisher auf die USA beschränkt (2008) 0.02
    0.016690476 = product of:
      0.03338095 = sum of:
        0.020692015 = weight(_text_:data in 2403) [ClassicSimilarity], result of:
          0.020692015 = score(doc=2403,freq=2.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.1397442 = fieldWeight in 2403, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.03125 = fieldNorm(doc=2403)
        0.012688936 = product of:
          0.025377871 = sum of:
            0.025377871 = weight(_text_:22 in 2403) [ClassicSimilarity], result of:
              0.025377871 = score(doc=2403,freq=2.0), product of:
                0.16398162 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046827413 = queryNorm
                0.15476047 = fieldWeight in 2403, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03125 = fieldNorm(doc=2403)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Date
    3. 5.1997 8:44:22
    Theme
    Data Mining
  3. Liu, Y.; Zhang, M.; Cen, R.; Ru, L.; Ma, S.: Data cleansing for Web information retrieval using query independent features (2007) 0.01
    0.01293251 = product of:
      0.05173004 = sum of:
        0.05173004 = weight(_text_:data in 607) [ClassicSimilarity], result of:
          0.05173004 = score(doc=607,freq=8.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.34936053 = fieldWeight in 607, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=607)
      0.25 = coord(1/4)
    
    Abstract
    Understanding what kinds of Web pages are the most useful for Web search engine users is a critical task in Web information retrieval (IR). Most previous works used hyperlink analysis algorithms to solve this problem. However, little research has been focused on query-independent Web data cleansing for Web IR. In this paper, we first provide analysis of the differences between retrieval target pages and ordinary ones based on more than 30 million Web pages obtained from both the Text Retrieval Conference (TREC) and a widely used Chinese search engine, SOGOU (www.sogou.com). We further propose a learning-based data cleansing algorithm for reducing Web pages that are unlikely to be useful for user requests. We found that there exists a large proportion of low-quality Web pages in both the English and the Chinese Web page corpus, and retrieval target pages can be identified using query-independent features and cleansing algorithms. The experimental results showed that our algorithm is effective in reducing a large portion of Web pages with a small loss in retrieval target pages. It makes it possible for Web IR tools to meet a large fraction of users' needs with only a small part of pages on the Web. These results may help Web search engines make better use of their limited storage and computation resources to improve search performance.
    Theme
    Data Mining
  4. Search tools (1997) 0.01
    0.012802532 = product of:
      0.051210128 = sum of:
        0.051210128 = weight(_text_:data in 3834) [ClassicSimilarity], result of:
          0.051210128 = score(doc=3834,freq=4.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.34584928 = fieldWeight in 3834, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3834)
      0.25 = coord(1/4)
    
    Abstract
    Offers brief accounts of Internet search tools. Covers the Lycos revamp; the new navigation service produced jointly by Excite and Netscape, delivering a language specific, locally relevant Web guide for Japan, Germany, France, the UK and Australia; InfoWatcher, a combination offline browser, search engine and push product from Carvelle Inc., USA; Alexa by Alexa Internet and WBI from IBM which are free and provide users with information on how others have used the Web sites which they are visiting; and Concept Explorer from Knowledge Discovery Systems, Inc., California which performs data mining from the Web, Usenet groups, MEDLINE and the US Patent and Trademark Office patent abstracts
    Theme
    Data Mining
  5. Whittle, M.; Eaglestone, B.; Ford, N.; Gillet, V.J.; Madden, A.: Data mining of search engine logs (2007) 0.01
    0.010973599 = product of:
      0.043894395 = sum of:
        0.043894395 = weight(_text_:data in 1330) [ClassicSimilarity], result of:
          0.043894395 = score(doc=1330,freq=4.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.29644224 = fieldWeight in 1330, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=1330)
      0.25 = coord(1/4)
    
    Theme
    Data Mining
  6. Baeza-Yates, R.; Hurtado, C.; Mendoza, M.: Improving search engines by query clustering (2007) 0.01
    0.009052756 = product of:
      0.036211025 = sum of:
        0.036211025 = weight(_text_:data in 601) [ClassicSimilarity], result of:
          0.036211025 = score(doc=601,freq=2.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.24455236 = fieldWeight in 601, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0546875 = fieldNorm(doc=601)
      0.25 = coord(1/4)
    
    Theme
    Data Mining
  7. Shi, X.; Yang, C.C.: Mining related queries from Web search engine query logs using an improved association rule mining model (2007) 0.01
    0.006466255 = product of:
      0.02586502 = sum of:
        0.02586502 = weight(_text_:data in 597) [ClassicSimilarity], result of:
          0.02586502 = score(doc=597,freq=2.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.17468026 = fieldWeight in 597, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=597)
      0.25 = coord(1/4)
    
    Theme
    Data Mining