Search (4 results, page 1 of 1)

  • × theme_ss:"Data Mining"
  • × theme_ss:"Suchmaschinen"
  1. Shi, X.; Yang, C.C.: Mining related queries from Web search engine query logs using an improved association rule mining model (2007) 0.01
    0.011021203 = product of:
      0.055106014 = sum of:
        0.055106014 = weight(_text_:needs in 597) [ClassicSimilarity], result of:
          0.055106014 = score(doc=597,freq=2.0), product of:
            0.233039 = queryWeight, product of:
              4.2805085 = idf(docFreq=1662, maxDocs=44218)
              0.0544419 = queryNorm
            0.23646691 = fieldWeight in 597, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.2805085 = idf(docFreq=1662, maxDocs=44218)
              0.0390625 = fieldNorm(doc=597)
      0.2 = coord(1/5)
    
    Abstract
    With the overwhelming volume of information, the task of finding relevant information on a given topic on the Web is becoming increasingly difficult. Web search engines hence become one of the most popular solutions available on the Web. However, it has never been easy for novice users to organize and represent their information needs using simple queries. Users have to keep modifying their input queries until they get expected results. Therefore, it is often desirable for search engines to give suggestions on related queries to users. Besides, by identifying those related queries, search engines can potentially perform optimizations on their systems, such as query expansion and file indexing. In this work we propose a method that suggests a list of related queries given an initial input query. The related queries are based in the query log of previously submitted queries by human users, which can be identified using an enhanced model of association rules. Users can utilize the suggested related queries to tune or redirect the search process. Our method not only discovers the related queries, but also ranks them according to the degree of their relatedness. Unlike many other rival techniques, it also performs reasonably well on less frequent input queries.
  2. Liu, Y.; Zhang, M.; Cen, R.; Ru, L.; Ma, S.: Data cleansing for Web information retrieval using query independent features (2007) 0.01
    0.011021203 = product of:
      0.055106014 = sum of:
        0.055106014 = weight(_text_:needs in 607) [ClassicSimilarity], result of:
          0.055106014 = score(doc=607,freq=2.0), product of:
            0.233039 = queryWeight, product of:
              4.2805085 = idf(docFreq=1662, maxDocs=44218)
              0.0544419 = queryNorm
            0.23646691 = fieldWeight in 607, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.2805085 = idf(docFreq=1662, maxDocs=44218)
              0.0390625 = fieldNorm(doc=607)
      0.2 = coord(1/5)
    
    Abstract
    Understanding what kinds of Web pages are the most useful for Web search engine users is a critical task in Web information retrieval (IR). Most previous works used hyperlink analysis algorithms to solve this problem. However, little research has been focused on query-independent Web data cleansing for Web IR. In this paper, we first provide analysis of the differences between retrieval target pages and ordinary ones based on more than 30 million Web pages obtained from both the Text Retrieval Conference (TREC) and a widely used Chinese search engine, SOGOU (www.sogou.com). We further propose a learning-based data cleansing algorithm for reducing Web pages that are unlikely to be useful for user requests. We found that there exists a large proportion of low-quality Web pages in both the English and the Chinese Web page corpus, and retrieval target pages can be identified using query-independent features and cleansing algorithms. The experimental results showed that our algorithm is effective in reducing a large portion of Web pages with a small loss in retrieval target pages. It makes it possible for Web IR tools to meet a large fraction of users' needs with only a small part of pages on the Web. These results may help Web search engines make better use of their limited storage and computation resources to improve search performance.
  3. Vaughan, L.; Chen, Y.: Data mining from web search queries : a comparison of Google trends and Baidu index (2015) 0.01
    0.0073761265 = product of:
      0.03688063 = sum of:
        0.03688063 = weight(_text_:22 in 1605) [ClassicSimilarity], result of:
          0.03688063 = score(doc=1605,freq=2.0), product of:
            0.19064626 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0544419 = queryNorm
            0.19345059 = fieldWeight in 1605, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1605)
      0.2 = coord(1/5)
    
    Source
    Journal of the Association for Information Science and Technology. 66(2015) no.1, S.13-22
  4. Hölzig, C.: Google spürt Grippewellen auf : Die neue Anwendung ist bisher auf die USA beschränkt (2008) 0.01
    0.005900901 = product of:
      0.029504504 = sum of:
        0.029504504 = weight(_text_:22 in 2403) [ClassicSimilarity], result of:
          0.029504504 = score(doc=2403,freq=2.0), product of:
            0.19064626 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0544419 = queryNorm
            0.15476047 = fieldWeight in 2403, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=2403)
      0.2 = coord(1/5)
    
    Date
    3. 5.1997 8:44:22

Authors

Languages