Search (132 results, page 1 of 7)

Vaughan, L.; Chen, Y.: Data mining from web search queries : a comparison of Google trends and Baidu index (2015) 0.04
```
0.043052107 = product of:
  0.086104214 = sum of:
    0.07315418 = weight(_text_:data in 1605) [ClassicSimilarity], result of:
      0.07315418 = score(doc=1605,freq=24.0), product of:
        0.120893985 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.03823278 = queryNorm
        0.60511017 = fieldWeight in 1605, product of:
          4.8989797 = tf(freq=24.0), with freq of:
            24.0 = termFreq=24.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1605)
    0.012950035 = product of:
      0.02590007 = sum of:
        0.02590007 = weight(_text_:22 in 1605) [ClassicSimilarity], result of:
          0.02590007 = score(doc=1605,freq=2.0), product of:
            0.13388468 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03823278 = queryNorm
            0.19345059 = fieldWeight in 1605, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1605)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

Numerous studies have explored the possibility of uncovering information from web search queries but few have examined the factors that affect web query data sources. We conducted a study that investigated this issue by comparing Google Trends and Baidu Index. Data from these two services are based on queries entered by users into Google and Baidu, two of the largest search engines in the world. We first compared the features and functions of the two services based on documents and extensive testing. We then carried out an empirical study that collected query volume data from the two sources. We found that data from both sources could be used to predict the quality of Chinese universities and companies. Despite the differences between the two services in terms of technology, such as differing methods of language processing, the search volume data from the two were highly correlated and combining the two data sources did not improve the predictive power of the data. However, there was a major difference between the two in terms of data availability. Baidu Index was able to provide more search volume data than Google Trends did. Our analysis showed that the disadvantage of Google Trends in this regard was due to Google's smaller user base in China. The implication of this finding goes beyond China. Google's user bases in many countries are smaller than that in China, so the search volume data related to those countries could result in the same issue as that related to China.

Source

Journal of the Association for Information Science and Technology. 66(2015) no.1, S.13-22

Theme

Data Mining

Hock, R.E.: How to do field searching in Web search engines : a field trip (1998) 0.03

0.031545527 = product of:
  0.063091055 = sum of:
    0.03378847 = weight(_text_:data in 3601) [ClassicSimilarity], result of:
      0.03378847 = score(doc=3601,freq=2.0), product of:
        0.120893985 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.03823278 = queryNorm
        0.2794884 = fieldWeight in 3601, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0625 = fieldNorm(doc=3601)
    0.029302584 = product of:
      0.058605168 = sum of:
        0.058605168 = weight(_text_:22 in 3601) [ClassicSimilarity], result of:
          0.058605168 = score(doc=3601,freq=4.0), product of:
            0.13388468 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03823278 = queryNorm
            0.4377287 = fieldWeight in 3601, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=3601)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: Explains how 5 Internet search engines (AltaVista, HotBot, InfoSeek, Lycos, and Yahoo) handle field searching. Includes a chart which identifies where on a search engine's page a particular field is searched and the prefix syntax used, and gives examples. Details the individual fields that can be searched: data, title, URL, images, audiovideo and other page content, links and page depth
Source: Online. 22(1998) no.3, S.18-22

Becker, N.J.: Google in perspective: : understanding and enhancing student search skills (2003) 0.03

0.028614787 = product of:
  0.11445915 = sum of:
    0.11445915 = weight(_text_:becker in 2383) [ClassicSimilarity], result of:
      0.11445915 = score(doc=2383,freq=2.0), product of:
        0.25693014 = queryWeight, product of:
          6.7201533 = idf(docFreq=144, maxDocs=44218)
          0.03823278 = queryNorm
        0.44548744 = fieldWeight in 2383, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.7201533 = idf(docFreq=144, maxDocs=44218)
          0.046875 = fieldNorm(doc=2383)
  0.25 = coord(1/4)

Lewandowski, D.; Sünkler, S.: What does Google recommend when you want to compare insurance offerings? (2019) 0.02
```
0.024763562 = product of:
  0.049527124 = sum of:
    0.03657709 = weight(_text_:data in 5288) [ClassicSimilarity], result of:
      0.03657709 = score(doc=5288,freq=6.0), product of:
        0.120893985 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.03823278 = queryNorm
        0.30255508 = fieldWeight in 5288, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5288)
    0.012950035 = product of:
      0.02590007 = sum of:
        0.02590007 = weight(_text_:22 in 5288) [ClassicSimilarity], result of:
          0.02590007 = score(doc=5288,freq=2.0), product of:
            0.13388468 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03823278 = queryNorm
            0.19345059 = fieldWeight in 5288, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5288)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

Purpose The purpose of this paper is to describe a new method to improve the analysis of search engine results by considering the provider level as well as the domain level. This approach is tested by conducting a study using queries on the topic of insurance comparisons. Design/methodology/approach The authors conducted an empirical study that analyses the results of search queries aimed at comparing insurance companies. The authors used a self-developed software system that automatically queries commercial search engines and automatically extracts the content of the returned result pages for further data analysis. The data analysis was carried out using the KNIME Analytics Platform. Findings Google's top search results are served by only a few providers that frequently appear in these results. The authors show that some providers operate several domains on the same topic and that these domains appear for the same queries in the result lists. Research limitations/implications The authors demonstrate the feasibility of this approach and draw conclusions for further investigations from the empirical study. However, the study is a limited use case based on a limited number of search queries. Originality/value The proposed method allows large-scale analysis of the composition of the top results from commercial search engines. It allows using valid empirical data to determine what users actually see on the search engine result pages.

Date

20. 1.2015 18:30:22

Wiley, D.L.: Beyond information retrieval : ways to provide content in context (1998) 0.02

0.02384748 = product of:
  0.04769496 = sum of:
    0.02956491 = weight(_text_:data in 3647) [ClassicSimilarity], result of:
      0.02956491 = score(doc=3647,freq=2.0), product of:
        0.120893985 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.03823278 = queryNorm
        0.24455236 = fieldWeight in 3647, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3647)
    0.01813005 = product of:
      0.0362601 = sum of:
        0.0362601 = weight(_text_:22 in 3647) [ClassicSimilarity], result of:
          0.0362601 = score(doc=3647,freq=2.0), product of:
            0.13388468 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03823278 = queryNorm
            0.2708308 = fieldWeight in 3647, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3647)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: The days of the traditional abstracting and indexing services are waning, as abstracts and bibliographic data become commodities. However, there are tremedous opportunities for those organizations willing to look beyond the status quo to the new possibilities enabled by the latest wave of advanced technologies. Those who own content need to focus on the delivery mechanisms and new markets that technology can provide. Features like automatic extraction of key concepts or names, collaborative filtering to help with trend analysis, and visualization techniques can take information past the retrieval stage and into the management area
Source: Database. 21(1998) no.4, S.18-22

Loia, V.; Pedrycz, W.; Senatore, S.; Sessa, M.I.: Web navigation support by means of proximity-driven assistant agents (2006) 0.02
```
0.021407552 = product of:
  0.042815104 = sum of:
    0.02986507 = weight(_text_:data in 5283) [ClassicSimilarity], result of:
      0.02986507 = score(doc=5283,freq=4.0), product of:
        0.120893985 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.03823278 = queryNorm
        0.24703519 = fieldWeight in 5283, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5283)
    0.012950035 = product of:
      0.02590007 = sum of:
        0.02590007 = weight(_text_:22 in 5283) [ClassicSimilarity], result of:
          0.02590007 = score(doc=5283,freq=2.0), product of:
            0.13388468 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03823278 = queryNorm
            0.19345059 = fieldWeight in 5283, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5283)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

The explosive growth of the Web and the consequent exigency of the Web personalization domain have gained a key position in the direction of customization of the Web information to the needs of specific users, taking advantage of the knowledge acquired from the analysis of the user's navigational behavior (usage data) in correlation with other information collected in the Web context, namely, structure, content, and user profile data. This work presents an agent-based framework designed to help a user in achieving personalized navigation, by recommending related documents according to the user's responses in similar-pages searching mode. Our agent-based approach is grounded in the integration of different techniques and methodologies into a unique platform featuring user profiling, fuzzy multisets, proximity-oriented fuzzy clustering, and knowledge-based discovery technologies. Each of these methodologies serves to solve one facet of the general problem (discovering documents relevant to the user by searching the Web) and is treated by specialized agents that ultimately achieve the final functionality through cooperation and task distribution.

Date

22. 7.2006 16:59:13

Fischer, T.; Neuroth, H.: SSG-FI - special subject gateways to high quality Internet resources for scientific users (2000) 0.02

0.020440696 = product of:
  0.04088139 = sum of:
    0.02534135 = weight(_text_:data in 4873) [ClassicSimilarity], result of:
      0.02534135 = score(doc=4873,freq=2.0), product of:
        0.120893985 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.03823278 = queryNorm
        0.2096163 = fieldWeight in 4873, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046875 = fieldNorm(doc=4873)
    0.015540041 = product of:
      0.031080082 = sum of:
        0.031080082 = weight(_text_:22 in 4873) [ClassicSimilarity], result of:
          0.031080082 = score(doc=4873,freq=2.0), product of:
            0.13388468 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03823278 = queryNorm
            0.23214069 = fieldWeight in 4873, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=4873)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: Project SSG-FI at SUB Göttingen provides special subject gateways to international high quality Internet resources for scientific users. Internet sites are selected by subject specialists and described using an extension of qualified Dublin Core metadata. A basic evaluation is added. These descriptions are freely available and can be searched and browsed. These are now subject gateways for 3 subject ares: earth sciences (GeoGuide); mathematics (MathGuide); and Anglo-American culture (split into HistoryGuide and AnglistikGuide). Together they receive about 3.300 'hard' requests per day, thus reaching over 1 million requests per year. The project SSG-FI behind these guides is open to collaboration. Institutions and private persons wishing to contribute can notify the SSG-FI team or send full data sets. Regular contributors can request registration with the project to access the database via the Internet and create and edit records
Date: 22. 6.2002 19:40:42

Su, L.T.: ¬A comprehensive and systematic model of user evaluation of Web search engines : Il. An evaluation by undergraduates (2003) 0.02
```
0.017033914 = product of:
  0.03406783 = sum of:
    0.021117793 = weight(_text_:data in 2117) [ClassicSimilarity], result of:
      0.021117793 = score(doc=2117,freq=2.0), product of:
        0.120893985 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.03823278 = queryNorm
        0.17468026 = fieldWeight in 2117, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2117)
    0.012950035 = product of:
      0.02590007 = sum of:
        0.02590007 = weight(_text_:22 in 2117) [ClassicSimilarity], result of:
          0.02590007 = score(doc=2117,freq=2.0), product of:
            0.13388468 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03823278 = queryNorm
            0.19345059 = fieldWeight in 2117, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2117)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

This paper presents an application of the model described in Part I to the evaluation of Web search engines by undergraduates. The study observed how 36 undergraduate used four major search engines to find information for their own individual problems and how they evaluated these engines based an actual interaction with the search engines. User evaluation was based an 16 performance measures representing five evaluation criteria: relevance, efficiency, utility, user satisfaction, and connectivity. Non-performance (user-related) measures were also applied. Each participant searched his/ her own topic an all four engines and provided satisfaction ratings for system features and interaction and reasons for satisfaction. Each also made relevance judgements of retrieved items in relation to his/her own information need and participated in post-search Interviews to provide reactions to the search results and overall performance. The study found significant differences in precision PR1 relative recall, user satisfaction with output display, time saving, value of search results, and overall performance among the four engines and also significant engine by discipline interactions an all these measures. In addition, the study found significant differences in user satisfaction with response time among four engines, and significant engine by discipline interaction in user satisfaction with search interface. None of the four search engines dominated in every aspect of the multidimensional evaluation. Content analysis of verbal data identified a number of user criteria and users evaluative comments based an these criteria. Results from both quantitative analysis and content analysis provide insight for system design and development, and useful feedback an strengths and weaknesses of search engines for system improvement

Date

24. 1.2004 18:27:22
Baeza-Yates, R.; Boldi, P.; Castillo, C.: Generalizing PageRank : damping functions for linkbased ranking algorithms (2006) 0.02
```
0.017033914 = product of:
  0.03406783 = sum of:
    0.021117793 = weight(_text_:data in 2565) [ClassicSimilarity], result of:
      0.021117793 = score(doc=2565,freq=2.0), product of:
        0.120893985 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.03823278 = queryNorm
        0.17468026 = fieldWeight in 2565, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2565)
    0.012950035 = product of:
      0.02590007 = sum of:
        0.02590007 = weight(_text_:22 in 2565) [ClassicSimilarity], result of:
          0.02590007 = score(doc=2565,freq=2.0), product of:
            0.13388468 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03823278 = queryNorm
            0.19345059 = fieldWeight in 2565, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2565)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

This paper introduces a family of link-based ranking algorithms that propagate page importance through links. In these algorithms there is a damping function that decreases with distance, so a direct link implies more endorsement than a link through a long path. PageRank is the most widely known ranking function of this family. The main objective of this paper is to determine whether this family of ranking techniques has some interest per se, and how different choices for the damping function impact on rank quality and on convergence speed. Even though our results suggest that PageRank can be approximated with other simpler forms of rankings that may be computed more efficiently, our focus is of more speculative nature, in that it aims at separating the kernel of PageRank, that is, link-based importance propagation, from the way propagation decays over paths. We focus on three damping functions, having linear, exponential, and hyperbolic decay on the lengths of the paths. The exponential decay corresponds to PageRank, and the other functions are new. Our presentation includes algorithms, analysis, comparisons and experiments that study their behavior under different parameters in real Web graph data. Among other results, we show how to calculate a linear approximation that induces a page ordering that is almost identical to PageRank's using a fixed small number of iterations; comparisons were performed using Kendall's tau on large domain datasets.

Date

16. 1.2016 10:22:28
Roux, M.: Metadata for search engines : what can be learned from e-Sciences? (2012) 0.02
```
0.01676173 = product of:
  0.06704692 = sum of:
    0.06704692 = weight(_text_:data in 96) [ClassicSimilarity], result of:
      0.06704692 = score(doc=96,freq=14.0), product of:
        0.120893985 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.03823278 = queryNorm
        0.55459267 = fieldWeight in 96, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046875 = fieldNorm(doc=96)
  0.25 = coord(1/4)
```
Abstract

E-sciences are data-intensive sciences that make a large use of the Web to share, collect, and process data. In this context, primary scientific data is becoming a new challenging issue as data must be extensively described (1) to account for empiric conditions and results that allow interpretation and/or analyses and (2) to be understandable by computers used for data storage and information retrieval. With this respect, metadata is a focal point whatever it is considered from the point of view of the user to visualize and exploit data as well as this of the search tools to find and retrieve information. Numerous disciplines are concerned with the issues of describing complex observations and addressing pertinent knowledge. In this paper, similarities and differences in data description and exploration strategies among disciplines in e-sciences are examined.
Drabenstott, K.M.: Web search strategies (2000) 0.01
```
0.013627131 = product of:
  0.027254261 = sum of:
    0.016894234 = weight(_text_:data in 1188) [ClassicSimilarity], result of:
      0.016894234 = score(doc=1188,freq=2.0), product of:
        0.120893985 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.03823278 = queryNorm
        0.1397442 = fieldWeight in 1188, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.03125 = fieldNorm(doc=1188)
    0.010360028 = product of:
      0.020720055 = sum of:
        0.020720055 = weight(_text_:22 in 1188) [ClassicSimilarity], result of:
          0.020720055 = score(doc=1188,freq=2.0), product of:
            0.13388468 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03823278 = queryNorm
            0.15476047 = fieldWeight in 1188, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=1188)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

Surfing the World Wide Web used to be cool, dude, real cool. But things have gotten hot - so hot that finding something useful an the Web is no longer cool. It is suffocating Web searchers in the smoke and debris of mountain-sized lists of hits, decisions about which search engines they should use, whether they will get lost in the dizzying maze of a subject directory, use the right syntax for the search engine at hand, enter keywords that are likely to retrieve hits an the topics they have in mind, or enlist a browser that has sufficient functionality to display the most promising hits. When it comes to Web searching, in a few short years we have gone from the cool image of surfing the Web into the frying pan of searching the Web. We can turn down the heat by rethinking what Web searchers are doing and introduce some order into the chaos. Web search strategies that are tool-based-oriented to specific Web searching tools such as search en gines, subject directories, and meta search engines-have been widely promoted, and these strategies are just not working. It is time to dissect what Web searching tools expect from searchers and adjust our search strategies to these new tools. This discussion offers Web searchers help in the form of search strategies that are based an strategies that librarians have been using for a long time to search commercial information retrieval systems like Dialog, NEXIS, Wilsonline, FirstSearch, and Data-Star.

Date

22. 9.1997 19:16:05
Hogan, A.; Harth, A.; Umbrich, J.; Kinsella, S.; Polleres, A.; Decker, S.: Searching and browsing Linked Data with SWSE : the Semantic Web Search Engine (2011) 0.01
```
0.012931953 = product of:
  0.051727813 = sum of:
    0.051727813 = weight(_text_:data in 438) [ClassicSimilarity], result of:
      0.051727813 = score(doc=438,freq=12.0), product of:
        0.120893985 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.03823278 = queryNorm
        0.4278775 = fieldWeight in 438, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=438)
  0.25 = coord(1/4)
```
Abstract

In this paper, we discuss the architecture and implementation of the Semantic Web Search Engine (SWSE). Following traditional search engine architecture, SWSE consists of crawling, data enhancing, indexing and a user interface for search, browsing and retrieval of information; unlike traditional search engines, SWSE operates over RDF Web data - loosely also known as Linked Data - which implies unique challenges for the system design, architecture, algorithms, implementation and user interface. In particular, many challenges exist in adopting Semantic Web technologies for Web data: the unique challenges of the Web - in terms of scale, unreliability, inconsistency and noise - are largely overlooked by the current Semantic Web standards. Herein, we describe the current SWSE system, initially detailing the architecture and later elaborating upon the function, design, implementation and performance of each individual component. In so doing, we also give an insight into how current Semantic Web standards can be tailored, in a best-effort manner, for use on Web data. Throughout, we offer evaluation and complementary argumentation to support our design choices, and also offer discussion on future directions and open research questions. Later, we also provide candid discussion relating to the difficulties currently faced in bringing such a search engine into the mainstream, and lessons learnt from roughly six years working on the Semantic Web Search Engine project.
What is Schema.org? (2011) 0.01
```
0.012670675 = product of:
  0.0506827 = sum of:
    0.0506827 = weight(_text_:data in 4437) [ClassicSimilarity], result of:
      0.0506827 = score(doc=4437,freq=8.0), product of:
        0.120893985 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.03823278 = queryNorm
        0.4192326 = fieldWeight in 4437, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046875 = fieldNorm(doc=4437)
  0.25 = coord(1/4)
```
Abstract

This site provides a collection of schemas, i.e., html tags, that webmasters can use to markup their pages in ways recognized by major search providers. Search engines including Bing, Google and Yahoo! rely on this markup to improve the display of search results, making it easier for people to find the right web pages. Many sites are generated from structured data, which is often stored in databases. When this data is formatted into HTML, it becomes very difficult to recover the original structured data. Many applications, especially search engines, can benefit greatly from direct access to this structured data. On-page markup enables search engines to understand the information on web pages and provide richer search results in order to make it easier for users to find relevant information on the web. Markup can also enable new tools and applications that make use of the structure. A shared markup vocabulary makes easier for webmasters to decide on a markup schema and get the maximum benefit for their efforts. So, in the spirit of sitemaps.org, Bing, Google and Yahoo! have come together to provide a shared collection of schemas that webmasters can use.
Das, A.; Jain, A.: Indexing the World Wide Web : the journey so far (2012) 0.01
```
0.010973128 = product of:
  0.04389251 = sum of:
    0.04389251 = weight(_text_:data in 95) [ClassicSimilarity], result of:
      0.04389251 = score(doc=95,freq=6.0), product of:
        0.120893985 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.03823278 = queryNorm
        0.3630661 = fieldWeight in 95, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046875 = fieldNorm(doc=95)
  0.25 = coord(1/4)
```
Abstract

In this chapter, the authors describe the key indexing components of today's web search engines. As the World Wide Web has grown, the systems and methods for indexing have changed significantly. The authors present the data structures used, the features extracted, the infrastructure needed, and the options available for designing a brand new search engine. Techniques are highlighted that improve relevance of results, discuss trade-offs to best utilize machine resources, and cover distributed processing concepts in this context. In particular, the authors delve into the topics of indexing phrases instead of terms, storage in memory vs. on disk, and data partitioning. Some thoughts on information organization for the newly emerging data-forms conclude the chapter.
Vaughan, L.; Romero-Frías, E.: Web search volume as a predictor of academic fame : an exploration of Google trends (2014) 0.01
```
0.010973128 = product of:
  0.04389251 = sum of:
    0.04389251 = weight(_text_:data in 1233) [ClassicSimilarity], result of:
      0.04389251 = score(doc=1233,freq=6.0), product of:
        0.120893985 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.03823278 = queryNorm
        0.3630661 = fieldWeight in 1233, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046875 = fieldNorm(doc=1233)
  0.25 = coord(1/4)
```
Abstract

Searches conducted on web search engines reflect the interests of users and society. Google Trends, which provides information about the queries searched by users of the Google web search engine, is a rich data source from which a wealth of information can be mined. We investigated the possibility of using web search volume data from Google Trends to predict academic fame. As queries are language-dependent, we studied universities from two countries with different languages, the United States and Spain. We found a significant correlation between the search volume of a university name and the university's academic reputation or fame. We also examined the effect of some Google Trends features, namely, limiting the search to a specific country or topic category on the search volume data. Finally, we examined the effect of university sizes on the correlations found to gain a deeper understanding of the nature of the relationships.
Jepsen, E.T.; Seiden, P.; Ingwersen, P.; Björneborn, L.; Borlund, P.: Characteristics of scientific Web publications : preliminary data gathering and analysis (2004) 0.01
```
0.010558897 = product of:
  0.042235587 = sum of:
    0.042235587 = weight(_text_:data in 3091) [ClassicSimilarity], result of:
      0.042235587 = score(doc=3091,freq=8.0), product of:
        0.120893985 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.03823278 = queryNorm
        0.34936053 = fieldWeight in 3091, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3091)
  0.25 = coord(1/4)
```
Abstract

Because of the increasing presence of scientific publications an the Web, combined with the existing difficulties in easily verifying and retrieving these publications, research an techniques and methods for retrieval of scientific Web publications is called for. In this article, we report an the initial steps taken toward the construction of a test collection of scientific Web publications within the subject domain of plant biology. The steps reported are those of data gathering and data analysis aiming at identifying characteristics of scientific Web publications. The data used in this article were generated based an specifically selected domain topics that are searched for in three publicly accessible search engines (Google, AlITheWeb, and AItaVista). A sample of the retrieved hits was analyzed with regard to how various publication attributes correlated with the scientific quality of the content and whether this information could be employed to harvest, filter, and rank Web publications. The attributes analyzed were inlinks, outlinks, bibliographic references, file format, language, search engine overlap, structural position (according to site structure), and the occurrence of various types of metadata. As could be expected, the ranked output differs between the three search engines. Apparently, this is caused by differences in ranking algorithms rather than the databases themselves. In fact, because scientific Web content in this subject domain receives few inlinks, both AItaVista and AlITheWeb retrieved a higher degree of accessible scientific content than Google. Because of the search engine cutoffs of accessible URLs, the feasibility of using search engine output for Web content analysis is also discussed.
Rotenberg, B.: Towards personalised search : EU Data Protection Law and its implications for media pluralism (2007) 0.01
```
0.010558897 = product of:
  0.042235587 = sum of:
    0.042235587 = weight(_text_:data in 373) [ClassicSimilarity], result of:
      0.042235587 = score(doc=373,freq=8.0), product of:
        0.120893985 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.03823278 = queryNorm
        0.34936053 = fieldWeight in 373, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=373)
  0.25 = coord(1/4)
```
Abstract

On 17 March 2006, Google, the major web search engine, won a partial victory in its legal battle against the United States government. In an attempt to enforce the 1998 Child Online Protection Act, the us government had asked it to provide one million web addresses or URLs that are accessible through Google, as well as 5,000 users' search queries. In Gonzales v. Google, a California District Court ruled that Google did not have to comply fully with the us government's request: Google did not need to disclose a single search query, and was not required to provide more than 50.000 web addresses. However, it soon appeared that Microsoft, AOL and Yahoo! had handed over the information requested by the government in that instance, and in the course of this case all search engines publicly admitted massive user data collection. It turns out that all major search engines are able to provide a list of IP addresses with the actual search queries made, and vice versa. Scarcely five months later, AOL's search engine logs were the subject of yet another round of data protection concerns. There was a public outcry when it became known that it had published 21 million search queries, that is, the search histories of more than 650,000 of its users. While AOL's intentions were laudable (namely supporting research in user behaviour), it emerged that making the link between the unique ID supplied for a given user and the real-world identity was not all that difficult. Both these cases are milestones in raising awareness of the importance of data protection in relation to web search.
Liu, Y.; Zhang, M.; Cen, R.; Ru, L.; Ma, S.: Data cleansing for Web information retrieval using query independent features (2007) 0.01
```
0.010558897 = product of:
  0.042235587 = sum of:
    0.042235587 = weight(_text_:data in 607) [ClassicSimilarity], result of:
      0.042235587 = score(doc=607,freq=8.0), product of:
        0.120893985 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.03823278 = queryNorm
        0.34936053 = fieldWeight in 607, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=607)
  0.25 = coord(1/4)
```
Abstract

Understanding what kinds of Web pages are the most useful for Web search engine users is a critical task in Web information retrieval (IR). Most previous works used hyperlink analysis algorithms to solve this problem. However, little research has been focused on query-independent Web data cleansing for Web IR. In this paper, we first provide analysis of the differences between retrieval target pages and ordinary ones based on more than 30 million Web pages obtained from both the Text Retrieval Conference (TREC) and a widely used Chinese search engine, SOGOU (www.sogou.com). We further propose a learning-based data cleansing algorithm for reducing Web pages that are unlikely to be useful for user requests. We found that there exists a large proportion of low-quality Web pages in both the English and the Chinese Web page corpus, and retrieval target pages can be identified using query-independent features and cleansing algorithms. The experimental results showed that our algorithm is effective in reducing a large portion of Web pages with a small loss in retrieval target pages. It makes it possible for Web IR tools to meet a large fraction of users' needs with only a small part of pages on the Web. These results may help Web search engines make better use of their limited storage and computation resources to improve search performance.

Theme

Data Mining
Blake, P.: Searching out and assessing Web sites (1996) 0.01
```
0.010452774 = product of:
  0.041811097 = sum of:
    0.041811097 = weight(_text_:data in 4095) [ClassicSimilarity], result of:
      0.041811097 = score(doc=4095,freq=4.0), product of:
        0.120893985 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.03823278 = queryNorm
        0.34584928 = fieldWeight in 4095, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4095)
  0.25 = coord(1/4)
```
Abstract

Describes 4 search engines for the Internet: infoMarket Search; Yahoo and OpenText; Lycos Spider; and WebCompass. InfoMarket Search retrieves data from Web pages and information providers such as Disclosure, Information Access Company and Cambridge Scientific Abstracts. It is able to search millions of Web pages in under five seconds. Automated 'crawlers' index the complete text of Web documents. Yahoo enables users to search for specific words and phrases and conduct multilevel Boolean and weighted searches. Lycos spider offers support for HotJava and indexes 91% of the Web. WebCompass polls multiple search engines such as Lycos and InfoSeek for relevant Web pages. A personalized index of topics may be built and retrieved data stored in a format based on Microsoft Access 2.0
Search tools (1997) 0.01
```
0.010452774 = product of:
  0.041811097 = sum of:
    0.041811097 = weight(_text_:data in 3834) [ClassicSimilarity], result of:
      0.041811097 = score(doc=3834,freq=4.0), product of:
        0.120893985 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.03823278 = queryNorm
        0.34584928 = fieldWeight in 3834, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3834)
  0.25 = coord(1/4)
```
Abstract

Offers brief accounts of Internet search tools. Covers the Lycos revamp; the new navigation service produced jointly by Excite and Netscape, delivering a language specific, locally relevant Web guide for Japan, Germany, France, the UK and Australia; InfoWatcher, a combination offline browser, search engine and push product from Carvelle Inc., USA; Alexa by Alexa Internet and WBI from IBM which are free and provide users with information on how others have used the Web sites which they are visiting; and Concept Explorer from Knowledge Discovery Systems, Inc., California which performs data mining from the Web, Usenet groups, MEDLINE and the US Patent and Trademark Office patent abstracts

Theme

Data Mining

Search (132 results, page 1 of 7)

Authors

Years

Types

Themes

Subjects

Classifications