Search (76 results, page 2 of 4)

Warnick, W.L.; Leberman, A.; Scott, R.L.; Spence, K.J.; Johnsom, L.A.; Allen, V.S.: Searching the deep Web : directed query engine applications at the Department of Energy (2001) 0.00
```
0.0024857575 = product of:
  0.004971515 = sum of:
    0.004971515 = product of:
      0.00994303 = sum of:
        0.00994303 = weight(_text_:a in 1215) [ClassicSimilarity], result of:
          0.00994303 = score(doc=1215,freq=12.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.18723148 = fieldWeight in 1215, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=1215)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Directed Query Engines, an emerging class of search engine specifically designed to access distributed resources on the deep web, offer the opportunity to create inexpensive digital libraries. Already, one such engine, Distributed Explorer, has been used to select and assemble high quality information resources and incorporate them into publicly available systems for the physical sciences. By nesting Directed Query Engines so that one query launches several other engines in a cascading fashion, enormous virtual collections may soon be assembled to form a comprehensive information infrastructure for the physical sciences. Once a Directed Query Engine has been configured for a set of information resources, distributed alerts tools can provide patrons with personalized, profile-based notices of recent additions to any of the selected resources. Due to the potentially enormous size and scope of Directed Query Engine applications, consideration must be given to issues surrounding the representation of large quantities of information from multiple, heterogeneous sources.

Type

a
Haynes, M.: Your Google algorithm cheat sheet : Panda, Penguin, and Hummingbird (2013) 0.00
```
0.0024857575 = product of:
  0.004971515 = sum of:
    0.004971515 = product of:
      0.00994303 = sum of:
        0.00994303 = weight(_text_:a in 2542) [ClassicSimilarity], result of:
          0.00994303 = score(doc=2542,freq=12.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.18723148 = fieldWeight in 2542, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=2542)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

If you're reading the Moz blog, then you probably have a decent understanding of Google and its algorithm changes. However, there is probably a good percentage of the Moz audience that is still confused about the effects that Panda, Penguin, and Hummingbird can have on your site. I did write a post last year about the main differences between Penguin and a Manual Unnautral Links Penalty, and if you haven't read that, it'll give you a good primer. The point of this article is to explain very simply what each of these algorithms are meant to do. It is hopefully a good reference that you can point your clients to if you want to explain an algorithm change and not overwhelm them with technical details about 301s, canonicals, crawl errors, and other confusing SEO terminologies.
Austin, D.: How Google finds your needle in the Web's haystack : as we'll see, the trick is to ask the web itself to rank the importance of pages... (2006) 0.00
```
0.002440756 = product of:
  0.004881512 = sum of:
    0.004881512 = product of:
      0.009763024 = sum of:
        0.009763024 = weight(_text_:a in 93) [ClassicSimilarity], result of:
          0.009763024 = score(doc=93,freq=34.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.1838419 = fieldWeight in 93, product of:
              5.8309517 = tf(freq=34.0), with freq of:
                34.0 = termFreq=34.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.02734375 = fieldNorm(doc=93)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Imagine a library containing 25 billion documents but with no centralized organization and no librarians. In addition, anyone may add a document at any time without telling anyone. You may feel sure that one of the documents contained in the collection has a piece of information that is vitally important to you, and, being impatient like most of us, you'd like to find it in a matter of seconds. How would you go about doing it? Posed in this way, the problem seems impossible. Yet this description is not too different from the World Wide Web, a huge, highly-disorganized collection of documents in many different formats. Of course, we're all familiar with search engines (perhaps you found this article using one) so we know that there is a solution. This article will describe Google's PageRank algorithm and how it returns pages from the web's collection of 25 billion documents that match search criteria so well that "google" has become a widely used verb. Most search engines, including Google, continually run an army of computer programs that retrieve pages from the web, index the words in each document, and store this information in an efficient format. Each time a user asks for a web search using a search phrase, such as "search engine," the search engine determines all the pages on the web that contains the words in the search phrase. (Perhaps additional information such as the distance between the words "search" and "engine" will be noted as well.) Here is the problem: Google now claims to index 25 billion pages. Roughly 95% of the text in web pages is composed from a mere 10,000 words. This means that, for most searches, there will be a huge number of pages containing the words in the search phrase. What is needed is a means of ranking the importance of the pages that fit the search criteria so that the pages can be sorted with the most important pages at the top of the list. One way to determine the importance of pages is to use a human-generated ranking. For instance, you may have seen pages that consist mainly of a large number of links to other resources in a particular area of interest. Assuming the person maintaining this page is reliable, the pages referenced are likely to be useful. Of course, the list may quickly fall out of date, and the person maintaining the list may miss some important pages, either unintentionally or as a result of an unstated bias. Google's PageRank algorithm assesses the importance of web pages without human evaluation of the content. In fact, Google feels that the value of its service is largely in its ability to provide unbiased results to search queries; Google claims, "the heart of our software is PageRank." As we'll see, the trick is to ask the web itself to rank the importance of pages.

Tomaiuolo, N.G.; Packer, J.G.: Quantitative analysis of five WWW 'search engines' (1996) 0.00

0.0023919214 = product of:
  0.0047838427 = sum of:
    0.0047838427 = product of:
      0.009567685 = sum of:
        0.009567685 = weight(_text_:a in 5675) [ClassicSimilarity], result of:
          0.009567685 = score(doc=5675,freq=4.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.18016359 = fieldWeight in 5675, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.078125 = fieldNorm(doc=5675)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Abstract: Provides a table of the results from over 100 questions actually asked at a library reference desk: The summary notes the average number of relevant 'hits' for all investigated search engines are: AltaVista: 9.3; InfoSeek: 8.3; Lycos: 8.1; Magellan: 7.8; Point: 2.1

Bradley, P.: ¬The relevance of underpants to searching the Web (2000) 0.00

0.0023678814 = product of:
  0.0047357627 = sum of:
    0.0047357627 = product of:
      0.009471525 = sum of:
        0.009471525 = weight(_text_:a in 3961) [ClassicSimilarity], result of:
          0.009471525 = score(doc=3961,freq=2.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.17835285 = fieldWeight in 3961, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.109375 = fieldNorm(doc=3961)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Type: a

Hughes, T.; Acharya, A.: ¬An interview with Anurag Acharya, Google Scholar lead engineer 0.00
```
0.0023678814 = product of:
  0.0047357627 = sum of:
    0.0047357627 = product of:
      0.009471525 = sum of:
        0.009471525 = weight(_text_:a in 94) [ClassicSimilarity], result of:
          0.009471525 = score(doc=94,freq=8.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.17835285 = fieldWeight in 94, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=94)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

When I interned at Google last summer after getting my MSI degree, I worked on projects for the Book Search and Google Scholar teams. I didn't know it at the time, but in completing my research over the course of the summer, I would become the resident expert on how universities were approaching Google Scholar as a research tool and how they were implementing Scholar on their library websites. Now working at an academic library, I seized a recent opportunity to sit down with Anurag Acharya, Google Scholar's founding engineer, to delve a little deeper into how Scholar features are developed and prioritized, what Scholar's scope and aims are, and where the product is headed. -Tracey Hughes, GIS Coordinator, Social Sciences & Humanities Library, University of California San Diego
Lossau, N.: Search engine technology and digital libraries : libraries need to discover the academic internet (2004) 0.00
```
0.0023678814 = product of:
  0.0047357627 = sum of:
    0.0047357627 = product of:
      0.009471525 = sum of:
        0.009471525 = weight(_text_:a in 1161) [ClassicSimilarity], result of:
          0.009471525 = score(doc=1161,freq=8.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.17835285 = fieldWeight in 1161, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1161)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

With the development of the World Wide Web, the "information search" has grown to be a significant business sector of a global, competitive and commercial market. Powerful players have entered this market, such as commercial internet search engines, information portals, multinational publishers and online content integrators. Will Google, Yahoo or Microsoft be the only portals to global knowledge in 2010? If libraries do not want to become marginalized in a key area of their traditional services, they need to acknowledge the challenges that come with the globalisation of scholarly information, the existence and further growth of the academic internet

Type

a

Hodson, H.: Google's fact-checking bots build vast knowledge bank (2014) 0.00

0.0023435948 = product of:
  0.0046871896 = sum of:
    0.0046871896 = product of:
      0.009374379 = sum of:
        0.009374379 = weight(_text_:a in 1700) [ClassicSimilarity], result of:
          0.009374379 = score(doc=1700,freq=6.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.17652355 = fieldWeight in 1700, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=1700)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Abstract: The search giant is automatically building Knowledge Vault, a massive database that could give us unprecedented access to the world's facts GOOGLE is building the largest store of knowledge in human history - and it's doing so without any human help. Instead, Knowledge Vault autonomously gathers and merges information from across the web into a single base of facts about the world, and the people and objects in it.
Type: a

Smith, A.G.: Search features of digital libraries (2000) 0.00
```
0.002269176 = product of:
  0.004538352 = sum of:
    0.004538352 = product of:
      0.009076704 = sum of:
        0.009076704 = weight(_text_:a in 940) [ClassicSimilarity], result of:
          0.009076704 = score(doc=940,freq=10.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.1709182 = fieldWeight in 940, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=940)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Traditional on-line search services such as Dialog, DataStar and Lexis provide a wide range of search features (boolean and proximity operators, truncation, etc). This paper discusses the use of these features for effective searching, and argues that these features are required, regardless of advances in search engine technology. The literature on on-line searching is reviewed, identifying features that searchers find desirable for effective searching. A selective survey of current digital libraries available on the Web was undertaken, identifying which search features are present. The survey indicates that current digital libraries do not implement a wide range of search features. For instance: under half of the examples included controlled vocabulary, under half had proximity searching, only one enabled browsing of term indexes, and none of the digital libraries enable searchers to refine an initial search. Suggestions are made for enhancing the search effectiveness of digital libraries; for instance, by providing a full range of search operators, enabling browsing of search terms, enhancement of records with controlled vocabulary, enabling the refining of initial searches, etc.

Type

a
Bladow, N.; Dorey, C.; Frederickson, L.; Grover, P.; Knudtson, Y.; Krishnamurthy, S.; Lazarou, V.: What's the Buzz about? : An empirical examination of Search on Yahoo! (2005) 0.00
```
0.002269176 = product of:
  0.004538352 = sum of:
    0.004538352 = product of:
      0.009076704 = sum of:
        0.009076704 = weight(_text_:a in 3072) [ClassicSimilarity], result of:
          0.009076704 = score(doc=3072,freq=10.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.1709182 = fieldWeight in 3072, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=3072)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

We present an analysis of the Yahoo Buzz Index over a period of 45 weeks. Our key findings are that: (1) It is most common for a search term to show up on the index for one week, followed by two weeks, three weeks, etc. Only two terms persist for all 45 weeks studied - Britney Spears and Jennifer Lopez. Search term longevity follows a power-law distribution or a winner-take-all structure; (2) Most search terms focus on entertainment. Search terms related to serious topics are found less often. The Buzz Index does not necessarily follow the "news cycle"; and, (3) We provide two ways to determine "star power" of various search terms - one that emphasizes staying power on the Index and another that emphasizes rank. In general, the methods lead to dramatically different results. Britney Spears performs well in both methods. We conclude that the data available on the Index is symptomatic of a celebrity-crazed, entertainment-centered culture.
Khare, R.; Cutting, D.; Sitaker, K.; Rifkin, A.: Nutch: a flexible and scalable open-source Web search engine (2004) 0.00
```
0.002269176 = product of:
  0.004538352 = sum of:
    0.004538352 = product of:
      0.009076704 = sum of:
        0.009076704 = weight(_text_:a in 852) [ClassicSimilarity], result of:
          0.009076704 = score(doc=852,freq=10.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.1709182 = fieldWeight in 852, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=852)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Nutch is an open-source Web search engine that can be used at global, local, and even personal scale. Its initial design goal was to enable a transparent alternative for global Web search in the public interest - one of its signature features is the ability to "explain" its result rankings. Recent work has emphasized how it can also be used for intranets; by local communities with richer data models, such as the Creative Commons metadata-enabled search for licensed content; on a personal scale to index a user's files, email, and web-surfing history; and we also report on several other research projects built on Nutch. In this paper, we present how the architecture of the Nutch system enables it to be more flexible and scalable than other comparable systems today.
Schomburg, S.; Prante, J.: Search Engine Federation in Libraries - Suchmaschinenföderation in Bibliotheken (2009) 0.00
```
0.002269176 = product of:
  0.004538352 = sum of:
    0.004538352 = product of:
      0.009076704 = sum of:
        0.009076704 = weight(_text_:a in 2809) [ClassicSimilarity], result of:
          0.009076704 = score(doc=2809,freq=10.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.1709182 = fieldWeight in 2809, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=2809)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The hbz (Academic Library Center, Cologne) has a strong focus on search engine applications: Beyond the projected integration of respective technologies into the new release of the Digital Library portal solution (DigiBib6), vascoda background services also apply and take advantage of search engine technology. Experience since 2003 has given proof that building and updating of search engine indexes involves a vast amount of resources. The use of search engine federations, however, pledges major improvements: The total amount of data records held in linked indexes can be almost unlimited but also allow for a joint output of all hits retrieved. A federation also comes with excellent response times - hits retrieved can also refer to or link into the original system's layout. Nonetheless, the major challenge these days is different search engine technologies, e.g. Lucene and FAST, the variations in terms of ranking, and the implementation or non-implementation of so-called drill-downs. The lecture is designed to give a brief insight into the hbz search engine workshop with an introduction to the special project state of play.
Zhao, Y.; Ma, F.; Xia, X.: Evaluating the coverage of entities in knowledge graphs behind general web search engines : Poster (2017) 0.00
```
0.0022374375 = product of:
  0.004474875 = sum of:
    0.004474875 = product of:
      0.00894975 = sum of:
        0.00894975 = weight(_text_:a in 3854) [ClassicSimilarity], result of:
          0.00894975 = score(doc=3854,freq=14.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.1685276 = fieldWeight in 3854, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3854)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Web search engines, such as Google and Bing, are constantly employing results from knowledge organization and various visualization features to improve their search services. Knowledge graph, a large repository of structured knowledge represented by formal languages such as RDF (Resource Description Framework), is used to support entity search feature of Google and Bing (Demartini, 2016). When a user searchs for an entity, such as a person, an organization, or a place in Google or Bing, it is likely that a knowledge cardwill be presented on the right side bar of the search engine result pages (SERPs). For example, when a user searches the entity Benedict Cumberbatch on Google, the knowledge card will show the basic structured information about this person, including his date of birth, height, spouse, parents, and his movies, etc. The knowledge card, which is used to present the result of entity search, is generated from knowledge graphs. Therefore, the quality of knowledge graphs is essential to the performance of entity search. However, studies on the quality of knowledge graphs from the angle of entity coverage are scant in the literature. This study aims to investigate the coverage of entities of knowledge graphs behind Google and Bing.

Type

a
Dodge, M.: ¬A map of Yahoo! (2000) 0.00
```
0.0021393995 = product of:
  0.004278799 = sum of:
    0.004278799 = product of:
      0.008557598 = sum of:
        0.008557598 = weight(_text_:a in 1555) [ClassicSimilarity], result of:
          0.008557598 = score(doc=1555,freq=80.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.16114321 = fieldWeight in 1555, product of:
              8.944272 = tf(freq=80.0), with freq of:
                80.0 = termFreq=80.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.015625 = fieldNorm(doc=1555)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Content

"Introduction Yahoo! is the undisputed king of the Web directories, providing one of the key information navigation tools on the Internet. It has maintained its popularity over many Internet-years as the most visited Web site, against intense competition. This is because it does a good job of shifting, cataloguing and organising the Web [1] . But what would a map of Yahoo!'s hierarchical classification of the Web look like? Would an interactive map of Yahoo!, rather than the conventional listing of sites, be more useful as navigational tool? We can get some idea what a map of Yahoo! might be like by taking a look at ET-Map, a prototype developed by Hsinchun Chen and colleagues in the Artificial Intelligence Lab [2] at the University of Arizona. ET-Map was developed in 1995 as part of innovative research in automatic Internet homepage categorization and it charts a large chunk of Yahoo!, from the entertainment section representing some 110,000 different Web links. The map is a two-dimensional, multi-layered category map; its aim is to provide an intuitive visual information browsing tool. ET-Map can be browsed interactively, explored and queried, using the familiar point-and-click navigation style of the Web to find information of interest.
The View From Above Browsing for a particular piece on information on the Web can often feel like being stuck in an unfamiliar part of town walking around at street level looking for a particular store. You know the store is around there somewhere, but your viewpoint at ground level is constrained. What you really want is to get above the streets, hovering half a mile or so up in the air, to see the whole neighbourhood. This kind of birds-eye view function has been memorably described by David D. Clark, Senior Research Scientist at MIT's Laboratory for Computer Science and the Chairman of the Invisible Worlds Protocol Advisory Board, as the missing "up button" on the browser [3] . ET-Map is a nice example of a prototype for Clark's "up-button" view of an information space. The goal of information maps, like ET-Map, is to provide the browser with a sense of the lie of the information landscape, what is where, the location of clusters and hotspots, what is related to what. Ideally, this 'big-picture' all-in-one visual summary needs to fit on a single standard computer screen. ET-Map is one of my favourite examples, but there are many other interesting information maps being developed by other researchers and companies (see inset at the bottom of this page). How does ET-Map work? Here is a sequence of screenshots of a typical browsing session with ET-Map, which ends with access to Web pages on jazz musician Miles Davis. You can also tryout ET-Map for yourself, using a fully working demo on the AI Lab's website [4] . We begin with the top-level map showing forty odd broad entertainment 'subject regions' represented by regularly shaped tiles. Each tile is a visual summary of a group of Web pages with similar content. These tiles are shaded different colours to differentiate them, while labels identify the subject of the tile and the number in brackets telling you how many individual Web page links it contains. ET-Map uses two important, but common-sense, spatial concepts in its organisation and representation of the Web. Firstly, the 'subject regions' size is directly related to the number of Web pages in that category. For example, the 'MUSIC' subject area contains over 11,000 pages and so has a much larger area than the neighbouring area of 'LIVE' which only has 4,300 odd pages. This is intuitively meaningful, as the largest tiles are visually more prominent on the map and are likely to be more significant as they contain the most links. In addition, a second spatial concept, that of neighbourhood proximity, is applied so 'subject regions' closely related in term of content are plotted close to each other on the map. For example, 'FILM' and 'YEAR'S OSCARS', at the bottom left, are neighbours in both semantic and spatial space. This make senses as many things in the real-world are ordered in this way, with things that are alike being spatially close together (e.g. layout of goods in a store, or books in a library). Importantly, ET-Map is also a multi-layer map, with sub-maps showing greater informational resolution through a finer degree of categorization. So for any subject region that contains more than two hundred Web pages, a second-level map, with more detailed categories is generated. This subdivision of information space is repeated down the hierarchy as far as necessary. In the example, the user selected the 'MUSIC' subject region which, not surprisingly, contained many thousands of pages. A second-level map with numerous different music categories is then presented to the user. Delving deeper, the user wants to learn more about jazz music, so clicking on the 'JAZZ' tile leads to a third-level map, a fine-grained map of jazz related Web pages. Finally, selecting the 'MILES DAVIS' subject region leads to more a conventional looking ranking of pages from which the user selects one to download.
ET-Map was created using a sophisticated AI technique called Kohonen self-organizing map, a neural network approach that has been used for automatic analysis and classification of semantic content of text documents like Web pages. I do not pretend to fully understand how this technique works; I tend to think of it as a clever 'black-box' that group together things that are alike [5] . It is a real challenge to automatically classify pages from a very heterogeneous information collection like the Web into categories that will match the conceptions of a typical user. Directories like Yahoo! tend to rely on the skill of human editors to achieve this. ET-Map is an interesting prototype that I think highlights well the potential for a map-based approach to Web browsing. I am surprised none of the major search engines or directories have introduced the option of mapping results. Although, I am sure many are working on ideas. People certainly need all the help they get, as Web growth shows no sign of slowing. Just last month it was reported that the Web had surpassed one billion indexable pages [6].
Information Maps There are many other fascinating examples that employ two dimensional interactive maps to provide a 'birds-eye' view of information. They use various underlying techniques of textual analysis and clustering to turn the mass of information into a useful summary map (see "Mining in Textual Mountains" in Mappa.Mundi Magazine). In terms of visual representations they can be divided into two groups, those that generate smooth surfaces and those that produce regular, tiled maps. Unfortunately, we don't have space to examine them in detail, but they are well worth spending some time exploring. I will be covering some of them in future columns.
Radhakrishnan, A.: Swoogle : an engine for the Semantic Web (2007) 0.00
```
0.0021393995 = product of:
  0.004278799 = sum of:
    0.004278799 = product of:
      0.008557598 = sum of:
        0.008557598 = weight(_text_:a in 4709) [ClassicSimilarity], result of:
          0.008557598 = score(doc=4709,freq=20.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.16114321 = fieldWeight in 4709, product of:
              4.472136 = tf(freq=20.0), with freq of:
                20.0 = termFreq=20.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.03125 = fieldNorm(doc=4709)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Content

"Swoogle, the Semantic web search engine, is a research project carried out by the ebiquity research group in the Computer Science and Electrical Engineering Department at the University of Maryland. It's an engine tailored towards finding documents on the semantic web. The whole research paper is available here. Semantic web is touted as the next generation of online content representation where the web documents are represented in a language that is not only easy for humans but is machine readable (easing the integration of data as never thought possible) as well. And the main elements of the semantic web include data model description formats such as Resource Description Framework (RDF), a variety of data interchange formats (e.g. RDF/XML, Turtle, N-Triples), and notations such as RDF Schema (RDFS), the Web Ontology Language (OWL), all of which are intended to provide a formal description of concepts, terms, and relationships within a given knowledge domain (Wikipedia). And Swoogle is an attempt to mine and index this new set of web documents. The engine performs crawling of semantic documents like most web search engines and the search is available as web service too. The engine is primarily written in Java with the PHP used for the front-end and MySQL for database. Swoogle is capable of searching over 10,000 ontologies and indexes more that 1.3 million web documents. It also computes the importance of a Semantic Web document. The techniques used for indexing are the more google-type page ranking and also mining the documents for inter-relationships that are the basis for the semantic web. For more information on how the RDF framework can be used to relate documents, read the link here. Being a research project, and with a non-commercial motive, there is not much hype around Swoogle. However, the approach to indexing of Semantic web documents is an approach that most engines will have to take at some point of time. When the Internet debuted, there were no specific engines available for indexing or searching. The Search domain only picked up as more and more content became available. One fundamental question that I've always wondered about it is - provided that the search engines return very relevant results for a query - how to ascertain that the documents are indeed the most relevant ones available. There is always an inherent delay in indexing of document. Its here that the new semantic documents search engines can close delay. Experimenting with the concept of Search in the semantic web can only bore well for the future of search technology."
Hogan, A.; Harth, A.; Umbrich, J.; Kinsella, S.; Polleres, A.; Decker, S.: Searching and browsing Linked Data with SWSE : the Semantic Web Search Engine (2011) 0.00
```
0.0020714647 = product of:
  0.0041429293 = sum of:
    0.0041429293 = product of:
      0.008285859 = sum of:
        0.008285859 = weight(_text_:a in 438) [ClassicSimilarity], result of:
          0.008285859 = score(doc=438,freq=12.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.15602624 = fieldWeight in 438, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=438)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

In this paper, we discuss the architecture and implementation of the Semantic Web Search Engine (SWSE). Following traditional search engine architecture, SWSE consists of crawling, data enhancing, indexing and a user interface for search, browsing and retrieval of information; unlike traditional search engines, SWSE operates over RDF Web data - loosely also known as Linked Data - which implies unique challenges for the system design, architecture, algorithms, implementation and user interface. In particular, many challenges exist in adopting Semantic Web technologies for Web data: the unique challenges of the Web - in terms of scale, unreliability, inconsistency and noise - are largely overlooked by the current Semantic Web standards. Herein, we describe the current SWSE system, initially detailing the architecture and later elaborating upon the function, design, implementation and performance of each individual component. In so doing, we also give an insight into how current Semantic Web standards can be tailored, in a best-effort manner, for use on Web data. Throughout, we offer evaluation and complementary argumentation to support our design choices, and also offer discussion on future directions and open research questions. Later, we also provide candid discussion relating to the difficulties currently faced in bringing such a search engine into the mainstream, and lessons learnt from roughly six years working on the Semantic Web Search Engine project.
Li, Z.: ¬A domain specific search engine with explicit document relations (2013) 0.00
```
0.0020714647 = product of:
  0.0041429293 = sum of:
    0.0041429293 = product of:
      0.008285859 = sum of:
        0.008285859 = weight(_text_:a in 1210) [ClassicSimilarity], result of:
          0.008285859 = score(doc=1210,freq=12.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.15602624 = fieldWeight in 1210, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1210)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The current web consists of documents that are highly heterogeneous and hard for machines to understand. The Semantic Web is a progressive movement of the Word Wide Web, aiming at converting the current web of unstructured documents to the web of data. In the Semantic Web, web documents are annotated with metadata using standardized ontology language. These annotated documents are directly processable by machines and it highly improves their usability and usefulness. In Ericsson, similar problems occur. There are massive documents being created with well-defined structures. Though these documents are about domain specific knowledge and can have rich relations, they are currently managed by a traditional search engine, which ignores the rich domain specific information and presents few data to users. Motivated by the Semantic Web, we aim to find standard ways to process these documents, extract rich domain specific information and annotate these data to documents with formal markup languages. We propose this project to develop a domain specific search engine for processing different documents and building explicit relations for them. This research project consists of the three main focuses: examining different domain specific documents and finding ways to extract their metadata; integrating a text search engine with an ontology server; exploring novel ways to build relations for documents. We implement this system and demonstrate its functions. As a prototype, the system provides required features and will be extended in the future.

Campbell, K.: Understanding and comparing search engines (1996) 0.00

0.0020296127 = product of:
  0.0040592253 = sum of:
    0.0040592253 = product of:
      0.008118451 = sum of:
        0.008118451 = weight(_text_:a in 5666) [ClassicSimilarity], result of:
          0.008118451 = score(doc=5666,freq=2.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.15287387 = fieldWeight in 5666, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.09375 = fieldNorm(doc=5666)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Abstract: A meta-list of 11 other sites that critique search engines

Sullivan D.: ¬The webmaster's guide to search engines and directories (1996) 0.00

0.0020296127 = product of:
  0.0040592253 = sum of:
    0.0040592253 = product of:
      0.008118451 = sum of:
        0.008118451 = weight(_text_:a in 5672) [ClassicSimilarity], result of:
          0.008118451 = score(doc=5672,freq=2.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.15287387 = fieldWeight in 5672, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.09375 = fieldNorm(doc=5672)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Abstract: Very thourough report on search engines. Has several sections including What's new, Tips for success, Frequently Asked Questions and an good list of features in a table format

Stanley, T.: Alta Vista vs. Lycos (1996) 0.00

0.0020296127 = product of:
  0.0040592253 = sum of:
    0.0040592253 = product of:
      0.008118451 = sum of:
        0.008118451 = weight(_text_:a in 3939) [ClassicSimilarity], result of:
          0.008118451 = score(doc=3939,freq=2.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.15287387 = fieldWeight in 3939, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.09375 = fieldNorm(doc=3939)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Type: a

Search (76 results, page 2 of 4)

Authors

Years

Languages

Types

Themes