Search (23 results, page 1 of 2)

Radhakrishnan, A.: Swoogle : an engine for the Semantic Web (2007) 0.04
```
0.03696416 = product of:
  0.09241039 = sum of:
    0.075167626 = weight(_text_:readable in 4709) [ClassicSimilarity], result of:
      0.075167626 = score(doc=4709,freq=2.0), product of:
        0.2768342 = queryWeight, product of:
          6.1439276 = idf(docFreq=257, maxDocs=44218)
          0.04505818 = queryNorm
        0.2715258 = fieldWeight in 4709, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.1439276 = idf(docFreq=257, maxDocs=44218)
          0.03125 = fieldNorm(doc=4709)
    0.017242765 = product of:
      0.03448553 = sum of:
        0.03448553 = weight(_text_:data in 4709) [ClassicSimilarity], result of:
          0.03448553 = score(doc=4709,freq=6.0), product of:
            0.14247625 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.04505818 = queryNorm
            0.24204408 = fieldWeight in 4709, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.03125 = fieldNorm(doc=4709)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Content

"Swoogle, the Semantic web search engine, is a research project carried out by the ebiquity research group in the Computer Science and Electrical Engineering Department at the University of Maryland. It's an engine tailored towards finding documents on the semantic web. The whole research paper is available here. Semantic web is touted as the next generation of online content representation where the web documents are represented in a language that is not only easy for humans but is machine readable (easing the integration of data as never thought possible) as well. And the main elements of the semantic web include data model description formats such as Resource Description Framework (RDF), a variety of data interchange formats (e.g. RDF/XML, Turtle, N-Triples), and notations such as RDF Schema (RDFS), the Web Ontology Language (OWL), all of which are intended to provide a formal description of concepts, terms, and relationships within a given knowledge domain (Wikipedia). And Swoogle is an attempt to mine and index this new set of web documents. The engine performs crawling of semantic documents like most web search engines and the search is available as web service too. The engine is primarily written in Java with the PHP used for the front-end and MySQL for database. Swoogle is capable of searching over 10,000 ontologies and indexes more that 1.3 million web documents. It also computes the importance of a Semantic Web document. The techniques used for indexing are the more google-type page ranking and also mining the documents for inter-relationships that are the basis for the semantic web. For more information on how the RDF framework can be used to relate documents, read the link here. Being a research project, and with a non-commercial motive, there is not much hype around Swoogle. However, the approach to indexing of Semantic web documents is an approach that most engines will have to take at some point of time. When the Internet debuted, there were no specific engines available for indexing or searching. The Search domain only picked up as more and more content became available. One fundamental question that I've always wondered about it is - provided that the search engines return very relevant results for a query - how to ascertain that the documents are indeed the most relevant ones available. There is always an inherent delay in indexing of document. Its here that the new semantic documents search engines can close delay. Experimenting with the concept of Search in the semantic web can only bore well for the future of search technology."
Baeza-Yates, R.; Boldi, P.; Castillo, C.: Generalizing PageRank : damping functions for linkbased ranking algorithms (2006) 0.01
```
0.01108232 = product of:
  0.055411596 = sum of:
    0.055411596 = sum of:
      0.024887787 = weight(_text_:data in 2565) [ClassicSimilarity], result of:
        0.024887787 = score(doc=2565,freq=2.0), product of:
          0.14247625 = queryWeight, product of:
            3.1620505 = idf(docFreq=5088, maxDocs=44218)
            0.04505818 = queryNorm
          0.17468026 = fieldWeight in 2565, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.1620505 = idf(docFreq=5088, maxDocs=44218)
            0.0390625 = fieldNorm(doc=2565)
      0.030523809 = weight(_text_:22 in 2565) [ClassicSimilarity], result of:
        0.030523809 = score(doc=2565,freq=2.0), product of:
          0.15778607 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.04505818 = queryNorm
          0.19345059 = fieldWeight in 2565, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=2565)
  0.2 = coord(1/5)
```
Abstract

This paper introduces a family of link-based ranking algorithms that propagate page importance through links. In these algorithms there is a damping function that decreases with distance, so a direct link implies more endorsement than a link through a long path. PageRank is the most widely known ranking function of this family. The main objective of this paper is to determine whether this family of ranking techniques has some interest per se, and how different choices for the damping function impact on rank quality and on convergence speed. Even though our results suggest that PageRank can be approximated with other simpler forms of rankings that may be computed more efficiently, our focus is of more speculative nature, in that it aims at separating the kernel of PageRank, that is, link-based importance propagation, from the way propagation decays over paths. We focus on three damping functions, having linear, exponential, and hyperbolic decay on the lengths of the paths. The exponential decay corresponds to PageRank, and the other functions are new. Our presentation includes algorithms, analysis, comparisons and experiments that study their behavior under different parameters in real Web graph data. Among other results, we show how to calculate a linear approximation that induces a page ordering that is almost identical to PageRank's using a fixed small number of iterations; comparisons were performed using Kendall's tau on large domain datasets.

Date

16. 1.2016 10:22:28

Dunning, A.: Do we still need search engines? (1999) 0.01

0.008546666 = product of:
  0.04273333 = sum of:
    0.04273333 = product of:
      0.08546666 = sum of:
        0.08546666 = weight(_text_:22 in 6021) [ClassicSimilarity], result of:
          0.08546666 = score(doc=6021,freq=2.0), product of:
            0.15778607 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04505818 = queryNorm
            0.5416616 = fieldWeight in 6021, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=6021)
      0.5 = coord(1/2)
  0.2 = coord(1/5)

Source: Ariadne. 1999, no.22

Birmingham, J.: Internet search engines (1996) 0.01

0.0073257135 = product of:
  0.036628567 = sum of:
    0.036628567 = product of:
      0.07325713 = sum of:
        0.07325713 = weight(_text_:22 in 5664) [ClassicSimilarity], result of:
          0.07325713 = score(doc=5664,freq=2.0), product of:
            0.15778607 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04505818 = queryNorm
            0.46428138 = fieldWeight in 5664, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=5664)
      0.5 = coord(1/2)
  0.2 = coord(1/5)

Date: 10.11.1996 16:36:22

Hogan, A.; Harth, A.; Umbrich, J.; Kinsella, S.; Polleres, A.; Decker, S.: Searching and browsing Linked Data with SWSE : the Semantic Web Search Engine (2011) 0.01
```
0.006096238 = product of:
  0.03048119 = sum of:
    0.03048119 = product of:
      0.06096238 = sum of:
        0.06096238 = weight(_text_:data in 438) [ClassicSimilarity], result of:
          0.06096238 = score(doc=438,freq=12.0), product of:
            0.14247625 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.04505818 = queryNorm
            0.4278775 = fieldWeight in 438, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=438)
      0.5 = coord(1/2)
  0.2 = coord(1/5)
```
Abstract

In this paper, we discuss the architecture and implementation of the Semantic Web Search Engine (SWSE). Following traditional search engine architecture, SWSE consists of crawling, data enhancing, indexing and a user interface for search, browsing and retrieval of information; unlike traditional search engines, SWSE operates over RDF Web data - loosely also known as Linked Data - which implies unique challenges for the system design, architecture, algorithms, implementation and user interface. In particular, many challenges exist in adopting Semantic Web technologies for Web data: the unique challenges of the Web - in terms of scale, unreliability, inconsistency and noise - are largely overlooked by the current Semantic Web standards. Herein, we describe the current SWSE system, initially detailing the architecture and later elaborating upon the function, design, implementation and performance of each individual component. In so doing, we also give an insight into how current Semantic Web standards can be tailored, in a best-effort manner, for use on Web data. Throughout, we offer evaluation and complementary argumentation to support our design choices, and also offer discussion on future directions and open research questions. Later, we also provide candid discussion relating to the difficulties currently faced in bringing such a search engine into the mainstream, and lessons learnt from roughly six years working on the Semantic Web Search Engine project.
Summann, F.; Lossau, N.: Search engine technology and digital libraries : moving from theory to practice (2004) 0.01
```
0.006035973 = product of:
  0.030179864 = sum of:
    0.030179864 = weight(_text_:bibliographic in 1196) [ClassicSimilarity], result of:
      0.030179864 = score(doc=1196,freq=2.0), product of:
        0.17541347 = queryWeight, product of:
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.04505818 = queryNorm
        0.17204987 = fieldWeight in 1196, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.03125 = fieldNorm(doc=1196)
  0.2 = coord(1/5)
```
Abstract

This article describes the journey from the conception of and vision for a modern search-engine-based search environment to its technological realisation. In doing so, it takes up the thread of an earlier article on this subject, this time from a technical viewpoint. As well as presenting the conceptual considerations of the initial stages, this article will principally elucidate the technological aspects of this journey. The starting point for the deliberations about development of an academic search engine was the experience we gained through the generally successful project "Digital Library NRW", in which from 1998 to 2000-with Bielefeld University Library in overall charge-we designed a system model for an Internet-based library portal with an improved academic search environment at its core. At the heart of this system was a metasearch with an availability function, to which we added a user interface integrating all relevant source material for study and research. The deficiencies of this approach were felt soon after the system was launched in June 2001. There were problems with the stability and performance of the database retrieval system, with the integration of full-text documents and Internet pages, and with acceptance by users, because users are increasingly performing the searches themselves using search engines rather than going to the library for help in doing searches. Since a long list of problems are also encountered using commercial search engines for academic use (in particular the retrieval of academic information and long-term availability), the idea was born for a search engine configured specifically for academic use. We also hoped that with one single access point founded on improved search engine technology, we could access the heterogeneous academic resources of subject-based bibliographic databases, catalogues, electronic newspapers, document servers and academic web pages.
What is Schema.org? (2011) 0.01
```
0.0059730685 = product of:
  0.029865343 = sum of:
    0.029865343 = product of:
      0.059730686 = sum of:
        0.059730686 = weight(_text_:data in 4437) [ClassicSimilarity], result of:
          0.059730686 = score(doc=4437,freq=8.0), product of:
            0.14247625 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.04505818 = queryNorm
            0.4192326 = fieldWeight in 4437, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=4437)
      0.5 = coord(1/2)
  0.2 = coord(1/5)
```
Abstract

This site provides a collection of schemas, i.e., html tags, that webmasters can use to markup their pages in ways recognized by major search providers. Search engines including Bing, Google and Yahoo! rely on this markup to improve the display of search results, making it easier for people to find the right web pages. Many sites are generated from structured data, which is often stored in databases. When this data is formatted into HTML, it becomes very difficult to recover the original structured data. Many applications, especially search engines, can benefit greatly from direct access to this structured data. On-page markup enables search engines to understand the information on web pages and provide richer search results in order to make it easier for users to find relevant information on the web. Markup can also enable new tools and applications that make use of the structure. A shared markup vocabulary makes easier for webmasters to decide on a markup schema and get the maximum benefit for their efforts. So, in the spirit of sitemaps.org, Bing, Google and Yahoo! have come together to provide a shared collection of schemas that webmasters can use.

Bensman, S.J.: Eugene Garfield, Francis Narin, and PageRank : the theoretical bases of the Google search engine (2013) 0.00

0.004883809 = product of:
  0.024419045 = sum of:
    0.024419045 = product of:
      0.04883809 = sum of:
        0.04883809 = weight(_text_:22 in 1149) [ClassicSimilarity], result of:
          0.04883809 = score(doc=1149,freq=2.0), product of:
            0.15778607 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04505818 = queryNorm
            0.30952093 = fieldWeight in 1149, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=1149)
      0.5 = coord(1/2)
  0.2 = coord(1/5)

Date: 17.12.2013 11:02:22

Schaat, S.: Von der automatisierten Manipulation zur Manipulation der Automatisierung (2019) 0.00

0.004883809 = product of:
  0.024419045 = sum of:
    0.024419045 = product of:
      0.04883809 = sum of:
        0.04883809 = weight(_text_:22 in 4996) [ClassicSimilarity], result of:
          0.04883809 = score(doc=4996,freq=2.0), product of:
            0.15778607 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04505818 = queryNorm
            0.30952093 = fieldWeight in 4996, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=4996)
      0.5 = coord(1/2)
  0.2 = coord(1/5)

Date: 19. 2.2019 17:22:00

Li, Z.: ¬A domain specific search engine with explicit document relations (2013) 0.00
```
0.0043106913 = product of:
  0.021553457 = sum of:
    0.021553457 = product of:
      0.043106914 = sum of:
        0.043106914 = weight(_text_:data in 1210) [ClassicSimilarity], result of:
          0.043106914 = score(doc=1210,freq=6.0), product of:
            0.14247625 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.04505818 = queryNorm
            0.30255508 = fieldWeight in 1210, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1210)
      0.5 = coord(1/2)
  0.2 = coord(1/5)
```
Abstract

The current web consists of documents that are highly heterogeneous and hard for machines to understand. The Semantic Web is a progressive movement of the Word Wide Web, aiming at converting the current web of unstructured documents to the web of data. In the Semantic Web, web documents are annotated with metadata using standardized ontology language. These annotated documents are directly processable by machines and it highly improves their usability and usefulness. In Ericsson, similar problems occur. There are massive documents being created with well-defined structures. Though these documents are about domain specific knowledge and can have rich relations, they are currently managed by a traditional search engine, which ignores the rich domain specific information and presents few data to users. Motivated by the Semantic Web, we aim to find standard ways to process these documents, extract rich domain specific information and annotate these data to documents with formal markup languages. We propose this project to develop a domain specific search engine for processing different documents and building explicit relations for them. This research project consists of the three main focuses: examining different domain specific documents and finding ways to extract their metadata; integrating a text search engine with an ontology server; exploring novel ways to build relations for documents. We implement this system and demonstrate its functions. As a prototype, the system provides required features and will be extended in the future.
Tetzchner, J. von: As a monopoly in search and advertising Google is not able to resist the misuse of power : is the Internet turning into a battlefield of propaganda? How Google should be regulated (2017) 0.00
```
0.0038955552 = product of:
  0.019477775 = sum of:
    0.019477775 = product of:
      0.03895555 = sum of:
        0.03895555 = weight(_text_:data in 3891) [ClassicSimilarity], result of:
          0.03895555 = score(doc=3891,freq=10.0), product of:
            0.14247625 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.04505818 = queryNorm
            0.27341786 = fieldWeight in 3891, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.02734375 = fieldNorm(doc=3891)
      0.5 = coord(1/2)
  0.2 = coord(1/5)
```
Content

How should Google be regulated? We should limit the amount of information that is being collected. In particular we should look at information that is being collected across sites. It should not be legal to combine data from multiple sites and services. The fact that these sites and services are using the same underlying technology does not change the fact that the user's dealings is with a site at a time and each site should not have the right to share the data with others. I believe this the cornerstone of laws in many countries today, but these laws need to be enforced. Data about us is ours alone and it should not be possible to sell it. We should also limit the ability to target users individually. In the past, ads on sites were ads on sites. You might know what kind of users visited a site and you would place tech ads on tech sites and fashion ads on fashion sites. Now the ads follow you individually. That should be made illegal as it uses data collected from multiple sources and invades our privacy. I also believe there should be regulation as to how location data is used and any information related to our mobile devices. In addition, regulators need to be vigilant as to how companies that have monopoly power use their power. That kind of goes without saying. Companies with monopoly powers should not be able to use those powers when competing in an open market or using their monopoly services to limit competition."

Place, E.: Internationale Zusammenarbeit bei Internet Subject Gateways (1999) 0.00

0.0036628568 = product of:
  0.018314283 = sum of:
    0.018314283 = product of:
      0.036628567 = sum of:
        0.036628567 = weight(_text_:22 in 4189) [ClassicSimilarity], result of:
          0.036628567 = score(doc=4189,freq=2.0), product of:
            0.15778607 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04505818 = queryNorm
            0.23214069 = fieldWeight in 4189, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=4189)
      0.5 = coord(1/2)
  0.2 = coord(1/5)

Date: 22. 6.2002 19:35:09

Boldi, P.; Santini, M.; Vigna, S.: PageRank as a function of the damping factor (2005) 0.00

0.0030523809 = product of:
  0.015261904 = sum of:
    0.015261904 = product of:
      0.030523809 = sum of:
        0.030523809 = weight(_text_:22 in 2564) [ClassicSimilarity], result of:
          0.030523809 = score(doc=2564,freq=2.0), product of:
            0.15778607 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04505818 = queryNorm
            0.19345059 = fieldWeight in 2564, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2564)
      0.5 = coord(1/2)
  0.2 = coord(1/5)

Date: 16. 1.2016 10:22:28

Bladow, N.; Dorey, C.; Frederickson, L.; Grover, P.; Knudtson, Y.; Krishnamurthy, S.; Lazarou, V.: What's the Buzz about? : An empirical examination of Search on Yahoo! (2005) 0.00
```
0.0029865343 = product of:
  0.014932672 = sum of:
    0.014932672 = product of:
      0.029865343 = sum of:
        0.029865343 = weight(_text_:data in 3072) [ClassicSimilarity], result of:
          0.029865343 = score(doc=3072,freq=2.0), product of:
            0.14247625 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.04505818 = queryNorm
            0.2096163 = fieldWeight in 3072, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=3072)
      0.5 = coord(1/2)
  0.2 = coord(1/5)
```
Abstract

We present an analysis of the Yahoo Buzz Index over a period of 45 weeks. Our key findings are that: (1) It is most common for a search term to show up on the index for one week, followed by two weeks, three weeks, etc. Only two terms persist for all 45 weeks studied - Britney Spears and Jennifer Lopez. Search term longevity follows a power-law distribution or a winner-take-all structure; (2) Most search terms focus on entertainment. Search terms related to serious topics are found less often. The Buzz Index does not necessarily follow the "news cycle"; and, (3) We provide two ways to determine "star power" of various search terms - one that emphasizes staying power on the Index and another that emphasizes rank. In general, the methods lead to dramatically different results. Britney Spears performs well in both methods. We conclude that the data available on the Index is symptomatic of a celebrity-crazed, entertainment-centered culture.
Khare, R.; Cutting, D.; Sitaker, K.; Rifkin, A.: Nutch: a flexible and scalable open-source Web search engine (2004) 0.00
```
0.0029865343 = product of:
  0.014932672 = sum of:
    0.014932672 = product of:
      0.029865343 = sum of:
        0.029865343 = weight(_text_:data in 852) [ClassicSimilarity], result of:
          0.029865343 = score(doc=852,freq=2.0), product of:
            0.14247625 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.04505818 = queryNorm
            0.2096163 = fieldWeight in 852, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=852)
      0.5 = coord(1/2)
  0.2 = coord(1/5)
```
Abstract

Nutch is an open-source Web search engine that can be used at global, local, and even personal scale. Its initial design goal was to enable a transparent alternative for global Web search in the public interest - one of its signature features is the ability to "explain" its result rankings. Recent work has emphasized how it can also be used for intranets; by local communities with richer data models, such as the Creative Commons metadata-enabled search for licensed content; on a personal scale to index a user's files, email, and web-surfing history; and we also report on several other research projects built on Nutch. In this paper, we present how the architecture of the Nutch system enables it to be more flexible and scalable than other comparable systems today.
Schomburg, S.; Prante, J.: Search Engine Federation in Libraries - Suchmaschinenföderation in Bibliotheken (2009) 0.00
```
0.0029865343 = product of:
  0.014932672 = sum of:
    0.014932672 = product of:
      0.029865343 = sum of:
        0.029865343 = weight(_text_:data in 2809) [ClassicSimilarity], result of:
          0.029865343 = score(doc=2809,freq=2.0), product of:
            0.14247625 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.04505818 = queryNorm
            0.2096163 = fieldWeight in 2809, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=2809)
      0.5 = coord(1/2)
  0.2 = coord(1/5)
```
Abstract

The hbz (Academic Library Center, Cologne) has a strong focus on search engine applications: Beyond the projected integration of respective technologies into the new release of the Digital Library portal solution (DigiBib6), vascoda background services also apply and take advantage of search engine technology. Experience since 2003 has given proof that building and updating of search engine indexes involves a vast amount of resources. The use of search engine federations, however, pledges major improvements: The total amount of data records held in linked indexes can be almost unlimited but also allow for a joint output of all hits retrieved. A federation also comes with excellent response times - hits retrieved can also refer to or link into the original system's layout. Nonetheless, the major challenge these days is different search engine technologies, e.g. Lucene and FAST, the variations in terms of ranking, and the implementation or non-implementation of so-called drill-downs. The lecture is designed to give a brief insight into the hbz search engine workshop with an introduction to the special project state of play.
Brin, S.; Page, L.: ¬The anatomy of a large-scale hypertextual Web search engine (1998) 0.00
```
0.0024887787 = product of:
  0.012443894 = sum of:
    0.012443894 = product of:
      0.024887787 = sum of:
        0.024887787 = weight(_text_:data in 947) [ClassicSimilarity], result of:
          0.024887787 = score(doc=947,freq=2.0), product of:
            0.14247625 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.04505818 = queryNorm
            0.17468026 = fieldWeight in 947, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=947)
      0.5 = coord(1/2)
  0.2 = coord(1/5)
```
Abstract

In this paper, we present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext. Google is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems. The prototype with a full text and hyperlink database of at least 24 million pages is available at http://google.stanford.edu/. To engineer a search engine is a challenging task. Search engines index tens to hundreds of millions of web pages involving a comparable number of distinct terms. They answer tens of millions of queries every day. Despite the importance of large-scale search engines on the web, very little academic research has been done on them. Furthermore, due to rapid advance in technology and web proliferation, creating a web search engine today is very different from three years ago. This paper provides an in-depth description of our large-scale web search engine -- the first such detailed public description we know of to date. Apart from the problems of scaling traditional search techniques to data of this magnitude, there are new technical challenges involved with using the additional information present in hypertext to produce better search results. This paper addresses this question of how to build a practical large-scale system which can exploit the additional information present in hypertext. Also we look at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want

Bredemeier, W.: "Strategische Deökonomisierung und Demokratisierung der Informationszugänge" : Eine Alternative zu Google und den Sozialen Medien? (2022) 0.00

0.0024887787 = product of:
  0.012443894 = sum of:
    0.012443894 = product of:
      0.024887787 = sum of:
        0.024887787 = weight(_text_:data in 598) [ClassicSimilarity], result of:
          0.024887787 = score(doc=598,freq=2.0), product of:
            0.14247625 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.04505818 = queryNorm
            0.17468026 = fieldWeight in 598, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=598)
      0.5 = coord(1/2)
  0.2 = coord(1/5)

Source: Open Password. 2022, Nr. 1077 vom 27.05.2022 [https://www.password-online.de/?mailpoet_router&endpoint=view_in_browser&action=view&data=WzQ2MCwiOTIwMzk1Zjg2YWU1IiwwLDAsNDIwLDFd

Ogden, J.; Summers, E.; Walker, S.: Know(ing) Infrastructure : the wayback machine as object and instrument of digital research (2023) 0.00
```
0.0024887787 = product of:
  0.012443894 = sum of:
    0.012443894 = product of:
      0.024887787 = sum of:
        0.024887787 = weight(_text_:data in 1084) [ClassicSimilarity], result of:
          0.024887787 = score(doc=1084,freq=2.0), product of:
            0.14247625 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.04505818 = queryNorm
            0.17468026 = fieldWeight in 1084, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1084)
      0.5 = coord(1/2)
  0.2 = coord(1/5)
```
Abstract

From documenting human rights abuses to studying online advertising, web archives are increasingly positioned as critical resources for a broad range of scholarly Internet research agendas. In this article, we reflect on the motivations and methodological challenges of investigating the world's largest web archive, the Internet Archive's Wayback Machine (IAWM). Using a mixed methods approach, we report on a pilot project centred around documenting the inner workings of 'Save Page Now' (SPN) - an Internet Archive tool that allows users to initiate the creation and storage of 'snapshots' of web resources. By improving our understanding of SPN and its role in shaping the IAWM, this work examines how the public tool is being used to 'save the Web' and highlights the challenges of operationalising a study of the dynamic sociotechnical processes supporting this knowledge infrastructure. Inspired by existing Science and Technology Studies (STS) approaches, the paper charts our development of methodological interventions to support an interdisciplinary investigation of SPN, including: ethnographic methods, 'experimental blackbox tactics', data tracing, modelling and documentary research. We discuss the opportunities and limitations of our methodology when interfacing with issues associated with temporality, scale and visibility, as well as critically engage with our own positionality in the research process (in terms of expertise and access). We conclude with reflections on the implications of digital STS approaches for 'knowing infrastructure', where the use of these infrastructures is unavoidably intertwined with our ability to study the situated and material arrangements of their creation.

Gillitzer, B.: Yewno (2017) 0.00

0.0024419045 = product of:
  0.012209523 = sum of:
    0.012209523 = product of:
      0.024419045 = sum of:
        0.024419045 = weight(_text_:22 in 3447) [ClassicSimilarity], result of:
          0.024419045 = score(doc=3447,freq=2.0), product of:
            0.15778607 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04505818 = queryNorm
            0.15476047 = fieldWeight in 3447, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=3447)
      0.5 = coord(1/2)
  0.2 = coord(1/5)

Date: 22. 2.2017 10:16:49

Search (23 results, page 1 of 2)

Authors

Years

Languages

Types

Themes