Search (414 results, page 1 of 21)

Hancock, B.: Subject-specific search engines : using the Harvest system to gather and maintain information on the Internet (1998) 0.13

0.12533839 = product of:
  0.20889731 = sum of:
    0.10875649 = weight(_text_:index in 3238) [ClassicSimilarity], result of:
      0.10875649 = score(doc=3238,freq=6.0), product of:
        0.18579477 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.04251826 = queryNorm
        0.5853582 = fieldWeight in 3238, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3238)
    0.04613084 = weight(_text_:system in 3238) [ClassicSimilarity], result of:
      0.04613084 = score(doc=3238,freq=4.0), product of:
        0.13391352 = queryWeight, product of:
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.04251826 = queryNorm
        0.34448233 = fieldWeight in 3238, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3238)
    0.054009974 = product of:
      0.08101496 = sum of:
        0.0406905 = weight(_text_:29 in 3238) [ClassicSimilarity], result of:
          0.0406905 = score(doc=3238,freq=2.0), product of:
            0.14956595 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.04251826 = queryNorm
            0.27205724 = fieldWeight in 3238, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3238)
        0.04032446 = weight(_text_:22 in 3238) [ClassicSimilarity], result of:
          0.04032446 = score(doc=3238,freq=2.0), product of:
            0.1488917 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04251826 = queryNorm
            0.2708308 = fieldWeight in 3238, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3238)
      0.6666667 = coord(2/3)
  0.6 = coord(3/5)

Abstract: The increasing expansion of the Internet has made resources available to users in sometimes unmanageable abundance. To help users manage this proliferation of information, librarians have begun to add URLs to their home pages. As well, specialized search engines are being used to retrieve information from selected sources in aneffort to return pertinent results. Describes the Harvest system which has been used to develop Index Antiquus, a specialized engine, for the classics and mediaeval studies. Presents a working example of how to search Index Antiquus
Date: 6. 3.1997 16:22:15
5. 3.1999 19:29:26
Object: Index Antiquus

Ardo, A.; Lundberg, S.: ¬A regional distributed WWW search and indexing service : the DESIRE way (1998) 0.09

0.08613239 = product of:
  0.14355399 = sum of:
    0.07611368 = weight(_text_:index in 4190) [ClassicSimilarity], result of:
      0.07611368 = score(doc=4190,freq=4.0), product of:
        0.18579477 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.04251826 = queryNorm
        0.40966535 = fieldWeight in 4190, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.046875 = fieldNorm(doc=4190)
    0.055919025 = weight(_text_:system in 4190) [ClassicSimilarity], result of:
      0.055919025 = score(doc=4190,freq=8.0), product of:
        0.13391352 = queryWeight, product of:
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.04251826 = queryNorm
        0.41757566 = fieldWeight in 4190, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.046875 = fieldNorm(doc=4190)
    0.011521274 = product of:
      0.03456382 = sum of:
        0.03456382 = weight(_text_:22 in 4190) [ClassicSimilarity], result of:
          0.03456382 = score(doc=4190,freq=2.0), product of:
            0.1488917 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04251826 = queryNorm
            0.23214069 = fieldWeight in 4190, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=4190)
      0.33333334 = coord(1/3)
  0.6 = coord(3/5)

Abstract: Creates an open, metadata aware system for distributed, collaborative WWW indexing. The system has 3 main components: a harvester (for collecting information), a database (for making the collection searchable), and a user interface (for making the information available). all components can be distributed across networked computers, thus supporting scalability. The system is metadata aware and thus allows searches on several fields including title, document author and URL. Nordic Web Index (NWI) is an application using this system to create a regional Nordic Web-indexing service. NWI is built using 5 collaborating service points within the Nordic countries. The NWI databases can be used to build additional services
Date: 1. 8.1996 22:08:06
Object: Nordic Web Index

Garcés, P.J.; Olivas, J.A.; Romero, F.P.: Concept-matching IR systems versus word-matching information retrieval systems : considering fuzzy interrelations for indexing Web pages (2006) 0.07

0.067956895 = product of:
  0.11326149 = sum of:
    0.057061244 = weight(_text_:context in 5288) [ClassicSimilarity], result of:
      0.057061244 = score(doc=5288,freq=4.0), product of:
        0.17622331 = queryWeight, product of:
          4.14465 = idf(docFreq=1904, maxDocs=44218)
          0.04251826 = queryNorm
        0.32380077 = fieldWeight in 5288, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.14465 = idf(docFreq=1904, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5288)
    0.046599183 = weight(_text_:system in 5288) [ClassicSimilarity], result of:
      0.046599183 = score(doc=5288,freq=8.0), product of:
        0.13391352 = queryWeight, product of:
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.04251826 = queryNorm
        0.3479797 = fieldWeight in 5288, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5288)
    0.009601062 = product of:
      0.028803186 = sum of:
        0.028803186 = weight(_text_:22 in 5288) [ClassicSimilarity], result of:
          0.028803186 = score(doc=5288,freq=2.0), product of:
            0.1488917 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04251826 = queryNorm
            0.19345059 = fieldWeight in 5288, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5288)
      0.33333334 = coord(1/3)
  0.6 = coord(3/5)

Abstract: This article presents a semantic-based Web retrieval system that is capable of retrieving the Web pages that are conceptually related to the implicit concepts of the query. The concept of concept is managed from a fuzzy point of view by means of semantic areas. In this context, the proposed system improves most search engines that are based on matching words. The key of the system is to use a new version of the Fuzzy Interrelations and Synonymy-Based Concept Representation Model (FIS-CRM) to extract and represent the concepts contained in both the Web pages and the user query. This model, which was integrated into other tools such as the Fuzzy Interrelations and Synonymy based Searcher (FISS) metasearcher and the fz-mail system, considers the fuzzy synonymy and the fuzzy generality interrelations as a means of representing word interrelations (stored in a fuzzy synonymy dictionary and ontologies). The new version of the model, which is based on the study of the cooccurrences of synonyms, integrates a soft method for disambiguating word senses. This method also considers the context of the word to be disambiguated and the thematic ontologies and sets of synonyms stored in the dictionary.
Date: 22. 7.2006 17:14:12

Park, E.-K.; Ra, D.-Y.; Jang, M.-G.: Techniques for improving web retrieval effectiveness (2005) 0.06

0.062992245 = product of:
  0.10498707 = sum of:
    0.0538205 = weight(_text_:index in 1060) [ClassicSimilarity], result of:
      0.0538205 = score(doc=1060,freq=2.0), product of:
        0.18579477 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.04251826 = queryNorm
        0.28967714 = fieldWeight in 1060, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.046875 = fieldNorm(doc=1060)
    0.03954072 = weight(_text_:system in 1060) [ClassicSimilarity], result of:
      0.03954072 = score(doc=1060,freq=4.0), product of:
        0.13391352 = queryWeight, product of:
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.04251826 = queryNorm
        0.29527056 = fieldWeight in 1060, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.046875 = fieldNorm(doc=1060)
    0.011625858 = product of:
      0.034877572 = sum of:
        0.034877572 = weight(_text_:29 in 1060) [ClassicSimilarity], result of:
          0.034877572 = score(doc=1060,freq=2.0), product of:
            0.14956595 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.04251826 = queryNorm
            0.23319192 = fieldWeight in 1060, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.046875 = fieldNorm(doc=1060)
      0.33333334 = coord(1/3)
  0.6 = coord(3/5)

Abstract: This paper talks about several schemes for improving retrieval effectiveness that can be used in the named page finding tasks of web information retrieval (Overview of the TREC-2002 web track. In: Proceedings of the Eleventh Text Retrieval Conference TREC-2002, NIST Special Publication #500-251, 2003). These methods were applied on top of the basic information retrieval model as additional mechanisms to upgrade the system. Use of the title of web pages was found to be effective. It was confirmed that anchor texts of incoming links was beneficial as suggested in other works. Sentence-query similarity is a new type of information proposed by us and was identified to be the best information to take advantage of. Stratifying and re-ranking the retrieval list based on the maximum count of index terms in common between a sentence and a query resulted in significant improvement of performance. To demonstrate these facts a large-scale web information retrieval system was developed and used for experimentation.
Date: 26.12.2007 20:28:29

Furner, J.: ¬A unifying model of document relatedness for hybrid search engines (2003) 0.06

0.05968804 = product of:
  0.09948006 = sum of:
    0.04841807 = weight(_text_:context in 2717) [ClassicSimilarity], result of:
      0.04841807 = score(doc=2717,freq=2.0), product of:
        0.17622331 = queryWeight, product of:
          4.14465 = idf(docFreq=1904, maxDocs=44218)
          0.04251826 = queryNorm
        0.27475408 = fieldWeight in 2717, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.14465 = idf(docFreq=1904, maxDocs=44218)
          0.046875 = fieldNorm(doc=2717)
    0.03954072 = weight(_text_:system in 2717) [ClassicSimilarity], result of:
      0.03954072 = score(doc=2717,freq=4.0), product of:
        0.13391352 = queryWeight, product of:
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.04251826 = queryNorm
        0.29527056 = fieldWeight in 2717, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.046875 = fieldNorm(doc=2717)
    0.011521274 = product of:
      0.03456382 = sum of:
        0.03456382 = weight(_text_:22 in 2717) [ClassicSimilarity], result of:
          0.03456382 = score(doc=2717,freq=2.0), product of:
            0.1488917 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04251826 = queryNorm
            0.23214069 = fieldWeight in 2717, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2717)
      0.33333334 = coord(1/3)
  0.6 = coord(3/5)

Abstract: Previous work an search-engine design has indicated that information-seekers may benefit from being given the opportunity to exploit multiple sources of evidence of document relatedness. Few existing systems, however, give users more than minimal control over the selections that may be made among methods of exploitation. By applying the methods of "document network analysis" (DNA), a unifying, graph-theoretic model of content-, collaboration-, and context-based systems (CCC) may be developed in which the nature of the similarities between types of document relatedness and document ranking are clarified. The usefulness of the approach to system design suggested by this model may be tested by constructing and evaluating a prototype system (UCXtra) that allows searchers to maintain control over the multiple ways in which document collections may be ranked and re-ranked.
Date: 11. 9.2004 17:32:22

Schulz, W.; Held, T.: ¬Der Index auf dem Index? : Selbstzensur und Zensur bei Suchmaschinen (2007) 0.05
```
0.049632978 = product of:
  0.124082446 = sum of:
    0.107641 = weight(_text_:index in 374) [ClassicSimilarity], result of:
      0.107641 = score(doc=374,freq=8.0), product of:
        0.18579477 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.04251826 = queryNorm
        0.5793543 = fieldWeight in 374, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.046875 = fieldNorm(doc=374)
    0.016441446 = product of:
      0.049324337 = sum of:
        0.049324337 = weight(_text_:29 in 374) [ClassicSimilarity], result of:
          0.049324337 = score(doc=374,freq=4.0), product of:
            0.14956595 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.04251826 = queryNorm
            0.3297832 = fieldWeight in 374, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.046875 = fieldNorm(doc=374)
      0.33333334 = coord(1/3)
  0.4 = coord(2/5)
```
Abstract

Suchmaschinen gelten als Gatekeeper in der öffentlichen Kommunikation. Zumindest für bestimmte Typen von Seitenaufrufen stellen Suchmaschinen den mit Abstand häufigsten Weg des Zugangs zu Internet-Inhalten dar. Deshalb können Beeinflussungen des Index von Suchmaschinen oder der Algorithmen, die die Ergebnislisten steuern, als hochsensibler Eingriff in die Internet-basierte Kommunikation angesehen werden. Dies lenkt die Aufmerksamkeit auf die >Policies< der Suchmaschinenanbieter, auch im Hinblick auf externe Anforderungen, die etwa von Nationalstaaten an sie gerichtet werden. Vor allem der Anbieter Google ist in die Kritik geraten, weil er in seinem Angebot in China Seiten aus dem Index löscht, die von der chinesischen Regierung als staatsgefährdend angesehen werden. Google beruft sich darauf, nur die dortigen Gesetze zu befolgen. Auch in Deutschland werden Seiten gefiltert. In der Internetgemeinde ist dann schnell das Wort >Zensur< zu hören. Im Folgenden soll der Frage nachgegangen werden, wann nach deutschem Verständnis von Zensur gesprochen werden kann. Dabei soll deutlich werden, wo Unterschiede in den nationalstaatlichen Politiken, aber auch bei den Kooperationen der Suchmaschinenanbieter mit den Nationalstaaten bestehen.

Date

13. 5.2007 10:29:29
Haveliwala, T.: Context-Sensitive Web search (2005) 0.05
```
0.049278647 = product of:
  0.12319662 = sum of:
    0.096836135 = weight(_text_:context in 2567) [ClassicSimilarity], result of:
      0.096836135 = score(doc=2567,freq=18.0), product of:
        0.17622331 = queryWeight, product of:
          4.14465 = idf(docFreq=1904, maxDocs=44218)
          0.04251826 = queryNorm
        0.5495081 = fieldWeight in 2567, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          4.14465 = idf(docFreq=1904, maxDocs=44218)
          0.03125 = fieldNorm(doc=2567)
    0.02636048 = weight(_text_:system in 2567) [ClassicSimilarity], result of:
      0.02636048 = score(doc=2567,freq=4.0), product of:
        0.13391352 = queryWeight, product of:
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.04251826 = queryNorm
        0.19684705 = fieldWeight in 2567, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.03125 = fieldNorm(doc=2567)
  0.4 = coord(2/5)
```
Abstract

As the Web continues to grow and encompass broader and more diverse sources of information, providing effective search facilities to users becomes an increasingly challenging problem. To help users deal with the deluge of Web-accessible information, we propose a search system which makes use of context to improve search results in a scalable way. By context, we mean any sources of information, in addition to any search query, that provide clues about the user's true information need. For instance, a user's bookmarks and search history can be considered a part of the search context. We consider two types of context-based search. The first type of functionality we consider is "similarity search." In this case, as the user is browsing Web pages, URLs for pages similar to the current page are retrieved and displayed in a side panel. No query is explicitly issued; context alone (i.e., the page currently being viewed) is used to provide the user with useful related information. The second type of functionality involves taking search context into account when ranking results to standard search queries. Web search differs from traditional information retrieval tasks in several major ways, making effective context-sensitive Web search challenging. First, scalability is of critical importance. With billions of publicly accessible documents, the Web is much larger than traditional datasets. Similarly, with millions of search queries issued each day, the query load is much higher than for traditional information retrieval systems. Second, there are no guarantees on the quality ofWeb pages, with Web-authors taking an adversarial, rather than cooperative, approach in attempts to inflate the rankings of their pages. Third, there is a significant amount of metadata embodied in the link structure corresponding to the hyperlinks between Web pages that can be exploitedduring the retrieval process. In this thesis, we design a search system, using the Stanford WebBase platform, that exploits the link structure of the Web to provide scalable, context-sensitive search.

Lewandowski, D.: Perspektiven eines Open Web Index (2016) 0.05

0.048927996 = product of:
  0.12231999 = sum of:
    0.10875649 = weight(_text_:index in 2935) [ClassicSimilarity], result of:
      0.10875649 = score(doc=2935,freq=6.0), product of:
        0.18579477 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.04251826 = queryNorm
        0.5853582 = fieldWeight in 2935, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2935)
    0.013563501 = product of:
      0.0406905 = sum of:
        0.0406905 = weight(_text_:29 in 2935) [ClassicSimilarity], result of:
          0.0406905 = score(doc=2935,freq=2.0), product of:
            0.14956595 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.04251826 = queryNorm
            0.27205724 = fieldWeight in 2935, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2935)
      0.33333334 = coord(1/3)
  0.4 = coord(2/5)

Abstract: Der Suchmaschinenmarkt wird seit vielen Jahren von nur einer einzigen Suchmaschine, Google, dominiert. Es wurde mittlerweile erkannt, dass diese Situation nicht wünschenswert ist. Wir sprechen nun über mögliche Lösungen. Der Artikel diskutiert unterschiedliche Lösungsansätze und fokussiert dabei auf die Idee einen Offenen Web-Index (OWI), der als öffentliche Infrastruktur verfügbar gemacht werden soll. Die Grundidee ist die Trennung von Datenbestand (Index) und darauf aufsetzenden Diensten, welche in großer Zahl in privater Initiative betrieben werden können. Es geht also darum, die Basis für Vielfalt zu schaffen.
Date: 16. 5.2016 21:53:29

Jezior, T.: Adaption und Integration von Suchmaschinentechnologie in mor(!)dernen OPACs (2013) 0.05

0.046794422 = product of:
  0.11698605 = sum of:
    0.10148491 = weight(_text_:index in 2222) [ClassicSimilarity], result of:
      0.10148491 = score(doc=2222,freq=4.0), product of:
        0.18579477 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.04251826 = queryNorm
        0.5462205 = fieldWeight in 2222, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.0625 = fieldNorm(doc=2222)
    0.015501143 = product of:
      0.04650343 = sum of:
        0.04650343 = weight(_text_:29 in 2222) [ClassicSimilarity], result of:
          0.04650343 = score(doc=2222,freq=2.0), product of:
            0.14956595 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.04251826 = queryNorm
            0.31092256 = fieldWeight in 2222, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0625 = fieldNorm(doc=2222)
      0.33333334 = coord(1/3)
  0.4 = coord(2/5)

Content: Vgl.: http://publiscologne.th-koeln.de/frontdoor/index/index/docId/234
Date: 18.10.2015 10:29:56

Hüskes, R.; Kleber, D.: ¬Den Server im Griff (1999) 0.04

0.043561187 = product of:
  0.10890296 = sum of:
    0.08970083 = weight(_text_:index in 4008) [ClassicSimilarity], result of:
      0.08970083 = score(doc=4008,freq=2.0), product of:
        0.18579477 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.04251826 = queryNorm
        0.48279524 = fieldWeight in 4008, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.078125 = fieldNorm(doc=4008)
    0.019202124 = product of:
      0.057606373 = sum of:
        0.057606373 = weight(_text_:22 in 4008) [ClassicSimilarity], result of:
          0.057606373 = score(doc=4008,freq=2.0), product of:
            0.1488917 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04251826 = queryNorm
            0.38690117 = fieldWeight in 4008, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=4008)
      0.33333334 = coord(1/3)
  0.4 = coord(2/5)

Date: 22. 8.1999 21:21:10
Object: Microsoft Index Server

Franke-Maier, M.; Rüter, C.: Discover Sacherschließung! : Was machen suchmaschinenbasierte Systeme mit unseren inhaltlichen Metadaten? (2015) 0.04

0.040945116 = product of:
  0.10236279 = sum of:
    0.08879929 = weight(_text_:index in 1706) [ClassicSimilarity], result of:
      0.08879929 = score(doc=1706,freq=4.0), product of:
        0.18579477 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.04251826 = queryNorm
        0.4779429 = fieldWeight in 1706, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1706)
    0.013563501 = product of:
      0.0406905 = sum of:
        0.0406905 = weight(_text_:29 in 1706) [ClassicSimilarity], result of:
          0.0406905 = score(doc=1706,freq=2.0), product of:
            0.14956595 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.04251826 = queryNorm
            0.27205724 = fieldWeight in 1706, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1706)
      0.33333334 = coord(1/3)
  0.4 = coord(2/5)

Date: 2. 3.2015 10:29:44
Source: http://opus4.kobv.de/opus4-hsog/frontdoor/index/index/docId/1124

Fordahl, M.: Mit Google den PC durchforsten : Kleines Programm erstellt in rechenfreien Zeiten einen Index (2004) 0.04
```
0.036504466 = product of:
  0.09126116 = sum of:
    0.07768321 = weight(_text_:index in 4209) [ClassicSimilarity], result of:
      0.07768321 = score(doc=4209,freq=6.0), product of:
        0.18579477 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.04251826 = queryNorm
        0.418113 = fieldWeight in 4209, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4209)
    0.013577952 = product of:
      0.040733855 = sum of:
        0.040733855 = weight(_text_:22 in 4209) [ClassicSimilarity], result of:
          0.040733855 = score(doc=4209,freq=4.0), product of:
            0.1488917 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04251826 = queryNorm
            0.27358043 = fieldWeight in 4209, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4209)
      0.33333334 = coord(1/3)
  0.4 = coord(2/5)
```
Content

"Die Google-Suche nach Dateien im Internet kann nun auch auf en eigenen PC ausgedehnt werden. Ein kleines kostenloses Programm, das sich am unteren Bildschirmrand einnistet, startet die Volltextsuche auf der Festplatte. Google erfasst den Inhalt aller Web-Seiten und Dokumente im Microsoft-Office-Format sowie die Namen sonstiger Dateien und zeigt die Trefferliste im Browser in der vertrauten Liste an - allerdings nur auf Computern mit Windows 2000 oder Windows XE Bei der Entwicklung dieses Werkzeugs hat Google sowohl die eigene Suchtechnologie als auch eine Schwäche von Windows ausgenutzt. Bei der "Desktop-Suche" kommt der gleiche Algorithmus zum Einsatz wie bei der Internet-Suche. Für die dazu benötigte Datenbank wird der Index-Dienst von Windows verwendet, der nur wenigen Anwendern bekannt ist, weil er etwas kompliziert und obendrein ziemlich langsam ist. Das neue Google Tool erstellt selbst diesen Suchindex für die Dateien in der Zeit, wenn der Computer gerade untätig ist. Sobald das 400 KB große Programm heruntergeladen und installiert ist, fängt es damit an, den PC zu durchforsten. Bei gut gefüllten Festplatten dauert es ein paar Stunden oder auch ein paar Tage, bis dieser Vorgang abgeschlossen ist. Sobald der Prozessor 30 Sekunden nichts zu tun hat, wird die Arbeit am Index aufgenommen beziehungsweise fortgesetzt. Ist er fertig, bietet diese Datenbank das Material, auf den sich der Google- Algorithmus stürzt, sobald eine Suchanfrage gestartet wird. Die meisten Google-Tricks für die Suche nach Web-Seiten, Bildern oder Beiträgen in Newsgroups funktionieren auch bei der Desktop-Suche."

Date

3. 5.1997 8:44:22

Source

Bergische Landeszeitung. Nr.247 vom 21.10.2004, S.22
Henzinger, M.; Pöppe, C.: "Qualität der Suchergebnisse ist unser höchstes Ziel" : Suchmaschine Google (2002) 0.04
```
0.03615439 = product of:
  0.060257316 = sum of:
    0.03805684 = weight(_text_:index in 851) [ClassicSimilarity], result of:
      0.03805684 = score(doc=851,freq=4.0), product of:
        0.18579477 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.04251826 = queryNorm
        0.20483267 = fieldWeight in 851, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.0234375 = fieldNorm(doc=851)
    0.013979756 = weight(_text_:system in 851) [ClassicSimilarity], result of:
      0.013979756 = score(doc=851,freq=2.0), product of:
        0.13391352 = queryWeight, product of:
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.04251826 = queryNorm
        0.104393914 = fieldWeight in 851, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.0234375 = fieldNorm(doc=851)
    0.008220723 = product of:
      0.024662169 = sum of:
        0.024662169 = weight(_text_:29 in 851) [ClassicSimilarity], result of:
          0.024662169 = score(doc=851,freq=4.0), product of:
            0.14956595 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.04251826 = queryNorm
            0.1648916 = fieldWeight in 851, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0234375 = fieldNorm(doc=851)
      0.33333334 = coord(1/3)
  0.6 = coord(3/5)
```
Content

Spektrum der Wissenschaft: Frau Henzinger, wie viele Seiten des World Wide Web erschließt Google heute? Monika Henzinger: Wir haben über zwei Milliarden Webseiten in unserer Datenbank. Hinzu kommen 700 Millionen Newsgroup-Beiträge, die weit in die Vergangenheit reichen, und 300 Millionen Bilder. - Spektrum: Und diese Inhalte haben Sie komplett gespeichert? - Henzinger: In komprimierter Form, ja. Spektrum: Ist das nicht schon das gesamte Web? - Henzinger: Bei weitem nicht! Eigentlich ist das Web unendlich. Es gibt Datenbanken, die beliebig viele Webseiten auf Anfrage erzeugen können. Natürlich macht es keinen Sinn, die alle in der Suchmaschine zu haben. Wir beschränken uns auf Seiten hoher Qualität. - Spektrum: Wie wählen Sie die aus? - Henzinger: Nach dem so genannten PageRank. Das ist eine Note, die wir jeder Seite geben, unabhängig von irgendeiner Anfrage, für die diese Seite relevant sein könnte. Und zwar ist die Qualität einer Seite - oder anders gesagt: die Hochachtung, die sie innerhalb des Web genießt - umso größer, je mehr andere Seiten auf sie verweisen und je höher die Qualität der verweisenden Seite ist. Der PageRank bestimmt auch wesentlich die Reihenfolge, in der Google dem Anfrager die Ergebnisse präsentiert. - Spektrum: Ist der PageRank manipulierbar, etwa durch ein Zitierkartell? - Henzinger: Es wird zumindest immer wieder versucht. Zum Beispiel ist "Britney Spears" ein sehr häufiger Suchbegriff. Deswegen versuchen viele, ihren PageRank hochzutreiben, um unter den Antworten auf "Britney Spears" auf den vordersten Plätzen zu landen, auch wenn sie bloß Turnschuhe verkaufen. - Spektrum: Und was tun Sie dagegen? - Henzinger: Wenn wir offensichtlichen Missbrauch sehen, nehmen wir die entsprechenden Seiten gezielt heraus - im Interesse unserer Benutzer, für die wir die Qualität wahren wollen. - Spektrum: Gibt es auch andere Maßnahmen als diese Einzelkorrekturen? - Henzinger: Ja. Aber die diskutieren wir nicht öffentlich, um den "Rüstungswettlauf" nicht anzuheizen. - Spektrum: Seit wann gibt es Google? - Henzinger: Die Firma existiert seit dreieinhalb Jahren. Seit reichlich zwei Jahren sind wir auf dem Markt. Die Kunde von uns hat sich durch Mundpropaganda verbreitet, und inzwischen kommt die Hälfte der Anfragen von außerhalb der USA, zwölf Prozent allein aus dem deutschsprachigen Raum. Wir beantworten über 150 Millionen Anfragen am Tag, und zwar direkt oder über unsere Partner. Wenn zum Beispiel die Suchmaschine Yahoo ein Stichwort nicht im eigenen Verzeichnis findet, reicht sie die Anfrage an uns weiter und gibt dem Benutzer unsere Antwort. - Spektrum: Mit welcher Hardware läuft das System? - Henzinger: Mit über zehntausend PCs, verteilt auf vier Datenzentren. Betriebssystem ist Linux. - Spektrum: Wie prüfen Sie, ob die aufgeftihrten Webseiten noch existieren? - Henzinger: Besonders ereignisreiche Webseiten besuchen wir täglich. Alle 28 Tage aktualisieren wir den Index - das ist die Liste, die zu jedem Wort die Seiten aufführt, auf denen es vorkommt. - Spektrum: Wie aufwendig ist dieses Indizieren? - Henzinger: Sehr aufwendig. Etwa eine Woche. - Spektrum: Wie viele Leute hat die Firma? - Henzinger: Ungefähr 300. Bisher haben wir unsere Belegschaft in jedem Jahr ungefähr verdoppelt. -
Spektrum: Wie finanziert sich Google? - Henzinger: Überwiegend durch gewöhnliche Reklame: einzeilige Anzeigen, die nur aus durchlaufendem Text bestehen. Diese Werbung erscheint nur auf solche Fragewörter, die mit dem Produkt in Verbindung stehen, wir nennen das "keyword targeting". Auch kann jeder online eine Anzeige kaufen. Wenn Sie Ihrer Frau über Google zum Geburtstag gratulieren wollen, können Sie eine Anzeige schalten, die nur auf deren Namen hin erscheint. Zweitens durch Search Services. Zum Beispiel bezahlt uns Yahoo dafür, dass unsere Ergebnisse auf deren Seite erscheinen. Manche Firmen wollen auf ihrer Webseite eine Suchfunktion einrichten, aber nicht selbst programmieren. Für diese Unternehmen bauen wir dann einen eigenen Index und beantworten damit die Suchanfragen, die an sie gestellt werden. Schließlich verkaufen wir neuerdings unsere Produkte zur firmeninternen Nutzung in Intranets. Mit diesem Konzept sind wir eine der wenigen neuen Internet-Firmen, die ihr Geld verdienen. - Spektrum: Gibt es neue Projekte? - Henzinger: Zum Beispiel Spracheingabe. Der Benutzer spricht seine Frage in ein Mikrofon und bekommt die Antworten auf den Bildschirm, später vielleicht auch gesprochen. Oder unser News Search. Unsere Maschinen lesen Tageszeitungen und stellen Artikel zum gleichen Thema aus verschiedenen Ländern zusammen. Das ist interessant, weil die Berichterstattung zumeist national gefärbt ist. Ein regelmäßiger Vergleich kann den Horizont erweitern. Klicken Sie unter google.com auf "News and Resources" und dann "Check out the Google news search". Oder User Interfaces. Wie bringt man den Benutzer dazu, mehr als zwei Wörter einzutippen? Je mehr Wörter er ansagt, desto besser können wir ihn bedienen.

Date

31.12.1996 19:29:41
2. 8.2002 14:39:29

Johnson, F.; Rowley, J.; Sbaffi, L.: Exploring information interactions in the context of Google (2016) 0.04

0.035183515 = product of:
  0.08795879 = sum of:
    0.04841807 = weight(_text_:context in 2885) [ClassicSimilarity], result of:
      0.04841807 = score(doc=2885,freq=2.0), product of:
        0.17622331 = queryWeight, product of:
          4.14465 = idf(docFreq=1904, maxDocs=44218)
          0.04251826 = queryNorm
        0.27475408 = fieldWeight in 2885, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.14465 = idf(docFreq=1904, maxDocs=44218)
          0.046875 = fieldNorm(doc=2885)
    0.03954072 = weight(_text_:system in 2885) [ClassicSimilarity], result of:
      0.03954072 = score(doc=2885,freq=4.0), product of:
        0.13391352 = queryWeight, product of:
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.04251826 = queryNorm
        0.29527056 = fieldWeight in 2885, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.046875 = fieldNorm(doc=2885)
  0.4 = coord(2/5)

Abstract: The study sets out to explore the factors that influence the evaluation of information and the judgments made in the process of finding useful information in web search contexts. Based on a diary study of 2 assigned tasks to search on Google and Google Scholar, factor analysis identified the core constructs of content, relevance, scope, and style, as well as informational and system "ease of use" as influencing the judgment that useful information had been found. Differences were found in the participants' evaluation of information across the search tasks on Google and on Google Scholar when identified by the factors related to both content and ease of use. The findings from this study suggest how searchers might critically evaluate information, and the study identifies a relation between the user's involvement in the information interaction and the influences of the perceived system ease of use and information design.

Vaughan, L.; Chen, Y.: Data mining from web search queries : a comparison of Google trends and Baidu index (2015) 0.03
```
0.03491371 = product of:
  0.087284274 = sum of:
    0.07768321 = weight(_text_:index in 1605) [ClassicSimilarity], result of:
      0.07768321 = score(doc=1605,freq=6.0), product of:
        0.18579477 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.04251826 = queryNorm
        0.418113 = fieldWeight in 1605, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1605)
    0.009601062 = product of:
      0.028803186 = sum of:
        0.028803186 = weight(_text_:22 in 1605) [ClassicSimilarity], result of:
          0.028803186 = score(doc=1605,freq=2.0), product of:
            0.1488917 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04251826 = queryNorm
            0.19345059 = fieldWeight in 1605, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1605)
      0.33333334 = coord(1/3)
  0.4 = coord(2/5)
```
Abstract

Numerous studies have explored the possibility of uncovering information from web search queries but few have examined the factors that affect web query data sources. We conducted a study that investigated this issue by comparing Google Trends and Baidu Index. Data from these two services are based on queries entered by users into Google and Baidu, two of the largest search engines in the world. We first compared the features and functions of the two services based on documents and extensive testing. We then carried out an empirical study that collected query volume data from the two sources. We found that data from both sources could be used to predict the quality of Chinese universities and companies. Despite the differences between the two services in terms of technology, such as differing methods of language processing, the search volume data from the two were highly correlated and combining the two data sources did not improve the predictive power of the data. However, there was a major difference between the two in terms of data availability. Baidu Index was able to provide more search volume data than Google Trends did. Our analysis showed that the disadvantage of Google Trends in this regard was due to Google's smaller user base in China. The implication of this finding goes beyond China. Google's user bases in many countries are smaller than that in China, so the search volume data related to those countries could result in the same issue as that related to China.

Source

Journal of the Association for Information Science and Technology. 66(2015) no.1, S.13-22
Markey, K.: Twenty-five years of end-user searching : part 2: future research directions (2007) 0.03
```
0.03477903 = product of:
  0.086947575 = sum of:
    0.040348392 = weight(_text_:context in 443) [ClassicSimilarity], result of:
      0.040348392 = score(doc=443,freq=2.0), product of:
        0.17622331 = queryWeight, product of:
          4.14465 = idf(docFreq=1904, maxDocs=44218)
          0.04251826 = queryNorm
        0.22896172 = fieldWeight in 443, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.14465 = idf(docFreq=1904, maxDocs=44218)
          0.0390625 = fieldNorm(doc=443)
    0.046599183 = weight(_text_:system in 443) [ClassicSimilarity], result of:
      0.046599183 = score(doc=443,freq=8.0), product of:
        0.13391352 = queryWeight, product of:
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.04251826 = queryNorm
        0.3479797 = fieldWeight in 443, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.0390625 = fieldNorm(doc=443)
  0.4 = coord(2/5)
```
Abstract

This is the second part of a two-part article that examines 25 years of published research findings on end-user searching of online information retrieval (IR) systems. In Part 1, it was learned that people enter a few short search statements into online IR systems. Their searches do not resemble the systematic approach of expert searchers who use the full range of IR-system functionality. Part 2 picks up the discussion of research findings about end-user searching in the context of current information retrieval models. These models demonstrate that information retrieval is a complex event, involving changes in cognition, feelings, and/or events during the information seeking process. The author challenges IR researchers to design new studies of end-user searching, collecting data not only on system-feature use, but on multiple search sessions and controlling for variables such as domain knowledge expertise and expert system knowledge. Because future IR systems designers are likely to improve the functionality of online IR systems in response to answers to the new research questions posed here, the author concludes with advice to these designers about retaining the simplicity of online IR system interfaces.
Brin, S.; Page, L.: ¬The anatomy of a large-scale hypertextual Web search engine (1998) 0.03
```
0.03469106 = product of:
  0.08672766 = sum of:
    0.06342807 = weight(_text_:index in 947) [ClassicSimilarity], result of:
      0.06342807 = score(doc=947,freq=4.0), product of:
        0.18579477 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.04251826 = queryNorm
        0.3413878 = fieldWeight in 947, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.0390625 = fieldNorm(doc=947)
    0.023299592 = weight(_text_:system in 947) [ClassicSimilarity], result of:
      0.023299592 = score(doc=947,freq=2.0), product of:
        0.13391352 = queryWeight, product of:
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.04251826 = queryNorm
        0.17398985 = fieldWeight in 947, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.0390625 = fieldNorm(doc=947)
  0.4 = coord(2/5)
```
Abstract

In this paper, we present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext. Google is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems. The prototype with a full text and hyperlink database of at least 24 million pages is available at http://google.stanford.edu/. To engineer a search engine is a challenging task. Search engines index tens to hundreds of millions of web pages involving a comparable number of distinct terms. They answer tens of millions of queries every day. Despite the importance of large-scale search engines on the web, very little academic research has been done on them. Furthermore, due to rapid advance in technology and web proliferation, creating a web search engine today is very different from three years ago. This paper provides an in-depth description of our large-scale web search engine -- the first such detailed public description we know of to date. Apart from the problems of scaling traditional search techniques to data of this magnitude, there are new technical challenges involved with using the additional information present in hypertext to produce better search results. This paper addresses this question of how to build a practical large-scale system which can exploit the additional information present in hypertext. Also we look at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want

Chang, C.-H.; Hsu, C.-C.: Customizable multi-engine search tool with clustering (1997) 0.03

0.034651764 = product of:
  0.086629406 = sum of:
    0.03261943 = weight(_text_:system in 2670) [ClassicSimilarity], result of:
      0.03261943 = score(doc=2670,freq=2.0), product of:
        0.13391352 = queryWeight, product of:
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.04251826 = queryNorm
        0.2435858 = fieldWeight in 2670, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2670)
    0.054009974 = product of:
      0.08101496 = sum of:
        0.0406905 = weight(_text_:29 in 2670) [ClassicSimilarity], result of:
          0.0406905 = score(doc=2670,freq=2.0), product of:
            0.14956595 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.04251826 = queryNorm
            0.27205724 = fieldWeight in 2670, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2670)
        0.04032446 = weight(_text_:22 in 2670) [ClassicSimilarity], result of:
          0.04032446 = score(doc=2670,freq=2.0), product of:
            0.1488917 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04251826 = queryNorm
            0.2708308 = fieldWeight in 2670, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2670)
      0.6666667 = coord(2/3)
  0.4 = coord(2/5)

Abstract: Proposes a new idea of searching under the multi-engine search architecture to overcome the problems associated with relevance ranking. These include clustering of the search results and extraction of co-occurence keywords, which, with the user's feedback, better refines the query in the searching process. The system also provides the construction of the concept space to gradually customize the search tool to fit the usage for the user at the same time
Date: 1. 8.1996 22:08:06
Source: Computer networks and ISDN systems. 29(1997) no.8, S.1217-1224

Khare, R.; Cutting, D.; Sitaker, K.; Rifkin, A.: Nutch: a flexible and scalable open-source Web search engine (2004) 0.03

0.032712005 = product of:
  0.08178001 = sum of:
    0.0538205 = weight(_text_:index in 852) [ClassicSimilarity], result of:
      0.0538205 = score(doc=852,freq=2.0), product of:
        0.18579477 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.04251826 = queryNorm
        0.28967714 = fieldWeight in 852, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.046875 = fieldNorm(doc=852)
    0.027959513 = weight(_text_:system in 852) [ClassicSimilarity], result of:
      0.027959513 = score(doc=852,freq=2.0), product of:
        0.13391352 = queryWeight, product of:
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.04251826 = queryNorm
        0.20878783 = fieldWeight in 852, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.046875 = fieldNorm(doc=852)
  0.4 = coord(2/5)

Abstract: Nutch is an open-source Web search engine that can be used at global, local, and even personal scale. Its initial design goal was to enable a transparent alternative for global Web search in the public interest - one of its signature features is the ability to "explain" its result rankings. Recent work has emphasized how it can also be used for intranets; by local communities with richer data models, such as the Creative Commons metadata-enabled search for licensed content; on a personal scale to index a user's files, email, and web-surfing history; and we also report on several other research projects built on Nutch. In this paper, we present how the architecture of the Nutch system enables it to be more flexible and scalable than other comparable systems today.

Mostafa, J.: Bessere Suchmaschinen für das Web (2006) 0.03
```
0.031206759 = product of:
  0.052011263 = sum of:
    0.017940165 = weight(_text_:index in 4871) [ClassicSimilarity], result of:
      0.017940165 = score(doc=4871,freq=2.0), product of:
        0.18579477 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.04251826 = queryNorm
        0.09655905 = fieldWeight in 4871, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.015625 = fieldNorm(doc=4871)
    0.018639674 = weight(_text_:system in 4871) [ClassicSimilarity], result of:
      0.018639674 = score(doc=4871,freq=8.0), product of:
        0.13391352 = queryWeight, product of:
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.04251826 = queryNorm
        0.13919188 = fieldWeight in 4871, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.015625 = fieldNorm(doc=4871)
    0.015431422 = product of:
      0.023147132 = sum of:
        0.011625857 = weight(_text_:29 in 4871) [ClassicSimilarity], result of:
          0.011625857 = score(doc=4871,freq=2.0), product of:
            0.14956595 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.04251826 = queryNorm
            0.07773064 = fieldWeight in 4871, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.015625 = fieldNorm(doc=4871)
        0.011521274 = weight(_text_:22 in 4871) [ClassicSimilarity], result of:
          0.011521274 = score(doc=4871,freq=2.0), product of:
            0.1488917 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04251826 = queryNorm
            0.07738023 = fieldWeight in 4871, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.015625 = fieldNorm(doc=4871)
      0.6666667 = coord(2/3)
  0.6 = coord(3/5)
```
Content

An der Wurzel des Indexbaums Im ersten Schritt werden potenziell interessante Inhalte identifiziert und fortlaufend gesammelt. Spezielle Programme vom Typ so genannter Webtrawler können im Internet publizierte Seiten ausfindig machen, durchsuchen (inklusive darauf befindlicher Links) und die Seiten an einem Ort gesammelt speichern. Im zweiten Schritt erfasst das System die relevanten Wörter auf diesen Seiten und bestimmt mit statistischen Methoden deren Wichtigkeit. Drittens wird aus den relevanten Begriffen eine hocheffiziente baumartige Datenstruktur erzeugt, die diese Begriffe bestimmten Webseiten zuordnet. Gibt ein Nutzer eine Anfrage ein, wird nur der gesamte Baum - auch Index genannt - durchsucht und nicht jede einzelne Webseite. Die Suche beginnt an der Wurzel des Indexbaums, und bei jedem Suchschritt wird eine Verzweigung des Baums (die jeweils viele Begriffe und zugehörige Webseiten beinhaltet) entweder weiter verfolgt oder als irrelevant verworfen. Dies verkürzt die Suchzeiten dramatisch. Um die relevanten Fundstellen (oder Links) an den Anfang der Ergebnisliste zu stellen, greift der Suchalgorithmus auf verschiedene Sortierstrategien zurück. Eine verbreitete Methode - die Begriffshäufigkeit - untersucht das Vorkommen der Wörter und errechnet daraus numerische Gewichte, welche die Bedeutung der Wörter in den einzelnen Dokumenten repräsentieren. Häufige Wörter (wie »oder«, »zu«, »mit«), die in vielen Dokumenten auftauchen, erhalten deutlich niedrigere Gewichte als Wörter, die eine höhere semantische Relevanz aufweisen und nur in vergleichsweise wenigen Dokumenten zu finden sind. Webseiten können aber auch nach anderen Strategien indiziert werden. Die Linkanalyse beispielsweise untersucht Webseiten nach dem Kriterium, mit welchen anderen Seiten sie verknüpft sind. Dabei wird analysiert, wie viele Links auf eine Seite verweisen und von dieser Seite selbst ausgehen. Google etwa verwendet zur Optimierung der Suchresultate diese Linkanalyse. Sechs Jahre benötigte Google, um sich als führende Suchmaschine zu etablieren. Zum Erfolg trugen vor allem zwei Vorzüge gegenüber der Konkurrenz bei: Zum einen kann Google extrem große Weberawling-Operationen durchführen. Zum anderen liefern seine Indizierungsund Gewichtungsmethoden überragende Ergebnisse. In letzter Zeit jedoch haben andere Suchmaschinen-Entwickler einige neue, ähnlich leistungsfähige oder gar punktuell bessere Systeme entwickelt.
Viele digitale Inhalte können mit Suchmaschinen nicht erschlossen werden, weil die Systeme, die diese verwalten, Webseiten auf andere Weise speichern, als die Nutzer sie betrachten. Erst durch die Anfrage des Nutzers entsteht die jeweils aktuelle Webseite. Die typischen Webtrawler sind von solchen Seiten überfordert und können deren Inhalte nicht erschließen. Dadurch bleibt ein Großteil der Information - schätzungsweise 500-mal so viel wie das, was das konventionelle Web umfasst - für Anwender verborgen. Doch nun laufen Bemühungen, auch dieses »versteckte Web« ähnlich leicht durchsuchbar zu machen wie seinen bisher zugänglichen Teil. Zu diesem Zweck haben Programmierer eine neuartige Software entwickelt, so genannte Wrapper. Sie macht sich zu Nutze, dass online verfügbare Information standardisierte grammatikalische Strukturen enthält. Wrapper erledigen ihre Arbeit auf vielerlei Weise. Einige nutzen die gewöhnliche Syntax von Suchanfragen und die Standardformate der Online-Quellen, um auf versteckte Inhalte zuzugreifen. Andere verwenden so genannte ApplikationsprogrammSchnittstellen (APIs), die Software in die Lage versetzen, standardisierte Operationen und Befehle auszuführen. Ein Beispiel für ein Programm, das auf versteckte Netzinhalte zugreifen kann, ist der von BrightPlanet entwickelte »Deep Query Manager«. Dieser wrapperbasierte Anfragemanager stellt Portale und Suchmasken für mehr als 70 000 versteckte Webquellen bereit. Wenn ein System zur Erzeugung der Rangfolge Links oder Wörter nutzt, ohne dabei zu berücksichtigen, welche Seitentypen miteinander verglichen werden, besteht die Gefahr des Spoofing: Spaßvögel oder Übeltäter richten Webseiten mit geschickt gewählten Wörtern gezielt ein, um das Rangberechnungssystem in die Irre zu führen. Noch heute liefert die Anfrage nach »miserable failure« (»klägliches Versagen«) an erster Stelle eine offizielle Webseite des Weißen Hauses mit der Biografie von Präsident Bush.
Vorsortiert und radförmig präsentiert Statt einfach nur die gewichtete Ergebnisliste zu präsentieren (die relativ leicht durch Spoofing manipuliert werden kann), versuchen einige Suchmaschinen, unter denjenigen Webseiten, die am ehesten der Anfrage entsprechen, Ähnlichkeiten und Unterschiede zu finden und die Ergebnisse in Gruppen unterteilt darzustellen. Diese Muster können Wörter sein, Synonyme oder sogar übergeordnete Themenbereiche, die nach speziellen Regeln ermittelt werden. Solche Systeme ordnen jeder gefundenen Linkgruppe einen charakteristischen Begriff zu. Der Anwender kann die Suche dann weiter verfeinern, indem er eine Untergruppe von Ergebnissen auswählt. So liefern etwa die Suchmaschinen »Northern Light« (der Pionier auf diesem Gebiet) und »Clusty« nach Gruppen (Clustern) geordnete Ergebnisse. »Mooter«, eine innovative Suchmaschine, die ebenfalls diese Gruppiertechnik verwendet, stellt die Gruppen zudem grafisch dar (siehe Grafik links unten). Das System ordnet die UntergruppenButtons radförmig um einen zentralen Button an, der sämtliche Ergebnisse enthält. Ein Klick auf die UntergruppenButtons erzeugt Listen relevanter Links und zeigt neue, damit zusammenhängende Gruppen. Mooter erinnert sich daran, welche Untergruppen gewählt wurden. Noch genauere Ergebnisse erhält der Nutzer, wenn er die Verfeinerungsoption wählt: Sie kombiniert bei früheren Suchen ausgewählte Gruppen mit der aktuellen Anfrage. Ein ähnliches System, das ebenfalls visuelle Effekte nutzt, ist »Kartoo«. Es handelt sich dabei um eine so genannte Meta-Suchmaschine: Sie gibt die Nutzeranfragen an andere Suchmaschinen weiter und präsentiert die gesammelten Ergebnisse in grafischer Form. Kartoo liefert eine Liste von Schlüsselbegriffen von den unterschiedlichen Webseiten und generiert daraus eine »Landkarte«. Auf ihr werden wichtige Seiten als kons (Symbole) dargestellt und Bezüge zwischen den Seiten mit Labeln und Pfaden versehen. Jedes Label lässt sich zur weiteren Verfeinerung der Suche nutzen. Einige neue Computertools erweitern die Suche dadurch, dass sie nicht nur das Web durchforsten, sondern auch die Festplatte des eigenen Rechners. Zurzeit braucht man dafür noch eigenständige Programme. Aber Google hat beispielsweise kürzlich seine »Desktop Search« angekündigt, die zwei Funktionen kombiniert: Der Anwender kann angeben, ob das Internet, die Festplatte oder beides zusammen durchsucht werden soll. Die nächste Version von Microsoft Windows (Codename »Longhorn«) soll mit ähnlichen Fähigkeiten ausgestattet werden: Longhorn soll die implizite Suche beherrschen, bei der Anwender ohne Eingabe spezifischer Anfragen relevante Informationen auffinden können. (Dabei werden Techniken angewandt, die in einem anderen Microsoft-Projekt namens »Stuff I've seen« - »Sachen, die ich gesehen habe« - entwickelt wurden.) Bei der impliziten Suche werden Schlüsselwörter aus der Textinformation gewonnen, die der Anwender in jüngster Zeit auf dem Rechner verarbeitet oder verändert hat - etwa E-Mails oder Word-Dokumente -, um damit auf der Festplatte gespeicherte Informationen wiederzufinden. Möglicherweise wird Microsoft diese Suchfunktion auch auf Webseiten ausdehnen. Außerdem sollen Anwender auf dem Bildschirm gezeigte Textinhalte leichter in Suchanfragen umsetzen können." ...

Date

31.12.1996 19:29:41
22. 1.2006 18:34:49

Search (414 results, page 1 of 21)

Authors

Years

Languages

Types

Themes

Subjects

Classifications