Search (168 results, page 1 of 9)

Alqaraleh, S.; Ramadan, O.; Salamah, M.: Efficient watcher based web crawler design (2015) 0.14
```
0.13557659 = product of:
  0.33894148 = sum of:
    0.30602702 = weight(_text_:crawler in 1627) [ClassicSimilarity], result of:
      0.30602702 = score(doc=1627,freq=6.0), product of:
        0.39420572 = queryWeight, product of:
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.048587184 = queryNorm
        0.776313 = fieldWeight in 1627, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1627)
    0.032914463 = weight(_text_:22 in 1627) [ClassicSimilarity], result of:
      0.032914463 = score(doc=1627,freq=2.0), product of:
        0.17014404 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.048587184 = queryNorm
        0.19345059 = fieldWeight in 1627, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1627)
  0.4 = coord(2/5)
```
Abstract

Purpose The purpose of this paper is to design a watcher-based crawler (WBC) that has the ability of crawling static and dynamic web sites, and can download only the updated and newly added web pages. Design/methodology/approach In the proposed WBC crawler, a watcher file, which can be uploaded to the web sites servers, prepares a report that contains the addresses of the updated and the newly added web pages. In addition, the WBC is split into five units, where each unit is responsible for performing a specific crawling process. Findings Several experiments have been conducted and it has been observed that the proposed WBC increases the number of uniquely visited static and dynamic web sites as compared with the existing crawling techniques. In addition, the proposed watcher file not only allows the crawlers to visit the updated and newly web pages, but also solves the crawlers overlapping and communication problems. Originality/value The proposed WBC performs all crawling processes in the sense that it detects all updated and newly added pages automatically without any human explicit intervention or downloading the entire web sites.

Date

20. 1.2015 18:30:22
Lehrke, C.: Architektur von Suchmaschinen : Googles Architektur, insb. Crawler und Indizierer (2005) 0.11
```
0.1131138 = product of:
  0.2827845 = sum of:
    0.24987002 = weight(_text_:crawler in 867) [ClassicSimilarity], result of:
      0.24987002 = score(doc=867,freq=4.0), product of:
        0.39420572 = queryWeight, product of:
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.048587184 = queryNorm
        0.6338569 = fieldWeight in 867, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.0390625 = fieldNorm(doc=867)
    0.032914463 = weight(_text_:22 in 867) [ClassicSimilarity], result of:
      0.032914463 = score(doc=867,freq=2.0), product of:
        0.17014404 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.048587184 = queryNorm
        0.19345059 = fieldWeight in 867, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.0390625 = fieldNorm(doc=867)
  0.4 = coord(2/5)
```
Abstract

Das Internet mit seinen ständig neuen Usern und seinem extremen Wachstum bringt viele neue Herausforderungen mit sich. Aufgrund dieses Wachstums bedienen sich die meisten Leute der Hilfe von Suchmaschinen um Inhalte innerhalb des Internet zu finden. Suchmaschinen nutzen für die Beantwortung der User-Anfragen Information Retrieval Techniken. Problematisch ist nur, dass traditionelle Information Retrieval (IR) Systeme für eine relativ kleine und zusammenhängende Sammlung von Dokumenten entwickelt wurden. Das Internet hingegen unterliegt einem ständigen Wachstum, schnellen Änderungsraten und es ist über geographisch verteilte Computer verteilt. Aufgrund dieser Tatsachen müssen die alten Techniken erweitert oder sogar neue IRTechniken entwickelt werden. Eine Suchmaschine die diesen Herausforderungen vergleichsweise erfolgreich entgegnet ist Google. Ziel dieser Arbeit ist es aufzuzeigen, wie Suchmaschinen funktionieren. Der Fokus liegt dabei auf der Suchmaschine Google. Kapitel 2 wird sich zuerst mit dem Aufbau von Suchmaschinen im Allgemeinen beschäftigen, wodurch ein grundlegendes Verständnis für die einzelnen Komponenten geschaffen werden soll. Im zweiten Teil des Kapitels wird darauf aufbauend ein Überblick über die Architektur von Google gegeben. Kapitel 3 und 4 dienen dazu, näher auf die beiden Komponenten Crawler und Indexer einzugehen, bei denen es sich um zentrale Elemente im Rahmen von Suchmaschinen handelt.

Pages

22 S

Thiele, J.: Sie haben 502.456 Treffer! (1999) 0.07

0.07067391 = product of:
  0.35336956 = sum of:
    0.35336956 = weight(_text_:crawler in 3868) [ClassicSimilarity], result of:
      0.35336956 = score(doc=3868,freq=2.0), product of:
        0.39420572 = queryWeight, product of:
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.048587184 = queryNorm
        0.89640903 = fieldWeight in 3868, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.078125 = fieldNorm(doc=3868)
  0.2 = coord(1/5)

Object: Crawler

Reinke, S.; Schmidt, M.: Einmal suchen, alles finden : 7 Meta-Suchmaschinen im Test (2001) 0.07

0.07067391 = product of:
  0.35336956 = sum of:
    0.35336956 = weight(_text_:crawler in 176) [ClassicSimilarity], result of:
      0.35336956 = score(doc=176,freq=2.0), product of:
        0.39420572 = queryWeight, product of:
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.048587184 = queryNorm
        0.89640903 = fieldWeight in 176, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.078125 = fieldNorm(doc=176)
  0.2 = coord(1/5)

Abstract: Von MetaSuchmaschinen oder Metacrawlern erwarten viele Datensucher Wunder. Die Crawler durchstöbern Kataloge von Suchmaschinen, fassen Ergebnisse zusammen, gleichen sie ab und präsentieren sie. CHIP hat sieben deutschsprachige, kostenlose Metacrawler getestet

Kwiatkowski, M.; Höhfeld, S.: Thematisches Aufspüren von Web-Dokumenten : eine kritische Betrachtung von Focused Crawling-Strategien (2007) 0.06
```
0.061205406 = product of:
  0.30602702 = sum of:
    0.30602702 = weight(_text_:crawler in 153) [ClassicSimilarity], result of:
      0.30602702 = score(doc=153,freq=6.0), product of:
        0.39420572 = queryWeight, product of:
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.048587184 = queryNorm
        0.776313 = fieldWeight in 153, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.0390625 = fieldNorm(doc=153)
  0.2 = coord(1/5)
```
Abstract

Herkömmliche Suchmaschinen dienen der breiten Websuche und zeichnen sich zumeist durch eine hohe Quantität - nicht unbedingt durch Qualität - ihrer Ergebnismengen aus. Zum Suchen von Dokumenten wird ein allgemeiner Crawler eingesetzt, der Webseiten aufspürt, um große Datenspeicher aufzubauen. Focused Crawler gehen dagegen gezielter vor: Nicht enorme Datenmengen sollen durchsucht, gespeichert und indexiert werden, sondern nur bestimmte, thematisch relevante Segmente des World Wide Web. Der Focused Crawler muss einen möglichst optimalen Weg durch das Web finden, um Knowledge Discovery zu betreiben. Dabei bleiben die für eine Thematik irrelevanten Bereiche des Web unberücksichtigt. Die Aufgabe wird dadurch erheblich verkleinert und der Ressourcenaufwand verringert. Ziel ist die Produktion qualifizierter Suchergebnisse zu einem bestimmten Wissensgebiet. Im Allgemeinen können Focused Crawling-Techniken für den Aufbau spezialisierter vertikaler Suchmaschinen eingesetzt werden. Sie sind darüber hinaus im Bereich der Digitalen Bibliotheken von Vorteil. Da diese oft über einen thematischen Schwerpunkt verfügen und der qualifizierten Literatur-Untersuchung dienen, müssen sie einen gewissen Qualitätsanspruch Genüge leisten und dabei lediglich Anfragen zu einem definierten Wissensbereich bedienen. Der Einsatz von Focused Crawling bietet sich also an, um eine hohe Dokument-Qualität in einer spezifischen Domäne zu gewährleisten. Dieser Review-Artikel beleuchtet grundlegende Ansätze des Focused Crawling und verfolgt diese bis in die aktuellen Entwicklungen. Praktische Einsatzgebiete und aktuelle Systeme untermauern die Bedeutsamkeit des Forschungsgebiets. Darüber hinaus wird eine kritische Betrachtung der aufgeführten Ansätze geleistet.
Fu, T.; Abbasi, A.; Chen, H.: ¬A focused crawler for Dark Web forums (2010) 0.06
```
0.061205406 = product of:
  0.30602702 = sum of:
    0.30602702 = weight(_text_:crawler in 3471) [ClassicSimilarity], result of:
      0.30602702 = score(doc=3471,freq=6.0), product of:
        0.39420572 = queryWeight, product of:
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.048587184 = queryNorm
        0.776313 = fieldWeight in 3471, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3471)
  0.2 = coord(1/5)
```
Abstract

The unprecedented growth of the Internet has given rise to the Dark Web, the problematic facet of the Web associated with cybercrime, hate, and extremism. Despite the need for tools to collect and analyze Dark Web forums, the covert nature of this part of the Internet makes traditional Web crawling techniques insufficient for capturing such content. In this study, we propose a novel crawling system designed to collect Dark Web forum content. The system uses a human-assisted accessibility approach to gain access to Dark Web forums. Several URL ordering features and techniques enable efficient extraction of forum postings. The system also includes an incremental crawler coupled with a recall-improvement mechanism intended to facilitate enhanced retrieval and updating of collected content. Experiments conducted to evaluate the effectiveness of the human-assisted accessibility approach and the recall-improvement-based, incremental-update procedure yielded favorable results. The human-assisted approach significantly improved access to Dark Web forums while the incremental crawler with recall improvement also outperformed standard periodic- and incremental-update approaches. Using the system, we were able to collect over 100 Dark Web forums from three regions. A case study encompassing link and content analysis of collected forums was used to illustrate the value and importance of gathering and analyzing content from such online communities.
Raeder, A.: Cataloguing the Web (1995) 0.06
```
0.05653913 = product of:
  0.28269565 = sum of:
    0.28269565 = weight(_text_:crawler in 3387) [ClassicSimilarity], result of:
      0.28269565 = score(doc=3387,freq=2.0), product of:
        0.39420572 = queryWeight, product of:
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.048587184 = queryNorm
        0.7171272 = fieldWeight in 3387, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.0625 = fieldNorm(doc=3387)
  0.2 = coord(1/5)
```
Abstract

Lists and describes sites that attempt to aid Internet searchers by helping them locate sites, files or information. Gives an overview of the methods used. Covers the following sides: Aliweb, ArchiPlex Archie Gateway, CUI W3, Clearing House for Subject Oriented Internet Resource Guide, InfoSeek, JumpStation, Lawrence Livermore National Laboratories List of Lists, Lycos WWW Search Engine, Mother of all BBSs, NIKOS, Plant Earth Home Page, Standford Newnews Filtering Service, WWW Home Page Harvest Browser, WWW virtual Library, WWW Wanderer Index, WWW Worm, Web Crawler, Whole Internet Catalog, and Yahoo Index to the Internet
Blake, P.: AltaVista and Notes for the web (1996) 0.06
```
0.05653913 = product of:
  0.28269565 = sum of:
    0.28269565 = weight(_text_:crawler in 4537) [ClassicSimilarity], result of:
      0.28269565 = score(doc=4537,freq=2.0), product of:
        0.39420572 = queryWeight, product of:
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.048587184 = queryNorm
        0.7171272 = fieldWeight in 4537, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.0625 = fieldNorm(doc=4537)
  0.2 = coord(1/5)
```
Footnote

Briefly reviews the AltaVista and Notes search software for searching the WWW. In the case of AltaVista, Digital claims that this web crawler has been crawling the WWW at the rate of 2,5 million pages per day and already accounts for the indexing of 16 million pages and 13.000 newsgroups. Suggests that AltaVista pulls of significantly more on obscure or specialist subjects than rivals like InfoSeek and Excite. concludes with details of IBM's development of the Lotus WWW searcher designed to cope with the increasing complexity of web applications

Esser, M.: Was Sie über Suchmaschinen wissen sollten (1998) 0.06

0.05653913 = product of:
  0.28269565 = sum of:
    0.28269565 = weight(_text_:crawler in 2335) [ClassicSimilarity], result of:
      0.28269565 = score(doc=2335,freq=2.0), product of:
        0.39420572 = queryWeight, product of:
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.048587184 = queryNorm
        0.7171272 = fieldWeight in 2335, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.0625 = fieldNorm(doc=2335)
  0.2 = coord(1/5)

Object: Crawler

Schneller finden! (1999) 0.06

0.05653913 = product of:
  0.28269565 = sum of:
    0.28269565 = weight(_text_:crawler in 2929) [ClassicSimilarity], result of:
      0.28269565 = score(doc=2929,freq=2.0), product of:
        0.39420572 = queryWeight, product of:
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.048587184 = queryNorm
        0.7171272 = fieldWeight in 2929, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.0625 = fieldNorm(doc=2929)
  0.2 = coord(1/5)

Object: CRAWLER

Reibold, H.: Findigkeit gefragt (2000) 0.06

0.05653913 = product of:
  0.28269565 = sum of:
    0.28269565 = weight(_text_:crawler in 4283) [ClassicSimilarity], result of:
      0.28269565 = score(doc=4283,freq=2.0), product of:
        0.39420572 = queryWeight, product of:
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.048587184 = queryNorm
        0.7171272 = fieldWeight in 4283, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.0625 = fieldNorm(doc=4283)
  0.2 = coord(1/5)

Object: Crawler

Stock, M.; Stock, W.G.: Internet-Suchwerkzeuge im Vergleich : Teil 1: Retrievaltests mit Known Item searches (2000) 0.05

0.049471736 = product of:
  0.24735868 = sum of:
    0.24735868 = weight(_text_:crawler in 5772) [ClassicSimilarity], result of:
      0.24735868 = score(doc=5772,freq=2.0), product of:
        0.39420572 = queryWeight, product of:
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.048587184 = queryNorm
        0.6274863 = fieldWeight in 5772, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5772)
  0.2 = coord(1/5)

Object: Web-Crawler

Thelwall, M.; Stuart, D.: Web crawling ethics revisited : cost, privacy, and denial of service (2006) 0.05
```
0.049471736 = product of:
  0.24735868 = sum of:
    0.24735868 = weight(_text_:crawler in 6098) [ClassicSimilarity], result of:
      0.24735868 = score(doc=6098,freq=2.0), product of:
        0.39420572 = queryWeight, product of:
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.048587184 = queryNorm
        0.6274863 = fieldWeight in 6098, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.0546875 = fieldNorm(doc=6098)
  0.2 = coord(1/5)
```
Abstract

Ethical aspects of the employment of Web crawlers for information science research and other contexts are reviewed. The difference between legal and ethical uses of communications technologies is emphasized as well as the changing boundary between ethical and unethical conduct. A review of the potential impacts on Web site owners is used to underpin a new framework for ethical crawling, and it is argued that delicate human judgment is required for each individual case, with verdicts likely to change over time. Decisions can be based upon an approximate cost-benefit analysis, but it is crucial that crawler owners find out about the technological issues affecting the owners of the sites being crawled in order to produce an informed assessment.

Li, L.; Shang, Y.; Zhang, W.: Improvement of HITS-based algorithms on Web documents 0.05

0.0463016 = product of:
  0.23150799 = sum of:
    0.23150799 = weight(_text_:3a in 2514) [ClassicSimilarity], result of:
      0.23150799 = score(doc=2514,freq=2.0), product of:
        0.4119227 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.048587184 = queryNorm
        0.56201804 = fieldWeight in 2514, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=2514)
  0.2 = coord(1/5)

Content: Vgl.: http%3A%2F%2Fdelab.csd.auth.gr%2F~dimitris%2Fcourses%2Fir_spring06%2Fpage_rank_computing%2Fp527-li.pdf. Vgl. auch: http://www2002.org/CDROM/refereed/643/.

Venditto, G.: Search engine showdown : IW Labs tests seven Internet search tools (1996) 0.04
```
0.042404346 = product of:
  0.21202172 = sum of:
    0.21202172 = weight(_text_:crawler in 4983) [ClassicSimilarity], result of:
      0.21202172 = score(doc=4983,freq=2.0), product of:
        0.39420572 = queryWeight, product of:
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.048587184 = queryNorm
        0.5378454 = fieldWeight in 4983, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.046875 = fieldNorm(doc=4983)
  0.2 = coord(1/5)
```
Abstract

Reports results of the IW Labs tests carried out on 7 major WWW search engines available free of charge on the Internet: AltaVista; Excite; InfoSeek; Lycos; Open Text; Web Crawler; and WWW Worm. Notes the differences between these 7 search engines and the Net directories like Yahoo and Magellan, which are essentially registries of web sites based on descriptions submitted by webmasters or written by the directory's staff. The search engines were tested using both simple searches for the name of a celebrity or a popular web site as well as more complex searches. Results indicated that the most relevant results and the most comprehensive results were obtained from InfoSeek and from AltaVista respectively. Excite was almost as good as InfoSeek in finding relavant pages and users who are comfortable with Boolean searching may prefer OpenText, which excels in this respect

Dresler, S.; Grosse, A.G.; Rösner, A.: Realisierung und Optimierung der Informationsbeschaffung von Internet-Suchmaschinen am Beispiel von www.crawler.de (1997) 0.04

0.042404346 = product of:
  0.21202172 = sum of:
    0.21202172 = weight(_text_:crawler in 716) [ClassicSimilarity], result of:
      0.21202172 = score(doc=716,freq=2.0), product of:
        0.39420572 = queryWeight, product of:
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.048587184 = queryNorm
        0.5378454 = fieldWeight in 716, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.046875 = fieldNorm(doc=716)
  0.2 = coord(1/5)

Object: Crawler

Kaiser, C.: Mit "Neomo" und "Turbo 10" neue Initiativen auf dem deutschen und britischen Suchmarkt (2005) 0.04
```
0.042404346 = product of:
  0.21202172 = sum of:
    0.21202172 = weight(_text_:crawler in 3434) [ClassicSimilarity], result of:
      0.21202172 = score(doc=3434,freq=2.0), product of:
        0.39420572 = queryWeight, product of:
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.048587184 = queryNorm
        0.5378454 = fieldWeight in 3434, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.046875 = fieldNorm(doc=3434)
  0.2 = coord(1/5)
```
Abstract

"Search Engine Strategies Conference" (SES) in München mit 160 Teilnehmern. Die Vortragenden waren im Regelfall sehr kompetent, und die Zuhörerschaft schien gut vorinformiert zu sein. Trotzdem wäre bei manchen Vorträgen mehr Inhalt und Fachkompetenz wünschenswert gewesen - wie beispielsweise beim Vortrag von Google. Die geplante Session "Treffen Sie die Crawler" fand leider nicht statt. Mittlerweile gibt es andere interessante Konferenzen in Europas, die sich mit Suchmaschinenmarketing und -optimierung befassten, wie das "Suchmaschinenmarketingseminar" in Heidelberg im November 2004, das wenig besucht war, aber hochinteressante Fachvorträge und Diskussionsforen bot. Die SES gilt bisher als das wichtigste Branchenereignis für Suchmaschinenmarketing und -optimierung weltweit. Hier treffen sich Websiteanbieter, Suchmaschinenmarketingagenturen und Suchmaschinenbetreiber. Außer allgemeinen Einblicken in die aktuelle Entwicklung der Branche bietet die SES Informationen zu Themen wie dynamische Websites, Websitestruktur, Verlinkung und Keywordanalysen. Neue Themen waren "lokale Suche", die aktuelle Entwicklung im deutschen Suchmarkt und markenrechtliche Probleme. Websiteanbieter konnten in den "Website-Klinik"-Sessions ihre Sites von Experten prüfen lassen und praktische Tipps zur Verbesserung des Rankings einholen.
Ding, L.; Finin, T.; Joshi, A.; Peng, Y.; Cost, R.S.; Sachs, J.; Pan, R.; Reddivari, P.; Doshi, V.: Swoogle : a Semantic Web search and metadata engine (2004) 0.04
```
0.042404346 = product of:
  0.21202172 = sum of:
    0.21202172 = weight(_text_:crawler in 4704) [ClassicSimilarity], result of:
      0.21202172 = score(doc=4704,freq=2.0), product of:
        0.39420572 = queryWeight, product of:
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.048587184 = queryNorm
        0.5378454 = fieldWeight in 4704, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.046875 = fieldNorm(doc=4704)
  0.2 = coord(1/5)
```
Abstract

Swoogle is a crawler-based indexing and retrieval system for the Semantic Web, i.e., for Web documents in RDF or OWL. It extracts metadata for each discovered document, and computes relations between documents. Discovered documents are also indexed by an information retrieval system which can use either character N-Gram or URIrefs as keywords to find relevant documents and to compute the similarity among a set of documents. One of the interesting properties we compute is rank, a measure of the importance of a Semantic Web document.
Becker, A: Neue Suchmaschinen für Fortgeschrittene : Neue Such-Angebote: Die fünf Top-Newcomer im Überblick (2000) 0.04
```
0.035336956 = product of:
  0.17668478 = sum of:
    0.17668478 = weight(_text_:crawler in 1526) [ClassicSimilarity], result of:
      0.17668478 = score(doc=1526,freq=2.0), product of:
        0.39420572 = queryWeight, product of:
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.048587184 = queryNorm
        0.44820452 = fieldWeight in 1526, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1526)
  0.2 = coord(1/5)
```
Content

Kart00.com - Anstatt seine Resultate traditionell als Liste darzustellen, zeichnet der Meta-Sucher eine Ergebniskarte. Vorteil: Die bildliche Darstellung sorgtfür einen überzeugenden Themenüberblick. Teoma.com - Die Maschine fahndet mit drei unterschiedlichen Methoden: via Volltextsuche, über Expertenseiten und mithilfe von Schlagwörtern. Vorteil: Durch die innovative 3D-Suche erzielt Teoma bei speziellen Recherchen beachtliche Ergebnisse. Wondir.com - Zu jeder Anfrage gibt es bei Wondir Antworten auf fünf Ebenen. Von einer Trefferliste bis hin zu einem Experten-MailKontakt. Vorteil: ideal für komplizierte und wissenschaftliche Themen. Turb10.com - Der neue britische Meta-Crawler durchforstet gleichzeitig sowohl das normale als auch das Deep Web. Vorteil: Dank Turb10.com muss niemand mehr für Deep-Web-Recherchen auf spezielle Zusatzprogramme zurückgreifen. Hotbot.com - Der Ex-Volitextdienst setzt jetzt auf Service. Über seine Startseite lassen sich vier Top-Dienste (u.a. Google, FAST) abfragen. Vorteil: Hotbot bietet vier Spitzenangebote auf einen Klick.
Jörn, F.: Wie Google für uns nach der ominösen Gluonenkraft stöbert : Software-Krabbler machen sich vor der Anfrage auf die Suche - Das Netz ist etwa fünfhundertmal größer als alles Durchforschte (2001) 0.03
```
0.03353588 = product of:
  0.0838397 = sum of:
    0.07067391 = weight(_text_:crawler in 3684) [ClassicSimilarity], result of:
      0.07067391 = score(doc=3684,freq=2.0), product of:
        0.39420572 = queryWeight, product of:
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.048587184 = queryNorm
        0.1792818 = fieldWeight in 3684, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.015625 = fieldNorm(doc=3684)
    0.013165785 = weight(_text_:22 in 3684) [ClassicSimilarity], result of:
      0.013165785 = score(doc=3684,freq=2.0), product of:
        0.17014404 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.048587184 = queryNorm
        0.07738023 = fieldWeight in 3684, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.015625 = fieldNorm(doc=3684)
  0.4 = coord(2/5)
```
Abstract

Der weitere Speicher wird für die URL-Adreßdatenbank gebraucht, welche die Krabbler steuert, und als Zwischenspeicher für frisch aufgesuchte Dokumente, die dort ihrer Indizierung harren. An Anfragen kommen bei Infoseek, die T-Online und andere bedienen, täglich zwei Millionen herein; Hauptsuchzeit ist abends 20 bis 23 Uhr. Ja, Spitzenreiter der Suchbegriffe ist immer noch Sex. Gehen wir auf die Suche nach Seltenem. Im internationalen Wettstreit um die weitreichendste Netzausforschung hat zur Zeit die Suchmaschine Google (www.Google.com, "search 1.346.966.000 web pages") mit über 700 Millionen indizierten, teils sogar gespeicherten Seiten die Nase vorn, zumal sie dank ihrer Linktechnik weitere fast 700 Millionen Seiten kennt. Täglich bekommt Google 70 Millionen Anfragen. An zweiter Stelle mit knapp 600 Millionen Seiten folgt Fast, als "Alltheweb" bekannt (www.alltheweb.com), danach etwa gleichrangig mit über 500 Millionen Seiten der Oldtimer Altavista (www.altavista.com), Inktomi und Webtop (www.webtop.com). Inktomi liefert seine Ergebnisse an andere, erst an Hotbot, dann an Microsoft (www.msn.com), bis zum Juli 2000 auch an Yahoo (www.yahoo.com). Yahoo, geboren 1994, ist die älteste und immer noch eine sehr beliebte Suchmaschine, nicht, weil sie Exotika wie "Gluonenkraft" liefern könnte-, sondern weil sich dort rund 150 Katalogisierer Menschen! - um Stichwörter kümmern. Nur wenn die nichts fanden, werden fremde Ergebnisse zugespielt, inzwischen von Google. Ähnlich ist das bei Look Smart (www.looksmart.com), die von Inktomi unterversorgt wird. In hartnäckigen Fällen nutze man Übersuchmaschinen, sogenannte Meta-Crawler wie www.ixquick.com oder hier www.metager.de, die den eingegebenen Begriff automatisch in mehreren Suchmaschinen aufzuspüren versuchen (nicht in Google). Bei den meisten Suchen geht es jedoch nicht um seltene Begriffe. Von den 75 Millionen Ausdrücken, die Altavista einst zählte, werden üblicherweise triviale gesucht. Die Datenbankgröße der Suchmaschine ist dann belanglos. Zudem stehen viele Inhalte mehrfach im Netz, und der Suchende will nicht fünfmal dasselbe vorgespielt bekommen. Bei den meist viel zu vielen Treffern ist die wirkliche Frage deren Anzeigereihenfolge. Da wird versucht, nach Häufigkeit des Wortes im Text zu sortieren oder danach, ob es im Titel und näher am Textanfang vorkommt. Die Suchmaschinen erklären selbst ein wenig davon, zugleich als Aufforderung an WebDesigner, einfache Seiten zu machen, sich kurz und möglichst rahmenlos zu fassen. Speziell für die Suchmaschinen haben die meisten Webseiten im Kopfeintrag Stichwörter, im Quelltext der Seite von jedermann zu sehen. Webseiten können sich "Roboter" sogar verbitten. In den Suchmaschinen-Redaktionen wird für viele Begriffe die Ausgabe manuell festgelegt - wobei zuweilen bereits ein gutes "Placement" bezahlt wird, was sicher bedenklich ist. Für den Neuankömmling Google haben sich 1998 Sergey Brin und Larry Page etwas Besonderes ausgedacht: Die Seiten werden nach Beliebtheit bewertet, und die hängt davon ab, wie viele (beliebte) Seiten zur jeweiligen Seite einen Link hin haben. Das ist gut für klassische Inhalte. Neuigkeiten, auf die noch niemand deutet, werden so nicht gefunden. Für allgemeine Fragen kommt die Lösung nicht von großen Automaten, sondern von spezialisierten Auskunfteien, die rubriziert nach Sachgebieten vorgehen.

Date

22. 6.2005 9:52:00

Search (168 results, page 1 of 9)

Authors

Years

Languages

Types

Themes

Subjects

Classifications