Search (3676 results, page 1 of 184)

  1. Alqaraleh, S.; Ramadan, O.; Salamah, M.: Efficient watcher based web crawler design (2015) 0.14
    0.13557659 = product of:
      0.33894148 = sum of:
        0.30602702 = weight(_text_:crawler in 1627) [ClassicSimilarity], result of:
          0.30602702 = score(doc=1627,freq=6.0), product of:
            0.39420572 = queryWeight, product of:
              8.113368 = idf(docFreq=35, maxDocs=44218)
              0.048587184 = queryNorm
            0.776313 = fieldWeight in 1627, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              8.113368 = idf(docFreq=35, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1627)
        0.032914463 = weight(_text_:22 in 1627) [ClassicSimilarity], result of:
          0.032914463 = score(doc=1627,freq=2.0), product of:
            0.17014404 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.048587184 = queryNorm
            0.19345059 = fieldWeight in 1627, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1627)
      0.4 = coord(2/5)
    
    Abstract
    Purpose The purpose of this paper is to design a watcher-based crawler (WBC) that has the ability of crawling static and dynamic web sites, and can download only the updated and newly added web pages. Design/methodology/approach In the proposed WBC crawler, a watcher file, which can be uploaded to the web sites servers, prepares a report that contains the addresses of the updated and the newly added web pages. In addition, the WBC is split into five units, where each unit is responsible for performing a specific crawling process. Findings Several experiments have been conducted and it has been observed that the proposed WBC increases the number of uniquely visited static and dynamic web sites as compared with the existing crawling techniques. In addition, the proposed watcher file not only allows the crawlers to visit the updated and newly web pages, but also solves the crawlers overlapping and communication problems. Originality/value The proposed WBC performs all crawling processes in the sense that it detects all updated and newly added pages automatically without any human explicit intervention or downloading the entire web sites.
    Date
    20. 1.2015 18:30:22
  2. Lehrke, C.: Architektur von Suchmaschinen : Googles Architektur, insb. Crawler und Indizierer (2005) 0.11
    0.1131138 = product of:
      0.2827845 = sum of:
        0.24987002 = weight(_text_:crawler in 867) [ClassicSimilarity], result of:
          0.24987002 = score(doc=867,freq=4.0), product of:
            0.39420572 = queryWeight, product of:
              8.113368 = idf(docFreq=35, maxDocs=44218)
              0.048587184 = queryNorm
            0.6338569 = fieldWeight in 867, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              8.113368 = idf(docFreq=35, maxDocs=44218)
              0.0390625 = fieldNorm(doc=867)
        0.032914463 = weight(_text_:22 in 867) [ClassicSimilarity], result of:
          0.032914463 = score(doc=867,freq=2.0), product of:
            0.17014404 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.048587184 = queryNorm
            0.19345059 = fieldWeight in 867, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=867)
      0.4 = coord(2/5)
    
    Abstract
    Das Internet mit seinen ständig neuen Usern und seinem extremen Wachstum bringt viele neue Herausforderungen mit sich. Aufgrund dieses Wachstums bedienen sich die meisten Leute der Hilfe von Suchmaschinen um Inhalte innerhalb des Internet zu finden. Suchmaschinen nutzen für die Beantwortung der User-Anfragen Information Retrieval Techniken. Problematisch ist nur, dass traditionelle Information Retrieval (IR) Systeme für eine relativ kleine und zusammenhängende Sammlung von Dokumenten entwickelt wurden. Das Internet hingegen unterliegt einem ständigen Wachstum, schnellen Änderungsraten und es ist über geographisch verteilte Computer verteilt. Aufgrund dieser Tatsachen müssen die alten Techniken erweitert oder sogar neue IRTechniken entwickelt werden. Eine Suchmaschine die diesen Herausforderungen vergleichsweise erfolgreich entgegnet ist Google. Ziel dieser Arbeit ist es aufzuzeigen, wie Suchmaschinen funktionieren. Der Fokus liegt dabei auf der Suchmaschine Google. Kapitel 2 wird sich zuerst mit dem Aufbau von Suchmaschinen im Allgemeinen beschäftigen, wodurch ein grundlegendes Verständnis für die einzelnen Komponenten geschaffen werden soll. Im zweiten Teil des Kapitels wird darauf aufbauend ein Überblick über die Architektur von Google gegeben. Kapitel 3 und 4 dienen dazu, näher auf die beiden Komponenten Crawler und Indexer einzugehen, bei denen es sich um zentrale Elemente im Rahmen von Suchmaschinen handelt.
    Pages
    22 S
  3. Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.11
    0.10840213 = product of:
      0.27100533 = sum of:
        0.23150799 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.23150799 = score(doc=562,freq=2.0), product of:
            0.4119227 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.048587184 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
        0.039497353 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
          0.039497353 = score(doc=562,freq=2.0), product of:
            0.17014404 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.048587184 = queryNorm
            0.23214069 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.4 = coord(2/5)
    
    Content
    Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
    Date
    8. 1.2013 10:22:32
  4. Verwer, K.: Freiheit und Verantwortung bei Hans Jonas (2011) 0.09
    0.0926032 = product of:
      0.46301597 = sum of:
        0.46301597 = weight(_text_:3a in 973) [ClassicSimilarity], result of:
          0.46301597 = score(doc=973,freq=2.0), product of:
            0.4119227 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.048587184 = queryNorm
            1.1240361 = fieldWeight in 973, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.09375 = fieldNorm(doc=973)
      0.2 = coord(1/5)
    
    Content
    Vgl.: http%3A%2F%2Fcreativechoice.org%2Fdoc%2FHansJonas.pdf&usg=AOvVaw1TM3teaYKgABL5H9yoIifA&opi=89978449.
  5. Fachsystematik Bremen nebst Schlüssel 1970 ff. (1970 ff) 0.09
    0.090335116 = product of:
      0.22583778 = sum of:
        0.19292332 = weight(_text_:3a in 3577) [ClassicSimilarity], result of:
          0.19292332 = score(doc=3577,freq=2.0), product of:
            0.4119227 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.048587184 = queryNorm
            0.46834838 = fieldWeight in 3577, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3577)
        0.032914463 = weight(_text_:22 in 3577) [ClassicSimilarity], result of:
          0.032914463 = score(doc=3577,freq=2.0), product of:
            0.17014404 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.048587184 = queryNorm
            0.19345059 = fieldWeight in 3577, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3577)
      0.4 = coord(2/5)
    
    Content
    1. Agrarwissenschaften 1981. - 3. Allgemeine Geographie 2.1972. - 3a. Allgemeine Naturwissenschaften 1.1973. - 4. Allgemeine Sprachwissenschaft, Allgemeine Literaturwissenschaft 2.1971. - 6. Allgemeines. 5.1983. - 7. Anglistik 3.1976. - 8. Astronomie, Geodäsie 4.1977. - 12. bio Biologie, bcp Biochemie-Biophysik, bot Botanik, zoo Zoologie 1981. - 13. Bremensien 3.1983. - 13a. Buch- und Bibliothekswesen 3.1975. - 14. Chemie 4.1977. - 14a. Elektrotechnik 1974. - 15 Ethnologie 2.1976. - 16,1. Geowissenschaften. Sachteil 3.1977. - 16,2. Geowissenschaften. Regionaler Teil 3.1977. - 17. Germanistik 6.1984. - 17a,1. Geschichte. Teilsystematik hil. - 17a,2. Geschichte. Teilsystematik his Neuere Geschichte. - 17a,3. Geschichte. Teilsystematik hit Neueste Geschichte. - 18. Humanbiologie 2.1983. - 19. Ingenieurwissenschaften 1974. - 20. siehe 14a. - 21. klassische Philologie 3.1977. - 22. Klinische Medizin 1975. - 23. Kunstgeschichte 2.1971. - 24. Kybernetik. 2.1975. - 25. Mathematik 3.1974. - 26. Medizin 1976. - 26a. Militärwissenschaft 1985. - 27. Musikwissenschaft 1978. - 27a. Noten 2.1974. - 28. Ozeanographie 3.1977. -29. Pädagogik 8.1985. - 30. Philosphie 3.1974. - 31. Physik 3.1974. - 33. Politik, Politische Wissenschaft, Sozialwissenschaft. Soziologie. Länderschlüssel. Register 1981. - 34. Psychologie 2.1972. - 35. Publizistik und Kommunikationswissenschaft 1985. - 36. Rechtswissenschaften 1986. - 37. Regionale Geograpgie 3.1975. - 37a. Religionswissenschaft 1970. - 38. Romanistik 3.1976. - 39. Skandinavistik 4.1985. - 40. Slavistik 1977. - 40a. Sonstige Sprachen und Literaturen 1973. - 43. Sport 4.1983. - 44. Theaterwissenschaft 1985. - 45. Theologie 2.1976. - 45a. Ur- und Frühgeschichte, Archäologie 1970. - 47. Volkskunde 1976. - 47a. Wirtschaftswissenschaften 1971 // Schlüssel: 1. Länderschlüssel 1971. - 2. Formenschlüssel (Kurzform) 1974. - 3. Personenschlüssel Literatur 5. Fassung 1968
  6. Kleineberg, M.: Context analysis and context indexing : formal pragmatics in knowledge organization (2014) 0.08
    0.07716933 = product of:
      0.38584664 = sum of:
        0.38584664 = weight(_text_:3a in 1826) [ClassicSimilarity], result of:
          0.38584664 = score(doc=1826,freq=2.0), product of:
            0.4119227 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.048587184 = queryNorm
            0.93669677 = fieldWeight in 1826, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.078125 = fieldNorm(doc=1826)
      0.2 = coord(1/5)
    
    Source
    http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=5&ved=0CDQQFjAE&url=http%3A%2F%2Fdigbib.ubka.uni-karlsruhe.de%2Fvolltexte%2Fdocuments%2F3131107&ei=HzFWVYvGMsiNsgGTyoFI&usg=AFQjCNE2FHUeR9oQTQlNC4TPedv4Mo3DaQ&sig2=Rlzpr7a3BLZZkqZCXXN_IA&bvm=bv.93564037,d.bGg&cad=rja
  7. Thiele, J.: Sie haben 502.456 Treffer! (1999) 0.07
    0.07067391 = product of:
      0.35336956 = sum of:
        0.35336956 = weight(_text_:crawler in 3868) [ClassicSimilarity], result of:
          0.35336956 = score(doc=3868,freq=2.0), product of:
            0.39420572 = queryWeight, product of:
              8.113368 = idf(docFreq=35, maxDocs=44218)
              0.048587184 = queryNorm
            0.89640903 = fieldWeight in 3868, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.113368 = idf(docFreq=35, maxDocs=44218)
              0.078125 = fieldNorm(doc=3868)
      0.2 = coord(1/5)
    
    Object
    Crawler
  8. Reinke, S.; Schmidt, M.: Einmal suchen, alles finden : 7 Meta-Suchmaschinen im Test (2001) 0.07
    0.07067391 = product of:
      0.35336956 = sum of:
        0.35336956 = weight(_text_:crawler in 176) [ClassicSimilarity], result of:
          0.35336956 = score(doc=176,freq=2.0), product of:
            0.39420572 = queryWeight, product of:
              8.113368 = idf(docFreq=35, maxDocs=44218)
              0.048587184 = queryNorm
            0.89640903 = fieldWeight in 176, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.113368 = idf(docFreq=35, maxDocs=44218)
              0.078125 = fieldNorm(doc=176)
      0.2 = coord(1/5)
    
    Abstract
    Von MetaSuchmaschinen oder Metacrawlern erwarten viele Datensucher Wunder. Die Crawler durchstöbern Kataloge von Suchmaschinen, fassen Ergebnisse zusammen, gleichen sie ab und präsentieren sie. CHIP hat sieben deutschsprachige, kostenlose Metacrawler getestet
  9. Schrodt, R.: Tiefen und Untiefen im wissenschaftlichen Sprachgebrauch (2008) 0.06
    0.061735462 = product of:
      0.30867732 = sum of:
        0.30867732 = weight(_text_:3a in 140) [ClassicSimilarity], result of:
          0.30867732 = score(doc=140,freq=2.0), product of:
            0.4119227 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.048587184 = queryNorm
            0.7493574 = fieldWeight in 140, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.0625 = fieldNorm(doc=140)
      0.2 = coord(1/5)
    
    Content
    Vgl. auch: https://studylibde.com/doc/13053640/richard-schrodt. Vgl. auch: http%3A%2F%2Fwww.univie.ac.at%2FGermanistik%2Fschrodt%2Fvorlesung%2Fwissenschaftssprache.doc&usg=AOvVaw1lDLDR6NFf1W0-oC9mEUJf.
  10. Popper, K.R.: Three worlds : the Tanner lecture on human values. Deliverd at the University of Michigan, April 7, 1978 (1978) 0.06
    0.061735462 = product of:
      0.30867732 = sum of:
        0.30867732 = weight(_text_:3a in 230) [ClassicSimilarity], result of:
          0.30867732 = score(doc=230,freq=2.0), product of:
            0.4119227 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.048587184 = queryNorm
            0.7493574 = fieldWeight in 230, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.0625 = fieldNorm(doc=230)
      0.2 = coord(1/5)
    
    Source
    https%3A%2F%2Ftannerlectures.utah.edu%2F_documents%2Fa-to-z%2Fp%2Fpopper80.pdf&usg=AOvVaw3f4QRTEH-OEBmoYr2J_c7H
  11. Thelwall, M.: Results from a web impact factor crawler (2001) 0.06
    0.061205406 = product of:
      0.30602702 = sum of:
        0.30602702 = weight(_text_:crawler in 4490) [ClassicSimilarity], result of:
          0.30602702 = score(doc=4490,freq=6.0), product of:
            0.39420572 = queryWeight, product of:
              8.113368 = idf(docFreq=35, maxDocs=44218)
              0.048587184 = queryNorm
            0.776313 = fieldWeight in 4490, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              8.113368 = idf(docFreq=35, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4490)
      0.2 = coord(1/5)
    
    Abstract
    Web impact factors, the proposed web equivalent of impact factors for journals, can be calculated by using search engines. It has been found that the results are problematic because of the variable coverage of search engines as well as their ability to give significantly different results over short periods of time. The fundamental problem is that although some search engines provide a functionality that is capable of being used for impact calculations, this is not their primary task and therefore they do not give guarantees as to performance in this respect. In this paper, a bespoke web crawler designed specifically for the calculation of reliable WIFs is presented. This crawler was used to calculate WIFs for a number of UK universities, and the results of these calculations are discussed. The principal findings were that with certain restrictions, WIFs can be calculated reliably, but do not correlate with accepted research rankings owing to the variety of material hosted on university servers. Changes to the calculations to improve the fit of the results to research rankings are proposed, but there are still inherent problems undermining the reliability of the calculation. These problems still apply if the WIF scores are taken on their own as indicators of the general impact of any area of the Internet, but with care would not apply to online journals.
  12. Kwiatkowski, M.; Höhfeld, S.: Thematisches Aufspüren von Web-Dokumenten : eine kritische Betrachtung von Focused Crawling-Strategien (2007) 0.06
    0.061205406 = product of:
      0.30602702 = sum of:
        0.30602702 = weight(_text_:crawler in 153) [ClassicSimilarity], result of:
          0.30602702 = score(doc=153,freq=6.0), product of:
            0.39420572 = queryWeight, product of:
              8.113368 = idf(docFreq=35, maxDocs=44218)
              0.048587184 = queryNorm
            0.776313 = fieldWeight in 153, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              8.113368 = idf(docFreq=35, maxDocs=44218)
              0.0390625 = fieldNorm(doc=153)
      0.2 = coord(1/5)
    
    Abstract
    Herkömmliche Suchmaschinen dienen der breiten Websuche und zeichnen sich zumeist durch eine hohe Quantität - nicht unbedingt durch Qualität - ihrer Ergebnismengen aus. Zum Suchen von Dokumenten wird ein allgemeiner Crawler eingesetzt, der Webseiten aufspürt, um große Datenspeicher aufzubauen. Focused Crawler gehen dagegen gezielter vor: Nicht enorme Datenmengen sollen durchsucht, gespeichert und indexiert werden, sondern nur bestimmte, thematisch relevante Segmente des World Wide Web. Der Focused Crawler muss einen möglichst optimalen Weg durch das Web finden, um Knowledge Discovery zu betreiben. Dabei bleiben die für eine Thematik irrelevanten Bereiche des Web unberücksichtigt. Die Aufgabe wird dadurch erheblich verkleinert und der Ressourcenaufwand verringert. Ziel ist die Produktion qualifizierter Suchergebnisse zu einem bestimmten Wissensgebiet. Im Allgemeinen können Focused Crawling-Techniken für den Aufbau spezialisierter vertikaler Suchmaschinen eingesetzt werden. Sie sind darüber hinaus im Bereich der Digitalen Bibliotheken von Vorteil. Da diese oft über einen thematischen Schwerpunkt verfügen und der qualifizierten Literatur-Untersuchung dienen, müssen sie einen gewissen Qualitätsanspruch Genüge leisten und dabei lediglich Anfragen zu einem definierten Wissensbereich bedienen. Der Einsatz von Focused Crawling bietet sich also an, um eine hohe Dokument-Qualität in einer spezifischen Domäne zu gewährleisten. Dieser Review-Artikel beleuchtet grundlegende Ansätze des Focused Crawling und verfolgt diese bis in die aktuellen Entwicklungen. Praktische Einsatzgebiete und aktuelle Systeme untermauern die Bedeutsamkeit des Forschungsgebiets. Darüber hinaus wird eine kritische Betrachtung der aufgeführten Ansätze geleistet.
  13. Fu, T.; Abbasi, A.; Chen, H.: ¬A focused crawler for Dark Web forums (2010) 0.06
    0.061205406 = product of:
      0.30602702 = sum of:
        0.30602702 = weight(_text_:crawler in 3471) [ClassicSimilarity], result of:
          0.30602702 = score(doc=3471,freq=6.0), product of:
            0.39420572 = queryWeight, product of:
              8.113368 = idf(docFreq=35, maxDocs=44218)
              0.048587184 = queryNorm
            0.776313 = fieldWeight in 3471, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              8.113368 = idf(docFreq=35, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3471)
      0.2 = coord(1/5)
    
    Abstract
    The unprecedented growth of the Internet has given rise to the Dark Web, the problematic facet of the Web associated with cybercrime, hate, and extremism. Despite the need for tools to collect and analyze Dark Web forums, the covert nature of this part of the Internet makes traditional Web crawling techniques insufficient for capturing such content. In this study, we propose a novel crawling system designed to collect Dark Web forum content. The system uses a human-assisted accessibility approach to gain access to Dark Web forums. Several URL ordering features and techniques enable efficient extraction of forum postings. The system also includes an incremental crawler coupled with a recall-improvement mechanism intended to facilitate enhanced retrieval and updating of collected content. Experiments conducted to evaluate the effectiveness of the human-assisted accessibility approach and the recall-improvement-based, incremental-update procedure yielded favorable results. The human-assisted approach significantly improved access to Dark Web forums while the incremental crawler with recall improvement also outperformed standard periodic- and incremental-update approaches. Using the system, we were able to collect over 100 Dark Web forums from three regions. A case study encompassing link and content analysis of collected forums was used to illustrate the value and importance of gathering and analyzing content from such online communities.
  14. Jung, J.J.: Contextualized query sampling to discover semantic resource descriptions on the web (2009) 0.06
    0.0599688 = product of:
      0.299844 = sum of:
        0.299844 = weight(_text_:crawler in 4216) [ClassicSimilarity], result of:
          0.299844 = score(doc=4216,freq=4.0), product of:
            0.39420572 = queryWeight, product of:
              8.113368 = idf(docFreq=35, maxDocs=44218)
              0.048587184 = queryNorm
            0.7606282 = fieldWeight in 4216, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              8.113368 = idf(docFreq=35, maxDocs=44218)
              0.046875 = fieldNorm(doc=4216)
      0.2 = coord(1/5)
    
    Abstract
    Resource description extracted by query-sampling method can be applied to determine which database sources a certain query should be firstly sent to. In this paper, we propose a contextualized query-sampling method to extract the resources which are most relevant to up-to-date context. Practically, the proposed approach is adopted to personal crawler systems (the so-called focused crawlers), which can support the corresponding user's web navigation tasks in real-time. By taking into account the user context (e.g., intentions or interests), the crawler can build the queries to evaluate candidate information sources. As a result, we can discover semantic associations (i) between user context and the sources, and (ii) between all pairs of the sources. These associations are applied to rank the sources, and transform the queries for the other sources. For evaluating the performance of contextualized query sampling on 53 information sources, we compared the ranking lists recommended by the proposed method with user feedbacks (i.e., ideal ranks), and also computed the precision of discovered subsumptions as semantic associations between the sources.
  15. Raeder, A.: Cataloguing the Web (1995) 0.06
    0.05653913 = product of:
      0.28269565 = sum of:
        0.28269565 = weight(_text_:crawler in 3387) [ClassicSimilarity], result of:
          0.28269565 = score(doc=3387,freq=2.0), product of:
            0.39420572 = queryWeight, product of:
              8.113368 = idf(docFreq=35, maxDocs=44218)
              0.048587184 = queryNorm
            0.7171272 = fieldWeight in 3387, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.113368 = idf(docFreq=35, maxDocs=44218)
              0.0625 = fieldNorm(doc=3387)
      0.2 = coord(1/5)
    
    Abstract
    Lists and describes sites that attempt to aid Internet searchers by helping them locate sites, files or information. Gives an overview of the methods used. Covers the following sides: Aliweb, ArchiPlex Archie Gateway, CUI W3, Clearing House for Subject Oriented Internet Resource Guide, InfoSeek, JumpStation, Lawrence Livermore National Laboratories List of Lists, Lycos WWW Search Engine, Mother of all BBSs, NIKOS, Plant Earth Home Page, Standford Newnews Filtering Service, WWW Home Page Harvest Browser, WWW virtual Library, WWW Wanderer Index, WWW Worm, Web Crawler, Whole Internet Catalog, and Yahoo Index to the Internet
  16. Blake, P.: AltaVista and Notes for the web (1996) 0.06
    0.05653913 = product of:
      0.28269565 = sum of:
        0.28269565 = weight(_text_:crawler in 4537) [ClassicSimilarity], result of:
          0.28269565 = score(doc=4537,freq=2.0), product of:
            0.39420572 = queryWeight, product of:
              8.113368 = idf(docFreq=35, maxDocs=44218)
              0.048587184 = queryNorm
            0.7171272 = fieldWeight in 4537, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.113368 = idf(docFreq=35, maxDocs=44218)
              0.0625 = fieldNorm(doc=4537)
      0.2 = coord(1/5)
    
    Footnote
    Briefly reviews the AltaVista and Notes search software for searching the WWW. In the case of AltaVista, Digital claims that this web crawler has been crawling the WWW at the rate of 2,5 million pages per day and already accounts for the indexing of 16 million pages and 13.000 newsgroups. Suggests that AltaVista pulls of significantly more on obscure or specialist subjects than rivals like InfoSeek and Excite. concludes with details of IBM's development of the Lotus WWW searcher designed to cope with the increasing complexity of web applications
  17. Esser, M.: Was Sie über Suchmaschinen wissen sollten (1998) 0.06
    0.05653913 = product of:
      0.28269565 = sum of:
        0.28269565 = weight(_text_:crawler in 2335) [ClassicSimilarity], result of:
          0.28269565 = score(doc=2335,freq=2.0), product of:
            0.39420572 = queryWeight, product of:
              8.113368 = idf(docFreq=35, maxDocs=44218)
              0.048587184 = queryNorm
            0.7171272 = fieldWeight in 2335, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.113368 = idf(docFreq=35, maxDocs=44218)
              0.0625 = fieldNorm(doc=2335)
      0.2 = coord(1/5)
    
    Object
    Crawler
  18. Schneller finden! (1999) 0.06
    0.05653913 = product of:
      0.28269565 = sum of:
        0.28269565 = weight(_text_:crawler in 2929) [ClassicSimilarity], result of:
          0.28269565 = score(doc=2929,freq=2.0), product of:
            0.39420572 = queryWeight, product of:
              8.113368 = idf(docFreq=35, maxDocs=44218)
              0.048587184 = queryNorm
            0.7171272 = fieldWeight in 2929, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.113368 = idf(docFreq=35, maxDocs=44218)
              0.0625 = fieldNorm(doc=2929)
      0.2 = coord(1/5)
    
    Object
    CRAWLER
  19. Reibold, H.: Findigkeit gefragt (2000) 0.06
    0.05653913 = product of:
      0.28269565 = sum of:
        0.28269565 = weight(_text_:crawler in 4283) [ClassicSimilarity], result of:
          0.28269565 = score(doc=4283,freq=2.0), product of:
            0.39420572 = queryWeight, product of:
              8.113368 = idf(docFreq=35, maxDocs=44218)
              0.048587184 = queryNorm
            0.7171272 = fieldWeight in 4283, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.113368 = idf(docFreq=35, maxDocs=44218)
              0.0625 = fieldNorm(doc=4283)
      0.2 = coord(1/5)
    
    Object
    Crawler
  20. Vetere, G.; Lenzerini, M.: Models for semantic interoperability in service-oriented architectures (2005) 0.05
    0.054018535 = product of:
      0.27009267 = sum of:
        0.27009267 = weight(_text_:3a in 306) [ClassicSimilarity], result of:
          0.27009267 = score(doc=306,freq=2.0), product of:
            0.4119227 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.048587184 = queryNorm
            0.65568775 = fieldWeight in 306, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.0546875 = fieldNorm(doc=306)
      0.2 = coord(1/5)
    
    Content
    Vgl.: http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=5386707&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D5386707.

Languages

Types

  • a 3075
  • m 345
  • el 165
  • s 139
  • b 39
  • x 35
  • i 23
  • r 17
  • ? 8
  • p 4
  • d 3
  • n 3
  • u 2
  • z 2
  • au 1
  • h 1
  • More… Less…

Themes

Subjects

Classifications