Search (2211 results, page 1 of 111)

Alqaraleh, S.; Ramadan, O.; Salamah, M.: Efficient watcher based web crawler design (2015) 0.14
```
0.13557659 = product of:
  0.33894148 = sum of:
    0.30602702 = weight(_text_:crawler in 1627) [ClassicSimilarity], result of:
      0.30602702 = score(doc=1627,freq=6.0), product of:
        0.39420572 = queryWeight, product of:
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.048587184 = queryNorm
        0.776313 = fieldWeight in 1627, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1627)
    0.032914463 = weight(_text_:22 in 1627) [ClassicSimilarity], result of:
      0.032914463 = score(doc=1627,freq=2.0), product of:
        0.17014404 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.048587184 = queryNorm
        0.19345059 = fieldWeight in 1627, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1627)
  0.4 = coord(2/5)
```
Abstract

Purpose The purpose of this paper is to design a watcher-based crawler (WBC) that has the ability of crawling static and dynamic web sites, and can download only the updated and newly added web pages. Design/methodology/approach In the proposed WBC crawler, a watcher file, which can be uploaded to the web sites servers, prepares a report that contains the addresses of the updated and the newly added web pages. In addition, the WBC is split into five units, where each unit is responsible for performing a specific crawling process. Findings Several experiments have been conducted and it has been observed that the proposed WBC increases the number of uniquely visited static and dynamic web sites as compared with the existing crawling techniques. In addition, the proposed watcher file not only allows the crawlers to visit the updated and newly web pages, but also solves the crawlers overlapping and communication problems. Originality/value The proposed WBC performs all crawling processes in the sense that it detects all updated and newly added pages automatically without any human explicit intervention or downloading the entire web sites.

Date

20. 1.2015 18:30:22

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.11

0.10840213 = product of:
  0.27100533 = sum of:
    0.23150799 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
      0.23150799 = score(doc=562,freq=2.0), product of:
        0.4119227 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.048587184 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.039497353 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
      0.039497353 = score(doc=562,freq=2.0), product of:
        0.17014404 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.048587184 = queryNorm
        0.23214069 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
  0.4 = coord(2/5)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Kleineberg, M.: Context analysis and context indexing : formal pragmatics in knowledge organization (2014) 0.08

0.07716933 = product of:
  0.38584664 = sum of:
    0.38584664 = weight(_text_:3a in 1826) [ClassicSimilarity], result of:
      0.38584664 = score(doc=1826,freq=2.0), product of:
        0.4119227 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.048587184 = queryNorm
        0.93669677 = fieldWeight in 1826, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.078125 = fieldNorm(doc=1826)
  0.2 = coord(1/5)

Source: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=5&ved=0CDQQFjAE&url=http%3A%2F%2Fdigbib.ubka.uni-karlsruhe.de%2Fvolltexte%2Fdocuments%2F3131107&ei=HzFWVYvGMsiNsgGTyoFI&usg=AFQjCNE2FHUeR9oQTQlNC4TPedv4Mo3DaQ&sig2=Rlzpr7a3BLZZkqZCXXN_IA&bvm=bv.93564037,d.bGg&cad=rja

Popper, K.R.: Three worlds : the Tanner lecture on human values. Deliverd at the University of Michigan, April 7, 1978 (1978) 0.06

0.061735462 = product of:
  0.30867732 = sum of:
    0.30867732 = weight(_text_:3a in 230) [ClassicSimilarity], result of:
      0.30867732 = score(doc=230,freq=2.0), product of:
        0.4119227 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.048587184 = queryNorm
        0.7493574 = fieldWeight in 230, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.0625 = fieldNorm(doc=230)
  0.2 = coord(1/5)

Source: https%3A%2F%2Ftannerlectures.utah.edu%2F_documents%2Fa-to-z%2Fp%2Fpopper80.pdf&usg=AOvVaw3f4QRTEH-OEBmoYr2J_c7H

Thelwall, M.: Results from a web impact factor crawler (2001) 0.06
```
0.061205406 = product of:
  0.30602702 = sum of:
    0.30602702 = weight(_text_:crawler in 4490) [ClassicSimilarity], result of:
      0.30602702 = score(doc=4490,freq=6.0), product of:
        0.39420572 = queryWeight, product of:
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.048587184 = queryNorm
        0.776313 = fieldWeight in 4490, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4490)
  0.2 = coord(1/5)
```
Abstract

Web impact factors, the proposed web equivalent of impact factors for journals, can be calculated by using search engines. It has been found that the results are problematic because of the variable coverage of search engines as well as their ability to give significantly different results over short periods of time. The fundamental problem is that although some search engines provide a functionality that is capable of being used for impact calculations, this is not their primary task and therefore they do not give guarantees as to performance in this respect. In this paper, a bespoke web crawler designed specifically for the calculation of reliable WIFs is presented. This crawler was used to calculate WIFs for a number of UK universities, and the results of these calculations are discussed. The principal findings were that with certain restrictions, WIFs can be calculated reliably, but do not correlate with accepted research rankings owing to the variety of material hosted on university servers. Changes to the calculations to improve the fit of the results to research rankings are proposed, but there are still inherent problems undermining the reliability of the calculation. These problems still apply if the WIF scores are taken on their own as indicators of the general impact of any area of the Internet, but with care would not apply to online journals.
Kwiatkowski, M.; Höhfeld, S.: Thematisches Aufspüren von Web-Dokumenten : eine kritische Betrachtung von Focused Crawling-Strategien (2007) 0.06
```
0.061205406 = product of:
  0.30602702 = sum of:
    0.30602702 = weight(_text_:crawler in 153) [ClassicSimilarity], result of:
      0.30602702 = score(doc=153,freq=6.0), product of:
        0.39420572 = queryWeight, product of:
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.048587184 = queryNorm
        0.776313 = fieldWeight in 153, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.0390625 = fieldNorm(doc=153)
  0.2 = coord(1/5)
```
Abstract

Herkömmliche Suchmaschinen dienen der breiten Websuche und zeichnen sich zumeist durch eine hohe Quantität - nicht unbedingt durch Qualität - ihrer Ergebnismengen aus. Zum Suchen von Dokumenten wird ein allgemeiner Crawler eingesetzt, der Webseiten aufspürt, um große Datenspeicher aufzubauen. Focused Crawler gehen dagegen gezielter vor: Nicht enorme Datenmengen sollen durchsucht, gespeichert und indexiert werden, sondern nur bestimmte, thematisch relevante Segmente des World Wide Web. Der Focused Crawler muss einen möglichst optimalen Weg durch das Web finden, um Knowledge Discovery zu betreiben. Dabei bleiben die für eine Thematik irrelevanten Bereiche des Web unberücksichtigt. Die Aufgabe wird dadurch erheblich verkleinert und der Ressourcenaufwand verringert. Ziel ist die Produktion qualifizierter Suchergebnisse zu einem bestimmten Wissensgebiet. Im Allgemeinen können Focused Crawling-Techniken für den Aufbau spezialisierter vertikaler Suchmaschinen eingesetzt werden. Sie sind darüber hinaus im Bereich der Digitalen Bibliotheken von Vorteil. Da diese oft über einen thematischen Schwerpunkt verfügen und der qualifizierten Literatur-Untersuchung dienen, müssen sie einen gewissen Qualitätsanspruch Genüge leisten und dabei lediglich Anfragen zu einem definierten Wissensbereich bedienen. Der Einsatz von Focused Crawling bietet sich also an, um eine hohe Dokument-Qualität in einer spezifischen Domäne zu gewährleisten. Dieser Review-Artikel beleuchtet grundlegende Ansätze des Focused Crawling und verfolgt diese bis in die aktuellen Entwicklungen. Praktische Einsatzgebiete und aktuelle Systeme untermauern die Bedeutsamkeit des Forschungsgebiets. Darüber hinaus wird eine kritische Betrachtung der aufgeführten Ansätze geleistet.
Fu, T.; Abbasi, A.; Chen, H.: ¬A focused crawler for Dark Web forums (2010) 0.06
```
0.061205406 = product of:
  0.30602702 = sum of:
    0.30602702 = weight(_text_:crawler in 3471) [ClassicSimilarity], result of:
      0.30602702 = score(doc=3471,freq=6.0), product of:
        0.39420572 = queryWeight, product of:
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.048587184 = queryNorm
        0.776313 = fieldWeight in 3471, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3471)
  0.2 = coord(1/5)
```
Abstract

The unprecedented growth of the Internet has given rise to the Dark Web, the problematic facet of the Web associated with cybercrime, hate, and extremism. Despite the need for tools to collect and analyze Dark Web forums, the covert nature of this part of the Internet makes traditional Web crawling techniques insufficient for capturing such content. In this study, we propose a novel crawling system designed to collect Dark Web forum content. The system uses a human-assisted accessibility approach to gain access to Dark Web forums. Several URL ordering features and techniques enable efficient extraction of forum postings. The system also includes an incremental crawler coupled with a recall-improvement mechanism intended to facilitate enhanced retrieval and updating of collected content. Experiments conducted to evaluate the effectiveness of the human-assisted accessibility approach and the recall-improvement-based, incremental-update procedure yielded favorable results. The human-assisted approach significantly improved access to Dark Web forums while the incremental crawler with recall improvement also outperformed standard periodic- and incremental-update approaches. Using the system, we were able to collect over 100 Dark Web forums from three regions. A case study encompassing link and content analysis of collected forums was used to illustrate the value and importance of gathering and analyzing content from such online communities.
Jung, J.J.: Contextualized query sampling to discover semantic resource descriptions on the web (2009) 0.06
```
0.0599688 = product of:
  0.299844 = sum of:
    0.299844 = weight(_text_:crawler in 4216) [ClassicSimilarity], result of:
      0.299844 = score(doc=4216,freq=4.0), product of:
        0.39420572 = queryWeight, product of:
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.048587184 = queryNorm
        0.7606282 = fieldWeight in 4216, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.046875 = fieldNorm(doc=4216)
  0.2 = coord(1/5)
```
Abstract

Resource description extracted by query-sampling method can be applied to determine which database sources a certain query should be firstly sent to. In this paper, we propose a contextualized query-sampling method to extract the resources which are most relevant to up-to-date context. Practically, the proposed approach is adopted to personal crawler systems (the so-called focused crawlers), which can support the corresponding user's web navigation tasks in real-time. By taking into account the user context (e.g., intentions or interests), the crawler can build the queries to evaluate candidate information sources. As a result, we can discover semantic associations (i) between user context and the sources, and (ii) between all pairs of the sources. These associations are applied to rank the sources, and transform the queries for the other sources. For evaluating the performance of contextualized query sampling on 53 information sources, we compared the ranking lists recommended by the proposed method with user feedbacks (i.e., ideal ranks), and also computed the precision of discovered subsumptions as semantic associations between the sources.
Raeder, A.: Cataloguing the Web (1995) 0.06
```
0.05653913 = product of:
  0.28269565 = sum of:
    0.28269565 = weight(_text_:crawler in 3387) [ClassicSimilarity], result of:
      0.28269565 = score(doc=3387,freq=2.0), product of:
        0.39420572 = queryWeight, product of:
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.048587184 = queryNorm
        0.7171272 = fieldWeight in 3387, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.0625 = fieldNorm(doc=3387)
  0.2 = coord(1/5)
```
Abstract

Lists and describes sites that attempt to aid Internet searchers by helping them locate sites, files or information. Gives an overview of the methods used. Covers the following sides: Aliweb, ArchiPlex Archie Gateway, CUI W3, Clearing House for Subject Oriented Internet Resource Guide, InfoSeek, JumpStation, Lawrence Livermore National Laboratories List of Lists, Lycos WWW Search Engine, Mother of all BBSs, NIKOS, Plant Earth Home Page, Standford Newnews Filtering Service, WWW Home Page Harvest Browser, WWW virtual Library, WWW Wanderer Index, WWW Worm, Web Crawler, Whole Internet Catalog, and Yahoo Index to the Internet
Blake, P.: AltaVista and Notes for the web (1996) 0.06
```
0.05653913 = product of:
  0.28269565 = sum of:
    0.28269565 = weight(_text_:crawler in 4537) [ClassicSimilarity], result of:
      0.28269565 = score(doc=4537,freq=2.0), product of:
        0.39420572 = queryWeight, product of:
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.048587184 = queryNorm
        0.7171272 = fieldWeight in 4537, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.0625 = fieldNorm(doc=4537)
  0.2 = coord(1/5)
```
Footnote

Briefly reviews the AltaVista and Notes search software for searching the WWW. In the case of AltaVista, Digital claims that this web crawler has been crawling the WWW at the rate of 2,5 million pages per day and already accounts for the indexing of 16 million pages and 13.000 newsgroups. Suggests that AltaVista pulls of significantly more on obscure or specialist subjects than rivals like InfoSeek and Excite. concludes with details of IBM's development of the Lotus WWW searcher designed to cope with the increasing complexity of web applications

Schneller finden! (1999) 0.06

0.05653913 = product of:
  0.28269565 = sum of:
    0.28269565 = weight(_text_:crawler in 2929) [ClassicSimilarity], result of:
      0.28269565 = score(doc=2929,freq=2.0), product of:
        0.39420572 = queryWeight, product of:
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.048587184 = queryNorm
        0.7171272 = fieldWeight in 2929, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.0625 = fieldNorm(doc=2929)
  0.2 = coord(1/5)

Object: CRAWLER

Vetere, G.; Lenzerini, M.: Models for semantic interoperability in service-oriented architectures (2005) 0.05

0.054018535 = product of:
  0.27009267 = sum of:
    0.27009267 = weight(_text_:3a in 306) [ClassicSimilarity], result of:
      0.27009267 = score(doc=306,freq=2.0), product of:
        0.4119227 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.048587184 = queryNorm
        0.65568775 = fieldWeight in 306, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.0546875 = fieldNorm(doc=306)
  0.2 = coord(1/5)

Content: Vgl.: http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=5386707&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D5386707.

Larson, R.R.: Bibliometrics of the World Wide Web : an exploratory analysis of the intellectual structure of cyberspace (1996) 0.05

0.049471736 = product of:
  0.24735868 = sum of:
    0.24735868 = weight(_text_:crawler in 7334) [ClassicSimilarity], result of:
      0.24735868 = score(doc=7334,freq=2.0), product of:
        0.39420572 = queryWeight, product of:
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.048587184 = queryNorm
        0.6274863 = fieldWeight in 7334, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.0546875 = fieldNorm(doc=7334)
  0.2 = coord(1/5)

Abstract: Examines the explosive growth and the bibliometrics of the WWW based on both analysis of over 30 GBytes of WWW pages collected by the Inktomi Web Crawler and on the use of the DEC AltaVista search engine for cocitation analysis of a set of Earth Science related WWW sites. Examines the statistical characteristics of web documents and their links, and the characteristics of highly cited web documents

Thelwall, M.; Stuart, D.: Web crawling ethics revisited : cost, privacy, and denial of service (2006) 0.05
```
0.049471736 = product of:
  0.24735868 = sum of:
    0.24735868 = weight(_text_:crawler in 6098) [ClassicSimilarity], result of:
      0.24735868 = score(doc=6098,freq=2.0), product of:
        0.39420572 = queryWeight, product of:
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.048587184 = queryNorm
        0.6274863 = fieldWeight in 6098, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.0546875 = fieldNorm(doc=6098)
  0.2 = coord(1/5)
```
Abstract

Ethical aspects of the employment of Web crawlers for information science research and other contexts are reviewed. The difference between legal and ethical uses of communications technologies is emphasized as well as the changing boundary between ethical and unethical conduct. A review of the potential impacts on Web site owners is used to underpin a new framework for ethical crawling, and it is argued that delicate human judgment is required for each individual case, with verdicts likely to change over time. Decisions can be based upon an approximate cost-benefit analysis, but it is crucial that crawler owners find out about the technological issues affecting the owners of the sites being crawled in order to produce an informed assessment.

Mas, S.; Marleau, Y.: Proposition of a faceted classification model to support corporate information organization and digital records management (2009) 0.05

0.0463016 = product of:
  0.23150799 = sum of:
    0.23150799 = weight(_text_:3a in 2918) [ClassicSimilarity], result of:
      0.23150799 = score(doc=2918,freq=2.0), product of:
        0.4119227 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.048587184 = queryNorm
        0.56201804 = fieldWeight in 2918, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=2918)
  0.2 = coord(1/5)

Footnote: Vgl.: http://ieeexplore.ieee.org/Xplore/login.jsp?reload=true&url=http%3A%2F%2Fieeexplore.ieee.org%2Fiel5%2F4755313%2F4755314%2F04755480.pdf%3Farnumber%3D4755480&authDecision=-203.

Li, L.; Shang, Y.; Zhang, W.: Improvement of HITS-based algorithms on Web documents 0.05

0.0463016 = product of:
  0.23150799 = sum of:
    0.23150799 = weight(_text_:3a in 2514) [ClassicSimilarity], result of:
      0.23150799 = score(doc=2514,freq=2.0), product of:
        0.4119227 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.048587184 = queryNorm
        0.56201804 = fieldWeight in 2514, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=2514)
  0.2 = coord(1/5)

Content: Vgl.: http%3A%2F%2Fdelab.csd.auth.gr%2F~dimitris%2Fcourses%2Fir_spring06%2Fpage_rank_computing%2Fp527-li.pdf. Vgl. auch: http://www2002.org/CDROM/refereed/643/.

Zeng, Q.; Yu, M.; Yu, W.; Xiong, J.; Shi, Y.; Jiang, M.: Faceted hierarchy : a new graph type to organize scientific concepts and a construction method (2019) 0.05

0.0463016 = product of:
  0.23150799 = sum of:
    0.23150799 = weight(_text_:3a in 400) [ClassicSimilarity], result of:
      0.23150799 = score(doc=400,freq=2.0), product of:
        0.4119227 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.048587184 = queryNorm
        0.56201804 = fieldWeight in 400, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=400)
  0.2 = coord(1/5)

Content: Vgl.: https%3A%2F%2Faclanthology.org%2FD19-5317.pdf&usg=AOvVaw0ZZFyq5wWTtNTvNkrvjlGA.

Noever, D.; Ciolino, M.: ¬The Turing deception (2022) 0.05

0.0463016 = product of:
  0.23150799 = sum of:
    0.23150799 = weight(_text_:3a in 862) [ClassicSimilarity], result of:
      0.23150799 = score(doc=862,freq=2.0), product of:
        0.4119227 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.048587184 = queryNorm
        0.56201804 = fieldWeight in 862, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=862)
  0.2 = coord(1/5)

Source: https%3A%2F%2Farxiv.org%2Fabs%2F2212.06721&usg=AOvVaw3i_9pZm9y_dQWoHi6uv0EN

Venditto, G.: Search engine showdown : IW Labs tests seven Internet search tools (1996) 0.04
```
0.042404346 = product of:
  0.21202172 = sum of:
    0.21202172 = weight(_text_:crawler in 4983) [ClassicSimilarity], result of:
      0.21202172 = score(doc=4983,freq=2.0), product of:
        0.39420572 = queryWeight, product of:
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.048587184 = queryNorm
        0.5378454 = fieldWeight in 4983, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.046875 = fieldNorm(doc=4983)
  0.2 = coord(1/5)
```
Abstract

Reports results of the IW Labs tests carried out on 7 major WWW search engines available free of charge on the Internet: AltaVista; Excite; InfoSeek; Lycos; Open Text; Web Crawler; and WWW Worm. Notes the differences between these 7 search engines and the Net directories like Yahoo and Magellan, which are essentially registries of web sites based on descriptions submitted by webmasters or written by the directory's staff. The search engines were tested using both simple searches for the name of a celebrity or a popular web site as well as more complex searches. Results indicated that the most relevant results and the most comprehensive results were obtained from InfoSeek and from AltaVista respectively. Excite was almost as good as InfoSeek in finding relavant pages and users who are comfortable with Boolean searching may prefer OpenText, which excels in this respect
Wenyin, L.; Chen, Z.; Li, M.; Zhang, H.: ¬A media agent for automatically builiding a personalized semantic index of Web media objects (2001) 0.04
```
0.042404346 = product of:
  0.21202172 = sum of:
    0.21202172 = weight(_text_:crawler in 6522) [ClassicSimilarity], result of:
      0.21202172 = score(doc=6522,freq=2.0), product of:
        0.39420572 = queryWeight, product of:
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.048587184 = queryNorm
        0.5378454 = fieldWeight in 6522, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.113368 = idf(docFreq=35, maxDocs=44218)
          0.046875 = fieldNorm(doc=6522)
  0.2 = coord(1/5)
```
Abstract

A novel idea of media agent is briefly presented, which can automatically build a personalized semantic index of Web media objects for each particular user. Because the Web is a rich source of multimedia data and the text content on the Web pages is usually semantically related to those media objects on the same pages, the media agent can automatically collect the URLs and related text, and then build the index of the multimedia data, on behalf of the user whenever and wherever she accesses these multimedia data or their container Web pages. Moreover, the media agent can also use an off-line crawler to build the index for those multimedia objects that are relevant to the user's favorites but have not accessed by the user yet. When the user wants to find these multimedia data once again, the semantic index facilitates text-based search for her.

Search (2211 results, page 1 of 111)

Authors

Years

Types

Themes

Subjects

Classifications