Search (46 results, page 1 of 3)

  • × language_ss:"e"
  • × theme_ss:"Suchmaschinen"
  • × type_ss:"el"
  1. Smith, A.G.: Search features of digital libraries (2000) 0.03
    0.029142123 = product of:
      0.10199743 = sum of:
        0.054539118 = weight(_text_:wide in 940) [ClassicSimilarity], result of:
          0.054539118 = score(doc=940,freq=4.0), product of:
            0.1312982 = queryWeight, product of:
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.029633347 = queryNorm
            0.4153836 = fieldWeight in 940, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.046875 = fieldNorm(doc=940)
        0.020922182 = weight(_text_:web in 940) [ClassicSimilarity], result of:
          0.020922182 = score(doc=940,freq=2.0), product of:
            0.09670874 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.029633347 = queryNorm
            0.21634221 = fieldWeight in 940, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=940)
        0.00856136 = weight(_text_:information in 940) [ClassicSimilarity], result of:
          0.00856136 = score(doc=940,freq=4.0), product of:
            0.052020688 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.029633347 = queryNorm
            0.16457605 = fieldWeight in 940, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=940)
        0.01797477 = weight(_text_:retrieval in 940) [ClassicSimilarity], result of:
          0.01797477 = score(doc=940,freq=2.0), product of:
            0.08963835 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.029633347 = queryNorm
            0.20052543 = fieldWeight in 940, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=940)
      0.2857143 = coord(4/14)
    
    Abstract
    Traditional on-line search services such as Dialog, DataStar and Lexis provide a wide range of search features (boolean and proximity operators, truncation, etc). This paper discusses the use of these features for effective searching, and argues that these features are required, regardless of advances in search engine technology. The literature on on-line searching is reviewed, identifying features that searchers find desirable for effective searching. A selective survey of current digital libraries available on the Web was undertaken, identifying which search features are present. The survey indicates that current digital libraries do not implement a wide range of search features. For instance: under half of the examples included controlled vocabulary, under half had proximity searching, only one enabled browsing of term indexes, and none of the digital libraries enable searchers to refine an initial search. Suggestions are made for enhancing the search effectiveness of digital libraries; for instance, by providing a full range of search operators, enabling browsing of search terms, enhancement of records with controlled vocabulary, enabling the refining of initial searches, etc.
    Content
    Enthält eine Zusammenstellung der Werkzeuge und Hilfsmittel des Information Retrieval
    Source
    Information Research. 5(2000) no.3, April 2000
  2. Search Engines and Beyond : Developing efficient knowledge management systems, April 19-20 1999, Boston, Mass (1999) 0.02
    0.020617396 = product of:
      0.072160885 = sum of:
        0.025709987 = weight(_text_:wide in 2596) [ClassicSimilarity], result of:
          0.025709987 = score(doc=2596,freq=2.0), product of:
            0.1312982 = queryWeight, product of:
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.029633347 = queryNorm
            0.1958137 = fieldWeight in 2596, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.03125 = fieldNorm(doc=2596)
        0.013948122 = weight(_text_:web in 2596) [ClassicSimilarity], result of:
          0.013948122 = score(doc=2596,freq=2.0), product of:
            0.09670874 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.029633347 = queryNorm
            0.14422815 = fieldWeight in 2596, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.03125 = fieldNorm(doc=2596)
        0.005707573 = weight(_text_:information in 2596) [ClassicSimilarity], result of:
          0.005707573 = score(doc=2596,freq=4.0), product of:
            0.052020688 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.029633347 = queryNorm
            0.10971737 = fieldWeight in 2596, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.03125 = fieldNorm(doc=2596)
        0.026795205 = weight(_text_:retrieval in 2596) [ClassicSimilarity], result of:
          0.026795205 = score(doc=2596,freq=10.0), product of:
            0.08963835 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.029633347 = queryNorm
            0.29892567 = fieldWeight in 2596, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.03125 = fieldNorm(doc=2596)
      0.2857143 = coord(4/14)
    
    Content
    Ramana Rao (Inxight, Palo Alto, CA) 7 ± 2 Insights on achieving Effective Information Access Session One: Updates and a twelve month perspective Danny Sullivan (Search Engine Watch, US / England) Portalization and other search trends Carol Tenopir (University of Tennessee) Search realities faced by end users and professional searchers Session Two: Today's search engines and beyond Daniel Hoogterp (Retrieval Technologies, McLean, VA) Effective presentation and utilization of search techniques Rick Kenny (Fulcrum Technologies, Ontario, Canada) Beyond document clustering: The knowledge impact statement Gary Stock (Ingenius, Kalamazoo, MI) Automated change monitoring Gary Culliss (Direct Hit, Wellesley Hills, MA) User popularity ranked search engines Byron Dom (IBM, CA) Automatically finding the best pages on the World Wide Web (CLEVER) Peter Tomassi (LookSmart, San Francisco, CA) Adding human intellect to search technology Session Three: Panel discussion: Human v automated categorization and editing Ev Brenner (New York, NY)- Chairman James Callan (University of Massachusetts, MA) Marc Krellenstein (Northern Light Technology, Cambridge, MA) Dan Miller (Ask Jeeves, Berkeley, CA) Session Four: Updates and a twelve month perspective Steve Arnold (AIT, Harrods Creek, KY) Review: The leading edge in search and retrieval software Ellen Voorhees (NIST, Gaithersburg, MD) TREC update Session Five: Search engines now and beyond Intelligent Agents John Snyder (Muscat, Cambridge, England) Practical issues behind intelligent agents Text summarization Therese Firmin, (Dept of Defense, Ft George G. Meade, MD) The TIPSTER/SUMMAC evaluation of automatic text summarization systems Cross language searching Elizabeth Liddy (TextWise, Syracuse, NY) A conceptual interlingua approach to cross-language retrieval. Video search and retrieval Armon Amir (IBM, Almaden, CA) CueVideo: Modular system for automatic indexing and browsing of video/audio Speech recognition Michael Witbrock (Lycos, Waltham, MA) Retrieval of spoken documents Visualization James A. Wise (Integral Visuals, Richland, WA) Information visualization in the new millennium: Emerging science or passing fashion? Text mining David Evans (Claritech, Pittsburgh, PA) Text mining - towards decision support
  3. Ding, L.; Finin, T.; Joshi, A.; Peng, Y.; Cost, R.S.; Sachs, J.; Pan, R.; Reddivari, P.; Doshi, V.: Swoogle : a Semantic Web search and metadata engine (2004) 0.02
    0.01914352 = product of:
      0.089336425 = sum of:
        0.055354897 = weight(_text_:web in 4704) [ClassicSimilarity], result of:
          0.055354897 = score(doc=4704,freq=14.0), product of:
            0.09670874 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.029633347 = queryNorm
            0.57238775 = fieldWeight in 4704, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=4704)
        0.00856136 = weight(_text_:information in 4704) [ClassicSimilarity], result of:
          0.00856136 = score(doc=4704,freq=4.0), product of:
            0.052020688 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.029633347 = queryNorm
            0.16457605 = fieldWeight in 4704, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=4704)
        0.025420163 = weight(_text_:retrieval in 4704) [ClassicSimilarity], result of:
          0.025420163 = score(doc=4704,freq=4.0), product of:
            0.08963835 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.029633347 = queryNorm
            0.2835858 = fieldWeight in 4704, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=4704)
      0.21428572 = coord(3/14)
    
    Abstract
    Swoogle is a crawler-based indexing and retrieval system for the Semantic Web, i.e., for Web documents in RDF or OWL. It extracts metadata for each discovered document, and computes relations between documents. Discovered documents are also indexed by an information retrieval system which can use either character N-Gram or URIrefs as keywords to find relevant documents and to compute the similarity among a set of documents. One of the interesting properties we compute is rank, a measure of the importance of a Semantic Web document.
    Content
    Vgl. unter: http://www.dblab.ntua.gr/~bikakis/LD/5.pdf Vgl. auch: http://swoogle.umbc.edu/. Vgl. auch: http://ebiquity.umbc.edu/paper/html/id/183/. Vgl. auch: Radhakrishnan, A.: Swoogle : An Engine for the Semantic Web unter: http://www.searchenginejournal.com/swoogle-an-engine-for-the-semantic-web/5469/.
    Source
    CIKM '04 Proceedings of the thirteenth ACM international conference on Information and knowledge management
    Theme
    Semantic Web
  4. Lossau, N.: Search engine technology and digital libraries : libraries need to discover the academic internet (2004) 0.02
    0.01789869 = product of:
      0.083527215 = sum of:
        0.044992477 = weight(_text_:wide in 1161) [ClassicSimilarity], result of:
          0.044992477 = score(doc=1161,freq=2.0), product of:
            0.1312982 = queryWeight, product of:
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.029633347 = queryNorm
            0.342674 = fieldWeight in 1161, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1161)
        0.024409214 = weight(_text_:web in 1161) [ClassicSimilarity], result of:
          0.024409214 = score(doc=1161,freq=2.0), product of:
            0.09670874 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.029633347 = queryNorm
            0.25239927 = fieldWeight in 1161, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1161)
        0.014125523 = weight(_text_:information in 1161) [ClassicSimilarity], result of:
          0.014125523 = score(doc=1161,freq=8.0), product of:
            0.052020688 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.029633347 = queryNorm
            0.27153665 = fieldWeight in 1161, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1161)
      0.21428572 = coord(3/14)
    
    Abstract
    With the development of the World Wide Web, the "information search" has grown to be a significant business sector of a global, competitive and commercial market. Powerful players have entered this market, such as commercial internet search engines, information portals, multinational publishers and online content integrators. Will Google, Yahoo or Microsoft be the only portals to global knowledge in 2010? If libraries do not want to become marginalized in a key area of their traditional services, they need to acknowledge the challenges that come with the globalisation of scholarly information, the existence and further growth of the academic internet
    Theme
    Information Gateway
  5. Hogan, A.; Harth, A.; Umbrich, J.; Kinsella, S.; Polleres, A.; Decker, S.: Searching and browsing Linked Data with SWSE : the Semantic Web Search Engine (2011) 0.02
    0.01723306 = product of:
      0.08042095 = sum of:
        0.06039714 = weight(_text_:web in 438) [ClassicSimilarity], result of:
          0.06039714 = score(doc=438,freq=24.0), product of:
            0.09670874 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.029633347 = queryNorm
            0.6245262 = fieldWeight in 438, product of:
              4.8989797 = tf(freq=24.0), with freq of:
                24.0 = termFreq=24.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=438)
        0.0050448296 = weight(_text_:information in 438) [ClassicSimilarity], result of:
          0.0050448296 = score(doc=438,freq=2.0), product of:
            0.052020688 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.029633347 = queryNorm
            0.09697737 = fieldWeight in 438, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=438)
        0.014978974 = weight(_text_:retrieval in 438) [ClassicSimilarity], result of:
          0.014978974 = score(doc=438,freq=2.0), product of:
            0.08963835 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.029633347 = queryNorm
            0.16710453 = fieldWeight in 438, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=438)
      0.21428572 = coord(3/14)
    
    Abstract
    In this paper, we discuss the architecture and implementation of the Semantic Web Search Engine (SWSE). Following traditional search engine architecture, SWSE consists of crawling, data enhancing, indexing and a user interface for search, browsing and retrieval of information; unlike traditional search engines, SWSE operates over RDF Web data - loosely also known as Linked Data - which implies unique challenges for the system design, architecture, algorithms, implementation and user interface. In particular, many challenges exist in adopting Semantic Web technologies for Web data: the unique challenges of the Web - in terms of scale, unreliability, inconsistency and noise - are largely overlooked by the current Semantic Web standards. Herein, we describe the current SWSE system, initially detailing the architecture and later elaborating upon the function, design, implementation and performance of each individual component. In so doing, we also give an insight into how current Semantic Web standards can be tailored, in a best-effort manner, for use on Web data. Throughout, we offer evaluation and complementary argumentation to support our design choices, and also offer discussion on future directions and open research questions. Later, we also provide candid discussion relating to the difficulties currently faced in bringing such a search engine into the mainstream, and lessons learnt from roughly six years working on the Semantic Web Search Engine project.
    Object
    Semantic Web Search Engine
    Theme
    Semantic Web
  6. Leighton, H.V.: Performance of four World Wide Web (WWW) index services : Infoseek, Lycos, WebCrawler and WWWWorm (1995) 0.02
    0.016996333 = product of:
      0.11897433 = sum of:
        0.07712996 = weight(_text_:wide in 3168) [ClassicSimilarity], result of:
          0.07712996 = score(doc=3168,freq=2.0), product of:
            0.1312982 = queryWeight, product of:
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.029633347 = queryNorm
            0.5874411 = fieldWeight in 3168, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.09375 = fieldNorm(doc=3168)
        0.041844364 = weight(_text_:web in 3168) [ClassicSimilarity], result of:
          0.041844364 = score(doc=3168,freq=2.0), product of:
            0.09670874 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.029633347 = queryNorm
            0.43268442 = fieldWeight in 3168, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.09375 = fieldNorm(doc=3168)
      0.14285715 = coord(2/14)
    
  7. Boldi, P.; Santini, M.; Vigna, S.: PageRank as a function of the damping factor (2005) 0.01
    0.01360415 = product of:
      0.06348603 = sum of:
        0.032137483 = weight(_text_:wide in 2564) [ClassicSimilarity], result of:
          0.032137483 = score(doc=2564,freq=2.0), product of:
            0.1312982 = queryWeight, product of:
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.029633347 = queryNorm
            0.24476713 = fieldWeight in 2564, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2564)
        0.02465703 = weight(_text_:web in 2564) [ClassicSimilarity], result of:
          0.02465703 = score(doc=2564,freq=4.0), product of:
            0.09670874 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.029633347 = queryNorm
            0.25496176 = fieldWeight in 2564, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2564)
        0.0066915164 = product of:
          0.020074548 = sum of:
            0.020074548 = weight(_text_:22 in 2564) [ClassicSimilarity], result of:
              0.020074548 = score(doc=2564,freq=2.0), product of:
                0.103770934 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.029633347 = queryNorm
                0.19345059 = fieldWeight in 2564, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2564)
          0.33333334 = coord(1/3)
      0.21428572 = coord(3/14)
    
    Abstract
    PageRank is defined as the stationary state of a Markov chain. The chain is obtained by perturbing the transition matrix induced by a web graph with a damping factor alpha that spreads uniformly part of the rank. The choice of alpha is eminently empirical, and in most cases the original suggestion alpha=0.85 by Brin and Page is still used. Recently, however, the behaviour of PageRank with respect to changes in alpha was discovered to be useful in link-spam detection. Moreover, an analytical justification of the value chosen for alpha is still missing. In this paper, we give the first mathematical analysis of PageRank when alpha changes. In particular, we show that, contrarily to popular belief, for real-world graphs values of alpha close to 1 do not give a more meaningful ranking. Then, we give closed-form formulae for PageRank derivatives of any order, and an extension of the Power Method that approximates them with convergence O(t**k*alpha**t) for the k-th derivative. Finally, we show a tight connection between iterated computation and analytical behaviour by proving that the k-th iteration of the Power Method gives exactly the PageRank value obtained using a Maclaurin polynomial of degree k. The latter result paves the way towards the application of analytical methods to the study of PageRank.
    Date
    16. 1.2016 10:22:28
    Source
    http://vigna.di.unimi.it/ftp/papers/PageRankAsFunction.pdf [Proceedings of the ACM World Wide Web Conference (WWW), 2005]
  8. Austin, D.: How Google finds your needle in the Web's haystack : as we'll see, the trick is to ask the web itself to rank the importance of pages... (2006) 0.01
    0.013528418 = product of:
      0.063132614 = sum of:
        0.022496238 = weight(_text_:wide in 93) [ClassicSimilarity], result of:
          0.022496238 = score(doc=93,freq=2.0), product of:
            0.1312982 = queryWeight, product of:
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.029633347 = queryNorm
            0.171337 = fieldWeight in 93, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.02734375 = fieldNorm(doc=93)
        0.034519844 = weight(_text_:web in 93) [ClassicSimilarity], result of:
          0.034519844 = score(doc=93,freq=16.0), product of:
            0.09670874 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.029633347 = queryNorm
            0.35694647 = fieldWeight in 93, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.02734375 = fieldNorm(doc=93)
        0.006116531 = weight(_text_:information in 93) [ClassicSimilarity], result of:
          0.006116531 = score(doc=93,freq=6.0), product of:
            0.052020688 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.029633347 = queryNorm
            0.11757882 = fieldWeight in 93, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.02734375 = fieldNorm(doc=93)
      0.21428572 = coord(3/14)
    
    Abstract
    Imagine a library containing 25 billion documents but with no centralized organization and no librarians. In addition, anyone may add a document at any time without telling anyone. You may feel sure that one of the documents contained in the collection has a piece of information that is vitally important to you, and, being impatient like most of us, you'd like to find it in a matter of seconds. How would you go about doing it? Posed in this way, the problem seems impossible. Yet this description is not too different from the World Wide Web, a huge, highly-disorganized collection of documents in many different formats. Of course, we're all familiar with search engines (perhaps you found this article using one) so we know that there is a solution. This article will describe Google's PageRank algorithm and how it returns pages from the web's collection of 25 billion documents that match search criteria so well that "google" has become a widely used verb. Most search engines, including Google, continually run an army of computer programs that retrieve pages from the web, index the words in each document, and store this information in an efficient format. Each time a user asks for a web search using a search phrase, such as "search engine," the search engine determines all the pages on the web that contains the words in the search phrase. (Perhaps additional information such as the distance between the words "search" and "engine" will be noted as well.) Here is the problem: Google now claims to index 25 billion pages. Roughly 95% of the text in web pages is composed from a mere 10,000 words. This means that, for most searches, there will be a huge number of pages containing the words in the search phrase. What is needed is a means of ranking the importance of the pages that fit the search criteria so that the pages can be sorted with the most important pages at the top of the list. One way to determine the importance of pages is to use a human-generated ranking. For instance, you may have seen pages that consist mainly of a large number of links to other resources in a particular area of interest. Assuming the person maintaining this page is reliable, the pages referenced are likely to be useful. Of course, the list may quickly fall out of date, and the person maintaining the list may miss some important pages, either unintentionally or as a result of an unstated bias. Google's PageRank algorithm assesses the importance of web pages without human evaluation of the content. In fact, Google feels that the value of its service is largely in its ability to provide unbiased results to search queries; Google claims, "the heart of our software is PageRank." As we'll see, the trick is to ask the web itself to rank the importance of pages.
  9. Baeza-Yates, R.; Boldi, P.; Castillo, C.: Generalizing PageRank : damping functions for linkbased ranking algorithms (2006) 0.01
    0.012614421 = product of:
      0.04415047 = sum of:
        0.017435152 = weight(_text_:web in 2565) [ClassicSimilarity], result of:
          0.017435152 = score(doc=2565,freq=2.0), product of:
            0.09670874 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.029633347 = queryNorm
            0.18028519 = fieldWeight in 2565, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2565)
        0.0050448296 = weight(_text_:information in 2565) [ClassicSimilarity], result of:
          0.0050448296 = score(doc=2565,freq=2.0), product of:
            0.052020688 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.029633347 = queryNorm
            0.09697737 = fieldWeight in 2565, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2565)
        0.014978974 = weight(_text_:retrieval in 2565) [ClassicSimilarity], result of:
          0.014978974 = score(doc=2565,freq=2.0), product of:
            0.08963835 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.029633347 = queryNorm
            0.16710453 = fieldWeight in 2565, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2565)
        0.0066915164 = product of:
          0.020074548 = sum of:
            0.020074548 = weight(_text_:22 in 2565) [ClassicSimilarity], result of:
              0.020074548 = score(doc=2565,freq=2.0), product of:
                0.103770934 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.029633347 = queryNorm
                0.19345059 = fieldWeight in 2565, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2565)
          0.33333334 = coord(1/3)
      0.2857143 = coord(4/14)
    
    Abstract
    This paper introduces a family of link-based ranking algorithms that propagate page importance through links. In these algorithms there is a damping function that decreases with distance, so a direct link implies more endorsement than a link through a long path. PageRank is the most widely known ranking function of this family. The main objective of this paper is to determine whether this family of ranking techniques has some interest per se, and how different choices for the damping function impact on rank quality and on convergence speed. Even though our results suggest that PageRank can be approximated with other simpler forms of rankings that may be computed more efficiently, our focus is of more speculative nature, in that it aims at separating the kernel of PageRank, that is, link-based importance propagation, from the way propagation decays over paths. We focus on three damping functions, having linear, exponential, and hyperbolic decay on the lengths of the paths. The exponential decay corresponds to PageRank, and the other functions are new. Our presentation includes algorithms, analysis, comparisons and experiments that study their behavior under different parameters in real Web graph data. Among other results, we show how to calculate a linear approximation that induces a page ordering that is almost identical to PageRank's using a fixed small number of iterations; comparisons were performed using Kendall's tau on large domain datasets.
    Date
    16. 1.2016 10:22:28
    Source
    http://chato.cl/papers/baeza06_general_pagerank_damping_functions_link_ranking.pdf [Proceedings of the ACM Special Interest Group on Information Retrieval (SIGIR) Conference, SIGIR'06, August 6-10, 2006, Seattle, Washington, USA]
  10. Powell, J.; Fox, E.A.: Multilingual federated searching across heterogeneous collections (1998) 0.01
    0.011330889 = product of:
      0.079316214 = sum of:
        0.051419973 = weight(_text_:wide in 1250) [ClassicSimilarity], result of:
          0.051419973 = score(doc=1250,freq=2.0), product of:
            0.1312982 = queryWeight, product of:
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.029633347 = queryNorm
            0.3916274 = fieldWeight in 1250, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.0625 = fieldNorm(doc=1250)
        0.027896244 = weight(_text_:web in 1250) [ClassicSimilarity], result of:
          0.027896244 = score(doc=1250,freq=2.0), product of:
            0.09670874 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.029633347 = queryNorm
            0.2884563 = fieldWeight in 1250, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0625 = fieldNorm(doc=1250)
      0.14285715 = coord(2/14)
    
    Abstract
    This article describes a scalable system for searching heterogeneous multilingual collections on the World Wide Web. It details a markup language for describing the characteristics of a search engine and its interface, and a protocol for requesting word translations between languages.
  11. Bates, M.E.: Quick answers to odd questions (2004) 0.01
    0.00913808 = product of:
      0.042644374 = sum of:
        0.01928249 = weight(_text_:wide in 3071) [ClassicSimilarity], result of:
          0.01928249 = score(doc=3071,freq=2.0), product of:
            0.1312982 = queryWeight, product of:
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.029633347 = queryNorm
            0.14686027 = fieldWeight in 3071, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.0234375 = fieldNorm(doc=3071)
        0.018119143 = weight(_text_:web in 3071) [ClassicSimilarity], result of:
          0.018119143 = score(doc=3071,freq=6.0), product of:
            0.09670874 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.029633347 = queryNorm
            0.18735787 = fieldWeight in 3071, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0234375 = fieldNorm(doc=3071)
        0.0052427407 = weight(_text_:information in 3071) [ClassicSimilarity], result of:
          0.0052427407 = score(doc=3071,freq=6.0), product of:
            0.052020688 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.029633347 = queryNorm
            0.10078184 = fieldWeight in 3071, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0234375 = fieldNorm(doc=3071)
      0.21428572 = coord(3/14)
    
    Content
    "One of the things I enjoyed the most when I was a reference librarian was the wide range of questions my clients sent my way. What was the original title of the first Godzilla movie? (Gojira, released in 1954) Who said 'I'm as pure as the driven slush'? (Tallulah Bankhead) What percentage of adults have gone to a jazz performance in the last year? (11%) I have found that librarians, speech writers and journalists have one thing in common - we all need to find information on all kinds of topics, and we usually need the answers right now. The following are a few of my favorite sites for finding answers to those there-must-be-an-answer-out-there questions. - For the electronic equivalent to the "ready reference" shelf of resources that most librarians keep hidden behind their desks, check out RefDesk . It is particularly good for answering factual questions - Where do I get the new Windows XP Service Pack? Where is the 386 area code? How do I contact my member of Congress? - Another resource for lots of those quick-fact questions is InfoPlease, the publishers of the Information Please almanac .- Right now, it's full of Olympics data, but it also has links to facts and factoids that you would look up in an almanac, atlas, or encyclopedia. - If you want numbers, start with the Statistical Abstract of the US. This source, produced by the U.S. Census Bureau, gives you everything from the divorce rate by state to airline cost indexes going back to 1980. It is many librarians' secret weapon for pulling numbers together quickly. - My favorite question is "how does that work?" Haven't you ever wondered how they get that Olympic torch to continue to burn while it is being carried by runners from one city to the next? Or how solar sails manage to propel a spacecraft? For answers, check out the appropriately-named How Stuff Works. - For questions about movies, my first resource is the Internet Movie Database. It is easy to search, is such a popular site that mistakes are corrected quickly, and is a fun place to catch trailers of both upcoming movies and those dating back to the 30s. - When I need to figure out who said what, I still tend to rely on the print sources such as Bartlett's Familiar Quotations . No, the current edition is not available on the web, but - and this is the librarian in me - I really appreciate the fact that I not only get the attribution but I also see the source of the quote. There are far too many quotes being attributed to a celebrity, but with no indication of the publication in which the quote appeared. Take, for example, the much-cited quote of Margaret Meade, "Never doubt that a small group of thoughtful committed people can change the world; indeed, it's the only thing that ever has!" Then see the page on the Institute for Intercultural Studies site, founded by Meade, and read its statement that it has never been able to verify this alleged quote from Meade. While there are lots of web-based sources of quotes (see QuotationsPage.com and Bartleby, for example), unless the site provides the original source for the quotation, I wouldn't rely on the citation. Of course, if you have a hunch as to the source of a quote, and it was published prior to 1923, head over to Project Gutenberg , which includes the full text of over 12,000 books that are in the public domain. When I needed to confirm a quotation of the Red Queen in "Through the Looking Glass", this is where I started. - And if you are stumped as to where to go to find information, instead of Googling it, try the Librarians' Index to the Internet. While it is somewhat US-centric, it is a great directory of web resources."
  12. Radhakrishnan, A.: Swoogle : an engine for the Semantic Web (2007) 0.01
    0.008792207 = product of:
      0.061545443 = sum of:
        0.05750958 = weight(_text_:web in 4709) [ClassicSimilarity], result of:
          0.05750958 = score(doc=4709,freq=34.0), product of:
            0.09670874 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.029633347 = queryNorm
            0.59466785 = fieldWeight in 4709, product of:
              5.8309517 = tf(freq=34.0), with freq of:
                34.0 = termFreq=34.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.03125 = fieldNorm(doc=4709)
        0.0040358636 = weight(_text_:information in 4709) [ClassicSimilarity], result of:
          0.0040358636 = score(doc=4709,freq=2.0), product of:
            0.052020688 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.029633347 = queryNorm
            0.0775819 = fieldWeight in 4709, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.03125 = fieldNorm(doc=4709)
      0.14285715 = coord(2/14)
    
    Content
    "Swoogle, the Semantic web search engine, is a research project carried out by the ebiquity research group in the Computer Science and Electrical Engineering Department at the University of Maryland. It's an engine tailored towards finding documents on the semantic web. The whole research paper is available here. Semantic web is touted as the next generation of online content representation where the web documents are represented in a language that is not only easy for humans but is machine readable (easing the integration of data as never thought possible) as well. And the main elements of the semantic web include data model description formats such as Resource Description Framework (RDF), a variety of data interchange formats (e.g. RDF/XML, Turtle, N-Triples), and notations such as RDF Schema (RDFS), the Web Ontology Language (OWL), all of which are intended to provide a formal description of concepts, terms, and relationships within a given knowledge domain (Wikipedia). And Swoogle is an attempt to mine and index this new set of web documents. The engine performs crawling of semantic documents like most web search engines and the search is available as web service too. The engine is primarily written in Java with the PHP used for the front-end and MySQL for database. Swoogle is capable of searching over 10,000 ontologies and indexes more that 1.3 million web documents. It also computes the importance of a Semantic Web document. The techniques used for indexing are the more google-type page ranking and also mining the documents for inter-relationships that are the basis for the semantic web. For more information on how the RDF framework can be used to relate documents, read the link here. Being a research project, and with a non-commercial motive, there is not much hype around Swoogle. However, the approach to indexing of Semantic web documents is an approach that most engines will have to take at some point of time. When the Internet debuted, there were no specific engines available for indexing or searching. The Search domain only picked up as more and more content became available. One fundamental question that I've always wondered about it is - provided that the search engines return very relevant results for a query - how to ascertain that the documents are indeed the most relevant ones available. There is always an inherent delay in indexing of document. Its here that the new semantic documents search engines can close delay. Experimenting with the concept of Search in the semantic web can only bore well for the future of search technology."
    Source
    http://www.searchenginejournal.com/swoogle-an-engine-for-the-semantic-web/5469/
    Theme
    Semantic Web
  13. Summann, F.; Lossau, N.: Search engine technology and digital libraries : moving from theory to practice (2004) 0.01
    0.007843386 = product of:
      0.036602467 = sum of:
        0.013948122 = weight(_text_:web in 1196) [ClassicSimilarity], result of:
          0.013948122 = score(doc=1196,freq=2.0), product of:
            0.09670874 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.029633347 = queryNorm
            0.14422815 = fieldWeight in 1196, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.03125 = fieldNorm(doc=1196)
        0.005707573 = weight(_text_:information in 1196) [ClassicSimilarity], result of:
          0.005707573 = score(doc=1196,freq=4.0), product of:
            0.052020688 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.029633347 = queryNorm
            0.10971737 = fieldWeight in 1196, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.03125 = fieldNorm(doc=1196)
        0.016946774 = weight(_text_:retrieval in 1196) [ClassicSimilarity], result of:
          0.016946774 = score(doc=1196,freq=4.0), product of:
            0.08963835 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.029633347 = queryNorm
            0.18905719 = fieldWeight in 1196, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.03125 = fieldNorm(doc=1196)
      0.21428572 = coord(3/14)
    
    Abstract
    This article describes the journey from the conception of and vision for a modern search-engine-based search environment to its technological realisation. In doing so, it takes up the thread of an earlier article on this subject, this time from a technical viewpoint. As well as presenting the conceptual considerations of the initial stages, this article will principally elucidate the technological aspects of this journey. The starting point for the deliberations about development of an academic search engine was the experience we gained through the generally successful project "Digital Library NRW", in which from 1998 to 2000-with Bielefeld University Library in overall charge-we designed a system model for an Internet-based library portal with an improved academic search environment at its core. At the heart of this system was a metasearch with an availability function, to which we added a user interface integrating all relevant source material for study and research. The deficiencies of this approach were felt soon after the system was launched in June 2001. There were problems with the stability and performance of the database retrieval system, with the integration of full-text documents and Internet pages, and with acceptance by users, because users are increasingly performing the searches themselves using search engines rather than going to the library for help in doing searches. Since a long list of problems are also encountered using commercial search engines for academic use (in particular the retrieval of academic information and long-term availability), the idea was born for a search engine configured specifically for academic use. We also hoped that with one single access point founded on improved search engine technology, we could access the heterogeneous academic resources of subject-based bibliographic databases, catalogues, electronic newspapers, document servers and academic web pages.
    Theme
    Information Gateway
  14. Brin, S.; Page, L.: ¬The anatomy of a large-scale hypertextual Web search engine (1998) 0.01
    0.007609078 = product of:
      0.053263545 = sum of:
        0.046129078 = weight(_text_:web in 947) [ClassicSimilarity], result of:
          0.046129078 = score(doc=947,freq=14.0), product of:
            0.09670874 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.029633347 = queryNorm
            0.47698978 = fieldWeight in 947, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=947)
        0.0071344664 = weight(_text_:information in 947) [ClassicSimilarity], result of:
          0.0071344664 = score(doc=947,freq=4.0), product of:
            0.052020688 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.029633347 = queryNorm
            0.13714671 = fieldWeight in 947, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=947)
      0.14285715 = coord(2/14)
    
    Abstract
    In this paper, we present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext. Google is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems. The prototype with a full text and hyperlink database of at least 24 million pages is available at http://google.stanford.edu/. To engineer a search engine is a challenging task. Search engines index tens to hundreds of millions of web pages involving a comparable number of distinct terms. They answer tens of millions of queries every day. Despite the importance of large-scale search engines on the web, very little academic research has been done on them. Furthermore, due to rapid advance in technology and web proliferation, creating a web search engine today is very different from three years ago. This paper provides an in-depth description of our large-scale web search engine -- the first such detailed public description we know of to date. Apart from the problems of scaling traditional search techniques to data of this magnitude, there are new technical challenges involved with using the additional information present in hypertext to produce better search results. This paper addresses this question of how to build a practical large-scale system which can exploit the additional information present in hypertext. Also we look at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want
  15. Entlich, R.: FAQ: Image Search Engines (2001) 0.01
    0.0075481744 = product of:
      0.05283722 = sum of:
        0.046783425 = weight(_text_:web in 155) [ClassicSimilarity], result of:
          0.046783425 = score(doc=155,freq=10.0), product of:
            0.09670874 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.029633347 = queryNorm
            0.48375595 = fieldWeight in 155, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=155)
        0.0060537956 = weight(_text_:information in 155) [ClassicSimilarity], result of:
          0.0060537956 = score(doc=155,freq=2.0), product of:
            0.052020688 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.029633347 = queryNorm
            0.116372846 = fieldWeight in 155, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=155)
      0.14285715 = coord(2/14)
    
    Abstract
    Everyone loves images. The web wasn't anything until images came along, then it was an overnight success. So how does one find a specific image on the web? By using one of a burgeoning number of image-focused search engines. These search engines are simply optimized versions of typical web indexes, with crawlers that go around sucking down web content and indexing it. But with image search engines, they focus on images only, and the web page text that may describe them. As information professionals, we know that this is a clumsy approach at best, but as the author puts it, until more sophisticated methods become available, the tools profiled here will "have to suffice." Seven search engines are thoroughly tested in this review article, with Google's Image Search (http://www.google.com/imghp?hl=en) being the highest rated
  16. Internet search tool details (1996) 0.01
    0.006422852 = product of:
      0.044959962 = sum of:
        0.034870304 = weight(_text_:web in 5677) [ClassicSimilarity], result of:
          0.034870304 = score(doc=5677,freq=2.0), product of:
            0.09670874 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.029633347 = queryNorm
            0.36057037 = fieldWeight in 5677, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.078125 = fieldNorm(doc=5677)
        0.010089659 = weight(_text_:information in 5677) [ClassicSimilarity], result of:
          0.010089659 = score(doc=5677,freq=2.0), product of:
            0.052020688 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.029633347 = queryNorm
            0.19395474 = fieldWeight in 5677, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.078125 = fieldNorm(doc=5677)
      0.14285715 = coord(2/14)
    
    Abstract
    Summaries of the popular engines extrated from the search sites. Summaries are from: AltaVista, Excite, HotBot, InfoSeek, Ultra, Lycos, OpenText Web Index, and Yahoo. Information covered includes Contents, Searching tips, Results, and Update frequency
  17. What is Schema.org? (2011) 0.01
    0.00639995 = product of:
      0.04479965 = sum of:
        0.036238287 = weight(_text_:web in 4437) [ClassicSimilarity], result of:
          0.036238287 = score(doc=4437,freq=6.0), product of:
            0.09670874 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.029633347 = queryNorm
            0.37471575 = fieldWeight in 4437, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=4437)
        0.00856136 = weight(_text_:information in 4437) [ClassicSimilarity], result of:
          0.00856136 = score(doc=4437,freq=4.0), product of:
            0.052020688 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.029633347 = queryNorm
            0.16457605 = fieldWeight in 4437, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=4437)
      0.14285715 = coord(2/14)
    
    Abstract
    This site provides a collection of schemas, i.e., html tags, that webmasters can use to markup their pages in ways recognized by major search providers. Search engines including Bing, Google and Yahoo! rely on this markup to improve the display of search results, making it easier for people to find the right web pages. Many sites are generated from structured data, which is often stored in databases. When this data is formatted into HTML, it becomes very difficult to recover the original structured data. Many applications, especially search engines, can benefit greatly from direct access to this structured data. On-page markup enables search engines to understand the information on web pages and provide richer search results in order to make it easier for users to find relevant information on the web. Markup can also enable new tools and applications that make use of the structure. A shared markup vocabulary makes easier for webmasters to decide on a markup schema and get the maximum benefit for their efforts. So, in the spirit of sitemaps.org, Bing, Google and Yahoo! have come together to provide a shared collection of schemas that webmasters can use.
  18. El-Ramly, N.; Peterson. R.E.; Volonino, L.: Top ten Web sites using search engines : the case of the desalination industry (1996) 0.01
    0.006041726 = product of:
      0.04229208 = sum of:
        0.036238287 = weight(_text_:web in 945) [ClassicSimilarity], result of:
          0.036238287 = score(doc=945,freq=6.0), product of:
            0.09670874 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.029633347 = queryNorm
            0.37471575 = fieldWeight in 945, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=945)
        0.0060537956 = weight(_text_:information in 945) [ClassicSimilarity], result of:
          0.0060537956 = score(doc=945,freq=2.0), product of:
            0.052020688 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.029633347 = queryNorm
            0.116372846 = fieldWeight in 945, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=945)
      0.14285715 = coord(2/14)
    
    Abstract
    The desalination industry involves the desalting of sea or brackish water and achieves the purpose of increasing the worls's effective water supply. There are approximately 4.000 desalination Web sites. The six major Internet search engines were used to determine, according to each of the six, the top twenty sites for desalination. Each site was visited and the 120 gross returns were pared down to the final ten - the 'Top Ten'. The Top Ten were then analyzed to determine what it was that made the sites useful and informative. The major attributes were: a) currency (up-to-date); b) search site capability; c) access to articles on desalination; d) newsletters; e) databases; f) product information; g) online conferencing; h) valuable links to other sites; l) communication links; j) site maps; and k) case studies. Reasons for having a Web site and the current status and prospects for Internet commerce are discussed
  19. Warnick, W.L.; Leberman, A.; Scott, R.L.; Spence, K.J.; Johnsom, L.A.; Allen, V.S.: Searching the deep Web : directed query engine applications at the Department of Energy (2001) 0.01
    0.0059565757 = product of:
      0.041696027 = sum of:
        0.029588435 = weight(_text_:web in 1215) [ClassicSimilarity], result of:
          0.029588435 = score(doc=1215,freq=4.0), product of:
            0.09670874 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.029633347 = queryNorm
            0.3059541 = fieldWeight in 1215, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=1215)
        0.012107591 = weight(_text_:information in 1215) [ClassicSimilarity], result of:
          0.012107591 = score(doc=1215,freq=8.0), product of:
            0.052020688 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.029633347 = queryNorm
            0.23274569 = fieldWeight in 1215, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=1215)
      0.14285715 = coord(2/14)
    
    Abstract
    Directed Query Engines, an emerging class of search engine specifically designed to access distributed resources on the deep web, offer the opportunity to create inexpensive digital libraries. Already, one such engine, Distributed Explorer, has been used to select and assemble high quality information resources and incorporate them into publicly available systems for the physical sciences. By nesting Directed Query Engines so that one query launches several other engines in a cascading fashion, enormous virtual collections may soon be assembled to form a comprehensive information infrastructure for the physical sciences. Once a Directed Query Engine has been configured for a set of information resources, distributed alerts tools can provide patrons with personalized, profile-based notices of recent additions to any of the selected resources. Due to the potentially enormous size and scope of Directed Query Engine applications, consideration must be given to issues surrounding the representation of large quantities of information from multiple, heterogeneous sources.
  20. Dodge, M.: ¬A map of Yahoo! (2000) 0.01
    0.0057541872 = product of:
      0.04027931 = sum of:
        0.031959165 = weight(_text_:web in 1555) [ClassicSimilarity], result of:
          0.031959165 = score(doc=1555,freq=42.0), product of:
            0.09670874 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.029633347 = queryNorm
            0.3304682 = fieldWeight in 1555, product of:
              6.4807405 = tf(freq=42.0), with freq of:
                42.0 = termFreq=42.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.015625 = fieldNorm(doc=1555)
        0.008320145 = weight(_text_:information in 1555) [ClassicSimilarity], result of:
          0.008320145 = score(doc=1555,freq=34.0), product of:
            0.052020688 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.029633347 = queryNorm
            0.15993917 = fieldWeight in 1555, product of:
              5.8309517 = tf(freq=34.0), with freq of:
                34.0 = termFreq=34.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.015625 = fieldNorm(doc=1555)
      0.14285715 = coord(2/14)
    
    Content
    "Introduction Yahoo! is the undisputed king of the Web directories, providing one of the key information navigation tools on the Internet. It has maintained its popularity over many Internet-years as the most visited Web site, against intense competition. This is because it does a good job of shifting, cataloguing and organising the Web [1] . But what would a map of Yahoo!'s hierarchical classification of the Web look like? Would an interactive map of Yahoo!, rather than the conventional listing of sites, be more useful as navigational tool? We can get some idea what a map of Yahoo! might be like by taking a look at ET-Map, a prototype developed by Hsinchun Chen and colleagues in the Artificial Intelligence Lab [2] at the University of Arizona. ET-Map was developed in 1995 as part of innovative research in automatic Internet homepage categorization and it charts a large chunk of Yahoo!, from the entertainment section representing some 110,000 different Web links. The map is a two-dimensional, multi-layered category map; its aim is to provide an intuitive visual information browsing tool. ET-Map can be browsed interactively, explored and queried, using the familiar point-and-click navigation style of the Web to find information of interest.
    The View From Above Browsing for a particular piece on information on the Web can often feel like being stuck in an unfamiliar part of town walking around at street level looking for a particular store. You know the store is around there somewhere, but your viewpoint at ground level is constrained. What you really want is to get above the streets, hovering half a mile or so up in the air, to see the whole neighbourhood. This kind of birds-eye view function has been memorably described by David D. Clark, Senior Research Scientist at MIT's Laboratory for Computer Science and the Chairman of the Invisible Worlds Protocol Advisory Board, as the missing "up button" on the browser [3] . ET-Map is a nice example of a prototype for Clark's "up-button" view of an information space. The goal of information maps, like ET-Map, is to provide the browser with a sense of the lie of the information landscape, what is where, the location of clusters and hotspots, what is related to what. Ideally, this 'big-picture' all-in-one visual summary needs to fit on a single standard computer screen. ET-Map is one of my favourite examples, but there are many other interesting information maps being developed by other researchers and companies (see inset at the bottom of this page). How does ET-Map work? Here is a sequence of screenshots of a typical browsing session with ET-Map, which ends with access to Web pages on jazz musician Miles Davis. You can also tryout ET-Map for yourself, using a fully working demo on the AI Lab's website [4] . We begin with the top-level map showing forty odd broad entertainment 'subject regions' represented by regularly shaped tiles. Each tile is a visual summary of a group of Web pages with similar content. These tiles are shaded different colours to differentiate them, while labels identify the subject of the tile and the number in brackets telling you how many individual Web page links it contains. ET-Map uses two important, but common-sense, spatial concepts in its organisation and representation of the Web. Firstly, the 'subject regions' size is directly related to the number of Web pages in that category. For example, the 'MUSIC' subject area contains over 11,000 pages and so has a much larger area than the neighbouring area of 'LIVE' which only has 4,300 odd pages. This is intuitively meaningful, as the largest tiles are visually more prominent on the map and are likely to be more significant as they contain the most links. In addition, a second spatial concept, that of neighbourhood proximity, is applied so 'subject regions' closely related in term of content are plotted close to each other on the map. For example, 'FILM' and 'YEAR'S OSCARS', at the bottom left, are neighbours in both semantic and spatial space. This make senses as many things in the real-world are ordered in this way, with things that are alike being spatially close together (e.g. layout of goods in a store, or books in a library). Importantly, ET-Map is also a multi-layer map, with sub-maps showing greater informational resolution through a finer degree of categorization. So for any subject region that contains more than two hundred Web pages, a second-level map, with more detailed categories is generated. This subdivision of information space is repeated down the hierarchy as far as necessary. In the example, the user selected the 'MUSIC' subject region which, not surprisingly, contained many thousands of pages. A second-level map with numerous different music categories is then presented to the user. Delving deeper, the user wants to learn more about jazz music, so clicking on the 'JAZZ' tile leads to a third-level map, a fine-grained map of jazz related Web pages. Finally, selecting the 'MILES DAVIS' subject region leads to more a conventional looking ranking of pages from which the user selects one to download.
    ET-Map was created using a sophisticated AI technique called Kohonen self-organizing map, a neural network approach that has been used for automatic analysis and classification of semantic content of text documents like Web pages. I do not pretend to fully understand how this technique works; I tend to think of it as a clever 'black-box' that group together things that are alike [5] . It is a real challenge to automatically classify pages from a very heterogeneous information collection like the Web into categories that will match the conceptions of a typical user. Directories like Yahoo! tend to rely on the skill of human editors to achieve this. ET-Map is an interesting prototype that I think highlights well the potential for a map-based approach to Web browsing. I am surprised none of the major search engines or directories have introduced the option of mapping results. Although, I am sure many are working on ideas. People certainly need all the help they get, as Web growth shows no sign of slowing. Just last month it was reported that the Web had surpassed one billion indexable pages [6].
    Information Maps There are many other fascinating examples that employ two dimensional interactive maps to provide a 'birds-eye' view of information. They use various underlying techniques of textual analysis and clustering to turn the mass of information into a useful summary map (see "Mining in Textual Mountains" in Mappa.Mundi Magazine). In terms of visual representations they can be divided into two groups, those that generate smooth surfaces and those that produce regular, tiled maps. Unfortunately, we don't have space to examine them in detail, but they are well worth spending some time exploring. I will be covering some of them in future columns.
    Research Prototypes Visual SiteMap Developed by Xia Lin, based at the College of Library and Information Science, Drexel University. CVG Cyberspace geography visualization, developed by Luc Girardin, at The Graduate Institute of International Studies, Switzerland. WEBSOM Maps the thousands of articles posted on Usenet newsgroups. It is being developed by researchers at the Neural Networks Research Centre, Helsinki University of Technology in Finland. TreeMaps Developed by Brian Johnson, Ben Shneiderman and colleagues in the Human-Computer Interaction Lab at the University of Maryland. Commercial Information Maps: NewsMaps Provides interactive information landscapes summarizing daily news stories, developed Cartia, Inc. Web Squirrel Creates maps known as information farms. It is developed by Eastgate Systems, Inc. Umap Produces interactive maps of Web searches. Map of the Market An interactive map of the market performance of the stocks of major US corporations developed by SmartMoney.com."

Years