Search (1 results, page 1 of 1)

  • × author_ss:"Austin, D."
  • × theme_ss:"Retrievalalgorithmen"
  1. Austin, D.: How Google finds your needle in the Web's haystack : as we'll see, the trick is to ask the web itself to rank the importance of pages... (2006) 0.02
    0.015172388 = product of:
      0.030344777 = sum of:
        0.030344777 = product of:
          0.060689554 = sum of:
            0.060689554 = weight(_text_:web in 93) [ClassicSimilarity], result of:
              0.060689554 = score(doc=93,freq=16.0), product of:
                0.17002425 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.052098576 = queryNorm
                0.35694647 = fieldWeight in 93, product of:
                  4.0 = tf(freq=16.0), with freq of:
                    16.0 = termFreq=16.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.02734375 = fieldNorm(doc=93)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Imagine a library containing 25 billion documents but with no centralized organization and no librarians. In addition, anyone may add a document at any time without telling anyone. You may feel sure that one of the documents contained in the collection has a piece of information that is vitally important to you, and, being impatient like most of us, you'd like to find it in a matter of seconds. How would you go about doing it? Posed in this way, the problem seems impossible. Yet this description is not too different from the World Wide Web, a huge, highly-disorganized collection of documents in many different formats. Of course, we're all familiar with search engines (perhaps you found this article using one) so we know that there is a solution. This article will describe Google's PageRank algorithm and how it returns pages from the web's collection of 25 billion documents that match search criteria so well that "google" has become a widely used verb. Most search engines, including Google, continually run an army of computer programs that retrieve pages from the web, index the words in each document, and store this information in an efficient format. Each time a user asks for a web search using a search phrase, such as "search engine," the search engine determines all the pages on the web that contains the words in the search phrase. (Perhaps additional information such as the distance between the words "search" and "engine" will be noted as well.) Here is the problem: Google now claims to index 25 billion pages. Roughly 95% of the text in web pages is composed from a mere 10,000 words. This means that, for most searches, there will be a huge number of pages containing the words in the search phrase. What is needed is a means of ranking the importance of the pages that fit the search criteria so that the pages can be sorted with the most important pages at the top of the list. One way to determine the importance of pages is to use a human-generated ranking. For instance, you may have seen pages that consist mainly of a large number of links to other resources in a particular area of interest. Assuming the person maintaining this page is reliable, the pages referenced are likely to be useful. Of course, the list may quickly fall out of date, and the person maintaining the list may miss some important pages, either unintentionally or as a result of an unstated bias. Google's PageRank algorithm assesses the importance of web pages without human evaluation of the content. In fact, Google feels that the value of its service is largely in its ability to provide unbiased results to search queries; Google claims, "the heart of our software is PageRank." As we'll see, the trick is to ask the web itself to rank the importance of pages.