Search (24 results, page 1 of 2)

  • × theme_ss:"Retrievalalgorithmen"
  • × theme_ss:"Suchmaschinen"
  • × year_i:[2000 TO 2010}
  1. Berry, M.W.; Browne, M.: Understanding search engines : mathematical modeling and text retrieval (2005) 0.05
    0.04618574 = product of:
      0.17319651 = sum of:
        0.08941177 = weight(_text_:suchmaschine in 7) [ClassicSimilarity], result of:
          0.08941177 = score(doc=7,freq=8.0), product of:
            0.17890577 = queryWeight, product of:
              5.6542544 = idf(docFreq=420, maxDocs=44218)
              0.031640913 = queryNorm
            0.4997702 = fieldWeight in 7, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              5.6542544 = idf(docFreq=420, maxDocs=44218)
              0.03125 = fieldNorm(doc=7)
        0.038118195 = weight(_text_:software in 7) [ClassicSimilarity], result of:
          0.038118195 = score(doc=7,freq=6.0), product of:
            0.12552431 = queryWeight, product of:
              3.9671519 = idf(docFreq=2274, maxDocs=44218)
              0.031640913 = queryNorm
            0.3036718 = fieldWeight in 7, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.9671519 = idf(docFreq=2274, maxDocs=44218)
              0.03125 = fieldNorm(doc=7)
        0.024604581 = weight(_text_:evaluation in 7) [ClassicSimilarity], result of:
          0.024604581 = score(doc=7,freq=2.0), product of:
            0.13272417 = queryWeight, product of:
              4.1947007 = idf(docFreq=1811, maxDocs=44218)
              0.031640913 = queryNorm
            0.18538132 = fieldWeight in 7, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.1947007 = idf(docFreq=1811, maxDocs=44218)
              0.03125 = fieldNorm(doc=7)
        0.021061972 = weight(_text_:web in 7) [ClassicSimilarity], result of:
          0.021061972 = score(doc=7,freq=4.0), product of:
            0.10326045 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.031640913 = queryNorm
            0.2039694 = fieldWeight in 7, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.03125 = fieldNorm(doc=7)
      0.26666668 = coord(4/15)
    
    Abstract
    The second edition of Understanding Search Engines: Mathematical Modeling and Text Retrieval follows the basic premise of the first edition by discussing many of the key design issues for building search engines and emphasizing the important role that applied mathematics can play in improving information retrieval. The authors discuss important data structures, algorithms, and software as well as user-centered issues such as interfaces, manual indexing, and document preparation. Significant changes bring the text up to date on current information retrieval methods: for example the addition of a new chapter on link-structure algorithms used in search engines such as Google. The chapter on user interface has been rewritten to specifically focus on search engine usability. In addition the authors have added new recommendations for further reading and expanded the bibliography, and have updated and streamlined the index to make it more reader friendly.
    Content
    Inhalt: Introduction Document File Preparation - Manual Indexing - Information Extraction - Vector Space Modeling - Matrix Decompositions - Query Representations - Ranking and Relevance Feedback - Searching by Link Structure - User Interface - Book Format Document File Preparation Document Purification and Analysis - Text Formatting - Validation - Manual Indexing - Automatic Indexing - Item Normalization - Inverted File Structures - Document File - Dictionary List - Inversion List - Other File Structures Vector Space Models Construction - Term-by-Document Matrices - Simple Query Matching - Design Issues - Term Weighting - Sparse Matrix Storage - Low-Rank Approximations Matrix Decompositions QR Factorization - Singular Value Decomposition - Low-Rank Approximations - Query Matching - Software - Semidiscrete Decomposition - Updating Techniques Query Management Query Binding - Types of Queries - Boolean Queries - Natural Language Queries - Thesaurus Queries - Fuzzy Queries - Term Searches - Probabilistic Queries Ranking and Relevance Feedback Performance Evaluation - Precision - Recall - Average Precision - Genetic Algorithms - Relevance Feedback Searching by Link Structure HITS Method - HITS Implementation - HITS Summary - PageRank Method - PageRank Adjustments - PageRank Implementation - PageRank Summary User Interface Considerations General Guidelines - Search Engine Interfaces - Form Fill-in - Display Considerations - Progress Indication - No Penalties for Error - Results - Test and Retest - Final Considerations Further Reading
    LCSH
    Web search engines
    RSWK
    Suchmaschine / Information Retrieval
    Suchmaschine / Information Retrieval / Mathematisches Modell (HEBIS)
    Series
    Software, environments, tools; 17
    Subject
    Web search engines
    Suchmaschine / Information Retrieval
    Suchmaschine / Information Retrieval / Mathematisches Modell (HEBIS)
  2. Thelwall, M.; Vaughan, L.: New versions of PageRank employing alternative Web document models (2004) 0.02
    0.020364448 = product of:
      0.15273336 = sum of:
        0.063185915 = weight(_text_:web in 674) [ClassicSimilarity], result of:
          0.063185915 = score(doc=674,freq=16.0), product of:
            0.10326045 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.031640913 = queryNorm
            0.6119082 = fieldWeight in 674, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=674)
        0.08954745 = weight(_text_:site in 674) [ClassicSimilarity], result of:
          0.08954745 = score(doc=674,freq=4.0), product of:
            0.1738463 = queryWeight, product of:
              5.494352 = idf(docFreq=493, maxDocs=44218)
              0.031640913 = queryNorm
            0.5150955 = fieldWeight in 674, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.494352 = idf(docFreq=493, maxDocs=44218)
              0.046875 = fieldNorm(doc=674)
      0.13333334 = coord(2/15)
    
    Abstract
    Introduces several new versions of PageRank (the link based Web page ranking algorithm), based on an information science perspective on the concept of the Web document. Although the Web page is the typical indivisible unit of information in search engine results and most Web information retrieval algorithms, other research has suggested that aggregating pages based on directories and domains gives promising alternatives, particularly when Web links are the object of study. The new algorithms introduced based on these alternatives were used to rank four sets of Web pages. The ranking results were compared with human subjects' rankings. The results of the tests were somewhat inconclusive: the new approach worked well for the set that includes pages from different Web sites; however, it does not work well in ranking pages that are from the same site. It seems that the new algorithms may be effective for some tasks but not for others, especially when only low numbers of links are involved or the pages to be ranked are from the same site or directory.
  3. Kanaeva, Z.: Ranking: Google und CiteSeer (2005) 0.02
    0.016752748 = product of:
      0.12564561 = sum of:
        0.11064143 = weight(_text_:suchmaschine in 3276) [ClassicSimilarity], result of:
          0.11064143 = score(doc=3276,freq=4.0), product of:
            0.17890577 = queryWeight, product of:
              5.6542544 = idf(docFreq=420, maxDocs=44218)
              0.031640913 = queryNorm
            0.6184341 = fieldWeight in 3276, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.6542544 = idf(docFreq=420, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3276)
        0.015004174 = product of:
          0.030008348 = sum of:
            0.030008348 = weight(_text_:22 in 3276) [ClassicSimilarity], result of:
              0.030008348 = score(doc=3276,freq=2.0), product of:
                0.110801086 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.031640913 = queryNorm
                0.2708308 = fieldWeight in 3276, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=3276)
          0.5 = coord(1/2)
      0.13333334 = coord(2/15)
    
    Abstract
    Im Rahmen des klassischen Information Retrieval wurden verschiedene Verfahren für das Ranking sowie die Suche in einer homogenen strukturlosen Dokumentenmenge entwickelt. Die Erfolge der Suchmaschine Google haben gezeigt dass die Suche in einer zwar inhomogenen aber zusammenhängenden Dokumentenmenge wie dem Internet unter Berücksichtigung der Dokumentenverbindungen (Links) sehr effektiv sein kann. Unter den von der Suchmaschine Google realisierten Konzepten ist ein Verfahren zum Ranking von Suchergebnissen (PageRank), das in diesem Artikel kurz erklärt wird. Darüber hinaus wird auf die Konzepte eines Systems namens CiteSeer eingegangen, welches automatisch bibliographische Angaben indexiert (engl. Autonomous Citation Indexing, ACI). Letzteres erzeugt aus einer Menge von nicht vernetzten wissenschaftlichen Dokumenten eine zusammenhängende Dokumentenmenge und ermöglicht den Einsatz von Banking-Verfahren, die auf den von Google genutzten Verfahren basieren.
    Date
    20. 3.2005 16:23:22
  4. Austin, D.: How Google finds your needle in the Web's haystack : as we'll see, the trick is to ask the web itself to rank the importance of pages... (2006) 0.02
    0.015528813 = product of:
      0.077644065 = sum of:
        0.019256605 = weight(_text_:software in 93) [ClassicSimilarity], result of:
          0.019256605 = score(doc=93,freq=2.0), product of:
            0.12552431 = queryWeight, product of:
              3.9671519 = idf(docFreq=2274, maxDocs=44218)
              0.031640913 = queryNorm
            0.15340936 = fieldWeight in 93, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.9671519 = idf(docFreq=2274, maxDocs=44218)
              0.02734375 = fieldNorm(doc=93)
        0.02152901 = weight(_text_:evaluation in 93) [ClassicSimilarity], result of:
          0.02152901 = score(doc=93,freq=2.0), product of:
            0.13272417 = queryWeight, product of:
              4.1947007 = idf(docFreq=1811, maxDocs=44218)
              0.031640913 = queryNorm
            0.16220866 = fieldWeight in 93, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.1947007 = idf(docFreq=1811, maxDocs=44218)
              0.02734375 = fieldNorm(doc=93)
        0.036858454 = weight(_text_:web in 93) [ClassicSimilarity], result of:
          0.036858454 = score(doc=93,freq=16.0), product of:
            0.10326045 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.031640913 = queryNorm
            0.35694647 = fieldWeight in 93, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.02734375 = fieldNorm(doc=93)
      0.2 = coord(3/15)
    
    Abstract
    Imagine a library containing 25 billion documents but with no centralized organization and no librarians. In addition, anyone may add a document at any time without telling anyone. You may feel sure that one of the documents contained in the collection has a piece of information that is vitally important to you, and, being impatient like most of us, you'd like to find it in a matter of seconds. How would you go about doing it? Posed in this way, the problem seems impossible. Yet this description is not too different from the World Wide Web, a huge, highly-disorganized collection of documents in many different formats. Of course, we're all familiar with search engines (perhaps you found this article using one) so we know that there is a solution. This article will describe Google's PageRank algorithm and how it returns pages from the web's collection of 25 billion documents that match search criteria so well that "google" has become a widely used verb. Most search engines, including Google, continually run an army of computer programs that retrieve pages from the web, index the words in each document, and store this information in an efficient format. Each time a user asks for a web search using a search phrase, such as "search engine," the search engine determines all the pages on the web that contains the words in the search phrase. (Perhaps additional information such as the distance between the words "search" and "engine" will be noted as well.) Here is the problem: Google now claims to index 25 billion pages. Roughly 95% of the text in web pages is composed from a mere 10,000 words. This means that, for most searches, there will be a huge number of pages containing the words in the search phrase. What is needed is a means of ranking the importance of the pages that fit the search criteria so that the pages can be sorted with the most important pages at the top of the list. One way to determine the importance of pages is to use a human-generated ranking. For instance, you may have seen pages that consist mainly of a large number of links to other resources in a particular area of interest. Assuming the person maintaining this page is reliable, the pages referenced are likely to be useful. Of course, the list may quickly fall out of date, and the person maintaining the list may miss some important pages, either unintentionally or as a result of an unstated bias. Google's PageRank algorithm assesses the importance of web pages without human evaluation of the content. In fact, Google feels that the value of its service is largely in its ability to provide unbiased results to search queries; Google claims, "the heart of our software is PageRank." As we'll see, the trick is to ask the web itself to rank the importance of pages.
  5. Back, J.: ¬An evaluation of relevancy ranking techniques used by Internet search engines (2000) 0.02
    0.015483252 = product of:
      0.116124384 = sum of:
        0.08611604 = weight(_text_:evaluation in 3445) [ClassicSimilarity], result of:
          0.08611604 = score(doc=3445,freq=2.0), product of:
            0.13272417 = queryWeight, product of:
              4.1947007 = idf(docFreq=1811, maxDocs=44218)
              0.031640913 = queryNorm
            0.64883465 = fieldWeight in 3445, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.1947007 = idf(docFreq=1811, maxDocs=44218)
              0.109375 = fieldNorm(doc=3445)
        0.030008348 = product of:
          0.060016695 = sum of:
            0.060016695 = weight(_text_:22 in 3445) [ClassicSimilarity], result of:
              0.060016695 = score(doc=3445,freq=2.0), product of:
                0.110801086 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.031640913 = queryNorm
                0.5416616 = fieldWeight in 3445, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=3445)
          0.5 = coord(1/2)
      0.13333334 = coord(2/15)
    
    Date
    25. 8.2005 17:42:22
  6. Thelwall, M.: Can Google's PageRank be used to find the most important academic Web pages? (2003) 0.02
    0.0151029965 = product of:
      0.113272466 = sum of:
        0.049952857 = weight(_text_:web in 4457) [ClassicSimilarity], result of:
          0.049952857 = score(doc=4457,freq=10.0), product of:
            0.10326045 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.031640913 = queryNorm
            0.48375595 = fieldWeight in 4457, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=4457)
        0.06331961 = weight(_text_:site in 4457) [ClassicSimilarity], result of:
          0.06331961 = score(doc=4457,freq=2.0), product of:
            0.1738463 = queryWeight, product of:
              5.494352 = idf(docFreq=493, maxDocs=44218)
              0.031640913 = queryNorm
            0.3642275 = fieldWeight in 4457, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.494352 = idf(docFreq=493, maxDocs=44218)
              0.046875 = fieldNorm(doc=4457)
      0.13333334 = coord(2/15)
    
    Abstract
    Google's PageRank is an influential algorithm that uses a model of Web use that is dominated by its link structure in order to rank pages by their estimated value to the Web community. This paper reports on the outcome of applying the algorithm to the Web sites of three national university systems in order to test whether it is capable of identifying the most important Web pages. The results are also compared with simple inlink counts. It was discovered that the highest inlinked pages do not always have the highest PageRank, indicating that the two metrics are genuinely different, even for the top pages. More significantly, however, internal links dominated external links for the high ranks in either method and superficial reasons accounted for high scores in both cases. It is concluded that PageRank is not useful for identifying the top pages in a site and that it must be combined with a powerful text matching techniques in order to get the quality of information retrieval results provided by Google.
  7. Stock, M.; Stock, W.G.: Internet-Suchwerkzeuge im Vergleich (IV) : Relevance Ranking nach "Popularität" von Webseiten: Google (2001) 0.01
    0.014100287 = product of:
      0.10575215 = sum of:
        0.06705883 = weight(_text_:suchmaschine in 5771) [ClassicSimilarity], result of:
          0.06705883 = score(doc=5771,freq=2.0), product of:
            0.17890577 = queryWeight, product of:
              5.6542544 = idf(docFreq=420, maxDocs=44218)
              0.031640913 = queryNorm
            0.37482765 = fieldWeight in 5771, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.6542544 = idf(docFreq=420, maxDocs=44218)
              0.046875 = fieldNorm(doc=5771)
        0.038693316 = weight(_text_:web in 5771) [ClassicSimilarity], result of:
          0.038693316 = score(doc=5771,freq=6.0), product of:
            0.10326045 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.031640913 = queryNorm
            0.37471575 = fieldWeight in 5771, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=5771)
      0.13333334 = coord(2/15)
    
    Abstract
    In unserem Retrievaltest von Suchwerkzeugen im World Wide Web (Password 11/2000) schnitt die Suchmaschine Google am besten ab. Im Vergleich zu anderen Search Engines setzt Google kaum auf Informationslinguistik, sondern auf Algorithmen, die sich aus den Besonderheiten der Web-Dokumente ableiten lassen. Kernstück der informationsstatistischen Technik ist das "PageRank"- Verfahren (benannt nach dem Entwickler Larry Page), das aus der Hypertextstruktur des Web die "Popularität" von Seiten anhand ihrer ein- und ausgehenden Links berechnet. Google besticht durch das Angebot intuitiv verstehbarer Suchbildschirme sowie durch einige sehr nützliche "Kleinigkeiten" wie die Angabe des Rangs einer Seite, Highlighting, Suchen in der Seite, Suchen innerhalb eines Suchergebnisses usw., alles verstaut in einer eigenen Befehlsleiste innerhalb des Browsers. Ähnlich wie RealNames bietet Google mit dem Produkt "AdWords" den Aufkauf von Suchtermen an. Nach einer Reihe von nunmehr vier Password-Artikeln über InternetSuchwerkzeugen im Vergleich wollen wir abschließend zu einer Bewertung kommen. Wie ist der Stand der Technik bei Directories und Search Engines aus informationswissenschaftlicher Sicht einzuschätzen? Werden die "typischen" Internetnutzer, die ja in der Regel keine Information Professionals sind, adäquat bedient? Und können auch Informationsfachleute von den Suchwerkzeugen profitieren?
  8. Langville, A.N.; Meyer, C.D.: Google's PageRank and beyond : the science of search engine rankings (2006) 0.01
    0.010790287 = product of:
      0.08092715 = sum of:
        0.047417752 = weight(_text_:suchmaschine in 6) [ClassicSimilarity], result of:
          0.047417752 = score(doc=6,freq=4.0), product of:
            0.17890577 = queryWeight, product of:
              5.6542544 = idf(docFreq=420, maxDocs=44218)
              0.031640913 = queryNorm
            0.26504317 = fieldWeight in 6, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.6542544 = idf(docFreq=420, maxDocs=44218)
              0.0234375 = fieldNorm(doc=6)
        0.033509392 = weight(_text_:web in 6) [ClassicSimilarity], result of:
          0.033509392 = score(doc=6,freq=18.0), product of:
            0.10326045 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.031640913 = queryNorm
            0.32451332 = fieldWeight in 6, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0234375 = fieldNorm(doc=6)
      0.13333334 = coord(2/15)
    
    Abstract
    Why doesn't your home page appear on the first page of search results, even when you query your own name? How do other Web pages always appear at the top? What creates these powerful rankings? And how? The first book ever about the science of Web page rankings, "Google's PageRank and Beyond" supplies the answers to these and other questions and more. The book serves two very different audiences: the curious science reader and the technical computational reader. The chapters build in mathematical sophistication, so that the first five are accessible to the general academic reader. While other chapters are much more mathematical in nature, each one contains something for both audiences. For example, the authors include entertaining asides such as how search engines make money and how the Great Firewall of China influences research. The book includes an extensive background chapter designed to help readers learn more about the mathematics of search engines, and it contains several MATLAB codes and links to sample Web data sets. The philosophy throughout is to encourage readers to experiment with the ideas and algorithms in the text. Any business seriously interested in improving its rankings in the major search engines can benefit from the clear examples, sample code, and list of resources provided. It includes: many illustrative examples and entertaining asides; MATLAB code; accessible and informal style; and complete and self-contained section for mathematics review.
    Content
    Inhalt: Chapter 1. Introduction to Web Search Engines: 1.1 A Short History of Information Retrieval - 1.2 An Overview of Traditional Information Retrieval - 1.3 Web Information Retrieval Chapter 2. Crawling, Indexing, and Query Processing: 2.1 Crawling - 2.2 The Content Index - 2.3 Query Processing Chapter 3. Ranking Webpages by Popularity: 3.1 The Scene in 1998 - 3.2 Two Theses - 3.3 Query-Independence Chapter 4. The Mathematics of Google's PageRank: 4.1 The Original Summation Formula for PageRank - 4.2 Matrix Representation of the Summation Equations - 4.3 Problems with the Iterative Process - 4.4 A Little Markov Chain Theory - 4.5 Early Adjustments to the Basic Model - 4.6 Computation of the PageRank Vector - 4.7 Theorem and Proof for Spectrum of the Google Matrix Chapter 5. Parameters in the PageRank Model: 5.1 The a Factor - 5.2 The Hyperlink Matrix H - 5.3 The Teleportation Matrix E Chapter 6. The Sensitivity of PageRank; 6.1 Sensitivity with respect to alpha - 6.2 Sensitivity with respect to H - 6.3 Sensitivity with respect to vT - 6.4 Other Analyses of Sensitivity - 6.5 Sensitivity Theorems and Proofs Chapter 7. The PageRank Problem as a Linear System: 7.1 Properties of (I - alphaS) - 7.2 Properties of (I - alphaH) - 7.3 Proof of the PageRank Sparse Linear System Chapter 8. Issues in Large-Scale Implementation of PageRank: 8.1 Storage Issues - 8.2 Convergence Criterion - 8.3 Accuracy - 8.4 Dangling Nodes - 8.5 Back Button Modeling
    Chapter 9. Accelerating the Computation of PageRank: 9.1 An Adaptive Power Method - 9.2 Extrapolation - 9.3 Aggregation - 9.4 Other Numerical Methods Chapter 10. Updating the PageRank Vector: 10.1 The Two Updating Problems and their History - 10.2 Restarting the Power Method - 10.3 Approximate Updating Using Approximate Aggregation - 10.4 Exact Aggregation - 10.5 Exact vs. Approximate Aggregation - 10.6 Updating with Iterative Aggregation - 10.7 Determining the Partition - 10.8 Conclusions Chapter 11. The HITS Method for Ranking Webpages: 11.1 The HITS Algorithm - 11.2 HITS Implementation - 11.3 HITS Convergence - 11.4 HITS Example - 11.5 Strengths and Weaknesses of HITS - 11.6 HITS's Relationship to Bibliometrics - 11.7 Query-Independent HITS - 11.8 Accelerating HITS - 11.9 HITS Sensitivity Chapter 12. Other Link Methods for Ranking Webpages: 12.1 SALSA - 12.2 Hybrid Ranking Methods - 12.3 Rankings based on Traffic Flow Chapter 13. The Future of Web Information Retrieval: 13.1 Spam - 13.2 Personalization - 13.3 Clustering - 13.4 Intelligent Agents - 13.5 Trends and Time-Sensitive Search - 13.6 Privacy and Censorship - 13.7 Library Classification Schemes - 13.8 Data Fusion Chapter 14. Resources for Web Information Retrieval: 14.1 Resources for Getting Started - 14.2 Resources for Serious Study Chapter 15. The Mathematics Guide: 15.1 Linear Algebra - 15.2 Perron-Frobenius Theory - 15.3 Markov Chains - 15.4 Perron Complementation - 15.5 Stochastic Complementation - 15.6 Censoring - 15.7 Aggregation - 15.8 Disaggregation
    RSWK
    Google / Web-Seite / Rangstatistik (HEBIS)
    Google / Suchmaschine / Ranking (BVB)
    Subject
    Google / Web-Seite / Rangstatistik (HEBIS)
    Google / Suchmaschine / Ranking (BVB)
  9. Chen, Z.; Meng, X.; Fowler, R.H.; Zhu, B.: Real-time adaptive feature and document learning for Web search (2001) 0.01
    0.0076110926 = product of:
      0.057083193 = sum of:
        0.030755727 = weight(_text_:evaluation in 5209) [ClassicSimilarity], result of:
          0.030755727 = score(doc=5209,freq=2.0), product of:
            0.13272417 = queryWeight, product of:
              4.1947007 = idf(docFreq=1811, maxDocs=44218)
              0.031640913 = queryNorm
            0.23172665 = fieldWeight in 5209, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.1947007 = idf(docFreq=1811, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5209)
        0.026327467 = weight(_text_:web in 5209) [ClassicSimilarity], result of:
          0.026327467 = score(doc=5209,freq=4.0), product of:
            0.10326045 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.031640913 = queryNorm
            0.25496176 = fieldWeight in 5209, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5209)
      0.13333334 = coord(2/15)
    
    Abstract
    Chen et alia report on the design of FEATURES, a web search engine with adaptive features based on minimal relevance feedback. Rather than developing user profiles from previous searcher activity either at the server or client location, or updating indexes after search completion, FEATURES allows for index and user characterization files to be updated during query modification on retrieval from a general purpose search engine. Indexing terms relevant to a query are defined as the union of all terms assigned to documents retrieved by the initial search run and are used to build a vector space model on this retrieved set. The top ten weighted terms are presented to the user for a relevant non-relevant choice which is used to modify the term weights. Documents are chosen if their summed term weights are greater than some threshold. A user evaluation of the top ten ranked documents as non-relevant will decrease these term weights and a positive judgement will increase them. A new ordering of the retrieved set will generate new display lists of terms and documents. Precision is improved in a test on Alta Vista searches.
  10. Meghabghab, G.: Google's Web page ranking applied to different topological Web graph structures (2001) 0.01
    0.0059520523 = product of:
      0.08928078 = sum of:
        0.08928078 = weight(_text_:web in 6028) [ClassicSimilarity], result of:
          0.08928078 = score(doc=6028,freq=46.0), product of:
            0.10326045 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.031640913 = queryNorm
            0.86461735 = fieldWeight in 6028, product of:
              6.78233 = tf(freq=46.0), with freq of:
                46.0 = termFreq=46.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=6028)
      0.06666667 = coord(1/15)
    
    Abstract
    This research is part of the ongoing study to better understand web page ranking on the web. It looks at a web page as a graph structure or a web graph, and tries to classify different web graphs in the new coordinate space: (out-degree, in-degree). The out-degree coordinate od is defined as the number of outgoing web pages from a given web page. The in-degree id coordinate is the number of web pages that point to a given web page. In this new coordinate space a metric is built to classify how close or far different web graphs are. Google's web ranking algorithm (Brin & Page, 1998) on ranking web pages is applied in this new coordinate space. The results of the algorithm has been modified to fit different topological web graph structures. Also the algorithm was not successful in the case of general web graphs and new ranking web algorithms have to be considered. This study does not look at enhancing web ranking by adding any contextual information. It only considers web links as a source to web page ranking. The author believes that understanding the underlying web page as a graph will help design better ranking web algorithms, enhance retrieval and web performance, and recommends using graphs as a part of visual aid for browsing engine designers
  11. Watters, C.; Amoudi, A.: Geosearcher : location-based ranking of search engine results (2003) 0.00
    0.004974859 = product of:
      0.07462288 = sum of:
        0.07462288 = weight(_text_:site in 5152) [ClassicSimilarity], result of:
          0.07462288 = score(doc=5152,freq=4.0), product of:
            0.1738463 = queryWeight, product of:
              5.494352 = idf(docFreq=493, maxDocs=44218)
              0.031640913 = queryNorm
            0.42924625 = fieldWeight in 5152, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.494352 = idf(docFreq=493, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5152)
      0.06666667 = coord(1/15)
    
    Abstract
    Waters and Amoudi describe GeoSearcher, a prototype ranking program that arranges search engine results along a geo-spatial dimension without the provision of geo-spatial meta-tags or the use of geo-spatial feature extraction. GeoSearcher uses URL analysis, IptoLL, Whois, and the Getty Thesaurus of Geographic Names to determine site location. It accepts the first 200 sites returned by a search engine, identifies the coordinates, calculates their distance from a reference point and ranks in ascending order by this value. For any retrieved site the system checks if it has already been located in the current session, then sends the domain name to Whois to generate a return of a two letter country code and an area code. With no success the name is stripped one level and resent. If this fails the top level domain is tested for being a country code. Any remaining unmatched names go to IptoLL. Distance is calculated using the center point of the geographic area and a provided reference location. A test run on a set of 100 URLs from a search was successful in locating 90 sites. Eighty three pages could be manually found and 68 had sufficient information to verify location determination. Of these 65 ( 95%) had been assigned reasonably correct geographic locations. A random set of URLs used instead of a search result, yielded 80% success.
  12. Lanvent, A.: Licht im Daten Chaos (2004) 0.00
    0.0047318903 = product of:
      0.07097835 = sum of:
        0.07097835 = sum of:
          0.012879624 = weight(_text_:online in 2806) [ClassicSimilarity], result of:
            0.012879624 = score(doc=2806,freq=2.0), product of:
              0.096027054 = queryWeight, product of:
                3.0349014 = idf(docFreq=5778, maxDocs=44218)
                0.031640913 = queryNorm
              0.13412495 = fieldWeight in 2806, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.0349014 = idf(docFreq=5778, maxDocs=44218)
                0.03125 = fieldNorm(doc=2806)
          0.05809873 = weight(_text_:recherche in 2806) [ClassicSimilarity], result of:
            0.05809873 = score(doc=2806,freq=4.0), product of:
              0.17150146 = queryWeight, product of:
                5.4202437 = idf(docFreq=531, maxDocs=44218)
                0.031640913 = queryNorm
              0.33876523 = fieldWeight in 2806, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.4202437 = idf(docFreq=531, maxDocs=44218)
                0.03125 = fieldNorm(doc=2806)
      0.06666667 = coord(1/15)
    
    Content
    "Bitte suchen Sie alle Unterlagen, die im PC zum Ibelshäuser-Vertrag in Sprockhövel gespeichert sind. Finden Sie alles, was wir haben - Dokumente, Tabellen, Präsentationen, Scans, E-Mails. Und erledigen Sie das gleich! « Wer diese Aufgabe an das Windows-eigene Suchmodul vergibt, wird zwangsläufig enttäuscht. Denn das Betriebssystem beherrscht weder die formatübergreifende Recherche noch die Kontextsuche, die für solche komplexen Aufträge nötig sind. Professionelle Desktop-Suchmaschinen erledigen Aufgaben dieser Art jedoch im Handumdrehen - genauer gesagt in einer einzigen Sekunde. Spitzenprogramme wie Global Brain benötigen dafür nicht einmal umfangreiche Abfrageformulare. Es genügt, einen Satz im Eingabefeld zu formulieren, der das Thema der gewünschten Dokumente eingrenzt. Dabei suchen die Programme über alle Laufwerke, die sich auf dem System einbinden lassen - also auch im Netzwerk-Ordner (Shared Folder), sofern dieser freigegeben wurde. Allen Testkandidaten - mit Ausnahme von Search 32 - gemeinsam ist, dass sie weitaus bessere Rechercheergebnisse abliefern als Windows, deutlich schneller arbeiten und meist auch in den Online-Postfächern stöbern. Wer schon öfter vergeblich über die Windows-Suche nach wichtigen Dokumenten gefahndet hat, kommt angesichts der Qualität der Search-Engines kaum mehr um die Anschaffung eines Desktop-Suchtools herum. Aber Microsoft will nachbessern. Für den Windows-XP-Nachfolger Longhorn wirbt der Hersteller vor allem mit dem Hinweis auf das neue Dateisystem WinFS, das sämtliche Files auf der Festplatte über Meta-Tags indiziert und dem Anwender damit lange Suchläufe erspart. So sollen sich anders als bei Windows XP alle Dateien zu bestimmten Themen in wenigen Sekunden auflisten lassen - unabhängig vom Format und vom physikalischen Speicherort der Files. Für die Recherche selbst ist dann weder der Dateiname noch das Erstelldatum ausschlaggebend. Anhand der kontextsensitiven Suche von WinFS kann der Anwender einfach einen Suchbefehl wie »Vertragsabschluss mit Firma XYZ, Neunkirchen/Saar« eingeben, der dann ohne Umwege zum Ziel führt."
  13. Zhang, D.; Dong, Y.: ¬An effective algorithm to rank Web resources (2000) 0.00
    0.0034750483 = product of:
      0.052125722 = sum of:
        0.052125722 = weight(_text_:web in 3662) [ClassicSimilarity], result of:
          0.052125722 = score(doc=3662,freq=2.0), product of:
            0.10326045 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.031640913 = queryNorm
            0.50479853 = fieldWeight in 3662, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.109375 = fieldNorm(doc=3662)
      0.06666667 = coord(1/15)
    
  14. Dominich, S.; Skrop, A.: PageRank and interaction information retrieval (2005) 0.00
    0.0029786127 = product of:
      0.044679187 = sum of:
        0.044679187 = weight(_text_:web in 3268) [ClassicSimilarity], result of:
          0.044679187 = score(doc=3268,freq=8.0), product of:
            0.10326045 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.031640913 = queryNorm
            0.43268442 = fieldWeight in 3268, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=3268)
      0.06666667 = coord(1/15)
    
    Abstract
    The PageRank method is used by the Google Web search engine to compute the importance of Web pages. Two different views have been developed for the Interpretation of the PageRank method and values: (a) stochastic (random surfer): the PageRank values can be conceived as the steady-state distribution of a Markov chain, and (b) algebraic: the PageRank values form the eigenvector corresponding to eigenvalue 1 of the Web link matrix. The Interaction Information Retrieval (1**2 R) method is a nonclassical information retrieval paradigm, which represents a connectionist approach based an dynamic systems. In the present paper, a different Interpretation of PageRank is proposed, namely, a dynamic systems viewpoint, by showing that the PageRank method can be formally interpreted as a particular case of the Interaction Information Retrieval method; and thus, the PageRank values may be interpreted as neutral equilibrium points of the Web.
  15. Radev, D.; Fan, W.; Qu, H.; Wu, H.; Grewal, A.: Probabilistic question answering on the Web (2005) 0.00
    0.0025795544 = product of:
      0.038693316 = sum of:
        0.038693316 = weight(_text_:web in 3455) [ClassicSimilarity], result of:
          0.038693316 = score(doc=3455,freq=6.0), product of:
            0.10326045 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.031640913 = queryNorm
            0.37471575 = fieldWeight in 3455, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=3455)
      0.06666667 = coord(1/15)
    
    Abstract
    Web-based search engines such as Google and NorthernLight return documents that are relevant to a user query, not answers to user questions. We have developed an architecture that augments existing search engines so that they support natural language question answering. The process entails five steps: query modulation, document retrieval, passage extraction, phrase extraction, and answer ranking. In this article, we describe some probabilistic approaches to the last three of these stages. We show how our techniques apply to a number of existing search engines, and we also present results contrasting three different methods for question answering. Our algorithm, probabilistic phrase reranking (PPR), uses proximity and question type features and achieves a total reciprocal document rank of .20 an the TREC8 corpus. Our techniques have been implemented as a Web-accessible system, called NSIR.
  16. Agosti, M.; Pretto, L.: ¬A theoretical study of a generalized version of kleinberg's HITS algorithm (2005) 0.00
    0.0024821775 = product of:
      0.03723266 = sum of:
        0.03723266 = weight(_text_:web in 4) [ClassicSimilarity], result of:
          0.03723266 = score(doc=4,freq=8.0), product of:
            0.10326045 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.031640913 = queryNorm
            0.36057037 = fieldWeight in 4, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4)
      0.06666667 = coord(1/15)
    
    Abstract
    Kleinberg's HITS (Hyperlink-Induced Topic Search) algorithm (Kleinberg 1999), which was originally developed in a Web context, tries to infer the authoritativeness of a Web page in relation to a specific query using the structure of a subgraph of the Web graph, which is obtained considering this specific query. Recent applications of this algorithm in contexts far removed from that of Web searching (Bacchin, Ferro and Melucci 2002, Ng et al. 2001) inspired us to study the algorithm in the abstract, independently of its particular applications, trying to mathematically illuminate its behaviour. In the present paper we detail this theoretical analysis. The original work starts from the definition of a revised and more general version of the algorithm, which includes the classic one as a particular case. We perform an analysis of the structure of two particular matrices, essential to studying the behaviour of the algorithm, and we prove the convergence of the algorithm in the most general case, finding the analytic expression of the vectors to which it converges. Then we study the symmetry of the algorithm and prove the equivalence between the existence of symmetry and the independence from the order of execution of some basic operations on initial vectors. Finally, we expound some interesting consequences of our theoretical results.
  17. Weiß, B.: Verwandte Seiten finden : "Ähnliche Seiten" oder "What's Related" (2005) 0.00
    0.0022873797 = product of:
      0.034310695 = sum of:
        0.034310695 = product of:
          0.06862139 = sum of:
            0.06862139 = weight(_text_:analyse in 868) [ClassicSimilarity], result of:
              0.06862139 = score(doc=868,freq=4.0), product of:
                0.16670908 = queryWeight, product of:
                  5.268782 = idf(docFreq=618, maxDocs=44218)
                  0.031640913 = queryNorm
                0.4116236 = fieldWeight in 868, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  5.268782 = idf(docFreq=618, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=868)
          0.5 = coord(1/2)
      0.06666667 = coord(1/15)
    
    Abstract
    Die Link-Struktur-Analyse (LSA) ist nicht nur beim Crawling, dem Webseitenranking, der Abgrenzung geographischer Bereiche, der Vorhersage von Linkverwendungen, dem Auffinden von "Mirror"-Seiten, dem Kategorisieren von Webseiten und beim Generieren von Webseitenstatistiken eines der wichtigsten Analyseverfahren, sondern auch bei der Suche nach verwandten Seiten. Um qualitativ hochwertige verwandte Seiten zu finden, bildet sie nach herrschender Meinung den Hauptbestandteil bei der Identifizierung von ähnlichen Seiten innerhalb themenspezifischer Graphen vernetzter Dokumente. Dabei wird stets von zwei Annahmen ausgegangen: Links zwischen zwei Dokumenten implizieren einen verwandten Inhalt beider Dokumente und wenn die Dokumente aus unterschiedlichen Quellen (von unterschiedlichen Autoren, Hosts, Domänen, .) stammen, so bedeutet dies das eine Quelle die andere über einen Link empfiehlt. Aufbauend auf dieser Idee entwickelte Kleinberg 1998 den HITS Algorithmus um verwandte Seiten über die Link-Struktur-Analyse zu bestimmen. Dieser Ansatz wurde von Bharat und Henzinger weiterentwickelt und später auch in Algorithmen wie dem Companion und Cocitation Algorithmus zur Suche von verwandten Seiten basierend auf nur einer Anfrage-URL weiter verfolgt. In der vorliegenden Seminararbeit sollen dabei die Algorithmen, die hinter diesen Überlegungen stehen, näher erläutert werden und im Anschluss jeweils neuere Forschungsansätze auf diesem Themengebiet aufgezeigt werden.
  18. Ding, Y.; Chowdhury, G.; Foo, S.: Organsising keywords in a Web search environment : a methodology based on co-word analysis (2000) 0.00
    0.0021061972 = product of:
      0.031592958 = sum of:
        0.031592958 = weight(_text_:web in 105) [ClassicSimilarity], result of:
          0.031592958 = score(doc=105,freq=4.0), product of:
            0.10326045 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.031640913 = queryNorm
            0.3059541 = fieldWeight in 105, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=105)
      0.06666667 = coord(1/15)
    
    Abstract
    The rapid development of the Internet and World Wide Web has caused some critical problem for information retrieval. Researchers have made several attempts to solve these problems. Thesauri and subject heading lists as traditional information retrieval tools have been criticised for their efficiency to tackle these newly emerging problems. This paper proposes an information retrieval tool generated by cocitation analysis, comprising keyword clusters with relationships based on the co-occurrences of keywords in the literature. Such a tool can play the role of an associative thesaurus that can provide information about the keywords in a domain that might be useful for information searching and query expansion
  19. Lempel, R.; Moran, S.: SALSA: the stochastic approach for link-structure analysis (2001) 0.00
    0.0017551646 = product of:
      0.026327467 = sum of:
        0.026327467 = weight(_text_:web in 10) [ClassicSimilarity], result of:
          0.026327467 = score(doc=10,freq=4.0), product of:
            0.10326045 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.031640913 = queryNorm
            0.25496176 = fieldWeight in 10, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=10)
      0.06666667 = coord(1/15)
    
    Abstract
    Today, when searching for information on the WWW, one usually performs a query through a term-based search engine. These engines return, as the query's result, a list of Web pages whose contents matches the query. For broad-topic queries, such searches often result in a huge set of retrieved documents, many of which are irrelevant to the user. However, much information is contained in the link-structure of the WWW. Information such as which pages are linked to others can be used to augment search algorithms. In this context, Jon Kleinberg introduced the notion of two distinct types of Web pages: hubs and authorities. Kleinberg argued that hubs and authorities exhibit a mutually reinforcing relationship: a good hub will point to many authorities, and a good authority will be pointed at by many hubs. In light of this, he dervised an algoirthm aimed at finding authoritative pages. We present SALSA, a new stochastic approach for link-structure analysis, which examines random walks on graphs derived from the link-structure. We show that both SALSA and Kleinberg's Mutual Reinforcement approach employ the same metaalgorithm. We then prove that SALSA is quivalent to a weighted in degree analysis of the link-sturcutre of WWW subgraphs, making it computationally more efficient than the Mutual reinforcement approach. We compare that results of applying SALSA to the results derived through Kleinberg's approach. These comparisions reveal a topological Phenomenon called the TKC effectwhich, in certain cases, prevents the Mutual reinforcement approach from identifying meaningful authorities.
  20. Notess, G.R.: Search engine relevance : the never-ending quest (2000) 0.00
    0.0015026229 = product of:
      0.022539342 = sum of:
        0.022539342 = product of:
          0.045078684 = sum of:
            0.045078684 = weight(_text_:online in 4797) [ClassicSimilarity], result of:
              0.045078684 = score(doc=4797,freq=2.0), product of:
                0.096027054 = queryWeight, product of:
                  3.0349014 = idf(docFreq=5778, maxDocs=44218)
                  0.031640913 = queryNorm
                0.46943733 = fieldWeight in 4797, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0349014 = idf(docFreq=5778, maxDocs=44218)
                  0.109375 = fieldNorm(doc=4797)
          0.5 = coord(1/2)
      0.06666667 = coord(1/15)
    
    Source
    Online. 24(2000) no.3, S.35-40