Diese Datenbank enthält über 40.000 Dokumente zu Themen aus den Bereichen Formalerschließung – Inhaltserschließung – Information Retrieval.
© 2015 W. Gödert, TH Köln, Institut für Informationswissenschaft / Powered by litecat, BIS Oldenburg (Stand: 28. April 2022)
21Berry, M.W. ; Browne, M.: Understanding search engines : mathematical modeling and text retrieval.2nd ed.
Philadelphia, PA : SIAM, 2005. XVII, 117 S.
(Software, environments, tools; 17)
Abstract: The second edition of Understanding Search Engines: Mathematical Modeling and Text Retrieval follows the basic premise of the first edition by discussing many of the key design issues for building search engines and emphasizing the important role that applied mathematics can play in improving information retrieval. The authors discuss important data structures, algorithms, and software as well as user-centered issues such as interfaces, manual indexing, and document preparation. Significant changes bring the text up to date on current information retrieval methods: for example the addition of a new chapter on link-structure algorithms used in search engines such as Google. The chapter on user interface has been rewritten to specifically focus on search engine usability. In addition the authors have added new recommendations for further reading and expanded the bibliography, and have updated and streamlined the index to make it more reader friendly.
Inhalt: Inhalt: Introduction Document File Preparation - Manual Indexing - Information Extraction - Vector Space Modeling - Matrix Decompositions - Query Representations - Ranking and Relevance Feedback - Searching by Link Structure - User Interface - Book Format Document File Preparation Document Purification and Analysis - Text Formatting - Validation - Manual Indexing - Automatic Indexing - Item Normalization - Inverted File Structures - Document File - Dictionary List - Inversion List - Other File Structures Vector Space Models Construction - Term-by-Document Matrices - Simple Query Matching - Design Issues - Term Weighting - Sparse Matrix Storage - Low-Rank Approximations Matrix Decompositions QR Factorization - Singular Value Decomposition - Low-Rank Approximations - Query Matching - Software - Semidiscrete Decomposition - Updating Techniques Query Management Query Binding - Types of Queries - Boolean Queries - Natural Language Queries - Thesaurus Queries - Fuzzy Queries - Term Searches - Probabilistic Queries Ranking and Relevance Feedback Performance Evaluation - Precision - Recall - Average Precision - Genetic Algorithms - Relevance Feedback Searching by Link Structure HITS Method - HITS Implementation - HITS Summary - PageRank Method - PageRank Adjustments - PageRank Implementation - PageRank Summary User Interface Considerations General Guidelines - Search Engine Interfaces - Form Fill-in - Display Considerations - Progress Indication - No Penalties for Error - Results - Test and Retest - Final Considerations Further Reading
Themenfeld: Suchmaschinen ; Retrievalalgorithmen
Objekt: HITS-Algorithmus ; PageRank
LCSH: Web search engines ; Vector spaces ; Text processing (Computer science)
RSWK: Suchmaschine / Information Retrieval ; Suchmaschine / Information Retrieval / Mathematisches Modell (HEBIS)
BK: 06.44 / IuD-Einrichtungen ; 06.74 / Informationssysteme ; 31.80 / Angewandte Mathematik
LCC: TK5105.884.B47 2005
22Boldi, P. ; Santini, M. ; Vigna, S.: PageRank as a function of the damping factor.
In: http://vigna.di.unimi.it/ftp/papers/PageRankAsFunction.pdf [Proceedings of the ACM World Wide Web Conference (WWW), 2005].
Abstract: PageRank is defined as the stationary state of a Markov chain. The chain is obtained by perturbing the transition matrix induced by a web graph with a damping factor alpha that spreads uniformly part of the rank. The choice of alpha is eminently empirical, and in most cases the original suggestion alpha=0.85 by Brin and Page is still used. Recently, however, the behaviour of PageRank with respect to changes in alpha was discovered to be useful in link-spam detection. Moreover, an analytical justification of the value chosen for alpha is still missing. In this paper, we give the first mathematical analysis of PageRank when alpha changes. In particular, we show that, contrarily to popular belief, for real-world graphs values of alpha close to 1 do not give a more meaningful ranking. Then, we give closed-form formulae for PageRank derivatives of any order, and an extension of the Power Method that approximates them with convergence O(t**k*alpha**t) for the k-th derivative. Finally, we show a tight connection between iterated computation and analytical behaviour by proving that the k-th iteration of the Power Method gives exactly the PageRank value obtained using a Maclaurin polynomial of degree k. The latter result paves the way towards the application of analytical methods to the study of PageRank.
23Haveliwala, T.: Context-Sensitive Web search.
Stanford : Stanford University, 2005. XVII, 158 S.
Abstract: As the Web continues to grow and encompass broader and more diverse sources of information, providing effective search facilities to users becomes an increasingly challenging problem. To help users deal with the deluge of Web-accessible information, we propose a search system which makes use of context to improve search results in a scalable way. By context, we mean any sources of information, in addition to any search query, that provide clues about the user's true information need. For instance, a user's bookmarks and search history can be considered a part of the search context. We consider two types of context-based search. The first type of functionality we consider is "similarity search." In this case, as the user is browsing Web pages, URLs for pages similar to the current page are retrieved and displayed in a side panel. No query is explicitly issued; context alone (i.e., the page currently being viewed) is used to provide the user with useful related information. The second type of functionality involves taking search context into account when ranking results to standard search queries. Web search differs from traditional information retrieval tasks in several major ways, making effective context-sensitive Web search challenging. First, scalability is of critical importance. With billions of publicly accessible documents, the Web is much larger than traditional datasets. Similarly, with millions of search queries issued each day, the query load is much higher than for traditional information retrieval systems. Second, there are no guarantees on the quality ofWeb pages, with Web-authors taking an adversarial, rather than cooperative, approach in attempts to inflate the rankings of their pages. Third, there is a significant amount of metadata embodied in the link structure corresponding to the hyperlinks between Web pages that can be exploitedduring the retrieval process. In this thesis, we design a search system, using the Stanford WebBase platform, that exploits the link structure of the Web to provide scalable, context-sensitive search.
Inhalt: Ph.D. Dissertation, Stanford University, May 2005. Vgl. auch: http://infolab.stanford.edu/~taherh/papers/taher_thesis.pdf.
25Kamvar, S. ; Haveliwala, T. ; Golub, G.: Adaptive methods for the computation of PageRank.
Stanford : Stanford University, 2003. 13 S.
(Stanford University Technical Report; April 2003)
Abstract: We observe that the convergence patterns of pages in the PageRank algorithm have a nonuniform distribution. Specifically, many pages converge to their true PageRank quickly, while relatively few pages take a much longer time to converge. Furthermore, we observe that these slow-converging pages are generally those pages with high PageRank.We use this observation to devise a simple algorithm to speed up the computation of PageRank, in which the PageRank of pages that have converged are not recomputed at each iteration after convergence. This algorithm, which we call Adaptive PageRank, speeds up the computation of PageRank by nearly 30%.
Inhalt: Accepted for publication by NSMC '03. Vgl. auch: http://infolab.stanford.edu/~taherh/papers/adaptive.pdf.
26Haveliwala, T. ; Kamvar, S.: ¬The second eigenvalue of the Google matrix.
Stanford : Stanford University, 2003. 8 S.
(Stanford University Technical Report; March 2003)
Abstract: We determine analytically the modulus of the second eigenvalue for the web hyperlink matrix used by Google for computing PageRank. Specifically, we prove the following statement: "For any matrix A=(cP + (1-c)E)**T, where P is an nxn row-stochasticmatrix, E is a nonnegative nxn rank-one row-stochastic matrix, and 0<=c<=1, the second eigenvalue of A has modulus Betrag (Lambda_sub2)<=c. Furthermore, if P has at least two irreducible closed subsets, the second eigenvalue Lambda_sub2 = c." This statement has implications for the convergence rate of the standard PageRank algorithm as the web scales, for the stability of PageRank to perturbations to the link structure of the web, for the detection of Google spammers, and for the design of algorithms to speed up PageRank.
Inhalt: Vgl. auch: http://infolab.stanford.edu/~taherh/papers/secondeigenvalue.pdf.
27Rogers, I.: ¬The Google Pagerank algorithm and how it works.
Abstract: Page Rank is a topic much discussed by Search Engine Optimisation (SEO) experts. At the heart of PageRank is a mathematical formula that seems scary to look at but is actually fairly simple to understand. Despite this many people seem to get it wrong! In particular "Chris Ridings of www.searchenginesystems.net" has written a paper entitled "PageRank Explained: Everything you've always wanted to know about PageRank", pointed to by many people, that contains a fundamental mistake early on in the explanation! Unfortunately this means some of the recommendations in the paper are not quite accurate. By showing code to correctly calculate real PageRank I hope to achieve several things in this response: - Clearly explain how PageRank is calculated. - Go through every example in Chris' paper, and add some more of my own, showing the correct PageRank for each diagram. By showing the code used to calculate each diagram I've opened myself up to peer review - mostly in an effort to make sure the examples are correct, but also because the code can help explain the PageRank calculations. - Describe some principles and observations on website design based on these correctly calculated examples. Any good web designer should take the time to fully understand how PageRank really works - if you don't then your site's layout could be seriously hurting your Google listings! [Note: I have nothing in particular against Chris. If I find any other papers on the subject I'll try to comment evenly]
Inhalt: Vgl. auch: www.ianrogers.net/google-page-rank/.
28Schöch, V.C.: ¬Die Suchmaschine Google.
Abstract: Google ist die erste Suchmaschine, die ihre Wurzeln an einer Uni hat. Sie wurde ursprünglich von S. Brin und L. Page aus Stanford lediglich als Proof of Concept implementiert [BP98]. Den gesuchten Beweis für die implementierten Konzepte erbrachte sie aber so überzeugend, dass sie innerhalb weniger Monate zu einer der populärsten universellen Suchmaschinen wurde. Unter den von der Suchmaschine Google zu Testzwecken implementierten Konzepten sind insbesondere zwei Verfahren zum Ranking von Suchergebnissen, auf die in diesem Papier näher eingegangen wird: PageRank (Abschnitt 6) und Anchor Text (Abschnitt 7).
Objekt: Google ; PageRank
Objekt: Google PageRank -> PageRank