Diese Datenbank enthält über 40.000 Dokumente zu Themen aus den Bereichen Formalerschließung – Inhaltserschließung – Information Retrieval.
© 2015 W. Gödert, TH Köln, Institut für Informationswissenschaft / Powered by litecat, BIS Oldenburg (Stand: 04. Juni 2021)
1Bhansali, D. ; Desai, H. ; Deulkar, K.: ¬A study of different ranking approaches for semantic search.
In: International journal of computer applications. 129(2015) no.5, S12-15.
Abstract: Search Engines have become an integral part of our day to day life. Our reliance on search engines increases with every passing day. With the amount of data available on Internet increasing exponentially, it becomes important to develop new methods and tools that help to return results relevant to the queries and reduce the time spent on searching. The results should be diverse but at the same time should return results focused on the queries asked. Relation Based Page Rank  algorithms are considered to be the next frontier in improvement of Semantic Web Search. The probability of finding relevance in the search results as posited by the user while entering the query is used to measure the relevance. However, its application is limited by the complexity of determining relation between the terms and assigning explicit meaning to each term. Trust Rank is one of the most widely used ranking algorithms for semantic web search. Few other ranking algorithms like HITS algorithm, PageRank algorithm are also used for Semantic Web Searching. In this paper, we will provide a comparison of few ranking approaches.
Inhalt: Vgl. auch: http://www.ijcaonline.org/research/volume129/number5/bhansali-2015-ijca-906896.pdf.
Themenfeld: Suchmaschinen ; Retrievalalgorithmen ; Semantisches Umfeld in Indexierung u. Retrieval
Objekt: SemRank ; HITS
2Cheng, S. ; YunTao, P. ; JunPeng, Y. ; Hong, G. ; ZhengLu, Y. ; ZhiYu, H.: PageRank, HITS and impact factor for journal ranking.
In: Proceeding CSIE '09: Proceedings of the 2009 WRI World Congress on Computer Science and Information Engineering - Volume 06. Washington, DC : IEEE Computer Society, 2009. S.285-290.
Abstract: Journal citation measures are one of the most widely used bibliometric tools. The most well-known measure is the ISI Impact Factor, under the standard definition, the impact factor of journal j in a given year is the average number of citations received by papers published in the previous two years of journal j. However, the impact factor has its "intrinsic" limitations, it is a ranking measure based fundamentally on a pure counting of the in-degrees of nodes in the network, and its calculation does not take into account the "impact" or "prestige" of the journals in which the citations appear. Google's PageRank algorithm and Kleinberg's HITS method are webpage ranking algorithm, they compute the scores of webpages based on a combination of the number of hyperlinks that point to the page and the status of pages that the hyperlinks originate from, a page is important if it is pointed to by other important pages. We demonstrate how popular webpage algorithm PageRank and HITS can be used ranking journal, and we compared ISI impact factor, PageRank and HITS for journal ranking, and with PageRank and HITS compute respectively including self-citation and non self-citation, and discussed the merit and shortcomings and the scope of application that the various algorithms are used to rank journal.
Inhalt: Vgl.: DOI: 10.1109/CSIE.2009.351.
Themenfeld: Suchmaschinen ; Informetrie
Objekt: PageRank ; HITS
3Langville, A.N. ; Meyer, C.D.: Google's PageRank and beyond : the science of search engine rankings.
Princeton : Princeton Univ. Press, 2006. X, 224 S.
Abstract: Why doesn't your home page appear on the first page of search results, even when you query your own name? How do other Web pages always appear at the top? What creates these powerful rankings? And how? The first book ever about the science of Web page rankings, "Google's PageRank and Beyond" supplies the answers to these and other questions and more. The book serves two very different audiences: the curious science reader and the technical computational reader. The chapters build in mathematical sophistication, so that the first five are accessible to the general academic reader. While other chapters are much more mathematical in nature, each one contains something for both audiences. For example, the authors include entertaining asides such as how search engines make money and how the Great Firewall of China influences research. The book includes an extensive background chapter designed to help readers learn more about the mathematics of search engines, and it contains several MATLAB codes and links to sample Web data sets. The philosophy throughout is to encourage readers to experiment with the ideas and algorithms in the text. Any business seriously interested in improving its rankings in the major search engines can benefit from the clear examples, sample code, and list of resources provided. It includes: many illustrative examples and entertaining asides; MATLAB code; accessible and informal style; and complete and self-contained section for mathematics review.
Inhalt: Inhalt: Chapter 1. Introduction to Web Search Engines: 1.1 A Short History of Information Retrieval - 1.2 An Overview of Traditional Information Retrieval - 1.3 Web Information Retrieval Chapter 2. Crawling, Indexing, and Query Processing: 2.1 Crawling - 2.2 The Content Index - 2.3 Query Processing Chapter 3. Ranking Webpages by Popularity: 3.1 The Scene in 1998 - 3.2 Two Theses - 3.3 Query-Independence Chapter 4. The Mathematics of Google's PageRank: 4.1 The Original Summation Formula for PageRank - 4.2 Matrix Representation of the Summation Equations - 4.3 Problems with the Iterative Process - 4.4 A Little Markov Chain Theory - 4.5 Early Adjustments to the Basic Model - 4.6 Computation of the PageRank Vector - 4.7 Theorem and Proof for Spectrum of the Google Matrix Chapter 5. Parameters in the PageRank Model: 5.1 The a Factor - 5.2 The Hyperlink Matrix H - 5.3 The Teleportation Matrix E Chapter 6. The Sensitivity of PageRank; 6.1 Sensitivity with respect to alpha - 6.2 Sensitivity with respect to H - 6.3 Sensitivity with respect to vT - 6.4 Other Analyses of Sensitivity - 6.5 Sensitivity Theorems and Proofs Chapter 7. The PageRank Problem as a Linear System: 7.1 Properties of (I - alphaS) - 7.2 Properties of (I - alphaH) - 7.3 Proof of the PageRank Sparse Linear System Chapter 8. Issues in Large-Scale Implementation of PageRank: 8.1 Storage Issues - 8.2 Convergence Criterion - 8.3 Accuracy - 8.4 Dangling Nodes - 8.5 Back Button Modeling ; Chapter 9. Accelerating the Computation of PageRank: 9.1 An Adaptive Power Method - 9.2 Extrapolation - 9.3 Aggregation - 9.4 Other Numerical Methods Chapter 10. Updating the PageRank Vector: 10.1 The Two Updating Problems and their History - 10.2 Restarting the Power Method - 10.3 Approximate Updating Using Approximate Aggregation - 10.4 Exact Aggregation - 10.5 Exact vs. Approximate Aggregation - 10.6 Updating with Iterative Aggregation - 10.7 Determining the Partition - 10.8 Conclusions Chapter 11. The HITS Method for Ranking Webpages: 11.1 The HITS Algorithm - 11.2 HITS Implementation - 11.3 HITS Convergence - 11.4 HITS Example - 11.5 Strengths and Weaknesses of HITS - 11.6 HITS's Relationship to Bibliometrics - 11.7 Query-Independent HITS - 11.8 Accelerating HITS - 11.9 HITS Sensitivity Chapter 12. Other Link Methods for Ranking Webpages: 12.1 SALSA - 12.2 Hybrid Ranking Methods - 12.3 Rankings based on Traffic Flow Chapter 13. The Future of Web Information Retrieval: 13.1 Spam - 13.2 Personalization - 13.3 Clustering - 13.4 Intelligent Agents - 13.5 Trends and Time-Sensitive Search - 13.6 Privacy and Censorship - 13.7 Library Classification Schemes - 13.8 Data Fusion Chapter 14. Resources for Web Information Retrieval: 14.1 Resources for Getting Started - 14.2 Resources for Serious Study Chapter 15. The Mathematics Guide: 15.1 Linear Algebra - 15.2 Perron-Frobenius Theory - 15.3 Markov Chains - 15.4 Perron Complementation - 15.5 Stochastic Complementation - 15.6 Censoring - 15.7 Aggregation - 15.8 Disaggregation
Themenfeld: Suchmaschinen ; Retrievalalgorithmen
Objekt: Google ; PageRank ; HITS-Algorithmus
RSWK: Google / Web-Seite / Rangstatistik (HEBIS) ; Webpage / Rangstatistik (GBV) ; Google / Suchmaschine / Ranking (BVB)
BK: 54.65 / Webentwicklung / Webanwendungen ; 31.80 / Angewandte Mathematik ; 54.32 / Rechnerkommunikation ; 06.74 / Informationssysteme ; 06.70 / Katalogisierung / Bestandserschließung
4Agosti, M. ; Pretto, L.: ¬A theoretical study of a generalized version of kleinberg's HITS algorithm.
In: Advances in mathematical/formal methods in information retrieval. 8(2005) no.2 , S.219-243.
Abstract: Kleinberg's HITS (Hyperlink-Induced Topic Search) algorithm (Kleinberg 1999), which was originally developed in a Web context, tries to infer the authoritativeness of a Web page in relation to a specific query using the structure of a subgraph of the Web graph, which is obtained considering this specific query. Recent applications of this algorithm in contexts far removed from that of Web searching (Bacchin, Ferro and Melucci 2002, Ng et al. 2001) inspired us to study the algorithm in the abstract, independently of its particular applications, trying to mathematically illuminate its behaviour. In the present paper we detail this theoretical analysis. The original work starts from the definition of a revised and more general version of the algorithm, which includes the classic one as a particular case. We perform an analysis of the structure of two particular matrices, essential to studying the behaviour of the algorithm, and we prove the convergence of the algorithm in the most general case, finding the analytic expression of the vectors to which it converges. Then we study the symmetry of the algorithm and prove the equivalence between the existence of symmetry and the independence from the order of execution of some basic operations on initial vectors. Finally, we expound some interesting consequences of our theoretical results.
Themenfeld: Suchmaschinen ; Retrievalalgorithmen
5Berry, M.W. ; Browne, M.: Understanding search engines : mathematical modeling and text retrieval.2nd ed.
Philadelphia, PA : SIAM, 2005. XVII, 117 S.
(Software, environments, tools; 17)
Abstract: The second edition of Understanding Search Engines: Mathematical Modeling and Text Retrieval follows the basic premise of the first edition by discussing many of the key design issues for building search engines and emphasizing the important role that applied mathematics can play in improving information retrieval. The authors discuss important data structures, algorithms, and software as well as user-centered issues such as interfaces, manual indexing, and document preparation. Significant changes bring the text up to date on current information retrieval methods: for example the addition of a new chapter on link-structure algorithms used in search engines such as Google. The chapter on user interface has been rewritten to specifically focus on search engine usability. In addition the authors have added new recommendations for further reading and expanded the bibliography, and have updated and streamlined the index to make it more reader friendly.
Inhalt: Inhalt: Introduction Document File Preparation - Manual Indexing - Information Extraction - Vector Space Modeling - Matrix Decompositions - Query Representations - Ranking and Relevance Feedback - Searching by Link Structure - User Interface - Book Format Document File Preparation Document Purification and Analysis - Text Formatting - Validation - Manual Indexing - Automatic Indexing - Item Normalization - Inverted File Structures - Document File - Dictionary List - Inversion List - Other File Structures Vector Space Models Construction - Term-by-Document Matrices - Simple Query Matching - Design Issues - Term Weighting - Sparse Matrix Storage - Low-Rank Approximations Matrix Decompositions QR Factorization - Singular Value Decomposition - Low-Rank Approximations - Query Matching - Software - Semidiscrete Decomposition - Updating Techniques Query Management Query Binding - Types of Queries - Boolean Queries - Natural Language Queries - Thesaurus Queries - Fuzzy Queries - Term Searches - Probabilistic Queries Ranking and Relevance Feedback Performance Evaluation - Precision - Recall - Average Precision - Genetic Algorithms - Relevance Feedback Searching by Link Structure HITS Method - HITS Implementation - HITS Summary - PageRank Method - PageRank Adjustments - PageRank Implementation - PageRank Summary User Interface Considerations General Guidelines - Search Engine Interfaces - Form Fill-in - Display Considerations - Progress Indication - No Penalties for Error - Results - Test and Retest - Final Considerations Further Reading
Themenfeld: Suchmaschinen ; Retrievalalgorithmen
Objekt: HITS-Algorithmus ; PageRank
LCSH: Web search engines ; Vector spaces ; Text processing (Computer science)
RSWK: Suchmaschine / Information Retrieval ; Suchmaschine / Information Retrieval / Mathematisches Modell (HEBIS)
BK: 06.44 / IuD-Einrichtungen ; 06.74 / Informationssysteme ; 31.80 / Angewandte Mathematik
LCC: TK5105.884.B47 2005
6Chakrabarti, S. ; Dom, B. ; Kumar, S.R. ; Raghavan, P. ; Rajagopalan, S. ; Tomkins, A. ; Kleinberg, J.M. ; Gibson, D.: Neue Pfade durch den Internet-Dschungel : Die zweite Generation von Web-Suchmaschinen.
In: Spektrum der Wissenschaft. 1999, H.8, S.44-49.
Abstract: Die im WWW verfügbare Datenmenge wächst mit atemberaubender Geschwindigkeit; entsprechend schwieriger wird es, relevante Informationen zu finden. ein neues Analyseverfahren stellt nahezu automatische Abhilfe in Aussicht
Inhalt: Ausnutzen der Hyperlinks für verbesserte Such- und Findeverfahren; Darstellung des HITS-Algorithmus
Anmerkung: Vgl. auch: http://www.almaden.ibm.com/cs/k53/clever.html
Themenfeld: Suchmaschinen ; Retrievalalgorithmen
Objekt: Google ; HITS-Algorithmus
7Kleinberg, J.M.: Authoritative sources in a hyperlinked environment.
In: Journal of the Association for Computing Machinery. 46(1998) no.5, S.604-632.
Abstract: The network structure of a hyperlinked environment can be a rich source of information about the content of the environment, provided we have effective means for understanding it. We develop a set of algorithmic tools for extracting information from the link structures of such environments, and report on experiments that demonstrate their effectiveness in a variety of contexts on the World Wide Web. The central issue we address within our framework is the distillation of broad search topics, through the discovery of "authoritative" information sources on such topics. We propose and test an algorithmic formulation of the notion of authority, based on the relationship between a set of relevant authoritative pages and the set of "hub pages" that join them together in the link structure. Our formulation has connections to the eigenvectors of certain matrices associated with the link graph; these connections in turn motivate additional heuristics for link-based analysis.
Inhalt: Vorversionen auch in: Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, 1998, und als IBM Research Report RJ 10076, May 1997.
8Li, L. ; Shang, Y. ; Zhang, W.: Improvement of HITS-based algorithms on Web documents.
In: WWW '02: Proceedings of the 11th International Conference on World Wide Web, May 7-11, 2002, Honolulu, Hawaii, USA. New York : ACM, S.527-535.
Abstract: In this paper, we present two ways to improve the precision of HITS-based algorithms onWeb documents. First, by analyzing the limitations of current HITS-based algorithms, we propose a new weighted HITS-based method that assigns appropriate weights to in-links of root documents. Then, we combine content analysis with HITS-based algorithms and study the effects of four representative relevance scoring methods, VSM, Okapi, TLS, and CDR, using a set of broad topic queries. Our experimental results show that our weighted HITS-based method performs significantly better than Bharat's improved HITS algorithm. When we combine our weighted HITS-based method or Bharat's HITS algorithm with any of the four relevance scoring methods, the combined methods are only marginally better than our weighted HITS-based method. Between the four relevance scoring methods, there is no significant quality difference when they are combined with a HITS-based algorithm.
Inhalt: Vgl.: http%3A%2F%2Fdelab.csd.auth.gr%2F~dimitris%2Fcourses%2Fir_spring06%2Fpage_rank_computing%2Fp527-li.pdf. Vgl. auch: http://www2002.org/CDROM/refereed/643/.