Search (61 results, page 1 of 4)

  • × year_i:[2000 TO 2010}
  • × theme_ss:"Retrievalalgorithmen"
  1. Back, J.: ¬An evaluation of relevancy ranking techniques used by Internet search engines (2000) 0.26
    0.26002502 = product of:
      0.39003754 = sum of:
        0.093939 = weight(_text_:search in 3445) [ClassicSimilarity], result of:
          0.093939 = score(doc=3445,freq=2.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.5376164 = fieldWeight in 3445, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.109375 = fieldNorm(doc=3445)
        0.29609853 = sum of:
          0.20074052 = weight(_text_:engines in 3445) [ClassicSimilarity], result of:
            0.20074052 = score(doc=3445,freq=2.0), product of:
              0.25542772 = queryWeight, product of:
                5.080822 = idf(docFreq=746, maxDocs=44218)
                0.05027291 = queryNorm
              0.7858995 = fieldWeight in 3445, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.080822 = idf(docFreq=746, maxDocs=44218)
                0.109375 = fieldNorm(doc=3445)
          0.095358 = weight(_text_:22 in 3445) [ClassicSimilarity], result of:
            0.095358 = score(doc=3445,freq=2.0), product of:
              0.17604718 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.05027291 = queryNorm
              0.5416616 = fieldWeight in 3445, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.109375 = fieldNorm(doc=3445)
      0.6666667 = coord(2/3)
    
    Date
    25. 8.2005 17:42:22
  2. Stock, W.G.: On relevance distributions (2006) 0.13
    0.12821087 = product of:
      0.1923163 = sum of:
        0.0929755 = weight(_text_:search in 5116) [ClassicSimilarity], result of:
          0.0929755 = score(doc=5116,freq=6.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.5321022 = fieldWeight in 5116, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0625 = fieldNorm(doc=5116)
        0.099340804 = product of:
          0.19868161 = sum of:
            0.19868161 = weight(_text_:engines in 5116) [ClassicSimilarity], result of:
              0.19868161 = score(doc=5116,freq=6.0), product of:
                0.25542772 = queryWeight, product of:
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.05027291 = queryNorm
                0.7778389 = fieldWeight in 5116, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.0625 = fieldNorm(doc=5116)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    There are at least three possible ways that documents are distributed by relevance: informetric (power law), inverse logistic, and dichotomous. The nature of the type of distribution has implications for the construction of relevance ranking algorithms for search engines, for automated (blind) relevance feedback, for user behavior when using Web search engines, for combining of outputs of search engines for metasearch, for topic detection and tracking, and for the methodology of evaluation of information retrieval systems.
  3. Wills, R.S.: Google's PageRank : the math behind the search engine (2006) 0.13
    0.12667575 = product of:
      0.19001363 = sum of:
        0.12588942 = weight(_text_:search in 5954) [ClassicSimilarity], result of:
          0.12588942 = score(doc=5954,freq=44.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.72046983 = fieldWeight in 5954, product of:
              6.6332498 = tf(freq=44.0), with freq of:
                44.0 = termFreq=44.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.03125 = fieldNorm(doc=5954)
        0.06412421 = product of:
          0.12824842 = sum of:
            0.12824842 = weight(_text_:engines in 5954) [ClassicSimilarity], result of:
              0.12824842 = score(doc=5954,freq=10.0), product of:
                0.25542772 = queryWeight, product of:
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.05027291 = queryNorm
                0.50209284 = fieldWeight in 5954, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.03125 = fieldNorm(doc=5954)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Approximately 91 million American adults use the Internet on a typical day The number-one Internet activity is reading and writing e-mail. Search engine use is next in line and continues to increase in popularity. In fact, survey findings indicate that nearly 60 million American adults use search engines on a given day. Even though there are many Internet search engines, Google, Yahoo!, and MSN receive over 81% of all search requests. Despite claims that the quality of search provided by Yahoo! and MSN now equals that of Google, Google continues to thrive as the search engine of choice, receiving over 46% of all search requests, nearly double the volume of Yahoo! and over four times that of MSN. I use Google's search engine on a daily basis and rarely request information from other search engines. One day, I decided to visit the homepages of Google. Yahoo!, and MSN to compare the quality of search results. Coffee was on my mind that day, so I entered the simple query "coffee" in the search box at each homepage. Table 1 shows the top ten (unsponsored) results returned by each search engine. Although ordered differently, two webpages, www.peets.com and www.coffeegeek.com, appear in all three top ten lists. In addition, each pairing of top ten lists has two additional results in common. Depending on the information I hoped to obtain about coffee by using the search engines, I could argue that any one of the three returned better results: however, I was not looking for a particular webpage, so all three listings of search results seemed of equal quality. Thus, I plan to continue using Google. My decision is indicative of the problem Yahoo!, MSN, and other search engine companies face in the quest to obtain a larger percentage of Internet search volume. Search engine users are loyal to one or a few search engines and are generally happy with search results. Thus, as long as Google continues to provide results deemed high in quality, Google likely will remain the top search engine. But what set Google apart from its competitors in the first place? The answer is PageRank. In this article I explain this simple mathematical algorithm that revolutionized Web search.
  4. Khoo, C.S.G.; Wan, K.-W.: ¬A simple relevancy-ranking strategy for an interface to Boolean OPACs (2004) 0.12
    0.122785255 = product of:
      0.18417788 = sum of:
        0.11015324 = weight(_text_:search in 2509) [ClassicSimilarity], result of:
          0.11015324 = score(doc=2509,freq=44.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.6304111 = fieldWeight in 2509, product of:
              6.6332498 = tf(freq=44.0), with freq of:
                44.0 = termFreq=44.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.02734375 = fieldNorm(doc=2509)
        0.07402463 = sum of:
          0.05018513 = weight(_text_:engines in 2509) [ClassicSimilarity], result of:
            0.05018513 = score(doc=2509,freq=2.0), product of:
              0.25542772 = queryWeight, product of:
                5.080822 = idf(docFreq=746, maxDocs=44218)
                0.05027291 = queryNorm
              0.19647488 = fieldWeight in 2509, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.080822 = idf(docFreq=746, maxDocs=44218)
                0.02734375 = fieldNorm(doc=2509)
          0.0238395 = weight(_text_:22 in 2509) [ClassicSimilarity], result of:
            0.0238395 = score(doc=2509,freq=2.0), product of:
              0.17604718 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.05027291 = queryNorm
              0.1354154 = fieldWeight in 2509, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.02734375 = fieldNorm(doc=2509)
      0.6666667 = coord(2/3)
    
    Abstract
    A relevancy-ranking algorithm for a natural language interface to Boolean online public access catalogs (OPACs) was formulated and compared with that currently used in a knowledge-based search interface called the E-Referencer, being developed by the authors. The algorithm makes use of seven weIl-known ranking criteria: breadth of match, section weighting, proximity of query words, variant word forms (stemming), document frequency, term frequency and document length. The algorithm converts a natural language query into a series of increasingly broader Boolean search statements. In a small experiment with ten subjects in which the algorithm was simulated by hand, the algorithm obtained good results with a mean overall precision of 0.42 and mean average precision of 0.62, representing a 27 percent improvement in precision and 41 percent improvement in average precision compared to the E-Referencer. The usefulness of each step in the algorithm was analyzed and suggestions are made for improving the algorithm.
    Content
    "Most Web search engines accept natural language queries, perform some kind of fuzzy matching and produce ranked output, displaying first the documents that are most likely to be relevant. On the other hand, most library online public access catalogs (OPACs) an the Web are still Boolean retrieval systems that perform exact matching, and require users to express their search requests precisely in a Boolean search language and to refine their search statements to improve the search results. It is well-documented that users have difficulty searching Boolean OPACs effectively (e.g. Borgman, 1996; Ensor, 1992; Wallace, 1993). One approach to making OPACs easier to use is to develop a natural language search interface that acts as a middleware between the user's Web browser and the OPAC system. The search interface can accept a natural language query from the user and reformulate it as a series of Boolean search statements that are then submitted to the OPAC. The records retrieved by the OPAC are ranked by the search interface before forwarding them to the user's Web browser. The user, then, does not need to interact directly with the Boolean OPAC but with the natural language search interface or search intermediary. The search interface interacts with the OPAC system an the user's behalf. The advantage of this approach is that no modification to the OPAC or library system is required. Furthermore, the search interface can access multiple OPACs, acting as a meta search engine, and integrate search results from various OPACs before sending them to the user. The search interface needs to incorporate a method for converting the user's natural language query into a series of Boolean search statements, and for ranking the OPAC records retrieved. The purpose of this study was to develop a relevancyranking algorithm for a search interface to Boolean OPAC systems. This is part of an on-going effort to develop a knowledge-based search interface to OPACs called the E-Referencer (Khoo et al., 1998, 1999; Poo et al., 2000). E-Referencer v. 2 that has been implemented applies a repertoire of initial search strategies and reformulation strategies to retrieve records from OPACs using the Z39.50 protocol, and also assists users in mapping query keywords to the Library of Congress subject headings."
    Source
    Electronic library. 22(2004) no.2, S.112-120
  5. Furner, J.: ¬A unifying model of document relatedness for hybrid search engines (2003) 0.12
    0.12255666 = product of:
      0.18383498 = sum of:
        0.056935627 = weight(_text_:search in 2717) [ClassicSimilarity], result of:
          0.056935627 = score(doc=2717,freq=4.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.3258447 = fieldWeight in 2717, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.046875 = fieldNorm(doc=2717)
        0.12689936 = sum of:
          0.08603165 = weight(_text_:engines in 2717) [ClassicSimilarity], result of:
            0.08603165 = score(doc=2717,freq=2.0), product of:
              0.25542772 = queryWeight, product of:
                5.080822 = idf(docFreq=746, maxDocs=44218)
                0.05027291 = queryNorm
              0.33681408 = fieldWeight in 2717, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.080822 = idf(docFreq=746, maxDocs=44218)
                0.046875 = fieldNorm(doc=2717)
          0.040867712 = weight(_text_:22 in 2717) [ClassicSimilarity], result of:
            0.040867712 = score(doc=2717,freq=2.0), product of:
              0.17604718 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.05027291 = queryNorm
              0.23214069 = fieldWeight in 2717, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=2717)
      0.6666667 = coord(2/3)
    
    Abstract
    Previous work an search-engine design has indicated that information-seekers may benefit from being given the opportunity to exploit multiple sources of evidence of document relatedness. Few existing systems, however, give users more than minimal control over the selections that may be made among methods of exploitation. By applying the methods of "document network analysis" (DNA), a unifying, graph-theoretic model of content-, collaboration-, and context-based systems (CCC) may be developed in which the nature of the similarities between types of document relatedness and document ranking are clarified. The usefulness of the approach to system design suggested by this model may be tested by constructing and evaluating a prototype system (UCXtra) that allows searchers to maintain control over the multiple ways in which document collections may be ranked and re-ranked.
    Date
    11. 9.2004 17:32:22
  6. Bar-Ilan, J.; Levene, M.; Mat-Hassan, M.: Methods for evaluating dynamic changes in search engine rankings : a case study (2006) 0.11
    0.10716495 = product of:
      0.16074742 = sum of:
        0.08487463 = weight(_text_:search in 616) [ClassicSimilarity], result of:
          0.08487463 = score(doc=616,freq=20.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.48574063 = fieldWeight in 616, product of:
              4.472136 = tf(freq=20.0), with freq of:
                20.0 = termFreq=20.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.03125 = fieldNorm(doc=616)
        0.075872794 = product of:
          0.15174559 = sum of:
            0.15174559 = weight(_text_:engines in 616) [ClassicSimilarity], result of:
              0.15174559 = score(doc=616,freq=14.0), product of:
                0.25542772 = queryWeight, product of:
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.05027291 = queryNorm
                0.59408426 = fieldWeight in 616, product of:
                  3.7416575 = tf(freq=14.0), with freq of:
                    14.0 = termFreq=14.0
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.03125 = fieldNorm(doc=616)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Purpose - The objective of this paper is to characterize the changes in the rankings of the top ten results of major search engines over time and to compare the rankings between these engines. Design/methodology/approach - The papers compare rankings of the top-ten results of the search engines Google and AlltheWeb on ten identical queries over a period of three weeks. Only the top-ten results were considered, since users do not normally inspect more than the first results page returned by a search engine. The experiment was repeated twice, in October 2003 and in January 2004, in order to assess changes to the top-ten results of some of the queries during the three months interval. In order to assess the changes in the rankings, three measures were computed for each data collection point and each search engine. Findings - The findings in this paper show that the rankings of AlltheWeb were highly stable over each period, while the rankings of Google underwent constant yet minor changes, with occasional major ones. Changes over time can be explained by the dynamic nature of the web or by fluctuations in the search engines' indexes. The top-ten results of the two search engines had surprisingly low overlap. With such small overlap, the task of comparing the rankings of the two engines becomes extremely challenging. Originality/value - The paper shows that because of the abundance of information on the web, ranking search results is of extreme importance. The paper compares several measures for computing the similarity between rankings of search tools, and shows that none of the measures is fully satisfactory as a standalone measure. It also demonstrates the apparent differences in the ranking algorithms of two widely used search engines.
  7. Berry, M.W.; Browne, M.: Understanding search engines : mathematical modeling and text retrieval (2005) 0.10
    0.097439155 = product of:
      0.14615873 = sum of:
        0.075914174 = weight(_text_:search in 7) [ClassicSimilarity], result of:
          0.075914174 = score(doc=7,freq=16.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.43445963 = fieldWeight in 7, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.03125 = fieldNorm(doc=7)
        0.07024455 = product of:
          0.1404891 = sum of:
            0.1404891 = weight(_text_:engines in 7) [ClassicSimilarity], result of:
              0.1404891 = score(doc=7,freq=12.0), product of:
                0.25542772 = queryWeight, product of:
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.05027291 = queryNorm
                0.5500151 = fieldWeight in 7, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.03125 = fieldNorm(doc=7)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    The second edition of Understanding Search Engines: Mathematical Modeling and Text Retrieval follows the basic premise of the first edition by discussing many of the key design issues for building search engines and emphasizing the important role that applied mathematics can play in improving information retrieval. The authors discuss important data structures, algorithms, and software as well as user-centered issues such as interfaces, manual indexing, and document preparation. Significant changes bring the text up to date on current information retrieval methods: for example the addition of a new chapter on link-structure algorithms used in search engines such as Google. The chapter on user interface has been rewritten to specifically focus on search engine usability. In addition the authors have added new recommendations for further reading and expanded the bibliography, and have updated and streamlined the index to make it more reader friendly.
    Content
    Inhalt: Introduction Document File Preparation - Manual Indexing - Information Extraction - Vector Space Modeling - Matrix Decompositions - Query Representations - Ranking and Relevance Feedback - Searching by Link Structure - User Interface - Book Format Document File Preparation Document Purification and Analysis - Text Formatting - Validation - Manual Indexing - Automatic Indexing - Item Normalization - Inverted File Structures - Document File - Dictionary List - Inversion List - Other File Structures Vector Space Models Construction - Term-by-Document Matrices - Simple Query Matching - Design Issues - Term Weighting - Sparse Matrix Storage - Low-Rank Approximations Matrix Decompositions QR Factorization - Singular Value Decomposition - Low-Rank Approximations - Query Matching - Software - Semidiscrete Decomposition - Updating Techniques Query Management Query Binding - Types of Queries - Boolean Queries - Natural Language Queries - Thesaurus Queries - Fuzzy Queries - Term Searches - Probabilistic Queries Ranking and Relevance Feedback Performance Evaluation - Precision - Recall - Average Precision - Genetic Algorithms - Relevance Feedback Searching by Link Structure HITS Method - HITS Implementation - HITS Summary - PageRank Method - PageRank Adjustments - PageRank Implementation - PageRank Summary User Interface Considerations General Guidelines - Search Engine Interfaces - Form Fill-in - Display Considerations - Progress Indication - No Penalties for Error - Results - Test and Retest - Final Considerations Further Reading
    LCSH
    Web search engines
    Subject
    Web search engines
  8. Radev, D.; Fan, W.; Qu, H.; Wu, H.; Grewal, A.: Probabilistic question answering on the Web (2005) 0.10
    0.09615815 = product of:
      0.14423722 = sum of:
        0.06973162 = weight(_text_:search in 3455) [ClassicSimilarity], result of:
          0.06973162 = score(doc=3455,freq=6.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.39907667 = fieldWeight in 3455, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.046875 = fieldNorm(doc=3455)
        0.074505605 = product of:
          0.14901121 = sum of:
            0.14901121 = weight(_text_:engines in 3455) [ClassicSimilarity], result of:
              0.14901121 = score(doc=3455,freq=6.0), product of:
                0.25542772 = queryWeight, product of:
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.05027291 = queryNorm
                0.58337915 = fieldWeight in 3455, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3455)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Web-based search engines such as Google and NorthernLight return documents that are relevant to a user query, not answers to user questions. We have developed an architecture that augments existing search engines so that they support natural language question answering. The process entails five steps: query modulation, document retrieval, passage extraction, phrase extraction, and answer ranking. In this article, we describe some probabilistic approaches to the last three of these stages. We show how our techniques apply to a number of existing search engines, and we also present results contrasting three different methods for question answering. Our algorithm, probabilistic phrase reranking (PPR), uses proximity and question type features and achieves a total reciprocal document rank of .20 an the TREC8 corpus. Our techniques have been implemented as a Web-accessible system, called NSIR.
  9. Henzinger, M.R.: Hyperlink analysis for the Web (2001) 0.09
    0.090090275 = product of:
      0.13513541 = sum of:
        0.07101121 = weight(_text_:search in 8) [ClassicSimilarity], result of:
          0.07101121 = score(doc=8,freq=14.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.4063998 = fieldWeight in 8, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.03125 = fieldNorm(doc=8)
        0.06412421 = product of:
          0.12824842 = sum of:
            0.12824842 = weight(_text_:engines in 8) [ClassicSimilarity], result of:
              0.12824842 = score(doc=8,freq=10.0), product of:
                0.25542772 = queryWeight, product of:
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.05027291 = queryNorm
                0.50209284 = fieldWeight in 8, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.03125 = fieldNorm(doc=8)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Hyperlink analysis algorithms allow search engines to deliver focused results to user queries.This article surveys ranking algorithms used to retrieve information on the Web.
    Content
    Information retrieval is a computer science subfield whose goal is to find all documents relevant to a user query in a given collection of documents. As such, information retrieval should really be called document retrieval. Before the advent of the Web, IR systems were typically installed in libraries for use mostly by reference librarians. The retrieval algorithm for these systems was usually based exclusively on analysis of the words in the document. The Web changed all this. Now each Web user has access to various search engines whose retrieval algorithms often use not only the words in the documents but also information like the hyperlink structure of the Web or markup language tags. How are hyperlinks useful? The hyperlink functionality alone-that is, the hyperlink to Web page B that is contained in Web page A-is not directly useful in information retrieval. However, the way Web page authors use hyperlinks can give them valuable information content. Authors usually create hyperlinks they think will be useful to readers. Some may be navigational aids that, for example, take the reader back to the site's home page; others provide access to documents that augment the content of the current page. The latter tend to point to highquality pages that might be on the same topic as the page containing the hyperlink. Web information retrieval systems can exploit this information to refine searches for relevant documents. Hyperlink analysis significantly improves the relevance of the search results, so much so that all major Web search engines claim to use some type of hyperlink analysis. However, the search engines do not disclose details about the type of hyperlink analysis they perform- mostly to avoid manipulation of search results by Web-positioning companies. In this article, I discuss how hyperlink analysis can be applied to ranking algorithms, and survey other ways Web search engines can use this analysis.
  10. Desai, M.; Spink, A.: ¬A algorithm to cluster documents based on relevance (2005) 0.08
    0.08235665 = product of:
      0.12353496 = sum of:
        0.08051914 = weight(_text_:search in 1035) [ClassicSimilarity], result of:
          0.08051914 = score(doc=1035,freq=8.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.460814 = fieldWeight in 1035, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.046875 = fieldNorm(doc=1035)
        0.043015826 = product of:
          0.08603165 = sum of:
            0.08603165 = weight(_text_:engines in 1035) [ClassicSimilarity], result of:
              0.08603165 = score(doc=1035,freq=2.0), product of:
                0.25542772 = queryWeight, product of:
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.05027291 = queryNorm
                0.33681408 = fieldWeight in 1035, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1035)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Search engines fail to make a clear distinction between items of varying relevance when presenting search results to users. Instead, they rely on the user of the system to estimate which items are relevant, partially relevant, or not relevant. The user of the system is given the task of distinguishing between documents that are relevant to different degrees. This process often hinders the accessibility of relevant or partially relevant documents, particularly when the results set is large and documents of varying relevance are scattered throughout the set. In this paper, we present a clustering scheme that groups documents within relevant, partially relevant, and not relevant regions for a given search. A clustering algorithm accomplishes the task of clustering documents based on relevance. The clusters were evaluated by end-users issuing categorical, interval, and descriptive relevance judgments for the documents returned from a search. The degree of overlap between users and the system for each of the clustered regions was measured to determine the overall effectiveness of the algorithm. This research showed that clustering documents on the Web by regions of relevance is highly necessary and quite feasible.
  11. Lewandowski, D.: How can library materials be ranked in the OPAC? (2009) 0.08
    0.0801318 = product of:
      0.12019769 = sum of:
        0.058109686 = weight(_text_:search in 2810) [ClassicSimilarity], result of:
          0.058109686 = score(doc=2810,freq=6.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.33256388 = fieldWeight in 2810, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2810)
        0.062088005 = product of:
          0.12417601 = sum of:
            0.12417601 = weight(_text_:engines in 2810) [ClassicSimilarity], result of:
              0.12417601 = score(doc=2810,freq=6.0), product of:
                0.25542772 = queryWeight, product of:
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.05027291 = queryNorm
                0.4861493 = fieldWeight in 2810, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2810)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Some Online Public Access Catalogues offer a ranking component. However, ranking there is merely text-based and is doomed to fail due to limited text in bibliographic data. The main assumption for the talk is that we are in a situation where the appropriate ranking factors for OPACs should be defined, while the implementation is no major problem. We must define what we want, and not so much focus on the technical work. Some deep thinking is necessary on the "perfect results set" and how we can achieve it through ranking. The talk presents a set of potential ranking factors and clustering possibilities for further discussion. A look at commercial Web search engines could provide us with ideas how ranking can be improved with additional factors. Search engines are way beyond pure text-based ranking and apply ranking factors in the groups like popularity, freshness, personalisation, etc. The talk describes the main factors used in search engines and how derivatives of these could be used for libraries' purposes. The goal of ranking is to provide the user with the best-suitable results on top of the results list. How can this goal be achieved with the library catalogue and also concerning the library's different collections and databases? The assumption is that ranking of such materials is a complex problem and is yet nowhere near solved. Libraries should focus on ranking to improve user experience.
  12. Stock, M.; Stock, W.G.: Internet-Suchwerkzeuge im Vergleich (IV) : Relevance Ranking nach "Popularität" von Webseiten: Google (2001) 0.08
    0.0785128 = product of:
      0.1177692 = sum of:
        0.056935627 = weight(_text_:search in 5771) [ClassicSimilarity], result of:
          0.056935627 = score(doc=5771,freq=4.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.3258447 = fieldWeight in 5771, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.046875 = fieldNorm(doc=5771)
        0.060833566 = product of:
          0.12166713 = sum of:
            0.12166713 = weight(_text_:engines in 5771) [ClassicSimilarity], result of:
              0.12166713 = score(doc=5771,freq=4.0), product of:
                0.25542772 = queryWeight, product of:
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.05027291 = queryNorm
                0.47632706 = fieldWeight in 5771, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5771)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    In unserem Retrievaltest von Suchwerkzeugen im World Wide Web (Password 11/2000) schnitt die Suchmaschine Google am besten ab. Im Vergleich zu anderen Search Engines setzt Google kaum auf Informationslinguistik, sondern auf Algorithmen, die sich aus den Besonderheiten der Web-Dokumente ableiten lassen. Kernstück der informationsstatistischen Technik ist das "PageRank"- Verfahren (benannt nach dem Entwickler Larry Page), das aus der Hypertextstruktur des Web die "Popularität" von Seiten anhand ihrer ein- und ausgehenden Links berechnet. Google besticht durch das Angebot intuitiv verstehbarer Suchbildschirme sowie durch einige sehr nützliche "Kleinigkeiten" wie die Angabe des Rangs einer Seite, Highlighting, Suchen in der Seite, Suchen innerhalb eines Suchergebnisses usw., alles verstaut in einer eigenen Befehlsleiste innerhalb des Browsers. Ähnlich wie RealNames bietet Google mit dem Produkt "AdWords" den Aufkauf von Suchtermen an. Nach einer Reihe von nunmehr vier Password-Artikeln über InternetSuchwerkzeugen im Vergleich wollen wir abschließend zu einer Bewertung kommen. Wie ist der Stand der Technik bei Directories und Search Engines aus informationswissenschaftlicher Sicht einzuschätzen? Werden die "typischen" Internetnutzer, die ja in der Regel keine Information Professionals sind, adäquat bedient? Und können auch Informationsfachleute von den Suchwerkzeugen profitieren?
  13. Austin, D.: How Google finds your needle in the Web's haystack : as we'll see, the trick is to ask the web itself to rank the importance of pages... (2006) 0.08
    0.07789321 = product of:
      0.11683981 = sum of:
        0.08135357 = weight(_text_:search in 93) [ClassicSimilarity], result of:
          0.08135357 = score(doc=93,freq=24.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.46558946 = fieldWeight in 93, product of:
              4.8989797 = tf(freq=24.0), with freq of:
                24.0 = termFreq=24.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.02734375 = fieldNorm(doc=93)
        0.035486247 = product of:
          0.070972495 = sum of:
            0.070972495 = weight(_text_:engines in 93) [ClassicSimilarity], result of:
              0.070972495 = score(doc=93,freq=4.0), product of:
                0.25542772 = queryWeight, product of:
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.05027291 = queryNorm
                0.27785745 = fieldWeight in 93, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.02734375 = fieldNorm(doc=93)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Imagine a library containing 25 billion documents but with no centralized organization and no librarians. In addition, anyone may add a document at any time without telling anyone. You may feel sure that one of the documents contained in the collection has a piece of information that is vitally important to you, and, being impatient like most of us, you'd like to find it in a matter of seconds. How would you go about doing it? Posed in this way, the problem seems impossible. Yet this description is not too different from the World Wide Web, a huge, highly-disorganized collection of documents in many different formats. Of course, we're all familiar with search engines (perhaps you found this article using one) so we know that there is a solution. This article will describe Google's PageRank algorithm and how it returns pages from the web's collection of 25 billion documents that match search criteria so well that "google" has become a widely used verb. Most search engines, including Google, continually run an army of computer programs that retrieve pages from the web, index the words in each document, and store this information in an efficient format. Each time a user asks for a web search using a search phrase, such as "search engine," the search engine determines all the pages on the web that contains the words in the search phrase. (Perhaps additional information such as the distance between the words "search" and "engine" will be noted as well.) Here is the problem: Google now claims to index 25 billion pages. Roughly 95% of the text in web pages is composed from a mere 10,000 words. This means that, for most searches, there will be a huge number of pages containing the words in the search phrase. What is needed is a means of ranking the importance of the pages that fit the search criteria so that the pages can be sorted with the most important pages at the top of the list. One way to determine the importance of pages is to use a human-generated ranking. For instance, you may have seen pages that consist mainly of a large number of links to other resources in a particular area of interest. Assuming the person maintaining this page is reliable, the pages referenced are likely to be useful. Of course, the list may quickly fall out of date, and the person maintaining the list may miss some important pages, either unintentionally or as a result of an unstated bias. Google's PageRank algorithm assesses the importance of web pages without human evaluation of the content. In fact, Google feels that the value of its service is largely in its ability to provide unbiased results to search queries; Google claims, "the heart of our software is PageRank." As we'll see, the trick is to ask the web itself to rank the importance of pages.
  14. Shiri, A.A.; Revie, C.: Query expansion behavior within a thesaurus-enhanced search environment : a user-centered evaluation (2006) 0.07
    0.07052815 = product of:
      0.105792224 = sum of:
        0.08876401 = weight(_text_:search in 56) [ClassicSimilarity], result of:
          0.08876401 = score(doc=56,freq=14.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.5079997 = fieldWeight in 56, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0390625 = fieldNorm(doc=56)
        0.017028214 = product of:
          0.03405643 = sum of:
            0.03405643 = weight(_text_:22 in 56) [ClassicSimilarity], result of:
              0.03405643 = score(doc=56,freq=2.0), product of:
                0.17604718 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05027291 = queryNorm
                0.19345059 = fieldWeight in 56, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=56)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    The study reported here investigated the query expansion behavior of end-users interacting with a thesaurus-enhanced search system on the Web. Two groups, namely academic staff and postgraduate students, were recruited into this study. Data were collected from 90 searches performed by 30 users using the OVID interface to the CAB abstracts database. Data-gathering techniques included questionnaires, screen capturing software, and interviews. The results presented here relate to issues of search-topic and search-term characteristics, number and types of expanded queries, usefulness of thesaurus terms, and behavioral differences between academic staff and postgraduate students in their interaction. The key conclusions drawn were that (a) academic staff chose more narrow and synonymous terms than did postgraduate students, who generally selected broader and related terms; (b) topic complexity affected users' interaction with the thesaurus in that complex topics required more query expansion and search term selection; (c) users' prior topic-search experience appeared to have a significant effect on their selection and evaluation of thesaurus terms; (d) in 50% of the searches where additional terms were suggested from the thesaurus, users stated that they had not been aware of the terms at the beginning of the search; this observation was particularly noticeable in the case of postgraduate students.
    Date
    22. 7.2006 16:32:43
  15. Langville, A.N.; Meyer, C.D.: Google's PageRank and beyond : the science of search engine rankings (2006) 0.06
    0.06418283 = product of:
      0.09627424 = sum of:
        0.05325841 = weight(_text_:search in 6) [ClassicSimilarity], result of:
          0.05325841 = score(doc=6,freq=14.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.30479985 = fieldWeight in 6, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0234375 = fieldNorm(doc=6)
        0.043015826 = product of:
          0.08603165 = sum of:
            0.08603165 = weight(_text_:engines in 6) [ClassicSimilarity], result of:
              0.08603165 = score(doc=6,freq=8.0), product of:
                0.25542772 = queryWeight, product of:
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.05027291 = queryNorm
                0.33681408 = fieldWeight in 6, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.0234375 = fieldNorm(doc=6)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Why doesn't your home page appear on the first page of search results, even when you query your own name? How do other Web pages always appear at the top? What creates these powerful rankings? And how? The first book ever about the science of Web page rankings, "Google's PageRank and Beyond" supplies the answers to these and other questions and more. The book serves two very different audiences: the curious science reader and the technical computational reader. The chapters build in mathematical sophistication, so that the first five are accessible to the general academic reader. While other chapters are much more mathematical in nature, each one contains something for both audiences. For example, the authors include entertaining asides such as how search engines make money and how the Great Firewall of China influences research. The book includes an extensive background chapter designed to help readers learn more about the mathematics of search engines, and it contains several MATLAB codes and links to sample Web data sets. The philosophy throughout is to encourage readers to experiment with the ideas and algorithms in the text. Any business seriously interested in improving its rankings in the major search engines can benefit from the clear examples, sample code, and list of resources provided. It includes: many illustrative examples and entertaining asides; MATLAB code; accessible and informal style; and complete and self-contained section for mathematics review.
    Content
    Inhalt: Chapter 1. Introduction to Web Search Engines: 1.1 A Short History of Information Retrieval - 1.2 An Overview of Traditional Information Retrieval - 1.3 Web Information Retrieval Chapter 2. Crawling, Indexing, and Query Processing: 2.1 Crawling - 2.2 The Content Index - 2.3 Query Processing Chapter 3. Ranking Webpages by Popularity: 3.1 The Scene in 1998 - 3.2 Two Theses - 3.3 Query-Independence Chapter 4. The Mathematics of Google's PageRank: 4.1 The Original Summation Formula for PageRank - 4.2 Matrix Representation of the Summation Equations - 4.3 Problems with the Iterative Process - 4.4 A Little Markov Chain Theory - 4.5 Early Adjustments to the Basic Model - 4.6 Computation of the PageRank Vector - 4.7 Theorem and Proof for Spectrum of the Google Matrix Chapter 5. Parameters in the PageRank Model: 5.1 The a Factor - 5.2 The Hyperlink Matrix H - 5.3 The Teleportation Matrix E Chapter 6. The Sensitivity of PageRank; 6.1 Sensitivity with respect to alpha - 6.2 Sensitivity with respect to H - 6.3 Sensitivity with respect to vT - 6.4 Other Analyses of Sensitivity - 6.5 Sensitivity Theorems and Proofs Chapter 7. The PageRank Problem as a Linear System: 7.1 Properties of (I - alphaS) - 7.2 Properties of (I - alphaH) - 7.3 Proof of the PageRank Sparse Linear System Chapter 8. Issues in Large-Scale Implementation of PageRank: 8.1 Storage Issues - 8.2 Convergence Criterion - 8.3 Accuracy - 8.4 Dangling Nodes - 8.5 Back Button Modeling
    Chapter 9. Accelerating the Computation of PageRank: 9.1 An Adaptive Power Method - 9.2 Extrapolation - 9.3 Aggregation - 9.4 Other Numerical Methods Chapter 10. Updating the PageRank Vector: 10.1 The Two Updating Problems and their History - 10.2 Restarting the Power Method - 10.3 Approximate Updating Using Approximate Aggregation - 10.4 Exact Aggregation - 10.5 Exact vs. Approximate Aggregation - 10.6 Updating with Iterative Aggregation - 10.7 Determining the Partition - 10.8 Conclusions Chapter 11. The HITS Method for Ranking Webpages: 11.1 The HITS Algorithm - 11.2 HITS Implementation - 11.3 HITS Convergence - 11.4 HITS Example - 11.5 Strengths and Weaknesses of HITS - 11.6 HITS's Relationship to Bibliometrics - 11.7 Query-Independent HITS - 11.8 Accelerating HITS - 11.9 HITS Sensitivity Chapter 12. Other Link Methods for Ranking Webpages: 12.1 SALSA - 12.2 Hybrid Ranking Methods - 12.3 Rankings based on Traffic Flow Chapter 13. The Future of Web Information Retrieval: 13.1 Spam - 13.2 Personalization - 13.3 Clustering - 13.4 Intelligent Agents - 13.5 Trends and Time-Sensitive Search - 13.6 Privacy and Censorship - 13.7 Library Classification Schemes - 13.8 Data Fusion Chapter 14. Resources for Web Information Retrieval: 14.1 Resources for Getting Started - 14.2 Resources for Serious Study Chapter 15. The Mathematics Guide: 15.1 Linear Algebra - 15.2 Perron-Frobenius Theory - 15.3 Markov Chains - 15.4 Perron Complementation - 15.5 Stochastic Complementation - 15.6 Censoring - 15.7 Aggregation - 15.8 Disaggregation
  16. Fan, W.; Fox, E.A.; Pathak, P.; Wu, H.: ¬The effects of fitness functions an genetic programming-based ranking discovery for Web search (2004) 0.06
    0.060110323 = product of:
      0.09016548 = sum of:
        0.06973162 = weight(_text_:search in 2239) [ClassicSimilarity], result of:
          0.06973162 = score(doc=2239,freq=6.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.39907667 = fieldWeight in 2239, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.046875 = fieldNorm(doc=2239)
        0.020433856 = product of:
          0.040867712 = sum of:
            0.040867712 = weight(_text_:22 in 2239) [ClassicSimilarity], result of:
              0.040867712 = score(doc=2239,freq=2.0), product of:
                0.17604718 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05027291 = queryNorm
                0.23214069 = fieldWeight in 2239, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2239)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Genetic-based evolutionary learning algorithms, such as genetic algorithms (GAs) and genetic programming (GP), have been applied to information retrieval (IR) since the 1980s. Recently, GP has been applied to a new IR taskdiscovery of ranking functions for Web search-and has achieved very promising results. However, in our prior research, only one fitness function has been used for GP-based learning. It is unclear how other fitness functions may affect ranking function discovery for Web search, especially since it is weIl known that choosing a proper fitness function is very important for the effectiveness and efficiency of evolutionary algorithms. In this article, we report our experience in contrasting different fitness function designs an GP-based learning using a very large Web corpus. Our results indicate that the design of fitness functions is instrumental in performance improvement. We also give recommendations an the design of fitness functions for genetic-based information retrieval experiments.
    Date
    31. 5.2004 19:22:06
  17. Hoenkamp, E.: Unitary operators on the document space (2003) 0.06
    0.05552859 = product of:
      0.08329288 = sum of:
        0.04744636 = weight(_text_:search in 3457) [ClassicSimilarity], result of:
          0.04744636 = score(doc=3457,freq=4.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.27153727 = fieldWeight in 3457, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3457)
        0.03584652 = product of:
          0.07169304 = sum of:
            0.07169304 = weight(_text_:engines in 3457) [ClassicSimilarity], result of:
              0.07169304 = score(doc=3457,freq=2.0), product of:
                0.25542772 = queryWeight, product of:
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.05027291 = queryNorm
                0.2806784 = fieldWeight in 3457, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3457)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    When people search for documents, they eventually want content, not words. Hence, search engines should relate documents more by their underlying concepts than by the words they contain. One promising technique to do so is Latent Semantic Indexing (LSI). LSI dramatically reduces the dimension of the document space by mapping it into a space spanned by conceptual indices. Empirically, the number of concepts that can represent the documents are far fewer than the great variety of words in the textual representation. Although this almost obviates the problem of lexical matching, the mapping incurs a high computational cost compared to document parsing, indexing, query matching, and updating. This article accomplishes several things. First, it shows how the technique underlying LSI is just one example of a unitary operator, for which there are computationally more attractive alternatives. Second, it proposes the Haar transform as such an alternative, as it is memory efficient, and can be computed in linear to sublinear time. Third, it generalizes LSI by a multiresolution representation of the document space. The approach not only preserves the advantages of LSI at drastically reduced computational costs, it also opens a spectrum of possibilities for new research.
  18. Lempel, R.; Moran, S.: SALSA: the stochastic approach for link-structure analysis (2001) 0.06
    0.05552859 = product of:
      0.08329288 = sum of:
        0.04744636 = weight(_text_:search in 10) [ClassicSimilarity], result of:
          0.04744636 = score(doc=10,freq=4.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.27153727 = fieldWeight in 10, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0390625 = fieldNorm(doc=10)
        0.03584652 = product of:
          0.07169304 = sum of:
            0.07169304 = weight(_text_:engines in 10) [ClassicSimilarity], result of:
              0.07169304 = score(doc=10,freq=2.0), product of:
                0.25542772 = queryWeight, product of:
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.05027291 = queryNorm
                0.2806784 = fieldWeight in 10, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=10)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Today, when searching for information on the WWW, one usually performs a query through a term-based search engine. These engines return, as the query's result, a list of Web pages whose contents matches the query. For broad-topic queries, such searches often result in a huge set of retrieved documents, many of which are irrelevant to the user. However, much information is contained in the link-structure of the WWW. Information such as which pages are linked to others can be used to augment search algorithms. In this context, Jon Kleinberg introduced the notion of two distinct types of Web pages: hubs and authorities. Kleinberg argued that hubs and authorities exhibit a mutually reinforcing relationship: a good hub will point to many authorities, and a good authority will be pointed at by many hubs. In light of this, he dervised an algoirthm aimed at finding authoritative pages. We present SALSA, a new stochastic approach for link-structure analysis, which examines random walks on graphs derived from the link-structure. We show that both SALSA and Kleinberg's Mutual Reinforcement approach employ the same metaalgorithm. We then prove that SALSA is quivalent to a weighted in degree analysis of the link-sturcutre of WWW subgraphs, making it computationally more efficient than the Mutual reinforcement approach. We compare that results of applying SALSA to the results derived through Kleinberg's approach. These comparisions reveal a topological Phenomenon called the TKC effectwhich, in certain cases, prevents the Mutual reinforcement approach from identifying meaningful authorities.
  19. Klas, C.-P.; Fuhr, N.; Schaefer, A.: Evaluating strategic support for information access in the DAFFODIL system (2004) 0.05
    0.051579654 = product of:
      0.07736948 = sum of:
        0.056935627 = weight(_text_:search in 2419) [ClassicSimilarity], result of:
          0.056935627 = score(doc=2419,freq=4.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.3258447 = fieldWeight in 2419, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.046875 = fieldNorm(doc=2419)
        0.020433856 = product of:
          0.040867712 = sum of:
            0.040867712 = weight(_text_:22 in 2419) [ClassicSimilarity], result of:
              0.040867712 = score(doc=2419,freq=2.0), product of:
                0.17604718 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05027291 = queryNorm
                0.23214069 = fieldWeight in 2419, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2419)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    The digital library system Daffodil is targeted at strategic support of users during the information search process. For searching, exploring and managing digital library objects it provides user-customisable information seeking patterns over a federation of heterogeneous digital libraries. In this paper evaluation results with respect to retrieval effectiveness, efficiency and user satisfaction are presented. The analysis focuses on strategic support for the scientific work-flow. Daffodil supports the whole work-flow, from data source selection over information seeking to the representation, organisation and reuse of information. By embedding high level search functionality into the scientific work-flow, the user experiences better strategic system support due to a more systematic work process. These ideas have been implemented in Daffodil followed by a qualitative evaluation. The evaluation has been conducted with 28 participants, ranging from information seeking novices to experts. The results are promising, as they support the chosen model.
    Date
    16.11.2008 16:22:48
  20. Lanvent, A.: Licht im Daten Chaos (2004) 0.05
    0.05010998 = product of:
      0.07516497 = sum of:
        0.04648775 = weight(_text_:search in 2806) [ClassicSimilarity], result of:
          0.04648775 = score(doc=2806,freq=6.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.2660511 = fieldWeight in 2806, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.03125 = fieldNorm(doc=2806)
        0.028677218 = product of:
          0.057354435 = sum of:
            0.057354435 = weight(_text_:engines in 2806) [ClassicSimilarity], result of:
              0.057354435 = score(doc=2806,freq=2.0), product of:
                0.25542772 = queryWeight, product of:
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.05027291 = queryNorm
                0.22454272 = fieldWeight in 2806, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.03125 = fieldNorm(doc=2806)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Content
    "Bitte suchen Sie alle Unterlagen, die im PC zum Ibelshäuser-Vertrag in Sprockhövel gespeichert sind. Finden Sie alles, was wir haben - Dokumente, Tabellen, Präsentationen, Scans, E-Mails. Und erledigen Sie das gleich! « Wer diese Aufgabe an das Windows-eigene Suchmodul vergibt, wird zwangsläufig enttäuscht. Denn das Betriebssystem beherrscht weder die formatübergreifende Recherche noch die Kontextsuche, die für solche komplexen Aufträge nötig sind. Professionelle Desktop-Suchmaschinen erledigen Aufgaben dieser Art jedoch im Handumdrehen - genauer gesagt in einer einzigen Sekunde. Spitzenprogramme wie Global Brain benötigen dafür nicht einmal umfangreiche Abfrageformulare. Es genügt, einen Satz im Eingabefeld zu formulieren, der das Thema der gewünschten Dokumente eingrenzt. Dabei suchen die Programme über alle Laufwerke, die sich auf dem System einbinden lassen - also auch im Netzwerk-Ordner (Shared Folder), sofern dieser freigegeben wurde. Allen Testkandidaten - mit Ausnahme von Search 32 - gemeinsam ist, dass sie weitaus bessere Rechercheergebnisse abliefern als Windows, deutlich schneller arbeiten und meist auch in den Online-Postfächern stöbern. Wer schon öfter vergeblich über die Windows-Suche nach wichtigen Dokumenten gefahndet hat, kommt angesichts der Qualität der Search-Engines kaum mehr um die Anschaffung eines Desktop-Suchtools herum. Aber Microsoft will nachbessern. Für den Windows-XP-Nachfolger Longhorn wirbt der Hersteller vor allem mit dem Hinweis auf das neue Dateisystem WinFS, das sämtliche Files auf der Festplatte über Meta-Tags indiziert und dem Anwender damit lange Suchläufe erspart. So sollen sich anders als bei Windows XP alle Dateien zu bestimmten Themen in wenigen Sekunden auflisten lassen - unabhängig vom Format und vom physikalischen Speicherort der Files. Für die Recherche selbst ist dann weder der Dateiname noch das Erstelldatum ausschlaggebend. Anhand der kontextsensitiven Suche von WinFS kann der Anwender einfach einen Suchbefehl wie »Vertragsabschluss mit Firma XYZ, Neunkirchen/Saar« eingeben, der dann ohne Umwege zum Ziel führt."
    Object
    Search 32 8.1

Languages

  • e 56
  • d 5

Types

  • a 56
  • el 3
  • m 3
  • More… Less…