Search (157 results, page 1 of 8)

  • × theme_ss:"Retrievalalgorithmen"
  1. Wills, R.S.: Google's PageRank : the math behind the search engine (2006) 0.18
    0.17662422 = product of:
      0.23549896 = sum of:
        0.023270661 = weight(_text_:web in 5954) [ClassicSimilarity], result of:
          0.023270661 = score(doc=5954,freq=2.0), product of:
            0.16134618 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.049439456 = queryNorm
            0.14422815 = fieldWeight in 5954, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.03125 = fieldNorm(doc=5954)
        0.12380236 = weight(_text_:search in 5954) [ClassicSimilarity], result of:
          0.12380236 = score(doc=5954,freq=44.0), product of:
            0.17183559 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.049439456 = queryNorm
            0.72046983 = fieldWeight in 5954, product of:
              6.6332498 = tf(freq=44.0), with freq of:
                44.0 = termFreq=44.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.03125 = fieldNorm(doc=5954)
        0.08842595 = product of:
          0.1768519 = sum of:
            0.1768519 = weight(_text_:engine in 5954) [ClassicSimilarity], result of:
              0.1768519 = score(doc=5954,freq=16.0), product of:
                0.26447627 = queryWeight, product of:
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.049439456 = queryNorm
                0.6686872 = fieldWeight in 5954, product of:
                  4.0 = tf(freq=16.0), with freq of:
                    16.0 = termFreq=16.0
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.03125 = fieldNorm(doc=5954)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Abstract
    Approximately 91 million American adults use the Internet on a typical day The number-one Internet activity is reading and writing e-mail. Search engine use is next in line and continues to increase in popularity. In fact, survey findings indicate that nearly 60 million American adults use search engines on a given day. Even though there are many Internet search engines, Google, Yahoo!, and MSN receive over 81% of all search requests. Despite claims that the quality of search provided by Yahoo! and MSN now equals that of Google, Google continues to thrive as the search engine of choice, receiving over 46% of all search requests, nearly double the volume of Yahoo! and over four times that of MSN. I use Google's search engine on a daily basis and rarely request information from other search engines. One day, I decided to visit the homepages of Google. Yahoo!, and MSN to compare the quality of search results. Coffee was on my mind that day, so I entered the simple query "coffee" in the search box at each homepage. Table 1 shows the top ten (unsponsored) results returned by each search engine. Although ordered differently, two webpages, www.peets.com and www.coffeegeek.com, appear in all three top ten lists. In addition, each pairing of top ten lists has two additional results in common. Depending on the information I hoped to obtain about coffee by using the search engines, I could argue that any one of the three returned better results: however, I was not looking for a particular webpage, so all three listings of search results seemed of equal quality. Thus, I plan to continue using Google. My decision is indicative of the problem Yahoo!, MSN, and other search engine companies face in the quest to obtain a larger percentage of Internet search volume. Search engine users are loyal to one or a few search engines and are generally happy with search results. Thus, as long as Google continues to provide results deemed high in quality, Google likely will remain the top search engine. But what set Google apart from its competitors in the first place? The answer is PageRank. In this article I explain this simple mathematical algorithm that revolutionized Web search.
  2. Khoo, C.S.G.; Wan, K.-W.: ¬A simple relevancy-ranking strategy for an interface to Boolean OPACs (2004) 0.17
    0.17040431 = product of:
      0.22720575 = sum of:
        0.04072366 = weight(_text_:web in 2509) [ClassicSimilarity], result of:
          0.04072366 = score(doc=2509,freq=8.0), product of:
            0.16134618 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.049439456 = queryNorm
            0.25239927 = fieldWeight in 2509, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.02734375 = fieldNorm(doc=2509)
        0.10832706 = weight(_text_:search in 2509) [ClassicSimilarity], result of:
          0.10832706 = score(doc=2509,freq=44.0), product of:
            0.17183559 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.049439456 = queryNorm
            0.6304111 = fieldWeight in 2509, product of:
              6.6332498 = tf(freq=44.0), with freq of:
                44.0 = termFreq=44.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.02734375 = fieldNorm(doc=2509)
        0.07815504 = sum of:
          0.05471077 = weight(_text_:engine in 2509) [ClassicSimilarity], result of:
            0.05471077 = score(doc=2509,freq=2.0), product of:
              0.26447627 = queryWeight, product of:
                5.349498 = idf(docFreq=570, maxDocs=44218)
                0.049439456 = queryNorm
              0.20686457 = fieldWeight in 2509, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.349498 = idf(docFreq=570, maxDocs=44218)
                0.02734375 = fieldNorm(doc=2509)
          0.023444273 = weight(_text_:22 in 2509) [ClassicSimilarity], result of:
            0.023444273 = score(doc=2509,freq=2.0), product of:
              0.17312855 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.049439456 = queryNorm
              0.1354154 = fieldWeight in 2509, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.02734375 = fieldNorm(doc=2509)
      0.75 = coord(3/4)
    
    Abstract
    A relevancy-ranking algorithm for a natural language interface to Boolean online public access catalogs (OPACs) was formulated and compared with that currently used in a knowledge-based search interface called the E-Referencer, being developed by the authors. The algorithm makes use of seven weIl-known ranking criteria: breadth of match, section weighting, proximity of query words, variant word forms (stemming), document frequency, term frequency and document length. The algorithm converts a natural language query into a series of increasingly broader Boolean search statements. In a small experiment with ten subjects in which the algorithm was simulated by hand, the algorithm obtained good results with a mean overall precision of 0.42 and mean average precision of 0.62, representing a 27 percent improvement in precision and 41 percent improvement in average precision compared to the E-Referencer. The usefulness of each step in the algorithm was analyzed and suggestions are made for improving the algorithm.
    Content
    "Most Web search engines accept natural language queries, perform some kind of fuzzy matching and produce ranked output, displaying first the documents that are most likely to be relevant. On the other hand, most library online public access catalogs (OPACs) an the Web are still Boolean retrieval systems that perform exact matching, and require users to express their search requests precisely in a Boolean search language and to refine their search statements to improve the search results. It is well-documented that users have difficulty searching Boolean OPACs effectively (e.g. Borgman, 1996; Ensor, 1992; Wallace, 1993). One approach to making OPACs easier to use is to develop a natural language search interface that acts as a middleware between the user's Web browser and the OPAC system. The search interface can accept a natural language query from the user and reformulate it as a series of Boolean search statements that are then submitted to the OPAC. The records retrieved by the OPAC are ranked by the search interface before forwarding them to the user's Web browser. The user, then, does not need to interact directly with the Boolean OPAC but with the natural language search interface or search intermediary. The search interface interacts with the OPAC system an the user's behalf. The advantage of this approach is that no modification to the OPAC or library system is required. Furthermore, the search interface can access multiple OPACs, acting as a meta search engine, and integrate search results from various OPACs before sending them to the user. The search interface needs to incorporate a method for converting the user's natural language query into a series of Boolean search statements, and for ranking the OPAC records retrieved. The purpose of this study was to develop a relevancyranking algorithm for a search interface to Boolean OPAC systems. This is part of an on-going effort to develop a knowledge-based search interface to OPACs called the E-Referencer (Khoo et al., 1998, 1999; Poo et al., 2000). E-Referencer v. 2 that has been implemented applies a repertoire of initial search strategies and reformulation strategies to retrieve records from OPACs using the Z39.50 protocol, and also assists users in mapping query keywords to the Library of Congress subject headings."
    Source
    Electronic library. 22(2004) no.2, S.112-120
  3. Thelwall, M.; Vaughan, L.: New versions of PageRank employing alternative Web document models (2004) 0.14
    0.13891208 = product of:
      0.18521611 = sum of:
        0.09872905 = weight(_text_:web in 674) [ClassicSimilarity], result of:
          0.09872905 = score(doc=674,freq=16.0), product of:
            0.16134618 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.049439456 = queryNorm
            0.6119082 = fieldWeight in 674, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=674)
        0.03959212 = weight(_text_:search in 674) [ClassicSimilarity], result of:
          0.03959212 = score(doc=674,freq=2.0), product of:
            0.17183559 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.049439456 = queryNorm
            0.230407 = fieldWeight in 674, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.046875 = fieldNorm(doc=674)
        0.04689494 = product of:
          0.09378988 = sum of:
            0.09378988 = weight(_text_:engine in 674) [ClassicSimilarity], result of:
              0.09378988 = score(doc=674,freq=2.0), product of:
                0.26447627 = queryWeight, product of:
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.049439456 = queryNorm
                0.35462496 = fieldWeight in 674, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.046875 = fieldNorm(doc=674)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Abstract
    Introduces several new versions of PageRank (the link based Web page ranking algorithm), based on an information science perspective on the concept of the Web document. Although the Web page is the typical indivisible unit of information in search engine results and most Web information retrieval algorithms, other research has suggested that aggregating pages based on directories and domains gives promising alternatives, particularly when Web links are the object of study. The new algorithms introduced based on these alternatives were used to rank four sets of Web pages. The ranking results were compared with human subjects' rankings. The results of the tests were somewhat inconclusive: the new approach worked well for the set that includes pages from different Web sites; however, it does not work well in ranking pages that are from the same site. It seems that the new algorithms may be effective for some tasks but not for others, especially when only low numbers of links are involved or the pages to be ranked are from the same site or directory.
  4. Austin, D.: How Google finds your needle in the Web's haystack : as we'll see, the trick is to ask the web itself to rank the importance of pages... (2006) 0.14
    0.13873328 = product of:
      0.18497771 = sum of:
        0.05759195 = weight(_text_:web in 93) [ClassicSimilarity], result of:
          0.05759195 = score(doc=93,freq=16.0), product of:
            0.16134618 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.049439456 = queryNorm
            0.35694647 = fieldWeight in 93, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.02734375 = fieldNorm(doc=93)
        0.08000484 = weight(_text_:search in 93) [ClassicSimilarity], result of:
          0.08000484 = score(doc=93,freq=24.0), product of:
            0.17183559 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.049439456 = queryNorm
            0.46558946 = fieldWeight in 93, product of:
              4.8989797 = tf(freq=24.0), with freq of:
                24.0 = termFreq=24.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.02734375 = fieldNorm(doc=93)
        0.047380917 = product of:
          0.09476183 = sum of:
            0.09476183 = weight(_text_:engine in 93) [ClassicSimilarity], result of:
              0.09476183 = score(doc=93,freq=6.0), product of:
                0.26447627 = queryWeight, product of:
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.049439456 = queryNorm
                0.35829994 = fieldWeight in 93, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.02734375 = fieldNorm(doc=93)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Abstract
    Imagine a library containing 25 billion documents but with no centralized organization and no librarians. In addition, anyone may add a document at any time without telling anyone. You may feel sure that one of the documents contained in the collection has a piece of information that is vitally important to you, and, being impatient like most of us, you'd like to find it in a matter of seconds. How would you go about doing it? Posed in this way, the problem seems impossible. Yet this description is not too different from the World Wide Web, a huge, highly-disorganized collection of documents in many different formats. Of course, we're all familiar with search engines (perhaps you found this article using one) so we know that there is a solution. This article will describe Google's PageRank algorithm and how it returns pages from the web's collection of 25 billion documents that match search criteria so well that "google" has become a widely used verb. Most search engines, including Google, continually run an army of computer programs that retrieve pages from the web, index the words in each document, and store this information in an efficient format. Each time a user asks for a web search using a search phrase, such as "search engine," the search engine determines all the pages on the web that contains the words in the search phrase. (Perhaps additional information such as the distance between the words "search" and "engine" will be noted as well.) Here is the problem: Google now claims to index 25 billion pages. Roughly 95% of the text in web pages is composed from a mere 10,000 words. This means that, for most searches, there will be a huge number of pages containing the words in the search phrase. What is needed is a means of ranking the importance of the pages that fit the search criteria so that the pages can be sorted with the most important pages at the top of the list. One way to determine the importance of pages is to use a human-generated ranking. For instance, you may have seen pages that consist mainly of a large number of links to other resources in a particular area of interest. Assuming the person maintaining this page is reliable, the pages referenced are likely to be useful. Of course, the list may quickly fall out of date, and the person maintaining the list may miss some important pages, either unintentionally or as a result of an unstated bias. Google's PageRank algorithm assesses the importance of web pages without human evaluation of the content. In fact, Google feels that the value of its service is largely in its ability to provide unbiased results to search queries; Google claims, "the heart of our software is PageRank." As we'll see, the trick is to ask the web itself to rank the importance of pages.
  5. Bar-Ilan, J.; Levene, M.; Mat-Hassan, M.: Methods for evaluating dynamic changes in search engine rankings : a case study (2006) 0.13
    0.12789512 = product of:
      0.17052683 = sum of:
        0.032909684 = weight(_text_:web in 616) [ClassicSimilarity], result of:
          0.032909684 = score(doc=616,freq=4.0), product of:
            0.16134618 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.049439456 = queryNorm
            0.2039694 = fieldWeight in 616, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.03125 = fieldNorm(doc=616)
        0.08346753 = weight(_text_:search in 616) [ClassicSimilarity], result of:
          0.08346753 = score(doc=616,freq=20.0), product of:
            0.17183559 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.049439456 = queryNorm
            0.48574063 = fieldWeight in 616, product of:
              4.472136 = tf(freq=20.0), with freq of:
                20.0 = termFreq=20.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.03125 = fieldNorm(doc=616)
        0.054149617 = product of:
          0.10829923 = sum of:
            0.10829923 = weight(_text_:engine in 616) [ClassicSimilarity], result of:
              0.10829923 = score(doc=616,freq=6.0), product of:
                0.26447627 = queryWeight, product of:
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.049439456 = queryNorm
                0.40948564 = fieldWeight in 616, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.03125 = fieldNorm(doc=616)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Abstract
    Purpose - The objective of this paper is to characterize the changes in the rankings of the top ten results of major search engines over time and to compare the rankings between these engines. Design/methodology/approach - The papers compare rankings of the top-ten results of the search engines Google and AlltheWeb on ten identical queries over a period of three weeks. Only the top-ten results were considered, since users do not normally inspect more than the first results page returned by a search engine. The experiment was repeated twice, in October 2003 and in January 2004, in order to assess changes to the top-ten results of some of the queries during the three months interval. In order to assess the changes in the rankings, three measures were computed for each data collection point and each search engine. Findings - The findings in this paper show that the rankings of AlltheWeb were highly stable over each period, while the rankings of Google underwent constant yet minor changes, with occasional major ones. Changes over time can be explained by the dynamic nature of the web or by fluctuations in the search engines' indexes. The top-ten results of the two search engines had surprisingly low overlap. With such small overlap, the task of comparing the rankings of the two engines becomes extremely challenging. Originality/value - The paper shows that because of the abundance of information on the web, ranking search results is of extreme importance. The paper compares several measures for computing the similarity between rankings of search tools, and shows that none of the measures is fully satisfactory as a standalone measure. It also demonstrates the apparent differences in the ranking algorithms of two widely used search engines.
  6. Chen, Z.; Meng, X.; Fowler, R.H.; Zhu, B.: Real-time adaptive feature and document learning for Web search (2001) 0.13
    0.12763417 = product of:
      0.17017889 = sum of:
        0.041137107 = weight(_text_:web in 5209) [ClassicSimilarity], result of:
          0.041137107 = score(doc=5209,freq=4.0), product of:
            0.16134618 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.049439456 = queryNorm
            0.25496176 = fieldWeight in 5209, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5209)
        0.07377557 = weight(_text_:search in 5209) [ClassicSimilarity], result of:
          0.07377557 = score(doc=5209,freq=10.0), product of:
            0.17183559 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.049439456 = queryNorm
            0.4293381 = fieldWeight in 5209, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5209)
        0.05526622 = product of:
          0.11053244 = sum of:
            0.11053244 = weight(_text_:engine in 5209) [ClassicSimilarity], result of:
              0.11053244 = score(doc=5209,freq=4.0), product of:
                0.26447627 = queryWeight, product of:
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.049439456 = queryNorm
                0.41792953 = fieldWeight in 5209, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5209)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Abstract
    Chen et alia report on the design of FEATURES, a web search engine with adaptive features based on minimal relevance feedback. Rather than developing user profiles from previous searcher activity either at the server or client location, or updating indexes after search completion, FEATURES allows for index and user characterization files to be updated during query modification on retrieval from a general purpose search engine. Indexing terms relevant to a query are defined as the union of all terms assigned to documents retrieved by the initial search run and are used to build a vector space model on this retrieved set. The top ten weighted terms are presented to the user for a relevant non-relevant choice which is used to modify the term weights. Documents are chosen if their summed term weights are greater than some threshold. A user evaluation of the top ten ranked documents as non-relevant will decrease these term weights and a positive judgement will increase them. A new ordering of the retrieved set will generate new display lists of terms and documents. Precision is improved in a test on Alta Vista searches.
  7. Käki, M.: fKWIC: frequency-based Keyword-in-Context Index for filtering Web search results (2006) 0.12
    0.12362628 = product of:
      0.16483504 = sum of:
        0.049364526 = weight(_text_:web in 6112) [ClassicSimilarity], result of:
          0.049364526 = score(doc=6112,freq=4.0), product of:
            0.16134618 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.049439456 = queryNorm
            0.3059541 = fieldWeight in 6112, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=6112)
        0.068575576 = weight(_text_:search in 6112) [ClassicSimilarity], result of:
          0.068575576 = score(doc=6112,freq=6.0), product of:
            0.17183559 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.049439456 = queryNorm
            0.39907667 = fieldWeight in 6112, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.046875 = fieldNorm(doc=6112)
        0.04689494 = product of:
          0.09378988 = sum of:
            0.09378988 = weight(_text_:engine in 6112) [ClassicSimilarity], result of:
              0.09378988 = score(doc=6112,freq=2.0), product of:
                0.26447627 = queryWeight, product of:
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.049439456 = queryNorm
                0.35462496 = fieldWeight in 6112, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.046875 = fieldNorm(doc=6112)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Abstract
    Enormous Web search engine databases combined with short search queries result in large result sets that are often difficult to access. Result ranking works fairly well, but users need help when it fails. For these situations, we propose a filtering interface that is inspired by keyword-in-context (KWIC) indices. The user interface lists the most frequent keyword contexts (fKWIC). When a context is selected, the corresponding results are displayed in the result list, allowing users to concentrate on the specific context. We compared the keyword context index user interface to the rank order result listing in an experiment with 36 participants. The results show that the proposed user interface was 29% faster in finding relevant results, and the precision of the selected results was 19% higher. In addition, participants showed positive attitudes toward the system.
  8. Fan, W.; Fox, E.A.; Pathak, P.; Wu, H.: ¬The effects of fitness functions an genetic programming-based ranking discovery for Web search (2004) 0.12
    0.11886199 = product of:
      0.15848266 = sum of:
        0.06981198 = weight(_text_:web in 2239) [ClassicSimilarity], result of:
          0.06981198 = score(doc=2239,freq=8.0), product of:
            0.16134618 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.049439456 = queryNorm
            0.43268442 = fieldWeight in 2239, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=2239)
        0.068575576 = weight(_text_:search in 2239) [ClassicSimilarity], result of:
          0.068575576 = score(doc=2239,freq=6.0), product of:
            0.17183559 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.049439456 = queryNorm
            0.39907667 = fieldWeight in 2239, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.046875 = fieldNorm(doc=2239)
        0.02009509 = product of:
          0.04019018 = sum of:
            0.04019018 = weight(_text_:22 in 2239) [ClassicSimilarity], result of:
              0.04019018 = score(doc=2239,freq=2.0), product of:
                0.17312855 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049439456 = queryNorm
                0.23214069 = fieldWeight in 2239, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2239)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Abstract
    Genetic-based evolutionary learning algorithms, such as genetic algorithms (GAs) and genetic programming (GP), have been applied to information retrieval (IR) since the 1980s. Recently, GP has been applied to a new IR taskdiscovery of ranking functions for Web search-and has achieved very promising results. However, in our prior research, only one fitness function has been used for GP-based learning. It is unclear how other fitness functions may affect ranking function discovery for Web search, especially since it is weIl known that choosing a proper fitness function is very important for the effectiveness and efficiency of evolutionary algorithms. In this article, we report our experience in contrasting different fitness function designs an GP-based learning using a very large Web corpus. Our results indicate that the design of fitness functions is instrumental in performance improvement. We also give recommendations an the design of fitness functions for genetic-based information retrieval experiments.
    Date
    31. 5.2004 19:22:06
  9. Jiang, J.-D.; Jiang, J.-Y.; Cheng, P.-J.: Cocluster hypothesis and ranking consistency for relevance ranking in web search (2019) 0.12
    0.11859758 = product of:
      0.15813011 = sum of:
        0.029088326 = weight(_text_:web in 5247) [ClassicSimilarity], result of:
          0.029088326 = score(doc=5247,freq=2.0), product of:
            0.16134618 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.049439456 = queryNorm
            0.18028519 = fieldWeight in 5247, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5247)
        0.07377557 = weight(_text_:search in 5247) [ClassicSimilarity], result of:
          0.07377557 = score(doc=5247,freq=10.0), product of:
            0.17183559 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.049439456 = queryNorm
            0.4293381 = fieldWeight in 5247, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5247)
        0.05526622 = product of:
          0.11053244 = sum of:
            0.11053244 = weight(_text_:engine in 5247) [ClassicSimilarity], result of:
              0.11053244 = score(doc=5247,freq=4.0), product of:
                0.26447627 = queryWeight, product of:
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.049439456 = queryNorm
                0.41792953 = fieldWeight in 5247, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5247)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Abstract
    Conventional approaches to relevance ranking typically optimize ranking models by each query separately. The traditional cluster hypothesis also does not consider the dependency between related queries. The goal of this paper is to leverage similar search intents to perform ranking consistency so that the search performance can be improved accordingly. Different from the previous supervised approach, which learns relevance by click-through data, we propose a novel cocluster hypothesis to bridge the gap between relevance ranking and ranking consistency. A nearest-neighbors test is also designed to measure the extent to which the cocluster hypothesis holds. Based on the hypothesis, we further propose a two-stage unsupervised approach, in which two ranking heuristics and a cost function are developed to optimize the combination of consistency and uniqueness (or inconsistency). Extensive experiments have been conducted on a real and large-scale search engine log. The experimental results not only verify the applicability of the proposed cocluster hypothesis but also show that our approach is effective in boosting the retrieval performance of the commercial search engine and reaches a comparable performance to the supervised approach.
  10. Dominich, S.; Skrop, A.: PageRank and interaction information retrieval (2005) 0.12
    0.117224276 = product of:
      0.15629904 = sum of:
        0.06981198 = weight(_text_:web in 3268) [ClassicSimilarity], result of:
          0.06981198 = score(doc=3268,freq=8.0), product of:
            0.16134618 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.049439456 = queryNorm
            0.43268442 = fieldWeight in 3268, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=3268)
        0.03959212 = weight(_text_:search in 3268) [ClassicSimilarity], result of:
          0.03959212 = score(doc=3268,freq=2.0), product of:
            0.17183559 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.049439456 = queryNorm
            0.230407 = fieldWeight in 3268, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.046875 = fieldNorm(doc=3268)
        0.04689494 = product of:
          0.09378988 = sum of:
            0.09378988 = weight(_text_:engine in 3268) [ClassicSimilarity], result of:
              0.09378988 = score(doc=3268,freq=2.0), product of:
                0.26447627 = queryWeight, product of:
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.049439456 = queryNorm
                0.35462496 = fieldWeight in 3268, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3268)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Abstract
    The PageRank method is used by the Google Web search engine to compute the importance of Web pages. Two different views have been developed for the Interpretation of the PageRank method and values: (a) stochastic (random surfer): the PageRank values can be conceived as the steady-state distribution of a Markov chain, and (b) algebraic: the PageRank values form the eigenvector corresponding to eigenvalue 1 of the Web link matrix. The Interaction Information Retrieval (1**2 R) method is a nonclassical information retrieval paradigm, which represents a connectionist approach based an dynamic systems. In the present paper, a different Interpretation of PageRank is proposed, namely, a dynamic systems viewpoint, by showing that the PageRank method can be formally interpreted as a particular case of the Interaction Information Retrieval method; and thus, the PageRank values may be interpreted as neutral equilibrium points of the Web.
  11. Berry, M.W.; Browne, M.: Understanding search engines : mathematical modeling and text retrieval (2005) 0.11
    0.11383371 = product of:
      0.15177828 = sum of:
        0.032909684 = weight(_text_:web in 7) [ClassicSimilarity], result of:
          0.032909684 = score(doc=7,freq=4.0), product of:
            0.16134618 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.049439456 = queryNorm
            0.2039694 = fieldWeight in 7, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.03125 = fieldNorm(doc=7)
        0.07465562 = weight(_text_:search in 7) [ClassicSimilarity], result of:
          0.07465562 = score(doc=7,freq=16.0), product of:
            0.17183559 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.049439456 = queryNorm
            0.43445963 = fieldWeight in 7, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.03125 = fieldNorm(doc=7)
        0.044212975 = product of:
          0.08842595 = sum of:
            0.08842595 = weight(_text_:engine in 7) [ClassicSimilarity], result of:
              0.08842595 = score(doc=7,freq=4.0), product of:
                0.26447627 = queryWeight, product of:
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.049439456 = queryNorm
                0.3343436 = fieldWeight in 7, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.03125 = fieldNorm(doc=7)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Abstract
    The second edition of Understanding Search Engines: Mathematical Modeling and Text Retrieval follows the basic premise of the first edition by discussing many of the key design issues for building search engines and emphasizing the important role that applied mathematics can play in improving information retrieval. The authors discuss important data structures, algorithms, and software as well as user-centered issues such as interfaces, manual indexing, and document preparation. Significant changes bring the text up to date on current information retrieval methods: for example the addition of a new chapter on link-structure algorithms used in search engines such as Google. The chapter on user interface has been rewritten to specifically focus on search engine usability. In addition the authors have added new recommendations for further reading and expanded the bibliography, and have updated and streamlined the index to make it more reader friendly.
    Content
    Inhalt: Introduction Document File Preparation - Manual Indexing - Information Extraction - Vector Space Modeling - Matrix Decompositions - Query Representations - Ranking and Relevance Feedback - Searching by Link Structure - User Interface - Book Format Document File Preparation Document Purification and Analysis - Text Formatting - Validation - Manual Indexing - Automatic Indexing - Item Normalization - Inverted File Structures - Document File - Dictionary List - Inversion List - Other File Structures Vector Space Models Construction - Term-by-Document Matrices - Simple Query Matching - Design Issues - Term Weighting - Sparse Matrix Storage - Low-Rank Approximations Matrix Decompositions QR Factorization - Singular Value Decomposition - Low-Rank Approximations - Query Matching - Software - Semidiscrete Decomposition - Updating Techniques Query Management Query Binding - Types of Queries - Boolean Queries - Natural Language Queries - Thesaurus Queries - Fuzzy Queries - Term Searches - Probabilistic Queries Ranking and Relevance Feedback Performance Evaluation - Precision - Recall - Average Precision - Genetic Algorithms - Relevance Feedback Searching by Link Structure HITS Method - HITS Implementation - HITS Summary - PageRank Method - PageRank Adjustments - PageRank Implementation - PageRank Summary User Interface Considerations General Guidelines - Search Engine Interfaces - Form Fill-in - Display Considerations - Progress Indication - No Penalties for Error - Results - Test and Retest - Final Considerations Further Reading
    LCSH
    Web search engines
    Subject
    Web search engines
  12. Hubert, G.; Mothe, J.: ¬An adaptable search engine for multimodal information retrieval (2009) 0.11
    0.106939115 = product of:
      0.21387823 = sum of:
        0.105578996 = weight(_text_:search in 2951) [ClassicSimilarity], result of:
          0.105578996 = score(doc=2951,freq=8.0), product of:
            0.17183559 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.049439456 = queryNorm
            0.6144187 = fieldWeight in 2951, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0625 = fieldNorm(doc=2951)
        0.10829923 = product of:
          0.21659847 = sum of:
            0.21659847 = weight(_text_:engine in 2951) [ClassicSimilarity], result of:
              0.21659847 = score(doc=2951,freq=6.0), product of:
                0.26447627 = queryWeight, product of:
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.049439456 = queryNorm
                0.8189713 = fieldWeight in 2951, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.0625 = fieldNorm(doc=2951)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    This article describes an information retrieval approach according to the two different search modes that exist: browsing an ontology (via categories) or defining a query in free language (via keywords). Various proposals offer approaches adapted to one of these two modes. We present a proposal leading to a system allowing the integration of both modes using the same search engine. This engine is adapted according to each possible search mode.
  13. Chang, C.-H.; Hsu, C.-C.: Integrating query expansion and conceptual relevance feedback for personalized Web information retrieval (1998) 0.11
    0.1051279 = product of:
      0.14017053 = sum of:
        0.07053544 = weight(_text_:web in 1319) [ClassicSimilarity], result of:
          0.07053544 = score(doc=1319,freq=6.0), product of:
            0.16134618 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.049439456 = queryNorm
            0.43716836 = fieldWeight in 1319, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1319)
        0.046190813 = weight(_text_:search in 1319) [ClassicSimilarity], result of:
          0.046190813 = score(doc=1319,freq=2.0), product of:
            0.17183559 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.049439456 = queryNorm
            0.2688082 = fieldWeight in 1319, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1319)
        0.023444273 = product of:
          0.046888545 = sum of:
            0.046888545 = weight(_text_:22 in 1319) [ClassicSimilarity], result of:
              0.046888545 = score(doc=1319,freq=2.0), product of:
                0.17312855 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049439456 = queryNorm
                0.2708308 = fieldWeight in 1319, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1319)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Abstract
    Keyword based querying has been an immediate and efficient way to specify and retrieve related information that the user inquired. However, conventional document ranking based on an automatic assessment of document relevance to the query may not be the best approach when little information is given. Proposes an idea to integrate 2 existing techniques, query expansion and relevance feedback to achieve a concept-based information search for the Web
    Date
    1. 8.1996 22:08:06
    Footnote
    Contribution to a special issue devoted to the Proceedings of the 7th International World Wide Web Conference, held 14-18 April 1998, Brisbane, Australia
  14. Notess, G.R.: Search engine relevance : the never-ending quest (2000) 0.10
    0.10090158 = product of:
      0.20180316 = sum of:
        0.09238163 = weight(_text_:search in 4797) [ClassicSimilarity], result of:
          0.09238163 = score(doc=4797,freq=2.0), product of:
            0.17183559 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.049439456 = queryNorm
            0.5376164 = fieldWeight in 4797, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.109375 = fieldNorm(doc=4797)
        0.10942154 = product of:
          0.21884307 = sum of:
            0.21884307 = weight(_text_:engine in 4797) [ClassicSimilarity], result of:
              0.21884307 = score(doc=4797,freq=2.0), product of:
                0.26447627 = queryWeight, product of:
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.049439456 = queryNorm
                0.82745826 = fieldWeight in 4797, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.109375 = fieldNorm(doc=4797)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
  15. Shiri, A.A.; Revie, C.: Query expansion behavior within a thesaurus-enhanced search environment : a user-centered evaluation (2006) 0.10
    0.09984501 = product of:
      0.13312668 = sum of:
        0.029088326 = weight(_text_:web in 56) [ClassicSimilarity], result of:
          0.029088326 = score(doc=56,freq=2.0), product of:
            0.16134618 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.049439456 = queryNorm
            0.18028519 = fieldWeight in 56, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=56)
        0.08729243 = weight(_text_:search in 56) [ClassicSimilarity], result of:
          0.08729243 = score(doc=56,freq=14.0), product of:
            0.17183559 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.049439456 = queryNorm
            0.5079997 = fieldWeight in 56, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0390625 = fieldNorm(doc=56)
        0.01674591 = product of:
          0.03349182 = sum of:
            0.03349182 = weight(_text_:22 in 56) [ClassicSimilarity], result of:
              0.03349182 = score(doc=56,freq=2.0), product of:
                0.17312855 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049439456 = queryNorm
                0.19345059 = fieldWeight in 56, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=56)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Abstract
    The study reported here investigated the query expansion behavior of end-users interacting with a thesaurus-enhanced search system on the Web. Two groups, namely academic staff and postgraduate students, were recruited into this study. Data were collected from 90 searches performed by 30 users using the OVID interface to the CAB abstracts database. Data-gathering techniques included questionnaires, screen capturing software, and interviews. The results presented here relate to issues of search-topic and search-term characteristics, number and types of expanded queries, usefulness of thesaurus terms, and behavioral differences between academic staff and postgraduate students in their interaction. The key conclusions drawn were that (a) academic staff chose more narrow and synonymous terms than did postgraduate students, who generally selected broader and related terms; (b) topic complexity affected users' interaction with the thesaurus in that complex topics required more query expansion and search term selection; (c) users' prior topic-search experience appeared to have a significant effect on their selection and evaluation of thesaurus terms; (d) in 50% of the searches where additional terms were suggested from the thesaurus, users stated that they had not been aware of the terms at the beginning of the search; this observation was particularly noticeable in the case of postgraduate students.
    Date
    22. 7.2006 16:32:43
  16. Langville, A.N.; Meyer, C.D.: Google's PageRank and beyond : the science of search engine rankings (2006) 0.10
    0.096136436 = product of:
      0.12818192 = sum of:
        0.052358985 = weight(_text_:web in 6) [ClassicSimilarity], result of:
          0.052358985 = score(doc=6,freq=18.0), product of:
            0.16134618 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.049439456 = queryNorm
            0.32451332 = fieldWeight in 6, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0234375 = fieldNorm(doc=6)
        0.052375462 = weight(_text_:search in 6) [ClassicSimilarity], result of:
          0.052375462 = score(doc=6,freq=14.0), product of:
            0.17183559 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.049439456 = queryNorm
            0.30479985 = fieldWeight in 6, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0234375 = fieldNorm(doc=6)
        0.02344747 = product of:
          0.04689494 = sum of:
            0.04689494 = weight(_text_:engine in 6) [ClassicSimilarity], result of:
              0.04689494 = score(doc=6,freq=2.0), product of:
                0.26447627 = queryWeight, product of:
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.049439456 = queryNorm
                0.17731248 = fieldWeight in 6, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.0234375 = fieldNorm(doc=6)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Abstract
    Why doesn't your home page appear on the first page of search results, even when you query your own name? How do other Web pages always appear at the top? What creates these powerful rankings? And how? The first book ever about the science of Web page rankings, "Google's PageRank and Beyond" supplies the answers to these and other questions and more. The book serves two very different audiences: the curious science reader and the technical computational reader. The chapters build in mathematical sophistication, so that the first five are accessible to the general academic reader. While other chapters are much more mathematical in nature, each one contains something for both audiences. For example, the authors include entertaining asides such as how search engines make money and how the Great Firewall of China influences research. The book includes an extensive background chapter designed to help readers learn more about the mathematics of search engines, and it contains several MATLAB codes and links to sample Web data sets. The philosophy throughout is to encourage readers to experiment with the ideas and algorithms in the text. Any business seriously interested in improving its rankings in the major search engines can benefit from the clear examples, sample code, and list of resources provided. It includes: many illustrative examples and entertaining asides; MATLAB code; accessible and informal style; and complete and self-contained section for mathematics review.
    Content
    Inhalt: Chapter 1. Introduction to Web Search Engines: 1.1 A Short History of Information Retrieval - 1.2 An Overview of Traditional Information Retrieval - 1.3 Web Information Retrieval Chapter 2. Crawling, Indexing, and Query Processing: 2.1 Crawling - 2.2 The Content Index - 2.3 Query Processing Chapter 3. Ranking Webpages by Popularity: 3.1 The Scene in 1998 - 3.2 Two Theses - 3.3 Query-Independence Chapter 4. The Mathematics of Google's PageRank: 4.1 The Original Summation Formula for PageRank - 4.2 Matrix Representation of the Summation Equations - 4.3 Problems with the Iterative Process - 4.4 A Little Markov Chain Theory - 4.5 Early Adjustments to the Basic Model - 4.6 Computation of the PageRank Vector - 4.7 Theorem and Proof for Spectrum of the Google Matrix Chapter 5. Parameters in the PageRank Model: 5.1 The a Factor - 5.2 The Hyperlink Matrix H - 5.3 The Teleportation Matrix E Chapter 6. The Sensitivity of PageRank; 6.1 Sensitivity with respect to alpha - 6.2 Sensitivity with respect to H - 6.3 Sensitivity with respect to vT - 6.4 Other Analyses of Sensitivity - 6.5 Sensitivity Theorems and Proofs Chapter 7. The PageRank Problem as a Linear System: 7.1 Properties of (I - alphaS) - 7.2 Properties of (I - alphaH) - 7.3 Proof of the PageRank Sparse Linear System Chapter 8. Issues in Large-Scale Implementation of PageRank: 8.1 Storage Issues - 8.2 Convergence Criterion - 8.3 Accuracy - 8.4 Dangling Nodes - 8.5 Back Button Modeling
    Chapter 9. Accelerating the Computation of PageRank: 9.1 An Adaptive Power Method - 9.2 Extrapolation - 9.3 Aggregation - 9.4 Other Numerical Methods Chapter 10. Updating the PageRank Vector: 10.1 The Two Updating Problems and their History - 10.2 Restarting the Power Method - 10.3 Approximate Updating Using Approximate Aggregation - 10.4 Exact Aggregation - 10.5 Exact vs. Approximate Aggregation - 10.6 Updating with Iterative Aggregation - 10.7 Determining the Partition - 10.8 Conclusions Chapter 11. The HITS Method for Ranking Webpages: 11.1 The HITS Algorithm - 11.2 HITS Implementation - 11.3 HITS Convergence - 11.4 HITS Example - 11.5 Strengths and Weaknesses of HITS - 11.6 HITS's Relationship to Bibliometrics - 11.7 Query-Independent HITS - 11.8 Accelerating HITS - 11.9 HITS Sensitivity Chapter 12. Other Link Methods for Ranking Webpages: 12.1 SALSA - 12.2 Hybrid Ranking Methods - 12.3 Rankings based on Traffic Flow Chapter 13. The Future of Web Information Retrieval: 13.1 Spam - 13.2 Personalization - 13.3 Clustering - 13.4 Intelligent Agents - 13.5 Trends and Time-Sensitive Search - 13.6 Privacy and Censorship - 13.7 Library Classification Schemes - 13.8 Data Fusion Chapter 14. Resources for Web Information Retrieval: 14.1 Resources for Getting Started - 14.2 Resources for Serious Study Chapter 15. The Mathematics Guide: 15.1 Linear Algebra - 15.2 Perron-Frobenius Theory - 15.3 Markov Chains - 15.4 Perron Complementation - 15.5 Stochastic Complementation - 15.6 Censoring - 15.7 Aggregation - 15.8 Disaggregation
    RSWK
    Google / Web-Seite / Rangstatistik (HEBIS)
    Subject
    Google / Web-Seite / Rangstatistik (HEBIS)
  17. Lempel, R.; Moran, S.: SALSA: the stochastic approach for link-structure analysis (2001) 0.10
    0.095157 = product of:
      0.126876 = sum of:
        0.041137107 = weight(_text_:web in 10) [ClassicSimilarity], result of:
          0.041137107 = score(doc=10,freq=4.0), product of:
            0.16134618 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.049439456 = queryNorm
            0.25496176 = fieldWeight in 10, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=10)
        0.046659768 = weight(_text_:search in 10) [ClassicSimilarity], result of:
          0.046659768 = score(doc=10,freq=4.0), product of:
            0.17183559 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.049439456 = queryNorm
            0.27153727 = fieldWeight in 10, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0390625 = fieldNorm(doc=10)
        0.03907912 = product of:
          0.07815824 = sum of:
            0.07815824 = weight(_text_:engine in 10) [ClassicSimilarity], result of:
              0.07815824 = score(doc=10,freq=2.0), product of:
                0.26447627 = queryWeight, product of:
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.049439456 = queryNorm
                0.29552078 = fieldWeight in 10, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=10)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Abstract
    Today, when searching for information on the WWW, one usually performs a query through a term-based search engine. These engines return, as the query's result, a list of Web pages whose contents matches the query. For broad-topic queries, such searches often result in a huge set of retrieved documents, many of which are irrelevant to the user. However, much information is contained in the link-structure of the WWW. Information such as which pages are linked to others can be used to augment search algorithms. In this context, Jon Kleinberg introduced the notion of two distinct types of Web pages: hubs and authorities. Kleinberg argued that hubs and authorities exhibit a mutually reinforcing relationship: a good hub will point to many authorities, and a good authority will be pointed at by many hubs. In light of this, he dervised an algoirthm aimed at finding authoritative pages. We present SALSA, a new stochastic approach for link-structure analysis, which examines random walks on graphs derived from the link-structure. We show that both SALSA and Kleinberg's Mutual Reinforcement approach employ the same metaalgorithm. We then prove that SALSA is quivalent to a weighted in degree analysis of the link-sturcutre of WWW subgraphs, making it computationally more efficient than the Mutual reinforcement approach. We compare that results of applying SALSA to the results derived through Kleinberg's approach. These comparisions reveal a topological Phenomenon called the TKC effectwhich, in certain cases, prevents the Mutual reinforcement approach from identifying meaningful authorities.
  18. Furner, J.: ¬A unifying model of document relatedness for hybrid search engines (2003) 0.09
    0.09498589 = product of:
      0.18997177 = sum of:
        0.055991717 = weight(_text_:search in 2717) [ClassicSimilarity], result of:
          0.055991717 = score(doc=2717,freq=4.0), product of:
            0.17183559 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.049439456 = queryNorm
            0.3258447 = fieldWeight in 2717, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.046875 = fieldNorm(doc=2717)
        0.13398007 = sum of:
          0.09378988 = weight(_text_:engine in 2717) [ClassicSimilarity], result of:
            0.09378988 = score(doc=2717,freq=2.0), product of:
              0.26447627 = queryWeight, product of:
                5.349498 = idf(docFreq=570, maxDocs=44218)
                0.049439456 = queryNorm
              0.35462496 = fieldWeight in 2717, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.349498 = idf(docFreq=570, maxDocs=44218)
                0.046875 = fieldNorm(doc=2717)
          0.04019018 = weight(_text_:22 in 2717) [ClassicSimilarity], result of:
            0.04019018 = score(doc=2717,freq=2.0), product of:
              0.17312855 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.049439456 = queryNorm
              0.23214069 = fieldWeight in 2717, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=2717)
      0.5 = coord(2/4)
    
    Abstract
    Previous work an search-engine design has indicated that information-seekers may benefit from being given the opportunity to exploit multiple sources of evidence of document relatedness. Few existing systems, however, give users more than minimal control over the selections that may be made among methods of exploitation. By applying the methods of "document network analysis" (DNA), a unifying, graph-theoretic model of content-, collaboration-, and context-based systems (CCC) may be developed in which the nature of the similarities between types of document relatedness and document ranking are clarified. The usefulness of the approach to system design suggested by this model may be tested by constructing and evaluating a prototype system (UCXtra) that allows searchers to maintain control over the multiple ways in which document collections may be ranked and re-ranked.
    Date
    11. 9.2004 17:32:22
  19. Ning, X.; Jin, H.; Wu, H.: RSS: a framework enabling ranked search on the semantic web (2008) 0.09
    0.09062726 = product of:
      0.18125452 = sum of:
        0.08227421 = weight(_text_:web in 2069) [ClassicSimilarity], result of:
          0.08227421 = score(doc=2069,freq=16.0), product of:
            0.16134618 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.049439456 = queryNorm
            0.5099235 = fieldWeight in 2069, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2069)
        0.09898031 = weight(_text_:search in 2069) [ClassicSimilarity], result of:
          0.09898031 = score(doc=2069,freq=18.0), product of:
            0.17183559 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.049439456 = queryNorm
            0.5760175 = fieldWeight in 2069, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2069)
      0.5 = coord(2/4)
    
    Abstract
    The semantic web not only contains resources but also includes the heterogeneous relationships among them, which is sharply distinguished from the current web. As the growth of the semantic web, specialized search techniques are of significance. In this paper, we present RSS-a framework for enabling ranked semantic search on the semantic web. In this framework, the heterogeneity of relationships is fully exploited to determine the global importance of resources. In addition, the search results can be greatly expanded with entities most semantically related to the query, thus able to provide users with properly ordered semantic search results by combining global ranking values and the relevance between the resources and the query. The proposed semantic search model which supports inference is very different from traditional keyword-based search methods. Moreover, RSS also distinguishes from many current methods of accessing the semantic web data in that it applies novel ranking strategies to prevent returning search results in disorder. The experimental results show that the framework is feasible and can produce better ordering of semantic search results than directly applying the standard PageRank algorithm on the semantic web.
    Theme
    Semantic Web
  20. Meghabghab, G.: Google's Web page ranking applied to different topological Web graph structures (2001) 0.09
    0.08929091 = product of:
      0.17858182 = sum of:
        0.1395027 = weight(_text_:web in 6028) [ClassicSimilarity], result of:
          0.1395027 = score(doc=6028,freq=46.0), product of:
            0.16134618 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.049439456 = queryNorm
            0.86461735 = fieldWeight in 6028, product of:
              6.78233 = tf(freq=46.0), with freq of:
                46.0 = termFreq=46.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=6028)
        0.03907912 = product of:
          0.07815824 = sum of:
            0.07815824 = weight(_text_:engine in 6028) [ClassicSimilarity], result of:
              0.07815824 = score(doc=6028,freq=2.0), product of:
                0.26447627 = queryWeight, product of:
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.049439456 = queryNorm
                0.29552078 = fieldWeight in 6028, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.349498 = idf(docFreq=570, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=6028)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    This research is part of the ongoing study to better understand web page ranking on the web. It looks at a web page as a graph structure or a web graph, and tries to classify different web graphs in the new coordinate space: (out-degree, in-degree). The out-degree coordinate od is defined as the number of outgoing web pages from a given web page. The in-degree id coordinate is the number of web pages that point to a given web page. In this new coordinate space a metric is built to classify how close or far different web graphs are. Google's web ranking algorithm (Brin & Page, 1998) on ranking web pages is applied in this new coordinate space. The results of the algorithm has been modified to fit different topological web graph structures. Also the algorithm was not successful in the case of general web graphs and new ranking web algorithms have to be considered. This study does not look at enhancing web ranking by adding any contextual information. It only considers web links as a source to web page ranking. The author believes that understanding the underlying web page as a graph will help design better ranking web algorithms, enhance retrieval and web performance, and recommends using graphs as a part of visual aid for browsing engine designers

Languages

  • e 143
  • d 12
  • chi 1
  • m 1
  • More… Less…

Types

  • a 142
  • m 7
  • el 4
  • s 4
  • r 2
  • p 1
  • x 1
  • More… Less…