Search (136 results, page 1 of 7)

Back, J.: ¬An evaluation of relevancy ranking techniques used by Internet search engines (2000) 0.26

0.26002502 = product of:
  0.39003754 = sum of:
    0.093939 = weight(_text_:search in 3445) [ClassicSimilarity], result of:
      0.093939 = score(doc=3445,freq=2.0), product of:
        0.1747324 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.05027291 = queryNorm
        0.5376164 = fieldWeight in 3445, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.109375 = fieldNorm(doc=3445)
    0.29609853 = sum of:
      0.20074052 = weight(_text_:engines in 3445) [ClassicSimilarity], result of:
        0.20074052 = score(doc=3445,freq=2.0), product of:
          0.25542772 = queryWeight, product of:
            5.080822 = idf(docFreq=746, maxDocs=44218)
            0.05027291 = queryNorm
          0.7858995 = fieldWeight in 3445, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            5.080822 = idf(docFreq=746, maxDocs=44218)
            0.109375 = fieldNorm(doc=3445)
      0.095358 = weight(_text_:22 in 3445) [ClassicSimilarity], result of:
        0.095358 = score(doc=3445,freq=2.0), product of:
          0.17604718 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.05027291 = queryNorm
          0.5416616 = fieldWeight in 3445, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.109375 = fieldNorm(doc=3445)
  0.6666667 = coord(2/3)

Date: 25. 8.2005 17:42:22

Stock, W.G.: On relevance distributions (2006) 0.13

0.12821087 = product of:
  0.1923163 = sum of:
    0.0929755 = weight(_text_:search in 5116) [ClassicSimilarity], result of:
      0.0929755 = score(doc=5116,freq=6.0), product of:
        0.1747324 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.05027291 = queryNorm
        0.5321022 = fieldWeight in 5116, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.0625 = fieldNorm(doc=5116)
    0.099340804 = product of:
      0.19868161 = sum of:
        0.19868161 = weight(_text_:engines in 5116) [ClassicSimilarity], result of:
          0.19868161 = score(doc=5116,freq=6.0), product of:
            0.25542772 = queryWeight, product of:
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.05027291 = queryNorm
            0.7778389 = fieldWeight in 5116, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.0625 = fieldNorm(doc=5116)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: There are at least three possible ways that documents are distributed by relevance: informetric (power law), inverse logistic, and dichotomous. The nature of the type of distribution has implications for the construction of relevance ranking algorithms for search engines, for automated (blind) relevance feedback, for user behavior when using Web search engines, for combining of outputs of search engines for metasearch, for topic detection and tracking, and for the methodology of evaluation of information retrieval systems.

Wills, R.S.: Google's PageRank : the math behind the search engine (2006) 0.13
```
0.12667575 = product of:
  0.19001363 = sum of:
    0.12588942 = weight(_text_:search in 5954) [ClassicSimilarity], result of:
      0.12588942 = score(doc=5954,freq=44.0), product of:
        0.1747324 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.05027291 = queryNorm
        0.72046983 = fieldWeight in 5954, product of:
          6.6332498 = tf(freq=44.0), with freq of:
            44.0 = termFreq=44.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.03125 = fieldNorm(doc=5954)
    0.06412421 = product of:
      0.12824842 = sum of:
        0.12824842 = weight(_text_:engines in 5954) [ClassicSimilarity], result of:
          0.12824842 = score(doc=5954,freq=10.0), product of:
            0.25542772 = queryWeight, product of:
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.05027291 = queryNorm
            0.50209284 = fieldWeight in 5954, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.03125 = fieldNorm(doc=5954)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

Approximately 91 million American adults use the Internet on a typical day The number-one Internet activity is reading and writing e-mail. Search engine use is next in line and continues to increase in popularity. In fact, survey findings indicate that nearly 60 million American adults use search engines on a given day. Even though there are many Internet search engines, Google, Yahoo!, and MSN receive over 81% of all search requests. Despite claims that the quality of search provided by Yahoo! and MSN now equals that of Google, Google continues to thrive as the search engine of choice, receiving over 46% of all search requests, nearly double the volume of Yahoo! and over four times that of MSN. I use Google's search engine on a daily basis and rarely request information from other search engines. One day, I decided to visit the homepages of Google. Yahoo!, and MSN to compare the quality of search results. Coffee was on my mind that day, so I entered the simple query "coffee" in the search box at each homepage. Table 1 shows the top ten (unsponsored) results returned by each search engine. Although ordered differently, two webpages, www.peets.com and www.coffeegeek.com, appear in all three top ten lists. In addition, each pairing of top ten lists has two additional results in common. Depending on the information I hoped to obtain about coffee by using the search engines, I could argue that any one of the three returned better results: however, I was not looking for a particular webpage, so all three listings of search results seemed of equal quality. Thus, I plan to continue using Google. My decision is indicative of the problem Yahoo!, MSN, and other search engine companies face in the quest to obtain a larger percentage of Internet search volume. Search engine users are loyal to one or a few search engines and are generally happy with search results. Thus, as long as Google continues to provide results deemed high in quality, Google likely will remain the top search engine. But what set Google apart from its competitors in the first place? The answer is PageRank. In this article I explain this simple mathematical algorithm that revolutionized Web search.
Khoo, C.S.G.; Wan, K.-W.: ¬A simple relevancy-ranking strategy for an interface to Boolean OPACs (2004) 0.12
```
0.122785255 = product of:
  0.18417788 = sum of:
    0.11015324 = weight(_text_:search in 2509) [ClassicSimilarity], result of:
      0.11015324 = score(doc=2509,freq=44.0), product of:
        0.1747324 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.05027291 = queryNorm
        0.6304111 = fieldWeight in 2509, product of:
          6.6332498 = tf(freq=44.0), with freq of:
            44.0 = termFreq=44.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.02734375 = fieldNorm(doc=2509)
    0.07402463 = sum of:
      0.05018513 = weight(_text_:engines in 2509) [ClassicSimilarity], result of:
        0.05018513 = score(doc=2509,freq=2.0), product of:
          0.25542772 = queryWeight, product of:
            5.080822 = idf(docFreq=746, maxDocs=44218)
            0.05027291 = queryNorm
          0.19647488 = fieldWeight in 2509, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            5.080822 = idf(docFreq=746, maxDocs=44218)
            0.02734375 = fieldNorm(doc=2509)
      0.0238395 = weight(_text_:22 in 2509) [ClassicSimilarity], result of:
        0.0238395 = score(doc=2509,freq=2.0), product of:
          0.17604718 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.05027291 = queryNorm
          0.1354154 = fieldWeight in 2509, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.02734375 = fieldNorm(doc=2509)
  0.6666667 = coord(2/3)
```
Abstract

A relevancy-ranking algorithm for a natural language interface to Boolean online public access catalogs (OPACs) was formulated and compared with that currently used in a knowledge-based search interface called the E-Referencer, being developed by the authors. The algorithm makes use of seven weIl-known ranking criteria: breadth of match, section weighting, proximity of query words, variant word forms (stemming), document frequency, term frequency and document length. The algorithm converts a natural language query into a series of increasingly broader Boolean search statements. In a small experiment with ten subjects in which the algorithm was simulated by hand, the algorithm obtained good results with a mean overall precision of 0.42 and mean average precision of 0.62, representing a 27 percent improvement in precision and 41 percent improvement in average precision compared to the E-Referencer. The usefulness of each step in the algorithm was analyzed and suggestions are made for improving the algorithm.

Content

"Most Web search engines accept natural language queries, perform some kind of fuzzy matching and produce ranked output, displaying first the documents that are most likely to be relevant. On the other hand, most library online public access catalogs (OPACs) an the Web are still Boolean retrieval systems that perform exact matching, and require users to express their search requests precisely in a Boolean search language and to refine their search statements to improve the search results. It is well-documented that users have difficulty searching Boolean OPACs effectively (e.g. Borgman, 1996; Ensor, 1992; Wallace, 1993). One approach to making OPACs easier to use is to develop a natural language search interface that acts as a middleware between the user's Web browser and the OPAC system. The search interface can accept a natural language query from the user and reformulate it as a series of Boolean search statements that are then submitted to the OPAC. The records retrieved by the OPAC are ranked by the search interface before forwarding them to the user's Web browser. The user, then, does not need to interact directly with the Boolean OPAC but with the natural language search interface or search intermediary. The search interface interacts with the OPAC system an the user's behalf. The advantage of this approach is that no modification to the OPAC or library system is required. Furthermore, the search interface can access multiple OPACs, acting as a meta search engine, and integrate search results from various OPACs before sending them to the user. The search interface needs to incorporate a method for converting the user's natural language query into a series of Boolean search statements, and for ranking the OPAC records retrieved. The purpose of this study was to develop a relevancyranking algorithm for a search interface to Boolean OPAC systems. This is part of an on-going effort to develop a knowledge-based search interface to OPACs called the E-Referencer (Khoo et al., 1998, 1999; Poo et al., 2000). E-Referencer v. 2 that has been implemented applies a repertoire of initial search strategies and reformulation strategies to retrieve records from OPACs using the Z39.50 protocol, and also assists users in mapping query keywords to the Library of Congress subject headings."

Source

Electronic library. 22(2004) no.2, S.112-120

Furner, J.: ¬A unifying model of document relatedness for hybrid search engines (2003) 0.12

0.12255666 = product of:
  0.18383498 = sum of:
    0.056935627 = weight(_text_:search in 2717) [ClassicSimilarity], result of:
      0.056935627 = score(doc=2717,freq=4.0), product of:
        0.1747324 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.05027291 = queryNorm
        0.3258447 = fieldWeight in 2717, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.046875 = fieldNorm(doc=2717)
    0.12689936 = sum of:
      0.08603165 = weight(_text_:engines in 2717) [ClassicSimilarity], result of:
        0.08603165 = score(doc=2717,freq=2.0), product of:
          0.25542772 = queryWeight, product of:
            5.080822 = idf(docFreq=746, maxDocs=44218)
            0.05027291 = queryNorm
          0.33681408 = fieldWeight in 2717, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            5.080822 = idf(docFreq=746, maxDocs=44218)
            0.046875 = fieldNorm(doc=2717)
      0.040867712 = weight(_text_:22 in 2717) [ClassicSimilarity], result of:
        0.040867712 = score(doc=2717,freq=2.0), product of:
          0.17604718 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.05027291 = queryNorm
          0.23214069 = fieldWeight in 2717, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=2717)
  0.6666667 = coord(2/3)

Abstract: Previous work an search-engine design has indicated that information-seekers may benefit from being given the opportunity to exploit multiple sources of evidence of document relatedness. Few existing systems, however, give users more than minimal control over the selections that may be made among methods of exploitation. By applying the methods of "document network analysis" (DNA), a unifying, graph-theoretic model of content-, collaboration-, and context-based systems (CCC) may be developed in which the nature of the similarities between types of document relatedness and document ranking are clarified. The usefulness of the approach to system design suggested by this model may be tested by constructing and evaluating a prototype system (UCXtra) that allows searchers to maintain control over the multiple ways in which document collections may be ranked and re-ranked.
Date: 11. 9.2004 17:32:22

Joss, M.W.; Wszola, S.: ¬The engines that can : text search and retrieval software, their strategies, and vendors (1996) 0.11

0.11143929 = product of:
  0.16715893 = sum of:
    0.04025957 = weight(_text_:search in 5123) [ClassicSimilarity], result of:
      0.04025957 = score(doc=5123,freq=2.0), product of:
        0.1747324 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.05027291 = queryNorm
        0.230407 = fieldWeight in 5123, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.046875 = fieldNorm(doc=5123)
    0.12689936 = sum of:
      0.08603165 = weight(_text_:engines in 5123) [ClassicSimilarity], result of:
        0.08603165 = score(doc=5123,freq=2.0), product of:
          0.25542772 = queryWeight, product of:
            5.080822 = idf(docFreq=746, maxDocs=44218)
            0.05027291 = queryNorm
          0.33681408 = fieldWeight in 5123, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            5.080822 = idf(docFreq=746, maxDocs=44218)
            0.046875 = fieldNorm(doc=5123)
      0.040867712 = weight(_text_:22 in 5123) [ClassicSimilarity], result of:
        0.040867712 = score(doc=5123,freq=2.0), product of:
          0.17604718 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.05027291 = queryNorm
          0.23214069 = fieldWeight in 5123, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=5123)
  0.6666667 = coord(2/3)

Date: 12. 9.1996 13:56:22

Berry, M.W.; Browne, M.: Understanding search engines : mathematical modeling and text retrieval (1999) 0.11

0.11103386 = product of:
  0.16655079 = sum of:
    0.08051914 = weight(_text_:search in 5777) [ClassicSimilarity], result of:
      0.08051914 = score(doc=5777,freq=8.0), product of:
        0.1747324 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.05027291 = queryNorm
        0.460814 = fieldWeight in 5777, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.046875 = fieldNorm(doc=5777)
    0.08603165 = product of:
      0.1720633 = sum of:
        0.1720633 = weight(_text_:engines in 5777) [ClassicSimilarity], result of:
          0.1720633 = score(doc=5777,freq=8.0), product of:
            0.25542772 = queryWeight, product of:
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.05027291 = queryNorm
            0.67362815 = fieldWeight in 5777, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.046875 = fieldNorm(doc=5777)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: This book discusses many of the key design issues for building search engines and emphazises the important role that applied mathematics can play in improving information retrieval. The authors discuss not only important data structures, algorithms, and software but also user-centered issues such as interfaces, manual indexing, and document preparation. They also present some of the current problems in information retrieval that many not be familiar to applied mathematicians and computer scientists and some of the driving computational methods (SVD, SDD) for automated conceptual indexing
LCSH: Web search engines
Subject: Web search engines

Bar-Ilan, J.; Levene, M.; Mat-Hassan, M.: Methods for evaluating dynamic changes in search engine rankings : a case study (2006) 0.11
```
0.10716495 = product of:
  0.16074742 = sum of:
    0.08487463 = weight(_text_:search in 616) [ClassicSimilarity], result of:
      0.08487463 = score(doc=616,freq=20.0), product of:
        0.1747324 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.05027291 = queryNorm
        0.48574063 = fieldWeight in 616, product of:
          4.472136 = tf(freq=20.0), with freq of:
            20.0 = termFreq=20.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.03125 = fieldNorm(doc=616)
    0.075872794 = product of:
      0.15174559 = sum of:
        0.15174559 = weight(_text_:engines in 616) [ClassicSimilarity], result of:
          0.15174559 = score(doc=616,freq=14.0), product of:
            0.25542772 = queryWeight, product of:
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.05027291 = queryNorm
            0.59408426 = fieldWeight in 616, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.03125 = fieldNorm(doc=616)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

Purpose - The objective of this paper is to characterize the changes in the rankings of the top ten results of major search engines over time and to compare the rankings between these engines. Design/methodology/approach - The papers compare rankings of the top-ten results of the search engines Google and AlltheWeb on ten identical queries over a period of three weeks. Only the top-ten results were considered, since users do not normally inspect more than the first results page returned by a search engine. The experiment was repeated twice, in October 2003 and in January 2004, in order to assess changes to the top-ten results of some of the queries during the three months interval. In order to assess the changes in the rankings, three measures were computed for each data collection point and each search engine. Findings - The findings in this paper show that the rankings of AlltheWeb were highly stable over each period, while the rankings of Google underwent constant yet minor changes, with occasional major ones. Changes over time can be explained by the dynamic nature of the web or by fluctuations in the search engines' indexes. The top-ten results of the two search engines had surprisingly low overlap. With such small overlap, the task of comparing the rankings of the two engines becomes extremely challenging. Originality/value - The paper shows that because of the abundance of information on the web, ranking search results is of extreme importance. The paper compares several measures for computing the similarity between rankings of search tools, and shows that none of the measures is fully satisfactory as a standalone measure. It also demonstrates the apparent differences in the ranking algorithms of two widely used search engines.

Jindal, V.; Bawa, S.; Batra, S.: ¬A review of ranking approaches for semantic search on Web (2014) 0.10

0.10057114 = product of:
  0.1508567 = sum of:
    0.09002314 = weight(_text_:search in 2799) [ClassicSimilarity], result of:
      0.09002314 = score(doc=2799,freq=10.0), product of:
        0.1747324 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.05027291 = queryNorm
        0.51520574 = fieldWeight in 2799, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.046875 = fieldNorm(doc=2799)
    0.060833566 = product of:
      0.12166713 = sum of:
        0.12166713 = weight(_text_:engines in 2799) [ClassicSimilarity], result of:
          0.12166713 = score(doc=2799,freq=4.0), product of:
            0.25542772 = queryWeight, product of:
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.05027291 = queryNorm
            0.47632706 = fieldWeight in 2799, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.046875 = fieldNorm(doc=2799)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: With ever increasing information being available to the end users, search engines have become the most powerful tools for obtaining useful information scattered on the Web. However, it is very common that even most renowned search engines return result sets with not so useful pages to the user. Research on semantic search aims to improve traditional information search and retrieval methods where the basic relevance criteria rely primarily on the presence of query keywords within the returned pages. This work is an attempt to explore different relevancy ranking approaches based on semantics which are considered appropriate for the retrieval of relevant information. In this paper, various pilot projects and their corresponding outcomes have been investigated based on methodologies adopted and their most distinctive characteristics towards ranking. An overview of selected approaches and their comparison by means of the classification criteria has been presented. With the help of this comparison, some common concepts and outstanding features have been identified.

Berry, M.W.; Browne, M.: Understanding search engines : mathematical modeling and text retrieval (2005) 0.10
```
0.097439155 = product of:
  0.14615873 = sum of:
    0.075914174 = weight(_text_:search in 7) [ClassicSimilarity], result of:
      0.075914174 = score(doc=7,freq=16.0), product of:
        0.1747324 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.05027291 = queryNorm
        0.43445963 = fieldWeight in 7, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.03125 = fieldNorm(doc=7)
    0.07024455 = product of:
      0.1404891 = sum of:
        0.1404891 = weight(_text_:engines in 7) [ClassicSimilarity], result of:
          0.1404891 = score(doc=7,freq=12.0), product of:
            0.25542772 = queryWeight, product of:
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.05027291 = queryNorm
            0.5500151 = fieldWeight in 7, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.03125 = fieldNorm(doc=7)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

The second edition of Understanding Search Engines: Mathematical Modeling and Text Retrieval follows the basic premise of the first edition by discussing many of the key design issues for building search engines and emphasizing the important role that applied mathematics can play in improving information retrieval. The authors discuss important data structures, algorithms, and software as well as user-centered issues such as interfaces, manual indexing, and document preparation. Significant changes bring the text up to date on current information retrieval methods: for example the addition of a new chapter on link-structure algorithms used in search engines such as Google. The chapter on user interface has been rewritten to specifically focus on search engine usability. In addition the authors have added new recommendations for further reading and expanded the bibliography, and have updated and streamlined the index to make it more reader friendly.

Content

Inhalt: Introduction Document File Preparation - Manual Indexing - Information Extraction - Vector Space Modeling - Matrix Decompositions - Query Representations - Ranking and Relevance Feedback - Searching by Link Structure - User Interface - Book Format Document File Preparation Document Purification and Analysis - Text Formatting - Validation - Manual Indexing - Automatic Indexing - Item Normalization - Inverted File Structures - Document File - Dictionary List - Inversion List - Other File Structures Vector Space Models Construction - Term-by-Document Matrices - Simple Query Matching - Design Issues - Term Weighting - Sparse Matrix Storage - Low-Rank Approximations Matrix Decompositions QR Factorization - Singular Value Decomposition - Low-Rank Approximations - Query Matching - Software - Semidiscrete Decomposition - Updating Techniques Query Management Query Binding - Types of Queries - Boolean Queries - Natural Language Queries - Thesaurus Queries - Fuzzy Queries - Term Searches - Probabilistic Queries Ranking and Relevance Feedback Performance Evaluation - Precision - Recall - Average Precision - Genetic Algorithms - Relevance Feedback Searching by Link Structure HITS Method - HITS Implementation - HITS Summary - PageRank Method - PageRank Adjustments - PageRank Implementation - PageRank Summary User Interface Considerations General Guidelines - Search Engine Interfaces - Form Fill-in - Display Considerations - Progress Indication - No Penalties for Error - Results - Test and Retest - Final Considerations Further Reading

LCSH

Web search engines

Subject

Web search engines

Radev, D.; Fan, W.; Qu, H.; Wu, H.; Grewal, A.: Probabilistic question answering on the Web (2005) 0.10

0.09615815 = product of:
  0.14423722 = sum of:
    0.06973162 = weight(_text_:search in 3455) [ClassicSimilarity], result of:
      0.06973162 = score(doc=3455,freq=6.0), product of:
        0.1747324 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.05027291 = queryNorm
        0.39907667 = fieldWeight in 3455, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.046875 = fieldNorm(doc=3455)
    0.074505605 = product of:
      0.14901121 = sum of:
        0.14901121 = weight(_text_:engines in 3455) [ClassicSimilarity], result of:
          0.14901121 = score(doc=3455,freq=6.0), product of:
            0.25542772 = queryWeight, product of:
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.05027291 = queryNorm
            0.58337915 = fieldWeight in 3455, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.046875 = fieldNorm(doc=3455)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: Web-based search engines such as Google and NorthernLight return documents that are relevant to a user query, not answers to user questions. We have developed an architecture that augments existing search engines so that they support natural language question answering. The process entails five steps: query modulation, document retrieval, passage extraction, phrase extraction, and answer ranking. In this article, we describe some probabilistic approaches to the last three of these stages. We show how our techniques apply to a number of existing search engines, and we also present results contrasting three different methods for question answering. Our algorithm, probabilistic phrase reranking (PPR), uses proximity and question type features and achieves a total reciprocal document rank of .20 an the TREC8 corpus. Our techniques have been implemented as a Web-accessible system, called NSIR.

Habernal, I.; Konopík, M.; Rohlík, O.: Question answering (2012) 0.10

0.09615815 = product of:
  0.14423722 = sum of:
    0.06973162 = weight(_text_:search in 101) [ClassicSimilarity], result of:
      0.06973162 = score(doc=101,freq=6.0), product of:
        0.1747324 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.05027291 = queryNorm
        0.39907667 = fieldWeight in 101, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.046875 = fieldNorm(doc=101)
    0.074505605 = product of:
      0.14901121 = sum of:
        0.14901121 = weight(_text_:engines in 101) [ClassicSimilarity], result of:
          0.14901121 = score(doc=101,freq=6.0), product of:
            0.25542772 = queryWeight, product of:
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.05027291 = queryNorm
            0.58337915 = fieldWeight in 101, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.046875 = fieldNorm(doc=101)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: Question Answering is an area of information retrieval with the added challenge of applying sophisticated techniques to identify the complex syntactic and semantic relationships present in text in order to provide a more sophisticated and satisfactory response to the user's information needs. For this reason, the authors see question answering as the next step beyond standard information retrieval. In this chapter state of the art question answering is covered focusing on providing an overview of systems, techniques and approaches that are likely to be employed in the next generations of search engines. Special attention is paid to question answering using the World Wide Web as the data source and to question answering exploiting the possibilities of Semantic Web. Considerations about the current issues and prospects for promising future research are also provided.
Footnote: Vgl.: http://www.igi-global.com/book/next-generation-search-engines/64431.
Source: Next generation search engines: advanced models for information retrieval. Eds.: C. Jouis, u.a

Evans, R.: Beyond Boolean : relevance ranking, natural language and the new search paradigm (1994) 0.09

0.094235145 = product of:
  0.14135271 = sum of:
    0.08051914 = weight(_text_:search in 8578) [ClassicSimilarity], result of:
      0.08051914 = score(doc=8578,freq=8.0), product of:
        0.1747324 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.05027291 = queryNorm
        0.460814 = fieldWeight in 8578, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.046875 = fieldNorm(doc=8578)
    0.060833566 = product of:
      0.12166713 = sum of:
        0.12166713 = weight(_text_:engines in 8578) [ClassicSimilarity], result of:
          0.12166713 = score(doc=8578,freq=4.0), product of:
            0.25542772 = queryWeight, product of:
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.05027291 = queryNorm
            0.47632706 = fieldWeight in 8578, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.046875 = fieldNorm(doc=8578)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: New full-text search engines that employ relevance ranking have become available online services. These software tools provide increased ease of use by making natural language queries possible, and deliver superior recall. Even inexperienced end users can execute searchers with good results. For experienced database searchers, the ranked search engines offer a technology that is complementary to structured Boolean strategy, not necessarily a replacement. Even traditional Boolean queries become useful when the results are ranked by probable relevance, such ranking can free users from overwhelming output. Relevance ranking also permits the use of statistical inference methods to find related terms. using such tools to their best advantage requires rethinking some basic techniques, such as progressively narrowing queries until the retrieved set is small enough. users should broaden their search to maximize recall, then browse retrieved documents or pare the set down from the top

Courtois, M.P.; Berry, M.W.: Results ranking in Web search engines (1999) 0.09

0.09252822 = product of:
  0.13879232 = sum of:
    0.06709928 = weight(_text_:search in 3726) [ClassicSimilarity], result of:
      0.06709928 = score(doc=3726,freq=2.0), product of:
        0.1747324 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.05027291 = queryNorm
        0.3840117 = fieldWeight in 3726, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.078125 = fieldNorm(doc=3726)
    0.07169304 = product of:
      0.14338608 = sum of:
        0.14338608 = weight(_text_:engines in 3726) [ClassicSimilarity], result of:
          0.14338608 = score(doc=3726,freq=2.0), product of:
            0.25542772 = queryWeight, product of:
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.05027291 = queryNorm
            0.5613568 = fieldWeight in 3726, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.078125 = fieldNorm(doc=3726)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Henzinger, M.R.: Hyperlink analysis for the Web (2001) 0.09
```
0.090090275 = product of:
  0.13513541 = sum of:
    0.07101121 = weight(_text_:search in 8) [ClassicSimilarity], result of:
      0.07101121 = score(doc=8,freq=14.0), product of:
        0.1747324 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.05027291 = queryNorm
        0.4063998 = fieldWeight in 8, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.03125 = fieldNorm(doc=8)
    0.06412421 = product of:
      0.12824842 = sum of:
        0.12824842 = weight(_text_:engines in 8) [ClassicSimilarity], result of:
          0.12824842 = score(doc=8,freq=10.0), product of:
            0.25542772 = queryWeight, product of:
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.05027291 = queryNorm
            0.50209284 = fieldWeight in 8, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.03125 = fieldNorm(doc=8)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

Hyperlink analysis algorithms allow search engines to deliver focused results to user queries.This article surveys ranking algorithms used to retrieve information on the Web.

Content

Information retrieval is a computer science subfield whose goal is to find all documents relevant to a user query in a given collection of documents. As such, information retrieval should really be called document retrieval. Before the advent of the Web, IR systems were typically installed in libraries for use mostly by reference librarians. The retrieval algorithm for these systems was usually based exclusively on analysis of the words in the document. The Web changed all this. Now each Web user has access to various search engines whose retrieval algorithms often use not only the words in the documents but also information like the hyperlink structure of the Web or markup language tags. How are hyperlinks useful? The hyperlink functionality alone-that is, the hyperlink to Web page B that is contained in Web page A-is not directly useful in information retrieval. However, the way Web page authors use hyperlinks can give them valuable information content. Authors usually create hyperlinks they think will be useful to readers. Some may be navigational aids that, for example, take the reader back to the site's home page; others provide access to documents that augment the content of the current page. The latter tend to point to highquality pages that might be on the same topic as the page containing the hyperlink. Web information retrieval systems can exploit this information to refine searches for relevant documents. Hyperlink analysis significantly improves the relevance of the search results, so much so that all major Web search engines claim to use some type of hyperlink analysis. However, the search engines do not disclose details about the type of hyperlink analysis they perform- mostly to avoid manipulation of search results by Web-positioning companies. In this article, I discuss how hyperlink analysis can be applied to ranking algorithms, and survey other ways Web search engines can use this analysis.

Tenopir, C.: Online databases : natural language searching with WIN (1993) 0.09

0.088845745 = product of:
  0.13326861 = sum of:
    0.075914174 = weight(_text_:search in 7038) [ClassicSimilarity], result of:
      0.075914174 = score(doc=7038,freq=4.0), product of:
        0.1747324 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.05027291 = queryNorm
        0.43445963 = fieldWeight in 7038, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.0625 = fieldNorm(doc=7038)
    0.057354435 = product of:
      0.11470887 = sum of:
        0.11470887 = weight(_text_:engines in 7038) [ClassicSimilarity], result of:
          0.11470887 = score(doc=7038,freq=2.0), product of:
            0.25542772 = queryWeight, product of:
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.05027291 = queryNorm
            0.44908544 = fieldWeight in 7038, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.0625 = fieldNorm(doc=7038)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: WESTLAW is one of the first major commercial online systems to embrace both natural language input and partial match searching. Provides a backgroud to WESTLAW. Explains how the WESTLAW Is Natural (WIN) search engine works. Some searchers find that when searching with commands and Boolean logic, results differ drastically from those produces by searching with WIN. Discusses exact match Boolean logic search engines

O'Leary, M.: DIALOG TARGET's new age searching (1993) 0.09

0.088845745 = product of:
  0.13326861 = sum of:
    0.075914174 = weight(_text_:search in 7951) [ClassicSimilarity], result of:
      0.075914174 = score(doc=7951,freq=4.0), product of:
        0.1747324 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.05027291 = queryNorm
        0.43445963 = fieldWeight in 7951, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.0625 = fieldNorm(doc=7951)
    0.057354435 = product of:
      0.11470887 = sum of:
        0.11470887 = weight(_text_:engines in 7951) [ClassicSimilarity], result of:
          0.11470887 = score(doc=7951,freq=2.0), product of:
            0.25542772 = queryWeight, product of:
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.05027291 = queryNorm
            0.44908544 = fieldWeight in 7951, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.0625 = fieldNorm(doc=7951)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: Relevance search engines, which measure the occurrence of search terms in a group of retrieved records and rank them accordingly, often produce better results than refined Boolean searches. Relevance searching has emerged from the research stage to be on the verge of becoming the standard retrieval method. Describes and evaluates the operation of DIALOG's TARGET, a major accomplishment, despite some rough edges

Bhansali, D.; Desai, H.; Deulkar, K.: ¬A study of different ranking approaches for semantic search (2015) 0.09
```
0.08858277 = product of:
  0.13287415 = sum of:
    0.08217951 = weight(_text_:search in 2696) [ClassicSimilarity], result of:
      0.08217951 = score(doc=2696,freq=12.0), product of:
        0.1747324 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.05027291 = queryNorm
        0.47031635 = fieldWeight in 2696, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2696)
    0.05069464 = product of:
      0.10138928 = sum of:
        0.10138928 = weight(_text_:engines in 2696) [ClassicSimilarity], result of:
          0.10138928 = score(doc=2696,freq=4.0), product of:
            0.25542772 = queryWeight, product of:
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.05027291 = queryNorm
            0.39693922 = fieldWeight in 2696, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2696)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

Search Engines have become an integral part of our day to day life. Our reliance on search engines increases with every passing day. With the amount of data available on Internet increasing exponentially, it becomes important to develop new methods and tools that help to return results relevant to the queries and reduce the time spent on searching. The results should be diverse but at the same time should return results focused on the queries asked. Relation Based Page Rank [4] algorithms are considered to be the next frontier in improvement of Semantic Web Search. The probability of finding relevance in the search results as posited by the user while entering the query is used to measure the relevance. However, its application is limited by the complexity of determining relation between the terms and assigning explicit meaning to each term. Trust Rank is one of the most widely used ranking algorithms for semantic web search. Few other ranking algorithms like HITS algorithm, PageRank algorithm are also used for Semantic Web Searching. In this paper, we will provide a comparison of few ranking approaches.

Brenner, E.H.: Beyond Boolean : new approaches in information retrieval; the quest for intuitive online search systems past, present & future (1995) 0.09

0.08769246 = product of:
  0.13153869 = sum of:
    0.08135357 = weight(_text_:search in 2547) [ClassicSimilarity], result of:
      0.08135357 = score(doc=2547,freq=6.0), product of:
        0.1747324 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.05027291 = queryNorm
        0.46558946 = fieldWeight in 2547, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2547)
    0.05018513 = product of:
      0.10037026 = sum of:
        0.10037026 = weight(_text_:engines in 2547) [ClassicSimilarity], result of:
          0.10037026 = score(doc=2547,freq=2.0), product of:
            0.25542772 = queryWeight, product of:
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.05027291 = queryNorm
            0.39294976 = fieldWeight in 2547, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2547)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: The challenge of effectively bringing specific, relevant information from the global sea of data to our fingertips, has become an increasingly difficult one. Discusses how the online information industry, founded on Boolean search systems, may be evolving to take advantage of other methods, such as 'term weighting', 'relevance ranking' and 'query by example'
Content: (1) The Boolean world; (2) The Non-Boolean picture; (3) The commercial search engines: Personal Librarian, CLARIT, ConQuest, DR-LINK, InQuizit, InTEXT, TOPIC, WIN, TARGET, FREESTYLE, InfoSeek; (4) Wiedergabe von 8 Aufsätzen aus 'Monitor'

Desai, M.; Spink, A.: ¬A algorithm to cluster documents based on relevance (2005) 0.08
```
0.08235665 = product of:
  0.12353496 = sum of:
    0.08051914 = weight(_text_:search in 1035) [ClassicSimilarity], result of:
      0.08051914 = score(doc=1035,freq=8.0), product of:
        0.1747324 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.05027291 = queryNorm
        0.460814 = fieldWeight in 1035, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.046875 = fieldNorm(doc=1035)
    0.043015826 = product of:
      0.08603165 = sum of:
        0.08603165 = weight(_text_:engines in 1035) [ClassicSimilarity], result of:
          0.08603165 = score(doc=1035,freq=2.0), product of:
            0.25542772 = queryWeight, product of:
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.05027291 = queryNorm
            0.33681408 = fieldWeight in 1035, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.080822 = idf(docFreq=746, maxDocs=44218)
              0.046875 = fieldNorm(doc=1035)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

Search engines fail to make a clear distinction between items of varying relevance when presenting search results to users. Instead, they rely on the user of the system to estimate which items are relevant, partially relevant, or not relevant. The user of the system is given the task of distinguishing between documents that are relevant to different degrees. This process often hinders the accessibility of relevant or partially relevant documents, particularly when the results set is large and documents of varying relevance are scattered throughout the set. In this paper, we present a clustering scheme that groups documents within relevant, partially relevant, and not relevant regions for a given search. A clustering algorithm accomplishes the task of clustering documents based on relevance. The clusters were evaluated by end-users issuing categorical, interval, and descriptive relevance judgments for the documents returned from a search. The degree of overlap between users and the system for each of the clustered regions was measured to determine the overall effectiveness of the algorithm. This research showed that clustering documents on the Web by regions of relevance is highly necessary and quite feasible.

Search (136 results, page 1 of 7)

Authors

Years

Languages

Types

Themes

Subjects

Classifications