Search (129 results, page 1 of 7)

Chang, C.-H.; Hsu, C.-C.: Integrating query expansion and conceptual relevance feedback for personalized Web information retrieval (1998) 0.12

0.11753695 = product of:
  0.17630541 = sum of:
    0.045744486 = weight(_text_:world in 1319) [ClassicSimilarity], result of:
      0.045744486 = score(doc=1319,freq=2.0), product of:
        0.1538826 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.04003532 = queryNorm
        0.29726875 = fieldWeight in 1319, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1319)
    0.06078585 = weight(_text_:wide in 1319) [ClassicSimilarity], result of:
      0.06078585 = score(doc=1319,freq=2.0), product of:
        0.17738682 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.04003532 = queryNorm
        0.342674 = fieldWeight in 1319, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1319)
    0.057118528 = weight(_text_:web in 1319) [ClassicSimilarity], result of:
      0.057118528 = score(doc=1319,freq=6.0), product of:
        0.13065568 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.04003532 = queryNorm
        0.43716836 = fieldWeight in 1319, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1319)
    0.012656543 = product of:
      0.03796963 = sum of:
        0.03796963 = weight(_text_:22 in 1319) [ClassicSimilarity], result of:
          0.03796963 = score(doc=1319,freq=2.0), product of:
            0.14019686 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04003532 = queryNorm
            0.2708308 = fieldWeight in 1319, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1319)
      0.33333334 = coord(1/3)
  0.6666667 = coord(4/6)

Abstract: Keyword based querying has been an immediate and efficient way to specify and retrieve related information that the user inquired. However, conventional document ranking based on an automatic assessment of document relevance to the query may not be the best approach when little information is given. Proposes an idea to integrate 2 existing techniques, query expansion and relevance feedback to achieve a concept-based information search for the Web
Date: 1. 8.1996 22:08:06
Footnote: Contribution to a special issue devoted to the Proceedings of the 7th International World Wide Web Conference, held 14-18 April 1998, Brisbane, Australia

Berry, M.W.; Browne, M.: Understanding search engines : mathematical modeling and text retrieval (1999) 0.09

0.092833474 = product of:
  0.18566695 = sum of:
    0.05545069 = weight(_text_:world in 5777) [ClassicSimilarity], result of:
      0.05545069 = score(doc=5777,freq=4.0), product of:
        0.1538826 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.04003532 = queryNorm
        0.36034414 = fieldWeight in 5777, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.046875 = fieldNorm(doc=5777)
    0.073683575 = weight(_text_:wide in 5777) [ClassicSimilarity], result of:
      0.073683575 = score(doc=5777,freq=4.0), product of:
        0.17738682 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.04003532 = queryNorm
        0.4153836 = fieldWeight in 5777, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.046875 = fieldNorm(doc=5777)
    0.056532677 = weight(_text_:web in 5777) [ClassicSimilarity], result of:
      0.056532677 = score(doc=5777,freq=8.0), product of:
        0.13065568 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.04003532 = queryNorm
        0.43268442 = fieldWeight in 5777, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=5777)
  0.5 = coord(3/6)

LCSH: Web search engines
RSWK: World Wide Web / Suchmaschine / Mathematisches Modell (BVB)
Subject: World Wide Web / Suchmaschine / Mathematisches Modell (BVB)
Web search engines

Stock, M.; Stock, W.G.: Internet-Suchwerkzeuge im Vergleich (IV) : Relevance Ranking nach "Popularität" von Webseiten: Google (2001) 0.07

0.07013522 = product of:
  0.14027044 = sum of:
    0.03920956 = weight(_text_:world in 5771) [ClassicSimilarity], result of:
      0.03920956 = score(doc=5771,freq=2.0), product of:
        0.1538826 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.04003532 = queryNorm
        0.25480178 = fieldWeight in 5771, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.046875 = fieldNorm(doc=5771)
    0.052102152 = weight(_text_:wide in 5771) [ClassicSimilarity], result of:
      0.052102152 = score(doc=5771,freq=2.0), product of:
        0.17738682 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.04003532 = queryNorm
        0.29372054 = fieldWeight in 5771, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.046875 = fieldNorm(doc=5771)
    0.048958737 = weight(_text_:web in 5771) [ClassicSimilarity], result of:
      0.048958737 = score(doc=5771,freq=6.0), product of:
        0.13065568 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.04003532 = queryNorm
        0.37471575 = fieldWeight in 5771, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=5771)
  0.5 = coord(3/6)

Abstract: In unserem Retrievaltest von Suchwerkzeugen im World Wide Web (Password 11/2000) schnitt die Suchmaschine Google am besten ab. Im Vergleich zu anderen Search Engines setzt Google kaum auf Informationslinguistik, sondern auf Algorithmen, die sich aus den Besonderheiten der Web-Dokumente ableiten lassen. Kernstück der informationsstatistischen Technik ist das "PageRank"- Verfahren (benannt nach dem Entwickler Larry Page), das aus der Hypertextstruktur des Web die "Popularität" von Seiten anhand ihrer ein- und ausgehenden Links berechnet. Google besticht durch das Angebot intuitiv verstehbarer Suchbildschirme sowie durch einige sehr nützliche "Kleinigkeiten" wie die Angabe des Rangs einer Seite, Highlighting, Suchen in der Seite, Suchen innerhalb eines Suchergebnisses usw., alles verstaut in einer eigenen Befehlsleiste innerhalb des Browsers. Ähnlich wie RealNames bietet Google mit dem Produkt "AdWords" den Aufkauf von Suchtermen an. Nach einer Reihe von nunmehr vier Password-Artikeln über InternetSuchwerkzeugen im Vergleich wollen wir abschließend zu einer Bewertung kommen. Wie ist der Stand der Technik bei Directories und Search Engines aus informationswissenschaftlicher Sicht einzuschätzen? Werden die "typischen" Internetnutzer, die ja in der Regel keine Information Professionals sind, adäquat bedient? Und können auch Informationsfachleute von den Suchwerkzeugen profitieren?

Ding, Y.; Chowdhury, G.; Foo, S.: Organsising keywords in a Web search environment : a methodology based on co-word analysis (2000) 0.07

0.06564318 = product of:
  0.13128635 = sum of:
    0.03920956 = weight(_text_:world in 105) [ClassicSimilarity], result of:
      0.03920956 = score(doc=105,freq=2.0), product of:
        0.1538826 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.04003532 = queryNorm
        0.25480178 = fieldWeight in 105, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.046875 = fieldNorm(doc=105)
    0.052102152 = weight(_text_:wide in 105) [ClassicSimilarity], result of:
      0.052102152 = score(doc=105,freq=2.0), product of:
        0.17738682 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.04003532 = queryNorm
        0.29372054 = fieldWeight in 105, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.046875 = fieldNorm(doc=105)
    0.03997464 = weight(_text_:web in 105) [ClassicSimilarity], result of:
      0.03997464 = score(doc=105,freq=4.0), product of:
        0.13065568 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.04003532 = queryNorm
        0.3059541 = fieldWeight in 105, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=105)
  0.5 = coord(3/6)

Abstract: The rapid development of the Internet and World Wide Web has caused some critical problem for information retrieval. Researchers have made several attempts to solve these problems. Thesauri and subject heading lists as traditional information retrieval tools have been criticised for their efficiency to tackle these newly emerging problems. This paper proposes an information retrieval tool generated by cocitation analysis, comprising keyword clusters with relationships based on the co-occurrences of keywords in the literature. Such a tool can play the role of an associative thesaurus that can provide information about the keywords in a domain that might be useful for information searching and query expansion

Habernal, I.; Konopík, M.; Rohlík, O.: Question answering (2012) 0.07

0.06564318 = product of:
  0.13128635 = sum of:
    0.03920956 = weight(_text_:world in 101) [ClassicSimilarity], result of:
      0.03920956 = score(doc=101,freq=2.0), product of:
        0.1538826 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.04003532 = queryNorm
        0.25480178 = fieldWeight in 101, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.046875 = fieldNorm(doc=101)
    0.052102152 = weight(_text_:wide in 101) [ClassicSimilarity], result of:
      0.052102152 = score(doc=101,freq=2.0), product of:
        0.17738682 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.04003532 = queryNorm
        0.29372054 = fieldWeight in 101, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.046875 = fieldNorm(doc=101)
    0.03997464 = weight(_text_:web in 101) [ClassicSimilarity], result of:
      0.03997464 = score(doc=101,freq=4.0), product of:
        0.13065568 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.04003532 = queryNorm
        0.3059541 = fieldWeight in 101, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=101)
  0.5 = coord(3/6)

Abstract: Question Answering is an area of information retrieval with the added challenge of applying sophisticated techniques to identify the complex syntactic and semantic relationships present in text in order to provide a more sophisticated and satisfactory response to the user's information needs. For this reason, the authors see question answering as the next step beyond standard information retrieval. In this chapter state of the art question answering is covered focusing on providing an overview of systems, techniques and approaches that are likely to be employed in the next generations of search engines. Special attention is paid to question answering using the World Wide Web as the data source and to question answering exploiting the possibilities of Semantic Web. Considerations about the current issues and prospects for promising future research are also provided.

Kleinberg, J.M.: Authoritative sources in a hyperlinked environment (1998) 0.06

0.059789024 = product of:
  0.11957805 = sum of:
    0.03920956 = weight(_text_:world in 5) [ClassicSimilarity], result of:
      0.03920956 = score(doc=5,freq=2.0), product of:
        0.1538826 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.04003532 = queryNorm
        0.25480178 = fieldWeight in 5, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.046875 = fieldNorm(doc=5)
    0.052102152 = weight(_text_:wide in 5) [ClassicSimilarity], result of:
      0.052102152 = score(doc=5,freq=2.0), product of:
        0.17738682 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.04003532 = queryNorm
        0.29372054 = fieldWeight in 5, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.046875 = fieldNorm(doc=5)
    0.028266339 = weight(_text_:web in 5) [ClassicSimilarity], result of:
      0.028266339 = score(doc=5,freq=2.0), product of:
        0.13065568 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.04003532 = queryNorm
        0.21634221 = fieldWeight in 5, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=5)
  0.5 = coord(3/6)

Abstract: The network structure of a hyperlinked environment can be a rich source of information about the content of the environment, provided we have effective means for understanding it. We develop a set of algorithmic tools for extracting information from the link structures of such environments, and report on experiments that demonstrate their effectiveness in a variety of contexts on the World Wide Web. The central issue we address within our framework is the distillation of broad search topics, through the discovery of "authoritative" information sources on such topics. We propose and test an algorithmic formulation of the notion of authority, based on the relationship between a set of relevant authoritative pages and the set of "hub pages" that join them together in the link structure. Our formulation has connections to the eigenvectors of certain matrices associated with the link graph; these connections in turn motivate additional heuristics for link-based analysis.

Picard, J.; Savoy, J.: Enhancing retrieval with hyperlinks : a general model based on propositional argumentation systems (2003) 0.05

0.054702647 = product of:
  0.109405294 = sum of:
    0.032674633 = weight(_text_:world in 1427) [ClassicSimilarity], result of:
      0.032674633 = score(doc=1427,freq=2.0), product of:
        0.1538826 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.04003532 = queryNorm
        0.21233483 = fieldWeight in 1427, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1427)
    0.043418463 = weight(_text_:wide in 1427) [ClassicSimilarity], result of:
      0.043418463 = score(doc=1427,freq=2.0), product of:
        0.17738682 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.04003532 = queryNorm
        0.24476713 = fieldWeight in 1427, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1427)
    0.0333122 = weight(_text_:web in 1427) [ClassicSimilarity], result of:
      0.0333122 = score(doc=1427,freq=4.0), product of:
        0.13065568 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.04003532 = queryNorm
        0.25496176 = fieldWeight in 1427, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1427)
  0.5 = coord(3/6)

Abstract: Fast, effective, and adaptable techniques are needed to automatically organize and retrieve information an the ever-increasing World Wide Web. In that respect, different strategies have been suggested to take hypertext links into account. For example, hyperlinks have been used to (1) enhance document representation, (2) improve document ranking by propagating document score, (3) provide an indicator of popularity, and (4) find hubs and authorities for a given topic. Although the TREC experiments have not demonstrated the usefulness of hyperlinks for retrieval, the hypertext structure is nevertheless an essential aspect of the Web, and as such, should not be ignored. The development of abstract models of the IR task was a key factor to the improvement of search engines. However, at this time conceptual tools for modeling the hypertext retrieval task are lacking, making it difficult to compare, improve, and reason an the existing techniques. This article proposes a general model for using hyperlinks based an Probabilistic Argumentation Systems, in which each of the above-mentioned techniques can be stated. This model will allow to discover some inconsistencies in the mentioned techniques, and to take a higher level and systematic approach for using hyperlinks for retrieval.

Austin, D.: How Google finds your needle in the Web's haystack : as we'll see, the trick is to ask the web itself to rank the importance of pages... (2006) 0.05
```
0.049951125 = product of:
  0.09990225 = sum of:
    0.022872243 = weight(_text_:world in 93) [ClassicSimilarity], result of:
      0.022872243 = score(doc=93,freq=2.0), product of:
        0.1538826 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.04003532 = queryNorm
        0.14863437 = fieldWeight in 93, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.02734375 = fieldNorm(doc=93)
    0.030392924 = weight(_text_:wide in 93) [ClassicSimilarity], result of:
      0.030392924 = score(doc=93,freq=2.0), product of:
        0.17738682 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.04003532 = queryNorm
        0.171337 = fieldWeight in 93, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.02734375 = fieldNorm(doc=93)
    0.04663708 = weight(_text_:web in 93) [ClassicSimilarity], result of:
      0.04663708 = score(doc=93,freq=16.0), product of:
        0.13065568 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.04003532 = queryNorm
        0.35694647 = fieldWeight in 93, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.02734375 = fieldNorm(doc=93)
  0.5 = coord(3/6)
```
Abstract

Imagine a library containing 25 billion documents but with no centralized organization and no librarians. In addition, anyone may add a document at any time without telling anyone. You may feel sure that one of the documents contained in the collection has a piece of information that is vitally important to you, and, being impatient like most of us, you'd like to find it in a matter of seconds. How would you go about doing it? Posed in this way, the problem seems impossible. Yet this description is not too different from the World Wide Web, a huge, highly-disorganized collection of documents in many different formats. Of course, we're all familiar with search engines (perhaps you found this article using one) so we know that there is a solution. This article will describe Google's PageRank algorithm and how it returns pages from the web's collection of 25 billion documents that match search criteria so well that "google" has become a widely used verb. Most search engines, including Google, continually run an army of computer programs that retrieve pages from the web, index the words in each document, and store this information in an efficient format. Each time a user asks for a web search using a search phrase, such as "search engine," the search engine determines all the pages on the web that contains the words in the search phrase. (Perhaps additional information such as the distance between the words "search" and "engine" will be noted as well.) Here is the problem: Google now claims to index 25 billion pages. Roughly 95% of the text in web pages is composed from a mere 10,000 words. This means that, for most searches, there will be a huge number of pages containing the words in the search phrase. What is needed is a means of ranking the importance of the pages that fit the search criteria so that the pages can be sorted with the most important pages at the top of the list. One way to determine the importance of pages is to use a human-generated ranking. For instance, you may have seen pages that consist mainly of a large number of links to other resources in a particular area of interest. Assuming the person maintaining this page is reliable, the pages referenced are likely to be useful. Of course, the list may quickly fall out of date, and the person maintaining the list may miss some important pages, either unintentionally or as a result of an unstated bias. Google's PageRank algorithm assesses the importance of web pages without human evaluation of the content. In fact, Google feels that the value of its service is largely in its ability to provide unbiased results to search queries; Google claims, "the heart of our software is PageRank." As we'll see, the trick is to ask the web itself to rank the importance of pages.
Kantor, P.; Kim, M.H.; Ibraev, U.; Atasoy, K.: Estimating the number of relevant documents in enormous collections (1999) 0.05
```
0.049824186 = product of:
  0.09964837 = sum of:
    0.032674633 = weight(_text_:world in 6690) [ClassicSimilarity], result of:
      0.032674633 = score(doc=6690,freq=2.0), product of:
        0.1538826 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.04003532 = queryNorm
        0.21233483 = fieldWeight in 6690, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.0390625 = fieldNorm(doc=6690)
    0.043418463 = weight(_text_:wide in 6690) [ClassicSimilarity], result of:
      0.043418463 = score(doc=6690,freq=2.0), product of:
        0.17738682 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.04003532 = queryNorm
        0.24476713 = fieldWeight in 6690, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.0390625 = fieldNorm(doc=6690)
    0.023555283 = weight(_text_:web in 6690) [ClassicSimilarity], result of:
      0.023555283 = score(doc=6690,freq=2.0), product of:
        0.13065568 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.04003532 = queryNorm
        0.18028519 = fieldWeight in 6690, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=6690)
  0.5 = coord(3/6)
```
Abstract

In assessing information retrieval systems, it is important to know not only the precision of the retrieved set, but also to compare the number of retrieved relevant items to the total number of relevant items. For large collections, such as the TREC test collections, or the World Wide Web, it is not possible to enumerate the entire set of relevant documents. If the retrieved documents are evaluated, a variant of the statistical "capture-recapture" method can be used to estimate the total number of relevant documents, providing the several retrieval systems used are sufficiently independent. We show that the underlying signal detection model supporting such an analysis can be extended in two ways. First, assuming that there are two distinct performance characteristics (corresponding to the chance of retrieving a relevant, and retrieving a given non-relevant document), we show that if there are three or more independent systems available it is possible to estimate the number of relevant documents without actually having to decide whether each individual document is relevant. We report applications of this 3-system method to the TREC data, leading to the conclusion that the independence assumptions are not satisfied. We then extend the model to a multi-system, multi-problem model, and show that it is possible to include statistical dependencies of all orders in the model, and determine the number of relevant documents for each of the problems in the set. Application to the TREC setting will be presented

Fan, W.; Fox, E.A.; Pathak, P.; Wu, H.: ¬The effects of fitness functions an genetic programming-based ranking discovery for Web search (2004) 0.02

0.022460382 = product of:
  0.06738114 = sum of:
    0.056532677 = weight(_text_:web in 2239) [ClassicSimilarity], result of:
      0.056532677 = score(doc=2239,freq=8.0), product of:
        0.13065568 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.04003532 = queryNorm
        0.43268442 = fieldWeight in 2239, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=2239)
    0.010848465 = product of:
      0.032545395 = sum of:
        0.032545395 = weight(_text_:22 in 2239) [ClassicSimilarity], result of:
          0.032545395 = score(doc=2239,freq=2.0), product of:
            0.14019686 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04003532 = queryNorm
            0.23214069 = fieldWeight in 2239, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2239)
      0.33333334 = coord(1/3)
  0.33333334 = coord(2/6)

Abstract: Genetic-based evolutionary learning algorithms, such as genetic algorithms (GAs) and genetic programming (GP), have been applied to information retrieval (IR) since the 1980s. Recently, GP has been applied to a new IR taskdiscovery of ranking functions for Web search-and has achieved very promising results. However, in our prior research, only one fitness function has been used for GP-based learning. It is unclear how other fitness functions may affect ranking function discovery for Web search, especially since it is weIl known that choosing a proper fitness function is very important for the effectiveness and efficiency of evolutionary algorithms. In this article, we report our experience in contrasting different fitness function designs an GP-based learning using a very large Web corpus. Our results indicate that the design of fitness functions is instrumental in performance improvement. We also give recommendations an the design of fitness functions for genetic-based information retrieval experiments.
Date: 31. 5.2004 19:22:06

Symonds, M.; Bruza, P.; Zuccon, G.; Koopman, B.; Sitbon, L.; Turner, I.: Automatic query expansion : a structural linguistic perspective (2014) 0.02
```
0.022324583 = product of:
  0.066973746 = sum of:
    0.043418463 = weight(_text_:wide in 1338) [ClassicSimilarity], result of:
      0.043418463 = score(doc=1338,freq=2.0), product of:
        0.17738682 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.04003532 = queryNorm
        0.24476713 = fieldWeight in 1338, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1338)
    0.023555283 = weight(_text_:web in 1338) [ClassicSimilarity], result of:
      0.023555283 = score(doc=1338,freq=2.0), product of:
        0.13065568 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.04003532 = queryNorm
        0.18028519 = fieldWeight in 1338, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1338)
  0.33333334 = coord(2/6)
```
Abstract

A user's query is considered to be an imprecise description of their information need. Automatic query expansion is the process of reformulating the original query with the goal of improving retrieval effectiveness. Many successful query expansion techniques model syntagmatic associations that infer two terms co-occur more often than by chance in natural language. However, structural linguistics relies on both syntagmatic and paradigmatic associations to deduce the meaning of a word. Given the success of dependency-based approaches to query expansion and the reliance on word meanings in the query formulation process, we argue that modeling both syntagmatic and paradigmatic information in the query expansion process improves retrieval effectiveness. This article develops and evaluates a new query expansion technique that is based on a formal, corpus-based model of word meaning that models syntagmatic and paradigmatic associations. We demonstrate that when sufficient statistical information exists, as in the case of longer queries, including paradigmatic information alone provides significant improvements in retrieval effectiveness across a wide variety of data sets. More generally, when our new query expansion approach is applied to large-scale web retrieval it demonstrates significant improvements in retrieval effectiveness over a strong baseline system, based on a commercial search engine.
Meghabghab, G.: Google's Web page ranking applied to different topological Web graph structures (2001) 0.02
```
0.018827861 = product of:
  0.11296716 = sum of:
    0.11296716 = weight(_text_:web in 6028) [ClassicSimilarity], result of:
      0.11296716 = score(doc=6028,freq=46.0), product of:
        0.13065568 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.04003532 = queryNorm
        0.86461735 = fieldWeight in 6028, product of:
          6.78233 = tf(freq=46.0), with freq of:
            46.0 = termFreq=46.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=6028)
  0.16666667 = coord(1/6)
```
Abstract

This research is part of the ongoing study to better understand web page ranking on the web. It looks at a web page as a graph structure or a web graph, and tries to classify different web graphs in the new coordinate space: (out-degree, in-degree). The out-degree coordinate od is defined as the number of outgoing web pages from a given web page. The in-degree id coordinate is the number of web pages that point to a given web page. In this new coordinate space a metric is built to classify how close or far different web graphs are. Google's web ranking algorithm (Brin & Page, 1998) on ranking web pages is applied in this new coordinate space. The results of the algorithm has been modified to fit different topological web graph structures. Also the algorithm was not successful in the case of general web graphs and new ranking web algorithms have to be considered. This study does not look at enhancing web ranking by adding any contextual information. It only considers web links as a source to web page ranking. The author believes that understanding the underlying web page as a graph will help design better ranking web algorithms, enhance retrieval and web performance, and recommends using graphs as a part of visual aid for browsing engine designers
Agosti, M.; Pretto, L.: ¬A theoretical study of a generalized version of kleinberg's HITS algorithm (2005) 0.02
```
0.01874434 = product of:
  0.05623302 = sum of:
    0.047110565 = weight(_text_:web in 4) [ClassicSimilarity], result of:
      0.047110565 = score(doc=4,freq=8.0), product of:
        0.13065568 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.04003532 = queryNorm
        0.36057037 = fieldWeight in 4, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4)
    0.009122452 = product of:
      0.027367353 = sum of:
        0.027367353 = weight(_text_:29 in 4) [ClassicSimilarity], result of:
          0.027367353 = score(doc=4,freq=2.0), product of:
            0.14083174 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.04003532 = queryNorm
            0.19432661 = fieldWeight in 4, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4)
      0.33333334 = coord(1/3)
  0.33333334 = coord(2/6)
```
Abstract

Kleinberg's HITS (Hyperlink-Induced Topic Search) algorithm (Kleinberg 1999), which was originally developed in a Web context, tries to infer the authoritativeness of a Web page in relation to a specific query using the structure of a subgraph of the Web graph, which is obtained considering this specific query. Recent applications of this algorithm in contexts far removed from that of Web searching (Bacchin, Ferro and Melucci 2002, Ng et al. 2001) inspired us to study the algorithm in the abstract, independently of its particular applications, trying to mathematically illuminate its behaviour. In the present paper we detail this theoretical analysis. The original work starts from the definition of a revised and more general version of the algorithm, which includes the classic one as a particular case. We perform an analysis of the structure of two particular matrices, essential to studying the behaviour of the algorithm, and we prove the convergence of the algorithm in the most general case, finding the analytic expression of the vectors to which it converges. Then we study the symmetry of the algorithm and prove the equivalence between the existence of symmetry and the independence from the order of execution of some basic operations on initial vectors. Finally, we expound some interesting consequences of our theoretical results.

Date

31.12.1996 19:29:41
Baloh, P.; Desouza, K.C.; Hackney, R.: Contextualizing organizational interventions of knowledge management systems : a design science perspectiveA domain analysis (2012) 0.02
```
0.017486285 = product of:
  0.052458853 = sum of:
    0.043418463 = weight(_text_:wide in 241) [ClassicSimilarity], result of:
      0.043418463 = score(doc=241,freq=2.0), product of:
        0.17738682 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.04003532 = queryNorm
        0.24476713 = fieldWeight in 241, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.0390625 = fieldNorm(doc=241)
    0.009040388 = product of:
      0.027121164 = sum of:
        0.027121164 = weight(_text_:22 in 241) [ClassicSimilarity], result of:
          0.027121164 = score(doc=241,freq=2.0), product of:
            0.14019686 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04003532 = queryNorm
            0.19345059 = fieldWeight in 241, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=241)
      0.33333334 = coord(1/3)
  0.33333334 = coord(2/6)
```
Abstract

We address how individuals' (workers) knowledge needs influence the design of knowledge management systems (KMS), enabling knowledge creation and utilization. It is evident that KMS technologies and activities are indiscriminately deployed in most organizations with little regard to the actual context of their adoption. Moreover, it is apparent that the extant literature pertaining to knowledge management projects is frequently deficient in identifying the variety of factors indicative for successful KMS. This presents an obvious business practice and research gap that requires a critical analysis of the necessary intervention that will actually improve how workers can leverage and form organization-wide knowledge. This research involved an extensive review of the literature, a grounded theory methodological approach and rigorous data collection and synthesis through an empirical case analysis (Parsons Brinckerhoff and Samsung). The contribution of this study is the formulation of a model for designing KMS based upon the design science paradigm, which aspires to create artifacts that are interdependent of people and organizations. The essential proposition is that KMS design and implementation must be contextualized in relation to knowledge needs and that these will differ for various organizational settings. The findings present valuable insights and further understanding of the way in which KMS design efforts should be focused.

Date

11. 6.2012 14:22:34

Chakrabarti, S.; Dom, B.; Kumar, S.R.; Raghavan, P.; Rajagopalan, S.; Tomkins, A.; Kleinberg, J.M.; Gibson, D.: Neue Pfade durch den Internet-Dschungel : Die zweite Generation von Web-Suchmaschinen (1999) 0.02

0.017428126 = product of:
  0.052284375 = sum of:
    0.037688453 = weight(_text_:web in 3) [ClassicSimilarity], result of:
      0.037688453 = score(doc=3,freq=2.0), product of:
        0.13065568 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.04003532 = queryNorm
        0.2884563 = fieldWeight in 3, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0625 = fieldNorm(doc=3)
    0.014595922 = product of:
      0.043787766 = sum of:
        0.043787766 = weight(_text_:29 in 3) [ClassicSimilarity], result of:
          0.043787766 = score(doc=3,freq=2.0), product of:
            0.14083174 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.04003532 = queryNorm
            0.31092256 = fieldWeight in 3, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0625 = fieldNorm(doc=3)
      0.33333334 = coord(1/3)
  0.33333334 = coord(2/6)

Date: 31.12.1996 19:29:41

Käki, M.: fKWIC: frequency-based Keyword-in-Context Index for filtering Web search results (2006) 0.02

0.01697386 = product of:
  0.05092158 = sum of:
    0.03997464 = weight(_text_:web in 6112) [ClassicSimilarity], result of:
      0.03997464 = score(doc=6112,freq=4.0), product of:
        0.13065568 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.04003532 = queryNorm
        0.3059541 = fieldWeight in 6112, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=6112)
    0.0109469425 = product of:
      0.032840826 = sum of:
        0.032840826 = weight(_text_:29 in 6112) [ClassicSimilarity], result of:
          0.032840826 = score(doc=6112,freq=2.0), product of:
            0.14083174 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.04003532 = queryNorm
            0.23319192 = fieldWeight in 6112, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.046875 = fieldNorm(doc=6112)
      0.33333334 = coord(1/3)
  0.33333334 = coord(2/6)

Abstract: Enormous Web search engine databases combined with short search queries result in large result sets that are often difficult to access. Result ranking works fairly well, but users need help when it fails. For these situations, we propose a filtering interface that is inspired by keyword-in-context (KWIC) indices. The user interface lists the most frequent keyword contexts (fKWIC). When a context is selected, the corresponding results are displayed in the result list, allowing users to concentrate on the specific context. We compared the keyword context index user interface to the rank order result listing in an experiment with 36 participants. The results show that the proposed user interface was 29% faster in finding relevant results, and the precision of the selected results was 19% higher. In addition, participants showed positive attitudes toward the system.

Zhu, J.; Han, L.; Gou, Z.; Yuan, X.: ¬A fuzzy clustering-based denoising model for evaluating uncertainty in collaborative filtering recommender systems (2018) 0.01
```
0.013932362 = product of:
  0.041797087 = sum of:
    0.032674633 = weight(_text_:world in 4460) [ClassicSimilarity], result of:
      0.032674633 = score(doc=4460,freq=2.0), product of:
        0.1538826 = queryWeight, product of:
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.04003532 = queryNorm
        0.21233483 = fieldWeight in 4460, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.8436708 = idf(docFreq=2573, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4460)
    0.009122452 = product of:
      0.027367353 = sum of:
        0.027367353 = weight(_text_:29 in 4460) [ClassicSimilarity], result of:
          0.027367353 = score(doc=4460,freq=2.0), product of:
            0.14083174 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.04003532 = queryNorm
            0.19432661 = fieldWeight in 4460, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4460)
      0.33333334 = coord(1/3)
  0.33333334 = coord(2/6)
```
Abstract

Recommender systems are effective in predicting the most suitable products for users, such as movies and books. To facilitate personalized recommendations, the quality of item ratings should be guaranteed. However, a few ratings might not be accurate enough due to the uncertainty of user behavior and are referred to as natural noise. In this article, we present a novel fuzzy clustering-based method for detecting noisy ratings. The entropy of a subset of the original ratings dataset is used to indicate the data-driven uncertainty, and evaluation metrics are adopted to represent the prediction-driven uncertainty. After the repetition of resampling and the execution of a recommendation algorithm, the entropy and evaluation metrics vectors are obtained and are empirically categorized to identify the proportion of the potential noise. Then, the fuzzy C-means-based denoising (FCMD) algorithm is performed to verify the natural noise under the assumption that natural noise is primarily the result of the exceptional behavior of users. Finally, a case study is performed using two real-world datasets. The experimental results show that our proposal outperforms previous proposals and has an advantage in dealing with natural noise.

Date

29. 9.2018 12:32:59
Thelwall, M.; Vaughan, L.: New versions of PageRank employing alternative Web document models (2004) 0.01
```
0.013324881 = product of:
  0.07994928 = sum of:
    0.07994928 = weight(_text_:web in 674) [ClassicSimilarity], result of:
      0.07994928 = score(doc=674,freq=16.0), product of:
        0.13065568 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.04003532 = queryNorm
        0.6119082 = fieldWeight in 674, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=674)
  0.16666667 = coord(1/6)
```
Abstract

Introduces several new versions of PageRank (the link based Web page ranking algorithm), based on an information science perspective on the concept of the Web document. Although the Web page is the typical indivisible unit of information in search engine results and most Web information retrieval algorithms, other research has suggested that aggregating pages based on directories and domains gives promising alternatives, particularly when Web links are the object of study. The new algorithms introduced based on these alternatives were used to rank four sets of Web pages. The ranking results were compared with human subjects' rankings. The results of the tests were somewhat inconclusive: the new approach worked well for the set that includes pages from different Web sites; however, it does not work well in ranking pages that are from the same site. It seems that the new algorithms may be effective for some tasks but not for others, especially when only low numbers of links are involved or the pages to be ranked are from the same site or directory.
Khoo, C.S.G.; Wan, K.-W.: ¬A simple relevancy-ranking strategy for an interface to Boolean OPACs (2004) 0.01
```
0.013101891 = product of:
  0.039305672 = sum of:
    0.0329774 = weight(_text_:web in 2509) [ClassicSimilarity], result of:
      0.0329774 = score(doc=2509,freq=8.0), product of:
        0.13065568 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.04003532 = queryNorm
        0.25239927 = fieldWeight in 2509, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.02734375 = fieldNorm(doc=2509)
    0.0063282717 = product of:
      0.018984815 = sum of:
        0.018984815 = weight(_text_:22 in 2509) [ClassicSimilarity], result of:
          0.018984815 = score(doc=2509,freq=2.0), product of:
            0.14019686 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04003532 = queryNorm
            0.1354154 = fieldWeight in 2509, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.02734375 = fieldNorm(doc=2509)
      0.33333334 = coord(1/3)
  0.33333334 = coord(2/6)
```
Content

"Most Web search engines accept natural language queries, perform some kind of fuzzy matching and produce ranked output, displaying first the documents that are most likely to be relevant. On the other hand, most library online public access catalogs (OPACs) an the Web are still Boolean retrieval systems that perform exact matching, and require users to express their search requests precisely in a Boolean search language and to refine their search statements to improve the search results. It is well-documented that users have difficulty searching Boolean OPACs effectively (e.g. Borgman, 1996; Ensor, 1992; Wallace, 1993). One approach to making OPACs easier to use is to develop a natural language search interface that acts as a middleware between the user's Web browser and the OPAC system. The search interface can accept a natural language query from the user and reformulate it as a series of Boolean search statements that are then submitted to the OPAC. The records retrieved by the OPAC are ranked by the search interface before forwarding them to the user's Web browser. The user, then, does not need to interact directly with the Boolean OPAC but with the natural language search interface or search intermediary. The search interface interacts with the OPAC system an the user's behalf. The advantage of this approach is that no modification to the OPAC or library system is required. Furthermore, the search interface can access multiple OPACs, acting as a meta search engine, and integrate search results from various OPACs before sending them to the user. The search interface needs to incorporate a method for converting the user's natural language query into a series of Boolean search statements, and for ranking the OPAC records retrieved. The purpose of this study was to develop a relevancyranking algorithm for a search interface to Boolean OPAC systems. This is part of an on-going effort to develop a knowledge-based search interface to OPACs called the E-Referencer (Khoo et al., 1998, 1999; Poo et al., 2000). E-Referencer v. 2 that has been implemented applies a repertoire of initial search strategies and reformulation strategies to retrieve records from OPACs using the Z39.50 protocol, and also assists users in mapping query keywords to the Library of Congress subject headings."

Source

Electronic library. 22(2004) no.2, S.112-120
Kaszkiel, M.; Zobel, J.: Effective ranking with arbitrary passages (2001) 0.01
```
0.013071094 = product of:
  0.03921328 = sum of:
    0.028266339 = weight(_text_:web in 5764) [ClassicSimilarity], result of:
      0.028266339 = score(doc=5764,freq=2.0), product of:
        0.13065568 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.04003532 = queryNorm
        0.21634221 = fieldWeight in 5764, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=5764)
    0.0109469425 = product of:
      0.032840826 = sum of:
        0.032840826 = weight(_text_:29 in 5764) [ClassicSimilarity], result of:
          0.032840826 = score(doc=5764,freq=2.0), product of:
            0.14083174 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.04003532 = queryNorm
            0.23319192 = fieldWeight in 5764, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.046875 = fieldNorm(doc=5764)
      0.33333334 = coord(1/3)
  0.33333334 = coord(2/6)
```
Abstract

Text retrieval systems store a great variety of documents, from abstracts, newspaper articles, and Web pages to journal articles, books, court transcripts, and legislation. Collections of diverse types of documents expose shortcomings in current approaches to ranking. Use of short fragments of documents, called passages, instead of whole documents can overcome these shortcomings: passage ranking provides convenient units of text to return to the user, can avoid the difficulties of comparing documents of different length, and enables identification of short blocks of relevant material among otherwise irrelevant text. In this article, we compare several kinds of passage in an extensive series of experiments. We introduce a new type of passage, overlapping fragments of either fixed or variable length. We show that ranking with these arbitrary passages gives substantial improvements in retrieval effectiveness over traditional document ranking schemes, particularly for queries on collections of long documents. Ranking with arbitrary passages shows consistent improvements compared to ranking with whole documents, and to ranking with previous passage types that depend on document structure or topic shifts in documents

Date

29. 9.2001 14:00:39

Search (129 results, page 1 of 7)

Authors

Years

Languages

Types

Themes

Subjects

Classifications