Search (144 results, page 1 of 8)

MacFarlane, A.; Robertson, S.E.; McCann, J.A.: Parallel computing for passage retrieval (2004) 0.11

0.10531742 = product of:
  0.21063484 = sum of:
    0.047231287 = weight(_text_:retrieval in 5108) [ClassicSimilarity], result of:
      0.047231287 = score(doc=5108,freq=4.0), product of:
        0.124912694 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.041294612 = queryNorm
        0.37811437 = fieldWeight in 5108, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0625 = fieldNorm(doc=5108)
    0.008925388 = weight(_text_:of in 5108) [ClassicSimilarity], result of:
      0.008925388 = score(doc=5108,freq=2.0), product of:
        0.06457475 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.041294612 = queryNorm
        0.13821793 = fieldWeight in 5108, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0625 = fieldNorm(doc=5108)
    0.008828212 = product of:
      0.017656423 = sum of:
        0.017656423 = weight(_text_:on in 5108) [ClassicSimilarity], result of:
          0.017656423 = score(doc=5108,freq=2.0), product of:
            0.090823986 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.041294612 = queryNorm
            0.19440265 = fieldWeight in 5108, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0625 = fieldNorm(doc=5108)
      0.5 = coord(1/2)
    0.14564995 = sum of:
      0.10089115 = weight(_text_:computers in 5108) [ClassicSimilarity], result of:
        0.10089115 = score(doc=5108,freq=2.0), product of:
          0.21710795 = queryWeight, product of:
            5.257537 = idf(docFreq=625, maxDocs=44218)
            0.041294612 = queryNorm
          0.464705 = fieldWeight in 5108, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            5.257537 = idf(docFreq=625, maxDocs=44218)
            0.0625 = fieldNorm(doc=5108)
      0.0447588 = weight(_text_:22 in 5108) [ClassicSimilarity], result of:
        0.0447588 = score(doc=5108,freq=2.0), product of:
          0.1446067 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.041294612 = queryNorm
          0.30952093 = fieldWeight in 5108, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0625 = fieldNorm(doc=5108)
  0.5 = coord(4/8)

Abstract: In this paper methods for both speeding up passage processing and examining more passages using parallel computers are explored. The number of passages processed are varied in order to examine the effect on retrieval effectiveness and efficiency. The particular algorithm applied has previously been used to good effect in Okapi experiments at TREC. This algorithm and the mechanism for applying parallel computing to speed up processing are described.
Date: 20. 1.2007 18:30:22

Losada, D.E.; Barreiro, A.: Emebedding term similarity and inverse document frequency into a logical model of information retrieval (2003) 0.07

0.067203455 = product of:
  0.13440691 = sum of:
    0.057846278 = weight(_text_:retrieval in 1422) [ClassicSimilarity], result of:
      0.057846278 = score(doc=1422,freq=6.0), product of:
        0.124912694 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.041294612 = queryNorm
        0.46309367 = fieldWeight in 1422, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0625 = fieldNorm(doc=1422)
    0.03422346 = weight(_text_:use in 1422) [ClassicSimilarity], result of:
      0.03422346 = score(doc=1422,freq=2.0), product of:
        0.12644777 = queryWeight, product of:
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.041294612 = queryNorm
        0.27065295 = fieldWeight in 1422, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.0625 = fieldNorm(doc=1422)
    0.019957775 = weight(_text_:of in 1422) [ClassicSimilarity], result of:
      0.019957775 = score(doc=1422,freq=10.0), product of:
        0.06457475 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.041294612 = queryNorm
        0.3090647 = fieldWeight in 1422, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0625 = fieldNorm(doc=1422)
    0.0223794 = product of:
      0.0447588 = sum of:
        0.0447588 = weight(_text_:22 in 1422) [ClassicSimilarity], result of:
          0.0447588 = score(doc=1422,freq=2.0), product of:
            0.1446067 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.041294612 = queryNorm
            0.30952093 = fieldWeight in 1422, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=1422)
      0.5 = coord(1/2)
  0.5 = coord(4/8)

Abstract: We propose a novel approach to incorporate term similarity and inverse document frequency into a logical model of information retrieval. The ability of the logic to handle expressive representations along with the use of such classical notions are promising characteristics for IR systems. The approach proposed here has been efficiently implemented and experiments against test collections are presented.
Date: 22. 3.2003 19:27:23
Footnote: Beitrag eines Themenheftes: Mathematical, logical, and formal methods in information retrieval
Source: Journal of the American Society for Information Science and technology. 54(2003) no.4, S.285-301

Crestani, F.; Dominich, S.; Lalmas, M.; Rijsbergen, C.J.K. van: Mathematical, logical, and formal methods in information retrieval : an introduction to the special issue (2003) 0.06

0.0613705 = product of:
  0.122741 = sum of:
    0.06135524 = weight(_text_:retrieval in 1451) [ClassicSimilarity], result of:
      0.06135524 = score(doc=1451,freq=12.0), product of:
        0.124912694 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.041294612 = queryNorm
        0.49118498 = fieldWeight in 1451, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=1451)
    0.025667597 = weight(_text_:use in 1451) [ClassicSimilarity], result of:
      0.025667597 = score(doc=1451,freq=2.0), product of:
        0.12644777 = queryWeight, product of:
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.041294612 = queryNorm
        0.20298971 = fieldWeight in 1451, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.046875 = fieldNorm(doc=1451)
    0.018933605 = weight(_text_:of in 1451) [ClassicSimilarity], result of:
      0.018933605 = score(doc=1451,freq=16.0), product of:
        0.06457475 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.041294612 = queryNorm
        0.2932045 = fieldWeight in 1451, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=1451)
    0.016784549 = product of:
      0.033569098 = sum of:
        0.033569098 = weight(_text_:22 in 1451) [ClassicSimilarity], result of:
          0.033569098 = score(doc=1451,freq=2.0), product of:
            0.1446067 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.041294612 = queryNorm
            0.23214069 = fieldWeight in 1451, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=1451)
      0.5 = coord(1/2)
  0.5 = coord(4/8)

Abstract: Research an the use of mathematical, logical, and formal methods, has been central to Information Retrieval research for a long time. Research in this area is important not only because it helps enhancing retrieval effectiveness, but also because it helps clarifying the underlying concepts of Information Retrieval. In this article we outline some of the major aspects of the subject, and summarize the papers of this special issue with respect to how they relate to these aspects. We conclude by highlighting some directions of future research, which are needed to better understand the formal characteristics of Information Retrieval.
Date: 22. 3.2003 19:27:36
Footnote: Einführung zu den Beiträgen eines Themenheftes: Mathematical, logical, and formal methods in information retrieval
Source: Journal of the American Society for Information Science and technology. 54(2003) no.4, S.281-284

Computational information retrieval (2001) 0.06

0.05918767 = product of:
  0.11837534 = sum of:
    0.06627123 = weight(_text_:retrieval in 4167) [ClassicSimilarity], result of:
      0.06627123 = score(doc=4167,freq=14.0), product of:
        0.124912694 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.041294612 = queryNorm
        0.5305404 = fieldWeight in 4167, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=4167)
    0.025667597 = weight(_text_:use in 4167) [ClassicSimilarity], result of:
      0.025667597 = score(doc=4167,freq=2.0), product of:
        0.12644777 = queryWeight, product of:
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.041294612 = queryNorm
        0.20298971 = fieldWeight in 4167, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.046875 = fieldNorm(doc=4167)
    0.014968331 = weight(_text_:of in 4167) [ClassicSimilarity], result of:
      0.014968331 = score(doc=4167,freq=10.0), product of:
        0.06457475 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.041294612 = queryNorm
        0.23179851 = fieldWeight in 4167, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=4167)
    0.011468184 = product of:
      0.022936368 = sum of:
        0.022936368 = weight(_text_:on in 4167) [ClassicSimilarity], result of:
          0.022936368 = score(doc=4167,freq=6.0), product of:
            0.090823986 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.041294612 = queryNorm
            0.25253648 = fieldWeight in 4167, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.046875 = fieldNorm(doc=4167)
      0.5 = coord(1/2)
  0.5 = coord(4/8)

Abstract: This volume contains selected papers that focus on the use of linear algebra, computational statistics, and computer science in the development of algorithms and software systems for text retrieval. Experts in information modeling and retrieval share their perspectives on the design of scalable but precise text retrieval systems, revealing many of the challenges and obstacles that mathematical and statistical models must overcome to be viable for automated text processing. This very useful proceedings is an excellent companion for courses in information retrieval, applied linear algebra, and applied statistics. Computational Information Retrieval provides background material on vector space models for text retrieval that applied mathematicians, statisticians, and computer scientists may not be familiar with. For graduate students in these areas, several research questions in information modeling are exposed. In addition, several case studies concerning the efficacy of the popular Latent Semantic Analysis (or Indexing) approach are provided.

Efthimiadis, E.N.: Interactive query expansion : a user-based evaluation in a relevance feedback environment (2000) 0.06
```
0.05606333 = product of:
  0.14950222 = sum of:
    0.033397563 = weight(_text_:retrieval in 5701) [ClassicSimilarity], result of:
      0.033397563 = score(doc=5701,freq=8.0), product of:
        0.124912694 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.041294612 = queryNorm
        0.26736724 = fieldWeight in 5701, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.03125 = fieldNorm(doc=5701)
    0.01728394 = weight(_text_:of in 5701) [ClassicSimilarity], result of:
      0.01728394 = score(doc=5701,freq=30.0), product of:
        0.06457475 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.041294612 = queryNorm
        0.26765788 = fieldWeight in 5701, product of:
          5.477226 = tf(freq=30.0), with freq of:
            30.0 = termFreq=30.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03125 = fieldNorm(doc=5701)
    0.098820716 = sum of:
      0.017656423 = weight(_text_:on in 5701) [ClassicSimilarity], result of:
        0.017656423 = score(doc=5701,freq=8.0), product of:
          0.090823986 = queryWeight, product of:
            2.199415 = idf(docFreq=13325, maxDocs=44218)
            0.041294612 = queryNorm
          0.19440265 = fieldWeight in 5701, product of:
            2.828427 = tf(freq=8.0), with freq of:
              8.0 = termFreq=8.0
            2.199415 = idf(docFreq=13325, maxDocs=44218)
            0.03125 = fieldNorm(doc=5701)
      0.08116429 = weight(_text_:line in 5701) [ClassicSimilarity], result of:
        0.08116429 = score(doc=5701,freq=4.0), product of:
          0.23157367 = queryWeight, product of:
            5.6078424 = idf(docFreq=440, maxDocs=44218)
            0.041294612 = queryNorm
          0.35049015 = fieldWeight in 5701, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            5.6078424 = idf(docFreq=440, maxDocs=44218)
            0.03125 = fieldNorm(doc=5701)
  0.375 = coord(3/8)
```
Abstract

A user-centered investigation of interactive query expansion within the context of a relevance feedback system is presented in this article. Data were collected from 25 searches using the INSPEC database. The data collection mechanisms included questionnaires, transaction logs, and relevance evaluations. The results discuss issues that relate to query expansion, retrieval effectiveness, the correspondence of the on-line-to-off-line relevance judgments, and the selection of terms for query expansion by users (interactive query expansion). The main conclusions drawn from the results of the study are that: (1) one-third of the terms presented to users in a list of candidate terms for query expansion was identified by the users as potentially useful for query expansion. (2) These terms were mainly judged as either variant expressions (synonyms) or alternative (related) terms to the initial query terms. However, a substantial portion of the selected terms were identified as representing new ideas. (3) The relationships identified between the five best terms selected by the users for query expansion and the initial query terms were that: (a) 34% of the query expansion terms have no relationship or other type of correspondence with a query term; (b) 66% of the remaining query expansion terms have a relationship to the query terms. These relationships were: narrower term (46%), broader term (3%), related term (17%). (4) The results provide evidence for the effectiveness of interactive query expansion. The initial search produced on average three highly relevant documents; the query expansion search produced on average nine further highly relevant documents. The conclusions highlight the need for more research on: interactive query expansion, the comparative evaluation of automatic vs. interactive query expansion, the study of weighted Webbased or Web-accessible retrieval systems in operational environments, and for user studies in searching ranked retrieval systems in general

Source

Journal of the American Society for Information Science. 51(2000) no.11, S.989-1003

Theme

Semantisches Umfeld in Indexierung u. Retrieval

Otterbacher, J.; Erkan, G.; Radev, D.R.: Biased LexRank : passage retrieval using random walks with question-based priors (2009) 0.05

0.053733695 = product of:
  0.10746739 = sum of:
    0.050615493 = weight(_text_:retrieval in 2450) [ClassicSimilarity], result of:
      0.050615493 = score(doc=2450,freq=6.0), product of:
        0.124912694 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.041294612 = queryNorm
        0.40520695 = fieldWeight in 2450, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2450)
    0.029945528 = weight(_text_:use in 2450) [ClassicSimilarity], result of:
      0.029945528 = score(doc=2450,freq=2.0), product of:
        0.12644777 = queryWeight, product of:
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.041294612 = queryNorm
        0.23682132 = fieldWeight in 2450, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2450)
    0.013526822 = weight(_text_:of in 2450) [ClassicSimilarity], result of:
      0.013526822 = score(doc=2450,freq=6.0), product of:
        0.06457475 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.041294612 = queryNorm
        0.20947541 = fieldWeight in 2450, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2450)
    0.013379549 = product of:
      0.026759097 = sum of:
        0.026759097 = weight(_text_:on in 2450) [ClassicSimilarity], result of:
          0.026759097 = score(doc=2450,freq=6.0), product of:
            0.090823986 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.041294612 = queryNorm
            0.29462588 = fieldWeight in 2450, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2450)
      0.5 = coord(1/2)
  0.5 = coord(4/8)

Abstract: We present Biased LexRank, a method for semi-supervised passage retrieval in the context of question answering. We represent a text as a graph of passages linked based on their pairwise lexical similarity. We use traditional passage retrieval techniques to identify passages that are likely to be relevant to a user's natural language question. We then perform a random walk on the lexical similarity graph in order to recursively retrieve additional passages that are similar to other relevant passages. We present results on several benchmarks that show the applicability of our work to question answering and topic-focused text summarization.

Sparck Jones, K.: ¬A statistical interpretation of term specificity and its application in retrieval (2004) 0.05

0.05337083 = product of:
  0.10674166 = sum of:
    0.041327372 = weight(_text_:retrieval in 4420) [ClassicSimilarity], result of:
      0.041327372 = score(doc=4420,freq=4.0), product of:
        0.124912694 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.041294612 = queryNorm
        0.33085006 = fieldWeight in 4420, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4420)
    0.029945528 = weight(_text_:use in 4420) [ClassicSimilarity], result of:
      0.029945528 = score(doc=4420,freq=2.0), product of:
        0.12644777 = queryWeight, product of:
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.041294612 = queryNorm
        0.23682132 = fieldWeight in 4420, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4420)
    0.022089208 = weight(_text_:of in 4420) [ClassicSimilarity], result of:
      0.022089208 = score(doc=4420,freq=16.0), product of:
        0.06457475 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.041294612 = queryNorm
        0.34207192 = fieldWeight in 4420, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4420)
    0.013379549 = product of:
      0.026759097 = sum of:
        0.026759097 = weight(_text_:on in 4420) [ClassicSimilarity], result of:
          0.026759097 = score(doc=4420,freq=6.0), product of:
            0.090823986 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.041294612 = queryNorm
            0.29462588 = fieldWeight in 4420, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4420)
      0.5 = coord(1/2)
  0.5 = coord(4/8)

Abstract: The exhaustivity of document descriptions and the specificity of index terms are usually regarded as independent. It is suggested that specificity should be interpreted statistically, as a function of term use rather than of term meaning. The effects on retrieval of variations in term specificity are examined, experiments with three test collections showing, in particular, that frequently-occurring terms are required for good overall performance. It is argued that terms should be weighted according to collection frequency, so that matches on less frequent, more specific, terms are of greater value than matches on frequent terms. Results for the test collections show that considerable improvements in performance are obtained with this very simple procedure.
Source: Journal of documentation. 60(2004) no.5, S.493-502

Lalmas, M.: XML retrieval (2009) 0.05

0.05263522 = product of:
  0.10527044 = sum of:
    0.059039105 = weight(_text_:retrieval in 4998) [ClassicSimilarity], result of:
      0.059039105 = score(doc=4998,freq=16.0), product of:
        0.124912694 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.041294612 = queryNorm
        0.47264296 = fieldWeight in 4998, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4998)
    0.021389665 = weight(_text_:use in 4998) [ClassicSimilarity], result of:
      0.021389665 = score(doc=4998,freq=2.0), product of:
        0.12644777 = queryWeight, product of:
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.041294612 = queryNorm
        0.1691581 = fieldWeight in 4998, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4998)
    0.019324033 = weight(_text_:of in 4998) [ClassicSimilarity], result of:
      0.019324033 = score(doc=4998,freq=24.0), product of:
        0.06457475 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.041294612 = queryNorm
        0.2992506 = fieldWeight in 4998, product of:
          4.8989797 = tf(freq=24.0), with freq of:
            24.0 = termFreq=24.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4998)
    0.0055176322 = product of:
      0.0110352645 = sum of:
        0.0110352645 = weight(_text_:on in 4998) [ClassicSimilarity], result of:
          0.0110352645 = score(doc=4998,freq=2.0), product of:
            0.090823986 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.041294612 = queryNorm
            0.121501654 = fieldWeight in 4998, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4998)
      0.5 = coord(1/2)
  0.5 = coord(4/8)

Abstract: Documents usually have a content and a structure. The content refers to the text of the document, whereas the structure refers to how a document is logically organized. An increasingly common way to encode the structure is through the use of a mark-up language. Nowadays, the most widely used mark-up language for representing structure is the eXtensible Mark-up Language (XML). XML can be used to provide a focused access to documents, i.e. returning XML elements, such as sections and paragraphs, instead of whole documents in response to a query. Such focused strategies are of particular benefit for information repositories containing long documents, or documents covering a wide variety of topics, where users are directed to the most relevant content within a document. The increased adoption of XML to represent a document structure requires the development of tools to effectively access documents marked-up in XML. This book provides a detailed description of query languages, indexing strategies, ranking algorithms, presentation scenarios developed to access XML documents. Major advances in XML retrieval were seen from 2002 as a result of INEX, the Initiative for Evaluation of XML Retrieval. INEX, also described in this book, provided test sets for evaluating XML retrieval effectiveness. Many of the developments and results described in this book were investigated within INEX.
Content: Table of Contents: Introduction / Basic XML Concepts / Historical Perspectives / Query Languages / Indexing Strategies / Ranking Strategies / Presentation Strategies / Evaluating XML Retrieval Effectiveness / Conclusions
LCSH: Information retrieval
Series: Synthesis lectures on information concepts, retrieval & services; 7
Subject: Information retrieval

Henzinger, M.R.: Hyperlink analysis for the Web (2001) 0.05

0.05173868 = product of:
  0.10347736 = sum of:
    0.04418082 = weight(_text_:retrieval in 8) [ClassicSimilarity], result of:
      0.04418082 = score(doc=8,freq=14.0), product of:
        0.124912694 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.041294612 = queryNorm
        0.3536936 = fieldWeight in 8, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.03125 = fieldNorm(doc=8)
    0.038262997 = weight(_text_:use in 8) [ClassicSimilarity], result of:
      0.038262997 = score(doc=8,freq=10.0), product of:
        0.12644777 = queryWeight, product of:
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.041294612 = queryNorm
        0.30259922 = fieldWeight in 8, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.03125 = fieldNorm(doc=8)
    0.0133880805 = weight(_text_:of in 8) [ClassicSimilarity], result of:
      0.0133880805 = score(doc=8,freq=18.0), product of:
        0.06457475 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.041294612 = queryNorm
        0.20732687 = fieldWeight in 8, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03125 = fieldNorm(doc=8)
    0.007645456 = product of:
      0.015290912 = sum of:
        0.015290912 = weight(_text_:on in 8) [ClassicSimilarity], result of:
          0.015290912 = score(doc=8,freq=6.0), product of:
            0.090823986 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.041294612 = queryNorm
            0.16835764 = fieldWeight in 8, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.03125 = fieldNorm(doc=8)
      0.5 = coord(1/2)
  0.5 = coord(4/8)

Abstract: Hyperlink analysis algorithms allow search engines to deliver focused results to user queries.This article surveys ranking algorithms used to retrieve information on the Web.
Content: Information retrieval is a computer science subfield whose goal is to find all documents relevant to a user query in a given collection of documents. As such, information retrieval should really be called document retrieval. Before the advent of the Web, IR systems were typically installed in libraries for use mostly by reference librarians. The retrieval algorithm for these systems was usually based exclusively on analysis of the words in the document. The Web changed all this. Now each Web user has access to various search engines whose retrieval algorithms often use not only the words in the documents but also information like the hyperlink structure of the Web or markup language tags. How are hyperlinks useful? The hyperlink functionality alone-that is, the hyperlink to Web page B that is contained in Web page A-is not directly useful in information retrieval. However, the way Web page authors use hyperlinks can give them valuable information content. Authors usually create hyperlinks they think will be useful to readers. Some may be navigational aids that, for example, take the reader back to the site's home page; others provide access to documents that augment the content of the current page. The latter tend to point to highquality pages that might be on the same topic as the page containing the hyperlink. Web information retrieval systems can exploit this information to refine searches for relevant documents. Hyperlink analysis significantly improves the relevance of the search results, so much so that all major Web search engines claim to use some type of hyperlink analysis. However, the search engines do not disclose details about the type of hyperlink analysis they perform- mostly to avoid manipulation of search results by Web-positioning companies. In this article, I discuss how hyperlink analysis can be applied to ranking algorithms, and survey other ways Web search engines can use this analysis.

Boughanem, M.; Chrisment, C.; Tamine, L.: On using genetic algorithms for multimodal relevance optimization in information retrieval (2002) 0.05

0.050813206 = product of:
  0.10162641 = sum of:
    0.029222867 = weight(_text_:retrieval in 1011) [ClassicSimilarity], result of:
      0.029222867 = score(doc=1011,freq=2.0), product of:
        0.124912694 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.041294612 = queryNorm
        0.23394634 = fieldWeight in 1011, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1011)
    0.042349376 = weight(_text_:use in 1011) [ClassicSimilarity], result of:
      0.042349376 = score(doc=1011,freq=4.0), product of:
        0.12644777 = queryWeight, product of:
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.041294612 = queryNorm
        0.33491597 = fieldWeight in 1011, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1011)
    0.019129815 = weight(_text_:of in 1011) [ClassicSimilarity], result of:
      0.019129815 = score(doc=1011,freq=12.0), product of:
        0.06457475 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.041294612 = queryNorm
        0.29624295 = fieldWeight in 1011, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1011)
    0.010924355 = product of:
      0.02184871 = sum of:
        0.02184871 = weight(_text_:on in 1011) [ClassicSimilarity], result of:
          0.02184871 = score(doc=1011,freq=4.0), product of:
            0.090823986 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.041294612 = queryNorm
            0.24056101 = fieldWeight in 1011, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1011)
      0.5 = coord(1/2)
  0.5 = coord(4/8)

Abstract: Boughanem, Chrisment, and Tamine use 144,186 documents and 25 queries from the TREC corpus AP88 to evaluate a genetic algorithm for multiple query evaluation against single query evaluation. They demonstrate niche construction by the use of a genetic technique to reproduce queries more often if they retrieve more relevant documents (genotypic sharing), or if they have close evaluation results (phenotypic sharing).New documents generated in each iteration are ranked by a merge based on one of these two principles. Genotypic sharing yields improvements of from 6% to 15% over single query evaluation, and phenotypic sharing shows from 5% to 15% improvement. Thus the niching technique appears to offer the possibility of successful merging of different query expressions.
Source: Journal of the American Society for Information Science and Technology. 53(2002) no.11, S.934-943

Li, M.; Li, H.; Zhou, Z.-H.: Semi-supervised document retrieval (2009) 0.05

0.05074767 = product of:
  0.10149534 = sum of:
    0.036153924 = weight(_text_:retrieval in 4218) [ClassicSimilarity], result of:
      0.036153924 = score(doc=4218,freq=6.0), product of:
        0.124912694 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.041294612 = queryNorm
        0.28943354 = fieldWeight in 4218, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4218)
    0.04277933 = weight(_text_:use in 4218) [ClassicSimilarity], result of:
      0.04277933 = score(doc=4218,freq=8.0), product of:
        0.12644777 = queryWeight, product of:
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.041294612 = queryNorm
        0.3383162 = fieldWeight in 4218, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4218)
    0.014758972 = weight(_text_:of in 4218) [ClassicSimilarity], result of:
      0.014758972 = score(doc=4218,freq=14.0), product of:
        0.06457475 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.041294612 = queryNorm
        0.22855641 = fieldWeight in 4218, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4218)
    0.007803111 = product of:
      0.015606222 = sum of:
        0.015606222 = weight(_text_:on in 4218) [ClassicSimilarity], result of:
          0.015606222 = score(doc=4218,freq=4.0), product of:
            0.090823986 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.041294612 = queryNorm
            0.1718293 = fieldWeight in 4218, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4218)
      0.5 = coord(1/2)
  0.5 = coord(4/8)

Abstract: This paper proposes a new machine learning method for constructing ranking models in document retrieval. The method, which is referred to as SSRank, aims to use the advantages of both the traditional Information Retrieval (IR) methods and the supervised learning methods for IR proposed recently. The advantages include the use of limited amount of labeled data and rich model representation. To do so, the method adopts a semi-supervised learning framework in ranking model construction. Specifically, given a small number of labeled documents with respect to some queries, the method effectively labels the unlabeled documents for the queries. It then uses all the labeled data to train a machine learning model (in our case, Neural Network). In the data labeling, the method also makes use of a traditional IR model (in our case, BM25). A stopping criterion based on machine learning theory is given for the data labeling process. Experimental results on three benchmark datasets and one web search dataset indicate that SSRank consistently and almost always significantly outperforms the baseline methods (unsupervised and supervised learning methods), given the same amount of labeled data. This is because SSRank can effectively leverage the use of unlabeled data in learning.

Kaszkiel, M.; Zobel, J.: Effective ranking with arbitrary passages (2001) 0.05

0.049027514 = product of:
  0.09805503 = sum of:
    0.035423465 = weight(_text_:retrieval in 5764) [ClassicSimilarity], result of:
      0.035423465 = score(doc=5764,freq=4.0), product of:
        0.124912694 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.041294612 = queryNorm
        0.2835858 = fieldWeight in 5764, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=5764)
    0.025667597 = weight(_text_:use in 5764) [ClassicSimilarity], result of:
      0.025667597 = score(doc=5764,freq=2.0), product of:
        0.12644777 = queryWeight, product of:
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.041294612 = queryNorm
        0.20298971 = fieldWeight in 5764, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.046875 = fieldNorm(doc=5764)
    0.027600236 = weight(_text_:of in 5764) [ClassicSimilarity], result of:
      0.027600236 = score(doc=5764,freq=34.0), product of:
        0.06457475 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.041294612 = queryNorm
        0.4274153 = fieldWeight in 5764, product of:
          5.8309517 = tf(freq=34.0), with freq of:
            34.0 = termFreq=34.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=5764)
    0.009363732 = product of:
      0.018727465 = sum of:
        0.018727465 = weight(_text_:on in 5764) [ClassicSimilarity], result of:
          0.018727465 = score(doc=5764,freq=4.0), product of:
            0.090823986 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.041294612 = queryNorm
            0.20619515 = fieldWeight in 5764, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.046875 = fieldNorm(doc=5764)
      0.5 = coord(1/2)
  0.5 = coord(4/8)

Abstract: Text retrieval systems store a great variety of documents, from abstracts, newspaper articles, and Web pages to journal articles, books, court transcripts, and legislation. Collections of diverse types of documents expose shortcomings in current approaches to ranking. Use of short fragments of documents, called passages, instead of whole documents can overcome these shortcomings: passage ranking provides convenient units of text to return to the user, can avoid the difficulties of comparing documents of different length, and enables identification of short blocks of relevant material among otherwise irrelevant text. In this article, we compare several kinds of passage in an extensive series of experiments. We introduce a new type of passage, overlapping fragments of either fixed or variable length. We show that ranking with these arbitrary passages gives substantial improvements in retrieval effectiveness over traditional document ranking schemes, particularly for queries on collections of long documents. Ranking with arbitrary passages shows consistent improvements compared to ranking with whole documents, and to ranking with previous passage types that depend on document structure or topic shifts in documents
Source: Journal of the American Society for Information Science and technology. 52(2001) no.4, S.344-364

MacFarlane, A.; McCann, J.A.; Robertson, S.E.: Parallel methods for the generation of partitioned inverted files (2005) 0.05

0.048638847 = product of:
  0.09727769 = sum of:
    0.035423465 = weight(_text_:retrieval in 651) [ClassicSimilarity], result of:
      0.035423465 = score(doc=651,freq=4.0), product of:
        0.124912694 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.041294612 = queryNorm
        0.2835858 = fieldWeight in 651, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=651)
    0.036299463 = weight(_text_:use in 651) [ClassicSimilarity], result of:
      0.036299463 = score(doc=651,freq=4.0), product of:
        0.12644777 = queryWeight, product of:
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.041294612 = queryNorm
        0.2870708 = fieldWeight in 651, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.046875 = fieldNorm(doc=651)
    0.018933605 = weight(_text_:of in 651) [ClassicSimilarity], result of:
      0.018933605 = score(doc=651,freq=16.0), product of:
        0.06457475 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.041294612 = queryNorm
        0.2932045 = fieldWeight in 651, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=651)
    0.006621159 = product of:
      0.013242318 = sum of:
        0.013242318 = weight(_text_:on in 651) [ClassicSimilarity], result of:
          0.013242318 = score(doc=651,freq=2.0), product of:
            0.090823986 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.041294612 = queryNorm
            0.14580199 = fieldWeight in 651, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.046875 = fieldNorm(doc=651)
      0.5 = coord(1/2)
  0.5 = coord(4/8)

Abstract: Purpose - The generation of inverted indexes is one of the most computationally intensive activities for information retrieval systems: indexing large multi-gigabyte text databases can take many hours or even days to complete. We examine the generation of partitioned inverted files in order to speed up the process of indexing. Two types of index partitions are investigated: TermId and DocId. Design/methodology/approach - We use standard measures used in parallel computing such as speedup and efficiency to examine the computing results and also the space costs of our trial indexing experiments. Findings - The results from runs on both partitioning methods are compared and contrasted, concluding that DocId is the more efficient method. Practical implications - The practical implications are that the DocId partitioning method would in most circumstances be used for distributing inverted file data in a parallel computer, particularly if indexing speed is the primary consideration. Originality/value - The paper is of value to database administrators who manage large-scale text collections, and who need to use parallel computing to implement their text retrieval services.

Wills, R.S.: Google's PageRank : the math behind the search engine (2006) 0.05
```
0.048239887 = product of:
  0.1286397 = sum of:
    0.03422346 = weight(_text_:use in 5954) [ClassicSimilarity], result of:
      0.03422346 = score(doc=5954,freq=8.0), product of:
        0.12644777 = queryWeight, product of:
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.041294612 = queryNorm
        0.27065295 = fieldWeight in 5954, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.03125 = fieldNorm(doc=5954)
    0.01728394 = weight(_text_:of in 5954) [ClassicSimilarity], result of:
      0.01728394 = score(doc=5954,freq=30.0), product of:
        0.06457475 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.041294612 = queryNorm
        0.26765788 = fieldWeight in 5954, product of:
          5.477226 = tf(freq=30.0), with freq of:
            30.0 = termFreq=30.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03125 = fieldNorm(doc=5954)
    0.0771323 = sum of:
      0.019740483 = weight(_text_:on in 5954) [ClassicSimilarity], result of:
        0.019740483 = score(doc=5954,freq=10.0), product of:
          0.090823986 = queryWeight, product of:
            2.199415 = idf(docFreq=13325, maxDocs=44218)
            0.041294612 = queryNorm
          0.21734878 = fieldWeight in 5954, product of:
            3.1622777 = tf(freq=10.0), with freq of:
              10.0 = termFreq=10.0
            2.199415 = idf(docFreq=13325, maxDocs=44218)
            0.03125 = fieldNorm(doc=5954)
      0.05739182 = weight(_text_:line in 5954) [ClassicSimilarity], result of:
        0.05739182 = score(doc=5954,freq=2.0), product of:
          0.23157367 = queryWeight, product of:
            5.6078424 = idf(docFreq=440, maxDocs=44218)
            0.041294612 = queryNorm
          0.24783395 = fieldWeight in 5954, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            5.6078424 = idf(docFreq=440, maxDocs=44218)
            0.03125 = fieldNorm(doc=5954)
  0.375 = coord(3/8)
```
Abstract

Approximately 91 million American adults use the Internet on a typical day The number-one Internet activity is reading and writing e-mail. Search engine use is next in line and continues to increase in popularity. In fact, survey findings indicate that nearly 60 million American adults use search engines on a given day. Even though there are many Internet search engines, Google, Yahoo!, and MSN receive over 81% of all search requests. Despite claims that the quality of search provided by Yahoo! and MSN now equals that of Google, Google continues to thrive as the search engine of choice, receiving over 46% of all search requests, nearly double the volume of Yahoo! and over four times that of MSN. I use Google's search engine on a daily basis and rarely request information from other search engines. One day, I decided to visit the homepages of Google. Yahoo!, and MSN to compare the quality of search results. Coffee was on my mind that day, so I entered the simple query "coffee" in the search box at each homepage. Table 1 shows the top ten (unsponsored) results returned by each search engine. Although ordered differently, two webpages, www.peets.com and www.coffeegeek.com, appear in all three top ten lists. In addition, each pairing of top ten lists has two additional results in common. Depending on the information I hoped to obtain about coffee by using the search engines, I could argue that any one of the three returned better results: however, I was not looking for a particular webpage, so all three listings of search results seemed of equal quality. Thus, I plan to continue using Google. My decision is indicative of the problem Yahoo!, MSN, and other search engine companies face in the quest to obtain a larger percentage of Internet search volume. Search engine users are loyal to one or a few search engines and are generally happy with search results. Thus, as long as Google continues to provide results deemed high in quality, Google likely will remain the top search engine. But what set Google apart from its competitors in the first place? The answer is PageRank. In this article I explain this simple mathematical algorithm that revolutionized Web search.

López-Pujalte, C.; Guerrero-Bote, V.P.; Moya-Anegón, F. de: Order-based fitness functions for genetic algorithms applied to relevance feedback (2003) 0.05

0.04719743 = product of:
  0.12585981 = sum of:
    0.020873476 = weight(_text_:retrieval in 5154) [ClassicSimilarity], result of:
      0.020873476 = score(doc=5154,freq=2.0), product of:
        0.124912694 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.041294612 = queryNorm
        0.16710453 = fieldWeight in 5154, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5154)
    0.017640345 = weight(_text_:of in 5154) [ClassicSimilarity], result of:
      0.017640345 = score(doc=5154,freq=20.0), product of:
        0.06457475 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.041294612 = queryNorm
        0.27317715 = fieldWeight in 5154, product of:
          4.472136 = tf(freq=20.0), with freq of:
            20.0 = termFreq=20.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5154)
    0.087345995 = sum of:
      0.015606222 = weight(_text_:on in 5154) [ClassicSimilarity], result of:
        0.015606222 = score(doc=5154,freq=4.0), product of:
          0.090823986 = queryWeight, product of:
            2.199415 = idf(docFreq=13325, maxDocs=44218)
            0.041294612 = queryNorm
          0.1718293 = fieldWeight in 5154, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            2.199415 = idf(docFreq=13325, maxDocs=44218)
            0.0390625 = fieldNorm(doc=5154)
      0.07173977 = weight(_text_:line in 5154) [ClassicSimilarity], result of:
        0.07173977 = score(doc=5154,freq=2.0), product of:
          0.23157367 = queryWeight, product of:
            5.6078424 = idf(docFreq=440, maxDocs=44218)
            0.041294612 = queryNorm
          0.30979243 = fieldWeight in 5154, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            5.6078424 = idf(docFreq=440, maxDocs=44218)
            0.0390625 = fieldNorm(doc=5154)
  0.375 = coord(3/8)

Abstract: Lopez-Pujalte and Guerrero-Bote test a relevance feedback genetic algorithm while varying its order based fitness functions and generating a function based upon the Ide dec-hi method as a base line. Using the non-zero weighted term types assigned to the query, and to the initially retrieved set of documents, as genes, a chromosome of equal length is created for each. The algorithm is provided with the chromosomes for judged relevant documents, for judged irrelevant documents, and for the irrelevant documents with their terms negated. The algorithm uses random selection of all possible genes, but gives greater likelihood to those with higher fitness values. When the fittest chromosome of a previous population is eliminated it is restored while the least fittest of the new population is eliminated in its stead. A crossover probability of .8 and a mutation probability of .2 were used with 20 generations. Three fitness functions were utilized; the Horng and Yeh function which takes into account the position of relevant documents, and two new functions, one based on accumulating the cosine similarity for retrieved documents, the other on stored fixed-recall-interval precessions. The Cranfield collection was used with the first 15 documents retrieved from 33 queries chosen to have at least 3 relevant documents in the first 15 and at least 5 relevant documents not initially retrieved. Precision was calculated at fixed recall levels using the residual collection method which removes viewed documents. One of the three functions improved the original retrieval by127 percent, while the Ide dec-hi method provided a 120 percent improvement.
Source: Journal of the American Society for Information Science and technology. 54(2003) no.2, S.152-160

Witschel, H.F.: Global term weights in distributed environments (2008) 0.05

0.046977695 = product of:
  0.09395539 = sum of:
    0.050096344 = weight(_text_:retrieval in 2096) [ClassicSimilarity], result of:
      0.050096344 = score(doc=2096,freq=8.0), product of:
        0.124912694 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.041294612 = queryNorm
        0.40105087 = fieldWeight in 2096, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=2096)
    0.017710768 = weight(_text_:of in 2096) [ClassicSimilarity], result of:
      0.017710768 = score(doc=2096,freq=14.0), product of:
        0.06457475 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.041294612 = queryNorm
        0.2742677 = fieldWeight in 2096, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=2096)
    0.009363732 = product of:
      0.018727465 = sum of:
        0.018727465 = weight(_text_:on in 2096) [ClassicSimilarity], result of:
          0.018727465 = score(doc=2096,freq=4.0), product of:
            0.090823986 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.041294612 = queryNorm
            0.20619515 = fieldWeight in 2096, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.046875 = fieldNorm(doc=2096)
      0.5 = coord(1/2)
    0.016784549 = product of:
      0.033569098 = sum of:
        0.033569098 = weight(_text_:22 in 2096) [ClassicSimilarity], result of:
          0.033569098 = score(doc=2096,freq=2.0), product of:
            0.1446067 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.041294612 = queryNorm
            0.23214069 = fieldWeight in 2096, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2096)
      0.5 = coord(1/2)
  0.5 = coord(4/8)

Abstract: This paper examines the estimation of global term weights (such as IDF) in information retrieval scenarios where a global view on the collection is not available. In particular, the two options of either sampling documents or of using a reference corpus independent of the target retrieval collection are compared using standard IR test collections. In addition, the possibility of pruning term lists based on frequency is evaluated. The results show that very good retrieval performance can be reached when just the most frequent terms of a collection - an "extended stop word list" - are known and all terms which are not in that list are treated equally. However, the list cannot always be fully estimated from a general-purpose reference corpus, but some "domain-specific stop words" need to be added. A good solution for achieving this is to mix estimates from small samples of the target retrieval collection with ones derived from a reference corpus.
Date: 1. 8.2008 9:44:22

Beitzel, S.M.; Jensen, E.C.; Chowdhury, A.; Grossman, D.; Frieder, O; Goharian, N.: Fusion of effective retrieval strategies in the same information retrieval system (2004) 0.05

0.04597036 = product of:
  0.12258763 = sum of:
    0.07920927 = weight(_text_:retrieval in 2502) [ClassicSimilarity], result of:
      0.07920927 = score(doc=2502,freq=20.0), product of:
        0.124912694 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.041294612 = queryNorm
        0.63411707 = fieldWeight in 2502, product of:
          4.472136 = tf(freq=20.0), with freq of:
            20.0 = termFreq=20.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=2502)
    0.025667597 = weight(_text_:use in 2502) [ClassicSimilarity], result of:
      0.025667597 = score(doc=2502,freq=2.0), product of:
        0.12644777 = queryWeight, product of:
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.041294612 = queryNorm
        0.20298971 = fieldWeight in 2502, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.046875 = fieldNorm(doc=2502)
    0.017710768 = weight(_text_:of in 2502) [ClassicSimilarity], result of:
      0.017710768 = score(doc=2502,freq=14.0), product of:
        0.06457475 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.041294612 = queryNorm
        0.2742677 = fieldWeight in 2502, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=2502)
  0.375 = coord(3/8)

Abstract: Prior efforts have shown that under certain situations retrieval effectiveness may be improved via the use of data fusion techniques. Although these improvements have been observed from the fusion of result sets from several distinct information retrieval systems, it has often been thought that fusing different document retrieval strategies in a single information retrieval system will lead to similar improvements. In this study, we show that this is not the case. We hold constant systemic differences such as parsing, stemming, phrase processing, and relevance feedback, and fuse result sets generated from highly effective retrieval strategies in the same information retrieval system. From this, we show that data fusion of highly effective retrieval strategies alone shows little or no improvement in retrieval effectiveness. Furthermore, we present a detailed analysis of the performance of modern data fusion approaches, and demonstrate the reasons why they do not perform weIl when applied to this problem. Detailed results and analyses are included to support our conclusions.
Source: Journal of the American Society for Information Science and Technology. 55(2004) no.10, S.859-868

Vechtomova, O.; Karamuftuoglu, M.: Elicitation and use of relevance feedback information (2006) 0.05

0.045794785 = product of:
  0.09158957 = sum of:
    0.029222867 = weight(_text_:retrieval in 966) [ClassicSimilarity], result of:
      0.029222867 = score(doc=966,freq=2.0), product of:
        0.124912694 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.041294612 = queryNorm
        0.23394634 = fieldWeight in 966, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0546875 = fieldNorm(doc=966)
    0.029945528 = weight(_text_:use in 966) [ClassicSimilarity], result of:
      0.029945528 = score(doc=966,freq=2.0), product of:
        0.12644777 = queryWeight, product of:
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.041294612 = queryNorm
        0.23682132 = fieldWeight in 966, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.0546875 = fieldNorm(doc=966)
    0.024696484 = weight(_text_:of in 966) [ClassicSimilarity], result of:
      0.024696484 = score(doc=966,freq=20.0), product of:
        0.06457475 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.041294612 = queryNorm
        0.38244802 = fieldWeight in 966, product of:
          4.472136 = tf(freq=20.0), with freq of:
            20.0 = termFreq=20.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=966)
    0.007724685 = product of:
      0.01544937 = sum of:
        0.01544937 = weight(_text_:on in 966) [ClassicSimilarity], result of:
          0.01544937 = score(doc=966,freq=2.0), product of:
            0.090823986 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.041294612 = queryNorm
            0.17010231 = fieldWeight in 966, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0546875 = fieldNorm(doc=966)
      0.5 = coord(1/2)
  0.5 = coord(4/8)

Abstract: The paper presents two approaches to interactively refining user search formulations and their evaluation in the new High Accuracy Retrieval from Documents (HARD) track of TREC-12. The first method consists of asking the user to select a number of sentences that represent documents. The second method consists of showing to the user a list of noun phrases extracted from the initial document set. Both methods then expand the query based on the user feedback. The TREC results show that one of the methods is an effective means of interactive query expansion and yields significant performance improvements. The paper presents a comparison of the methods and detailed analysis of the evaluation results.

Wan, X.; Yang, J.; Xiao, J.: Towards a unified approach to document similarity search using manifold-ranking of blocks (2008) 0.04

0.04497494 = product of:
  0.08994988 = sum of:
    0.04174695 = weight(_text_:retrieval in 2081) [ClassicSimilarity], result of:
      0.04174695 = score(doc=2081,freq=8.0), product of:
        0.124912694 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.041294612 = queryNorm
        0.33420905 = fieldWeight in 2081, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2081)
    0.021389665 = weight(_text_:use in 2081) [ClassicSimilarity], result of:
      0.021389665 = score(doc=2081,freq=2.0), product of:
        0.12644777 = queryWeight, product of:
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.041294612 = queryNorm
        0.1691581 = fieldWeight in 2081, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2081)
    0.015778005 = weight(_text_:of in 2081) [ClassicSimilarity], result of:
      0.015778005 = score(doc=2081,freq=16.0), product of:
        0.06457475 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.041294612 = queryNorm
        0.24433708 = fieldWeight in 2081, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2081)
    0.0110352645 = product of:
      0.022070529 = sum of:
        0.022070529 = weight(_text_:on in 2081) [ClassicSimilarity], result of:
          0.022070529 = score(doc=2081,freq=8.0), product of:
            0.090823986 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.041294612 = queryNorm
            0.24300331 = fieldWeight in 2081, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2081)
      0.5 = coord(1/2)
  0.5 = coord(4/8)

Abstract: Document similarity search (i.e. query by example) aims to retrieve a ranked list of documents similar to a query document in a text corpus or on the Web. Most existing approaches to similarity search first compute the pairwise similarity score between each document and the query using a retrieval function or similarity measure (e.g. Cosine), and then rank the documents by the similarity scores. In this paper, we propose a novel retrieval approach based on manifold-ranking of document blocks (i.e. a block of coherent text about a subtopic) to re-rank a small set of documents initially retrieved by some existing retrieval function. The proposed approach can make full use of the intrinsic global manifold structure of the document blocks by propagating the ranking scores between the blocks on a weighted graph. First, the TextTiling algorithm and the VIPS algorithm are respectively employed to segment text documents and web pages into blocks. Then, each block is assigned with a ranking score by the manifold-ranking algorithm. Lastly, a document gets its final ranking score by fusing the scores of its blocks. Experimental results on the TDT data and the ODP data demonstrate that the proposed approach can significantly improve the retrieval performances over baseline approaches. Document block is validated to be a better unit than the whole document in the manifold-ranking process.

López-Herrera, A.G.; Herrera-Viedma, E.; Herrera, F.: ¬A study of the use of multi-objective evolutionary algorithms to learn Boolean queries : a comparative study (2009) 0.04

0.044491146 = product of:
  0.08898229 = sum of:
    0.029222867 = weight(_text_:retrieval in 1751) [ClassicSimilarity], result of:
      0.029222867 = score(doc=1751,freq=2.0), product of:
        0.124912694 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.041294612 = queryNorm
        0.23394634 = fieldWeight in 1751, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1751)
    0.029945528 = weight(_text_:use in 1751) [ClassicSimilarity], result of:
      0.029945528 = score(doc=1751,freq=2.0), product of:
        0.12644777 = queryWeight, product of:
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.041294612 = queryNorm
        0.23682132 = fieldWeight in 1751, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1751)
    0.022089208 = weight(_text_:of in 1751) [ClassicSimilarity], result of:
      0.022089208 = score(doc=1751,freq=16.0), product of:
        0.06457475 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.041294612 = queryNorm
        0.34207192 = fieldWeight in 1751, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1751)
    0.007724685 = product of:
      0.01544937 = sum of:
        0.01544937 = weight(_text_:on in 1751) [ClassicSimilarity], result of:
          0.01544937 = score(doc=1751,freq=2.0), product of:
            0.090823986 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.041294612 = queryNorm
            0.17010231 = fieldWeight in 1751, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1751)
      0.5 = coord(1/2)
  0.5 = coord(4/8)

Abstract: In this article, our interest is focused on the automatic learning of Boolean queries in information retrieval systems (IRSs) by means of multi-objective evolutionary algorithms considering the classic performance criteria, precision and recall. We present a comparative study of four well-known, general-purpose, multi-objective evolutionary algorithms to learn Boolean queries in IRSs. These evolutionary algorithms are the Nondominated Sorting Genetic Algorithm (NSGA-II), the first version of the Strength Pareto Evolutionary Algorithm (SPEA), the second version of SPEA (SPEA2), and the Multi-Objective Genetic Algorithm (MOGA).
Source: Journal of the American Society for Information Science and Technology. 60(2009) no.6, S.1192-1207

Search (144 results, page 1 of 8)

Authors

Languages

Types

Themes

Subjects

Classifications