Search (125 results, page 1 of 7)

MacFarlane, A.; Robertson, S.E.; McCann, J.A.: Parallel computing for passage retrieval (2004) 0.00

0.0024238946 = product of:
  0.020603104 = sum of:
    0.0070723416 = weight(_text_:in in 5108) [ClassicSimilarity], result of:
      0.0070723416 = score(doc=5108,freq=6.0), product of:
        0.033961542 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.024967048 = queryNorm
        0.2082456 = fieldWeight in 5108, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0625 = fieldNorm(doc=5108)
    0.013530762 = product of:
      0.027061524 = sum of:
        0.027061524 = weight(_text_:22 in 5108) [ClassicSimilarity], result of:
          0.027061524 = score(doc=5108,freq=2.0), product of:
            0.08743035 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.024967048 = queryNorm
            0.30952093 = fieldWeight in 5108, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=5108)
      0.5 = coord(1/2)
  0.11764706 = coord(2/17)

Abstract: In this paper methods for both speeding up passage processing and examining more passages using parallel computers are explored. The number of passages processed are varied in order to examine the effect on retrieval effectiveness and efficiency. The particular algorithm applied has previously been used to good effect in Okapi experiments at TREC. This algorithm and the mechanism for applying parallel computing to speed up processing are described.
Date: 20. 1.2007 18:30:22

Losada, D.E.; Barreiro, A.: Emebedding term similarity and inverse document frequency into a logical model of information retrieval (2003) 0.00

0.002072233 = product of:
  0.01761398 = sum of:
    0.004083218 = weight(_text_:in in 1422) [ClassicSimilarity], result of:
      0.004083218 = score(doc=1422,freq=2.0), product of:
        0.033961542 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.024967048 = queryNorm
        0.120230645 = fieldWeight in 1422, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0625 = fieldNorm(doc=1422)
    0.013530762 = product of:
      0.027061524 = sum of:
        0.027061524 = weight(_text_:22 in 1422) [ClassicSimilarity], result of:
          0.027061524 = score(doc=1422,freq=2.0), product of:
            0.08743035 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.024967048 = queryNorm
            0.30952093 = fieldWeight in 1422, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=1422)
      0.5 = coord(1/2)
  0.11764706 = coord(2/17)

Date: 22. 3.2003 19:27:23
Footnote: Beitrag eines Themenheftes: Mathematical, logical, and formal methods in information retrieval

Witschel, H.F.: Global term weights in distributed environments (2008) 0.00

0.0019995102 = product of:
  0.016995836 = sum of:
    0.0068477658 = weight(_text_:in in 2096) [ClassicSimilarity], result of:
      0.0068477658 = score(doc=2096,freq=10.0), product of:
        0.033961542 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.024967048 = queryNorm
        0.20163295 = fieldWeight in 2096, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.046875 = fieldNorm(doc=2096)
    0.010148071 = product of:
      0.020296142 = sum of:
        0.020296142 = weight(_text_:22 in 2096) [ClassicSimilarity], result of:
          0.020296142 = score(doc=2096,freq=2.0), product of:
            0.08743035 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.024967048 = queryNorm
            0.23214069 = fieldWeight in 2096, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2096)
      0.5 = coord(1/2)
  0.11764706 = coord(2/17)

Abstract: This paper examines the estimation of global term weights (such as IDF) in information retrieval scenarios where a global view on the collection is not available. In particular, the two options of either sampling documents or of using a reference corpus independent of the target retrieval collection are compared using standard IR test collections. In addition, the possibility of pruning term lists based on frequency is evaluated. The results show that very good retrieval performance can be reached when just the most frequent terms of a collection - an "extended stop word list" - are known and all terms which are not in that list are treated equally. However, the list cannot always be fully estimated from a general-purpose reference corpus, but some "domain-specific stop words" need to be added. A good solution for achieving this is to mix estimates from small samples of the target retrieval collection with ones derived from a reference corpus.
Date: 1. 8.2008 9:44:22

Klas, C.-P.; Fuhr, N.; Schaefer, A.: Evaluating strategic support for information access in the DAFFODIL system (2004) 0.00
```
0.0019995102 = product of:
  0.016995836 = sum of:
    0.0068477658 = weight(_text_:in in 2419) [ClassicSimilarity], result of:
      0.0068477658 = score(doc=2419,freq=10.0), product of:
        0.033961542 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.024967048 = queryNorm
        0.20163295 = fieldWeight in 2419, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.046875 = fieldNorm(doc=2419)
    0.010148071 = product of:
      0.020296142 = sum of:
        0.020296142 = weight(_text_:22 in 2419) [ClassicSimilarity], result of:
          0.020296142 = score(doc=2419,freq=2.0), product of:
            0.08743035 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.024967048 = queryNorm
            0.23214069 = fieldWeight in 2419, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2419)
      0.5 = coord(1/2)
  0.11764706 = coord(2/17)
```
Abstract

The digital library system Daffodil is targeted at strategic support of users during the information search process. For searching, exploring and managing digital library objects it provides user-customisable information seeking patterns over a federation of heterogeneous digital libraries. In this paper evaluation results with respect to retrieval effectiveness, efficiency and user satisfaction are presented. The analysis focuses on strategic support for the scientific work-flow. Daffodil supports the whole work-flow, from data source selection over information seeking to the representation, organisation and reuse of information. By embedding high level search functionality into the scientific work-flow, the user experiences better strategic system support due to a more systematic work process. These ideas have been implemented in Daffodil followed by a qualitative evaluation. The evaluation has been conducted with 28 participants, ranging from information seeking novices to experts. The results are promising, as they support the chosen model.

Date

16.11.2008 16:22:48

Series

Lecture notes in computer science; vol.3232

Theme

Semantisches Umfeld in Indexierung u. Retrieval

Campos, L.M. de; Fernández-Luna, J.M.; Huete, J.F.: Implementing relevance feedback in the Bayesian network retrieval model (2003) 0.00

0.0019995102 = product of:
  0.016995836 = sum of:
    0.0068477658 = weight(_text_:in in 825) [ClassicSimilarity], result of:
      0.0068477658 = score(doc=825,freq=10.0), product of:
        0.033961542 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.024967048 = queryNorm
        0.20163295 = fieldWeight in 825, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.046875 = fieldNorm(doc=825)
    0.010148071 = product of:
      0.020296142 = sum of:
        0.020296142 = weight(_text_:22 in 825) [ClassicSimilarity], result of:
          0.020296142 = score(doc=825,freq=2.0), product of:
            0.08743035 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.024967048 = queryNorm
            0.23214069 = fieldWeight in 825, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=825)
      0.5 = coord(1/2)
  0.11764706 = coord(2/17)

Abstract: Relevance Feedback consists in automatically formulating a new query according to the relevance judgments provided by the user after evaluating a set of retrieved documents. In this article, we introduce several relevance feedback methods for the Bayesian Network Retrieval ModeL The theoretical frame an which our methods are based uses the concept of partial evidences, which summarize the new pieces of information gathered after evaluating the results obtained by the original query. These partial evidences are inserted into the underlying Bayesian network and a new inference process (probabilities propagation) is run to compute the posterior relevance probabilities of the documents in the collection given the new query. The quality of the proposed methods is tested using a preliminary experimentation with different standard document collections.
Date: 22. 3.2003 19:30:19
Footnote: Beitrag eines Themenheftes: Mathematical, logical, and formal methods in information retrieval

Crestani, F.; Dominich, S.; Lalmas, M.; Rijsbergen, C.J.K. van: Mathematical, logical, and formal methods in information retrieval : an introduction to the special issue (2003) 0.00

0.0019144587 = product of:
  0.016272899 = sum of:
    0.006124827 = weight(_text_:in in 1451) [ClassicSimilarity], result of:
      0.006124827 = score(doc=1451,freq=8.0), product of:
        0.033961542 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.024967048 = queryNorm
        0.18034597 = fieldWeight in 1451, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.046875 = fieldNorm(doc=1451)
    0.010148071 = product of:
      0.020296142 = sum of:
        0.020296142 = weight(_text_:22 in 1451) [ClassicSimilarity], result of:
          0.020296142 = score(doc=1451,freq=2.0), product of:
            0.08743035 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.024967048 = queryNorm
            0.23214069 = fieldWeight in 1451, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=1451)
      0.5 = coord(1/2)
  0.11764706 = coord(2/17)

Abstract: Research an the use of mathematical, logical, and formal methods, has been central to Information Retrieval research for a long time. Research in this area is important not only because it helps enhancing retrieval effectiveness, but also because it helps clarifying the underlying concepts of Information Retrieval. In this article we outline some of the major aspects of the subject, and summarize the papers of this special issue with respect to how they relate to these aspects. We conclude by highlighting some directions of future research, which are needed to better understand the formal characteristics of Information Retrieval.
Date: 22. 3.2003 19:27:36
Footnote: Einführung zu den Beiträgen eines Themenheftes: Mathematical, logical, and formal methods in information retrieval

Fan, W.; Fox, E.A.; Pathak, P.; Wu, H.: ¬The effects of fitness functions an genetic programming-based ranking discovery for Web search (2004) 0.00

0.0019144587 = product of:
  0.016272899 = sum of:
    0.006124827 = weight(_text_:in in 2239) [ClassicSimilarity], result of:
      0.006124827 = score(doc=2239,freq=8.0), product of:
        0.033961542 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.024967048 = queryNorm
        0.18034597 = fieldWeight in 2239, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.046875 = fieldNorm(doc=2239)
    0.010148071 = product of:
      0.020296142 = sum of:
        0.020296142 = weight(_text_:22 in 2239) [ClassicSimilarity], result of:
          0.020296142 = score(doc=2239,freq=2.0), product of:
            0.08743035 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.024967048 = queryNorm
            0.23214069 = fieldWeight in 2239, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2239)
      0.5 = coord(1/2)
  0.11764706 = coord(2/17)

Abstract: Genetic-based evolutionary learning algorithms, such as genetic algorithms (GAs) and genetic programming (GP), have been applied to information retrieval (IR) since the 1980s. Recently, GP has been applied to a new IR taskdiscovery of ranking functions for Web search-and has achieved very promising results. However, in our prior research, only one fitness function has been used for GP-based learning. It is unclear how other fitness functions may affect ranking function discovery for Web search, especially since it is weIl known that choosing a proper fitness function is very important for the effectiveness and efficiency of evolutionary algorithms. In this article, we report our experience in contrasting different fitness function designs an GP-based learning using a very large Web corpus. Our results indicate that the design of fitness functions is instrumental in performance improvement. We also give recommendations an the design of fitness functions for genetic-based information retrieval experiments.
Date: 31. 5.2004 19:22:06

Furner, J.: ¬A unifying model of document relatedness for hybrid search engines (2003) 0.00
```
0.0019144587 = product of:
  0.016272899 = sum of:
    0.006124827 = weight(_text_:in in 2717) [ClassicSimilarity], result of:
      0.006124827 = score(doc=2717,freq=8.0), product of:
        0.033961542 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.024967048 = queryNorm
        0.18034597 = fieldWeight in 2717, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.046875 = fieldNorm(doc=2717)
    0.010148071 = product of:
      0.020296142 = sum of:
        0.020296142 = weight(_text_:22 in 2717) [ClassicSimilarity], result of:
          0.020296142 = score(doc=2717,freq=2.0), product of:
            0.08743035 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.024967048 = queryNorm
            0.23214069 = fieldWeight in 2717, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2717)
      0.5 = coord(1/2)
  0.11764706 = coord(2/17)
```
Abstract

Previous work an search-engine design has indicated that information-seekers may benefit from being given the opportunity to exploit multiple sources of evidence of document relatedness. Few existing systems, however, give users more than minimal control over the selections that may be made among methods of exploitation. By applying the methods of "document network analysis" (DNA), a unifying, graph-theoretic model of content-, collaboration-, and context-based systems (CCC) may be developed in which the nature of the similarities between types of document relatedness and document ranking are clarified. The usefulness of the approach to system design suggested by this model may be tested by constructing and evaluating a prototype system (UCXtra) that allows searchers to maintain control over the multiple ways in which document collections may be ranked and re-ranked.

Date

11. 9.2004 17:32:22

Series

Advances in knowledge organization; vol.8

Source

Challenges in knowledge representation and organization for the 21st century: Integration of knowledge across boundaries. Proceedings of the 7th ISKO International Conference Granada, Spain, July 10-13, 2002. Ed.: M. López-Huertas
Song, D.; Bruza, P.D.: Towards context sensitive information inference (2003) 0.00
```
0.0017303355 = product of:
  0.014707852 = sum of:
    0.0062511256 = weight(_text_:in in 1428) [ClassicSimilarity], result of:
      0.0062511256 = score(doc=1428,freq=12.0), product of:
        0.033961542 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.024967048 = queryNorm
        0.18406484 = fieldWeight in 1428, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1428)
    0.008456727 = product of:
      0.016913453 = sum of:
        0.016913453 = weight(_text_:22 in 1428) [ClassicSimilarity], result of:
          0.016913453 = score(doc=1428,freq=2.0), product of:
            0.08743035 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.024967048 = queryNorm
            0.19345059 = fieldWeight in 1428, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1428)
      0.5 = coord(1/2)
  0.11764706 = coord(2/17)
```
Abstract

Humans can make hasty, but generally robust judgements about what a text fragment is, or is not, about. Such judgements are termed information inference. This article furnishes an account of information inference from a psychologistic stance. By drawing an theories from nonclassical logic and applied cognition, an information inference mechanism is proposed that makes inferences via computations of information flow through an approximation of a conceptual space. Within a conceptual space information is represented geometrically. In this article, geometric representations of words are realized as vectors in a high dimensional semantic space, which is automatically constructed from a text corpus. Two approaches were presented for priming vector representations according to context. The first approach uses a concept combination heuristic to adjust the vector representation of a concept in the light of the representation of another concept. The second approach computes a prototypical concept an the basis of exemplar trace texts and moves it in the dimensional space according to the context. Information inference is evaluated by measuring the effectiveness of query models derived by information flow computations. Results show that information flow contributes significantly to query model effectiveness, particularly with respect to precision. Moreover, retrieval effectiveness compares favorably with two probabilistic query models, and another based an semantic association. More generally, this article can be seen as a contribution towards realizing operational systems that mimic text-based human reasoning.

Date

22. 3.2003 19:35:46

Footnote

Beitrag eines Themenheftes: Mathematical, logical, and formal methods in information retrieval

Theme

Semantisches Umfeld in Indexierung u. Retrieval
Shiri, A.A.; Revie, C.: Query expansion behavior within a thesaurus-enhanced search environment : a user-centered evaluation (2006) 0.00
```
0.0016662586 = product of:
  0.014163198 = sum of:
    0.005706471 = weight(_text_:in in 56) [ClassicSimilarity], result of:
      0.005706471 = score(doc=56,freq=10.0), product of:
        0.033961542 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.024967048 = queryNorm
        0.16802745 = fieldWeight in 56, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0390625 = fieldNorm(doc=56)
    0.008456727 = product of:
      0.016913453 = sum of:
        0.016913453 = weight(_text_:22 in 56) [ClassicSimilarity], result of:
          0.016913453 = score(doc=56,freq=2.0), product of:
            0.08743035 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.024967048 = queryNorm
            0.19345059 = fieldWeight in 56, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=56)
      0.5 = coord(1/2)
  0.11764706 = coord(2/17)
```
Abstract

The study reported here investigated the query expansion behavior of end-users interacting with a thesaurus-enhanced search system on the Web. Two groups, namely academic staff and postgraduate students, were recruited into this study. Data were collected from 90 searches performed by 30 users using the OVID interface to the CAB abstracts database. Data-gathering techniques included questionnaires, screen capturing software, and interviews. The results presented here relate to issues of search-topic and search-term characteristics, number and types of expanded queries, usefulness of thesaurus terms, and behavioral differences between academic staff and postgraduate students in their interaction. The key conclusions drawn were that (a) academic staff chose more narrow and synonymous terms than did postgraduate students, who generally selected broader and related terms; (b) topic complexity affected users' interaction with the thesaurus in that complex topics required more query expansion and search term selection; (c) users' prior topic-search experience appeared to have a significant effect on their selection and evaluation of thesaurus terms; (d) in 50% of the searches where additional terms were suggested from the thesaurus, users stated that they had not been aware of the terms at the beginning of the search; this observation was particularly noticeable in the case of postgraduate students.

Date

22. 7.2006 16:32:43

Theme

Semantisches Umfeld in Indexierung u. Retrieval

Back, J.: ¬An evaluation of relevancy ranking techniques used by Internet search engines (2000) 0.00

0.0013928724 = product of:
  0.023678832 = sum of:
    0.023678832 = product of:
      0.047357664 = sum of:
        0.047357664 = weight(_text_:22 in 3445) [ClassicSimilarity], result of:
          0.047357664 = score(doc=3445,freq=2.0), product of:
            0.08743035 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.024967048 = queryNorm
            0.5416616 = fieldWeight in 3445, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=3445)
      0.5 = coord(1/2)
  0.05882353 = coord(1/17)

Date: 25. 8.2005 17:42:22

Dominich, S.: Mathematical foundations of information retrieval (2001) 0.00

0.0012951456 = product of:
  0.011008738 = sum of:
    0.0025520115 = weight(_text_:in in 1753) [ClassicSimilarity], result of:
      0.0025520115 = score(doc=1753,freq=2.0), product of:
        0.033961542 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.024967048 = queryNorm
        0.07514416 = fieldWeight in 1753, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1753)
    0.008456727 = product of:
      0.016913453 = sum of:
        0.016913453 = weight(_text_:22 in 1753) [ClassicSimilarity], result of:
          0.016913453 = score(doc=1753,freq=2.0), product of:
            0.08743035 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.024967048 = queryNorm
            0.19345059 = fieldWeight in 1753, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1753)
      0.5 = coord(1/2)
  0.11764706 = coord(2/17)

Abstract: This book offers a comprehensive and consistent mathematical approach to information retrieval (IR) without which no implementation is possible, and sheds an entirely new light upon the structure of IR models. It contains the descriptions of all IR models in a unified formal style and language, along with examples for each, thus offering a comprehensive overview of them. The book also creates mathematical foundations and a consistent mathematical theory (including all mathematical results achieved so far) of IR as a stand-alone mathematical discipline, which thus can be read and taught independently. Also, the book contains all necessary mathematical knowledge on which IR relies, to help the reader avoid searching different sources. The book will be of interest to computer or information scientists, librarians, mathematicians, undergraduate students and researchers whose work involves information retrieval.
Date: 22. 3.2008 12:26:32

Khoo, C.S.G.; Wan, K.-W.: ¬A simple relevancy-ranking strategy for an interface to Boolean OPACs (2004) 0.00
```
0.0012908744 = product of:
  0.010972433 = sum of:
    0.005052725 = weight(_text_:in in 2509) [ClassicSimilarity], result of:
      0.005052725 = score(doc=2509,freq=16.0), product of:
        0.033961542 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.024967048 = queryNorm
        0.14877784 = fieldWeight in 2509, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.02734375 = fieldNorm(doc=2509)
    0.005919708 = product of:
      0.011839416 = sum of:
        0.011839416 = weight(_text_:22 in 2509) [ClassicSimilarity], result of:
          0.011839416 = score(doc=2509,freq=2.0), product of:
            0.08743035 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.024967048 = queryNorm
            0.1354154 = fieldWeight in 2509, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.02734375 = fieldNorm(doc=2509)
      0.5 = coord(1/2)
  0.11764706 = coord(2/17)
```
Abstract

A relevancy-ranking algorithm for a natural language interface to Boolean online public access catalogs (OPACs) was formulated and compared with that currently used in a knowledge-based search interface called the E-Referencer, being developed by the authors. The algorithm makes use of seven weIl-known ranking criteria: breadth of match, section weighting, proximity of query words, variant word forms (stemming), document frequency, term frequency and document length. The algorithm converts a natural language query into a series of increasingly broader Boolean search statements. In a small experiment with ten subjects in which the algorithm was simulated by hand, the algorithm obtained good results with a mean overall precision of 0.42 and mean average precision of 0.62, representing a 27 percent improvement in precision and 41 percent improvement in average precision compared to the E-Referencer. The usefulness of each step in the algorithm was analyzed and suggestions are made for improving the algorithm.

Content

"Most Web search engines accept natural language queries, perform some kind of fuzzy matching and produce ranked output, displaying first the documents that are most likely to be relevant. On the other hand, most library online public access catalogs (OPACs) an the Web are still Boolean retrieval systems that perform exact matching, and require users to express their search requests precisely in a Boolean search language and to refine their search statements to improve the search results. It is well-documented that users have difficulty searching Boolean OPACs effectively (e.g. Borgman, 1996; Ensor, 1992; Wallace, 1993). One approach to making OPACs easier to use is to develop a natural language search interface that acts as a middleware between the user's Web browser and the OPAC system. The search interface can accept a natural language query from the user and reformulate it as a series of Boolean search statements that are then submitted to the OPAC. The records retrieved by the OPAC are ranked by the search interface before forwarding them to the user's Web browser. The user, then, does not need to interact directly with the Boolean OPAC but with the natural language search interface or search intermediary. The search interface interacts with the OPAC system an the user's behalf. The advantage of this approach is that no modification to the OPAC or library system is required. Furthermore, the search interface can access multiple OPACs, acting as a meta search engine, and integrate search results from various OPACs before sending them to the user. The search interface needs to incorporate a method for converting the user's natural language query into a series of Boolean search statements, and for ranking the OPAC records retrieved. The purpose of this study was to develop a relevancyranking algorithm for a search interface to Boolean OPAC systems. This is part of an on-going effort to develop a knowledge-based search interface to OPACs called the E-Referencer (Khoo et al., 1998, 1999; Poo et al., 2000). E-Referencer v. 2 that has been implemented applies a repertoire of initial search strategies and reformulation strategies to retrieve records from OPACs using the Z39.50 protocol, and also assists users in mapping query keywords to the Library of Congress subject headings."

Source

Electronic library. 22(2004) no.2, S.112-120
Weller, K.; Stock, W.G.: Transitive meronymy : automatic concept-based query expansion using weighted transitive part-whole relations (2008) 0.00
```
7.890776E-4 = product of:
  0.01341432 = sum of:
    0.01341432 = weight(_text_:und in 1835) [ClassicSimilarity], result of:
      0.01341432 = score(doc=1835,freq=4.0), product of:
        0.055336144 = queryWeight, product of:
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.024967048 = queryNorm
        0.24241515 = fieldWeight in 1835, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1835)
  0.05882353 = coord(1/17)
```
Abstract

Transitive Meronymie. Automatische begriffsbasierte Suchanfrageerweiterung unter Nutzung gewichteter transitiver Teil-Ganzes-Relationen. Unsere theoretisch orientierte Arbeit isoliert transitive Teil-Ganzes-Beziehungen. Wir diskutieren den Einsatz der Meronymie bei der automatischen begriffsbasierten Suchanfrageerweiterung im Information Retrieval. Aus praktischen Gründen schlagen wir vor, die Bestandsrelationen zu spezifizieren und die einzelnen Arten mit unterschiedlichen Gewichtungswerten zu versehen, die im Retrieval genutzt werden. Für das Design von Wissensordnungen ist bedeutsam, dass innerhalb der Begriffsleiter einer Abstraktionsrelation ein Begriff alle seine Teile (sowie alle transitiven Teile der Teile) an seine Unterbegriffe vererbt.

Source

Information - Wissenschaft und Praxis. 59(2008) H.3, S.165-170
Klein, S.T.: On the use of negation in Boolean IR queries. (2009) 0.00
```
6.304969E-4 = product of:
  0.010718447 = sum of:
    0.010718447 = weight(_text_:in in 3927) [ClassicSimilarity], result of:
      0.010718447 = score(doc=3927,freq=18.0), product of:
        0.033961542 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.024967048 = queryNorm
        0.31560543 = fieldWeight in 3927, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3927)
  0.05882353 = coord(1/17)
```
Abstract

The negation operator, in various forms in which it appears in Information Retrieval queries, is investigated. The applications include negated terms in Boolean queries, more specifically in the presence of metrical constraints, but also negated characters used in the definition of extended keywords by means of regular expressions. Exact definitions are suggested and their usefulness is shown on several examples. Finally, some implementation issues are discussed, in particular as to the order in which the terms of long queries, with or without negated keywords, should be processed, and efficient heuristics for choosing a good order are suggested.
Na, S.-H.; Kang, I.-S.; Roh, J.-E.; Lee, J.-H.: ¬An empirical study of query expansion and cluster-based retrieval in language modeling approach (2007) 0.00
```
5.944382E-4 = product of:
  0.01010545 = sum of:
    0.01010545 = weight(_text_:in in 906) [ClassicSimilarity], result of:
      0.01010545 = score(doc=906,freq=16.0), product of:
        0.033961542 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.024967048 = queryNorm
        0.29755569 = fieldWeight in 906, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0546875 = fieldNorm(doc=906)
  0.05882353 = coord(1/17)
```
Abstract

The term mismatch problem in information retrieval is a critical problem, and several techniques have been developed, such as query expansion, cluster-based retrieval and dimensionality reduction to resolve this issue. Of these techniques, this paper performs an empirical study on query expansion and cluster-based retrieval. We examine the effect of using parsimony in query expansion and the effect of clustering algorithms in cluster-based retrieval. In addition, query expansion and cluster-based retrieval are compared, and their combinations are evaluated in terms of retrieval performance by performing experimentations on seven test collections of NTCIR and TREC.

Footnote

Beitrag in: Special issue on AIRS2005: Information Retrieval Research in Asia
Quiroga, L.M.; Mostafa, J.: ¬An experiment in building profiles in information filtering : the role of context of user relevance feedback (2002) 0.00
```
5.2002515E-4 = product of:
  0.008840428 = sum of:
    0.008840428 = weight(_text_:in in 2579) [ClassicSimilarity], result of:
      0.008840428 = score(doc=2579,freq=24.0), product of:
        0.033961542 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.024967048 = queryNorm
        0.260307 = fieldWeight in 2579, product of:
          4.8989797 = tf(freq=24.0), with freq of:
            24.0 = termFreq=24.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2579)
  0.05882353 = coord(1/17)
```
Abstract

An experiment was conducted to see how relevance feedback could be used to build and adjust profiles to improve the performance of filtering systems. Data was collected during the system interaction of 18 graduate students with SIFTER (Smart Information Filtering Technology for Electronic Resources), a filtering system that ranks incoming information based on users' profiles. The data set came from a collection of 6000 records concerning consumer health. In the first phase of the study, three different modes of profile acquisition were compared. The explicit mode allowed users to directly specify the profile; the implicit mode utilized relevance feedback to create and refine the profile; and the combined mode allowed users to initialize the profile and to continuously refine it using relevance feedback. Filtering performance, measured in terms of Normalized Precision, showed that the three approaches were significantly different ( [small alpha, Greek] =0.05 and p =0.012). The explicit mode of profile acquisition consistently produced superior results. Exclusive reliance on relevance feedback in the implicit mode resulted in inferior performance. The low performance obtained by the implicit acquisition mode motivated the second phase of the study, which aimed to clarify the role of context in relevance feedback judgments. An inductive content analysis of thinking aloud protocols showed dimensions that were highly situational, establishing the importance context plays in feedback relevance assessments. Results suggest the need for better representation of documents, profiles, and relevance feedback mechanisms that incorporate dimensions identified in this research.

Footnote

Beitrag in einem Themenheft: "Issues of context in information retrieval (IR)"

Theme

Semantisches Umfeld in Indexierung u. Retrieval

Sparck Jones, K.: IDF term weighting and IR research lessons (2004) 0.00

5.2002515E-4 = product of:
  0.008840428 = sum of:
    0.008840428 = weight(_text_:in in 4422) [ClassicSimilarity], result of:
      0.008840428 = score(doc=4422,freq=6.0), product of:
        0.033961542 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.024967048 = queryNorm
        0.260307 = fieldWeight in 4422, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.078125 = fieldNorm(doc=4422)
  0.05882353 = coord(1/17)

Abstract: Robertson comments on the theoretical status of IDF term weighting. Its history illustrates how ideas develop in a specific research context, in theory/experiment interaction, and in operational practice.

Abdelali, A.; Cowie, J.; Soliman, H.S.: Improving query precision using semantic expansion (2007) 0.00
```
5.147986E-4 = product of:
  0.008751577 = sum of:
    0.008751577 = weight(_text_:in in 917) [ClassicSimilarity], result of:
      0.008751577 = score(doc=917,freq=12.0), product of:
        0.033961542 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.024967048 = queryNorm
        0.2576908 = fieldWeight in 917, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0546875 = fieldNorm(doc=917)
  0.05882353 = coord(1/17)
```
Abstract

Query Expansion (QE) is one of the most important mechanisms in the information retrieval field. A typical short Internet query will go through a process of refinement to improve its retrieval power. Most of the existing QE techniques suffer from retrieval performance degradation due to imprecise choice of query's additive terms in the QE process. In this paper, we introduce a novel automated QE mechanism. The new expansion process is guided by the semantics relations between the original query and the expanding words, in the context of the utilized corpus. Experimental results of our "controlled" query expansion, using the Arabic TREC-10 data, show a significant enhancement of recall and precision over current existing mechanisms in the field.

Footnote

Beitrag in: Special issue on Heterogeneous and Distributed IR
Computational information retrieval (2001) 0.00
```
5.095185E-4 = product of:
  0.008661814 = sum of:
    0.008661814 = weight(_text_:in in 4167) [ClassicSimilarity], result of:
      0.008661814 = score(doc=4167,freq=16.0), product of:
        0.033961542 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.024967048 = queryNorm
        0.25504774 = fieldWeight in 4167, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.046875 = fieldNorm(doc=4167)
  0.05882353 = coord(1/17)
```
Abstract

This volume contains selected papers that focus on the use of linear algebra, computational statistics, and computer science in the development of algorithms and software systems for text retrieval. Experts in information modeling and retrieval share their perspectives on the design of scalable but precise text retrieval systems, revealing many of the challenges and obstacles that mathematical and statistical models must overcome to be viable for automated text processing. This very useful proceedings is an excellent companion for courses in information retrieval, applied linear algebra, and applied statistics. Computational Information Retrieval provides background material on vector space models for text retrieval that applied mathematicians, statisticians, and computer scientists may not be familiar with. For graduate students in these areas, several research questions in information modeling are exposed. In addition, several case studies concerning the efficacy of the popular Latent Semantic Analysis (or Indexing) approach are provided.

Source

Workshop held in October 2000 in Raleigh, North Carolina

Search (125 results, page 1 of 7)

Authors

Types

Themes

Subjects

Classifications