Search (26 results, page 1 of 2)

Klas, C.-P.; Fuhr, N.; Schaefer, A.: Evaluating strategic support for information access in the DAFFODIL system (2004) 0.09
```
0.08519173 = product of:
  0.12778759 = sum of:
    0.10759281 = weight(_text_:systematic in 2419) [ClassicSimilarity], result of:
      0.10759281 = score(doc=2419,freq=2.0), product of:
        0.28397155 = queryWeight, product of:
          5.715473 = idf(docFreq=395, maxDocs=44218)
          0.049684696 = queryNorm
        0.3788859 = fieldWeight in 2419, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.715473 = idf(docFreq=395, maxDocs=44218)
          0.046875 = fieldNorm(doc=2419)
    0.02019477 = product of:
      0.04038954 = sum of:
        0.04038954 = weight(_text_:22 in 2419) [ClassicSimilarity], result of:
          0.04038954 = score(doc=2419,freq=2.0), product of:
            0.17398734 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.049684696 = queryNorm
            0.23214069 = fieldWeight in 2419, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2419)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

The digital library system Daffodil is targeted at strategic support of users during the information search process. For searching, exploring and managing digital library objects it provides user-customisable information seeking patterns over a federation of heterogeneous digital libraries. In this paper evaluation results with respect to retrieval effectiveness, efficiency and user satisfaction are presented. The analysis focuses on strategic support for the scientific work-flow. Daffodil supports the whole work-flow, from data source selection over information seeking to the representation, organisation and reuse of information. By embedding high level search functionality into the scientific work-flow, the user experiences better strategic system support due to a more systematic work process. These ideas have been implemented in Daffodil followed by a qualitative evaluation. The evaluation has been conducted with 28 participants, ranging from information seeking novices to experts. The results are promising, as they support the chosen model.

Date

16.11.2008 16:22:48

Kanaeva, Z.: Ranking: Google und CiteSeer (2005) 0.03

0.03447506 = product of:
  0.103425175 = sum of:
    0.103425175 = sum of:
      0.05630404 = weight(_text_:indexing in 3276) [ClassicSimilarity], result of:
        0.05630404 = score(doc=3276,freq=2.0), product of:
          0.19018644 = queryWeight, product of:
            3.8278677 = idf(docFreq=2614, maxDocs=44218)
            0.049684696 = queryNorm
          0.29604656 = fieldWeight in 3276, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.8278677 = idf(docFreq=2614, maxDocs=44218)
            0.0546875 = fieldNorm(doc=3276)
      0.047121134 = weight(_text_:22 in 3276) [ClassicSimilarity], result of:
        0.047121134 = score(doc=3276,freq=2.0), product of:
          0.17398734 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.049684696 = queryNorm
          0.2708308 = fieldWeight in 3276, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0546875 = fieldNorm(doc=3276)
  0.33333334 = coord(1/3)

Abstract: Im Rahmen des klassischen Information Retrieval wurden verschiedene Verfahren für das Ranking sowie die Suche in einer homogenen strukturlosen Dokumentenmenge entwickelt. Die Erfolge der Suchmaschine Google haben gezeigt dass die Suche in einer zwar inhomogenen aber zusammenhängenden Dokumentenmenge wie dem Internet unter Berücksichtigung der Dokumentenverbindungen (Links) sehr effektiv sein kann. Unter den von der Suchmaschine Google realisierten Konzepten ist ein Verfahren zum Ranking von Suchergebnissen (PageRank), das in diesem Artikel kurz erklärt wird. Darüber hinaus wird auf die Konzepte eines Systems namens CiteSeer eingegangen, welches automatisch bibliographische Angaben indexiert (engl. Autonomous Citation Indexing, ACI). Letzteres erzeugt aus einer Menge von nicht vernetzten wissenschaftlichen Dokumenten eine zusammenhängende Dokumentenmenge und ermöglicht den Einsatz von Banking-Verfahren, die auf den von Google genutzten Verfahren basieren.
Date: 20. 3.2005 16:23:22

Picard, J.; Savoy, J.: Enhancing retrieval with hyperlinks : a general model based on propositional argumentation systems (2003) 0.03
```
0.029886894 = product of:
  0.08966068 = sum of:
    0.08966068 = weight(_text_:systematic in 1427) [ClassicSimilarity], result of:
      0.08966068 = score(doc=1427,freq=2.0), product of:
        0.28397155 = queryWeight, product of:
          5.715473 = idf(docFreq=395, maxDocs=44218)
          0.049684696 = queryNorm
        0.31573826 = fieldWeight in 1427, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.715473 = idf(docFreq=395, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1427)
  0.33333334 = coord(1/3)
```
Abstract

Fast, effective, and adaptable techniques are needed to automatically organize and retrieve information an the ever-increasing World Wide Web. In that respect, different strategies have been suggested to take hypertext links into account. For example, hyperlinks have been used to (1) enhance document representation, (2) improve document ranking by propagating document score, (3) provide an indicator of popularity, and (4) find hubs and authorities for a given topic. Although the TREC experiments have not demonstrated the usefulness of hyperlinks for retrieval, the hypertext structure is nevertheless an essential aspect of the Web, and as such, should not be ignored. The development of abstract models of the IR task was a key factor to the improvement of search engines. However, at this time conceptual tools for modeling the hypertext retrieval task are lacking, making it difficult to compare, improve, and reason an the existing techniques. This article proposes a general model for using hyperlinks based an Probabilistic Argumentation Systems, in which each of the above-mentioned techniques can be stated. This model will allow to discover some inconsistencies in the mentioned techniques, and to take a higher level and systematic approach for using hyperlinks for retrieval.
MacFarlane, A.; McCann, J.A.; Robertson, S.E.: Parallel methods for the generation of partitioned inverted files (2005) 0.02
```
0.016086869 = product of:
  0.048260607 = sum of:
    0.048260607 = product of:
      0.09652121 = sum of:
        0.09652121 = weight(_text_:indexing in 651) [ClassicSimilarity], result of:
          0.09652121 = score(doc=651,freq=8.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.5075084 = fieldWeight in 651, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.046875 = fieldNorm(doc=651)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

Purpose - The generation of inverted indexes is one of the most computationally intensive activities for information retrieval systems: indexing large multi-gigabyte text databases can take many hours or even days to complete. We examine the generation of partitioned inverted files in order to speed up the process of indexing. Two types of index partitions are investigated: TermId and DocId. Design/methodology/approach - We use standard measures used in parallel computing such as speedup and efficiency to examine the computing results and also the space costs of our trial indexing experiments. Findings - The results from runs on both partitioning methods are compared and contrasted, concluding that DocId is the more efficient method. Practical implications - The practical implications are that the DocId partitioning method would in most circumstances be used for distributing inverted file data in a parallel computer, particularly if indexing speed is the primary consideration. Originality/value - The paper is of value to database administrators who manage large-scale text collections, and who need to use parallel computing to implement their text retrieval services.

Back, J.: ¬An evaluation of relevancy ranking techniques used by Internet search engines (2000) 0.02

0.015707046 = product of:
  0.047121134 = sum of:
    0.047121134 = product of:
      0.09424227 = sum of:
        0.09424227 = weight(_text_:22 in 3445) [ClassicSimilarity], result of:
          0.09424227 = score(doc=3445,freq=2.0), product of:
            0.17398734 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.049684696 = queryNorm
            0.5416616 = fieldWeight in 3445, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=3445)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Date: 25. 8.2005 17:42:22

Maron, M.E.: ¬An historical note on the origins of probabilistic indexing (2008) 0.02

0.015166845 = product of:
  0.045500536 = sum of:
    0.045500536 = product of:
      0.09100107 = sum of:
        0.09100107 = weight(_text_:indexing in 2047) [ClassicSimilarity], result of:
          0.09100107 = score(doc=2047,freq=4.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.47848347 = fieldWeight in 2047, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0625 = fieldNorm(doc=2047)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Abstract: The motivation behind "Probabilistic Indexing" was to replace two-valued thinking about information retrieval with probabilistic notions. This involved a new view of the information retrieval problem - viewing it as problem of inference and prediction, and introducing probabilistically weighted indexes and probabilistically ranked output. These ideas were first formulated and written up in August 1958.

Thompson, P.: Looking back: on relevance, probabilistic indexing and information retrieval (2008) 0.02

0.015166845 = product of:
  0.045500536 = sum of:
    0.045500536 = product of:
      0.09100107 = sum of:
        0.09100107 = weight(_text_:indexing in 2074) [ClassicSimilarity], result of:
          0.09100107 = score(doc=2074,freq=4.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.47848347 = fieldWeight in 2074, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0625 = fieldNorm(doc=2074)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Abstract: Forty-eight years ago Maron and Kuhns published their paper, "On Relevance, Probabilistic Indexing and Information Retrieval" (1960). This was the first paper to present a probabilistic approach to information retrieval, and perhaps the first paper on ranked retrieval. Although it is one of the most widely cited papers in the field of information retrieval, many researchers today may not be familiar with its influence. This paper describes the Maron and Kuhns article and the influence that it has had on the field of information retrieval.

Efron, M.: Query expansion and dimensionality reduction : Notions of optimality in Rocchio relevance feedback and latent semantic indexing (2008) 0.01
```
0.013931636 = product of:
  0.041794907 = sum of:
    0.041794907 = product of:
      0.083589815 = sum of:
        0.083589815 = weight(_text_:indexing in 2020) [ClassicSimilarity], result of:
          0.083589815 = score(doc=2020,freq=6.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.4395151 = fieldWeight in 2020, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.046875 = fieldNorm(doc=2020)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

Rocchio relevance feedback and latent semantic indexing (LSI) are well-known extensions of the vector space model for information retrieval (IR). This paper analyzes the statistical relationship between these extensions. The analysis focuses on each method's basis in least-squares optimization. Noting that LSI and Rocchio relevance feedback both alter the vector space model in a way that is in some sense least-squares optimal, we ask: what is the relationship between LSI's and Rocchio's notions of optimality? What does this relationship imply for IR? Using an analytical approach, we argue that Rocchio relevance feedback is optimal if we understand retrieval as a simplified classification problem. On the other hand, LSI's motivation comes to the fore if we understand it as a biased regression technique, where projection onto a low-dimensional orthogonal subspace of the documents reduces model variance.

Object

Latent semantic indexing

Liu, A.; Zou, Q.; Chu, W.W.: Configurable indexing and ranking for XML information retrieval (2004) 0.01

0.0134057235 = product of:
  0.04021717 = sum of:
    0.04021717 = product of:
      0.08043434 = sum of:
        0.08043434 = weight(_text_:indexing in 4114) [ClassicSimilarity], result of:
          0.08043434 = score(doc=4114,freq=2.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.42292362 = fieldWeight in 4114, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.078125 = fieldNorm(doc=4114)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Lee, C.; Lee, G.G.: Probabilistic information retrieval model for a dependence structured indexing system (2005) 0.01
```
0.013270989 = product of:
  0.039812967 = sum of:
    0.039812967 = product of:
      0.079625934 = sum of:
        0.079625934 = weight(_text_:indexing in 1004) [ClassicSimilarity], result of:
          0.079625934 = score(doc=1004,freq=4.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.41867304 = fieldWeight in 1004, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1004)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

Most previous information retrieval (IR) models assume that terms of queries and documents are statistically independent from each other. However, conditional independence assumption is obviously and openly understood to be wrong, so we present a new method of incorporating term dependence into a probabilistic retrieval model by adapting a dependency structured indexing system using a dependency parse tree and Chow Expansion to compensate the weakness of the assumption. In this paper, we describe a theoretic process to apply the Chow Expansion to the general probabilistic models and the state-of-the-art 2-Poisson model. Through experiments on document collections in English and Korean, we demonstrate that the incorporation of term dependences using Chow Expansion contributes to the improvement of performance in probabilistic IR systems.
Hoenkamp, E.: Unitary operators on the document space (2003) 0.01
```
0.011609698 = product of:
  0.03482909 = sum of:
    0.03482909 = product of:
      0.06965818 = sum of:
        0.06965818 = weight(_text_:indexing in 3457) [ClassicSimilarity], result of:
          0.06965818 = score(doc=3457,freq=6.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.3662626 = fieldWeight in 3457, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3457)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

When people search for documents, they eventually want content, not words. Hence, search engines should relate documents more by their underlying concepts than by the words they contain. One promising technique to do so is Latent Semantic Indexing (LSI). LSI dramatically reduces the dimension of the document space by mapping it into a space spanned by conceptual indices. Empirically, the number of concepts that can represent the documents are far fewer than the great variety of words in the textual representation. Although this almost obviates the problem of lexical matching, the mapping incurs a high computational cost compared to document parsing, indexing, query matching, and updating. This article accomplishes several things. First, it shows how the technique underlying LSI is just one example of a unitary operator, for which there are computationally more attractive alternatives. Second, it proposes the Haar transform as such an alternative, as it is memory efficient, and can be computed in linear to sublinear time. Third, it generalizes LSI by a multiresolution representation of the document space. The approach not only preserves the advantages of LSI at drastically reduced computational costs, it also opens a spectrum of possibilities for new research.

Object

Latent Semantic Indexing
Baeza-Yates, R.; Navarro, G.: Block addressing indices for approximate text retrieval (2000) 0.01
```
0.011375135 = product of:
  0.034125403 = sum of:
    0.034125403 = product of:
      0.068250805 = sum of:
        0.068250805 = weight(_text_:indexing in 4295) [ClassicSimilarity], result of:
          0.068250805 = score(doc=4295,freq=4.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.3588626 = fieldWeight in 4295, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.046875 = fieldNorm(doc=4295)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

The issue of reducing the space overhead when indexing large text databases is becoming more and more important, as the text collection grow in size. Another subject, which is gaining importance as text databases grow and get more heterogeneous and error prone, is that of flexible string matching. One of the best tools to make the search more flexible is to allow a limited number of differences between the words found and those sought. This is called 'approximate text searching'. which is becoming more and more popular. In recent years some indexing schemes with very low space overhead have appeared, some of them dealing with approximate searching. These low overhead indices (whose most notorious exponent is Glimpse) are modified inverted files, where space is saved by making the lists of occurences point to text blocks instead of exact word positions. Despite their existence, little is known about the expected behaviour of these 'block addressing' indices, and even less is known when it comes to cope with approximate search. Our main contribution is an analytical study of the space-time trade-offs for indexed text searching

MacFarlane, A.; Robertson, S.E.; McCann, J.A.: Parallel computing for passage retrieval (2004) 0.01

0.008975455 = product of:
  0.026926363 = sum of:
    0.026926363 = product of:
      0.053852726 = sum of:
        0.053852726 = weight(_text_:22 in 5108) [ClassicSimilarity], result of:
          0.053852726 = score(doc=5108,freq=2.0), product of:
            0.17398734 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.049684696 = queryNorm
            0.30952093 = fieldWeight in 5108, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=5108)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Date: 20. 1.2007 18:30:22

Losada, D.E.; Barreiro, A.: Emebedding term similarity and inverse document frequency into a logical model of information retrieval (2003) 0.01

0.008975455 = product of:
  0.026926363 = sum of:
    0.026926363 = product of:
      0.053852726 = sum of:
        0.053852726 = weight(_text_:22 in 1422) [ClassicSimilarity], result of:
          0.053852726 = score(doc=1422,freq=2.0), product of:
            0.17398734 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.049684696 = queryNorm
            0.30952093 = fieldWeight in 1422, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=1422)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Date: 22. 3.2003 19:27:23

Crestani, F.; Dominich, S.; Lalmas, M.; Rijsbergen, C.J.K. van: Mathematical, logical, and formal methods in information retrieval : an introduction to the special issue (2003) 0.01

0.0067315903 = product of:
  0.02019477 = sum of:
    0.02019477 = product of:
      0.04038954 = sum of:
        0.04038954 = weight(_text_:22 in 1451) [ClassicSimilarity], result of:
          0.04038954 = score(doc=1451,freq=2.0), product of:
            0.17398734 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.049684696 = queryNorm
            0.23214069 = fieldWeight in 1451, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=1451)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Date: 22. 3.2003 19:27:36

Fan, W.; Fox, E.A.; Pathak, P.; Wu, H.: ¬The effects of fitness functions an genetic programming-based ranking discovery for Web search (2004) 0.01

0.0067315903 = product of:
  0.02019477 = sum of:
    0.02019477 = product of:
      0.04038954 = sum of:
        0.04038954 = weight(_text_:22 in 2239) [ClassicSimilarity], result of:
          0.04038954 = score(doc=2239,freq=2.0), product of:
            0.17398734 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.049684696 = queryNorm
            0.23214069 = fieldWeight in 2239, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2239)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Date: 31. 5.2004 19:22:06

Furner, J.: ¬A unifying model of document relatedness for hybrid search engines (2003) 0.01

0.0067315903 = product of:
  0.02019477 = sum of:
    0.02019477 = product of:
      0.04038954 = sum of:
        0.04038954 = weight(_text_:22 in 2717) [ClassicSimilarity], result of:
          0.04038954 = score(doc=2717,freq=2.0), product of:
            0.17398734 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.049684696 = queryNorm
            0.23214069 = fieldWeight in 2717, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2717)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Date: 11. 9.2004 17:32:22

Witschel, H.F.: Global term weights in distributed environments (2008) 0.01

0.0067315903 = product of:
  0.02019477 = sum of:
    0.02019477 = product of:
      0.04038954 = sum of:
        0.04038954 = weight(_text_:22 in 2096) [ClassicSimilarity], result of:
          0.04038954 = score(doc=2096,freq=2.0), product of:
            0.17398734 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.049684696 = queryNorm
            0.23214069 = fieldWeight in 2096, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2096)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Date: 1. 8.2008 9:44:22

Campos, L.M. de; Fernández-Luna, J.M.; Huete, J.F.: Implementing relevance feedback in the Bayesian network retrieval model (2003) 0.01

0.0067315903 = product of:
  0.02019477 = sum of:
    0.02019477 = product of:
      0.04038954 = sum of:
        0.04038954 = weight(_text_:22 in 825) [ClassicSimilarity], result of:
          0.04038954 = score(doc=825,freq=2.0), product of:
            0.17398734 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.049684696 = queryNorm
            0.23214069 = fieldWeight in 825, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=825)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Date: 22. 3.2003 19:30:19

Chen, H.; Lally, A.M.; Zhu, B.; Chau, M.: HelpfulMed : Intelligent searching for medical information over the Internet (2003) 0.01
```
0.0067028617 = product of:
  0.020108584 = sum of:
    0.020108584 = product of:
      0.04021717 = sum of:
        0.04021717 = weight(_text_:indexing in 1615) [ClassicSimilarity], result of:
          0.04021717 = score(doc=1615,freq=2.0), product of:
            0.19018644 = queryWeight, product of:
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.049684696 = queryNorm
            0.21146181 = fieldWeight in 1615, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.8278677 = idf(docFreq=2614, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1615)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

The Medical professionals and researchers need information from reputable sources to accomplish their work. Unfortunately, the Web has a large number of documents that are irrelevant to their work, even those documents that purport to be "medically-related." This paper describes an architecture designed to integrate advanced searching and indexing algorithms, an automatic thesaurus, or "concept space," and Kohonen-based Self-Organizing Map (SOM) technologies to provide searchers with finegrained results. Initial results indicate that these systems provide complementary retrieval functionalities. HelpfulMed not only allows users to search Web pages and other online databases, but also allows them to build searches through the use of an automatic thesaurus and browse a graphical display of medical-related topics. Evaluation results for each of the different components are included. Our spidering algorithm outperformed both breadth-first search and PageRank spiders an a test collection of 100,000 Web pages. The automatically generated thesaurus performed as well as both MeSH and UMLS-systems which require human mediation for currency. Lastly, a variant of the Kohonen SOM was comparable to MeSH terms in perceived cluster precision and significantly better at perceived cluster recall.

Search (26 results, page 1 of 2)

Authors

Languages

Themes