Search (13 results, page 1 of 1)

  • × year_i:[1990 TO 2000}
  • × theme_ss:"Retrievalalgorithmen"
  1. Couvreur, T.R.; Benzel, R.N.; Miller, S.F.; Zeitler, D.N.; Lee, D.L.; Singhal, M.; Shivaratri, N.; Wong, W.Y.P.: ¬An analysis of performance and cost factors in searching large text databases using parallel search systems (1994) 0.01
    0.0131395515 = product of:
      0.10511641 = sum of:
        0.10511641 = weight(_text_:supported in 7657) [ClassicSimilarity], result of:
          0.10511641 = score(doc=7657,freq=2.0), product of:
            0.22949564 = queryWeight, product of:
              5.9223356 = idf(docFreq=321, maxDocs=44218)
              0.03875087 = queryNorm
            0.45803228 = fieldWeight in 7657, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.9223356 = idf(docFreq=321, maxDocs=44218)
              0.0546875 = fieldNorm(doc=7657)
      0.125 = coord(1/8)
    
    Abstract
    The results of modelling the performance of searching large text databases (>10 GBytes) via various parallel hardware architectures and search algorithms are discussed. The performance under load and the cost of each configuration are compared. Strengths, weaknesses, performance sensitivities, and search features supported for each configuration are also addressed. In addition, a common search workload used in the modelling is described. The search workload is derived from a set of searches run against the Chemical Abstracts file of bibliographic and abstract text available on STN International. This common workload is applied to all configurations modelled to provide a common basis of comparison
  2. Lee, D.L.; Ren, L.: Document ranking on weight-partitioned signature files (1996) 0.01
    0.0131395515 = product of:
      0.10511641 = sum of:
        0.10511641 = weight(_text_:supported in 2417) [ClassicSimilarity], result of:
          0.10511641 = score(doc=2417,freq=2.0), product of:
            0.22949564 = queryWeight, product of:
              5.9223356 = idf(docFreq=321, maxDocs=44218)
              0.03875087 = queryNorm
            0.45803228 = fieldWeight in 2417, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.9223356 = idf(docFreq=321, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2417)
      0.125 = coord(1/8)
    
    Abstract
    Proposes the weight partitioned signature file, a signature file organization for supporting document ranking. It uses multiple signature files each corresponding to one term frequency to represent terms with different term frequencies. Words with the same term frequency in a document are grouped together and hased into the signature file corresponding to that term frequency. Investigates the effect of false drops on retrieval effectiveness. Analyses the performance of the weight partitioned signature file under different search strategies and configurations. Obtains an optimal formula for storage allocation to minimise the effect of false drops on document ranks. Analytical results are supported by experiments on document collections
  3. Robertson, S.E.: OKAPI at TREC-1 (1994) 0.01
    0.007209763 = product of:
      0.057678103 = sum of:
        0.057678103 = weight(_text_:work in 7953) [ClassicSimilarity], result of:
          0.057678103 = score(doc=7953,freq=2.0), product of:
            0.14223081 = queryWeight, product of:
              3.6703904 = idf(docFreq=3060, maxDocs=44218)
              0.03875087 = queryNorm
            0.40552467 = fieldWeight in 7953, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.6703904 = idf(docFreq=3060, maxDocs=44218)
              0.078125 = fieldNorm(doc=7953)
      0.125 = coord(1/8)
    
    Abstract
    Describes the work carried out on the TREC-2 project following the results of the TREC-1 project. Experiments were conducted on the OKAPI experimental text information retrieval system which investigated a number of alternative probabilistic term weighting functions in place of the 'standard' Robertson Sparck Jones weighting functions used in TREC-1
  4. Keen, M.: Query reformulation in ranked output interaction (1994) 0.01
    0.005046834 = product of:
      0.04037467 = sum of:
        0.04037467 = weight(_text_:work in 1065) [ClassicSimilarity], result of:
          0.04037467 = score(doc=1065,freq=2.0), product of:
            0.14223081 = queryWeight, product of:
              3.6703904 = idf(docFreq=3060, maxDocs=44218)
              0.03875087 = queryNorm
            0.28386727 = fieldWeight in 1065, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.6703904 = idf(docFreq=3060, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1065)
      0.125 = coord(1/8)
    
    Abstract
    Reports on a research project to evaluate and compare Boolean searching and methods of query reformulation using ranked output retrieval. Illustrates the design and operating features of the ranked output system, called ROSE (Ranked Output Search Engine), by means of typical results obtained by searching a database of 1239 records on the subject of cystic fibrosis. Concludes that further work is needed to determine the best reformulation tactics needed to harness the professional searcher's intelligence with that much more limited intelligence provided by the search software
  5. Cross-language information retrieval (1998) 0.00
    0.004692697 = product of:
      0.037541576 = sum of:
        0.037541576 = weight(_text_:supported in 6299) [ClassicSimilarity], result of:
          0.037541576 = score(doc=6299,freq=2.0), product of:
            0.22949564 = queryWeight, product of:
              5.9223356 = idf(docFreq=321, maxDocs=44218)
              0.03875087 = queryNorm
            0.16358295 = fieldWeight in 6299, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.9223356 = idf(docFreq=321, maxDocs=44218)
              0.01953125 = fieldNorm(doc=6299)
      0.125 = coord(1/8)
    
    Footnote
    The retrieved output from a query including the phrase 'big rockets' may be, for instance, a sentence containing 'giant rocket' which is semantically ranked above 'military ocket'. David Hull (Xerox Research Centre, Grenoble) describes an implementation of a weighted Boolean model for Spanish-English CLIR. Users construct Boolean-type queries, weighting each term in the query, which is then translated by an on-line dictionary before being applied to the database. Comparisons with the performance of unweighted free-form queries ('vector space' models) proved encouraging. Two contributions consider the evaluation of CLIR systems. In order to by-pass the time-consuming and expensive process of assembling a standard collection of documents and of user queries against which the performance of an CLIR system is manually assessed, Páriac Sheridan et al (ETH Zurich) propose a method based on retrieving 'seed documents'. This involves identifying a unique document in a database (the 'seed document') and, for a number of queries, measuring how fast it is retrieved. The authors have also assembled a large database of multilingual news documents for testing purposes. By storing the (fairly short) documents in a structured form tagged with descriptor codes (e.g. for topic, country and area), the test suite is easily expanded while remaining consistent for the purposes of testing. Douglas Ouard and Bonne Dorr (University of Maryland) describe an evaluation methodology which appears to apply LSI techniques in order to filter and rank incoming documents designed for testing CLIR systems. The volume provides the reader an excellent overview of several projects in CLIR. It is well supported with references and is intended as a secondary text for researchers and practitioners. It highlights the need for a good, general tutorial introduction to the field."
  6. Rada, R.; Barlow, J.; Potharst, J.; Zanstra, P.; Bijstra, D.: Document ranking using an enriched thesaurus (1991) 0.00
    0.004325858 = product of:
      0.034606863 = sum of:
        0.034606863 = weight(_text_:work in 6626) [ClassicSimilarity], result of:
          0.034606863 = score(doc=6626,freq=2.0), product of:
            0.14223081 = queryWeight, product of:
              3.6703904 = idf(docFreq=3060, maxDocs=44218)
              0.03875087 = queryNorm
            0.2433148 = fieldWeight in 6626, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.6703904 = idf(docFreq=3060, maxDocs=44218)
              0.046875 = fieldNorm(doc=6626)
      0.125 = coord(1/8)
    
    Abstract
    A thesaurus may be viewed as a graph, and document retrieval algorithms can exploit this graph when both the documents and the query are represented by thesaurus terms. These retrieval algorithms measure the distance between the query and documents by using the path lengths in the graph. Previous work witj such strategies has shown that the hierarchical relations in the thesaurus are useful but the non-hierarchical are not. This paper shows that when the query explicitly mentions a particular non-hierarchical relation, the retrieval algorithm benefits from the presence of such relations in the thesaurus. Our algorithms were applied to the Excerpta Medica bibliographic citation database whose citations are indexed with terms from the EMTREE thesaurus. We also created an enriched EMTREE by systematically adding non-hierarchical relations from a medical knowledge base. Our algorithms used at one time EMTREE and, at another time, the enriched EMTREE in the course of ranking documents from Excerpta Medica against queries. When, and only when, the query specifically mentioned a particular non-hierarchical relation type, did EMTREE enriched with that relation type lead to a ranking that better corresponded to an expert's ranking
  7. Robertson, S.E.; Sparck Jones, K.: Simple, proven approaches to text retrieval (1997) 0.00
    0.0036048815 = product of:
      0.028839052 = sum of:
        0.028839052 = weight(_text_:work in 4532) [ClassicSimilarity], result of:
          0.028839052 = score(doc=4532,freq=2.0), product of:
            0.14223081 = queryWeight, product of:
              3.6703904 = idf(docFreq=3060, maxDocs=44218)
              0.03875087 = queryNorm
            0.20276234 = fieldWeight in 4532, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.6703904 = idf(docFreq=3060, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4532)
      0.125 = coord(1/8)
    
    Abstract
    This technical note describes straightforward techniques for document indexing and retrieval that have been solidly established through extensive testing and are easy to apply. They are useful for many different types of text material, are viable for very large files, and have the advantage that they do not require special skills or training for searching, but are easy for end users. The document and text retrieval methods described here have a sound theoretical basis, are well established by extensive testing, and the ideas involved are now implemented in some commercial retrieval systems. Testing in the last few years has, in particular, shown that the methods presented here work very well with full texts, not only title and abstracts, and with large files of texts containing three quarters of a million documents. These tests, the TREC Tests (see Harman 1993 - 1997; IP&M 1995), have been rigorous comparative evaluations involving many different approaches to information retrieval. These techniques depend an the use of simple terms for indexing both request and document texts; an term weighting exploiting statistical information about term occurrences; an scoring for request-document matching, using these weights, to obtain a ranked search output; and an relevance feedback to modify request weights or term sets in iterative searching. The normal implementation is via an inverted file organisation using a term list with linked document identifiers, plus counting data, and pointers to the actual texts. The user's request can be a word list, phrases, sentences or extended text.
  8. Faloutsos, C.: Signature files (1992) 0.00
    0.002625104 = product of:
      0.021000832 = sum of:
        0.021000832 = product of:
          0.042001665 = sum of:
            0.042001665 = weight(_text_:22 in 3499) [ClassicSimilarity], result of:
              0.042001665 = score(doc=3499,freq=2.0), product of:
                0.13569894 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03875087 = queryNorm
                0.30952093 = fieldWeight in 3499, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=3499)
          0.5 = coord(1/2)
      0.125 = coord(1/8)
    
    Date
    7. 5.1999 15:22:48
  9. Chang, C.-H.; Hsu, C.-C.: Integrating query expansion and conceptual relevance feedback for personalized Web information retrieval (1998) 0.00
    0.002296966 = product of:
      0.018375728 = sum of:
        0.018375728 = product of:
          0.036751457 = sum of:
            0.036751457 = weight(_text_:22 in 1319) [ClassicSimilarity], result of:
              0.036751457 = score(doc=1319,freq=2.0), product of:
                0.13569894 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03875087 = queryNorm
                0.2708308 = fieldWeight in 1319, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1319)
          0.5 = coord(1/2)
      0.125 = coord(1/8)
    
    Date
    1. 8.1996 22:08:06
  10. Joss, M.W.; Wszola, S.: ¬The engines that can : text search and retrieval software, their strategies, and vendors (1996) 0.00
    0.0019688278 = product of:
      0.015750622 = sum of:
        0.015750622 = product of:
          0.031501245 = sum of:
            0.031501245 = weight(_text_:22 in 5123) [ClassicSimilarity], result of:
              0.031501245 = score(doc=5123,freq=2.0), product of:
                0.13569894 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03875087 = queryNorm
                0.23214069 = fieldWeight in 5123, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5123)
          0.5 = coord(1/2)
      0.125 = coord(1/8)
    
    Date
    12. 9.1996 13:56:22
  11. Kelledy, F.; Smeaton, A.F.: Signature files and beyond (1996) 0.00
    0.0019688278 = product of:
      0.015750622 = sum of:
        0.015750622 = product of:
          0.031501245 = sum of:
            0.031501245 = weight(_text_:22 in 6973) [ClassicSimilarity], result of:
              0.031501245 = score(doc=6973,freq=2.0), product of:
                0.13569894 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03875087 = queryNorm
                0.23214069 = fieldWeight in 6973, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=6973)
          0.5 = coord(1/2)
      0.125 = coord(1/8)
    
    Source
    Information retrieval: new systems and current research. Proceedings of the 16th Research Colloquium of the British Computer Society Information Retrieval Specialist Group, Drymen, Scotland, 22-23 Mar 94. Ed.: R. Leon
  12. Burgin, R.: ¬The retrieval effectiveness of 5 clustering algorithms as a function of indexing exhaustivity (1995) 0.00
    0.00164069 = product of:
      0.01312552 = sum of:
        0.01312552 = product of:
          0.02625104 = sum of:
            0.02625104 = weight(_text_:22 in 3365) [ClassicSimilarity], result of:
              0.02625104 = score(doc=3365,freq=2.0), product of:
                0.13569894 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03875087 = queryNorm
                0.19345059 = fieldWeight in 3365, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3365)
          0.5 = coord(1/2)
      0.125 = coord(1/8)
    
    Date
    22. 2.1996 11:20:06
  13. Efthimiadis, E.N.: User choices : a new yardstick for the evaluation of ranking algorithms for interactive query expansion (1995) 0.00
    0.00164069 = product of:
      0.01312552 = sum of:
        0.01312552 = product of:
          0.02625104 = sum of:
            0.02625104 = weight(_text_:22 in 5697) [ClassicSimilarity], result of:
              0.02625104 = score(doc=5697,freq=2.0), product of:
                0.13569894 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03875087 = queryNorm
                0.19345059 = fieldWeight in 5697, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5697)
          0.5 = coord(1/2)
      0.125 = coord(1/8)
    
    Date
    22. 2.1996 13:14:10