Search (30 results, page 1 of 2)

  • × language_ss:"e"
  • × theme_ss:"Retrievalalgorithmen"
  • × year_i:[1990 TO 2000}
  1. Joss, M.W.; Wszola, S.: ¬The engines that can : text search and retrieval software, their strategies, and vendors (1996) 0.04
    0.042603284 = product of:
      0.12780985 = sum of:
        0.113619536 = weight(_text_:searching in 5123) [ClassicSimilarity], result of:
          0.113619536 = score(doc=5123,freq=18.0), product of:
            0.14122958 = queryWeight, product of:
              4.0452914 = idf(docFreq=2103, maxDocs=44218)
              0.03491209 = queryNorm
            0.80450237 = fieldWeight in 5123, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              4.0452914 = idf(docFreq=2103, maxDocs=44218)
              0.046875 = fieldNorm(doc=5123)
        0.014190319 = product of:
          0.028380638 = sum of:
            0.028380638 = weight(_text_:22 in 5123) [ClassicSimilarity], result of:
              0.028380638 = score(doc=5123,freq=2.0), product of:
                0.1222562 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03491209 = queryNorm
                0.23214069 = fieldWeight in 5123, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5123)
          0.5 = coord(1/2)
      0.33333334 = coord(2/6)
    
    Abstract
    Traces the development of text searching and retrieval software designed to cope with the increasing demands made by the storage and handling of large amounts of data, recorded on high data storage media, from CD-ROM to multi gigabyte storage media and online information services, with particular reference to the need to cope with graphics as well as conventional ASCII text. Includes details of: Boolean searching, fuzzy searching and matching; relevance ranking; proximity searching and improved strategies for dealing with text searching in very large databases. Concludes that the best searching tools for CD-ROM publishers are those optimized for searching and retrieval on CD-ROM. CD-ROM drives have relatively lower random seek times than hard discs and so the software most appropriate to the medium is that which can effectively arrange the indexes and text on the CD-ROM to avoid continuous random access searching. Lists and reviews a selection of software packages designed to achieve the sort of results required for rapid CD-ROM searching
    Date
    12. 9.1996 13:56:22
  2. Kelledy, F.; Smeaton, A.F.: Signature files and beyond (1996) 0.02
    0.0173545 = product of:
      0.0520635 = sum of:
        0.03787318 = weight(_text_:searching in 6973) [ClassicSimilarity], result of:
          0.03787318 = score(doc=6973,freq=2.0), product of:
            0.14122958 = queryWeight, product of:
              4.0452914 = idf(docFreq=2103, maxDocs=44218)
              0.03491209 = queryNorm
            0.26816747 = fieldWeight in 6973, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.0452914 = idf(docFreq=2103, maxDocs=44218)
              0.046875 = fieldNorm(doc=6973)
        0.014190319 = product of:
          0.028380638 = sum of:
            0.028380638 = weight(_text_:22 in 6973) [ClassicSimilarity], result of:
              0.028380638 = score(doc=6973,freq=2.0), product of:
                0.1222562 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03491209 = queryNorm
                0.23214069 = fieldWeight in 6973, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=6973)
          0.5 = coord(1/2)
      0.33333334 = coord(2/6)
    
    Abstract
    Proposes that signature files be used as a viable alternative to other indexing strategies such as inverted files for searching through large volumes of text. Demonstrates through simulation, that search times can be further reduced by enhancing the basic signature file concept using deterministic partitioning algorithms which eliminate the need for an exhaustive search of the entire signature file. Reports research to evaluate the performance of some deterministic partitioning algorithms in a non simulated environment using 276 MB of raw newspaper text (taken from the Wall Street Journal) and real user queries. Presents a selection of results to illustrate trends and highlight important aspects of the performance of these methods under realistic rather than simulated operating conditions. As a result of the research reported here certain aspects of this approach to signature files are shown to be found wanting and require improvement. Suggests lines of future research on the partitioning of signature files
    Source
    Information retrieval: new systems and current research. Proceedings of the 16th Research Colloquium of the British Computer Society Information Retrieval Specialist Group, Drymen, Scotland, 22-23 Mar 94. Ed.: R. Leon
  3. Uratani, N.; Takeda, M.: ¬A fast string-searching algorithm for multiple patterns (1993) 0.02
    0.016832525 = product of:
      0.100995146 = sum of:
        0.100995146 = weight(_text_:searching in 6275) [ClassicSimilarity], result of:
          0.100995146 = score(doc=6275,freq=8.0), product of:
            0.14122958 = queryWeight, product of:
              4.0452914 = idf(docFreq=2103, maxDocs=44218)
              0.03491209 = queryNorm
            0.7151132 = fieldWeight in 6275, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              4.0452914 = idf(docFreq=2103, maxDocs=44218)
              0.0625 = fieldNorm(doc=6275)
      0.16666667 = coord(1/6)
    
    Abstract
    The string-searching problem is to find all occurrences of pattern(s) in a text string. The Aho-Corasick string searching algorithm simultaneously finds all occurrences of multiple patterns in one pass through the text. The Boyer-Moore algorithm is the fastest algorithm for a single pattern. By combining the ideas of these two algorithms, presents an efficient string searching algorithm for multiple patterns. The algorithm runs in sublinear time, on the average, as the BM algorithm achieves, and its preprocessing time is linear proportional to the sum of the lengths of the patterns like the AC algorithm
  4. Tenopir, C.: Online databases : natural language searching with WIN (1993) 0.02
    0.016832525 = product of:
      0.100995146 = sum of:
        0.100995146 = weight(_text_:searching in 7038) [ClassicSimilarity], result of:
          0.100995146 = score(doc=7038,freq=8.0), product of:
            0.14122958 = queryWeight, product of:
              4.0452914 = idf(docFreq=2103, maxDocs=44218)
              0.03491209 = queryNorm
            0.7151132 = fieldWeight in 7038, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              4.0452914 = idf(docFreq=2103, maxDocs=44218)
              0.0625 = fieldNorm(doc=7038)
      0.16666667 = coord(1/6)
    
    Abstract
    WESTLAW is one of the first major commercial online systems to embrace both natural language input and partial match searching. Provides a backgroud to WESTLAW. Explains how the WESTLAW Is Natural (WIN) search engine works. Some searchers find that when searching with commands and Boolean logic, results differ drastically from those produces by searching with WIN. Discusses exact match Boolean logic search engines
  5. Willett, P.: Best-match text retrieval (1993) 0.01
    0.014877992 = product of:
      0.08926795 = sum of:
        0.08926795 = weight(_text_:searching in 7818) [ClassicSimilarity], result of:
          0.08926795 = score(doc=7818,freq=4.0), product of:
            0.14122958 = queryWeight, product of:
              4.0452914 = idf(docFreq=2103, maxDocs=44218)
              0.03491209 = queryNorm
            0.6320768 = fieldWeight in 7818, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.0452914 = idf(docFreq=2103, maxDocs=44218)
              0.078125 = fieldNorm(doc=7818)
      0.16666667 = coord(1/6)
    
    Abstract
    Provides an introduction to the computational techniques that underlie best match searching retrieval systems. Discusses: problems of traditional Boolean systems; characteristics of best-match searching; automatic indexing; term conflation; matching of documents and queries (dealing with similarity measures, initial weights, relevance weights, and the matching algorithm); and describes operational best-match systems
  6. Jones, K.: Linguistic searching versus relevance ranking : DR-LINK and TARGET (1999) 0.01
    0.0147284595 = product of:
      0.088370755 = sum of:
        0.088370755 = weight(_text_:searching in 6423) [ClassicSimilarity], result of:
          0.088370755 = score(doc=6423,freq=2.0), product of:
            0.14122958 = queryWeight, product of:
              4.0452914 = idf(docFreq=2103, maxDocs=44218)
              0.03491209 = queryNorm
            0.6257241 = fieldWeight in 6423, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.0452914 = idf(docFreq=2103, maxDocs=44218)
              0.109375 = fieldNorm(doc=6423)
      0.16666667 = coord(1/6)
    
  7. Belkin, N.J.; Cool, C.; Koenemann, J.; Ng, K.B.; Park, S.: Using relevance feedback and ranking in interactive searching (1996) 0.01
    0.012624393 = product of:
      0.07574636 = sum of:
        0.07574636 = weight(_text_:searching in 7588) [ClassicSimilarity], result of:
          0.07574636 = score(doc=7588,freq=2.0), product of:
            0.14122958 = queryWeight, product of:
              4.0452914 = idf(docFreq=2103, maxDocs=44218)
              0.03491209 = queryNorm
            0.53633493 = fieldWeight in 7588, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.0452914 = idf(docFreq=2103, maxDocs=44218)
              0.09375 = fieldNorm(doc=7588)
      0.16666667 = coord(1/6)
    
  8. Al-Hawamdeh, S.; Smith, G.; Willett, P.; Vere, R. de: Using nearest-neighbour searching techniques to access full-text documents (1991) 0.01
    0.011902392 = product of:
      0.07141435 = sum of:
        0.07141435 = weight(_text_:searching in 2300) [ClassicSimilarity], result of:
          0.07141435 = score(doc=2300,freq=4.0), product of:
            0.14122958 = queryWeight, product of:
              4.0452914 = idf(docFreq=2103, maxDocs=44218)
              0.03491209 = queryNorm
            0.5056614 = fieldWeight in 2300, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.0452914 = idf(docFreq=2103, maxDocs=44218)
              0.0625 = fieldNorm(doc=2300)
      0.16666667 = coord(1/6)
    
    Abstract
    Summarises the results to date of a continuing programme of research at Sheffield Univ. to investigate the use of nearest-neighbour retrieval algorithms for full text searching. Given a natural language query statement, the research methods result in a ranking of the paragraphs comprising a full text document in order of decreasing similarity with the query, where the similarity for each paragraph is determined by the number of keyword stems that it has in common with the query
  9. Chang, R.: Keyword searching and indexing (1993) 0.01
    0.011902392 = product of:
      0.07141435 = sum of:
        0.07141435 = weight(_text_:searching in 7223) [ClassicSimilarity], result of:
          0.07141435 = score(doc=7223,freq=4.0), product of:
            0.14122958 = queryWeight, product of:
              4.0452914 = idf(docFreq=2103, maxDocs=44218)
              0.03491209 = queryNorm
            0.5056614 = fieldWeight in 7223, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.0452914 = idf(docFreq=2103, maxDocs=44218)
              0.0625 = fieldNorm(doc=7223)
      0.16666667 = coord(1/6)
    
    Abstract
    Explains how a computer indexing system works. Reviews fundamentals of how data are stored and retrieved by computers. Describes B-Tree and B+-Tree indexing structures. Gives basic keyword searching techniques that the user must apply to make use of the indexing programs. The demand for keyword retrieval is increasing and librarians should expect to see the keyword-indexing feature become commonly available
  10. O'Leary, M.: DIALOG TARGET's new age searching (1993) 0.01
    0.011902392 = product of:
      0.07141435 = sum of:
        0.07141435 = weight(_text_:searching in 7951) [ClassicSimilarity], result of:
          0.07141435 = score(doc=7951,freq=4.0), product of:
            0.14122958 = queryWeight, product of:
              4.0452914 = idf(docFreq=2103, maxDocs=44218)
              0.03491209 = queryNorm
            0.5056614 = fieldWeight in 7951, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.0452914 = idf(docFreq=2103, maxDocs=44218)
              0.0625 = fieldNorm(doc=7951)
      0.16666667 = coord(1/6)
    
    Abstract
    Relevance search engines, which measure the occurrence of search terms in a group of retrieved records and rank them accordingly, often produce better results than refined Boolean searches. Relevance searching has emerged from the research stage to be on the verge of becoming the standard retrieval method. Describes and evaluates the operation of DIALOG's TARGET, a major accomplishment, despite some rough edges
  11. Baeza-Yates, R.A.: String searching algorithms (1992) 0.01
    0.011902392 = product of:
      0.07141435 = sum of:
        0.07141435 = weight(_text_:searching in 3505) [ClassicSimilarity], result of:
          0.07141435 = score(doc=3505,freq=4.0), product of:
            0.14122958 = queryWeight, product of:
              4.0452914 = idf(docFreq=2103, maxDocs=44218)
              0.03491209 = queryNorm
            0.5056614 = fieldWeight in 3505, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.0452914 = idf(docFreq=2103, maxDocs=44218)
              0.0625 = fieldNorm(doc=3505)
      0.16666667 = coord(1/6)
    
    Abstract
    Survey of several algorithms for searching a string in a text. Includes are theoretical and empirical results, as well as the actual code of each algorithm. An extensive bibliography is included
  12. Loughran, H.: ¬A review of nearest neighbour information retrieval (1994) 0.01
    0.010520328 = product of:
      0.06312197 = sum of:
        0.06312197 = weight(_text_:searching in 616) [ClassicSimilarity], result of:
          0.06312197 = score(doc=616,freq=2.0), product of:
            0.14122958 = queryWeight, product of:
              4.0452914 = idf(docFreq=2103, maxDocs=44218)
              0.03491209 = queryNorm
            0.44694576 = fieldWeight in 616, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.0452914 = idf(docFreq=2103, maxDocs=44218)
              0.078125 = fieldNorm(doc=616)
      0.16666667 = coord(1/6)
    
    Abstract
    Explains the concept of 'nearest neighbour' searching, also known as best match or ranked output, which it is claimed can overcome many of the inadequacies of traditional Boolean methods. Also points to some of the limitations. Identifies a number of commercial information retrieval systems which feature this search technique
  13. Couvreur, T.R.; Benzel, R.N.; Miller, S.F.; Zeitler, D.N.; Lee, D.L.; Singhal, M.; Shivaratri, N.; Wong, W.Y.P.: ¬An analysis of performance and cost factors in searching large text databases using parallel search systems (1994) 0.01
    0.010414593 = product of:
      0.062487558 = sum of:
        0.062487558 = weight(_text_:searching in 7657) [ClassicSimilarity], result of:
          0.062487558 = score(doc=7657,freq=4.0), product of:
            0.14122958 = queryWeight, product of:
              4.0452914 = idf(docFreq=2103, maxDocs=44218)
              0.03491209 = queryNorm
            0.44245374 = fieldWeight in 7657, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.0452914 = idf(docFreq=2103, maxDocs=44218)
              0.0546875 = fieldNorm(doc=7657)
      0.16666667 = coord(1/6)
    
    Abstract
    The results of modelling the performance of searching large text databases (>10 GBytes) via various parallel hardware architectures and search algorithms are discussed. The performance under load and the cost of each configuration are compared. Strengths, weaknesses, performance sensitivities, and search features supported for each configuration are also addressed. In addition, a common search workload used in the modelling is described. The search workload is derived from a set of searches run against the Chemical Abstracts file of bibliographic and abstract text available on STN International. This common workload is applied to all configurations modelled to provide a common basis of comparison
  14. Keen, M.: Query reformulation in ranked output interaction (1994) 0.01
    0.010414593 = product of:
      0.062487558 = sum of:
        0.062487558 = weight(_text_:searching in 1065) [ClassicSimilarity], result of:
          0.062487558 = score(doc=1065,freq=4.0), product of:
            0.14122958 = queryWeight, product of:
              4.0452914 = idf(docFreq=2103, maxDocs=44218)
              0.03491209 = queryNorm
            0.44245374 = fieldWeight in 1065, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.0452914 = idf(docFreq=2103, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1065)
      0.16666667 = coord(1/6)
    
    Abstract
    Reports on a research project to evaluate and compare Boolean searching and methods of query reformulation using ranked output retrieval. Illustrates the design and operating features of the ranked output system, called ROSE (Ranked Output Search Engine), by means of typical results obtained by searching a database of 1239 records on the subject of cystic fibrosis. Concludes that further work is needed to determine the best reformulation tactics needed to harness the professional searcher's intelligence with that much more limited intelligence provided by the search software
  15. Gauch, S.; Smith, J.B.: ¬An expert system for automatic query reformation (1993) 0.01
    0.008926794 = product of:
      0.053560764 = sum of:
        0.053560764 = weight(_text_:searching in 3693) [ClassicSimilarity], result of:
          0.053560764 = score(doc=3693,freq=4.0), product of:
            0.14122958 = queryWeight, product of:
              4.0452914 = idf(docFreq=2103, maxDocs=44218)
              0.03491209 = queryNorm
            0.37924606 = fieldWeight in 3693, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.0452914 = idf(docFreq=2103, maxDocs=44218)
              0.046875 = fieldNorm(doc=3693)
      0.16666667 = coord(1/6)
    
    Abstract
    Unfamiliarity with search tactics creates difficulties for many users of online retrieval systems. User observations indicate that even experienced searchers use vocabulary incorrectly and rarely reformulate their queries. To address these problems, an expert system for online search assistance was developed. This prototype automatically reformulates queries to improve the search results, and ranks the retrieved passages to speed the identification of relevant information. User's search performance using the expert system was compared with their search performance using an online thesaurus. The following conclusions were reached: (1) the expert system significantly reduced the number of queries necessary to find relevant passages compared with the user searching alone or with the thesaurus. (2) The expert system produced marginally significant improvements in precision compared with the user searching on their own. There was no significant difference in the recall achieved by the three system configurations. (3) Overall, the expert system ranked relevant passages above irrelevant passages
  16. Can, F.: Incremental clustering for dynamic information processing (1993) 0.01
    0.008416262 = product of:
      0.050497573 = sum of:
        0.050497573 = weight(_text_:searching in 6627) [ClassicSimilarity], result of:
          0.050497573 = score(doc=6627,freq=2.0), product of:
            0.14122958 = queryWeight, product of:
              4.0452914 = idf(docFreq=2103, maxDocs=44218)
              0.03491209 = queryNorm
            0.3575566 = fieldWeight in 6627, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.0452914 = idf(docFreq=2103, maxDocs=44218)
              0.0625 = fieldNorm(doc=6627)
      0.16666667 = coord(1/6)
    
    Abstract
    Clustering of very large document databases is useful for both searching and browsing. The periodic updating of clusters is required due to the dynamic nature of databases. Introduces an algorithm for incremental clustering and discusses the complexity and cost of analysis of the algorithm together with an investigation of its expected behaviour. Shows through empirical testing that the algortihm achieves cost effectiveness and generates statistically valid clusters that are compatible with those of reclustering. The experimental evidence shows that the algorithm creates an effective and effecient retrieval environment
  17. Hofferer, M.: Heuristic search in information retrieval (1994) 0.01
    0.008416262 = product of:
      0.050497573 = sum of:
        0.050497573 = weight(_text_:searching in 1070) [ClassicSimilarity], result of:
          0.050497573 = score(doc=1070,freq=2.0), product of:
            0.14122958 = queryWeight, product of:
              4.0452914 = idf(docFreq=2103, maxDocs=44218)
              0.03491209 = queryNorm
            0.3575566 = fieldWeight in 1070, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.0452914 = idf(docFreq=2103, maxDocs=44218)
              0.0625 = fieldNorm(doc=1070)
      0.16666667 = coord(1/6)
    
    Abstract
    Describes an adaptive information retrieval system: Information Retrieval Algorithm System (IRAS); that uses heuristic searching to sample a document space and retrieve relevant documents according to users' requests; and also a learning module based on a knowledge representation system and an approximate probabilistic characterization of relevant documents; to reproduce a user classification of relevant documents and to provide a rule controlled ranking
  18. Robertson, M.; Willett, P.: ¬An upperbound to the performance of ranked output searching : optimal weighting of query terms using a genetic algorithms (1996) 0.01
    0.008416262 = product of:
      0.050497573 = sum of:
        0.050497573 = weight(_text_:searching in 6977) [ClassicSimilarity], result of:
          0.050497573 = score(doc=6977,freq=2.0), product of:
            0.14122958 = queryWeight, product of:
              4.0452914 = idf(docFreq=2103, maxDocs=44218)
              0.03491209 = queryNorm
            0.3575566 = fieldWeight in 6977, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.0452914 = idf(docFreq=2103, maxDocs=44218)
              0.0625 = fieldNorm(doc=6977)
      0.16666667 = coord(1/6)
    
  19. Stanfill, C.: Parallel information retrieval algorithms (1992) 0.01
    0.008416262 = product of:
      0.050497573 = sum of:
        0.050497573 = weight(_text_:searching in 3515) [ClassicSimilarity], result of:
          0.050497573 = score(doc=3515,freq=2.0), product of:
            0.14122958 = queryWeight, product of:
              4.0452914 = idf(docFreq=2103, maxDocs=44218)
              0.03491209 = queryNorm
            0.3575566 = fieldWeight in 3515, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.0452914 = idf(docFreq=2103, maxDocs=44218)
              0.0625 = fieldNorm(doc=3515)
      0.16666667 = coord(1/6)
    
    Abstract
    Data Parallel computers, such as the connection Machine CM-2, can provide interactive access to text databases containign tens, hundreds or even thousands of Gigabytes of data. Starts by presenting a brief overview of data parallel computing, a performance model of the CM-2, and a model of the workload involved in searching text databases. Discusses various algorithms used in information retrieval and gives performance estimates based on the data and procssing models presented
  20. Robertson, S.E.; Sparck Jones, K.: Simple, proven approaches to text retrieval (1997) 0.01
    0.007438996 = product of:
      0.044633973 = sum of:
        0.044633973 = weight(_text_:searching in 4532) [ClassicSimilarity], result of:
          0.044633973 = score(doc=4532,freq=4.0), product of:
            0.14122958 = queryWeight, product of:
              4.0452914 = idf(docFreq=2103, maxDocs=44218)
              0.03491209 = queryNorm
            0.3160384 = fieldWeight in 4532, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.0452914 = idf(docFreq=2103, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4532)
      0.16666667 = coord(1/6)
    
    Abstract
    This technical note describes straightforward techniques for document indexing and retrieval that have been solidly established through extensive testing and are easy to apply. They are useful for many different types of text material, are viable for very large files, and have the advantage that they do not require special skills or training for searching, but are easy for end users. The document and text retrieval methods described here have a sound theoretical basis, are well established by extensive testing, and the ideas involved are now implemented in some commercial retrieval systems. Testing in the last few years has, in particular, shown that the methods presented here work very well with full texts, not only title and abstracts, and with large files of texts containing three quarters of a million documents. These tests, the TREC Tests (see Harman 1993 - 1997; IP&M 1995), have been rigorous comparative evaluations involving many different approaches to information retrieval. These techniques depend an the use of simple terms for indexing both request and document texts; an term weighting exploiting statistical information about term occurrences; an scoring for request-document matching, using these weights, to obtain a ranked search output; and an relevance feedback to modify request weights or term sets in iterative searching. The normal implementation is via an inverted file organisation using a term list with linked document identifiers, plus counting data, and pointers to the actual texts. The user's request can be a word list, phrases, sentences or extended text.