Search (26 results, page 1 of 2)

Sparck Jones, K.: Some thoughts on classification for retrieval (1970) 0.01
```
0.012632976 = product of:
  0.03158244 = sum of:
    0.008315044 = product of:
      0.041575223 = sum of:
        0.041575223 = weight(_text_:problem in 4327) [ClassicSimilarity], result of:
          0.041575223 = score(doc=4327,freq=2.0), product of:
            0.17731056 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.04177434 = queryNorm
            0.23447686 = fieldWeight in 4327, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4327)
      0.2 = coord(1/5)
    0.023267398 = weight(_text_:of in 4327) [ClassicSimilarity], result of:
      0.023267398 = score(doc=4327,freq=34.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.35617945 = fieldWeight in 4327, product of:
          5.8309517 = tf(freq=34.0), with freq of:
            34.0 = termFreq=34.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4327)
  0.4 = coord(2/5)
```
Abstract

The suggestion that classifications for retrieval should be constructed automatically raises some serious problems concerning the sorts of classification which are required, and the way in which formal classification theories should be exploited, given that a retrieval classification is required for a purpose. These difficulties have not been sufficiently considered, and the paper therefore attempts an analysis of them, though no solution of immediate application can be suggested. Starting with the illustrative proposition that a polythetic, multiple, unordered classification is required in automatic thesaurus construction, this is considered in the context of classification in general, where eight sorts of classification can be distinguished, each covering a range of class definitions and class-finding algorithms. The problem which follows is that since there is generally no natural or best classification of a set of objects as such, the evaluation of alternative classifications requires either formal criteria of goodness of fit, or, if a classification is required for a purpose, a precises statement of that purpose. In any case a substantive theory of classification is needed, which does not exist; and since sufficiently precise specifications of retrieval requirements are also lacking, the only currently available approach to automatic classification experiments for information retrieval is to do enough of them

Footnote

Wiederabdruck in: Journal of documentation. 61(2005) no.5, S.571-581.

Source

Journal of documentation. 26(1970), S.89-101

Sparck Jones, K.; Jones, G.J.F.; Foote, J.T.; Young, S.J.: Experiments in spoken document retrieval (1996) 0.01

0.0120587675 = product of:
  0.030146917 = sum of:
    0.01646295 = product of:
      0.08231475 = sum of:
        0.08231475 = weight(_text_:problem in 1951) [ClassicSimilarity], result of:
          0.08231475 = score(doc=1951,freq=4.0), product of:
            0.17731056 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.04177434 = queryNorm
            0.46424055 = fieldWeight in 1951, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1951)
      0.2 = coord(1/5)
    0.013683967 = weight(_text_:of in 1951) [ClassicSimilarity], result of:
      0.013683967 = score(doc=1951,freq=6.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.20947541 = fieldWeight in 1951, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1951)
  0.4 = coord(2/5)

Abstract: Describes experiments in the retrieval of spoken documents in multimedia systems. Speech documents pose a particular problem for retrieval since their words as well as contents are unknown. Addresses this problem, for a video mail application, by combining state of the art speech recognition with established document retrieval technologies so as to provide an effective and efficient retrieval tool. Tests with a small spoken message collection show that retrieval precision for the spoken file can reach 90% of that obtained when the same file is used, as a benchmark, in text transcription form

Needham, R.M.; Sparck Jones, K.: Keywords and clumps (1985) 0.01
```
0.009394583 = product of:
  0.023486458 = sum of:
    0.005820531 = product of:
      0.029102655 = sum of:
        0.029102655 = weight(_text_:problem in 3645) [ClassicSimilarity], result of:
          0.029102655 = score(doc=3645,freq=2.0), product of:
            0.17731056 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.04177434 = queryNorm
            0.1641338 = fieldWeight in 3645, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.02734375 = fieldNorm(doc=3645)
      0.2 = coord(1/5)
    0.017665926 = weight(_text_:of in 3645) [ClassicSimilarity], result of:
      0.017665926 = score(doc=3645,freq=40.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.2704316 = fieldWeight in 3645, product of:
          6.3245554 = tf(freq=40.0), with freq of:
            40.0 = termFreq=40.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.02734375 = fieldNorm(doc=3645)
  0.4 = coord(2/5)
```
Abstract

The selection that follows was chosen as it represents "a very early paper an the possibilities allowed by computers an documentation." In the early 1960s computers were being used to provide simple automatic indexing systems wherein keywords were extracted from documents. The problem with such systems was that they lacked vocabulary control, thus documents related in subject matter were not always collocated in retrieval. To improve retrieval by improving recall is the raison d'être of vocabulary control tools such as classifications and thesauri. The question arose whether it was possible by automatic means to construct classes of terms, which when substituted, one for another, could be used to improve retrieval performance? One of the first theoretical approaches to this question was initiated by R. M. Needham and Karen Sparck Jones at the Cambridge Language Research Institute in England.t The question was later pursued using experimental methodologies by Sparck Jones, who, as a Senior Research Associate in the Computer Laboratory at the University of Cambridge, has devoted her life's work to research in information retrieval and automatic naturai language processing. Based an the principles of numerical taxonomy, automatic classification techniques start from the premise that two objects are similar to the degree that they share attributes in common. When these two objects are keywords, their similarity is measured in terms of the number of documents they index in common. Step 1 in automatic classification is to compute mathematically the degree to which two terms are similar. Step 2 is to group together those terms that are "most similar" to each other, forming equivalence classes of intersubstitutable terms. The technique for forming such classes varies and is the factor that characteristically distinguishes different approaches to automatic classification. The technique used by Needham and Sparck Jones, that of clumping, is described in the selection that follows. Questions that must be asked are whether the use of automatically generated classes really does improve retrieval performance and whether there is a true eco nomic advantage in substituting mechanical for manual labor. Several years after her work with clumping, Sparck Jones was to observe that while it was not wholly satisfactory in itself, it was valuable in that it stimulated research into automatic classification. To this it might be added that it was valuable in that it introduced to libraryl information science the methods of numerical taxonomy, thus stimulating us to think again about the fundamental nature and purpose of classification. In this connection it might be useful to review how automatically derived classes differ from those of manually constructed classifications: 1) the manner of their derivation is purely a posteriori, the ultimate operationalization of the principle of literary warrant; 2) the relationship between members forming such classes is essentially statistical; the members of a given class are similar to each other not because they possess the class-defining characteristic but by virtue of sharing a family resemblance; and finally, 3) automatically derived classes are not related meaningfully one to another, that is, they are not ordered in traditional hierarchical and precedence relationships.

Footnote

Original in: Journal of documentation 20(1964) no.1, S.5-15.

Source

Theory of subject analysis: a sourcebook. Ed.: L.M. Chan, et al

Sparck Jones, K.: ¬A statistical interpretation of term specifity and its application in retrieval (1972) 0.01

0.005107617 = product of:
  0.025538085 = sum of:
    0.025538085 = weight(_text_:of in 5187) [ClassicSimilarity], result of:
      0.025538085 = score(doc=5187,freq=4.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.39093933 = fieldWeight in 5187, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.125 = fieldNorm(doc=5187)
  0.2 = coord(1/5)

Source: Journal of documentation. 28(1972), S.11-21

Sparck Jones, K.: Some thoughts on classification for retrieval (2005) 0.00
```
0.004788391 = product of:
  0.023941955 = sum of:
    0.023941955 = weight(_text_:of in 4392) [ClassicSimilarity], result of:
      0.023941955 = score(doc=4392,freq=36.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.36650562 = fieldWeight in 4392, product of:
          6.0 = tf(freq=36.0), with freq of:
            36.0 = termFreq=36.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4392)
  0.2 = coord(1/5)
```
Abstract

Purpose - This paper was originally published in 1970 (Journal of documentation. 26(1970), S.89-101), considered the suggestion that classifications for retrieval should be constructed automatically and raised some serious problems concerning the sorts of classification which were required, and the way in which formal classification theories should be exploited, given that a retrieval classification is required for a purpose. These difficulties had not been sufficiently considered, and the paper, therefore, aims to attempt an analysis of them, though no solutions of immediate application could be suggested. Design/methodology/approach - Starting with the illustrative proposition that a polythetic, multiple, unordered classification is required in automatic thesaurus construction, this is considered in the context of classification in general, where eight sorts of classification can be distinguished, each covering a range of class definitions and class-finding algorithms. Findings - Since there is generally no natural or best classification of a set of objects as such, the evaluation of alternative classifications requires either formal criteria of goodness of fit, or, if a classification is required for a purpose, a precise statement of that purpose. In any case a substantive theory of classification is needed, which does not exist; and, since sufficiently precise specifications of retrieval requirements are also lacking, the only currently available approach to automatic classification experiments for information retrieval is to do enough of them. Originality/value - Gives insights into the classification of material for information retrieval.

Source

Journal of documentation. 61(2005) no.5, S.571-581
Robertson, S.E.; Sparck Jones, K.: Relevance weighting of search terms (1976) 0.00
```
0.0047777384 = product of:
  0.023888692 = sum of:
    0.023888692 = weight(_text_:of in 71) [ClassicSimilarity], result of:
      0.023888692 = score(doc=71,freq=14.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.36569026 = fieldWeight in 71, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0625 = fieldNorm(doc=71)
  0.2 = coord(1/5)
```
Abstract

Examines statistical techniques for exploiting relevance information to weight search terms. These techniques are presented as a natural extension of weighting methods using information about the distribution of index terms in documents in general. A series of relevance weighting functions is derived and is justified by theoretical considerations. In particular, it is shown that specific weighted search methods are implied by a general probabilistic theory of retrieval. Different applications of relevance weighting are illustrated by experimental results for test collections

Source

Journal of the American Society for Information Science. 27(1976), S.129-146
Sparck Jones, K.: ¬A statistical interpretation of term specificity and its application in retrieval (2004) 0.00
```
0.004469165 = product of:
  0.022345824 = sum of:
    0.022345824 = weight(_text_:of in 4420) [ClassicSimilarity], result of:
      0.022345824 = score(doc=4420,freq=16.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.34207192 = fieldWeight in 4420, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4420)
  0.2 = coord(1/5)
```
Abstract

The exhaustivity of document descriptions and the specificity of index terms are usually regarded as independent. It is suggested that specificity should be interpreted statistically, as a function of term use rather than of term meaning. The effects on retrieval of variations in term specificity are examined, experiments with three test collections showing, in particular, that frequently-occurring terms are required for good overall performance. It is argued that terms should be weighted according to collection frequency, so that matches on less frequent, more specific, terms are of greater value than matches on frequent terms. Results for the test collections show that considerable improvements in performance are obtained with this very simple procedure.

Source

Journal of documentation. 60(2004) no.5, S.493-502
Sparck Jones, K.: ¬The role of artificial intelligence in information retrieval (1991) 0.00
```
0.004423326 = product of:
  0.02211663 = sum of:
    0.02211663 = weight(_text_:of in 4811) [ClassicSimilarity], result of:
      0.02211663 = score(doc=4811,freq=12.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.33856338 = fieldWeight in 4811, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0625 = fieldNorm(doc=4811)
  0.2 = coord(1/5)
```
Abstract

Presents a view of the scope of artificial intelligence (AI) in information retrieval (IR). Considers potential roles of AI and IR, evaluating AI from a realistic point od view and within a wide information management potential, not just because AI is itself insufficiently developed, but because many information management tasks are properly shallow information processing ones. There is nevertheless an important place for specific applications of AI or AI-derived technology when particular constraints can be placed on the information management tasks involved

Source

Journal of the American Society for Information Science. 42(1991) no.8, S.558-565

Sparck Jones, K.; Tait, J.I.: Automatic search term variant generation (1984) 0.00

0.0036116305 = product of:
  0.018058153 = sum of:
    0.018058153 = weight(_text_:of in 2918) [ClassicSimilarity], result of:
      0.018058153 = score(doc=2918,freq=2.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.27643585 = fieldWeight in 2918, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.125 = fieldNorm(doc=2918)
  0.2 = coord(1/5)

Source: Journal of documentation. 40(1984), S.50-66

Sparck Jones, K.; Jackson, D.M.: ¬The use of automatically obtained keyword classification for information retrieval (1970) 0.00

0.0036116305 = product of:
  0.018058153 = sum of:
    0.018058153 = weight(_text_:of in 5177) [ClassicSimilarity], result of:
      0.018058153 = score(doc=5177,freq=2.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.27643585 = fieldWeight in 5177, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.125 = fieldNorm(doc=5177)
  0.2 = coord(1/5)

Sparck Jones, K.: Reflections on TREC : TREC-2 (1995) 0.00
```
0.0036116305 = product of:
  0.018058153 = sum of:
    0.018058153 = weight(_text_:of in 1916) [ClassicSimilarity], result of:
      0.018058153 = score(doc=1916,freq=8.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.27643585 = fieldWeight in 1916, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0625 = fieldNorm(doc=1916)
  0.2 = coord(1/5)
```
Abstract

Discusses the TREC programme as a major enterprise in information retrieval research. It reviews its structure as an evaluation exercise, characterises the methods of indexing and retrieval being tested within it in terms of the approaches to system performance factors these represent; analyses the test results for solid, overall conclusions that can be drawn from them; and, in the light of the particular features of the test data, assesses TREC both for generally applicable findings that emerge from it and for directions it offers for future research

Kay, M.; Sparck Jones, K.: Automated language processing (1971) 0.00

0.0036116305 = product of:
  0.018058153 = sum of:
    0.018058153 = weight(_text_:of in 250) [ClassicSimilarity], result of:
      0.018058153 = score(doc=250,freq=2.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.27643585 = fieldWeight in 250, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.125 = fieldNorm(doc=250)
  0.2 = coord(1/5)

Source: Annual review of information science and technology. 6(1971), S.141-166

Sparck Jones, K.; Galliers, J.R.: Evaluating natural language processing systems : an analysis and review (1996) 0.00
```
0.003583304 = product of:
  0.01791652 = sum of:
    0.01791652 = weight(_text_:of in 2934) [ClassicSimilarity], result of:
      0.01791652 = score(doc=2934,freq=14.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.2742677 = fieldWeight in 2934, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=2934)
  0.2 = coord(1/5)
```
Abstract

This comprehensive state-of-the-art book is the first devoted to the important and timely issue of evaluating NLP systems. It addresses the whole area of NLP system evaluation, including aims and scope, problems and methodology. The authors provide a wide-ranging and careful analysis of evaluation concepts, reinforced with extensive illustrations; they relate systems to their environments and develop a framework for proper evaluation. The discussion of principles is completed by a detailed review of practice and strategies in the field, covering both systems for specific tasks, like translation, and core language processors. The methodology lessons drawn from the analysis and review are applied in a series of example cases. A comprehensive bibliography, a subject index, and term glossary are included
Sparck Jones, K.; Rijsbergen, C.J. van: Progress in documentation : Information retrieval test collection (1976) 0.00
```
0.0035331852 = product of:
  0.017665926 = sum of:
    0.017665926 = weight(_text_:of in 4161) [ClassicSimilarity], result of:
      0.017665926 = score(doc=4161,freq=10.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.2704316 = fieldWeight in 4161, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4161)
  0.2 = coord(1/5)
```
Abstract

Many retrieval experiments have been based on inadequate test collections, and current research is hampered by the lack of proper collections. This short review does not attempt a fully docuemted survey of all the collections used in the past decade: hopefully representative examples have been studied to throw light on the requriements test collections should meet, to show how past collections have been defective, and to suggest guidelines for a future "ideal" test collection. This specifications for this collection can be taken as an indirect comment on our present state of knowledge of major retrieval system variables, and experience in conducting experiments.

Source

Journal of documentation. 32(1976) no.1, S.59-75

Sparck Jones, K.: IDF term weighting and IR research lessons (2004) 0.00

0.0031922606 = product of:
  0.015961302 = sum of:
    0.015961302 = weight(_text_:of in 4422) [ClassicSimilarity], result of:
      0.015961302 = score(doc=4422,freq=4.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.24433708 = fieldWeight in 4422, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.078125 = fieldNorm(doc=4422)
  0.2 = coord(1/5)

Abstract: Robertson comments on the theoretical status of IDF term weighting. Its history illustrates how ideas develop in a specific research context, in theory/experiment interaction, and in operational practice.
Source: Journal of documentation. 60(2004) no.5, S.521-523

Lewis, D.D.; Sparck Jones, K.: Natural language processing for information retrieval (1996) 0.00

0.0031601768 = product of:
  0.015800884 = sum of:
    0.015800884 = weight(_text_:of in 4144) [ClassicSimilarity], result of:
      0.015800884 = score(doc=4144,freq=2.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.24188137 = fieldWeight in 4144, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.109375 = fieldNorm(doc=4144)
  0.2 = coord(1/5)

Source: Communications of the Association for Computing Machinery. 39(1996) no.1, S.92-101

Sparck Jones, K.: Reflections on TREC (1997) 0.00
```
0.0030284445 = product of:
  0.015142222 = sum of:
    0.015142222 = weight(_text_:of in 580) [ClassicSimilarity], result of:
      0.015142222 = score(doc=580,freq=10.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.23179851 = fieldWeight in 580, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=580)
  0.2 = coord(1/5)
```
Abstract

This paper discusses the Text REtrieval Conferences (TREC) programme as a major enterprise in information retrieval research. It reviews its structure as an evaluation exercise, characterises the methods of indexing and retrieval being tested within its terms of the approaches to system performance factors these represent; analyses the test results for solid, overall conclusions that can be drawn from them; and, in the light of the particular features of the test data, assesses TREC both for generally applicable findings that emerge from it and for directions it offers for future research

Source

From classification to 'knowledge organization': Dorking revisited or 'past is prelude'. A collection of reprints to commemorate the firty year span between the Dorking Conference (First International Study Conference on Classification Research 1957) and the Sixth International Study Conference on Classification Research (London 1997). Ed.: A. Gilchrist
Robertson, S.E.; Sparck Jones, K.: Simple, proven approaches to text retrieval (1997) 0.00
```
0.002764579 = product of:
  0.013822895 = sum of:
    0.013822895 = weight(_text_:of in 4532) [ClassicSimilarity], result of:
      0.013822895 = score(doc=4532,freq=12.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.21160212 = fieldWeight in 4532, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4532)
  0.2 = coord(1/5)
```
Abstract

This technical note describes straightforward techniques for document indexing and retrieval that have been solidly established through extensive testing and are easy to apply. They are useful for many different types of text material, are viable for very large files, and have the advantage that they do not require special skills or training for searching, but are easy for end users. The document and text retrieval methods described here have a sound theoretical basis, are well established by extensive testing, and the ideas involved are now implemented in some commercial retrieval systems. Testing in the last few years has, in particular, shown that the methods presented here work very well with full texts, not only title and abstracts, and with large files of texts containing three quarters of a million documents. These tests, the TREC Tests (see Harman 1993 - 1997; IP&M 1995), have been rigorous comparative evaluations involving many different approaches to information retrieval. These techniques depend an the use of simple terms for indexing both request and document texts; an term weighting exploiting statistical information about term occurrences; an scoring for request-document matching, using these weights, to obtain a ranked search output; and an relevance feedback to modify request weights or term sets in iterative searching. The normal implementation is via an inverted file organisation using a term list with linked document identifiers, plus counting data, and pointers to the actual texts. The user's request can be a word list, phrases, sentences or extended text.

Issue

May, 1997, Update of 1994 and 1996 versions.

Series

Technical Report TR356, University of Cambridge, Computer Laboratory

Sparck Jones, K.; Walker, S.; Robertson, S.E.: ¬A probabilistic model of information retrieval : development and comparative experiments - part 1 (2000) 0.00

0.002708723 = product of:
  0.013543615 = sum of:
    0.013543615 = weight(_text_:of in 4181) [ClassicSimilarity], result of:
      0.013543615 = score(doc=4181,freq=2.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.20732689 = fieldWeight in 4181, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.09375 = fieldNorm(doc=4181)
  0.2 = coord(1/5)

Sparck Jones, K.; Walker, S.; Robertson, S.E.: ¬A probabilistic model of information retrieval : development and comparative experiments - part 2 (2000) 0.00

0.002708723 = product of:
  0.013543615 = sum of:
    0.013543615 = weight(_text_:of in 4286) [ClassicSimilarity], result of:
      0.013543615 = score(doc=4286,freq=2.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.20732689 = fieldWeight in 4286, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.09375 = fieldNorm(doc=4286)
  0.2 = coord(1/5)

Search (26 results, page 1 of 2)

Authors

Years

Types

Themes