Search (12 results, page 1 of 1)

Sparck Jones, K.; Tait, J.I.: Automatic search term variant generation (1984) 0.02

0.022654874 = product of:
  0.13592924 = sum of:
    0.13592924 = weight(_text_:documentation in 2918) [ClassicSimilarity], result of:
      0.13592924 = score(doc=2918,freq=2.0), product of:
        0.1765992 = queryWeight, product of:
          4.354108 = idf(docFreq=1544, maxDocs=44218)
          0.040559217 = queryNorm
        0.76970476 = fieldWeight in 2918, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.354108 = idf(docFreq=1544, maxDocs=44218)
          0.125 = fieldNorm(doc=2918)
  0.16666667 = coord(1/6)

Source: Journal of documentation. 40(1984), S.50-66

Sparck Jones, K.: ¬A statistical interpretation of term specifity and its application in retrieval (1972) 0.02

0.022654874 = product of:
  0.13592924 = sum of:
    0.13592924 = weight(_text_:documentation in 5187) [ClassicSimilarity], result of:
      0.13592924 = score(doc=5187,freq=2.0), product of:
        0.1765992 = queryWeight, product of:
          4.354108 = idf(docFreq=1544, maxDocs=44218)
          0.040559217 = queryNorm
        0.76970476 = fieldWeight in 5187, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.354108 = idf(docFreq=1544, maxDocs=44218)
          0.125 = fieldNorm(doc=5187)
  0.16666667 = coord(1/6)

Source: Journal of documentation. 28(1972), S.11-21

Sparck Jones, K.: Search term relevance weighting given little relevance information (1979) 0.02

0.016991157 = product of:
  0.101946935 = sum of:
    0.101946935 = weight(_text_:documentation in 1939) [ClassicSimilarity], result of:
      0.101946935 = score(doc=1939,freq=2.0), product of:
        0.1765992 = queryWeight, product of:
          4.354108 = idf(docFreq=1544, maxDocs=44218)
          0.040559217 = queryNorm
        0.57727855 = fieldWeight in 1939, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.354108 = idf(docFreq=1544, maxDocs=44218)
          0.09375 = fieldNorm(doc=1939)
  0.16666667 = coord(1/6)

Source: Journal of documentation. 35(1979), S.30-48

Sparck Jones, K.: IDF term weighting and IR research lessons (2004) 0.01

0.014159298 = product of:
  0.08495578 = sum of:
    0.08495578 = weight(_text_:documentation in 4422) [ClassicSimilarity], result of:
      0.08495578 = score(doc=4422,freq=2.0), product of:
        0.1765992 = queryWeight, product of:
          4.354108 = idf(docFreq=1544, maxDocs=44218)
          0.040559217 = queryNorm
        0.48106548 = fieldWeight in 4422, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.354108 = idf(docFreq=1544, maxDocs=44218)
          0.078125 = fieldNorm(doc=4422)
  0.16666667 = coord(1/6)

Source: Journal of documentation. 60(2004) no.5, S.521-523

Sparck Jones, K.; Rijsbergen, C.J. van: Progress in documentation : Information retrieval test collection (1976) 0.01

0.01401699 = product of:
  0.08410194 = sum of:
    0.08410194 = weight(_text_:documentation in 4161) [ClassicSimilarity], result of:
      0.08410194 = score(doc=4161,freq=4.0), product of:
        0.1765992 = queryWeight, product of:
          4.354108 = idf(docFreq=1544, maxDocs=44218)
          0.040559217 = queryNorm
        0.47623056 = fieldWeight in 4161, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.354108 = idf(docFreq=1544, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4161)
  0.16666667 = coord(1/6)

Source: Journal of documentation. 32(1976) no.1, S.59-75

Lewis, D.D.; Sparck Jones, K.: Natural language processing for information retrieval (1997) 0.01

0.011327437 = product of:
  0.06796462 = sum of:
    0.06796462 = weight(_text_:documentation in 575) [ClassicSimilarity], result of:
      0.06796462 = score(doc=575,freq=2.0), product of:
        0.1765992 = queryWeight, product of:
          4.354108 = idf(docFreq=1544, maxDocs=44218)
          0.040559217 = queryNorm
        0.38485238 = fieldWeight in 575, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.354108 = idf(docFreq=1544, maxDocs=44218)
          0.0625 = fieldNorm(doc=575)
  0.16666667 = coord(1/6)

Imprint: The Hague : International Federation for Information and Documentation (FID)

Sparck Jones, K.: Some thoughts on classification for retrieval (1970) 0.01

0.010012135 = product of:
  0.06007281 = sum of:
    0.06007281 = weight(_text_:documentation in 4327) [ClassicSimilarity], result of:
      0.06007281 = score(doc=4327,freq=4.0), product of:
        0.1765992 = queryWeight, product of:
          4.354108 = idf(docFreq=1544, maxDocs=44218)
          0.040559217 = queryNorm
        0.34016466 = fieldWeight in 4327, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.354108 = idf(docFreq=1544, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4327)
  0.16666667 = coord(1/6)

Footnote: Wiederabdruck in: Journal of documentation. 61(2005) no.5, S.571-581.
Source: Journal of documentation. 26(1970), S.89-101

Sparck Jones, K.: Some thoughts on classification for retrieval (2005) 0.01
```
0.010012135 = product of:
  0.06007281 = sum of:
    0.06007281 = weight(_text_:documentation in 4392) [ClassicSimilarity], result of:
      0.06007281 = score(doc=4392,freq=4.0), product of:
        0.1765992 = queryWeight, product of:
          4.354108 = idf(docFreq=1544, maxDocs=44218)
          0.040559217 = queryNorm
        0.34016466 = fieldWeight in 4392, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.354108 = idf(docFreq=1544, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4392)
  0.16666667 = coord(1/6)
```
Abstract

Purpose - This paper was originally published in 1970 (Journal of documentation. 26(1970), S.89-101), considered the suggestion that classifications for retrieval should be constructed automatically and raised some serious problems concerning the sorts of classification which were required, and the way in which formal classification theories should be exploited, given that a retrieval classification is required for a purpose. These difficulties had not been sufficiently considered, and the paper, therefore, aims to attempt an analysis of them, though no solutions of immediate application could be suggested. Design/methodology/approach - Starting with the illustrative proposition that a polythetic, multiple, unordered classification is required in automatic thesaurus construction, this is considered in the context of classification in general, where eight sorts of classification can be distinguished, each covering a range of class definitions and class-finding algorithms. Findings - Since there is generally no natural or best classification of a set of objects as such, the evaluation of alternative classifications requires either formal criteria of goodness of fit, or, if a classification is required for a purpose, a precise statement of that purpose. In any case a substantive theory of classification is needed, which does not exist; and, since sufficiently precise specifications of retrieval requirements are also lacking, the only currently available approach to automatic classification experiments for information retrieval is to do enough of them. Originality/value - Gives insights into the classification of material for information retrieval.

Source

Journal of documentation. 61(2005) no.5, S.571-581

Sparck Jones, K.: Revisiting classification for retrieval (2005) 0.01

0.009911507 = product of:
  0.059469044 = sum of:
    0.059469044 = weight(_text_:documentation in 4328) [ClassicSimilarity], result of:
      0.059469044 = score(doc=4328,freq=2.0), product of:
        0.1765992 = queryWeight, product of:
          4.354108 = idf(docFreq=1544, maxDocs=44218)
          0.040559217 = queryNorm
        0.33674583 = fieldWeight in 4328, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.354108 = idf(docFreq=1544, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4328)
  0.16666667 = coord(1/6)

Source: Journal of documentation. 61(2005) no.5, S.598-601

Sparck Jones, K.: ¬A statistical interpretation of term specificity and its application in retrieval (2004) 0.01

0.009911507 = product of:
  0.059469044 = sum of:
    0.059469044 = weight(_text_:documentation in 4420) [ClassicSimilarity], result of:
      0.059469044 = score(doc=4420,freq=2.0), product of:
        0.1765992 = queryWeight, product of:
          4.354108 = idf(docFreq=1544, maxDocs=44218)
          0.040559217 = queryNorm
        0.33674583 = fieldWeight in 4420, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.354108 = idf(docFreq=1544, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4420)
  0.16666667 = coord(1/6)

Source: Journal of documentation. 60(2004) no.5, S.493-502

Sparck Jones, K.: Reflections on TREC (1997) 0.01

0.008495579 = product of:
  0.050973468 = sum of:
    0.050973468 = weight(_text_:documentation in 580) [ClassicSimilarity], result of:
      0.050973468 = score(doc=580,freq=2.0), product of:
        0.1765992 = queryWeight, product of:
          4.354108 = idf(docFreq=1544, maxDocs=44218)
          0.040559217 = queryNorm
        0.28863928 = fieldWeight in 580, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.354108 = idf(docFreq=1544, maxDocs=44218)
          0.046875 = fieldNorm(doc=580)
  0.16666667 = coord(1/6)

Imprint: The Hague : International Federation for Information and Documentation (FID)

Needham, R.M.; Sparck Jones, K.: Keywords and clumps (1985) 0.01
```
0.007008495 = product of:
  0.04205097 = sum of:
    0.04205097 = weight(_text_:documentation in 3645) [ClassicSimilarity], result of:
      0.04205097 = score(doc=3645,freq=4.0), product of:
        0.1765992 = queryWeight, product of:
          4.354108 = idf(docFreq=1544, maxDocs=44218)
          0.040559217 = queryNorm
        0.23811528 = fieldWeight in 3645, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.354108 = idf(docFreq=1544, maxDocs=44218)
          0.02734375 = fieldNorm(doc=3645)
  0.16666667 = coord(1/6)
```
Abstract

The selection that follows was chosen as it represents "a very early paper an the possibilities allowed by computers an documentation." In the early 1960s computers were being used to provide simple automatic indexing systems wherein keywords were extracted from documents. The problem with such systems was that they lacked vocabulary control, thus documents related in subject matter were not always collocated in retrieval. To improve retrieval by improving recall is the raison d'être of vocabulary control tools such as classifications and thesauri. The question arose whether it was possible by automatic means to construct classes of terms, which when substituted, one for another, could be used to improve retrieval performance? One of the first theoretical approaches to this question was initiated by R. M. Needham and Karen Sparck Jones at the Cambridge Language Research Institute in England.t The question was later pursued using experimental methodologies by Sparck Jones, who, as a Senior Research Associate in the Computer Laboratory at the University of Cambridge, has devoted her life's work to research in information retrieval and automatic naturai language processing. Based an the principles of numerical taxonomy, automatic classification techniques start from the premise that two objects are similar to the degree that they share attributes in common. When these two objects are keywords, their similarity is measured in terms of the number of documents they index in common. Step 1 in automatic classification is to compute mathematically the degree to which two terms are similar. Step 2 is to group together those terms that are "most similar" to each other, forming equivalence classes of intersubstitutable terms. The technique for forming such classes varies and is the factor that characteristically distinguishes different approaches to automatic classification. The technique used by Needham and Sparck Jones, that of clumping, is described in the selection that follows. Questions that must be asked are whether the use of automatically generated classes really does improve retrieval performance and whether there is a true eco nomic advantage in substituting mechanical for manual labor. Several years after her work with clumping, Sparck Jones was to observe that while it was not wholly satisfactory in itself, it was valuable in that it stimulated research into automatic classification. To this it might be added that it was valuable in that it introduced to libraryl information science the methods of numerical taxonomy, thus stimulating us to think again about the fundamental nature and purpose of classification. In this connection it might be useful to review how automatically derived classes differ from those of manually constructed classifications: 1) the manner of their derivation is purely a posteriori, the ultimate operationalization of the principle of literary warrant; 2) the relationship between members forming such classes is essentially statistical; the members of a given class are similar to each other not because they possess the class-defining characteristic but by virtue of sharing a family resemblance; and finally, 3) automatically derived classes are not related meaningfully one to another, that is, they are not ordered in traditional hierarchical and precedence relationships.

Footnote

Original in: Journal of documentation 20(1964) no.1, S.5-15.

Search (12 results, page 1 of 1)

Authors

Years

Themes