Search (1 results, page 1 of 1)

Ferber, R.: Automated indexing with thesaurus descriptors : a co-occurence based approach to multilingual retrieval (1997) 0.01
```
0.010386601 = product of:
  0.062319607 = sum of:
    0.062319607 = product of:
      0.12463921 = sum of:
        0.12463921 = weight(_text_:thesaurus in 4144) [ClassicSimilarity], result of:
          0.12463921 = score(doc=4144,freq=10.0), product of:
            0.21834905 = queryWeight, product of:
              4.6210785 = idf(docFreq=1182, maxDocs=44218)
              0.04725067 = queryNorm
            0.5708255 = fieldWeight in 4144, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              4.6210785 = idf(docFreq=1182, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4144)
      0.5 = coord(1/2)
  0.16666667 = coord(1/6)
```
Abstract

Indexing documents with descriptors from a multilingual thesaurus is an approach to multilingual information retrieval. However, manual indexing is expensive. Automazed indexing methods in general use terms found in the document. Thesaurus descriptors are complex terms that are often not used in documents or have specific meanings within the thesaurus; therefore most weighting schemes of automated indexing methods are not suited to select thesaurus descriptors. In this paper a linear associative system is described that uses similarity values extracted from a large corpus of manually indexed documents to construct a rank ordering of the descriptors for a given document title. The system is adaptive and has to be tuned with a training sample of records for the specific task. The system was tested on a corpus of some 80.000 bibliographic records. The results show a high variability with changing parameter values. This indicated that it is very important to empirically adapt the model to the specific situation it is used in. The overall median of the manually assigned descriptors in the automatically generated ranked list of all 3.631 descriptors is 14 for the set used to adapt the system and 11 for a test set not used in the optimization process. This result shows that the optimization is not a fitting to a specific training set but a real adaptation of the model to the setting