Search (92 results, page 1 of 5)

Gomez, I.: Coping with the problem of subject classification diversity (1996) 0.06

0.06306431 = product of:
  0.09459646 = sum of:
    0.059888236 = weight(_text_:bibliographic in 5074) [ClassicSimilarity], result of:
      0.059888236 = score(doc=5074,freq=2.0), product of:
        0.19890657 = queryWeight, product of:
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.051092815 = queryNorm
        0.30108726 = fieldWeight in 5074, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5074)
    0.034708228 = product of:
      0.069416456 = sum of:
        0.069416456 = weight(_text_:classification in 5074) [ClassicSimilarity], result of:
          0.069416456 = score(doc=5074,freq=6.0), product of:
            0.16271563 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.051092815 = queryNorm
            0.42661208 = fieldWeight in 5074, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5074)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: The delimination of a research field in bibliometric studies presents the problem of the diversity of subject classifications used in the sources of input and output data. Classification of documents according the thematic codes or keywords is the most accurate method, mainly used is specialized bibliographic or patent databases. Classification of journals in disciplines presents lower specifity, and some shortcomings as the change over time of both journals and disciplines and the increasing interdisciplinarity of research. Standardization of subject classifications emerges as an important point in bibliometric studies in order to allow international comparisons, although flexibility is needed to meet the needs of local studies

Golub, K.: Automated subject indexing : an overview (2021) 0.05

0.053284697 = product of:
  0.07992704 = sum of:
    0.059888236 = weight(_text_:bibliographic in 718) [ClassicSimilarity], result of:
      0.059888236 = score(doc=718,freq=2.0), product of:
        0.19890657 = queryWeight, product of:
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.051092815 = queryNorm
        0.30108726 = fieldWeight in 718, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.0546875 = fieldNorm(doc=718)
    0.020038802 = product of:
      0.040077604 = sum of:
        0.040077604 = weight(_text_:classification in 718) [ClassicSimilarity], result of:
          0.040077604 = score(doc=718,freq=2.0), product of:
            0.16271563 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.051092815 = queryNorm
            0.24630459 = fieldWeight in 718, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0546875 = fieldNorm(doc=718)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: In the face of the ever-increasing document volume, libraries around the globe are more and more exploring (semi-) automated approaches to subject indexing. This helps sustain bibliographic objectives, enrich metadata, and establish more connections across documents from various collections, effectively leading to improved information retrieval and access. However, generally accepted automated approaches that are functional in operative systems are lacking. This article aims to provide an overview of basic principles used for automated subject indexing, major approaches in relation to their possible application in actual library systems, existing working examples, as well as related challenges calling for further research.
Source: Cataloging and classification quarterly. 59(2021) no.8, p.702-719

Wang, S.; Koopman, R.: Embed first, then predict (2019) 0.04
```
0.04201304 = product of:
  0.06301956 = sum of:
    0.04277731 = weight(_text_:bibliographic in 5400) [ClassicSimilarity], result of:
      0.04277731 = score(doc=5400,freq=2.0), product of:
        0.19890657 = queryWeight, product of:
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.051092815 = queryNorm
        0.21506234 = fieldWeight in 5400, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5400)
    0.020242248 = product of:
      0.040484495 = sum of:
        0.040484495 = weight(_text_:classification in 5400) [ClassicSimilarity], result of:
          0.040484495 = score(doc=5400,freq=4.0), product of:
            0.16271563 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.051092815 = queryNorm
            0.24880521 = fieldWeight in 5400, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5400)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

Automatic subject prediction is a desirable feature for modern digital library systems, as manual indexing can no longer cope with the rapid growth of digital collections. It is also desirable to be able to identify a small set of entities (e.g., authors, citations, bibliographic records) which are most relevant to a query. This gets more difficult when the amount of data increases dramatically. Data sparsity and model scalability are the major challenges to solving this type of extreme multilabel classification problem automatically. In this paper, we propose to address this problem in two steps: we first embed different types of entities into the same semantic space, where similarity could be computed easily; second, we propose a novel non-parametric method to identify the most relevant entities in addition to direct semantic similarities. We show how effectively this approach predicts even very specialised subjects, which are associated with few documents in the training set and are more problematic for a classifier.

Footnote

Beitrag eines Special Issue: Research Information Systems and Science Classifications; including papers from "Trajectories for Research: Fathoming the Promise of the NARCIS Classification," 27-28 September 2018, The Hague, The Netherlands.
Milstead, J.L.: Methodologies for subject analysis in bibliographic databases (1992) 0.03
```
0.028231587 = product of:
  0.08469476 = sum of:
    0.08469476 = weight(_text_:bibliographic in 2311) [ClassicSimilarity], result of:
      0.08469476 = score(doc=2311,freq=4.0), product of:
        0.19890657 = queryWeight, product of:
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.051092815 = queryNorm
        0.4258017 = fieldWeight in 2311, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2311)
  0.33333334 = coord(1/3)
```
Abstract

The goal of the study was to determine the state of the art of subject analysis as applied to large bibliographic data bases. The intent was to gather and evaluate information, casting it in a form that could be applied by management. There was no attempt to determine actual costs or trade-offs among costs and possible benefits. Commercial automatic indexing packages were also reviewed. The overall conclusion was that data base producers should begin working seriously on upgrading their thesauri and codifying their indexing policies as a means of moving toward development of machine aids to indexing, but that fully automatic indexing is not yet ready for wholesale implementation
Martins, A.L.; Souza, R.R.; Ribeiro de Mello, H.: ¬The use of noun phrases in information retrieval : proposing a mechanism for automatic classification (2014) 0.03
```
0.027928818 = product of:
  0.08378645 = sum of:
    0.08378645 = sum of:
      0.056096964 = weight(_text_:classification in 1441) [ClassicSimilarity], result of:
        0.056096964 = score(doc=1441,freq=12.0), product of:
          0.16271563 = queryWeight, product of:
            3.1847067 = idf(docFreq=4974, maxDocs=44218)
            0.051092815 = queryNorm
          0.3447546 = fieldWeight in 1441, product of:
            3.4641016 = tf(freq=12.0), with freq of:
              12.0 = termFreq=12.0
            3.1847067 = idf(docFreq=4974, maxDocs=44218)
            0.03125 = fieldNorm(doc=1441)
      0.027689485 = weight(_text_:22 in 1441) [ClassicSimilarity], result of:
        0.027689485 = score(doc=1441,freq=2.0), product of:
          0.17891833 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.051092815 = queryNorm
          0.15476047 = fieldWeight in 1441, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.03125 = fieldNorm(doc=1441)
  0.33333334 = coord(1/3)
```
Abstract

This paper presents a research on syntactic structures known as noun phrases (NP) being applied to increase the effectiveness and efficiency of the mechanisms for the document's classification. Our hypothesis is the fact that the NP can be used instead of single words as a semantic aggregator to reduce the number of words that will be used for the classification system without losing its semantic coverage, increasing its efficiency. The experiment divided the documents classification process in three phases: a) NP preprocessing b) system training; and c) classification experiments. In the first step, a corpus of digitalized texts was submitted to a natural language processing platform1 in which the part-of-speech tagging was done, and them PERL scripts pertaining to the PALAVRAS package were used to extract the Noun Phrases. The preprocessing also involved the tasks of a) removing NP low meaning pre-modifiers, as quantifiers; b) identification of synonyms and corresponding substitution for common hyperonyms; and c) stemming of the relevant words contained in the NP, for similitude checking with other NPs. The first tests with the resulting documents have demonstrated its effectiveness. We have compared the structural similarity of the documents before and after the whole pre-processing steps of phase one. The texts maintained the consistency with the original and have kept the readability. The second phase involves submitting the modified documents to a SVM algorithm to identify clusters and classify the documents. The classification rules are to be established using a machine learning approach. Finally, tests will be conducted to check the effectiveness of the whole process.

Source

Knowledge organization in the 21st century: between historical patterns and future prospects. Proceedings of the Thirteenth International ISKO Conference 19-22 May 2014, Kraków, Poland. Ed.: Wieslaw Babik
Ward, M.L.: ¬The future of the human indexer (1996) 0.03
```
0.025295487 = product of:
  0.07588646 = sum of:
    0.07588646 = sum of:
      0.034352235 = weight(_text_:classification in 7244) [ClassicSimilarity], result of:
        0.034352235 = score(doc=7244,freq=2.0), product of:
          0.16271563 = queryWeight, product of:
            3.1847067 = idf(docFreq=4974, maxDocs=44218)
            0.051092815 = queryNorm
          0.21111822 = fieldWeight in 7244, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.1847067 = idf(docFreq=4974, maxDocs=44218)
            0.046875 = fieldNorm(doc=7244)
      0.041534226 = weight(_text_:22 in 7244) [ClassicSimilarity], result of:
        0.041534226 = score(doc=7244,freq=2.0), product of:
          0.17891833 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.051092815 = queryNorm
          0.23214069 = fieldWeight in 7244, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=7244)
  0.33333334 = coord(1/3)
```
Abstract

Considers the principles of indexing and the intellectual skills involved in order to determine what automatic indexing systems would be required in order to supplant or complement the human indexer. Good indexing requires: considerable prior knowledge of the literature; judgement as to what to index and what depth to index; reading skills; abstracting skills; and classification skills, Illustrates these features with a detailed description of abstracting and indexing processes involved in generating entries for the mechanical engineering database POWERLINK. Briefly assesses the possibility of replacing human indexers with specialist indexing software, with particular reference to the Object Analyzer from the InTEXT automatic indexing system and using the criteria described for human indexers. At present, it is unlikely that the automatic indexer will replace the human indexer, but when more primary texts are available in electronic form, it may be a useful productivity tool for dealing with large quantities of low grade texts (should they be wanted in the database)

Date

9. 2.1997 18:44:22

Polity, Y.: Vers une ergonomie linguistique (1994) 0.02

0.022814568 = product of:
  0.0684437 = sum of:
    0.0684437 = weight(_text_:bibliographic in 36) [ClassicSimilarity], result of:
      0.0684437 = score(doc=36,freq=2.0), product of:
        0.19890657 = queryWeight, product of:
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.051092815 = queryNorm
        0.34409973 = fieldWeight in 36, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.0625 = fieldNorm(doc=36)
  0.33333334 = coord(1/3)

Abstract: Analyzed a special type of man-mchine interaction, that of searching an information system with natural language. A model for full text processing for information retrieval was proposed that considered the system's users and how they employ information. Describes how INIST (the National Institute for Scientific and Technical Information) is developing computer assisted indexing as an aid to improving relevance when retrieving information from bibliographic data banks

Hirawa, M.: Role of keywords in the network searching era (1998) 0.02

0.022814568 = product of:
  0.0684437 = sum of:
    0.0684437 = weight(_text_:bibliographic in 3446) [ClassicSimilarity], result of:
      0.0684437 = score(doc=3446,freq=2.0), product of:
        0.19890657 = queryWeight, product of:
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.051092815 = queryNorm
        0.34409973 = fieldWeight in 3446, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.0625 = fieldNorm(doc=3446)
  0.33333334 = coord(1/3)

Abstract: A survey of Japanese OPACs available on the Internet was conducted relating to use of keywords for subject access. The findings suggest that present OPACs are not capable of storing subject-oriented information. Currently available keyword access derives from a merely title-based retrieval system. Contents data should be added to bibliographic records as an efficient way of providing subject access, and costings for this process should be estimated. Word standardisation issues must also be addressed

Pulgarin, A.; Gil-Leiva, I.: Bibliometric analysis of the automatic indexing literature : 1956-2000 (2004) 0.02
```
0.019962747 = product of:
  0.059888236 = sum of:
    0.059888236 = weight(_text_:bibliographic in 2566) [ClassicSimilarity], result of:
      0.059888236 = score(doc=2566,freq=2.0), product of:
        0.19890657 = queryWeight, product of:
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.051092815 = queryNorm
        0.30108726 = fieldWeight in 2566, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2566)
  0.33333334 = coord(1/3)
```
Abstract

We present a bibliometric study of a corpus of 839 bibliographic references about automatic indexing, covering the period 1956-2000. We analyse the distribution of authors and works, the obsolescence and its dispersion, and the distribution of the literature by topic, year, and source type. We conclude that: (i) there has been a constant interest on the part of researchers; (ii) the most studied topics were the techniques and methods employed and the general aspects of automatic indexing; (iii) the productivity of the authors does fit a Lotka distribution (Dmax=0.02 and critical value=0.054); (iv) the annual aging factor is 95%; and (v) the dispersion of the literature is low.

Sparck Jones, K.: Automatic keyword classification for information retrieval (1971) 0.02

0.019084575 = product of:
  0.057253722 = sum of:
    0.057253722 = product of:
      0.114507444 = sum of:
        0.114507444 = weight(_text_:classification in 5176) [ClassicSimilarity], result of:
          0.114507444 = score(doc=5176,freq=2.0), product of:
            0.16271563 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.051092815 = queryNorm
            0.70372736 = fieldWeight in 5176, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.15625 = fieldNorm(doc=5176)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval (1986) 0.02

0.018459657 = product of:
  0.05537897 = sum of:
    0.05537897 = product of:
      0.11075794 = sum of:
        0.11075794 = weight(_text_:22 in 402) [ClassicSimilarity], result of:
          0.11075794 = score(doc=402,freq=2.0), product of:
            0.17891833 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051092815 = queryNorm
            0.61904186 = fieldWeight in 402, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.125 = fieldNorm(doc=402)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Source: Information processing and management. 22(1986) no.6, S.465-476

Humphrey, S.M.: Automatic indexing of documents from journal descriptors : a preliminary investigation (1999) 0.02
```
0.017110925 = product of:
  0.05133277 = sum of:
    0.05133277 = weight(_text_:bibliographic in 3769) [ClassicSimilarity], result of:
      0.05133277 = score(doc=3769,freq=2.0), product of:
        0.19890657 = queryWeight, product of:
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.051092815 = queryNorm
        0.2580748 = fieldWeight in 3769, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.046875 = fieldNorm(doc=3769)
  0.33333334 = coord(1/3)
```
Abstract

A new, fully automated approach for indedexing documents is presented based on associating textwords in a training set of bibliographic citations with the indexing of journals. This journal-level indexing is in the form of a consistent, timely set of journal descriptors (JDs) indexing the individual journals themselves. This indexing is maintained in journal records in a serials authority database. The advantage of this novel approach is that the training set does not depend on previous manual indexing of thousands of documents (i.e., any such indexing already in the training set is not used), but rather the relatively small intellectual effort of indexing at the journal level, usually a matter of a few thousand unique journals for which retrospective indexing to maintain consistency and currency may be feasible. If successful, JD indexing would provide topical categorization of documents outside the training set, i.e., journal articles, monographs, Web documents, reports from the grey literature, etc., and therefore be applied in searching. Because JDs are quite general, corresponding to subject domains, their most problable use would be for improving or refining search results

Fuhr, N.; Niewelt, B.: ¬Ein Retrievaltest mit automatisch indexierten Dokumenten (1984) 0.02

0.0161522 = product of:
  0.048456598 = sum of:
    0.048456598 = product of:
      0.096913196 = sum of:
        0.096913196 = weight(_text_:22 in 262) [ClassicSimilarity], result of:
          0.096913196 = score(doc=262,freq=2.0), product of:
            0.17891833 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051092815 = queryNorm
            0.5416616 = fieldWeight in 262, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=262)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Date: 20.10.2000 12:22:23

Hlava, M.M.K.: Automatic indexing : comparing rule-based and statistics-based indexing systems (2005) 0.02

0.0161522 = product of:
  0.048456598 = sum of:
    0.048456598 = product of:
      0.096913196 = sum of:
        0.096913196 = weight(_text_:22 in 6265) [ClassicSimilarity], result of:
          0.096913196 = score(doc=6265,freq=2.0), product of:
            0.17891833 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051092815 = queryNorm
            0.5416616 = fieldWeight in 6265, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=6265)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Source: Information outlook. 9(2005) no.8, S.22-23

Griffiths, A.; Robinson, L.A.; Willett, P.: Hierarchic agglomerative clustering methods for automatic document classification (1984) 0.02

0.015267659 = product of:
  0.045802977 = sum of:
    0.045802977 = product of:
      0.091605954 = sum of:
        0.091605954 = weight(_text_:classification in 2414) [ClassicSimilarity], result of:
          0.091605954 = score(doc=2414,freq=2.0), product of:
            0.16271563 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.051092815 = queryNorm
            0.5629819 = fieldWeight in 2414, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.125 = fieldNorm(doc=2414)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Borko, H.; Bernick, M.: Automatic document classification : T.2 (1964) 0.02

0.015267659 = product of:
  0.045802977 = sum of:
    0.045802977 = product of:
      0.091605954 = sum of:
        0.091605954 = weight(_text_:classification in 4197) [ClassicSimilarity], result of:
          0.091605954 = score(doc=4197,freq=2.0), product of:
            0.16271563 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.051092815 = queryNorm
            0.5629819 = fieldWeight in 4197, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.125 = fieldNorm(doc=4197)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Frants, V.I.; Kamenoff, N.I.; Shapiro, J.: ¬One approach to classification of users and automatic clustering of documents (1993) 0.02

0.015267659 = product of:
  0.045802977 = sum of:
    0.045802977 = product of:
      0.091605954 = sum of:
        0.091605954 = weight(_text_:classification in 4569) [ClassicSimilarity], result of:
          0.091605954 = score(doc=4569,freq=8.0), product of:
            0.16271563 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.051092815 = queryNorm
            0.5629819 = fieldWeight in 4569, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0625 = fieldNorm(doc=4569)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Abstract: Shows how to automatically construct a classification of users and a clustering of documents on the basis of users' information needs by creating clusters of documents and cross-references among clusters using users' search requests. Examines feedback in the construction of this classification and clustering so that the classification can be changed over time to reflect the changing needs of the users

Sparck Jones, K.; Jackson, D.M.: ¬The use of automatically obtained keyword classification for information retrieval (1970) 0.02

0.015267659 = product of:
  0.045802977 = sum of:
    0.045802977 = product of:
      0.091605954 = sum of:
        0.091605954 = weight(_text_:classification in 5177) [ClassicSimilarity], result of:
          0.091605954 = score(doc=5177,freq=2.0), product of:
            0.16271563 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.051092815 = queryNorm
            0.5629819 = fieldWeight in 5177, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.125 = fieldNorm(doc=5177)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Borko, H.; Bernick, M.: Automatic document classification : T.1 (1963) 0.02

0.015267659 = product of:
  0.045802977 = sum of:
    0.045802977 = product of:
      0.091605954 = sum of:
        0.091605954 = weight(_text_:classification in 5487) [ClassicSimilarity], result of:
          0.091605954 = score(doc=5487,freq=2.0), product of:
            0.16271563 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.051092815 = queryNorm
            0.5629819 = fieldWeight in 5487, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.125 = fieldNorm(doc=5487)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Ferber, R.: Automated indexing with thesaurus descriptors : a co-occurence based approach to multilingual retrieval (1997) 0.01
```
0.014259104 = product of:
  0.04277731 = sum of:
    0.04277731 = weight(_text_:bibliographic in 4144) [ClassicSimilarity], result of:
      0.04277731 = score(doc=4144,freq=2.0), product of:
        0.19890657 = queryWeight, product of:
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.051092815 = queryNorm
        0.21506234 = fieldWeight in 4144, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.893044 = idf(docFreq=2449, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4144)
  0.33333334 = coord(1/3)
```
Abstract

Indexing documents with descriptors from a multilingual thesaurus is an approach to multilingual information retrieval. However, manual indexing is expensive. Automazed indexing methods in general use terms found in the document. Thesaurus descriptors are complex terms that are often not used in documents or have specific meanings within the thesaurus; therefore most weighting schemes of automated indexing methods are not suited to select thesaurus descriptors. In this paper a linear associative system is described that uses similarity values extracted from a large corpus of manually indexed documents to construct a rank ordering of the descriptors for a given document title. The system is adaptive and has to be tuned with a training sample of records for the specific task. The system was tested on a corpus of some 80.000 bibliographic records. The results show a high variability with changing parameter values. This indicated that it is very important to empirically adapt the model to the specific situation it is used in. The overall median of the manually assigned descriptors in the automatically generated ranked list of all 3.631 descriptors is 14 for the set used to adapt the system and 11 for a test set not used in the optimization process. This result shows that the optimization is not a fitting to a specific training set but a real adaptation of the model to the setting

Search (92 results, page 1 of 5)

Authors

Years

Languages

Types

Themes