Search (263 results, page 1 of 14)

Chou, C.; Chu, T.: ¬An analysis of BERT (NLP) for assisted subject indexing for Project Gutenberg (2022) 0.07

0.0707307 = product of:
  0.15914407 = sum of:
    0.030001212 = product of:
      0.060002424 = sum of:
        0.060002424 = weight(_text_:headings in 1139) [ClassicSimilarity], result of:
          0.060002424 = score(doc=1139,freq=2.0), product of:
            0.15996648 = queryWeight, product of:
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.03298316 = queryNorm
            0.37509373 = fieldWeight in 1139, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1139)
      0.5 = coord(1/2)
    0.030546555 = weight(_text_:library in 1139) [ClassicSimilarity], result of:
      0.030546555 = score(doc=1139,freq=6.0), product of:
        0.08672522 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.03298316 = queryNorm
        0.3522223 = fieldWeight in 1139, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1139)
    0.016503768 = weight(_text_:of in 1139) [ClassicSimilarity], result of:
      0.016503768 = score(doc=1139,freq=14.0), product of:
        0.05157766 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03298316 = queryNorm
        0.31997898 = fieldWeight in 1139, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1139)
    0.08209253 = weight(_text_:congress in 1139) [ClassicSimilarity], result of:
      0.08209253 = score(doc=1139,freq=4.0), product of:
        0.15733992 = queryWeight, product of:
          4.7703104 = idf(docFreq=1018, maxDocs=44218)
          0.03298316 = queryNorm
        0.5217527 = fieldWeight in 1139, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.7703104 = idf(docFreq=1018, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1139)
  0.44444445 = coord(4/9)

Abstract: In light of AI (Artificial Intelligence) and NLP (Natural language processing) technologies, this article examines the feasibility of using AI/NLP models to enhance the subject indexing of digital resources. While BERT (Bidirectional Encoder Representations from Transformers) models are widely used in scholarly communities, the authors assess whether BERT models can be used in machine-assisted indexing in the Project Gutenberg collection, through suggesting Library of Congress subject headings filtered by certain Library of Congress Classification subclass labels. The findings of this study are informative for further research on BERT models to assist with automatic subject indexing for digital library collections.

Losee, R.M.: ¬A Gray code based ordering for documents on shelves : classification for browsing and retrieval (1992) 0.05

0.053170525 = product of:
  0.11963368 = sum of:
    0.030001212 = product of:
      0.060002424 = sum of:
        0.060002424 = weight(_text_:headings in 2335) [ClassicSimilarity], result of:
          0.060002424 = score(doc=2335,freq=2.0), product of:
            0.15996648 = queryWeight, product of:
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.03298316 = queryNorm
            0.37509373 = fieldWeight in 2335, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2335)
      0.5 = coord(1/2)
    0.01763606 = weight(_text_:library in 2335) [ClassicSimilarity], result of:
      0.01763606 = score(doc=2335,freq=2.0), product of:
        0.08672522 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.03298316 = queryNorm
        0.20335563 = fieldWeight in 2335, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2335)
    0.01394823 = weight(_text_:of in 2335) [ClassicSimilarity], result of:
      0.01394823 = score(doc=2335,freq=10.0), product of:
        0.05157766 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03298316 = queryNorm
        0.2704316 = fieldWeight in 2335, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2335)
    0.05804818 = weight(_text_:congress in 2335) [ClassicSimilarity], result of:
      0.05804818 = score(doc=2335,freq=2.0), product of:
        0.15733992 = queryWeight, product of:
          4.7703104 = idf(docFreq=1018, maxDocs=44218)
          0.03298316 = queryNorm
        0.36893487 = fieldWeight in 2335, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.7703104 = idf(docFreq=1018, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2335)
  0.44444445 = coord(4/9)

Abstract: A document classifier places documents together in a linear arrangement for browsing or high-speed access by human or computerised information retrieval systems. Requirements for document classification and browsing systems are developed from similarity measures, distance measures, and the notion of subject aboutness. A requirement that documents be arranged in decreasing order of similarity as the distance from a given document increases can often not be met. Based on these requirements, information-theoretic considerations, and the Gray code, a classification system is proposed that can classifiy documents without human intervention. A measure of classifier performance is developed, and used to evaluate experimental results comparing the distance between subject headings assigned to documents given classifications from the proposed system and the Library of Congress Classification (LCC) system
Source: Journal of the American Society for Information Science. 43(1992) no.4, S.312-322

Junger, U.: Can indexing be automated? : the example of the Deutsche Nationalbibliothek (2012) 0.04

0.04189715 = product of:
  0.09426859 = sum of:
    0.04242812 = product of:
      0.08485624 = sum of:
        0.08485624 = weight(_text_:headings in 1717) [ClassicSimilarity], result of:
          0.08485624 = score(doc=1717,freq=4.0), product of:
            0.15996648 = queryWeight, product of:
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.03298316 = queryNorm
            0.5304626 = fieldWeight in 1717, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1717)
      0.5 = coord(1/2)
    0.01763606 = weight(_text_:library in 1717) [ClassicSimilarity], result of:
      0.01763606 = score(doc=1717,freq=2.0), product of:
        0.08672522 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.03298316 = queryNorm
        0.20335563 = fieldWeight in 1717, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1717)
    0.012475675 = weight(_text_:of in 1717) [ClassicSimilarity], result of:
      0.012475675 = score(doc=1717,freq=8.0), product of:
        0.05157766 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03298316 = queryNorm
        0.24188137 = fieldWeight in 1717, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1717)
    0.02172873 = product of:
      0.04345746 = sum of:
        0.04345746 = weight(_text_:problems in 1717) [ClassicSimilarity], result of:
          0.04345746 = score(doc=1717,freq=2.0), product of:
            0.13613719 = queryWeight, product of:
              4.1274753 = idf(docFreq=1937, maxDocs=44218)
              0.03298316 = queryNorm
            0.31921813 = fieldWeight in 1717, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.1274753 = idf(docFreq=1937, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1717)
      0.5 = coord(1/2)
  0.44444445 = coord(4/9)

Abstract: The German subject headings authority file (Schlagwortnormdatei/SWD) provides a broad controlled vocabulary for indexing documents of all subjects. Traditionally used for intellectual subject cataloguing primarily of books the Deutsche Nationalbibliothek (DNB, German National Library) has been working on developping and implementing procedures for automated assignment of subject headings for online publications. This project, its results and problems are sketched in the paper.

Golub, K.; Lykke, M.; Tudhope, D.: Enhancing social tagging with automated keywords from the Dewey Decimal Classification (2014) 0.04

0.040960416 = product of:
  0.09216093 = sum of:
    0.021429438 = product of:
      0.042858876 = sum of:
        0.042858876 = weight(_text_:headings in 2918) [ClassicSimilarity], result of:
          0.042858876 = score(doc=2918,freq=2.0), product of:
            0.15996648 = queryWeight, product of:
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.03298316 = queryNorm
            0.2679241 = fieldWeight in 2918, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2918)
      0.5 = coord(1/2)
    0.0125971865 = weight(_text_:library in 2918) [ClassicSimilarity], result of:
      0.0125971865 = score(doc=2918,freq=2.0), product of:
        0.08672522 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.03298316 = queryNorm
        0.14525402 = fieldWeight in 2918, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2918)
    0.016671322 = weight(_text_:of in 2918) [ClassicSimilarity], result of:
      0.016671322 = score(doc=2918,freq=28.0), product of:
        0.05157766 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03298316 = queryNorm
        0.32322758 = fieldWeight in 2918, product of:
          5.2915025 = tf(freq=28.0), with freq of:
            28.0 = termFreq=28.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2918)
    0.041462988 = weight(_text_:congress in 2918) [ClassicSimilarity], result of:
      0.041462988 = score(doc=2918,freq=2.0), product of:
        0.15733992 = queryWeight, product of:
          4.7703104 = idf(docFreq=1018, maxDocs=44218)
          0.03298316 = queryNorm
        0.26352492 = fieldWeight in 2918, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.7703104 = idf(docFreq=1018, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2918)
  0.44444445 = coord(4/9)

Abstract: Purpose - The purpose of this paper is to explore the potential of applying the Dewey Decimal Classification (DDC) as an established knowledge organization system (KOS) for enhancing social tagging, with the ultimate purpose of improving subject indexing and information retrieval. Design/methodology/approach - Over 11.000 Intute metadata records in politics were used. Totally, 28 politics students were each given four tasks, in which a total of 60 resources were tagged in two different configurations, one with uncontrolled social tags only and another with uncontrolled social tags as well as suggestions from a controlled vocabulary. The controlled vocabulary was DDC comprising also mappings from the Library of Congress Subject Headings. Findings - The results demonstrate the importance of controlled vocabulary suggestions for indexing and retrieval: to help produce ideas of which tags to use, to make it easier to find focus for the tagging, to ensure consistency and to increase the number of access points in retrieval. The value and usefulness of the suggestions proved to be dependent on the quality of the suggestions, both as to conceptual relevance to the user and as to appropriateness of the terminology. Originality/value - No research has investigated the enhancement of social tagging with suggestions from the DDC, an established KOS, in a user trial, comparing social tagging only and social tagging enhanced with the suggestions. This paper is a final reflection on all aspects of the study.
Source: Journal of documentation. 70(2014) no.5, S.801-828

Prasad, A.R.D.: PROMETHEUS: an automatic indexing system (1996) 0.04

0.035119243 = product of:
  0.10535773 = sum of:
    0.020155499 = weight(_text_:library in 5189) [ClassicSimilarity], result of:
      0.020155499 = score(doc=5189,freq=2.0), product of:
        0.08672522 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.03298316 = queryNorm
        0.23240642 = fieldWeight in 5189, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.0625 = fieldNorm(doc=5189)
    0.018861448 = weight(_text_:of in 5189) [ClassicSimilarity], result of:
      0.018861448 = score(doc=5189,freq=14.0), product of:
        0.05157766 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03298316 = queryNorm
        0.36569026 = fieldWeight in 5189, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0625 = fieldNorm(doc=5189)
    0.06634078 = weight(_text_:congress in 5189) [ClassicSimilarity], result of:
      0.06634078 = score(doc=5189,freq=2.0), product of:
        0.15733992 = queryWeight, product of:
          4.7703104 = idf(docFreq=1018, maxDocs=44218)
          0.03298316 = queryNorm
        0.42163986 = fieldWeight in 5189, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.7703104 = idf(docFreq=1018, maxDocs=44218)
          0.0625 = fieldNorm(doc=5189)
  0.33333334 = coord(3/9)

Abstract: An automatic indexing system using the tools and techniques of artificial intelligence is described. The paper presents the various components of the system like the parser, grammar formalism, lexicon, and the frame based knowledge representation for semantic representation. The semantic representation is based on the Ranganathan school of thought, especially that of Deep Structure of Subject Indexing Languages enunciated by Bhattacharyya. It is attempted to demonstrate the various stepts in indexing by providing an illustration
Source: Knowledge organization and change: Proceedings of the Fourth International ISKO Conference, 15-18 July 1996, Library of Congress, Washington, DC. Ed.: R. Green

Junger, U.: Can indexing be automated? : the example of the Deutsche Nationalbibliothek (2014) 0.03

0.034750063 = product of:
  0.07818764 = sum of:
    0.030001212 = product of:
      0.060002424 = sum of:
        0.060002424 = weight(_text_:headings in 1969) [ClassicSimilarity], result of:
          0.060002424 = score(doc=1969,freq=2.0), product of:
            0.15996648 = queryWeight, product of:
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.03298316 = queryNorm
            0.37509373 = fieldWeight in 1969, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1969)
      0.5 = coord(1/2)
    0.01763606 = weight(_text_:library in 1969) [ClassicSimilarity], result of:
      0.01763606 = score(doc=1969,freq=2.0), product of:
        0.08672522 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.03298316 = queryNorm
        0.20335563 = fieldWeight in 1969, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1969)
    0.008821635 = weight(_text_:of in 1969) [ClassicSimilarity], result of:
      0.008821635 = score(doc=1969,freq=4.0), product of:
        0.05157766 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03298316 = queryNorm
        0.17103596 = fieldWeight in 1969, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1969)
    0.02172873 = product of:
      0.04345746 = sum of:
        0.04345746 = weight(_text_:problems in 1969) [ClassicSimilarity], result of:
          0.04345746 = score(doc=1969,freq=2.0), product of:
            0.13613719 = queryWeight, product of:
              4.1274753 = idf(docFreq=1937, maxDocs=44218)
              0.03298316 = queryNorm
            0.31921813 = fieldWeight in 1969, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.1274753 = idf(docFreq=1937, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1969)
      0.5 = coord(1/2)
  0.44444445 = coord(4/9)

Abstract: The German Integrated Authority File (Gemeinsame Normdatei, GND), provides a broad controlled vocabulary for indexing documents on all subjects. Traditionally used for intellectual subject cataloging primarily for books, the Deutsche Nationalbibliothek (DNB, German National Library) has been working on developing and implementing procedures for automated assignment of subject headings for online publications. This project, its results, and problems are outlined in this article.

Zeng, L.: Automatic indexing for Chinese text : problems and progress (1992) 0.03

0.030401751 = product of:
  0.091205254 = sum of:
    0.03527212 = weight(_text_:library in 1289) [ClassicSimilarity], result of:
      0.03527212 = score(doc=1289,freq=2.0), product of:
        0.08672522 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.03298316 = queryNorm
        0.40671125 = fieldWeight in 1289, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.109375 = fieldNorm(doc=1289)
    0.012475675 = weight(_text_:of in 1289) [ClassicSimilarity], result of:
      0.012475675 = score(doc=1289,freq=2.0), product of:
        0.05157766 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03298316 = queryNorm
        0.24188137 = fieldWeight in 1289, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.109375 = fieldNorm(doc=1289)
    0.04345746 = product of:
      0.08691492 = sum of:
        0.08691492 = weight(_text_:problems in 1289) [ClassicSimilarity], result of:
          0.08691492 = score(doc=1289,freq=2.0), product of:
            0.13613719 = queryWeight, product of:
              4.1274753 = idf(docFreq=1937, maxDocs=44218)
              0.03298316 = queryNorm
            0.63843626 = fieldWeight in 1289, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.1274753 = idf(docFreq=1937, maxDocs=44218)
              0.109375 = fieldNorm(doc=1289)
      0.5 = coord(1/2)
  0.33333334 = coord(3/9)

Source: Encyclopedia of library and information science. Vol.49, [=Suppl.12]

Malone, L.C.; Wildman-Pepe, J.; Driscoll, J.R.: Evaluation of an automated keywording system (1990) 0.03

0.029355511 = product of:
  0.08806653 = sum of:
    0.014146087 = weight(_text_:of in 4999) [ClassicSimilarity], result of:
      0.014146087 = score(doc=4999,freq=14.0), product of:
        0.05157766 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03298316 = queryNorm
        0.2742677 = fieldWeight in 4999, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=4999)
    0.018624624 = product of:
      0.03724925 = sum of:
        0.03724925 = weight(_text_:problems in 4999) [ClassicSimilarity], result of:
          0.03724925 = score(doc=4999,freq=2.0), product of:
            0.13613719 = queryWeight, product of:
              4.1274753 = idf(docFreq=1937, maxDocs=44218)
              0.03298316 = queryNorm
            0.27361554 = fieldWeight in 4999, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.1274753 = idf(docFreq=1937, maxDocs=44218)
              0.046875 = fieldNorm(doc=4999)
      0.5 = coord(1/2)
    0.055295818 = product of:
      0.110591635 = sum of:
        0.110591635 = weight(_text_:exercises in 4999) [ClassicSimilarity], result of:
          0.110591635 = score(doc=4999,freq=2.0), product of:
            0.2345736 = queryWeight, product of:
              7.11192 = idf(docFreq=97, maxDocs=44218)
              0.03298316 = queryNorm
            0.47145814 = fieldWeight in 4999, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              7.11192 = idf(docFreq=97, maxDocs=44218)
              0.046875 = fieldNorm(doc=4999)
      0.5 = coord(1/2)
  0.33333334 = coord(3/9)

Abstract: An automated keywording system has been designed ro artifically behave as a human "expert" indexer. The system was designed to keyword 100 to 800 word documents representing lessons learned from military exercises and operations. A set of 74 documents can be keyworded on an IBM PS/2 model 80 in about five minutes. This paper presents a variety of ways for statistical documenting improvements in the development of an automated keywording system over time. It is not only beneficial to have some measure of system performance for a given time, but it is also useful as attemps are made to improve a system to assess if actual statistically significant improvements have been made. Furthermore, it is useful to identify the source of any existing problems so that they can be rectified. The specifics of the automated system that was evaluated are described, and the performance measures used are discussed.

Abdul, H.; Khoo, C.: Automatic indexing of medical literature using phrase matching : an exploratory study 0.03

0.027651673 = product of:
  0.08295502 = sum of:
    0.0342871 = product of:
      0.0685742 = sum of:
        0.0685742 = weight(_text_:headings in 3601) [ClassicSimilarity], result of:
          0.0685742 = score(doc=3601,freq=2.0), product of:
            0.15996648 = queryWeight, product of:
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.03298316 = queryNorm
            0.42867854 = fieldWeight in 3601, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.0625 = fieldNorm(doc=3601)
      0.5 = coord(1/2)
    0.02850418 = weight(_text_:library in 3601) [ClassicSimilarity], result of:
      0.02850418 = score(doc=3601,freq=4.0), product of:
        0.08672522 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.03298316 = queryNorm
        0.32867232 = fieldWeight in 3601, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.0625 = fieldNorm(doc=3601)
    0.020163735 = weight(_text_:of in 3601) [ClassicSimilarity], result of:
      0.020163735 = score(doc=3601,freq=16.0), product of:
        0.05157766 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03298316 = queryNorm
        0.39093933 = fieldWeight in 3601, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0625 = fieldNorm(doc=3601)
  0.33333334 = coord(3/9)

Abstract: Reports the 1st part of a study to apply the technique of phrase matching to the automatic assignment of MeSH subject headings and subheadings to abstracts of periodical articles.
Source: Health information: new directions. Proceedings of the Joint Conference of the Health Libraries Sections of the Australian Library and Information Association and New Zealand Library Association, Auckland, New Zealand, 12.-16.11.1989

Plaunt, C.; Norgard, B.A.: ¬An association-based method for automatic indexing with a controlled vocabulary (1998) 0.03

0.02542181 = product of:
  0.076265424 = sum of:
    0.052491184 = product of:
      0.10498237 = sum of:
        0.10498237 = weight(_text_:headings in 1794) [ClassicSimilarity], result of:
          0.10498237 = score(doc=1794,freq=12.0), product of:
            0.15996648 = queryWeight, product of:
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.03298316 = queryNorm
            0.6562773 = fieldWeight in 1794, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1794)
      0.5 = coord(1/2)
    0.012602335 = weight(_text_:of in 1794) [ClassicSimilarity], result of:
      0.012602335 = score(doc=1794,freq=16.0), product of:
        0.05157766 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03298316 = queryNorm
        0.24433708 = fieldWeight in 1794, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1794)
    0.011171908 = product of:
      0.022343816 = sum of:
        0.022343816 = weight(_text_:22 in 1794) [ClassicSimilarity], result of:
          0.022343816 = score(doc=1794,freq=2.0), product of:
            0.11550141 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03298316 = queryNorm
            0.19345059 = fieldWeight in 1794, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1794)
      0.5 = coord(1/2)
  0.33333334 = coord(3/9)

Abstract: In this article, we describe and test a two-stage algorithm based on a lexical collocation technique which maps from the lexical clues contained in a document representation into a controlled vocabulary list of subject headings. Using a collection of 4.626 INSPEC documents, we create a 'dictionary' of associations between the lexical items contained in the titles, authors, and abstracts, and controlled vocabulary subject headings assigned to those records by human indexers using a likelihood ratio statistic as the measure of association. In the deployment stage, we use the dictiony to predict which of the controlled vocabulary subject headings best describe new documents when they are presented to the system. Our evaluation of this algorithm, in which we compare the automatically assigned subject headings to the subject headings assigned to the test documents by human catalogers, shows that we can obtain results comparable to, and consistent with, human cataloging. In effect we have cast this as a classic partial match information retrieval problem. We consider the problem to be one of 'retrieving' (or assigning) the most probably 'relevant' (or correct) controlled vocabulary subject headings to a document based on the clues contained in that document
Date: 11. 9.2000 19:53:22
Source: Journal of the American Society for Information Science. 49(1998) no.10, S.888-902

Shafer, K.: Scorpion Project explores using Dewey to organize the Web (1996) 0.02

0.0205285 = product of:
  0.0615855 = sum of:
    0.030001212 = product of:
      0.060002424 = sum of:
        0.060002424 = weight(_text_:headings in 6750) [ClassicSimilarity], result of:
          0.060002424 = score(doc=6750,freq=2.0), product of:
            0.15996648 = queryWeight, product of:
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.03298316 = queryNorm
            0.37509373 = fieldWeight in 6750, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.0546875 = fieldNorm(doc=6750)
      0.5 = coord(1/2)
    0.01763606 = weight(_text_:library in 6750) [ClassicSimilarity], result of:
      0.01763606 = score(doc=6750,freq=2.0), product of:
        0.08672522 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.03298316 = queryNorm
        0.20335563 = fieldWeight in 6750, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.0546875 = fieldNorm(doc=6750)
    0.01394823 = weight(_text_:of in 6750) [ClassicSimilarity], result of:
      0.01394823 = score(doc=6750,freq=10.0), product of:
        0.05157766 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03298316 = queryNorm
        0.2704316 = fieldWeight in 6750, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=6750)
  0.33333334 = coord(3/9)

Abstract: As the amount of accessible information on the WWW increases, so will the cost of accessing it, even if search servcies remain free, due to the increasing amount of time users will have to spend to find needed items. Considers what the seemingly unorganized Web and the organized world of libraries can offer each other. The OCLC Scorpion Project is attempting to combine indexing and cataloguing, specifically focusing on building tools for automatic subject recognition using the technqiues of library science and information retrieval. If subject headings or concept domains can be automatically assigned to electronic items, improved filtering tools for searching can be produced

Vledutz-Stokolov, N.: Concept recognition in an automatic text-processing system for the life sciences (1987) 0.02

0.019814553 = product of:
  0.059443656 = sum of:
    0.021429438 = product of:
      0.042858876 = sum of:
        0.042858876 = weight(_text_:headings in 2849) [ClassicSimilarity], result of:
          0.042858876 = score(doc=2849,freq=2.0), product of:
            0.15996648 = queryWeight, product of:
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.03298316 = queryNorm
            0.2679241 = fieldWeight in 2849, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2849)
      0.5 = coord(1/2)
    0.016064888 = weight(_text_:of in 2849) [ClassicSimilarity], result of:
      0.016064888 = score(doc=2849,freq=26.0), product of:
        0.05157766 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03298316 = queryNorm
        0.31146988 = fieldWeight in 2849, product of:
          5.0990195 = tf(freq=26.0), with freq of:
            26.0 = termFreq=26.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2849)
    0.021949332 = product of:
      0.043898664 = sum of:
        0.043898664 = weight(_text_:problems in 2849) [ClassicSimilarity], result of:
          0.043898664 = score(doc=2849,freq=4.0), product of:
            0.13613719 = queryWeight, product of:
              4.1274753 = idf(docFreq=1937, maxDocs=44218)
              0.03298316 = queryNorm
            0.322459 = fieldWeight in 2849, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.1274753 = idf(docFreq=1937, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2849)
      0.5 = coord(1/2)
  0.33333334 = coord(3/9)

Abstract: This article describes a natural-language text-processing system designed as an automatic aid to subject indexing at BIOSIS. The intellectual procedure the system should model is a deep indexing with a controlled vocabulary of biological concepts - Concept Headings (CHs). On the average, ten CHs are assigned to each article by BIOSIS indexers. The automatic procedure consists of two stages: (1) translation of natural-language biological titles into title-semantic representations which are in the constructed formalized language of Concept Primitives, and (2) translation of the latter representations into the language of CHs. The first stage is performed by matching the titles agianst the system's Semantic Vocabulary (SV). The SV currently contains approximately 15.000 biological natural-language terms and their translations in the language of Concept Primitives. Tor the ambiguous terms, the SV contains the algorithmical rules of term disambiguation, ruels based on semantic analysis of the contexts. The second stage of the automatic procedure is performed by matching the title representations against the CH definitions, formulated as Boolean search strategies in the language of Concept Primitives. Three experiments performed with the system and their results are decribed. The most typical problems the system encounters, the problems of lexical and situational ambiguities, are discussed. The disambiguation techniques employed are described and demonstrated in many examples
Source: Journal of the American Society for Information Science. 38(1987) no.4, S.269-287

Banerjee, K.; Johnson, M.: Improving access to archival collections with automated entity extraction (2015) 0.02

0.019294726 = product of:
  0.057884175 = sum of:
    0.015116624 = weight(_text_:library in 2144) [ClassicSimilarity], result of:
      0.015116624 = score(doc=2144,freq=2.0), product of:
        0.08672522 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.03298316 = queryNorm
        0.17430481 = fieldWeight in 2144, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.046875 = fieldNorm(doc=2144)
    0.0106934365 = weight(_text_:of in 2144) [ClassicSimilarity], result of:
      0.0106934365 = score(doc=2144,freq=8.0), product of:
        0.05157766 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03298316 = queryNorm
        0.20732689 = fieldWeight in 2144, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=2144)
    0.032074116 = product of:
      0.06414823 = sum of:
        0.06414823 = weight(_text_:etc in 2144) [ClassicSimilarity], result of:
          0.06414823 = score(doc=2144,freq=2.0), product of:
            0.17865302 = queryWeight, product of:
              5.4164915 = idf(docFreq=533, maxDocs=44218)
              0.03298316 = queryNorm
            0.35906604 = fieldWeight in 2144, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4164915 = idf(docFreq=533, maxDocs=44218)
              0.046875 = fieldNorm(doc=2144)
      0.5 = coord(1/2)
  0.33333334 = coord(3/9)

Abstract: The complexity and diversity of archival resources make constructing rich metadata records time consuming and expensive, which in turn limits access to these valuable materials. However, significant automation of the metadata creation process would dramatically reduce the cost of providing access points, improve access to individual resources, and establish connections between resources that would otherwise remain unknown. Using a case study at Oregon Health & Science University as a lens to examine the conceptual and technical challenges associated with automated extraction of access points, we discuss using publically accessible API's to extract entities (i.e. people, places, concepts, etc.) from digital and digitized objects. We describe why Linked Open Data is not well suited for a use case such as ours. We conclude with recommendations about how this method can be used in archives as well as for other library applications.

Malone, L.C.; Driscoll, J.R.; Pepe, J.W.: Modeling the performance of an automated keywording system (1991) 0.02

0.019127883 = product of:
  0.08607547 = sum of:
    0.012347717 = weight(_text_:of in 6682) [ClassicSimilarity], result of:
      0.012347717 = score(doc=6682,freq=6.0), product of:
        0.05157766 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03298316 = queryNorm
        0.23940048 = fieldWeight in 6682, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0625 = fieldNorm(doc=6682)
    0.07372776 = product of:
      0.14745551 = sum of:
        0.14745551 = weight(_text_:exercises in 6682) [ClassicSimilarity], result of:
          0.14745551 = score(doc=6682,freq=2.0), product of:
            0.2345736 = queryWeight, product of:
              7.11192 = idf(docFreq=97, maxDocs=44218)
              0.03298316 = queryNorm
            0.62861085 = fieldWeight in 6682, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              7.11192 = idf(docFreq=97, maxDocs=44218)
              0.0625 = fieldNorm(doc=6682)
      0.5 = coord(1/2)
  0.22222222 = coord(2/9)

Abstract: Presents a model for predicting the performance of a computerised keyword assigning and indexing system. Statistical procedures were investigated in order to protect against incorrect keywording by the system behaving as an expert system designed to mimic the behaviour of human keyword indexers and representing lessons learned from military exercises and operations

Moulaison-Sandy, H.; Adkins, D.; Bossaller, J.; Cho, H.: ¬An automated approach to describing fiction : a methodology to use book reviews to identify affect (2021) 0.02

0.018819636 = product of:
  0.056458905 = sum of:
    0.030001212 = product of:
      0.060002424 = sum of:
        0.060002424 = weight(_text_:headings in 710) [ClassicSimilarity], result of:
          0.060002424 = score(doc=710,freq=2.0), product of:
            0.15996648 = queryWeight, product of:
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.03298316 = queryNorm
            0.37509373 = fieldWeight in 710, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.0546875 = fieldNorm(doc=710)
      0.5 = coord(1/2)
    0.01763606 = weight(_text_:library in 710) [ClassicSimilarity], result of:
      0.01763606 = score(doc=710,freq=2.0), product of:
        0.08672522 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.03298316 = queryNorm
        0.20335563 = fieldWeight in 710, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.0546875 = fieldNorm(doc=710)
    0.008821635 = weight(_text_:of in 710) [ClassicSimilarity], result of:
      0.008821635 = score(doc=710,freq=4.0), product of:
        0.05157766 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03298316 = queryNorm
        0.17103596 = fieldWeight in 710, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=710)
  0.33333334 = coord(3/9)

Abstract: Subject headings and genre terms are notoriously difficult to apply, yet are important for fiction. The current project functions as a proof of concept, using a text-mining methodology to identify affective information (emotion and tone) about fiction titles from professional book reviews as a potential first step in automating the subject analysis process. Findings are presented and discussed, comparing results to the range of aboutness and isness information in library cataloging records. The methodology is likewise presented, and how future work might expand on the current project to enhance catalog records through text-mining is explored.

Hodges, P.R.: Keyword in title indexes : effectiveness of retrieval in computer searches (1983) 0.02

0.017105877 = product of:
  0.051317632 = sum of:
    0.01394823 = weight(_text_:of in 5001) [ClassicSimilarity], result of:
      0.01394823 = score(doc=5001,freq=10.0), product of:
        0.05157766 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03298316 = queryNorm
        0.2704316 = fieldWeight in 5001, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5001)
    0.02172873 = product of:
      0.04345746 = sum of:
        0.04345746 = weight(_text_:problems in 5001) [ClassicSimilarity], result of:
          0.04345746 = score(doc=5001,freq=2.0), product of:
            0.13613719 = queryWeight, product of:
              4.1274753 = idf(docFreq=1937, maxDocs=44218)
              0.03298316 = queryNorm
            0.31921813 = fieldWeight in 5001, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.1274753 = idf(docFreq=1937, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5001)
      0.5 = coord(1/2)
    0.01564067 = product of:
      0.03128134 = sum of:
        0.03128134 = weight(_text_:22 in 5001) [ClassicSimilarity], result of:
          0.03128134 = score(doc=5001,freq=2.0), product of:
            0.11550141 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03298316 = queryNorm
            0.2708308 = fieldWeight in 5001, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5001)
      0.5 = coord(1/2)
  0.33333334 = coord(3/9)

Abstract: A study was done to test the effectiveness of retrieval using title word searching. It was based on actual search profiles used in the Mechanized Information Center at Ohio State University, in order ro replicate as closely as possible actual searching conditions. Fewer than 50% of the relevant titles were retrieved by keywords in titles. The low rate of retrieval can be attributes to three sources: titles themselves, user and information specialist ignorance of the subject vocabulary in use, and to general language problems. Across fields it was found that the social sciences had the best retrieval rate, with science having the next best, and arts and humanities the lowest. Ways to enhance and supplement keyword in title searching on the computer and in printed indexes are discussed.
Date: 14. 3.1996 13:22:21

Golub, K.: Automatic subject indexing of text (2019) 0.02

0.017013267 = product of:
  0.051039796 = sum of:
    0.021429438 = product of:
      0.042858876 = sum of:
        0.042858876 = weight(_text_:headings in 5268) [ClassicSimilarity], result of:
          0.042858876 = score(doc=5268,freq=2.0), product of:
            0.15996648 = queryWeight, product of:
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.03298316 = queryNorm
            0.2679241 = fieldWeight in 5268, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5268)
      0.5 = coord(1/2)
    0.014089839 = weight(_text_:of in 5268) [ClassicSimilarity], result of:
      0.014089839 = score(doc=5268,freq=20.0), product of:
        0.05157766 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03298316 = queryNorm
        0.27317715 = fieldWeight in 5268, product of:
          4.472136 = tf(freq=20.0), with freq of:
            20.0 = termFreq=20.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5268)
    0.0155205205 = product of:
      0.031041041 = sum of:
        0.031041041 = weight(_text_:problems in 5268) [ClassicSimilarity], result of:
          0.031041041 = score(doc=5268,freq=2.0), product of:
            0.13613719 = queryWeight, product of:
              4.1274753 = idf(docFreq=1937, maxDocs=44218)
              0.03298316 = queryNorm
            0.22801295 = fieldWeight in 5268, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.1274753 = idf(docFreq=1937, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5268)
      0.5 = coord(1/2)
  0.33333334 = coord(3/9)

Abstract: Automatic subject indexing addresses problems of scale and sustainability and can be at the same time used to enrich existing metadata records, establish more connections across and between resources from various metadata and resource collec-tions, and enhance consistency of the metadata. In this work, au-tomatic subject indexing focuses on assigning index terms or classes from established knowledge organization systems (KOSs) for subject indexing like thesauri, subject headings systems and classification systems. The following major approaches are dis-cussed, in terms of their similarities and differences, advantages and disadvantages for automatic assigned indexing from KOSs: "text categorization," "document clustering," and "document classification." Text categorization is perhaps the most wide-spread, machine-learning approach with what seems generally good reported performance. Document clustering automatically both creates groups of related documents and extracts names of subjects depicting the group at hand. Document classification re-uses the intellectual effort invested into creating a KOS for sub-ject indexing and even simple string-matching algorithms have been reported to achieve good results, because one concept can be described using a number of different terms, including equiv-alent, related, narrower and broader terms. Finally, applicability of automatic subject indexing to operative information systems and challenges of evaluation are outlined, suggesting the need for more research.
Series: Reviews of concepts in knowledge organization

Milstead, J.L.: Thesauri in a full-text world (1998) 0.02

0.015693571 = product of:
  0.047080714 = sum of:
    0.02181897 = weight(_text_:library in 2337) [ClassicSimilarity], result of:
      0.02181897 = score(doc=2337,freq=6.0), product of:
        0.08672522 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.03298316 = queryNorm
        0.25158736 = fieldWeight in 2337, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2337)
    0.014089839 = weight(_text_:of in 2337) [ClassicSimilarity], result of:
      0.014089839 = score(doc=2337,freq=20.0), product of:
        0.05157766 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03298316 = queryNorm
        0.27317715 = fieldWeight in 2337, product of:
          4.472136 = tf(freq=20.0), with freq of:
            20.0 = termFreq=20.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2337)
    0.011171908 = product of:
      0.022343816 = sum of:
        0.022343816 = weight(_text_:22 in 2337) [ClassicSimilarity], result of:
          0.022343816 = score(doc=2337,freq=2.0), product of:
            0.11550141 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03298316 = queryNorm
            0.19345059 = fieldWeight in 2337, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2337)
      0.5 = coord(1/2)
  0.33333334 = coord(3/9)

Abstract: Despite early claims to the contemporary, thesauri continue to find use as access tools for information in the full-text environment. Their mode of use is changing, but this change actually represents an expansion rather than a contrdiction of their utility. Thesauri and similar vocabulary tools can complement full-text access by aiding users in focusing their searches, by supplementing the linguistic analysis of the text search engine, and even by serving as one of the tools used by the linguistic engine for its analysis. While human indexing contunues to be used for many databases, the trend is to increase the use of machine aids for this purpose. All machine-aided indexing (MAI) systems rely on thesauri as the basis for term selection. In the 21st century, the balance of effort between human and machine will change at both input and output, but thesauri will continue to play an important role for the foreseeable future
Date: 22. 9.1997 19:16:05
Imprint: Urbana-Champaign, IL : Illinois University at Urbana-Champaign, Graduate School of Library and Information Science
Source: Visualizing subject access for 21st century information resources: Papers presented at the 1997 Clinic on Library Applications of Data Processing, 2-4 Mar 1997, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign. Ed.: P.A. Cochrane et al

Olsgaard, J.N.; Evans, E.J.: Improving keyword indexing (1981) 0.01

0.013484727 = product of:
  0.06068127 = sum of:
    0.042858876 = product of:
      0.08571775 = sum of:
        0.08571775 = weight(_text_:headings in 4996) [ClassicSimilarity], result of:
          0.08571775 = score(doc=4996,freq=2.0), product of:
            0.15996648 = queryWeight, product of:
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.03298316 = queryNorm
            0.5358482 = fieldWeight in 4996, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.078125 = fieldNorm(doc=4996)
      0.5 = coord(1/2)
    0.017822394 = weight(_text_:of in 4996) [ClassicSimilarity], result of:
      0.017822394 = score(doc=4996,freq=8.0), product of:
        0.05157766 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03298316 = queryNorm
        0.34554482 = fieldWeight in 4996, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.078125 = fieldNorm(doc=4996)
  0.22222222 = coord(2/9)

Abstract: This communication examines some of the most frequently cited critisms of keyword indexing. These critisms include (1) absence of general subject headings, (2) limited entry points, and (3) irrelevant indexing. Some solutions are suggested to meet these critisms.
Source: Journal of the American society for information science. 32(1981), S.71-72

Willis, C.; Losee, R.M.: ¬A random walk on an ontology : using thesaurus structure for automatic subject indexing (2013) 0.01
```
0.013357736 = product of:
  0.04007321 = sum of:
    0.01714355 = product of:
      0.0342871 = sum of:
        0.0342871 = weight(_text_:headings in 1016) [ClassicSimilarity], result of:
          0.0342871 = score(doc=1016,freq=2.0), product of:
            0.15996648 = queryWeight, product of:
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.03298316 = queryNorm
            0.21433927 = fieldWeight in 1016, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.849944 = idf(docFreq=940, maxDocs=44218)
              0.03125 = fieldNorm(doc=1016)
      0.5 = coord(1/2)
    0.010077749 = weight(_text_:library in 1016) [ClassicSimilarity], result of:
      0.010077749 = score(doc=1016,freq=2.0), product of:
        0.08672522 = queryWeight, product of:
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.03298316 = queryNorm
        0.11620321 = fieldWeight in 1016, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.6293786 = idf(docFreq=8668, maxDocs=44218)
          0.03125 = fieldNorm(doc=1016)
    0.012851911 = weight(_text_:of in 1016) [ClassicSimilarity], result of:
      0.012851911 = score(doc=1016,freq=26.0), product of:
        0.05157766 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03298316 = queryNorm
        0.2491759 = fieldWeight in 1016, product of:
          5.0990195 = tf(freq=26.0), with freq of:
            26.0 = termFreq=26.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03125 = fieldNorm(doc=1016)
  0.33333334 = coord(3/9)
```
Abstract

Relationships between terms and features are an essential component of thesauri, ontologies, and a range of controlled vocabularies. In this article, we describe ways to identify important concepts in documents using the relationships in a thesaurus or other vocabulary structures. We introduce a methodology for the analysis and modeling of the indexing process based on a weighted random walk algorithm. The primary goal of this research is the analysis of the contribution of thesaurus structure to the indexing process. The resulting models are evaluated in the context of automatic subject indexing using four collections of documents pre-indexed with 4 different thesauri (AGROVOC [UN Food and Agriculture Organization], high-energy physics taxonomy [HEP], National Agricultural Library Thesaurus [NALT], and medical subject headings [MeSH]). We also introduce a thesaurus-centric matching algorithm intended to improve the quality of candidate concepts. In all cases, the weighted random walk improves automatic indexing performance over matching alone with an increase in average precision (AP) of 9% for HEP, 11% for MeSH, 35% for NALT, and 37% for AGROVOC. The results of the analysis support our hypothesis that subject indexing is in part a browsing process, and that using the vocabulary and its structure in a thesaurus contributes to the indexing process. The amount that the vocabulary structure contributes was found to differ among the 4 thesauri, possibly due to the vocabulary used in the corresponding thesauri and the structural relationships between the terms. Each of the thesauri and the manual indexing associated with it is characterized using the methods developed here.

Source

Journal of the American Society for Information Science and Technology. 64(2013) no.7, S.1330-1344

Search (263 results, page 1 of 14)

Authors

Years

Languages

Types

Themes

Classifications