Search (260 results, page 1 of 13)

Mesquita, L.A.P.; Souza, R.R.; Baracho Porto, R.M.A.: Noun phrases in automatic indexing: : a structural analysis of the distribution of relevant terms in doctoral theses (2014) 0.08
```
0.081112646 = product of:
  0.18926284 = sum of:
    0.02372625 = weight(_text_:of in 1442) [ClassicSimilarity], result of:
      0.02372625 = score(doc=1442,freq=50.0), product of:
        0.06866331 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.043909185 = queryNorm
        0.34554482 = fieldWeight in 1442, product of:
          7.071068 = tf(freq=50.0), with freq of:
            50.0 = termFreq=50.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03125 = fieldNorm(doc=1442)
    0.15363841 = weight(_text_:distribution in 1442) [ClassicSimilarity], result of:
      0.15363841 = score(doc=1442,freq=14.0), product of:
        0.24019864 = queryWeight, product of:
          5.4703507 = idf(docFreq=505, maxDocs=44218)
          0.043909185 = queryNorm
        0.6396306 = fieldWeight in 1442, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          5.4703507 = idf(docFreq=505, maxDocs=44218)
          0.03125 = fieldNorm(doc=1442)
    0.011898177 = product of:
      0.023796353 = sum of:
        0.023796353 = weight(_text_:22 in 1442) [ClassicSimilarity], result of:
          0.023796353 = score(doc=1442,freq=2.0), product of:
            0.15376249 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.043909185 = queryNorm
            0.15476047 = fieldWeight in 1442, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=1442)
      0.5 = coord(1/2)
  0.42857143 = coord(3/7)
```
Abstract

The main objective of this research was to analyze whether there was a characteristic distribution behavior of relevant terms over a scientific text that could contribute as a criterion for their process of automatic indexing. The terms considered in this study were only full noun phrases contained in the texts themselves. The texts were considered a total of 98 doctoral theses of the eight areas of knowledge in a same university. Initially, 20 full noun phrases were automatically extracted from each text as candidates to be the most relevant terms, and each author of each text assigned a relevance value 0-6 (not relevant and highly relevant, respectively) for each of the 20 noun phrases sent. Only, 22.1 % of noun phrases were considered not relevant. A relevance values of the terms assigned by the authors were associated with their positions in the text. Each full noun phrases found in the text was considered as a valid linear position. The results that were obtained showed values resulting from this distribution by considering two types of position: linear, with values consolidated into ten equal consecutive parts; and structural, considering parts of the text (such as introduction, development and conclusion). As a result of considerable importance, all areas of knowledge related to the Natural Sciences showed a characteristic behavior in the distribution of relevant terms, as well as all areas of knowledge related to Social Sciences showed the same characteristic behavior of distribution, but distinct from the Natural Sciences. The difference of the distribution behavior between the Natural and Social Sciences can be clearly visualized through graphs. All behaviors, including the general behavior of all areas of knowledge together, were characterized in polynomial equations and can be applied in future as criteria for automatic indexing. Until the present date this work has become inedited of for two reasons: to present a method for characterizing the distribution of relevant terms in a scientific text, and also, through this method, pointing out a quantitative trait difference between the Natural and Social Sciences.

Source

Knowledge organization in the 21st century: between historical patterns and future prospects. Proceedings of the Thirteenth International ISKO Conference 19-22 May 2014, Kraków, Poland. Ed.: Wieslaw Babik

Chou, C.; Chu, T.: ¬An analysis of BERT (NLP) for assisted subject indexing for Project Gutenberg (2022) 0.08

0.07885902 = product of:
  0.18400437 = sum of:
    0.021970814 = weight(_text_:of in 1139) [ClassicSimilarity], result of:
      0.021970814 = score(doc=1139,freq=14.0), product of:
        0.06866331 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.043909185 = queryNorm
        0.31997898 = fieldWeight in 1139, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1139)
    0.109286554 = weight(_text_:congress in 1139) [ClassicSimilarity], result of:
      0.109286554 = score(doc=1139,freq=4.0), product of:
        0.20946044 = queryWeight, product of:
          4.7703104 = idf(docFreq=1018, maxDocs=44218)
          0.043909185 = queryNorm
        0.5217527 = fieldWeight in 1139, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.7703104 = idf(docFreq=1018, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1139)
    0.052747004 = weight(_text_:cataloging in 1139) [ClassicSimilarity], result of:
      0.052747004 = score(doc=1139,freq=2.0), product of:
        0.17305137 = queryWeight, product of:
          3.9411201 = idf(docFreq=2334, maxDocs=44218)
          0.043909185 = queryNorm
        0.30480546 = fieldWeight in 1139, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9411201 = idf(docFreq=2334, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1139)
  0.42857143 = coord(3/7)

Abstract: In light of AI (Artificial Intelligence) and NLP (Natural language processing) technologies, this article examines the feasibility of using AI/NLP models to enhance the subject indexing of digital resources. While BERT (Bidirectional Encoder Representations from Transformers) models are widely used in scholarly communities, the authors assess whether BERT models can be used in machine-assisted indexing in the Project Gutenberg collection, through suggesting Library of Congress subject headings filtered by certain Library of Congress Classification subclass labels. The findings of this study are informative for further research on BERT models to assist with automatic subject indexing for digital library collections.
Source: Cataloging and classification quarterly. 60(2022) no.8, p.807-835

Pulgarin, A.; Gil-Leiva, I.: Bibliometric analysis of the automatic indexing literature : 1956-2000 (2004) 0.06

0.05740785 = product of:
  0.20092747 = sum of:
    0.02491256 = weight(_text_:of in 2566) [ClassicSimilarity], result of:
      0.02491256 = score(doc=2566,freq=18.0), product of:
        0.06866331 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.043909185 = queryNorm
        0.36282203 = fieldWeight in 2566, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2566)
    0.1760149 = weight(_text_:distribution in 2566) [ClassicSimilarity], result of:
      0.1760149 = score(doc=2566,freq=6.0), product of:
        0.24019864 = queryWeight, product of:
          5.4703507 = idf(docFreq=505, maxDocs=44218)
          0.043909185 = queryNorm
        0.7327889 = fieldWeight in 2566, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          5.4703507 = idf(docFreq=505, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2566)
  0.2857143 = coord(2/7)

Abstract: We present a bibliometric study of a corpus of 839 bibliographic references about automatic indexing, covering the period 1956-2000. We analyse the distribution of authors and works, the obsolescence and its dispersion, and the distribution of the literature by topic, year, and source type. We conclude that: (i) there has been a constant interest on the part of researchers; (ii) the most studied topics were the techniques and methods employed and the general aspects of automatic indexing; (iii) the productivity of the authors does fit a Lotka distribution (Dmax=0.02 and critical value=0.054); (iv) the annual aging factor is 95%; and (v) the dispersion of the literature is low.

Srinivasan, P.: On generalizing the Two-Poisson Model (1990) 0.05

0.046873312 = product of:
  0.16405658 = sum of:
    0.02034102 = weight(_text_:of in 2880) [ClassicSimilarity], result of:
      0.02034102 = score(doc=2880,freq=12.0), product of:
        0.06866331 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.043909185 = queryNorm
        0.29624295 = fieldWeight in 2880, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2880)
    0.14371556 = weight(_text_:distribution in 2880) [ClassicSimilarity], result of:
      0.14371556 = score(doc=2880,freq=4.0), product of:
        0.24019864 = queryWeight, product of:
          5.4703507 = idf(docFreq=505, maxDocs=44218)
          0.043909185 = queryNorm
        0.5983196 = fieldWeight in 2880, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          5.4703507 = idf(docFreq=505, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2880)
  0.2857143 = coord(2/7)

Abstract: Automatic indexing is one of the important functions of a modern document retrieval system. Numerous techniques for this function have been proposed in the literature ranging from purely statistical to linguistically complex mechanisms. Most result from examining properties of terms. Examines term distribution within the framework of the Poisson models. Specifically examines the effectiveness of the Two-Poisson and the Three-Poisson model to see if generalisation results in increased effectiveness. The results show that the Two-Poisson model is only moderately effective in identifying index terms. In addition, generalisation to the Three-Poisson does not give any additional power. The only Poisson model which consistently works well is the basic One-Poisson model. Also discusses term distribution information.
Source: Journal of the American Society for Information Science. 41(1990) no.1, S.61-66

Mao, J.; Xu, W.; Yang, Y.; Wang, J.; Yuille, A.L.: Explain images with multimodal recurrent neural networks (2014) 0.04
```
0.040177125 = product of:
  0.14061993 = sum of:
    0.01743516 = weight(_text_:of in 1557) [ClassicSimilarity], result of:
      0.01743516 = score(doc=1557,freq=12.0), product of:
        0.06866331 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.043909185 = queryNorm
        0.25392252 = fieldWeight in 1557, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=1557)
    0.12318477 = weight(_text_:distribution in 1557) [ClassicSimilarity], result of:
      0.12318477 = score(doc=1557,freq=4.0), product of:
        0.24019864 = queryWeight, product of:
          5.4703507 = idf(docFreq=505, maxDocs=44218)
          0.043909185 = queryNorm
        0.5128454 = fieldWeight in 1557, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          5.4703507 = idf(docFreq=505, maxDocs=44218)
          0.046875 = fieldNorm(doc=1557)
  0.2857143 = coord(2/7)
```
Abstract

In this paper, we present a multimodal Recurrent Neural Network (m-RNN) model for generating novel sentence descriptions to explain the content of images. It directly models the probability distribution of generating a word given previous words and the image. Image descriptions are generated by sampling from this distribution. The model consists of two sub-networks: a deep recurrent neural network for sentences and a deep convolutional network for images. These two sub-networks interact with each other in a multimodal layer to form the whole m-RNN model. The effectiveness of our model is validated on three benchmark datasets (IAPR TC-12 [8], Flickr 8K [28], and Flickr 30K [13]). Our model outperforms the state-of-the-art generative method. In addition, the m-RNN model can be applied to retrieval tasks for retrieving images or sentences, and achieves significant performance improvement over the state-of-the-art methods which directly optimize the ranking objective function for retrieval.

Prasad, A.R.D.: PROMETHEUS: an automatic indexing system (1996) 0.03

0.032407537 = product of:
  0.11342637 = sum of:
    0.025109503 = weight(_text_:of in 5189) [ClassicSimilarity], result of:
      0.025109503 = score(doc=5189,freq=14.0), product of:
        0.06866331 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.043909185 = queryNorm
        0.36569026 = fieldWeight in 5189, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0625 = fieldNorm(doc=5189)
    0.08831687 = weight(_text_:congress in 5189) [ClassicSimilarity], result of:
      0.08831687 = score(doc=5189,freq=2.0), product of:
        0.20946044 = queryWeight, product of:
          4.7703104 = idf(docFreq=1018, maxDocs=44218)
          0.043909185 = queryNorm
        0.42163986 = fieldWeight in 5189, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.7703104 = idf(docFreq=1018, maxDocs=44218)
          0.0625 = fieldNorm(doc=5189)
  0.2857143 = coord(2/7)

Abstract: An automatic indexing system using the tools and techniques of artificial intelligence is described. The paper presents the various components of the system like the parser, grammar formalism, lexicon, and the frame based knowledge representation for semantic representation. The semantic representation is based on the Ranganathan school of thought, especially that of Deep Structure of Subject Indexing Languages enunciated by Bhattacharyya. It is attempted to demonstrate the various stepts in indexing by providing an illustration
Source: Knowledge organization and change: Proceedings of the Fourth International ISKO Conference, 15-18 July 1996, Library of Congress, Washington, DC. Ed.: R. Green

Matthews, P.; Glitre, K.: Genre analysis of movies using a topic model of plot summaries (2021) 0.03
```
0.030639194 = product of:
  0.107237175 = sum of:
    0.020132389 = weight(_text_:of in 412) [ClassicSimilarity], result of:
      0.020132389 = score(doc=412,freq=16.0), product of:
        0.06866331 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.043909185 = queryNorm
        0.2932045 = fieldWeight in 412, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=412)
    0.08710478 = weight(_text_:distribution in 412) [ClassicSimilarity], result of:
      0.08710478 = score(doc=412,freq=2.0), product of:
        0.24019864 = queryWeight, product of:
          5.4703507 = idf(docFreq=505, maxDocs=44218)
          0.043909185 = queryNorm
        0.36263645 = fieldWeight in 412, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.4703507 = idf(docFreq=505, maxDocs=44218)
          0.046875 = fieldNorm(doc=412)
  0.2857143 = coord(2/7)
```
Abstract

Genre plays an important role in the description, navigation, and discovery of movies, but it is rarely studied at large scale using quantitative methods. This allows an analysis of how genre labels are applied, how genres are composed and how these ingredients change, and how genres compare. We apply unsupervised topic modeling to a large collection of textual movie summaries and then use the model's topic proportions to investigate key questions in genre, including recognizability, mapping, canonicity, and change over time. We find that many genres can be quite easily predicted by their lexical signatures and this defines their position on the genre landscape. We find significant genre composition changes between periods for westerns, science fiction and road movies, reflecting changes in production and consumption values. We show that in terms of canonicity, canonical examples are often at the high end of the topic distribution profile for the genre rather than central as might be predicted by categorization theory.

Source

Journal of the Association for Information Science and Technology. 72(2021) no.12, S.1511-1527

Plaunt, C.; Norgard, B.A.: ¬An association-based method for automatic indexing with a controlled vocabulary (1998) 0.03

0.029711206 = product of:
  0.06932615 = sum of:
    0.016776992 = weight(_text_:of in 1794) [ClassicSimilarity], result of:
      0.016776992 = score(doc=1794,freq=16.0), product of:
        0.06866331 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.043909185 = queryNorm
        0.24433708 = fieldWeight in 1794, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1794)
    0.03767643 = weight(_text_:cataloging in 1794) [ClassicSimilarity], result of:
      0.03767643 = score(doc=1794,freq=2.0), product of:
        0.17305137 = queryWeight, product of:
          3.9411201 = idf(docFreq=2334, maxDocs=44218)
          0.043909185 = queryNorm
        0.21771818 = fieldWeight in 1794, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9411201 = idf(docFreq=2334, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1794)
    0.014872721 = product of:
      0.029745443 = sum of:
        0.029745443 = weight(_text_:22 in 1794) [ClassicSimilarity], result of:
          0.029745443 = score(doc=1794,freq=2.0), product of:
            0.15376249 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.043909185 = queryNorm
            0.19345059 = fieldWeight in 1794, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1794)
      0.5 = coord(1/2)
  0.42857143 = coord(3/7)

Abstract: In this article, we describe and test a two-stage algorithm based on a lexical collocation technique which maps from the lexical clues contained in a document representation into a controlled vocabulary list of subject headings. Using a collection of 4.626 INSPEC documents, we create a 'dictionary' of associations between the lexical items contained in the titles, authors, and abstracts, and controlled vocabulary subject headings assigned to those records by human indexers using a likelihood ratio statistic as the measure of association. In the deployment stage, we use the dictiony to predict which of the controlled vocabulary subject headings best describe new documents when they are presented to the system. Our evaluation of this algorithm, in which we compare the automatically assigned subject headings to the subject headings assigned to the test documents by human catalogers, shows that we can obtain results comparable to, and consistent with, human cataloging. In effect we have cast this as a classic partial match information retrieval problem. We consider the problem to be one of 'retrieving' (or assigning) the most probably 'relevant' (or correct) controlled vocabulary subject headings to a document based on the clues contained in that document
Date: 11. 9.2000 19:53:22
Source: Journal of the American Society for Information Science. 49(1998) no.10, S.888-902

Losee, R.M.: ¬A Gray code based ordering for documents on shelves : classification for browsing and retrieval (1992) 0.03
```
0.027384568 = product of:
  0.09584598 = sum of:
    0.018568728 = weight(_text_:of in 2335) [ClassicSimilarity], result of:
      0.018568728 = score(doc=2335,freq=10.0), product of:
        0.06866331 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.043909185 = queryNorm
        0.2704316 = fieldWeight in 2335, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2335)
    0.07727726 = weight(_text_:congress in 2335) [ClassicSimilarity], result of:
      0.07727726 = score(doc=2335,freq=2.0), product of:
        0.20946044 = queryWeight, product of:
          4.7703104 = idf(docFreq=1018, maxDocs=44218)
          0.043909185 = queryNorm
        0.36893487 = fieldWeight in 2335, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.7703104 = idf(docFreq=1018, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2335)
  0.2857143 = coord(2/7)
```
Abstract

A document classifier places documents together in a linear arrangement for browsing or high-speed access by human or computerised information retrieval systems. Requirements for document classification and browsing systems are developed from similarity measures, distance measures, and the notion of subject aboutness. A requirement that documents be arranged in decreasing order of similarity as the distance from a given document increases can often not be met. Based on these requirements, information-theoretic considerations, and the Gray code, a classification system is proposed that can classifiy documents without human intervention. A measure of classifier performance is developed, and used to evaluate experimental results comparing the distance between subject headings assigned to documents given classifications from the proposed system and the Library of Congress Classification (LCC) system

Source

Journal of the American Society for Information Science. 43(1992) no.4, S.312-322

Short, M.: Text mining and subject analysis for fiction; or, using machine learning and information extraction to assign subject headings to dime novels (2019) 0.03

0.027124729 = product of:
  0.09493655 = sum of:
    0.02034102 = weight(_text_:of in 5481) [ClassicSimilarity], result of:
      0.02034102 = score(doc=5481,freq=12.0), product of:
        0.06866331 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.043909185 = queryNorm
        0.29624295 = fieldWeight in 5481, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5481)
    0.074595526 = weight(_text_:cataloging in 5481) [ClassicSimilarity], result of:
      0.074595526 = score(doc=5481,freq=4.0), product of:
        0.17305137 = queryWeight, product of:
          3.9411201 = idf(docFreq=2334, maxDocs=44218)
          0.043909185 = queryNorm
        0.43106002 = fieldWeight in 5481, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.9411201 = idf(docFreq=2334, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5481)
  0.2857143 = coord(2/7)

Abstract: This article describes multiple experiments in text mining at Northern Illinois University that were undertaken to improve the efficiency and accuracy of cataloging. It focuses narrowly on subject analysis of dime novels, a format of inexpensive fiction that was popular in the United States between 1860 and 1915. NIU holds more than 55,000 dime novels in its collections, which it is in the process of comprehensively digitizing. Classification, keyword extraction, named-entity recognition, clustering, and topic modeling are discussed as means of assigning subject headings to improve their discoverability by researchers and to increase the productivity of digitization workflows.
Source: Cataloging and classification quarterly. 57(2019) no.5, S.315-336

Lichtenstein, A.; Plank, M.; Neumann, J.: TIB's portal for audiovisual media : combining manual and automatic indexing (2014) 0.03

0.025422515 = product of:
  0.0889788 = sum of:
    0.014383274 = weight(_text_:of in 1981) [ClassicSimilarity], result of:
      0.014383274 = score(doc=1981,freq=6.0), product of:
        0.06866331 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.043909185 = queryNorm
        0.20947541 = fieldWeight in 1981, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1981)
    0.074595526 = weight(_text_:cataloging in 1981) [ClassicSimilarity], result of:
      0.074595526 = score(doc=1981,freq=4.0), product of:
        0.17305137 = queryWeight, product of:
          3.9411201 = idf(docFreq=2334, maxDocs=44218)
          0.043909185 = queryNorm
        0.43106002 = fieldWeight in 1981, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.9411201 = idf(docFreq=2334, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1981)
  0.2857143 = coord(2/7)

Abstract: The German National Library of Science and Technology (TIB) developed a Web-based platform for audiovisual media. The audiovisual portal optimizes access to scientific videos such as computer animations and lecture and conference recordings. TIB's AV-Portal combines traditional cataloging and automatic indexing of audiovisual media. The article describes metadata standards for audiovisual media and introduces the TIB's metadata schema in comparison to other metadata standards for non-textual materials. Additionally, we give an overview of multimedia retrieval technologies used for the Portal and present the AV-Portal in detail as well as the additional value for libraries and their users.
Source: Cataloging and classification quarterly. 52(2014) no.5, S.562-577

Junger, U.: Can indexing be automated? : the example of the Deutsche Nationalbibliothek (2014) 0.02

0.024668407 = product of:
  0.08633942 = sum of:
    0.011743895 = weight(_text_:of in 1969) [ClassicSimilarity], result of:
      0.011743895 = score(doc=1969,freq=4.0), product of:
        0.06866331 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.043909185 = queryNorm
        0.17103596 = fieldWeight in 1969, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1969)
    0.074595526 = weight(_text_:cataloging in 1969) [ClassicSimilarity], result of:
      0.074595526 = score(doc=1969,freq=4.0), product of:
        0.17305137 = queryWeight, product of:
          3.9411201 = idf(docFreq=2334, maxDocs=44218)
          0.043909185 = queryNorm
        0.43106002 = fieldWeight in 1969, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.9411201 = idf(docFreq=2334, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1969)
  0.2857143 = coord(2/7)

Abstract: The German Integrated Authority File (Gemeinsame Normdatei, GND), provides a broad controlled vocabulary for indexing documents on all subjects. Traditionally used for intellectual subject cataloging primarily for books, the Deutsche Nationalbibliothek (DNB, German National Library) has been working on developing and implementing procedures for automated assignment of subject headings for online publications. This project, its results, and problems are outlined in this article.
Source: Cataloging and classification quarterly. 52(2014) no.1, S.102-109

Moulaison-Sandy, H.; Adkins, D.; Bossaller, J.; Cho, H.: ¬An automated approach to describing fiction : a methodology to use book reviews to identify affect (2021) 0.02

0.024668407 = product of:
  0.08633942 = sum of:
    0.011743895 = weight(_text_:of in 710) [ClassicSimilarity], result of:
      0.011743895 = score(doc=710,freq=4.0), product of:
        0.06866331 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.043909185 = queryNorm
        0.17103596 = fieldWeight in 710, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=710)
    0.074595526 = weight(_text_:cataloging in 710) [ClassicSimilarity], result of:
      0.074595526 = score(doc=710,freq=4.0), product of:
        0.17305137 = queryWeight, product of:
          3.9411201 = idf(docFreq=2334, maxDocs=44218)
          0.043909185 = queryNorm
        0.43106002 = fieldWeight in 710, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.9411201 = idf(docFreq=2334, maxDocs=44218)
          0.0546875 = fieldNorm(doc=710)
  0.2857143 = coord(2/7)

Abstract: Subject headings and genre terms are notoriously difficult to apply, yet are important for fiction. The current project functions as a proof of concept, using a text-mining methodology to identify affective information (emotion and tone) about fiction titles from professional book reviews as a potential first step in automating the subject analysis process. Findings are presented and discussed, comparing results to the range of aboutness and isness information in library cataloging records. The methodology is likewise presented, and how future work might expand on the current project to enhance catalog records through text-mining is explored.
Source: Cataloging and classification quarterly. 59(2021) no.8, p.794-814

Golub, K.; Lykke, M.; Tudhope, D.: Enhancing social tagging with automated keywords from the Dewey Decimal Classification (2014) 0.02
```
0.022111978 = product of:
  0.07739192 = sum of:
    0.022193875 = weight(_text_:of in 2918) [ClassicSimilarity], result of:
      0.022193875 = score(doc=2918,freq=28.0), product of:
        0.06866331 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.043909185 = queryNorm
        0.32322758 = fieldWeight in 2918, product of:
          5.2915025 = tf(freq=28.0), with freq of:
            28.0 = termFreq=28.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2918)
    0.055198044 = weight(_text_:congress in 2918) [ClassicSimilarity], result of:
      0.055198044 = score(doc=2918,freq=2.0), product of:
        0.20946044 = queryWeight, product of:
          4.7703104 = idf(docFreq=1018, maxDocs=44218)
          0.043909185 = queryNorm
        0.26352492 = fieldWeight in 2918, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.7703104 = idf(docFreq=1018, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2918)
  0.2857143 = coord(2/7)
```
Abstract

Purpose - The purpose of this paper is to explore the potential of applying the Dewey Decimal Classification (DDC) as an established knowledge organization system (KOS) for enhancing social tagging, with the ultimate purpose of improving subject indexing and information retrieval. Design/methodology/approach - Over 11.000 Intute metadata records in politics were used. Totally, 28 politics students were each given four tasks, in which a total of 60 resources were tagged in two different configurations, one with uncontrolled social tags only and another with uncontrolled social tags as well as suggestions from a controlled vocabulary. The controlled vocabulary was DDC comprising also mappings from the Library of Congress Subject Headings. Findings - The results demonstrate the importance of controlled vocabulary suggestions for indexing and retrieval: to help produce ideas of which tags to use, to make it easier to find focus for the tagging, to ensure consistency and to increase the number of access points in retrieval. The value and usefulness of the suggestions proved to be dependent on the quality of the suggestions, both as to conceptual relevance to the user and as to appropriateness of the terminology. Originality/value - No research has investigated the enhancement of social tagging with suggestions from the DDC, an established KOS, in a user trial, comparing social tagging only and social tagging enhanced with the suggestions. This paper is a final reflection on all aspects of the study.

Source

Journal of documentation. 70(2014) no.5, S.801-828

Oliver, C.: Leveraging KOS to extend our reach with automated processes (2021) 0.02

0.019935083 = product of:
  0.06977279 = sum of:
    0.0094905 = weight(_text_:of in 722) [ClassicSimilarity], result of:
      0.0094905 = score(doc=722,freq=2.0), product of:
        0.06866331 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.043909185 = queryNorm
        0.13821793 = fieldWeight in 722, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0625 = fieldNorm(doc=722)
    0.06028229 = weight(_text_:cataloging in 722) [ClassicSimilarity], result of:
      0.06028229 = score(doc=722,freq=2.0), product of:
        0.17305137 = queryWeight, product of:
          3.9411201 = idf(docFreq=2334, maxDocs=44218)
          0.043909185 = queryNorm
        0.3483491 = fieldWeight in 722, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9411201 = idf(docFreq=2334, maxDocs=44218)
          0.0625 = fieldNorm(doc=722)
  0.2857143 = coord(2/7)

Abstract: This article provides a conclusion to the special issue on Artificial Intelligence (AI) and Automated Processes for Subject Access. The authors who contributed to this special issue have provoked interesting questions as well as bringing attention to important issues. This concluding article looks at common themes and highlights some of the questions raised.
Source: Cataloging and classification quarterly. 59(2021) no.8, p.868-874

Medelyan, O.; Witten, I.H.: Domain-independent automatic keyphrase indexing with small training sets (2008) 0.02
```
0.01901867 = product of:
  0.06656534 = sum of:
    0.021353623 = weight(_text_:of in 1871) [ClassicSimilarity], result of:
      0.021353623 = score(doc=1871,freq=18.0), product of:
        0.06866331 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.043909185 = queryNorm
        0.3109903 = fieldWeight in 1871, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=1871)
    0.045211717 = weight(_text_:cataloging in 1871) [ClassicSimilarity], result of:
      0.045211717 = score(doc=1871,freq=2.0), product of:
        0.17305137 = queryWeight, product of:
          3.9411201 = idf(docFreq=2334, maxDocs=44218)
          0.043909185 = queryNorm
        0.26126182 = fieldWeight in 1871, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9411201 = idf(docFreq=2334, maxDocs=44218)
          0.046875 = fieldNorm(doc=1871)
  0.2857143 = coord(2/7)
```
Abstract

Keyphrases are widely used in both physical and digital libraries as a brief, but precise, summary of documents. They help organize material based on content, provide thematic access, represent search results, and assist with navigation. Manual assignment is expensive because trained human indexers must reach an understanding of the document and select appropriate descriptors according to defined cataloging rules. We propose a new method that enhances automatic keyphrase extraction by using semantic information about terms and phrases gleaned from a domain-specific thesaurus. The key advantage of the new approach is that it performs well with very little training data. We evaluate it on a large set of manually indexed documents in the domain of agriculture, compare its consistency with a group of six professional indexers, and explore its performance on smaller collections of documents in other domains and of French and Spanish documents.

Source

Journal of the American Society for Information Science and Technology. 59(2008) no.7, S.1026-1040

Bloomfield, M.: Indexing : neglected and poorly understood (2001) 0.02

0.018669747 = product of:
  0.06534411 = sum of:
    0.020132389 = weight(_text_:of in 5439) [ClassicSimilarity], result of:
      0.020132389 = score(doc=5439,freq=16.0), product of:
        0.06866331 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.043909185 = queryNorm
        0.2932045 = fieldWeight in 5439, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=5439)
    0.045211717 = weight(_text_:cataloging in 5439) [ClassicSimilarity], result of:
      0.045211717 = score(doc=5439,freq=2.0), product of:
        0.17305137 = queryWeight, product of:
          3.9411201 = idf(docFreq=2334, maxDocs=44218)
          0.043909185 = queryNorm
        0.26126182 = fieldWeight in 5439, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9411201 = idf(docFreq=2334, maxDocs=44218)
          0.046875 = fieldNorm(doc=5439)
  0.2857143 = coord(2/7)

Abstract: The growth of the Internet has highlighted the use of machine indexing. The difficulties in using the Internet as a searching device can be frustrating. The use of the term "Python" is given as an example. Machine indexing is noted as "rotten" and human indexing as "capricious." The problem seems to be a lack of a theoretical foundation for the art of indexing. What librarians have learned over the last hundred years has yet to yield a consistent approach to what really works best in preparing index terms and in the ability of our customers to search the various indexes. An attempt is made to consider the elements of indexing, their pros and cons. The argument is made that machine indexing is far too prolific in its production of index terms. Neither librarians nor computer programmers have made much progress to improve Internet indexing. Human indexing has had the same problems for over fifty years.
Source: Cataloging and classification quarterly. 33(2001) no.1, S.63-75

Keller, A.: Attitudes among German- and English-speaking librarians toward (automatic) subject indexing (2015) 0.02

0.018425971 = product of:
  0.0644909 = sum of:
    0.011743895 = weight(_text_:of in 2629) [ClassicSimilarity], result of:
      0.011743895 = score(doc=2629,freq=4.0), product of:
        0.06866331 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.043909185 = queryNorm
        0.17103596 = fieldWeight in 2629, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2629)
    0.052747004 = weight(_text_:cataloging in 2629) [ClassicSimilarity], result of:
      0.052747004 = score(doc=2629,freq=2.0), product of:
        0.17305137 = queryWeight, product of:
          3.9411201 = idf(docFreq=2334, maxDocs=44218)
          0.043909185 = queryNorm
        0.30480546 = fieldWeight in 2629, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9411201 = idf(docFreq=2334, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2629)
  0.2857143 = coord(2/7)

Abstract: The survey described in this article investigates the attitudes of librarians in German- and English-speaking countries toward subject indexing in general, and automatic subject indexing in particular. The results show great similarity between attitudes in both language areas. Respondents agree that the current quality standards should be upheld and dismiss critical voices claiming that subject indexing has lost relevance. With regard to automatic subject indexing, respondents demonstrate considerable skepticism-both with regard to the likely timeframe and the expected quality of such systems. The author considers how this low acceptance poses a difficulty for those involved in change management.
Source: Cataloging and classification quarterly. 53(2015) no.8, S.895-904

Golub, K.: Automated subject indexing : an overview (2021) 0.02

0.018425971 = product of:
  0.0644909 = sum of:
    0.011743895 = weight(_text_:of in 718) [ClassicSimilarity], result of:
      0.011743895 = score(doc=718,freq=4.0), product of:
        0.06866331 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.043909185 = queryNorm
        0.17103596 = fieldWeight in 718, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=718)
    0.052747004 = weight(_text_:cataloging in 718) [ClassicSimilarity], result of:
      0.052747004 = score(doc=718,freq=2.0), product of:
        0.17305137 = queryWeight, product of:
          3.9411201 = idf(docFreq=2334, maxDocs=44218)
          0.043909185 = queryNorm
        0.30480546 = fieldWeight in 718, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9411201 = idf(docFreq=2334, maxDocs=44218)
          0.0546875 = fieldNorm(doc=718)
  0.2857143 = coord(2/7)

Abstract: In the face of the ever-increasing document volume, libraries around the globe are more and more exploring (semi-) automated approaches to subject indexing. This helps sustain bibliographic objectives, enrich metadata, and establish more connections across documents from various collections, effectively leading to improved information retrieval and access. However, generally accepted automated approaches that are functional in operative systems are lacking. This article aims to provide an overview of basic principles used for automated subject indexing, major approaches in relation to their possible application in actual library systems, existing working examples, as well as related challenges calling for further research.
Source: Cataloging and classification quarterly. 59(2021) no.8, p.702-719

Lassalle, E.: Text retrieval : from a monolingual system to a multilingual system (1993) 0.02

0.018407404 = product of:
  0.064425915 = sum of:
    0.02034102 = weight(_text_:of in 7403) [ClassicSimilarity], result of:
      0.02034102 = score(doc=7403,freq=12.0), product of:
        0.06866331 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.043909185 = queryNorm
        0.29624295 = fieldWeight in 7403, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=7403)
    0.04408489 = product of:
      0.08816978 = sum of:
        0.08816978 = weight(_text_:service in 7403) [ClassicSimilarity], result of:
          0.08816978 = score(doc=7403,freq=4.0), product of:
            0.18813887 = queryWeight, product of:
              4.284727 = idf(docFreq=1655, maxDocs=44218)
              0.043909185 = queryNorm
            0.46864203 = fieldWeight in 7403, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.284727 = idf(docFreq=1655, maxDocs=44218)
              0.0546875 = fieldNorm(doc=7403)
      0.5 = coord(1/2)
  0.2857143 = coord(2/7)

Abstract: Describes the TELMI monolingual text retrieval system and its future extension, a multilingual system. TELMI is designed for medium sized databases containing short texts. The characteristics of the system are fine-grained natural language processing (NLP); an open domain and a large scale knowledge base; automated indexing based on conceptual representation of texts and reusability of the NLP tools. Discusses the French MINITEL service, the MGS information service and the TELMI research system covering the full text system; NLP architecture; the lexical level; the syntactic level; the semantic level and an example of the use of a generic system
Source: Journal of document and text management. 1(1993) no.1, S.65-74

Search (260 results, page 1 of 13)

Authors

Years

Languages

Types

Themes

Classifications