Search (5 results, page 1 of 1)

Lowe, D.B.; Dollinger, I.; Koster, T.; Herbert, B.E.: Text mining for type of research classification (2021) 0.01

0.011352874 = product of:
  0.022705749 = sum of:
    0.022705749 = product of:
      0.045411497 = sum of:
        0.045411497 = weight(_text_:i in 720) [ClassicSimilarity], result of:
          0.045411497 = score(doc=720,freq=2.0), product of:
            0.18162222 = queryWeight, product of:
              3.7717297 = idf(docFreq=2765, maxDocs=44218)
              0.04815356 = queryNorm
            0.25003272 = fieldWeight in 720, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.7717297 = idf(docFreq=2765, maxDocs=44218)
              0.046875 = fieldNorm(doc=720)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Suominen, O.; Koskenniemi, I.: Annif Analyzer Shootout : comparing text lemmatization methods for automated subject indexing (2022) 0.01

0.009460728 = product of:
  0.018921455 = sum of:
    0.018921455 = product of:
      0.03784291 = sum of:
        0.03784291 = weight(_text_:i in 658) [ClassicSimilarity], result of:
          0.03784291 = score(doc=658,freq=2.0), product of:
            0.18162222 = queryWeight, product of:
              3.7717297 = idf(docFreq=2765, maxDocs=44218)
              0.04815356 = queryNorm
            0.20836058 = fieldWeight in 658, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.7717297 = idf(docFreq=2765, maxDocs=44218)
              0.0390625 = fieldNorm(doc=658)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Ahmed, M.: Automatic indexing for agriculture : designing a framework by deploying Agrovoc, Agris and Annif (2023) 0.01
```
0.009460728 = product of:
  0.018921455 = sum of:
    0.018921455 = product of:
      0.03784291 = sum of:
        0.03784291 = weight(_text_:i in 1024) [ClassicSimilarity], result of:
          0.03784291 = score(doc=1024,freq=2.0), product of:
            0.18162222 = queryWeight, product of:
              3.7717297 = idf(docFreq=2765, maxDocs=44218)
              0.04815356 = queryNorm
            0.20836058 = fieldWeight in 1024, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.7717297 = idf(docFreq=2765, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1024)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

There are several ways to employ machine learning for automating subject indexing. One popular strategy is to utilize a supervised learning algorithm to train a model on a set of documents that have been manually indexed by subject matter using a standard vocabulary. The resulting model can then predict the subject of new and previously unseen documents by identifying patterns learned from the training data. To do this, the first step is to gather a large dataset of documents and manually assign each document a set of subject keywords/descriptors from a controlled vocabulary (e.g., from Agrovoc). Next, the dataset (obtained from Agris) can be divided into - i) a training dataset, and ii) a test dataset. The training dataset is used to train the model, while the test dataset is used to evaluate the model's performance. Machine learning can be a powerful tool for automating the process of subject indexing. This research is an attempt to apply Annif (http://annif. org/), an open-source AI/ML framework, to autogenerate subject keywords/descriptors for documentary resources in the domain of agriculture. The training dataset is obtained from Agris, which applies the Agrovoc thesaurus as a vocabulary tool (https://www.fao.org/agris/download).

Oliver, C.: Leveraging KOS to extend our reach with automated processes (2021) 0.01

0.0088455975 = product of:
  0.017691195 = sum of:
    0.017691195 = product of:
      0.088455975 = sum of:
        0.088455975 = weight(_text_:authors in 722) [ClassicSimilarity], result of:
          0.088455975 = score(doc=722,freq=2.0), product of:
            0.21952313 = queryWeight, product of:
              4.558814 = idf(docFreq=1258, maxDocs=44218)
              0.04815356 = queryNorm
            0.40294603 = fieldWeight in 722, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.558814 = idf(docFreq=1258, maxDocs=44218)
              0.0625 = fieldNorm(doc=722)
      0.2 = coord(1/5)
  0.5 = coord(1/2)

Abstract: This article provides a conclusion to the special issue on Artificial Intelligence (AI) and Automated Processes for Subject Access. The authors who contributed to this special issue have provoked interesting questions as well as bringing attention to important issues. This concluding article looks at common themes and highlights some of the questions raised.

Chou, C.; Chu, T.: ¬An analysis of BERT (NLP) for assisted subject indexing for Project Gutenberg (2022) 0.01
```
0.007739898 = product of:
  0.015479796 = sum of:
    0.015479796 = product of:
      0.07739898 = sum of:
        0.07739898 = weight(_text_:authors in 1139) [ClassicSimilarity], result of:
          0.07739898 = score(doc=1139,freq=2.0), product of:
            0.21952313 = queryWeight, product of:
              4.558814 = idf(docFreq=1258, maxDocs=44218)
              0.04815356 = queryNorm
            0.35257778 = fieldWeight in 1139, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.558814 = idf(docFreq=1258, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1139)
      0.2 = coord(1/5)
  0.5 = coord(1/2)
```
Abstract

In light of AI (Artificial Intelligence) and NLP (Natural language processing) technologies, this article examines the feasibility of using AI/NLP models to enhance the subject indexing of digital resources. While BERT (Bidirectional Encoder Representations from Transformers) models are widely used in scholarly communities, the authors assess whether BERT models can be used in machine-assisted indexing in the Project Gutenberg collection, through suggesting Library of Congress subject headings filtered by certain Library of Congress Classification subclass labels. The findings of this study are informative for further research on BERT models to assist with automatic subject indexing for digital library collections.

Search (5 results, page 1 of 1)

Authors

Types

Themes