Search (2 results, page 1 of 1)

Did you mean:
author's%3a%22Gilliland-swetland%2c A.%22 2
author's%3a%22Gilliland-scotland%2c A.%22 2
authors%3a%22Gilliland-swetland%2c A.%22 2
author's%3a%22Gilliland-seland%2c A.%22 2
authors%3a%22Gilliland-scotland%2c A.%22 2

Ahmed, M.: Automatic indexing for agriculture : designing a framework by deploying Agrovoc, Agris and Annif (2023) 0.00
```
0.002740105 = product of:
  0.00548021 = sum of:
    0.00548021 = product of:
      0.01096042 = sum of:
        0.01096042 = weight(_text_:a in 1024) [ClassicSimilarity], result of:
          0.01096042 = score(doc=1024,freq=26.0), product of:
            0.04772363 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.041389145 = queryNorm
            0.22966442 = fieldWeight in 1024, product of:
              5.0990195 = tf(freq=26.0), with freq of:
                26.0 = termFreq=26.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1024)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

There are several ways to employ machine learning for automating subject indexing. One popular strategy is to utilize a supervised learning algorithm to train a model on a set of documents that have been manually indexed by subject matter using a standard vocabulary. The resulting model can then predict the subject of new and previously unseen documents by identifying patterns learned from the training data. To do this, the first step is to gather a large dataset of documents and manually assign each document a set of subject keywords/descriptors from a controlled vocabulary (e.g., from Agrovoc). Next, the dataset (obtained from Agris) can be divided into - i) a training dataset, and ii) a test dataset. The training dataset is used to train the model, while the test dataset is used to evaluate the model's performance. Machine learning can be a powerful tool for automating the process of subject indexing. This research is an attempt to apply Annif (http://annif. org/), an open-source AI/ML framework, to autogenerate subject keywords/descriptors for documentary resources in the domain of agriculture. The training dataset is obtained from Agris, which applies the Agrovoc thesaurus as a vocabulary tool (https://www.fao.org/agris/download).

Type

a
Ahmed, M.; Mukhopadhyay, M.; Mukhopadhyay, P.: Automated knowledge organization : AI ML based subject indexing system for libraries (2023) 0.00
```
0.002279905 = product of:
  0.00455981 = sum of:
    0.00455981 = product of:
      0.00911962 = sum of:
        0.00911962 = weight(_text_:a in 977) [ClassicSimilarity], result of:
          0.00911962 = score(doc=977,freq=18.0), product of:
            0.04772363 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.041389145 = queryNorm
            0.19109234 = fieldWeight in 977, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=977)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The research study as reported here is an attempt to explore the possibilities of an AI/ML-based semi-automated indexing system in a library setup to handle large volumes of documents. It uses the Python virtual environment to install and configure an open source AI environment (named Annif) to feed the LOD (Linked Open Data) dataset of Library of Congress Subject Headings (LCSH) as a standard KOS (Knowledge Organisation System). The framework deployed the Turtle format of LCSH after cleaning the file with Skosify, applied an array of backend algorithms (namely TF-IDF, Omikuji, and NN-Ensemble) to measure relative performance, and selected Snowball as an analyser. The training of Annif was conducted with a large set of bibliographic records populated with subject descriptors (MARC tag 650$a) and indexed by trained LIS professionals. The training dataset is first treated with MarcEdit to export it in a format suitable for OpenRefine, and then in OpenRefine it undergoes many steps to produce a bibliographic record set suitable to train Annif. The framework, after training, has been tested with a bibliographic dataset to measure indexing efficiencies, and finally, the automated indexing framework is integrated with data wrangling software (OpenRefine) to produce suggested headings on a mass scale. The entire framework is based on open-source software, open datasets, and open standards.

Type

a

Search (2 results, page 1 of 1)

Authors

Themes