Search (5 results, page 1 of 1)

  • × theme_ss:"Automatisches Indexieren"
  • × year_i:[2020 TO 2030}
  1. Chou, C.; Chu, T.: ¬An analysis of BERT (NLP) for assisted subject indexing for Project Gutenberg (2022) 0.14
    0.13541423 = product of:
      0.1805523 = sum of:
        0.085297674 = weight(_text_:digital in 1139) [ClassicSimilarity], result of:
          0.085297674 = score(doc=1139,freq=4.0), product of:
            0.19770671 = queryWeight, product of:
              3.944552 = idf(docFreq=2326, maxDocs=44218)
              0.050121464 = queryNorm
            0.43143538 = fieldWeight in 1139, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.944552 = idf(docFreq=2326, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1139)
        0.04641878 = weight(_text_:library in 1139) [ClassicSimilarity], result of:
          0.04641878 = score(doc=1139,freq=6.0), product of:
            0.1317883 = queryWeight, product of:
              2.6293786 = idf(docFreq=8668, maxDocs=44218)
              0.050121464 = queryNorm
            0.3522223 = fieldWeight in 1139, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              2.6293786 = idf(docFreq=8668, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1139)
        0.048835836 = product of:
          0.09767167 = sum of:
            0.09767167 = weight(_text_:project in 1139) [ClassicSimilarity], result of:
              0.09767167 = score(doc=1139,freq=4.0), product of:
                0.21156175 = queryWeight, product of:
                  4.220981 = idf(docFreq=1764, maxDocs=44218)
                  0.050121464 = queryNorm
                0.4616698 = fieldWeight in 1139, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.220981 = idf(docFreq=1764, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1139)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Abstract
    In light of AI (Artificial Intelligence) and NLP (Natural language processing) technologies, this article examines the feasibility of using AI/NLP models to enhance the subject indexing of digital resources. While BERT (Bidirectional Encoder Representations from Transformers) models are widely used in scholarly communities, the authors assess whether BERT models can be used in machine-assisted indexing in the Project Gutenberg collection, through suggesting Library of Congress subject headings filtered by certain Library of Congress Classification subclass labels. The findings of this study are informative for further research on BERT models to assist with automatic subject indexing for digital library collections.
  2. Moulaison-Sandy, H.; Adkins, D.; Bossaller, J.; Cho, H.: ¬An automated approach to describing fiction : a methodology to use book reviews to identify affect (2021) 0.04
    0.037817866 = product of:
      0.07563573 = sum of:
        0.026799891 = weight(_text_:library in 710) [ClassicSimilarity], result of:
          0.026799891 = score(doc=710,freq=2.0), product of:
            0.1317883 = queryWeight, product of:
              2.6293786 = idf(docFreq=8668, maxDocs=44218)
              0.050121464 = queryNorm
            0.20335563 = fieldWeight in 710, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.6293786 = idf(docFreq=8668, maxDocs=44218)
              0.0546875 = fieldNorm(doc=710)
        0.048835836 = product of:
          0.09767167 = sum of:
            0.09767167 = weight(_text_:project in 710) [ClassicSimilarity], result of:
              0.09767167 = score(doc=710,freq=4.0), product of:
                0.21156175 = queryWeight, product of:
                  4.220981 = idf(docFreq=1764, maxDocs=44218)
                  0.050121464 = queryNorm
                0.4616698 = fieldWeight in 710, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.220981 = idf(docFreq=1764, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=710)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Subject headings and genre terms are notoriously difficult to apply, yet are important for fiction. The current project functions as a proof of concept, using a text-mining methodology to identify affective information (emotion and tone) about fiction titles from professional book reviews as a potential first step in automating the subject analysis process. Findings are presented and discussed, comparing results to the range of aboutness and isness information in library cataloging records. The methodology is likewise presented, and how future work might expand on the current project to enhance catalog records through text-mining is explored.
  3. Lowe, D.B.; Dollinger, I.; Koster, T.; Herbert, B.E.: Text mining for type of research classification (2021) 0.01
    0.010464822 = product of:
      0.041859288 = sum of:
        0.041859288 = product of:
          0.083718576 = sum of:
            0.083718576 = weight(_text_:project in 720) [ClassicSimilarity], result of:
              0.083718576 = score(doc=720,freq=4.0), product of:
                0.21156175 = queryWeight, product of:
                  4.220981 = idf(docFreq=1764, maxDocs=44218)
                  0.050121464 = queryNorm
                0.39571697 = fieldWeight in 720, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.220981 = idf(docFreq=1764, maxDocs=44218)
                  0.046875 = fieldNorm(doc=720)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    This project brought together undergraduate students in Computer Science with librarians to mine abstracts of articles from the Texas A&M University Libraries' institutional repository, OAKTrust, in order to probe the creation of new metadata to improve discovery and use. The mining operation task consisted simply of classifying the articles into two categories of research type: basic research ("for understanding," "curiosity-based," or "knowledge-based") and applied research ("use-based"). These categories are fundamental especially for funders but are also important to researchers. The mining-to-classification steps took several iterations, but ultimately, we achieved good results with the toolkit BERT (Bidirectional Encoder Representations from Transformers). The project and its workflows represent a preview of what may lie ahead in the future of crafting metadata using text mining techniques to enhance discoverability.
  4. Asula, M.; Makke, J.; Freienthal, L.; Kuulmets, H.-A.; Sirel, R.: Kratt: developing an automatic subject indexing tool for the National Library of Estonia : how to transfer metadata information among work cluster members (2021) 0.01
    0.008121594 = product of:
      0.032486375 = sum of:
        0.032486375 = weight(_text_:library in 723) [ClassicSimilarity], result of:
          0.032486375 = score(doc=723,freq=4.0), product of:
            0.1317883 = queryWeight, product of:
              2.6293786 = idf(docFreq=8668, maxDocs=44218)
              0.050121464 = queryNorm
            0.24650425 = fieldWeight in 723, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.6293786 = idf(docFreq=8668, maxDocs=44218)
              0.046875 = fieldNorm(doc=723)
      0.25 = coord(1/4)
    
    Abstract
    Manual subject indexing in libraries is a time-consuming and costly process and the quality of the assigned subjects is affected by the cataloger's knowledge on the specific topics contained in the book. Trying to solve these issues, we exploited the opportunities arising from artificial intelligence to develop Kratt: a prototype of an automatic subject indexing tool. Kratt is able to subject index a book independent of its extent and genre with a set of keywords present in the Estonian Subject Thesaurus. It takes Kratt approximately one minute to subject index a book, outperforming humans 10-15 times. Although the resulting keywords were not considered satisfactory by the catalogers, the ratings of a small sample of regular library users showed more promise. We also argue that the results can be enhanced by including a bigger corpus for training the model and applying more careful preprocessing techniques.
  5. Golub, K.: Automated subject indexing : an overview (2021) 0.01
    0.006699973 = product of:
      0.026799891 = sum of:
        0.026799891 = weight(_text_:library in 718) [ClassicSimilarity], result of:
          0.026799891 = score(doc=718,freq=2.0), product of:
            0.1317883 = queryWeight, product of:
              2.6293786 = idf(docFreq=8668, maxDocs=44218)
              0.050121464 = queryNorm
            0.20335563 = fieldWeight in 718, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.6293786 = idf(docFreq=8668, maxDocs=44218)
              0.0546875 = fieldNorm(doc=718)
      0.25 = coord(1/4)
    
    Abstract
    In the face of the ever-increasing document volume, libraries around the globe are more and more exploring (semi-) automated approaches to subject indexing. This helps sustain bibliographic objectives, enrich metadata, and establish more connections across documents from various collections, effectively leading to improved information retrieval and access. However, generally accepted automated approaches that are functional in operative systems are lacking. This article aims to provide an overview of basic principles used for automated subject indexing, major approaches in relation to their possible application in actual library systems, existing working examples, as well as related challenges calling for further research.