Document (#44141)

Author
Chou, C.
Chu, T.
Title
¬An analysis of BERT (NLP) for assisted subject indexing for Project Gutenberg
Source
Cataloging and classification quarterly. 60(2022) no.8, p.807-835
Year
2022
Abstract
In light of AI (Artificial Intelligence) and NLP (Natural language processing) technologies, this article examines the feasibility of using AI/NLP models to enhance the subject indexing of digital resources. While BERT (Bidirectional Encoder Representations from Transformers) models are widely used in scholarly communities, the authors assess whether BERT models can be used in machine-assisted indexing in the Project Gutenberg collection, through suggesting Library of Congress subject headings filtered by certain Library of Congress Classification subclass labels. The findings of this study are informative for further research on BERT models to assist with automatic subject indexing for digital library collections.
Content
Vgl.: https://www.tandfonline.com/doi/full/10.1080/01639374.2022.2138666.
Theme
Computerlinguistik
Automatisches Indexieren
Object
BERT
Projekt Gutenberg
LCSH
LCC

Similar documents (author)

  1. Chou, D.D.: Developing an Intranet : tool selection and management issues (1998) 5.87
    5.871439 = sum of:
      5.871439 = weight(author_txt:chou in 2425) [ClassicSimilarity], result of:
        5.871439 = fieldWeight in 2425, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.394302 = idf(docFreq=9, maxDocs=44218)
          0.625 = fieldNorm(doc=2425)
    
  2. Chou, L.: Informativ, interaktiv, kollaborativ und selbstbestimmt : Mit digitalen Lernumgebungen verändern sich die Lernprozesse (2000) 5.87
    5.871439 = sum of:
      5.871439 = weight(author_txt:chou in 5211) [ClassicSimilarity], result of:
        5.871439 = fieldWeight in 5211, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.394302 = idf(docFreq=9, maxDocs=44218)
          0.625 = fieldNorm(doc=5211)
    
  3. Chou, C.: Purpose-driven assessment of cataloging and metadata services : transforming broken links into linked data (2019) 5.87
    5.871439 = sum of:
      5.871439 = weight(author_txt:chou in 5280) [ClassicSimilarity], result of:
        5.871439 = fieldWeight in 5280, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.394302 = idf(docFreq=9, maxDocs=44218)
          0.625 = fieldNorm(doc=5280)
    
  4. Chou, S.W.; Tsai, Y.H.: Knowledge creation : individual and organizational perspectives (2005) 4.70
    4.697151 = sum of:
      4.697151 = weight(author_txt:chou in 4648) [ClassicSimilarity], result of:
        4.697151 = fieldWeight in 4648, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.394302 = idf(docFreq=9, maxDocs=44218)
          0.5 = fieldNorm(doc=4648)
    
  5. Kalczynski, P.J.; Chou, A.: Temporal Document Retrieval Model for business news archives (2005) 4.70
    4.697151 = sum of:
      4.697151 = weight(author_txt:chou in 1030) [ClassicSimilarity], result of:
        4.697151 = fieldWeight in 1030, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.394302 = idf(docFreq=9, maxDocs=44218)
          0.5 = fieldNorm(doc=1030)
    

Similar documents (content)

  1. Lowe, D.B.; Dollinger, I.; Koster, T.; Herbert, B.E.: Text mining for type of research classification (2021) 0.18
    0.17695145 = sum of:
      0.17695145 = product of:
        0.8847572 = sum of:
          0.035187505 = weight(abstract_txt:representations in 720) [ClassicSimilarity], result of:
            0.035187505 = score(doc=720,freq=1.0), product of:
              0.074985094 = queryWeight, product of:
                1.025662 = boost
                6.006528 = idf(docFreq=295, maxDocs=44218)
                0.012171587 = queryNorm
              0.46925998 = fieldWeight in 720, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.006528 = idf(docFreq=295, maxDocs=44218)
                0.078125 = fieldNorm(doc=720)
          0.03854728 = weight(abstract_txt:project in 720) [ClassicSimilarity], result of:
            0.03854728 = score(doc=720,freq=2.0), product of:
              0.079685345 = queryWeight, product of:
                1.4952747 = boost
                4.378348 = idf(docFreq=1507, maxDocs=44218)
                0.012171587 = queryNorm
              0.48374367 = fieldWeight in 720, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.378348 = idf(docFreq=1507, maxDocs=44218)
                0.078125 = fieldNorm(doc=720)
          0.12365363 = weight(abstract_txt:bidirectional in 720) [ClassicSimilarity], result of:
            0.12365363 = score(doc=720,freq=1.0), product of:
              0.17332207 = queryWeight, product of:
                1.5593503 = boost
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.012171587 = queryNorm
              0.71343267 = fieldWeight in 720, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.078125 = fieldNorm(doc=720)
          0.13056464 = weight(abstract_txt:encoder in 720) [ClassicSimilarity], result of:
            0.13056464 = score(doc=720,freq=1.0), product of:
              0.17972136 = queryWeight, product of:
                1.5878761 = boost
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.012171587 = queryNorm
              0.72648376 = fieldWeight in 720, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.078125 = fieldNorm(doc=720)
          0.5568042 = weight(abstract_txt:bert in 720) [ClassicSimilarity], result of:
            0.5568042 = score(doc=720,freq=1.0), product of:
              0.750247 = queryWeight, product of:
                6.4885683 = boost
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.012171587 = queryNorm
              0.74216115 = fieldWeight in 720, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.078125 = fieldNorm(doc=720)
        0.2 = coord(5/25)
    
  2. Xiang, R.; Chersoni, E.; Lu, Q.; Huang, C.-R.; Li, W.; Long, Y.: Lexical data augmentation for sentiment analysis (2021) 0.16
    0.15966268 = sum of:
      0.15966268 = product of:
        0.66526115 = sum of:
          0.024631254 = weight(abstract_txt:representations in 392) [ClassicSimilarity], result of:
            0.024631254 = score(doc=392,freq=1.0), product of:
              0.074985094 = queryWeight, product of:
                1.025662 = boost
                6.006528 = idf(docFreq=295, maxDocs=44218)
                0.012171587 = queryNorm
              0.328482 = fieldWeight in 392, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.006528 = idf(docFreq=295, maxDocs=44218)
                0.0546875 = fieldNorm(doc=392)
          0.008617713 = weight(abstract_txt:used in 392) [ClassicSimilarity], result of:
            0.008617713 = score(doc=392,freq=1.0), product of:
              0.046908904 = queryWeight, product of:
                1.1472535 = boost
                3.3592992 = idf(docFreq=4177, maxDocs=44218)
                0.012171587 = queryNorm
              0.18371168 = fieldWeight in 392, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3592992 = idf(docFreq=4177, maxDocs=44218)
                0.0546875 = fieldNorm(doc=392)
          0.08655754 = weight(abstract_txt:bidirectional in 392) [ClassicSimilarity], result of:
            0.08655754 = score(doc=392,freq=1.0), product of:
              0.17332207 = queryWeight, product of:
                1.5593503 = boost
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.012171587 = queryNorm
              0.49940285 = fieldWeight in 392, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.0546875 = fieldNorm(doc=392)
          0.091395244 = weight(abstract_txt:encoder in 392) [ClassicSimilarity], result of:
            0.091395244 = score(doc=392,freq=1.0), product of:
              0.17972136 = queryWeight, product of:
                1.5878761 = boost
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.012171587 = queryNorm
              0.5085386 = fieldWeight in 392, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.0546875 = fieldNorm(doc=392)
          0.06429645 = weight(abstract_txt:models in 392) [ClassicSimilarity], result of:
            0.06429645 = score(doc=392,freq=2.0), product of:
              0.17910948 = queryWeight, product of:
                3.1703415 = boost
                4.6415744 = idf(docFreq=1158, maxDocs=44218)
                0.012171587 = queryNorm
              0.35897845 = fieldWeight in 392, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.6415744 = idf(docFreq=1158, maxDocs=44218)
                0.0546875 = fieldNorm(doc=392)
          0.3897629 = weight(abstract_txt:bert in 392) [ClassicSimilarity], result of:
            0.3897629 = score(doc=392,freq=1.0), product of:
              0.750247 = queryWeight, product of:
                6.4885683 = boost
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.012171587 = queryNorm
              0.5195128 = fieldWeight in 392, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.0546875 = fieldNorm(doc=392)
        0.24 = coord(6/25)
    
  3. Humphrey, S.M.: Use and management of classification systems for knowledge-based indexing (1992) 0.12
    0.117243 = sum of:
      0.117243 = product of:
        0.48851252 = sum of:
          0.045656625 = weight(abstract_txt:intelligence in 2094) [ClassicSimilarity], result of:
            0.045656625 = score(doc=2094,freq=1.0), product of:
              0.0712798 = queryWeight, product of:
                5.8562455 = idf(docFreq=343, maxDocs=44218)
                0.012171587 = queryNorm
              0.64052683 = fieldWeight in 2094, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8562455 = idf(docFreq=343, maxDocs=44218)
                0.109375 = fieldNorm(doc=2094)
          0.05073218 = weight(abstract_txt:artificial in 2094) [ClassicSimilarity], result of:
            0.05073218 = score(doc=2094,freq=1.0), product of:
              0.07646915 = queryWeight, product of:
                1.0357618 = boost
                6.0656753 = idf(docFreq=278, maxDocs=44218)
                0.012171587 = queryNorm
              0.66343325 = fieldWeight in 2094, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0656753 = idf(docFreq=278, maxDocs=44218)
                0.109375 = fieldNorm(doc=2094)
          0.053966194 = weight(abstract_txt:project in 2094) [ClassicSimilarity], result of:
            0.053966194 = score(doc=2094,freq=2.0), product of:
              0.079685345 = queryWeight, product of:
                1.4952747 = boost
                4.378348 = idf(docFreq=1507, maxDocs=44218)
                0.012171587 = queryNorm
              0.67724115 = fieldWeight in 2094, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.378348 = idf(docFreq=1507, maxDocs=44218)
                0.109375 = fieldNorm(doc=2094)
          0.022069803 = weight(abstract_txt:library in 2094) [ClassicSimilarity], result of:
            0.022069803 = score(doc=2094,freq=1.0), product of:
              0.06331938 = queryWeight, product of:
                1.6324718 = boost
                3.1867187 = idf(docFreq=4964, maxDocs=44218)
                0.012171587 = queryNorm
              0.34854737 = fieldWeight in 2094, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1867187 = idf(docFreq=4964, maxDocs=44218)
                0.109375 = fieldNorm(doc=2094)
          0.16643652 = weight(abstract_txt:assisted in 2094) [ClassicSimilarity], result of:
            0.16643652 = score(doc=2094,freq=1.0), product of:
              0.21271904 = queryWeight, product of:
                2.4430645 = boost
                7.1535926 = idf(docFreq=93, maxDocs=44218)
                0.012171587 = queryNorm
              0.7824242 = fieldWeight in 2094, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.1535926 = idf(docFreq=93, maxDocs=44218)
                0.109375 = fieldNorm(doc=2094)
          0.1496512 = weight(abstract_txt:indexing in 2094) [ClassicSimilarity], result of:
            0.1496512 = score(doc=2094,freq=4.0), product of:
              0.15728383 = queryWeight, product of:
                2.970905 = boost
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.012171587 = queryNorm
              0.9514723 = fieldWeight in 2094, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.109375 = fieldNorm(doc=2094)
        0.24 = coord(6/25)
    
  4. From Gutenberg to the global information infrastructure : access to information in the networked world (2000) 0.09
    0.08619995 = sum of:
      0.08619995 = product of:
        0.7183329 = sum of:
          0.021134704 = weight(abstract_txt:digital in 3886) [ClassicSimilarity], result of:
            0.021134704 = score(doc=3886,freq=1.0), product of:
              0.078042306 = queryWeight, product of:
                1.4797788 = boost
                4.332974 = idf(docFreq=1577, maxDocs=44218)
                0.012171587 = queryNorm
              0.27081087 = fieldWeight in 3886, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.332974 = idf(docFreq=1577, maxDocs=44218)
                0.0625 = fieldNorm(doc=3886)
          0.012611316 = weight(abstract_txt:library in 3886) [ClassicSimilarity], result of:
            0.012611316 = score(doc=3886,freq=1.0), product of:
              0.06331938 = queryWeight, product of:
                1.6324718 = boost
                3.1867187 = idf(docFreq=4964, maxDocs=44218)
                0.012171587 = queryNorm
              0.19916992 = fieldWeight in 3886, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1867187 = idf(docFreq=4964, maxDocs=44218)
                0.0625 = fieldNorm(doc=3886)
          0.6845869 = weight(title_txt:gutenberg in 3886) [ClassicSimilarity], result of:
            0.6845869 = score(doc=3886,freq=1.0), product of:
              0.31471083 = queryWeight, product of:
                2.9715812 = boost
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.012171587 = queryNorm
              2.1752887 = fieldWeight in 3886, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.25 = fieldNorm(doc=3886)
        0.12 = coord(3/25)
    
  5. Meng, K.; Ba, Z.; Ma, Y.; Li, G.: ¬A network coupling approach to detecting hierarchical linkages between science and technology (2024) 0.08
    0.07785815 = sum of:
      0.07785815 = product of:
        0.64881796 = sum of:
          0.0989229 = weight(abstract_txt:bidirectional in 1205) [ClassicSimilarity], result of:
            0.0989229 = score(doc=1205,freq=1.0), product of:
              0.17332207 = queryWeight, product of:
                1.5593503 = boost
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.012171587 = queryNorm
              0.5707461 = fieldWeight in 1205, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.0625 = fieldNorm(doc=1205)
          0.104451716 = weight(abstract_txt:encoder in 1205) [ClassicSimilarity], result of:
            0.104451716 = score(doc=1205,freq=1.0), product of:
              0.17972136 = queryWeight, product of:
                1.5878761 = boost
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.012171587 = queryNorm
              0.581187 = fieldWeight in 1205, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.0625 = fieldNorm(doc=1205)
          0.44544333 = weight(abstract_txt:bert in 1205) [ClassicSimilarity], result of:
            0.44544333 = score(doc=1205,freq=1.0), product of:
              0.750247 = queryWeight, product of:
                6.4885683 = boost
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.012171587 = queryNorm
              0.5937289 = fieldWeight in 1205, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.0625 = fieldNorm(doc=1205)
        0.12 = coord(3/25)