Document (#43725)

Author
Asula, M.
Makke, J.
Freienthal, L.
Kuulmets, H.-A.
Sirel, R.
Title
Kratt: developing an automatic subject indexing tool for the National Library of Estonia : how to transfer metadata information among work cluster members
Source
Cataloging and classification quarterly. 59(2021) no.8, p.775-793
Year
2021
Abstract
Manual subject indexing in libraries is a time-consuming and costly process and the quality of the assigned subjects is affected by the cataloger's knowledge on the specific topics contained in the book. Trying to solve these issues, we exploited the opportunities arising from artificial intelligence to develop Kratt: a prototype of an automatic subject indexing tool. Kratt is able to subject index a book independent of its extent and genre with a set of keywords present in the Estonian Subject Thesaurus. It takes Kratt approximately one minute to subject index a book, outperforming humans 10-15 times. Although the resulting keywords were not considered satisfactory by the catalogers, the ratings of a small sample of regular library users showed more promise. We also argue that the results can be enhanced by including a bigger corpus for training the model and applying more careful preprocessing techniques.
Content
Vgl.: https://doi.org/10.1080/01639374.2021.1998283.
Footnote
Teil eines Themenheftes: Artificial intelligence (AI) and automated processes for subject sccess
Theme
Automatisches Indexieren
Location
Estland

Similar documents (content)

  1. Fugmann, R.: Book indexing : the classificatory approach (1994) 0.15
    0.1522457 = sum of:
      0.1522457 = product of:
        0.6343571 = sum of:
          0.019582663 = weight(abstract_txt:more in 6920) [ClassicSimilarity], result of:
            0.019582663 = score(doc=6920,freq=1.0), product of:
              0.073677726 = queryWeight, product of:
                3.402088 = idf(docFreq=4002, maxDocs=44218)
                0.021656621 = queryNorm
              0.2657881 = fieldWeight in 6920, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.402088 = idf(docFreq=4002, maxDocs=44218)
                0.078125 = fieldNorm(doc=6920)
          0.112266995 = weight(abstract_txt:careful in 6920) [ClassicSimilarity], result of:
            0.112266995 = score(doc=6920,freq=1.0), product of:
              0.18731812 = queryWeight, product of:
                1.1274747 = boost
                7.6715355 = idf(docFreq=55, maxDocs=44218)
                0.021656621 = queryNorm
              0.5993387 = fieldWeight in 6920, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.6715355 = idf(docFreq=55, maxDocs=44218)
                0.078125 = fieldNorm(doc=6920)
          0.07532567 = weight(abstract_txt:index in 6920) [ClassicSimilarity], result of:
            0.07532567 = score(doc=6920,freq=2.0), product of:
              0.14356229 = queryWeight, product of:
                1.3958927 = boost
                4.74895 = idf(docFreq=1040, maxDocs=44218)
                0.021656621 = queryNorm
              0.5246898 = fieldWeight in 6920, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.74895 = idf(docFreq=1040, maxDocs=44218)
                0.078125 = fieldNorm(doc=6920)
          0.106323995 = weight(abstract_txt:indexing in 6920) [ClassicSimilarity], result of:
            0.106323995 = score(doc=6920,freq=3.0), product of:
              0.18064776 = queryWeight, product of:
                1.917758 = boost
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.021656621 = queryNorm
              0.5885708 = fieldWeight in 6920, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.078125 = fieldNorm(doc=6920)
          0.08543858 = weight(abstract_txt:book in 6920) [ClassicSimilarity], result of:
            0.08543858 = score(doc=6920,freq=1.0), product of:
              0.22519296 = queryWeight, product of:
                2.1411886 = boost
                4.856341 = idf(docFreq=934, maxDocs=44218)
                0.021656621 = queryNorm
              0.37940162 = fieldWeight in 6920, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.856341 = idf(docFreq=934, maxDocs=44218)
                0.078125 = fieldNorm(doc=6920)
          0.23541921 = weight(abstract_txt:subject in 6920) [ClassicSimilarity], result of:
            0.23541921 = score(doc=6920,freq=7.0), product of:
              0.29151264 = queryWeight, product of:
                3.4452536 = boost
                3.9070187 = idf(docFreq=2415, maxDocs=44218)
                0.021656621 = queryNorm
              0.8075781 = fieldWeight in 6920, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                3.9070187 = idf(docFreq=2415, maxDocs=44218)
                0.078125 = fieldNorm(doc=6920)
        0.24 = coord(6/25)
    
  2. Collier, H.: Cool, cool searching (1996) 0.14
    0.13507158 = sum of:
      0.13507158 = product of:
        0.56279826 = sum of:
          0.101016246 = weight(abstract_txt:humans in 4536) [ClassicSimilarity], result of:
            0.101016246 = score(doc=4536,freq=1.0), product of:
              0.15460315 = queryWeight, product of:
                1.0242974 = boost
                6.9694996 = idf(docFreq=112, maxDocs=44218)
                0.021656621 = queryNorm
              0.6533906 = fieldWeight in 4536, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9694996 = idf(docFreq=112, maxDocs=44218)
                0.09375 = fieldNorm(doc=4536)
          0.063915946 = weight(abstract_txt:index in 4536) [ClassicSimilarity], result of:
            0.063915946 = score(doc=4536,freq=1.0), product of:
              0.14356229 = queryWeight, product of:
                1.3958927 = boost
                4.74895 = idf(docFreq=1040, maxDocs=44218)
                0.021656621 = queryNorm
              0.44521406 = fieldWeight in 4536, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.74895 = idf(docFreq=1040, maxDocs=44218)
                0.09375 = fieldNorm(doc=4536)
          0.0724547 = weight(abstract_txt:tool in 4536) [ClassicSimilarity], result of:
            0.0724547 = score(doc=4536,freq=1.0), product of:
              0.15607928 = queryWeight, product of:
                1.4554741 = boost
                4.951651 = idf(docFreq=849, maxDocs=44218)
                0.021656621 = queryNorm
              0.4642173 = fieldWeight in 4536, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.951651 = idf(docFreq=849, maxDocs=44218)
                0.09375 = fieldNorm(doc=4536)
          0.14497182 = weight(abstract_txt:automatic in 4536) [ClassicSimilarity], result of:
            0.14497182 = score(doc=4536,freq=3.0), product of:
              0.17183681 = queryWeight, product of:
                1.5271791 = boost
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.021656621 = queryNorm
              0.8436599 = fieldWeight in 4536, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.09375 = fieldNorm(doc=4536)
          0.07366343 = weight(abstract_txt:indexing in 4536) [ClassicSimilarity], result of:
            0.07366343 = score(doc=4536,freq=1.0), product of:
              0.18064776 = queryWeight, product of:
                1.917758 = boost
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.021656621 = queryNorm
              0.40777382 = fieldWeight in 4536, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.09375 = fieldNorm(doc=4536)
          0.106776126 = weight(abstract_txt:subject in 4536) [ClassicSimilarity], result of:
            0.106776126 = score(doc=4536,freq=1.0), product of:
              0.29151264 = queryWeight, product of:
                3.4452536 = boost
                3.9070187 = idf(docFreq=2415, maxDocs=44218)
                0.021656621 = queryNorm
              0.366283 = fieldWeight in 4536, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9070187 = idf(docFreq=2415, maxDocs=44218)
                0.09375 = fieldNorm(doc=4536)
        0.24 = coord(6/25)
    
  3. Taylor, A.G.: Enhancing subject access in online systems : the year's work in subject analysis, 1991 (1992) 0.12
    0.12125009 = sum of:
      0.12125009 = product of:
        0.60625046 = sum of:
          0.023499196 = weight(abstract_txt:more in 1504) [ClassicSimilarity], result of:
            0.023499196 = score(doc=1504,freq=1.0), product of:
              0.073677726 = queryWeight, product of:
                3.402088 = idf(docFreq=4002, maxDocs=44218)
                0.021656621 = queryNorm
              0.31894574 = fieldWeight in 1504, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.402088 = idf(docFreq=4002, maxDocs=44218)
                0.09375 = fieldNorm(doc=1504)
          0.11990837 = weight(abstract_txt:promise in 1504) [ClassicSimilarity], result of:
            0.11990837 = score(doc=1504,freq=1.0), product of:
              0.1733234 = queryWeight, product of:
                1.0845398 = boost
                7.3793993 = idf(docFreq=74, maxDocs=44218)
                0.021656621 = queryNorm
              0.6918187 = fieldWeight in 1504, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3793993 = idf(docFreq=74, maxDocs=44218)
                0.09375 = fieldNorm(doc=1504)
          0.11990837 = weight(abstract_txt:trying in 1504) [ClassicSimilarity], result of:
            0.11990837 = score(doc=1504,freq=1.0), product of:
              0.1733234 = queryWeight, product of:
                1.0845398 = boost
                7.3793993 = idf(docFreq=74, maxDocs=44218)
                0.021656621 = queryNorm
              0.6918187 = fieldWeight in 1504, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3793993 = idf(docFreq=74, maxDocs=44218)
                0.09375 = fieldNorm(doc=1504)
          0.10417582 = weight(abstract_txt:indexing in 1504) [ClassicSimilarity], result of:
            0.10417582 = score(doc=1504,freq=2.0), product of:
              0.18064776 = queryWeight, product of:
                1.917758 = boost
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.021656621 = queryNorm
              0.5766793 = fieldWeight in 1504, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.09375 = fieldNorm(doc=1504)
          0.23875868 = weight(abstract_txt:subject in 1504) [ClassicSimilarity], result of:
            0.23875868 = score(doc=1504,freq=5.0), product of:
              0.29151264 = queryWeight, product of:
                3.4452536 = boost
                3.9070187 = idf(docFreq=2415, maxDocs=44218)
                0.021656621 = queryNorm
              0.81903374 = fieldWeight in 1504, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.9070187 = idf(docFreq=2415, maxDocs=44218)
                0.09375 = fieldNorm(doc=1504)
        0.2 = coord(5/25)
    
  4. Langridge, D.W.: Subject analysis : principles and procedures (1989) 0.11
    0.10640134 = sum of:
      0.10640134 = product of:
        0.66500837 = sum of:
          0.11159936 = weight(abstract_txt:automatic in 2021) [ClassicSimilarity], result of:
            0.11159936 = score(doc=2021,freq=1.0), product of:
              0.17183681 = queryWeight, product of:
                1.5271791 = boost
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.021656621 = queryNorm
              0.6494497 = fieldWeight in 2021, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.125 = fieldNorm(doc=2021)
          0.17011839 = weight(abstract_txt:indexing in 2021) [ClassicSimilarity], result of:
            0.17011839 = score(doc=2021,freq=3.0), product of:
              0.18064776 = queryWeight, product of:
                1.917758 = boost
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.021656621 = queryNorm
              0.9417133 = fieldWeight in 2021, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.125 = fieldNorm(doc=2021)
          0.13670172 = weight(abstract_txt:book in 2021) [ClassicSimilarity], result of:
            0.13670172 = score(doc=2021,freq=1.0), product of:
              0.22519296 = queryWeight, product of:
                2.1411886 = boost
                4.856341 = idf(docFreq=934, maxDocs=44218)
                0.021656621 = queryNorm
              0.6070426 = fieldWeight in 2021, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.856341 = idf(docFreq=934, maxDocs=44218)
                0.125 = fieldNorm(doc=2021)
          0.24658889 = weight(abstract_txt:subject in 2021) [ClassicSimilarity], result of:
            0.24658889 = score(doc=2021,freq=3.0), product of:
              0.29151264 = queryWeight, product of:
                3.4452536 = boost
                3.9070187 = idf(docFreq=2415, maxDocs=44218)
                0.021656621 = queryNorm
              0.84589434 = fieldWeight in 2021, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.9070187 = idf(docFreq=2415, maxDocs=44218)
                0.125 = fieldNorm(doc=2021)
        0.16 = coord(4/25)
    
  5. Academic research on the Internet : options for scholars & librarians (2001) 0.10
    0.09836296 = sum of:
      0.09836296 = product of:
        0.6147685 = sum of:
          0.23909241 = weight(abstract_txt:minute in 686) [ClassicSimilarity], result of:
            0.23909241 = score(doc=686,freq=1.0), product of:
              0.22666036 = queryWeight, product of:
                1.240237 = boost
                8.43879 = idf(docFreq=25, maxDocs=44218)
                0.021656621 = queryNorm
              1.0548488 = fieldWeight in 686, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.43879 = idf(docFreq=25, maxDocs=44218)
                0.125 = fieldNorm(doc=686)
          0.09660626 = weight(abstract_txt:tool in 686) [ClassicSimilarity], result of:
            0.09660626 = score(doc=686,freq=1.0), product of:
              0.15607928 = queryWeight, product of:
                1.4554741 = boost
                4.951651 = idf(docFreq=849, maxDocs=44218)
                0.021656621 = queryNorm
              0.6189564 = fieldWeight in 686, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.951651 = idf(docFreq=849, maxDocs=44218)
                0.125 = fieldNorm(doc=686)
          0.13670172 = weight(abstract_txt:book in 686) [ClassicSimilarity], result of:
            0.13670172 = score(doc=686,freq=1.0), product of:
              0.22519296 = queryWeight, product of:
                2.1411886 = boost
                4.856341 = idf(docFreq=934, maxDocs=44218)
                0.021656621 = queryNorm
              0.6070426 = fieldWeight in 686, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.856341 = idf(docFreq=934, maxDocs=44218)
                0.125 = fieldNorm(doc=686)
          0.14236817 = weight(abstract_txt:subject in 686) [ClassicSimilarity], result of:
            0.14236817 = score(doc=686,freq=1.0), product of:
              0.29151264 = queryWeight, product of:
                3.4452536 = boost
                3.9070187 = idf(docFreq=2415, maxDocs=44218)
                0.021656621 = queryNorm
              0.48837733 = fieldWeight in 686, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9070187 = idf(docFreq=2415, maxDocs=44218)
                0.125 = fieldNorm(doc=686)
        0.16 = coord(4/25)