Document (#43720)

Author
Golub, K.
Title
Automated subject indexing : an overview
Source
Cataloging and classification quarterly. 59(2021) no.8, p.702-719
Year
2021
Abstract
In the face of the ever-increasing document volume, libraries around the globe are more and more exploring (semi-) automated approaches to subject indexing. This helps sustain bibliographic objectives, enrich metadata, and establish more connections across documents from various collections, effectively leading to improved information retrieval and access. However, generally accepted automated approaches that are functional in operative systems are lacking. This article aims to provide an overview of basic principles used for automated subject indexing, major approaches in relation to their possible application in actual library systems, existing working examples, as well as related challenges calling for further research.
Content
Vgl.: https://doi.org/10.1080/01639374.2021.2012311.
Footnote
Teil eines Themenheftes: Artificial intelligence (AI) and automated processes for subject sccess
Theme
Automatisches Indexieren

Similar documents (author)

  1. Golub, K.: Automated subject classification of textual web documents (2006) 5.30
    5.298757 = sum of:
      5.298757 = weight(author_txt:golub in 5600) [ClassicSimilarity], result of:
        5.298757 = fieldWeight in 5600, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.625 = fieldNorm(doc=5600)
    
  2. Golub, K.: Automated subject classification of textual Web pages, based on a controlled vocabulary : challenges and recommendations (2006) 5.30
    5.298757 = sum of:
      5.298757 = weight(author_txt:golub in 5897) [ClassicSimilarity], result of:
        5.298757 = fieldWeight in 5897, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.625 = fieldNorm(doc=5897)
    
  3. Golub, K.: Subject access to information : an interdisciplinary approach (2015) 5.30
    5.298757 = sum of:
      5.298757 = weight(author_txt:golub in 134) [ClassicSimilarity], result of:
        5.298757 = fieldWeight in 134, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.625 = fieldNorm(doc=134)
    
  4. Golub, K.: Automated subject classification of textual documents in the context of Web-based hierarchical browsing (2011) 5.30
    5.298757 = sum of:
      5.298757 = weight(author_txt:golub in 4558) [ClassicSimilarity], result of:
        5.298757 = fieldWeight in 4558, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.625 = fieldNorm(doc=4558)
    
  5. Golub, K.: Subject access in Swedish discovery services (2018) 5.30
    5.298757 = sum of:
      5.298757 = weight(author_txt:golub in 4379) [ClassicSimilarity], result of:
        5.298757 = fieldWeight in 4379, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.625 = fieldNorm(doc=4379)
    

Similar documents (content)

  1. Golub, K.: Automatic subject indexing of text (2019) 0.25
    0.252932 = sum of:
      0.252932 = product of:
        0.70258886 = sum of:
          0.05513582 = weight(abstract_txt:establish in 5268) [ClassicSimilarity], result of:
            0.05513582 = score(doc=5268,freq=1.0), product of:
              0.14045583 = queryWeight, product of:
                1.0621886 = boost
                6.280787 = idf(docFreq=224, maxDocs=44218)
                0.021053487 = queryNorm
              0.3925492 = fieldWeight in 5268, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.280787 = idf(docFreq=224, maxDocs=44218)
                0.0625 = fieldNorm(doc=5268)
          0.058990955 = weight(abstract_txt:connections in 5268) [ClassicSimilarity], result of:
            0.058990955 = score(doc=5268,freq=1.0), product of:
              0.14692898 = queryWeight, product of:
                1.0863893 = boost
                6.4238877 = idf(docFreq=194, maxDocs=44218)
                0.021053487 = queryNorm
              0.40149298 = fieldWeight in 5268, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.4238877 = idf(docFreq=194, maxDocs=44218)
                0.0625 = fieldNorm(doc=5268)
          0.035353485 = weight(abstract_txt:systems in 5268) [ClassicSimilarity], result of:
            0.035353485 = score(doc=5268,freq=4.0), product of:
              0.08289506 = queryWeight, product of:
                1.154014 = boost
                3.4118783 = idf(docFreq=3963, maxDocs=44218)
                0.021053487 = queryNorm
              0.4264848 = fieldWeight in 5268, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.4118783 = idf(docFreq=3963, maxDocs=44218)
                0.0625 = fieldNorm(doc=5268)
          0.08942411 = weight(abstract_txt:enrich in 5268) [ClassicSimilarity], result of:
            0.08942411 = score(doc=5268,freq=1.0), product of:
              0.1938892 = queryWeight, product of:
                1.2479827 = boost
                7.3793993 = idf(docFreq=74, maxDocs=44218)
                0.021053487 = queryNorm
              0.46121246 = fieldWeight in 5268, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3793993 = idf(docFreq=74, maxDocs=44218)
                0.0625 = fieldNorm(doc=5268)
          0.1375726 = weight(abstract_txt:operative in 5268) [ClassicSimilarity], result of:
            0.1375726 = score(doc=5268,freq=1.0), product of:
              0.25838768 = queryWeight, product of:
                1.4406805 = boost
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.021053487 = queryNorm
              0.5324271 = fieldWeight in 5268, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.0625 = fieldNorm(doc=5268)
          0.037176155 = weight(abstract_txt:more in 5268) [ClassicSimilarity], result of:
            0.037176155 = score(doc=5268,freq=2.0), product of:
              0.12363002 = queryWeight, product of:
                1.7260538 = boost
                3.402088 = idf(docFreq=4002, maxDocs=44218)
                0.021053487 = queryNorm
              0.30070493 = fieldWeight in 5268, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.402088 = idf(docFreq=4002, maxDocs=44218)
                0.0625 = fieldNorm(doc=5268)
          0.08902959 = weight(abstract_txt:subject in 5268) [ClassicSimilarity], result of:
            0.08902959 = score(doc=5268,freq=5.0), product of:
              0.16305114 = queryWeight, product of:
                1.9822311 = boost
                3.9070187 = idf(docFreq=2415, maxDocs=44218)
                0.021053487 = queryNorm
              0.5460225 = fieldWeight in 5268, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.9070187 = idf(docFreq=2415, maxDocs=44218)
                0.0625 = fieldNorm(doc=5268)
          0.13456519 = weight(abstract_txt:indexing in 5268) [ClassicSimilarity], result of:
            0.13456519 = score(doc=5268,freq=6.0), product of:
              0.20208263 = queryWeight, product of:
                2.206769 = boost
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.021053487 = queryNorm
              0.6658919 = fieldWeight in 5268, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.0625 = fieldNorm(doc=5268)
          0.06534097 = weight(abstract_txt:approaches in 5268) [ClassicSimilarity], result of:
            0.06534097 = score(doc=5268,freq=1.0), product of:
              0.22685482 = queryWeight, product of:
                2.3381178 = boost
                4.6084785 = idf(docFreq=1197, maxDocs=44218)
                0.021053487 = queryNorm
              0.2880299 = fieldWeight in 5268, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6084785 = idf(docFreq=1197, maxDocs=44218)
                0.0625 = fieldNorm(doc=5268)
        0.36 = coord(9/25)
    
  2. Golub, K.; Hansson, J.; Soergel, D.; Tudhope, D.: Managing classification in libraries : a methodological outline for evaluating automatic subject indexing and classification in Swedish library catalogues (2015) 0.19
    0.19063786 = sum of:
      0.19063786 = product of:
        0.5957433 = sum of:
          0.05513582 = weight(abstract_txt:establish in 2300) [ClassicSimilarity], result of:
            0.05513582 = score(doc=2300,freq=1.0), product of:
              0.14045583 = queryWeight, product of:
                1.0621886 = boost
                6.280787 = idf(docFreq=224, maxDocs=44218)
                0.021053487 = queryNorm
              0.3925492 = fieldWeight in 2300, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.280787 = idf(docFreq=224, maxDocs=44218)
                0.0625 = fieldNorm(doc=2300)
          0.058990955 = weight(abstract_txt:connections in 2300) [ClassicSimilarity], result of:
            0.058990955 = score(doc=2300,freq=1.0), product of:
              0.14692898 = queryWeight, product of:
                1.0863893 = boost
                6.4238877 = idf(docFreq=194, maxDocs=44218)
                0.021053487 = queryNorm
              0.40149298 = fieldWeight in 2300, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.4238877 = idf(docFreq=194, maxDocs=44218)
                0.0625 = fieldNorm(doc=2300)
          0.08942411 = weight(abstract_txt:enrich in 2300) [ClassicSimilarity], result of:
            0.08942411 = score(doc=2300,freq=1.0), product of:
              0.1938892 = queryWeight, product of:
                1.2479827 = boost
                7.3793993 = idf(docFreq=74, maxDocs=44218)
                0.021053487 = queryNorm
              0.46121246 = fieldWeight in 2300, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3793993 = idf(docFreq=74, maxDocs=44218)
                0.0625 = fieldNorm(doc=2300)
          0.026287511 = weight(abstract_txt:more in 2300) [ClassicSimilarity], result of:
            0.026287511 = score(doc=2300,freq=1.0), product of:
              0.12363002 = queryWeight, product of:
                1.7260538 = boost
                3.402088 = idf(docFreq=4002, maxDocs=44218)
                0.021053487 = queryNorm
              0.2126305 = fieldWeight in 2300, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.402088 = idf(docFreq=4002, maxDocs=44218)
                0.0625 = fieldNorm(doc=2300)
          0.08902959 = weight(abstract_txt:subject in 2300) [ClassicSimilarity], result of:
            0.08902959 = score(doc=2300,freq=5.0), product of:
              0.16305114 = queryWeight, product of:
                1.9822311 = boost
                3.9070187 = idf(docFreq=2415, maxDocs=44218)
                0.021053487 = queryNorm
              0.5460225 = fieldWeight in 2300, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.9070187 = idf(docFreq=2415, maxDocs=44218)
                0.0625 = fieldNorm(doc=2300)
          0.054936007 = weight(abstract_txt:indexing in 2300) [ClassicSimilarity], result of:
            0.054936007 = score(doc=2300,freq=1.0), product of:
              0.20208263 = queryWeight, product of:
                2.206769 = boost
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.021053487 = queryNorm
              0.27184922 = fieldWeight in 2300, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.0625 = fieldNorm(doc=2300)
          0.06534097 = weight(abstract_txt:approaches in 2300) [ClassicSimilarity], result of:
            0.06534097 = score(doc=2300,freq=1.0), product of:
              0.22685482 = queryWeight, product of:
                2.3381178 = boost
                4.6084785 = idf(docFreq=1197, maxDocs=44218)
                0.021053487 = queryNorm
              0.2880299 = fieldWeight in 2300, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6084785 = idf(docFreq=1197, maxDocs=44218)
                0.0625 = fieldNorm(doc=2300)
          0.15659837 = weight(abstract_txt:automated in 2300) [ClassicSimilarity], result of:
            0.15659837 = score(doc=2300,freq=1.0), product of:
              0.44715905 = queryWeight, product of:
                3.7904675 = boost
                5.6033173 = idf(docFreq=442, maxDocs=44218)
                0.021053487 = queryNorm
              0.35020733 = fieldWeight in 2300, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6033173 = idf(docFreq=442, maxDocs=44218)
                0.0625 = fieldNorm(doc=2300)
        0.32 = coord(8/25)
    
  3. Thiel, T.J.: Automated indexing of document image management systems (1992) 0.16
    0.161656 = sum of:
      0.161656 = product of:
        0.6735667 = sum of:
          0.09759672 = weight(abstract_txt:effectively in 3049) [ClassicSimilarity], result of:
            0.09759672 = score(doc=3049,freq=2.0), product of:
              0.12449058 = queryWeight, product of:
                5.913062 = idf(docFreq=324, maxDocs=44218)
                0.021053487 = queryNorm
              0.7839687 = fieldWeight in 3049, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.913062 = idf(docFreq=324, maxDocs=44218)
                0.09375 = fieldNorm(doc=3049)
          0.090907596 = weight(abstract_txt:ever in 3049) [ClassicSimilarity], result of:
            0.090907596 = score(doc=3049,freq=1.0), product of:
              0.14959708 = queryWeight, product of:
                1.0962089 = boost
                6.481951 = idf(docFreq=183, maxDocs=44218)
                0.021053487 = queryNorm
              0.60768294 = fieldWeight in 3049, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.481951 = idf(docFreq=183, maxDocs=44218)
                0.09375 = fieldNorm(doc=3049)
          0.04592552 = weight(abstract_txt:systems in 3049) [ClassicSimilarity], result of:
            0.04592552 = score(doc=3049,freq=3.0), product of:
              0.08289506 = queryWeight, product of:
                1.154014 = boost
                3.4118783 = idf(docFreq=3963, maxDocs=44218)
                0.021053487 = queryNorm
              0.55402 = fieldWeight in 3049, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.4118783 = idf(docFreq=3963, maxDocs=44218)
                0.09375 = fieldNorm(doc=3049)
          0.039431266 = weight(abstract_txt:more in 3049) [ClassicSimilarity], result of:
            0.039431266 = score(doc=3049,freq=1.0), product of:
              0.12363002 = queryWeight, product of:
                1.7260538 = boost
                3.402088 = idf(docFreq=4002, maxDocs=44218)
                0.021053487 = queryNorm
              0.31894574 = fieldWeight in 3049, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.402088 = idf(docFreq=4002, maxDocs=44218)
                0.09375 = fieldNorm(doc=3049)
          0.16480802 = weight(abstract_txt:indexing in 3049) [ClassicSimilarity], result of:
            0.16480802 = score(doc=3049,freq=4.0), product of:
              0.20208263 = queryWeight, product of:
                2.206769 = boost
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.021053487 = queryNorm
              0.81554765 = fieldWeight in 3049, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.09375 = fieldNorm(doc=3049)
          0.23489757 = weight(abstract_txt:automated in 3049) [ClassicSimilarity], result of:
            0.23489757 = score(doc=3049,freq=1.0), product of:
              0.44715905 = queryWeight, product of:
                3.7904675 = boost
                5.6033173 = idf(docFreq=442, maxDocs=44218)
                0.021053487 = queryNorm
              0.525311 = fieldWeight in 3049, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6033173 = idf(docFreq=442, maxDocs=44218)
                0.09375 = fieldNorm(doc=3049)
        0.24 = coord(6/25)
    
  4. Rugg, G.: ¬The future of smart systems in information science (1993) 0.12
    0.12047931 = sum of:
      0.12047931 = product of:
        0.5019971 = sum of:
          0.06891978 = weight(abstract_txt:face in 6713) [ClassicSimilarity], result of:
            0.06891978 = score(doc=6713,freq=1.0), product of:
              0.14045583 = queryWeight, product of:
                1.0621886 = boost
                6.280787 = idf(docFreq=224, maxDocs=44218)
                0.021053487 = queryNorm
              0.49068648 = fieldWeight in 6713, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.280787 = idf(docFreq=224, maxDocs=44218)
                0.078125 = fieldNorm(doc=6713)
          0.05412375 = weight(abstract_txt:systems in 6713) [ClassicSimilarity], result of:
            0.05412375 = score(doc=6713,freq=6.0), product of:
              0.08289506 = queryWeight, product of:
                1.154014 = boost
                3.4118783 = idf(docFreq=3963, maxDocs=44218)
                0.021053487 = queryNorm
              0.6529189 = fieldWeight in 6713, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                3.4118783 = idf(docFreq=3963, maxDocs=44218)
                0.078125 = fieldNorm(doc=6713)
          0.03285939 = weight(abstract_txt:more in 6713) [ClassicSimilarity], result of:
            0.03285939 = score(doc=6713,freq=1.0), product of:
              0.12363002 = queryWeight, product of:
                1.7260538 = boost
                3.402088 = idf(docFreq=4002, maxDocs=44218)
                0.021053487 = queryNorm
              0.2657881 = fieldWeight in 6713, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.402088 = idf(docFreq=4002, maxDocs=44218)
                0.078125 = fieldNorm(doc=6713)
          0.068670005 = weight(abstract_txt:indexing in 6713) [ClassicSimilarity], result of:
            0.068670005 = score(doc=6713,freq=1.0), product of:
              0.20208263 = queryWeight, product of:
                2.206769 = boost
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.021053487 = queryNorm
              0.3398115 = fieldWeight in 6713, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.078125 = fieldNorm(doc=6713)
          0.081676215 = weight(abstract_txt:approaches in 6713) [ClassicSimilarity], result of:
            0.081676215 = score(doc=6713,freq=1.0), product of:
              0.22685482 = queryWeight, product of:
                2.3381178 = boost
                4.6084785 = idf(docFreq=1197, maxDocs=44218)
                0.021053487 = queryNorm
              0.3600374 = fieldWeight in 6713, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6084785 = idf(docFreq=1197, maxDocs=44218)
                0.078125 = fieldNorm(doc=6713)
          0.19574797 = weight(abstract_txt:automated in 6713) [ClassicSimilarity], result of:
            0.19574797 = score(doc=6713,freq=1.0), product of:
              0.44715905 = queryWeight, product of:
                3.7904675 = boost
                5.6033173 = idf(docFreq=442, maxDocs=44218)
                0.021053487 = queryNorm
              0.43775916 = fieldWeight in 6713, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6033173 = idf(docFreq=442, maxDocs=44218)
                0.078125 = fieldNorm(doc=6713)
        0.24 = coord(6/25)
    
  5. Hahn, J.: Semi-automated methods for BIBFRAME work entity description (2021) 0.11
    0.112846285 = sum of:
      0.112846285 = product of:
        0.7052893 = sum of:
          0.13156992 = weight(abstract_txt:semi in 725) [ClassicSimilarity], result of:
            0.13156992 = score(doc=725,freq=2.0), product of:
              0.15192086 = queryWeight, product of:
                1.1046901 = boost
                6.532101 = idf(docFreq=174, maxDocs=44218)
                0.021053487 = queryNorm
              0.8660425 = fieldWeight in 725, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.532101 = idf(docFreq=174, maxDocs=44218)
                0.09375 = fieldNorm(doc=725)
          0.084460884 = weight(abstract_txt:subject in 725) [ClassicSimilarity], result of:
            0.084460884 = score(doc=725,freq=2.0), product of:
              0.16305114 = queryWeight, product of:
                1.9822311 = boost
                3.9070187 = idf(docFreq=2415, maxDocs=44218)
                0.021053487 = queryNorm
              0.5180024 = fieldWeight in 725, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.9070187 = idf(docFreq=2415, maxDocs=44218)
                0.09375 = fieldNorm(doc=725)
          0.08240401 = weight(abstract_txt:indexing in 725) [ClassicSimilarity], result of:
            0.08240401 = score(doc=725,freq=1.0), product of:
              0.20208263 = queryWeight, product of:
                2.206769 = boost
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.021053487 = queryNorm
              0.40777382 = fieldWeight in 725, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.09375 = fieldNorm(doc=725)
          0.40685448 = weight(abstract_txt:automated in 725) [ClassicSimilarity], result of:
            0.40685448 = score(doc=725,freq=3.0), product of:
              0.44715905 = queryWeight, product of:
                3.7904675 = boost
                5.6033173 = idf(docFreq=442, maxDocs=44218)
                0.021053487 = queryNorm
              0.90986526 = fieldWeight in 725, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.6033173 = idf(docFreq=442, maxDocs=44218)
                0.09375 = fieldNorm(doc=725)
        0.16 = coord(4/25)