Document (#42237)

Author
Wolfe, EW.
Title
a case study in automated metadata enhancement : Natural Language Processing in the humanities
Source
Code4Lib journal. Issue 46(2019), [http://journal.code4lib.org]
Year
2019
Abstract
The Black Book Interactive Project at the University of Kansas (KU) is developing an expanded corpus of novels by African American authors, with an emphasis on lesser known writers and a goal of expanding research in this field. Using a custom metadata schema with an emphasis on race-related elements, each novel is analyzed for a variety of elements such as literary style, targeted content analysis, historical context, and other areas. Librarians at KU have worked to develop a variety of computational text analysis processes designed to assist with specific aspects of this metadata collection, including text mining and natural language processing, automated subject extraction based on word sense disambiguation, harvesting data from Wikidata, and other actions.
Content
Vgl.: https://journal.code4lib.org/articles/14834.
Theme
Metadaten
Automatisches Indexieren
Field
Geisteswissenschaften

Similar documents (content)

  1. Shaalan, K.; Raza, H.: NERA: Named Entity Recognition for Arabic (2009) 0.15
    0.14877859 = sum of:
      0.14877859 = product of:
        0.4649331 = sum of:
          0.07032956 = weight(abstract_txt:worked in 2953) [ClassicSimilarity], result of:
            0.07032956 = score(doc=2953,freq=1.0), product of:
              0.17699063 = queryWeight, product of:
                1.0187215 = boost
                7.2660704 = idf(docFreq=83, maxDocs=44218)
                0.023910861 = queryNorm
              0.39736322 = fieldWeight in 2953, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.2660704 = idf(docFreq=83, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2953)
          0.07250351 = weight(abstract_txt:disambiguation in 2953) [ClassicSimilarity], result of:
            0.07250351 = score(doc=2953,freq=1.0), product of:
              0.18061937 = queryWeight, product of:
                1.0291116 = boost
                7.3401785 = idf(docFreq=77, maxDocs=44218)
                0.023910861 = queryNorm
              0.401416 = fieldWeight in 2953, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3401785 = idf(docFreq=77, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2953)
          0.02424709 = weight(abstract_txt:text in 2953) [ClassicSimilarity], result of:
            0.02424709 = score(doc=2953,freq=1.0), product of:
              0.10964144 = queryWeight, product of:
                1.1339207 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.023910861 = queryNorm
              0.22114895 = fieldWeight in 2953, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2953)
          0.053638726 = weight(abstract_txt:language in 2953) [ClassicSimilarity], result of:
            0.053638726 = score(doc=2953,freq=4.0), product of:
              0.117264695 = queryWeight, product of:
                1.1726785 = boost
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.023910861 = queryNorm
              0.45741582 = fieldWeight in 2953, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2953)
          0.062203266 = weight(abstract_txt:processing in 2953) [ClassicSimilarity], result of:
            0.062203266 = score(doc=2953,freq=2.0), product of:
              0.1630799 = queryWeight, product of:
                1.3829151 = boost
                4.931848 = idf(docFreq=866, maxDocs=44218)
                0.023910861 = queryNorm
              0.38142815 = fieldWeight in 2953, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.931848 = idf(docFreq=866, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2953)
          0.067958385 = weight(abstract_txt:natural in 2953) [ClassicSimilarity], result of:
            0.067958385 = score(doc=2953,freq=2.0), product of:
              0.17298974 = queryWeight, product of:
                1.4243132 = boost
                5.0794845 = idf(docFreq=747, maxDocs=44218)
                0.023910861 = queryNorm
              0.39284632 = fieldWeight in 2953, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.0794845 = idf(docFreq=747, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2953)
          0.050086025 = weight(abstract_txt:variety in 2953) [ClassicSimilarity], result of:
            0.050086025 = score(doc=2953,freq=1.0), product of:
              0.17783314 = queryWeight, product of:
                1.4441146 = boost
                5.1501017 = idf(docFreq=696, maxDocs=44218)
                0.023910861 = queryNorm
              0.2816462 = fieldWeight in 2953, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1501017 = idf(docFreq=696, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2953)
          0.06396653 = weight(abstract_txt:metadata in 2953) [ClassicSimilarity], result of:
            0.06396653 = score(doc=2953,freq=1.0), product of:
              0.23962599 = queryWeight, product of:
                2.0530896 = boost
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.023910861 = queryNorm
              0.2669432 = fieldWeight in 2953, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2953)
        0.32 = coord(8/25)
    
  2. Christel, M.G.: Automated metadata in multimedia information systems : creation, refinement, use in surrogates, and evaluation (2009) 0.14
    0.14105915 = sum of:
      0.14105915 = product of:
        0.44080985 = sum of:
          0.020436406 = weight(abstract_txt:analysis in 3086) [ClassicSimilarity], result of:
            0.020436406 = score(doc=3086,freq=1.0), product of:
              0.08949732 = queryWeight, product of:
                1.0244726 = boost
                3.6535451 = idf(docFreq=3112, maxDocs=44218)
                0.023910861 = queryNorm
              0.22834657 = fieldWeight in 3086, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6535451 = idf(docFreq=3112, maxDocs=44218)
                0.0625 = fieldNorm(doc=3086)
          0.019636473 = weight(abstract_txt:with in 3086) [ClassicSimilarity], result of:
            0.019636473 = score(doc=3086,freq=4.0), product of:
              0.06284341 = queryWeight, product of:
                1.0514069 = boost
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.023910861 = queryNorm
              0.31246668 = fieldWeight in 3086, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.0625 = fieldNorm(doc=3086)
          0.02771096 = weight(abstract_txt:text in 3086) [ClassicSimilarity], result of:
            0.02771096 = score(doc=3086,freq=1.0), product of:
              0.10964144 = queryWeight, product of:
                1.1339207 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.023910861 = queryNorm
              0.25274166 = fieldWeight in 3086, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=3086)
          0.030650701 = weight(abstract_txt:language in 3086) [ClassicSimilarity], result of:
            0.030650701 = score(doc=3086,freq=1.0), product of:
              0.117264695 = queryWeight, product of:
                1.1726785 = boost
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.023910861 = queryNorm
              0.26138046 = fieldWeight in 3086, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.0625 = fieldNorm(doc=3086)
          0.05026783 = weight(abstract_txt:processing in 3086) [ClassicSimilarity], result of:
            0.05026783 = score(doc=3086,freq=1.0), product of:
              0.1630799 = queryWeight, product of:
                1.3829151 = boost
                4.931848 = idf(docFreq=866, maxDocs=44218)
                0.023910861 = queryNorm
              0.3082405 = fieldWeight in 3086, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.931848 = idf(docFreq=866, maxDocs=44218)
                0.0625 = fieldNorm(doc=3086)
          0.05491867 = weight(abstract_txt:natural in 3086) [ClassicSimilarity], result of:
            0.05491867 = score(doc=3086,freq=1.0), product of:
              0.17298974 = queryWeight, product of:
                1.4243132 = boost
                5.0794845 = idf(docFreq=747, maxDocs=44218)
                0.023910861 = queryNorm
              0.31746778 = fieldWeight in 3086, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0794845 = idf(docFreq=747, maxDocs=44218)
                0.0625 = fieldNorm(doc=3086)
          0.07372194 = weight(abstract_txt:automated in 3086) [ClassicSimilarity], result of:
            0.07372194 = score(doc=3086,freq=1.0), product of:
              0.21050942 = queryWeight, product of:
                1.5711986 = boost
                5.6033173 = idf(docFreq=442, maxDocs=44218)
                0.023910861 = queryNorm
              0.35020733 = fieldWeight in 3086, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6033173 = idf(docFreq=442, maxDocs=44218)
                0.0625 = fieldNorm(doc=3086)
          0.16346687 = weight(abstract_txt:metadata in 3086) [ClassicSimilarity], result of:
            0.16346687 = score(doc=3086,freq=5.0), product of:
              0.23962599 = queryWeight, product of:
                2.0530896 = boost
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.023910861 = queryNorm
              0.68217504 = fieldWeight in 3086, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.0625 = fieldNorm(doc=3086)
        0.32 = coord(8/25)
    
  3. Jurafsky, D.; Martin, J.H.: Speech and language processing : ani ntroduction to natural language processing, computational linguistics and speech recognition (2009) 0.12
    0.12467628 = sum of:
      0.12467628 = product of:
        0.5194845 = sum of:
          0.017356355 = weight(abstract_txt:with in 1081) [ClassicSimilarity], result of:
            0.017356355 = score(doc=1081,freq=2.0), product of:
              0.06284341 = queryWeight, product of:
                1.0514069 = boost
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.023910861 = queryNorm
              0.27618414 = fieldWeight in 1081, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.078125 = fieldNorm(doc=1081)
          0.0346387 = weight(abstract_txt:text in 1081) [ClassicSimilarity], result of:
            0.0346387 = score(doc=1081,freq=1.0), product of:
              0.10964144 = queryWeight, product of:
                1.1339207 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.023910861 = queryNorm
              0.3159271 = fieldWeight in 1081, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=1081)
          0.108366586 = weight(abstract_txt:language in 1081) [ClassicSimilarity], result of:
            0.108366586 = score(doc=1081,freq=8.0), product of:
              0.117264695 = queryWeight, product of:
                1.1726785 = boost
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.023910861 = queryNorm
              0.9241195 = fieldWeight in 1081, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.078125 = fieldNorm(doc=1081)
          0.15391319 = weight(abstract_txt:processing in 1081) [ClassicSimilarity], result of:
            0.15391319 = score(doc=1081,freq=6.0), product of:
              0.1630799 = queryWeight, product of:
                1.3829151 = boost
                4.931848 = idf(docFreq=866, maxDocs=44218)
                0.023910861 = queryNorm
              0.94379 = fieldWeight in 1081, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.931848 = idf(docFreq=866, maxDocs=44218)
                0.078125 = fieldNorm(doc=1081)
          0.097083405 = weight(abstract_txt:natural in 1081) [ClassicSimilarity], result of:
            0.097083405 = score(doc=1081,freq=2.0), product of:
              0.17298974 = queryWeight, product of:
                1.4243132 = boost
                5.0794845 = idf(docFreq=747, maxDocs=44218)
                0.023910861 = queryNorm
              0.561209 = fieldWeight in 1081, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.0794845 = idf(docFreq=747, maxDocs=44218)
                0.078125 = fieldNorm(doc=1081)
          0.10812629 = weight(abstract_txt:emphasis in 1081) [ClassicSimilarity], result of:
            0.10812629 = score(doc=1081,freq=1.0), product of:
              0.23418255 = queryWeight, product of:
                1.657191 = boost
                5.90999 = idf(docFreq=325, maxDocs=44218)
                0.023910861 = queryNorm
              0.46171796 = fieldWeight in 1081, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.90999 = idf(docFreq=325, maxDocs=44218)
                0.078125 = fieldNorm(doc=1081)
        0.24 = coord(6/25)
    
  4. Taylor, S.L.: Integrating natural language understanding with document structure analysis (1994) 0.12
    0.11758592 = sum of:
      0.11758592 = product of:
        0.48994136 = sum of:
          0.05309534 = weight(abstract_txt:analysis in 1794) [ClassicSimilarity], result of:
            0.05309534 = score(doc=1794,freq=3.0), product of:
              0.08949732 = queryWeight, product of:
                1.0244726 = boost
                3.6535451 = idf(docFreq=3112, maxDocs=44218)
                0.023910861 = queryNorm
              0.5932618 = fieldWeight in 1794, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.6535451 = idf(docFreq=3112, maxDocs=44218)
                0.09375 = fieldNorm(doc=1794)
          0.014727354 = weight(abstract_txt:with in 1794) [ClassicSimilarity], result of:
            0.014727354 = score(doc=1794,freq=1.0), product of:
              0.06284341 = queryWeight, product of:
                1.0514069 = boost
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.023910861 = queryNorm
              0.23435001 = fieldWeight in 1794, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.09375 = fieldNorm(doc=1794)
          0.071995184 = weight(abstract_txt:text in 1794) [ClassicSimilarity], result of:
            0.071995184 = score(doc=1794,freq=3.0), product of:
              0.10964144 = queryWeight, product of:
                1.1339207 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.023910861 = queryNorm
              0.6566421 = fieldWeight in 1794, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.09375 = fieldNorm(doc=1794)
          0.06501996 = weight(abstract_txt:language in 1794) [ClassicSimilarity], result of:
            0.06501996 = score(doc=1794,freq=2.0), product of:
              0.117264695 = queryWeight, product of:
                1.1726785 = boost
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.023910861 = queryNorm
              0.55447173 = fieldWeight in 1794, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.09375 = fieldNorm(doc=1794)
          0.16860344 = weight(abstract_txt:processing in 1794) [ClassicSimilarity], result of:
            0.16860344 = score(doc=1794,freq=5.0), product of:
              0.1630799 = queryWeight, product of:
                1.3829151 = boost
                4.931848 = idf(docFreq=866, maxDocs=44218)
                0.023910861 = queryNorm
              1.0338701 = fieldWeight in 1794, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.931848 = idf(docFreq=866, maxDocs=44218)
                0.09375 = fieldNorm(doc=1794)
          0.11650009 = weight(abstract_txt:natural in 1794) [ClassicSimilarity], result of:
            0.11650009 = score(doc=1794,freq=2.0), product of:
              0.17298974 = queryWeight, product of:
                1.4243132 = boost
                5.0794845 = idf(docFreq=747, maxDocs=44218)
                0.023910861 = queryNorm
              0.6734508 = fieldWeight in 1794, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.0794845 = idf(docFreq=747, maxDocs=44218)
                0.09375 = fieldNorm(doc=1794)
        0.24 = coord(6/25)
    
  5. Vledutz-Stokolov, N.: Concept recognition in an automatic text-processing system for the life sciences (1987) 0.11
    0.11359591 = sum of:
      0.11359591 = product of:
        0.40569967 = sum of:
          0.020436406 = weight(abstract_txt:analysis in 2849) [ClassicSimilarity], result of:
            0.020436406 = score(doc=2849,freq=1.0), product of:
              0.08949732 = queryWeight, product of:
                1.0244726 = boost
                3.6535451 = idf(docFreq=3112, maxDocs=44218)
                0.023910861 = queryNorm
              0.22834657 = fieldWeight in 2849, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6535451 = idf(docFreq=3112, maxDocs=44218)
                0.0625 = fieldNorm(doc=2849)
          0.117183365 = weight(abstract_txt:disambiguation in 2849) [ClassicSimilarity], result of:
            0.117183365 = score(doc=2849,freq=2.0), product of:
              0.18061937 = queryWeight, product of:
                1.0291116 = boost
                7.3401785 = idf(docFreq=77, maxDocs=44218)
                0.023910861 = queryNorm
              0.64878625 = fieldWeight in 2849, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.3401785 = idf(docFreq=77, maxDocs=44218)
                0.0625 = fieldNorm(doc=2849)
          0.013885083 = weight(abstract_txt:with in 2849) [ClassicSimilarity], result of:
            0.013885083 = score(doc=2849,freq=2.0), product of:
              0.06284341 = queryWeight, product of:
                1.0514069 = boost
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.023910861 = queryNorm
              0.22094731 = fieldWeight in 2849, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.0625 = fieldNorm(doc=2849)
          0.02771096 = weight(abstract_txt:text in 2849) [ClassicSimilarity], result of:
            0.02771096 = score(doc=2849,freq=1.0), product of:
              0.10964144 = queryWeight, product of:
                1.1339207 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.023910861 = queryNorm
              0.25274166 = fieldWeight in 2849, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=2849)
          0.08109413 = weight(abstract_txt:language in 2849) [ClassicSimilarity], result of:
            0.08109413 = score(doc=2849,freq=7.0), product of:
              0.117264695 = queryWeight, product of:
                1.1726785 = boost
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.023910861 = queryNorm
              0.6915477 = fieldWeight in 2849, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.0625 = fieldNorm(doc=2849)
          0.05026783 = weight(abstract_txt:processing in 2849) [ClassicSimilarity], result of:
            0.05026783 = score(doc=2849,freq=1.0), product of:
              0.1630799 = queryWeight, product of:
                1.3829151 = boost
                4.931848 = idf(docFreq=866, maxDocs=44218)
                0.023910861 = queryNorm
              0.3082405 = fieldWeight in 2849, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.931848 = idf(docFreq=866, maxDocs=44218)
                0.0625 = fieldNorm(doc=2849)
          0.09512192 = weight(abstract_txt:natural in 2849) [ClassicSimilarity], result of:
            0.09512192 = score(doc=2849,freq=3.0), product of:
              0.17298974 = queryWeight, product of:
                1.4243132 = boost
                5.0794845 = idf(docFreq=747, maxDocs=44218)
                0.023910861 = queryNorm
              0.5498703 = fieldWeight in 2849, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.0794845 = idf(docFreq=747, maxDocs=44218)
                0.0625 = fieldNorm(doc=2849)
        0.28 = coord(7/25)