Document (#42238)

Author
Wolfe, EW.
Title
a case study in automated metadata enhancement : Natural Language Processing in the humanities
Source
Code4Lib journal. Issue 46(2019), [http://journal.code4lib.org]
Year
2019
Abstract
The Black Book Interactive Project at the University of Kansas (KU) is developing an expanded corpus of novels by African American authors, with an emphasis on lesser known writers and a goal of expanding research in this field. Using a custom metadata schema with an emphasis on race-related elements, each novel is analyzed for a variety of elements such as literary style, targeted content analysis, historical context, and other areas. Librarians at KU have worked to develop a variety of computational text analysis processes designed to assist with specific aspects of this metadata collection, including text mining and natural language processing, automated subject extraction based on word sense disambiguation, harvesting data from Wikidata, and other actions.
Content
Vgl.: https://journal.code4lib.org/articles/14834.
Theme
Metadaten
Automatisches Indexieren
Field
Geisteswissenschaften

Similar documents (content)

  1. Jurafsky, D.; Martin, J.H.: Speech and language processing : ani ntroduction to natural language processing, computational linguistics and speech recognition (2009) 0.16
    0.15796411 = sum of:
      0.15796411 = product of:
        0.56415755 = sum of:
          0.023829972 = weight(abstract_txt:other in 3082) [ClassicSimilarity], result of:
            0.023829972 = score(doc=3082,freq=1.0), product of:
              0.0862893 = queryWeight, product of:
                3.5348954 = idf(docFreq=3387, maxDocs=42740)
                0.024410708 = queryNorm
              0.2761637 = fieldWeight in 3082, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5348954 = idf(docFreq=3387, maxDocs=42740)
                0.078125 = fieldNorm(doc=3082)
          0.018262986 = weight(abstract_txt:with in 3082) [ClassicSimilarity], result of:
            0.018262986 = score(doc=3082,freq=2.0), product of:
              0.06565627 = queryWeight, product of:
                1.0683296 = boost
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.024410708 = queryNorm
              0.2781606 = fieldWeight in 3082, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.078125 = fieldNorm(doc=3082)
          0.035840828 = weight(abstract_txt:text in 3082) [ClassicSimilarity], result of:
            0.035840828 = score(doc=3082,freq=1.0), product of:
              0.11327305 = queryWeight, product of:
                1.1457367 = boost
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.024410708 = queryNorm
              0.3164109 = fieldWeight in 3082, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.078125 = fieldNorm(doc=3082)
          0.11255481 = weight(abstract_txt:language in 3082) [ClassicSimilarity], result of:
            0.11255481 = score(doc=3082,freq=8.0), product of:
              0.12145646 = queryWeight, product of:
                1.1864018 = boost
                4.1938066 = idf(docFreq=1752, maxDocs=42740)
                0.024410708 = queryNorm
              0.9267091 = fieldWeight in 3082, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                4.1938066 = idf(docFreq=1752, maxDocs=42740)
                0.078125 = fieldNorm(doc=3082)
          0.16027574 = weight(abstract_txt:processing in 3082) [ClassicSimilarity], result of:
            0.16027574 = score(doc=3082,freq=6.0), product of:
              0.16920091 = queryWeight, product of:
                1.4003057 = boost
                4.9499345 = idf(docFreq=822, maxDocs=42740)
                0.024410708 = queryNorm
              0.9472511 = fieldWeight in 3082, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.9499345 = idf(docFreq=822, maxDocs=42740)
                0.078125 = fieldNorm(doc=3082)
          0.10182843 = weight(abstract_txt:natural in 3082) [ClassicSimilarity], result of:
            0.10182843 = score(doc=3082,freq=2.0), product of:
              0.18034771 = queryWeight, product of:
                1.4456955 = boost
                5.1103826 = idf(docFreq=700, maxDocs=42740)
                0.024410708 = queryNorm
              0.5646228 = fieldWeight in 3082, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.1103826 = idf(docFreq=700, maxDocs=42740)
                0.078125 = fieldNorm(doc=3082)
          0.11156477 = weight(abstract_txt:emphasis in 3082) [ClassicSimilarity], result of:
            0.11156477 = score(doc=3082,freq=1.0), product of:
              0.24148637 = queryWeight, product of:
                1.6728917 = boost
                5.9134974 = idf(docFreq=313, maxDocs=42740)
                0.024410708 = queryNorm
              0.461992 = fieldWeight in 3082, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9134974 = idf(docFreq=313, maxDocs=42740)
                0.078125 = fieldNorm(doc=3082)
        0.28 = coord(7/25)
    
  2. Shaalan, K.; Raza, H.: NERA: Named Entity Recognition for Arabic (2009) 0.16
    0.15511638 = sum of:
      0.15511638 = product of:
        0.48473868 = sum of:
          0.07288037 = weight(abstract_txt:worked in 4954) [ClassicSimilarity], result of:
            0.07288037 = score(doc=4954,freq=1.0), product of:
              0.18303731 = queryWeight, product of:
                1.0298556 = boost
                7.280864 = idf(docFreq=79, maxDocs=42740)
                0.024410708 = queryNorm
              0.39817223 = fieldWeight in 4954, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.280864 = idf(docFreq=79, maxDocs=42740)
                0.0546875 = fieldNorm(doc=4954)
          0.07609032 = weight(abstract_txt:disambiguation in 4954) [ClassicSimilarity], result of:
            0.07609032 = score(doc=4954,freq=1.0), product of:
              0.18837307 = queryWeight, product of:
                1.0447586 = boost
                7.3862243 = idf(docFreq=71, maxDocs=42740)
                0.024410708 = queryNorm
              0.40393415 = fieldWeight in 4954, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3862243 = idf(docFreq=71, maxDocs=42740)
                0.0546875 = fieldNorm(doc=4954)
          0.025088577 = weight(abstract_txt:text in 4954) [ClassicSimilarity], result of:
            0.025088577 = score(doc=4954,freq=1.0), product of:
              0.11327305 = queryWeight, product of:
                1.1457367 = boost
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.024410708 = queryNorm
              0.22148761 = fieldWeight in 4954, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.0546875 = fieldNorm(doc=4954)
          0.055711787 = weight(abstract_txt:language in 4954) [ClassicSimilarity], result of:
            0.055711787 = score(doc=4954,freq=4.0), product of:
              0.12145646 = queryWeight, product of:
                1.1864018 = boost
                4.1938066 = idf(docFreq=1752, maxDocs=42740)
                0.024410708 = queryNorm
              0.45869762 = fieldWeight in 4954, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.1938066 = idf(docFreq=1752, maxDocs=42740)
                0.0546875 = fieldNorm(doc=4954)
          0.06477467 = weight(abstract_txt:processing in 4954) [ClassicSimilarity], result of:
            0.06477467 = score(doc=4954,freq=2.0), product of:
              0.16920091 = queryWeight, product of:
                1.4003057 = boost
                4.9499345 = idf(docFreq=822, maxDocs=42740)
                0.024410708 = queryNorm
              0.38282695 = fieldWeight in 4954, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.9499345 = idf(docFreq=822, maxDocs=42740)
                0.0546875 = fieldNorm(doc=4954)
          0.071279906 = weight(abstract_txt:natural in 4954) [ClassicSimilarity], result of:
            0.071279906 = score(doc=4954,freq=2.0), product of:
              0.18034771 = queryWeight, product of:
                1.4456955 = boost
                5.1103826 = idf(docFreq=700, maxDocs=42740)
                0.024410708 = queryNorm
              0.395236 = fieldWeight in 4954, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.1103826 = idf(docFreq=700, maxDocs=42740)
                0.0546875 = fieldNorm(doc=4954)
          0.052024044 = weight(abstract_txt:variety in 4954) [ClassicSimilarity], result of:
            0.052024044 = score(doc=4954,freq=1.0), product of:
              0.18419534 = queryWeight, product of:
                1.4610357 = boost
                5.1646085 = idf(docFreq=663, maxDocs=42740)
                0.024410708 = queryNorm
              0.28243953 = fieldWeight in 4954, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1646085 = idf(docFreq=663, maxDocs=42740)
                0.0546875 = fieldNorm(doc=4954)
          0.066888995 = weight(abstract_txt:metadata in 4954) [ClassicSimilarity], result of:
            0.066888995 = score(doc=4954,freq=1.0), product of:
              0.24931176 = queryWeight, product of:
                2.0817978 = boost
                4.905958 = idf(docFreq=859, maxDocs=42740)
                0.024410708 = queryNorm
              0.26829457 = fieldWeight in 4954, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.905958 = idf(docFreq=859, maxDocs=42740)
                0.0546875 = fieldNorm(doc=4954)
        0.32 = coord(8/25)
    
  3. Christel, M.G.: Automated metadata in multimedia information systems : creation, refinement, use in surrogates, and evaluation (2009) 0.15
    0.14730631 = sum of:
      0.14730631 = product of:
        0.4603322 = sum of:
          0.02165623 = weight(abstract_txt:analysis in 87) [ClassicSimilarity], result of:
            0.02165623 = score(doc=87,freq=1.0), product of:
              0.09394417 = queryWeight, product of:
                1.0434135 = boost
                3.6883576 = idf(docFreq=2905, maxDocs=42740)
                0.024410708 = queryNorm
              0.23052235 = fieldWeight in 87, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6883576 = idf(docFreq=2905, maxDocs=42740)
                0.0625 = fieldNorm(doc=87)
          0.020662209 = weight(abstract_txt:with in 87) [ClassicSimilarity], result of:
            0.020662209 = score(doc=87,freq=4.0), product of:
              0.06565627 = queryWeight, product of:
                1.0683296 = boost
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.024410708 = queryNorm
              0.31470278 = fieldWeight in 87, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.0625 = fieldNorm(doc=87)
          0.02867266 = weight(abstract_txt:text in 87) [ClassicSimilarity], result of:
            0.02867266 = score(doc=87,freq=1.0), product of:
              0.11327305 = queryWeight, product of:
                1.1457367 = boost
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.024410708 = queryNorm
              0.2531287 = fieldWeight in 87, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.0625 = fieldNorm(doc=87)
          0.031835306 = weight(abstract_txt:language in 87) [ClassicSimilarity], result of:
            0.031835306 = score(doc=87,freq=1.0), product of:
              0.12145646 = queryWeight, product of:
                1.1864018 = boost
                4.1938066 = idf(docFreq=1752, maxDocs=42740)
                0.024410708 = queryNorm
              0.26211292 = fieldWeight in 87, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1938066 = idf(docFreq=1752, maxDocs=42740)
                0.0625 = fieldNorm(doc=87)
          0.05234584 = weight(abstract_txt:processing in 87) [ClassicSimilarity], result of:
            0.05234584 = score(doc=87,freq=1.0), product of:
              0.16920091 = queryWeight, product of:
                1.4003057 = boost
                4.9499345 = idf(docFreq=822, maxDocs=42740)
                0.024410708 = queryNorm
              0.3093709 = fieldWeight in 87, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9499345 = idf(docFreq=822, maxDocs=42740)
                0.0625 = fieldNorm(doc=87)
          0.057602864 = weight(abstract_txt:natural in 87) [ClassicSimilarity], result of:
            0.057602864 = score(doc=87,freq=1.0), product of:
              0.18034771 = queryWeight, product of:
                1.4456955 = boost
                5.1103826 = idf(docFreq=700, maxDocs=42740)
                0.024410708 = queryNorm
              0.3193989 = fieldWeight in 87, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1103826 = idf(docFreq=700, maxDocs=42740)
                0.0625 = fieldNorm(doc=87)
          0.07662184 = weight(abstract_txt:automated in 87) [ClassicSimilarity], result of:
            0.07662184 = score(doc=87,freq=1.0), product of:
              0.21813045 = queryWeight, product of:
                1.5899361 = boost
                5.620258 = idf(docFreq=420, maxDocs=42740)
                0.024410708 = queryNorm
              0.35126612 = fieldWeight in 87, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.620258 = idf(docFreq=420, maxDocs=42740)
                0.0625 = fieldNorm(doc=87)
          0.17093526 = weight(abstract_txt:metadata in 87) [ClassicSimilarity], result of:
            0.17093526 = score(doc=87,freq=5.0), product of:
              0.24931176 = queryWeight, product of:
                2.0817978 = boost
                4.905958 = idf(docFreq=859, maxDocs=42740)
                0.024410708 = queryNorm
              0.68562853 = fieldWeight in 87, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.905958 = idf(docFreq=859, maxDocs=42740)
                0.0625 = fieldNorm(doc=87)
        0.32 = coord(8/25)
    
  4. Heidorn, P.B.; Wei, Q.: Automatic metadata extraction from museum specimen labels (2008) 0.13
    0.1338393 = sum of:
      0.1338393 = product of:
        0.47799745 = sum of:
          0.01906398 = weight(abstract_txt:other in 4625) [ClassicSimilarity], result of:
            0.01906398 = score(doc=4625,freq=1.0), product of:
              0.0862893 = queryWeight, product of:
                3.5348954 = idf(docFreq=3387, maxDocs=42740)
                0.024410708 = queryNorm
              0.22093096 = fieldWeight in 4625, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5348954 = idf(docFreq=3387, maxDocs=42740)
                0.0625 = fieldNorm(doc=4625)
          0.0103311045 = weight(abstract_txt:with in 4625) [ClassicSimilarity], result of:
            0.0103311045 = score(doc=4625,freq=1.0), product of:
              0.06565627 = queryWeight, product of:
                1.0683296 = boost
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.024410708 = queryNorm
              0.15735139 = fieldWeight in 4625, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.0625 = fieldNorm(doc=4625)
          0.040549263 = weight(abstract_txt:text in 4625) [ClassicSimilarity], result of:
            0.040549263 = score(doc=4625,freq=2.0), product of:
              0.11327305 = queryWeight, product of:
                1.1457367 = boost
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.024410708 = queryNorm
              0.35797805 = fieldWeight in 4625, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.0625 = fieldNorm(doc=4625)
          0.057602864 = weight(abstract_txt:natural in 4625) [ClassicSimilarity], result of:
            0.057602864 = score(doc=4625,freq=1.0), product of:
              0.18034771 = queryWeight, product of:
                1.4456955 = boost
                5.1103826 = idf(docFreq=700, maxDocs=42740)
                0.024410708 = queryNorm
              0.3193989 = fieldWeight in 4625, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1103826 = idf(docFreq=700, maxDocs=42740)
                0.0625 = fieldNorm(doc=4625)
          0.10968472 = weight(abstract_txt:elements in 4625) [ClassicSimilarity], result of:
            0.10968472 = score(doc=4625,freq=3.0), product of:
              0.19210483 = queryWeight, product of:
                1.4920751 = boost
                5.274329 = idf(docFreq=594, maxDocs=42740)
                0.024410708 = queryNorm
              0.57096285 = fieldWeight in 4625, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.274329 = idf(docFreq=594, maxDocs=42740)
                0.0625 = fieldNorm(doc=4625)
          0.108359635 = weight(abstract_txt:automated in 4625) [ClassicSimilarity], result of:
            0.108359635 = score(doc=4625,freq=2.0), product of:
              0.21813045 = queryWeight, product of:
                1.5899361 = boost
                5.620258 = idf(docFreq=420, maxDocs=42740)
                0.024410708 = queryNorm
              0.4967653 = fieldWeight in 4625, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.620258 = idf(docFreq=420, maxDocs=42740)
                0.0625 = fieldNorm(doc=4625)
          0.13240588 = weight(abstract_txt:metadata in 4625) [ClassicSimilarity], result of:
            0.13240588 = score(doc=4625,freq=3.0), product of:
              0.24931176 = queryWeight, product of:
                2.0817978 = boost
                4.905958 = idf(docFreq=859, maxDocs=42740)
                0.024410708 = queryNorm
              0.53108555 = fieldWeight in 4625, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.905958 = idf(docFreq=859, maxDocs=42740)
                0.0625 = fieldNorm(doc=4625)
        0.28 = coord(7/25)
    
  5. Manning, C.D.; Schütze, H.: Foundations of statistical natural language processing (2000) 0.12
    0.124951124 = sum of:
      0.124951124 = product of:
        0.5206297 = sum of:
          0.02859597 = weight(abstract_txt:other in 3604) [ClassicSimilarity], result of:
            0.02859597 = score(doc=3604,freq=1.0), product of:
              0.0862893 = queryWeight, product of:
                3.5348954 = idf(docFreq=3387, maxDocs=42740)
                0.024410708 = queryNorm
              0.33139646 = fieldWeight in 3604, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5348954 = idf(docFreq=3387, maxDocs=42740)
                0.09375 = fieldNorm(doc=3604)
          0.13044053 = weight(abstract_txt:disambiguation in 3604) [ClassicSimilarity], result of:
            0.13044053 = score(doc=3604,freq=1.0), product of:
              0.18837307 = queryWeight, product of:
                1.0447586 = boost
                7.3862243 = idf(docFreq=71, maxDocs=42740)
                0.024410708 = queryNorm
              0.6924585 = fieldWeight in 3604, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3862243 = idf(docFreq=71, maxDocs=42740)
                0.09375 = fieldNorm(doc=3604)
          0.06082389 = weight(abstract_txt:text in 3604) [ClassicSimilarity], result of:
            0.06082389 = score(doc=3604,freq=2.0), product of:
              0.11327305 = queryWeight, product of:
                1.1457367 = boost
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.024410708 = queryNorm
              0.53696704 = fieldWeight in 3604, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.09375 = fieldNorm(doc=3604)
          0.06753288 = weight(abstract_txt:language in 3604) [ClassicSimilarity], result of:
            0.06753288 = score(doc=3604,freq=2.0), product of:
              0.12145646 = queryWeight, product of:
                1.1864018 = boost
                4.1938066 = idf(docFreq=1752, maxDocs=42740)
                0.024410708 = queryNorm
              0.55602545 = fieldWeight in 3604, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1938066 = idf(docFreq=1752, maxDocs=42740)
                0.09375 = fieldNorm(doc=3604)
          0.11104229 = weight(abstract_txt:processing in 3604) [ClassicSimilarity], result of:
            0.11104229 = score(doc=3604,freq=2.0), product of:
              0.16920091 = queryWeight, product of:
                1.4003057 = boost
                4.9499345 = idf(docFreq=822, maxDocs=42740)
                0.024410708 = queryNorm
              0.6562748 = fieldWeight in 3604, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.9499345 = idf(docFreq=822, maxDocs=42740)
                0.09375 = fieldNorm(doc=3604)
          0.12219412 = weight(abstract_txt:natural in 3604) [ClassicSimilarity], result of:
            0.12219412 = score(doc=3604,freq=2.0), product of:
              0.18034771 = queryWeight, product of:
                1.4456955 = boost
                5.1103826 = idf(docFreq=700, maxDocs=42740)
                0.024410708 = queryNorm
              0.6775474 = fieldWeight in 3604, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.1103826 = idf(docFreq=700, maxDocs=42740)
                0.09375 = fieldNorm(doc=3604)
        0.24 = coord(6/25)