Document (#42721)

Author
Lynch, J.D.
Gibson, J.
Han, M.-J.
Title
Analyzing and normalizing type metadata for a large aggregated digital library
Source
Code4Lib journal. Issue 47(2020), [http://journal.code4lib.org]
Year
2020
Abstract
The Illinois Digital Heritage Hub (IDHH) gathers and enhances metadata from contributing institutions around the state of Illinois and provides this metadata to th Digital Public Library of America (DPLA) for greater access. The IDHH helps contributors shape their metadata to the standards recommended and required by the DPLA in part by analyzing and enhancing aggregated metadata. In late 2018, the IDHH undertook a project to address a particularly problematic field, Type metadata. This paper walks through the project, detailing the process of gathering and analyzing metadata using the DPLA API and OpenRefine, data remediation through XSL transformations in conjunction with local improvements by contributing institutions, and the DPLA ingestion system's quality controls.
Content
Vgl.: https://journal.code4lib.org/articles/14995.
Theme
Metadaten

Similar documents (author)

  1. Gibson, J.J.: Wahrnehmung und Umwelt : der ökologische Ansatz in der visuellen Wahrnehmung (1982) 2.24
    2.2404325 = sum of:
      2.2404325 = product of:
        4.480865 = sum of:
          4.480865 = weight(author_txt:gibson in 2490) [ClassicSimilarity], result of:
            4.480865 = score(doc=2490,freq=1.0), product of:
              0.7915123 = queryWeight, product of:
                1.1380302 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.076785594 = queryNorm
              5.661144 = fieldWeight in 2490, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.625 = fieldNorm(doc=2490)
        0.5 = coord(1/2)
    
  2. Gibson, P.: Professionals' perfect Web world in sight : users want more information on the Web, and vendors attempt to provide (1998) 2.24
    2.2404325 = sum of:
      2.2404325 = product of:
        4.480865 = sum of:
          4.480865 = weight(author_txt:gibson in 1656) [ClassicSimilarity], result of:
            4.480865 = score(doc=1656,freq=1.0), product of:
              0.7915123 = queryWeight, product of:
                1.1380302 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.076785594 = queryNorm
              5.661144 = fieldWeight in 1656, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.625 = fieldNorm(doc=1656)
        0.5 = coord(1/2)
    
  3. Gibson, P.: Navigating the Internet road to riches (1998) 2.24
    2.2404325 = sum of:
      2.2404325 = product of:
        4.480865 = sum of:
          4.480865 = weight(author_txt:gibson in 3521) [ClassicSimilarity], result of:
            4.480865 = score(doc=3521,freq=1.0), product of:
              0.7915123 = queryWeight, product of:
                1.1380302 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.076785594 = queryNorm
              5.661144 = fieldWeight in 3521, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.625 = fieldNorm(doc=3521)
        0.5 = coord(1/2)
    
  4. Gibson, P.: HotBot's future is in Lycos' hands : users hope that the search engine won't be hobbled by an acquisition (1999) 2.24
    2.2404325 = sum of:
      2.2404325 = product of:
        4.480865 = sum of:
          4.480865 = weight(author_txt:gibson in 5195) [ClassicSimilarity], result of:
            4.480865 = score(doc=5195,freq=1.0), product of:
              0.7915123 = queryWeight, product of:
                1.1380302 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.076785594 = queryNorm
              5.661144 = fieldWeight in 5195, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.625 = fieldNorm(doc=5195)
        0.5 = coord(1/2)
    
  5. Gibson, R.; Ward, S.: ¬A proposed methodology for studying the function and effectiveness of party and candidate Web sites (2000) 1.79
    1.792346 = sum of:
      1.792346 = product of:
        3.584692 = sum of:
          3.584692 = weight(author_txt:gibson in 3335) [ClassicSimilarity], result of:
            3.584692 = score(doc=3335,freq=1.0), product of:
              0.7915123 = queryWeight, product of:
                1.1380302 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.076785594 = queryNorm
              4.528915 = fieldWeight in 3335, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.5 = fieldNorm(doc=3335)
        0.5 = coord(1/2)
    

Similar documents (content)

  1. Shreeves, S.L.; Kaczmarek, J.S.; Cole, T.W.: Harvesting cultural heritage metadata using OAI Protocol (2003) 0.19
    0.19261925 = sum of:
      0.19261925 = product of:
        0.96309626 = sum of:
          0.122015 = weight(abstract_txt:undertook in 4775) [ClassicSimilarity], result of:
            0.122015 = score(doc=4775,freq=1.0), product of:
              0.18242265 = queryWeight, product of:
                1.222089 = boost
                8.561393 = idf(docFreq=22, maxDocs=44218)
                0.017435383 = queryNorm
              0.6688588 = fieldWeight in 4775, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.561393 = idf(docFreq=22, maxDocs=44218)
                0.078125 = fieldNorm(doc=4775)
          0.06527858 = weight(abstract_txt:project in 4775) [ClassicSimilarity], result of:
            0.06527858 = score(doc=4775,freq=4.0), product of:
              0.09542022 = queryWeight, product of:
                1.2499673 = boost
                4.378348 = idf(docFreq=1507, maxDocs=44218)
                0.017435383 = queryNorm
              0.68411684 = fieldWeight in 4775, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.378348 = idf(docFreq=1507, maxDocs=44218)
                0.078125 = fieldNorm(doc=4775)
          0.06710802 = weight(abstract_txt:digital in 4775) [ClassicSimilarity], result of:
            0.06710802 = score(doc=4775,freq=2.0), product of:
              0.14017913 = queryWeight, product of:
                1.8555206 = boost
                4.332974 = idf(docFreq=1577, maxDocs=44218)
                0.017435383 = queryNorm
              0.4787305 = fieldWeight in 4775, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.332974 = idf(docFreq=1577, maxDocs=44218)
                0.078125 = fieldNorm(doc=4775)
          0.26096553 = weight(abstract_txt:illinois in 4775) [ClassicSimilarity], result of:
            0.26096553 = score(doc=4775,freq=3.0), product of:
              0.2645422 = queryWeight, product of:
                2.081258 = boost
                7.290168 = idf(docFreq=81, maxDocs=44218)
                0.017435383 = queryNorm
              0.98647976 = fieldWeight in 4775, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.290168 = idf(docFreq=81, maxDocs=44218)
                0.078125 = fieldNorm(doc=4775)
          0.44772908 = weight(abstract_txt:metadata in 4775) [ClassicSimilarity], result of:
            0.44772908 = score(doc=4775,freq=8.0), product of:
              0.4150969 = queryWeight, product of:
                4.877387 = boost
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.017435383 = queryNorm
              1.0786134 = fieldWeight in 4775, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.078125 = fieldNorm(doc=4775)
        0.2 = coord(5/25)
    
  2. Isaac, A.; Raemy, J.A.; Meijers, E.; Valk, S. De; Freire, N.: Metadata aggregation via linked data : results of the Europeana Common Culture project (2020) 0.15
    0.15389167 = sum of:
      0.15389167 = product of:
        0.6412153 = sum of:
          0.025097342 = weight(abstract_txt:through in 39) [ClassicSimilarity], result of:
            0.025097342 = score(doc=39,freq=1.0), product of:
              0.080087565 = queryWeight, product of:
                1.1451464 = boost
                4.011184 = idf(docFreq=2176, maxDocs=44218)
                0.017435383 = queryNorm
              0.31337377 = fieldWeight in 39, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.011184 = idf(docFreq=2176, maxDocs=44218)
                0.078125 = fieldNorm(doc=39)
          0.03263929 = weight(abstract_txt:project in 39) [ClassicSimilarity], result of:
            0.03263929 = score(doc=39,freq=1.0), product of:
              0.09542022 = queryWeight, product of:
                1.2499673 = boost
                4.378348 = idf(docFreq=1507, maxDocs=44218)
                0.017435383 = queryNorm
              0.34205842 = fieldWeight in 39, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.378348 = idf(docFreq=1507, maxDocs=44218)
                0.078125 = fieldNorm(doc=39)
          0.08676463 = weight(abstract_txt:institutions in 39) [ClassicSimilarity], result of:
            0.08676463 = score(doc=39,freq=2.0), product of:
              0.14533329 = queryWeight, product of:
                1.5426273 = boost
                5.403468 = idf(docFreq=540, maxDocs=44218)
                0.017435383 = queryNorm
              0.59700453 = fieldWeight in 39, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.403468 = idf(docFreq=540, maxDocs=44218)
                0.078125 = fieldNorm(doc=39)
          0.06710802 = weight(abstract_txt:digital in 39) [ClassicSimilarity], result of:
            0.06710802 = score(doc=39,freq=2.0), product of:
              0.14017913 = queryWeight, product of:
                1.8555206 = boost
                4.332974 = idf(docFreq=1577, maxDocs=44218)
                0.017435383 = queryNorm
              0.4787305 = fieldWeight in 39, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.332974 = idf(docFreq=1577, maxDocs=44218)
                0.078125 = fieldNorm(doc=39)
          0.15542908 = weight(abstract_txt:aggregated in 39) [ClassicSimilarity], result of:
            0.15542908 = score(doc=39,freq=1.0), product of:
              0.27008563 = queryWeight, product of:
                2.1029513 = boost
                7.3661537 = idf(docFreq=75, maxDocs=44218)
                0.017435383 = queryNorm
              0.57548076 = fieldWeight in 39, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3661537 = idf(docFreq=75, maxDocs=44218)
                0.078125 = fieldNorm(doc=39)
          0.27417696 = weight(abstract_txt:metadata in 39) [ClassicSimilarity], result of:
            0.27417696 = score(doc=39,freq=3.0), product of:
              0.4150969 = queryWeight, product of:
                4.877387 = boost
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.017435383 = queryNorm
              0.6605131 = fieldWeight in 39, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.078125 = fieldNorm(doc=39)
        0.24 = coord(6/25)
    
  3. McElfresh, L.K.: Creator name standardization using faceted vocabularies in the BTAA geoportal : Michigan State University libraries digital repository case study (2023) 0.12
    0.12401976 = sum of:
      0.12401976 = product of:
        0.7751235 = sum of:
          0.05539071 = weight(abstract_txt:project in 1178) [ClassicSimilarity], result of:
            0.05539071 = score(doc=1178,freq=2.0), product of:
              0.09542022 = queryWeight, product of:
                1.2499673 = boost
                4.378348 = idf(docFreq=1507, maxDocs=44218)
                0.017435383 = queryNorm
              0.5804924 = fieldWeight in 1178, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.378348 = idf(docFreq=1507, maxDocs=44218)
                0.09375 = fieldNorm(doc=1178)
          0.282879 = weight(abstract_txt:openrefine in 1178) [ClassicSimilarity], result of:
            0.282879 = score(doc=1178,freq=2.0), product of:
              0.22459818 = queryWeight, product of:
                1.3560215 = boost
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.017435383 = queryNorm
              1.2594892 = fieldWeight in 1178, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.09375 = fieldNorm(doc=1178)
          0.05694305 = weight(abstract_txt:digital in 1178) [ClassicSimilarity], result of:
            0.05694305 = score(doc=1178,freq=1.0), product of:
              0.14017913 = queryWeight, product of:
                1.8555206 = boost
                4.332974 = idf(docFreq=1577, maxDocs=44218)
                0.017435383 = queryNorm
              0.40621632 = fieldWeight in 1178, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.332974 = idf(docFreq=1577, maxDocs=44218)
                0.09375 = fieldNorm(doc=1178)
          0.37991074 = weight(abstract_txt:metadata in 1178) [ClassicSimilarity], result of:
            0.37991074 = score(doc=1178,freq=4.0), product of:
              0.4150969 = queryWeight, product of:
                4.877387 = boost
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.017435383 = queryNorm
              0.91523385 = fieldWeight in 1178, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.09375 = fieldNorm(doc=1178)
        0.16 = coord(4/25)
    
  4. Stevens, G.: New metadata recipes for old cookbooks : creating and analyzing a digital collection using the HathiTrust Research Center Portal (2017) 0.12
    0.12204839 = sum of:
      0.12204839 = product of:
        0.61024195 = sum of:
          0.045226328 = weight(abstract_txt:project in 3897) [ClassicSimilarity], result of:
            0.045226328 = score(doc=3897,freq=3.0), product of:
              0.09542022 = queryWeight, product of:
                1.2499673 = boost
                4.378348 = idf(docFreq=1507, maxDocs=44218)
                0.017435383 = queryNorm
              0.47397006 = fieldWeight in 3897, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.378348 = idf(docFreq=1507, maxDocs=44218)
                0.0625 = fieldNorm(doc=3897)
          0.13335043 = weight(abstract_txt:openrefine in 3897) [ClassicSimilarity], result of:
            0.13335043 = score(doc=3897,freq=1.0), product of:
              0.22459818 = queryWeight, product of:
                1.3560215 = boost
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.017435383 = queryNorm
              0.5937289 = fieldWeight in 3897, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.0625 = fieldNorm(doc=3897)
          0.07592406 = weight(abstract_txt:digital in 3897) [ClassicSimilarity], result of:
            0.07592406 = score(doc=3897,freq=4.0), product of:
              0.14017913 = queryWeight, product of:
                1.8555206 = boost
                4.332974 = idf(docFreq=1577, maxDocs=44218)
                0.017435383 = queryNorm
              0.54162174 = fieldWeight in 3897, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.332974 = idf(docFreq=1577, maxDocs=44218)
                0.0625 = fieldNorm(doc=3897)
          0.17664948 = weight(abstract_txt:analyzing in 3897) [ClassicSimilarity], result of:
            0.17664948 = score(doc=3897,freq=3.0), product of:
              0.27090573 = queryWeight, product of:
                2.5794861 = boost
                6.023564 = idf(docFreq=290, maxDocs=44218)
                0.017435383 = queryNorm
              0.6520699 = fieldWeight in 3897, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.023564 = idf(docFreq=290, maxDocs=44218)
                0.0625 = fieldNorm(doc=3897)
          0.17909163 = weight(abstract_txt:metadata in 3897) [ClassicSimilarity], result of:
            0.17909163 = score(doc=3897,freq=2.0), product of:
              0.4150969 = queryWeight, product of:
                4.877387 = boost
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.017435383 = queryNorm
              0.43144536 = fieldWeight in 3897, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.0625 = fieldNorm(doc=3897)
        0.2 = coord(5/25)
    
  5. Valentino, M.L.: Integrating metadata creation into catalog workflow (2010) 0.11
    0.113990955 = sum of:
      0.113990955 = product of:
        0.56995475 = sum of:
          0.03011681 = weight(abstract_txt:through in 4160) [ClassicSimilarity], result of:
            0.03011681 = score(doc=4160,freq=1.0), product of:
              0.080087565 = queryWeight, product of:
                1.1451464 = boost
                4.011184 = idf(docFreq=2176, maxDocs=44218)
                0.017435383 = queryNorm
              0.3760485 = fieldWeight in 4160, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.011184 = idf(docFreq=2176, maxDocs=44218)
                0.09375 = fieldNorm(doc=4160)
          0.14641799 = weight(abstract_txt:undertook in 4160) [ClassicSimilarity], result of:
            0.14641799 = score(doc=4160,freq=1.0), product of:
              0.18242265 = queryWeight, product of:
                1.222089 = boost
                8.561393 = idf(docFreq=22, maxDocs=44218)
                0.017435383 = queryNorm
              0.80263054 = fieldWeight in 4160, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.561393 = idf(docFreq=22, maxDocs=44218)
                0.09375 = fieldNorm(doc=4160)
          0.06783949 = weight(abstract_txt:project in 4160) [ClassicSimilarity], result of:
            0.06783949 = score(doc=4160,freq=3.0), product of:
              0.09542022 = queryWeight, product of:
                1.2499673 = boost
                4.378348 = idf(docFreq=1507, maxDocs=44218)
                0.017435383 = queryNorm
              0.7109551 = fieldWeight in 4160, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.378348 = idf(docFreq=1507, maxDocs=44218)
                0.09375 = fieldNorm(doc=4160)
          0.05694305 = weight(abstract_txt:digital in 4160) [ClassicSimilarity], result of:
            0.05694305 = score(doc=4160,freq=1.0), product of:
              0.14017913 = queryWeight, product of:
                1.8555206 = boost
                4.332974 = idf(docFreq=1577, maxDocs=44218)
                0.017435383 = queryNorm
              0.40621632 = fieldWeight in 4160, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.332974 = idf(docFreq=1577, maxDocs=44218)
                0.09375 = fieldNorm(doc=4160)
          0.26863745 = weight(abstract_txt:metadata in 4160) [ClassicSimilarity], result of:
            0.26863745 = score(doc=4160,freq=2.0), product of:
              0.4150969 = queryWeight, product of:
                4.877387 = boost
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.017435383 = queryNorm
              0.64716804 = fieldWeight in 4160, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.09375 = fieldNorm(doc=4160)
        0.2 = coord(5/25)