Document (#40894)

Author
Bartczak, J.
Glendon, I.
Title
Python, Google Sheets, and the Thesaurus for Graphic Materials for efficient metadata project workflows
Source
Code4Lib journal. Issue 35(2017), [http://journal.code4lib.org]
Year
2017
Abstract
In 2017, the University of Virginia (U.Va.) will launch a two year initiative to celebrate the bicentennial anniversary of the University's founding in 1819. The U.Va. Library is participating in this event by digitizing some 20,000 photographs and negatives that document student life on the U.Va. grounds in the 1960s and 1970s. Metadata librarians and archivists are well-versed in the challenges associated with generating digital content and accompanying description within the context of limited resources. This paper describes how technology and new approaches to metadata design have enabled the University of Virginia's Metadata Analysis and Design Department to rapidly and successfully generate accurate description for these digital objects. Python's pandas module improves efficiency by cleaning and repurposing data recorded at digitization, while the lxml module builds MODS XML programmatically from CSV tables. A simplified technique for subject heading selection and assignment in Google Sheets provides a collaborative environment for streamlined metadata creation and data quality control.
Content
Vgl.: http://journal.code4lib.org/articles/12182.
Theme
Metadaten

Similar documents (content)

  1. Lorenzo, L.; Mak, L.; Smeltekop, N.: FAST Headings in MODS : Michigan State University libraries digital repository case study (2023) 0.16
    0.15526016 = sum of:
      0.15526016 = product of:
        0.64691734 = sum of:
          0.109749846 = weight(abstract_txt:workflows in 1177) [ClassicSimilarity], result of:
            0.109749846 = score(doc=1177,freq=1.0), product of:
              0.15224095 = queryWeight, product of:
                1.026342 = boost
                7.689554 = idf(docFreq=54, maxDocs=44218)
                0.019290265 = queryNorm
              0.7208957 = fieldWeight in 1177, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.689554 = idf(docFreq=54, maxDocs=44218)
                0.09375 = fieldNorm(doc=1177)
          0.037452947 = weight(abstract_txt:university in 1177) [ClassicSimilarity], result of:
            0.037452947 = score(doc=1177,freq=1.0), product of:
              0.09366907 = queryWeight, product of:
                1.1385169 = boost
                4.264995 = idf(docFreq=1688, maxDocs=44218)
                0.019290265 = queryNorm
              0.39984328 = fieldWeight in 1177, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.264995 = idf(docFreq=1688, maxDocs=44218)
                0.09375 = fieldNorm(doc=1177)
          0.07854502 = weight(abstract_txt:digital in 1177) [ClassicSimilarity], result of:
            0.07854502 = score(doc=1177,freq=4.0), product of:
              0.09667881 = queryWeight, product of:
                1.1566634 = boost
                4.332974 = idf(docFreq=1577, maxDocs=44218)
                0.019290265 = queryNorm
              0.81243265 = fieldWeight in 1177, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.332974 = idf(docFreq=1577, maxDocs=44218)
                0.09375 = fieldNorm(doc=1177)
          0.16808996 = weight(abstract_txt:mods in 1177) [ClassicSimilarity], result of:
            0.16808996 = score(doc=1177,freq=1.0), product of:
              0.20228174 = queryWeight, product of:
                1.1830544 = boost
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.019290265 = queryNorm
              0.83096945 = fieldWeight in 1177, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.09375 = fieldNorm(doc=1177)
          0.054571673 = weight(abstract_txt:description in 1177) [ClassicSimilarity], result of:
            0.054571673 = score(doc=1177,freq=1.0), product of:
              0.12038814 = queryWeight, product of:
                1.2907236 = boost
                4.835176 = idf(docFreq=954, maxDocs=44218)
                0.019290265 = queryNorm
              0.45329773 = fieldWeight in 1177, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.835176 = idf(docFreq=954, maxDocs=44218)
                0.09375 = fieldNorm(doc=1177)
          0.19850788 = weight(abstract_txt:metadata in 1177) [ClassicSimilarity], result of:
            0.19850788 = score(doc=1177,freq=2.0), product of:
              0.30673313 = queryWeight, product of:
                3.2575548 = boost
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.019290265 = queryNorm
              0.64716804 = fieldWeight in 1177, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.09375 = fieldNorm(doc=1177)
        0.24 = coord(6/25)
    
  2. Kurth, M.; Ruddy, D.; Rupp, N.: Repurposing MARC metadata : using digital project experience to develop a metadata management design (2004) 0.15
    0.14921708 = sum of:
      0.14921708 = product of:
        0.7460854 = sum of:
          0.024247326 = weight(abstract_txt:design in 4748) [ClassicSimilarity], result of:
            0.024247326 = score(doc=4748,freq=1.0), product of:
              0.07915936 = queryWeight, product of:
                1.0466284 = boost
                3.9207718 = idf(docFreq=2382, maxDocs=44218)
                0.019290265 = queryNorm
              0.3063103 = fieldWeight in 4748, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9207718 = idf(docFreq=2382, maxDocs=44218)
                0.078125 = fieldNorm(doc=4748)
          0.031210793 = weight(abstract_txt:university in 4748) [ClassicSimilarity], result of:
            0.031210793 = score(doc=4748,freq=1.0), product of:
              0.09366907 = queryWeight, product of:
                1.1385169 = boost
                4.264995 = idf(docFreq=1688, maxDocs=44218)
                0.019290265 = queryNorm
              0.33320275 = fieldWeight in 4748, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.264995 = idf(docFreq=1688, maxDocs=44218)
                0.078125 = fieldNorm(doc=4748)
          0.056684982 = weight(abstract_txt:digital in 4748) [ClassicSimilarity], result of:
            0.056684982 = score(doc=4748,freq=3.0), product of:
              0.09667881 = queryWeight, product of:
                1.1566634 = boost
                4.332974 = idf(docFreq=1577, maxDocs=44218)
                0.019290265 = queryNorm
              0.5863227 = fieldWeight in 4748, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.332974 = idf(docFreq=1577, maxDocs=44218)
                0.078125 = fieldNorm(doc=4748)
          0.22873981 = weight(abstract_txt:repurposing in 4748) [ClassicSimilarity], result of:
            0.22873981 = score(doc=4748,freq=2.0), product of:
              0.22263882 = queryWeight, product of:
                1.2411573 = boost
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.019290265 = queryNorm
              1.0274031 = fieldWeight in 4748, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.078125 = fieldNorm(doc=4748)
          0.4052025 = weight(abstract_txt:metadata in 4748) [ClassicSimilarity], result of:
            0.4052025 = score(doc=4748,freq=12.0), product of:
              0.30673313 = queryWeight, product of:
                3.2575548 = boost
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.019290265 = queryNorm
              1.3210262 = fieldWeight in 4748, product of:
                3.4641016 = tf(freq=12.0), with freq of:
                  12.0 = termFreq=12.0
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.078125 = fieldNorm(doc=4748)
        0.2 = coord(5/25)
    
  3. Wacker, M.; Han, M.-J.; Dartt, J.: Testing Resource Description and Access (RDA) with non-MARC standards (2011) 0.14
    0.13607053 = sum of:
      0.13607053 = product of:
        0.6803526 = sum of:
          0.109749846 = weight(abstract_txt:workflows in 1900) [ClassicSimilarity], result of:
            0.109749846 = score(doc=1900,freq=1.0), product of:
              0.15224095 = queryWeight, product of:
                1.026342 = boost
                7.689554 = idf(docFreq=54, maxDocs=44218)
                0.019290265 = queryNorm
              0.7208957 = fieldWeight in 1900, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.689554 = idf(docFreq=54, maxDocs=44218)
                0.09375 = fieldNorm(doc=1900)
          0.06487041 = weight(abstract_txt:university in 1900) [ClassicSimilarity], result of:
            0.06487041 = score(doc=1900,freq=3.0), product of:
              0.09366907 = queryWeight, product of:
                1.1385169 = boost
                4.264995 = idf(docFreq=1688, maxDocs=44218)
                0.019290265 = queryNorm
              0.6925489 = fieldWeight in 1900, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.264995 = idf(docFreq=1688, maxDocs=44218)
                0.09375 = fieldNorm(doc=1900)
          0.16808996 = weight(abstract_txt:mods in 1900) [ClassicSimilarity], result of:
            0.16808996 = score(doc=1900,freq=1.0), product of:
              0.20228174 = queryWeight, product of:
                1.1830544 = boost
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.019290265 = queryNorm
              0.83096945 = fieldWeight in 1900, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.09375 = fieldNorm(doc=1900)
          0.094520904 = weight(abstract_txt:description in 1900) [ClassicSimilarity], result of:
            0.094520904 = score(doc=1900,freq=3.0), product of:
              0.12038814 = queryWeight, product of:
                1.2907236 = boost
                4.835176 = idf(docFreq=954, maxDocs=44218)
                0.019290265 = queryNorm
              0.7851347 = fieldWeight in 1900, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.835176 = idf(docFreq=954, maxDocs=44218)
                0.09375 = fieldNorm(doc=1900)
          0.24312152 = weight(abstract_txt:metadata in 1900) [ClassicSimilarity], result of:
            0.24312152 = score(doc=1900,freq=3.0), product of:
              0.30673313 = queryWeight, product of:
                3.2575548 = boost
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.019290265 = queryNorm
              0.7926158 = fieldWeight in 1900, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.09375 = fieldNorm(doc=1900)
        0.2 = coord(5/25)
    
  4. Kirschenbaum, M.: Documenting digital images : textual meta-data at the Blake Archive (1998) 0.12
    0.12264349 = sum of:
      0.12264349 = product of:
        0.6132175 = sum of:
          0.1195843 = weight(abstract_txt:accompanying in 3287) [ClassicSimilarity], result of:
            0.1195843 = score(doc=3287,freq=1.0), product of:
              0.16120492 = queryWeight, product of:
                1.0561255 = boost
                7.912698 = idf(docFreq=43, maxDocs=44218)
                0.019290265 = queryNorm
              0.74181545 = fieldWeight in 3287, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.912698 = idf(docFreq=43, maxDocs=44218)
                0.09375 = fieldNorm(doc=3287)
          0.13617519 = weight(abstract_txt:virginia in 3287) [ClassicSimilarity], result of:
            0.13617519 = score(doc=3287,freq=1.0), product of:
              0.17579001 = queryWeight, product of:
                1.1028678 = boost
                8.2629 = idf(docFreq=30, maxDocs=44218)
                0.019290265 = queryNorm
              0.7746469 = fieldWeight in 3287, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.2629 = idf(docFreq=30, maxDocs=44218)
                0.09375 = fieldNorm(doc=3287)
          0.037452947 = weight(abstract_txt:university in 3287) [ClassicSimilarity], result of:
            0.037452947 = score(doc=3287,freq=1.0), product of:
              0.09366907 = queryWeight, product of:
                1.1385169 = boost
                4.264995 = idf(docFreq=1688, maxDocs=44218)
                0.019290265 = queryNorm
              0.39984328 = fieldWeight in 3287, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.264995 = idf(docFreq=1688, maxDocs=44218)
                0.09375 = fieldNorm(doc=3287)
          0.03927251 = weight(abstract_txt:digital in 3287) [ClassicSimilarity], result of:
            0.03927251 = score(doc=3287,freq=1.0), product of:
              0.09667881 = queryWeight, product of:
                1.1566634 = boost
                4.332974 = idf(docFreq=1577, maxDocs=44218)
                0.019290265 = queryNorm
              0.40621632 = fieldWeight in 3287, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.332974 = idf(docFreq=1577, maxDocs=44218)
                0.09375 = fieldNorm(doc=3287)
          0.28073254 = weight(abstract_txt:metadata in 3287) [ClassicSimilarity], result of:
            0.28073254 = score(doc=3287,freq=4.0), product of:
              0.30673313 = queryWeight, product of:
                3.2575548 = boost
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.019290265 = queryNorm
              0.91523385 = fieldWeight in 3287, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.09375 = fieldNorm(doc=3287)
        0.2 = coord(5/25)
    
  5. Hardesty, J.L.; Young, J.B.: ¬The semantics of metadata : Avalon Media System and the move to RDF (2017) 0.12
    0.12071838 = sum of:
      0.12071838 = product of:
        0.5029933 = sum of:
          0.07972287 = weight(abstract_txt:accompanying in 3896) [ClassicSimilarity], result of:
            0.07972287 = score(doc=3896,freq=1.0), product of:
              0.16120492 = queryWeight, product of:
                1.0561255 = boost
                7.912698 = idf(docFreq=43, maxDocs=44218)
                0.019290265 = queryNorm
              0.4945436 = fieldWeight in 3896, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.912698 = idf(docFreq=43, maxDocs=44218)
                0.0625 = fieldNorm(doc=3896)
          0.03531098 = weight(abstract_txt:university in 3896) [ClassicSimilarity], result of:
            0.03531098 = score(doc=3896,freq=2.0), product of:
              0.09366907 = queryWeight, product of:
                1.1385169 = boost
                4.264995 = idf(docFreq=1688, maxDocs=44218)
                0.019290265 = queryNorm
              0.37697586 = fieldWeight in 3896, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.264995 = idf(docFreq=1688, maxDocs=44218)
                0.0625 = fieldNorm(doc=3896)
          0.052363344 = weight(abstract_txt:digital in 3896) [ClassicSimilarity], result of:
            0.052363344 = score(doc=3896,freq=4.0), product of:
              0.09667881 = queryWeight, product of:
                1.1566634 = boost
                4.332974 = idf(docFreq=1577, maxDocs=44218)
                0.019290265 = queryNorm
              0.54162174 = fieldWeight in 3896, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.332974 = idf(docFreq=1577, maxDocs=44218)
                0.0625 = fieldNorm(doc=3896)
          0.112059966 = weight(abstract_txt:mods in 3896) [ClassicSimilarity], result of:
            0.112059966 = score(doc=3896,freq=1.0), product of:
              0.20228174 = queryWeight, product of:
                1.1830544 = boost
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.019290265 = queryNorm
              0.55397964 = fieldWeight in 3896, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.0625 = fieldNorm(doc=3896)
          0.036381114 = weight(abstract_txt:description in 3896) [ClassicSimilarity], result of:
            0.036381114 = score(doc=3896,freq=1.0), product of:
              0.12038814 = queryWeight, product of:
                1.2907236 = boost
                4.835176 = idf(docFreq=954, maxDocs=44218)
                0.019290265 = queryNorm
              0.3021985 = fieldWeight in 3896, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.835176 = idf(docFreq=954, maxDocs=44218)
                0.0625 = fieldNorm(doc=3896)
          0.18715502 = weight(abstract_txt:metadata in 3896) [ClassicSimilarity], result of:
            0.18715502 = score(doc=3896,freq=4.0), product of:
              0.30673313 = queryWeight, product of:
                3.2575548 = boost
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.019290265 = queryNorm
              0.6101559 = fieldWeight in 3896, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.0625 = fieldNorm(doc=3896)
        0.24 = coord(6/25)