Document (#40895)

Author
Bartczak, J.
Glendon, I.
Title
Python, Google Sheets, and the Thesaurus for Graphic Materials for efficient metadata project workflows
Source
Code4Lib journal. Issue 35(2017), [http://journal.code4lib.org]
Year
2017
Abstract
In 2017, the University of Virginia (U.Va.) will launch a two year initiative to celebrate the bicentennial anniversary of the University's founding in 1819. The U.Va. Library is participating in this event by digitizing some 20,000 photographs and negatives that document student life on the U.Va. grounds in the 1960s and 1970s. Metadata librarians and archivists are well-versed in the challenges associated with generating digital content and accompanying description within the context of limited resources. This paper describes how technology and new approaches to metadata design have enabled the University of Virginia's Metadata Analysis and Design Department to rapidly and successfully generate accurate description for these digital objects. Python's pandas module improves efficiency by cleaning and repurposing data recorded at digitization, while the lxml module builds MODS XML programmatically from CSV tables. A simplified technique for subject heading selection and assignment in Google Sheets provides a collaborative environment for streamlined metadata creation and data quality control.
Content
Vgl.: http://journal.code4lib.org/articles/12182.
Theme
Metadaten

Similar documents (content)

  1. Kurth, M.; Ruddy, D.; Rupp, N.: Repurposing MARC metadata : using digital project experience to develop a metadata management design (2004) 0.15
    0.15092161 = sum of:
      0.15092161 = product of:
        0.75460804 = sum of:
          0.024644151 = weight(abstract_txt:design in 5749) [ClassicSimilarity], result of:
            0.024644151 = score(doc=5749,freq=1.0), product of:
              0.07985534 = queryWeight, product of:
                1.0466008 = boost
                3.9502072 = idf(docFreq=2228, maxDocs=42596)
                0.019315368 = queryNorm
              0.30860993 = fieldWeight in 5749, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9502072 = idf(docFreq=2228, maxDocs=42596)
                0.078125 = fieldNorm(doc=5749)
          0.030726185 = weight(abstract_txt:university in 5749) [ClassicSimilarity], result of:
            0.030726185 = score(doc=5749,freq=1.0), product of:
              0.09250541 = queryWeight, product of:
                1.126452 = boost
                4.251591 = idf(docFreq=1648, maxDocs=42596)
                0.019315368 = queryNorm
              0.33215556 = fieldWeight in 5749, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.251591 = idf(docFreq=1648, maxDocs=42596)
                0.078125 = fieldNorm(doc=5749)
          0.05847263 = weight(abstract_txt:digital in 5749) [ClassicSimilarity], result of:
            0.05847263 = score(doc=5749,freq=3.0), product of:
              0.098496936 = queryWeight, product of:
                1.1623595 = boost
                4.3871174 = idf(docFreq=1439, maxDocs=42596)
                0.019315368 = queryNorm
              0.5936492 = fieldWeight in 5749, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.3871174 = idf(docFreq=1439, maxDocs=42596)
                0.078125 = fieldNorm(doc=5749)
          0.23160046 = weight(abstract_txt:repurposing in 5749) [ClassicSimilarity], result of:
            0.23160046 = score(doc=5749,freq=2.0), product of:
              0.22402732 = queryWeight, product of:
                1.2395515 = boost
                9.356931 = idf(docFreq=9, maxDocs=42596)
                0.019315368 = queryNorm
              1.0338045 = fieldWeight in 5749, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.356931 = idf(docFreq=9, maxDocs=42596)
                0.078125 = fieldNorm(doc=5749)
          0.40916458 = weight(abstract_txt:metadata in 5749) [ClassicSimilarity], result of:
            0.40916458 = score(doc=5749,freq=12.0), product of:
              0.30809146 = queryWeight, product of:
                3.250416 = boost
                4.907245 = idf(docFreq=855, maxDocs=42596)
                0.019315368 = queryNorm
              1.328062 = fieldWeight in 5749, product of:
                3.4641016 = tf(freq=12.0), with freq of:
                  12.0 = termFreq=12.0
                4.907245 = idf(docFreq=855, maxDocs=42596)
                0.078125 = fieldNorm(doc=5749)
        0.2 = coord(5/25)
    
  2. Wacker, M.; Han, M.-J.; Dartt, J.: Testing Resource Description and Access (RDA) with non-MARC standards (2011) 0.14
    0.13847192 = sum of:
      0.13847192 = product of:
        0.69235957 = sum of:
          0.11616836 = weight(abstract_txt:workflows in 2901) [ClassicSimilarity], result of:
            0.11616836 = score(doc=2901,freq=1.0), product of:
              0.15779349 = queryWeight, product of:
                1.0403001 = boost
                7.8528533 = idf(docFreq=44, maxDocs=42596)
                0.019315368 = queryNorm
              0.736205 = fieldWeight in 2901, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.8528533 = idf(docFreq=44, maxDocs=42596)
                0.09375 = fieldNorm(doc=2901)
          0.06386318 = weight(abstract_txt:university in 2901) [ClassicSimilarity], result of:
            0.06386318 = score(doc=2901,freq=3.0), product of:
              0.09250541 = queryWeight, product of:
                1.126452 = boost
                4.251591 = idf(docFreq=1648, maxDocs=42596)
                0.019315368 = queryNorm
              0.69037235 = fieldWeight in 2901, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.251591 = idf(docFreq=1648, maxDocs=42596)
                0.09375 = fieldNorm(doc=2901)
          0.17206317 = weight(abstract_txt:mods in 2901) [ClassicSimilarity], result of:
            0.17206317 = score(doc=2901,freq=1.0), product of:
              0.20503241 = queryWeight, product of:
                1.185838 = boost
                8.951466 = idf(docFreq=14, maxDocs=42596)
                0.019315368 = queryNorm
              0.8391999 = fieldWeight in 2901, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.951466 = idf(docFreq=14, maxDocs=42596)
                0.09375 = fieldNorm(doc=2901)
          0.09476608 = weight(abstract_txt:description in 2901) [ClassicSimilarity], result of:
            0.09476608 = score(doc=2901,freq=3.0), product of:
              0.12034704 = queryWeight, product of:
                1.2848333 = boost
                4.8493733 = idf(docFreq=906, maxDocs=42596)
                0.019315368 = queryNorm
              0.78744006 = fieldWeight in 2901, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.8493733 = idf(docFreq=906, maxDocs=42596)
                0.09375 = fieldNorm(doc=2901)
          0.24549876 = weight(abstract_txt:metadata in 2901) [ClassicSimilarity], result of:
            0.24549876 = score(doc=2901,freq=3.0), product of:
              0.30809146 = queryWeight, product of:
                3.250416 = boost
                4.907245 = idf(docFreq=855, maxDocs=42596)
                0.019315368 = queryNorm
              0.7968373 = fieldWeight in 2901, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.907245 = idf(docFreq=855, maxDocs=42596)
                0.09375 = fieldNorm(doc=2901)
        0.2 = coord(5/25)
    
  3. Kirschenbaum, M.: Documenting digital images : textual meta-data at the Blake Archive (1998) 0.12
    0.1225125 = sum of:
      0.1225125 = product of:
        0.6125625 = sum of:
          0.11819766 = weight(abstract_txt:accompanying in 4288) [ClassicSimilarity], result of:
            0.11819766 = score(doc=4288,freq=1.0), product of:
              0.15962581 = queryWeight, product of:
                1.0463227 = boost
                7.8983154 = idf(docFreq=42, maxDocs=42596)
                0.019315368 = queryNorm
              0.7404671 = fieldWeight in 4288, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.8983154 = idf(docFreq=42, maxDocs=42596)
                0.09375 = fieldNorm(doc=4288)
          0.13350482 = weight(abstract_txt:virginia in 4288) [ClassicSimilarity], result of:
            0.13350482 = score(doc=4288,freq=1.0), product of:
              0.17312582 = queryWeight, product of:
                1.0896701 = boost
                8.225529 = idf(docFreq=30, maxDocs=42596)
                0.019315368 = queryNorm
              0.7711433 = fieldWeight in 4288, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.225529 = idf(docFreq=30, maxDocs=42596)
                0.09375 = fieldNorm(doc=4288)
          0.036871426 = weight(abstract_txt:university in 4288) [ClassicSimilarity], result of:
            0.036871426 = score(doc=4288,freq=1.0), product of:
              0.09250541 = queryWeight, product of:
                1.126452 = boost
                4.251591 = idf(docFreq=1648, maxDocs=42596)
                0.019315368 = queryNorm
              0.3985867 = fieldWeight in 4288, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.251591 = idf(docFreq=1648, maxDocs=42596)
                0.09375 = fieldNorm(doc=4288)
          0.040511027 = weight(abstract_txt:digital in 4288) [ClassicSimilarity], result of:
            0.040511027 = score(doc=4288,freq=1.0), product of:
              0.098496936 = queryWeight, product of:
                1.1623595 = boost
                4.3871174 = idf(docFreq=1439, maxDocs=42596)
                0.019315368 = queryNorm
              0.41129225 = fieldWeight in 4288, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3871174 = idf(docFreq=1439, maxDocs=42596)
                0.09375 = fieldNorm(doc=4288)
          0.28347754 = weight(abstract_txt:metadata in 4288) [ClassicSimilarity], result of:
            0.28347754 = score(doc=4288,freq=4.0), product of:
              0.30809146 = queryWeight, product of:
                3.250416 = boost
                4.907245 = idf(docFreq=855, maxDocs=42596)
                0.019315368 = queryNorm
              0.92010844 = fieldWeight in 4288, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.907245 = idf(docFreq=855, maxDocs=42596)
                0.09375 = fieldNorm(doc=4288)
        0.2 = coord(5/25)
    
  4. Hardesty, J.L.; Young, J.B.: ¬The semantics of metadata : Avalon Media System and the move to RDF (2017) 0.12
    0.121858835 = sum of:
      0.121858835 = product of:
        0.50774515 = sum of:
          0.078798436 = weight(abstract_txt:accompanying in 4897) [ClassicSimilarity], result of:
            0.078798436 = score(doc=4897,freq=1.0), product of:
              0.15962581 = queryWeight, product of:
                1.0463227 = boost
                7.8983154 = idf(docFreq=42, maxDocs=42596)
                0.019315368 = queryNorm
              0.4936447 = fieldWeight in 4897, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.8983154 = idf(docFreq=42, maxDocs=42596)
                0.0625 = fieldNorm(doc=4897)
          0.03476271 = weight(abstract_txt:university in 4897) [ClassicSimilarity], result of:
            0.03476271 = score(doc=4897,freq=2.0), product of:
              0.09250541 = queryWeight, product of:
                1.126452 = boost
                4.251591 = idf(docFreq=1648, maxDocs=42596)
                0.019315368 = queryNorm
              0.3757911 = fieldWeight in 4897, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.251591 = idf(docFreq=1648, maxDocs=42596)
                0.0625 = fieldNorm(doc=4897)
          0.0540147 = weight(abstract_txt:digital in 4897) [ClassicSimilarity], result of:
            0.0540147 = score(doc=4897,freq=4.0), product of:
              0.098496936 = queryWeight, product of:
                1.1623595 = boost
                4.3871174 = idf(docFreq=1439, maxDocs=42596)
                0.019315368 = queryNorm
              0.5483897 = fieldWeight in 4897, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.3871174 = idf(docFreq=1439, maxDocs=42596)
                0.0625 = fieldNorm(doc=4897)
          0.11470878 = weight(abstract_txt:mods in 4897) [ClassicSimilarity], result of:
            0.11470878 = score(doc=4897,freq=1.0), product of:
              0.20503241 = queryWeight, product of:
                1.185838 = boost
                8.951466 = idf(docFreq=14, maxDocs=42596)
                0.019315368 = queryNorm
              0.5594666 = fieldWeight in 4897, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.951466 = idf(docFreq=14, maxDocs=42596)
                0.0625 = fieldNorm(doc=4897)
          0.036475483 = weight(abstract_txt:description in 4897) [ClassicSimilarity], result of:
            0.036475483 = score(doc=4897,freq=1.0), product of:
              0.12034704 = queryWeight, product of:
                1.2848333 = boost
                4.8493733 = idf(docFreq=906, maxDocs=42596)
                0.019315368 = queryNorm
              0.30308583 = fieldWeight in 4897, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8493733 = idf(docFreq=906, maxDocs=42596)
                0.0625 = fieldNorm(doc=4897)
          0.18898503 = weight(abstract_txt:metadata in 4897) [ClassicSimilarity], result of:
            0.18898503 = score(doc=4897,freq=4.0), product of:
              0.30809146 = queryWeight, product of:
                3.250416 = boost
                4.907245 = idf(docFreq=855, maxDocs=42596)
                0.019315368 = queryNorm
              0.61340564 = fieldWeight in 4897, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.907245 = idf(docFreq=855, maxDocs=42596)
                0.0625 = fieldNorm(doc=4897)
        0.24 = coord(6/25)
    
  5. Guenther, R.S.: Using the Metadata Object Description Schema (MODS) for resource description : guidelines and applications (2004) 0.11
    0.11479778 = sum of:
      0.11479778 = product of:
        0.71748614 = sum of:
          0.11819766 = weight(abstract_txt:accompanying in 3838) [ClassicSimilarity], result of:
            0.11819766 = score(doc=3838,freq=1.0), product of:
              0.15962581 = queryWeight, product of:
                1.0463227 = boost
                7.8983154 = idf(docFreq=42, maxDocs=42596)
                0.019315368 = queryNorm
              0.7404671 = fieldWeight in 3838, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.8983154 = idf(docFreq=42, maxDocs=42596)
                0.09375 = fieldNorm(doc=3838)
          0.34412634 = weight(abstract_txt:mods in 3838) [ClassicSimilarity], result of:
            0.34412634 = score(doc=3838,freq=4.0), product of:
              0.20503241 = queryWeight, product of:
                1.185838 = boost
                8.951466 = idf(docFreq=14, maxDocs=42596)
                0.019315368 = queryNorm
              1.6783998 = fieldWeight in 3838, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                8.951466 = idf(docFreq=14, maxDocs=42596)
                0.09375 = fieldNorm(doc=3838)
          0.054713227 = weight(abstract_txt:description in 3838) [ClassicSimilarity], result of:
            0.054713227 = score(doc=3838,freq=1.0), product of:
              0.12034704 = queryWeight, product of:
                1.2848333 = boost
                4.8493733 = idf(docFreq=906, maxDocs=42596)
                0.019315368 = queryNorm
              0.45462877 = fieldWeight in 3838, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8493733 = idf(docFreq=906, maxDocs=42596)
                0.09375 = fieldNorm(doc=3838)
          0.20044892 = weight(abstract_txt:metadata in 3838) [ClassicSimilarity], result of:
            0.20044892 = score(doc=3838,freq=2.0), product of:
              0.30809146 = queryWeight, product of:
                3.250416 = boost
                4.907245 = idf(docFreq=855, maxDocs=42596)
                0.019315368 = queryNorm
              0.650615 = fieldWeight in 3838, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.907245 = idf(docFreq=855, maxDocs=42596)
                0.09375 = fieldNorm(doc=3838)
        0.16 = coord(4/25)