Document (#42445)

Author
Hodges, D.W.
Schlottmann, K.
Title
better archival migration outcomes with Python and the Google Sheets API : Reporting from the archives
Source
Code4Lib journal. Issue 46(2019), [http://journal.code4lib.org]
Year
2019
Abstract
Columbia University Libraries recently embarked on a multi-phase project to migrate nearly 4,000 records describing over 70,000 linear feet of archival material from disparate sources and formats into ArchivesSpace. This paper discusses tools and methods brought to bear in Phase 2 of this project, which required us to look closely at how to integrate a large number of legacy finding aids into the new system and merge descriptive data that had diverged in myriad ways. Using Python, XSLT, and a widely available if underappreciated resource-the Google Sheets API-archival and technical library staff devised ways to efficiently report data from different sources, and present it in an accessible, user-friendly way,. Responses were then fed back into automated data remediation processes to keep the migration project on track and minimize manual intervention. The scripts and processes developed proved very effective, and moreover, show promise well beyond the ArchivesSpace migration. This paper describes the Python/XSLT/Sheets API processes developed and how they opened a path to move beyond CSV-based reporting with flexible, ad-hoc data interfaces easily adaptable to meet a variety of purposes.
Content
Vgl.: https://journal.code4lib.org/articles/14871.
Theme
Metadaten
Object
Google Sheets API
Python

Similar documents (author)

  1. Hodges, K.L.: Chronological order (1975) 5.71
    5.7074614 = sum of:
      5.7074614 = weight(author_txt:hodges in 7345) [ClassicSimilarity], result of:
        5.7074614 = fieldWeight in 7345, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.131938 = idf(docFreq=12, maxDocs=44218)
          0.625 = fieldNorm(doc=7345)
    
  2. Hodges, P.R.: Keyword in title indexes : effectiveness of retrieval in computer searches (1983) 5.71
    5.7074614 = sum of:
      5.7074614 = weight(author_txt:hodges in 5001) [ClassicSimilarity], result of:
        5.7074614 = fieldWeight in 5001, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.131938 = idf(docFreq=12, maxDocs=44218)
          0.625 = fieldNorm(doc=5001)
    
  3. Hodges, J.E.: Automated systems for the generation of document indexes (2000) 5.71
    5.7074614 = sum of:
      5.7074614 = weight(author_txt:hodges in 4668) [ClassicSimilarity], result of:
        5.7074614 = fieldWeight in 4668, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.131938 = idf(docFreq=12, maxDocs=44218)
          0.625 = fieldNorm(doc=4668)
    
  4. Hodges, A.: ¬Der Mann hinter der Maschine (2012) 5.71
    5.7074614 = sum of:
      5.7074614 = weight(author_txt:hodges in 157) [ClassicSimilarity], result of:
        5.7074614 = fieldWeight in 157, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.131938 = idf(docFreq=12, maxDocs=44218)
          0.625 = fieldNorm(doc=157)
    
  5. Hodges, J.A.: Forensically reconstructing biomedical maintenance labor : PDF metadata under the epistemic conditions of COVID-19 (2021) 5.71
    5.7074614 = sum of:
      5.7074614 = weight(author_txt:hodges in 388) [ClassicSimilarity], result of:
        5.7074614 = fieldWeight in 388, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.131938 = idf(docFreq=12, maxDocs=44218)
          0.625 = fieldNorm(doc=388)
    

Similar documents (content)

  1. Sinn, D.; Soares, N.: Historians' use of digital archival collections : the web, historical scholarship, and archival research (2014) 0.10
    0.09972618 = sum of:
      0.09972618 = product of:
        0.35616493 = sum of:
          0.010427406 = weight(abstract_txt:from in 1349) [ClassicSimilarity], result of:
            0.010427406 = score(doc=1349,freq=2.0), product of:
              0.04268366 = queryWeight, product of:
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.01544337 = queryNorm
              0.24429502 = fieldWeight in 1349, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.0625 = fieldNorm(doc=1349)
          0.017197175 = weight(abstract_txt:developed in 1349) [ClassicSimilarity], result of:
            0.017197175 = score(doc=1349,freq=1.0), product of:
              0.06557855 = queryWeight, product of:
                1.0120558 = boost
                4.195805 = idf(docFreq=1809, maxDocs=44218)
                0.01544337 = queryNorm
              0.26223782 = fieldWeight in 1349, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.195805 = idf(docFreq=1809, maxDocs=44218)
                0.0625 = fieldNorm(doc=1349)
          0.035198916 = weight(abstract_txt:sources in 1349) [ClassicSimilarity], result of:
            0.035198916 = score(doc=1349,freq=2.0), product of:
              0.083907336 = queryWeight, product of:
                1.1447839 = boost
                4.7460723 = idf(docFreq=1043, maxDocs=44218)
                0.01544337 = queryNorm
              0.4194975 = fieldWeight in 1349, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.7460723 = idf(docFreq=1043, maxDocs=44218)
                0.0625 = fieldNorm(doc=1349)
          0.017731233 = weight(abstract_txt:into in 1349) [ClassicSimilarity], result of:
            0.017731233 = score(doc=1349,freq=1.0), product of:
              0.07661493 = queryWeight, product of:
                1.3397565 = boost
                3.7029297 = idf(docFreq=2962, maxDocs=44218)
                0.01544337 = queryNorm
              0.23143311 = fieldWeight in 1349, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.7029297 = idf(docFreq=2962, maxDocs=44218)
                0.0625 = fieldNorm(doc=1349)
          0.029311178 = weight(abstract_txt:project in 1349) [ClassicSimilarity], result of:
            0.029311178 = score(doc=1349,freq=1.0), product of:
              0.1071132 = queryWeight, product of:
                1.5841295 = boost
                4.378348 = idf(docFreq=1507, maxDocs=44218)
                0.01544337 = queryNorm
              0.27364674 = fieldWeight in 1349, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.378348 = idf(docFreq=1507, maxDocs=44218)
                0.0625 = fieldNorm(doc=1349)
          0.047269672 = weight(abstract_txt:processes in 1349) [ClassicSimilarity], result of:
            0.047269672 = score(doc=1349,freq=1.0), product of:
              0.1473022 = queryWeight, product of:
                1.8576922 = boost
                5.1344433 = idf(docFreq=707, maxDocs=44218)
                0.01544337 = queryNorm
              0.3209027 = fieldWeight in 1349, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1344433 = idf(docFreq=707, maxDocs=44218)
                0.0625 = fieldNorm(doc=1349)
          0.19902937 = weight(abstract_txt:archival in 1349) [ClassicSimilarity], result of:
            0.19902937 = score(doc=1349,freq=5.0), product of:
              0.22461681 = queryWeight, product of:
                2.2939835 = boost
                6.340301 = idf(docFreq=211, maxDocs=44218)
                0.01544337 = queryNorm
              0.886084 = fieldWeight in 1349, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.340301 = idf(docFreq=211, maxDocs=44218)
                0.0625 = fieldNorm(doc=1349)
        0.28 = coord(7/25)
    
  2. Godfrey, B.; Johnson, J.: ¬The geospatial metadata manager's toolbox : three techniques for maintaining records (2015) 0.08
    0.084148735 = sum of:
      0.084148735 = product of:
        0.70123947 = sum of:
          0.025795765 = weight(abstract_txt:developed in 2275) [ClassicSimilarity], result of:
            0.025795765 = score(doc=2275,freq=1.0), product of:
              0.06557855 = queryWeight, product of:
                1.0120558 = boost
                4.195805 = idf(docFreq=1809, maxDocs=44218)
                0.01544337 = queryNorm
              0.39335674 = fieldWeight in 2275, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.195805 = idf(docFreq=1809, maxDocs=44218)
                0.09375 = fieldNorm(doc=2275)
          0.26594472 = weight(abstract_txt:xslt in 2275) [ClassicSimilarity], result of:
            0.26594472 = score(doc=2275,freq=1.0), product of:
              0.31063983 = queryWeight, product of:
                2.2026834 = boost
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.01544337 = queryNorm
              0.85611916 = fieldWeight in 2275, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.09375 = fieldNorm(doc=2275)
          0.409499 = weight(abstract_txt:python in 2275) [ClassicSimilarity], result of:
            0.409499 = score(doc=2275,freq=1.0), product of:
              0.47416395 = queryWeight, product of:
                3.3329856 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.01544337 = queryNorm
              0.8636232 = fieldWeight in 2275, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.09375 = fieldNorm(doc=2275)
        0.12 = coord(3/25)
    
  3. Suranofsky, M.; McColl, L.: a Google sheets add-on that uses the WorldCat search API : MatchMarc (2019) 0.08
    0.08388994 = sum of:
      0.08388994 = product of:
        0.52431214 = sum of:
          0.009216611 = weight(abstract_txt:from in 5442) [ClassicSimilarity], result of:
            0.009216611 = score(doc=5442,freq=1.0), product of:
              0.04268366 = queryWeight, product of:
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.01544337 = queryNorm
              0.21592833 = fieldWeight in 5442, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.078125 = fieldNorm(doc=5442)
          0.0304006 = weight(abstract_txt:developed in 5442) [ClassicSimilarity], result of:
            0.0304006 = score(doc=5442,freq=2.0), product of:
              0.06557855 = queryWeight, product of:
                1.0120558 = boost
                4.195805 = idf(docFreq=1809, maxDocs=44218)
                0.01544337 = queryNorm
              0.46357536 = fieldWeight in 5442, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.195805 = idf(docFreq=1809, maxDocs=44218)
                0.078125 = fieldNorm(doc=5442)
          0.078009956 = weight(abstract_txt:google in 5442) [ClassicSimilarity], result of:
            0.078009956 = score(doc=5442,freq=3.0), product of:
              0.10737669 = queryWeight, product of:
                1.2950262 = boost
                5.3689504 = idf(docFreq=559, maxDocs=44218)
                0.01544337 = queryNorm
              0.72650737 = fieldWeight in 5442, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.3689504 = idf(docFreq=559, maxDocs=44218)
                0.078125 = fieldNorm(doc=5442)
          0.406685 = weight(abstract_txt:sheets in 5442) [ClassicSimilarity], result of:
            0.406685 = score(doc=5442,freq=2.0), product of:
              0.42303494 = queryWeight, product of:
                3.1481636 = boost
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.01544337 = queryNorm
              0.96135086 = fieldWeight in 5442, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.078125 = fieldNorm(doc=5442)
        0.16 = coord(4/25)
    
  4. Chen, S.-J.: Semantic enrichment of linked archival materials (2019) 0.08
    0.082255125 = sum of:
      0.082255125 = product of:
        0.3427297 = sum of:
          0.010427406 = weight(abstract_txt:from in 5488) [ClassicSimilarity], result of:
            0.010427406 = score(doc=5488,freq=2.0), product of:
              0.04268366 = queryWeight, product of:
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.01544337 = queryNorm
              0.24429502 = fieldWeight in 5488, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.0625 = fieldNorm(doc=5488)
          0.035198916 = weight(abstract_txt:sources in 5488) [ClassicSimilarity], result of:
            0.035198916 = score(doc=5488,freq=2.0), product of:
              0.083907336 = queryWeight, product of:
                1.1447839 = boost
                4.7460723 = idf(docFreq=1043, maxDocs=44218)
                0.01544337 = queryNorm
              0.4194975 = fieldWeight in 5488, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.7460723 = idf(docFreq=1043, maxDocs=44218)
                0.0625 = fieldNorm(doc=5488)
          0.017731233 = weight(abstract_txt:into in 5488) [ClassicSimilarity], result of:
            0.017731233 = score(doc=5488,freq=1.0), product of:
              0.07661493 = queryWeight, product of:
                1.3397565 = boost
                3.7029297 = idf(docFreq=2962, maxDocs=44218)
                0.01544337 = queryNorm
              0.23143311 = fieldWeight in 5488, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.7029297 = idf(docFreq=2962, maxDocs=44218)
                0.0625 = fieldNorm(doc=5488)
          0.041452263 = weight(abstract_txt:project in 5488) [ClassicSimilarity], result of:
            0.041452263 = score(doc=5488,freq=2.0), product of:
              0.1071132 = queryWeight, product of:
                1.5841295 = boost
                4.378348 = idf(docFreq=1507, maxDocs=44218)
                0.01544337 = queryNorm
              0.38699493 = fieldWeight in 5488, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.378348 = idf(docFreq=1507, maxDocs=44218)
                0.0625 = fieldNorm(doc=5488)
          0.059902616 = weight(abstract_txt:data in 5488) [ClassicSimilarity], result of:
            0.059902616 = score(doc=5488,freq=12.0), product of:
              0.08292851 = queryWeight, product of:
                1.609498 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.01544337 = queryNorm
              0.72234046 = fieldWeight in 5488, product of:
                3.4641016 = tf(freq=12.0), with freq of:
                  12.0 = termFreq=12.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.0625 = fieldNorm(doc=5488)
          0.17801727 = weight(abstract_txt:archival in 5488) [ClassicSimilarity], result of:
            0.17801727 = score(doc=5488,freq=4.0), product of:
              0.22461681 = queryWeight, product of:
                2.2939835 = boost
                6.340301 = idf(docFreq=211, maxDocs=44218)
                0.01544337 = queryNorm
              0.7925376 = fieldWeight in 5488, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.340301 = idf(docFreq=211, maxDocs=44218)
                0.0625 = fieldNorm(doc=5488)
        0.24 = coord(6/25)
    
  5. Alexander, F.; Heather, A.: Transformation of a legacy UDC-based classification system : exploiting and remodelling semantic relationships (2011) 0.08
    0.08028763 = sum of:
      0.08028763 = product of:
        0.28674152 = sum of:
          0.007373289 = weight(abstract_txt:from in 4829) [ClassicSimilarity], result of:
            0.007373289 = score(doc=4829,freq=1.0), product of:
              0.04268366 = queryWeight, product of:
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.01544337 = queryNorm
              0.17274266 = fieldWeight in 4829, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.0625 = fieldNorm(doc=4829)
          0.025087047 = weight(abstract_txt:ways in 4829) [ClassicSimilarity], result of:
            0.025087047 = score(doc=4829,freq=1.0), product of:
              0.08435097 = queryWeight, product of:
                1.1478063 = boost
                4.7586026 = idf(docFreq=1030, maxDocs=44218)
                0.01544337 = queryNorm
              0.29741266 = fieldWeight in 4829, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7586026 = idf(docFreq=1030, maxDocs=44218)
                0.0625 = fieldNorm(doc=4829)
          0.017731233 = weight(abstract_txt:into in 4829) [ClassicSimilarity], result of:
            0.017731233 = score(doc=4829,freq=1.0), product of:
              0.07661493 = queryWeight, product of:
                1.3397565 = boost
                3.7029297 = idf(docFreq=2962, maxDocs=44218)
                0.01544337 = queryNorm
              0.23143311 = fieldWeight in 4829, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.7029297 = idf(docFreq=2962, maxDocs=44218)
                0.0625 = fieldNorm(doc=4829)
          0.029311178 = weight(abstract_txt:project in 4829) [ClassicSimilarity], result of:
            0.029311178 = score(doc=4829,freq=1.0), product of:
              0.1071132 = queryWeight, product of:
                1.5841295 = boost
                4.378348 = idf(docFreq=1507, maxDocs=44218)
                0.01544337 = queryNorm
              0.27364674 = fieldWeight in 4829, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.378348 = idf(docFreq=1507, maxDocs=44218)
                0.0625 = fieldNorm(doc=4829)
          0.017292397 = weight(abstract_txt:data in 4829) [ClassicSimilarity], result of:
            0.017292397 = score(doc=4829,freq=1.0), product of:
              0.08292851 = queryWeight, product of:
                1.609498 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.01544337 = queryNorm
              0.20852174 = fieldWeight in 4829, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.0625 = fieldNorm(doc=4829)
          0.047269672 = weight(abstract_txt:processes in 4829) [ClassicSimilarity], result of:
            0.047269672 = score(doc=4829,freq=1.0), product of:
              0.1473022 = queryWeight, product of:
                1.8576922 = boost
                5.1344433 = idf(docFreq=707, maxDocs=44218)
                0.01544337 = queryNorm
              0.3209027 = fieldWeight in 4829, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1344433 = idf(docFreq=707, maxDocs=44218)
                0.0625 = fieldNorm(doc=4829)
          0.14267671 = weight(abstract_txt:migration in 4829) [ClassicSimilarity], result of:
            0.14267671 = score(doc=4829,freq=1.0), product of:
              0.3076495 = queryWeight, product of:
                2.6847093 = boost
                7.4202213 = idf(docFreq=71, maxDocs=44218)
                0.01544337 = queryNorm
              0.46376383 = fieldWeight in 4829, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.4202213 = idf(docFreq=71, maxDocs=44218)
                0.0625 = fieldNorm(doc=4829)
        0.28 = coord(7/25)