Document (#40372)

Author
Mayo, D.
Bowers, K.
Title
¬The devil's shoehorn : a case study of EAD to ArchivesSpace migration at a large university
Source
Code4Lib journal. Issue 35(2017), [http://journal.code4lib.org]
Year
2017
Abstract
A band of archivists and IT professionals at Harvard took on a project to convert nearly two million descriptions of archival collection components from marked-up text into the ArchivesSpace archival metadata management system. Starting in the mid-1990s, Harvard was an alpha implementer of EAD, an SGML (later XML) text markup language for electronic inventories, indexes, and finding aids that archivists use to wend their way through the sometimes quirky filing systems that bureaucracies establish for their records or the utter chaos in which some individuals keep their personal archives. These pathfinder documents, designed to cope with messy reality, can themselves be difficult to classify. Portions of them are rigorously structured, while other parts are narrative. Early documents predate the establishment of the standard; many feature idiosyncratic encoding that had been through several machine conversions, while others were freshly encoded and fairly consistent. In this paper, we will cover the practical and technical challenges involved in preparing a large (900MiB) corpus of XML for ingest into an open-source archival information system (ArchivesSpace). This case study will give an overview of the project, discuss problem discovery and problem solving, and address the technical challenges, analysis, solutions, and decisions and provide information on the tools produced and lessons learned. The authors of this piece are Kate Bowers, Collections Services Archivist for Metadata, Systems, and Standards at the Harvard University Archive, and Dave Mayo, a Digital Library Software Engineer for Harvard's Library and Technology Services. Kate was heavily involved in both metadata analysis and later problem solving, while Dave was the sole full-time developer assigned to the migration project.
Content
Vgl.: http://journal.code4lib.org/articles/12239.
Theme
Formalerschließung
Auszeichnungssprachen
Object
EAD
Area
Archive

Similar documents (content)

  1. Carini, P.; Shepherd, K.: ¬The MARC standard and encoded archival description (2004) 0.20
    0.19715194 = sum of:
      0.19715194 = product of:
        0.82146645 = sum of:
          0.05694568 = weight(abstract_txt:case in 3828) [ClassicSimilarity], result of:
            0.05694568 = score(doc=3828,freq=1.0), product of:
              0.1080393 = queryWeight, product of:
                1.115871 = boost
                4.819045 = idf(docFreq=955, maxDocs=43556)
                0.020091243 = queryNorm
              0.52708304 = fieldWeight in 3828, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.819045 = idf(docFreq=955, maxDocs=43556)
                0.109375 = fieldNorm(doc=3828)
          0.07149257 = weight(abstract_txt:challenges in 3828) [ClassicSimilarity], result of:
            0.07149257 = score(doc=3828,freq=1.0), product of:
              0.12573276 = queryWeight, product of:
                1.2037805 = boost
                5.198695 = idf(docFreq=653, maxDocs=43556)
                0.020091243 = queryNorm
              0.5686073 = fieldWeight in 3828, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.198695 = idf(docFreq=653, maxDocs=43556)
                0.109375 = fieldNorm(doc=3828)
          0.064461194 = weight(abstract_txt:project in 3828) [ClassicSimilarity], result of:
            0.064461194 = score(doc=3828,freq=1.0), product of:
              0.13432923 = queryWeight, product of:
                1.5238912 = boost
                4.3874254 = idf(docFreq=1471, maxDocs=43556)
                0.020091243 = queryNorm
              0.47987467 = fieldWeight in 3828, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3874254 = idf(docFreq=1471, maxDocs=43556)
                0.109375 = fieldNorm(doc=3828)
          0.19895262 = weight(abstract_txt:archivists in 3828) [ClassicSimilarity], result of:
            0.19895262 = score(doc=3828,freq=1.0), product of:
              0.24875644 = queryWeight, product of:
                1.6932077 = boost
                7.312355 = idf(docFreq=78, maxDocs=43556)
                0.020091243 = queryNorm
              0.79978883 = fieldWeight in 3828, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.312355 = idf(docFreq=78, maxDocs=43556)
                0.109375 = fieldNorm(doc=3828)
          0.089712396 = weight(abstract_txt:metadata in 3828) [ClassicSimilarity], result of:
            0.089712396 = score(doc=3828,freq=1.0), product of:
              0.1674454 = queryWeight, product of:
                1.7013958 = boost
                4.8984776 = idf(docFreq=882, maxDocs=43556)
                0.020091243 = queryNorm
              0.535771 = fieldWeight in 3828, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8984776 = idf(docFreq=882, maxDocs=43556)
                0.109375 = fieldNorm(doc=3828)
          0.339902 = weight(abstract_txt:archival in 3828) [ClassicSimilarity], result of:
            0.339902 = score(doc=3828,freq=3.0), product of:
              0.2821632 = queryWeight, product of:
                2.2086093 = boost
                6.358793 = idf(docFreq=204, maxDocs=43556)
                0.020091243 = queryNorm
              1.2046292 = fieldWeight in 3828, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.358793 = idf(docFreq=204, maxDocs=43556)
                0.109375 = fieldNorm(doc=3828)
        0.24 = coord(6/25)
    
  2. Carpenter, K.E.: End of the war between print and electronics (1996) 0.13
    0.1342498 = sum of:
      0.1342498 = product of:
        0.8390613 = sum of:
          0.21707104 = weight(abstract_txt:inventories in 1597) [ClassicSimilarity], result of:
            0.21707104 = score(doc=1597,freq=1.0), product of:
              0.20925017 = queryWeight, product of:
                1.0980977 = boost
                9.484578 = idf(docFreq=8, maxDocs=43556)
                0.020091243 = queryNorm
              1.0373757 = fieldWeight in 1597, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.484578 = idf(docFreq=8, maxDocs=43556)
                0.109375 = fieldNorm(doc=1597)
          0.024481587 = weight(abstract_txt:their in 1597) [ClassicSimilarity], result of:
            0.024481587 = score(doc=1597,freq=1.0), product of:
              0.070447356 = queryWeight, product of:
                1.103573 = boost
                3.1772897 = idf(docFreq=4936, maxDocs=43556)
                0.020091243 = queryNorm
              0.34751606 = fieldWeight in 1597, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1772897 = idf(docFreq=4936, maxDocs=43556)
                0.109375 = fieldNorm(doc=1597)
          0.19624253 = weight(abstract_txt:archival in 1597) [ClassicSimilarity], result of:
            0.19624253 = score(doc=1597,freq=1.0), product of:
              0.2821632 = queryWeight, product of:
                2.2086093 = boost
                6.358793 = idf(docFreq=204, maxDocs=43556)
                0.020091243 = queryNorm
              0.695493 = fieldWeight in 1597, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.358793 = idf(docFreq=204, maxDocs=43556)
                0.109375 = fieldNorm(doc=1597)
          0.40126616 = weight(abstract_txt:harvard in 1597) [ClassicSimilarity], result of:
            0.40126616 = score(doc=1597,freq=1.0), product of:
              0.45456222 = queryWeight, product of:
                2.803273 = boost
                8.070885 = idf(docFreq=36, maxDocs=43556)
                0.020091243 = queryNorm
              0.882753 = fieldWeight in 1597, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.070885 = idf(docFreq=36, maxDocs=43556)
                0.109375 = fieldNorm(doc=1597)
        0.16 = coord(4/25)
    
  3. Heastrom, M.: Descriptive practices for electronic records : deciding what is essential and imaging what is possible (1993) 0.12
    0.12102324 = sum of:
      0.12102324 = product of:
        0.6051162 = sum of:
          0.06127934 = weight(abstract_txt:challenges in 329) [ClassicSimilarity], result of:
            0.06127934 = score(doc=329,freq=1.0), product of:
              0.12573276 = queryWeight, product of:
                1.2037805 = boost
                5.198695 = idf(docFreq=653, maxDocs=43556)
                0.020091243 = queryNorm
              0.48737767 = fieldWeight in 329, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.198695 = idf(docFreq=653, maxDocs=43556)
                0.09375 = fieldNorm(doc=329)
          0.058527835 = weight(abstract_txt:while in 329) [ClassicSimilarity], result of:
            0.058527835 = score(doc=329,freq=1.0), product of:
              0.13958684 = queryWeight, product of:
                1.5534273 = boost
                4.4724627 = idf(docFreq=1351, maxDocs=43556)
                0.020091243 = queryNorm
              0.41929337 = fieldWeight in 329, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4724627 = idf(docFreq=1351, maxDocs=43556)
                0.09375 = fieldNorm(doc=329)
          0.17053083 = weight(abstract_txt:archivists in 329) [ClassicSimilarity], result of:
            0.17053083 = score(doc=329,freq=1.0), product of:
              0.24875644 = queryWeight, product of:
                1.6932077 = boost
                7.312355 = idf(docFreq=78, maxDocs=43556)
                0.020091243 = queryNorm
              0.6855333 = fieldWeight in 329, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.312355 = idf(docFreq=78, maxDocs=43556)
                0.09375 = fieldNorm(doc=329)
          0.07689633 = weight(abstract_txt:metadata in 329) [ClassicSimilarity], result of:
            0.07689633 = score(doc=329,freq=1.0), product of:
              0.1674454 = queryWeight, product of:
                1.7013958 = boost
                4.8984776 = idf(docFreq=882, maxDocs=43556)
                0.020091243 = queryNorm
              0.45923227 = fieldWeight in 329, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8984776 = idf(docFreq=882, maxDocs=43556)
                0.09375 = fieldNorm(doc=329)
          0.23788185 = weight(abstract_txt:archival in 329) [ClassicSimilarity], result of:
            0.23788185 = score(doc=329,freq=2.0), product of:
              0.2821632 = queryWeight, product of:
                2.2086093 = boost
                6.358793 = idf(docFreq=204, maxDocs=43556)
                0.020091243 = queryNorm
              0.8430648 = fieldWeight in 329, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.358793 = idf(docFreq=204, maxDocs=43556)
                0.09375 = fieldNorm(doc=329)
        0.2 = coord(5/25)
    
  4. Gracy, K.F.: Enriching and enhancing moving images with Linked Data : an exploration in the alignment of metadata models (2018) 0.11
    0.109218135 = sum of:
      0.109218135 = product of:
        0.45507556 = sum of:
          0.010492109 = weight(abstract_txt:their in 486) [ClassicSimilarity], result of:
            0.010492109 = score(doc=486,freq=1.0), product of:
              0.070447356 = queryWeight, product of:
                1.103573 = boost
                3.1772897 = idf(docFreq=4936, maxDocs=43556)
                0.020091243 = queryNorm
              0.14893545 = fieldWeight in 486, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1772897 = idf(docFreq=4936, maxDocs=43556)
                0.046875 = fieldNorm(doc=486)
          0.03063967 = weight(abstract_txt:challenges in 486) [ClassicSimilarity], result of:
            0.03063967 = score(doc=486,freq=1.0), product of:
              0.12573276 = queryWeight, product of:
                1.2037805 = boost
                5.198695 = idf(docFreq=653, maxDocs=43556)
                0.020091243 = queryNorm
              0.24368884 = fieldWeight in 486, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.198695 = idf(docFreq=653, maxDocs=43556)
                0.046875 = fieldNorm(doc=486)
          0.029263917 = weight(abstract_txt:while in 486) [ClassicSimilarity], result of:
            0.029263917 = score(doc=486,freq=1.0), product of:
              0.13958684 = queryWeight, product of:
                1.5534273 = boost
                4.4724627 = idf(docFreq=1351, maxDocs=43556)
                0.020091243 = queryNorm
              0.20964669 = fieldWeight in 486, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4724627 = idf(docFreq=1351, maxDocs=43556)
                0.046875 = fieldNorm(doc=486)
          0.08526541 = weight(abstract_txt:archivists in 486) [ClassicSimilarity], result of:
            0.08526541 = score(doc=486,freq=1.0), product of:
              0.24875644 = queryWeight, product of:
                1.6932077 = boost
                7.312355 = idf(docFreq=78, maxDocs=43556)
                0.020091243 = queryNorm
              0.34276664 = fieldWeight in 486, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.312355 = idf(docFreq=78, maxDocs=43556)
                0.046875 = fieldNorm(doc=486)
          0.07689633 = weight(abstract_txt:metadata in 486) [ClassicSimilarity], result of:
            0.07689633 = score(doc=486,freq=4.0), product of:
              0.1674454 = queryWeight, product of:
                1.7013958 = boost
                4.8984776 = idf(docFreq=882, maxDocs=43556)
                0.020091243 = queryNorm
              0.45923227 = fieldWeight in 486, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.8984776 = idf(docFreq=882, maxDocs=43556)
                0.046875 = fieldNorm(doc=486)
          0.2225181 = weight(abstract_txt:archival in 486) [ClassicSimilarity], result of:
            0.2225181 = score(doc=486,freq=7.0), product of:
              0.2821632 = queryWeight, product of:
                2.2086093 = boost
                6.358793 = idf(docFreq=204, maxDocs=43556)
                0.020091243 = queryNorm
              0.78861487 = fieldWeight in 486, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                6.358793 = idf(docFreq=204, maxDocs=43556)
                0.046875 = fieldNorm(doc=486)
        0.24 = coord(6/25)
    
  5. Trace, C.B.; Francisco-Revilla, L.: ¬The value and complexity of collection arrangement for evidentiary work (2015) 0.10
    0.10180712 = sum of:
      0.10180712 = product of:
        0.5090356 = sum of:
          0.01978411 = weight(abstract_txt:their in 4162) [ClassicSimilarity], result of:
            0.01978411 = score(doc=4162,freq=2.0), product of:
              0.070447356 = queryWeight, product of:
                1.103573 = boost
                3.1772897 = idf(docFreq=4936, maxDocs=43556)
                0.020091243 = queryNorm
              0.2808354 = fieldWeight in 4162, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.1772897 = idf(docFreq=4936, maxDocs=43556)
                0.0625 = fieldNorm(doc=4162)
          0.040888987 = weight(abstract_txt:involved in 4162) [ClassicSimilarity], result of:
            0.040888987 = score(doc=4162,freq=1.0), product of:
              0.12580681 = queryWeight, product of:
                1.204135 = boost
                5.200226 = idf(docFreq=652, maxDocs=43556)
                0.020091243 = queryNorm
              0.3250141 = fieldWeight in 4162, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.200226 = idf(docFreq=652, maxDocs=43556)
                0.0625 = fieldNorm(doc=4162)
          0.036834966 = weight(abstract_txt:project in 4162) [ClassicSimilarity], result of:
            0.036834966 = score(doc=4162,freq=1.0), product of:
              0.13432923 = queryWeight, product of:
                1.5238912 = boost
                4.3874254 = idf(docFreq=1471, maxDocs=43556)
                0.020091243 = queryNorm
              0.2742141 = fieldWeight in 4162, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3874254 = idf(docFreq=1471, maxDocs=43556)
                0.0625 = fieldNorm(doc=4162)
          0.16077799 = weight(abstract_txt:archivists in 4162) [ClassicSimilarity], result of:
            0.16077799 = score(doc=4162,freq=2.0), product of:
              0.24875644 = queryWeight, product of:
                1.6932077 = boost
                7.312355 = idf(docFreq=78, maxDocs=43556)
                0.020091243 = queryNorm
              0.64632696 = fieldWeight in 4162, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.312355 = idf(docFreq=78, maxDocs=43556)
                0.0625 = fieldNorm(doc=4162)
          0.2507495 = weight(abstract_txt:archival in 4162) [ClassicSimilarity], result of:
            0.2507495 = score(doc=4162,freq=5.0), product of:
              0.2821632 = queryWeight, product of:
                2.2086093 = boost
                6.358793 = idf(docFreq=204, maxDocs=43556)
                0.020091243 = queryNorm
              0.8886683 = fieldWeight in 4162, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.358793 = idf(docFreq=204, maxDocs=43556)
                0.0625 = fieldNorm(doc=4162)
        0.2 = coord(5/25)