Document (#39279)

Author
Harlow, C.
Title
Data munging tools in Preparation for RDF : Catmandu and LODRefine
Source
Code4Lib journal. Issue 30(2015), [http://journal.code4lib.org]
Year
2015
Abstract
Data munging, or the work of remediating, enhancing and transforming library datasets for new or improved uses, has become more important and staff-inclusive in many library technology discussions and projects. Many times we know how we want our data to look, as well as how we want our data to act in discovery interfaces or when exposed, but we are uncertain how to make the data we have into the data we want. This article introduces and compares two library data munging tools that can help: LODRefine (OpenRefine with the DERI RDF Extension) and Catmandu. The strengths and best practices of each tool are discussed in the context of metadata munging use cases for an institution's metadata migration workflow. There is a focus on Linked Open Data modeling and transformation applications of each tool, in particular how metadataists, catalogers, and programmers can create metadata quality reports, enhance existing data with LOD sets, and transform that data to a RDF model. Integration of these tools with other systems and projects, the use of domain specific transformation languages, and the expansion of vocabulary reconciliation services are mentioned.
Content
Vgl.: http://journal.code4lib.org/articles/11013.
Theme
Formalerschließung
Semantic Web
Object
Catmandu
LODRefine
RDF

Similar documents (content)

  1. Hooland, S. van; Verborgh, R.; Wilde, M. De; Hercher, J.; Mannens, E.; Wa, R.Van de: Evaluating the success of vocabulary reconciliation for cultural heritage collections (2013) 0.32
    0.3158205 = sum of:
      0.3158205 = product of:
        0.9869391 = sum of:
          0.011020257 = weight(abstract_txt:with in 2663) [ClassicSimilarity], result of:
            0.011020257 = score(doc=2663,freq=1.0), product of:
              0.056028776 = queryWeight, product of:
                1.115219 = boost
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.019955399 = queryNorm
              0.19668923 = fieldWeight in 2663, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.078125 = fieldNorm(doc=2663)
          0.251684 = weight(abstract_txt:reconciliation in 2663) [ClassicSimilarity], result of:
            0.251684 = score(doc=2663,freq=2.0), product of:
              0.24820088 = queryWeight, product of:
                1.3551757 = boost
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.019955399 = queryNorm
              1.0140336 = fieldWeight in 2663, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.078125 = fieldNorm(doc=2663)
          0.02227452 = weight(abstract_txt:library in 2663) [ClassicSimilarity], result of:
            0.02227452 = score(doc=2663,freq=1.0), product of:
              0.089568555 = queryWeight, product of:
                1.410043 = boost
                3.1831915 = idf(docFreq=4815, maxDocs=42740)
                0.019955399 = queryNorm
              0.24868684 = fieldWeight in 2663, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1831915 = idf(docFreq=4815, maxDocs=42740)
                0.078125 = fieldNorm(doc=2663)
          0.22141115 = weight(abstract_txt:openrefine in 2663) [ClassicSimilarity], result of:
            0.22141115 = score(doc=2663,freq=1.0), product of:
              0.2871062 = queryWeight, product of:
                1.4575224 = boost
                9.871131 = idf(docFreq=5, maxDocs=42740)
                0.019955399 = queryNorm
              0.7711821 = fieldWeight in 2663, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.871131 = idf(docFreq=5, maxDocs=42740)
                0.078125 = fieldNorm(doc=2663)
          0.056674343 = weight(abstract_txt:tool in 2663) [ClassicSimilarity], result of:
            0.056674343 = score(doc=2663,freq=1.0), product of:
              0.14582899 = queryWeight, product of:
                1.4690307 = boost
                4.974536 = idf(docFreq=802, maxDocs=42740)
                0.019955399 = queryNorm
              0.38863564 = fieldWeight in 2663, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.974536 = idf(docFreq=802, maxDocs=42740)
                0.078125 = fieldNorm(doc=2663)
          0.12978765 = weight(abstract_txt:transformation in 2663) [ClassicSimilarity], result of:
            0.12978765 = score(doc=2663,freq=1.0), product of:
              0.25336218 = queryWeight, product of:
                1.9363321 = boost
                6.556945 = idf(docFreq=164, maxDocs=42740)
                0.019955399 = queryNorm
              0.51226133 = fieldWeight in 2663, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.556945 = idf(docFreq=164, maxDocs=42740)
                0.078125 = fieldNorm(doc=2663)
          0.1412382 = weight(abstract_txt:metadata in 2663) [ClassicSimilarity], result of:
            0.1412382 = score(doc=2663,freq=3.0), product of:
              0.21275397 = queryWeight, product of:
                2.1731687 = boost
                4.905958 = idf(docFreq=859, maxDocs=42740)
                0.019955399 = queryNorm
              0.6638569 = fieldWeight in 2663, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.905958 = idf(docFreq=859, maxDocs=42740)
                0.078125 = fieldNorm(doc=2663)
          0.15284893 = weight(abstract_txt:data in 2663) [ClassicSimilarity], result of:
            0.15284893 = score(doc=2663,freq=3.0), product of:
              0.33499944 = queryWeight, product of:
                4.9787006 = boost
                3.3718455 = idf(docFreq=3987, maxDocs=42740)
                0.019955399 = queryNorm
              0.45626622 = fieldWeight in 2663, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.3718455 = idf(docFreq=3987, maxDocs=42740)
                0.078125 = fieldNorm(doc=2663)
        0.32 = coord(8/25)
    
  2. Stephens, O.: Introduction to OpenRefine (2014) 0.22
    0.22244877 = sum of:
      0.22244877 = product of:
        0.92686987 = sum of:
          0.017632412 = weight(abstract_txt:with in 4885) [ClassicSimilarity], result of:
            0.017632412 = score(doc=4885,freq=4.0), product of:
              0.056028776 = queryWeight, product of:
                1.115219 = boost
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.019955399 = queryNorm
              0.31470278 = fieldWeight in 4885, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.0625 = fieldNorm(doc=4885)
          0.0254078 = weight(abstract_txt:many in 4885) [ClassicSimilarity], result of:
            0.0254078 = score(doc=4885,freq=1.0), product of:
              0.099122204 = queryWeight, product of:
                1.2111403 = boost
                4.1012487 = idf(docFreq=1922, maxDocs=42740)
                0.019955399 = queryNorm
              0.25632805 = fieldWeight in 4885, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1012487 = idf(docFreq=1922, maxDocs=42740)
                0.0625 = fieldNorm(doc=4885)
          0.30679628 = weight(abstract_txt:openrefine in 4885) [ClassicSimilarity], result of:
            0.30679628 = score(doc=4885,freq=3.0), product of:
              0.2871062 = queryWeight, product of:
                1.4575224 = boost
                9.871131 = idf(docFreq=5, maxDocs=42740)
                0.019955399 = queryNorm
              1.0685812 = fieldWeight in 4885, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.871131 = idf(docFreq=5, maxDocs=42740)
                0.0625 = fieldNorm(doc=4885)
          0.045339473 = weight(abstract_txt:tool in 4885) [ClassicSimilarity], result of:
            0.045339473 = score(doc=4885,freq=1.0), product of:
              0.14582899 = queryWeight, product of:
                1.4690307 = boost
                4.974536 = idf(docFreq=802, maxDocs=42740)
                0.019955399 = queryNorm
              0.3109085 = fieldWeight in 4885, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.974536 = idf(docFreq=802, maxDocs=42740)
                0.0625 = fieldNorm(doc=4885)
          0.26754078 = weight(abstract_txt:want in 4885) [ClassicSimilarity], result of:
            0.26754078 = score(doc=4885,freq=3.0), product of:
              0.37795743 = queryWeight, product of:
                2.8965166 = boost
                6.5389266 = idf(docFreq=167, maxDocs=42740)
                0.019955399 = queryNorm
              0.7078596 = fieldWeight in 4885, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.5389266 = idf(docFreq=167, maxDocs=42740)
                0.0625 = fieldNorm(doc=4885)
          0.26415315 = weight(abstract_txt:data in 4885) [ClassicSimilarity], result of:
            0.26415315 = score(doc=4885,freq=14.0), product of:
              0.33499944 = queryWeight, product of:
                4.9787006 = boost
                3.3718455 = idf(docFreq=3987, maxDocs=42740)
                0.019955399 = queryNorm
              0.7885182 = fieldWeight in 4885, product of:
                3.7416575 = tf(freq=14.0), with freq of:
                  14.0 = termFreq=14.0
                3.3718455 = idf(docFreq=3987, maxDocs=42740)
                0.0625 = fieldNorm(doc=4885)
        0.24 = coord(6/25)
    
  3. Lynch, J.D.; Gibson, J.; Han, M.-J.: Analyzing and normalizing type metadata for a large aggregated digital library (2020) 0.18
    0.17689021 = sum of:
      0.17689021 = product of:
        0.73704255 = sum of:
          0.08580943 = weight(abstract_txt:enhancing in 1721) [ClassicSimilarity], result of:
            0.08580943 = score(doc=1721,freq=1.0), product of:
              0.13514876 = queryWeight, product of:
                6.7725415 = idf(docFreq=132, maxDocs=42740)
                0.019955399 = queryNorm
              0.6349258 = fieldWeight in 1721, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7725415 = idf(docFreq=132, maxDocs=42740)
                0.09375 = fieldNorm(doc=1721)
          0.013224309 = weight(abstract_txt:with in 1721) [ClassicSimilarity], result of:
            0.013224309 = score(doc=1721,freq=1.0), product of:
              0.056028776 = queryWeight, product of:
                1.115219 = boost
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.019955399 = queryNorm
              0.23602709 = fieldWeight in 1721, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.09375 = fieldNorm(doc=1721)
          0.026729425 = weight(abstract_txt:library in 1721) [ClassicSimilarity], result of:
            0.026729425 = score(doc=1721,freq=1.0), product of:
              0.089568555 = queryWeight, product of:
                1.410043 = boost
                3.1831915 = idf(docFreq=4815, maxDocs=42740)
                0.019955399 = queryNorm
              0.2984242 = fieldWeight in 1721, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1831915 = idf(docFreq=4815, maxDocs=42740)
                0.09375 = fieldNorm(doc=1721)
          0.26569337 = weight(abstract_txt:openrefine in 1721) [ClassicSimilarity], result of:
            0.26569337 = score(doc=1721,freq=1.0), product of:
              0.2871062 = queryWeight, product of:
                1.4575224 = boost
                9.871131 = idf(docFreq=5, maxDocs=42740)
                0.019955399 = queryNorm
              0.9254185 = fieldWeight in 1721, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.871131 = idf(docFreq=5, maxDocs=42740)
                0.09375 = fieldNorm(doc=1721)
          0.23968919 = weight(abstract_txt:metadata in 1721) [ClassicSimilarity], result of:
            0.23968919 = score(doc=1721,freq=6.0), product of:
              0.21275397 = queryWeight, product of:
                2.1731687 = boost
                4.905958 = idf(docFreq=859, maxDocs=42740)
                0.019955399 = queryNorm
              1.1266026 = fieldWeight in 1721, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.905958 = idf(docFreq=859, maxDocs=42740)
                0.09375 = fieldNorm(doc=1721)
          0.105896845 = weight(abstract_txt:data in 1721) [ClassicSimilarity], result of:
            0.105896845 = score(doc=1721,freq=1.0), product of:
              0.33499944 = queryWeight, product of:
                4.9787006 = boost
                3.3718455 = idf(docFreq=3987, maxDocs=42740)
                0.019955399 = queryNorm
              0.31611052 = fieldWeight in 1721, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3718455 = idf(docFreq=3987, maxDocs=42740)
                0.09375 = fieldNorm(doc=1721)
        0.24 = coord(6/25)
    
  4. Takhirov, N.; Aalberg, T.; Duchateau, F.; Zumer, M.: FRBR-ML: a FRBR-based framework for semantic interoperability (2012) 0.17
    0.17200316 = sum of:
      0.17200316 = product of:
        0.5375099 = sum of:
          0.050055504 = weight(abstract_txt:enhancing in 2135) [ClassicSimilarity], result of:
            0.050055504 = score(doc=2135,freq=1.0), product of:
              0.13514876 = queryWeight, product of:
                6.7725415 = idf(docFreq=132, maxDocs=42740)
                0.019955399 = queryNorm
              0.37037337 = fieldWeight in 2135, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7725415 = idf(docFreq=132, maxDocs=42740)
                0.0546875 = fieldNorm(doc=2135)
          0.0077141803 = weight(abstract_txt:with in 2135) [ClassicSimilarity], result of:
            0.0077141803 = score(doc=2135,freq=1.0), product of:
              0.056028776 = queryWeight, product of:
                1.115219 = boost
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.019955399 = queryNorm
              0.13768247 = fieldWeight in 2135, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.0546875 = fieldNorm(doc=2135)
          0.14357772 = weight(abstract_txt:transforming in 2135) [ClassicSimilarity], result of:
            0.14357772 = score(doc=2135,freq=4.0), product of:
              0.17187613 = queryWeight, product of:
                1.1277212 = boost
                7.637539 = idf(docFreq=55, maxDocs=42740)
                0.019955399 = queryNorm
              0.8353558 = fieldWeight in 2135, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.637539 = idf(docFreq=55, maxDocs=42740)
                0.0546875 = fieldNorm(doc=2135)
          0.022231825 = weight(abstract_txt:many in 2135) [ClassicSimilarity], result of:
            0.022231825 = score(doc=2135,freq=1.0), product of:
              0.099122204 = queryWeight, product of:
                1.2111403 = boost
                4.1012487 = idf(docFreq=1922, maxDocs=42740)
                0.019955399 = queryNorm
              0.22428703 = fieldWeight in 2135, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1012487 = idf(docFreq=1922, maxDocs=42740)
                0.0546875 = fieldNorm(doc=2135)
          0.015592164 = weight(abstract_txt:library in 2135) [ClassicSimilarity], result of:
            0.015592164 = score(doc=2135,freq=1.0), product of:
              0.089568555 = queryWeight, product of:
                1.410043 = boost
                3.1831915 = idf(docFreq=4815, maxDocs=42740)
                0.019955399 = queryNorm
              0.17408079 = fieldWeight in 2135, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1831915 = idf(docFreq=4815, maxDocs=42740)
                0.0546875 = fieldNorm(doc=2135)
          0.042893317 = weight(abstract_txt:tools in 2135) [ClassicSimilarity], result of:
            0.042893317 = score(doc=2135,freq=1.0), product of:
              0.17585081 = queryWeight, product of:
                1.9757262 = boost
                4.4602294 = idf(docFreq=1342, maxDocs=42740)
                0.019955399 = queryNorm
              0.24391879 = fieldWeight in 2135, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4602294 = idf(docFreq=1342, maxDocs=42740)
                0.0546875 = fieldNorm(doc=2135)
          0.08072436 = weight(abstract_txt:metadata in 2135) [ClassicSimilarity], result of:
            0.08072436 = score(doc=2135,freq=2.0), product of:
              0.21275397 = queryWeight, product of:
                2.1731687 = boost
                4.905958 = idf(docFreq=859, maxDocs=42740)
                0.019955399 = queryNorm
              0.37942585 = fieldWeight in 2135, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.905958 = idf(docFreq=859, maxDocs=42740)
                0.0546875 = fieldNorm(doc=2135)
          0.17472087 = weight(abstract_txt:data in 2135) [ClassicSimilarity], result of:
            0.17472087 = score(doc=2135,freq=8.0), product of:
              0.33499944 = queryWeight, product of:
                4.9787006 = boost
                3.3718455 = idf(docFreq=3987, maxDocs=42740)
                0.019955399 = queryNorm
              0.5215557 = fieldWeight in 2135, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                3.3718455 = idf(docFreq=3987, maxDocs=42740)
                0.0546875 = fieldNorm(doc=2135)
        0.32 = coord(8/25)
    
  5. Hooland, S. van; Verborgh, R.: Linked data for Lilibraries, archives and museums : how to clean, link, and publish your metadata (2014) 0.16
    0.16430007 = sum of:
      0.16430007 = product of:
        0.586786 = sum of:
          0.009350998 = weight(abstract_txt:with in 1154) [ClassicSimilarity], result of:
            0.009350998 = score(doc=1154,freq=2.0), product of:
              0.056028776 = queryWeight, product of:
                1.115219 = boost
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.019955399 = queryNorm
              0.16689636 = fieldWeight in 1154, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.046875 = fieldNorm(doc=1154)
          0.019501118 = weight(abstract_txt:each in 1154) [ClassicSimilarity], result of:
            0.019501118 = score(doc=1154,freq=1.0), product of:
              0.10066034 = queryWeight, product of:
                1.2205011 = boost
                4.132947 = idf(docFreq=1862, maxDocs=42740)
                0.019955399 = queryNorm
              0.19373189 = fieldWeight in 1154, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.132947 = idf(docFreq=1862, maxDocs=42740)
                0.046875 = fieldNorm(doc=1154)
          0.18494923 = weight(abstract_txt:reconciliation in 1154) [ClassicSimilarity], result of:
            0.18494923 = score(doc=1154,freq=3.0), product of:
              0.24820088 = queryWeight, product of:
                1.3551757 = boost
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.019955399 = queryNorm
              0.74515945 = fieldWeight in 1154, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.046875 = fieldNorm(doc=1154)
          0.013364713 = weight(abstract_txt:library in 1154) [ClassicSimilarity], result of:
            0.013364713 = score(doc=1154,freq=1.0), product of:
              0.089568555 = queryWeight, product of:
                1.410043 = boost
                3.1831915 = idf(docFreq=4815, maxDocs=42740)
                0.019955399 = queryNorm
              0.1492121 = fieldWeight in 1154, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1831915 = idf(docFreq=4815, maxDocs=42740)
                0.046875 = fieldNorm(doc=1154)
          0.051994555 = weight(abstract_txt:tools in 1154) [ClassicSimilarity], result of:
            0.051994555 = score(doc=1154,freq=2.0), product of:
              0.17585081 = queryWeight, product of:
                1.9757262 = boost
                4.4602294 = idf(docFreq=1342, maxDocs=42740)
                0.019955399 = queryNorm
              0.29567423 = fieldWeight in 1154, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4602294 = idf(docFreq=1342, maxDocs=42740)
                0.046875 = fieldNorm(doc=1154)
          0.2017285 = weight(abstract_txt:metadata in 1154) [ClassicSimilarity], result of:
            0.2017285 = score(doc=1154,freq=17.0), product of:
              0.21275397 = queryWeight, product of:
                2.1731687 = boost
                4.905958 = idf(docFreq=859, maxDocs=42740)
                0.019955399 = queryNorm
              0.94817734 = fieldWeight in 1154, product of:
                4.1231055 = tf(freq=17.0), with freq of:
                  17.0 = termFreq=17.0
                4.905958 = idf(docFreq=859, maxDocs=42740)
                0.046875 = fieldNorm(doc=1154)
          0.105896845 = weight(abstract_txt:data in 1154) [ClassicSimilarity], result of:
            0.105896845 = score(doc=1154,freq=4.0), product of:
              0.33499944 = queryWeight, product of:
                4.9787006 = boost
                3.3718455 = idf(docFreq=3987, maxDocs=42740)
                0.019955399 = queryNorm
              0.31611052 = fieldWeight in 1154, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.3718455 = idf(docFreq=3987, maxDocs=42740)
                0.046875 = fieldNorm(doc=1154)
        0.28 = coord(7/25)