Document (#39278)

Author
Harlow, C.
Title
Data munging tools in Preparation for RDF : Catmandu and LODRefine
Source
Code4Lib journal. Issue 30(2015), [http://journal.code4lib.org]
Year
2015
Abstract
Data munging, or the work of remediating, enhancing and transforming library datasets for new or improved uses, has become more important and staff-inclusive in many library technology discussions and projects. Many times we know how we want our data to look, as well as how we want our data to act in discovery interfaces or when exposed, but we are uncertain how to make the data we have into the data we want. This article introduces and compares two library data munging tools that can help: LODRefine (OpenRefine with the DERI RDF Extension) and Catmandu. The strengths and best practices of each tool are discussed in the context of metadata munging use cases for an institution's metadata migration workflow. There is a focus on Linked Open Data modeling and transformation applications of each tool, in particular how metadataists, catalogers, and programmers can create metadata quality reports, enhance existing data with LOD sets, and transform that data to a RDF model. Integration of these tools with other systems and projects, the use of domain specific transformation languages, and the expansion of vocabulary reconciliation services are mentioned.
Content
Vgl.: http://journal.code4lib.org/articles/11013.
Theme
Formalerschließung
Semantic Web
Object
Catmandu
LODRefine
RDF

Similar documents (content)

  1. Hooland, S. van; Verborgh, R.; Wilde, M. De; Hercher, J.; Mannens, E.; Wa, R.Van de: Evaluating the success of vocabulary reconciliation for cultural heritage collections (2013) 0.30
    0.304671 = sum of:
      0.304671 = product of:
        0.9520969 = sum of:
          0.010960875 = weight(abstract_txt:with in 662) [ClassicSimilarity], result of:
            0.010960875 = score(doc=662,freq=1.0), product of:
              0.056125667 = queryWeight, product of:
                1.112466 = boost
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.020182783 = queryNorm
              0.19529167 = fieldWeight in 662, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.078125 = fieldNorm(doc=662)
          0.24025278 = weight(abstract_txt:reconciliation in 662) [ClassicSimilarity], result of:
            0.24025278 = score(doc=662,freq=2.0), product of:
              0.24191338 = queryWeight, product of:
                1.3334457 = boost
                8.988837 = idf(docFreq=14, maxDocs=44218)
                0.020182783 = queryNorm
              0.9931356 = fieldWeight in 662, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.988837 = idf(docFreq=14, maxDocs=44218)
                0.078125 = fieldNorm(doc=662)
          0.2005245 = weight(abstract_txt:openrefine in 662) [ClassicSimilarity], result of:
            0.2005245 = score(doc=662,freq=1.0), product of:
              0.27018997 = queryWeight, product of:
                1.4092238 = boost
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.020182783 = queryNorm
              0.74216115 = fieldWeight in 662, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.078125 = fieldNorm(doc=662)
          0.022708857 = weight(abstract_txt:library in 662) [ClassicSimilarity], result of:
            0.022708857 = score(doc=662,freq=1.0), product of:
              0.091214 = queryWeight, product of:
                1.4181976 = boost
                3.1867187 = idf(docFreq=4964, maxDocs=44218)
                0.020182783 = queryNorm
              0.2489624 = fieldWeight in 662, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1867187 = idf(docFreq=4964, maxDocs=44218)
                0.078125 = fieldNorm(doc=662)
          0.05679668 = weight(abstract_txt:tool in 662) [ClassicSimilarity], result of:
            0.05679668 = score(doc=662,freq=1.0), product of:
              0.14681922 = queryWeight, product of:
                1.4691015 = boost
                4.951651 = idf(docFreq=849, maxDocs=44218)
                0.020182783 = queryNorm
              0.38684773 = fieldWeight in 662, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.951651 = idf(docFreq=849, maxDocs=44218)
                0.078125 = fieldNorm(doc=662)
          0.12903754 = weight(abstract_txt:transformation in 662) [ClassicSimilarity], result of:
            0.12903754 = score(doc=662,freq=1.0), product of:
              0.25373378 = queryWeight, product of:
                1.931299 = boost
                6.5095015 = idf(docFreq=178, maxDocs=44218)
                0.020182783 = queryNorm
              0.5085548 = fieldWeight in 662, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5095015 = idf(docFreq=178, maxDocs=44218)
                0.078125 = fieldNorm(doc=662)
          0.14135692 = weight(abstract_txt:metadata in 662) [ClassicSimilarity], result of:
            0.14135692 = score(doc=662,freq=3.0), product of:
              0.21401078 = queryWeight, product of:
                2.1723201 = boost
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.020182783 = queryNorm
              0.6605131 = fieldWeight in 662, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.078125 = fieldNorm(doc=662)
          0.15045875 = weight(abstract_txt:data in 662) [ClassicSimilarity], result of:
            0.15045875 = score(doc=662,freq=3.0), product of:
              0.33326945 = queryWeight, product of:
                4.9492927 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.020182783 = queryNorm
              0.4514628 = fieldWeight in 662, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.078125 = fieldNorm(doc=662)
        0.32 = coord(8/25)
    
  2. Stephens, O.: Introduction to OpenRefine (2014) 0.22
    0.21586828 = sum of:
      0.21586828 = product of:
        0.8994512 = sum of:
          0.0175374 = weight(abstract_txt:with in 2884) [ClassicSimilarity], result of:
            0.0175374 = score(doc=2884,freq=4.0), product of:
              0.056125667 = queryWeight, product of:
                1.112466 = boost
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.020182783 = queryNorm
              0.31246668 = fieldWeight in 2884, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.0625 = fieldNorm(doc=2884)
          0.02543861 = weight(abstract_txt:many in 2884) [ClassicSimilarity], result of:
            0.02543861 = score(doc=2884,freq=1.0), product of:
              0.09973246 = queryWeight, product of:
                1.2108173 = boost
                4.081096 = idf(docFreq=2029, maxDocs=44218)
                0.020182783 = queryNorm
              0.2550685 = fieldWeight in 2884, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.081096 = idf(docFreq=2029, maxDocs=44218)
                0.0625 = fieldNorm(doc=2884)
          0.2778549 = weight(abstract_txt:openrefine in 2884) [ClassicSimilarity], result of:
            0.2778549 = score(doc=2884,freq=3.0), product of:
              0.27018997 = queryWeight, product of:
                1.4092238 = boost
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.020182783 = queryNorm
              1.0283686 = fieldWeight in 2884, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.0625 = fieldNorm(doc=2884)
          0.045437347 = weight(abstract_txt:tool in 2884) [ClassicSimilarity], result of:
            0.045437347 = score(doc=2884,freq=1.0), product of:
              0.14681922 = queryWeight, product of:
                1.4691015 = boost
                4.951651 = idf(docFreq=849, maxDocs=44218)
                0.020182783 = queryNorm
              0.3094782 = fieldWeight in 2884, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.951651 = idf(docFreq=849, maxDocs=44218)
                0.0625 = fieldNorm(doc=2884)
          0.27316046 = weight(abstract_txt:want in 2884) [ClassicSimilarity], result of:
            0.27316046 = score(doc=2884,freq=3.0), product of:
              0.38527974 = queryWeight, product of:
                2.9147015 = boost
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.020182783 = queryNorm
              0.70899254 = fieldWeight in 2884, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.0625 = fieldNorm(doc=2884)
          0.26002246 = weight(abstract_txt:data in 2884) [ClassicSimilarity], result of:
            0.26002246 = score(doc=2884,freq=14.0), product of:
              0.33326945 = queryWeight, product of:
                4.9492927 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.020182783 = queryNorm
              0.78021693 = fieldWeight in 2884, product of:
                3.7416575 = tf(freq=14.0), with freq of:
                  14.0 = termFreq=14.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.0625 = fieldNorm(doc=2884)
        0.24 = coord(6/25)
    
  3. Takhirov, N.; Aalberg, T.; Duchateau, F.; Zumer, M.: FRBR-ML: a FRBR-based framework for semantic interoperability (2012) 0.17
    0.17213294 = sum of:
      0.17213294 = product of:
        0.53791547 = sum of:
          0.050156306 = weight(abstract_txt:enhancing in 134) [ClassicSimilarity], result of:
            0.050156306 = score(doc=134,freq=1.0), product of:
              0.13605335 = queryWeight, product of:
                6.7410603 = idf(docFreq=141, maxDocs=44218)
                0.020182783 = queryNorm
              0.36865175 = fieldWeight in 134, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7410603 = idf(docFreq=141, maxDocs=44218)
                0.0546875 = fieldNorm(doc=134)
          0.007672613 = weight(abstract_txt:with in 134) [ClassicSimilarity], result of:
            0.007672613 = score(doc=134,freq=1.0), product of:
              0.056125667 = queryWeight, product of:
                1.112466 = boost
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.020182783 = queryNorm
              0.13670418 = fieldWeight in 134, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.0546875 = fieldNorm(doc=134)
          0.14582919 = weight(abstract_txt:transforming in 134) [ClassicSimilarity], result of:
            0.14582919 = score(doc=134,freq=4.0), product of:
              0.17459637 = queryWeight, product of:
                1.1328254 = boost
                7.636444 = idf(docFreq=57, maxDocs=44218)
                0.020182783 = queryNorm
              0.8352361 = fieldWeight in 134, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.636444 = idf(docFreq=57, maxDocs=44218)
                0.0546875 = fieldNorm(doc=134)
          0.022258783 = weight(abstract_txt:many in 134) [ClassicSimilarity], result of:
            0.022258783 = score(doc=134,freq=1.0), product of:
              0.09973246 = queryWeight, product of:
                1.2108173 = boost
                4.081096 = idf(docFreq=2029, maxDocs=44218)
                0.020182783 = queryNorm
              0.22318494 = fieldWeight in 134, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.081096 = idf(docFreq=2029, maxDocs=44218)
                0.0546875 = fieldNorm(doc=134)
          0.0158962 = weight(abstract_txt:library in 134) [ClassicSimilarity], result of:
            0.0158962 = score(doc=134,freq=1.0), product of:
              0.091214 = queryWeight, product of:
                1.4181976 = boost
                3.1867187 = idf(docFreq=4964, maxDocs=44218)
                0.020182783 = queryNorm
              0.17427368 = fieldWeight in 134, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1867187 = idf(docFreq=4964, maxDocs=44218)
                0.0546875 = fieldNorm(doc=134)
          0.043321434 = weight(abstract_txt:tools in 134) [ClassicSimilarity], result of:
            0.043321434 = score(doc=134,freq=1.0), product of:
              0.17796499 = queryWeight, product of:
                1.9809489 = boost
                4.451232 = idf(docFreq=1401, maxDocs=44218)
                0.020182783 = queryNorm
              0.24342674 = fieldWeight in 134, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.451232 = idf(docFreq=1401, maxDocs=44218)
                0.0546875 = fieldNorm(doc=134)
          0.08079221 = weight(abstract_txt:metadata in 134) [ClassicSimilarity], result of:
            0.08079221 = score(doc=134,freq=2.0), product of:
              0.21401078 = queryWeight, product of:
                2.1723201 = boost
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.020182783 = queryNorm
              0.3775147 = fieldWeight in 134, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.0546875 = fieldNorm(doc=134)
          0.1719887 = weight(abstract_txt:data in 134) [ClassicSimilarity], result of:
            0.1719887 = score(doc=134,freq=8.0), product of:
              0.33326945 = queryWeight, product of:
                4.9492927 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.020182783 = queryNorm
              0.516065 = fieldWeight in 134, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.0546875 = fieldNorm(doc=134)
        0.32 = coord(8/25)
    
  4. Lynch, J.D.; Gibson, J.; Han, M.-J.: Analyzing and normalizing type metadata for a large aggregated digital library (2020) 0.17
    0.17067523 = sum of:
      0.17067523 = product of:
        0.71114683 = sum of:
          0.08598223 = weight(abstract_txt:enhancing in 5720) [ClassicSimilarity], result of:
            0.08598223 = score(doc=5720,freq=1.0), product of:
              0.13605335 = queryWeight, product of:
                6.7410603 = idf(docFreq=141, maxDocs=44218)
                0.020182783 = queryNorm
              0.6319744 = fieldWeight in 5720, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7410603 = idf(docFreq=141, maxDocs=44218)
                0.09375 = fieldNorm(doc=5720)
          0.013153051 = weight(abstract_txt:with in 5720) [ClassicSimilarity], result of:
            0.013153051 = score(doc=5720,freq=1.0), product of:
              0.056125667 = queryWeight, product of:
                1.112466 = boost
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.020182783 = queryNorm
              0.23435001 = fieldWeight in 5720, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.09375 = fieldNorm(doc=5720)
          0.24062939 = weight(abstract_txt:openrefine in 5720) [ClassicSimilarity], result of:
            0.24062939 = score(doc=5720,freq=1.0), product of:
              0.27018997 = queryWeight, product of:
                1.4092238 = boost
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.020182783 = queryNorm
              0.89059335 = fieldWeight in 5720, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.09375 = fieldNorm(doc=5720)
          0.027250627 = weight(abstract_txt:library in 5720) [ClassicSimilarity], result of:
            0.027250627 = score(doc=5720,freq=1.0), product of:
              0.091214 = queryWeight, product of:
                1.4181976 = boost
                3.1867187 = idf(docFreq=4964, maxDocs=44218)
                0.020182783 = queryNorm
              0.29875487 = fieldWeight in 5720, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1867187 = idf(docFreq=4964, maxDocs=44218)
                0.09375 = fieldNorm(doc=5720)
          0.23989065 = weight(abstract_txt:metadata in 5720) [ClassicSimilarity], result of:
            0.23989065 = score(doc=5720,freq=6.0), product of:
              0.21401078 = queryWeight, product of:
                2.1723201 = boost
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.020182783 = queryNorm
              1.1209279 = fieldWeight in 5720, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.09375 = fieldNorm(doc=5720)
          0.10424089 = weight(abstract_txt:data in 5720) [ClassicSimilarity], result of:
            0.10424089 = score(doc=5720,freq=1.0), product of:
              0.33326945 = queryWeight, product of:
                4.9492927 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.020182783 = queryNorm
              0.31278262 = fieldWeight in 5720, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.09375 = fieldNorm(doc=5720)
        0.24 = coord(6/25)
    
  5. Hooland, S. van; Verborgh, R.: Linked data for Lilibraries, archives and museums : how to clean, link, and publish your metadata (2014) 0.16
    0.16176698 = sum of:
      0.16176698 = product of:
        0.5777392 = sum of:
          0.009300611 = weight(abstract_txt:with in 5153) [ClassicSimilarity], result of:
            0.009300611 = score(doc=5153,freq=2.0), product of:
              0.056125667 = queryWeight, product of:
                1.112466 = boost
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.020182783 = queryNorm
              0.16571048 = fieldWeight in 5153, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.046875 = fieldNorm(doc=5153)
          0.019611819 = weight(abstract_txt:each in 5153) [ClassicSimilarity], result of:
            0.019611819 = score(doc=5153,freq=1.0), product of:
              0.10158089 = queryWeight, product of:
                1.2219864 = boost
                4.118742 = idf(docFreq=1954, maxDocs=44218)
                0.020182783 = queryNorm
              0.19306603 = fieldWeight in 5153, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.118742 = idf(docFreq=1954, maxDocs=44218)
                0.046875 = fieldNorm(doc=5153)
          0.17654902 = weight(abstract_txt:reconciliation in 5153) [ClassicSimilarity], result of:
            0.17654902 = score(doc=5153,freq=3.0), product of:
              0.24191338 = queryWeight, product of:
                1.3334457 = boost
                8.988837 = idf(docFreq=14, maxDocs=44218)
                0.020182783 = queryNorm
              0.7298026 = fieldWeight in 5153, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.988837 = idf(docFreq=14, maxDocs=44218)
                0.046875 = fieldNorm(doc=5153)
          0.0136253135 = weight(abstract_txt:library in 5153) [ClassicSimilarity], result of:
            0.0136253135 = score(doc=5153,freq=1.0), product of:
              0.091214 = queryWeight, product of:
                1.4181976 = boost
                3.1867187 = idf(docFreq=4964, maxDocs=44218)
                0.020182783 = queryNorm
              0.14937744 = fieldWeight in 5153, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1867187 = idf(docFreq=4964, maxDocs=44218)
                0.046875 = fieldNorm(doc=5153)
          0.05251351 = weight(abstract_txt:tools in 5153) [ClassicSimilarity], result of:
            0.05251351 = score(doc=5153,freq=2.0), product of:
              0.17796499 = queryWeight, product of:
                1.9809489 = boost
                4.451232 = idf(docFreq=1401, maxDocs=44218)
                0.020182783 = queryNorm
              0.29507777 = fieldWeight in 5153, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.451232 = idf(docFreq=1401, maxDocs=44218)
                0.046875 = fieldNorm(doc=5153)
          0.20189807 = weight(abstract_txt:metadata in 5153) [ClassicSimilarity], result of:
            0.20189807 = score(doc=5153,freq=17.0), product of:
              0.21401078 = queryWeight, product of:
                2.1723201 = boost
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.020182783 = queryNorm
              0.9434014 = fieldWeight in 5153, product of:
                4.1231055 = tf(freq=17.0), with freq of:
                  17.0 = termFreq=17.0
                4.881247 = idf(docFreq=911, maxDocs=44218)
                0.046875 = fieldNorm(doc=5153)
          0.10424089 = weight(abstract_txt:data in 5153) [ClassicSimilarity], result of:
            0.10424089 = score(doc=5153,freq=4.0), product of:
              0.33326945 = queryWeight, product of:
                4.9492927 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.020182783 = queryNorm
              0.31278262 = fieldWeight in 5153, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.046875 = fieldNorm(doc=5153)
        0.28 = coord(7/25)