Document (#32917)

Author
Foulonneau, M.
Title
Information redundancy across metadata collections
Source
Information processing and management. 43(2007) no.3, S.740-751
Year
2007
Abstract
Metadata records made available by content providers often lack the implicit information of their original use environment. Metadata aggregators therefore tend to emphasize completeness as a primary quality for shareable metadata. However, when adding implicit information to item-level records, data providers increase the redundancy of information contained in records from the same collection. The present paper reports on an effort to assess the extent and potential impact of information redundancy in metadata collections aggregated using the Open Archives Protocol for Metadata Harvesting. The first experiment quantifies the resemblance of metadata records on a collection-by-collection basis across 176 metadata collections aggregated for the CIC metadata portal. A second experiment measures the tendency of items from the same collection to appear together in results lists generated for a set of user queries. Results of the analyses correlate and suggest that within some collections item-level metadata records are not sufficiently differentiated to support certain digital library functions well. Metadata collections have a distinct role when included in larger aggregations, and in that role a minimum level of descriptive granularity is required to support digital library functions implemented by service providers. The experiments suggest possible ways to deal simultaneously with metadata record completeness, consistency, and redundancy.
Footnote
Beitrag in: Special issue on Heterogeneous and Distributed IR
Theme
Metadaten

Similar documents (content)

  1. Renear, A.H.; Wickett, K.M.; Urban, R.J.; Dubin, D.; Shreeves, S.L.: Collection/item metadata relationships (2008) 0.35
    0.35057494 = sum of:
      0.35057494 = product of:
        1.0955467 = sum of:
          0.024770176 = weight(abstract_txt:support in 4624) [ClassicSimilarity], result of:
            0.024770176 = score(doc=4624,freq=1.0), product of:
              0.07183543 = queryWeight, product of:
                1.0184827 = boost
                4.4136753 = idf(docFreq=1406, maxDocs=42740)
                0.01598029 = queryNorm
              0.34481838 = fieldWeight in 4624, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4136753 = idf(docFreq=1406, maxDocs=42740)
                0.078125 = fieldNorm(doc=4624)
          0.039686263 = weight(abstract_txt:across in 4624) [ClassicSimilarity], result of:
            0.039686263 = score(doc=4624,freq=1.0), product of:
              0.0983587 = queryWeight, product of:
                1.1917652 = boost
                5.1646085 = idf(docFreq=663, maxDocs=42740)
                0.01598029 = queryNorm
              0.40348503 = fieldWeight in 4624, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1646085 = idf(docFreq=663, maxDocs=42740)
                0.078125 = fieldNorm(doc=4624)
          0.0146169765 = weight(abstract_txt:information in 4624) [ClassicSimilarity], result of:
            0.0146169765 = score(doc=4624,freq=2.0), product of:
              0.0544412 = queryWeight, product of:
                1.4019036 = boost
                2.430104 = idf(docFreq=10226, maxDocs=42740)
                0.01598029 = queryNorm
              0.2684911 = fieldWeight in 4624, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.430104 = idf(docFreq=10226, maxDocs=42740)
                0.078125 = fieldNorm(doc=4624)
          0.13145833 = weight(abstract_txt:item in 4624) [ClassicSimilarity], result of:
            0.13145833 = score(doc=4624,freq=3.0), product of:
              0.1515436 = queryWeight, product of:
                1.4792893 = boost
                6.410617 = idf(docFreq=190, maxDocs=42740)
                0.01598029 = queryNorm
              0.86746204 = fieldWeight in 4624, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.410617 = idf(docFreq=190, maxDocs=42740)
                0.078125 = fieldNorm(doc=4624)
          0.08994386 = weight(abstract_txt:level in 4624) [ClassicSimilarity], result of:
            0.08994386 = score(doc=4624,freq=5.0), product of:
              0.1136076 = queryWeight, product of:
                1.5686773 = boost
                4.5319915 = idf(docFreq=1249, maxDocs=42740)
                0.01598029 = queryNorm
              0.7917064 = fieldWeight in 4624, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.5319915 = idf(docFreq=1249, maxDocs=42740)
                0.078125 = fieldNorm(doc=4624)
          0.14297207 = weight(abstract_txt:collection in 4624) [ClassicSimilarity], result of:
            0.14297207 = score(doc=4624,freq=6.0), product of:
              0.16026783 = queryWeight, product of:
                2.1514065 = boost
                4.661645 = idf(docFreq=1097, maxDocs=42740)
                0.01598029 = queryNorm
              0.89208215 = fieldWeight in 4624, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.661645 = idf(docFreq=1097, maxDocs=42740)
                0.078125 = fieldNorm(doc=4624)
          0.074806765 = weight(abstract_txt:collections in 4624) [ClassicSimilarity], result of:
            0.074806765 = score(doc=4624,freq=1.0), product of:
              0.20370102 = queryWeight, product of:
                2.711758 = boost
                4.700647 = idf(docFreq=1055, maxDocs=42740)
                0.01598029 = queryNorm
              0.36723804 = fieldWeight in 4624, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.700647 = idf(docFreq=1055, maxDocs=42740)
                0.078125 = fieldNorm(doc=4624)
          0.5772923 = weight(abstract_txt:metadata in 4624) [ClassicSimilarity], result of:
            0.5772923 = score(doc=4624,freq=8.0), product of:
              0.5325212 = queryWeight, product of:
                6.79248 = boost
                4.905958 = idf(docFreq=859, maxDocs=42740)
                0.01598029 = queryNorm
              1.0840739 = fieldWeight in 4624, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                4.905958 = idf(docFreq=859, maxDocs=42740)
                0.078125 = fieldNorm(doc=4624)
        0.32 = coord(8/25)
    
  2. Stvilia, B.; Gasser, L.: Value-based metadata quality assessment (2008) 0.27
    0.27189866 = sum of:
      0.27189866 = product of:
        1.1329111 = sum of:
          0.03740663 = weight(abstract_txt:same in 2253) [ClassicSimilarity], result of:
            0.03740663 = score(doc=2253,freq=1.0), product of:
              0.083733164 = queryWeight, product of:
                1.0995958 = boost
                4.7651854 = idf(docFreq=989, maxDocs=42740)
                0.01598029 = queryNorm
              0.44673613 = fieldWeight in 2253, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7651854 = idf(docFreq=989, maxDocs=42740)
                0.09375 = fieldNorm(doc=2253)
          0.14090826 = weight(abstract_txt:aggregated in 2253) [ClassicSimilarity], result of:
            0.14090826 = score(doc=2253,freq=1.0), product of:
              0.20271665 = queryWeight, product of:
                1.7109172 = boost
                7.4143953 = idf(docFreq=69, maxDocs=42740)
                0.01598029 = queryNorm
              0.6950996 = fieldWeight in 2253, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.4143953 = idf(docFreq=69, maxDocs=42740)
                0.09375 = fieldNorm(doc=2253)
          0.14008345 = weight(abstract_txt:collection in 2253) [ClassicSimilarity], result of:
            0.14008345 = score(doc=2253,freq=4.0), product of:
              0.16026783 = queryWeight, product of:
                2.1514065 = boost
                4.661645 = idf(docFreq=1097, maxDocs=42740)
                0.01598029 = queryNorm
              0.8740584 = fieldWeight in 2253, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.661645 = idf(docFreq=1097, maxDocs=42740)
                0.09375 = fieldNorm(doc=2253)
          0.1400825 = weight(abstract_txt:providers in 2253) [ClassicSimilarity], result of:
            0.1400825 = score(doc=2253,freq=1.0), product of:
              0.23114514 = queryWeight, product of:
                2.2375479 = boost
                6.4643936 = idf(docFreq=180, maxDocs=42740)
                0.01598029 = queryNorm
              0.6060369 = fieldWeight in 2253, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.4643936 = idf(docFreq=180, maxDocs=42740)
                0.09375 = fieldNorm(doc=2253)
          0.0744905 = weight(abstract_txt:records in 2253) [ClassicSimilarity], result of:
            0.0744905 = score(doc=2253,freq=1.0), product of:
              0.17987843 = queryWeight, product of:
                2.5482607 = boost
                4.4172354 = idf(docFreq=1401, maxDocs=42740)
                0.01598029 = queryNorm
              0.41411582 = fieldWeight in 2253, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4172354 = idf(docFreq=1401, maxDocs=42740)
                0.09375 = fieldNorm(doc=2253)
          0.59993976 = weight(abstract_txt:metadata in 2253) [ClassicSimilarity], result of:
            0.59993976 = score(doc=2253,freq=6.0), product of:
              0.5325212 = queryWeight, product of:
                6.79248 = boost
                4.905958 = idf(docFreq=859, maxDocs=42740)
                0.01598029 = queryNorm
              1.1266026 = fieldWeight in 2253, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.905958 = idf(docFreq=859, maxDocs=42740)
                0.09375 = fieldNorm(doc=2253)
        0.24 = coord(6/25)
    
  3. Park, J.-r.: Semantic interoperability and metadata quality : an analysis of metadata item records of digital image collections (2006) 0.25
    0.25355032 = sum of:
      0.25355032 = product of:
        0.90553683 = sum of:
          0.03351434 = weight(abstract_txt:digital in 1298) [ClassicSimilarity], result of:
            0.03351434 = score(doc=1298,freq=3.0), product of:
              0.07070324 = queryWeight, product of:
                1.0104247 = boost
                4.3787556 = idf(docFreq=1456, maxDocs=42740)
                0.01598029 = queryNorm
              0.4740142 = fieldWeight in 1298, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.3787556 = idf(docFreq=1456, maxDocs=42740)
                0.0625 = fieldNorm(doc=1298)
          0.04489988 = weight(abstract_txt:across in 1298) [ClassicSimilarity], result of:
            0.04489988 = score(doc=1298,freq=2.0), product of:
              0.0983587 = queryWeight, product of:
                1.1917652 = boost
                5.1646085 = idf(docFreq=663, maxDocs=42740)
                0.01598029 = queryNorm
              0.4564912 = fieldWeight in 1298, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.1646085 = idf(docFreq=663, maxDocs=42740)
                0.0625 = fieldNorm(doc=1298)
          0.04746039 = weight(abstract_txt:suggest in 1298) [ClassicSimilarity], result of:
            0.04746039 = score(doc=1298,freq=2.0), product of:
              0.10206343 = queryWeight, product of:
                1.214002 = boost
                5.2609735 = idf(docFreq=602, maxDocs=42740)
                0.01598029 = queryNorm
              0.46500874 = fieldWeight in 1298, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.2609735 = idf(docFreq=602, maxDocs=42740)
                0.0625 = fieldNorm(doc=1298)
          0.060718 = weight(abstract_txt:item in 1298) [ClassicSimilarity], result of:
            0.060718 = score(doc=1298,freq=1.0), product of:
              0.1515436 = queryWeight, product of:
                1.4792893 = boost
                6.410617 = idf(docFreq=190, maxDocs=42740)
                0.01598029 = queryNorm
              0.40066355 = fieldWeight in 1298, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.410617 = idf(docFreq=190, maxDocs=42740)
                0.0625 = fieldNorm(doc=1298)
          0.049660336 = weight(abstract_txt:records in 1298) [ClassicSimilarity], result of:
            0.049660336 = score(doc=1298,freq=1.0), product of:
              0.17987843 = queryWeight, product of:
                2.5482607 = boost
                4.4172354 = idf(docFreq=1401, maxDocs=42740)
                0.01598029 = queryNorm
              0.2760772 = fieldWeight in 1298, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4172354 = idf(docFreq=1401, maxDocs=42740)
                0.0625 = fieldNorm(doc=1298)
          0.103655286 = weight(abstract_txt:collections in 1298) [ClassicSimilarity], result of:
            0.103655286 = score(doc=1298,freq=3.0), product of:
              0.20370102 = queryWeight, product of:
                2.711758 = boost
                4.700647 = idf(docFreq=1055, maxDocs=42740)
                0.01598029 = queryNorm
              0.50885993 = fieldWeight in 1298, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.700647 = idf(docFreq=1055, maxDocs=42740)
                0.0625 = fieldNorm(doc=1298)
          0.5656286 = weight(abstract_txt:metadata in 1298) [ClassicSimilarity], result of:
            0.5656286 = score(doc=1298,freq=12.0), product of:
              0.5325212 = queryWeight, product of:
                6.79248 = boost
                4.905958 = idf(docFreq=859, maxDocs=42740)
                0.01598029 = queryNorm
              1.0621711 = fieldWeight in 1298, product of:
                3.4641016 = tf(freq=12.0), with freq of:
                  12.0 = termFreq=12.0
                4.905958 = idf(docFreq=859, maxDocs=42740)
                0.0625 = fieldNorm(doc=1298)
        0.28 = coord(7/25)
    
  4. McCallum, S.H.: Library of Congress metadata landscape (2003) 0.22
    0.22451785 = sum of:
      0.22451785 = product of:
        0.7016183 = sum of:
          0.019349512 = weight(abstract_txt:digital in 2761) [ClassicSimilarity], result of:
            0.019349512 = score(doc=2761,freq=1.0), product of:
              0.07070324 = queryWeight, product of:
                1.0104247 = boost
                4.3787556 = idf(docFreq=1456, maxDocs=42740)
                0.01598029 = queryNorm
              0.27367222 = fieldWeight in 2761, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3787556 = idf(docFreq=1456, maxDocs=42740)
                0.0625 = fieldNorm(doc=2761)
          0.019816142 = weight(abstract_txt:support in 2761) [ClassicSimilarity], result of:
            0.019816142 = score(doc=2761,freq=1.0), product of:
              0.07183543 = queryWeight, product of:
                1.0184827 = boost
                4.4136753 = idf(docFreq=1406, maxDocs=42740)
                0.01598029 = queryNorm
              0.2758547 = fieldWeight in 2761, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4136753 = idf(docFreq=1406, maxDocs=42740)
                0.0625 = fieldNorm(doc=2761)
          0.024937753 = weight(abstract_txt:same in 2761) [ClassicSimilarity], result of:
            0.024937753 = score(doc=2761,freq=1.0), product of:
              0.083733164 = queryWeight, product of:
                1.0995958 = boost
                4.7651854 = idf(docFreq=989, maxDocs=42740)
                0.01598029 = queryNorm
              0.29782408 = fieldWeight in 2761, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7651854 = idf(docFreq=989, maxDocs=42740)
                0.0625 = fieldNorm(doc=2761)
          0.060718 = weight(abstract_txt:item in 2761) [ClassicSimilarity], result of:
            0.060718 = score(doc=2761,freq=1.0), product of:
              0.1515436 = queryWeight, product of:
                1.4792893 = boost
                6.410617 = idf(docFreq=190, maxDocs=42740)
                0.01598029 = queryNorm
              0.40066355 = fieldWeight in 2761, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.410617 = idf(docFreq=190, maxDocs=42740)
                0.0625 = fieldNorm(doc=2761)
          0.04550839 = weight(abstract_txt:level in 2761) [ClassicSimilarity], result of:
            0.04550839 = score(doc=2761,freq=2.0), product of:
              0.1136076 = queryWeight, product of:
                1.5686773 = boost
                4.5319915 = idf(docFreq=1249, maxDocs=42740)
                0.01598029 = queryNorm
              0.40057522 = fieldWeight in 2761, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.5319915 = idf(docFreq=1249, maxDocs=42740)
                0.0625 = fieldNorm(doc=2761)
          0.046694484 = weight(abstract_txt:collection in 2761) [ClassicSimilarity], result of:
            0.046694484 = score(doc=2761,freq=1.0), product of:
              0.16026783 = queryWeight, product of:
                2.1514065 = boost
                4.661645 = idf(docFreq=1097, maxDocs=42740)
                0.01598029 = queryNorm
              0.2913528 = fieldWeight in 2761, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.661645 = idf(docFreq=1097, maxDocs=42740)
                0.0625 = fieldNorm(doc=2761)
          0.08463419 = weight(abstract_txt:collections in 2761) [ClassicSimilarity], result of:
            0.08463419 = score(doc=2761,freq=2.0), product of:
              0.20370102 = queryWeight, product of:
                2.711758 = boost
                4.700647 = idf(docFreq=1055, maxDocs=42740)
                0.01598029 = queryNorm
              0.4154824 = fieldWeight in 2761, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.700647 = idf(docFreq=1055, maxDocs=42740)
                0.0625 = fieldNorm(doc=2761)
          0.39995983 = weight(abstract_txt:metadata in 2761) [ClassicSimilarity], result of:
            0.39995983 = score(doc=2761,freq=6.0), product of:
              0.5325212 = queryWeight, product of:
                6.79248 = boost
                4.905958 = idf(docFreq=859, maxDocs=42740)
                0.01598029 = queryNorm
              0.7510684 = fieldWeight in 2761, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.905958 = idf(docFreq=859, maxDocs=42740)
                0.0625 = fieldNorm(doc=2761)
        0.32 = coord(8/25)
    
  5. Zavalina, O.L.: Complementarity in subject metadata in large-scale digital libraries : a comparative analysis (2014) 0.22
    0.21871693 = sum of:
      0.21871693 = product of:
        0.91132057 = sum of:
          0.04837378 = weight(abstract_txt:digital in 3973) [ClassicSimilarity], result of:
            0.04837378 = score(doc=3973,freq=4.0), product of:
              0.07070324 = queryWeight, product of:
                1.0104247 = boost
                4.3787556 = idf(docFreq=1456, maxDocs=42740)
                0.01598029 = queryNorm
              0.68418056 = fieldWeight in 3973, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.3787556 = idf(docFreq=1456, maxDocs=42740)
                0.078125 = fieldNorm(doc=3973)
          0.06967021 = weight(abstract_txt:level in 3973) [ClassicSimilarity], result of:
            0.06967021 = score(doc=3973,freq=3.0), product of:
              0.1136076 = queryWeight, product of:
                1.5686773 = boost
                4.5319915 = idf(docFreq=1249, maxDocs=42740)
                0.01598029 = queryNorm
              0.61325306 = fieldWeight in 3973, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.5319915 = idf(docFreq=1249, maxDocs=42740)
                0.078125 = fieldNorm(doc=3973)
          0.11742354 = weight(abstract_txt:aggregated in 3973) [ClassicSimilarity], result of:
            0.11742354 = score(doc=3973,freq=1.0), product of:
              0.20271665 = queryWeight, product of:
                1.7109172 = boost
                7.4143953 = idf(docFreq=69, maxDocs=42740)
                0.01598029 = queryNorm
              0.5792496 = fieldWeight in 3973, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.4143953 = idf(docFreq=69, maxDocs=42740)
                0.078125 = fieldNorm(doc=3973)
          0.10109651 = weight(abstract_txt:collection in 3973) [ClassicSimilarity], result of:
            0.10109651 = score(doc=3973,freq=3.0), product of:
              0.16026783 = queryWeight, product of:
                2.1514065 = boost
                4.661645 = idf(docFreq=1097, maxDocs=42740)
                0.01598029 = queryNorm
              0.63079727 = fieldWeight in 3973, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.661645 = idf(docFreq=1097, maxDocs=42740)
                0.078125 = fieldNorm(doc=3973)
          0.074806765 = weight(abstract_txt:collections in 3973) [ClassicSimilarity], result of:
            0.074806765 = score(doc=3973,freq=1.0), product of:
              0.20370102 = queryWeight, product of:
                2.711758 = boost
                4.700647 = idf(docFreq=1055, maxDocs=42740)
                0.01598029 = queryNorm
              0.36723804 = fieldWeight in 3973, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.700647 = idf(docFreq=1055, maxDocs=42740)
                0.078125 = fieldNorm(doc=3973)
          0.49994978 = weight(abstract_txt:metadata in 3973) [ClassicSimilarity], result of:
            0.49994978 = score(doc=3973,freq=6.0), product of:
              0.5325212 = queryWeight, product of:
                6.79248 = boost
                4.905958 = idf(docFreq=859, maxDocs=42740)
                0.01598029 = queryNorm
              0.9388355 = fieldWeight in 3973, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.905958 = idf(docFreq=859, maxDocs=42740)
                0.078125 = fieldNorm(doc=3973)
        0.24 = coord(6/25)