Document (#38576)

Author
Wisser, K.
Title
¬The errors of our ways : using metadata quality research to understand common error patterns in the application of name headings
Source
Metadata and semantics research: 8th Research Conference, MTSR 2014, Karlsruhe, Germany, November 27-29, 2014, Proceedings. Eds.: S. Closs et al
Imprint
Cham : Springer
Year
2014
Pages
S.83-94
Series
Communications in computer and information science; 478
Abstract
Using data culled during a metadata quality research project for the Social Network and Archival Context (SNAC) project, this article discusses common errors and problems in the use of standardized languages, specifically unambiguous names for persons and corporate bodies. Errors such as misspelling, qualifiers, format, and miss-encoding point to several areas where quality control measures can improve aggregation of data. Results from a large data set indicate that there are predictable problems that can be retrospectively corrected before aggregation. This research looked specifically at name formation and expression in metadata records, but the errors detected could be extended to other controlled vocabularies as well.
Theme
Metadaten
Formalerschließung

Similar documents (content)

  1. Beall, J.; Kafadar, K.: ¬The effectiveness of copy cotaloging at eliminating typographical errors in shared bibliographic records (2004) 0.27
    0.27262744 = sum of:
      0.27262744 = product of:
        1.1359477 = sum of:
          0.083097436 = weight(abstract_txt:error in 850) [ClassicSimilarity], result of:
            0.083097436 = score(doc=850,freq=1.0), product of:
              0.12957717 = queryWeight, product of:
                6.8405 = idf(docFreq=122, maxDocs=42306)
                0.018942647 = queryNorm
              0.64129686 = fieldWeight in 850, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8405 = idf(docFreq=122, maxDocs=42306)
                0.09375 = fieldNorm(doc=850)
          0.041068546 = weight(abstract_txt:problems in 850) [ClassicSimilarity], result of:
            0.041068546 = score(doc=850,freq=1.0), product of:
              0.102051556 = queryWeight, product of:
                1.2550486 = boost
                4.29258 = idf(docFreq=1571, maxDocs=42306)
                0.018942647 = queryNorm
              0.4024294 = fieldWeight in 850, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.29258 = idf(docFreq=1571, maxDocs=42306)
                0.09375 = fieldNorm(doc=850)
          0.2518643 = weight(abstract_txt:corrected in 850) [ClassicSimilarity], result of:
            0.2518643 = score(doc=850,freq=2.0), product of:
              0.2153961 = queryWeight, product of:
                1.289302 = boost
                8.81947 = idf(docFreq=16, maxDocs=42306)
                0.018942647 = queryNorm
              1.1693076 = fieldWeight in 850, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.81947 = idf(docFreq=16, maxDocs=42306)
                0.09375 = fieldNorm(doc=850)
          0.03014565 = weight(abstract_txt:data in 850) [ClassicSimilarity], result of:
            0.03014565 = score(doc=850,freq=1.0), product of:
              0.09505908 = queryWeight, product of:
                1.4835192 = boost
                3.382671 = idf(docFreq=3904, maxDocs=42306)
                0.018942647 = queryNorm
              0.3171254 = fieldWeight in 850, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.382671 = idf(docFreq=3904, maxDocs=42306)
                0.09375 = fieldNorm(doc=850)
          0.08002941 = weight(abstract_txt:quality in 850) [ClassicSimilarity], result of:
            0.08002941 = score(doc=850,freq=1.0), product of:
              0.18225394 = queryWeight, product of:
                2.0541627 = boost
                4.6838336 = idf(docFreq=1062, maxDocs=42306)
                0.018942647 = queryNorm
              0.43910939 = fieldWeight in 850, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6838336 = idf(docFreq=1062, maxDocs=42306)
                0.09375 = fieldNorm(doc=850)
          0.6497423 = weight(abstract_txt:errors in 850) [ClassicSimilarity], result of:
            0.6497423 = score(doc=850,freq=5.0), product of:
              0.47387177 = queryWeight, product of:
                3.8246894 = boost
                6.5406966 = idf(docFreq=165, maxDocs=42306)
                0.018942647 = queryNorm
              1.3711352 = fieldWeight in 850, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.5406966 = idf(docFreq=165, maxDocs=42306)
                0.09375 = fieldNorm(doc=850)
        0.24 = coord(6/25)
    
  2. Pope, J.T.; Holley, R.P.: Google Book Search and metadata (2011) 0.16
    0.163018 = sum of:
      0.163018 = product of:
        0.6792417 = sum of:
          0.069247864 = weight(abstract_txt:error in 3888) [ClassicSimilarity], result of:
            0.069247864 = score(doc=3888,freq=1.0), product of:
              0.12957717 = queryWeight, product of:
                6.8405 = idf(docFreq=122, maxDocs=42306)
                0.018942647 = queryNorm
              0.53441405 = fieldWeight in 3888, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8405 = idf(docFreq=122, maxDocs=42306)
                0.078125 = fieldNorm(doc=3888)
          0.034223787 = weight(abstract_txt:problems in 3888) [ClassicSimilarity], result of:
            0.034223787 = score(doc=3888,freq=1.0), product of:
              0.102051556 = queryWeight, product of:
                1.2550486 = boost
                4.29258 = idf(docFreq=1571, maxDocs=42306)
                0.018942647 = queryNorm
              0.33535782 = fieldWeight in 3888, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.29258 = idf(docFreq=1571, maxDocs=42306)
                0.078125 = fieldNorm(doc=3888)
          0.03631291 = weight(abstract_txt:project in 3888) [ClassicSimilarity], result of:
            0.03631291 = score(doc=3888,freq=1.0), product of:
              0.106163435 = queryWeight, product of:
                1.2800833 = boost
                4.378205 = idf(docFreq=1442, maxDocs=42306)
                0.018942647 = queryNorm
              0.34204724 = fieldWeight in 3888, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.378205 = idf(docFreq=1442, maxDocs=42306)
                0.078125 = fieldNorm(doc=3888)
          0.025121374 = weight(abstract_txt:data in 3888) [ClassicSimilarity], result of:
            0.025121374 = score(doc=3888,freq=1.0), product of:
              0.09505908 = queryWeight, product of:
                1.4835192 = boost
                3.382671 = idf(docFreq=3904, maxDocs=42306)
                0.018942647 = queryNorm
              0.26427117 = fieldWeight in 3888, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.382671 = idf(docFreq=3904, maxDocs=42306)
                0.078125 = fieldNorm(doc=3888)
          0.17189156 = weight(abstract_txt:metadata in 3888) [ClassicSimilarity], result of:
            0.17189156 = score(doc=3888,freq=5.0), product of:
              0.20036 = queryWeight, product of:
                2.1537826 = boost
                4.9109836 = idf(docFreq=846, maxDocs=42306)
                0.018942647 = queryNorm
              0.85791355 = fieldWeight in 3888, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.9109836 = idf(docFreq=846, maxDocs=42306)
                0.078125 = fieldNorm(doc=3888)
          0.34244424 = weight(abstract_txt:errors in 3888) [ClassicSimilarity], result of:
            0.34244424 = score(doc=3888,freq=2.0), product of:
              0.47387177 = queryWeight, product of:
                3.8246894 = boost
                6.5406966 = idf(docFreq=165, maxDocs=42306)
                0.018942647 = queryNorm
              0.7226517 = fieldWeight in 3888, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.5406966 = idf(docFreq=165, maxDocs=42306)
                0.078125 = fieldNorm(doc=3888)
        0.24 = coord(6/25)
    
  3. Lardy, J.P.; Herzhaft, L.: Bibliometric treatments according to bibliographic errors and data heterogenity : the end-user point of view (1992) 0.16
    0.16159858 = sum of:
      0.16159858 = product of:
        0.8079929 = sum of:
          0.08276406 = weight(abstract_txt:common in 5133) [ClassicSimilarity], result of:
            0.08276406 = score(doc=5133,freq=2.0), product of:
              0.12923038 = queryWeight, product of:
                1.4123198 = boost
                4.830487 = idf(docFreq=917, maxDocs=42306)
                0.018942647 = queryNorm
              0.6404381 = fieldWeight in 5133, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.830487 = idf(docFreq=917, maxDocs=42306)
                0.09375 = fieldNorm(doc=5133)
          0.042632386 = weight(abstract_txt:data in 5133) [ClassicSimilarity], result of:
            0.042632386 = score(doc=5133,freq=2.0), product of:
              0.09505908 = queryWeight, product of:
                1.4835192 = boost
                3.382671 = idf(docFreq=3904, maxDocs=42306)
                0.018942647 = queryNorm
              0.44848305 = fieldWeight in 5133, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.382671 = idf(docFreq=3904, maxDocs=42306)
                0.09375 = fieldNorm(doc=5133)
          0.099278875 = weight(abstract_txt:name in 5133) [ClassicSimilarity], result of:
            0.099278875 = score(doc=5133,freq=1.0), product of:
              0.18381657 = queryWeight, product of:
                1.6843916 = boost
                5.76104 = idf(docFreq=361, maxDocs=42306)
                0.018942647 = queryNorm
              0.54009753 = fieldWeight in 5133, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.76104 = idf(docFreq=361, maxDocs=42306)
                0.09375 = fieldNorm(doc=5133)
          0.08002941 = weight(abstract_txt:quality in 5133) [ClassicSimilarity], result of:
            0.08002941 = score(doc=5133,freq=1.0), product of:
              0.18225394 = queryWeight, product of:
                2.0541627 = boost
                4.6838336 = idf(docFreq=1062, maxDocs=42306)
                0.018942647 = queryNorm
              0.43910939 = fieldWeight in 5133, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6838336 = idf(docFreq=1062, maxDocs=42306)
                0.09375 = fieldNorm(doc=5133)
          0.50328815 = weight(abstract_txt:errors in 5133) [ClassicSimilarity], result of:
            0.50328815 = score(doc=5133,freq=3.0), product of:
              0.47387177 = queryWeight, product of:
                3.8246894 = boost
                6.5406966 = idf(docFreq=165, maxDocs=42306)
                0.018942647 = queryNorm
              1.0620767 = fieldWeight in 5133, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.5406966 = idf(docFreq=165, maxDocs=42306)
                0.09375 = fieldNorm(doc=5133)
        0.2 = coord(5/25)
    
  4. Tani, A.; Candela, L.; Castelli, D.: Dealing with metadata quality : the legacy of digital library efforts (2013) 0.16
    0.15519942 = sum of:
      0.15519942 = product of:
        0.64666426 = sum of:
          0.041068546 = weight(abstract_txt:problems in 4663) [ClassicSimilarity], result of:
            0.041068546 = score(doc=4663,freq=1.0), product of:
              0.102051556 = queryWeight, product of:
                1.2550486 = boost
                4.29258 = idf(docFreq=1571, maxDocs=42306)
                0.018942647 = queryNorm
              0.4024294 = fieldWeight in 4663, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.29258 = idf(docFreq=1571, maxDocs=42306)
                0.09375 = fieldNorm(doc=4663)
          0.05852303 = weight(abstract_txt:common in 4663) [ClassicSimilarity], result of:
            0.05852303 = score(doc=4663,freq=1.0), product of:
              0.12923038 = queryWeight, product of:
                1.4123198 = boost
                4.830487 = idf(docFreq=917, maxDocs=42306)
                0.018942647 = queryNorm
              0.45285815 = fieldWeight in 4663, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.830487 = idf(docFreq=917, maxDocs=42306)
                0.09375 = fieldNorm(doc=4663)
          0.042632386 = weight(abstract_txt:data in 4663) [ClassicSimilarity], result of:
            0.042632386 = score(doc=4663,freq=2.0), product of:
              0.09505908 = queryWeight, product of:
                1.4835192 = boost
                3.382671 = idf(docFreq=3904, maxDocs=42306)
                0.018942647 = queryNorm
              0.44848305 = fieldWeight in 4663, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.382671 = idf(docFreq=3904, maxDocs=42306)
                0.09375 = fieldNorm(doc=4663)
          0.13861503 = weight(abstract_txt:quality in 4663) [ClassicSimilarity], result of:
            0.13861503 = score(doc=4663,freq=3.0), product of:
              0.18225394 = queryWeight, product of:
                2.0541627 = boost
                4.6838336 = idf(docFreq=1062, maxDocs=42306)
                0.018942647 = queryNorm
              0.7605598 = fieldWeight in 4663, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.6838336 = idf(docFreq=1062, maxDocs=42306)
                0.09375 = fieldNorm(doc=4663)
          0.20604937 = weight(abstract_txt:aggregation in 4663) [ClassicSimilarity], result of:
            0.20604937 = score(doc=4663,freq=1.0), product of:
              0.29908475 = queryWeight, product of:
                2.148562 = boost
                7.348619 = idf(docFreq=73, maxDocs=42306)
                0.018942647 = queryNorm
              0.688933 = fieldWeight in 4663, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.348619 = idf(docFreq=73, maxDocs=42306)
                0.09375 = fieldNorm(doc=4663)
          0.15977594 = weight(abstract_txt:metadata in 4663) [ClassicSimilarity], result of:
            0.15977594 = score(doc=4663,freq=3.0), product of:
              0.20036 = queryWeight, product of:
                2.1537826 = boost
                4.9109836 = idf(docFreq=846, maxDocs=42306)
                0.018942647 = queryNorm
              0.79744434 = fieldWeight in 4663, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.9109836 = idf(docFreq=846, maxDocs=42306)
                0.09375 = fieldNorm(doc=4663)
        0.24 = coord(6/25)
    
  5. Jarke, M.; Lenzerini, M.; Vassiliou, Y.: Fundamentals of data warehousing (1999) 0.14
    0.13813469 = sum of:
      0.13813469 = product of:
        0.5755612 = sum of:
          0.02200986 = weight(abstract_txt:using in 2303) [ClassicSimilarity], result of:
            0.02200986 = score(doc=2303,freq=1.0), product of:
              0.06733253 = queryWeight, product of:
                1.0194436 = boost
                3.486752 = idf(docFreq=3518, maxDocs=42306)
                0.018942647 = queryNorm
              0.32688302 = fieldWeight in 2303, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.486752 = idf(docFreq=3518, maxDocs=42306)
                0.09375 = fieldNorm(doc=2303)
          0.043575495 = weight(abstract_txt:project in 2303) [ClassicSimilarity], result of:
            0.043575495 = score(doc=2303,freq=1.0), product of:
              0.106163435 = queryWeight, product of:
                1.2800833 = boost
                4.378205 = idf(docFreq=1442, maxDocs=42306)
                0.018942647 = queryNorm
              0.41045672 = fieldWeight in 2303, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.378205 = idf(docFreq=1442, maxDocs=42306)
                0.09375 = fieldNorm(doc=2303)
          0.0602913 = weight(abstract_txt:data in 2303) [ClassicSimilarity], result of:
            0.0602913 = score(doc=2303,freq=4.0), product of:
              0.09505908 = queryWeight, product of:
                1.4835192 = boost
                3.382671 = idf(docFreq=3904, maxDocs=42306)
                0.018942647 = queryNorm
              0.6342508 = fieldWeight in 2303, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.382671 = idf(docFreq=3904, maxDocs=42306)
                0.09375 = fieldNorm(doc=2303)
          0.113178685 = weight(abstract_txt:quality in 2303) [ClassicSimilarity], result of:
            0.113178685 = score(doc=2303,freq=2.0), product of:
              0.18225394 = queryWeight, product of:
                2.0541627 = boost
                4.6838336 = idf(docFreq=1062, maxDocs=42306)
                0.018942647 = queryNorm
              0.62099445 = fieldWeight in 2303, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.6838336 = idf(docFreq=1062, maxDocs=42306)
                0.09375 = fieldNorm(doc=2303)
          0.20604937 = weight(abstract_txt:aggregation in 2303) [ClassicSimilarity], result of:
            0.20604937 = score(doc=2303,freq=1.0), product of:
              0.29908475 = queryWeight, product of:
                2.148562 = boost
                7.348619 = idf(docFreq=73, maxDocs=42306)
                0.018942647 = queryNorm
              0.688933 = fieldWeight in 2303, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.348619 = idf(docFreq=73, maxDocs=42306)
                0.09375 = fieldNorm(doc=2303)
          0.13045652 = weight(abstract_txt:metadata in 2303) [ClassicSimilarity], result of:
            0.13045652 = score(doc=2303,freq=2.0), product of:
              0.20036 = queryWeight, product of:
                2.1537826 = boost
                4.9109836 = idf(docFreq=846, maxDocs=42306)
                0.018942647 = queryNorm
              0.6511106 = fieldWeight in 2303, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.9109836 = idf(docFreq=846, maxDocs=42306)
                0.09375 = fieldNorm(doc=2303)
        0.24 = coord(6/25)