Document (#267)

Author
Hustand, S.
Title
Problems of duplicate records
Source
Future of online catalogs. Essen Symposium, 30.9.-3.10.1985. Ed. by A.H. Helal, J.W. Weiss
Imprint
Essen : Gesamthochschulbibliothek
Year
1986
Pages
S.169-202
Series
Veröffentlichungen der Gesamthochschulbibliothek Essen; 8
Abstract
Duplicate records is a familiar problem in bibliographic databases. The problem is obvious when a union catalogue is established by automatically merging two or more separate and independent source of catalogue information. However, even in systems with on-line cataloguing and access to previous records, duplication is a problem. Author / title search search prior to cataloguing does not cut duplication to zero. A great deal of effort has been put into developing methods of duplicate detection. A major problem in this work has been efficiency. Particularly in the on-line setting is this of importance. Most studies have dealt with book and article material. The Research Libraries Group Inc. has described matching algorithms also for films, maps, recordings, scores and serials. Various methods of detecting duplicates will be discussed.
Theme
Formalerschließung

Similar documents (content)

  1. Cousins, S.A.: Duplicate detection and record consolidation in large bibliographic databases : the COPAC database experience (1998) 0.35
    0.34705567 = sum of:
      0.34705567 = product of:
        1.2394845 = sum of:
          0.09982887 = weight(abstract_txt:detection in 2833) [ClassicSimilarity], result of:
            0.09982887 = score(doc=2833,freq=2.0), product of:
              0.13318351 = queryWeight, product of:
                1.1144685 = boost
                6.784232 = idf(docFreq=135, maxDocs=44218)
                0.017614972 = queryNorm
              0.7495588 = fieldWeight in 2833, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.784232 = idf(docFreq=135, maxDocs=44218)
                0.078125 = fieldNorm(doc=2833)
          0.021405388 = weight(abstract_txt:been in 2833) [ClassicSimilarity], result of:
            0.021405388 = score(doc=2833,freq=1.0), product of:
              0.075738214 = queryWeight, product of:
                1.1885436 = boost
                3.617579 = idf(docFreq=3226, maxDocs=44218)
                0.017614972 = queryNorm
              0.28262335 = fieldWeight in 2833, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.617579 = idf(docFreq=3226, maxDocs=44218)
                0.078125 = fieldNorm(doc=2833)
          0.083601825 = weight(abstract_txt:catalogue in 2833) [ClassicSimilarity], result of:
            0.083601825 = score(doc=2833,freq=2.0), product of:
              0.14908485 = queryWeight, product of:
                1.6675326 = boost
                5.0754814 = idf(docFreq=750, maxDocs=44218)
                0.017614972 = queryNorm
              0.56076676 = fieldWeight in 2833, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.0754814 = idf(docFreq=750, maxDocs=44218)
                0.078125 = fieldNorm(doc=2833)
          0.10141016 = weight(abstract_txt:records in 2833) [ClassicSimilarity], result of:
            0.10141016 = score(doc=2833,freq=3.0), product of:
              0.16956803 = queryWeight, product of:
                2.1780868 = boost
                4.4196396 = idf(docFreq=1446, maxDocs=44218)
                0.017614972 = queryNorm
              0.59805 = fieldWeight in 2833, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.4196396 = idf(docFreq=1446, maxDocs=44218)
                0.078125 = fieldNorm(doc=2833)
          0.24399936 = weight(abstract_txt:duplication in 2833) [ClassicSimilarity], result of:
            0.24399936 = score(doc=2833,freq=1.0), product of:
              0.38361195 = queryWeight, product of:
                2.6748757 = boost
                8.14154 = idf(docFreq=34, maxDocs=44218)
                0.017614972 = queryNorm
              0.6360578 = fieldWeight in 2833, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.14154 = idf(docFreq=34, maxDocs=44218)
                0.078125 = fieldNorm(doc=2833)
          0.080253445 = weight(abstract_txt:problem in 2833) [ClassicSimilarity], result of:
            0.080253445 = score(doc=2833,freq=1.0), product of:
              0.2302955 = queryWeight, product of:
                2.9309964 = boost
                4.460548 = idf(docFreq=1388, maxDocs=44218)
                0.017614972 = queryNorm
              0.3484803 = fieldWeight in 2833, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.460548 = idf(docFreq=1388, maxDocs=44218)
                0.078125 = fieldNorm(doc=2833)
          0.6089855 = weight(abstract_txt:duplicate in 2833) [ClassicSimilarity], result of:
            0.6089855 = score(doc=2833,freq=3.0), product of:
              0.56022304 = queryWeight, product of:
                3.9589834 = boost
                8.033325 = idf(docFreq=38, maxDocs=44218)
                0.017614972 = queryNorm
              1.0870411 = fieldWeight in 2833, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.033325 = idf(docFreq=38, maxDocs=44218)
                0.078125 = fieldNorm(doc=2833)
        0.28 = coord(7/25)
    
  2. Süle, G.: Problems of duplicate records, standards and quality control (1986) 0.28
    0.28322563 = sum of:
      0.28322563 = product of:
        1.4161282 = sum of:
          0.0717873 = weight(abstract_txt:cataloguing in 2060) [ClassicSimilarity], result of:
            0.0717873 = score(doc=2060,freq=1.0), product of:
              0.13559574 = queryWeight, product of:
                1.5903056 = boost
                4.840425 = idf(docFreq=949, maxDocs=44218)
                0.017614972 = queryNorm
              0.5294215 = fieldWeight in 2060, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.840425 = idf(docFreq=949, maxDocs=44218)
                0.109375 = fieldNorm(doc=2060)
          0.08276159 = weight(abstract_txt:catalogue in 2060) [ClassicSimilarity], result of:
            0.08276159 = score(doc=2060,freq=1.0), product of:
              0.14908485 = queryWeight, product of:
                1.6675326 = boost
                5.0754814 = idf(docFreq=750, maxDocs=44218)
                0.017614972 = queryNorm
              0.5551308 = fieldWeight in 2060, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0754814 = idf(docFreq=750, maxDocs=44218)
                0.109375 = fieldNorm(doc=2060)
          0.13513078 = weight(abstract_txt:line in 2060) [ClassicSimilarity], result of:
            0.13513078 = score(doc=2060,freq=1.0), product of:
              0.20672062 = queryWeight, product of:
                1.963584 = boost
                5.9765754 = idf(docFreq=304, maxDocs=44218)
                0.017614972 = queryNorm
              0.65368795 = fieldWeight in 2060, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9765754 = idf(docFreq=304, maxDocs=44218)
                0.109375 = fieldNorm(doc=2060)
          0.14197423 = weight(abstract_txt:records in 2060) [ClassicSimilarity], result of:
            0.14197423 = score(doc=2060,freq=3.0), product of:
              0.16956803 = queryWeight, product of:
                2.1780868 = boost
                4.4196396 = idf(docFreq=1446, maxDocs=44218)
                0.017614972 = queryNorm
              0.83727 = fieldWeight in 2060, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.4196396 = idf(docFreq=1446, maxDocs=44218)
                0.109375 = fieldNorm(doc=2060)
          0.9844743 = weight(abstract_txt:duplicate in 2060) [ClassicSimilarity], result of:
            0.9844743 = score(doc=2060,freq=4.0), product of:
              0.56022304 = queryWeight, product of:
                3.9589834 = boost
                8.033325 = idf(docFreq=38, maxDocs=44218)
                0.017614972 = queryNorm
              1.7572899 = fieldWeight in 2060, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                8.033325 = idf(docFreq=38, maxDocs=44218)
                0.109375 = fieldNorm(doc=2060)
        0.2 = coord(5/25)
    
  3. Sitas, A.; Kapidakis, S.: Duplicate detection algorithms of bibliographic descriptions (2008) 0.23
    0.22990862 = sum of:
      0.22990862 = product of:
        1.149543 = sum of:
          0.14117935 = weight(abstract_txt:detection in 2543) [ClassicSimilarity], result of:
            0.14117935 = score(doc=2543,freq=4.0), product of:
              0.13318351 = queryWeight, product of:
                1.1144685 = boost
                6.784232 = idf(docFreq=135, maxDocs=44218)
                0.017614972 = queryNorm
              1.0600363 = fieldWeight in 2543, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.784232 = idf(docFreq=135, maxDocs=44218)
                0.078125 = fieldNorm(doc=2543)
          0.09682959 = weight(abstract_txt:merging in 2543) [ClassicSimilarity], result of:
            0.09682959 = score(doc=2543,freq=1.0), product of:
              0.16442269 = queryWeight, product of:
                1.238293 = boost
                7.538004 = idf(docFreq=63, maxDocs=44218)
                0.017614972 = queryNorm
              0.5889065 = fieldWeight in 2543, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.538004 = idf(docFreq=63, maxDocs=44218)
                0.078125 = fieldNorm(doc=2543)
          0.058549188 = weight(abstract_txt:records in 2543) [ClassicSimilarity], result of:
            0.058549188 = score(doc=2543,freq=1.0), product of:
              0.16956803 = queryWeight, product of:
                2.1780868 = boost
                4.4196396 = idf(docFreq=1446, maxDocs=44218)
                0.017614972 = queryNorm
              0.34528434 = fieldWeight in 2543, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4196396 = idf(docFreq=1446, maxDocs=44218)
                0.078125 = fieldNorm(doc=2543)
          0.24399936 = weight(abstract_txt:duplication in 2543) [ClassicSimilarity], result of:
            0.24399936 = score(doc=2543,freq=1.0), product of:
              0.38361195 = queryWeight, product of:
                2.6748757 = boost
                8.14154 = idf(docFreq=34, maxDocs=44218)
                0.017614972 = queryNorm
              0.6360578 = fieldWeight in 2543, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.14154 = idf(docFreq=34, maxDocs=44218)
                0.078125 = fieldNorm(doc=2543)
          0.6089855 = weight(abstract_txt:duplicate in 2543) [ClassicSimilarity], result of:
            0.6089855 = score(doc=2543,freq=3.0), product of:
              0.56022304 = queryWeight, product of:
                3.9589834 = boost
                8.033325 = idf(docFreq=38, maxDocs=44218)
                0.017614972 = queryNorm
              1.0870411 = fieldWeight in 2543, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.033325 = idf(docFreq=38, maxDocs=44218)
                0.078125 = fieldNorm(doc=2543)
        0.2 = coord(5/25)
    
  4. Weiss, P.J.: Getting the expert into the system : expert systems and cataloging (1995) 0.20
    0.20392655 = sum of:
      0.20392655 = product of:
        1.0196327 = sum of:
          0.107162476 = weight(abstract_txt:serials in 2397) [ClassicSimilarity], result of:
            0.107162476 = score(doc=2397,freq=1.0), product of:
              0.12859917 = queryWeight, product of:
                1.0951198 = boost
                6.666449 = idf(docFreq=152, maxDocs=44218)
                0.017614972 = queryNorm
              0.83330613 = fieldWeight in 2397, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.666449 = idf(docFreq=152, maxDocs=44218)
                0.125 = fieldNorm(doc=2397)
          0.11294348 = weight(abstract_txt:detection in 2397) [ClassicSimilarity], result of:
            0.11294348 = score(doc=2397,freq=1.0), product of:
              0.13318351 = queryWeight, product of:
                1.1144685 = boost
                6.784232 = idf(docFreq=135, maxDocs=44218)
                0.017614972 = queryNorm
              0.848029 = fieldWeight in 2397, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.784232 = idf(docFreq=135, maxDocs=44218)
                0.125 = fieldNorm(doc=2397)
          0.15492736 = weight(abstract_txt:merging in 2397) [ClassicSimilarity], result of:
            0.15492736 = score(doc=2397,freq=1.0), product of:
              0.16442269 = queryWeight, product of:
                1.238293 = boost
                7.538004 = idf(docFreq=63, maxDocs=44218)
                0.017614972 = queryNorm
              0.9422505 = fieldWeight in 2397, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.538004 = idf(docFreq=63, maxDocs=44218)
                0.125 = fieldNorm(doc=2397)
          0.08204263 = weight(abstract_txt:cataloguing in 2397) [ClassicSimilarity], result of:
            0.08204263 = score(doc=2397,freq=1.0), product of:
              0.13559574 = queryWeight, product of:
                1.5903056 = boost
                4.840425 = idf(docFreq=949, maxDocs=44218)
                0.017614972 = queryNorm
              0.6050531 = fieldWeight in 2397, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.840425 = idf(docFreq=949, maxDocs=44218)
                0.125 = fieldNorm(doc=2397)
          0.56255674 = weight(abstract_txt:duplicate in 2397) [ClassicSimilarity], result of:
            0.56255674 = score(doc=2397,freq=1.0), product of:
              0.56022304 = queryWeight, product of:
                3.9589834 = boost
                8.033325 = idf(docFreq=38, maxDocs=44218)
                0.017614972 = queryNorm
              1.0041656 = fieldWeight in 2397, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.033325 = idf(docFreq=38, maxDocs=44218)
                0.125 = fieldNorm(doc=2397)
        0.2 = coord(5/25)
    
  5. Conrad, J.G.; Schriber, C.P.: Managing déjà vu : collection building for the identification of nonidentical duplicate documents (2006) 0.20
    0.20021224 = sum of:
      0.20021224 = product of:
        1.0010612 = sum of:
          0.07058968 = weight(abstract_txt:detection in 5059) [ClassicSimilarity], result of:
            0.07058968 = score(doc=5059,freq=1.0), product of:
              0.13318351 = queryWeight, product of:
                1.1144685 = boost
                6.784232 = idf(docFreq=135, maxDocs=44218)
                0.017614972 = queryNorm
              0.53001815 = fieldWeight in 5059, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.784232 = idf(docFreq=135, maxDocs=44218)
                0.078125 = fieldNorm(doc=5059)
          0.03129925 = weight(abstract_txt:search in 5059) [ClassicSimilarity], result of:
            0.03129925 = score(doc=5059,freq=2.0), product of:
              0.07744242 = queryWeight, product of:
                1.201841 = boost
                3.6580524 = idf(docFreq=3098, maxDocs=44218)
                0.017614972 = queryNorm
              0.4041615 = fieldWeight in 5059, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.6580524 = idf(docFreq=3098, maxDocs=44218)
                0.078125 = fieldNorm(doc=5059)
          0.03223986 = weight(abstract_txt:methods in 5059) [ClassicSimilarity], result of:
            0.03223986 = score(doc=5059,freq=1.0), product of:
              0.0995165 = queryWeight, product of:
                1.3624015 = boost
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.017614972 = queryNorm
              0.32396498 = fieldWeight in 5059, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.078125 = fieldNorm(doc=5059)
          0.2579469 = weight(abstract_txt:duplicates in 5059) [ClassicSimilarity], result of:
            0.2579469 = score(doc=5059,freq=3.0), product of:
              0.21908003 = queryWeight, product of:
                1.4293677 = boost
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.017614972 = queryNorm
              1.1774095 = fieldWeight in 5059, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.078125 = fieldNorm(doc=5059)
          0.6089855 = weight(abstract_txt:duplicate in 5059) [ClassicSimilarity], result of:
            0.6089855 = score(doc=5059,freq=3.0), product of:
              0.56022304 = queryWeight, product of:
                3.9589834 = boost
                8.033325 = idf(docFreq=38, maxDocs=44218)
                0.017614972 = queryNorm
              1.0870411 = fieldWeight in 5059, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.033325 = idf(docFreq=38, maxDocs=44218)
                0.078125 = fieldNorm(doc=5059)
        0.2 = coord(5/25)