Document (#267)

Author
Hustand, S.
Title
Problems of duplicate records
Source
Future of online catalogs. Essen Symposium, 30.9.-3.10.1985. Ed. by A.H. Helal, J.W. Weiss
Imprint
Essen : Gesamthochschulbibliothek
Year
1986
Pages
S.169-202
Series
Veröffentlichungen der Gesamthochschulbibliothek Essen; 8
Abstract
Duplicate records is a familiar problem in bibliographic databases. The problem is obvious when a union catalogue is established by automatically merging two or more separate and independent source of catalogue information. However, even in systems with on-line cataloguing and access to previous records, duplication is a problem. Author / title search search prior to cataloguing does not cut duplication to zero. A great deal of effort has been put into developing methods of duplicate detection. A major problem in this work has been efficiency. Particularly in the on-line setting is this of importance. Most studies have dealt with book and article material. The Research Libraries Group Inc. has described matching algorithms also for films, maps, recordings, scores and serials. Various methods of detecting duplicates will be discussed.
Theme
Formalerschließung

Similar documents (content)

  1. Cousins, S.A.: Duplicate detection and record consolidation in large bibliographic databases : the COPAC database experience (1998) 0.35
    0.34555805 = sum of:
      0.34555805 = product of:
        1.2341359 = sum of:
          0.10010816 = weight(abstract_txt:detection in 3831) [ClassicSimilarity], result of:
            0.10010816 = score(doc=3831,freq=2.0), product of:
              0.1334141 = queryWeight, product of:
                1.1122857 = boost
                6.791454 = idf(docFreq=132, maxDocs=43556)
                0.0176613 = queryNorm
              0.7503567 = fieldWeight in 3831, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.791454 = idf(docFreq=132, maxDocs=43556)
                0.078125 = fieldNorm(doc=3831)
          0.021609213 = weight(abstract_txt:been in 3831) [ClassicSimilarity], result of:
            0.021609213 = score(doc=3831,freq=1.0), product of:
              0.076208144 = queryWeight, product of:
                1.1888613 = boost
                3.6295063 = idf(docFreq=3140, maxDocs=43556)
                0.0176613 = queryNorm
              0.28355518 = fieldWeight in 3831, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6295063 = idf(docFreq=3140, maxDocs=43556)
                0.078125 = fieldNorm(doc=3831)
          0.082956694 = weight(abstract_txt:catalogue in 3831) [ClassicSimilarity], result of:
            0.082956694 = score(doc=3831,freq=2.0), product of:
              0.14829722 = queryWeight, product of:
                1.6584294 = boost
                5.0630636 = idf(docFreq=748, maxDocs=43556)
                0.0176613 = queryNorm
              0.5593948 = fieldWeight in 3831, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.0630636 = idf(docFreq=748, maxDocs=43556)
                0.078125 = fieldNorm(doc=3831)
          0.10167672 = weight(abstract_txt:records in 3831) [ClassicSimilarity], result of:
            0.10167672 = score(doc=3831,freq=3.0), product of:
              0.16984253 = queryWeight, product of:
                2.1736987 = boost
                4.424095 = idf(docFreq=1418, maxDocs=43556)
                0.0176613 = queryNorm
              0.5986529 = fieldWeight in 3831, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.424095 = idf(docFreq=1418, maxDocs=43556)
                0.078125 = fieldNorm(doc=3831)
          0.24254905 = weight(abstract_txt:duplication in 3831) [ClassicSimilarity], result of:
            0.24254905 = score(doc=3831,freq=1.0), product of:
              0.38203967 = queryWeight, product of:
                2.661857 = boost
                8.126454 = idf(docFreq=34, maxDocs=43556)
                0.0176613 = queryNorm
              0.63487923 = fieldWeight in 3831, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.126454 = idf(docFreq=34, maxDocs=43556)
                0.078125 = fieldNorm(doc=3831)
          0.079915404 = weight(abstract_txt:problem in 3831) [ClassicSimilarity], result of:
            0.079915404 = score(doc=3831,freq=1.0), product of:
              0.2296179 = queryWeight, product of:
                2.9184237 = boost
                4.454867 = idf(docFreq=1375, maxDocs=43556)
                0.0176613 = queryNorm
              0.34803647 = fieldWeight in 3831, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.454867 = idf(docFreq=1375, maxDocs=43556)
                0.078125 = fieldNorm(doc=3831)
          0.60532063 = weight(abstract_txt:duplicate in 3831) [ClassicSimilarity], result of:
            0.60532063 = score(doc=3831,freq=3.0), product of:
              0.5578992 = queryWeight, product of:
                3.9396167 = boost
                8.018241 = idf(docFreq=38, maxDocs=43556)
                0.0176613 = queryNorm
              1.085 = fieldWeight in 3831, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.018241 = idf(docFreq=38, maxDocs=43556)
                0.078125 = fieldNorm(doc=3831)
        0.28 = coord(7/25)
    
  2. Süle, G.: Problems of duplicate records, standards and quality control (1986) 0.28
    0.28168008 = sum of:
      0.28168008 = product of:
        1.4084003 = sum of:
          0.071323454 = weight(abstract_txt:cataloguing in 4058) [ClassicSimilarity], result of:
            0.071323454 = score(doc=4058,freq=1.0), product of:
              0.13499312 = queryWeight, product of:
                1.5822908 = boost
                4.830618 = idf(docFreq=944, maxDocs=43556)
                0.0176613 = queryNorm
              0.5283488 = fieldWeight in 4058, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.830618 = idf(docFreq=944, maxDocs=43556)
                0.109375 = fieldNorm(doc=4058)
          0.08212293 = weight(abstract_txt:catalogue in 4058) [ClassicSimilarity], result of:
            0.08212293 = score(doc=4058,freq=1.0), product of:
              0.14829722 = queryWeight, product of:
                1.6584294 = boost
                5.0630636 = idf(docFreq=748, maxDocs=43556)
                0.0176613 = queryNorm
              0.55377257 = fieldWeight in 4058, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0630636 = idf(docFreq=748, maxDocs=43556)
                0.109375 = fieldNorm(doc=4058)
          0.13405687 = weight(abstract_txt:line in 4058) [ClassicSimilarity], result of:
            0.13405687 = score(doc=4058,freq=1.0), product of:
              0.20559667 = queryWeight, product of:
                1.9527134 = boost
                5.961491 = idf(docFreq=304, maxDocs=43556)
                0.0176613 = queryNorm
              0.6520381 = fieldWeight in 4058, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.961491 = idf(docFreq=304, maxDocs=43556)
                0.109375 = fieldNorm(doc=4058)
          0.14234741 = weight(abstract_txt:records in 4058) [ClassicSimilarity], result of:
            0.14234741 = score(doc=4058,freq=3.0), product of:
              0.16984253 = queryWeight, product of:
                2.1736987 = boost
                4.424095 = idf(docFreq=1418, maxDocs=43556)
                0.0176613 = queryNorm
              0.8381141 = fieldWeight in 4058, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.424095 = idf(docFreq=1418, maxDocs=43556)
                0.109375 = fieldNorm(doc=4058)
          0.97854966 = weight(abstract_txt:duplicate in 4058) [ClassicSimilarity], result of:
            0.97854966 = score(doc=4058,freq=4.0), product of:
              0.5578992 = queryWeight, product of:
                3.9396167 = boost
                8.018241 = idf(docFreq=38, maxDocs=43556)
                0.0176613 = queryNorm
              1.7539902 = fieldWeight in 4058, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                8.018241 = idf(docFreq=38, maxDocs=43556)
                0.109375 = fieldNorm(doc=4058)
        0.2 = coord(5/25)
    
  3. Sitas, A.; Kapidakis, S.: Duplicate detection algorithms of bibliographic descriptions (2008) 0.23
    0.2291163 = sum of:
      0.2291163 = product of:
        1.1455815 = sum of:
          0.14157432 = weight(abstract_txt:detection in 4541) [ClassicSimilarity], result of:
            0.14157432 = score(doc=4541,freq=4.0), product of:
              0.1334141 = queryWeight, product of:
                1.1122857 = boost
                6.791454 = idf(docFreq=132, maxDocs=43556)
                0.0176613 = queryNorm
              1.0611646 = fieldWeight in 4541, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.791454 = idf(docFreq=132, maxDocs=43556)
                0.078125 = fieldNorm(doc=4541)
          0.097434446 = weight(abstract_txt:merging in 4541) [ClassicSimilarity], result of:
            0.097434446 = score(doc=4541,freq=1.0), product of:
              0.1650848 = queryWeight, product of:
                1.237283 = boost
                7.5546684 = idf(docFreq=61, maxDocs=43556)
                0.0176613 = queryNorm
              0.5902085 = fieldWeight in 4541, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5546684 = idf(docFreq=61, maxDocs=43556)
                0.078125 = fieldNorm(doc=4541)
          0.058703087 = weight(abstract_txt:records in 4541) [ClassicSimilarity], result of:
            0.058703087 = score(doc=4541,freq=1.0), product of:
              0.16984253 = queryWeight, product of:
                2.1736987 = boost
                4.424095 = idf(docFreq=1418, maxDocs=43556)
                0.0176613 = queryNorm
              0.34563243 = fieldWeight in 4541, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.424095 = idf(docFreq=1418, maxDocs=43556)
                0.078125 = fieldNorm(doc=4541)
          0.24254905 = weight(abstract_txt:duplication in 4541) [ClassicSimilarity], result of:
            0.24254905 = score(doc=4541,freq=1.0), product of:
              0.38203967 = queryWeight, product of:
                2.661857 = boost
                8.126454 = idf(docFreq=34, maxDocs=43556)
                0.0176613 = queryNorm
              0.63487923 = fieldWeight in 4541, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.126454 = idf(docFreq=34, maxDocs=43556)
                0.078125 = fieldNorm(doc=4541)
          0.60532063 = weight(abstract_txt:duplicate in 4541) [ClassicSimilarity], result of:
            0.60532063 = score(doc=4541,freq=3.0), product of:
              0.5578992 = queryWeight, product of:
                3.9396167 = boost
                8.018241 = idf(docFreq=38, maxDocs=43556)
                0.0176613 = queryNorm
              1.085 = fieldWeight in 4541, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.018241 = idf(docFreq=38, maxDocs=43556)
                0.078125 = fieldNorm(doc=4541)
        0.2 = coord(5/25)
    
  4. Weiss, P.J.: Getting the expert into the system : expert systems and cataloging (1995) 0.20
    0.20330954 = sum of:
      0.20330954 = product of:
        1.0165477 = sum of:
          0.106709346 = weight(abstract_txt:serials in 2463) [ClassicSimilarity], result of:
            0.106709346 = score(doc=2463,freq=1.0), product of:
              0.1282194 = queryWeight, product of:
                1.0904163 = boost
                6.6579223 = idf(docFreq=151, maxDocs=43556)
                0.0176613 = queryNorm
              0.8322403 = fieldWeight in 2463, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6579223 = idf(docFreq=151, maxDocs=43556)
                0.125 = fieldNorm(doc=2463)
          0.113259465 = weight(abstract_txt:detection in 2463) [ClassicSimilarity], result of:
            0.113259465 = score(doc=2463,freq=1.0), product of:
              0.1334141 = queryWeight, product of:
                1.1122857 = boost
                6.791454 = idf(docFreq=132, maxDocs=43556)
                0.0176613 = queryNorm
              0.8489317 = fieldWeight in 2463, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.791454 = idf(docFreq=132, maxDocs=43556)
                0.125 = fieldNorm(doc=2463)
          0.15589511 = weight(abstract_txt:merging in 2463) [ClassicSimilarity], result of:
            0.15589511 = score(doc=2463,freq=1.0), product of:
              0.1650848 = queryWeight, product of:
                1.237283 = boost
                7.5546684 = idf(docFreq=61, maxDocs=43556)
                0.0176613 = queryNorm
              0.94433355 = fieldWeight in 2463, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5546684 = idf(docFreq=61, maxDocs=43556)
                0.125 = fieldNorm(doc=2463)
          0.081512526 = weight(abstract_txt:cataloguing in 2463) [ClassicSimilarity], result of:
            0.081512526 = score(doc=2463,freq=1.0), product of:
              0.13499312 = queryWeight, product of:
                1.5822908 = boost
                4.830618 = idf(docFreq=944, maxDocs=43556)
                0.0176613 = queryNorm
              0.60382724 = fieldWeight in 2463, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.830618 = idf(docFreq=944, maxDocs=43556)
                0.125 = fieldNorm(doc=2463)
          0.55917126 = weight(abstract_txt:duplicate in 2463) [ClassicSimilarity], result of:
            0.55917126 = score(doc=2463,freq=1.0), product of:
              0.5578992 = queryWeight, product of:
                3.9396167 = boost
                8.018241 = idf(docFreq=38, maxDocs=43556)
                0.0176613 = queryNorm
              1.0022801 = fieldWeight in 2463, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.018241 = idf(docFreq=38, maxDocs=43556)
                0.125 = fieldNorm(doc=2463)
        0.2 = coord(5/25)
    
  5. Conrad, J.G.; Schriber, C.P.: Managing déjà vu : collection building for the identification of nonidentical duplicate documents (2006) 0.20
    0.20115171 = sum of:
      0.20115171 = product of:
        1.0057585 = sum of:
          0.07078716 = weight(abstract_txt:detection in 57) [ClassicSimilarity], result of:
            0.07078716 = score(doc=57,freq=1.0), product of:
              0.1334141 = queryWeight, product of:
                1.1122857 = boost
                6.791454 = idf(docFreq=132, maxDocs=43556)
                0.0176613 = queryNorm
              0.5305823 = fieldWeight in 57, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.791454 = idf(docFreq=132, maxDocs=43556)
                0.078125 = fieldNorm(doc=57)
          0.031199642 = weight(abstract_txt:search in 57) [ClassicSimilarity], result of:
            0.031199642 = score(doc=57,freq=2.0), product of:
              0.07726779 = queryWeight, product of:
                1.197098 = boost
                3.6546526 = idf(docFreq=3062, maxDocs=43556)
                0.0176613 = queryNorm
              0.40378588 = fieldWeight in 57, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.6546526 = idf(docFreq=3062, maxDocs=43556)
                0.078125 = fieldNorm(doc=57)
          0.032497536 = weight(abstract_txt:methods in 57) [ClassicSimilarity], result of:
            0.032497536 = score(doc=57,freq=1.0), product of:
              0.100032784 = queryWeight, product of:
                1.362077 = boost
                4.1583214 = idf(docFreq=1850, maxDocs=43556)
                0.0176613 = queryNorm
              0.32486886 = fieldWeight in 57, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1583214 = idf(docFreq=1850, maxDocs=43556)
                0.078125 = fieldNorm(doc=57)
          0.26595348 = weight(abstract_txt:duplicates in 57) [ClassicSimilarity], result of:
            0.26595348 = score(doc=57,freq=3.0), product of:
              0.2235607 = queryWeight, product of:
                1.4398366 = boost
                8.791431 = idf(docFreq=17, maxDocs=43556)
                0.0176613 = queryNorm
              1.1896254 = fieldWeight in 57, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.791431 = idf(docFreq=17, maxDocs=43556)
                0.078125 = fieldNorm(doc=57)
          0.60532063 = weight(abstract_txt:duplicate in 57) [ClassicSimilarity], result of:
            0.60532063 = score(doc=57,freq=3.0), product of:
              0.5578992 = queryWeight, product of:
                3.9396167 = boost
                8.018241 = idf(docFreq=38, maxDocs=43556)
                0.0176613 = queryNorm
              1.085 = fieldWeight in 57, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.018241 = idf(docFreq=38, maxDocs=43556)
                0.078125 = fieldNorm(doc=57)
        0.2 = coord(5/25)