Document (#21381)

Author
Meir, D.D.
Lazinger, S.S.
Title
Measuring the performance of a merging algorithm : mismatches, missed-matches, and overlap in Israel's union list
Source
Information technology and libraries. 17(1998) no.3, S.116-123
Year
1998
Abstract
Reports results of a survey, undertaken in 1996, to measure the performance of the merging algorithm used to generate the now defunct ALEPH ULM (Union List of Monographs) file. Results showed that although the algorithm created almost no mismatches that would have led to the loss of information, it had a greater proportion of missed matches than was anticipated, especially when matching Hebrew bibliographic records. Discusses the central issues inherent in automatic detection and merging of duplicate records, as well as the main methodologies for measuring the performance of merging algorithms. Recommendations include integrating testing procedures into the initial specifications for any future algorithms and deciding on a performance threshold that the algorithm must exceed in order to be put to use
Theme
Formalerschließung
Object
ALEPH ULM
Location
Israel

Similar documents (author)

  1. Lazinger, S.S.: To merge or not to merge : Israel's Union List of Monographs in the context of merging algorithms (1994) 5.86
    5.8620114 = sum of:
      5.8620114 = weight(author_txt:lazinger in 3100) [ClassicSimilarity], result of:
        5.8620114 = fieldWeight in 3100, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.379218 = idf(docFreq=9, maxDocs=43556)
          0.625 = fieldNorm(doc=3100)
    
  2. Lazinger, S.S.: Digital preservation and metadata : history, theory, practice (2002) 5.86
    5.8620114 = sum of:
      5.8620114 = weight(author_txt:lazinger in 2260) [ClassicSimilarity], result of:
        5.8620114 = fieldWeight in 2260, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.379218 = idf(docFreq=9, maxDocs=43556)
          0.625 = fieldNorm(doc=2260)
    
  3. Lazinger, S.S.: LC Classification of a library and information science library for maximum shelf retrieval (1984) 5.86
    5.8620114 = sum of:
      5.8620114 = weight(author_txt:lazinger in 1462) [ClassicSimilarity], result of:
        5.8620114 = fieldWeight in 1462, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.379218 = idf(docFreq=9, maxDocs=43556)
          0.625 = fieldNorm(doc=1462)
    
  4. Lazinger, S.S.; Peritz, B.C.: Reader use of a nationwide research library network : local OPAC vs. remote files (1991) 4.69
    4.689609 = sum of:
      4.689609 = weight(author_txt:lazinger in 4011) [ClassicSimilarity], result of:
        4.689609 = fieldWeight in 4011, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.379218 = idf(docFreq=9, maxDocs=43556)
          0.5 = fieldNorm(doc=4011)
    
  5. Shoham, S.; Lazinger, S.S.: ¬The no-main-entry principle and the automated catalog (1991) 4.69
    4.689609 = sum of:
      4.689609 = weight(author_txt:lazinger in 1630) [ClassicSimilarity], result of:
        4.689609 = fieldWeight in 1630, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.379218 = idf(docFreq=9, maxDocs=43556)
          0.5 = fieldNorm(doc=1630)
    

Similar documents (content)

  1. Lazinger, S.S.: To merge or not to merge : Israel's Union List of Monographs in the context of merging algorithms (1994) 0.32
    0.323422 = sum of:
      0.323422 = product of:
        1.1550786 = sum of:
          0.15389912 = weight(abstract_txt:monographs in 3100) [ClassicSimilarity], result of:
            0.15389912 = score(doc=3100,freq=4.0), product of:
              0.11428517 = queryWeight, product of:
                1.037916 = boost
                7.181993 = idf(docFreq=89, maxDocs=43556)
                0.015331432 = queryNorm
              1.3466237 = fieldWeight in 3100, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.181993 = idf(docFreq=89, maxDocs=43556)
                0.09375 = fieldNorm(doc=3100)
          0.093229584 = weight(abstract_txt:aleph in 3100) [ClassicSimilarity], result of:
            0.093229584 = score(doc=3100,freq=1.0), product of:
              0.1298838 = queryWeight, product of:
                1.106483 = boost
                7.656451 = idf(docFreq=55, maxDocs=43556)
                0.015331432 = queryNorm
              0.7177923 = fieldWeight in 3100, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.656451 = idf(docFreq=55, maxDocs=43556)
                0.09375 = fieldNorm(doc=3100)
          0.035972826 = weight(abstract_txt:records in 3100) [ClassicSimilarity], result of:
            0.035972826 = score(doc=3100,freq=1.0), product of:
              0.086731896 = queryWeight, product of:
                1.2787088 = boost
                4.424095 = idf(docFreq=1418, maxDocs=43556)
                0.015331432 = queryNorm
              0.41475892 = fieldWeight in 3100, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.424095 = idf(docFreq=1418, maxDocs=43556)
                0.09375 = fieldNorm(doc=3100)
          0.13012294 = weight(abstract_txt:list in 3100) [ClassicSimilarity], result of:
            0.13012294 = score(doc=3100,freq=4.0), product of:
              0.12874936 = queryWeight, product of:
                1.5579545 = boost
                5.3902335 = idf(docFreq=539, maxDocs=43556)
                0.015331432 = queryNorm
              1.0106688 = fieldWeight in 3100, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.3902335 = idf(docFreq=539, maxDocs=43556)
                0.09375 = fieldNorm(doc=3100)
          0.22826025 = weight(abstract_txt:union in 3100) [ClassicSimilarity], result of:
            0.22826025 = score(doc=3100,freq=6.0), product of:
              0.16359332 = queryWeight, product of:
                1.7561638 = boost
                6.0760007 = idf(docFreq=271, maxDocs=43556)
                0.015331432 = queryNorm
              1.3952907 = fieldWeight in 3100, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.0760007 = idf(docFreq=271, maxDocs=43556)
                0.09375 = fieldNorm(doc=3100)
          0.15535118 = weight(abstract_txt:algorithm in 3100) [ClassicSimilarity], result of:
            0.15535118 = score(doc=3100,freq=1.0), product of:
              0.28978917 = queryWeight, product of:
                3.3055089 = boost
                5.7182236 = idf(docFreq=388, maxDocs=43556)
                0.015331432 = queryNorm
              0.53608346 = fieldWeight in 3100, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7182236 = idf(docFreq=388, maxDocs=43556)
                0.09375 = fieldNorm(doc=3100)
          0.35824272 = weight(abstract_txt:merging in 3100) [ClassicSimilarity], result of:
            0.35824272 = score(doc=3100,freq=1.0), product of:
              0.50581384 = queryWeight, product of:
                4.3670945 = boost
                7.5546684 = idf(docFreq=61, maxDocs=43556)
                0.015331432 = queryNorm
              0.70825016 = fieldWeight in 3100, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5546684 = idf(docFreq=61, maxDocs=43556)
                0.09375 = fieldNorm(doc=3100)
        0.28 = coord(7/25)
    
  2. Paltoglou, G.; Salampasis, M.; Satratzemi, M.: ¬A results merging algorithm for distributed information retrieval environments that combines regression methodologies with a selective download phase (2008) 0.22
    0.21895248 = sum of:
      0.21895248 = product of:
        0.7819731 = sum of:
          0.04675478 = weight(abstract_txt:overlap in 4109) [ClassicSimilarity], result of:
            0.04675478 = score(doc=4109,freq=1.0), product of:
              0.107431255 = queryWeight, product of:
                1.0063118 = boost
                6.963304 = idf(docFreq=111, maxDocs=43556)
                0.015331432 = queryNorm
              0.4352065 = fieldWeight in 4109, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.963304 = idf(docFreq=111, maxDocs=43556)
                0.0625 = fieldNorm(doc=4109)
          0.0118327355 = weight(abstract_txt:results in 4109) [ClassicSimilarity], result of:
            0.0118327355 = score(doc=4109,freq=1.0), product of:
              0.05415602 = queryWeight, product of:
                1.0104285 = boost
                3.4958951 = idf(docFreq=3589, maxDocs=43556)
                0.015331432 = queryNorm
              0.21849345 = fieldWeight in 4109, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4958951 = idf(docFreq=3589, maxDocs=43556)
                0.0625 = fieldNorm(doc=4109)
          0.011225342 = weight(abstract_txt:that in 4109) [ClassicSimilarity], result of:
            0.011225342 = score(doc=4109,freq=4.0), product of:
              0.0377051 = queryWeight, product of:
                1.03259 = boost
                2.3817132 = idf(docFreq=10938, maxDocs=43556)
                0.015331432 = queryNorm
              0.29771414 = fieldWeight in 4109, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.3817132 = idf(docFreq=10938, maxDocs=43556)
                0.0625 = fieldNorm(doc=4109)
          0.07393313 = weight(abstract_txt:algorithms in 4109) [ClassicSimilarity], result of:
            0.07393313 = score(doc=4109,freq=2.0), product of:
              0.1458163 = queryWeight, product of:
                1.6580029 = boost
                5.736382 = idf(docFreq=381, maxDocs=43556)
                0.015331432 = queryNorm
              0.5070293 = fieldWeight in 4109, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.736382 = idf(docFreq=381, maxDocs=43556)
                0.0625 = fieldNorm(doc=4109)
          0.07809755 = weight(abstract_txt:performance in 4109) [ClassicSimilarity], result of:
            0.07809755 = score(doc=4109,freq=2.0), product of:
              0.19055262 = queryWeight, product of:
                2.6804314 = boost
                4.6368976 = idf(docFreq=1146, maxDocs=43556)
                0.015331432 = queryNorm
              0.4098477 = fieldWeight in 4109, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.6368976 = idf(docFreq=1146, maxDocs=43556)
                0.0625 = fieldNorm(doc=4109)
          0.1464665 = weight(abstract_txt:algorithm in 4109) [ClassicSimilarity], result of:
            0.1464665 = score(doc=4109,freq=2.0), product of:
              0.28978917 = queryWeight, product of:
                3.3055089 = boost
                5.7182236 = idf(docFreq=388, maxDocs=43556)
                0.015331432 = queryNorm
              0.5054243 = fieldWeight in 4109, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.7182236 = idf(docFreq=388, maxDocs=43556)
                0.0625 = fieldNorm(doc=4109)
          0.4136631 = weight(abstract_txt:merging in 4109) [ClassicSimilarity], result of:
            0.4136631 = score(doc=4109,freq=3.0), product of:
              0.50581384 = queryWeight, product of:
                4.3670945 = boost
                7.5546684 = idf(docFreq=61, maxDocs=43556)
                0.015331432 = queryNorm
              0.81781685 = fieldWeight in 4109, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.5546684 = idf(docFreq=61, maxDocs=43556)
                0.0625 = fieldNorm(doc=4109)
        0.28 = coord(7/25)
    
  3. Tsai, M.-.F.; Chen, H.-H.; Wang, Y.-T.: Learning a merge model for multilingual information retrieval (2011) 0.20
    0.19569719 = sum of:
      0.19569719 = product of:
        0.97848594 = sum of:
          0.0118327355 = weight(abstract_txt:results in 4748) [ClassicSimilarity], result of:
            0.0118327355 = score(doc=4748,freq=1.0), product of:
              0.05415602 = queryWeight, product of:
                1.0104285 = boost
                3.4958951 = idf(docFreq=3589, maxDocs=43556)
                0.015331432 = queryNorm
              0.21849345 = fieldWeight in 4748, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4958951 = idf(docFreq=3589, maxDocs=43556)
                0.0625 = fieldNorm(doc=4748)
          0.009721431 = weight(abstract_txt:that in 4748) [ClassicSimilarity], result of:
            0.009721431 = score(doc=4748,freq=3.0), product of:
              0.0377051 = queryWeight, product of:
                1.03259 = boost
                2.3817132 = idf(docFreq=10938, maxDocs=43556)
                0.015331432 = queryNorm
              0.257828 = fieldWeight in 4748, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.3817132 = idf(docFreq=10938, maxDocs=43556)
                0.0625 = fieldNorm(doc=4748)
          0.055223312 = weight(abstract_txt:performance in 4748) [ClassicSimilarity], result of:
            0.055223312 = score(doc=4748,freq=1.0), product of:
              0.19055262 = queryWeight, product of:
                2.6804314 = boost
                4.6368976 = idf(docFreq=1146, maxDocs=43556)
                0.015331432 = queryNorm
              0.2898061 = fieldWeight in 4748, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6368976 = idf(docFreq=1146, maxDocs=43556)
                0.0625 = fieldNorm(doc=4748)
          0.1464665 = weight(abstract_txt:algorithm in 4748) [ClassicSimilarity], result of:
            0.1464665 = score(doc=4748,freq=2.0), product of:
              0.28978917 = queryWeight, product of:
                3.3055089 = boost
                5.7182236 = idf(docFreq=388, maxDocs=43556)
                0.015331432 = queryNorm
              0.5054243 = fieldWeight in 4748, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.7182236 = idf(docFreq=388, maxDocs=43556)
                0.0625 = fieldNorm(doc=4748)
          0.755242 = weight(abstract_txt:merging in 4748) [ClassicSimilarity], result of:
            0.755242 = score(doc=4748,freq=10.0), product of:
              0.50581384 = queryWeight, product of:
                4.3670945 = boost
                7.5546684 = idf(docFreq=61, maxDocs=43556)
                0.015331432 = queryNorm
              1.4931225 = fieldWeight in 4748, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                7.5546684 = idf(docFreq=61, maxDocs=43556)
                0.0625 = fieldNorm(doc=4748)
        0.2 = coord(5/25)
    
  4. Sitas, A.; Kapidakis, S.: Duplicate detection algorithms of bibliographic descriptions (2008) 0.13
    0.12693675 = sum of:
      0.12693675 = product of:
        0.6346837 = sum of:
          0.020917518 = weight(abstract_txt:results in 4541) [ClassicSimilarity], result of:
            0.020917518 = score(doc=4541,freq=2.0), product of:
              0.05415602 = queryWeight, product of:
                1.0104285 = boost
                3.4958951 = idf(docFreq=3589, maxDocs=43556)
                0.015331432 = queryNorm
              0.3862455 = fieldWeight in 4541, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4958951 = idf(docFreq=3589, maxDocs=43556)
                0.078125 = fieldNorm(doc=4541)
          0.15455674 = weight(abstract_txt:duplicate in 4541) [ClassicSimilarity], result of:
            0.15455674 = score(doc=4541,freq=3.0), product of:
              0.1424486 = queryWeight, product of:
                1.1587676 = boost
                8.018241 = idf(docFreq=38, maxDocs=43556)
                0.015331432 = queryNorm
              1.085 = fieldWeight in 4541, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.018241 = idf(docFreq=38, maxDocs=43556)
                0.078125 = fieldNorm(doc=4541)
          0.029977357 = weight(abstract_txt:records in 4541) [ClassicSimilarity], result of:
            0.029977357 = score(doc=4541,freq=1.0), product of:
              0.086731896 = queryWeight, product of:
                1.2787088 = boost
                4.424095 = idf(docFreq=1418, maxDocs=43556)
                0.015331432 = queryNorm
              0.34563243 = fieldWeight in 4541, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.424095 = idf(docFreq=1418, maxDocs=43556)
                0.078125 = fieldNorm(doc=4541)
          0.13069655 = weight(abstract_txt:algorithms in 4541) [ClassicSimilarity], result of:
            0.13069655 = score(doc=4541,freq=4.0), product of:
              0.1458163 = queryWeight, product of:
                1.6580029 = boost
                5.736382 = idf(docFreq=381, maxDocs=43556)
                0.015331432 = queryNorm
              0.8963097 = fieldWeight in 4541, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.736382 = idf(docFreq=381, maxDocs=43556)
                0.078125 = fieldNorm(doc=4541)
          0.29853562 = weight(abstract_txt:merging in 4541) [ClassicSimilarity], result of:
            0.29853562 = score(doc=4541,freq=1.0), product of:
              0.50581384 = queryWeight, product of:
                4.3670945 = boost
                7.5546684 = idf(docFreq=61, maxDocs=43556)
                0.015331432 = queryNorm
              0.5902085 = fieldWeight in 4541, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5546684 = idf(docFreq=61, maxDocs=43556)
                0.078125 = fieldNorm(doc=4541)
        0.2 = coord(5/25)
    
  5. Hustand, S.: Problems of duplicate records (1986) 0.12
    0.1220258 = sum of:
      0.1220258 = product of:
        0.610129 = sum of:
          0.12619506 = weight(abstract_txt:duplicate in 266) [ClassicSimilarity], result of:
            0.12619506 = score(doc=266,freq=2.0), product of:
              0.1424486 = queryWeight, product of:
                1.1587676 = boost
                8.018241 = idf(docFreq=38, maxDocs=43556)
                0.015331432 = queryNorm
              0.8858988 = fieldWeight in 266, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.018241 = idf(docFreq=38, maxDocs=43556)
                0.078125 = fieldNorm(doc=266)
          0.04239438 = weight(abstract_txt:records in 266) [ClassicSimilarity], result of:
            0.04239438 = score(doc=266,freq=2.0), product of:
              0.086731896 = queryWeight, product of:
                1.2787088 = boost
                4.424095 = idf(docFreq=1418, maxDocs=43556)
                0.015331432 = queryNorm
              0.48879805 = fieldWeight in 266, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.424095 = idf(docFreq=1418, maxDocs=43556)
                0.078125 = fieldNorm(doc=266)
          0.065348275 = weight(abstract_txt:algorithms in 266) [ClassicSimilarity], result of:
            0.065348275 = score(doc=266,freq=1.0), product of:
              0.1458163 = queryWeight, product of:
                1.6580029 = boost
                5.736382 = idf(docFreq=381, maxDocs=43556)
                0.015331432 = queryNorm
              0.44815484 = fieldWeight in 266, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.736382 = idf(docFreq=381, maxDocs=43556)
                0.078125 = fieldNorm(doc=266)
          0.07765571 = weight(abstract_txt:union in 266) [ClassicSimilarity], result of:
            0.07765571 = score(doc=266,freq=1.0), product of:
              0.16359332 = queryWeight, product of:
                1.7561638 = boost
                6.0760007 = idf(docFreq=271, maxDocs=43556)
                0.015331432 = queryNorm
              0.47468755 = fieldWeight in 266, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0760007 = idf(docFreq=271, maxDocs=43556)
                0.078125 = fieldNorm(doc=266)
          0.29853562 = weight(abstract_txt:merging in 266) [ClassicSimilarity], result of:
            0.29853562 = score(doc=266,freq=1.0), product of:
              0.50581384 = queryWeight, product of:
                4.3670945 = boost
                7.5546684 = idf(docFreq=61, maxDocs=43556)
                0.015331432 = queryNorm
              0.5902085 = fieldWeight in 266, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5546684 = idf(docFreq=61, maxDocs=43556)
                0.078125 = fieldNorm(doc=266)
        0.2 = coord(5/25)