Document (#21383)

Author
Meir, D.D.
Lazinger, S.S.
Title
Measuring the performance of a merging algorithm : mismatches, missed-matches, and overlap in Israel's union list
Source
Information technology and libraries. 17(1998) no.3, S.116-123
Year
1998
Abstract
Reports results of a survey, undertaken in 1996, to measure the performance of the merging algorithm used to generate the now defunct ALEPH ULM (Union List of Monographs) file. Results showed that although the algorithm created almost no mismatches that would have led to the loss of information, it had a greater proportion of missed matches than was anticipated, especially when matching Hebrew bibliographic records. Discusses the central issues inherent in automatic detection and merging of duplicate records, as well as the main methodologies for measuring the performance of merging algorithms. Recommendations include integrating testing procedures into the initial specifications for any future algorithms and deciding on a performance threshold that the algorithm must exceed in order to be put to use
Theme
Formalerschließung
Object
ALEPH ULM
Location
Israel

Similar documents (author)

  1. Lazinger, S.S.: To merge or not to merge : Israel's Union List of Monographs in the context of merging algorithms (1994) 5.87
    5.871439 = sum of:
      5.871439 = weight(author_txt:lazinger in 3100) [ClassicSimilarity], result of:
        5.871439 = fieldWeight in 3100, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.394302 = idf(docFreq=9, maxDocs=44218)
          0.625 = fieldNorm(doc=3100)
    
  2. Lazinger, S.S.: Digital preservation and metadata : history, theory, practice (2002) 5.87
    5.871439 = sum of:
      5.871439 = weight(author_txt:lazinger in 1262) [ClassicSimilarity], result of:
        5.871439 = fieldWeight in 1262, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.394302 = idf(docFreq=9, maxDocs=44218)
          0.625 = fieldNorm(doc=1262)
    
  3. Lazinger, S.S.: LC Classification of a library and information science library for maximum shelf retrieval (1984) 5.87
    5.871439 = sum of:
      5.871439 = weight(author_txt:lazinger in 339) [ClassicSimilarity], result of:
        5.871439 = fieldWeight in 339, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.394302 = idf(docFreq=9, maxDocs=44218)
          0.625 = fieldNorm(doc=339)
    
  4. Lazinger, S.S.; Peritz, B.C.: Reader use of a nationwide research library network : local OPAC vs. remote files (1991) 4.70
    4.697151 = sum of:
      4.697151 = weight(author_txt:lazinger in 3013) [ClassicSimilarity], result of:
        4.697151 = fieldWeight in 3013, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.394302 = idf(docFreq=9, maxDocs=44218)
          0.5 = fieldNorm(doc=3013)
    
  5. Shoham, S.; Lazinger, S.S.: ¬The no-main-entry principle and the automated catalog (1991) 4.70
    4.697151 = sum of:
      4.697151 = weight(author_txt:lazinger in 507) [ClassicSimilarity], result of:
        4.697151 = fieldWeight in 507, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.394302 = idf(docFreq=9, maxDocs=44218)
          0.5 = fieldNorm(doc=507)
    

Similar documents (content)

  1. Lazinger, S.S.: To merge or not to merge : Israel's Union List of Monographs in the context of merging algorithms (1994) 0.32
    0.32347062 = sum of:
      0.32347062 = product of:
        1.1552522 = sum of:
          0.15442003 = weight(abstract_txt:monographs in 3100) [ClassicSimilarity], result of:
            0.15442003 = score(doc=3100,freq=4.0), product of:
              0.114607625 = queryWeight, product of:
                1.0412878 = boost
                7.1860275 = idf(docFreq=90, maxDocs=44218)
                0.0153163 = queryNorm
              1.3473802 = fieldWeight in 3100, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.1860275 = idf(docFreq=90, maxDocs=44218)
                0.09375 = fieldNorm(doc=3100)
          0.09394072 = weight(abstract_txt:aleph in 3100) [ClassicSimilarity], result of:
            0.09394072 = score(doc=3100,freq=1.0), product of:
              0.13061719 = queryWeight, product of:
                1.11164 = boost
                7.6715355 = idf(docFreq=55, maxDocs=44218)
                0.0153163 = queryNorm
              0.71920645 = fieldWeight in 3100, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.6715355 = idf(docFreq=55, maxDocs=44218)
                0.09375 = fieldNorm(doc=3100)
          0.035925068 = weight(abstract_txt:records in 3100) [ClassicSimilarity], result of:
            0.035925068 = score(doc=3100,freq=1.0), product of:
              0.08670407 = queryWeight, product of:
                1.2808514 = boost
                4.4196396 = idf(docFreq=1446, maxDocs=44218)
                0.0153163 = queryNorm
              0.4143412 = fieldWeight in 3100, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4196396 = idf(docFreq=1446, maxDocs=44218)
                0.09375 = fieldNorm(doc=3100)
          0.12997535 = weight(abstract_txt:list in 3100) [ClassicSimilarity], result of:
            0.12997535 = score(doc=3100,freq=4.0), product of:
              0.12872465 = queryWeight, product of:
                1.5606656 = boost
                5.3851523 = idf(docFreq=550, maxDocs=44218)
                0.0153163 = queryNorm
              1.009716 = fieldWeight in 3100, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.3851523 = idf(docFreq=550, maxDocs=44218)
                0.09375 = fieldNorm(doc=3100)
          0.22993834 = weight(abstract_txt:union in 3100) [ClassicSimilarity], result of:
            0.22993834 = score(doc=3100,freq=6.0), product of:
              0.16448699 = queryWeight, product of:
                1.7641877 = boost
                6.087415 = idf(docFreq=272, maxDocs=44218)
                0.0153163 = queryNorm
              1.397912 = fieldWeight in 3100, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.087415 = idf(docFreq=272, maxDocs=44218)
                0.09375 = fieldNorm(doc=3100)
          0.15457188 = weight(abstract_txt:algorithm in 3100) [ClassicSimilarity], result of:
            0.15457188 = score(doc=3100,freq=1.0), product of:
              0.2889824 = queryWeight, product of:
                3.3069658 = boost
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.0153163 = queryNorm
              0.5348834 = fieldWeight in 3100, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.09375 = fieldNorm(doc=3100)
          0.3564808 = weight(abstract_txt:merging in 3100) [ClassicSimilarity], result of:
            0.3564808 = score(doc=3100,freq=1.0), product of:
              0.5044388 = queryWeight, product of:
                4.369163 = boost
                7.538004 = idf(docFreq=63, maxDocs=44218)
                0.0153163 = queryNorm
              0.70668787 = fieldWeight in 3100, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.538004 = idf(docFreq=63, maxDocs=44218)
                0.09375 = fieldNorm(doc=3100)
        0.28 = coord(7/25)
    
  2. Paltoglou, G.; Salampasis, M.; Satratzemi, M.: ¬A results merging algorithm for distributed information retrieval environments that combines regression methodologies with a selective download phase (2008) 0.22
    0.21768446 = sum of:
      0.21768446 = product of:
        0.7774445 = sum of:
          0.046431527 = weight(abstract_txt:overlap in 2111) [ClassicSimilarity], result of:
            0.046431527 = score(doc=2111,freq=1.0), product of:
              0.10699592 = queryWeight, product of:
                1.006115 = boost
                6.943297 = idf(docFreq=115, maxDocs=44218)
                0.0153163 = queryNorm
              0.43395606 = fieldWeight in 2111, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.943297 = idf(docFreq=115, maxDocs=44218)
                0.0625 = fieldNorm(doc=2111)
          0.011716287 = weight(abstract_txt:results in 2111) [ClassicSimilarity], result of:
            0.011716287 = score(doc=2111,freq=1.0), product of:
              0.05383052 = queryWeight, product of:
                1.0092373 = boost
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.0153163 = queryNorm
              0.21765138 = fieldWeight in 2111, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.0625 = fieldNorm(doc=2111)
          0.0110719325 = weight(abstract_txt:that in 2111) [ClassicSimilarity], result of:
            0.0110719325 = score(doc=2111,freq=4.0), product of:
              0.03738189 = queryWeight, product of:
                1.0300428 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0153163 = queryNorm
              0.2961844 = fieldWeight in 2111, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0625 = fieldNorm(doc=2111)
          0.07296183 = weight(abstract_txt:algorithms in 2111) [ClassicSimilarity], result of:
            0.07296183 = score(doc=2111,freq=2.0), product of:
              0.14461802 = queryWeight, product of:
                1.6542083 = boost
                5.707926 = idf(docFreq=398, maxDocs=44218)
                0.0153163 = queryNorm
              0.5045141 = fieldWeight in 2111, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.707926 = idf(docFreq=398, maxDocs=44218)
                0.0625 = fieldNorm(doc=2111)
          0.07790259 = weight(abstract_txt:performance in 2111) [ClassicSimilarity], result of:
            0.07790259 = score(doc=2111,freq=2.0), product of:
              0.19034283 = queryWeight, product of:
                2.6838748 = boost
                4.63042 = idf(docFreq=1171, maxDocs=44218)
                0.0153163 = queryNorm
              0.40927517 = fieldWeight in 2111, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.63042 = idf(docFreq=1171, maxDocs=44218)
                0.0625 = fieldNorm(doc=2111)
          0.14573178 = weight(abstract_txt:algorithm in 2111) [ClassicSimilarity], result of:
            0.14573178 = score(doc=2111,freq=2.0), product of:
              0.2889824 = queryWeight, product of:
                3.3069658 = boost
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.0153163 = queryNorm
              0.5042929 = fieldWeight in 2111, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.0625 = fieldNorm(doc=2111)
          0.41162857 = weight(abstract_txt:merging in 2111) [ClassicSimilarity], result of:
            0.41162857 = score(doc=2111,freq=3.0), product of:
              0.5044388 = queryWeight, product of:
                4.369163 = boost
                7.538004 = idf(docFreq=63, maxDocs=44218)
                0.0153163 = queryNorm
              0.81601286 = fieldWeight in 2111, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.538004 = idf(docFreq=63, maxDocs=44218)
                0.0625 = fieldNorm(doc=2111)
        0.28 = coord(7/25)
    
  3. Tsai, M.-.F.; Chen, H.-H.; Wang, Y.-T.: Learning a merge model for multilingual information retrieval (2011) 0.19
    0.19472991 = sum of:
      0.19472991 = product of:
        0.97364956 = sum of:
          0.011716287 = weight(abstract_txt:results in 2750) [ClassicSimilarity], result of:
            0.011716287 = score(doc=2750,freq=1.0), product of:
              0.05383052 = queryWeight, product of:
                1.0092373 = boost
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.0153163 = queryNorm
              0.21765138 = fieldWeight in 2750, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.0625 = fieldNorm(doc=2750)
          0.009588574 = weight(abstract_txt:that in 2750) [ClassicSimilarity], result of:
            0.009588574 = score(doc=2750,freq=3.0), product of:
              0.03738189 = queryWeight, product of:
                1.0300428 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0153163 = queryNorm
              0.2565032 = fieldWeight in 2750, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0625 = fieldNorm(doc=2750)
          0.055085454 = weight(abstract_txt:performance in 2750) [ClassicSimilarity], result of:
            0.055085454 = score(doc=2750,freq=1.0), product of:
              0.19034283 = queryWeight, product of:
                2.6838748 = boost
                4.63042 = idf(docFreq=1171, maxDocs=44218)
                0.0153163 = queryNorm
              0.28940126 = fieldWeight in 2750, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.63042 = idf(docFreq=1171, maxDocs=44218)
                0.0625 = fieldNorm(doc=2750)
          0.14573178 = weight(abstract_txt:algorithm in 2750) [ClassicSimilarity], result of:
            0.14573178 = score(doc=2750,freq=2.0), product of:
              0.2889824 = queryWeight, product of:
                3.3069658 = boost
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.0153163 = queryNorm
              0.5042929 = fieldWeight in 2750, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.0625 = fieldNorm(doc=2750)
          0.7515275 = weight(abstract_txt:merging in 2750) [ClassicSimilarity], result of:
            0.7515275 = score(doc=2750,freq=10.0), product of:
              0.5044388 = queryWeight, product of:
                4.369163 = boost
                7.538004 = idf(docFreq=63, maxDocs=44218)
                0.0153163 = queryNorm
              1.4898288 = fieldWeight in 2750, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                7.538004 = idf(docFreq=63, maxDocs=44218)
                0.0625 = fieldNorm(doc=2750)
        0.2 = coord(5/25)
    
  4. Sitas, A.; Kapidakis, S.: Duplicate detection algorithms of bibliographic descriptions (2008) 0.13
    0.12647806 = sum of:
      0.12647806 = product of:
        0.63239026 = sum of:
          0.020711664 = weight(abstract_txt:results in 2543) [ClassicSimilarity], result of:
            0.020711664 = score(doc=2543,freq=2.0), product of:
              0.05383052 = queryWeight, product of:
                1.0092373 = boost
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.0153163 = queryNorm
              0.38475692 = fieldWeight in 2543, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.078125 = fieldNorm(doc=2543)
          0.1556942 = weight(abstract_txt:duplicate in 2543) [ClassicSimilarity], result of:
            0.1556942 = score(doc=2543,freq=3.0), product of:
              0.14322752 = queryWeight, product of:
                1.164065 = boost
                8.033325 = idf(docFreq=38, maxDocs=44218)
                0.0153163 = queryNorm
              1.0870411 = fieldWeight in 2543, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.033325 = idf(docFreq=38, maxDocs=44218)
                0.078125 = fieldNorm(doc=2543)
          0.029937558 = weight(abstract_txt:records in 2543) [ClassicSimilarity], result of:
            0.029937558 = score(doc=2543,freq=1.0), product of:
              0.08670407 = queryWeight, product of:
                1.2808514 = boost
                4.4196396 = idf(docFreq=1446, maxDocs=44218)
                0.0153163 = queryNorm
              0.34528434 = fieldWeight in 2543, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4196396 = idf(docFreq=1446, maxDocs=44218)
                0.078125 = fieldNorm(doc=2543)
          0.12897952 = weight(abstract_txt:algorithms in 2543) [ClassicSimilarity], result of:
            0.12897952 = score(doc=2543,freq=4.0), product of:
              0.14461802 = queryWeight, product of:
                1.6542083 = boost
                5.707926 = idf(docFreq=398, maxDocs=44218)
                0.0153163 = queryNorm
              0.8918634 = fieldWeight in 2543, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.707926 = idf(docFreq=398, maxDocs=44218)
                0.078125 = fieldNorm(doc=2543)
          0.2970673 = weight(abstract_txt:merging in 2543) [ClassicSimilarity], result of:
            0.2970673 = score(doc=2543,freq=1.0), product of:
              0.5044388 = queryWeight, product of:
                4.369163 = boost
                7.538004 = idf(docFreq=63, maxDocs=44218)
                0.0153163 = queryNorm
              0.5889065 = fieldWeight in 2543, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.538004 = idf(docFreq=63, maxDocs=44218)
                0.078125 = fieldNorm(doc=2543)
        0.2 = coord(5/25)
    
  5. Hustand, S.: Problems of duplicate records (1986) 0.12
    0.12184911 = sum of:
      0.12184911 = product of:
        0.60924554 = sum of:
          0.12712379 = weight(abstract_txt:duplicate in 266) [ClassicSimilarity], result of:
            0.12712379 = score(doc=266,freq=2.0), product of:
              0.14322752 = queryWeight, product of:
                1.164065 = boost
                8.033325 = idf(docFreq=38, maxDocs=44218)
                0.0153163 = queryNorm
              0.8875654 = fieldWeight in 266, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.033325 = idf(docFreq=38, maxDocs=44218)
                0.078125 = fieldNorm(doc=266)
          0.0423381 = weight(abstract_txt:records in 266) [ClassicSimilarity], result of:
            0.0423381 = score(doc=266,freq=2.0), product of:
              0.08670407 = queryWeight, product of:
                1.2808514 = boost
                4.4196396 = idf(docFreq=1446, maxDocs=44218)
                0.0153163 = queryNorm
              0.4883058 = fieldWeight in 266, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4196396 = idf(docFreq=1446, maxDocs=44218)
                0.078125 = fieldNorm(doc=266)
          0.06448976 = weight(abstract_txt:algorithms in 266) [ClassicSimilarity], result of:
            0.06448976 = score(doc=266,freq=1.0), product of:
              0.14461802 = queryWeight, product of:
                1.6542083 = boost
                5.707926 = idf(docFreq=398, maxDocs=44218)
                0.0153163 = queryNorm
              0.4459317 = fieldWeight in 266, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.707926 = idf(docFreq=398, maxDocs=44218)
                0.078125 = fieldNorm(doc=266)
          0.07822661 = weight(abstract_txt:union in 266) [ClassicSimilarity], result of:
            0.07822661 = score(doc=266,freq=1.0), product of:
              0.16448699 = queryWeight, product of:
                1.7641877 = boost
                6.087415 = idf(docFreq=272, maxDocs=44218)
                0.0153163 = queryNorm
              0.47557932 = fieldWeight in 266, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.087415 = idf(docFreq=272, maxDocs=44218)
                0.078125 = fieldNorm(doc=266)
          0.2970673 = weight(abstract_txt:merging in 266) [ClassicSimilarity], result of:
            0.2970673 = score(doc=266,freq=1.0), product of:
              0.5044388 = queryWeight, product of:
                4.369163 = boost
                7.538004 = idf(docFreq=63, maxDocs=44218)
                0.0153163 = queryNorm
              0.5889065 = fieldWeight in 266, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.538004 = idf(docFreq=63, maxDocs=44218)
                0.078125 = fieldNorm(doc=266)
        0.2 = coord(5/25)