Search (1 results, page 1 of 1)

  • × author_ss:"Jele, H."
  • × language_ss:"d"
  • × theme_ss:"Formalerschließung"
  • × year_i:[2000 TO 2010}
  1. Jele, H.: Erkennung bibliographischer Dubletten mittels Trigrammen : Messungen zur Performanz (2009) 0.03
    0.028187582 = product of:
      0.0939586 = sum of:
        0.014868983 = weight(_text_:und in 2562) [ClassicSimilarity], result of:
          0.014868983 = score(doc=2562,freq=4.0), product of:
            0.05366975 = queryWeight, product of:
              2.216367 = idf(docFreq=13101, maxDocs=44218)
              0.024215192 = queryNorm
            0.27704588 = fieldWeight in 2562, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.216367 = idf(docFreq=13101, maxDocs=44218)
              0.0625 = fieldNorm(doc=2562)
        0.014868983 = weight(_text_:und in 2562) [ClassicSimilarity], result of:
          0.014868983 = score(doc=2562,freq=4.0), product of:
            0.05366975 = queryWeight, product of:
              2.216367 = idf(docFreq=13101, maxDocs=44218)
              0.024215192 = queryNorm
            0.27704588 = fieldWeight in 2562, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.216367 = idf(docFreq=13101, maxDocs=44218)
              0.0625 = fieldNorm(doc=2562)
        0.021359377 = weight(_text_:der in 2562) [ClassicSimilarity], result of:
          0.021359377 = score(doc=2562,freq=8.0), product of:
            0.054091092 = queryWeight, product of:
              2.2337668 = idf(docFreq=12875, maxDocs=44218)
              0.024215192 = queryNorm
            0.3948779 = fieldWeight in 2562, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              2.2337668 = idf(docFreq=12875, maxDocs=44218)
              0.0625 = fieldNorm(doc=2562)
        0.014868983 = weight(_text_:und in 2562) [ClassicSimilarity], result of:
          0.014868983 = score(doc=2562,freq=4.0), product of:
            0.05366975 = queryWeight, product of:
              2.216367 = idf(docFreq=13101, maxDocs=44218)
              0.024215192 = queryNorm
            0.27704588 = fieldWeight in 2562, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.216367 = idf(docFreq=13101, maxDocs=44218)
              0.0625 = fieldNorm(doc=2562)
        0.014868983 = weight(_text_:und in 2562) [ClassicSimilarity], result of:
          0.014868983 = score(doc=2562,freq=4.0), product of:
            0.05366975 = queryWeight, product of:
              2.216367 = idf(docFreq=13101, maxDocs=44218)
              0.024215192 = queryNorm
            0.27704588 = fieldWeight in 2562, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.216367 = idf(docFreq=13101, maxDocs=44218)
              0.0625 = fieldNorm(doc=2562)
        0.013123296 = product of:
          0.026246592 = sum of:
            0.026246592 = weight(_text_:22 in 2562) [ClassicSimilarity], result of:
              0.026246592 = score(doc=2562,freq=2.0), product of:
                0.08479747 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.024215192 = queryNorm
                0.30952093 = fieldWeight in 2562, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=2562)
          0.5 = coord(1/2)
      0.3 = coord(6/20)
    
    Abstract
    Die Bildung von Trigrammen wird in der automatisierten Dublettenerkennung häufig in Situationen angewandt, in denen "sehr ähnliche" aber nicht idente Datensätze als Duplikate identifiziert werden sollen. In dieser Arbeit werden drei auf Trigrammen beruhende Erkennungsverfahren (das Jaccard-Maß, der euklidische Abstand sowie der Ähnlichkeitswert des KOBV) praktisch angewandt, sämtliche dabei notwendigen Schritte umgesetzt und schließlich der Verbrauch an Zeit und Ressourcen (=die "Performanz") gemessen. Die hier zur Anwendung gelangte Datenmenge umfasst 392.616 bibliographische Titeldatensätze, die im Österreichischen Bibliothekenverbund erbracht wurden.
    Date
    21. 6.2010 19:30:22