Document (#34564)

Author
Jele, H.
Title
Erkennung bibliographischer Dubletten mittels Trigrammen : Messungen zur Performanz
Source
B.I.T.online. 12(2009) H.3, S.xxx-xxx
Year
2009
Abstract
Die Bildung von Trigrammen wird in der automatisierten Dublettenerkennung häufig in Situationen angewandt, in denen "sehr ähnliche" aber nicht idente Datensätze als Duplikate identifiziert werden sollen. In dieser Arbeit werden drei auf Trigrammen beruhende Erkennungsverfahren (das Jaccard-Maß, der euklidische Abstand sowie der Ähnlichkeitswert des KOBV) praktisch angewandt, sämtliche dabei notwendigen Schritte umgesetzt und schließlich der Verbrauch an Zeit und Ressourcen (=die "Performanz") gemessen. Die hier zur Anwendung gelangte Datenmenge umfasst 392.616 bibliographische Titeldatensätze, die im Österreichischen Bibliothekenverbund erbracht wurden.
Theme
Formalerschließung

Similar documents (content)

  1. Schneider, W.: ¬Ein verteiltes Bibliotheks-Informationssystem auf Basis des Z39.50 Protokolls (1999) 0.05
    0.04813518 = sum of:
      0.04813518 = product of:
        0.4011265 = sum of:
          0.09576001 = weight(abstract_txt:bibliographische in 5774) [ClassicSimilarity], result of:
            0.09576001 = score(doc=5774,freq=1.0), product of:
              0.13405009 = queryWeight, product of:
                1.021673 = boost
                7.619839 = idf(docFreq=56, maxDocs=42740)
                0.017219057 = queryNorm
              0.71435994 = fieldWeight in 5774, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.619839 = idf(docFreq=56, maxDocs=42740)
                0.09375 = fieldNorm(doc=5774)
          0.106784135 = weight(abstract_txt:datensätze in 5774) [ClassicSimilarity], result of:
            0.106784135 = score(doc=5774,freq=1.0), product of:
              0.14415027 = queryWeight, product of:
                1.0594637 = boost
                7.9016905 = idf(docFreq=42, maxDocs=42740)
                0.017219057 = queryNorm
              0.74078345 = fieldWeight in 5774, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.9016905 = idf(docFreq=42, maxDocs=42740)
                0.09375 = fieldNorm(doc=5774)
          0.19858235 = weight(abstract_txt:dubletten in 5774) [ClassicSimilarity], result of:
            0.19858235 = score(doc=5774,freq=1.0), product of:
              0.21799076 = queryWeight, product of:
                1.302859 = boost
                9.71698 = idf(docFreq=6, maxDocs=42740)
                0.017219057 = queryNorm
              0.9109669 = fieldWeight in 5774, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.71698 = idf(docFreq=6, maxDocs=42740)
                0.09375 = fieldNorm(doc=5774)
        0.12 = coord(3/25)
    
  2. Schaffner, V.: FRBR in MAB2 und Primo - ein kafkaesker Prozess? : Möglichkeiten der FRBRisierung von MAB2-Datensätzen in Primo exemplarisch dargestellt an Datensätzen zu Franz Kafkas "Der Process" (2011) 0.04
    0.036258906 = sum of:
      0.036258906 = product of:
        0.30215755 = sum of:
          0.09675236 = weight(abstract_txt:bibliographische in 2908) [ClassicSimilarity], result of:
            0.09675236 = score(doc=2908,freq=3.0), product of:
              0.13405009 = queryWeight, product of:
                1.021673 = boost
                7.619839 = idf(docFreq=56, maxDocs=42740)
                0.017219057 = queryNorm
              0.7217628 = fieldWeight in 2908, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.619839 = idf(docFreq=56, maxDocs=42740)
                0.0546875 = fieldNorm(doc=2908)
          0.12458149 = weight(abstract_txt:datensätze in 2908) [ClassicSimilarity], result of:
            0.12458149 = score(doc=2908,freq=4.0), product of:
              0.14415027 = queryWeight, product of:
                1.0594637 = boost
                7.9016905 = idf(docFreq=42, maxDocs=42740)
                0.017219057 = queryNorm
              0.8642474 = fieldWeight in 2908, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.9016905 = idf(docFreq=42, maxDocs=42740)
                0.0546875 = fieldNorm(doc=2908)
          0.08082368 = weight(abstract_txt:bibliothekenverbund in 2908) [ClassicSimilarity], result of:
            0.08082368 = score(doc=2908,freq=1.0), product of:
              0.17148477 = queryWeight, product of:
                1.1555563 = boost
                8.618368 = idf(docFreq=20, maxDocs=42740)
                0.017219057 = queryNorm
              0.471317 = fieldWeight in 2908, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.618368 = idf(docFreq=20, maxDocs=42740)
                0.0546875 = fieldNorm(doc=2908)
        0.12 = coord(3/25)
    
  3. Figge, U.L.: Technische Anleitungen und der Erwerb kohärenten Wissens (2004) 0.03
    0.03178587 = sum of:
      0.03178587 = product of:
        0.39732343 = sum of:
          0.14384265 = weight(abstract_txt:situationen in 4145) [ClassicSimilarity], result of:
            0.14384265 = score(doc=4145,freq=2.0), product of:
              0.15758453 = queryWeight, product of:
                1.1077331 = boost
                8.261693 = idf(docFreq=29, maxDocs=42740)
                0.017219057 = queryNorm
              0.91279674 = fieldWeight in 4145, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.261693 = idf(docFreq=29, maxDocs=42740)
                0.078125 = fieldNorm(doc=4145)
          0.25348076 = weight(abstract_txt:angewandt in 4145) [ClassicSimilarity], result of:
            0.25348076 = score(doc=4145,freq=1.0), product of:
              0.3649543 = queryWeight, product of:
                2.3840349 = boost
                8.890302 = idf(docFreq=15, maxDocs=42740)
                0.017219057 = queryNorm
              0.6945548 = fieldWeight in 4145, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.890302 = idf(docFreq=15, maxDocs=42740)
                0.078125 = fieldNorm(doc=4145)
        0.08 = coord(2/25)
    
  4. Probstmeyer, J.: Analyse von maschinell generierten Korrelationen zwischen der Regensburger Verbundklassifikation (RVK) und der Schlagwortnormdatei (SWD) (2009) 0.03
    0.027397403 = sum of:
      0.027397403 = product of:
        0.34246755 = sum of:
          0.088986784 = weight(abstract_txt:datensätze in 217) [ClassicSimilarity], result of:
            0.088986784 = score(doc=217,freq=1.0), product of:
              0.14415027 = queryWeight, product of:
                1.0594637 = boost
                7.9016905 = idf(docFreq=42, maxDocs=42740)
                0.017219057 = queryNorm
              0.6173196 = fieldWeight in 217, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.9016905 = idf(docFreq=42, maxDocs=42740)
                0.078125 = fieldNorm(doc=217)
          0.25348076 = weight(abstract_txt:angewandt in 217) [ClassicSimilarity], result of:
            0.25348076 = score(doc=217,freq=1.0), product of:
              0.3649543 = queryWeight, product of:
                2.3840349 = boost
                8.890302 = idf(docFreq=15, maxDocs=42740)
                0.017219057 = queryNorm
              0.6945548 = fieldWeight in 217, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.890302 = idf(docFreq=15, maxDocs=42740)
                0.078125 = fieldNorm(doc=217)
        0.08 = coord(2/25)
    
  5. Jersek, T.: Automatische DDC-Klassifizierung mit Lingo : Vorgehensweise und Ergebnisse (2012) 0.03
    0.026070742 = sum of:
      0.026070742 = product of:
        0.32588428 = sum of:
          0.11172001 = weight(abstract_txt:bibliographische in 2123) [ClassicSimilarity], result of:
            0.11172001 = score(doc=2123,freq=1.0), product of:
              0.13405009 = queryWeight, product of:
                1.021673 = boost
                7.619839 = idf(docFreq=56, maxDocs=42740)
                0.017219057 = queryNorm
              0.8334199 = fieldWeight in 2123, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.619839 = idf(docFreq=56, maxDocs=42740)
                0.109375 = fieldNorm(doc=2123)
          0.21416426 = weight(abstract_txt:titeldatensätze in 2123) [ClassicSimilarity], result of:
            0.21416426 = score(doc=2123,freq=1.0), product of:
              0.2068606 = queryWeight, product of:
                1.2691625 = boost
                9.465666 = idf(docFreq=8, maxDocs=42740)
                0.017219057 = queryNorm
              1.0353072 = fieldWeight in 2123, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.465666 = idf(docFreq=8, maxDocs=42740)
                0.109375 = fieldNorm(doc=2123)
        0.08 = coord(2/25)