Document (#44002)

Gabler, S.
Vergabe von DDC-Sachgruppen mittels eines Schlagwort-Thesaurus
Wien / Library and Information Studies : Universität
109 S
Vorgestellt wird die Konstruktion eines thematisch geordneten Thesaurus auf Basis der Sachschlagwörter der Gemeinsamen Normdatei (GND) unter Nutzung der darin enthaltenen DDC-Notationen. Oberste Ordnungsebene dieses Thesaurus werden die DDC-Sachgruppen der Deutschen Nationalbibliothek. Die Konstruktion des Thesaurus erfolgt regelbasiert unter der Nutzung von Linked Data Prinzipien in einem SPARQL Prozessor. Der Thesaurus dient der automatisierten Gewinnung von Metadaten aus wissenschaftlichen Publikationen mittels eines computerlinguistischen Extraktors. Hierzu werden digitale Volltexte verarbeitet. Dieser ermittelt die gefundenen Schlagwörter über Vergleich der Zeichenfolgen Benennungen im Thesaurus, ordnet die Treffer nach Relevanz im Text und gibt die zugeordne-ten Sachgruppen rangordnend zurück. Die grundlegende Annahme dabei ist, dass die gesuchte Sachgruppe unter den oberen Rängen zurückgegeben wird. In einem dreistufigen Verfahren wird die Leistungsfähigkeit des Verfahrens validiert. Hierzu wird zunächst anhand von Metadaten und Erkenntnissen einer Kurzautopsie ein Goldstandard aus Dokumenten erstellt, die im Online-Katalog der DNB abrufbar sind. Die Dokumente vertei-len sich über 14 der Sachgruppen mit einer Losgröße von jeweils 50 Dokumenten. Sämtliche Dokumente werden mit dem Extraktor erschlossen und die Ergebnisse der Kategorisierung do-kumentiert. Schließlich wird die sich daraus ergebende Retrievalleistung sowohl für eine harte (binäre) Kategorisierung als auch eine rangordnende Rückgabe der Sachgruppen beurteilt.
Master thesis Master of Science (Library and Information Studies) (MSc), Universität Wien. Advisor: Christoph Steiner. Vgl.: DOI: 10.25365/thesis.70030. Vgl. dazu die Präsentation unter:
Beziehungen verbale / systematische Erschließung
Semantische Interoperabilität

Similar documents (content)

  1. Darstellung der CrissCross-Mappingrelationen im Rahmen des Semantic Web (2010) 0.19
    0.19018918 = sum of:
      0.19018918 = product of:
        0.52830327 = sum of:
          0.015598024 = weight(abstract_txt:einem in 4285) [ClassicSimilarity], result of:
            0.015598024 = score(doc=4285,freq=2.0), product of:
              0.06511642 = queryWeight, product of:
                4.3361473 = idf(docFreq=1572, maxDocs=44218)
                0.015017114 = queryNorm
              0.23954056 = fieldWeight in 4285, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.3361473 = idf(docFreq=1572, maxDocs=44218)
                0.0390625 = fieldNorm(doc=4285)
          0.08868944 = weight(abstract_txt:sachgruppe in 4285) [ClassicSimilarity], result of:
            0.08868944 = score(doc=4285,freq=2.0), product of:
              0.16464508 = queryWeight, product of:
                1.1243826 = boost
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.015017114 = queryNorm
              0.5386705 = fieldWeight in 4285, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.0390625 = fieldNorm(doc=4285)
          0.030300865 = weight(abstract_txt:werden in 4285) [ClassicSimilarity], result of:
            0.030300865 = score(doc=4285,freq=12.0), product of:
              0.063864686 = queryWeight, product of:
                1.2129161 = boost
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.015017114 = queryNorm
              0.47445413 = fieldWeight in 4285, product of:
                3.4641016 = tf(freq=12.0), with freq of:
                  12.0 = termFreq=12.0
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.0390625 = fieldNorm(doc=4285)
          0.051209223 = weight(abstract_txt:dokumenten in 4285) [ClassicSimilarity], result of:
            0.051209223 = score(doc=4285,freq=2.0), product of:
              0.14383887 = queryWeight, product of:
                1.4862535 = boost
                6.444614 = idf(docFreq=190, maxDocs=44218)
                0.015017114 = queryNorm
              0.35601798 = fieldWeight in 4285, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.444614 = idf(docFreq=190, maxDocs=44218)
                0.0390625 = fieldNorm(doc=4285)
          0.040317655 = weight(abstract_txt:mittels in 4285) [ClassicSimilarity], result of:
            0.040317655 = score(doc=4285,freq=1.0), product of:
              0.15451986 = queryWeight, product of:
                1.5404475 = boost
                6.6796074 = idf(docFreq=150, maxDocs=44218)
                0.015017114 = queryNorm
              0.26092216 = fieldWeight in 4285, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6796074 = idf(docFreq=150, maxDocs=44218)
                0.0390625 = fieldNorm(doc=4285)
          0.019449921 = weight(abstract_txt:eines in 4285) [ClassicSimilarity], result of:
            0.019449921 = score(doc=4285,freq=1.0), product of:
              0.10880022 = queryWeight, product of:
                1.5831252 = boost
                4.5764427 = idf(docFreq=1236, maxDocs=44218)
                0.015017114 = queryNorm
              0.1787673 = fieldWeight in 4285, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5764427 = idf(docFreq=1236, maxDocs=44218)
                0.0390625 = fieldNorm(doc=4285)
          0.03142401 = weight(abstract_txt:unter in 4285) [ClassicSimilarity], result of:
            0.03142401 = score(doc=4285,freq=2.0), product of:
              0.11890011 = queryWeight, product of:
                1.6549753 = boost
                4.7841444 = idf(docFreq=1004, maxDocs=44218)
                0.015017114 = queryNorm
              0.26428914 = fieldWeight in 4285, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.7841444 = idf(docFreq=1004, maxDocs=44218)
                0.0390625 = fieldNorm(doc=4285)
          0.04806766 = weight(abstract_txt:wird in 4285) [ClassicSimilarity], result of:
            0.04806766 = score(doc=4285,freq=7.0), product of:
              0.12326415 = queryWeight, product of:
                2.1754203 = boost
                3.773177 = idf(docFreq=2761, maxDocs=44218)
                0.015017114 = queryNorm
              0.38995653 = fieldWeight in 4285, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                3.773177 = idf(docFreq=2761, maxDocs=44218)
                0.0390625 = fieldNorm(doc=4285)
          0.2032465 = weight(abstract_txt:sachgruppen in 4285) [ClassicSimilarity], result of:
            0.2032465 = score(doc=4285,freq=1.0), product of:
              0.61657065 = queryWeight, product of:
                4.865373 = boost
                8.43879 = idf(docFreq=25, maxDocs=44218)
                0.015017114 = queryNorm
              0.32964024 = fieldWeight in 4285, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.43879 = idf(docFreq=25, maxDocs=44218)
                0.0390625 = fieldNorm(doc=4285)
        0.36 = coord(9/25)
  2. Hoffmann, R.: Entwicklung einer benutzerunterstützten automatisierten Klassifikation von Web - Dokumenten : Untersuchung gegenwärtiger Methoden zur automatisierten Dokumentklassifikation und Implementierung eines Prototyps zum verbesserten Information Retrieval für das xFIND System (2002) 0.17
    0.16842744 = sum of:
      0.16842744 = product of:
        0.467854 = sum of:
          0.018717628 = weight(abstract_txt:einem in 4197) [ClassicSimilarity], result of:
            0.018717628 = score(doc=4197,freq=2.0), product of:
              0.06511642 = queryWeight, product of:
                4.3361473 = idf(docFreq=1572, maxDocs=44218)
                0.015017114 = queryNorm
              0.28744867 = fieldWeight in 4197, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.3361473 = idf(docFreq=1572, maxDocs=44218)
                0.046875 = fieldNorm(doc=4197)
          0.0277712 = weight(abstract_txt:werden in 4197) [ClassicSimilarity], result of:
            0.0277712 = score(doc=4197,freq=7.0), product of:
              0.063864686 = queryWeight, product of:
                1.2129161 = boost
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.015017114 = queryNorm
              0.43484437 = fieldWeight in 4197, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.046875 = fieldNorm(doc=4197)
          0.039718714 = weight(abstract_txt:dokumente in 4197) [ClassicSimilarity], result of:
            0.039718714 = score(doc=4197,freq=1.0), product of:
              0.13547632 = queryWeight, product of:
                1.4424025 = boost
                6.2544694 = idf(docFreq=230, maxDocs=44218)
                0.015017114 = queryNorm
              0.29317826 = fieldWeight in 4197, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2544694 = idf(docFreq=230, maxDocs=44218)
                0.046875 = fieldNorm(doc=4197)
          0.06145107 = weight(abstract_txt:dokumenten in 4197) [ClassicSimilarity], result of:
            0.06145107 = score(doc=4197,freq=2.0), product of:
              0.14383887 = queryWeight, product of:
                1.4862535 = boost
                6.444614 = idf(docFreq=190, maxDocs=44218)
                0.015017114 = queryNorm
              0.4272216 = fieldWeight in 4197, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.444614 = idf(docFreq=190, maxDocs=44218)
                0.046875 = fieldNorm(doc=4197)
          0.048381187 = weight(abstract_txt:mittels in 4197) [ClassicSimilarity], result of:
            0.048381187 = score(doc=4197,freq=1.0), product of:
              0.15451986 = queryWeight, product of:
                1.5404475 = boost
                6.6796074 = idf(docFreq=150, maxDocs=44218)
                0.015017114 = queryNorm
              0.3131066 = fieldWeight in 4197, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6796074 = idf(docFreq=150, maxDocs=44218)
                0.046875 = fieldNorm(doc=4197)
          0.03300761 = weight(abstract_txt:eines in 4197) [ClassicSimilarity], result of:
            0.03300761 = score(doc=4197,freq=2.0), product of:
              0.10880022 = queryWeight, product of:
                1.5831252 = boost
                4.5764427 = idf(docFreq=1236, maxDocs=44218)
                0.015017114 = queryNorm
              0.30337816 = fieldWeight in 4197, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.5764427 = idf(docFreq=1236, maxDocs=44218)
                0.046875 = fieldNorm(doc=4197)
          0.026664155 = weight(abstract_txt:unter in 4197) [ClassicSimilarity], result of:
            0.026664155 = score(doc=4197,freq=1.0), product of:
              0.11890011 = queryWeight, product of:
                1.6549753 = boost
                4.7841444 = idf(docFreq=1004, maxDocs=44218)
                0.015017114 = queryNorm
              0.22425677 = fieldWeight in 4197, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7841444 = idf(docFreq=1004, maxDocs=44218)
                0.046875 = fieldNorm(doc=4197)
          0.19034098 = weight(abstract_txt:kategorisierung in 4197) [ClassicSimilarity], result of:
            0.19034098 = score(doc=4197,freq=2.0), product of:
              0.30564094 = queryWeight, product of:
                2.1665092 = boost
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.015017114 = queryNorm
              0.6227601 = fieldWeight in 4197, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.046875 = fieldNorm(doc=4197)
          0.021801442 = weight(abstract_txt:wird in 4197) [ClassicSimilarity], result of:
            0.021801442 = score(doc=4197,freq=1.0), product of:
              0.12326415 = queryWeight, product of:
                2.1754203 = boost
                3.773177 = idf(docFreq=2761, maxDocs=44218)
                0.015017114 = queryNorm
              0.17686766 = fieldWeight in 4197, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.773177 = idf(docFreq=2761, maxDocs=44218)
                0.046875 = fieldNorm(doc=4197)
        0.36 = coord(9/25)
  3. Heiner-Freiling, M.: Dewey in der Deutschen Nationalbibliographie? (2002) 0.15
    0.14541864 = sum of:
      0.14541864 = product of:
        0.605911 = sum of:
          0.013235362 = weight(abstract_txt:einem in 1419) [ClassicSimilarity], result of:
            0.013235362 = score(doc=1419,freq=1.0), product of:
              0.06511642 = queryWeight, product of:
                4.3361473 = idf(docFreq=1572, maxDocs=44218)
                0.015017114 = queryNorm
              0.2032569 = fieldWeight in 1419, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3361473 = idf(docFreq=1572, maxDocs=44218)
                0.046875 = fieldNorm(doc=1419)
          0.05347175 = weight(abstract_txt:vergabe in 1419) [ClassicSimilarity], result of:
            0.05347175 = score(doc=1419,freq=1.0), product of:
              0.13110107 = queryWeight, product of:
                1.0033278 = boost
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.015017114 = queryNorm
              0.40786663 = fieldWeight in 1419, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.046875 = fieldNorm(doc=1419)
          0.07525549 = weight(abstract_txt:sachgruppe in 1419) [ClassicSimilarity], result of:
            0.07525549 = score(doc=1419,freq=1.0), product of:
              0.16464508 = queryWeight, product of:
                1.1243826 = boost
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.015017114 = queryNorm
              0.4570771 = fieldWeight in 1419, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.046875 = fieldNorm(doc=1419)
          0.014844331 = weight(abstract_txt:werden in 1419) [ClassicSimilarity], result of:
            0.014844331 = score(doc=1419,freq=2.0), product of:
              0.063864686 = queryWeight, product of:
                1.2129161 = boost
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.015017114 = queryNorm
              0.23243411 = fieldWeight in 1419, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.046875 = fieldNorm(doc=1419)
          0.026664155 = weight(abstract_txt:unter in 1419) [ClassicSimilarity], result of:
            0.026664155 = score(doc=1419,freq=1.0), product of:
              0.11890011 = queryWeight, product of:
                1.6549753 = boost
                4.7841444 = idf(docFreq=1004, maxDocs=44218)
                0.015017114 = queryNorm
              0.22425677 = fieldWeight in 1419, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7841444 = idf(docFreq=1004, maxDocs=44218)
                0.046875 = fieldNorm(doc=1419)
          0.4224399 = weight(abstract_txt:sachgruppen in 1419) [ClassicSimilarity], result of:
            0.4224399 = score(doc=1419,freq=3.0), product of:
              0.61657065 = queryWeight, product of:
                4.865373 = boost
                8.43879 = idf(docFreq=25, maxDocs=44218)
                0.015017114 = queryNorm
              0.68514436 = fieldWeight in 1419, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.43879 = idf(docFreq=25, maxDocs=44218)
                0.046875 = fieldNorm(doc=1419)
        0.24 = coord(6/25)
  4. Alex, H.; Heiner-Freiling, M.: DDC-Sachgruppen der Deutschen Naitonalbibliografie : Leitfaden zu ihrer Vergabe (2003) 0.13
    0.12687682 = sum of:
      0.12687682 = product of:
        1.5859603 = sum of:
          0.28518268 = weight(abstract_txt:vergabe in 2191) [ClassicSimilarity], result of:
            0.28518268 = score(doc=2191,freq=1.0), product of:
              0.13110107 = queryWeight, product of:
                1.0033278 = boost
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.015017114 = queryNorm
              2.1752887 = fieldWeight in 2191, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.25 = fieldNorm(doc=2191)
          1.3007776 = weight(abstract_txt:sachgruppen in 2191) [ClassicSimilarity], result of:
            1.3007776 = score(doc=2191,freq=1.0), product of:
              0.61657065 = queryWeight, product of:
                4.865373 = boost
                8.43879 = idf(docFreq=25, maxDocs=44218)
                0.015017114 = queryNorm
              2.1096976 = fieldWeight in 2191, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.43879 = idf(docFreq=25, maxDocs=44218)
                0.25 = fieldNorm(doc=2191)
        0.08 = coord(2/25)
  5. Krischker, U.: Formale Analyse von Dokumenten (1997) 0.13
    0.12664764 = sum of:
      0.12664764 = product of:
        0.452313 = sum of:
          0.030300865 = weight(abstract_txt:werden in 3925) [ClassicSimilarity], result of:
            0.030300865 = score(doc=3925,freq=3.0), product of:
              0.063864686 = queryWeight, product of:
                1.2129161 = boost
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.015017114 = queryNorm
              0.47445413 = fieldWeight in 3925, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.078125 = fieldNorm(doc=3925)
          0.0936179 = weight(abstract_txt:dokumente in 3925) [ClassicSimilarity], result of:
            0.0936179 = score(doc=3925,freq=2.0), product of:
              0.13547632 = queryWeight, product of:
                1.4424025 = boost
                6.2544694 = idf(docFreq=230, maxDocs=44218)
                0.015017114 = queryNorm
              0.69102776 = fieldWeight in 3925, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.2544694 = idf(docFreq=230, maxDocs=44218)
                0.078125 = fieldNorm(doc=3925)
          0.102418445 = weight(abstract_txt:dokumenten in 3925) [ClassicSimilarity], result of:
            0.102418445 = score(doc=3925,freq=2.0), product of:
              0.14383887 = queryWeight, product of:
                1.4862535 = boost
                6.444614 = idf(docFreq=190, maxDocs=44218)
                0.015017114 = queryNorm
              0.71203595 = fieldWeight in 3925, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.444614 = idf(docFreq=190, maxDocs=44218)
                0.078125 = fieldNorm(doc=3925)
          0.038899843 = weight(abstract_txt:eines in 3925) [ClassicSimilarity], result of:
            0.038899843 = score(doc=3925,freq=1.0), product of:
              0.10880022 = queryWeight, product of:
                1.5831252 = boost
                4.5764427 = idf(docFreq=1236, maxDocs=44218)
                0.015017114 = queryNorm
              0.3575346 = fieldWeight in 3925, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5764427 = idf(docFreq=1236, maxDocs=44218)
                0.078125 = fieldNorm(doc=3925)
          0.0912492 = weight(abstract_txt:hierzu in 3925) [ClassicSimilarity], result of:
            0.0912492 = score(doc=3925,freq=1.0), product of:
              0.16779801 = queryWeight, product of:
                1.6052703 = boost
                6.9606886 = idf(docFreq=113, maxDocs=44218)
                0.015017114 = queryNorm
              0.5438038 = fieldWeight in 3925, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9606886 = idf(docFreq=113, maxDocs=44218)
                0.078125 = fieldNorm(doc=3925)
          0.044440262 = weight(abstract_txt:unter in 3925) [ClassicSimilarity], result of:
            0.044440262 = score(doc=3925,freq=1.0), product of:
              0.11890011 = queryWeight, product of:
                1.6549753 = boost
                4.7841444 = idf(docFreq=1004, maxDocs=44218)
                0.015017114 = queryNorm
              0.3737613 = fieldWeight in 3925, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7841444 = idf(docFreq=1004, maxDocs=44218)
                0.078125 = fieldNorm(doc=3925)
          0.05138649 = weight(abstract_txt:wird in 3925) [ClassicSimilarity], result of:
            0.05138649 = score(doc=3925,freq=2.0), product of:
              0.12326415 = queryWeight, product of:
                2.1754203 = boost
                3.773177 = idf(docFreq=2761, maxDocs=44218)
                0.015017114 = queryNorm
              0.41688108 = fieldWeight in 3925, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.773177 = idf(docFreq=2761, maxDocs=44218)
                0.078125 = fieldNorm(doc=3925)
        0.28 = coord(7/25)