Document (#40196)

Author
Bredack, J.
Title
Automatische Extraktion fachterminologischer Mehrwortbegriffe : ein Verfahrensvergleich
Imprint
Trier : Universität / Fachbereich II
Year
2016
Pages
V, 98 S
Abstract
In dieser Untersuchung wurden zwei Systeme eingesetzt, um MWT aus einer Dokumentkollektion mit fachsprachlichem Bezug (Volltexte des ACL Anthology Reference Corpus) automatisch zu extrahieren. Das thematische Spektrum umfasste alle Bereiche der natürlichen Sprachverarbeitung, im Speziellen die CL als interdisziplinäre Wissenschaft. Ziel war es MWT zu extrahieren, die als potentielle Indexterme im IR Verwendung finden können. Diese sollten auf Konzepte, Methoden, Verfahren und Algorithmen in der CL und angrenzenden Teilgebieten, wie Linguistik und Informatik hinweisen bzw. benennen.
Als Extraktionssysteme wurden der TreeTagger und die Indexierungssoftware Lingo verwendet. Der TreeTagger basiert auf einem statistischen Tagging- und Chunking- Algorithmus, mit dessen Hilfe NPs automatisch identifiziert und extrahiert werden. Er kann für verschiedene Anwendungsszenarien der natürlichen Sprachverarbeitung eingesetzt werden, in erster Linie als POS-Tagger für unterschiedliche Sprachen. Das Indexierungssystem Lingo arbeitet im Gegensatz zum TreeTagger mit elektronischen Wörterbüchern und einem musterbasierten Abgleich. Lingo ist ein auf automatische Indexierung ausgerichtetes System, was eine Vielzahl von Modulen mitliefert, die individuell auf eine bestimmte Aufgabenstellung angepasst und aufeinander abgestimmt werden können. Die unterschiedlichen Verarbeitungsweisen haben sich in den Ergebnismengen beider Systeme deutlich gezeigt. Die gering ausfallenden Übereinstimmungen der Ergebnismengen verdeutlichen die abweichende Funktionsweise und konnte mit einer qualitativen Analyse beispielhaft beschrieben werden. In der vorliegenden Arbeit kann abschließend nicht geklärt werden, welches der beiden Systeme bevorzugt für die Generierung von Indextermen eingesetzt werden sollte.
Content
Schriftliche Hausarbeit (Masterarbeit) zur Erlangung des Grades eines Master of Arts An der Universität Trier Fachbereich II Studiengang Computerlinguistik.
Theme
Automatisches Indexieren
Computerlinguistik
Object
Lingo
TreeTagger

Similar documents (content)

  1. Bredack, J.: Terminologieextraktion von Mehrwortgruppen in kunsthistorischen Fachtexten (2013) 0.34
    0.3353264 = sum of:
      0.3353264 = product of:
        0.93146217 = sum of:
          0.028138284 = weight(abstract_txt:können in 3055) [ClassicSimilarity], result of:
            0.028138284 = score(doc=3055,freq=5.0), product of:
              0.07195983 = queryWeight, product of:
                4.476746 = idf(docFreq=1320, maxDocs=42740)
                0.016074138 = queryNorm
              0.39102766 = fieldWeight in 3055, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.476746 = idf(docFreq=1320, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3055)
          0.07468608 = weight(abstract_txt:extrahiert in 3055) [ClassicSimilarity], result of:
            0.07468608 = score(doc=3055,freq=2.0), product of:
              0.14860092 = queryWeight, product of:
                1.0161333 = boost
                9.097941 = idf(docFreq=12, maxDocs=42740)
                0.016074138 = queryNorm
              0.502595 = fieldWeight in 3055, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.097941 = idf(docFreq=12, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3055)
          0.05421722 = weight(abstract_txt:wörterbüchern in 3055) [ClassicSimilarity], result of:
            0.05421722 = score(doc=3055,freq=1.0), product of:
              0.15122719 = queryWeight, product of:
                1.0250732 = boost
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.016074138 = queryNorm
              0.35851502 = fieldWeight in 3055, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3055)
          0.059476957 = weight(abstract_txt:indexierungssystem in 3055) [ClassicSimilarity], result of:
            0.059476957 = score(doc=3055,freq=1.0), product of:
              0.1608561 = queryWeight, product of:
                1.0572038 = boost
                9.465666 = idf(docFreq=8, maxDocs=42740)
                0.016074138 = queryNorm
              0.3697526 = fieldWeight in 3055, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.465666 = idf(docFreq=8, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3055)
          0.08729228 = weight(abstract_txt:indexterme in 3055) [ClassicSimilarity], result of:
            0.08729228 = score(doc=3055,freq=2.0), product of:
              0.16488418 = queryWeight, product of:
                1.0703589 = boost
                9.583449 = idf(docFreq=7, maxDocs=42740)
                0.016074138 = queryNorm
              0.5294157 = fieldWeight in 3055, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.583449 = idf(docFreq=7, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3055)
          0.034341197 = weight(abstract_txt:wurden in 3055) [ClassicSimilarity], result of:
            0.034341197 = score(doc=3055,freq=3.0), product of:
              0.097435735 = queryWeight, product of:
                1.1636277 = boost
                5.2092657 = idf(docFreq=634, maxDocs=42740)
                0.016074138 = queryNorm
              0.35244972 = fieldWeight in 3055, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.2092657 = idf(docFreq=634, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3055)
          0.21589608 = weight(abstract_txt:extrahieren in 3055) [ClassicSimilarity], result of:
            0.21589608 = score(doc=3055,freq=5.0), product of:
              0.2799335 = queryWeight, product of:
                1.9723427 = boost
                8.829678 = idf(docFreq=16, maxDocs=42740)
                0.016074138 = queryNorm
              0.7712406 = fieldWeight in 3055, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                8.829678 = idf(docFreq=16, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3055)
          0.05211078 = weight(abstract_txt:werden in 3055) [ClassicSimilarity], result of:
            0.05211078 = score(doc=3055,freq=8.0), product of:
              0.13381676 = queryWeight, product of:
                2.3619506 = boost
                3.524618 = idf(docFreq=3422, maxDocs=42740)
                0.016074138 = queryNorm
              0.38941893 = fieldWeight in 3055, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                3.524618 = idf(docFreq=3422, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3055)
          0.3253033 = weight(abstract_txt:lingo in 3055) [ClassicSimilarity], result of:
            0.3253033 = score(doc=3055,freq=4.0), product of:
              0.45368153 = queryWeight, product of:
                3.0752194 = boost
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.016074138 = queryNorm
              0.71703005 = fieldWeight in 3055, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3055)
        0.36 = coord(9/25)
    
  2. Glaesener, L.: Automatisches Indexieren einer informationswissenschaftlichen Datenbank mit Mehrwortgruppen (2012) 0.17
    0.16694605 = sum of:
      0.16694605 = product of:
        0.83473027 = sum of:
          0.042112265 = weight(abstract_txt:kann in 2402) [ClassicSimilarity], result of:
            0.042112265 = score(doc=2402,freq=1.0), product of:
              0.074140266 = queryWeight, product of:
                1.0150373 = boost
                4.544064 = idf(docFreq=1234, maxDocs=42740)
                0.016074138 = queryNorm
              0.568008 = fieldWeight in 2402, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.544064 = idf(docFreq=1234, maxDocs=42740)
                0.125 = fieldNorm(doc=2402)
          0.06344608 = weight(abstract_txt:wurden in 2402) [ClassicSimilarity], result of:
            0.06344608 = score(doc=2402,freq=1.0), product of:
              0.097435735 = queryWeight, product of:
                1.1636277 = boost
                5.2092657 = idf(docFreq=634, maxDocs=42740)
                0.016074138 = queryNorm
              0.6511582 = fieldWeight in 2402, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2092657 = idf(docFreq=634, maxDocs=42740)
                0.125 = fieldNorm(doc=2402)
          0.14973007 = weight(abstract_txt:automatische in 2402) [ClassicSimilarity], result of:
            0.14973007 = score(doc=2402,freq=1.0), product of:
              0.17271143 = queryWeight, product of:
                1.5492284 = boost
                6.9355025 = idf(docFreq=112, maxDocs=42740)
                0.016074138 = queryNorm
              0.8669378 = fieldWeight in 2402, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9355025 = idf(docFreq=112, maxDocs=42740)
                0.125 = fieldNorm(doc=2402)
          0.05895662 = weight(abstract_txt:werden in 2402) [ClassicSimilarity], result of:
            0.05895662 = score(doc=2402,freq=1.0), product of:
              0.13381676 = queryWeight, product of:
                2.3619506 = boost
                3.524618 = idf(docFreq=3422, maxDocs=42740)
                0.016074138 = queryNorm
              0.44057724 = fieldWeight in 2402, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.524618 = idf(docFreq=3422, maxDocs=42740)
                0.125 = fieldNorm(doc=2402)
          0.5204852 = weight(abstract_txt:lingo in 2402) [ClassicSimilarity], result of:
            0.5204852 = score(doc=2402,freq=1.0), product of:
              0.45368153 = queryWeight, product of:
                3.0752194 = boost
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.016074138 = queryNorm
              1.147248 = fieldWeight in 2402, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.125 = fieldNorm(doc=2402)
        0.2 = coord(5/25)
    
  3. Grün, S.: Mehrwortbegriffe und Latent Semantic Analysis : Bewertung automatisch extrahierter Mehrwortgruppen mit LSA (2017) 0.16
    0.16171142 = sum of:
      0.16171142 = product of:
        0.6737976 = sum of:
          0.08057287 = weight(abstract_txt:bevorzugt in 5955) [ClassicSimilarity], result of:
            0.08057287 = score(doc=5955,freq=1.0), product of:
              0.14396302 = queryWeight, product of:
                1.0001506 = boost
                8.954841 = idf(docFreq=14, maxDocs=42740)
                0.016074138 = queryNorm
              0.55967754 = fieldWeight in 5955, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.954841 = idf(docFreq=14, maxDocs=42740)
                0.0625 = fieldNorm(doc=5955)
          0.03172304 = weight(abstract_txt:wurden in 5955) [ClassicSimilarity], result of:
            0.03172304 = score(doc=5955,freq=1.0), product of:
              0.097435735 = queryWeight, product of:
                1.1636277 = boost
                5.2092657 = idf(docFreq=634, maxDocs=42740)
                0.016074138 = queryNorm
              0.3255791 = fieldWeight in 5955, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2092657 = idf(docFreq=634, maxDocs=42740)
                0.0625 = fieldNorm(doc=5955)
          0.074865036 = weight(abstract_txt:automatische in 5955) [ClassicSimilarity], result of:
            0.074865036 = score(doc=5955,freq=1.0), product of:
              0.17271143 = queryWeight, product of:
                1.5492284 = boost
                6.9355025 = idf(docFreq=112, maxDocs=42740)
                0.016074138 = queryNorm
              0.4334689 = fieldWeight in 5955, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9355025 = idf(docFreq=112, maxDocs=42740)
                0.0625 = fieldNorm(doc=5955)
          0.08911969 = weight(abstract_txt:eingesetzt in 5955) [ClassicSimilarity], result of:
            0.08911969 = score(doc=5955,freq=1.0), product of:
              0.22206558 = queryWeight, product of:
                2.1514993 = boost
                6.4211435 = idf(docFreq=188, maxDocs=42740)
                0.016074138 = queryNorm
              0.40132147 = fieldWeight in 5955, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.4211435 = idf(docFreq=188, maxDocs=42740)
                0.0625 = fieldNorm(doc=5955)
          0.02947831 = weight(abstract_txt:werden in 5955) [ClassicSimilarity], result of:
            0.02947831 = score(doc=5955,freq=1.0), product of:
              0.13381676 = queryWeight, product of:
                2.3619506 = boost
                3.524618 = idf(docFreq=3422, maxDocs=42740)
                0.016074138 = queryNorm
              0.22028862 = fieldWeight in 5955, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.524618 = idf(docFreq=3422, maxDocs=42740)
                0.0625 = fieldNorm(doc=5955)
          0.36803862 = weight(abstract_txt:lingo in 5955) [ClassicSimilarity], result of:
            0.36803862 = score(doc=5955,freq=2.0), product of:
              0.45368153 = queryWeight, product of:
                3.0752194 = boost
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.016074138 = queryNorm
              0.81122684 = fieldWeight in 5955, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.0625 = fieldNorm(doc=5955)
        0.24 = coord(6/25)
    
  4. Bredack, J.; Lepsky, K.: Automatische Extraktion von Fachterminologie aus Volltexten (2014) 0.15
    0.14542933 = sum of:
      0.14542933 = product of:
        0.9089333 = sum of:
          0.16653547 = weight(abstract_txt:indexierungssystem in 1873) [ClassicSimilarity], result of:
            0.16653547 = score(doc=1873,freq=1.0), product of:
              0.1608561 = queryWeight, product of:
                1.0572038 = boost
                9.465666 = idf(docFreq=8, maxDocs=42740)
                0.016074138 = queryNorm
              1.0353072 = fieldWeight in 1873, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.465666 = idf(docFreq=8, maxDocs=42740)
                0.109375 = fieldNorm(doc=1873)
          0.13101381 = weight(abstract_txt:automatische in 1873) [ClassicSimilarity], result of:
            0.13101381 = score(doc=1873,freq=1.0), product of:
              0.17271143 = queryWeight, product of:
                1.5492284 = boost
                6.9355025 = idf(docFreq=112, maxDocs=42740)
                0.016074138 = queryNorm
              0.7585706 = fieldWeight in 1873, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9355025 = idf(docFreq=112, maxDocs=42740)
                0.109375 = fieldNorm(doc=1873)
          0.15595946 = weight(abstract_txt:eingesetzt in 1873) [ClassicSimilarity], result of:
            0.15595946 = score(doc=1873,freq=1.0), product of:
              0.22206558 = queryWeight, product of:
                2.1514993 = boost
                6.4211435 = idf(docFreq=188, maxDocs=42740)
                0.016074138 = queryNorm
              0.7023126 = fieldWeight in 1873, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.4211435 = idf(docFreq=188, maxDocs=42740)
                0.109375 = fieldNorm(doc=1873)
          0.45542458 = weight(abstract_txt:lingo in 1873) [ClassicSimilarity], result of:
            0.45542458 = score(doc=1873,freq=1.0), product of:
              0.45368153 = queryWeight, product of:
                3.0752194 = boost
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.016074138 = queryNorm
              1.003842 = fieldWeight in 1873, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.109375 = fieldNorm(doc=1873)
        0.16 = coord(4/25)
    
  5. Grün, S.: Bildung von Komposita-Indextermen auf der Basis einer algorithmischen Mehrwortgruppenanalyse mit Lingo (2015) 0.12
    0.11802466 = sum of:
      0.11802466 = product of:
        0.5901233 = sum of:
          0.035592426 = weight(abstract_txt:können in 2336) [ClassicSimilarity], result of:
            0.035592426 = score(doc=2336,freq=2.0), product of:
              0.07195983 = queryWeight, product of:
                4.476746 = idf(docFreq=1320, maxDocs=42740)
                0.016074138 = queryNorm
              0.4946152 = fieldWeight in 2336, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.476746 = idf(docFreq=1320, maxDocs=42740)
                0.078125 = fieldNorm(doc=2336)
          0.10843444 = weight(abstract_txt:wörterbüchern in 2336) [ClassicSimilarity], result of:
            0.10843444 = score(doc=2336,freq=1.0), product of:
              0.15122719 = queryWeight, product of:
                1.0250732 = boost
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.016074138 = queryNorm
              0.71703005 = fieldWeight in 2336, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.078125 = fieldNorm(doc=2336)
          0.068682395 = weight(abstract_txt:wurden in 2336) [ClassicSimilarity], result of:
            0.068682395 = score(doc=2336,freq=3.0), product of:
              0.097435735 = queryWeight, product of:
                1.1636277 = boost
                5.2092657 = idf(docFreq=634, maxDocs=42740)
                0.016074138 = queryNorm
              0.70489943 = fieldWeight in 2336, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.2092657 = idf(docFreq=634, maxDocs=42740)
                0.078125 = fieldNorm(doc=2336)
          0.05211078 = weight(abstract_txt:werden in 2336) [ClassicSimilarity], result of:
            0.05211078 = score(doc=2336,freq=2.0), product of:
              0.13381676 = queryWeight, product of:
                2.3619506 = boost
                3.524618 = idf(docFreq=3422, maxDocs=42740)
                0.016074138 = queryNorm
              0.38941893 = fieldWeight in 2336, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.524618 = idf(docFreq=3422, maxDocs=42740)
                0.078125 = fieldNorm(doc=2336)
          0.3253033 = weight(abstract_txt:lingo in 2336) [ClassicSimilarity], result of:
            0.3253033 = score(doc=2336,freq=1.0), product of:
              0.45368153 = queryWeight, product of:
                3.0752194 = boost
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.016074138 = queryNorm
              0.71703005 = fieldWeight in 2336, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.078125 = fieldNorm(doc=2336)
        0.2 = coord(5/25)