Document (#40195)

Author
Bredack, J.
Title
Automatische Extraktion fachterminologischer Mehrwortbegriffe : ein Verfahrensvergleich
Imprint
Trier : Universität / Fachbereich II
Year
2016
Pages
V, 98 S
Abstract
In dieser Untersuchung wurden zwei Systeme eingesetzt, um MWT aus einer Dokumentkollektion mit fachsprachlichem Bezug (Volltexte des ACL Anthology Reference Corpus) automatisch zu extrahieren. Das thematische Spektrum umfasste alle Bereiche der natürlichen Sprachverarbeitung, im Speziellen die CL als interdisziplinäre Wissenschaft. Ziel war es MWT zu extrahieren, die als potentielle Indexterme im IR Verwendung finden können. Diese sollten auf Konzepte, Methoden, Verfahren und Algorithmen in der CL und angrenzenden Teilgebieten, wie Linguistik und Informatik hinweisen bzw. benennen.
Als Extraktionssysteme wurden der TreeTagger und die Indexierungssoftware Lingo verwendet. Der TreeTagger basiert auf einem statistischen Tagging- und Chunking- Algorithmus, mit dessen Hilfe NPs automatisch identifiziert und extrahiert werden. Er kann für verschiedene Anwendungsszenarien der natürlichen Sprachverarbeitung eingesetzt werden, in erster Linie als POS-Tagger für unterschiedliche Sprachen. Das Indexierungssystem Lingo arbeitet im Gegensatz zum TreeTagger mit elektronischen Wörterbüchern und einem musterbasierten Abgleich. Lingo ist ein auf automatische Indexierung ausgerichtetes System, was eine Vielzahl von Modulen mitliefert, die individuell auf eine bestimmte Aufgabenstellung angepasst und aufeinander abgestimmt werden können. Die unterschiedlichen Verarbeitungsweisen haben sich in den Ergebnismengen beider Systeme deutlich gezeigt. Die gering ausfallenden Übereinstimmungen der Ergebnismengen verdeutlichen die abweichende Funktionsweise und konnte mit einer qualitativen Analyse beispielhaft beschrieben werden. In der vorliegenden Arbeit kann abschließend nicht geklärt werden, welches der beiden Systeme bevorzugt für die Generierung von Indextermen eingesetzt werden sollte.
Content
Schriftliche Hausarbeit (Masterarbeit) zur Erlangung des Grades eines Master of Arts An der Universität Trier Fachbereich II Studiengang Computerlinguistik.
Theme
Automatisches Indexieren
Computerlinguistik
Object
Lingo
TreeTagger

Similar documents (content)

  1. Bredack, J.: Terminologieextraktion von Mehrwortgruppen in kunsthistorischen Fachtexten (2013) 0.34
    0.3366052 = sum of:
      0.3366052 = product of:
        0.93501437 = sum of:
          0.02763487 = weight(abstract_txt:können in 1054) [ClassicSimilarity], result of:
            0.02763487 = score(doc=1054,freq=5.0), product of:
              0.07112302 = queryWeight, product of:
                4.4483833 = idf(docFreq=1405, maxDocs=44218)
                0.01598851 = queryNorm
              0.3885503 = fieldWeight in 1054, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.4483833 = idf(docFreq=1405, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1054)
          0.0737774 = weight(abstract_txt:extrahiert in 1054) [ClassicSimilarity], result of:
            0.0737774 = score(doc=1054,freq=2.0), product of:
              0.147443 = queryWeight, product of:
                1.0181036 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.01598851 = queryNorm
              0.50037915 = fieldWeight in 1054, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1054)
          0.05487758 = weight(abstract_txt:wörterbüchern in 1054) [ClassicSimilarity], result of:
            0.05487758 = score(doc=1054,freq=1.0), product of:
              0.15250422 = queryWeight, product of:
                1.0354302 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.01598851 = queryNorm
              0.35984302 = fieldWeight in 1054, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1054)
          0.060181133 = weight(abstract_txt:indexierungssystem in 1054) [ClassicSimilarity], result of:
            0.060181133 = score(doc=1054,freq=1.0), product of:
              0.16217807 = queryWeight, product of:
                1.0677657 = boost
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.01598851 = queryNorm
              0.37108058 = fieldWeight in 1054, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1054)
          0.08510897 = weight(abstract_txt:indexterme in 1054) [ClassicSimilarity], result of:
            0.08510897 = score(doc=1054,freq=2.0), product of:
              0.16217807 = queryWeight, product of:
                1.0677657 = boost
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.01598851 = queryNorm
              0.5247872 = fieldWeight in 1054, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1054)
          0.03419516 = weight(abstract_txt:wurden in 1054) [ClassicSimilarity], result of:
            0.03419516 = score(doc=1054,freq=3.0), product of:
              0.09719216 = queryWeight, product of:
                1.1689893 = boost
                5.2001123 = idf(docFreq=662, maxDocs=44218)
                0.01598851 = queryNorm
              0.35183042 = fieldWeight in 1054, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.2001123 = idf(docFreq=662, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1054)
          0.21862116 = weight(abstract_txt:extrahieren in 1054) [ClassicSimilarity], result of:
            0.21862116 = score(doc=1054,freq=5.0), product of:
              0.28237963 = queryWeight, product of:
                1.9925607 = boost
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.01598851 = queryNorm
              0.7742101 = fieldWeight in 1054, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1054)
          0.051352646 = weight(abstract_txt:werden in 1054) [ClassicSimilarity], result of:
            0.051352646 = score(doc=1054,freq=8.0), product of:
              0.13256052 = queryWeight, product of:
                2.364627 = boost
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.01598851 = queryNorm
              0.3873902 = fieldWeight in 1054, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1054)
          0.32926548 = weight(abstract_txt:lingo in 1054) [ClassicSimilarity], result of:
            0.32926548 = score(doc=1054,freq=4.0), product of:
              0.45751265 = queryWeight, product of:
                3.1062906 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.01598851 = queryNorm
              0.71968603 = fieldWeight in 1054, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1054)
        0.36 = coord(9/25)
    
  2. Glaesener, L.: Automatisches Indexieren einer informationswissenschaftlichen Datenbank mit Mehrwortgruppen (2012) 0.17
    0.16758902 = sum of:
      0.16758902 = product of:
        0.8379451 = sum of:
          0.041110285 = weight(abstract_txt:kann in 401) [ClassicSimilarity], result of:
            0.041110285 = score(doc=401,freq=1.0), product of:
              0.07298421 = queryWeight, product of:
                1.0129998 = boost
                4.5062113 = idf(docFreq=1326, maxDocs=44218)
                0.01598851 = queryNorm
              0.5632764 = fieldWeight in 401, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5062113 = idf(docFreq=1326, maxDocs=44218)
                0.125 = fieldNorm(doc=401)
          0.06317627 = weight(abstract_txt:wurden in 401) [ClassicSimilarity], result of:
            0.06317627 = score(doc=401,freq=1.0), product of:
              0.09719216 = queryWeight, product of:
                1.1689893 = boost
                5.2001123 = idf(docFreq=662, maxDocs=44218)
                0.01598851 = queryNorm
              0.65001404 = fieldWeight in 401, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2001123 = idf(docFreq=662, maxDocs=44218)
                0.125 = fieldNorm(doc=401)
          0.14873493 = weight(abstract_txt:automatische in 401) [ClassicSimilarity], result of:
            0.14873493 = score(doc=401,freq=1.0), product of:
              0.17200348 = queryWeight, product of:
                1.5551186 = boost
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.01598851 = queryNorm
              0.86472046 = fieldWeight in 401, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.125 = fieldNorm(doc=401)
          0.058098882 = weight(abstract_txt:werden in 401) [ClassicSimilarity], result of:
            0.058098882 = score(doc=401,freq=1.0), product of:
              0.13256052 = queryWeight, product of:
                2.364627 = boost
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.01598851 = queryNorm
              0.43828195 = fieldWeight in 401, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.125 = fieldNorm(doc=401)
          0.5268247 = weight(abstract_txt:lingo in 401) [ClassicSimilarity], result of:
            0.5268247 = score(doc=401,freq=1.0), product of:
              0.45751265 = queryWeight, product of:
                3.1062906 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.01598851 = queryNorm
              1.1514976 = fieldWeight in 401, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.125 = fieldNorm(doc=401)
        0.2 = coord(5/25)
    
  3. Grün, S.: Mehrwortbegriffe und Latent Semantic Analysis : Bewertung automatisch extrahierter Mehrwortgruppen mit LSA (2017) 0.16
    0.16287415 = sum of:
      0.16287415 = product of:
        0.6786423 = sum of:
          0.08157674 = weight(abstract_txt:bevorzugt in 3954) [ClassicSimilarity], result of:
            0.08157674 = score(doc=3954,freq=1.0), product of:
              0.14520542 = queryWeight, product of:
                1.0103488 = boost
                8.988837 = idf(docFreq=14, maxDocs=44218)
                0.01598851 = queryNorm
              0.5618023 = fieldWeight in 3954, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.988837 = idf(docFreq=14, maxDocs=44218)
                0.0625 = fieldNorm(doc=3954)
          0.031588133 = weight(abstract_txt:wurden in 3954) [ClassicSimilarity], result of:
            0.031588133 = score(doc=3954,freq=1.0), product of:
              0.09719216 = queryWeight, product of:
                1.1689893 = boost
                5.2001123 = idf(docFreq=662, maxDocs=44218)
                0.01598851 = queryNorm
              0.32500702 = fieldWeight in 3954, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2001123 = idf(docFreq=662, maxDocs=44218)
                0.0625 = fieldNorm(doc=3954)
          0.07436746 = weight(abstract_txt:automatische in 3954) [ClassicSimilarity], result of:
            0.07436746 = score(doc=3954,freq=1.0), product of:
              0.17200348 = queryWeight, product of:
                1.5551186 = boost
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.01598851 = queryNorm
              0.43236023 = fieldWeight in 3954, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.0625 = fieldNorm(doc=3954)
          0.08953921 = weight(abstract_txt:eingesetzt in 3954) [ClassicSimilarity], result of:
            0.08953921 = score(doc=3954,freq=1.0), product of:
              0.22283727 = queryWeight, product of:
                2.167876 = boost
                6.429029 = idf(docFreq=193, maxDocs=44218)
                0.01598851 = queryNorm
              0.4018143 = fieldWeight in 3954, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.429029 = idf(docFreq=193, maxDocs=44218)
                0.0625 = fieldNorm(doc=3954)
          0.029049441 = weight(abstract_txt:werden in 3954) [ClassicSimilarity], result of:
            0.029049441 = score(doc=3954,freq=1.0), product of:
              0.13256052 = queryWeight, product of:
                2.364627 = boost
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.01598851 = queryNorm
              0.21914098 = fieldWeight in 3954, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.0625 = fieldNorm(doc=3954)
          0.3725213 = weight(abstract_txt:lingo in 3954) [ClassicSimilarity], result of:
            0.3725213 = score(doc=3954,freq=2.0), product of:
              0.45751265 = queryWeight, product of:
                3.1062906 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.01598851 = queryNorm
              0.81423175 = fieldWeight in 3954, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.0625 = fieldNorm(doc=3954)
        0.24 = coord(6/25)
    
  4. Lepsky, K.: Automatisches Indexieren (2023) 0.16
    0.16167742 = sum of:
      0.16167742 = product of:
        0.6736559 = sum of:
          0.029660854 = weight(abstract_txt:können in 781) [ClassicSimilarity], result of:
            0.029660854 = score(doc=781,freq=1.0), product of:
              0.07112302 = queryWeight, product of:
                4.4483833 = idf(docFreq=1405, maxDocs=44218)
                0.01598851 = queryNorm
              0.41703594 = fieldWeight in 781, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4483833 = idf(docFreq=1405, maxDocs=44218)
                0.09375 = fieldNorm(doc=781)
          0.04360404 = weight(abstract_txt:kann in 781) [ClassicSimilarity], result of:
            0.04360404 = score(doc=781,freq=2.0), product of:
              0.07298421 = queryWeight, product of:
                1.0129998 = boost
                4.5062113 = idf(docFreq=1326, maxDocs=44218)
                0.01598851 = queryNorm
              0.59744483 = fieldWeight in 781, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.5062113 = idf(docFreq=1326, maxDocs=44218)
                0.09375 = fieldNorm(doc=781)
          0.25016826 = weight(abstract_txt:indexterme in 781) [ClassicSimilarity], result of:
            0.25016826 = score(doc=781,freq=3.0), product of:
              0.16217807 = queryWeight, product of:
                1.0677657 = boost
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.01598851 = queryNorm
              1.542553 = fieldWeight in 781, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.09375 = fieldNorm(doc=781)
          0.1115512 = weight(abstract_txt:automatische in 781) [ClassicSimilarity], result of:
            0.1115512 = score(doc=781,freq=1.0), product of:
              0.17200348 = queryWeight, product of:
                1.5551186 = boost
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.01598851 = queryNorm
              0.6485404 = fieldWeight in 781, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.09375 = fieldNorm(doc=781)
          0.16319892 = weight(abstract_txt:automatisch in 781) [ClassicSimilarity], result of:
            0.16319892 = score(doc=781,freq=2.0), product of:
              0.17593649 = queryWeight, product of:
                1.5727977 = boost
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.01598851 = queryNorm
              0.92760134 = fieldWeight in 781, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.09375 = fieldNorm(doc=781)
          0.07547266 = weight(abstract_txt:werden in 781) [ClassicSimilarity], result of:
            0.07547266 = score(doc=781,freq=3.0), product of:
              0.13256052 = queryWeight, product of:
                2.364627 = boost
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.01598851 = queryNorm
              0.56934494 = fieldWeight in 781, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.09375 = fieldNorm(doc=781)
        0.24 = coord(6/25)
    
  5. Bredack, J.; Lepsky, K.: Automatische Extraktion von Fachterminologie aus Volltexten (2014) 0.15
    0.14661047 = sum of:
      0.14661047 = product of:
        0.91631544 = sum of:
          0.16850716 = weight(abstract_txt:indexierungssystem in 4872) [ClassicSimilarity], result of:
            0.16850716 = score(doc=4872,freq=1.0), product of:
              0.16217807 = queryWeight, product of:
                1.0677657 = boost
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.01598851 = queryNorm
              1.0390255 = fieldWeight in 4872, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.109375 = fieldNorm(doc=4872)
          0.13014306 = weight(abstract_txt:automatische in 4872) [ClassicSimilarity], result of:
            0.13014306 = score(doc=4872,freq=1.0), product of:
              0.17200348 = queryWeight, product of:
                1.5551186 = boost
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.01598851 = queryNorm
              0.7566304 = fieldWeight in 4872, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.109375 = fieldNorm(doc=4872)
          0.15669361 = weight(abstract_txt:eingesetzt in 4872) [ClassicSimilarity], result of:
            0.15669361 = score(doc=4872,freq=1.0), product of:
              0.22283727 = queryWeight, product of:
                2.167876 = boost
                6.429029 = idf(docFreq=193, maxDocs=44218)
                0.01598851 = queryNorm
              0.70317507 = fieldWeight in 4872, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.429029 = idf(docFreq=193, maxDocs=44218)
                0.109375 = fieldNorm(doc=4872)
          0.46097162 = weight(abstract_txt:lingo in 4872) [ClassicSimilarity], result of:
            0.46097162 = score(doc=4872,freq=1.0), product of:
              0.45751265 = queryWeight, product of:
                3.1062906 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.01598851 = queryNorm
              1.0075604 = fieldWeight in 4872, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.109375 = fieldNorm(doc=4872)
        0.16 = coord(4/25)