Document (#26337)

Author
Grün, S.
Title
Bildung von Komposita-Indextermen auf der Basis einer algorithmischen Mehrwortgruppenanalyse mit Lingo
Imprint
Köln : Fachhochschule, Institut für Informationswissenschaft
Year
2015
Pages
69 S
Abstract
In der deutschen Sprache lassen sich Begriffe durch Komposita und Mehrwortgruppen ausdrücken. Letztere können dabei aber auch als Kompositum selbst ausgedrückt werden und entsprechend auf den gleichen Begriff verweisen. In der nachfolgenden Studie werden Mehrwortgruppen analysiert, die auch Komposita sein können. Ziel der Untersuchung ist es, diese Wortfolgen über Muster zu identifizieren. Analysiert wurden Daten des Karrieremanagers Placement24 GmbH - in Form von Stellenanzeigen. Die Extraktion von Mehrwortgruppen erfolgte algorithmisch und wurde mit der Open-Source Software Lingo durch geführt. Auf der Basis von Erweiterungen bzw. Anpassungen in Wörterbüchern und den darin getaggten Wörtern wurde drei- bis fünfstelligen Kandidaten analysiert. Aus positiv bewerteten Mehrwortgruppen wurden Komposita gebildet. Diese wurden mit den identifizierten Komposita aus den Stellenanzeigen verglichen. Der Vergleich zeigte, dass ein Großteil der neu generierten Komposita nicht durch eine Kompositaidentifizierung erzeugt wurde.
Content
Bachelorarbeit, Studiengang Bibliothekswesen, Fakultät für Informations- und Kommunikationswissenschaften, Fachhochschule Köln
Theme
Automatisches Indexieren
Object
Lingo

Similar documents (content)

  1. Bredack, J.: Terminologieextraktion von Mehrwortgruppen in kunsthistorischen Fachtexten (2013) 0.57
    0.5689765 = sum of:
      0.5689765 = product of:
        1.1853678 = sum of:
          0.055814113 = weight(abstract_txt:extraktion in 3055) [ClassicSimilarity], result of:
            0.055814113 = score(doc=3055,freq=3.0), product of:
              0.094620235 = queryWeight, product of:
                1.0275332 = boost
                8.7184515 = idf(docFreq=18, maxDocs=42740)
                0.010562065 = queryNorm
              0.58987504 = fieldWeight in 3055, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.7184515 = idf(docFreq=18, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3055)
          0.01618515 = weight(abstract_txt:diese in 3055) [ClassicSimilarity], result of:
            0.01618515 = score(doc=3055,freq=4.0), product of:
              0.047452915 = queryWeight, product of:
                1.0290828 = boost
                4.3657994 = idf(docFreq=1475, maxDocs=42740)
                0.010562065 = queryNorm
              0.34107807 = fieldWeight in 3055, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.3657994 = idf(docFreq=1475, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3055)
          0.01951047 = weight(abstract_txt:können in 3055) [ClassicSimilarity], result of:
            0.01951047 = score(doc=3055,freq=5.0), product of:
              0.049895372 = queryWeight, product of:
                1.0552344 = boost
                4.476746 = idf(docFreq=1320, maxDocs=42740)
                0.010562065 = queryNorm
              0.39102766 = fieldWeight in 3055, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.476746 = idf(docFreq=1320, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3055)
          0.03759303 = weight(abstract_txt:wörterbüchern in 3055) [ClassicSimilarity], result of:
            0.03759303 = score(doc=3055,freq=1.0), product of:
              0.104857616 = queryWeight, product of:
                1.0816926 = boost
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.010562065 = queryNorm
              0.35851502 = fieldWeight in 3055, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3055)
          0.014233507 = weight(abstract_txt:basis in 3055) [ClassicSimilarity], result of:
            0.014233507 = score(doc=3055,freq=2.0), product of:
              0.054878604 = queryWeight, product of:
                1.1066756 = boost
                4.694981 = idf(docFreq=1061, maxDocs=42740)
                0.010562065 = queryNorm
              0.2593635 = fieldWeight in 3055, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.694981 = idf(docFreq=1061, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3055)
          0.06309199 = weight(abstract_txt:algorithmisch in 3055) [ClassicSimilarity], result of:
            0.06309199 = score(doc=3055,freq=2.0), product of:
              0.117535196 = queryWeight, product of:
                1.1452171 = boost
                9.71698 = idf(docFreq=6, maxDocs=42740)
                0.010562065 = queryNorm
              0.53679234 = fieldWeight in 3055, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.71698 = idf(docFreq=6, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3055)
          0.01604339 = weight(abstract_txt:durch in 3055) [ClassicSimilarity], result of:
            0.01604339 = score(doc=3055,freq=2.0), product of:
              0.06803874 = queryWeight, product of:
                1.5091854 = boost
                4.2683973 = idf(docFreq=1626, maxDocs=42740)
                0.010562065 = queryNorm
              0.23579785 = fieldWeight in 3055, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2683973 = idf(docFreq=1626, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3055)
          0.022728046 = weight(abstract_txt:wurde in 3055) [ClassicSimilarity], result of:
            0.022728046 = score(doc=3055,freq=2.0), product of:
              0.08582232 = queryWeight, product of:
                1.69498 = boost
                4.793876 = idf(docFreq=961, maxDocs=42740)
                0.010562065 = queryNorm
              0.26482674 = fieldWeight in 3055, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.793876 = idf(docFreq=961, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3055)
          0.035717152 = weight(abstract_txt:wurden in 3055) [ClassicSimilarity], result of:
            0.035717152 = score(doc=3055,freq=3.0), product of:
              0.101339705 = queryWeight, product of:
                1.8418502 = boost
                5.2092657 = idf(docFreq=634, maxDocs=42740)
                0.010562065 = queryNorm
              0.35244972 = fieldWeight in 3055, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.2092657 = idf(docFreq=634, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3055)
          0.15037212 = weight(abstract_txt:lingo in 3055) [ClassicSimilarity], result of:
            0.15037212 = score(doc=3055,freq=4.0), product of:
              0.20971523 = queryWeight, product of:
                2.1633852 = boost
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.010562065 = queryNorm
              0.71703005 = fieldWeight in 3055, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3055)
          0.07955429 = weight(abstract_txt:analysiert in 3055) [ClassicSimilarity], result of:
            0.07955429 = score(doc=3055,freq=3.0), product of:
              0.17283732 = queryWeight, product of:
                2.4053774 = boost
                6.803078 = idf(docFreq=128, maxDocs=42740)
                0.010562065 = queryNorm
              0.46028423 = fieldWeight in 3055, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.803078 = idf(docFreq=128, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3055)
          0.6745246 = weight(abstract_txt:mehrwortgruppen in 3055) [ClassicSimilarity], result of:
            0.6745246 = score(doc=3055,freq=13.0), product of:
              0.48517585 = queryWeight, product of:
                4.6535397 = boost
                9.871131 = idf(docFreq=5, maxDocs=42740)
                0.010562065 = queryNorm
              1.3902683 = fieldWeight in 3055, product of:
                3.6055512 = tf(freq=13.0), with freq of:
                  13.0 = termFreq=13.0
                9.871131 = idf(docFreq=5, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3055)
        0.48 = coord(12/25)
    
  2. Grün, S.: Mehrwortbegriffe und Latent Semantic Analysis : Bewertung automatisch extrahierter Mehrwortgruppen mit LSA (2017) 0.30
    0.29681402 = sum of:
      0.29681402 = product of:
        0.9275439 = sum of:
          0.016103376 = weight(abstract_txt:basis in 5955) [ClassicSimilarity], result of:
            0.016103376 = score(doc=5955,freq=1.0), product of:
              0.054878604 = queryWeight, product of:
                1.1066756 = boost
                4.694981 = idf(docFreq=1061, maxDocs=42740)
                0.010562065 = queryNorm
              0.29343632 = fieldWeight in 5955, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.694981 = idf(docFreq=1061, maxDocs=42740)
                0.0625 = fieldNorm(doc=5955)
          0.06598403 = weight(abstract_txt:bewerteten in 5955) [ClassicSimilarity], result of:
            0.06598403 = score(doc=5955,freq=1.0), product of:
              0.111534104 = queryWeight, product of:
                1.1155978 = boost
                9.465666 = idf(docFreq=8, maxDocs=42740)
                0.010562065 = queryNorm
              0.5916041 = fieldWeight in 5955, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.465666 = idf(docFreq=8, maxDocs=42740)
                0.0625 = fieldNorm(doc=5955)
          0.13695596 = weight(abstract_txt:kandidaten in 5955) [ClassicSimilarity], result of:
            0.13695596 = score(doc=5955,freq=4.0), product of:
              0.11432707 = queryWeight, product of:
                1.1294795 = boost
                9.583449 = idf(docFreq=7, maxDocs=42740)
                0.010562065 = queryNorm
              1.1979312 = fieldWeight in 5955, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                9.583449 = idf(docFreq=7, maxDocs=42740)
                0.0625 = fieldNorm(doc=5955)
          0.025669422 = weight(abstract_txt:durch in 5955) [ClassicSimilarity], result of:
            0.025669422 = score(doc=5955,freq=2.0), product of:
              0.06803874 = queryWeight, product of:
                1.5091854 = boost
                4.2683973 = idf(docFreq=1626, maxDocs=42740)
                0.010562065 = queryNorm
              0.37727657 = fieldWeight in 5955, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2683973 = idf(docFreq=1626, maxDocs=42740)
                0.0625 = fieldNorm(doc=5955)
          0.051427696 = weight(abstract_txt:wurde in 5955) [ClassicSimilarity], result of:
            0.051427696 = score(doc=5955,freq=4.0), product of:
              0.08582232 = queryWeight, product of:
                1.69498 = boost
                4.793876 = idf(docFreq=961, maxDocs=42740)
                0.010562065 = queryNorm
              0.5992345 = fieldWeight in 5955, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.793876 = idf(docFreq=961, maxDocs=42740)
                0.0625 = fieldNorm(doc=5955)
          0.03299409 = weight(abstract_txt:wurden in 5955) [ClassicSimilarity], result of:
            0.03299409 = score(doc=5955,freq=1.0), product of:
              0.101339705 = queryWeight, product of:
                1.8418502 = boost
                5.2092657 = idf(docFreq=634, maxDocs=42740)
                0.010562065 = queryNorm
              0.3255791 = fieldWeight in 5955, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2092657 = idf(docFreq=634, maxDocs=42740)
                0.0625 = fieldNorm(doc=5955)
          0.17012663 = weight(abstract_txt:lingo in 5955) [ClassicSimilarity], result of:
            0.17012663 = score(doc=5955,freq=2.0), product of:
              0.20971523 = queryWeight, product of:
                2.1633852 = boost
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.010562065 = queryNorm
              0.81122684 = fieldWeight in 5955, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.0625 = fieldNorm(doc=5955)
          0.4282827 = weight(abstract_txt:komposita in 5955) [ClassicSimilarity], result of:
            0.4282827 = score(doc=5955,freq=1.0), product of:
              0.7052112 = queryWeight, product of:
                6.8713026 = boost
                9.71698 = idf(docFreq=6, maxDocs=42740)
                0.010562065 = queryNorm
              0.60731125 = fieldWeight in 5955, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.71698 = idf(docFreq=6, maxDocs=42740)
                0.0625 = fieldNorm(doc=5955)
        0.32 = coord(8/25)
    
  3. Glaesener, L.: Automatisches Indexieren einer informationswissenschaftlichen Datenbank mit Mehrwortgruppen (2012) 0.28
    0.28414366 = sum of:
      0.28414366 = product of:
        1.4207182 = sum of:
          0.025896238 = weight(abstract_txt:diese in 2402) [ClassicSimilarity], result of:
            0.025896238 = score(doc=2402,freq=1.0), product of:
              0.047452915 = queryWeight, product of:
                1.0290828 = boost
                4.3657994 = idf(docFreq=1475, maxDocs=42740)
                0.010562065 = queryNorm
              0.5457249 = fieldWeight in 2402, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3657994 = idf(docFreq=1475, maxDocs=42740)
                0.125 = fieldNorm(doc=2402)
          0.051338844 = weight(abstract_txt:durch in 2402) [ClassicSimilarity], result of:
            0.051338844 = score(doc=2402,freq=2.0), product of:
              0.06803874 = queryWeight, product of:
                1.5091854 = boost
                4.2683973 = idf(docFreq=1626, maxDocs=42740)
                0.010562065 = queryNorm
              0.75455314 = fieldWeight in 2402, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2683973 = idf(docFreq=1626, maxDocs=42740)
                0.125 = fieldNorm(doc=2402)
          0.06598818 = weight(abstract_txt:wurden in 2402) [ClassicSimilarity], result of:
            0.06598818 = score(doc=2402,freq=1.0), product of:
              0.101339705 = queryWeight, product of:
                1.8418502 = boost
                5.2092657 = idf(docFreq=634, maxDocs=42740)
                0.010562065 = queryNorm
              0.6511582 = fieldWeight in 2402, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2092657 = idf(docFreq=634, maxDocs=42740)
                0.125 = fieldNorm(doc=2402)
          0.24059539 = weight(abstract_txt:lingo in 2402) [ClassicSimilarity], result of:
            0.24059539 = score(doc=2402,freq=1.0), product of:
              0.20971523 = queryWeight, product of:
                2.1633852 = boost
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.010562065 = queryNorm
              1.147248 = fieldWeight in 2402, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.125 = fieldNorm(doc=2402)
          1.0368996 = weight(abstract_txt:mehrwortgruppen in 2402) [ClassicSimilarity], result of:
            1.0368996 = score(doc=2402,freq=3.0), product of:
              0.48517585 = queryWeight, product of:
                4.6535397 = boost
                9.871131 = idf(docFreq=5, maxDocs=42740)
                0.010562065 = queryNorm
              2.1371624 = fieldWeight in 2402, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.871131 = idf(docFreq=5, maxDocs=42740)
                0.125 = fieldNorm(doc=2402)
        0.2 = coord(5/25)
    
  4. Bredack, J.; Lepsky, K.: Automatische Extraktion von Fachterminologie aus Volltexten (2014) 0.18
    0.18441544 = sum of:
      0.18441544 = product of:
        1.1525966 = sum of:
          0.15627952 = weight(abstract_txt:extraktion in 1873) [ClassicSimilarity], result of:
            0.15627952 = score(doc=1873,freq=3.0), product of:
              0.094620235 = queryWeight, product of:
                1.0275332 = boost
                8.7184515 = idf(docFreq=18, maxDocs=42740)
                0.010562065 = queryNorm
              1.6516501 = fieldWeight in 1873, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.7184515 = idf(docFreq=18, maxDocs=42740)
                0.109375 = fieldNorm(doc=1873)
          0.044999234 = weight(abstract_txt:wurde in 1873) [ClassicSimilarity], result of:
            0.044999234 = score(doc=1873,freq=1.0), product of:
              0.08582232 = queryWeight, product of:
                1.69498 = boost
                4.793876 = idf(docFreq=961, maxDocs=42740)
                0.010562065 = queryNorm
              0.5243302 = fieldWeight in 1873, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.793876 = idf(docFreq=961, maxDocs=42740)
                0.109375 = fieldNorm(doc=1873)
          0.21052095 = weight(abstract_txt:lingo in 1873) [ClassicSimilarity], result of:
            0.21052095 = score(doc=1873,freq=1.0), product of:
              0.20971523 = queryWeight, product of:
                2.1633852 = boost
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.010562065 = queryNorm
              1.003842 = fieldWeight in 1873, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.109375 = fieldNorm(doc=1873)
          0.74079686 = weight(abstract_txt:mehrwortgruppen in 1873) [ClassicSimilarity], result of:
            0.74079686 = score(doc=1873,freq=2.0), product of:
              0.48517585 = queryWeight, product of:
                4.6535397 = boost
                9.871131 = idf(docFreq=5, maxDocs=42740)
                0.010562065 = queryNorm
              1.5268626 = fieldWeight in 1873, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.871131 = idf(docFreq=5, maxDocs=42740)
                0.109375 = fieldNorm(doc=1873)
        0.16 = coord(4/25)
    
  5. Dzeyk, W.: Effektiv und nutzerfreundlich : Einsatz von semantischen Technologien und Usability-Methoden zur Verbesserung der medizinischen Literatursuche (2010) 0.10
    0.096176825 = sum of:
      0.096176825 = product of:
        0.40073678 = sum of:
          0.011444629 = weight(abstract_txt:diese in 1417) [ClassicSimilarity], result of:
            0.011444629 = score(doc=1417,freq=2.0), product of:
              0.047452915 = queryWeight, product of:
                1.0290828 = boost
                4.3657994 = idf(docFreq=1475, maxDocs=42740)
                0.010562065 = queryNorm
              0.24117863 = fieldWeight in 1417, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.3657994 = idf(docFreq=1475, maxDocs=42740)
                0.0390625 = fieldNorm(doc=1417)
          0.010064609 = weight(abstract_txt:basis in 1417) [ClassicSimilarity], result of:
            0.010064609 = score(doc=1417,freq=1.0), product of:
              0.054878604 = queryWeight, product of:
                1.1066756 = boost
                4.694981 = idf(docFreq=1061, maxDocs=42740)
                0.010562065 = queryNorm
              0.1833977 = fieldWeight in 1417, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.694981 = idf(docFreq=1061, maxDocs=42740)
                0.0390625 = fieldNorm(doc=1417)
          0.027787967 = weight(abstract_txt:durch in 1417) [ClassicSimilarity], result of:
            0.027787967 = score(doc=1417,freq=6.0), product of:
              0.06803874 = queryWeight, product of:
                1.5091854 = boost
                4.2683973 = idf(docFreq=1626, maxDocs=42740)
                0.010562065 = queryNorm
              0.4084139 = fieldWeight in 1417, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.2683973 = idf(docFreq=1626, maxDocs=42740)
                0.0390625 = fieldNorm(doc=1417)
          0.04252028 = weight(abstract_txt:wurde in 1417) [ClassicSimilarity], result of:
            0.04252028 = score(doc=1417,freq=7.0), product of:
              0.08582232 = queryWeight, product of:
                1.69498 = boost
                4.793876 = idf(docFreq=961, maxDocs=42740)
                0.010562065 = queryNorm
              0.49544546 = fieldWeight in 1417, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                4.793876 = idf(docFreq=961, maxDocs=42740)
                0.0390625 = fieldNorm(doc=1417)
          0.041242614 = weight(abstract_txt:wurden in 1417) [ClassicSimilarity], result of:
            0.041242614 = score(doc=1417,freq=4.0), product of:
              0.101339705 = queryWeight, product of:
                1.8418502 = boost
                5.2092657 = idf(docFreq=634, maxDocs=42740)
                0.010562065 = queryNorm
              0.4069739 = fieldWeight in 1417, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.2092657 = idf(docFreq=634, maxDocs=42740)
                0.0390625 = fieldNorm(doc=1417)
          0.26767668 = weight(abstract_txt:komposita in 1417) [ClassicSimilarity], result of:
            0.26767668 = score(doc=1417,freq=1.0), product of:
              0.7052112 = queryWeight, product of:
                6.8713026 = boost
                9.71698 = idf(docFreq=6, maxDocs=42740)
                0.010562065 = queryNorm
              0.37956953 = fieldWeight in 1417, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.71698 = idf(docFreq=6, maxDocs=42740)
                0.0390625 = fieldNorm(doc=1417)
        0.24 = coord(6/25)