Document (#30810)

Author
Granitzer, M.
Title
Statistische Verfahren der Textanalyse
Source
Semantic Web: Wege zur vernetzten Wissensgesellschaft. Hrsg.: T. Pellegrini, u. A. Blumauer
Imprint
Berlin : Springer
Year
2006
Pages
S.437-451
Series
X.media.press
Abstract
Der vorliegende Artikel bietet einen Überblick über statistische Verfahren der Textanalyse im Kontext des Semantic Webs. Als Einleitung erfolgt die Diskussion von Methoden und gängigen Techniken zur Vorverarbeitung von Texten wie z. B. Stemming oder Part-of-Speech Tagging. Die so eingeführten Repräsentationsformen dienen als Basis für statistische Merkmalsanalysen sowie für weiterführende Techniken wie Information Extraction und maschinelle Lernverfahren. Die Darstellung dieser speziellen Techniken erfolgt im Überblick, wobei auf die wichtigsten Aspekte in Bezug auf das Semantic Web detailliert eingegangen wird. Die Anwendung der vorgestellten Techniken zur Erstellung und Wartung von Ontologien sowie der Verweis auf weiterführende Literatur bilden den Abschluss dieses Artikels.
Theme
Computerlinguistik
Semantic Web

Similar documents (content)

  1. Franke-Maier, M.; Beck, C.; Kasprzik, A.; Maas, J.F.; Pielmeier, S.; Wiesenmüller, H: ¬Ein Feuerwerk an Algorithmen und der Startschuss zur Bildung eines Kompetenznetzwerks für maschinelle Erschließung : Bericht zur Fachtagung Netzwerk maschinelle Erschließung an der Deutschen Nationalbibliothek am 10. und 11. Oktober 2019 (2020) 0.12
    0.12182379 = sum of:
      0.12182379 = product of:
        0.7613987 = sum of:
          0.19620821 = weight(abstract_txt:maschinelle in 5851) [ClassicSimilarity], result of:
            0.19620821 = score(doc=5851,freq=4.0), product of:
              0.13224867 = queryWeight, product of:
                1.077998 = boost
                7.912698 = idf(docFreq=43, maxDocs=44218)
                0.015504179 = queryNorm
              1.4836309 = fieldWeight in 5851, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.912698 = idf(docFreq=43, maxDocs=44218)
                0.09375 = fieldNorm(doc=5851)
          0.03906027 = weight(abstract_txt:sowie in 5851) [ClassicSimilarity], result of:
            0.03906027 = score(doc=5851,freq=1.0), product of:
              0.09017788 = queryWeight, product of:
                1.2588886 = boost
                4.6202335 = idf(docFreq=1183, maxDocs=44218)
                0.015504179 = queryNorm
              0.4331469 = fieldWeight in 5851, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6202335 = idf(docFreq=1183, maxDocs=44218)
                0.09375 = fieldNorm(doc=5851)
          0.07576377 = weight(abstract_txt:verfahren in 5851) [ClassicSimilarity], result of:
            0.07576377 = score(doc=5851,freq=1.0), product of:
              0.14025475 = queryWeight, product of:
                1.5699872 = boost
                5.761993 = idf(docFreq=377, maxDocs=44218)
                0.015504179 = queryNorm
              0.5401868 = fieldWeight in 5851, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.761993 = idf(docFreq=377, maxDocs=44218)
                0.09375 = fieldNorm(doc=5851)
          0.45036647 = weight(abstract_txt:textanalyse in 5851) [ClassicSimilarity], result of:
            0.45036647 = score(doc=5851,freq=2.0), product of:
              0.36529514 = queryWeight, product of:
                2.5337238 = boost
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.015504179 = queryNorm
              1.2328838 = fieldWeight in 5851, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.09375 = fieldNorm(doc=5851)
        0.16 = coord(4/25)
    
  2. Stollberg, M.: Ontologiebasierte Wissensmodellierung : Verwendung als semantischer Grundbaustein des Semantic Web (2002) 0.11
    0.11097161 = sum of:
      0.11097161 = product of:
        0.46238172 = sum of:
          0.14136153 = weight(abstract_txt:ontologien in 4495) [ClassicSimilarity], result of:
            0.14136153 = score(doc=4495,freq=16.0), product of:
              0.120020345 = queryWeight, product of:
                1.0269511 = boost
                7.538004 = idf(docFreq=63, maxDocs=44218)
                0.015504179 = queryNorm
              1.177813 = fieldWeight in 4495, product of:
                4.0 = tf(freq=16.0), with freq of:
                  16.0 = termFreq=16.0
                7.538004 = idf(docFreq=63, maxDocs=44218)
                0.0390625 = fieldNorm(doc=4495)
          0.043621276 = weight(abstract_txt:abschluss in 4495) [ClassicSimilarity], result of:
            0.043621276 = score(doc=4495,freq=1.0), product of:
              0.138104 = queryWeight, product of:
                1.1016039 = boost
                8.085969 = idf(docFreq=36, maxDocs=44218)
                0.015504179 = queryNorm
              0.31585816 = fieldWeight in 4495, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.085969 = idf(docFreq=36, maxDocs=44218)
                0.0390625 = fieldNorm(doc=4495)
          0.049024142 = weight(abstract_txt:semantic in 4495) [ClassicSimilarity], result of:
            0.049024142 = score(doc=4495,freq=11.0), product of:
              0.08457197 = queryWeight, product of:
                1.2191315 = boost
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.015504179 = queryNorm
              0.57967365 = fieldWeight in 4495, product of:
                3.3166249 = tf(freq=11.0), with freq of:
                  11.0 = termFreq=11.0
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.0390625 = fieldNorm(doc=4495)
          0.028189322 = weight(abstract_txt:sowie in 4495) [ClassicSimilarity], result of:
            0.028189322 = score(doc=4495,freq=3.0), product of:
              0.09017788 = queryWeight, product of:
                1.2588886 = boost
                4.6202335 = idf(docFreq=1183, maxDocs=44218)
                0.015504179 = queryNorm
              0.31259686 = fieldWeight in 4495, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.6202335 = idf(docFreq=1183, maxDocs=44218)
                0.0390625 = fieldNorm(doc=4495)
          0.09470471 = weight(abstract_txt:verfahren in 4495) [ClassicSimilarity], result of:
            0.09470471 = score(doc=4495,freq=9.0), product of:
              0.14025475 = queryWeight, product of:
                1.5699872 = boost
                5.761993 = idf(docFreq=377, maxDocs=44218)
                0.015504179 = queryNorm
              0.67523354 = fieldWeight in 4495, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                5.761993 = idf(docFreq=377, maxDocs=44218)
                0.0390625 = fieldNorm(doc=4495)
          0.105480745 = weight(abstract_txt:techniken in 4495) [ClassicSimilarity], result of:
            0.105480745 = score(doc=4495,freq=1.0), product of:
              0.39495063 = queryWeight, product of:
                3.7258358 = boost
                6.8370748 = idf(docFreq=128, maxDocs=44218)
                0.015504179 = queryNorm
              0.26707324 = fieldWeight in 4495, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8370748 = idf(docFreq=128, maxDocs=44218)
                0.0390625 = fieldNorm(doc=4495)
        0.24 = coord(6/25)
    
  3. Rieger, B.B.: Unscharfe Semantik : die empirische Analyse, quantitative Beschreibung, formale Repräsentation und prozedurale Modellierung vager Wortbedeutungen in Texten (1990) 0.08
    0.07794688 = sum of:
      0.07794688 = product of:
        0.32477868 = sum of:
          0.03242353 = weight(abstract_txt:vorgestellten in 209) [ClassicSimilarity], result of:
            0.03242353 = score(doc=209,freq=1.0), product of:
              0.13149853 = queryWeight, product of:
                1.0749364 = boost
                7.890225 = idf(docFreq=44, maxDocs=44218)
                0.015504179 = queryNorm
              0.24656953 = fieldWeight in 209, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.890225 = idf(docFreq=44, maxDocs=44218)
                0.03125 = fieldNorm(doc=209)
          0.022551456 = weight(abstract_txt:sowie in 209) [ClassicSimilarity], result of:
            0.022551456 = score(doc=209,freq=3.0), product of:
              0.09017788 = queryWeight, product of:
                1.2588886 = boost
                4.6202335 = idf(docFreq=1183, maxDocs=44218)
                0.015504179 = queryNorm
              0.2500775 = fieldWeight in 209, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.6202335 = idf(docFreq=1183, maxDocs=44218)
                0.03125 = fieldNorm(doc=209)
          0.043742232 = weight(abstract_txt:verfahren in 209) [ClassicSimilarity], result of:
            0.043742232 = score(doc=209,freq=3.0), product of:
              0.14025475 = queryWeight, product of:
                1.5699872 = boost
                5.761993 = idf(docFreq=377, maxDocs=44218)
                0.015504179 = queryNorm
              0.311877 = fieldWeight in 209, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.761993 = idf(docFreq=377, maxDocs=44218)
                0.03125 = fieldNorm(doc=209)
          0.025754087 = weight(abstract_txt:überblick in 209) [ClassicSimilarity], result of:
            0.025754087 = score(doc=209,freq=1.0), product of:
              0.14209805 = queryWeight, product of:
                1.5802703 = boost
                5.799733 = idf(docFreq=363, maxDocs=44218)
                0.015504179 = queryNorm
              0.18124166 = fieldWeight in 209, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.799733 = idf(docFreq=363, maxDocs=44218)
                0.03125 = fieldNorm(doc=209)
          0.10615239 = weight(abstract_txt:textanalyse in 209) [ClassicSimilarity], result of:
            0.10615239 = score(doc=209,freq=1.0), product of:
              0.36529514 = queryWeight, product of:
                2.5337238 = boost
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.015504179 = queryNorm
              0.2905935 = fieldWeight in 209, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.03125 = fieldNorm(doc=209)
          0.094154984 = weight(abstract_txt:statistische in 209) [ClassicSimilarity], result of:
            0.094154984 = score(doc=209,freq=1.0), product of:
              0.38602608 = queryWeight, product of:
                3.1900043 = boost
                7.805067 = idf(docFreq=48, maxDocs=44218)
                0.015504179 = queryNorm
              0.24390835 = fieldWeight in 209, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.805067 = idf(docFreq=48, maxDocs=44218)
                0.03125 = fieldNorm(doc=209)
        0.24 = coord(6/25)
    
  4. Reichenberger, K.: Kompendium semantische Netze : Konzepte, Technologie, Modellierung (2010) 0.07
    0.07244716 = sum of:
      0.07244716 = product of:
        0.45279473 = sum of:
          0.084816925 = weight(abstract_txt:ontologien in 4413) [ClassicSimilarity], result of:
            0.084816925 = score(doc=4413,freq=1.0), product of:
              0.120020345 = queryWeight, product of:
                1.0269511 = boost
                7.538004 = idf(docFreq=63, maxDocs=44218)
                0.015504179 = queryNorm
              0.70668787 = fieldWeight in 4413, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.538004 = idf(docFreq=63, maxDocs=44218)
                0.09375 = fieldNorm(doc=4413)
          0.03906027 = weight(abstract_txt:sowie in 4413) [ClassicSimilarity], result of:
            0.03906027 = score(doc=4413,freq=1.0), product of:
              0.09017788 = queryWeight, product of:
                1.2588886 = boost
                4.6202335 = idf(docFreq=1183, maxDocs=44218)
                0.015504179 = queryNorm
              0.4331469 = fieldWeight in 4413, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6202335 = idf(docFreq=1183, maxDocs=44218)
                0.09375 = fieldNorm(doc=4413)
          0.07576377 = weight(abstract_txt:verfahren in 4413) [ClassicSimilarity], result of:
            0.07576377 = score(doc=4413,freq=1.0), product of:
              0.14025475 = queryWeight, product of:
                1.5699872 = boost
                5.761993 = idf(docFreq=377, maxDocs=44218)
                0.015504179 = queryNorm
              0.5401868 = fieldWeight in 4413, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.761993 = idf(docFreq=377, maxDocs=44218)
                0.09375 = fieldNorm(doc=4413)
          0.25315377 = weight(abstract_txt:techniken in 4413) [ClassicSimilarity], result of:
            0.25315377 = score(doc=4413,freq=1.0), product of:
              0.39495063 = queryWeight, product of:
                3.7258358 = boost
                6.8370748 = idf(docFreq=128, maxDocs=44218)
                0.015504179 = queryNorm
              0.6409758 = fieldWeight in 4413, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8370748 = idf(docFreq=128, maxDocs=44218)
                0.09375 = fieldNorm(doc=4413)
        0.16 = coord(4/25)
    
  5. Budin, G.: Kommunikation in Netzwerken : Terminologiemanagement (2006) 0.07
    0.07155018 = sum of:
      0.07155018 = product of:
        0.44718865 = sum of:
          0.04730027 = weight(abstract_txt:semantic in 5700) [ClassicSimilarity], result of:
            0.04730027 = score(doc=5700,freq=1.0), product of:
              0.08457197 = queryWeight, product of:
                1.2191315 = boost
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.015504179 = queryNorm
              0.5592902 = fieldWeight in 5700, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.125 = fieldNorm(doc=5700)
          0.052080356 = weight(abstract_txt:sowie in 5700) [ClassicSimilarity], result of:
            0.052080356 = score(doc=5700,freq=1.0), product of:
              0.09017788 = queryWeight, product of:
                1.2588886 = boost
                4.6202335 = idf(docFreq=1183, maxDocs=44218)
                0.015504179 = queryNorm
              0.5775292 = fieldWeight in 5700, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6202335 = idf(docFreq=1183, maxDocs=44218)
                0.125 = fieldNorm(doc=5700)
          0.24479166 = weight(abstract_txt:repräsentationsformen in 5700) [ClassicSimilarity], result of:
            0.24479166 = score(doc=5700,freq=1.0), product of:
              0.20083456 = queryWeight, product of:
                1.3284388 = boost
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.015504179 = queryNorm
              1.2188722 = fieldWeight in 5700, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.125 = fieldNorm(doc=5700)
          0.10301635 = weight(abstract_txt:überblick in 5700) [ClassicSimilarity], result of:
            0.10301635 = score(doc=5700,freq=1.0), product of:
              0.14209805 = queryWeight, product of:
                1.5802703 = boost
                5.799733 = idf(docFreq=363, maxDocs=44218)
                0.015504179 = queryNorm
              0.72496665 = fieldWeight in 5700, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.799733 = idf(docFreq=363, maxDocs=44218)
                0.125 = fieldNorm(doc=5700)
        0.16 = coord(4/25)