Document (#38018)

Author
Meyer, A.
Title
wiki2rdf: Automatische Extraktion von RDF-Tripeln aus Artikelvolltexten der Wikipedia
Source
Information - Wissenschaft und Praxis. 64(2013) H.2/3, S.115-126
Year
2013
Abstract
Im Projekt DBpedia werden unter anderem Informationen aus Wikipedia-Artikeln in RDF-Tripel umgewandelt. Dabei werden jedoch nicht die Artikeltexte berücksichtigt, sondern vorrangig die sogenannten Infoboxen, die Informationen enthalten, die bereits strukturiert sind. Im Rahmen einer Masterarbeit am Institut für Bibliotheks- und Informationswissenschaft der Humboldt-Universität zu Berlin wurde wiki2rdf entwickelt, eine Software zur regelbasierten Extraktion von RDF-Tripeln aus den unstrukturierten Volltexten der Wikipedia. Die Extraktion erfolgt nach Syntax-Parsing mithilfe eines Dependency-Parsers. Exemplarisch wurde wiki2rdf auf 68820 Artikel aus der Kategorie "Wissenschaftler" der deutschsprachigen Wikipedia angewandt. Es wurden 244563 Tripel extrahiert.
Content
Vgl.: http://www.degruyter.com/view/j/iwp.2013.64.issue-2-3/iwp-2013-0015/iwp-2013-0015.xml?format=INT.
Theme
Semantic Web
Object
DBpedia
Wikipedia

Similar documents (author)

  1. Meyer, A.: ¬Der Realkatalog (1923) 4.67
    4.6733623 = sum of:
      4.6733623 = weight(author_txt:meyer in 100) [ClassicSimilarity], result of:
        4.6733623 = fieldWeight in 100, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.4773793 = idf(docFreq=67, maxDocs=44218)
          0.625 = fieldNorm(doc=100)
    
  2. Meyer, T.: ¬Die öffentliche Bibliothek in der Zivilgesellschaft (2001) 4.67
    4.6733623 = sum of:
      4.6733623 = weight(author_txt:meyer in 235) [ClassicSimilarity], result of:
        4.6733623 = fieldWeight in 235, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.4773793 = idf(docFreq=67, maxDocs=44218)
          0.625 = fieldNorm(doc=235)
    
  3. Meyer, A.: Probleme des Realkatalogs (1921) 4.67
    4.6733623 = sum of:
      4.6733623 = weight(author_txt:meyer in 1669) [ClassicSimilarity], result of:
        4.6733623 = fieldWeight in 1669, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.4773793 = idf(docFreq=67, maxDocs=44218)
          0.625 = fieldNorm(doc=1669)
    
  4. Meyer, R.W.: Selecting electronic alternatives (1993) 4.67
    4.6733623 = sum of:
      4.6733623 = weight(author_txt:meyer in 5915) [ClassicSimilarity], result of:
        4.6733623 = fieldWeight in 5915, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.4773793 = idf(docFreq=67, maxDocs=44218)
          0.625 = fieldNorm(doc=5915)
    
  5. Meyer, F.P.: Out with the old, in with the new : why CD-ROM may have a new standard (1992) 4.67
    4.6733623 = sum of:
      4.6733623 = weight(author_txt:meyer in 6377) [ClassicSimilarity], result of:
        4.6733623 = fieldWeight in 6377, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.4773793 = idf(docFreq=67, maxDocs=44218)
          0.625 = fieldNorm(doc=6377)
    

Similar documents (content)

  1. Witschel, H.F.: Terminologie-Extraktion : Möglichkeiten der Kombination statistischer uns musterbasierter Verfahren (2004) 0.14
    0.14383422 = sum of:
      0.14383422 = product of:
        0.71917105 = sum of:
          0.028010247 = weight(abstract_txt:werden in 123) [ClassicSimilarity], result of:
            0.028010247 = score(doc=123,freq=5.0), product of:
              0.05716212 = queryWeight, product of:
                1.0048898 = boost
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.016223565 = queryNorm
              0.49001414 = fieldWeight in 123, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.0625 = fieldNorm(doc=123)
          0.06384808 = weight(abstract_txt:sogenannten in 123) [ClassicSimilarity], result of:
            0.06384808 = score(doc=123,freq=1.0), product of:
              0.13437204 = queryWeight, product of:
                1.0894411 = boost
                7.602543 = idf(docFreq=59, maxDocs=44218)
                0.016223565 = queryNorm
              0.47515893 = fieldWeight in 123, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.602543 = idf(docFreq=59, maxDocs=44218)
                0.0625 = fieldNorm(doc=123)
          0.09411864 = weight(abstract_txt:unstrukturierten in 123) [ClassicSimilarity], result of:
            0.09411864 = score(doc=123,freq=1.0), product of:
              0.1740447 = queryWeight, product of:
                1.2398801 = boost
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.016223565 = queryNorm
              0.5407728 = fieldWeight in 123, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.0625 = fieldNorm(doc=123)
          0.035819285 = weight(abstract_txt:informationen in 123) [ClassicSimilarity], result of:
            0.035819285 = score(doc=123,freq=1.0), product of:
              0.11515911 = queryWeight, product of:
                1.4263084 = boost
                4.976667 = idf(docFreq=828, maxDocs=44218)
                0.016223565 = queryNorm
              0.31104168 = fieldWeight in 123, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.976667 = idf(docFreq=828, maxDocs=44218)
                0.0625 = fieldNorm(doc=123)
          0.49737477 = weight(abstract_txt:extraktion in 123) [ClassicSimilarity], result of:
            0.49737477 = score(doc=123,freq=3.0), product of:
              0.5280393 = queryWeight, product of:
                3.7406151 = boost
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.016223565 = queryNorm
              0.9419276 = fieldWeight in 123, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.0625 = fieldNorm(doc=123)
        0.2 = coord(5/25)
    
  2. Lussner, W.: Technologien des Wissensmanagements : READWARE als Instrument des Knowledge Retrieval (2000) 0.14
    0.13747966 = sum of:
      0.13747966 = product of:
        0.8592479 = sum of:
          0.025053127 = weight(abstract_txt:werden in 5238) [ClassicSimilarity], result of:
            0.025053127 = score(doc=5238,freq=1.0), product of:
              0.05716212 = queryWeight, product of:
                1.0048898 = boost
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.016223565 = queryNorm
              0.43828195 = fieldWeight in 5238, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.125 = fieldNorm(doc=5238)
          0.18823728 = weight(abstract_txt:unstrukturierten in 5238) [ClassicSimilarity], result of:
            0.18823728 = score(doc=5238,freq=1.0), product of:
              0.1740447 = queryWeight, product of:
                1.2398801 = boost
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.016223565 = queryNorm
              1.0815456 = fieldWeight in 5238, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.125 = fieldNorm(doc=5238)
          0.07163857 = weight(abstract_txt:informationen in 5238) [ClassicSimilarity], result of:
            0.07163857 = score(doc=5238,freq=1.0), product of:
              0.11515911 = queryWeight, product of:
                1.4263084 = boost
                4.976667 = idf(docFreq=828, maxDocs=44218)
                0.016223565 = queryNorm
              0.62208337 = fieldWeight in 5238, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.976667 = idf(docFreq=828, maxDocs=44218)
                0.125 = fieldNorm(doc=5238)
          0.57431895 = weight(abstract_txt:extraktion in 5238) [ClassicSimilarity], result of:
            0.57431895 = score(doc=5238,freq=1.0), product of:
              0.5280393 = queryWeight, product of:
                3.7406151 = boost
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.016223565 = queryNorm
              1.0876443 = fieldWeight in 5238, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.125 = fieldNorm(doc=5238)
        0.16 = coord(4/25)
    
  3. Version 8.08 des Standard-Thesaurus Wirtschaft mit Mapping zu anderen Vokabularen veröffentlicht (2012) 0.11
    0.11092319 = sum of:
      0.11092319 = product of:
        0.69326997 = sum of:
          0.11067846 = weight(abstract_txt:dbpedia in 7) [ClassicSimilarity], result of:
            0.11067846 = score(doc=7,freq=1.0), product of:
              0.16710101 = queryWeight, product of:
                1.2148952 = boost
                8.478011 = idf(docFreq=24, maxDocs=44218)
                0.016223565 = queryNorm
              0.66234463 = fieldWeight in 7, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.478011 = idf(docFreq=24, maxDocs=44218)
                0.078125 = fieldNorm(doc=7)
          0.044774104 = weight(abstract_txt:informationen in 7) [ClassicSimilarity], result of:
            0.044774104 = score(doc=7,freq=1.0), product of:
              0.11515911 = queryWeight, product of:
                1.4263084 = boost
                4.976667 = idf(docFreq=828, maxDocs=44218)
                0.016223565 = queryNorm
              0.3888021 = fieldWeight in 7, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.976667 = idf(docFreq=828, maxDocs=44218)
                0.078125 = fieldNorm(doc=7)
          0.17886807 = weight(abstract_txt:wikipedia in 7) [ClassicSimilarity], result of:
            0.17886807 = score(doc=7,freq=1.0), product of:
              0.36529654 = queryWeight, product of:
                3.5925438 = boost
                6.2675414 = idf(docFreq=227, maxDocs=44218)
                0.016223565 = queryNorm
              0.48965168 = fieldWeight in 7, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2675414 = idf(docFreq=227, maxDocs=44218)
                0.078125 = fieldNorm(doc=7)
          0.35894933 = weight(abstract_txt:extraktion in 7) [ClassicSimilarity], result of:
            0.35894933 = score(doc=7,freq=1.0), product of:
              0.5280393 = queryWeight, product of:
                3.7406151 = boost
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.016223565 = queryNorm
              0.67977774 = fieldWeight in 7, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.078125 = fieldNorm(doc=7)
        0.16 = coord(4/25)
    
  4. Bredack, J.: Terminologieextraktion von Mehrwortgruppen in kunsthistorischen Fachtexten (2013) 0.10
    0.10022384 = sum of:
      0.10022384 = product of:
        0.5011192 = sum of:
          0.022144046 = weight(abstract_txt:werden in 1054) [ClassicSimilarity], result of:
            0.022144046 = score(doc=1054,freq=8.0), product of:
              0.05716212 = queryWeight, product of:
                1.0048898 = boost
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.016223565 = queryNorm
              0.3873902 = fieldWeight in 1054, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                3.5062556 = idf(docFreq=3606, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1054)
          0.0954418 = weight(abstract_txt:extrahiert in 1054) [ClassicSimilarity], result of:
            0.0954418 = score(doc=1054,freq=2.0), product of:
              0.19073898 = queryWeight, product of:
                1.297983 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.016223565 = queryNorm
              0.50037915 = fieldWeight in 1054, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1054)
          0.027900001 = weight(abstract_txt:wurde in 1054) [ClassicSimilarity], result of:
            0.027900001 = score(doc=1054,freq=2.0), product of:
              0.10585057 = queryWeight, product of:
                1.3674482 = boost
                4.771292 = idf(docFreq=1017, maxDocs=44218)
                0.016223565 = queryNorm
              0.26357913 = fieldWeight in 1054, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.771292 = idf(docFreq=1017, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1054)
          0.044774104 = weight(abstract_txt:informationen in 1054) [ClassicSimilarity], result of:
            0.044774104 = score(doc=1054,freq=4.0), product of:
              0.11515911 = queryWeight, product of:
                1.4263084 = boost
                4.976667 = idf(docFreq=828, maxDocs=44218)
                0.016223565 = queryNorm
              0.3888021 = fieldWeight in 1054, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.976667 = idf(docFreq=828, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1054)
          0.31085923 = weight(abstract_txt:extraktion in 1054) [ClassicSimilarity], result of:
            0.31085923 = score(doc=1054,freq=3.0), product of:
              0.5280393 = queryWeight, product of:
                3.7406151 = boost
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.016223565 = queryNorm
              0.58870476 = fieldWeight in 1054, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1054)
        0.2 = coord(5/25)
    
  5. Gross, D.: Maschinelle Bilderkennung mit Big Data und Deep Learning (2017) 0.10
    0.09577734 = sum of:
      0.09577734 = product of:
        0.7981445 = sum of:
          0.23293173 = weight(abstract_txt:unstrukturierten in 3726) [ClassicSimilarity], result of:
            0.23293173 = score(doc=3726,freq=2.0), product of:
              0.1740447 = queryWeight, product of:
                1.2398801 = boost
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.016223565 = queryNorm
              1.3383443 = fieldWeight in 3726, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.109375 = fieldNorm(doc=3726)
          0.062683746 = weight(abstract_txt:informationen in 3726) [ClassicSimilarity], result of:
            0.062683746 = score(doc=3726,freq=1.0), product of:
              0.11515911 = queryWeight, product of:
                1.4263084 = boost
                4.976667 = idf(docFreq=828, maxDocs=44218)
                0.016223565 = queryNorm
              0.54432297 = fieldWeight in 3726, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.976667 = idf(docFreq=828, maxDocs=44218)
                0.109375 = fieldNorm(doc=3726)
          0.502529 = weight(abstract_txt:extraktion in 3726) [ClassicSimilarity], result of:
            0.502529 = score(doc=3726,freq=1.0), product of:
              0.5280393 = queryWeight, product of:
                3.7406151 = boost
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.016223565 = queryNorm
              0.95168877 = fieldWeight in 3726, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.109375 = fieldNorm(doc=3726)
        0.12 = coord(3/25)