Document (#38016)

Author
Meyer, A.
Title
wiki2rdf: Automatische Extraktion von RDF-Tripeln aus Artikelvolltexten der Wikipedia
Source
Information - Wissenschaft und Praxis. 64(2013) H.2/3, S.115-126
Year
2013
Abstract
Im Projekt DBpedia werden unter anderem Informationen aus Wikipedia-Artikeln in RDF-Tripel umgewandelt. Dabei werden jedoch nicht die Artikeltexte berücksichtigt, sondern vorrangig die sogenannten Infoboxen, die Informationen enthalten, die bereits strukturiert sind. Im Rahmen einer Masterarbeit am Institut für Bibliotheks- und Informationswissenschaft der Humboldt-Universität zu Berlin wurde wiki2rdf entwickelt, eine Software zur regelbasierten Extraktion von RDF-Tripeln aus den unstrukturierten Volltexten der Wikipedia. Die Extraktion erfolgt nach Syntax-Parsing mithilfe eines Dependency-Parsers. Exemplarisch wurde wiki2rdf auf 68820 Artikel aus der Kategorie "Wissenschaftler" der deutschsprachigen Wikipedia angewandt. Es wurden 244563 Tripel extrahiert.
Content
Vgl.: http://www.degruyter.com/view/j/iwp.2013.64.issue-2-3/iwp-2013-0015/iwp-2013-0015.xml?format=INT.
Theme
Semantic Web
Object
DBpedia
Wikipedia

Similar documents (author)

  1. Meyer, A.: ¬Der Realkatalog (1923) 4.70
    4.7018247 = sum of:
      4.7018247 = weight(author_txt:meyer in 100) [ClassicSimilarity], result of:
        4.7018247 = fieldWeight in 100, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.5229197 = idf(docFreq=63, maxDocs=43556)
          0.625 = fieldNorm(doc=100)
    
  2. Meyer, T.: ¬Die öffentliche Bibliothek in der Zivilgesellschaft (2001) 4.70
    4.7018247 = sum of:
      4.7018247 = weight(author_txt:meyer in 235) [ClassicSimilarity], result of:
        4.7018247 = fieldWeight in 235, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.5229197 = idf(docFreq=63, maxDocs=43556)
          0.625 = fieldNorm(doc=235)
    
  3. Meyer, A.: Probleme des Realkatalogs (1921) 4.70
    4.7018247 = sum of:
      4.7018247 = weight(author_txt:meyer in 1669) [ClassicSimilarity], result of:
        4.7018247 = fieldWeight in 1669, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.5229197 = idf(docFreq=63, maxDocs=43556)
          0.625 = fieldNorm(doc=1669)
    
  4. Meyer, R.W.: Selecting electronic alternatives (1993) 4.70
    4.7018247 = sum of:
      4.7018247 = weight(author_txt:meyer in 5912) [ClassicSimilarity], result of:
        4.7018247 = fieldWeight in 5912, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.5229197 = idf(docFreq=63, maxDocs=43556)
          0.625 = fieldNorm(doc=5912)
    
  5. Meyer, F.P.: Out with the old, in with the new : why CD-ROM may have a new standard (1992) 4.70
    4.7018247 = sum of:
      4.7018247 = weight(author_txt:meyer in 6374) [ClassicSimilarity], result of:
        4.7018247 = fieldWeight in 6374, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.5229197 = idf(docFreq=63, maxDocs=43556)
          0.625 = fieldNorm(doc=6374)
    

Similar documents (content)

  1. Witschel, H.F.: Terminologie-Extraktion : Möglichkeiten der Kombination statistischer uns musterbasierter Verfahren (2004) 0.15
    0.14652339 = sum of:
      0.14652339 = product of:
        0.7326169 = sum of:
          0.028747886 = weight(abstract_txt:werden in 2121) [ClassicSimilarity], result of:
            0.028747886 = score(doc=2121,freq=5.0), product of:
              0.05856845 = queryWeight, product of:
                1.0138786 = boost
                3.5121832 = idf(docFreq=3531, maxDocs=43556)
                0.016447527 = queryNorm
              0.49084252 = fieldWeight in 2121, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.5121832 = idf(docFreq=3531, maxDocs=43556)
                0.0625 = fieldNorm(doc=2121)
          0.06481095 = weight(abstract_txt:sogenannten in 2121) [ClassicSimilarity], result of:
            0.06481095 = score(doc=2121,freq=1.0), product of:
              0.13666964 = queryWeight, product of:
                1.0951538 = boost
                7.587458 = idf(docFreq=59, maxDocs=43556)
                0.016447527 = queryNorm
              0.47421613 = fieldWeight in 2121, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.587458 = idf(docFreq=59, maxDocs=43556)
                0.0625 = fieldNorm(doc=2121)
          0.09723653 = weight(abstract_txt:unstrukturierten in 2121) [ClassicSimilarity], result of:
            0.09723653 = score(doc=2121,freq=1.0), product of:
              0.17911258 = queryWeight, product of:
                1.2537246 = boost
                8.68607 = idf(docFreq=19, maxDocs=43556)
                0.016447527 = queryNorm
              0.5428794 = fieldWeight in 2121, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.68607 = idf(docFreq=19, maxDocs=43556)
                0.0625 = fieldNorm(doc=2121)
          0.03656566 = weight(abstract_txt:informationen in 2121) [ClassicSimilarity], result of:
            0.03656566 = score(doc=2121,freq=1.0), product of:
              0.1175706 = queryWeight, product of:
                1.4364928 = boost
                4.976164 = idf(docFreq=816, maxDocs=43556)
                0.016447527 = queryNorm
              0.31101024 = fieldWeight in 2121, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.976164 = idf(docFreq=816, maxDocs=43556)
                0.0625 = fieldNorm(doc=2121)
          0.5052559 = weight(abstract_txt:extraktion in 2121) [ClassicSimilarity], result of:
            0.5052559 = score(doc=2121,freq=3.0), product of:
              0.5373378 = queryWeight, product of:
                3.761174 = boost
                8.68607 = idf(docFreq=19, maxDocs=43556)
                0.016447527 = queryNorm
              0.9402947 = fieldWeight in 2121, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.68607 = idf(docFreq=19, maxDocs=43556)
                0.0625 = fieldNorm(doc=2121)
        0.2 = coord(5/25)
    
  2. Lussner, W.: Technologien des Wissensmanagements : READWARE als Instrument des Knowledge Retrieval (2000) 0.14
    0.14027783 = sum of:
      0.14027783 = product of:
        0.87673646 = sum of:
          0.02571289 = weight(abstract_txt:werden in 236) [ClassicSimilarity], result of:
            0.02571289 = score(doc=236,freq=1.0), product of:
              0.05856845 = queryWeight, product of:
                1.0138786 = boost
                3.5121832 = idf(docFreq=3531, maxDocs=43556)
                0.016447527 = queryNorm
              0.4390229 = fieldWeight in 236, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5121832 = idf(docFreq=3531, maxDocs=43556)
                0.125 = fieldNorm(doc=236)
          0.19447306 = weight(abstract_txt:unstrukturierten in 236) [ClassicSimilarity], result of:
            0.19447306 = score(doc=236,freq=1.0), product of:
              0.17911258 = queryWeight, product of:
                1.2537246 = boost
                8.68607 = idf(docFreq=19, maxDocs=43556)
                0.016447527 = queryNorm
              1.0857588 = fieldWeight in 236, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.68607 = idf(docFreq=19, maxDocs=43556)
                0.125 = fieldNorm(doc=236)
          0.07313132 = weight(abstract_txt:informationen in 236) [ClassicSimilarity], result of:
            0.07313132 = score(doc=236,freq=1.0), product of:
              0.1175706 = queryWeight, product of:
                1.4364928 = boost
                4.976164 = idf(docFreq=816, maxDocs=43556)
                0.016447527 = queryNorm
              0.6220205 = fieldWeight in 236, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.976164 = idf(docFreq=816, maxDocs=43556)
                0.125 = fieldNorm(doc=236)
          0.5834192 = weight(abstract_txt:extraktion in 236) [ClassicSimilarity], result of:
            0.5834192 = score(doc=236,freq=1.0), product of:
              0.5373378 = queryWeight, product of:
                3.761174 = boost
                8.68607 = idf(docFreq=19, maxDocs=43556)
                0.016447527 = queryNorm
              1.0857588 = fieldWeight in 236, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.68607 = idf(docFreq=19, maxDocs=43556)
                0.125 = fieldNorm(doc=236)
        0.16 = coord(4/25)
    
  3. Bredack, J.: Terminologieextraktion von Mehrwortgruppen in kunsthistorischen Fachtexten (2013) 0.13
    0.12975448 = sum of:
      0.12975448 = product of:
        0.5406437 = sum of:
          0.030839197 = weight(abstract_txt:enthalten in 3052) [ClassicSimilarity], result of:
            0.030839197 = score(doc=3052,freq=1.0), product of:
              0.113951966 = queryWeight, product of:
                6.9282126 = idf(docFreq=115, maxDocs=43556)
                0.016447527 = queryNorm
              0.2706333 = fieldWeight in 3052, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9282126 = idf(docFreq=115, maxDocs=43556)
                0.0390625 = fieldNorm(doc=3052)
          0.022727199 = weight(abstract_txt:werden in 3052) [ClassicSimilarity], result of:
            0.022727199 = score(doc=3052,freq=8.0), product of:
              0.05856845 = queryWeight, product of:
                1.0138786 = boost
                3.5121832 = idf(docFreq=3531, maxDocs=43556)
                0.016447527 = queryNorm
              0.38804507 = fieldWeight in 3052, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                3.5121832 = idf(docFreq=3531, maxDocs=43556)
                0.0390625 = fieldNorm(doc=3052)
          0.096974015 = weight(abstract_txt:extrahiert in 3052) [ClassicSimilarity], result of:
            0.096974015 = score(doc=3052,freq=2.0), product of:
              0.19412437 = queryWeight, product of:
                1.3052062 = boost
                9.042746 = idf(docFreq=13, maxDocs=43556)
                0.016447527 = queryNorm
              0.4995458 = fieldWeight in 3052, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.042746 = idf(docFreq=13, maxDocs=43556)
                0.0390625 = fieldNorm(doc=3052)
          0.02861133 = weight(abstract_txt:wurde in 3052) [ClassicSimilarity], result of:
            0.02861133 = score(doc=3052,freq=2.0), product of:
              0.108395636 = queryWeight, product of:
                1.3793039 = boost
                4.7780557 = idf(docFreq=995, maxDocs=43556)
                0.016447527 = queryNorm
              0.2639528 = fieldWeight in 3052, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.7780557 = idf(docFreq=995, maxDocs=43556)
                0.0390625 = fieldNorm(doc=3052)
          0.045707077 = weight(abstract_txt:informationen in 3052) [ClassicSimilarity], result of:
            0.045707077 = score(doc=3052,freq=4.0), product of:
              0.1175706 = queryWeight, product of:
                1.4364928 = boost
                4.976164 = idf(docFreq=816, maxDocs=43556)
                0.016447527 = queryNorm
              0.3887628 = fieldWeight in 3052, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.976164 = idf(docFreq=816, maxDocs=43556)
                0.0390625 = fieldNorm(doc=3052)
          0.3157849 = weight(abstract_txt:extraktion in 3052) [ClassicSimilarity], result of:
            0.3157849 = score(doc=3052,freq=3.0), product of:
              0.5373378 = queryWeight, product of:
                3.761174 = boost
                8.68607 = idf(docFreq=19, maxDocs=43556)
                0.016447527 = queryNorm
              0.58768415 = fieldWeight in 3052, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.68607 = idf(docFreq=19, maxDocs=43556)
                0.0390625 = fieldNorm(doc=3052)
        0.24 = coord(6/25)
    
  4. Version 8.08 des Standard-Thesaurus Wirtschaft mit Mapping zu anderen Vokabularen veröffentlicht (2012) 0.11
    0.11290244 = sum of:
      0.11290244 = product of:
        0.70564026 = sum of:
          0.1124168 = weight(abstract_txt:dbpedia in 2005) [ClassicSimilarity], result of:
            0.1124168 = score(doc=2005,freq=1.0), product of:
              0.17002805 = queryWeight, product of:
                1.2215166 = boost
                8.462927 = idf(docFreq=24, maxDocs=43556)
                0.016447527 = queryNorm
              0.6611662 = fieldWeight in 2005, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.462927 = idf(docFreq=24, maxDocs=43556)
                0.078125 = fieldNorm(doc=2005)
          0.045707077 = weight(abstract_txt:informationen in 2005) [ClassicSimilarity], result of:
            0.045707077 = score(doc=2005,freq=1.0), product of:
              0.1175706 = queryWeight, product of:
                1.4364928 = boost
                4.976164 = idf(docFreq=816, maxDocs=43556)
                0.016447527 = queryNorm
              0.3887628 = fieldWeight in 2005, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.976164 = idf(docFreq=816, maxDocs=43556)
                0.078125 = fieldNorm(doc=2005)
          0.18287934 = weight(abstract_txt:wikipedia in 2005) [ClassicSimilarity], result of:
            0.18287934 = score(doc=2005,freq=1.0), product of:
              0.37333286 = queryWeight, product of:
                3.620072 = boost
                6.270157 = idf(docFreq=223, maxDocs=43556)
                0.016447527 = queryNorm
              0.489856 = fieldWeight in 2005, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.270157 = idf(docFreq=223, maxDocs=43556)
                0.078125 = fieldNorm(doc=2005)
          0.36463702 = weight(abstract_txt:extraktion in 2005) [ClassicSimilarity], result of:
            0.36463702 = score(doc=2005,freq=1.0), product of:
              0.5373378 = queryWeight, product of:
                3.761174 = boost
                8.68607 = idf(docFreq=19, maxDocs=43556)
                0.016447527 = queryNorm
              0.67859924 = fieldWeight in 2005, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.68607 = idf(docFreq=19, maxDocs=43556)
                0.078125 = fieldNorm(doc=2005)
        0.16 = coord(4/25)
    
  5. Gross, D.: Maschinelle Bilderkennung mit Big Data und Deep Learning (2017) 0.10
    0.09781558 = sum of:
      0.09781558 = product of:
        0.8151299 = sum of:
          0.24064814 = weight(abstract_txt:unstrukturierten in 12) [ClassicSimilarity], result of:
            0.24064814 = score(doc=12,freq=2.0), product of:
              0.17911258 = queryWeight, product of:
                1.2537246 = boost
                8.68607 = idf(docFreq=19, maxDocs=43556)
                0.016447527 = queryNorm
              1.343558 = fieldWeight in 12, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.68607 = idf(docFreq=19, maxDocs=43556)
                0.109375 = fieldNorm(doc=12)
          0.0639899 = weight(abstract_txt:informationen in 12) [ClassicSimilarity], result of:
            0.0639899 = score(doc=12,freq=1.0), product of:
              0.1175706 = queryWeight, product of:
                1.4364928 = boost
                4.976164 = idf(docFreq=816, maxDocs=43556)
                0.016447527 = queryNorm
              0.5442679 = fieldWeight in 12, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.976164 = idf(docFreq=816, maxDocs=43556)
                0.109375 = fieldNorm(doc=12)
          0.51049185 = weight(abstract_txt:extraktion in 12) [ClassicSimilarity], result of:
            0.51049185 = score(doc=12,freq=1.0), product of:
              0.5373378 = queryWeight, product of:
                3.761174 = boost
                8.68607 = idf(docFreq=19, maxDocs=43556)
                0.016447527 = queryNorm
              0.95003897 = fieldWeight in 12, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.68607 = idf(docFreq=19, maxDocs=43556)
                0.109375 = fieldNorm(doc=12)
        0.12 = coord(3/25)