Document (#38548)

Author
Barbu, E.
Title
What kind of knowledge is in Wikipedia? : unsupervised extraction of properties for similar concepts
Source
Journal of the Association for Information Science and Technology. 65(2014) no.12, S.2489-2497
Year
2014
Abstract
This article presents a novel method for extracting knowledge from Wikipedia and a classification schema for annotating the extracted knowledge. Unlike the majority of approaches in the literature, we use the raw Wikipedia text for knowledge acquisition. The main assumption made is that the concepts classified under the same node in a taxonomy are described in a comparable way in Wikipedia. The annotation of the extracted knowledge is done at two levels: ontological and logical. The extracted properties are evaluated in the traditional way, that is, by computing the precision of the extraction procedure and in a clustering task. The second method of evaluation is seldom used in the natural language processing community, but it is regularly employed in cognitive psychology.
Theme
Automatisches Klassifizieren
Object
Wikipedia

Similar documents (content)

  1. Boer, V. de; Porter, A.L.; Someren, M. v.: Extracting historical time periods from the Web (2010) 0.21
    0.21308666 = sum of:
      0.21308666 = product of:
        0.88786113 = sum of:
          0.083703525 = weight(abstract_txt:ontological in 3988) [ClassicSimilarity], result of:
            0.083703525 = score(doc=3988,freq=1.0), product of:
              0.13558865 = queryWeight, product of:
                1.0593044 = boost
                6.5848994 = idf(docFreq=165, maxDocs=44218)
                0.019438082 = queryNorm
              0.6173343 = fieldWeight in 3988, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5848994 = idf(docFreq=165, maxDocs=44218)
                0.09375 = fieldNorm(doc=3988)
          0.10159226 = weight(abstract_txt:annotation in 3988) [ClassicSimilarity], result of:
            0.10159226 = score(doc=3988,freq=1.0), product of:
              0.15427703 = queryWeight, product of:
                1.1299514 = boost
                7.0240583 = idf(docFreq=106, maxDocs=44218)
                0.019438082 = queryNorm
              0.65850544 = fieldWeight in 3988, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0240583 = idf(docFreq=106, maxDocs=44218)
                0.09375 = fieldNorm(doc=3988)
          0.09259754 = weight(abstract_txt:method in 3988) [ClassicSimilarity], result of:
            0.09259754 = score(doc=3988,freq=3.0), product of:
              0.12669614 = queryWeight, product of:
                1.4481242 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.019438082 = queryNorm
              0.73086315 = fieldWeight in 3988, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.09375 = fieldNorm(doc=3988)
          0.07827289 = weight(abstract_txt:concepts in 3988) [ClassicSimilarity], result of:
            0.07827289 = score(doc=3988,freq=2.0), product of:
              0.12965871 = queryWeight, product of:
                1.4649574 = boost
                4.5532694 = idf(docFreq=1265, maxDocs=44218)
                0.019438082 = queryNorm
              0.60368395 = fieldWeight in 3988, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.5532694 = idf(docFreq=1265, maxDocs=44218)
                0.09375 = fieldNorm(doc=3988)
          0.24103838 = weight(abstract_txt:extraction in 3988) [ClassicSimilarity], result of:
            0.24103838 = score(doc=3988,freq=3.0), product of:
              0.23974776 = queryWeight, product of:
                1.9920553 = boost
                6.1915555 = idf(docFreq=245, maxDocs=44218)
                0.019438082 = queryNorm
              1.0053833 = fieldWeight in 3988, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.1915555 = idf(docFreq=245, maxDocs=44218)
                0.09375 = fieldNorm(doc=3988)
          0.2906566 = weight(abstract_txt:extracted in 3988) [ClassicSimilarity], result of:
            0.2906566 = score(doc=3988,freq=2.0), product of:
              0.3559137 = queryWeight, product of:
                2.9726384 = boost
                6.159553 = idf(docFreq=253, maxDocs=44218)
                0.019438082 = queryNorm
              0.8166491 = fieldWeight in 3988, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.159553 = idf(docFreq=253, maxDocs=44218)
                0.09375 = fieldNorm(doc=3988)
        0.24 = coord(6/25)
    
  2. Vlachidis, A.; Binding, C.; Tudhope, D.; May, K.: Excavating grey literature : a case study on the rich indexing of archaeological documents via natural language-processing techniques and knowledge-based resources (2010) 0.14
    0.14147295 = sum of:
      0.14147295 = product of:
        0.5052605 = sum of:
          0.05580235 = weight(abstract_txt:ontological in 3948) [ClassicSimilarity], result of:
            0.05580235 = score(doc=3948,freq=1.0), product of:
              0.13558865 = queryWeight, product of:
                1.0593044 = boost
                6.5848994 = idf(docFreq=165, maxDocs=44218)
                0.019438082 = queryNorm
              0.4115562 = fieldWeight in 3948, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5848994 = idf(docFreq=165, maxDocs=44218)
                0.0625 = fieldNorm(doc=3948)
          0.06772818 = weight(abstract_txt:annotation in 3948) [ClassicSimilarity], result of:
            0.06772818 = score(doc=3948,freq=1.0), product of:
              0.15427703 = queryWeight, product of:
                1.1299514 = boost
                7.0240583 = idf(docFreq=106, maxDocs=44218)
                0.019438082 = queryNorm
              0.43900365 = fieldWeight in 3948, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0240583 = idf(docFreq=106, maxDocs=44218)
                0.0625 = fieldNorm(doc=3948)
          0.03564081 = weight(abstract_txt:method in 3948) [ClassicSimilarity], result of:
            0.03564081 = score(doc=3948,freq=1.0), product of:
              0.12669614 = queryWeight, product of:
                1.4481242 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.019438082 = queryNorm
              0.28130937 = fieldWeight in 3948, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.0625 = fieldNorm(doc=3948)
          0.036898192 = weight(abstract_txt:concepts in 3948) [ClassicSimilarity], result of:
            0.036898192 = score(doc=3948,freq=1.0), product of:
              0.12965871 = queryWeight, product of:
                1.4649574 = boost
                4.5532694 = idf(docFreq=1265, maxDocs=44218)
                0.019438082 = queryNorm
              0.28457934 = fieldWeight in 3948, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5532694 = idf(docFreq=1265, maxDocs=44218)
                0.0625 = fieldNorm(doc=3948)
          0.07981845 = weight(abstract_txt:properties in 3948) [ClassicSimilarity], result of:
            0.07981845 = score(doc=3948,freq=1.0), product of:
              0.21687053 = queryWeight, product of:
                1.8946298 = boost
                5.888745 = idf(docFreq=332, maxDocs=44218)
                0.019438082 = queryNorm
              0.36804655 = fieldWeight in 3948, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.888745 = idf(docFreq=332, maxDocs=44218)
                0.0625 = fieldNorm(doc=3948)
          0.18555145 = weight(abstract_txt:extraction in 3948) [ClassicSimilarity], result of:
            0.18555145 = score(doc=3948,freq=4.0), product of:
              0.23974776 = queryWeight, product of:
                1.9920553 = boost
                6.1915555 = idf(docFreq=245, maxDocs=44218)
                0.019438082 = queryNorm
              0.77394444 = fieldWeight in 3948, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.1915555 = idf(docFreq=245, maxDocs=44218)
                0.0625 = fieldNorm(doc=3948)
          0.043821093 = weight(abstract_txt:knowledge in 3948) [ClassicSimilarity], result of:
            0.043821093 = score(doc=3948,freq=1.0), product of:
              0.19734849 = queryWeight, product of:
                2.857663 = boost
                3.5527887 = idf(docFreq=3442, maxDocs=44218)
                0.019438082 = queryNorm
              0.2220493 = fieldWeight in 3948, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5527887 = idf(docFreq=3442, maxDocs=44218)
                0.0625 = fieldNorm(doc=3948)
        0.28 = coord(7/25)
    
  3. Zarrad, R.; Doggaz, N.; Zagrouba, E.: Wikipedia HTML structure analysis for ontology construction (2018) 0.13
    0.12785636 = sum of:
      0.12785636 = product of:
        0.6392818 = sum of:
          0.036898192 = weight(abstract_txt:concepts in 4302) [ClassicSimilarity], result of:
            0.036898192 = score(doc=4302,freq=1.0), product of:
              0.12965871 = queryWeight, product of:
                1.4649574 = boost
                4.5532694 = idf(docFreq=1265, maxDocs=44218)
                0.019438082 = queryNorm
              0.28457934 = fieldWeight in 4302, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5532694 = idf(docFreq=1265, maxDocs=44218)
                0.0625 = fieldNorm(doc=4302)
          0.13120468 = weight(abstract_txt:extraction in 4302) [ClassicSimilarity], result of:
            0.13120468 = score(doc=4302,freq=2.0), product of:
              0.23974776 = queryWeight, product of:
                1.9920553 = boost
                6.1915555 = idf(docFreq=245, maxDocs=44218)
                0.019438082 = queryNorm
              0.54726136 = fieldWeight in 4302, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.1915555 = idf(docFreq=245, maxDocs=44218)
                0.0625 = fieldNorm(doc=4302)
          0.061972387 = weight(abstract_txt:knowledge in 4302) [ClassicSimilarity], result of:
            0.061972387 = score(doc=4302,freq=2.0), product of:
              0.19734849 = queryWeight, product of:
                2.857663 = boost
                3.5527887 = idf(docFreq=3442, maxDocs=44218)
                0.019438082 = queryNorm
              0.31402513 = fieldWeight in 4302, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5527887 = idf(docFreq=3442, maxDocs=44218)
                0.0625 = fieldNorm(doc=4302)
          0.13701683 = weight(abstract_txt:extracted in 4302) [ClassicSimilarity], result of:
            0.13701683 = score(doc=4302,freq=1.0), product of:
              0.3559137 = queryWeight, product of:
                2.9726384 = boost
                6.159553 = idf(docFreq=253, maxDocs=44218)
                0.019438082 = queryNorm
              0.38497207 = fieldWeight in 4302, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.159553 = idf(docFreq=253, maxDocs=44218)
                0.0625 = fieldNorm(doc=4302)
          0.2721897 = weight(abstract_txt:wikipedia in 4302) [ClassicSimilarity], result of:
            0.2721897 = score(doc=4302,freq=2.0), product of:
              0.491337 = queryWeight, product of:
                4.0330057 = boost
                6.2675414 = idf(docFreq=227, maxDocs=44218)
                0.019438082 = queryNorm
              0.5539776 = fieldWeight in 4302, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.2675414 = idf(docFreq=227, maxDocs=44218)
                0.0625 = fieldNorm(doc=4302)
        0.2 = coord(5/25)
    
  4. Auer, S.; Lehmann, J.: What have Innsbruck and Leipzig in common? : extracting semantics from Wiki content (2007) 0.13
    0.12559025 = sum of:
      0.12559025 = product of:
        0.78493905 = sum of:
          0.08147055 = weight(abstract_txt:extracting in 2481) [ClassicSimilarity], result of:
            0.08147055 = score(doc=2481,freq=1.0), product of:
              0.15037723 = queryWeight, product of:
                1.1155785 = boost
                6.9347134 = idf(docFreq=116, maxDocs=44218)
                0.019438082 = queryNorm
              0.5417745 = fieldWeight in 2481, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9347134 = idf(docFreq=116, maxDocs=44218)
                0.078125 = fieldNorm(doc=2481)
          0.044551015 = weight(abstract_txt:method in 2481) [ClassicSimilarity], result of:
            0.044551015 = score(doc=2481,freq=1.0), product of:
              0.12669614 = queryWeight, product of:
                1.4481242 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.019438082 = queryNorm
              0.3516367 = fieldWeight in 2481, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.078125 = fieldNorm(doc=2481)
          0.24221382 = weight(abstract_txt:extracted in 2481) [ClassicSimilarity], result of:
            0.24221382 = score(doc=2481,freq=2.0), product of:
              0.3559137 = queryWeight, product of:
                2.9726384 = boost
                6.159553 = idf(docFreq=253, maxDocs=44218)
                0.019438082 = queryNorm
              0.68054086 = fieldWeight in 2481, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.159553 = idf(docFreq=253, maxDocs=44218)
                0.078125 = fieldNorm(doc=2481)
          0.41670364 = weight(abstract_txt:wikipedia in 2481) [ClassicSimilarity], result of:
            0.41670364 = score(doc=2481,freq=3.0), product of:
              0.491337 = queryWeight, product of:
                4.0330057 = boost
                6.2675414 = idf(docFreq=227, maxDocs=44218)
                0.019438082 = queryNorm
              0.8481015 = fieldWeight in 2481, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.2675414 = idf(docFreq=227, maxDocs=44218)
                0.078125 = fieldNorm(doc=2481)
        0.16 = coord(4/25)
    
  5. Cruys, T. van de; Moirón, B.V.: Semantics-based multiword expression extraction (2007) 0.12
    0.12407704 = sum of:
      0.12407704 = product of:
        0.62038517 = sum of:
          0.070417635 = weight(abstract_txt:clustering in 2919) [ClassicSimilarity], result of:
            0.070417635 = score(doc=2919,freq=1.0), product of:
              0.12083195 = queryWeight, product of:
                6.2162485 = idf(docFreq=239, maxDocs=44218)
                0.019438082 = queryNorm
              0.5827733 = fieldWeight in 2919, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2162485 = idf(docFreq=239, maxDocs=44218)
                0.09375 = fieldNorm(doc=2919)
          0.12967315 = weight(abstract_txt:unsupervised in 2919) [ClassicSimilarity], result of:
            0.12967315 = score(doc=2919,freq=1.0), product of:
              0.18153521 = queryWeight, product of:
                1.2257152 = boost
                7.61935 = idf(docFreq=58, maxDocs=44218)
                0.019438082 = queryNorm
              0.71431404 = fieldWeight in 2919, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.61935 = idf(docFreq=58, maxDocs=44218)
                0.09375 = fieldNorm(doc=2919)
          0.07560558 = weight(abstract_txt:method in 2919) [ClassicSimilarity], result of:
            0.07560558 = score(doc=2919,freq=2.0), product of:
              0.12669614 = queryWeight, product of:
                1.4481242 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.019438082 = queryNorm
              0.5967473 = fieldWeight in 2919, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.09375 = fieldNorm(doc=2919)
          0.13916358 = weight(abstract_txt:extraction in 2919) [ClassicSimilarity], result of:
            0.13916358 = score(doc=2919,freq=1.0), product of:
              0.23974776 = queryWeight, product of:
                1.9920553 = boost
                6.1915555 = idf(docFreq=245, maxDocs=44218)
                0.019438082 = queryNorm
              0.58045834 = fieldWeight in 2919, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1915555 = idf(docFreq=245, maxDocs=44218)
                0.09375 = fieldNorm(doc=2919)
          0.20552525 = weight(abstract_txt:extracted in 2919) [ClassicSimilarity], result of:
            0.20552525 = score(doc=2919,freq=1.0), product of:
              0.3559137 = queryWeight, product of:
                2.9726384 = boost
                6.159553 = idf(docFreq=253, maxDocs=44218)
                0.019438082 = queryNorm
              0.5774581 = fieldWeight in 2919, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.159553 = idf(docFreq=253, maxDocs=44218)
                0.09375 = fieldNorm(doc=2919)
        0.2 = coord(5/25)