Document (#40623)

Author
Collovini de Abreu, S.
Vieira, R.
Title
RelP: Portuguese open relation extraction
Source
Knowledge organization. 44(2017) no.3, S.163-177
Year
2017
Abstract
Natural language texts are valuable data sources in many human activities. NLP techniques are being widely used in order to help find the right information to specific needs. In this paper, we present one such technique: relation extraction from texts. This task aims at identifying and classifying semantic relations that occur between entities in a text. For example, the sentence "Roberto Marinho is the founder of Rede Globo" expresses a relation occurring between "Roberto Marinho" and "Rede Globo." This work presents a system for Portuguese Open Relation Extraction, named RelP, which extracts any relation descriptor that describes an explicit relation between named entities in the organisation domain by applying the Conditional Random Fields. For implementing RelP, we define the representation scheme, features based on previous work, and a reference corpus. RelP achieved state of the art results for open relation extraction; the F-measure rate was around 60% between the named entities person, organisation and place. For better understanding of the output, we present a way for organizing the output from the mining of the extracted relation descriptors. This organization can be useful to classify relation types, to cluster the entities involved in a common relation and to populate datasets.
Content
Beitrag in einem Special Issue "New Trends for Knowledge Organization, Guest Editor: Renato Rocha Souza".
Theme
Computerlinguistik

Similar documents (author)

  1. Vieira, L.: Modèle d'analyse pur une classification du document iconographique (1999) 5.94
    5.9353776 = sum of:
      5.9353776 = weight(author_txt:vieira in 321) [ClassicSimilarity], result of:
        5.9353776 = score(doc=321,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            9.496605 = idf(docFreq=8, maxDocs=44083)
            0.105300784 = queryNorm
          5.935378 = fieldWeight in 321, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            9.496605 = idf(docFreq=8, maxDocs=44083)
            0.625 = fieldNorm(doc=321)
    
  2. Vieira, S. Bastos => Bastos Vieira, S.: 5.04
    5.0363345 = sum of:
      5.0363345 = weight(author_txt:vieira in 5729) [ClassicSimilarity], result of:
        5.0363345 = score(doc=5729,freq=2.0), product of:
          0.99999994 = queryWeight, product of:
            9.496605 = idf(docFreq=8, maxDocs=44083)
            0.105300784 = queryNorm
          5.036335 = fieldWeight in 5729, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            9.496605 = idf(docFreq=8, maxDocs=44083)
            0.375 = fieldNorm(doc=5729)
    
  3. Vieira, E.S.; Cabral, J.A.S.; Gomes, J.A.N.F.: Definition of a model based on bibliometric indicators for assessing applicants to academic positions (2014) 3.56
    3.5612266 = sum of:
      3.5612266 = weight(author_txt:vieira in 2222) [ClassicSimilarity], result of:
        3.5612266 = score(doc=2222,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            9.496605 = idf(docFreq=8, maxDocs=44083)
            0.105300784 = queryNorm
          3.5612268 = fieldWeight in 2222, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            9.496605 = idf(docFreq=8, maxDocs=44083)
            0.375 = fieldNorm(doc=2222)
    
  4. Carvalho, J.R. de; Cordeiro, M.I.; Lopes, A.; Vieira, M.: Meta-information about MARC : an XML framework for validation, explanation and help systems (2004) 2.97
    2.9676888 = sum of:
      2.9676888 = weight(author_txt:vieira in 3849) [ClassicSimilarity], result of:
        2.9676888 = score(doc=3849,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            9.496605 = idf(docFreq=8, maxDocs=44083)
            0.105300784 = queryNorm
          2.967689 = fieldWeight in 3849, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            9.496605 = idf(docFreq=8, maxDocs=44083)
            0.3125 = fieldNorm(doc=3849)
    
  5. Bastos Vieira, S.; DeBrito, M.; Mustafa El Hadi, W.; Zumer, M.: Developing imaged KOS with the FRSAD Model : a conceptual methodology (2016) 2.37
    2.374151 = sum of:
      2.374151 = weight(author_txt:vieira in 4110) [ClassicSimilarity], result of:
        2.374151 = score(doc=4110,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            9.496605 = idf(docFreq=8, maxDocs=44083)
            0.105300784 = queryNorm
          2.3741512 = fieldWeight in 4110, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            9.496605 = idf(docFreq=8, maxDocs=44083)
            0.25 = fieldNorm(doc=4110)
    

Similar documents (content)

  1. Vo, D.-T.; Bagheri, E.: Feature-enriched matrix factorization for relation extraction (2019) 0.28
    0.27981773 = sum of:
      0.27981773 = product of:
        0.8744304 = sum of:
          0.022752028 = weight(abstract_txt:work in 106) [ClassicSimilarity], result of:
            0.022752028 = score(doc=106,freq=4.0), product of:
              0.054770246 = queryWeight, product of:
                1.1410819 = boost
                3.798021 = idf(docFreq=2685, maxDocs=44083)
                0.012637772 = queryNorm
              0.41540855 = fieldWeight in 106, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.798021 = idf(docFreq=2685, maxDocs=44083)
                0.0546875 = fieldNorm(doc=106)
          0.008305655 = weight(abstract_txt:this in 106) [ClassicSimilarity], result of:
            0.008305655 = score(doc=106,freq=2.0), product of:
              0.044408612 = queryWeight, product of:
                1.4530919 = boost
                2.4182634 = idf(docFreq=10673, maxDocs=44083)
                0.012637772 = queryNorm
              0.18702802 = fieldWeight in 106, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4182634 = idf(docFreq=10673, maxDocs=44083)
                0.0546875 = fieldNorm(doc=106)
          0.024469204 = weight(abstract_txt:between in 106) [ClassicSimilarity], result of:
            0.024469204 = score(doc=106,freq=2.0), product of:
              0.09126365 = queryWeight, product of:
                2.0830917 = boost
                3.4667213 = idf(docFreq=3740, maxDocs=44083)
                0.012637772 = queryNorm
              0.26811555 = fieldWeight in 106, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4667213 = idf(docFreq=3740, maxDocs=44083)
                0.0546875 = fieldNorm(doc=106)
          0.035187393 = weight(abstract_txt:open in 106) [ClassicSimilarity], result of:
            0.035187393 = score(doc=106,freq=1.0), product of:
              0.13309847 = queryWeight, product of:
                2.1785958 = boost
                4.8342147 = idf(docFreq=952, maxDocs=44083)
                0.012637772 = queryNorm
              0.26437113 = fieldWeight in 106, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8342147 = idf(docFreq=952, maxDocs=44083)
                0.0546875 = fieldNorm(doc=106)
          0.09907509 = weight(abstract_txt:named in 106) [ClassicSimilarity], result of:
            0.09907509 = score(doc=106,freq=1.0), product of:
              0.26539415 = queryWeight, product of:
                3.0763502 = boost
                6.826295 = idf(docFreq=129, maxDocs=44083)
                0.012637772 = queryNorm
              0.373313 = fieldWeight in 106, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.826295 = idf(docFreq=129, maxDocs=44083)
                0.0546875 = fieldNorm(doc=106)
          0.082420185 = weight(abstract_txt:entities in 106) [ClassicSimilarity], result of:
            0.082420185 = score(doc=106,freq=1.0), product of:
              0.2583749 = queryWeight, product of:
                3.5049727 = boost
                5.833043 = idf(docFreq=350, maxDocs=44083)
                0.012637772 = queryNorm
              0.31899455 = fieldWeight in 106, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.833043 = idf(docFreq=350, maxDocs=44083)
                0.0546875 = fieldNorm(doc=106)
          0.17047645 = weight(abstract_txt:extraction in 106) [ClassicSimilarity], result of:
            0.17047645 = score(doc=106,freq=3.0), product of:
              0.29082415 = queryWeight, product of:
                3.7185593 = boost
                6.188498 = idf(docFreq=245, maxDocs=44083)
                0.012637772 = queryNorm
              0.58618397 = fieldWeight in 106, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.188498 = idf(docFreq=245, maxDocs=44083)
                0.0546875 = fieldNorm(doc=106)
          0.4317445 = weight(abstract_txt:relation in 106) [ClassicSimilarity], result of:
            0.4317445 = score(doc=106,freq=7.0), product of:
              0.5529168 = queryWeight, product of:
                8.106985 = boost
                5.3967204 = idf(docFreq=542, maxDocs=44083)
                0.012637772 = queryNorm
              0.78084886 = fieldWeight in 106, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                5.3967204 = idf(docFreq=542, maxDocs=44083)
                0.0546875 = fieldNorm(doc=106)
        0.32 = coord(8/25)
    
  2. Li, J.; Zhang, Z.; Li, X.; Chen, H.: Kernel-based learning for biomedical relation extraction (2008) 0.27
    0.26566792 = sum of:
      0.26566792 = product of:
        1.1069497 = sum of:
          0.008389979 = weight(abstract_txt:this in 2612) [ClassicSimilarity], result of:
            0.008389979 = score(doc=2612,freq=1.0), product of:
              0.044408612 = queryWeight, product of:
                1.4530919 = boost
                2.4182634 = idf(docFreq=10673, maxDocs=44083)
                0.012637772 = queryNorm
              0.18892683 = fieldWeight in 2612, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4182634 = idf(docFreq=10673, maxDocs=44083)
                0.078125 = fieldNorm(doc=2612)
          0.024717629 = weight(abstract_txt:between in 2612) [ClassicSimilarity], result of:
            0.024717629 = score(doc=2612,freq=1.0), product of:
              0.09126365 = queryWeight, product of:
                2.0830917 = boost
                3.4667213 = idf(docFreq=3740, maxDocs=44083)
                0.012637772 = queryNorm
              0.2708376 = fieldWeight in 2612, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4667213 = idf(docFreq=3740, maxDocs=44083)
                0.078125 = fieldNorm(doc=2612)
          0.14153583 = weight(abstract_txt:named in 2612) [ClassicSimilarity], result of:
            0.14153583 = score(doc=2612,freq=1.0), product of:
              0.26539415 = queryWeight, product of:
                3.0763502 = boost
                6.826295 = idf(docFreq=129, maxDocs=44083)
                0.012637772 = queryNorm
              0.5333043 = fieldWeight in 2612, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.826295 = idf(docFreq=129, maxDocs=44083)
                0.078125 = fieldNorm(doc=2612)
          0.11774311 = weight(abstract_txt:entities in 2612) [ClassicSimilarity], result of:
            0.11774311 = score(doc=2612,freq=1.0), product of:
              0.2583749 = queryWeight, product of:
                3.5049727 = boost
                5.833043 = idf(docFreq=350, maxDocs=44083)
                0.012637772 = queryNorm
              0.45570648 = fieldWeight in 2612, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.833043 = idf(docFreq=350, maxDocs=44083)
                0.078125 = fieldNorm(doc=2612)
          0.2435378 = weight(abstract_txt:extraction in 2612) [ClassicSimilarity], result of:
            0.2435378 = score(doc=2612,freq=3.0), product of:
              0.29082415 = queryWeight, product of:
                3.7185593 = boost
                6.188498 = idf(docFreq=245, maxDocs=44083)
                0.012637772 = queryNorm
              0.8374057 = fieldWeight in 2612, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.188498 = idf(docFreq=245, maxDocs=44083)
                0.078125 = fieldNorm(doc=2612)
          0.5710253 = weight(abstract_txt:relation in 2612) [ClassicSimilarity], result of:
            0.5710253 = score(doc=2612,freq=6.0), product of:
              0.5529168 = queryWeight, product of:
                8.106985 = boost
                5.3967204 = idf(docFreq=542, maxDocs=44083)
                0.012637772 = queryNorm
              1.0327508 = fieldWeight in 2612, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.3967204 = idf(docFreq=542, maxDocs=44083)
                0.078125 = fieldNorm(doc=2612)
        0.24 = coord(6/25)
    
  3. Ru, C.; Tang, J.; Li, S.; Xie, S.; Wang, T.: Using semantic similarity to reduce wrong labels in distant supervision for relation extraction (2018) 0.24
    0.23936048 = sum of:
      0.23936048 = product of:
        0.9973354 = sum of:
          0.053741027 = weight(abstract_txt:sentence in 56) [ClassicSimilarity], result of:
            0.053741027 = score(doc=56,freq=2.0), product of:
              0.08886702 = queryWeight, product of:
                1.0277791 = boost
                6.8417993 = idf(docFreq=127, maxDocs=44083)
                0.012637772 = queryNorm
              0.6047353 = fieldWeight in 56, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.8417993 = idf(docFreq=127, maxDocs=44083)
                0.0625 = fieldNorm(doc=56)
          0.0067119827 = weight(abstract_txt:this in 56) [ClassicSimilarity], result of:
            0.0067119827 = score(doc=56,freq=1.0), product of:
              0.044408612 = queryWeight, product of:
                1.4530919 = boost
                2.4182634 = idf(docFreq=10673, maxDocs=44083)
                0.012637772 = queryNorm
              0.15114146 = fieldWeight in 56, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4182634 = idf(docFreq=10673, maxDocs=44083)
                0.0625 = fieldNorm(doc=56)
          0.027964804 = weight(abstract_txt:between in 56) [ClassicSimilarity], result of:
            0.027964804 = score(doc=56,freq=2.0), product of:
              0.09126365 = queryWeight, product of:
                2.0830917 = boost
                3.4667213 = idf(docFreq=3740, maxDocs=44083)
                0.012637772 = queryNorm
              0.30641776 = fieldWeight in 56, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4667213 = idf(docFreq=3740, maxDocs=44083)
                0.0625 = fieldNorm(doc=56)
          0.094194494 = weight(abstract_txt:entities in 56) [ClassicSimilarity], result of:
            0.094194494 = score(doc=56,freq=1.0), product of:
              0.2583749 = queryWeight, product of:
                3.5049727 = boost
                5.833043 = idf(docFreq=350, maxDocs=44083)
                0.012637772 = queryNorm
              0.3645652 = fieldWeight in 56, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.833043 = idf(docFreq=350, maxDocs=44083)
                0.0625 = fieldNorm(doc=56)
          0.22497058 = weight(abstract_txt:extraction in 56) [ClassicSimilarity], result of:
            0.22497058 = score(doc=56,freq=4.0), product of:
              0.29082415 = queryWeight, product of:
                3.7185593 = boost
                6.188498 = idf(docFreq=245, maxDocs=44083)
                0.012637772 = queryNorm
              0.77356225 = fieldWeight in 56, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.188498 = idf(docFreq=245, maxDocs=44083)
                0.0625 = fieldNorm(doc=56)
          0.5897525 = weight(abstract_txt:relation in 56) [ClassicSimilarity], result of:
            0.5897525 = score(doc=56,freq=10.0), product of:
              0.5529168 = queryWeight, product of:
                8.106985 = boost
                5.3967204 = idf(docFreq=542, maxDocs=44083)
                0.012637772 = queryNorm
              1.0666206 = fieldWeight in 56, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                5.3967204 = idf(docFreq=542, maxDocs=44083)
                0.0625 = fieldNorm(doc=56)
        0.24 = coord(6/25)
    
  4. Zhou, G.D.; Zhang, M.: Extracting relation information from text documents by exploring various types of knowledge (2007) 0.18
    0.17636418 = sum of:
      0.17636418 = product of:
        0.8818209 = sum of:
          0.013423965 = weight(abstract_txt:this in 1928) [ClassicSimilarity], result of:
            0.013423965 = score(doc=1928,freq=4.0), product of:
              0.044408612 = queryWeight, product of:
                1.4530919 = boost
                2.4182634 = idf(docFreq=10673, maxDocs=44083)
                0.012637772 = queryNorm
              0.30228293 = fieldWeight in 1928, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.4182634 = idf(docFreq=10673, maxDocs=44083)
                0.0625 = fieldNorm(doc=1928)
          0.019774104 = weight(abstract_txt:between in 1928) [ClassicSimilarity], result of:
            0.019774104 = score(doc=1928,freq=1.0), product of:
              0.09126365 = queryWeight, product of:
                2.0830917 = boost
                3.4667213 = idf(docFreq=3740, maxDocs=44083)
                0.012637772 = queryNorm
              0.21667008 = fieldWeight in 1928, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4667213 = idf(docFreq=3740, maxDocs=44083)
                0.0625 = fieldNorm(doc=1928)
          0.094194494 = weight(abstract_txt:entities in 1928) [ClassicSimilarity], result of:
            0.094194494 = score(doc=1928,freq=1.0), product of:
              0.2583749 = queryWeight, product of:
                3.5049727 = boost
                5.833043 = idf(docFreq=350, maxDocs=44083)
                0.012637772 = queryNorm
              0.3645652 = fieldWeight in 1928, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.833043 = idf(docFreq=350, maxDocs=44083)
                0.0625 = fieldNorm(doc=1928)
          0.2976081 = weight(abstract_txt:extraction in 1928) [ClassicSimilarity], result of:
            0.2976081 = score(doc=1928,freq=7.0), product of:
              0.29082415 = queryWeight, product of:
                3.7185593 = boost
                6.188498 = idf(docFreq=245, maxDocs=44083)
                0.012637772 = queryNorm
              1.0233266 = fieldWeight in 1928, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                6.188498 = idf(docFreq=245, maxDocs=44083)
                0.0625 = fieldNorm(doc=1928)
          0.45682028 = weight(abstract_txt:relation in 1928) [ClassicSimilarity], result of:
            0.45682028 = score(doc=1928,freq=6.0), product of:
              0.5529168 = queryWeight, product of:
                8.106985 = boost
                5.3967204 = idf(docFreq=542, maxDocs=44083)
                0.012637772 = queryNorm
              0.8262007 = fieldWeight in 1928, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.3967204 = idf(docFreq=542, maxDocs=44083)
                0.0625 = fieldNorm(doc=1928)
        0.2 = coord(5/25)
    
  5. Zhang, M.; Zhou, G.D.; Aw, A.: Exploring syntactic structured features over parse trees for relation extraction using kernel methods (2008) 0.16
    0.16320206 = sum of:
      0.16320206 = product of:
        0.8160103 = sum of:
          0.009492177 = weight(abstract_txt:this in 3056) [ClassicSimilarity], result of:
            0.009492177 = score(doc=3056,freq=2.0), product of:
              0.044408612 = queryWeight, product of:
                1.4530919 = boost
                2.4182634 = idf(docFreq=10673, maxDocs=44083)
                0.012637772 = queryNorm
              0.21374631 = fieldWeight in 3056, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4182634 = idf(docFreq=10673, maxDocs=44083)
                0.0625 = fieldNorm(doc=3056)
          0.019774104 = weight(abstract_txt:between in 3056) [ClassicSimilarity], result of:
            0.019774104 = score(doc=3056,freq=1.0), product of:
              0.09126365 = queryWeight, product of:
                2.0830917 = boost
                3.4667213 = idf(docFreq=3740, maxDocs=44083)
                0.012637772 = queryNorm
              0.21667008 = fieldWeight in 3056, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4667213 = idf(docFreq=3740, maxDocs=44083)
                0.0625 = fieldNorm(doc=3056)
          0.094194494 = weight(abstract_txt:entities in 3056) [ClassicSimilarity], result of:
            0.094194494 = score(doc=3056,freq=1.0), product of:
              0.2583749 = queryWeight, product of:
                3.5049727 = boost
                5.833043 = idf(docFreq=350, maxDocs=44083)
                0.012637772 = queryNorm
              0.3645652 = fieldWeight in 3056, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.833043 = idf(docFreq=350, maxDocs=44083)
                0.0625 = fieldNorm(doc=3056)
          0.27553156 = weight(abstract_txt:extraction in 3056) [ClassicSimilarity], result of:
            0.27553156 = score(doc=3056,freq=6.0), product of:
              0.29082415 = queryWeight, product of:
                3.7185593 = boost
                6.188498 = idf(docFreq=245, maxDocs=44083)
                0.012637772 = queryNorm
              0.9474164 = fieldWeight in 3056, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.188498 = idf(docFreq=245, maxDocs=44083)
                0.0625 = fieldNorm(doc=3056)
          0.41701797 = weight(abstract_txt:relation in 3056) [ClassicSimilarity], result of:
            0.41701797 = score(doc=3056,freq=5.0), product of:
              0.5529168 = queryWeight, product of:
                8.106985 = boost
                5.3967204 = idf(docFreq=542, maxDocs=44083)
                0.012637772 = queryNorm
              0.75421464 = fieldWeight in 3056, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.3967204 = idf(docFreq=542, maxDocs=44083)
                0.0625 = fieldNorm(doc=3056)
        0.2 = coord(5/25)