Document (#33010)

Author
Jung, H.
Yi, E.
Kim, D.
Lee, G.G.
Title
Information extraction with automatic knowledge expansion
Source
Information processing and management. 41(2005) no.2, S.217-242
Year
2005
Abstract
POSIE (POSTECH Information Extraction System) is an information extraction system which uses multiple learning strategies, i.e., SmL, user-oriented learning, and separate-context learning, in a question answering framework. POSIE replaces laborious annotation with automatic instance extraction by the SmL from structured Web documents, and places the user at the end of the user-oriented learning cycle. Information extraction as question answering simplifies the extraction procedures for a set of slots. We introduce the techniques verified on the question answering framework, such as domain knowledge and instance rules, into an information extraction problem. To incrementally improve extraction performance, a sequence of the user-oriented learning and the separate-context learning produces context rules and generalizes them in both the learning and extraction phases. Experiments on the "continuing education" domain initially show that the F1-measure becomes 0.477 and recall 0.748 with no user training. However, as the size of the training documents grows, the F1-measure reaches beyond 0.75 with recall 0.772. We also obtain F-measure of about 0.9 for five out of seven slots on "job offering" domain.

Similar documents (author)

  1. Jung, R.: ¬Die Reform der alphabetischen Katalogisierung in Deutschland 1908-1976 : eine annotierte Auswahlbibliographie (1976) 5.64
    5.639896 = sum of:
      5.639896 = weight(author_txt:jung in 5323) [ClassicSimilarity], result of:
        5.639896 = fieldWeight in 5323, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.023833 = idf(docFreq=13, maxDocs=42740)
          0.625 = fieldNorm(doc=5323)
    
  2. Jung, V.: Wissen, das produktiv wird : Mit Wissensmanagement zum lernenden Unternehmen (2000) 5.64
    5.639896 = sum of:
      5.639896 = weight(author_txt:jung in 6058) [ClassicSimilarity], result of:
        5.639896 = fieldWeight in 6058, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.023833 = idf(docFreq=13, maxDocs=42740)
          0.625 = fieldNorm(doc=6058)
    
  3. Jung, R.: Bibliographie der Festschriften und Festschriftenbeiträge zum Buch und Bibliothekswesen : Deutschland, Österreich, Schweiz 1976-2000 (2002) 5.64
    5.639896 = sum of:
      5.639896 = weight(author_txt:jung in 2090) [ClassicSimilarity], result of:
        5.639896 = fieldWeight in 2090, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.023833 = idf(docFreq=13, maxDocs=42740)
          0.625 = fieldNorm(doc=2090)
    
  4. Jung, R.: Methodik und Didaktik einer Einführung in die RAK nach vorausgegangenem Unterricht der Titelaufnahme nach den "Preußischen Instruktionen" (1976) 5.64
    5.639896 = sum of:
      5.639896 = weight(author_txt:jung in 2804) [ClassicSimilarity], result of:
        5.639896 = fieldWeight in 2804, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.023833 = idf(docFreq=13, maxDocs=42740)
          0.625 = fieldNorm(doc=2804)
    
  5. Jung, J.J.: Contextualized query sampling to discover semantic resource descriptions on the web (2009) 5.64
    5.639896 = sum of:
      5.639896 = weight(author_txt:jung in 1217) [ClassicSimilarity], result of:
        5.639896 = fieldWeight in 1217, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.023833 = idf(docFreq=13, maxDocs=42740)
          0.625 = fieldNorm(doc=1217)
    

Similar documents (content)

  1. Vlachidis, A.; Tudhope, D.: ¬A knowledge-based approach to information extraction for semantic interoperability in the archaeology domain (2016) 0.24
    0.23767455 = sum of:
      0.23767455 = product of:
        0.742733 = sum of:
          0.014403658 = weight(abstract_txt:with in 4896) [ClassicSimilarity], result of:
            0.014403658 = score(doc=4896,freq=4.0), product of:
              0.045769084 = queryWeight, product of:
                1.1809571 = boost
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.01539386 = queryNorm
              0.31470278 = fieldWeight in 4896, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.0625 = fieldNorm(doc=4896)
          0.031726073 = weight(abstract_txt:automatic in 4896) [ClassicSimilarity], result of:
            0.031726073 = score(doc=4896,freq=1.0), product of:
              0.0976213 = queryWeight, product of:
                1.219566 = boost
                5.199861 = idf(docFreq=640, maxDocs=42740)
                0.01539386 = queryNorm
              0.32499132 = fieldWeight in 4896, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.199861 = idf(docFreq=640, maxDocs=42740)
                0.0625 = fieldNorm(doc=4896)
          0.045650844 = weight(abstract_txt:rules in 4896) [ClassicSimilarity], result of:
            0.045650844 = score(doc=4896,freq=2.0), product of:
              0.09875435 = queryWeight, product of:
                1.226623 = boost
                5.2299504 = idf(docFreq=621, maxDocs=42740)
                0.01539386 = queryNorm
              0.46226668 = fieldWeight in 4896, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.2299504 = idf(docFreq=621, maxDocs=42740)
                0.0625 = fieldNorm(doc=4896)
          0.011449083 = weight(abstract_txt:information in 4896) [ClassicSimilarity], result of:
            0.011449083 = score(doc=4896,freq=2.0), product of:
              0.0533029 = queryWeight, product of:
                1.4248805 = boost
                2.430104 = idf(docFreq=10226, maxDocs=42740)
                0.01539386 = queryNorm
              0.21479288 = fieldWeight in 4896, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.430104 = idf(docFreq=10226, maxDocs=42740)
                0.0625 = fieldNorm(doc=4896)
          0.028390715 = weight(abstract_txt:context in 4896) [ClassicSimilarity], result of:
            0.028390715 = score(doc=4896,freq=1.0), product of:
              0.10377235 = queryWeight, product of:
                1.5399956 = boost
                4.377384 = idf(docFreq=1458, maxDocs=42740)
                0.01539386 = queryNorm
              0.2735865 = fieldWeight in 4896, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.377384 = idf(docFreq=1458, maxDocs=42740)
                0.0625 = fieldNorm(doc=4896)
          0.063964866 = weight(abstract_txt:domain in 4896) [ClassicSimilarity], result of:
            0.063964866 = score(doc=4896,freq=3.0), product of:
              0.1236568 = queryWeight, product of:
                1.6810771 = boost
                4.7784038 = idf(docFreq=976, maxDocs=42740)
                0.01539386 = queryNorm
              0.51727736 = fieldWeight in 4896, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.7784038 = idf(docFreq=976, maxDocs=42740)
                0.0625 = fieldNorm(doc=4896)
          0.059340004 = weight(abstract_txt:oriented in 4896) [ClassicSimilarity], result of:
            0.059340004 = score(doc=4896,freq=1.0), product of:
              0.16964035 = queryWeight, product of:
                1.9689887 = boost
                5.596782 = idf(docFreq=430, maxDocs=42740)
                0.01539386 = queryNorm
              0.3497989 = fieldWeight in 4896, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.596782 = idf(docFreq=430, maxDocs=42740)
                0.0625 = fieldNorm(doc=4896)
          0.48780778 = weight(abstract_txt:extraction in 4896) [ClassicSimilarity], result of:
            0.48780778 = score(doc=4896,freq=4.0), product of:
              0.6277938 = queryWeight, product of:
                6.5606637 = boost
                6.216153 = idf(docFreq=231, maxDocs=42740)
                0.01539386 = queryNorm
              0.77701914 = fieldWeight in 4896, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.216153 = idf(docFreq=231, maxDocs=42740)
                0.0625 = fieldNorm(doc=4896)
        0.32 = coord(8/25)
    
  2. Wu, T.; Pottenger, W.M.: ¬A semi-supervised active learning algorithm for information extraction from textual data (2005) 0.20
    0.20245464 = sum of:
      0.20245464 = product of:
        0.84356105 = sum of:
          0.010184924 = weight(abstract_txt:with in 4238) [ClassicSimilarity], result of:
            0.010184924 = score(doc=4238,freq=2.0), product of:
              0.045769084 = queryWeight, product of:
                1.1809571 = boost
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.01539386 = queryNorm
              0.22252847 = fieldWeight in 4238, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.0625 = fieldNorm(doc=4238)
          0.052783124 = weight(abstract_txt:training in 4238) [ClassicSimilarity], result of:
            0.052783124 = score(doc=4238,freq=3.0), product of:
              0.0950364 = queryWeight, product of:
                1.2033113 = boost
                5.130556 = idf(docFreq=686, maxDocs=42740)
                0.01539386 = queryNorm
              0.555399 = fieldWeight in 4238, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.130556 = idf(docFreq=686, maxDocs=42740)
                0.0625 = fieldNorm(doc=4238)
          0.016191449 = weight(abstract_txt:information in 4238) [ClassicSimilarity], result of:
            0.016191449 = score(doc=4238,freq=4.0), product of:
              0.0533029 = queryWeight, product of:
                1.4248805 = boost
                2.430104 = idf(docFreq=10226, maxDocs=42740)
                0.01539386 = queryNorm
              0.303763 = fieldWeight in 4238, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.430104 = idf(docFreq=10226, maxDocs=42740)
                0.0625 = fieldNorm(doc=4238)
          0.028390715 = weight(abstract_txt:context in 4238) [ClassicSimilarity], result of:
            0.028390715 = score(doc=4238,freq=1.0), product of:
              0.10377235 = queryWeight, product of:
                1.5399956 = boost
                4.377384 = idf(docFreq=1458, maxDocs=42740)
                0.01539386 = queryNorm
              0.2735865 = fieldWeight in 4238, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.377384 = idf(docFreq=1458, maxDocs=42740)
                0.0625 = fieldNorm(doc=4238)
          0.24820304 = weight(abstract_txt:learning in 4238) [ClassicSimilarity], result of:
            0.24820304 = score(doc=4238,freq=8.0), product of:
              0.29205486 = queryWeight, product of:
                3.9463832 = boost
                4.807482 = idf(docFreq=948, maxDocs=42740)
                0.01539386 = queryNorm
              0.8498507 = fieldWeight in 4238, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                4.807482 = idf(docFreq=948, maxDocs=42740)
                0.0625 = fieldNorm(doc=4238)
          0.48780778 = weight(abstract_txt:extraction in 4238) [ClassicSimilarity], result of:
            0.48780778 = score(doc=4238,freq=4.0), product of:
              0.6277938 = queryWeight, product of:
                6.5606637 = boost
                6.216153 = idf(docFreq=231, maxDocs=42740)
                0.01539386 = queryNorm
              0.77701914 = fieldWeight in 4238, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.216153 = idf(docFreq=231, maxDocs=42740)
                0.0625 = fieldNorm(doc=4238)
        0.24 = coord(6/25)
    
  3. El idrissi esserhrouchni, O. et al.; Frikh, B.; Ouhbi, B.: OntologyLine : a new framework for learning non-taxonomic relations of domain ontology (2016) 0.17
    0.17242822 = sum of:
      0.17242822 = product of:
        0.7184509 = sum of:
          0.0531576 = weight(abstract_txt:recall in 5380) [ClassicSimilarity], result of:
            0.0531576 = score(doc=5380,freq=1.0), product of:
              0.1186781 = queryWeight, product of:
                1.344678 = boost
                5.733301 = idf(docFreq=375, maxDocs=42740)
                0.01539386 = queryNorm
              0.44791415 = fieldWeight in 5380, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.733301 = idf(docFreq=375, maxDocs=42740)
                0.078125 = fieldNorm(doc=5380)
          0.010119655 = weight(abstract_txt:information in 5380) [ClassicSimilarity], result of:
            0.010119655 = score(doc=5380,freq=1.0), product of:
              0.0533029 = queryWeight, product of:
                1.4248805 = boost
                2.430104 = idf(docFreq=10226, maxDocs=42740)
                0.01539386 = queryNorm
              0.18985188 = fieldWeight in 5380, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.430104 = idf(docFreq=10226, maxDocs=42740)
                0.078125 = fieldNorm(doc=5380)
          0.09232534 = weight(abstract_txt:domain in 5380) [ClassicSimilarity], result of:
            0.09232534 = score(doc=5380,freq=4.0), product of:
              0.1236568 = queryWeight, product of:
                1.6810771 = boost
                4.7784038 = idf(docFreq=976, maxDocs=42740)
                0.01539386 = queryNorm
              0.7466256 = fieldWeight in 5380, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.7784038 = idf(docFreq=976, maxDocs=42740)
                0.078125 = fieldNorm(doc=5380)
          0.06797755 = weight(abstract_txt:measure in 5380) [ClassicSimilarity], result of:
            0.06797755 = score(doc=5380,freq=1.0), product of:
              0.16005445 = queryWeight, product of:
                1.9125488 = boost
                5.4363537 = idf(docFreq=505, maxDocs=42740)
                0.01539386 = queryNorm
              0.42471513 = fieldWeight in 5380, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4363537 = idf(docFreq=505, maxDocs=42740)
                0.078125 = fieldNorm(doc=5380)
          0.18999086 = weight(abstract_txt:learning in 5380) [ClassicSimilarity], result of:
            0.18999086 = score(doc=5380,freq=3.0), product of:
              0.29205486 = queryWeight, product of:
                3.9463832 = boost
                4.807482 = idf(docFreq=948, maxDocs=42740)
                0.01539386 = queryNorm
              0.6505314 = fieldWeight in 5380, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.807482 = idf(docFreq=948, maxDocs=42740)
                0.078125 = fieldNorm(doc=5380)
          0.30487987 = weight(abstract_txt:extraction in 5380) [ClassicSimilarity], result of:
            0.30487987 = score(doc=5380,freq=1.0), product of:
              0.6277938 = queryWeight, product of:
                6.5606637 = boost
                6.216153 = idf(docFreq=231, maxDocs=42740)
                0.01539386 = queryNorm
              0.48563695 = fieldWeight in 5380, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.216153 = idf(docFreq=231, maxDocs=42740)
                0.078125 = fieldNorm(doc=5380)
        0.24 = coord(6/25)
    
  4. Conde, A.; Larrañaga, M.; Arruarte, A.; Elorriaga, J.A.; Roth, D.: litewi: a combined term extraction and entity linking method for eliciting educational ontologies from textbooks (2016) 0.17
    0.16761759 = sum of:
      0.16761759 = product of:
        0.69840664 = sum of:
          0.007201829 = weight(abstract_txt:with in 4646) [ClassicSimilarity], result of:
            0.007201829 = score(doc=4646,freq=1.0), product of:
              0.045769084 = queryWeight, product of:
                1.1809571 = boost
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.01539386 = queryNorm
              0.15735139 = fieldWeight in 4646, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.0625 = fieldNorm(doc=4646)
          0.011449083 = weight(abstract_txt:information in 4646) [ClassicSimilarity], result of:
            0.011449083 = score(doc=4646,freq=2.0), product of:
              0.0533029 = queryWeight, product of:
                1.4248805 = boost
                2.430104 = idf(docFreq=10226, maxDocs=42740)
                0.01539386 = queryNorm
              0.21479288 = fieldWeight in 4646, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.430104 = idf(docFreq=10226, maxDocs=42740)
                0.0625 = fieldNorm(doc=4646)
          0.073860265 = weight(abstract_txt:domain in 4646) [ClassicSimilarity], result of:
            0.073860265 = score(doc=4646,freq=4.0), product of:
              0.1236568 = queryWeight, product of:
                1.6810771 = boost
                4.7784038 = idf(docFreq=976, maxDocs=42740)
                0.01539386 = queryNorm
              0.59730047 = fieldWeight in 4646, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.7784038 = idf(docFreq=976, maxDocs=42740)
                0.0625 = fieldNorm(doc=4646)
          0.059340004 = weight(abstract_txt:oriented in 4646) [ClassicSimilarity], result of:
            0.059340004 = score(doc=4646,freq=1.0), product of:
              0.16964035 = queryWeight, product of:
                1.9689887 = boost
                5.596782 = idf(docFreq=430, maxDocs=42740)
                0.01539386 = queryNorm
              0.3497989 = fieldWeight in 4646, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.596782 = idf(docFreq=430, maxDocs=42740)
                0.0625 = fieldNorm(doc=4646)
          0.12410152 = weight(abstract_txt:learning in 4646) [ClassicSimilarity], result of:
            0.12410152 = score(doc=4646,freq=2.0), product of:
              0.29205486 = queryWeight, product of:
                3.9463832 = boost
                4.807482 = idf(docFreq=948, maxDocs=42740)
                0.01539386 = queryNorm
              0.42492536 = fieldWeight in 4646, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.807482 = idf(docFreq=948, maxDocs=42740)
                0.0625 = fieldNorm(doc=4646)
          0.42245394 = weight(abstract_txt:extraction in 4646) [ClassicSimilarity], result of:
            0.42245394 = score(doc=4646,freq=3.0), product of:
              0.6277938 = queryWeight, product of:
                6.5606637 = boost
                6.216153 = idf(docFreq=231, maxDocs=42740)
                0.01539386 = queryNorm
              0.6729183 = fieldWeight in 4646, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.216153 = idf(docFreq=231, maxDocs=42740)
                0.0625 = fieldNorm(doc=4646)
        0.24 = coord(6/25)
    
  5. Boer, V. de; Porter, A.L.; Someren, M. v.: Extracting historical time periods from the Web (2010) 0.16
    0.1638634 = sum of:
      0.1638634 = product of:
        0.819317 = sum of:
          0.010802744 = weight(abstract_txt:with in 989) [ClassicSimilarity], result of:
            0.010802744 = score(doc=989,freq=1.0), product of:
              0.045769084 = queryWeight, product of:
                1.1809571 = boost
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.01539386 = queryNorm
              0.23602709 = fieldWeight in 989, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.09375 = fieldNorm(doc=989)
          0.047589112 = weight(abstract_txt:automatic in 989) [ClassicSimilarity], result of:
            0.047589112 = score(doc=989,freq=1.0), product of:
              0.0976213 = queryWeight, product of:
                1.219566 = boost
                5.199861 = idf(docFreq=640, maxDocs=42740)
                0.01539386 = queryNorm
              0.48748696 = fieldWeight in 989, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.199861 = idf(docFreq=640, maxDocs=42740)
                0.09375 = fieldNorm(doc=989)
          0.021033308 = weight(abstract_txt:information in 989) [ClassicSimilarity], result of:
            0.021033308 = score(doc=989,freq=3.0), product of:
              0.0533029 = queryWeight, product of:
                1.4248805 = boost
                2.430104 = idf(docFreq=10226, maxDocs=42740)
                0.01539386 = queryNorm
              0.3945997 = fieldWeight in 989, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.430104 = idf(docFreq=10226, maxDocs=42740)
                0.09375 = fieldNorm(doc=989)
          0.10621089 = weight(abstract_txt:instance in 989) [ClassicSimilarity], result of:
            0.10621089 = score(doc=989,freq=1.0), product of:
              0.16671917 = queryWeight, product of:
                1.5937705 = boost
                6.7953563 = idf(docFreq=129, maxDocs=42740)
                0.01539386 = queryNorm
              0.63706464 = fieldWeight in 989, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7953563 = idf(docFreq=129, maxDocs=42740)
                0.09375 = fieldNorm(doc=989)
          0.63368094 = weight(abstract_txt:extraction in 989) [ClassicSimilarity], result of:
            0.63368094 = score(doc=989,freq=3.0), product of:
              0.6277938 = queryWeight, product of:
                6.5606637 = boost
                6.216153 = idf(docFreq=231, maxDocs=42740)
                0.01539386 = queryNorm
              1.0093775 = fieldWeight in 989, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.216153 = idf(docFreq=231, maxDocs=42740)
                0.09375 = fieldNorm(doc=989)
        0.2 = coord(5/25)