Jung, H.; Yi, E.; Kim, D.; Lee, G.G.: Information extraction with automatic knowledge expansion (2005)
0.00
0.0020296127 = product of:
0.0040592253 = sum of:
0.0040592253 = product of:
0.008118451 = sum of:
0.008118451 = weight(_text_:a in 1008) [ClassicSimilarity], result of:
0.008118451 = score(doc=1008,freq=8.0), product of:
0.053105544 = queryWeight, product of:
1.153047 = idf(docFreq=37942, maxDocs=44218)
0.046056706 = queryNorm
0.15287387 = fieldWeight in 1008, product of:
2.828427 = tf(freq=8.0), with freq of:
8.0 = termFreq=8.0
1.153047 = idf(docFreq=37942, maxDocs=44218)
0.046875 = fieldNorm(doc=1008)
0.5 = coord(1/2)
0.5 = coord(1/2)
- Abstract
- POSIE (POSTECH Information Extraction System) is an information extraction system which uses multiple learning strategies, i.e., SmL, user-oriented learning, and separate-context learning, in a question answering framework. POSIE replaces laborious annotation with automatic instance extraction by the SmL from structured Web documents, and places the user at the end of the user-oriented learning cycle. Information extraction as question answering simplifies the extraction procedures for a set of slots. We introduce the techniques verified on the question answering framework, such as domain knowledge and instance rules, into an information extraction problem. To incrementally improve extraction performance, a sequence of the user-oriented learning and the separate-context learning produces context rules and generalizes them in both the learning and extraction phases. Experiments on the "continuing education" domain initially show that the F1-measure becomes 0.477 and recall 0.748 with no user training. However, as the size of the training documents grows, the F1-measure reaches beyond 0.75 with recall 0.772. We also obtain F-measure of about 0.9 for five out of seven slots on "job offering" domain.
- Type
- a