-
Hobson, S.P.; Dorr, B.J.; Monz, C.; Schwartz, R.: Task-based evaluation of text summarization using Relevance Prediction (2007)
0.02
0.0168082 = product of:
0.0672328 = sum of:
0.05859083 = weight(_text_:world in 938) [ClassicSimilarity], result of:
0.05859083 = score(doc=938,freq=4.0), product of:
0.16259687 = queryWeight, product of:
3.8436708 = idf(docFreq=2573, maxDocs=44218)
0.042302497 = queryNorm
0.36034414 = fieldWeight in 938, product of:
2.0 = tf(freq=4.0), with freq of:
4.0 = termFreq=4.0
3.8436708 = idf(docFreq=2573, maxDocs=44218)
0.046875 = fieldNorm(doc=938)
0.008641975 = weight(_text_:information in 938) [ClassicSimilarity], result of:
0.008641975 = score(doc=938,freq=2.0), product of:
0.0742611 = queryWeight, product of:
1.7554779 = idf(docFreq=20772, maxDocs=44218)
0.042302497 = queryNorm
0.116372846 = fieldWeight in 938, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
1.7554779 = idf(docFreq=20772, maxDocs=44218)
0.046875 = fieldNorm(doc=938)
0.25 = coord(2/8)
- Abstract
- This article introduces a new task-based evaluation measure called Relevance Prediction that is a more intuitive measure of an individual's performance on a real-world task than interannotator agreement. Relevance Prediction parallels what a user does in the real world task of browsing a set of documents using standard search tools, i.e., the user judges relevance based on a short summary and then that same user - not an independent user - decides whether to open (and judge) the corresponding document. This measure is shown to be a more reliable measure of task performance than LDC Agreement, a current gold-standard based measure used in the summarization evaluation community. Our goal is to provide a stable framework within which developers of new automatic measures may make stronger statistical statements about the effectiveness of their measures in predicting summary usefulness. We demonstrate - as a proof-of-concept methodology for automatic metric developers - that a current automatic evaluation measure has a better correlation with Relevance Prediction than with LDC Agreement and that the significance level for detected differences is higher for the former than for the latter.
- Source
- Information processing and management. 43(2007) no.6, S.1482-1499
-
Zajic, D.; Dorr, B.J.; Lin, J.; Schwartz, R.: Multi-candidate reduction : sentence compression as a tool for document summarization tasks (2007)
0.00
0.0012602882 = product of:
0.010082305 = sum of:
0.010082305 = weight(_text_:information in 944) [ClassicSimilarity], result of:
0.010082305 = score(doc=944,freq=2.0), product of:
0.0742611 = queryWeight, product of:
1.7554779 = idf(docFreq=20772, maxDocs=44218)
0.042302497 = queryNorm
0.13576832 = fieldWeight in 944, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
1.7554779 = idf(docFreq=20772, maxDocs=44218)
0.0546875 = fieldNorm(doc=944)
0.125 = coord(1/8)
- Source
- Information processing and management. 43(2007) no.6, S.1549-1570
-
Dorr, B.J.; Gaasterland, T.: Exploiting aspectual features and connecting words for summarization-inspired temporal-relation extraction (2007)
0.00
0.0010802469 = product of:
0.008641975 = sum of:
0.008641975 = weight(_text_:information in 950) [ClassicSimilarity], result of:
0.008641975 = score(doc=950,freq=2.0), product of:
0.0742611 = queryWeight, product of:
1.7554779 = idf(docFreq=20772, maxDocs=44218)
0.042302497 = queryNorm
0.116372846 = fieldWeight in 950, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
1.7554779 = idf(docFreq=20772, maxDocs=44218)
0.046875 = fieldNorm(doc=950)
0.125 = coord(1/8)
- Source
- Information processing and management. 43(2007) no.6, S.1681-1704