Toutanova, K.; Manning, C.D.: Enriching the knowledge sources used in a maximum entropy Part-of-Speech Tagger (2000)
0.01
0.005513504 = product of:
0.01378376 = sum of:
0.008258085 = weight(_text_:a in 1060) [ClassicSimilarity], result of:
0.008258085 = score(doc=1060,freq=6.0), product of:
0.053464882 = queryWeight, product of:
1.153047 = idf(docFreq=37942, maxDocs=44218)
0.046368346 = queryNorm
0.1544581 = fieldWeight in 1060, product of:
2.4494898 = tf(freq=6.0), with freq of:
6.0 = termFreq=6.0
1.153047 = idf(docFreq=37942, maxDocs=44218)
0.0546875 = fieldNorm(doc=1060)
0.005525676 = product of:
0.011051352 = sum of:
0.011051352 = weight(_text_:information in 1060) [ClassicSimilarity], result of:
0.011051352 = score(doc=1060,freq=2.0), product of:
0.08139861 = queryWeight, product of:
1.7554779 = idf(docFreq=20772, maxDocs=44218)
0.046368346 = queryNorm
0.13576832 = fieldWeight in 1060, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
1.7554779 = idf(docFreq=20772, maxDocs=44218)
0.0546875 = fieldNorm(doc=1060)
0.5 = coord(1/2)
0.4 = coord(2/5)
- Abstract
- This paper presents results for a maximumentropy-based part of speech tagger, which achieves superior performance principally by enriching the information sources used for tagging. In particular, we get improved results by incorporating these features: (i) more extensive treatment of capitalization for unknown words; (ii) features for the disambiguation of the tense forms of verbs; (iii) features for disambiguating particles from prepositions and adverbs. The best resulting accuracy for the tagger on the Penn Treebank is 96.86% overall, and 86.91% on previously unseen words.
- Type
- a