-
Toutanova, K.; Klein, D.; Manning, C.D.; Singer, Y.: Feature-rich Part-of-Speech Tagging with a cyclic dependency network (2003)
0.00
0.0026473717 = product of:
0.0052947435 = sum of:
0.0052947435 = product of:
0.010589487 = sum of:
0.010589487 = weight(_text_:a in 1059) [ClassicSimilarity], result of:
0.010589487 = score(doc=1059,freq=10.0), product of:
0.053105544 = queryWeight, product of:
1.153047 = idf(docFreq=37942, maxDocs=44218)
0.046056706 = queryNorm
0.19940455 = fieldWeight in 1059, product of:
3.1622777 = tf(freq=10.0), with freq of:
10.0 = termFreq=10.0
1.153047 = idf(docFreq=37942, maxDocs=44218)
0.0546875 = fieldNorm(doc=1059)
0.5 = coord(1/2)
0.5 = coord(1/2)
- Abstract
- We present a new part-of-speech tagger that demonstrates the following ideas: (i) explicit use of both preceding and following tag contexts via a dependency network representation, (ii) broad use of lexical features, including jointly conditioning on multiple consecutive words, (iii) effective use of priors in conditional loglinear models, and (iv) fine-grained modeling of unknown word features. Using these ideas together, the resulting tagger gives a 97.24%accuracy on the Penn TreebankWSJ, an error reduction of 4.4% on the best previous single automatically learned tagging result.
- Type
- a
-
Toutanova, K.; Manning, C.D.: Enriching the knowledge sources used in a maximum entropy Part-of-Speech Tagger (2000)
0.00
0.0020506454 = product of:
0.004101291 = sum of:
0.004101291 = product of:
0.008202582 = sum of:
0.008202582 = weight(_text_:a in 1060) [ClassicSimilarity], result of:
0.008202582 = score(doc=1060,freq=6.0), product of:
0.053105544 = queryWeight, product of:
1.153047 = idf(docFreq=37942, maxDocs=44218)
0.046056706 = queryNorm
0.1544581 = fieldWeight in 1060, product of:
2.4494898 = tf(freq=6.0), with freq of:
6.0 = termFreq=6.0
1.153047 = idf(docFreq=37942, maxDocs=44218)
0.0546875 = fieldNorm(doc=1060)
0.5 = coord(1/2)
0.5 = coord(1/2)
- Abstract
- This paper presents results for a maximumentropy-based part of speech tagger, which achieves superior performance principally by enriching the information sources used for tagging. In particular, we get improved results by incorporating these features: (i) more extensive treatment of capitalization for unknown words; (ii) features for the disambiguation of the tense forms of verbs; (iii) features for disambiguating particles from prepositions and adverbs. The best resulting accuracy for the tagger on the Penn Treebank is 96.86% overall, and 86.91% on previously unseen words.
- Type
- a
-
Manning, C.D.; Raghavan, P.; Schütze, H.: Introduction to information retrieval (2008)
0.00
6.765375E-4 = product of:
0.001353075 = sum of:
0.001353075 = product of:
0.00270615 = sum of:
0.00270615 = weight(_text_:a in 4041) [ClassicSimilarity], result of:
0.00270615 = score(doc=4041,freq=2.0), product of:
0.053105544 = queryWeight, product of:
1.153047 = idf(docFreq=37942, maxDocs=44218)
0.046056706 = queryNorm
0.050957955 = fieldWeight in 4041, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
1.153047 = idf(docFreq=37942, maxDocs=44218)
0.03125 = fieldNorm(doc=4041)
0.5 = coord(1/2)
0.5 = coord(1/2)
- Content
- Inhalt: Boolean retrieval - The term vocabulary & postings lists - Dictionaries and tolerant retrieval - Index construction - Index compression - Scoring, term weighting & the vector space model - Computing scores in a complete search system - Evaluation in information retrieval - Relevance feedback & query expansion - XML retrieval - Probabilistic information retrieval - Language models for information retrieval - Text classification & Naive Bayes - Vector space classification - Support vector machines & machine learning on documents - Flat clustering - Hierarchical clustering - Matrix decompositions & latent semantic indexing - Web search basics - Web crawling and indexes - Link analysis Vgl. die digitale Fassung unter: http://nlp.stanford.edu/IR-book/pdf/irbookprint.pdf.