Search (5 results, page 1 of 1)

Manning, C.D.; Raghavan, P.; Schütze, H.: Introduction to information retrieval (2008) 0.01

0.005056957 = product of:
  0.010113914 = sum of:
    0.010113914 = product of:
      0.01517087 = sum of:
        0.0026886058 = weight(_text_:a in 4041) [ClassicSimilarity], result of:
          0.0026886058 = score(doc=4041,freq=2.0), product of:
            0.052761257 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045758117 = queryNorm
            0.050957955 = fieldWeight in 4041, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.03125 = fieldNorm(doc=4041)
        0.012482265 = weight(_text_:h in 4041) [ClassicSimilarity], result of:
          0.012482265 = score(doc=4041,freq=2.0), product of:
            0.113683715 = queryWeight, product of:
              2.4844491 = idf(docFreq=10020, maxDocs=44218)
              0.045758117 = queryNorm
            0.10979818 = fieldWeight in 4041, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.4844491 = idf(docFreq=10020, maxDocs=44218)
              0.03125 = fieldNorm(doc=4041)
      0.6666667 = coord(2/3)
  0.5 = coord(1/2)

Content: Inhalt: Boolean retrieval - The term vocabulary & postings lists - Dictionaries and tolerant retrieval - Index construction - Index compression - Scoring, term weighting & the vector space model - Computing scores in a complete search system - Evaluation in information retrieval - Relevance feedback & query expansion - XML retrieval - Probabilistic information retrieval - Language models for information retrieval - Text classification & Naive Bayes - Vector space classification - Support vector machines & machine learning on documents - Flat clustering - Hierarchical clustering - Matrix decompositions & latent semantic indexing - Web search basics - Web crawling and indexes - Link analysis Vgl. die digitale Fassung unter: http://nlp.stanford.edu/IR-book/pdf/irbookprint.pdf.

Manning, C.D.; Schütze, H.: Foundations of statistical natural language processing (2000) 0.00

0.0031205663 = product of:
  0.0062411325 = sum of:
    0.0062411325 = product of:
      0.018723397 = sum of:
        0.018723397 = weight(_text_:h in 1603) [ClassicSimilarity], result of:
          0.018723397 = score(doc=1603,freq=2.0), product of:
            0.113683715 = queryWeight, product of:
              2.4844491 = idf(docFreq=10020, maxDocs=44218)
              0.045758117 = queryNorm
            0.16469726 = fieldWeight in 1603, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.4844491 = idf(docFreq=10020, maxDocs=44218)
              0.046875 = fieldNorm(doc=1603)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Toutanova, K.; Klein, D.; Manning, C.D.; Singer, Y.: Feature-rich Part-of-Speech Tagging with a cyclic dependency network (2003) 0.00
```
0.0017534725 = product of:
  0.003506945 = sum of:
    0.003506945 = product of:
      0.0105208345 = sum of:
        0.0105208345 = weight(_text_:a in 1059) [ClassicSimilarity], result of:
          0.0105208345 = score(doc=1059,freq=10.0), product of:
            0.052761257 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045758117 = queryNorm
            0.19940455 = fieldWeight in 1059, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1059)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)
```
Abstract

We present a new part-of-speech tagger that demonstrates the following ideas: (i) explicit use of both preceding and following tag contexts via a dependency network representation, (ii) broad use of lexical features, including jointly conditioning on multiple consecutive words, (iii) effective use of priors in conditional loglinear models, and (iv) fine-grained modeling of unknown word features. Using these ideas together, the resulting tagger gives a 97.24%accuracy on the Penn TreebankWSJ, an error reduction of 4.4% on the best previous single automatically learned tagging result.

Type

a
Toutanova, K.; Manning, C.D.: Enriching the knowledge sources used in a maximum entropy Part-of-Speech Tagger (2000) 0.00
```
0.0013582341 = product of:
  0.0027164682 = sum of:
    0.0027164682 = product of:
      0.008149404 = sum of:
        0.008149404 = weight(_text_:a in 1060) [ClassicSimilarity], result of:
          0.008149404 = score(doc=1060,freq=6.0), product of:
            0.052761257 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045758117 = queryNorm
            0.1544581 = fieldWeight in 1060, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1060)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)
```
Abstract

This paper presents results for a maximumentropy-based part of speech tagger, which achieves superior performance principally by enriching the information sources used for tagging. In particular, we get improved results by incorporating these features: (i) more extensive treatment of capitalization for unknown words; (ii) features for the disambiguation of the tense forms of verbs; (iii) features for disambiguating particles from prepositions and adverbs. The best resulting accuracy for the tagger on the Penn Treebank is 96.86% overall, and 86.91% on previously unseen words.

Type

a
Manning, C.D.: Part-of-Speech Tagging from 97% to 100% : is it time for some linguistics? (2011) 0.00
```
0.0011202524 = product of:
  0.0022405048 = sum of:
    0.0022405048 = product of:
      0.0067215143 = sum of:
        0.0067215143 = weight(_text_:a in 1121) [ClassicSimilarity], result of:
          0.0067215143 = score(doc=1121,freq=8.0), product of:
            0.052761257 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.045758117 = queryNorm
            0.12739488 = fieldWeight in 1121, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1121)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)
```
Abstract

I examine what would be necessary to move part-of-speech tagging performance from its current level of about 97.3% token accuracy (56% sentence accuracy) to close to 100% accuracy. I suggest that it must still be possible to greatly increase tagging performance and examine some useful improvements that have recently been made to the Stanford Part-of-Speech Tagger. However, an error analysis of some of the remaining errors suggests that there is limited further mileage to be had either from better machine learning or better features in a discriminative sequence classifier. The prospects for further gains from semisupervised learning also seem quite limited. Rather, I suggest and begin to demonstrate that the largest opportunity for further progress comes from improving the taxonomic basis of the linguistic resources from which taggers are trained. That is, from improved descriptive linguistics. However, I conclude by suggesting that there are also limits to this process. The status of some words may not be able to be adequately captured by assigning them to one of a small number of categories. While conventions can be used in such cases to improve tagging consistency, they lack a strong linguistic basis.

Type

a

Search (5 results, page 1 of 1)

Authors

Years

Types

Subjects

Classifications