Search (4 results, page 1 of 1)

  • × author_ss:"Manning, C.D."
  • × theme_ss:"Computerlinguistik"
  1. Toutanova, K.; Manning, C.D.: Enriching the knowledge sources used in a maximum entropy Part-of-Speech Tagger (2000) 0.01
    0.005513504 = product of:
      0.01378376 = sum of:
        0.008258085 = weight(_text_:a in 1060) [ClassicSimilarity], result of:
          0.008258085 = score(doc=1060,freq=6.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.1544581 = fieldWeight in 1060, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1060)
        0.005525676 = product of:
          0.011051352 = sum of:
            0.011051352 = weight(_text_:information in 1060) [ClassicSimilarity], result of:
              0.011051352 = score(doc=1060,freq=2.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.13576832 = fieldWeight in 1060, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1060)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    This paper presents results for a maximumentropy-based part of speech tagger, which achieves superior performance principally by enriching the information sources used for tagging. In particular, we get improved results by incorporating these features: (i) more extensive treatment of capitalization for unknown words; (ii) features for the disambiguation of the tense forms of verbs; (iii) features for disambiguating particles from prepositions and adverbs. The best resulting accuracy for the tagger on the Penn Treebank is 96.86% overall, and 86.91% on previously unseen words.
    Type
    a
  2. Toutanova, K.; Klein, D.; Manning, C.D.; Singer, Y.: Feature-rich Part-of-Speech Tagging with a cyclic dependency network (2003) 0.00
    0.0021322283 = product of:
      0.010661141 = sum of:
        0.010661141 = weight(_text_:a in 1059) [ClassicSimilarity], result of:
          0.010661141 = score(doc=1059,freq=10.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.19940455 = fieldWeight in 1059, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1059)
      0.2 = coord(1/5)
    
    Abstract
    We present a new part-of-speech tagger that demonstrates the following ideas: (i) explicit use of both preceding and following tag contexts via a dependency network representation, (ii) broad use of lexical features, including jointly conditioning on multiple consecutive words, (iii) effective use of priors in conditional loglinear models, and (iv) fine-grained modeling of unknown word features. Using these ideas together, the resulting tagger gives a 97.24%accuracy on the Penn TreebankWSJ, an error reduction of 4.4% on the best previous single automatically learned tagging result.
    Type
    a
  3. Manning, C.D.: Part-of-Speech Tagging from 97% to 100% : is it time for some linguistics? (2011) 0.00
    0.0013622305 = product of:
      0.0068111527 = sum of:
        0.0068111527 = weight(_text_:a in 1121) [ClassicSimilarity], result of:
          0.0068111527 = score(doc=1121,freq=8.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.12739488 = fieldWeight in 1121, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1121)
      0.2 = coord(1/5)
    
    Abstract
    I examine what would be necessary to move part-of-speech tagging performance from its current level of about 97.3% token accuracy (56% sentence accuracy) to close to 100% accuracy. I suggest that it must still be possible to greatly increase tagging performance and examine some useful improvements that have recently been made to the Stanford Part-of-Speech Tagger. However, an error analysis of some of the remaining errors suggests that there is limited further mileage to be had either from better machine learning or better features in a discriminative sequence classifier. The prospects for further gains from semisupervised learning also seem quite limited. Rather, I suggest and begin to demonstrate that the largest opportunity for further progress comes from improving the taxonomic basis of the linguistic resources from which taggers are trained. That is, from improved descriptive linguistics. However, I conclude by suggesting that there are also limits to this process. The status of some words may not be able to be adequately captured by assigning them to one of a small number of categories. While conventions can be used in such cases to improve tagging consistency, they lack a strong linguistic basis.
    Type
    a
  4. Manning, C.D.; Schütze, H.: Foundations of statistical natural language processing (2000) 0.00
    9.472587E-4 = product of:
      0.0047362936 = sum of:
        0.0047362936 = product of:
          0.009472587 = sum of:
            0.009472587 = weight(_text_:information in 1603) [ClassicSimilarity], result of:
              0.009472587 = score(doc=1603,freq=2.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.116372846 = fieldWeight in 1603, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1603)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Abstract
    Statistical approaches to processing natural language text have become dominant in recent years. This foundational text is the first comprehensive introduction to statistical Natural Language Processing (NLP) to appear. The book contains all the theory and algorithms needed for building NLP tools. It provides broad but rigorous coverage of mathematical and linguistic foundations, as well as detailed discussion of statistical methods, allowing students and researchers to construct their own implementations. The book covers collocation finding, word sense disambiguation, probabilistic parsing, information retrieval, and other applications.