Document (#38061)

Author
Toutanova, K.
Manning, C.D.
Title
Enriching the knowledge sources used in a maximum entropy Part-of-Speech Tagger
Source
Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (EMNLP/VLC-2000)
Imprint
xx : xx
Year
2000
Pages
S.63-70
Abstract
This paper presents results for a maximumentropy-based part of speech tagger, which achieves superior performance principally by enriching the information sources used for tagging. In particular, we get improved results by incorporating these features: (i) more extensive treatment of capitalization for unknown words; (ii) features for the disambiguation of the tense forms of verbs; (iii) features for disambiguating particles from prepositions and adverbs. The best resulting accuracy for the tagger on the Penn Treebank is 96.86% overall, and 86.91% on previously unseen words.
Content
Vgl.: http://nlp.stanford.edu/software/tagger.shtml.
Theme
Computerlinguistik
Object
Stanford POS Tagger

Similar documents (author)

  1. Manning, R.W.: ¬The Anglo-American Cataloguing Rules and their future (1999) 5.76
    5.7574883 = sum of:
      5.7574883 = weight(author_txt:manning in 6809) [ClassicSimilarity], result of:
        5.7574883 = fieldWeight in 6809, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.211981 = idf(docFreq=11, maxDocs=44218)
          0.625 = fieldNorm(doc=6809)
    
  2. Manning, R.W.: ¬The Anglo American Cataloguing Rules and their future (2000) 5.76
    5.7574883 = sum of:
      5.7574883 = weight(author_txt:manning in 189) [ClassicSimilarity], result of:
        5.7574883 = fieldWeight in 189, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.211981 = idf(docFreq=11, maxDocs=44218)
          0.625 = fieldNorm(doc=189)
    
  3. Manning, C.D.: Part-of-Speech Tagging from 97% to 100% : is it time for some linguistics? (2011) 5.76
    5.7574883 = sum of:
      5.7574883 = weight(author_txt:manning in 1121) [ClassicSimilarity], result of:
        5.7574883 = fieldWeight in 1121, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.211981 = idf(docFreq=11, maxDocs=44218)
          0.625 = fieldNorm(doc=1121)
    
  4. Mallett, J.; Manning, C.: Multimedia and database design : a discussion of database technology and its use in multimedia (1993) 4.61
    4.6059904 = sum of:
      4.6059904 = weight(author_txt:manning in 6277) [ClassicSimilarity], result of:
        4.6059904 = fieldWeight in 6277, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.211981 = idf(docFreq=11, maxDocs=44218)
          0.5 = fieldNorm(doc=6277)
    
  5. Manning, C.D.; Schütze, H.: Foundations of statistical natural language processing (2000) 4.61
    4.6059904 = sum of:
      4.6059904 = weight(author_txt:manning in 1603) [ClassicSimilarity], result of:
        4.6059904 = fieldWeight in 1603, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.211981 = idf(docFreq=11, maxDocs=44218)
          0.5 = fieldNorm(doc=1603)
    

Similar documents (content)

  1. Toutanova, K.; Klein, D.; Manning, C.D.; Singer, Y.: Feature-rich Part-of-Speech Tagging with a cyclic dependency network (2003) 0.35
    0.3492241 = sum of:
      0.3492241 = product of:
        1.2472289 = sum of:
          0.085697174 = weight(abstract_txt:unknown in 1059) [ClassicSimilarity], result of:
            0.085697174 = score(doc=1059,freq=1.0), product of:
              0.12661497 = queryWeight, product of:
                1.1071677 = boost
                7.2195506 = idf(docFreq=87, maxDocs=44218)
                0.01584023 = queryNorm
              0.67683285 = fieldWeight in 1059, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.2195506 = idf(docFreq=87, maxDocs=44218)
                0.09375 = fieldNorm(doc=1059)
          0.17343028 = weight(abstract_txt:penn in 1059) [ClassicSimilarity], result of:
            0.17343028 = score(doc=1059,freq=1.0), product of:
              0.20257726 = queryWeight, product of:
                1.4004456 = boost
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.01584023 = queryNorm
              0.85611916 = fieldWeight in 1059, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.09375 = fieldNorm(doc=1059)
          0.043772634 = weight(abstract_txt:part in 1059) [ClassicSimilarity], result of:
            0.043772634 = score(doc=1059,freq=1.0), product of:
              0.10193403 = queryWeight, product of:
                1.4049002 = boost
                4.580493 = idf(docFreq=1231, maxDocs=44218)
                0.01584023 = queryNorm
              0.42942122 = fieldWeight in 1059, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.580493 = idf(docFreq=1231, maxDocs=44218)
                0.09375 = fieldNorm(doc=1059)
          0.069864966 = weight(abstract_txt:words in 1059) [ClassicSimilarity], result of:
            0.069864966 = score(doc=1059,freq=1.0), product of:
              0.1392164 = queryWeight, product of:
                1.6418409 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.01584023 = queryNorm
              0.5018444 = fieldWeight in 1059, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.09375 = fieldNorm(doc=1059)
          0.09036418 = weight(abstract_txt:features in 1059) [ClassicSimilarity], result of:
            0.09036418 = score(doc=1059,freq=2.0), product of:
              0.15015347 = queryWeight, product of:
                2.0883303 = boost
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.01584023 = queryNorm
              0.6018121 = fieldWeight in 1059, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.09375 = fieldNorm(doc=1059)
          0.14759299 = weight(abstract_txt:speech in 1059) [ClassicSimilarity], result of:
            0.14759299 = score(doc=1059,freq=1.0), product of:
              0.22920701 = queryWeight, product of:
                2.106686 = boost
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.01584023 = queryNorm
              0.64392877 = fieldWeight in 1059, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.09375 = fieldNorm(doc=1059)
          0.6365067 = weight(abstract_txt:tagger in 1059) [ClassicSimilarity], result of:
            0.6365067 = score(doc=1059,freq=2.0), product of:
              0.5517467 = queryWeight, product of:
                4.0031457 = boost
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.01584023 = queryNorm
              1.1536211 = fieldWeight in 1059, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.09375 = fieldNorm(doc=1059)
        0.28 = coord(7/25)
    
  2. L'Homme, D.; L'Homme, M.-C.; Lemay, C.: Benchmarking the performance of two Part-of-Speech (POS) taggers for terminological purposes (2002) 0.24
    0.24473774 = sum of:
      0.24473774 = product of:
        0.8740634 = sum of:
          0.014389007 = weight(abstract_txt:used in 1855) [ClassicSimilarity], result of:
            0.014389007 = score(doc=1855,freq=1.0), product of:
              0.054826703 = queryWeight, product of:
                1.0303433 = boost
                3.3592992 = idf(docFreq=4177, maxDocs=44218)
                0.01584023 = queryNorm
              0.26244524 = fieldWeight in 1855, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3592992 = idf(docFreq=4177, maxDocs=44218)
                0.078125 = fieldNorm(doc=1855)
          0.016029835 = weight(abstract_txt:results in 1855) [ClassicSimilarity], result of:
            0.016029835 = score(doc=1855,freq=1.0), product of:
              0.0589193 = queryWeight, product of:
                1.0681068 = boost
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.01584023 = queryNorm
              0.27206424 = fieldWeight in 1855, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.078125 = fieldNorm(doc=1855)
          0.07141431 = weight(abstract_txt:unknown in 1855) [ClassicSimilarity], result of:
            0.07141431 = score(doc=1855,freq=1.0), product of:
              0.12661497 = queryWeight, product of:
                1.1071677 = boost
                7.2195506 = idf(docFreq=87, maxDocs=44218)
                0.01584023 = queryNorm
              0.56402737 = fieldWeight in 1855, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.2195506 = idf(docFreq=87, maxDocs=44218)
                0.078125 = fieldNorm(doc=1855)
          0.036477197 = weight(abstract_txt:part in 1855) [ClassicSimilarity], result of:
            0.036477197 = score(doc=1855,freq=1.0), product of:
              0.10193403 = queryWeight, product of:
                1.4049002 = boost
                4.580493 = idf(docFreq=1231, maxDocs=44218)
                0.01584023 = queryNorm
              0.35785103 = fieldWeight in 1855, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.580493 = idf(docFreq=1231, maxDocs=44218)
                0.078125 = fieldNorm(doc=1855)
          0.08233666 = weight(abstract_txt:words in 1855) [ClassicSimilarity], result of:
            0.08233666 = score(doc=1855,freq=2.0), product of:
              0.1392164 = queryWeight, product of:
                1.6418409 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.01584023 = queryNorm
              0.5914293 = fieldWeight in 1855, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.078125 = fieldNorm(doc=1855)
          0.12299416 = weight(abstract_txt:speech in 1855) [ClassicSimilarity], result of:
            0.12299416 = score(doc=1855,freq=1.0), product of:
              0.22920701 = queryWeight, product of:
                2.106686 = boost
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.01584023 = queryNorm
              0.5366073 = fieldWeight in 1855, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.078125 = fieldNorm(doc=1855)
          0.5304222 = weight(abstract_txt:tagger in 1855) [ClassicSimilarity], result of:
            0.5304222 = score(doc=1855,freq=2.0), product of:
              0.5517467 = queryWeight, product of:
                4.0031457 = boost
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.01584023 = queryNorm
              0.96135086 = fieldWeight in 1855, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.078125 = fieldNorm(doc=1855)
        0.28 = coord(7/25)
    
  3. Rishel, T.; Perkins, L.A.; Yenduri, S.; Zand, F.: Determining the context of text using augmented latent semantic indexing (2007) 0.16
    0.15938023 = sum of:
      0.15938023 = product of:
        0.7969011 = sum of:
          0.024922492 = weight(abstract_txt:used in 1316) [ClassicSimilarity], result of:
            0.024922492 = score(doc=1316,freq=3.0), product of:
              0.054826703 = queryWeight, product of:
                1.0303433 = boost
                3.3592992 = idf(docFreq=4177, maxDocs=44218)
                0.01584023 = queryNorm
              0.4545685 = fieldWeight in 1316, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.3592992 = idf(docFreq=4177, maxDocs=44218)
                0.078125 = fieldNorm(doc=1316)
          0.016029835 = weight(abstract_txt:results in 1316) [ClassicSimilarity], result of:
            0.016029835 = score(doc=1316,freq=1.0), product of:
              0.0589193 = queryWeight, product of:
                1.0681068 = boost
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.01584023 = queryNorm
              0.27206424 = fieldWeight in 1316, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.078125 = fieldNorm(doc=1316)
          0.051586546 = weight(abstract_txt:part in 1316) [ClassicSimilarity], result of:
            0.051586546 = score(doc=1316,freq=2.0), product of:
              0.10193403 = queryWeight, product of:
                1.4049002 = boost
                4.580493 = idf(docFreq=1231, maxDocs=44218)
                0.01584023 = queryNorm
              0.50607777 = fieldWeight in 1316, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.580493 = idf(docFreq=1231, maxDocs=44218)
                0.078125 = fieldNorm(doc=1316)
          0.17394 = weight(abstract_txt:speech in 1316) [ClassicSimilarity], result of:
            0.17394 = score(doc=1316,freq=2.0), product of:
              0.22920701 = queryWeight, product of:
                2.106686 = boost
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.01584023 = queryNorm
              0.75887734 = fieldWeight in 1316, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.078125 = fieldNorm(doc=1316)
          0.5304222 = weight(abstract_txt:tagger in 1316) [ClassicSimilarity], result of:
            0.5304222 = score(doc=1316,freq=2.0), product of:
              0.5517467 = queryWeight, product of:
                4.0031457 = boost
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.01584023 = queryNorm
              0.96135086 = fieldWeight in 1316, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.078125 = fieldNorm(doc=1316)
        0.2 = coord(5/25)
    
  4. Manning, C.D.: Part-of-Speech Tagging from 97% to 100% : is it time for some linguistics? (2011) 0.14
    0.13947822 = sum of:
      0.13947822 = product of:
        0.5811593 = sum of:
          0.011511206 = weight(abstract_txt:used in 1121) [ClassicSimilarity], result of:
            0.011511206 = score(doc=1121,freq=1.0), product of:
              0.054826703 = queryWeight, product of:
                1.0303433 = boost
                3.3592992 = idf(docFreq=4177, maxDocs=44218)
                0.01584023 = queryNorm
              0.2099562 = fieldWeight in 1121, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3592992 = idf(docFreq=4177, maxDocs=44218)
                0.0625 = fieldNorm(doc=1121)
          0.041269235 = weight(abstract_txt:part in 1121) [ClassicSimilarity], result of:
            0.041269235 = score(doc=1121,freq=2.0), product of:
              0.10193403 = queryWeight, product of:
                1.4049002 = boost
                4.580493 = idf(docFreq=1231, maxDocs=44218)
                0.01584023 = queryNorm
              0.4048622 = fieldWeight in 1121, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.580493 = idf(docFreq=1231, maxDocs=44218)
                0.0625 = fieldNorm(doc=1121)
          0.046576645 = weight(abstract_txt:words in 1121) [ClassicSimilarity], result of:
            0.046576645 = score(doc=1121,freq=1.0), product of:
              0.1392164 = queryWeight, product of:
                1.6418409 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.01584023 = queryNorm
              0.33456293 = fieldWeight in 1121, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.0625 = fieldNorm(doc=1121)
          0.042598087 = weight(abstract_txt:features in 1121) [ClassicSimilarity], result of:
            0.042598087 = score(doc=1121,freq=1.0), product of:
              0.15015347 = queryWeight, product of:
                2.0883303 = boost
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.01584023 = queryNorm
              0.28369698 = fieldWeight in 1121, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.0625 = fieldNorm(doc=1121)
          0.139152 = weight(abstract_txt:speech in 1121) [ClassicSimilarity], result of:
            0.139152 = score(doc=1121,freq=2.0), product of:
              0.22920701 = queryWeight, product of:
                2.106686 = boost
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.01584023 = queryNorm
              0.60710186 = fieldWeight in 1121, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.0625 = fieldNorm(doc=1121)
          0.3000521 = weight(abstract_txt:tagger in 1121) [ClassicSimilarity], result of:
            0.3000521 = score(doc=1121,freq=1.0), product of:
              0.5517467 = queryWeight, product of:
                4.0031457 = boost
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.01584023 = queryNorm
              0.54382217 = fieldWeight in 1121, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.0625 = fieldNorm(doc=1121)
        0.24 = coord(6/25)
    
  5. Brychcín, T.; Konopík, M.: HPS: High precision stemmer (2015) 0.11
    0.11411043 = sum of:
      0.11411043 = product of:
        0.40753725 = sum of:
          0.016279303 = weight(abstract_txt:used in 2686) [ClassicSimilarity], result of:
            0.016279303 = score(doc=2686,freq=2.0), product of:
              0.054826703 = queryWeight, product of:
                1.0303433 = boost
                3.3592992 = idf(docFreq=4177, maxDocs=44218)
                0.01584023 = queryNorm
              0.2969229 = fieldWeight in 2686, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.3592992 = idf(docFreq=4177, maxDocs=44218)
                0.0625 = fieldNorm(doc=2686)
          0.018135687 = weight(abstract_txt:results in 2686) [ClassicSimilarity], result of:
            0.018135687 = score(doc=2686,freq=2.0), product of:
              0.0589193 = queryWeight, product of:
                1.0681068 = boost
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.01584023 = queryNorm
              0.30780554 = fieldWeight in 2686, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.0625 = fieldNorm(doc=2686)
          0.057678968 = weight(abstract_txt:maximum in 2686) [ClassicSimilarity], result of:
            0.057678968 = score(doc=2686,freq=1.0), product of:
              0.12742263 = queryWeight, product of:
                1.1106933 = boost
                7.24254 = idf(docFreq=85, maxDocs=44218)
                0.01584023 = queryNorm
              0.45265874 = fieldWeight in 2686, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.24254 = idf(docFreq=85, maxDocs=44218)
                0.0625 = fieldNorm(doc=2686)
          0.0765519 = weight(abstract_txt:entropy in 2686) [ClassicSimilarity], result of:
            0.0765519 = score(doc=2686,freq=1.0), product of:
              0.15388829 = queryWeight, product of:
                1.2206008 = boost
                7.9592175 = idf(docFreq=41, maxDocs=44218)
                0.01584023 = queryNorm
              0.4974511 = fieldWeight in 2686, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.9592175 = idf(docFreq=41, maxDocs=44218)
                0.0625 = fieldNorm(doc=2686)
          0.11562019 = weight(abstract_txt:unseen in 2686) [ClassicSimilarity], result of:
            0.11562019 = score(doc=2686,freq=1.0), product of:
              0.20257726 = queryWeight, product of:
                1.4004456 = boost
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.01584023 = queryNorm
              0.5707461 = fieldWeight in 2686, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.0625 = fieldNorm(doc=2686)
          0.08067311 = weight(abstract_txt:words in 2686) [ClassicSimilarity], result of:
            0.08067311 = score(doc=2686,freq=3.0), product of:
              0.1392164 = queryWeight, product of:
                1.6418409 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.01584023 = queryNorm
              0.57948 = fieldWeight in 2686, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.0625 = fieldNorm(doc=2686)
          0.042598087 = weight(abstract_txt:features in 2686) [ClassicSimilarity], result of:
            0.042598087 = score(doc=2686,freq=1.0), product of:
              0.15015347 = queryWeight, product of:
                2.0883303 = boost
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.01584023 = queryNorm
              0.28369698 = fieldWeight in 2686, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.0625 = fieldNorm(doc=2686)
        0.28 = coord(7/25)