Document (#37125)

Author
Schmid, H.
Title
Improvements in Part-of-Speech tagging with an application to German
Source
ftp://ftp.ims.uni-stuttgart.de/pub/corpora/tree-tagger2.pdf
Year
1995
Abstract
This paper presents a couple of extensions to a basic Markov Model tagger (called TreeTagger) which improve its accuracy when trained on small corpora. The basic tagger was originally developed for English Schmid, 1994. The extensions together reduced error rates on a German test corpus by more than a third.
Content
Beitrag für: Proceedings of the ACL SIGDAT-Workshop. Dublin, Ireland, 1995. Für die Software TreeTagger, vgl.: http://www.ims.uni-stuttgart.de/~schmid/.
Theme
Computerlinguistik
Object
TreeTagger

Similar documents (author)

  1. Schmid, F.: Weitere Betrachtungen zum alphabetischen Sachkatalog (1925) 5.38
    5.378652 = sum of:
      5.378652 = weight(author_txt:schmid in 619) [ClassicSimilarity], result of:
        5.378652 = score(doc=619,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            8.6058445 = idf(docFreq=21, maxDocs=44218)
            0.1162001 = queryNorm
          5.3786526 = fieldWeight in 619, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            8.6058445 = idf(docFreq=21, maxDocs=44218)
            0.625 = fieldNorm(doc=619)
    
  2. Schmid, F.: ¬Der alphabetische Sachkatalog mit besonderer Beziehung auf die Landesbibliothek in Stuttgart (1924) 5.38
    5.378652 = sum of:
      5.378652 = weight(author_txt:schmid in 620) [ClassicSimilarity], result of:
        5.378652 = score(doc=620,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            8.6058445 = idf(docFreq=21, maxDocs=44218)
            0.1162001 = queryNorm
          5.3786526 = fieldWeight in 620, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            8.6058445 = idf(docFreq=21, maxDocs=44218)
            0.625 = fieldNorm(doc=620)
    
  3. Schmid, F.: Mein letztes Wort zum alphabetischen Sachkatalog (1927) 5.38
    5.378652 = sum of:
      5.378652 = weight(author_txt:schmid in 621) [ClassicSimilarity], result of:
        5.378652 = score(doc=621,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            8.6058445 = idf(docFreq=21, maxDocs=44218)
            0.1162001 = queryNorm
          5.3786526 = fieldWeight in 621, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            8.6058445 = idf(docFreq=21, maxDocs=44218)
            0.625 = fieldNorm(doc=621)
    
  4. Schmid, B.: ¬Der Information Highway als Infrastruktur der Informationsgesellschaft (1996) 5.38
    5.378652 = sum of:
      5.378652 = weight(author_txt:schmid in 4966) [ClassicSimilarity], result of:
        5.378652 = score(doc=4966,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            8.6058445 = idf(docFreq=21, maxDocs=44218)
            0.1162001 = queryNorm
          5.3786526 = fieldWeight in 4966, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            8.6058445 = idf(docFreq=21, maxDocs=44218)
            0.625 = fieldNorm(doc=4966)
    
  5. Schmid, B.: Elektronische Märkte (1993) 5.38
    5.378652 = sum of:
      5.378652 = weight(author_txt:schmid in 7301) [ClassicSimilarity], result of:
        5.378652 = score(doc=7301,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            8.6058445 = idf(docFreq=21, maxDocs=44218)
            0.1162001 = queryNorm
          5.3786526 = fieldWeight in 7301, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            8.6058445 = idf(docFreq=21, maxDocs=44218)
            0.625 = fieldNorm(doc=7301)
    

Similar documents (content)

  1. L'Homme, D.; L'Homme, M.-C.; Lemay, C.: Benchmarking the performance of two Part-of-Speech (POS) taggers for terminological purposes (2002) 0.27
    0.26556262 = sum of:
      0.26556262 = product of:
        0.9484379 = sum of:
          0.04480623 = weight(abstract_txt:english in 1855) [ClassicSimilarity], result of:
            0.04480623 = score(doc=1855,freq=1.0), product of:
              0.10288468 = queryWeight, product of:
                1.1045824 = boost
                5.574394 = idf(docFreq=455, maxDocs=44218)
                0.016709171 = queryNorm
              0.43549955 = fieldWeight in 1855, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.574394 = idf(docFreq=455, maxDocs=44218)
                0.078125 = fieldNorm(doc=1855)
          0.055039816 = weight(abstract_txt:accuracy in 1855) [ClassicSimilarity], result of:
            0.055039816 = score(doc=1855,freq=1.0), product of:
              0.11800753 = queryWeight, product of:
                1.1829807 = boost
                5.9700394 = idf(docFreq=306, maxDocs=44218)
                0.016709171 = queryNorm
              0.46640933 = fieldWeight in 1855, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9700394 = idf(docFreq=306, maxDocs=44218)
                0.078125 = fieldNorm(doc=1855)
          0.06492057 = weight(abstract_txt:tagging in 1855) [ClassicSimilarity], result of:
            0.06492057 = score(doc=1855,freq=1.0), product of:
              0.13173868 = queryWeight, product of:
                1.2499119 = boost
                6.3078156 = idf(docFreq=218, maxDocs=44218)
                0.016709171 = queryNorm
              0.4927981 = fieldWeight in 1855, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.3078156 = idf(docFreq=218, maxDocs=44218)
                0.078125 = fieldNorm(doc=1855)
          0.0838195 = weight(abstract_txt:speech in 1855) [ClassicSimilarity], result of:
            0.0838195 = score(doc=1855,freq=1.0), product of:
              0.15620267 = queryWeight, product of:
                1.3610278 = boost
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.016709171 = queryNorm
              0.5366073 = fieldWeight in 1855, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.078125 = fieldNorm(doc=1855)
          0.1267724 = weight(abstract_txt:corpora in 1855) [ClassicSimilarity], result of:
            0.1267724 = score(doc=1855,freq=2.0), product of:
              0.16335468 = queryWeight, product of:
                1.3918375 = boost
                7.0240583 = idf(docFreq=106, maxDocs=44218)
                0.016709171 = queryNorm
              0.7760561 = fieldWeight in 1855, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.0240583 = idf(docFreq=106, maxDocs=44218)
                0.078125 = fieldNorm(doc=1855)
          0.09110825 = weight(abstract_txt:trained in 1855) [ClassicSimilarity], result of:
            0.09110825 = score(doc=1855,freq=1.0), product of:
              0.16513161 = queryWeight, product of:
                1.3993871 = boost
                7.062158 = idf(docFreq=102, maxDocs=44218)
                0.016709171 = queryNorm
              0.5517311 = fieldWeight in 1855, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.062158 = idf(docFreq=102, maxDocs=44218)
                0.078125 = fieldNorm(doc=1855)
          0.4819711 = weight(abstract_txt:tagger in 1855) [ClassicSimilarity], result of:
            0.4819711 = score(doc=1855,freq=2.0), product of:
              0.5013478 = queryWeight, product of:
                3.448318 = boost
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.016709171 = queryNorm
              0.96135086 = fieldWeight in 1855, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.078125 = fieldNorm(doc=1855)
        0.28 = coord(7/25)
    
  2. Manning, C.D.: Part-of-Speech Tagging from 97% to 100% : is it time for some linguistics? (2011) 0.24
    0.24127597 = sum of:
      0.24127597 = product of:
        0.75398743 = sum of:
          0.03140219 = weight(abstract_txt:small in 1121) [ClassicSimilarity], result of:
            0.03140219 = score(doc=1121,freq=1.0), product of:
              0.09419728 = queryWeight, product of:
                1.0569196 = boost
                5.333859 = idf(docFreq=579, maxDocs=44218)
                0.016709171 = queryNorm
              0.3333662 = fieldWeight in 1121, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.333859 = idf(docFreq=579, maxDocs=44218)
                0.0625 = fieldNorm(doc=1121)
          0.0762654 = weight(abstract_txt:accuracy in 1121) [ClassicSimilarity], result of:
            0.0762654 = score(doc=1121,freq=3.0), product of:
              0.11800753 = queryWeight, product of:
                1.1829807 = boost
                5.9700394 = idf(docFreq=306, maxDocs=44218)
                0.016709171 = queryNorm
              0.6462757 = fieldWeight in 1121, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.9700394 = idf(docFreq=306, maxDocs=44218)
                0.0625 = fieldNorm(doc=1121)
          0.049409922 = weight(abstract_txt:improvements in 1121) [ClassicSimilarity], result of:
            0.049409922 = score(doc=1121,freq=1.0), product of:
              0.12743084 = queryWeight, product of:
                1.2293061 = boost
                6.203826 = idf(docFreq=242, maxDocs=44218)
                0.016709171 = queryNorm
              0.38773912 = fieldWeight in 1121, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.203826 = idf(docFreq=242, maxDocs=44218)
                0.0625 = fieldNorm(doc=1121)
          0.08995658 = weight(abstract_txt:tagging in 1121) [ClassicSimilarity], result of:
            0.08995658 = score(doc=1121,freq=3.0), product of:
              0.13173868 = queryWeight, product of:
                1.2499119 = boost
                6.3078156 = idf(docFreq=218, maxDocs=44218)
                0.016709171 = queryNorm
              0.68284106 = fieldWeight in 1121, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.3078156 = idf(docFreq=218, maxDocs=44218)
                0.0625 = fieldNorm(doc=1121)
          0.06659176 = weight(abstract_txt:error in 1121) [ClassicSimilarity], result of:
            0.06659176 = score(doc=1121,freq=1.0), product of:
              0.15548152 = queryWeight, product of:
                1.3578824 = boost
                6.8527 = idf(docFreq=126, maxDocs=44218)
                0.016709171 = queryNorm
              0.42829376 = fieldWeight in 1121, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8527 = idf(docFreq=126, maxDocs=44218)
                0.0625 = fieldNorm(doc=1121)
          0.09483093 = weight(abstract_txt:speech in 1121) [ClassicSimilarity], result of:
            0.09483093 = score(doc=1121,freq=2.0), product of:
              0.15620267 = queryWeight, product of:
                1.3610278 = boost
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.016709171 = queryNorm
              0.60710186 = fieldWeight in 1121, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.0625 = fieldNorm(doc=1121)
          0.0728866 = weight(abstract_txt:trained in 1121) [ClassicSimilarity], result of:
            0.0728866 = score(doc=1121,freq=1.0), product of:
              0.16513161 = queryWeight, product of:
                1.3993871 = boost
                7.062158 = idf(docFreq=102, maxDocs=44218)
                0.016709171 = queryNorm
              0.44138488 = fieldWeight in 1121, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.062158 = idf(docFreq=102, maxDocs=44218)
                0.0625 = fieldNorm(doc=1121)
          0.27264404 = weight(abstract_txt:tagger in 1121) [ClassicSimilarity], result of:
            0.27264404 = score(doc=1121,freq=1.0), product of:
              0.5013478 = queryWeight, product of:
                3.448318 = boost
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.016709171 = queryNorm
              0.54382217 = fieldWeight in 1121, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.0625 = fieldNorm(doc=1121)
        0.32 = coord(8/25)
    
  3. Toutanova, K.; Klein, D.; Manning, C.D.; Singer, Y.: Feature-rich Part-of-Speech Tagging with a cyclic dependency network (2003) 0.23
    0.23227607 = sum of:
      0.23227607 = product of:
        0.96781695 = sum of:
          0.045028016 = weight(abstract_txt:together in 1059) [ClassicSimilarity], result of:
            0.045028016 = score(doc=1059,freq=1.0), product of:
              0.09140981 = queryWeight, product of:
                1.041164 = boost
                5.254347 = idf(docFreq=627, maxDocs=44218)
                0.016709171 = queryNorm
              0.49259502 = fieldWeight in 1059, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.254347 = idf(docFreq=627, maxDocs=44218)
                0.09375 = fieldNorm(doc=1059)
          0.06604778 = weight(abstract_txt:accuracy in 1059) [ClassicSimilarity], result of:
            0.06604778 = score(doc=1059,freq=1.0), product of:
              0.11800753 = queryWeight, product of:
                1.1829807 = boost
                5.9700394 = idf(docFreq=306, maxDocs=44218)
                0.016709171 = queryNorm
              0.5596912 = fieldWeight in 1059, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9700394 = idf(docFreq=306, maxDocs=44218)
                0.09375 = fieldNorm(doc=1059)
          0.07790468 = weight(abstract_txt:tagging in 1059) [ClassicSimilarity], result of:
            0.07790468 = score(doc=1059,freq=1.0), product of:
              0.13173868 = queryWeight, product of:
                1.2499119 = boost
                6.3078156 = idf(docFreq=218, maxDocs=44218)
                0.016709171 = queryNorm
              0.5913577 = fieldWeight in 1059, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.3078156 = idf(docFreq=218, maxDocs=44218)
                0.09375 = fieldNorm(doc=1059)
          0.099887654 = weight(abstract_txt:error in 1059) [ClassicSimilarity], result of:
            0.099887654 = score(doc=1059,freq=1.0), product of:
              0.15548152 = queryWeight, product of:
                1.3578824 = boost
                6.8527 = idf(docFreq=126, maxDocs=44218)
                0.016709171 = queryNorm
              0.6424407 = fieldWeight in 1059, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8527 = idf(docFreq=126, maxDocs=44218)
                0.09375 = fieldNorm(doc=1059)
          0.1005834 = weight(abstract_txt:speech in 1059) [ClassicSimilarity], result of:
            0.1005834 = score(doc=1059,freq=1.0), product of:
              0.15620267 = queryWeight, product of:
                1.3610278 = boost
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.016709171 = queryNorm
              0.64392877 = fieldWeight in 1059, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.09375 = fieldNorm(doc=1059)
          0.5783654 = weight(abstract_txt:tagger in 1059) [ClassicSimilarity], result of:
            0.5783654 = score(doc=1059,freq=2.0), product of:
              0.5013478 = queryWeight, product of:
                3.448318 = boost
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.016709171 = queryNorm
              1.1536211 = fieldWeight in 1059, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.09375 = fieldNorm(doc=1059)
        0.24 = coord(6/25)
    
  4. Bergler, S.: Generative lexicon principles for machine translation : a case for meta-lexical structure (1994/95) 0.16
    0.15737776 = sum of:
      0.15737776 = product of:
        0.78688884 = sum of:
          0.07603869 = weight(abstract_txt:english in 4072) [ClassicSimilarity], result of:
            0.07603869 = score(doc=4072,freq=2.0), product of:
              0.10288468 = queryWeight, product of:
                1.1045824 = boost
                5.574394 = idf(docFreq=455, maxDocs=44218)
                0.016709171 = queryNorm
              0.7390672 = fieldWeight in 4072, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.574394 = idf(docFreq=455, maxDocs=44218)
                0.09375 = fieldNorm(doc=4072)
          0.17421556 = weight(abstract_txt:speech in 4072) [ClassicSimilarity], result of:
            0.17421556 = score(doc=4072,freq=3.0), product of:
              0.15620267 = queryWeight, product of:
                1.3610278 = boost
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.016709171 = queryNorm
              1.1153173 = fieldWeight in 4072, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.09375 = fieldNorm(doc=4072)
          0.07624252 = weight(abstract_txt:basic in 4072) [ClassicSimilarity], result of:
            0.07624252 = score(doc=4072,freq=1.0), product of:
              0.16361098 = queryWeight, product of:
                1.969899 = boost
                4.970654 = idf(docFreq=833, maxDocs=44218)
                0.016709171 = queryNorm
              0.46599883 = fieldWeight in 4072, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.970654 = idf(docFreq=833, maxDocs=44218)
                0.09375 = fieldNorm(doc=4072)
          0.21615398 = weight(abstract_txt:german in 4072) [ClassicSimilarity], result of:
            0.21615398 = score(doc=4072,freq=2.0), product of:
              0.2601236 = queryWeight, product of:
                2.483863 = boost
                6.2675414 = idf(docFreq=227, maxDocs=44218)
                0.016709171 = queryNorm
              0.8309664 = fieldWeight in 4072, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.2675414 = idf(docFreq=227, maxDocs=44218)
                0.09375 = fieldNorm(doc=4072)
          0.24423805 = weight(abstract_txt:extensions in 4072) [ClassicSimilarity], result of:
            0.24423805 = score(doc=4072,freq=1.0), product of:
              0.35554108 = queryWeight, product of:
                2.9039066 = boost
                7.3274393 = idf(docFreq=78, maxDocs=44218)
                0.016709171 = queryNorm
              0.68694746 = fieldWeight in 4072, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3274393 = idf(docFreq=78, maxDocs=44218)
                0.09375 = fieldNorm(doc=4072)
        0.2 = coord(5/25)
    
  5. Toutanova, K.; Manning, C.D.: Enriching the knowledge sources used in a maximum entropy Part-of-Speech Tagger (2000) 0.13
    0.1316642 = sum of:
      0.1316642 = product of:
        0.82290125 = sum of:
          0.06604778 = weight(abstract_txt:accuracy in 1060) [ClassicSimilarity], result of:
            0.06604778 = score(doc=1060,freq=1.0), product of:
              0.11800753 = queryWeight, product of:
                1.1829807 = boost
                5.9700394 = idf(docFreq=306, maxDocs=44218)
                0.016709171 = queryNorm
              0.5596912 = fieldWeight in 1060, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9700394 = idf(docFreq=306, maxDocs=44218)
                0.09375 = fieldNorm(doc=1060)
          0.07790468 = weight(abstract_txt:tagging in 1060) [ClassicSimilarity], result of:
            0.07790468 = score(doc=1060,freq=1.0), product of:
              0.13173868 = queryWeight, product of:
                1.2499119 = boost
                6.3078156 = idf(docFreq=218, maxDocs=44218)
                0.016709171 = queryNorm
              0.5913577 = fieldWeight in 1060, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.3078156 = idf(docFreq=218, maxDocs=44218)
                0.09375 = fieldNorm(doc=1060)
          0.1005834 = weight(abstract_txt:speech in 1060) [ClassicSimilarity], result of:
            0.1005834 = score(doc=1060,freq=1.0), product of:
              0.15620267 = queryWeight, product of:
                1.3610278 = boost
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.016709171 = queryNorm
              0.64392877 = fieldWeight in 1060, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.09375 = fieldNorm(doc=1060)
          0.5783654 = weight(abstract_txt:tagger in 1060) [ClassicSimilarity], result of:
            0.5783654 = score(doc=1060,freq=2.0), product of:
              0.5013478 = queryWeight, product of:
                3.448318 = boost
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.016709171 = queryNorm
              1.1536211 = fieldWeight in 1060, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.09375 = fieldNorm(doc=1060)
        0.16 = coord(4/25)