Document (#26856)

Author
L'Homme, D.
L'Homme, M.-C.
Lemay, C.
Title
Benchmarking the performance of two Part-of-Speech (POS) taggers for terminological purposes
Source
Knowledge organization. 29(2002) nos.3/4, S.204-216
Year
2002
Abstract
Part-of-Speech (POS) taggers are used in an increasing number of terminology applications. However, terminologists do not know exactly how they perform an specialized texts since most POS taggers have been trained an "general" Corpora, that is, Corpora containing all sorts of undifferentiated texts. In this article, we evaluate the Performance of two POS taggers an French and English medical texts. The taggers are TnT (a statistical tagger developed at Saarland University (Brants 2000)) and WinBrill (the Windows version of the tagger initially developed by Eric Brill (1992)). Ten extracts from medical texts were submitted to the taggers and the outputs scanned manually. Results pertain to the accuracy of tagging in terms of correctly and incorrectly tagged words. We also study the handling of unknown words from different viewpoints.
Theme
Computerlinguistik

Similar documents (content)

  1. Manning, C.D.: Part-of-Speech Tagging from 97% to 100% : is it time for some linguistics? (2011) 0.18
    0.17779315 = sum of:
      0.17779315 = product of:
        0.7408048 = sum of:
          0.028578512 = weight(abstract_txt:part in 1121) [ClassicSimilarity], result of:
            0.028578512 = score(doc=1121,freq=2.0), product of:
              0.070588246 = queryWeight, product of:
                1.2523801 = boost
                4.580493 = idf(docFreq=1231, maxDocs=44218)
                0.012305068 = queryNorm
              0.4048622 = fieldWeight in 1121, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.580493 = idf(docFreq=1231, maxDocs=44218)
                0.0625 = fieldNorm(doc=1121)
          0.029523252 = weight(abstract_txt:performance in 1121) [ClassicSimilarity], result of:
            0.029523252 = score(doc=1121,freq=2.0), product of:
              0.072135456 = queryWeight, product of:
                1.266031 = boost
                4.63042 = idf(docFreq=1171, maxDocs=44218)
                0.012305068 = queryNorm
              0.40927517 = fieldWeight in 1121, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.63042 = idf(docFreq=1171, maxDocs=44218)
                0.0625 = fieldNorm(doc=1121)
          0.032253835 = weight(abstract_txt:words in 1121) [ClassicSimilarity], result of:
            0.032253835 = score(doc=1121,freq=1.0), product of:
              0.096405886 = queryWeight, product of:
                1.4635978 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.012305068 = queryNorm
              0.33456293 = fieldWeight in 1121, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.0625 = fieldNorm(doc=1121)
          0.09636129 = weight(abstract_txt:speech in 1121) [ClassicSimilarity], result of:
            0.09636129 = score(doc=1121,freq=2.0), product of:
              0.15872343 = queryWeight, product of:
                1.877978 = boost
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.012305068 = queryNorm
              0.60710186 = fieldWeight in 1121, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.0625 = fieldNorm(doc=1121)
          0.13852197 = weight(abstract_txt:tagger in 1121) [ClassicSimilarity], result of:
            0.13852197 = score(doc=1121,freq=1.0), product of:
              0.25471923 = queryWeight, product of:
                2.379035 = boost
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.012305068 = queryNorm
              0.54382217 = fieldWeight in 1121, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.0625 = fieldNorm(doc=1121)
          0.4155659 = weight(abstract_txt:taggers in 1121) [ClassicSimilarity], result of:
            0.4155659 = score(doc=1121,freq=1.0), product of:
              0.7641577 = queryWeight, product of:
                7.137105 = boost
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.012305068 = queryNorm
              0.54382217 = fieldWeight in 1121, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.0625 = fieldNorm(doc=1121)
        0.24 = coord(6/25)
    
  2. Perovsek, M.; Kranjca, J.; Erjaveca, T.; Cestnika, B.; Lavraca, N.: TextFlows : a visual programming platform for text mining and natural language processing (2016) 0.12
    0.11535646 = sum of:
      0.11535646 = product of:
        0.7209779 = sum of:
          0.025260076 = weight(abstract_txt:part in 2697) [ClassicSimilarity], result of:
            0.025260076 = score(doc=2697,freq=1.0), product of:
              0.070588246 = queryWeight, product of:
                1.2523801 = boost
                4.580493 = idf(docFreq=1231, maxDocs=44218)
                0.012305068 = queryNorm
              0.35785103 = fieldWeight in 2697, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.580493 = idf(docFreq=1231, maxDocs=44218)
                0.078125 = fieldNorm(doc=2697)
          0.085172154 = weight(abstract_txt:speech in 2697) [ClassicSimilarity], result of:
            0.085172154 = score(doc=2697,freq=1.0), product of:
              0.15872343 = queryWeight, product of:
                1.877978 = boost
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.012305068 = queryNorm
              0.5366073 = fieldWeight in 2697, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.078125 = fieldNorm(doc=2697)
          0.09108824 = weight(abstract_txt:corpora in 2697) [ClassicSimilarity], result of:
            0.09108824 = score(doc=2697,freq=1.0), product of:
              0.16599086 = queryWeight, product of:
                1.9204899 = boost
                7.0240583 = idf(docFreq=106, maxDocs=44218)
                0.012305068 = queryNorm
              0.5487546 = fieldWeight in 2697, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0240583 = idf(docFreq=106, maxDocs=44218)
                0.078125 = fieldNorm(doc=2697)
          0.5194574 = weight(abstract_txt:taggers in 2697) [ClassicSimilarity], result of:
            0.5194574 = score(doc=2697,freq=1.0), product of:
              0.7641577 = queryWeight, product of:
                7.137105 = boost
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.012305068 = queryNorm
              0.67977774 = fieldWeight in 2697, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.078125 = fieldNorm(doc=2697)
        0.16 = coord(4/25)
    
  3. Toutanova, K.; Manning, C.D.: Enriching the knowledge sources used in a maximum entropy Part-of-Speech Tagger (2000) 0.11
    0.10522061 = sum of:
      0.10522061 = product of:
        0.526103 = sum of:
          0.030312091 = weight(abstract_txt:part in 1060) [ClassicSimilarity], result of:
            0.030312091 = score(doc=1060,freq=1.0), product of:
              0.070588246 = queryWeight, product of:
                1.2523801 = boost
                4.580493 = idf(docFreq=1231, maxDocs=44218)
                0.012305068 = queryNorm
              0.42942122 = fieldWeight in 1060, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.580493 = idf(docFreq=1231, maxDocs=44218)
                0.09375 = fieldNorm(doc=1060)
          0.03131414 = weight(abstract_txt:performance in 1060) [ClassicSimilarity], result of:
            0.03131414 = score(doc=1060,freq=1.0), product of:
              0.072135456 = queryWeight, product of:
                1.266031 = boost
                4.63042 = idf(docFreq=1171, maxDocs=44218)
                0.012305068 = queryNorm
              0.43410188 = fieldWeight in 1060, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.63042 = idf(docFreq=1171, maxDocs=44218)
                0.09375 = fieldNorm(doc=1060)
          0.068420716 = weight(abstract_txt:words in 1060) [ClassicSimilarity], result of:
            0.068420716 = score(doc=1060,freq=2.0), product of:
              0.096405886 = queryWeight, product of:
                1.4635978 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.012305068 = queryNorm
              0.7097151 = fieldWeight in 1060, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.09375 = fieldNorm(doc=1060)
          0.10220658 = weight(abstract_txt:speech in 1060) [ClassicSimilarity], result of:
            0.10220658 = score(doc=1060,freq=1.0), product of:
              0.15872343 = queryWeight, product of:
                1.877978 = boost
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.012305068 = queryNorm
              0.64392877 = fieldWeight in 1060, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.09375 = fieldNorm(doc=1060)
          0.29384947 = weight(abstract_txt:tagger in 1060) [ClassicSimilarity], result of:
            0.29384947 = score(doc=1060,freq=2.0), product of:
              0.25471923 = queryWeight, product of:
                2.379035 = boost
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.012305068 = queryNorm
              1.1536211 = fieldWeight in 1060, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.09375 = fieldNorm(doc=1060)
        0.2 = coord(5/25)
    
  4. Dias, G.: Multiword unit hybrid extraction (o.J.) 0.10
    0.096854426 = sum of:
      0.096854426 = product of:
        0.4035601 = sum of:
          0.056644294 = weight(abstract_txt:extracts in 643) [ClassicSimilarity], result of:
            0.056644294 = score(doc=643,freq=1.0), product of:
              0.09598501 = queryWeight, product of:
                1.0326583 = boost
                7.5537524 = idf(docFreq=62, maxDocs=44218)
                0.012305068 = queryNorm
              0.5901369 = fieldWeight in 643, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5537524 = idf(docFreq=62, maxDocs=44218)
                0.078125 = fieldNorm(doc=643)
          0.059335537 = weight(abstract_txt:tagged in 643) [ClassicSimilarity], result of:
            0.059335537 = score(doc=643,freq=1.0), product of:
              0.099001676 = queryWeight, product of:
                1.0487603 = boost
                7.6715355 = idf(docFreq=55, maxDocs=44218)
                0.012305068 = queryNorm
              0.5993387 = fieldWeight in 643, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.6715355 = idf(docFreq=55, maxDocs=44218)
                0.078125 = fieldNorm(doc=643)
          0.035723142 = weight(abstract_txt:part in 643) [ClassicSimilarity], result of:
            0.035723142 = score(doc=643,freq=2.0), product of:
              0.070588246 = queryWeight, product of:
                1.2523801 = boost
                4.580493 = idf(docFreq=1231, maxDocs=44218)
                0.012305068 = queryNorm
              0.50607777 = fieldWeight in 643, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.580493 = idf(docFreq=1231, maxDocs=44218)
                0.078125 = fieldNorm(doc=643)
          0.040317293 = weight(abstract_txt:words in 643) [ClassicSimilarity], result of:
            0.040317293 = score(doc=643,freq=1.0), product of:
              0.096405886 = queryWeight, product of:
                1.4635978 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.012305068 = queryNorm
              0.41820365 = fieldWeight in 643, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.078125 = fieldNorm(doc=643)
          0.120451614 = weight(abstract_txt:speech in 643) [ClassicSimilarity], result of:
            0.120451614 = score(doc=643,freq=2.0), product of:
              0.15872343 = queryWeight, product of:
                1.877978 = boost
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.012305068 = queryNorm
              0.75887734 = fieldWeight in 643, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.078125 = fieldNorm(doc=643)
          0.09108824 = weight(abstract_txt:corpora in 643) [ClassicSimilarity], result of:
            0.09108824 = score(doc=643,freq=1.0), product of:
              0.16599086 = queryWeight, product of:
                1.9204899 = boost
                7.0240583 = idf(docFreq=106, maxDocs=44218)
                0.012305068 = queryNorm
              0.5487546 = fieldWeight in 643, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0240583 = idf(docFreq=106, maxDocs=44218)
                0.078125 = fieldNorm(doc=643)
        0.24 = coord(6/25)
    
  5. Lin, N.; Li, D.; Ding, Y.; He, B.; Qin, Z.; Tang, J.; Li, J.; Dong, T.: ¬The dynamic features of Delicious, Flickr, and YouTube (2012) 0.08
    0.0775723 = sum of:
      0.0775723 = product of:
        0.9696538 = sum of:
          0.13852197 = weight(abstract_txt:tagger in 4970) [ClassicSimilarity], result of:
            0.13852197 = score(doc=4970,freq=1.0), product of:
              0.25471923 = queryWeight, product of:
                2.379035 = boost
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.012305068 = queryNorm
              0.54382217 = fieldWeight in 4970, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.0625 = fieldNorm(doc=4970)
          0.8311318 = weight(abstract_txt:taggers in 4970) [ClassicSimilarity], result of:
            0.8311318 = score(doc=4970,freq=4.0), product of:
              0.7641577 = queryWeight, product of:
                7.137105 = boost
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.012305068 = queryNorm
              1.0876443 = fieldWeight in 4970, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                8.701155 = idf(docFreq=19, maxDocs=44218)
                0.0625 = fieldNorm(doc=4970)
        0.08 = coord(2/25)