Document (#37644)

Author
Dias, G.
Title
Multiword unit hybrid extraction
Source
http://acl.ldc.upenn.edu/W/W03/W03-1806.pdf
Year
o.J.
Abstract
This paper describes an original hybrid system that extracts multiword unit candidates from part-of-speech tagged corpora. While classical hybrid systems manually define local part-of-speech patterns that lead to the identification of well-known multiword units (mainly compound nouns), our solution automatically identifies relevant syntactical patterns from the corpus. Word statistics are then combined with the endogenously acquired linguistic information in order to extract the most relevant sequences of words. As a result, (1) human intervention is avoided providing total flexibility of use of the system and (2) different multiword units like phrasal verbs, adverbial locutions and prepositional locutions may be identified. The system has been tested on the Brown Corpus leading to encouraging results
Theme
Computerlinguistik

Similar documents (content)

  1. Nagy T., I.: Detecting multiword expressions and named entities in natural language texts (2014) 0.19
    0.18518342 = sum of:
      0.18518342 = product of:
        0.9259171 = sum of:
          0.036072332 = weight(abstract_txt:compound in 1536) [ClassicSimilarity], result of:
            0.036072332 = score(doc=1536,freq=1.0), product of:
              0.12325517 = queryWeight, product of:
                1.0666475 = boost
                7.4921947 = idf(docFreq=66, maxDocs=44218)
                0.015423224 = queryNorm
              0.29266384 = fieldWeight in 1536, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.4921947 = idf(docFreq=66, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1536)
          0.041781195 = weight(abstract_txt:candidates in 1536) [ClassicSimilarity], result of:
            0.041781195 = score(doc=1536,freq=1.0), product of:
              0.13593863 = queryWeight, product of:
                1.1201851 = boost
                7.8682456 = idf(docFreq=45, maxDocs=44218)
                0.015423224 = queryNorm
              0.30735335 = fieldWeight in 1536, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.8682456 = idf(docFreq=45, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1536)
          0.03890799 = weight(abstract_txt:corpus in 1536) [ClassicSimilarity], result of:
            0.03890799 = score(doc=1536,freq=1.0), product of:
              0.16332708 = queryWeight, product of:
                1.7364507 = boost
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.015423224 = queryNorm
              0.2382213 = fieldWeight in 1536, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1536)
          0.07208697 = weight(abstract_txt:unit in 1536) [ClassicSimilarity], result of:
            0.07208697 = score(doc=1536,freq=2.0), product of:
              0.19555105 = queryWeight, product of:
                1.900043 = boost
                6.6730065 = idf(docFreq=151, maxDocs=44218)
                0.015423224 = queryNorm
              0.36863503 = fieldWeight in 1536, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.6730065 = idf(docFreq=151, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1536)
          0.7370686 = weight(abstract_txt:multiword in 1536) [ClassicSimilarity], result of:
            0.7370686 = score(doc=1536,freq=11.0), product of:
              0.6575317 = queryWeight, product of:
                4.9272738 = boost
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.015423224 = queryNorm
              1.1209629 = fieldWeight in 1536, product of:
                3.3166249 = tf(freq=11.0), with freq of:
                  11.0 = termFreq=11.0
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1536)
        0.2 = coord(5/25)
    
  2. Warner, J.: Analogies between linguistics and information theory (2007) 0.16
    0.1622596 = sum of:
      0.1622596 = product of:
        0.81129795 = sum of:
          0.11213668 = weight(abstract_txt:sequences in 138) [ClassicSimilarity], result of:
            0.11213668 = score(doc=138,freq=4.0), product of:
              0.12089847 = queryWeight, product of:
                1.0564009 = boost
                7.4202213 = idf(docFreq=71, maxDocs=44218)
                0.015423224 = queryNorm
              0.92752767 = fieldWeight in 138, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.4202213 = idf(docFreq=71, maxDocs=44218)
                0.0625 = fieldNorm(doc=138)
          0.04036772 = weight(abstract_txt:patterns in 138) [ClassicSimilarity], result of:
            0.04036772 = score(doc=138,freq=1.0), product of:
              0.12236066 = queryWeight, product of:
                1.5029837 = boost
                5.2785225 = idf(docFreq=612, maxDocs=44218)
                0.015423224 = queryNorm
              0.32990766 = fieldWeight in 138, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2785225 = idf(docFreq=612, maxDocs=44218)
                0.0625 = fieldNorm(doc=138)
          0.07437716 = weight(abstract_txt:units in 138) [ClassicSimilarity], result of:
            0.07437716 = score(doc=138,freq=1.0), product of:
              0.18389873 = queryWeight, product of:
                1.8425646 = boost
                6.4711404 = idf(docFreq=185, maxDocs=44218)
                0.015423224 = queryNorm
              0.40444627 = fieldWeight in 138, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.4711404 = idf(docFreq=185, maxDocs=44218)
                0.0625 = fieldNorm(doc=138)
          0.08155709 = weight(abstract_txt:unit in 138) [ClassicSimilarity], result of:
            0.08155709 = score(doc=138,freq=1.0), product of:
              0.19555105 = queryWeight, product of:
                1.900043 = boost
                6.6730065 = idf(docFreq=151, maxDocs=44218)
                0.015423224 = queryNorm
              0.4170629 = fieldWeight in 138, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6730065 = idf(docFreq=151, maxDocs=44218)
                0.0625 = fieldNorm(doc=138)
          0.5028593 = weight(abstract_txt:multiword in 138) [ClassicSimilarity], result of:
            0.5028593 = score(doc=138,freq=2.0), product of:
              0.6575317 = queryWeight, product of:
                4.9272738 = boost
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.015423224 = queryNorm
              0.7647682 = fieldWeight in 138, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.0625 = fieldNorm(doc=138)
        0.2 = coord(5/25)
    
  3. Nissim, M.; Zaninello, A,: Modeling the internal variability of multiword expressions through a pattern-based method (2013) 0.13
    0.13379653 = sum of:
      0.13379653 = product of:
        0.6689826 = sum of:
          0.026377505 = weight(abstract_txt:part in 990) [ClassicSimilarity], result of:
            0.026377505 = score(doc=990,freq=1.0), product of:
              0.09213857 = queryWeight, product of:
                1.3042297 = boost
                4.580493 = idf(docFreq=1231, maxDocs=44218)
                0.015423224 = queryNorm
              0.2862808 = fieldWeight in 990, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.580493 = idf(docFreq=1231, maxDocs=44218)
                0.0625 = fieldNorm(doc=990)
          0.09026496 = weight(abstract_txt:patterns in 990) [ClassicSimilarity], result of:
            0.09026496 = score(doc=990,freq=5.0), product of:
              0.12236066 = queryWeight, product of:
                1.5029837 = boost
                5.2785225 = idf(docFreq=612, maxDocs=44218)
                0.015423224 = queryNorm
              0.73769593 = fieldWeight in 990, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.2785225 = idf(docFreq=612, maxDocs=44218)
                0.0625 = fieldNorm(doc=990)
          0.10782499 = weight(abstract_txt:corpus in 990) [ClassicSimilarity], result of:
            0.10782499 = score(doc=990,freq=3.0), product of:
              0.16332708 = queryWeight, product of:
                1.7364507 = boost
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.015423224 = queryNorm
              0.66017824 = fieldWeight in 990, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.0625 = fieldNorm(doc=990)
          0.08893993 = weight(abstract_txt:speech in 990) [ClassicSimilarity], result of:
            0.08893993 = score(doc=990,freq=1.0), product of:
              0.20718113 = queryWeight, product of:
                1.9557279 = boost
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.015423224 = queryNorm
              0.42928585 = fieldWeight in 990, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.0625 = fieldNorm(doc=990)
          0.35557523 = weight(abstract_txt:multiword in 990) [ClassicSimilarity], result of:
            0.35557523 = score(doc=990,freq=1.0), product of:
              0.6575317 = queryWeight, product of:
                4.9272738 = boost
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.015423224 = queryNorm
              0.5407728 = fieldWeight in 990, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.0625 = fieldNorm(doc=990)
        0.2 = coord(5/25)
    
  4. Gödert, W.: Detecting multiword phrases in mathematical text corpora (2012) 0.11
    0.11176093 = sum of:
      0.11176093 = product of:
        0.9313411 = sum of:
          0.07133828 = weight(abstract_txt:corpora in 466) [ClassicSimilarity], result of:
            0.07133828 = score(doc=466,freq=1.0), product of:
              0.108333625 = queryWeight, product of:
                7.0240583 = idf(docFreq=106, maxDocs=44218)
                0.015423224 = queryNorm
              0.65850544 = fieldWeight in 466, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0240583 = idf(docFreq=106, maxDocs=44218)
                0.09375 = fieldNorm(doc=466)
          0.105713844 = weight(abstract_txt:nouns in 466) [ClassicSimilarity], result of:
            0.105713844 = score(doc=466,freq=1.0), product of:
              0.14081083 = queryWeight, product of:
                1.1400828 = boost
                8.008008 = idf(docFreq=39, maxDocs=44218)
                0.015423224 = queryNorm
              0.7507508 = fieldWeight in 466, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.008008 = idf(docFreq=39, maxDocs=44218)
                0.09375 = fieldNorm(doc=466)
          0.754289 = weight(abstract_txt:multiword in 466) [ClassicSimilarity], result of:
            0.754289 = score(doc=466,freq=2.0), product of:
              0.6575317 = queryWeight, product of:
                4.9272738 = boost
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.015423224 = queryNorm
              1.1471523 = fieldWeight in 466, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.09375 = fieldNorm(doc=466)
        0.12 = coord(3/25)
    
  5. Ramisch, C.: Multiword expressions acquisition : a generic and open framework (2015) 0.11
    0.10836364 = sum of:
      0.10836364 = product of:
        0.9030304 = sum of:
          0.045687176 = weight(abstract_txt:part in 1649) [ClassicSimilarity], result of:
            0.045687176 = score(doc=1649,freq=3.0), product of:
              0.09213857 = queryWeight, product of:
                1.3042297 = boost
                4.580493 = idf(docFreq=1231, maxDocs=44218)
                0.015423224 = queryNorm
              0.4958529 = fieldWeight in 1649, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.580493 = idf(docFreq=1231, maxDocs=44218)
                0.0625 = fieldNorm(doc=1649)
          0.062252786 = weight(abstract_txt:corpus in 1649) [ClassicSimilarity], result of:
            0.062252786 = score(doc=1649,freq=1.0), product of:
              0.16332708 = queryWeight, product of:
                1.7364507 = boost
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.015423224 = queryNorm
              0.3811541 = fieldWeight in 1649, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.0625 = fieldNorm(doc=1649)
          0.79509044 = weight(abstract_txt:multiword in 1649) [ClassicSimilarity], result of:
            0.79509044 = score(doc=1649,freq=5.0), product of:
              0.6575317 = queryWeight, product of:
                4.9272738 = boost
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.015423224 = queryNorm
              1.2092048 = fieldWeight in 1649, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.0625 = fieldNorm(doc=1649)
        0.12 = coord(3/25)