Document (#37645)

Author
Dias, G.
Title
Multiword unit hybrid extraction
Source
http://acl.ldc.upenn.edu/W/W03/W03-1806.pdf
Year
o.J.
Abstract
This paper describes an original hybrid system that extracts multiword unit candidates from part-of-speech tagged corpora. While classical hybrid systems manually define local part-of-speech patterns that lead to the identification of well-known multiword units (mainly compound nouns), our solution automatically identifies relevant syntactical patterns from the corpus. Word statistics are then combined with the endogenously acquired linguistic information in order to extract the most relevant sequences of words. As a result, (1) human intervention is avoided providing total flexibility of use of the system and (2) different multiword units like phrasal verbs, adverbial locutions and prepositional locutions may be identified. The system has been tested on the Brown Corpus leading to encouraging results
Theme
Computerlinguistik

Similar documents (content)

  1. Nagy T., I.: Detecting multiword expressions and named entities in natural language texts (2014) 0.18
    0.1842751 = sum of:
      0.1842751 = product of:
        0.9213755 = sum of:
          0.0362021 = weight(abstract_txt:compound in 3001) [ClassicSimilarity], result of:
            0.0362021 = score(doc=3001,freq=1.0), product of:
              0.12356229 = queryWeight, product of:
                1.0563825 = boost
                7.500458 = idf(docFreq=64, maxDocs=43254)
                0.015594698 = queryNorm
              0.29298663 = fieldWeight in 3001, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.500458 = idf(docFreq=64, maxDocs=43254)
                0.0390625 = fieldNorm(doc=3001)
          0.04144281 = weight(abstract_txt:candidates in 3001) [ClassicSimilarity], result of:
            0.04144281 = score(doc=3001,freq=1.0), product of:
              0.13521647 = queryWeight, product of:
                1.1050783 = boost
                7.846204 = idf(docFreq=45, maxDocs=43254)
                0.015594698 = queryNorm
              0.30649233 = fieldWeight in 3001, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.846204 = idf(docFreq=45, maxDocs=43254)
                0.0390625 = fieldNorm(doc=3001)
          0.039595235 = weight(abstract_txt:corpus in 3001) [ClassicSimilarity], result of:
            0.039595235 = score(doc=3001,freq=1.0), product of:
              0.16526037 = queryWeight, product of:
                1.727737 = boost
                6.1335816 = idf(docFreq=254, maxDocs=43254)
                0.015594698 = queryNorm
              0.23959303 = fieldWeight in 3001, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1335816 = idf(docFreq=254, maxDocs=43254)
                0.0390625 = fieldNorm(doc=3001)
          0.072477825 = weight(abstract_txt:unit in 3001) [ClassicSimilarity], result of:
            0.072477825 = score(doc=3001,freq=2.0), product of:
              0.19627586 = queryWeight, product of:
                1.8828976 = boost
                6.6844125 = idf(docFreq=146, maxDocs=43254)
                0.015594698 = queryNorm
              0.3692651 = fieldWeight in 3001, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.6844125 = idf(docFreq=146, maxDocs=43254)
                0.0390625 = fieldNorm(doc=3001)
          0.73165756 = weight(abstract_txt:multiword in 3001) [ClassicSimilarity], result of:
            0.73165756 = score(doc=3001,freq=11.0), product of:
              0.6543716 = queryWeight, product of:
                4.862062 = boost
                8.630322 = idf(docFreq=20, maxDocs=43254)
                0.015594698 = queryNorm
              1.1181071 = fieldWeight in 3001, product of:
                3.3166249 = tf(freq=11.0), with freq of:
                  11.0 = termFreq=11.0
                8.630322 = idf(docFreq=20, maxDocs=43254)
                0.0390625 = fieldNorm(doc=3001)
        0.2 = coord(5/25)
    
  2. Warner, J.: Analogies between linguistics and information theory (2007) 0.16
    0.16211024 = sum of:
      0.16211024 = product of:
        0.81055117 = sum of:
          0.11244667 = weight(abstract_txt:sequences in 2139) [ClassicSimilarity], result of:
            0.11244667 = score(doc=2139,freq=4.0), product of:
              0.12113265 = queryWeight, product of:
                1.045945 = boost
                7.4263496 = idf(docFreq=69, maxDocs=43254)
                0.015594698 = queryNorm
              0.9282937 = fieldWeight in 2139, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.4263496 = idf(docFreq=69, maxDocs=43254)
                0.0625 = fieldNorm(doc=2139)
          0.04098858 = weight(abstract_txt:patterns in 2139) [ClassicSimilarity], result of:
            0.04098858 = score(doc=2139,freq=1.0), product of:
              0.12362379 = queryWeight, product of:
                1.4943223 = boost
                5.304944 = idf(docFreq=583, maxDocs=43254)
                0.015594698 = queryNorm
              0.331559 = fieldWeight in 2139, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.304944 = idf(docFreq=583, maxDocs=43254)
                0.0625 = fieldNorm(doc=2139)
          0.07594891 = weight(abstract_txt:units in 2139) [ClassicSimilarity], result of:
            0.07594891 = score(doc=2139,freq=1.0), product of:
              0.18649814 = queryWeight, product of:
                1.835399 = boost
                6.5157895 = idf(docFreq=173, maxDocs=43254)
                0.015594698 = queryNorm
              0.40723684 = fieldWeight in 2139, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5157895 = idf(docFreq=173, maxDocs=43254)
                0.0625 = fieldNorm(doc=2139)
          0.0819993 = weight(abstract_txt:unit in 2139) [ClassicSimilarity], result of:
            0.0819993 = score(doc=2139,freq=1.0), product of:
              0.19627586 = queryWeight, product of:
                1.8828976 = boost
                6.6844125 = idf(docFreq=146, maxDocs=43254)
                0.015594698 = queryNorm
              0.41777578 = fieldWeight in 2139, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6844125 = idf(docFreq=146, maxDocs=43254)
                0.0625 = fieldNorm(doc=2139)
          0.4991677 = weight(abstract_txt:multiword in 2139) [ClassicSimilarity], result of:
            0.4991677 = score(doc=2139,freq=2.0), product of:
              0.6543716 = queryWeight, product of:
                4.862062 = boost
                8.630322 = idf(docFreq=20, maxDocs=43254)
                0.015594698 = queryNorm
              0.76281995 = fieldWeight in 2139, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.630322 = idf(docFreq=20, maxDocs=43254)
                0.0625 = fieldNorm(doc=2139)
        0.2 = coord(5/25)
    
  3. Nissim, M.; Zaninello, A,: Modeling the internal variability of multiword expressions through a pattern-based method (2013) 0.13
    0.13407774 = sum of:
      0.13407774 = product of:
        0.6703887 = sum of:
          0.026344076 = weight(abstract_txt:part in 2455) [ClassicSimilarity], result of:
            0.026344076 = score(doc=2455,freq=1.0), product of:
              0.092069425 = queryWeight, product of:
                1.289588 = boost
                4.5781236 = idf(docFreq=1207, maxDocs=43254)
                0.015594698 = queryNorm
              0.28613272 = fieldWeight in 2455, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5781236 = idf(docFreq=1207, maxDocs=43254)
                0.0625 = fieldNorm(doc=2455)
          0.09165326 = weight(abstract_txt:patterns in 2455) [ClassicSimilarity], result of:
            0.09165326 = score(doc=2455,freq=5.0), product of:
              0.12362379 = queryWeight, product of:
                1.4943223 = boost
                5.304944 = idf(docFreq=583, maxDocs=43254)
                0.015594698 = queryNorm
              0.7413885 = fieldWeight in 2455, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.304944 = idf(docFreq=583, maxDocs=43254)
                0.0625 = fieldNorm(doc=2455)
          0.10972953 = weight(abstract_txt:corpus in 2455) [ClassicSimilarity], result of:
            0.10972953 = score(doc=2455,freq=3.0), product of:
              0.16526037 = queryWeight, product of:
                1.727737 = boost
                6.1335816 = idf(docFreq=254, maxDocs=43254)
                0.015594698 = queryNorm
              0.66397965 = fieldWeight in 2455, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.1335816 = idf(docFreq=254, maxDocs=43254)
                0.0625 = fieldNorm(doc=2455)
          0.08969692 = weight(abstract_txt:speech in 2455) [ClassicSimilarity], result of:
            0.08969692 = score(doc=2455,freq=1.0), product of:
              0.20837478 = queryWeight, product of:
                1.940063 = boost
                6.8873534 = idf(docFreq=119, maxDocs=43254)
                0.015594698 = queryNorm
              0.4304596 = fieldWeight in 2455, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8873534 = idf(docFreq=119, maxDocs=43254)
                0.0625 = fieldNorm(doc=2455)
          0.35296488 = weight(abstract_txt:multiword in 2455) [ClassicSimilarity], result of:
            0.35296488 = score(doc=2455,freq=1.0), product of:
              0.6543716 = queryWeight, product of:
                4.862062 = boost
                8.630322 = idf(docFreq=20, maxDocs=43254)
                0.015594698 = queryNorm
              0.53939515 = fieldWeight in 2455, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.630322 = idf(docFreq=20, maxDocs=43254)
                0.0625 = fieldNorm(doc=2455)
        0.2 = coord(5/25)
    
  4. Gödert, W.: Detecting multiword phrases in mathematical text corpora (2012) 0.11
    0.11127924 = sum of:
      0.11127924 = product of:
        0.92732704 = sum of:
          0.07370234 = weight(abstract_txt:corpora in 1931) [ClassicSimilarity], result of:
            0.07370234 = score(doc=1931,freq=1.0), product of:
              0.11072444 = queryWeight, product of:
                7.100134 = idf(docFreq=96, maxDocs=43254)
                0.015594698 = queryNorm
              0.66563755 = fieldWeight in 1931, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.100134 = idf(docFreq=96, maxDocs=43254)
                0.09375 = fieldNorm(doc=1931)
          0.104873076 = weight(abstract_txt:nouns in 1931) [ClassicSimilarity], result of:
            0.104873076 = score(doc=1931,freq=1.0), product of:
              0.1400765 = queryWeight, product of:
                1.1247627 = boost
                7.9859657 = idf(docFreq=39, maxDocs=43254)
                0.015594698 = queryNorm
              0.7486843 = fieldWeight in 1931, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.9859657 = idf(docFreq=39, maxDocs=43254)
                0.09375 = fieldNorm(doc=1931)
          0.7487516 = weight(abstract_txt:multiword in 1931) [ClassicSimilarity], result of:
            0.7487516 = score(doc=1931,freq=2.0), product of:
              0.6543716 = queryWeight, product of:
                4.862062 = boost
                8.630322 = idf(docFreq=20, maxDocs=43254)
                0.015594698 = queryNorm
              1.1442299 = fieldWeight in 1931, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.630322 = idf(docFreq=20, maxDocs=43254)
                0.09375 = fieldNorm(doc=1931)
        0.12 = coord(3/25)
    
  5. Ramisch, C.: Multiword expressions acquisition : a generic and open framework (2015) 0.11
    0.10778822 = sum of:
      0.10778822 = product of:
        0.8982352 = sum of:
          0.045629278 = weight(abstract_txt:part in 3114) [ClassicSimilarity], result of:
            0.045629278 = score(doc=3114,freq=3.0), product of:
              0.092069425 = queryWeight, product of:
                1.289588 = boost
                4.5781236 = idf(docFreq=1207, maxDocs=43254)
                0.015594698 = queryNorm
              0.4955964 = fieldWeight in 3114, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.5781236 = idf(docFreq=1207, maxDocs=43254)
                0.0625 = fieldNorm(doc=3114)
          0.063352376 = weight(abstract_txt:corpus in 3114) [ClassicSimilarity], result of:
            0.063352376 = score(doc=3114,freq=1.0), product of:
              0.16526037 = queryWeight, product of:
                1.727737 = boost
                6.1335816 = idf(docFreq=254, maxDocs=43254)
                0.015594698 = queryNorm
              0.38334885 = fieldWeight in 3114, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1335816 = idf(docFreq=254, maxDocs=43254)
                0.0625 = fieldNorm(doc=3114)
          0.78925353 = weight(abstract_txt:multiword in 3114) [ClassicSimilarity], result of:
            0.78925353 = score(doc=3114,freq=5.0), product of:
              0.6543716 = queryWeight, product of:
                4.862062 = boost
                8.630322 = idf(docFreq=20, maxDocs=43254)
                0.015594698 = queryNorm
              1.2061243 = fieldWeight in 3114, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                8.630322 = idf(docFreq=20, maxDocs=43254)
                0.0625 = fieldNorm(doc=3114)
        0.12 = coord(3/25)