Document (#37991)

Author
Nissim, M.
Zaninello, A,
Title
Modeling the internal variability of multiword expressions through a pattern-based method
Source
ACM Transactions on Speech and Language Processing. 10(2013) no.2, Article7, S.1-26
Year
2013
Series
Special issue on multiword expressions: from theory to practice and use
Abstract
The issue of internal variability of multiword expressions (MWEs) is crucial towards their identification and extraction in running text.We present a corpus-supported and computational study on Italian MWEs, aimed at defining an automatic method for modeling internal variation, exploiting frequency and part-of-speech (POS) information. We do so by deriving an XML-encoded lexicon of MWEs based on a manually compiled dictionary, which is then projected onto a a large corpus. Since a search for fixed forms suffers from low recall, while an unconstrained flexible search for lemmas yields a loss in precision, we suggest a procedure aimed at maximizing precision in the identification of MWEs within a flexible search. Our method builds on the idea that internal variability can be modelled via the novel introduction of variation patterns, which work over POS patterns, and can be used as working tools for controlling precision. We also compare the performance of variation patterns to that of association measures, and explore the possibility of using variation patterns in MWE extraction in addition to identification. Finally, we suggest that corpus-derived, pattern-related information can be included in the original MWE lexicon by means of an enriched coding and the creation of an XML-based repository of patterns.
Content
Vgl. für das Themenheft: http://doi.acm.org/10.1145/2483691.2483692.
Theme
Computerlinguistik

Similar documents (content)

  1. Nagy T., I.: Detecting multiword expressions and named entities in natural language texts (2014) 0.31
    0.31087065 = sum of:
      0.31087065 = product of:
        0.97147083 = sum of:
          0.01705569 = weight(abstract_txt:based in 1536) [ClassicSimilarity], result of:
            0.01705569 = score(doc=1536,freq=8.0), product of:
              0.048423458 = queryWeight, product of:
                1.1481695 = boost
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.013229436 = queryNorm
              0.35221958 = fieldWeight in 1536, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1536)
          0.051011443 = weight(abstract_txt:extraction in 1536) [ClassicSimilarity], result of:
            0.051011443 = score(doc=1536,freq=3.0), product of:
              0.121771924 = queryWeight, product of:
                1.4866408 = boost
                6.1915555 = idf(docFreq=245, maxDocs=44218)
                0.013229436 = queryNorm
              0.41890973 = fieldWeight in 1536, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.1915555 = idf(docFreq=245, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1536)
          0.03794866 = weight(abstract_txt:method in 1536) [ClassicSimilarity], result of:
            0.03794866 = score(doc=1536,freq=5.0), product of:
              0.096526645 = queryWeight, product of:
                1.6210698 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.013229436 = queryNorm
              0.3931418 = fieldWeight in 1536, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1536)
          0.109943956 = weight(abstract_txt:expressions in 1536) [ClassicSimilarity], result of:
            0.109943956 = score(doc=1536,freq=8.0), product of:
              0.14651883 = queryWeight, product of:
                1.6307192 = boost
                6.7916126 = idf(docFreq=134, maxDocs=44218)
                0.013229436 = queryNorm
              0.75037426 = fieldWeight in 1536, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                6.7916126 = idf(docFreq=134, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1536)
          0.26656827 = weight(abstract_txt:multiword in 1536) [ClassicSimilarity], result of:
            0.26656827 = score(doc=1536,freq=11.0), product of:
              0.23780295 = queryWeight, product of:
                2.0775003 = boost
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.013229436 = queryNorm
              1.1209629 = fieldWeight in 1536, product of:
                3.3166249 = tf(freq=11.0), with freq of:
                  11.0 = termFreq=11.0
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1536)
          0.06436396 = weight(abstract_txt:identification in 1536) [ClassicSimilarity], result of:
            0.06436396 = score(doc=1536,freq=3.0), product of:
              0.162765 = queryWeight, product of:
                2.1050315 = boost
                5.8446846 = idf(docFreq=347, maxDocs=44218)
                0.013229436 = queryNorm
              0.39544103 = fieldWeight in 1536, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.8446846 = idf(docFreq=347, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1536)
          0.042214397 = weight(abstract_txt:corpus in 1536) [ClassicSimilarity], result of:
            0.042214397 = score(doc=1536,freq=1.0), product of:
              0.17720665 = queryWeight, product of:
                2.1964338 = boost
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.013229436 = queryNorm
              0.2382213 = fieldWeight in 1536, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1536)
          0.38236448 = weight(abstract_txt:mwes in 1536) [ClassicSimilarity], result of:
            0.38236448 = score(doc=1536,freq=3.0), product of:
              0.58762074 = queryWeight, product of:
                4.618448 = boost
                9.617446 = idf(docFreq=7, maxDocs=44218)
                0.013229436 = queryNorm
              0.65069944 = fieldWeight in 1536, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.617446 = idf(docFreq=7, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1536)
        0.32 = coord(8/25)
    
  2. Cruys, T. van de; Moirón, B.V.: Semantics-based multiword expression extraction (2007) 0.28
    0.28277344 = sum of:
      0.28277344 = product of:
        1.1782227 = sum of:
          0.014472233 = weight(abstract_txt:based in 2919) [ClassicSimilarity], result of:
            0.014472233 = score(doc=2919,freq=1.0), product of:
              0.048423458 = queryWeight, product of:
                1.1481695 = boost
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.013229436 = queryNorm
              0.29886824 = fieldWeight in 2919, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.09375 = fieldNorm(doc=2919)
          0.07068353 = weight(abstract_txt:extraction in 2919) [ClassicSimilarity], result of:
            0.07068353 = score(doc=2919,freq=1.0), product of:
              0.121771924 = queryWeight, product of:
                1.4866408 = boost
                6.1915555 = idf(docFreq=245, maxDocs=44218)
                0.013229436 = queryNorm
              0.58045834 = fieldWeight in 2919, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1915555 = idf(docFreq=245, maxDocs=44218)
                0.09375 = fieldNorm(doc=2919)
          0.057602014 = weight(abstract_txt:method in 2919) [ClassicSimilarity], result of:
            0.057602014 = score(doc=2919,freq=2.0), product of:
              0.096526645 = queryWeight, product of:
                1.6210698 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.013229436 = queryNorm
              0.5967473 = fieldWeight in 2919, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.09375 = fieldNorm(doc=2919)
          0.093290545 = weight(abstract_txt:expressions in 2919) [ClassicSimilarity], result of:
            0.093290545 = score(doc=2919,freq=1.0), product of:
              0.14651883 = queryWeight, product of:
                1.6307192 = boost
                6.7916126 = idf(docFreq=134, maxDocs=44218)
                0.013229436 = queryNorm
              0.6367137 = fieldWeight in 2919, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7916126 = idf(docFreq=134, maxDocs=44218)
                0.09375 = fieldNorm(doc=2919)
          0.19289605 = weight(abstract_txt:multiword in 2919) [ClassicSimilarity], result of:
            0.19289605 = score(doc=2919,freq=1.0), product of:
              0.23780295 = queryWeight, product of:
                2.0775003 = boost
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.013229436 = queryNorm
              0.8111592 = fieldWeight in 2919, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.09375 = fieldNorm(doc=2919)
          0.7492783 = weight(abstract_txt:mwes in 2919) [ClassicSimilarity], result of:
            0.7492783 = score(doc=2919,freq=2.0), product of:
              0.58762074 = queryWeight, product of:
                4.618448 = boost
                9.617446 = idf(docFreq=7, maxDocs=44218)
                0.013229436 = queryNorm
              1.2751052 = fieldWeight in 2919, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.617446 = idf(docFreq=7, maxDocs=44218)
                0.09375 = fieldNorm(doc=2919)
        0.24 = coord(6/25)
    
  3. Snajder, J.; Almic, P.: Modeling semantic compositionality of Croatian multiword expressions (2015) 0.26
    0.2587362 = sum of:
      0.2587362 = product of:
        1.2936809 = sum of:
          0.025066642 = weight(abstract_txt:based in 2920) [ClassicSimilarity], result of:
            0.025066642 = score(doc=2920,freq=3.0), product of:
              0.048423458 = queryWeight, product of:
                1.1481695 = boost
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.013229436 = queryNorm
              0.51765496 = fieldWeight in 2920, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.09375 = fieldNorm(doc=2920)
          0.064752884 = weight(abstract_txt:modeling in 2920) [ClassicSimilarity], result of:
            0.064752884 = score(doc=2920,freq=1.0), product of:
              0.11486149 = queryWeight, product of:
                1.443842 = boost
                6.0133076 = idf(docFreq=293, maxDocs=44218)
                0.013229436 = queryNorm
              0.5637476 = fieldWeight in 2920, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0133076 = idf(docFreq=293, maxDocs=44218)
                0.09375 = fieldNorm(doc=2920)
          0.093290545 = weight(abstract_txt:expressions in 2920) [ClassicSimilarity], result of:
            0.093290545 = score(doc=2920,freq=1.0), product of:
              0.14651883 = queryWeight, product of:
                1.6307192 = boost
                6.7916126 = idf(docFreq=134, maxDocs=44218)
                0.013229436 = queryNorm
              0.6367137 = fieldWeight in 2920, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7916126 = idf(docFreq=134, maxDocs=44218)
                0.09375 = fieldNorm(doc=2920)
          0.19289605 = weight(abstract_txt:multiword in 2920) [ClassicSimilarity], result of:
            0.19289605 = score(doc=2920,freq=1.0), product of:
              0.23780295 = queryWeight, product of:
                2.0775003 = boost
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.013229436 = queryNorm
              0.8111592 = fieldWeight in 2920, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.09375 = fieldNorm(doc=2920)
          0.9176748 = weight(abstract_txt:mwes in 2920) [ClassicSimilarity], result of:
            0.9176748 = score(doc=2920,freq=3.0), product of:
              0.58762074 = queryWeight, product of:
                4.618448 = boost
                9.617446 = idf(docFreq=7, maxDocs=44218)
                0.013229436 = queryNorm
              1.5616786 = fieldWeight in 2920, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.617446 = idf(docFreq=7, maxDocs=44218)
                0.09375 = fieldNorm(doc=2920)
        0.2 = coord(5/25)
    
  4. Ramisch, C.; Schreiner, P.; Idiart, M.; Villavicencio, A.: ¬An evaluation of methods for the extraction of multiword expressions (20xx) 0.14
    0.14496508 = sum of:
      0.14496508 = product of:
        1.2080424 = sum of:
          0.10883897 = weight(abstract_txt:expressions in 962) [ClassicSimilarity], result of:
            0.10883897 = score(doc=962,freq=1.0), product of:
              0.14651883 = queryWeight, product of:
                1.6307192 = boost
                6.7916126 = idf(docFreq=134, maxDocs=44218)
                0.013229436 = queryNorm
              0.74283266 = fieldWeight in 962, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7916126 = idf(docFreq=134, maxDocs=44218)
                0.109375 = fieldNorm(doc=962)
          0.22504538 = weight(abstract_txt:multiword in 962) [ClassicSimilarity], result of:
            0.22504538 = score(doc=962,freq=1.0), product of:
              0.23780295 = queryWeight, product of:
                2.0775003 = boost
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.013229436 = queryNorm
              0.94635236 = fieldWeight in 962, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.109375 = fieldNorm(doc=962)
          0.87415797 = weight(abstract_txt:mwes in 962) [ClassicSimilarity], result of:
            0.87415797 = score(doc=962,freq=2.0), product of:
              0.58762074 = queryWeight, product of:
                4.618448 = boost
                9.617446 = idf(docFreq=7, maxDocs=44218)
                0.013229436 = queryNorm
              1.4876227 = fieldWeight in 962, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.617446 = idf(docFreq=7, maxDocs=44218)
                0.109375 = fieldNorm(doc=962)
        0.12 = coord(3/25)
    
  5. Ferret, O.; Grau, B.; Hurault-Plantet, M.; Illouz, G.; Jacquemin, C.; Monceaux, L.; Robba, I.; Vilnat, A.: How NLP can improve question answering (2002) 0.12
    0.117829755 = sum of:
      0.117829755 = product of:
        0.58914876 = sum of:
          0.058902938 = weight(abstract_txt:extraction in 1850) [ClassicSimilarity], result of:
            0.058902938 = score(doc=1850,freq=1.0), product of:
              0.121771924 = queryWeight, product of:
                1.4866408 = boost
                6.1915555 = idf(docFreq=245, maxDocs=44218)
                0.013229436 = queryNorm
              0.48371527 = fieldWeight in 1850, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1915555 = idf(docFreq=245, maxDocs=44218)
                0.078125 = fieldNorm(doc=1850)
          0.16074672 = weight(abstract_txt:multiword in 1850) [ClassicSimilarity], result of:
            0.16074672 = score(doc=1850,freq=1.0), product of:
              0.23780295 = queryWeight, product of:
                2.0775003 = boost
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.013229436 = queryNorm
              0.675966 = fieldWeight in 1850, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.078125 = fieldNorm(doc=1850)
          0.0743211 = weight(abstract_txt:identification in 1850) [ClassicSimilarity], result of:
            0.0743211 = score(doc=1850,freq=1.0), product of:
              0.162765 = queryWeight, product of:
                2.1050315 = boost
                5.8446846 = idf(docFreq=347, maxDocs=44218)
                0.013229436 = queryNorm
              0.45661598 = fieldWeight in 1850, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8446846 = idf(docFreq=347, maxDocs=44218)
                0.078125 = fieldNorm(doc=1850)
          0.12904161 = weight(abstract_txt:patterns in 1850) [ClassicSimilarity], result of:
            0.12904161 = score(doc=1850,freq=2.0), product of:
              0.22126482 = queryWeight, product of:
                3.1685362 = boost
                5.2785225 = idf(docFreq=612, maxDocs=44218)
                0.013229436 = queryNorm
              0.58319986 = fieldWeight in 1850, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.2785225 = idf(docFreq=612, maxDocs=44218)
                0.078125 = fieldNorm(doc=1850)
          0.16613641 = weight(abstract_txt:variation in 1850) [ClassicSimilarity], result of:
            0.16613641 = score(doc=1850,freq=1.0), product of:
              0.30627325 = queryWeight, product of:
                3.3342795 = boost
                6.943297 = idf(docFreq=115, maxDocs=44218)
                0.013229436 = queryNorm
              0.54244506 = fieldWeight in 1850, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.943297 = idf(docFreq=115, maxDocs=44218)
                0.078125 = fieldNorm(doc=1850)
        0.2 = coord(5/25)