Document (#37992)

Author
Nissim, M.
Zaninello, A,
Title
Modeling the internal variability of multiword expressions through a pattern-based method
Source
ACM Transactions on Speech and Language Processing. 10(2013) no.2, Article7, S.1-26
Year
2013
Series
Special issue on multiword expressions: from theory to practice and use
Abstract
The issue of internal variability of multiword expressions (MWEs) is crucial towards their identification and extraction in running text.We present a corpus-supported and computational study on Italian MWEs, aimed at defining an automatic method for modeling internal variation, exploiting frequency and part-of-speech (POS) information. We do so by deriving an XML-encoded lexicon of MWEs based on a manually compiled dictionary, which is then projected onto a a large corpus. Since a search for fixed forms suffers from low recall, while an unconstrained flexible search for lemmas yields a loss in precision, we suggest a procedure aimed at maximizing precision in the identification of MWEs within a flexible search. Our method builds on the idea that internal variability can be modelled via the novel introduction of variation patterns, which work over POS patterns, and can be used as working tools for controlling precision. We also compare the performance of variation patterns to that of association measures, and explore the possibility of using variation patterns in MWE extraction in addition to identification. Finally, we suggest that corpus-derived, pattern-related information can be included in the original MWE lexicon by means of an enriched coding and the creation of an XML-based repository of patterns.
Content
Vgl. für das Themenheft: http://doi.acm.org/10.1145/2483691.2483692.
Theme
Computerlinguistik

Similar documents (content)

  1. Nagy T., I.: Detecting multiword expressions and named entities in natural language texts (2014) 0.31
    0.3098125 = sum of:
      0.3098125 = product of:
        0.9681641 = sum of:
          0.017367333 = weight(abstract_txt:based in 3537) [ClassicSimilarity], result of:
            0.017367333 = score(doc=3537,freq=8.0), product of:
              0.04898676 = queryWeight, product of:
                1.1453643 = boost
                3.2088501 = idf(docFreq=4693, maxDocs=42740)
                0.013328634 = queryNorm
              0.35453117 = fieldWeight in 3537, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                3.2088501 = idf(docFreq=4693, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3537)
          0.051543552 = weight(abstract_txt:extraction in 3537) [ClassicSimilarity], result of:
            0.051543552 = score(doc=3537,freq=3.0), product of:
              0.12255526 = queryWeight, product of:
                1.4791923 = boost
                6.216153 = idf(docFreq=231, maxDocs=42740)
                0.013328634 = queryNorm
              0.42057395 = fieldWeight in 3537, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.216153 = idf(docFreq=231, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3537)
          0.038416207 = weight(abstract_txt:method in 3537) [ClassicSimilarity], result of:
            0.038416207 = score(doc=3537,freq=5.0), product of:
              0.097268656 = queryWeight, product of:
                1.6139524 = boost
                4.5216455 = idf(docFreq=1262, maxDocs=42740)
                0.013328634 = queryNorm
              0.3949495 = fieldWeight in 3537, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.5216455 = idf(docFreq=1262, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3537)
          0.112268716 = weight(abstract_txt:expressions in 3537) [ClassicSimilarity], result of:
            0.112268716 = score(doc=3537,freq=8.0), product of:
              0.1485019 = queryWeight, product of:
                1.6282634 = boost
                6.842609 = idf(docFreq=123, maxDocs=42740)
                0.013328634 = queryNorm
              0.7560086 = fieldWeight in 3537, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                6.842609 = idf(docFreq=123, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3537)
          0.263039 = weight(abstract_txt:multiword in 3537) [ClassicSimilarity], result of:
            0.263039 = score(doc=3537,freq=11.0), product of:
              0.23558015 = queryWeight, product of:
                2.050822 = boost
                8.618368 = idf(docFreq=20, maxDocs=42740)
                0.013328634 = queryNorm
              1.1165584 = fieldWeight in 3537, product of:
                3.3166249 = tf(freq=11.0), with freq of:
                  11.0 = termFreq=11.0
                8.618368 = idf(docFreq=20, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3537)
          0.064898916 = weight(abstract_txt:identification in 3537) [ClassicSimilarity], result of:
            0.064898916 = score(doc=3537,freq=3.0), product of:
              0.16358286 = queryWeight, product of:
                2.093019 = boost
                5.8637977 = idf(docFreq=329, maxDocs=42740)
                0.013328634 = queryNorm
              0.39673418 = fieldWeight in 3537, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.8637977 = idf(docFreq=329, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3537)
          0.0428802 = weight(abstract_txt:corpus in 3537) [ClassicSimilarity], result of:
            0.0428802 = score(doc=3537,freq=1.0), product of:
              0.1789745 = queryWeight, product of:
                2.1892726 = boost
                6.1334615 = idf(docFreq=251, maxDocs=42740)
                0.013328634 = queryNorm
              0.23958834 = fieldWeight in 3537, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1334615 = idf(docFreq=251, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3537)
          0.37775025 = weight(abstract_txt:mwes in 3537) [ClassicSimilarity], result of:
            0.37775025 = score(doc=3537,freq=3.0), product of:
              0.582589 = queryWeight, product of:
                4.5609446 = boost
                9.583449 = idf(docFreq=7, maxDocs=42740)
                0.013328634 = queryNorm
              0.64839923 = fieldWeight in 3537, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.583449 = idf(docFreq=7, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3537)
        0.32 = coord(8/25)
    
  2. Cruys, T. van de; Moirón, B.V.: Semantics-based multiword expression extraction (2007) 0.28
    0.2808746 = sum of:
      0.2808746 = product of:
        1.1703109 = sum of:
          0.014736673 = weight(abstract_txt:based in 4920) [ClassicSimilarity], result of:
            0.014736673 = score(doc=4920,freq=1.0), product of:
              0.04898676 = queryWeight, product of:
                1.1453643 = boost
                3.2088501 = idf(docFreq=4693, maxDocs=42740)
                0.013328634 = queryNorm
              0.3008297 = fieldWeight in 4920, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.2088501 = idf(docFreq=4693, maxDocs=42740)
                0.09375 = fieldNorm(doc=4920)
          0.07142084 = weight(abstract_txt:extraction in 4920) [ClassicSimilarity], result of:
            0.07142084 = score(doc=4920,freq=1.0), product of:
              0.12255526 = queryWeight, product of:
                1.4791923 = boost
                6.216153 = idf(docFreq=231, maxDocs=42740)
                0.013328634 = queryNorm
              0.5827644 = fieldWeight in 4920, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.216153 = idf(docFreq=231, maxDocs=42740)
                0.09375 = fieldNorm(doc=4920)
          0.058311697 = weight(abstract_txt:method in 4920) [ClassicSimilarity], result of:
            0.058311697 = score(doc=4920,freq=2.0), product of:
              0.097268656 = queryWeight, product of:
                1.6139524 = boost
                4.5216455 = idf(docFreq=1262, maxDocs=42740)
                0.013328634 = queryNorm
              0.5994911 = fieldWeight in 4920, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.5216455 = idf(docFreq=1262, maxDocs=42740)
                0.09375 = fieldNorm(doc=4920)
          0.09526317 = weight(abstract_txt:expressions in 4920) [ClassicSimilarity], result of:
            0.09526317 = score(doc=4920,freq=1.0), product of:
              0.1485019 = queryWeight, product of:
                1.6282634 = boost
                6.842609 = idf(docFreq=123, maxDocs=42740)
                0.013328634 = queryNorm
              0.6414946 = fieldWeight in 4920, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.842609 = idf(docFreq=123, maxDocs=42740)
                0.09375 = fieldNorm(doc=4920)
          0.19034216 = weight(abstract_txt:multiword in 4920) [ClassicSimilarity], result of:
            0.19034216 = score(doc=4920,freq=1.0), product of:
              0.23558015 = queryWeight, product of:
                2.050822 = boost
                8.618368 = idf(docFreq=20, maxDocs=42740)
                0.013328634 = queryNorm
              0.807972 = fieldWeight in 4920, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.618368 = idf(docFreq=20, maxDocs=42740)
                0.09375 = fieldNorm(doc=4920)
          0.7402363 = weight(abstract_txt:mwes in 4920) [ClassicSimilarity], result of:
            0.7402363 = score(doc=4920,freq=2.0), product of:
              0.582589 = queryWeight, product of:
                4.5609446 = boost
                9.583449 = idf(docFreq=7, maxDocs=42740)
                0.013328634 = queryNorm
              1.2705978 = fieldWeight in 4920, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.583449 = idf(docFreq=7, maxDocs=42740)
                0.09375 = fieldNorm(doc=4920)
        0.24 = coord(6/25)
    
  3. Snajder, J.; Almic, P.: Modeling semantic compositionality of Croatian multiword expressions (2015) 0.26
    0.25693294 = sum of:
      0.25693294 = product of:
        1.2846646 = sum of:
          0.025524663 = weight(abstract_txt:based in 4921) [ClassicSimilarity], result of:
            0.025524663 = score(doc=4921,freq=3.0), product of:
              0.04898676 = queryWeight, product of:
                1.1453643 = boost
                3.2088501 = idf(docFreq=4693, maxDocs=42740)
                0.013328634 = queryNorm
              0.5210523 = fieldWeight in 4921, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.2088501 = idf(docFreq=4693, maxDocs=42740)
                0.09375 = fieldNorm(doc=4921)
          0.06693415 = weight(abstract_txt:modeling in 4921) [ClassicSimilarity], result of:
            0.06693415 = score(doc=4921,freq=1.0), product of:
              0.11736732 = queryWeight, product of:
                1.4475455 = boost
                6.083161 = idf(docFreq=264, maxDocs=42740)
                0.013328634 = queryNorm
              0.57029635 = fieldWeight in 4921, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.083161 = idf(docFreq=264, maxDocs=42740)
                0.09375 = fieldNorm(doc=4921)
          0.09526317 = weight(abstract_txt:expressions in 4921) [ClassicSimilarity], result of:
            0.09526317 = score(doc=4921,freq=1.0), product of:
              0.1485019 = queryWeight, product of:
                1.6282634 = boost
                6.842609 = idf(docFreq=123, maxDocs=42740)
                0.013328634 = queryNorm
              0.6414946 = fieldWeight in 4921, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.842609 = idf(docFreq=123, maxDocs=42740)
                0.09375 = fieldNorm(doc=4921)
          0.19034216 = weight(abstract_txt:multiword in 4921) [ClassicSimilarity], result of:
            0.19034216 = score(doc=4921,freq=1.0), product of:
              0.23558015 = queryWeight, product of:
                2.050822 = boost
                8.618368 = idf(docFreq=20, maxDocs=42740)
                0.013328634 = queryNorm
              0.807972 = fieldWeight in 4921, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.618368 = idf(docFreq=20, maxDocs=42740)
                0.09375 = fieldNorm(doc=4921)
          0.90660053 = weight(abstract_txt:mwes in 4921) [ClassicSimilarity], result of:
            0.90660053 = score(doc=4921,freq=3.0), product of:
              0.582589 = queryWeight, product of:
                4.5609446 = boost
                9.583449 = idf(docFreq=7, maxDocs=42740)
                0.013328634 = queryNorm
              1.5561581 = fieldWeight in 4921, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.583449 = idf(docFreq=7, maxDocs=42740)
                0.09375 = fieldNorm(doc=4921)
        0.2 = coord(5/25)
    
  4. Ramisch, C.; Schreiner, P.; Idiart, M.; Villavicencio, A.: ¬An evaluation of methods for the extraction of multiword expressions (20xx) 0.14
    0.14361782 = sum of:
      0.14361782 = product of:
        1.1968153 = sum of:
          0.11114036 = weight(abstract_txt:expressions in 2963) [ClassicSimilarity], result of:
            0.11114036 = score(doc=2963,freq=1.0), product of:
              0.1485019 = queryWeight, product of:
                1.6282634 = boost
                6.842609 = idf(docFreq=123, maxDocs=42740)
                0.013328634 = queryNorm
              0.74841034 = fieldWeight in 2963, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.842609 = idf(docFreq=123, maxDocs=42740)
                0.109375 = fieldNorm(doc=2963)
          0.22206585 = weight(abstract_txt:multiword in 2963) [ClassicSimilarity], result of:
            0.22206585 = score(doc=2963,freq=1.0), product of:
              0.23558015 = queryWeight, product of:
                2.050822 = boost
                8.618368 = idf(docFreq=20, maxDocs=42740)
                0.013328634 = queryNorm
              0.942634 = fieldWeight in 2963, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.618368 = idf(docFreq=20, maxDocs=42740)
                0.109375 = fieldNorm(doc=2963)
          0.863609 = weight(abstract_txt:mwes in 2963) [ClassicSimilarity], result of:
            0.863609 = score(doc=2963,freq=2.0), product of:
              0.582589 = queryWeight, product of:
                4.5609446 = boost
                9.583449 = idf(docFreq=7, maxDocs=42740)
                0.013328634 = queryNorm
              1.4823642 = fieldWeight in 2963, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.583449 = idf(docFreq=7, maxDocs=42740)
                0.109375 = fieldNorm(doc=2963)
        0.12 = coord(3/25)
    
  5. Ferret, O.; Grau, B.; Hurault-Plantet, M.; Illouz, G.; Jacquemin, C.; Monceaux, L.; Robba, I.; Vilnat, A.: How NLP can improve question answering (2002) 0.12
    0.11843522 = sum of:
      0.11843522 = product of:
        0.5921761 = sum of:
          0.059517365 = weight(abstract_txt:extraction in 2851) [ClassicSimilarity], result of:
            0.059517365 = score(doc=2851,freq=1.0), product of:
              0.12255526 = queryWeight, product of:
                1.4791923 = boost
                6.216153 = idf(docFreq=231, maxDocs=42740)
                0.013328634 = queryNorm
              0.48563695 = fieldWeight in 2851, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.216153 = idf(docFreq=231, maxDocs=42740)
                0.078125 = fieldNorm(doc=2851)
          0.15861848 = weight(abstract_txt:multiword in 2851) [ClassicSimilarity], result of:
            0.15861848 = score(doc=2851,freq=1.0), product of:
              0.23558015 = queryWeight, product of:
                2.050822 = boost
                8.618368 = idf(docFreq=20, maxDocs=42740)
                0.013328634 = queryNorm
              0.67331004 = fieldWeight in 2851, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.618368 = idf(docFreq=20, maxDocs=42740)
                0.078125 = fieldNorm(doc=2851)
          0.07493881 = weight(abstract_txt:identification in 2851) [ClassicSimilarity], result of:
            0.07493881 = score(doc=2851,freq=1.0), product of:
              0.16358286 = queryWeight, product of:
                2.093019 = boost
                5.8637977 = idf(docFreq=329, maxDocs=42740)
                0.013328634 = queryNorm
              0.4581092 = fieldWeight in 2851, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8637977 = idf(docFreq=329, maxDocs=42740)
                0.078125 = fieldNorm(doc=2851)
          0.13118413 = weight(abstract_txt:patterns in 2851) [ClassicSimilarity], result of:
            0.13118413 = score(doc=2851,freq=2.0), product of:
              0.22359411 = queryWeight, product of:
                3.159067 = boost
                5.3102612 = idf(docFreq=573, maxDocs=42740)
                0.013328634 = queryNorm
              0.5867065 = fieldWeight in 2851, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.3102612 = idf(docFreq=573, maxDocs=42740)
                0.078125 = fieldNorm(doc=2851)
          0.16791725 = weight(abstract_txt:variation in 2851) [ClassicSimilarity], result of:
            0.16791725 = score(doc=2851,freq=1.0), product of:
              0.30830202 = queryWeight, product of:
                3.3178887 = boost
                6.971543 = idf(docFreq=108, maxDocs=42740)
                0.013328634 = queryNorm
              0.5446518 = fieldWeight in 2851, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.971543 = idf(docFreq=108, maxDocs=42740)
                0.078125 = fieldNorm(doc=2851)
        0.2 = coord(5/25)