Document (#39922)

Author
Snajder, J.
Almic, P.
Title
Modeling semantic compositionality of Croatian multiword expressions
Source
Informatica. 39(2015) H.3, S.301-309
Year
2015
Abstract
A distinguishing feature of many multiword expressions (MWEs) is their semantic non-compositionality. Determining the semantic compositionality of MWEs is important for many natural language processing tasks. We address the task of modeling semantic compositionality of Croatian MWEs. We adopt a composition-based approach within the distributional semantics framework. We build and evaluate models based on Latent Semantic Analysis and the recently proposed neural network-based Skip-gram model, and experiment with different composition functions. We show that the compositionality scores predicted by the Skip-gram additive models correlate well with human judgments (=0.50). When framed as a classification task, the model achieves an accuracy of 0.64.
Content
Vgl. unter: http://takelab.fer.hr/data/cromwesc/. The dataset is available from here: TakeLab-CroMWEsc.tar.gz. The archive contains one file, which contains a list of 200 Croatian multiword expressions annotated with semantic compositionality scores. Twenty expressions were annotated by 24 annotators (denoted by "*") and the rest of them were annotated by 6 annotators. Besides median, we provide mode, mean, and standard deviation for each expression. Consult the above mentioned paper for details.
Theme
Computerlinguistik

Similar documents (content)

  1. Cruys, T. van de; Moirón, B.V.: Semantics-based multiword expression extraction (2007) 0.44
    0.43514276 = sum of:
      0.43514276 = product of:
        1.8130949 = sum of:
          0.11383611 = weight(abstract_txt:distributional in 4920) [ClassicSimilarity], result of:
            0.11383611 = score(doc=4920,freq=1.0), product of:
              0.12972352 = queryWeight, product of:
                1.3564252 = boost
                9.360306 = idf(docFreq=9, maxDocs=42740)
                0.010217222 = queryNorm
              0.87752867 = fieldWeight in 4920, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.360306 = idf(docFreq=9, maxDocs=42740)
                0.09375 = fieldNorm(doc=4920)
          0.013758766 = weight(abstract_txt:based in 4920) [ClassicSimilarity], result of:
            0.013758766 = score(doc=4920,freq=1.0), product of:
              0.045736063 = queryWeight, product of:
                1.3950074 = boost
                3.2088501 = idf(docFreq=4693, maxDocs=42740)
                0.010217222 = queryNorm
              0.3008297 = fieldWeight in 4920, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.2088501 = idf(docFreq=4693, maxDocs=42740)
                0.09375 = fieldNorm(doc=4920)
          0.08894163 = weight(abstract_txt:expressions in 4920) [ClassicSimilarity], result of:
            0.08894163 = score(doc=4920,freq=1.0), product of:
              0.13864753 = queryWeight, product of:
                1.983159 = boost
                6.842609 = idf(docFreq=123, maxDocs=42740)
                0.010217222 = queryNorm
              0.6414946 = fieldWeight in 4920, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.842609 = idf(docFreq=123, maxDocs=42740)
                0.09375 = fieldNorm(doc=4920)
          0.1777113 = weight(abstract_txt:multiword in 4920) [ClassicSimilarity], result of:
            0.1777113 = score(doc=4920,freq=1.0), product of:
              0.21994734 = queryWeight, product of:
                2.4978182 = boost
                8.618368 = idf(docFreq=20, maxDocs=42740)
                0.010217222 = queryNorm
              0.807972 = fieldWeight in 4920, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.618368 = idf(docFreq=20, maxDocs=42740)
                0.09375 = fieldNorm(doc=4920)
          0.51833636 = weight(abstract_txt:mwes in 4920) [ClassicSimilarity], result of:
            0.51833636 = score(doc=4920,freq=2.0), product of:
              0.40794683 = queryWeight, product of:
                4.166284 = boost
                9.583449 = idf(docFreq=7, maxDocs=42740)
                0.010217222 = queryNorm
              1.2705978 = fieldWeight in 4920, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.583449 = idf(docFreq=7, maxDocs=42740)
                0.09375 = fieldNorm(doc=4920)
          0.9005106 = weight(abstract_txt:compositionality in 4920) [ClassicSimilarity], result of:
            0.9005106 = score(doc=4920,freq=2.0), product of:
              0.69899046 = queryWeight, product of:
                7.0405583 = boost
                9.71698 = idf(docFreq=6, maxDocs=42740)
                0.010217222 = queryNorm
              1.2883017 = fieldWeight in 4920, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.71698 = idf(docFreq=6, maxDocs=42740)
                0.09375 = fieldNorm(doc=4920)
        0.24 = coord(6/25)
    
  2. Nagy T., I.: Detecting multiword expressions and named entities in natural language texts (2014) 0.19
    0.1938226 = sum of:
      0.1938226 = product of:
        0.6922236 = sum of:
          0.007527449 = weight(abstract_txt:model in 3537) [ClassicSimilarity], result of:
            0.007527449 = score(doc=3537,freq=1.0), product of:
              0.04790874 = queryWeight, product of:
                1.1657592 = boost
                4.022287 = idf(docFreq=2080, maxDocs=42740)
                0.010217222 = queryNorm
              0.15712059 = fieldWeight in 3537, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.022287 = idf(docFreq=2080, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3537)
          0.007979525 = weight(abstract_txt:many in 3537) [ClassicSimilarity], result of:
            0.007979525 = score(doc=3537,freq=1.0), product of:
              0.049808204 = queryWeight, product of:
                1.1886443 = boost
                4.1012487 = idf(docFreq=1922, maxDocs=42740)
                0.010217222 = queryNorm
              0.16020504 = fieldWeight in 3537, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1012487 = idf(docFreq=1922, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3537)
          0.01621486 = weight(abstract_txt:based in 3537) [ClassicSimilarity], result of:
            0.01621486 = score(doc=3537,freq=8.0), product of:
              0.045736063 = queryWeight, product of:
                1.3950074 = boost
                3.2088501 = idf(docFreq=4693, maxDocs=42740)
                0.010217222 = queryNorm
              0.35453117 = fieldWeight in 3537, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                3.2088501 = idf(docFreq=4693, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3537)
          0.104818724 = weight(abstract_txt:expressions in 3537) [ClassicSimilarity], result of:
            0.104818724 = score(doc=3537,freq=8.0), product of:
              0.13864753 = queryWeight, product of:
                1.983159 = boost
                6.842609 = idf(docFreq=123, maxDocs=42740)
                0.010217222 = queryNorm
              0.7560086 = fieldWeight in 3537, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                6.842609 = idf(docFreq=123, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3537)
          0.24558406 = weight(abstract_txt:multiword in 3537) [ClassicSimilarity], result of:
            0.24558406 = score(doc=3537,freq=11.0), product of:
              0.21994734 = queryWeight, product of:
                2.4978182 = boost
                8.618368 = idf(docFreq=20, maxDocs=42740)
                0.010217222 = queryNorm
              1.1165584 = fieldWeight in 3537, product of:
                3.3166249 = tf(freq=11.0), with freq of:
                  11.0 = termFreq=11.0
                8.618368 = idf(docFreq=20, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3537)
          0.045586582 = weight(abstract_txt:semantic in 3537) [ClassicSimilarity], result of:
            0.045586582 = score(doc=3537,freq=3.0), product of:
              0.14978918 = queryWeight, product of:
                3.2592053 = boost
                4.4981704 = idf(docFreq=1292, maxDocs=42740)
                0.010217222 = queryNorm
              0.30433828 = fieldWeight in 3537, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.4981704 = idf(docFreq=1292, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3537)
          0.26451242 = weight(abstract_txt:mwes in 3537) [ClassicSimilarity], result of:
            0.26451242 = score(doc=3537,freq=3.0), product of:
              0.40794683 = queryWeight, product of:
                4.166284 = boost
                9.583449 = idf(docFreq=7, maxDocs=42740)
                0.010217222 = queryNorm
              0.64839923 = fieldWeight in 3537, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.583449 = idf(docFreq=7, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3537)
        0.28 = coord(7/25)
    
  3. Rayson, P.; Piao, S.; Sharoff, S.; Evert, S.; Moiron, B.V.: Multiword expressions : hard going or plain sailing? (2015) 0.18
    0.17920566 = sum of:
      0.17920566 = product of:
        1.1200354 = sum of:
          0.104818724 = weight(abstract_txt:expressions in 4919) [ClassicSimilarity], result of:
            0.104818724 = score(doc=4919,freq=2.0), product of:
              0.13864753 = queryWeight, product of:
                1.983159 = boost
                6.842609 = idf(docFreq=123, maxDocs=42740)
                0.010217222 = queryNorm
              0.7560086 = fieldWeight in 4919, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.842609 = idf(docFreq=123, maxDocs=42740)
                0.078125 = fieldNorm(doc=4919)
          0.05263885 = weight(abstract_txt:semantic in 4919) [ClassicSimilarity], result of:
            0.05263885 = score(doc=4919,freq=1.0), product of:
              0.14978918 = queryWeight, product of:
                3.2592053 = boost
                4.4981704 = idf(docFreq=1292, maxDocs=42740)
                0.010217222 = queryNorm
              0.35141957 = fieldWeight in 4919, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4981704 = idf(docFreq=1292, maxDocs=42740)
                0.078125 = fieldNorm(doc=4919)
          0.43194693 = weight(abstract_txt:mwes in 4919) [ClassicSimilarity], result of:
            0.43194693 = score(doc=4919,freq=2.0), product of:
              0.40794683 = queryWeight, product of:
                4.166284 = boost
                9.583449 = idf(docFreq=7, maxDocs=42740)
                0.010217222 = queryNorm
              1.0588315 = fieldWeight in 4919, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.583449 = idf(docFreq=7, maxDocs=42740)
                0.078125 = fieldNorm(doc=4919)
          0.53063095 = weight(abstract_txt:compositionality in 4919) [ClassicSimilarity], result of:
            0.53063095 = score(doc=4919,freq=1.0), product of:
              0.69899046 = queryWeight, product of:
                7.0405583 = boost
                9.71698 = idf(docFreq=6, maxDocs=42740)
                0.010217222 = queryNorm
              0.75913906 = fieldWeight in 4919, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.71698 = idf(docFreq=6, maxDocs=42740)
                0.078125 = fieldNorm(doc=4919)
        0.16 = coord(4/25)
    
  4. Kiela, D.; Clark, S.: Detecting compositionality of multi-word expressions using nearest neighbours in vector space models (2013) 0.18
    0.17570055 = sum of:
      0.17570055 = product of:
        1.4641713 = sum of:
          0.10376524 = weight(abstract_txt:expressions in 3162) [ClassicSimilarity], result of:
            0.10376524 = score(doc=3162,freq=1.0), product of:
              0.13864753 = queryWeight, product of:
                1.983159 = boost
                6.842609 = idf(docFreq=123, maxDocs=42740)
                0.010217222 = queryNorm
              0.74841034 = fieldWeight in 3162, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.842609 = idf(docFreq=123, maxDocs=42740)
                0.109375 = fieldNorm(doc=3162)
          0.073694386 = weight(abstract_txt:semantic in 3162) [ClassicSimilarity], result of:
            0.073694386 = score(doc=3162,freq=1.0), product of:
              0.14978918 = queryWeight, product of:
                3.2592053 = boost
                4.4981704 = idf(docFreq=1292, maxDocs=42740)
                0.010217222 = queryNorm
              0.49198738 = fieldWeight in 3162, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4981704 = idf(docFreq=1292, maxDocs=42740)
                0.109375 = fieldNorm(doc=3162)
          1.2867117 = weight(abstract_txt:compositionality in 3162) [ClassicSimilarity], result of:
            1.2867117 = score(doc=3162,freq=3.0), product of:
              0.69899046 = queryWeight, product of:
                7.0405583 = boost
                9.71698 = idf(docFreq=6, maxDocs=42740)
                0.010217222 = queryNorm
              1.8408144 = fieldWeight in 3162, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.71698 = idf(docFreq=6, maxDocs=42740)
                0.109375 = fieldNorm(doc=3162)
        0.12 = coord(3/25)
    
  5. Nissim, M.; Zaninello, A,: Modeling the internal variability of multiword expressions through a pattern-based method (2013) 0.14
    0.14421888 = sum of:
      0.14421888 = product of:
        0.72109437 = sum of:
          0.012971888 = weight(abstract_txt:based in 2991) [ClassicSimilarity], result of:
            0.012971888 = score(doc=2991,freq=2.0), product of:
              0.045736063 = queryWeight, product of:
                1.3950074 = boost
                3.2088501 = idf(docFreq=4693, maxDocs=42740)
                0.010217222 = queryNorm
              0.28362495 = fieldWeight in 2991, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.2088501 = idf(docFreq=4693, maxDocs=42740)
                0.0625 = fieldNorm(doc=2991)
          0.041661657 = weight(abstract_txt:modeling in 2991) [ClassicSimilarity], result of:
            0.041661657 = score(doc=2991,freq=1.0), product of:
              0.109578975 = queryWeight, product of:
                1.7630519 = boost
                6.083161 = idf(docFreq=264, maxDocs=42740)
                0.010217222 = queryNorm
              0.38019755 = fieldWeight in 2991, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.083161 = idf(docFreq=264, maxDocs=42740)
                0.0625 = fieldNorm(doc=2991)
          0.059294425 = weight(abstract_txt:expressions in 2991) [ClassicSimilarity], result of:
            0.059294425 = score(doc=2991,freq=1.0), product of:
              0.13864753 = queryWeight, product of:
                1.983159 = boost
                6.842609 = idf(docFreq=123, maxDocs=42740)
                0.010217222 = queryNorm
              0.42766306 = fieldWeight in 2991, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.842609 = idf(docFreq=123, maxDocs=42740)
                0.0625 = fieldNorm(doc=2991)
          0.11847419 = weight(abstract_txt:multiword in 2991) [ClassicSimilarity], result of:
            0.11847419 = score(doc=2991,freq=1.0), product of:
              0.21994734 = queryWeight, product of:
                2.4978182 = boost
                8.618368 = idf(docFreq=20, maxDocs=42740)
                0.010217222 = queryNorm
              0.538648 = fieldWeight in 2991, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.618368 = idf(docFreq=20, maxDocs=42740)
                0.0625 = fieldNorm(doc=2991)
          0.48869222 = weight(abstract_txt:mwes in 2991) [ClassicSimilarity], result of:
            0.48869222 = score(doc=2991,freq=4.0), product of:
              0.40794683 = queryWeight, product of:
                4.166284 = boost
                9.583449 = idf(docFreq=7, maxDocs=42740)
                0.010217222 = queryNorm
              1.1979312 = fieldWeight in 2991, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                9.583449 = idf(docFreq=7, maxDocs=42740)
                0.0625 = fieldNorm(doc=2991)
        0.2 = coord(5/25)