Document (#39921)

Author
Snajder, J.
Almic, P.
Title
Modeling semantic compositionality of Croatian multiword expressions
Source
Informatica. 39(2015) H.3, S.301-309
Year
2015
Abstract
A distinguishing feature of many multiword expressions (MWEs) is their semantic non-compositionality. Determining the semantic compositionality of MWEs is important for many natural language processing tasks. We address the task of modeling semantic compositionality of Croatian MWEs. We adopt a composition-based approach within the distributional semantics framework. We build and evaluate models based on Latent Semantic Analysis and the recently proposed neural network-based Skip-gram model, and experiment with different composition functions. We show that the compositionality scores predicted by the Skip-gram additive models correlate well with human judgments (=0.50). When framed as a classification task, the model achieves an accuracy of 0.64.
Content
Vgl. unter: http://takelab.fer.hr/data/cromwesc/. The dataset is available from here: TakeLab-CroMWEsc.tar.gz. The archive contains one file, which contains a list of 200 Croatian multiword expressions annotated with semantic compositionality scores. Twenty expressions were annotated by 24 annotators (denoted by "*") and the rest of them were annotated by 6 annotators. Besides median, we provide mode, mean, and standard deviation for each expression. Consult the above mentioned paper for details.
Theme
Computerlinguistik

Similar documents (content)

  1. Cruys, T. van de; Moirón, B.V.: Semantics-based multiword expression extraction (2007) 0.44
    0.4377681 = sum of:
      0.4377681 = product of:
        1.8240337 = sum of:
          0.114754006 = weight(abstract_txt:distributional in 2919) [ClassicSimilarity], result of:
            0.114754006 = score(doc=2919,freq=1.0), product of:
              0.13029629 = queryWeight, product of:
                1.3629022 = boost
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.010176603 = queryNorm
              0.88071585 = fieldWeight in 2919, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.09375 = fieldNorm(doc=2919)
          0.013453056 = weight(abstract_txt:based in 2919) [ClassicSimilarity], result of:
            0.013453056 = score(doc=2919,freq=1.0), product of:
              0.045013335 = queryWeight, product of:
                1.3874902 = boost
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.010176603 = queryNorm
              0.29886824 = fieldWeight in 2919, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.09375 = fieldNorm(doc=2919)
          0.086720735 = weight(abstract_txt:expressions in 2919) [ClassicSimilarity], result of:
            0.086720735 = score(doc=2919,freq=1.0), product of:
              0.13620052 = queryWeight, product of:
                1.9706208 = boost
                6.7916126 = idf(docFreq=134, maxDocs=44218)
                0.010176603 = queryNorm
              0.6367137 = fieldWeight in 2919, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7916126 = idf(docFreq=134, maxDocs=44218)
                0.09375 = fieldNorm(doc=2919)
          0.17931172 = weight(abstract_txt:multiword in 2919) [ClassicSimilarity], result of:
            0.17931172 = score(doc=2919,freq=1.0), product of:
              0.22105615 = queryWeight, product of:
                2.5105274 = boost
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.010176603 = queryNorm
              0.8111592 = fieldWeight in 2919, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.09375 = fieldNorm(doc=2919)
          0.52238387 = weight(abstract_txt:mwes in 2919) [ClassicSimilarity], result of:
            0.52238387 = score(doc=2919,freq=2.0), product of:
              0.40967903 = queryWeight, product of:
                4.185826 = boost
                9.617446 = idf(docFreq=7, maxDocs=44218)
                0.010176603 = queryNorm
              1.2751052 = fieldWeight in 2919, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.617446 = idf(docFreq=7, maxDocs=44218)
                0.09375 = fieldNorm(doc=2919)
          0.9074103 = weight(abstract_txt:compositionality in 2919) [ClassicSimilarity], result of:
            0.9074103 = score(doc=2919,freq=2.0), product of:
              0.70189035 = queryWeight, product of:
                7.0732384 = boost
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.010176603 = queryNorm
              1.2928092 = fieldWeight in 2919, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.09375 = fieldNorm(doc=2919)
        0.24 = coord(6/25)
    
  2. Nagy T., I.: Detecting multiword expressions and named entities in natural language texts (2014) 0.19
    0.19384779 = sum of:
      0.19384779 = product of:
        0.69231355 = sum of:
          0.0073060286 = weight(abstract_txt:model in 1536) [ClassicSimilarity], result of:
            0.0073060286 = score(doc=1536,freq=1.0), product of:
              0.04692006 = queryWeight, product of:
                1.156626 = boost
                3.986234 = idf(docFreq=2231, maxDocs=44218)
                0.010176603 = queryNorm
              0.15571226 = fieldWeight in 1536, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.986234 = idf(docFreq=2231, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1536)
          0.007840134 = weight(abstract_txt:many in 1536) [ClassicSimilarity], result of:
            0.007840134 = score(doc=1536,freq=1.0), product of:
              0.049179785 = queryWeight, product of:
                1.1841507 = boost
                4.081096 = idf(docFreq=2029, maxDocs=44218)
                0.010176603 = queryNorm
              0.15941782 = fieldWeight in 1536, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.081096 = idf(docFreq=2029, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1536)
          0.015854578 = weight(abstract_txt:based in 1536) [ClassicSimilarity], result of:
            0.015854578 = score(doc=1536,freq=8.0), product of:
              0.045013335 = queryWeight, product of:
                1.3874902 = boost
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.010176603 = queryNorm
              0.35221958 = fieldWeight in 1536, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1536)
          0.102201365 = weight(abstract_txt:expressions in 1536) [ClassicSimilarity], result of:
            0.102201365 = score(doc=1536,freq=8.0), product of:
              0.13620052 = queryWeight, product of:
                1.9706208 = boost
                6.7916126 = idf(docFreq=134, maxDocs=44218)
                0.010176603 = queryNorm
              0.75037426 = fieldWeight in 1536, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                6.7916126 = idf(docFreq=134, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1536)
          0.24779573 = weight(abstract_txt:multiword in 1536) [ClassicSimilarity], result of:
            0.24779573 = score(doc=1536,freq=11.0), product of:
              0.22105615 = queryWeight, product of:
                2.5105274 = boost
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.010176603 = queryNorm
              1.1209629 = fieldWeight in 1536, product of:
                3.3166249 = tf(freq=11.0), with freq of:
                  11.0 = termFreq=11.0
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1536)
          0.04473785 = weight(abstract_txt:semantic in 1536) [ClassicSimilarity], result of:
            0.04473785 = score(doc=1536,freq=3.0), product of:
              0.14778396 = queryWeight, product of:
                3.2456174 = boost
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.010176603 = queryNorm
              0.30272466 = fieldWeight in 1536, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1536)
          0.2665779 = weight(abstract_txt:mwes in 1536) [ClassicSimilarity], result of:
            0.2665779 = score(doc=1536,freq=3.0), product of:
              0.40967903 = queryWeight, product of:
                4.185826 = boost
                9.617446 = idf(docFreq=7, maxDocs=44218)
                0.010176603 = queryNorm
              0.65069944 = fieldWeight in 1536, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.617446 = idf(docFreq=7, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1536)
        0.28 = coord(7/25)
    
  3. Rayson, P.; Piao, S.; Sharoff, S.; Evert, S.; Moiron, B.V.: Multiword expressions : hard going or plain sailing? (2015) 0.18
    0.17982027 = sum of:
      0.17982027 = product of:
        1.1238767 = sum of:
          0.102201365 = weight(abstract_txt:expressions in 2918) [ClassicSimilarity], result of:
            0.102201365 = score(doc=2918,freq=2.0), product of:
              0.13620052 = queryWeight, product of:
                1.9706208 = boost
                6.7916126 = idf(docFreq=134, maxDocs=44218)
                0.010176603 = queryNorm
              0.75037426 = fieldWeight in 2918, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.7916126 = idf(docFreq=134, maxDocs=44218)
                0.078125 = fieldNorm(doc=2918)
          0.051658824 = weight(abstract_txt:semantic in 2918) [ClassicSimilarity], result of:
            0.051658824 = score(doc=2918,freq=1.0), product of:
              0.14778396 = queryWeight, product of:
                3.2456174 = boost
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.010176603 = queryNorm
              0.34955636 = fieldWeight in 2918, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.078125 = fieldNorm(doc=2918)
          0.43531987 = weight(abstract_txt:mwes in 2918) [ClassicSimilarity], result of:
            0.43531987 = score(doc=2918,freq=2.0), product of:
              0.40967903 = queryWeight, product of:
                4.185826 = boost
                9.617446 = idf(docFreq=7, maxDocs=44218)
                0.010176603 = queryNorm
              1.0625876 = fieldWeight in 2918, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.617446 = idf(docFreq=7, maxDocs=44218)
                0.078125 = fieldNorm(doc=2918)
          0.53469664 = weight(abstract_txt:compositionality in 2918) [ClassicSimilarity], result of:
            0.53469664 = score(doc=2918,freq=1.0), product of:
              0.70189035 = queryWeight, product of:
                7.0732384 = boost
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.010176603 = queryNorm
              0.7617951 = fieldWeight in 2918, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.078125 = fieldNorm(doc=2918)
        0.16 = coord(4/25)
    
  4. Kiela, D.; Clark, S.: Detecting compositionality of multi-word expressions using nearest neighbours in vector space models (2013) 0.18
    0.17640804 = sum of:
      0.17640804 = product of:
        1.470067 = sum of:
          0.10117419 = weight(abstract_txt:expressions in 1161) [ClassicSimilarity], result of:
            0.10117419 = score(doc=1161,freq=1.0), product of:
              0.13620052 = queryWeight, product of:
                1.9706208 = boost
                6.7916126 = idf(docFreq=134, maxDocs=44218)
                0.010176603 = queryNorm
              0.74283266 = fieldWeight in 1161, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7916126 = idf(docFreq=134, maxDocs=44218)
                0.109375 = fieldNorm(doc=1161)
          0.07232235 = weight(abstract_txt:semantic in 1161) [ClassicSimilarity], result of:
            0.07232235 = score(doc=1161,freq=1.0), product of:
              0.14778396 = queryWeight, product of:
                3.2456174 = boost
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.010176603 = queryNorm
              0.4893789 = fieldWeight in 1161, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.109375 = fieldNorm(doc=1161)
          1.2965704 = weight(abstract_txt:compositionality in 1161) [ClassicSimilarity], result of:
            1.2965704 = score(doc=1161,freq=3.0), product of:
              0.70189035 = queryWeight, product of:
                7.0732384 = boost
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.010176603 = queryNorm
              1.847255 = fieldWeight in 1161, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.109375 = fieldNorm(doc=1161)
        0.12 = coord(3/25)
    
  5. Nissim, M.; Zaninello, A,: Modeling the internal variability of multiword expressions through a pattern-based method (2013) 0.14
    0.14453508 = sum of:
      0.14453508 = product of:
        0.7226754 = sum of:
          0.012683662 = weight(abstract_txt:based in 990) [ClassicSimilarity], result of:
            0.012683662 = score(doc=990,freq=2.0), product of:
              0.045013335 = queryWeight, product of:
                1.3874902 = boost
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.010176603 = queryNorm
              0.28177565 = fieldWeight in 990, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.0625 = fieldNorm(doc=990)
          0.04012853 = weight(abstract_txt:modeling in 990) [ClassicSimilarity], result of:
            0.04012853 = score(doc=990,freq=1.0), product of:
              0.1067726 = queryWeight, product of:
                1.7447916 = boost
                6.0133076 = idf(docFreq=293, maxDocs=44218)
                0.010176603 = queryNorm
              0.37583172 = fieldWeight in 990, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0133076 = idf(docFreq=293, maxDocs=44218)
                0.0625 = fieldNorm(doc=990)
          0.057813823 = weight(abstract_txt:expressions in 990) [ClassicSimilarity], result of:
            0.057813823 = score(doc=990,freq=1.0), product of:
              0.13620052 = queryWeight, product of:
                1.9706208 = boost
                6.7916126 = idf(docFreq=134, maxDocs=44218)
                0.010176603 = queryNorm
              0.4244758 = fieldWeight in 990, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7916126 = idf(docFreq=134, maxDocs=44218)
                0.0625 = fieldNorm(doc=990)
          0.11954115 = weight(abstract_txt:multiword in 990) [ClassicSimilarity], result of:
            0.11954115 = score(doc=990,freq=1.0), product of:
              0.22105615 = queryWeight, product of:
                2.5105274 = boost
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.010176603 = queryNorm
              0.5407728 = fieldWeight in 990, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.0625 = fieldNorm(doc=990)
          0.49250823 = weight(abstract_txt:mwes in 990) [ClassicSimilarity], result of:
            0.49250823 = score(doc=990,freq=4.0), product of:
              0.40967903 = queryWeight, product of:
                4.185826 = boost
                9.617446 = idf(docFreq=7, maxDocs=44218)
                0.010176603 = queryNorm
              1.2021807 = fieldWeight in 990, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                9.617446 = idf(docFreq=7, maxDocs=44218)
                0.0625 = fieldNorm(doc=990)
        0.2 = coord(5/25)