Document (#38163)

Author
Kiela, D.
Clark, S.
Title
Detecting compositionality of multi-word expressions using nearest neighbours in vector space models
Source
http://www.cl.cam.ac.uk/~dk427/papers/emnlp2013.pdf
Year
2013
Abstract
We present a novel unsupervised approach to detecting the compositionality of multi-word expressions. We compute the compositionality of a phrase through substituting the constituent words with their "neighbours" in a semantic vector space and averaging over the distance between the original phrase and the substituted neighbour phrases. Several methods of obtaining neighbours are presented. The results are compared to existing supervised results and achieve state-of-the-art performance on a verb-object dataset of human compositionality ratings.
Theme
Computerlinguistik

Similar documents (author)

  1. Clark, K.: CD-ROM retrieval software : the year in review (1992) 5.12
    5.123222 = sum of:
      5.123222 = weight(author_txt:clark in 2338) [ClassicSimilarity], result of:
        5.123222 = fieldWeight in 2338, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.197155 = idf(docFreq=31, maxDocs=42740)
          0.625 = fieldNorm(doc=2338)
    
  2. Clark, K.: CD-ROM retrieval software : the year 1992 in review (1993) 5.12
    5.123222 = sum of:
      5.123222 = weight(author_txt:clark in 2354) [ClassicSimilarity], result of:
        5.123222 = fieldWeight in 2354, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.197155 = idf(docFreq=31, maxDocs=42740)
          0.625 = fieldNorm(doc=2354)
    
  3. Clark, A.J.: Education and training for librarianship and information work : annual bibliography, 1990 (1991) 5.12
    5.123222 = sum of:
      5.123222 = weight(author_txt:clark in 2692) [ClassicSimilarity], result of:
        5.123222 = fieldWeight in 2692, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.197155 = idf(docFreq=31, maxDocs=42740)
          0.625 = fieldNorm(doc=2692)
    
  4. Clark, D.: Mad cows, metathesauri, and meaning (1999) 5.12
    5.123222 = sum of:
      5.123222 = weight(author_txt:clark in 2728) [ClassicSimilarity], result of:
        5.123222 = fieldWeight in 2728, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.197155 = idf(docFreq=31, maxDocs=42740)
          0.625 = fieldNorm(doc=2728)
    
  5. Clark, K.: To cancel or not to cancel (print indexes) (1992) 5.12
    5.123222 = sum of:
      5.123222 = weight(author_txt:clark in 3685) [ClassicSimilarity], result of:
        5.123222 = fieldWeight in 3685, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.197155 = idf(docFreq=31, maxDocs=42740)
          0.625 = fieldNorm(doc=3685)
    

Similar documents (content)

  1. Cruys, T. van de; Moirón, B.V.: Semantics-based multiword expression extraction (2007) 0.12
    0.11946249 = sum of:
      0.11946249 = product of:
        0.9955208 = sum of:
          0.07113101 = weight(abstract_txt:unsupervised in 4920) [ClassicSimilarity], result of:
            0.07113101 = score(doc=4920,freq=1.0), product of:
              0.09910851 = queryWeight, product of:
                1.1812114 = boost
                7.655557 = idf(docFreq=54, maxDocs=42740)
                0.010959898 = queryNorm
              0.71770847 = fieldWeight in 4920, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.655557 = idf(docFreq=54, maxDocs=42740)
                0.09375 = fieldNorm(doc=4920)
          0.10158363 = weight(abstract_txt:expressions in 4920) [ClassicSimilarity], result of:
            0.10158363 = score(doc=4920,freq=1.0), product of:
              0.15835463 = queryWeight, product of:
                2.1115556 = boost
                6.842609 = idf(docFreq=123, maxDocs=42740)
                0.010959898 = queryNorm
              0.6414946 = fieldWeight in 4920, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.842609 = idf(docFreq=123, maxDocs=42740)
                0.09375 = fieldNorm(doc=4920)
          0.8228061 = weight(abstract_txt:compositionality in 4920) [ClassicSimilarity], result of:
            0.8228061 = score(doc=4920,freq=2.0), product of:
              0.63867503 = queryWeight, product of:
                5.9971113 = boost
                9.71698 = idf(docFreq=6, maxDocs=42740)
                0.010959898 = queryNorm
              1.2883017 = fieldWeight in 4920, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.71698 = idf(docFreq=6, maxDocs=42740)
                0.09375 = fieldNorm(doc=4920)
        0.12 = coord(3/25)
    
  2. Rayson, P.; Piao, S.; Sharoff, S.; Evert, S.; Moiron, B.V.: Multiword expressions : hard going or plain sailing? (2015) 0.12
    0.11916859 = sum of:
      0.11916859 = product of:
        0.74480367 = sum of:
          0.060633063 = weight(abstract_txt:word in 4919) [ClassicSimilarity], result of:
            0.060633063 = score(doc=4919,freq=2.0), product of:
              0.10061563 = queryWeight, product of:
                1.6831386 = boost
                5.4543004 = idf(docFreq=496, maxDocs=42740)
                0.010959898 = queryNorm
              0.6026207 = fieldWeight in 4919, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4543004 = idf(docFreq=496, maxDocs=42740)
                0.078125 = fieldNorm(doc=4919)
          0.07960999 = weight(abstract_txt:multi in 4919) [ClassicSimilarity], result of:
            0.07960999 = score(doc=4919,freq=2.0), product of:
              0.12064356 = queryWeight, product of:
                1.843059 = boost
                5.972531 = idf(docFreq=295, maxDocs=42740)
                0.010959898 = queryNorm
              0.65987766 = fieldWeight in 4919, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.972531 = idf(docFreq=295, maxDocs=42740)
                0.078125 = fieldNorm(doc=4919)
          0.119717464 = weight(abstract_txt:expressions in 4919) [ClassicSimilarity], result of:
            0.119717464 = score(doc=4919,freq=2.0), product of:
              0.15835463 = queryWeight, product of:
                2.1115556 = boost
                6.842609 = idf(docFreq=123, maxDocs=42740)
                0.010959898 = queryNorm
              0.7560086 = fieldWeight in 4919, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.842609 = idf(docFreq=123, maxDocs=42740)
                0.078125 = fieldNorm(doc=4919)
          0.48484316 = weight(abstract_txt:compositionality in 4919) [ClassicSimilarity], result of:
            0.48484316 = score(doc=4919,freq=1.0), product of:
              0.63867503 = queryWeight, product of:
                5.9971113 = boost
                9.71698 = idf(docFreq=6, maxDocs=42740)
                0.010959898 = queryNorm
              0.75913906 = fieldWeight in 4919, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.71698 = idf(docFreq=6, maxDocs=42740)
                0.078125 = fieldNorm(doc=4919)
        0.16 = coord(4/25)
    
  3. Snajder, J.; Almic, P.: Modeling semantic compositionality of Croatian multiword expressions (2015) 0.10
    0.10121657 = sum of:
      0.10121657 = product of:
        1.2652072 = sum of:
          0.10158363 = weight(abstract_txt:expressions in 4921) [ClassicSimilarity], result of:
            0.10158363 = score(doc=4921,freq=1.0), product of:
              0.15835463 = queryWeight, product of:
                2.1115556 = boost
                6.842609 = idf(docFreq=123, maxDocs=42740)
                0.010959898 = queryNorm
              0.6414946 = fieldWeight in 4921, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.842609 = idf(docFreq=123, maxDocs=42740)
                0.09375 = fieldNorm(doc=4921)
          1.1636236 = weight(abstract_txt:compositionality in 4921) [ClassicSimilarity], result of:
            1.1636236 = score(doc=4921,freq=4.0), product of:
              0.63867503 = queryWeight, product of:
                5.9971113 = boost
                9.71698 = idf(docFreq=6, maxDocs=42740)
                0.010959898 = queryNorm
              1.8219337 = fieldWeight in 4921, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                9.71698 = idf(docFreq=6, maxDocs=42740)
                0.09375 = fieldNorm(doc=4921)
        0.08 = coord(2/25)
    
  4. Mohan, K.C.: Boolean and nearest neighbour text searching in a multi-strategy retrieval system (1996) 0.09
    0.08816125 = sum of:
      0.08816125 = product of:
        0.44080627 = sum of:
          0.015958877 = weight(abstract_txt:results in 325) [ClassicSimilarity], result of:
            0.015958877 = score(doc=325,freq=1.0), product of:
              0.04160242 = queryWeight, product of:
                1.082297 = boost
                3.5072412 = idf(docFreq=3482, maxDocs=42740)
                0.010959898 = queryNorm
              0.38360453 = fieldWeight in 325, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5072412 = idf(docFreq=3482, maxDocs=42740)
                0.109375 = fieldNorm(doc=325)
          0.10073134 = weight(abstract_txt:nearest in 325) [ClassicSimilarity], result of:
            0.10073134 = score(doc=325,freq=1.0), product of:
              0.11277603 = queryWeight, product of:
                1.2600291 = boost
                8.166383 = idf(docFreq=32, maxDocs=42740)
                0.010959898 = queryNorm
              0.89319813 = fieldWeight in 325, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.166383 = idf(docFreq=32, maxDocs=42740)
                0.109375 = fieldNorm(doc=325)
          0.14299384 = weight(abstract_txt:neighbour in 325) [ClassicSimilarity], result of:
            0.14299384 = score(doc=325,freq=1.0), product of:
              0.14244656 = queryWeight, product of:
                1.4161137 = boost
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.010959898 = queryNorm
              1.003842 = fieldWeight in 325, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.109375 = fieldNorm(doc=325)
          0.07880987 = weight(abstract_txt:multi in 325) [ClassicSimilarity], result of:
            0.07880987 = score(doc=325,freq=1.0), product of:
              0.12064356 = queryWeight, product of:
                1.843059 = boost
                5.972531 = idf(docFreq=295, maxDocs=42740)
                0.010959898 = queryNorm
              0.65324557 = fieldWeight in 325, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.972531 = idf(docFreq=295, maxDocs=42740)
                0.109375 = fieldNorm(doc=325)
          0.10231231 = weight(abstract_txt:vector in 325) [ClassicSimilarity], result of:
            0.10231231 = score(doc=325,freq=1.0), product of:
              0.14357175 = queryWeight, product of:
                2.0105813 = boost
                6.515396 = idf(docFreq=171, maxDocs=42740)
                0.010959898 = queryNorm
              0.71262145 = fieldWeight in 325, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.515396 = idf(docFreq=171, maxDocs=42740)
                0.109375 = fieldNorm(doc=325)
        0.2 = coord(5/25)
    
  5. Nagy T., I.: Detecting multiword expressions and named entities in natural language texts (2014) 0.07
    0.0687867 = sum of:
      0.0687867 = product of:
        0.28661126 = sum of:
          0.017983114 = weight(abstract_txt:distance in 3537) [ClassicSimilarity], result of:
            0.017983114 = score(doc=3537,freq=1.0), product of:
              0.07103227 = queryWeight, product of:
                6.4811068 = idf(docFreq=177, maxDocs=42740)
                0.010959898 = queryNorm
              0.25316823 = fieldWeight in 3537, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.4811068 = idf(docFreq=177, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3537)
          0.0056995987 = weight(abstract_txt:results in 3537) [ClassicSimilarity], result of:
            0.0056995987 = score(doc=3537,freq=1.0), product of:
              0.04160242 = queryWeight, product of:
                1.082297 = boost
                3.5072412 = idf(docFreq=3482, maxDocs=42740)
                0.010959898 = queryNorm
              0.1370016 = fieldWeight in 3537, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5072412 = idf(docFreq=3482, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3537)
          0.041330423 = weight(abstract_txt:supervised in 3537) [ClassicSimilarity], result of:
            0.041330423 = score(doc=3537,freq=2.0), product of:
              0.09818587 = queryWeight, product of:
                1.1757003 = boost
                7.619839 = idf(docFreq=56, maxDocs=42740)
                0.010959898 = queryNorm
              0.42094064 = fieldWeight in 3537, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.619839 = idf(docFreq=56, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3537)
          0.08044362 = weight(abstract_txt:verb in 3537) [ClassicSimilarity], result of:
            0.08044362 = score(doc=3537,freq=5.0), product of:
              0.11277603 = queryWeight, product of:
                1.2600291 = boost
                8.166383 = idf(docFreq=32, maxDocs=42740)
                0.010959898 = queryNorm
              0.7133042 = fieldWeight in 3537, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                8.166383 = idf(docFreq=32, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3537)
          0.021437025 = weight(abstract_txt:word in 3537) [ClassicSimilarity], result of:
            0.021437025 = score(doc=3537,freq=1.0), product of:
              0.10061563 = queryWeight, product of:
                1.6831386 = boost
                5.4543004 = idf(docFreq=496, maxDocs=42740)
                0.010959898 = queryNorm
              0.2130586 = fieldWeight in 3537, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4543004 = idf(docFreq=496, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3537)
          0.119717464 = weight(abstract_txt:expressions in 3537) [ClassicSimilarity], result of:
            0.119717464 = score(doc=3537,freq=8.0), product of:
              0.15835463 = queryWeight, product of:
                2.1115556 = boost
                6.842609 = idf(docFreq=123, maxDocs=42740)
                0.010959898 = queryNorm
              0.7560086 = fieldWeight in 3537, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                6.842609 = idf(docFreq=123, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3537)
        0.24 = coord(6/25)