Document (#38162)

Author
Kiela, D.
Clark, S.
Title
Detecting compositionality of multi-word expressions using nearest neighbours in vector space models
Source
http://www.cl.cam.ac.uk/~dk427/papers/emnlp2013.pdf
Year
2013
Abstract
We present a novel unsupervised approach to detecting the compositionality of multi-word expressions. We compute the compositionality of a phrase through substituting the constituent words with their "neighbours" in a semantic vector space and averaging over the distance between the original phrase and the substituted neighbour phrases. Several methods of obtaining neighbours are presented. The results are compared to existing supervised results and achieve state-of-the-art performance on a verb-object dataset of human compositionality ratings.
Theme
Computerlinguistik

Similar documents (author)

  1. Clark, K.: CD-ROM retrieval software : the year in review (1992) 5.13
    5.125237 = sum of:
      5.125237 = weight(author_txt:clark in 2338) [ClassicSimilarity], result of:
        5.125237 = fieldWeight in 2338, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.200379 = idf(docFreq=32, maxDocs=44218)
          0.625 = fieldNorm(doc=2338)
    
  2. Clark, K.: CD-ROM retrieval software : the year 1992 in review (1993) 5.13
    5.125237 = sum of:
      5.125237 = weight(author_txt:clark in 2354) [ClassicSimilarity], result of:
        5.125237 = fieldWeight in 2354, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.200379 = idf(docFreq=32, maxDocs=44218)
          0.625 = fieldNorm(doc=2354)
    
  3. Clark, A.J.: Education and training for librarianship and information work : annual bibliography, 1990 (1991) 5.13
    5.125237 = sum of:
      5.125237 = weight(author_txt:clark in 2692) [ClassicSimilarity], result of:
        5.125237 = fieldWeight in 2692, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.200379 = idf(docFreq=32, maxDocs=44218)
          0.625 = fieldNorm(doc=2692)
    
  4. Clark, D.: Mad cows, metathesauri, and meaning (1999) 5.13
    5.125237 = sum of:
      5.125237 = weight(author_txt:clark in 2728) [ClassicSimilarity], result of:
        5.125237 = fieldWeight in 2728, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.200379 = idf(docFreq=32, maxDocs=44218)
          0.625 = fieldNorm(doc=2728)
    
  5. Clark, K.: To cancel or not to cancel (print indexes) (1992) 5.13
    5.125237 = sum of:
      5.125237 = weight(author_txt:clark in 3685) [ClassicSimilarity], result of:
        5.125237 = fieldWeight in 3685, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.200379 = idf(docFreq=32, maxDocs=44218)
          0.625 = fieldNorm(doc=3685)
    

Similar documents (content)

  1. Cruys, T. van de; Moirón, B.V.: Semantics-based multiword expression extraction (2007) 0.12
    0.12069381 = sum of:
      0.12069381 = product of:
        1.0057818 = sum of:
          0.07046654 = weight(abstract_txt:unsupervised in 2919) [ClassicSimilarity], result of:
            0.07046654 = score(doc=2919,freq=1.0), product of:
              0.09864925 = queryWeight, product of:
                1.1793846 = boost
                7.61935 = idf(docFreq=58, maxDocs=44218)
                0.0109779285 = queryNorm
              0.71431404 = fieldWeight in 2919, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.61935 = idf(docFreq=58, maxDocs=44218)
                0.09375 = fieldNorm(doc=2919)
          0.09981089 = weight(abstract_txt:expressions in 2919) [ClassicSimilarity], result of:
            0.09981089 = score(doc=2919,freq=1.0), product of:
              0.15675946 = queryWeight, product of:
                2.1025214 = boost
                6.7916126 = idf(docFreq=134, maxDocs=44218)
                0.0109779285 = queryNorm
              0.6367137 = fieldWeight in 2919, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7916126 = idf(docFreq=134, maxDocs=44218)
                0.09375 = fieldNorm(doc=2919)
          0.8355043 = weight(abstract_txt:compositionality in 2919) [ClassicSimilarity], result of:
            0.8355043 = score(doc=2919,freq=2.0), product of:
              0.64627033 = queryWeight, product of:
                6.0373406 = boost
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.0109779285 = queryNorm
              1.2928092 = fieldWeight in 2919, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.09375 = fieldNorm(doc=2919)
        0.12 = coord(3/25)
    
  2. Rayson, P.; Piao, S.; Sharoff, S.; Evert, S.; Moiron, B.V.: Multiword expressions : hard going or plain sailing? (2015) 0.12
    0.11985874 = sum of:
      0.11985874 = product of:
        0.74911714 = sum of:
          0.06029572 = weight(abstract_txt:word in 2918) [ClassicSimilarity], result of:
            0.06029572 = score(doc=2918,freq=2.0), product of:
              0.10040383 = queryWeight, product of:
                1.682669 = boost
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.0109779285 = queryNorm
              0.60053205 = fieldWeight in 2918, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.078125 = fieldNorm(doc=2918)
          0.07886755 = weight(abstract_txt:multi in 2918) [ClassicSimilarity], result of:
            0.07886755 = score(doc=2918,freq=2.0), product of:
              0.12008575 = queryWeight, product of:
                1.8402182 = boost
                5.9443145 = idf(docFreq=314, maxDocs=44218)
                0.0109779285 = queryNorm
              0.6567602 = fieldWeight in 2918, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.9443145 = idf(docFreq=314, maxDocs=44218)
                0.078125 = fieldNorm(doc=2918)
          0.11762826 = weight(abstract_txt:expressions in 2918) [ClassicSimilarity], result of:
            0.11762826 = score(doc=2918,freq=2.0), product of:
              0.15675946 = queryWeight, product of:
                2.1025214 = boost
                6.7916126 = idf(docFreq=134, maxDocs=44218)
                0.0109779285 = queryNorm
              0.75037426 = fieldWeight in 2918, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.7916126 = idf(docFreq=134, maxDocs=44218)
                0.078125 = fieldNorm(doc=2918)
          0.49232557 = weight(abstract_txt:compositionality in 2918) [ClassicSimilarity], result of:
            0.49232557 = score(doc=2918,freq=1.0), product of:
              0.64627033 = queryWeight, product of:
                6.0373406 = boost
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.0109779285 = queryNorm
              0.7617951 = fieldWeight in 2918, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.078125 = fieldNorm(doc=2918)
        0.16 = coord(4/25)
    
  3. Snajder, J.; Almic, P.: Modeling semantic compositionality of Croatian multiword expressions (2015) 0.10
    0.10251138 = sum of:
      0.10251138 = product of:
        1.2813923 = sum of:
          0.09981089 = weight(abstract_txt:expressions in 2920) [ClassicSimilarity], result of:
            0.09981089 = score(doc=2920,freq=1.0), product of:
              0.15675946 = queryWeight, product of:
                2.1025214 = boost
                6.7916126 = idf(docFreq=134, maxDocs=44218)
                0.0109779285 = queryNorm
              0.6367137 = fieldWeight in 2920, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7916126 = idf(docFreq=134, maxDocs=44218)
                0.09375 = fieldNorm(doc=2920)
          1.1815815 = weight(abstract_txt:compositionality in 2920) [ClassicSimilarity], result of:
            1.1815815 = score(doc=2920,freq=4.0), product of:
              0.64627033 = queryWeight, product of:
                6.0373406 = boost
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.0109779285 = queryNorm
              1.8283083 = fieldWeight in 2920, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.09375 = fieldNorm(doc=2920)
        0.08 = coord(2/25)
    
  4. Mohan, K.C.: Boolean and nearest neighbour text searching in a multi-strategy retrieval system (1996) 0.09
    0.08848462 = sum of:
      0.08848462 = product of:
        0.4424231 = sum of:
          0.015698215 = weight(abstract_txt:results in 7255) [ClassicSimilarity], result of:
            0.015698215 = score(doc=7255,freq=1.0), product of:
              0.04121457 = queryWeight, product of:
                1.0780749 = boost
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.0109779285 = queryNorm
              0.38088992 = fieldWeight in 7255, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.109375 = fieldNorm(doc=7255)
          0.10029878 = weight(abstract_txt:nearest in 7255) [ClassicSimilarity], result of:
            0.10029878 = score(doc=7255,freq=1.0), product of:
              0.1126344 = queryWeight, product of:
                1.2602134 = boost
                8.14154 = idf(docFreq=34, maxDocs=44218)
                0.0109779285 = queryNorm
              0.8904809 = fieldWeight in 7255, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.14154 = idf(docFreq=34, maxDocs=44218)
                0.109375 = fieldNorm(doc=7255)
          0.14528978 = weight(abstract_txt:neighbour in 7255) [ClassicSimilarity], result of:
            0.14528978 = score(doc=7255,freq=1.0), product of:
              0.14419958 = queryWeight, product of:
                1.4259049 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.0109779285 = queryNorm
              1.0075604 = fieldWeight in 7255, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.109375 = fieldNorm(doc=7255)
          0.07807489 = weight(abstract_txt:multi in 7255) [ClassicSimilarity], result of:
            0.07807489 = score(doc=7255,freq=1.0), product of:
              0.12008575 = queryWeight, product of:
                1.8402182 = boost
                5.9443145 = idf(docFreq=314, maxDocs=44218)
                0.0109779285 = queryNorm
              0.6501594 = fieldWeight in 7255, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9443145 = idf(docFreq=314, maxDocs=44218)
                0.109375 = fieldNorm(doc=7255)
          0.10306144 = weight(abstract_txt:vector in 7255) [ClassicSimilarity], result of:
            0.10306144 = score(doc=7255,freq=1.0), product of:
              0.1445045 = queryWeight, product of:
                2.018665 = boost
                6.5207376 = idf(docFreq=176, maxDocs=44218)
                0.0109779285 = queryNorm
              0.7132057 = fieldWeight in 7255, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5207376 = idf(docFreq=176, maxDocs=44218)
                0.109375 = fieldNorm(doc=7255)
        0.2 = coord(5/25)
    
  5. Nagy T., I.: Detecting multiword expressions and named entities in natural language texts (2014) 0.07
    0.06757537 = sum of:
      0.06757537 = product of:
        0.28156406 = sum of:
          0.017898034 = weight(abstract_txt:distance in 1536) [ClassicSimilarity], result of:
            0.017898034 = score(doc=1536,freq=1.0), product of:
              0.07092231 = queryWeight, product of:
                6.4604454 = idf(docFreq=187, maxDocs=44218)
                0.0109779285 = queryNorm
              0.25236115 = fieldWeight in 1536, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.4604454 = idf(docFreq=187, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1536)
          0.0056065056 = weight(abstract_txt:results in 1536) [ClassicSimilarity], result of:
            0.0056065056 = score(doc=1536,freq=1.0), product of:
              0.04121457 = queryWeight, product of:
                1.0780749 = boost
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.0109779285 = queryNorm
              0.13603212 = fieldWeight in 1536, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1536)
          0.039015308 = weight(abstract_txt:supervised in 1536) [ClassicSimilarity], result of:
            0.039015308 = score(doc=1536,freq=2.0), product of:
              0.09463664 = queryWeight, product of:
                1.1551496 = boost
                7.462781 = idf(docFreq=68, maxDocs=44218)
                0.0109779285 = queryNorm
              0.4122643 = fieldWeight in 1536, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.462781 = idf(docFreq=68, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1536)
          0.080098175 = weight(abstract_txt:verb in 1536) [ClassicSimilarity], result of:
            0.080098175 = score(doc=1536,freq=5.0), product of:
              0.1126344 = queryWeight, product of:
                1.2602134 = boost
                8.14154 = idf(docFreq=34, maxDocs=44218)
                0.0109779285 = queryNorm
              0.7111342 = fieldWeight in 1536, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                8.14154 = idf(docFreq=34, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1536)
          0.021317756 = weight(abstract_txt:word in 1536) [ClassicSimilarity], result of:
            0.021317756 = score(doc=1536,freq=1.0), product of:
              0.10040383 = queryWeight, product of:
                1.682669 = boost
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.0109779285 = queryNorm
              0.21232015 = fieldWeight in 1536, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4353957 = idf(docFreq=523, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1536)
          0.11762826 = weight(abstract_txt:expressions in 1536) [ClassicSimilarity], result of:
            0.11762826 = score(doc=1536,freq=8.0), product of:
              0.15675946 = queryWeight, product of:
                2.1025214 = boost
                6.7916126 = idf(docFreq=134, maxDocs=44218)
                0.0109779285 = queryNorm
              0.75037426 = fieldWeight in 1536, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                6.7916126 = idf(docFreq=134, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1536)
        0.24 = coord(6/25)