Cruys, T. van de; Moirón, B.V.: Semantics-based multiword expression extraction (2007)
0.02
0.017565278 = product of:
0.052695833 = sum of:
0.052695833 = weight(_text_:resources in 2919) [ClassicSimilarity], result of:
0.052695833 = score(doc=2919,freq=2.0), product of:
0.18665522 = queryWeight, product of:
3.650338 = idf(docFreq=3122, maxDocs=44218)
0.051133685 = queryNorm
0.28231642 = fieldWeight in 2919, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.650338 = idf(docFreq=3122, maxDocs=44218)
0.0546875 = fieldNorm(doc=2919)
0.33333334 = coord(1/3)
- Abstract
- This paper describes a fully unsupervised and automated method for large-scale extraction of multiword expressions (MWEs) from large corpora. The method aims at capturing the non-compositionality of MWEs; the intuition is that a noun within a MWE cannot easily be replaced by a semantically similar noun. To implement this intuition, a noun clustering is automatically extracted (using distributional similarity measures), which gives us clusters of semantically related nouns. Next, a number of statistical measures - based on selectional preferences - is developed that formalize the intuition of non-compositionality. Our approach has been tested on Dutch, and automatically evaluated using Dutch lexical resources.