WordHoard: finding multiword units (20??)
0.02
0.016259817 = product of:
0.032519635 = sum of:
0.032519635 = product of:
0.06503927 = sum of:
0.06503927 = weight(_text_:n in 1123) [ClassicSimilarity], result of:
0.06503927 = score(doc=1123,freq=2.0), product of:
0.19504215 = queryWeight, product of:
4.3116565 = idf(docFreq=1611, maxDocs=44218)
0.045236014 = queryNorm
0.33346266 = fieldWeight in 1123, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
4.3116565 = idf(docFreq=1611, maxDocs=44218)
0.0546875 = fieldNorm(doc=1123)
0.5 = coord(1/2)
0.5 = coord(1/2)
- Abstract
- WordHoard defines a multiword unit as a special type of collocate in which the component words comprise a meaningful phrase. For example, "Knight of the Round Table" is a meaningful multiword unit or phrase. WordHoard uses the notion of a pseudo-bigram to generalize the computation of bigram (two word) statistical measures to phrases (n-grams) longer than two words, and to allow comparisons of these measures for phrases with different word counts. WordHoard applies the localmaxs algorithm of Silva et al. to the pseudo-bigrams to identify potential compositional phrases that "stand out" in a text. WordHoard can also filter two and three word phrases using the word class filters suggested by Justeson and Katz.