WordHoard: finding multiword units (20??)
0.05
0.048865046 = product of:
0.24432522 = sum of:
0.24432522 = weight(_text_:grams in 1123) [ClassicSimilarity], result of:
0.24432522 = score(doc=1123,freq=2.0), product of:
0.39198354 = queryWeight, product of:
8.059301 = idf(docFreq=37, maxDocs=44218)
0.04863741 = queryNorm
0.6233048 = fieldWeight in 1123, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
8.059301 = idf(docFreq=37, maxDocs=44218)
0.0546875 = fieldNorm(doc=1123)
0.2 = coord(1/5)
- Abstract
- WordHoard defines a multiword unit as a special type of collocate in which the component words comprise a meaningful phrase. For example, "Knight of the Round Table" is a meaningful multiword unit or phrase. WordHoard uses the notion of a pseudo-bigram to generalize the computation of bigram (two word) statistical measures to phrases (n-grams) longer than two words, and to allow comparisons of these measures for phrases with different word counts. WordHoard applies the localmaxs algorithm of Silva et al. to the pseudo-bigrams to identify potential compositional phrases that "stand out" in a text. WordHoard can also filter two and three word phrases using the word class filters suggested by Justeson and Katz.