Sun, Q.; Shaw, D.; Davis, C.H.: ¬A model for estimating the occurence of same-frequency words and the boundary between high- and low-frequency words in texts (1999)
0.01
0.0068817483 = product of:
0.01720437 = sum of:
0.011678694 = weight(_text_:a in 3063) [ClassicSimilarity], result of:
0.011678694 = score(doc=3063,freq=12.0), product of:
0.053464882 = queryWeight, product of:
1.153047 = idf(docFreq=37942, maxDocs=44218)
0.046368346 = queryNorm
0.21843673 = fieldWeight in 3063, product of:
3.4641016 = tf(freq=12.0), with freq of:
12.0 = termFreq=12.0
1.153047 = idf(docFreq=37942, maxDocs=44218)
0.0546875 = fieldNorm(doc=3063)
0.005525676 = product of:
0.011051352 = sum of:
0.011051352 = weight(_text_:information in 3063) [ClassicSimilarity], result of:
0.011051352 = score(doc=3063,freq=2.0), product of:
0.08139861 = queryWeight, product of:
1.7554779 = idf(docFreq=20772, maxDocs=44218)
0.046368346 = queryNorm
0.13576832 = fieldWeight in 3063, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
1.7554779 = idf(docFreq=20772, maxDocs=44218)
0.0546875 = fieldNorm(doc=3063)
0.5 = coord(1/2)
0.4 = coord(2/5)
- Abstract
- A simpler model is proposed for estimating the frequency of any same-frequency words and identifying the boundary point between high-frequency words and low-frequency words in a text. The model, based on a 'maximum-ranking method', assigns ranks to the words and estimates word frequency by a formula. The boundary value between high-frequency and low-frequency words is obtained by taking the square root of the number of different words in the text. This straightforward model was used successfully with both English and Chinese texts
- Source
- Journal of the American Society for Information Science. 50(1999) no.3, S.280-286
- Type
- a