Brin, S.: Extracting patterns and relations from the World Wide Web (1999)
0.02
0.02317223 = product of:
0.10813707 = sum of:
0.06362897 = weight(_text_:wide in 3970) [ClassicSimilarity], result of:
0.06362897 = score(doc=3970,freq=4.0), product of:
0.1312982 = queryWeight, product of:
4.4307585 = idf(docFreq=1430, maxDocs=44218)
0.029633347 = queryNorm
0.4846142 = fieldWeight in 3970, product of:
2.0 = tf(freq=4.0), with freq of:
4.0 = termFreq=4.0
4.4307585 = idf(docFreq=1430, maxDocs=44218)
0.0546875 = fieldNorm(doc=3970)
0.034519844 = weight(_text_:web in 3970) [ClassicSimilarity], result of:
0.034519844 = score(doc=3970,freq=4.0), product of:
0.09670874 = queryWeight, product of:
3.2635105 = idf(docFreq=4597, maxDocs=44218)
0.029633347 = queryNorm
0.35694647 = fieldWeight in 3970, product of:
2.0 = tf(freq=4.0), with freq of:
4.0 = termFreq=4.0
3.2635105 = idf(docFreq=4597, maxDocs=44218)
0.0546875 = fieldNorm(doc=3970)
0.009988253 = weight(_text_:information in 3970) [ClassicSimilarity], result of:
0.009988253 = score(doc=3970,freq=4.0), product of:
0.052020688 = queryWeight, product of:
1.7554779 = idf(docFreq=20772, maxDocs=44218)
0.029633347 = queryNorm
0.1920054 = fieldWeight in 3970, product of:
2.0 = tf(freq=4.0), with freq of:
4.0 = termFreq=4.0
1.7554779 = idf(docFreq=20772, maxDocs=44218)
0.0546875 = fieldNorm(doc=3970)
0.21428572 = coord(3/14)
- Abstract
- The WWW is a vast resource for information. At the same time it is extremely distributed. A particular type of data such as restaurant lists may be scattered across thousands of independent information sources in many different formats. In this paper, we consider the problem of extracting a relation for such a data type from all of these sources automatically. We present a technique which exploits the duality between sets of patterns and relations to grow the target relation starting from a small sample. To test our technique we use it to extract a relation of (author, title) pairs from the WWW
- Source
- The World Wide Web and Databases: International Workshop WebDB'98, Valencia, Spain, March 27-28, 1998, Selected papers. Eds.: P. Atzeni et al