Document (#21971)

Author
Brin, S.
Title
Extracting patterns and relations from the World Wide Web
Source
The World Wide Web and Databases: International Workshop WebDB'98, Valencia, Spain, March 27-28, 1998, Selected papers. Eds.: P. Atzeni et al
Imprint
Berlin : Springer
Year
1999
Pages
S.172-183
Series
Lecture notes in computer science; vol.1590
Abstract
The WWW is a vast resource for information. At the same time it is extremely distributed. A particular type of data such as restaurant lists may be scattered across thousands of independent information sources in many different formats. In this paper, we consider the problem of extracting a relation for such a data type from all of these sources automatically. We present a technique which exploits the duality between sets of patterns and relations to grow the target relation starting from a small sample. To test our technique we use it to extract a relation of (author, title) pairs from the WWW
Theme
Internet
Object
WWW

Similar documents (content)

  1. Tsuji, K.; Kageura, K.: Automatic generation of Japanese-English bilingual thesauri based on bilingual corpora (2006) 0.17
    0.1683959 = sum of:
      0.1683959 = product of:
        0.60141397 = sum of:
          0.043844048 = weight(abstract_txt:independent in 5061) [ClassicSimilarity], result of:
            0.043844048 = score(doc=5061,freq=1.0), product of:
              0.12066689 = queryWeight, product of:
                1.0158108 = boost
                5.813565 = idf(docFreq=358, maxDocs=44218)
                0.02043303 = queryNorm
              0.3633478 = fieldWeight in 5061, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.813565 = idf(docFreq=358, maxDocs=44218)
                0.0625 = fieldNorm(doc=5061)
          0.09060281 = weight(abstract_txt:extract in 5061) [ClassicSimilarity], result of:
            0.09060281 = score(doc=5061,freq=2.0), product of:
              0.15538132 = queryWeight, product of:
                1.1527051 = boost
                6.5970206 = idf(docFreq=163, maxDocs=44218)
                0.02043303 = queryNorm
              0.5830997 = fieldWeight in 5061, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.5970206 = idf(docFreq=163, maxDocs=44218)
                0.0625 = fieldNorm(doc=5061)
          0.14026761 = weight(abstract_txt:pairs in 5061) [ClassicSimilarity], result of:
            0.14026761 = score(doc=5061,freq=4.0), product of:
              0.16504383 = queryWeight, product of:
                1.1880054 = boost
                6.7990475 = idf(docFreq=133, maxDocs=44218)
                0.02043303 = queryNorm
              0.84988093 = fieldWeight in 5061, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.7990475 = idf(docFreq=133, maxDocs=44218)
                0.0625 = fieldNorm(doc=5061)
          0.017939942 = weight(abstract_txt:such in 5061) [ClassicSimilarity], result of:
            0.017939942 = score(doc=5061,freq=1.0), product of:
              0.08379248 = queryWeight, product of:
                1.1971163 = boost
                3.4255946 = idf(docFreq=3909, maxDocs=44218)
                0.02043303 = queryNorm
              0.21409966 = fieldWeight in 5061, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4255946 = idf(docFreq=3909, maxDocs=44218)
                0.0625 = fieldNorm(doc=5061)
          0.06563723 = weight(abstract_txt:patterns in 5061) [ClassicSimilarity], result of:
            0.06563723 = score(doc=5061,freq=1.0), product of:
              0.19895637 = queryWeight, product of:
                1.8446447 = boost
                5.2785225 = idf(docFreq=612, maxDocs=44218)
                0.02043303 = queryNorm
              0.32990766 = fieldWeight in 5061, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2785225 = idf(docFreq=612, maxDocs=44218)
                0.0625 = fieldNorm(doc=5061)
          0.032640934 = weight(abstract_txt:from in 5061) [ClassicSimilarity], result of:
            0.032640934 = score(doc=5061,freq=3.0), product of:
              0.10909437 = queryWeight, product of:
                1.9317456 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.02043303 = queryNorm
              0.29919907 = fieldWeight in 5061, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.0625 = fieldNorm(doc=5061)
          0.21048135 = weight(abstract_txt:extracting in 5061) [ClassicSimilarity], result of:
            0.21048135 = score(doc=5061,freq=2.0), product of:
              0.34339195 = queryWeight, product of:
                2.423421 = boost
                6.9347134 = idf(docFreq=116, maxDocs=44218)
                0.02043303 = queryNorm
              0.6129478 = fieldWeight in 5061, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.9347134 = idf(docFreq=116, maxDocs=44218)
                0.0625 = fieldNorm(doc=5061)
        0.28 = coord(7/25)
    
  2. Collovini de Abreu, S.; Vieira, R.: RelP: Portuguese open relation extraction (2017) 0.16
    0.16180456 = sum of:
      0.16180456 = product of:
        0.5778734 = sum of:
          0.016573992 = weight(abstract_txt:data in 3621) [ClassicSimilarity], result of:
            0.016573992 = score(doc=3621,freq=1.0), product of:
              0.07948328 = queryWeight, product of:
                1.1659279 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.02043303 = queryNorm
              0.20852174 = fieldWeight in 3621, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.0625 = fieldNorm(doc=3621)
          0.017939942 = weight(abstract_txt:such in 3621) [ClassicSimilarity], result of:
            0.017939942 = score(doc=3621,freq=1.0), product of:
              0.08379248 = queryWeight, product of:
                1.1971163 = boost
                3.4255946 = idf(docFreq=3909, maxDocs=44218)
                0.02043303 = queryNorm
              0.21409966 = fieldWeight in 3621, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4255946 = idf(docFreq=3909, maxDocs=44218)
                0.0625 = fieldNorm(doc=3621)
          0.047710743 = weight(abstract_txt:sources in 3621) [ClassicSimilarity], result of:
            0.047710743 = score(doc=3621,freq=1.0), product of:
              0.16084287 = queryWeight, product of:
                1.6585734 = boost
                4.7460723 = idf(docFreq=1043, maxDocs=44218)
                0.02043303 = queryNorm
              0.29662952 = fieldWeight in 3621, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7460723 = idf(docFreq=1043, maxDocs=44218)
                0.0625 = fieldNorm(doc=3621)
          0.02665121 = weight(abstract_txt:from in 3621) [ClassicSimilarity], result of:
            0.02665121 = score(doc=3621,freq=2.0), product of:
              0.10909437 = queryWeight, product of:
                1.9317456 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.02043303 = queryNorm
              0.24429502 = fieldWeight in 3621, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.0625 = fieldNorm(doc=3621)
          0.076141186 = weight(abstract_txt:relations in 3621) [ClassicSimilarity], result of:
            0.076141186 = score(doc=3621,freq=1.0), product of:
              0.21965316 = queryWeight, product of:
                1.9382175 = boost
                5.5462847 = idf(docFreq=468, maxDocs=44218)
                0.02043303 = queryNorm
              0.3466428 = fieldWeight in 3621, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5462847 = idf(docFreq=468, maxDocs=44218)
                0.0625 = fieldNorm(doc=3621)
          0.07795014 = weight(abstract_txt:technique in 3621) [ClassicSimilarity], result of:
            0.07795014 = score(doc=3621,freq=1.0), product of:
              0.22311853 = queryWeight, product of:
                1.9534469 = boost
                5.5898643 = idf(docFreq=448, maxDocs=44218)
                0.02043303 = queryNorm
              0.34936652 = fieldWeight in 3621, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5898643 = idf(docFreq=448, maxDocs=44218)
                0.0625 = fieldNorm(doc=3621)
          0.31490618 = weight(abstract_txt:relation in 3621) [ClassicSimilarity], result of:
            0.31490618 = score(doc=3621,freq=9.0), product of:
              0.3114546 = queryWeight, product of:
                2.826681 = boost
                5.3924384 = idf(docFreq=546, maxDocs=44218)
                0.02043303 = queryNorm
              1.0110822 = fieldWeight in 3621, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                5.3924384 = idf(docFreq=546, maxDocs=44218)
                0.0625 = fieldNorm(doc=3621)
        0.28 = coord(7/25)
    
  3. Li, J.; Zhang, Z.; Li, X.; Chen, H.: Kernel-based learning for biomedical relation extraction (2008) 0.15
    0.1510874 = sum of:
      0.1510874 = product of:
        0.755437 = sum of:
          0.08008233 = weight(abstract_txt:extract in 1611) [ClassicSimilarity], result of:
            0.08008233 = score(doc=1611,freq=1.0), product of:
              0.15538132 = queryWeight, product of:
                1.1527051 = boost
                6.5970206 = idf(docFreq=163, maxDocs=44218)
                0.02043303 = queryNorm
              0.51539224 = fieldWeight in 1611, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5970206 = idf(docFreq=163, maxDocs=44218)
                0.078125 = fieldNorm(doc=1611)
          0.033314012 = weight(abstract_txt:from in 1611) [ClassicSimilarity], result of:
            0.033314012 = score(doc=1611,freq=2.0), product of:
              0.10909437 = queryWeight, product of:
                1.9317456 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.02043303 = queryNorm
              0.30536878 = fieldWeight in 1611, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.078125 = fieldNorm(doc=1611)
          0.13459986 = weight(abstract_txt:relations in 1611) [ClassicSimilarity], result of:
            0.13459986 = score(doc=1611,freq=2.0), product of:
              0.21965316 = queryWeight, product of:
                1.9382175 = boost
                5.5462847 = idf(docFreq=468, maxDocs=44218)
                0.02043303 = queryNorm
              0.6127837 = fieldWeight in 1611, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.5462847 = idf(docFreq=468, maxDocs=44218)
                0.078125 = fieldNorm(doc=1611)
          0.18604101 = weight(abstract_txt:extracting in 1611) [ClassicSimilarity], result of:
            0.18604101 = score(doc=1611,freq=1.0), product of:
              0.34339195 = queryWeight, product of:
                2.423421 = boost
                6.9347134 = idf(docFreq=116, maxDocs=44218)
                0.02043303 = queryNorm
              0.5417745 = fieldWeight in 1611, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9347134 = idf(docFreq=116, maxDocs=44218)
                0.078125 = fieldNorm(doc=1611)
          0.3213998 = weight(abstract_txt:relation in 1611) [ClassicSimilarity], result of:
            0.3213998 = score(doc=1611,freq=6.0), product of:
              0.3114546 = queryWeight, product of:
                2.826681 = boost
                5.3924384 = idf(docFreq=546, maxDocs=44218)
                0.02043303 = queryNorm
              1.0319315 = fieldWeight in 1611, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.3924384 = idf(docFreq=546, maxDocs=44218)
                0.078125 = fieldNorm(doc=1611)
        0.2 = coord(5/25)
    
  4. Blanco, E.; Moldovan, D.: ¬A model for composing semantic relations (2011) 0.15
    0.14869599 = sum of:
      0.14869599 = product of:
        0.7434799 = sum of:
          0.05480506 = weight(abstract_txt:independent in 4762) [ClassicSimilarity], result of:
            0.05480506 = score(doc=4762,freq=1.0), product of:
              0.12066689 = queryWeight, product of:
                1.0158108 = boost
                5.813565 = idf(docFreq=358, maxDocs=44218)
                0.02043303 = queryNorm
              0.45418474 = fieldWeight in 4762, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.813565 = idf(docFreq=358, maxDocs=44218)
                0.078125 = fieldNorm(doc=4762)
          0.023556564 = weight(abstract_txt:from in 4762) [ClassicSimilarity], result of:
            0.023556564 = score(doc=4762,freq=1.0), product of:
              0.10909437 = queryWeight, product of:
                1.9317456 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.02043303 = queryNorm
              0.21592833 = fieldWeight in 4762, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.078125 = fieldNorm(doc=4762)
          0.2518133 = weight(abstract_txt:relations in 4762) [ClassicSimilarity], result of:
            0.2518133 = score(doc=4762,freq=7.0), product of:
              0.21965316 = queryWeight, product of:
                1.9382175 = boost
                5.5462847 = idf(docFreq=468, maxDocs=44218)
                0.02043303 = queryNorm
              1.1464132 = fieldWeight in 4762, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                5.5462847 = idf(docFreq=468, maxDocs=44218)
                0.078125 = fieldNorm(doc=4762)
          0.18604101 = weight(abstract_txt:extracting in 4762) [ClassicSimilarity], result of:
            0.18604101 = score(doc=4762,freq=1.0), product of:
              0.34339195 = queryWeight, product of:
                2.423421 = boost
                6.9347134 = idf(docFreq=116, maxDocs=44218)
                0.02043303 = queryNorm
              0.5417745 = fieldWeight in 4762, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9347134 = idf(docFreq=116, maxDocs=44218)
                0.078125 = fieldNorm(doc=4762)
          0.22726397 = weight(abstract_txt:relation in 4762) [ClassicSimilarity], result of:
            0.22726397 = score(doc=4762,freq=3.0), product of:
              0.3114546 = queryWeight, product of:
                2.826681 = boost
                5.3924384 = idf(docFreq=546, maxDocs=44218)
                0.02043303 = queryNorm
              0.7296857 = fieldWeight in 4762, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.3924384 = idf(docFreq=546, maxDocs=44218)
                0.078125 = fieldNorm(doc=4762)
        0.2 = coord(5/25)
    
  5. Wang, P.; Hao, T.; Yan, J.; Jin, L.: Large-scale extraction of drug-disease pairs from the medical literature (2017) 0.15
    0.1477937 = sum of:
      0.1477937 = product of:
        0.73896843 = sum of:
          0.11096533 = weight(abstract_txt:extract in 3927) [ClassicSimilarity], result of:
            0.11096533 = score(doc=3927,freq=3.0), product of:
              0.15538132 = queryWeight, product of:
                1.1527051 = boost
                6.5970206 = idf(docFreq=163, maxDocs=44218)
                0.02043303 = queryNorm
              0.7141484 = fieldWeight in 3927, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.5970206 = idf(docFreq=163, maxDocs=44218)
                0.0625 = fieldNorm(doc=3927)
          0.23260751 = weight(abstract_txt:pairs in 3927) [ClassicSimilarity], result of:
            0.23260751 = score(doc=3927,freq=11.0), product of:
              0.16504383 = queryWeight, product of:
                1.1880054 = boost
                6.7990475 = idf(docFreq=133, maxDocs=44218)
                0.02043303 = queryNorm
              1.4093682 = fieldWeight in 3927, product of:
                3.3166249 = tf(freq=11.0), with freq of:
                  11.0 = termFreq=11.0
                6.7990475 = idf(docFreq=133, maxDocs=44218)
                0.0625 = fieldNorm(doc=3927)
          0.032640934 = weight(abstract_txt:from in 3927) [ClassicSimilarity], result of:
            0.032640934 = score(doc=3927,freq=3.0), product of:
              0.10909437 = queryWeight, product of:
                1.9317456 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.02043303 = queryNorm
              0.29919907 = fieldWeight in 3927, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.0625 = fieldNorm(doc=3927)
          0.25778595 = weight(abstract_txt:extracting in 3927) [ClassicSimilarity], result of:
            0.25778595 = score(doc=3927,freq=3.0), product of:
              0.34339195 = queryWeight, product of:
                2.423421 = boost
                6.9347134 = idf(docFreq=116, maxDocs=44218)
                0.02043303 = queryNorm
              0.7507047 = fieldWeight in 3927, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.9347134 = idf(docFreq=116, maxDocs=44218)
                0.0625 = fieldNorm(doc=3927)
          0.104968734 = weight(abstract_txt:relation in 3927) [ClassicSimilarity], result of:
            0.104968734 = score(doc=3927,freq=1.0), product of:
              0.3114546 = queryWeight, product of:
                2.826681 = boost
                5.3924384 = idf(docFreq=546, maxDocs=44218)
                0.02043303 = queryNorm
              0.3370274 = fieldWeight in 3927, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3924384 = idf(docFreq=546, maxDocs=44218)
                0.0625 = fieldNorm(doc=3927)
        0.2 = coord(5/25)