Search (1 results, page 1 of 1)

  • × author_ss:"Blanco, L."
  • × theme_ss:"Semantic Web"
  1. Blanco, L.; Bronzi, M.; Crescenzi, V.; Merialdo, P.; Papotti, P.: Flint: from Web pages to probabilistic semantic data (2012) 0.00
    0.0029745363 = product of:
      0.011898145 = sum of:
        0.011898145 = weight(_text_:information in 437) [ClassicSimilarity], result of:
          0.011898145 = score(doc=437,freq=8.0), product of:
            0.06134496 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.034944877 = queryNorm
            0.19395474 = fieldWeight in 437, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=437)
      0.25 = coord(1/4)
    
    Abstract
    The Web is a surprisingly extensive source of information: it offers a huge number of sites containing data about a disparate range of topics. Although Web pages are built for human fruition, not for automatic processing of the data, we observe that an increasing number of Web sites deliver pages containing structured information about recognizable concepts, relevant to specific application domains, such as movies, finance, sport, products, etc. The development of scalable techniques to discover, extract, and integrate data from fairly structured large corpora available on the Web is a challenging issue, because to face the Web scale, these activities should be accomplished automatically by domain-independent techniques. To cope with the complexity and the heterogeneity of Web data, state-of-the-art approaches focus on information organized according to specific patterns that frequently occur on the Web. Meaningful examples are WebTables, which focuses on data published in HTML tables, and information extraction systems, such as TextRunner, which exploits lexical-syntactic patterns. As noticed by Cafarella et al., even if a small fraction of the Web is organized according to these patterns, due to the Web scale, the amount of data involved is impressive. In this chapter, we focus on methods and techniques to wring out value from the data delivered by large data-intensive Web sites.