Search (2 results, page 1 of 1)

  • × theme_ss:"Internet"
  • × theme_ss:"Suchtaktik"
  • × year_i:[2010 TO 2020}
  1. Sanchiza, M.; Chinb, J.; Chevaliera, A.; Fuc, W.T.; Amadieua, F.; Hed, J.: Searching for information on the web : impact of cognitive aging, prior domain knowledge and complexity of the search problems (2017) 0.01
    0.010669115 = product of:
      0.032007344 = sum of:
        0.032007344 = weight(_text_:on in 3294) [ClassicSimilarity], result of:
          0.032007344 = score(doc=3294,freq=8.0), product of:
            0.109763056 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.04990557 = queryNorm
            0.29160398 = fieldWeight in 3294, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.046875 = fieldNorm(doc=3294)
      0.33333334 = coord(1/3)
    
    Abstract
    This study focuses on the impact of age, prior domain knowledge and cognitive abilities on performance, query production and navigation strategies during information searching. Twenty older adults and nineteen young adults had to answer 12 information search problems of varying nature within two domain knowledge: health and manga. In each domain, participants had to perform two simple fact-finding problems (keywords provided and answer directly accessible on the search engine results page), two difficult fact-finding problems (keywords had to be inferred) and two open-ended information search problems (multiple answers possible and navigation necessary). Results showed that prior domain knowledge helped older adults improve navigation (i.e. reduced the number of webpages visited and thus decreased the feeling of disorientation), query production and reformulation (i.e. they formulated semantically more specific queries, and they inferred a greater number of new keywords).
  2. Barrio, P.; Gravano, L.: Sampling strategies for information extraction over the deep web (2017) 0.01
    0.008711295 = product of:
      0.026133886 = sum of:
        0.026133886 = weight(_text_:on in 3412) [ClassicSimilarity], result of:
          0.026133886 = score(doc=3412,freq=12.0), product of:
            0.109763056 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.04990557 = queryNorm
            0.23809364 = fieldWeight in 3412, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.03125 = fieldNorm(doc=3412)
      0.33333334 = coord(1/3)
    
    Abstract
    Information extraction systems discover structured information in natural language text. Having information in structured form enables much richer querying and data mining than possible over the natural language text. However, information extraction is a computationally expensive task, and hence improving the efficiency of the extraction process over large text collections is of critical interest. In this paper, we focus on an especially valuable family of text collections, namely, the so-called deep-web text collections, whose contents are not crawlable and are only available via querying. Important steps for efficient information extraction over deep-web text collections (e.g., selecting the collections on which to focus the extraction effort, based on their contents; or learning which documents within these collections-and in which order-to process, based on their words and phrases) require having a representative document sample from each collection. These document samples have to be collected by querying the deep-web text collections, an expensive process that renders impractical the existing sampling approaches developed for other data scenarios. In this paper, we systematically study the space of query-based document sampling techniques for information extraction over the deep web. Specifically, we consider (i) alternative query execution schedules, which vary on how they account for the query effectiveness, and (ii) alternative document retrieval and processing schedules, which vary on how they distribute the extraction effort over documents. We report the results of the first large-scale experimental evaluation of sampling techniques for information extraction over the deep web. Our results show the merits and limitations of the alternative query execution and document retrieval and processing strategies, and provide a roadmap for addressing this critically important building block for efficient, scalable information extraction.