Search (3 results, page 1 of 1)

  • × author_ss:"Baeza-Yates, R."
  • × author_ss:"Navarro, G."
  1. Navarro, G.; Baeza-Yates, R.; Azevedo Arcoverde, J.M.: Matchsimile : a flexible approximate matching tool for searching proper names (2003) 0.00
    0.0024128247 = product of:
      0.0048256493 = sum of:
        0.0048256493 = product of:
          0.009651299 = sum of:
            0.009651299 = weight(_text_:a in 1420) [ClassicSimilarity], result of:
              0.009651299 = score(doc=1420,freq=14.0), product of:
                0.04772363 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.041389145 = queryNorm
                0.20223314 = fieldWeight in 1420, product of:
                  3.7416575 = tf(freq=14.0), with freq of:
                    14.0 = termFreq=14.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1420)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    We present the architecture and algorithms behind Matchsimile, an approximate string matching lookup tool especially designed for extracting person and company names from large texts. Part of a larger information extraction environment, this specific engine receives a large set of proper names to search for, a text to search, and search options; and outputs all the occurrences of the names found in the text. Beyond the similarity search capabilities applied at the intraword level, the tool considers a set of specific person name formation rules at the word level, such as combination, abbreviation, duplicity detections, ordering, word omission and insertion, among others. This engine is used in a successful commercial application (also named Matchsimile), which allows searching for lawyer names in official law publications.
    Type
    a
  2. Baeza-Yates, R.; Navarro, G.: XQL and proximal nodes (2002) 0.00
    0.0018428253 = product of:
      0.0036856506 = sum of:
        0.0036856506 = product of:
          0.0073713013 = sum of:
            0.0073713013 = weight(_text_:a in 454) [ClassicSimilarity], result of:
              0.0073713013 = score(doc=454,freq=6.0), product of:
                0.04772363 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.041389145 = queryNorm
                0.1544581 = fieldWeight in 454, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=454)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Despite the fact that several models to structure text documents and to query on this structure have been proposed in the past, a standard has emerged only relatively recently with the introduction of XML and its proposed query language XQL, on which we focus in this article. Although there exist some implementations of XQL, efficiency of the query engine is still a problem. We show in this article that an already existing model, Proximal Nodes, which was defined with the goal of efficiency in mind, can be used as an efficient query engine behind an XQL front-end.
    Type
    a
  3. Baeza-Yates, R.; Navarro, G.: Block addressing indices for approximate text retrieval (2000) 0.00
    0.001289709 = product of:
      0.002579418 = sum of:
        0.002579418 = product of:
          0.005158836 = sum of:
            0.005158836 = weight(_text_:a in 4295) [ClassicSimilarity], result of:
              0.005158836 = score(doc=4295,freq=4.0), product of:
                0.04772363 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.041389145 = queryNorm
                0.10809815 = fieldWeight in 4295, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4295)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The issue of reducing the space overhead when indexing large text databases is becoming more and more important, as the text collection grow in size. Another subject, which is gaining importance as text databases grow and get more heterogeneous and error prone, is that of flexible string matching. One of the best tools to make the search more flexible is to allow a limited number of differences between the words found and those sought. This is called 'approximate text searching'. which is becoming more and more popular. In recent years some indexing schemes with very low space overhead have appeared, some of them dealing with approximate searching. These low overhead indices (whose most notorious exponent is Glimpse) are modified inverted files, where space is saved by making the lists of occurences point to text blocks instead of exact word positions. Despite their existence, little is known about the expected behaviour of these 'block addressing' indices, and even less is known when it comes to cope with approximate search. Our main contribution is an analytical study of the space-time trade-offs for indexed text searching
    Type
    a