Search (9 results, page 1 of 1)

Castillo, C.; Baeza-Yates, R.: Web retrieval and mining (2009) 0.02

0.02143507 = product of:
  0.10003033 = sum of:
    0.054580662 = weight(_text_:web in 3904) [ClassicSimilarity], result of:
      0.054580662 = score(doc=3904,freq=10.0), product of:
        0.09670874 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.029633347 = queryNorm
        0.5643819 = fieldWeight in 3904, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3904)
    0.015792815 = weight(_text_:information in 3904) [ClassicSimilarity], result of:
      0.015792815 = score(doc=3904,freq=10.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.3035872 = fieldWeight in 3904, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3904)
    0.029656855 = weight(_text_:retrieval in 3904) [ClassicSimilarity], result of:
      0.029656855 = score(doc=3904,freq=4.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.33085006 = fieldWeight in 3904, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3904)
  0.21428572 = coord(3/14)

Abstract: The advent of the Web in the mid-1990s followed by its fast adoption in a relatively short time, posed significant challenges to classical information retrieval methods developed in the 1970s and the 1980s. The major challenges include that the Web is massive, dynamic, and distributed. The two main types of tasks that are carried on the Web are searching and mining. Searching is locating information given an information need, and mining is extracting information and/or knowledge from a corpus. The metrics for success when carrying these tasks on the Web include precision, recall (completeness), freshness, and efficiency.
Source: Encyclopedia of library and information sciences. 3rd ed. Ed.: M.J. Bates

Baeza-Yates, R.; Hurtado, C.; Mendoza, M.: Improving search engines by query clustering (2007) 0.02

0.016174633 = product of:
  0.07548162 = sum of:
    0.042278 = weight(_text_:web in 601) [ClassicSimilarity], result of:
      0.042278 = score(doc=601,freq=6.0), product of:
        0.09670874 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.029633347 = queryNorm
        0.43716836 = fieldWeight in 601, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=601)
    0.012233062 = weight(_text_:information in 601) [ClassicSimilarity], result of:
      0.012233062 = score(doc=601,freq=6.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.23515764 = fieldWeight in 601, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=601)
    0.020970564 = weight(_text_:retrieval in 601) [ClassicSimilarity], result of:
      0.020970564 = score(doc=601,freq=2.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.23394634 = fieldWeight in 601, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0546875 = fieldNorm(doc=601)
  0.21428572 = coord(3/14)

Abstract: In this paper, we present a framework for clustering Web search engine queries whose aim is to identify groups of queries used to search for similar information on the Web. The framework is based on a novel term vector model of queries that integrates user selections and the content of selected documents extracted from the logs of a search engine. The query representation obtained allows us to treat query clustering similarly to standard document clustering. We study the application of the clustering framework to two problems: relevance ranking boosting and query recommendation. Finally, we evaluate with experiments the effectiveness of our approach.
Footnote: Beitrag eines Themenschwerpunktes "Mining Web resources for enhancing information retrieval"
Source: Journal of the American Society for Information Science and Technology. 58(2007) no.12, S.1793-1804

Baeza-Yates, R.; Boldi, P.; Castillo, C.: Generalizing PageRank : damping functions for linkbased ranking algorithms (2006) 0.01

0.012614421 = product of:
  0.04415047 = sum of:
    0.017435152 = weight(_text_:web in 2565) [ClassicSimilarity], result of:
      0.017435152 = score(doc=2565,freq=2.0), product of:
        0.09670874 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.029633347 = queryNorm
        0.18028519 = fieldWeight in 2565, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2565)
    0.0050448296 = weight(_text_:information in 2565) [ClassicSimilarity], result of:
      0.0050448296 = score(doc=2565,freq=2.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.09697737 = fieldWeight in 2565, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2565)
    0.014978974 = weight(_text_:retrieval in 2565) [ClassicSimilarity], result of:
      0.014978974 = score(doc=2565,freq=2.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.16710453 = fieldWeight in 2565, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2565)
    0.0066915164 = product of:
      0.020074548 = sum of:
        0.020074548 = weight(_text_:22 in 2565) [ClassicSimilarity], result of:
          0.020074548 = score(doc=2565,freq=2.0), product of:
            0.103770934 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.029633347 = queryNorm
            0.19345059 = fieldWeight in 2565, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2565)
      0.33333334 = coord(1/3)
  0.2857143 = coord(4/14)

Abstract: This paper introduces a family of link-based ranking algorithms that propagate page importance through links. In these algorithms there is a damping function that decreases with distance, so a direct link implies more endorsement than a link through a long path. PageRank is the most widely known ranking function of this family. The main objective of this paper is to determine whether this family of ranking techniques has some interest per se, and how different choices for the damping function impact on rank quality and on convergence speed. Even though our results suggest that PageRank can be approximated with other simpler forms of rankings that may be computed more efficiently, our focus is of more speculative nature, in that it aims at separating the kernel of PageRank, that is, link-based importance propagation, from the way propagation decays over paths. We focus on three damping functions, having linear, exponential, and hyperbolic decay on the lengths of the paths. The exponential decay corresponds to PageRank, and the other functions are new. Our presentation includes algorithms, analysis, comparisons and experiments that study their behavior under different parameters in real Web graph data. Among other results, we show how to calculate a linear approximation that induces a page ordering that is almost identical to PageRank's using a fixed small number of iterations; comparisons were performed using Kendall's tau on large domain datasets.
Date: 16. 1.2016 10:22:28
Source: http://chato.cl/papers/baeza06_general_pagerank_damping_functions_link_ranking.pdf [Proceedings of the ACM Special Interest Group on Information Retrieval (SIGIR) Conference, SIGIR'06, August 6-10, 2006, Seattle, Washington, USA]

Harman, D.; Fox, E.; Baeza-Yates, R.; Lee, W.: Inverted files (1992) 0.00

0.0045768693 = product of:
  0.032038085 = sum of:
    0.008071727 = weight(_text_:information in 3497) [ClassicSimilarity], result of:
      0.008071727 = score(doc=3497,freq=2.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.1551638 = fieldWeight in 3497, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0625 = fieldNorm(doc=3497)
    0.023966359 = weight(_text_:retrieval in 3497) [ClassicSimilarity], result of:
      0.023966359 = score(doc=3497,freq=2.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.26736724 = fieldWeight in 3497, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0625 = fieldNorm(doc=3497)
  0.14285715 = coord(2/14)

Source: Information retrieval: data structures and algorithms. Ed.: W.B. Frakes u. R. Baeza-Yates

Kucukyilmaz, T.; Cambazoglu, B.B.; Aykanat, C.; Baeza-Yates, R.: ¬A machine learning approach for result caching in web search engines (2017) 0.00

0.0038537113 = product of:
  0.026975978 = sum of:
    0.020922182 = weight(_text_:web in 5100) [ClassicSimilarity], result of:
      0.020922182 = score(doc=5100,freq=2.0), product of:
        0.09670874 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.029633347 = queryNorm
        0.21634221 = fieldWeight in 5100, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=5100)
    0.0060537956 = weight(_text_:information in 5100) [ClassicSimilarity], result of:
      0.0060537956 = score(doc=5100,freq=2.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.116372846 = fieldWeight in 5100, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=5100)
  0.14285715 = coord(2/14)

Source: Information processing and management. 53(2017) no.4, S.834-850

Baeza-Yates, R.; Navarro, G.: Block addressing indices for approximate text retrieval (2000) 0.00

0.0034326524 = product of:
  0.024028566 = sum of:
    0.0060537956 = weight(_text_:information in 4295) [ClassicSimilarity], result of:
      0.0060537956 = score(doc=4295,freq=2.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.116372846 = fieldWeight in 4295, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=4295)
    0.01797477 = weight(_text_:retrieval in 4295) [ClassicSimilarity], result of:
      0.01797477 = score(doc=4295,freq=2.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.20052543 = fieldWeight in 4295, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=4295)
  0.14285715 = coord(2/14)

Source: Journal of the American Society for Information Science. 51(2000) no.1, S.69-82

Navarro, G.; Baeza-Yates, R.; Azevedo Arcoverde, J.M.: Matchsimile : a flexible approximate matching tool for searching proper names (2003) 0.00
```
6.115257E-4 = product of:
  0.00856136 = sum of:
    0.00856136 = weight(_text_:information in 1420) [ClassicSimilarity], result of:
      0.00856136 = score(doc=1420,freq=4.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.16457605 = fieldWeight in 1420, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=1420)
  0.071428575 = coord(1/14)
```
Abstract

We present the architecture and algorithms behind Matchsimile, an approximate string matching lookup tool especially designed for extracting person and company names from large texts. Part of a larger information extraction environment, this specific engine receives a large set of proper names to search for, a text to search, and search options; and outputs all the occurrences of the names found in the text. Beyond the similarity search capabilities applied at the intraword level, the tool considers a set of specific person name formation rules at the word level, such as combination, abbreviation, duplicity detections, ordering, word omission and insertion, among others. This engine is used in a successful commercial application (also named Matchsimile), which allows searching for lawyer names in official law publications.

Source

Journal of the American Society for Information Science and technology. 54(2003) no.1, S.3-15

Baeza-Yates, R.; Navarro, G.: XQL and proximal nodes (2002) 0.00

5.04483E-4 = product of:
  0.0070627616 = sum of:
    0.0070627616 = weight(_text_:information in 454) [ClassicSimilarity], result of:
      0.0070627616 = score(doc=454,freq=2.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.13576832 = fieldWeight in 454, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=454)
  0.071428575 = coord(1/14)

Source: Journal of the American Society for Information Science and technology. 53(2002) no.6, S.504-514

Lehmann, J.; Castillo, C.; Lalmas, M.; Baeza-Yates, R.: Story-focused reading in online news and its potential for user engagement (2017) 0.00

3.6034497E-4 = product of:
  0.0050448296 = sum of:
    0.0050448296 = weight(_text_:information in 3529) [ClassicSimilarity], result of:
      0.0050448296 = score(doc=3529,freq=2.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.09697737 = fieldWeight in 3529, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3529)
  0.071428575 = coord(1/14)

Source: Journal of the Association for Information Science and Technology. 68(2017) no.4, S.869-883

Search (9 results, page 1 of 1)

Authors

Years

Themes