Search (2 results, page 1 of 1)

  • × author_ss:"Cacheda, F."
  • × type_ss:"a"
  • × year_i:[2000 TO 2010}
  1. Cacheda, F.; Carneiro, V.; Plachouras, V.; Ounis, I.: Performance analysis of distributed information retrieval architectures using an improved network simulation model (2007) 0.03
    0.034526248 = product of:
      0.069052495 = sum of:
        0.069052495 = product of:
          0.13810499 = sum of:
            0.13810499 = weight(_text_:network in 903) [ClassicSimilarity], result of:
              0.13810499 = score(doc=903,freq=12.0), product of:
                0.22917621 = queryWeight, product of:
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.05146125 = queryNorm
                0.6026149 = fieldWeight in 903, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=903)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The increasing number of documents that have to be indexed in different environments, particularly on the Web, and the lack of scalability of a single centralised index lead to the use of distributed information retrieval systems to effectively search for and locate the required information. In this study, we present several improvements over the two main bottlenecks in a distributed information retrieval system (the network and the brokers). We extend a simulation network model in order to represent a switched network. The new simulation model is validated by comparing the estimated response times with those obtained using a real system. We show that the use of a switched network reduces the saturation of the interconnection network, especially in a replicated system, and some improvements may be achieved using multicast messages and faster connections with the brokers. We also demonstrate that reducing the partial results sets will improve the response time of a distributed system by 53%, with a negligible probability of changing the system's precision and recall values. Finally, we present a simple hierarchical distributed broker model that will reduce the response times for a distributed system by 55%.
  2. Cacheda, F.; Plachouras, V.; Ounis, l.: ¬A case study of distributed information retrieval architectures to index one terabyte of text (2005) 0.02
    0.01993374 = product of:
      0.03986748 = sum of:
        0.03986748 = product of:
          0.07973496 = sum of:
            0.07973496 = weight(_text_:network in 1042) [ClassicSimilarity], result of:
              0.07973496 = score(doc=1042,freq=4.0), product of:
                0.22917621 = queryWeight, product of:
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.05146125 = queryNorm
                0.34791988 = fieldWeight in 1042, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1042)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The increasing number of documents to be indexed in many environments (Web, intranets, digital libraries) and the limitations of a single centralised index (lack of scalability, server overloading and failures), lead to the use of distributed information retrieval systems to efficiently search and locate the desired information. This work is a case study of different architectures for a distributed information retrieval system, in order to provide a guide to approximate the optimal architecture with a specific set of resources. We analyse the effectiveness of a distributed, replicated and clustered architecture simulating a variable number of workstations (from 1 up to 4096). A collection of approximately 94 million documents and 1 terabyte (TB) of text is used to test the performance of the different architectures. In a purely distributed information retrieval system, the brokers become the bottleneck due to the high number of local answer sets to be sorted. In a replicated system, the network is the bottleneck due to the high number of query servers and the continuous data interchange with the brokers. Finally, we demonstrate that a clustered system will outperform a replicated system if a high number of query servers is used, essentially due to the reduction of the network load. However a change in the distribution of the users' queries could reduce the performance of a clustered system.