Search (2 results, page 1 of 1)

  • × author_ss:"Can, F."
  1. Carterette, B.; Can, F.: Comparing inverted files and signature files for searching a large lexicon (2005) 0.02
    0.024285134 = product of:
      0.048570268 = sum of:
        0.048570268 = product of:
          0.097140536 = sum of:
            0.097140536 = weight(_text_:searching in 1029) [ClassicSimilarity], result of:
              0.097140536 = score(doc=1029,freq=6.0), product of:
                0.2091384 = queryWeight, product of:
                  4.0452914 = idf(docFreq=2103, maxDocs=44218)
                  0.051699217 = queryNorm
                0.46447968 = fieldWeight in 1029, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  4.0452914 = idf(docFreq=2103, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1029)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Signature files and inverted files are well-known index structures. In this paper we undertake a direct comparison of the two for searching for partially-specified queries in a large lexicon stored in main memory. Using n-grams to index lexicon terms, a bit-sliced signature file can be compressed to a smaller size than an inverted file if each n-gram sets only one bit in the term signature. With a signature width less than half the number of unique n-grams in the lexicon, the signature file method is about as fast as the inverted file method, and significantly smaller. Greater flexibility in memory usage and faster index generation time make signature files appropriate for searching large lexicons or other collections in an environment where memory is at a premium.
  2. Can, F.: Incremental clustering for dynamic information processing (1993) 0.02
    0.018694704 = product of:
      0.03738941 = sum of:
        0.03738941 = product of:
          0.07477882 = sum of:
            0.07477882 = weight(_text_:searching in 6627) [ClassicSimilarity], result of:
              0.07477882 = score(doc=6627,freq=2.0), product of:
                0.2091384 = queryWeight, product of:
                  4.0452914 = idf(docFreq=2103, maxDocs=44218)
                  0.051699217 = queryNorm
                0.3575566 = fieldWeight in 6627, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.0452914 = idf(docFreq=2103, maxDocs=44218)
                  0.0625 = fieldNorm(doc=6627)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Clustering of very large document databases is useful for both searching and browsing. The periodic updating of clusters is required due to the dynamic nature of databases. Introduces an algorithm for incremental clustering and discusses the complexity and cost of analysis of the algorithm together with an investigation of its expected behaviour. Shows through empirical testing that the algortihm achieves cost effectiveness and generates statistically valid clusters that are compatible with those of reclustering. The experimental evidence shows that the algorithm creates an effective and effecient retrieval environment