Search (58 results, page 2 of 3)

  • × theme_ss:"Retrievalalgorithmen"
  • × type_ss:"a"
  1. Fuhr, N.: Ranking-Experimente mit gewichteter Indexierung (1986) 0.01
    0.0134631805 = product of:
      0.04038954 = sum of:
        0.04038954 = product of:
          0.08077908 = sum of:
            0.08077908 = weight(_text_:22 in 58) [ClassicSimilarity], result of:
              0.08077908 = score(doc=58,freq=2.0), product of:
                0.17398734 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049684696 = queryNorm
                0.46428138 = fieldWeight in 58, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=58)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    14. 6.2015 22:12:44
  2. Fuhr, N.: Rankingexperimente mit gewichteter Indexierung (1986) 0.01
    0.0134631805 = product of:
      0.04038954 = sum of:
        0.04038954 = product of:
          0.08077908 = sum of:
            0.08077908 = weight(_text_:22 in 2051) [ClassicSimilarity], result of:
              0.08077908 = score(doc=2051,freq=2.0), product of:
                0.17398734 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049684696 = queryNorm
                0.46428138 = fieldWeight in 2051, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=2051)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    14. 6.2015 22:12:56
  3. Willett, P.: Best-match text retrieval (1993) 0.01
    0.0134057235 = product of:
      0.04021717 = sum of:
        0.04021717 = product of:
          0.08043434 = sum of:
            0.08043434 = weight(_text_:indexing in 7818) [ClassicSimilarity], result of:
              0.08043434 = score(doc=7818,freq=2.0), product of:
                0.19018644 = queryWeight, product of:
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.049684696 = queryNorm
                0.42292362 = fieldWeight in 7818, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.078125 = fieldNorm(doc=7818)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Provides an introduction to the computational techniques that underlie best match searching retrieval systems. Discusses: problems of traditional Boolean systems; characteristics of best-match searching; automatic indexing; term conflation; matching of documents and queries (dealing with similarity measures, initial weights, relevance weights, and the matching algorithm); and describes operational best-match systems
  4. Liu, A.; Zou, Q.; Chu, W.W.: Configurable indexing and ranking for XML information retrieval (2004) 0.01
    0.0134057235 = product of:
      0.04021717 = sum of:
        0.04021717 = product of:
          0.08043434 = sum of:
            0.08043434 = weight(_text_:indexing in 4114) [ClassicSimilarity], result of:
              0.08043434 = score(doc=4114,freq=2.0), product of:
                0.19018644 = queryWeight, product of:
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.049684696 = queryNorm
                0.42292362 = fieldWeight in 4114, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.078125 = fieldNorm(doc=4114)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
  5. Lee, C.; Lee, G.G.: Probabilistic information retrieval model for a dependence structured indexing system (2005) 0.01
    0.013270989 = product of:
      0.039812967 = sum of:
        0.039812967 = product of:
          0.079625934 = sum of:
            0.079625934 = weight(_text_:indexing in 1004) [ClassicSimilarity], result of:
              0.079625934 = score(doc=1004,freq=4.0), product of:
                0.19018644 = queryWeight, product of:
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.049684696 = queryNorm
                0.41867304 = fieldWeight in 1004, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1004)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Most previous information retrieval (IR) models assume that terms of queries and documents are statistically independent from each other. However, conditional independence assumption is obviously and openly understood to be wrong, so we present a new method of incorporating term dependence into a probabilistic retrieval model by adapting a dependency structured indexing system using a dependency parse tree and Chow Expansion to compensate the weakness of the assumption. In this paper, we describe a theoretic process to apply the Chow Expansion to the general probabilistic models and the state-of-the-art 2-Poisson model. Through experiments on document collections in English and Korean, we demonstrate that the incorporation of term dependences using Chow Expansion contributes to the improvement of performance in probabilistic IR systems.
  6. Maron, M.E.; Kuhns, I.L.: On relevance, probabilistic indexing and information retrieval (1960) 0.01
    0.011609698 = product of:
      0.03482909 = sum of:
        0.03482909 = product of:
          0.06965818 = sum of:
            0.06965818 = weight(_text_:indexing in 1928) [ClassicSimilarity], result of:
              0.06965818 = score(doc=1928,freq=6.0), product of:
                0.19018644 = queryWeight, product of:
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.049684696 = queryNorm
                0.3662626 = fieldWeight in 1928, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1928)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Reports on a novel technique for literature indexing and searching in a mechanized library system. The notion of relevance is taken as the key concept in the theory of information retrieval and a comparative concept of relevance is explicated in terms of the theory of probability. The resulting technique called 'Probabilistic indexing' allows a computing machine, given a request for information, to make a statistical inference and derive a number (called the 'relevance number') for each document, which is a measure of the probability that the document will satisfy the given request. The result of a search is an ordered list of those documents which satisfy the request ranked according to their probable relevance. The paper goes on to show that whereas in a conventional library system the cross-referencing ('see' and 'see also') is based soley on the 'semantic closeness' between index terms, statistical measures of closeness between index terms can be defined and computed. Thus, given an arbitrary request consisting of one (or many) index term(s), a machine can eleborate on it to increase the probability of selecting relevant documents that would not otherwise have been selected. Finally, the paper suggest an interpretation of the whole library problem as one where the request is considered as a clue on the basis of which the library system makes a concatenated statistical inference in order to provide as an output an ordered list of those documents which most probably satisfy the information needs of the user
  7. Hoenkamp, E.: Unitary operators on the document space (2003) 0.01
    0.011609698 = product of:
      0.03482909 = sum of:
        0.03482909 = product of:
          0.06965818 = sum of:
            0.06965818 = weight(_text_:indexing in 3457) [ClassicSimilarity], result of:
              0.06965818 = score(doc=3457,freq=6.0), product of:
                0.19018644 = queryWeight, product of:
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.049684696 = queryNorm
                0.3662626 = fieldWeight in 3457, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3457)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    When people search for documents, they eventually want content, not words. Hence, search engines should relate documents more by their underlying concepts than by the words they contain. One promising technique to do so is Latent Semantic Indexing (LSI). LSI dramatically reduces the dimension of the document space by mapping it into a space spanned by conceptual indices. Empirically, the number of concepts that can represent the documents are far fewer than the great variety of words in the textual representation. Although this almost obviates the problem of lexical matching, the mapping incurs a high computational cost compared to document parsing, indexing, query matching, and updating. This article accomplishes several things. First, it shows how the technique underlying LSI is just one example of a unitary operator, for which there are computationally more attractive alternatives. Second, it proposes the Haar transform as such an alternative, as it is memory efficient, and can be computed in linear to sublinear time. Third, it generalizes LSI by a multiresolution representation of the document space. The approach not only preserves the advantages of LSI at drastically reduced computational costs, it also opens a spectrum of possibilities for new research.
    Object
    Latent Semantic Indexing
  8. Baeza-Yates, R.; Navarro, G.: Block addressing indices for approximate text retrieval (2000) 0.01
    0.011375135 = product of:
      0.034125403 = sum of:
        0.034125403 = product of:
          0.068250805 = sum of:
            0.068250805 = weight(_text_:indexing in 4295) [ClassicSimilarity], result of:
              0.068250805 = score(doc=4295,freq=4.0), product of:
                0.19018644 = queryWeight, product of:
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.049684696 = queryNorm
                0.3588626 = fieldWeight in 4295, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4295)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    The issue of reducing the space overhead when indexing large text databases is becoming more and more important, as the text collection grow in size. Another subject, which is gaining importance as text databases grow and get more heterogeneous and error prone, is that of flexible string matching. One of the best tools to make the search more flexible is to allow a limited number of differences between the words found and those sought. This is called 'approximate text searching'. which is becoming more and more popular. In recent years some indexing schemes with very low space overhead have appeared, some of them dealing with approximate searching. These low overhead indices (whose most notorious exponent is Glimpse) are modified inverted files, where space is saved by making the lists of occurences point to text blocks instead of exact word positions. Despite their existence, little is known about the expected behaviour of these 'block addressing' indices, and even less is known when it comes to cope with approximate search. Our main contribution is an analytical study of the space-time trade-offs for indexed text searching
  9. Chen, H.; Zhang, Y.; Houston, A.L.: Semantic indexing and searching using a Hopfield net (1998) 0.01
    0.011375135 = product of:
      0.034125403 = sum of:
        0.034125403 = product of:
          0.068250805 = sum of:
            0.068250805 = weight(_text_:indexing in 5704) [ClassicSimilarity], result of:
              0.068250805 = score(doc=5704,freq=4.0), product of:
                0.19018644 = queryWeight, product of:
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.049684696 = queryNorm
                0.3588626 = fieldWeight in 5704, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5704)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Presents a neural network approach to document semantic indexing. Reports results of a study to apply a Hopfield net algorithm to simulate human associative memory for concept exploration in the domain of computer science and engineering. The INSPEC database, consisting of 320.000 abstracts from leading periodical articles was used as the document test bed. Benchmark tests conformed that 3 parameters: maximum number of activated nodes; maximum allowable error; and maximum number of iterations; were useful in positively influencing network convergence behaviour without negatively impacting central processing unit performance. Another series of benchmark tests was performed to determine the effectiveness of various filtering techniques in reducing the negative impact of noisy input terms. Preliminary user tests conformed expectations that the Hopfield net is potentially useful as an associative memory technique to improve document recall and precision by solving discrepancies between indexer vocabularies and end user vocabularies
  10. Ojala, M.: Commands that RANKle (1997) 0.01
    0.01072458 = product of:
      0.032173738 = sum of:
        0.032173738 = product of:
          0.064347476 = sum of:
            0.064347476 = weight(_text_:indexing in 428) [ClassicSimilarity], result of:
              0.064347476 = score(doc=428,freq=2.0), product of:
                0.19018644 = queryWeight, product of:
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.049684696 = queryNorm
                0.3383389 = fieldWeight in 428, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.0625 = fieldNorm(doc=428)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Examines the RANK command on DIALOG using a statistical analysis of articles in DATABASE as an example. The RANK command was used to find authors, company names, and length of articles. Use of the command revealed a number of complexities and revealed some problematic indexing on the part of the database producers. The LEXIS-NEXIS RANK command was also used, but this fulfils a different function to the command of the same name in DIALOG
  11. Longshu, L.; Xia, Z.: On an aproximate fuzzy information retrieval agent (1998) 0.01
    0.01072458 = product of:
      0.032173738 = sum of:
        0.032173738 = product of:
          0.064347476 = sum of:
            0.064347476 = weight(_text_:indexing in 3294) [ClassicSimilarity], result of:
              0.064347476 = score(doc=3294,freq=2.0), product of:
                0.19018644 = queryWeight, product of:
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.049684696 = queryNorm
                0.3383389 = fieldWeight in 3294, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.0625 = fieldNorm(doc=3294)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Discusses online approximate information retrieval based on fuzzy mathematics. Defines fuzzy semantics. Presents an approximate fuzzy matching algorithm and an algorithm for a fuzzy word indexing agent for approximate retrieval. Also presents a case study demonstrating approximate fuzzy matching
  12. Costa Carvalho, A. da; Rossi, C.; Moura, E.S. de; Silva, A.S. da; Fernandes, D.: LePrEF: Learn to precompute evidence fusion for efficient query evaluation (2012) 0.01
    0.009479279 = product of:
      0.028437834 = sum of:
        0.028437834 = product of:
          0.05687567 = sum of:
            0.05687567 = weight(_text_:indexing in 278) [ClassicSimilarity], result of:
              0.05687567 = score(doc=278,freq=4.0), product of:
                0.19018644 = queryWeight, product of:
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.049684696 = queryNorm
                0.29905218 = fieldWeight in 278, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=278)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    State-of-the-art search engine ranking methods combine several distinct sources of relevance evidence to produce a high-quality ranking of results for each query. The fusion of information is currently done at query-processing time, which has a direct effect on the response time of search systems. Previous research also shows that an alternative to improve search efficiency in textual databases is to precompute term impacts at indexing time. In this article, we propose a novel alternative to precompute term impacts, providing a generic framework for combining any distinct set of sources of evidence by using a machine-learning technique. This method retains the advantages of producing high-quality results, but avoids the costs of combining evidence at query-processing time. Our method, called Learn to Precompute Evidence Fusion (LePrEF), uses genetic programming to compute a unified precomputed impact value for each term found in each document prior to query processing, at indexing time. Compared with previous research on precomputing term impacts, our method offers the advantage of providing a generic framework to precompute impact using any set of relevance evidence at any text collection, whereas previous research articles do not. The precomputed impact values are indexed and used later for computing document ranking at query-processing time. By doing so, our method effectively reduces the query processing to simple additions of such impacts. We show that this approach, while leading to results comparable to state-of-the-art ranking methods, also can lead to a significant decrease in computational costs during query processing.
  13. Carpineto, C.; Romano, G.: Information retrieval through hybrid navigation of lattice representations (1996) 0.01
    0.009384007 = product of:
      0.02815202 = sum of:
        0.02815202 = product of:
          0.05630404 = sum of:
            0.05630404 = weight(_text_:indexing in 7434) [ClassicSimilarity], result of:
              0.05630404 = score(doc=7434,freq=2.0), product of:
                0.19018644 = queryWeight, product of:
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.049684696 = queryNorm
                0.29604656 = fieldWeight in 7434, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=7434)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Presents a comprehensive approach to automatic organization and hybrid navigation of text databases. An organizing stage builds a particular lattice representation of the data, through text indexing followed by lattice clustering of the indexed texts. The lattice representation supports the navigation state of the system, a visual retrieval interface that combines 3 main retrieval strategies: browsing, querying, and bounding. Such a hybrid paradigm permits high flexibility in trading off information exploration and retrieval, and had good retrieval performance. Compares information retrieval using lattice-based hybrid navigation with conventional Boolean querying. Experiments conducted on 2 medium-sized bibliographic databases showed that the performance of lattice retrieval was comparable to or better than Boolean retrieval
  14. Lalmas, M.; Ruthven, I.: Representing and retrieving structured documents using the Dempster-Shafer theory of evidence : modelling and evaluation (1998) 0.01
    0.009384007 = product of:
      0.02815202 = sum of:
        0.02815202 = product of:
          0.05630404 = sum of:
            0.05630404 = weight(_text_:indexing in 1076) [ClassicSimilarity], result of:
              0.05630404 = score(doc=1076,freq=2.0), product of:
                0.19018644 = queryWeight, product of:
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.049684696 = queryNorm
                0.29604656 = fieldWeight in 1076, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1076)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Reports on a theoretical model of structured document indexing and retrieval based on the Dempster-Schafer Theory of Evidence. Includes a description of the model of structured document retrieval, the representation of structured documents, the representation of individual components, how components are combined, details of the combination process, and how relevance is captured within the model. Also presents a detailed account of an implementation of the model, and an evaluation scheme designed to test the effectiveness of the model
  15. MacFarlane, A.; Robertson, S.E.; McCann, J.A.: Parallel computing for passage retrieval (2004) 0.01
    0.008975455 = product of:
      0.026926363 = sum of:
        0.026926363 = product of:
          0.053852726 = sum of:
            0.053852726 = weight(_text_:22 in 5108) [ClassicSimilarity], result of:
              0.053852726 = score(doc=5108,freq=2.0), product of:
                0.17398734 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049684696 = queryNorm
                0.30952093 = fieldWeight in 5108, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=5108)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    20. 1.2007 18:30:22
  16. Faloutsos, C.: Signature files (1992) 0.01
    0.008975455 = product of:
      0.026926363 = sum of:
        0.026926363 = product of:
          0.053852726 = sum of:
            0.053852726 = weight(_text_:22 in 3499) [ClassicSimilarity], result of:
              0.053852726 = score(doc=3499,freq=2.0), product of:
                0.17398734 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049684696 = queryNorm
                0.30952093 = fieldWeight in 3499, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=3499)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    7. 5.1999 15:22:48
  17. Losada, D.E.; Barreiro, A.: Emebedding term similarity and inverse document frequency into a logical model of information retrieval (2003) 0.01
    0.008975455 = product of:
      0.026926363 = sum of:
        0.026926363 = product of:
          0.053852726 = sum of:
            0.053852726 = weight(_text_:22 in 1422) [ClassicSimilarity], result of:
              0.053852726 = score(doc=1422,freq=2.0), product of:
                0.17398734 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049684696 = queryNorm
                0.30952093 = fieldWeight in 1422, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1422)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    22. 3.2003 19:27:23
  18. Bornmann, L.; Mutz, R.: From P100 to P100' : a new citation-rank approach (2014) 0.01
    0.008975455 = product of:
      0.026926363 = sum of:
        0.026926363 = product of:
          0.053852726 = sum of:
            0.053852726 = weight(_text_:22 in 1431) [ClassicSimilarity], result of:
              0.053852726 = score(doc=1431,freq=2.0), product of:
                0.17398734 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.049684696 = queryNorm
                0.30952093 = fieldWeight in 1431, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1431)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    22. 8.2014 17:05:18
  19. Moffat, A.; Bell, T.A.H.: In situ generation of compressed inverted files (1995) 0.01
    0.0080434345 = product of:
      0.024130303 = sum of:
        0.024130303 = product of:
          0.048260607 = sum of:
            0.048260607 = weight(_text_:indexing in 2648) [ClassicSimilarity], result of:
              0.048260607 = score(doc=2648,freq=2.0), product of:
                0.19018644 = queryWeight, product of:
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.049684696 = queryNorm
                0.2537542 = fieldWeight in 2648, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2648)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    An inverted index stores, for each term that appears in a collection of documents, a list of document numbers containing that term. Such an index is indispensible when Boolean or informal ranked queries are to be answered. Construction of the index ist, however, a non trivial task. Simple methods using in.memory data structures cannot be used for large collections because they require too much random access storage, and traditional disc based methods require large amounts of temporary file space. Describes a new indexing algorithm designed to create large compressed inverted indexes in situ. It makes use of simple compression codes for the positive integers and an in place external multi way merge sort. The new techniques has been used to invert a 2-gigabyte text collection in under 4 hours, using less than 40 megabytes of temporary disc space, and less than 20 megabytes of main memory
  20. Savoy, J.: Ranking schemes in hybrid Boolean systems : a new approach (1997) 0.01
    0.0080434345 = product of:
      0.024130303 = sum of:
        0.024130303 = product of:
          0.048260607 = sum of:
            0.048260607 = weight(_text_:indexing in 393) [ClassicSimilarity], result of:
              0.048260607 = score(doc=393,freq=2.0), product of:
                0.19018644 = queryWeight, product of:
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.049684696 = queryNorm
                0.2537542 = fieldWeight in 393, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.046875 = fieldNorm(doc=393)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    In most commercial online systems, the retrieval system is based on the Boolean model and its inverted file organization. Since the investment in these systems is so great and changing them could be economically unfeasible, this article suggests a new ranking scheme especially adapted for hypertext environments in order to produce more effective retrieval results and yet maintain the effectiveness of the investment made to date in the Boolean model. To select the retrieved documents, the suggested ranking strategy uses multiple sources of document content evidence. The proposed scheme integrates both the information provided by the index and query terms, and the inherent relationships between documents such as bibliographic references or hypertext links. We will demonstrate that our scheme represents an integration of both subject and citation indexing, and results in a significant imporvement over classical ranking schemes uses in hybrid Boolean systems, while preserving its efficiency. Moreover, through knowing the nearest neighbor and the hypertext links which constitute additional sources of evidence, our strategy will take them into account in order to further improve retrieval effectiveness and to provide 'good' starting points for browsing in a hypertext or hypermedia environement

Years

Languages

  • e 54
  • d 3
  • chi 1
  • More… Less…