Search (75 results, page 1 of 4)

  • × theme_ss:"Retrievalalgorithmen"
  • × type_ss:"a"
  • × year_i:[2000 TO 2010}
  1. Fan, W.; Fox, E.A.; Pathak, P.; Wu, H.: ¬The effects of fitness functions an genetic programming-based ranking discovery for Web search (2004) 0.02
    0.019552121 = product of:
      0.06843242 = sum of:
        0.052557457 = weight(_text_:based in 2239) [ClassicSimilarity], result of:
          0.052557457 = score(doc=2239,freq=10.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.44662142 = fieldWeight in 2239, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.046875 = fieldNorm(doc=2239)
        0.015874967 = product of:
          0.031749934 = sum of:
            0.031749934 = weight(_text_:22 in 2239) [ClassicSimilarity], result of:
              0.031749934 = score(doc=2239,freq=2.0), product of:
                0.13677022 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03905679 = queryNorm
                0.23214069 = fieldWeight in 2239, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2239)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Abstract
    Genetic-based evolutionary learning algorithms, such as genetic algorithms (GAs) and genetic programming (GP), have been applied to information retrieval (IR) since the 1980s. Recently, GP has been applied to a new IR taskdiscovery of ranking functions for Web search-and has achieved very promising results. However, in our prior research, only one fitness function has been used for GP-based learning. It is unclear how other fitness functions may affect ranking function discovery for Web search, especially since it is weIl known that choosing a proper fitness function is very important for the effectiveness and efficiency of evolutionary algorithms. In this article, we report our experience in contrasting different fitness function designs an GP-based learning using a very large Web corpus. Our results indicate that the design of fitness functions is instrumental in performance improvement. We also give recommendations an the design of fitness functions for genetic-based information retrieval experiments.
    Date
    31. 5.2004 19:22:06
  2. Kaszkiel, M.; Zobel, J.: Effective ranking with arbitrary passages (2001) 0.01
    0.011727133 = product of:
      0.08208992 = sum of:
        0.08208992 = weight(_text_:great in 5764) [ClassicSimilarity], result of:
          0.08208992 = score(doc=5764,freq=2.0), product of:
            0.21992016 = queryWeight, product of:
              5.6307793 = idf(docFreq=430, maxDocs=44218)
              0.03905679 = queryNorm
            0.37327147 = fieldWeight in 5764, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.6307793 = idf(docFreq=430, maxDocs=44218)
              0.046875 = fieldNorm(doc=5764)
      0.14285715 = coord(1/7)
    
    Abstract
    Text retrieval systems store a great variety of documents, from abstracts, newspaper articles, and Web pages to journal articles, books, court transcripts, and legislation. Collections of diverse types of documents expose shortcomings in current approaches to ranking. Use of short fragments of documents, called passages, instead of whole documents can overcome these shortcomings: passage ranking provides convenient units of text to return to the user, can avoid the difficulties of comparing documents of different length, and enables identification of short blocks of relevant material among otherwise irrelevant text. In this article, we compare several kinds of passage in an extensive series of experiments. We introduce a new type of passage, overlapping fragments of either fixed or variable length. We show that ranking with these arbitrary passages gives substantial improvements in retrieval effectiveness over traditional document ranking schemes, particularly for queries on collections of long documents. Ranking with arbitrary passages shows consistent improvements compared to ranking with whole documents, and to ranking with previous passage types that depend on document structure or topic shifts in documents
  3. Song, D.; Bruza, P.D.: Towards context sensitive information inference (2003) 0.01
    0.0116941 = product of:
      0.040929347 = sum of:
        0.02770021 = weight(_text_:based in 1428) [ClassicSimilarity], result of:
          0.02770021 = score(doc=1428,freq=4.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.23539014 = fieldWeight in 1428, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1428)
        0.013229139 = product of:
          0.026458278 = sum of:
            0.026458278 = weight(_text_:22 in 1428) [ClassicSimilarity], result of:
              0.026458278 = score(doc=1428,freq=2.0), product of:
                0.13677022 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03905679 = queryNorm
                0.19345059 = fieldWeight in 1428, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1428)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Abstract
    Humans can make hasty, but generally robust judgements about what a text fragment is, or is not, about. Such judgements are termed information inference. This article furnishes an account of information inference from a psychologistic stance. By drawing an theories from nonclassical logic and applied cognition, an information inference mechanism is proposed that makes inferences via computations of information flow through an approximation of a conceptual space. Within a conceptual space information is represented geometrically. In this article, geometric representations of words are realized as vectors in a high dimensional semantic space, which is automatically constructed from a text corpus. Two approaches were presented for priming vector representations according to context. The first approach uses a concept combination heuristic to adjust the vector representation of a concept in the light of the representation of another concept. The second approach computes a prototypical concept an the basis of exemplar trace texts and moves it in the dimensional space according to the context. Information inference is evaluated by measuring the effectiveness of query models derived by information flow computations. Results show that information flow contributes significantly to query model effectiveness, particularly with respect to precision. Moreover, retrieval effectiveness compares favorably with two probabilistic query models, and another based an semantic association. More generally, this article can be seen as a contribution towards realizing operational systems that mimic text-based human reasoning.
    Date
    22. 3.2003 19:35:46
  4. Furner, J.: ¬A unifying model of document relatedness for hybrid search engines (2003) 0.01
    0.01125125 = product of:
      0.039379373 = sum of:
        0.023504408 = weight(_text_:based in 2717) [ClassicSimilarity], result of:
          0.023504408 = score(doc=2717,freq=2.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.19973516 = fieldWeight in 2717, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.046875 = fieldNorm(doc=2717)
        0.015874967 = product of:
          0.031749934 = sum of:
            0.031749934 = weight(_text_:22 in 2717) [ClassicSimilarity], result of:
              0.031749934 = score(doc=2717,freq=2.0), product of:
                0.13677022 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03905679 = queryNorm
                0.23214069 = fieldWeight in 2717, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2717)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Abstract
    Previous work an search-engine design has indicated that information-seekers may benefit from being given the opportunity to exploit multiple sources of evidence of document relatedness. Few existing systems, however, give users more than minimal control over the selections that may be made among methods of exploitation. By applying the methods of "document network analysis" (DNA), a unifying, graph-theoretic model of content-, collaboration-, and context-based systems (CCC) may be developed in which the nature of the similarities between types of document relatedness and document ranking are clarified. The usefulness of the approach to system design suggested by this model may be tested by constructing and evaluating a prototype system (UCXtra) that allows searchers to maintain control over the multiple ways in which document collections may be ranked and re-ranked.
    Date
    11. 9.2004 17:32:22
  5. Witschel, H.F.: Global term weights in distributed environments (2008) 0.01
    0.01125125 = product of:
      0.039379373 = sum of:
        0.023504408 = weight(_text_:based in 2096) [ClassicSimilarity], result of:
          0.023504408 = score(doc=2096,freq=2.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.19973516 = fieldWeight in 2096, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.046875 = fieldNorm(doc=2096)
        0.015874967 = product of:
          0.031749934 = sum of:
            0.031749934 = weight(_text_:22 in 2096) [ClassicSimilarity], result of:
              0.031749934 = score(doc=2096,freq=2.0), product of:
                0.13677022 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03905679 = queryNorm
                0.23214069 = fieldWeight in 2096, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2096)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Abstract
    This paper examines the estimation of global term weights (such as IDF) in information retrieval scenarios where a global view on the collection is not available. In particular, the two options of either sampling documents or of using a reference corpus independent of the target retrieval collection are compared using standard IR test collections. In addition, the possibility of pruning term lists based on frequency is evaluated. The results show that very good retrieval performance can be reached when just the most frequent terms of a collection - an "extended stop word list" - are known and all terms which are not in that list are treated equally. However, the list cannot always be fully estimated from a general-purpose reference corpus, but some "domain-specific stop words" need to be added. A good solution for achieving this is to mix estimates from small samples of the target retrieval collection with ones derived from a reference corpus.
    Date
    1. 8.2008 9:44:22
  6. Campos, L.M. de; Fernández-Luna, J.M.; Huete, J.F.: Implementing relevance feedback in the Bayesian network retrieval model (2003) 0.01
    0.01125125 = product of:
      0.039379373 = sum of:
        0.023504408 = weight(_text_:based in 825) [ClassicSimilarity], result of:
          0.023504408 = score(doc=825,freq=2.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.19973516 = fieldWeight in 825, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.046875 = fieldNorm(doc=825)
        0.015874967 = product of:
          0.031749934 = sum of:
            0.031749934 = weight(_text_:22 in 825) [ClassicSimilarity], result of:
              0.031749934 = score(doc=825,freq=2.0), product of:
                0.13677022 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03905679 = queryNorm
                0.23214069 = fieldWeight in 825, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=825)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Abstract
    Relevance Feedback consists in automatically formulating a new query according to the relevance judgments provided by the user after evaluating a set of retrieved documents. In this article, we introduce several relevance feedback methods for the Bayesian Network Retrieval ModeL The theoretical frame an which our methods are based uses the concept of partial evidences, which summarize the new pieces of information gathered after evaluating the results obtained by the original query. These partial evidences are inserted into the underlying Bayesian network and a new inference process (probabilities propagation) is run to compute the posterior relevance probabilities of the documents in the collection given the new query. The quality of the proposed methods is tested using a preliminary experimentation with different standard document collections.
    Date
    22. 3.2003 19:30:19
  7. Lopez-Pujalte, C.; Guerrero Bote, V.P.; Moya-Anegón, F. de: Evaluation of the application of genetic algorithms to relevance feedback (2003) 0.01
    0.00977261 = product of:
      0.068408266 = sum of:
        0.068408266 = weight(_text_:great in 2756) [ClassicSimilarity], result of:
          0.068408266 = score(doc=2756,freq=2.0), product of:
            0.21992016 = queryWeight, product of:
              5.6307793 = idf(docFreq=430, maxDocs=44218)
              0.03905679 = queryNorm
            0.31105953 = fieldWeight in 2756, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.6307793 = idf(docFreq=430, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2756)
      0.14285715 = coord(1/7)
    
    Abstract
    We evaluated the different genetic algorithms applied to relevance feedback that are to be found in the literature and which follow the vector space model (the most commonly used model in this type of application). They were compared with a traditional relevance feedback algorithm - the Ide dec-hi method - since this had given the best results in the study of Salton & Buckley (1990) an this subject. The experiment was performed an the Cranfield collection, and the different algorithms were evaluated using the residual collection method (one of the most suitable methods for evaluating relevance feedback techniques). The results varied greatly depending an the fitness function that was used, from no improvement in some of the genetic algorithms, to a more than 127% improvement with one algorithm, surpassing even the traditional Ide dec-hi method. One can therefore conclude that genetic algorithms show great promise as an aid to implementing a truly effective information retrieval system.
  8. Hoenkamp, E.: Unitary operators on the document space (2003) 0.01
    0.00977261 = product of:
      0.068408266 = sum of:
        0.068408266 = weight(_text_:great in 3457) [ClassicSimilarity], result of:
          0.068408266 = score(doc=3457,freq=2.0), product of:
            0.21992016 = queryWeight, product of:
              5.6307793 = idf(docFreq=430, maxDocs=44218)
              0.03905679 = queryNorm
            0.31105953 = fieldWeight in 3457, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.6307793 = idf(docFreq=430, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3457)
      0.14285715 = coord(1/7)
    
    Abstract
    When people search for documents, they eventually want content, not words. Hence, search engines should relate documents more by their underlying concepts than by the words they contain. One promising technique to do so is Latent Semantic Indexing (LSI). LSI dramatically reduces the dimension of the document space by mapping it into a space spanned by conceptual indices. Empirically, the number of concepts that can represent the documents are far fewer than the great variety of words in the textual representation. Although this almost obviates the problem of lexical matching, the mapping incurs a high computational cost compared to document parsing, indexing, query matching, and updating. This article accomplishes several things. First, it shows how the technique underlying LSI is just one example of a unitary operator, for which there are computationally more attractive alternatives. Second, it proposes the Haar transform as such an alternative, as it is memory efficient, and can be computed in linear to sublinear time. Third, it generalizes LSI by a multiresolution representation of the document space. The approach not only preserves the advantages of LSI at drastically reduced computational costs, it also opens a spectrum of possibilities for new research.
  9. Na, S.-H.; Kang, I.-S.; Roh, J.-E.; Lee, J.-H.: ¬An empirical study of query expansion and cluster-based retrieval in language modeling approach (2007) 0.01
    0.008759576 = product of:
      0.06131703 = sum of:
        0.06131703 = weight(_text_:based in 906) [ClassicSimilarity], result of:
          0.06131703 = score(doc=906,freq=10.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.5210583 = fieldWeight in 906, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0546875 = fieldNorm(doc=906)
      0.14285715 = coord(1/7)
    
    Abstract
    The term mismatch problem in information retrieval is a critical problem, and several techniques have been developed, such as query expansion, cluster-based retrieval and dimensionality reduction to resolve this issue. Of these techniques, this paper performs an empirical study on query expansion and cluster-based retrieval. We examine the effect of using parsimony in query expansion and the effect of clustering algorithms in cluster-based retrieval. In addition, query expansion and cluster-based retrieval are compared, and their combinations are evaluated in terms of retrieval performance by performing experimentations on seven test collections of NTCIR and TREC.
  10. Khoo, C.S.G.; Wan, K.-W.: ¬A simple relevancy-ranking strategy for an interface to Boolean OPACs (2004) 0.01
    0.00818587 = product of:
      0.028650545 = sum of:
        0.019390147 = weight(_text_:based in 2509) [ClassicSimilarity], result of:
          0.019390147 = score(doc=2509,freq=4.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.1647731 = fieldWeight in 2509, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.02734375 = fieldNorm(doc=2509)
        0.009260397 = product of:
          0.018520795 = sum of:
            0.018520795 = weight(_text_:22 in 2509) [ClassicSimilarity], result of:
              0.018520795 = score(doc=2509,freq=2.0), product of:
                0.13677022 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03905679 = queryNorm
                0.1354154 = fieldWeight in 2509, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.02734375 = fieldNorm(doc=2509)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Abstract
    A relevancy-ranking algorithm for a natural language interface to Boolean online public access catalogs (OPACs) was formulated and compared with that currently used in a knowledge-based search interface called the E-Referencer, being developed by the authors. The algorithm makes use of seven weIl-known ranking criteria: breadth of match, section weighting, proximity of query words, variant word forms (stemming), document frequency, term frequency and document length. The algorithm converts a natural language query into a series of increasingly broader Boolean search statements. In a small experiment with ten subjects in which the algorithm was simulated by hand, the algorithm obtained good results with a mean overall precision of 0.42 and mean average precision of 0.62, representing a 27 percent improvement in precision and 41 percent improvement in average precision compared to the E-Referencer. The usefulness of each step in the algorithm was analyzed and suggestions are made for improving the algorithm.
    Content
    "Most Web search engines accept natural language queries, perform some kind of fuzzy matching and produce ranked output, displaying first the documents that are most likely to be relevant. On the other hand, most library online public access catalogs (OPACs) an the Web are still Boolean retrieval systems that perform exact matching, and require users to express their search requests precisely in a Boolean search language and to refine their search statements to improve the search results. It is well-documented that users have difficulty searching Boolean OPACs effectively (e.g. Borgman, 1996; Ensor, 1992; Wallace, 1993). One approach to making OPACs easier to use is to develop a natural language search interface that acts as a middleware between the user's Web browser and the OPAC system. The search interface can accept a natural language query from the user and reformulate it as a series of Boolean search statements that are then submitted to the OPAC. The records retrieved by the OPAC are ranked by the search interface before forwarding them to the user's Web browser. The user, then, does not need to interact directly with the Boolean OPAC but with the natural language search interface or search intermediary. The search interface interacts with the OPAC system an the user's behalf. The advantage of this approach is that no modification to the OPAC or library system is required. Furthermore, the search interface can access multiple OPACs, acting as a meta search engine, and integrate search results from various OPACs before sending them to the user. The search interface needs to incorporate a method for converting the user's natural language query into a series of Boolean search statements, and for ranking the OPAC records retrieved. The purpose of this study was to develop a relevancyranking algorithm for a search interface to Boolean OPAC systems. This is part of an on-going effort to develop a knowledge-based search interface to OPACs called the E-Referencer (Khoo et al., 1998, 1999; Poo et al., 2000). E-Referencer v. 2 that has been implemented applies a repertoire of initial search strategies and reformulation strategies to retrieve records from OPACs using the Z39.50 protocol, and also assists users in mapping query keywords to the Library of Congress subject headings."
    Source
    Electronic library. 22(2004) no.2, S.112-120
  11. Daniowicz, C.; Baliski, J.: Document ranking based upon Markov chains (2001) 0.01
    0.007834803 = product of:
      0.05484362 = sum of:
        0.05484362 = weight(_text_:based in 5388) [ClassicSimilarity], result of:
          0.05484362 = score(doc=5388,freq=2.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.46604872 = fieldWeight in 5388, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.109375 = fieldNorm(doc=5388)
      0.14285715 = coord(1/7)
    
  12. Sakai, T.: On the reliability of information retrieval metrics based on graded relevance (2007) 0.01
    0.0075082085 = product of:
      0.052557457 = sum of:
        0.052557457 = weight(_text_:based in 910) [ClassicSimilarity], result of:
          0.052557457 = score(doc=910,freq=10.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.44662142 = fieldWeight in 910, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.046875 = fieldNorm(doc=910)
      0.14285715 = coord(1/7)
    
    Abstract
    This paper compares 14 information retrieval metrics based on graded relevance, together with 10 traditional metrics based on binary relevance, in terms of stability, sensitivity and resemblance of system rankings. More specifically, we compare these metrics using the Buckley/Voorhees stability method, the Voorhees/Buckley swap method and Kendall's rank correlation, with three data sets comprising test collections and submitted runs from NTCIR. Our experiments show that (Average) Normalised Discounted Cumulative Gain at document cut-off l are the best among the rank-based graded-relevance metrics, provided that l is large. On the other hand, if one requires a recall-based graded-relevance metric that is highly correlated with Average Precision, then Q-measure is the best choice. Moreover, these best graded-relevance metrics are at least as stable and sensitive as Average Precision, and are fairly robust to the choice of gain values.
  13. Silveira, M.; Ribeiro-Neto, B.: Concept-based ranking : a case study in the juridical domain (2004) 0.01
    0.0067155454 = product of:
      0.047008816 = sum of:
        0.047008816 = weight(_text_:based in 2339) [ClassicSimilarity], result of:
          0.047008816 = score(doc=2339,freq=2.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.39947033 = fieldWeight in 2339, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.09375 = fieldNorm(doc=2339)
      0.14285715 = coord(1/7)
    
  14. Thelwall, M.; Vaughan, L.: New versions of PageRank employing alternative Web document models (2004) 0.01
    0.0067155454 = product of:
      0.047008816 = sum of:
        0.047008816 = weight(_text_:based in 674) [ClassicSimilarity], result of:
          0.047008816 = score(doc=674,freq=8.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.39947033 = fieldWeight in 674, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.046875 = fieldNorm(doc=674)
      0.14285715 = coord(1/7)
    
    Abstract
    Introduces several new versions of PageRank (the link based Web page ranking algorithm), based on an information science perspective on the concept of the Web document. Although the Web page is the typical indivisible unit of information in search engine results and most Web information retrieval algorithms, other research has suggested that aggregating pages based on directories and domains gives promising alternatives, particularly when Web links are the object of study. The new algorithms introduced based on these alternatives were used to rank four sets of Web pages. The ranking results were compared with human subjects' rankings. The results of the tests were somewhat inconclusive: the new approach worked well for the set that includes pages from different Web sites; however, it does not work well in ranking pages that are from the same site. It seems that the new algorithms may be effective for some tasks but not for others, especially when only low numbers of links are involved or the pages to be ranked are from the same site or directory.
  15. Guerrero-Bote, V.P.; Moya Anegón, F. de; Herrero Solana, V.: Document organization using Kohonen's algorithm (2002) 0.01
    0.006331477 = product of:
      0.044320337 = sum of:
        0.044320337 = weight(_text_:based in 2564) [ClassicSimilarity], result of:
          0.044320337 = score(doc=2564,freq=4.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.37662423 = fieldWeight in 2564, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0625 = fieldNorm(doc=2564)
      0.14285715 = coord(1/7)
    
    Abstract
    The classification of documents from a bibliographic database is a task that is linked to processes of information retrieval based on partial matching. A method is described of vectorizing reference documents from LISA which permits their topological organization using Kohonen's algorithm. As an example a map is generated of 202 documents from LISA, and an analysis is made of the possibilities of this type of neural network with respect to the development of information retrieval systems based on graphical browsing.
  16. Crouch, C.J.; Crouch, D.B.; Chen, Q.; Holtz, S.J.: Improving the retrieval effectiveness of very short queries (2002) 0.01
    0.005596288 = product of:
      0.039174013 = sum of:
        0.039174013 = weight(_text_:based in 2572) [ClassicSimilarity], result of:
          0.039174013 = score(doc=2572,freq=8.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.33289194 = fieldWeight in 2572, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2572)
      0.14285715 = coord(1/7)
    
    Abstract
    This paper describes an automatic approach designed to improve the retrieval effectiveness of very short queries such as those used in web searching. The method is based on the observation that stemming, which is designed to maximize recall, often results in depressed precision. Our approach is based on pseudo-feedback and attempts to increase the number of relevant documents in the pseudo-relevant set by reranking those documents based on the presence of unstemmed query terms in the document text. The original experiments underlying this work were carried out using Smart 11.0 and the lnc.ltc weighting scheme on three sets of documents from the TREC collection with corresponding TREC (title only) topics as queries. (The average length of these queries after stoplisting ranges from 2.4 to 4.5 terms.) Results, evaluated in terms of P@20 and non-interpolated average precision, showed clearly that pseudo-feedback (PF) based on this approach was effective in increasing the number of relevant documents in the top ranks. Subsequent experiments, performed on the same data sets using Smart 13.0 and the improved Lnu.ltu weighting scheme, indicate that these results hold up even over the much higher baseline provided by the new weights. Query drift analysis presents a more detailed picture of the improvements produced by this process.
  17. López-Pujalte, C.; Guerrero-Bote, V.P.; Moya-Anegón, F. de: Order-based fitness functions for genetic algorithms applied to relevance feedback (2003) 0.01
    0.005596288 = product of:
      0.039174013 = sum of:
        0.039174013 = weight(_text_:based in 5154) [ClassicSimilarity], result of:
          0.039174013 = score(doc=5154,freq=8.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.33289194 = fieldWeight in 5154, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5154)
      0.14285715 = coord(1/7)
    
    Abstract
    Lopez-Pujalte and Guerrero-Bote test a relevance feedback genetic algorithm while varying its order based fitness functions and generating a function based upon the Ide dec-hi method as a base line. Using the non-zero weighted term types assigned to the query, and to the initially retrieved set of documents, as genes, a chromosome of equal length is created for each. The algorithm is provided with the chromosomes for judged relevant documents, for judged irrelevant documents, and for the irrelevant documents with their terms negated. The algorithm uses random selection of all possible genes, but gives greater likelihood to those with higher fitness values. When the fittest chromosome of a previous population is eliminated it is restored while the least fittest of the new population is eliminated in its stead. A crossover probability of .8 and a mutation probability of .2 were used with 20 generations. Three fitness functions were utilized; the Horng and Yeh function which takes into account the position of relevant documents, and two new functions, one based on accumulating the cosine similarity for retrieved documents, the other on stored fixed-recall-interval precessions. The Cranfield collection was used with the first 15 documents retrieved from 33 queries chosen to have at least 3 relevant documents in the first 15 and at least 5 relevant documents not initially retrieved. Precision was calculated at fixed recall levels using the residual collection method which removes viewed documents. One of the three functions improved the original retrieval by127 percent, while the Ide dec-hi method provided a 120 percent improvement.
  18. Shah, B.; Raghavan, V.; Dhatric, P.; Zhao, X.: ¬A cluster-based approach for efficient content-based image retrieval using a similarity-preserving space transformation method (2006) 0.01
    0.005596288 = product of:
      0.039174013 = sum of:
        0.039174013 = weight(_text_:based in 6118) [ClassicSimilarity], result of:
          0.039174013 = score(doc=6118,freq=8.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.33289194 = fieldWeight in 6118, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0390625 = fieldNorm(doc=6118)
      0.14285715 = coord(1/7)
    
    Abstract
    The techniques of clustering and space transformation have been successfully used in the past to solve a number of pattern recognition problems. In this article, the authors propose a new approach to content-based image retrieval (CBIR) that uses (a) a newly proposed similarity-preserving space transformation method to transform the original low-level image space into a highlevel vector space that enables efficient query processing, and (b) a clustering scheme that further improves the efficiency of our retrieval system. This combination is unique and the resulting system provides synergistic advantages of using both clustering and space transformation. The proposed space transformation method is shown to preserve the order of the distances in the transformed feature space. This strategy makes this approach to retrieval generic as it can be applied to object types, other than images, and feature spaces more general than metric spaces. The CBIR approach uses the inexpensive "estimated" distance in the transformed space, as opposed to the computationally inefficient "real" distance in the original space, to retrieve the desired results for a given query image. The authors also provide a theoretical analysis of the complexity of their CBIR approach when used for color-based retrieval, which shows that it is computationally more efficient than other comparable approaches. An extensive set of experiments to test the efficiency and effectiveness of the proposed approach has been performed. The results show that the approach offers superior response time (improvement of 1-2 orders of magnitude compared to retrieval approaches that either use pruning techniques like indexing, clustering, etc., or space transformation, but not both) with sufficiently high retrieval accuracy.
  19. Bhogal, J.; Macfarlane, A.; Smith, P.: ¬A review of ontology based query expansion (2007) 0.01
    0.005540042 = product of:
      0.038780294 = sum of:
        0.038780294 = weight(_text_:based in 919) [ClassicSimilarity], result of:
          0.038780294 = score(doc=919,freq=4.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.3295462 = fieldWeight in 919, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0546875 = fieldNorm(doc=919)
      0.14285715 = coord(1/7)
    
    Abstract
    This paper examines the meaning of context in relation to ontology based query expansion and contains a review of query expansion approaches. The various query expansion approaches include relevance feedback, corpus dependent knowledge models and corpus independent knowledge models. Case studies detailing query expansion using domain-specific and domain-independent ontologies are also included. The penultimate section attempts to synthesise the information obtained from the review and provide success factors in using an ontology for query expansion. Finally the area of further research in applying context from an ontology to query expansion within a newswire domain is described.
  20. Torra, V.; Miyamoto, S.; Lanau, S.: Exploration of textual document archives using a fuzzy hierarchical clustering algorithm in the GAMBAL system (2005) 0.01
    0.005540042 = product of:
      0.038780294 = sum of:
        0.038780294 = weight(_text_:based in 1028) [ClassicSimilarity], result of:
          0.038780294 = score(doc=1028,freq=4.0), product of:
            0.11767787 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03905679 = queryNorm
            0.3295462 = fieldWeight in 1028, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1028)
      0.14285715 = coord(1/7)
    
    Abstract
    The Internet, together with the large amount of textual information available in document archives, has increased the relevance of information retrieval related tools. In this work we present an extension of the Gambal system for clustering and visualization of documents based on fuzzy clustering techniques. The tool allows to structure the set of documents in a hierarchical way (using a fuzzy hierarchical structure) and represent this structure in a graphical interface (a 3D sphere) over which the user can navigate. Gambal allows the analysis of the documents and the computation of their similarity not only on the basis of the syntactic similarity between words but also based on a dictionary (Wordnet 1.7) and latent semantics analysis.

Languages

  • e 74
  • d 1
  • More… Less…