Search (111 results, page 1 of 6)

  • × theme_ss:"Retrievalalgorithmen"
  1. Faloutsos, C.: Signature files (1992) 0.05
    0.053570747 = product of:
      0.13392687 = sum of:
        0.07728181 = weight(_text_:it in 3499) [ClassicSimilarity], result of:
          0.07728181 = score(doc=3499,freq=8.0), product of:
            0.15115225 = queryWeight, product of:
              2.892262 = idf(docFreq=6664, maxDocs=44218)
              0.052260913 = queryNorm
            0.51128453 = fieldWeight in 3499, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              2.892262 = idf(docFreq=6664, maxDocs=44218)
              0.0625 = fieldNorm(doc=3499)
        0.05664506 = weight(_text_:22 in 3499) [ClassicSimilarity], result of:
          0.05664506 = score(doc=3499,freq=2.0), product of:
            0.18300882 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052260913 = queryNorm
            0.30952093 = fieldWeight in 3499, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=3499)
      0.4 = coord(2/5)
    
    Abstract
    Presents a survey and discussion on signature-based text retrieval methods. It describes the main idea behind the signature approach and its advantages over other text retrieval methods, it provides a classification of the signature methods that have appeared in the literature, it describes the main representatives of each class, together with the relative advantages and drawbacks, and it gives a list of applications as well as commercial or university prototypes that use the signature approach
    Date
    7. 5.1999 15:22:48
  2. Crestani, F.; Dominich, S.; Lalmas, M.; Rijsbergen, C.J.K. van: Mathematical, logical, and formal methods in information retrieval : an introduction to the special issue (2003) 0.03
    0.033387464 = product of:
      0.08346866 = sum of:
        0.04098487 = weight(_text_:it in 1451) [ClassicSimilarity], result of:
          0.04098487 = score(doc=1451,freq=4.0), product of:
            0.15115225 = queryWeight, product of:
              2.892262 = idf(docFreq=6664, maxDocs=44218)
              0.052260913 = queryNorm
            0.27114958 = fieldWeight in 1451, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.892262 = idf(docFreq=6664, maxDocs=44218)
              0.046875 = fieldNorm(doc=1451)
        0.042483795 = weight(_text_:22 in 1451) [ClassicSimilarity], result of:
          0.042483795 = score(doc=1451,freq=2.0), product of:
            0.18300882 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052260913 = queryNorm
            0.23214069 = fieldWeight in 1451, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=1451)
      0.4 = coord(2/5)
    
    Abstract
    Research an the use of mathematical, logical, and formal methods, has been central to Information Retrieval research for a long time. Research in this area is important not only because it helps enhancing retrieval effectiveness, but also because it helps clarifying the underlying concepts of Information Retrieval. In this article we outline some of the major aspects of the subject, and summarize the papers of this special issue with respect to how they relate to these aspects. We conclude by highlighting some directions of future research, which are needed to better understand the formal characteristics of Information Retrieval.
    Date
    22. 3.2003 19:27:36
  3. Fan, W.; Fox, E.A.; Pathak, P.; Wu, H.: ¬The effects of fitness functions an genetic programming-based ranking discovery for Web search (2004) 0.03
    0.033387464 = product of:
      0.08346866 = sum of:
        0.04098487 = weight(_text_:it in 2239) [ClassicSimilarity], result of:
          0.04098487 = score(doc=2239,freq=4.0), product of:
            0.15115225 = queryWeight, product of:
              2.892262 = idf(docFreq=6664, maxDocs=44218)
              0.052260913 = queryNorm
            0.27114958 = fieldWeight in 2239, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.892262 = idf(docFreq=6664, maxDocs=44218)
              0.046875 = fieldNorm(doc=2239)
        0.042483795 = weight(_text_:22 in 2239) [ClassicSimilarity], result of:
          0.042483795 = score(doc=2239,freq=2.0), product of:
            0.18300882 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052260913 = queryNorm
            0.23214069 = fieldWeight in 2239, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2239)
      0.4 = coord(2/5)
    
    Abstract
    Genetic-based evolutionary learning algorithms, such as genetic algorithms (GAs) and genetic programming (GP), have been applied to information retrieval (IR) since the 1980s. Recently, GP has been applied to a new IR taskdiscovery of ranking functions for Web search-and has achieved very promising results. However, in our prior research, only one fitness function has been used for GP-based learning. It is unclear how other fitness functions may affect ranking function discovery for Web search, especially since it is weIl known that choosing a proper fitness function is very important for the effectiveness and efficiency of evolutionary algorithms. In this article, we report our experience in contrasting different fitness function designs an GP-based learning using a very large Web corpus. Our results indicate that the design of fitness functions is instrumental in performance improvement. We also give recommendations an the design of fitness functions for genetic-based information retrieval experiments.
    Date
    31. 5.2004 19:22:06
  4. Klas, C.-P.; Fuhr, N.; Schaefer, A.: Evaluating strategic support for information access in the DAFFODIL system (2004) 0.03
    0.02858579 = product of:
      0.07146447 = sum of:
        0.028980678 = weight(_text_:it in 2419) [ClassicSimilarity], result of:
          0.028980678 = score(doc=2419,freq=2.0), product of:
            0.15115225 = queryWeight, product of:
              2.892262 = idf(docFreq=6664, maxDocs=44218)
              0.052260913 = queryNorm
            0.19173169 = fieldWeight in 2419, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.892262 = idf(docFreq=6664, maxDocs=44218)
              0.046875 = fieldNorm(doc=2419)
        0.042483795 = weight(_text_:22 in 2419) [ClassicSimilarity], result of:
          0.042483795 = score(doc=2419,freq=2.0), product of:
            0.18300882 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052260913 = queryNorm
            0.23214069 = fieldWeight in 2419, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2419)
      0.4 = coord(2/5)
    
    Abstract
    The digital library system Daffodil is targeted at strategic support of users during the information search process. For searching, exploring and managing digital library objects it provides user-customisable information seeking patterns over a federation of heterogeneous digital libraries. In this paper evaluation results with respect to retrieval effectiveness, efficiency and user satisfaction are presented. The analysis focuses on strategic support for the scientific work-flow. Daffodil supports the whole work-flow, from data source selection over information seeking to the representation, organisation and reuse of information. By embedding high level search functionality into the scientific work-flow, the user experiences better strategic system support due to a more systematic work process. These ideas have been implemented in Daffodil followed by a qualitative evaluation. The evaluation has been conducted with 28 participants, ranging from information seeking novices to experts. The results are promising, as they support the chosen model.
    Date
    16.11.2008 16:22:48
  5. Baloh, P.; Desouza, K.C.; Hackney, R.: Contextualizing organizational interventions of knowledge management systems : a design science perspectiveA domain analysis (2012) 0.03
    0.027822888 = product of:
      0.06955722 = sum of:
        0.034154054 = weight(_text_:it in 241) [ClassicSimilarity], result of:
          0.034154054 = score(doc=241,freq=4.0), product of:
            0.15115225 = queryWeight, product of:
              2.892262 = idf(docFreq=6664, maxDocs=44218)
              0.052260913 = queryNorm
            0.22595796 = fieldWeight in 241, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.892262 = idf(docFreq=6664, maxDocs=44218)
              0.0390625 = fieldNorm(doc=241)
        0.035403162 = weight(_text_:22 in 241) [ClassicSimilarity], result of:
          0.035403162 = score(doc=241,freq=2.0), product of:
            0.18300882 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052260913 = queryNorm
            0.19345059 = fieldWeight in 241, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=241)
      0.4 = coord(2/5)
    
    Abstract
    We address how individuals' (workers) knowledge needs influence the design of knowledge management systems (KMS), enabling knowledge creation and utilization. It is evident that KMS technologies and activities are indiscriminately deployed in most organizations with little regard to the actual context of their adoption. Moreover, it is apparent that the extant literature pertaining to knowledge management projects is frequently deficient in identifying the variety of factors indicative for successful KMS. This presents an obvious business practice and research gap that requires a critical analysis of the necessary intervention that will actually improve how workers can leverage and form organization-wide knowledge. This research involved an extensive review of the literature, a grounded theory methodological approach and rigorous data collection and synthesis through an empirical case analysis (Parsons Brinckerhoff and Samsung). The contribution of this study is the formulation of a model for designing KMS based upon the design science paradigm, which aspires to create artifacts that are interdependent of people and organizations. The essential proposition is that KMS design and implementation must be contextualized in relation to knowledge needs and that these will differ for various organizational settings. The findings present valuable insights and further understanding of the way in which KMS design efforts should be focused.
    Date
    11. 6.2012 14:22:34
  6. Song, D.; Bruza, P.D.: Towards context sensitive information inference (2003) 0.02
    0.023821492 = product of:
      0.059553728 = sum of:
        0.024150565 = weight(_text_:it in 1428) [ClassicSimilarity], result of:
          0.024150565 = score(doc=1428,freq=2.0), product of:
            0.15115225 = queryWeight, product of:
              2.892262 = idf(docFreq=6664, maxDocs=44218)
              0.052260913 = queryNorm
            0.15977642 = fieldWeight in 1428, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.892262 = idf(docFreq=6664, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1428)
        0.035403162 = weight(_text_:22 in 1428) [ClassicSimilarity], result of:
          0.035403162 = score(doc=1428,freq=2.0), product of:
            0.18300882 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052260913 = queryNorm
            0.19345059 = fieldWeight in 1428, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1428)
      0.4 = coord(2/5)
    
    Abstract
    Humans can make hasty, but generally robust judgements about what a text fragment is, or is not, about. Such judgements are termed information inference. This article furnishes an account of information inference from a psychologistic stance. By drawing an theories from nonclassical logic and applied cognition, an information inference mechanism is proposed that makes inferences via computations of information flow through an approximation of a conceptual space. Within a conceptual space information is represented geometrically. In this article, geometric representations of words are realized as vectors in a high dimensional semantic space, which is automatically constructed from a text corpus. Two approaches were presented for priming vector representations according to context. The first approach uses a concept combination heuristic to adjust the vector representation of a concept in the light of the representation of another concept. The second approach computes a prototypical concept an the basis of exemplar trace texts and moves it in the dimensional space according to the context. Information inference is evaluated by measuring the effectiveness of query models derived by information flow computations. Results show that information flow contributes significantly to query model effectiveness, particularly with respect to precision. Moreover, retrieval effectiveness compares favorably with two probabilistic query models, and another based an semantic association. More generally, this article can be seen as a contribution towards realizing operational systems that mimic text-based human reasoning.
    Date
    22. 3.2003 19:35:46
  7. Dominich, S.: Mathematical foundations of information retrieval (2001) 0.02
    0.023821492 = product of:
      0.059553728 = sum of:
        0.024150565 = weight(_text_:it in 1753) [ClassicSimilarity], result of:
          0.024150565 = score(doc=1753,freq=2.0), product of:
            0.15115225 = queryWeight, product of:
              2.892262 = idf(docFreq=6664, maxDocs=44218)
              0.052260913 = queryNorm
            0.15977642 = fieldWeight in 1753, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.892262 = idf(docFreq=6664, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1753)
        0.035403162 = weight(_text_:22 in 1753) [ClassicSimilarity], result of:
          0.035403162 = score(doc=1753,freq=2.0), product of:
            0.18300882 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052260913 = queryNorm
            0.19345059 = fieldWeight in 1753, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1753)
      0.4 = coord(2/5)
    
    Abstract
    This book offers a comprehensive and consistent mathematical approach to information retrieval (IR) without which no implementation is possible, and sheds an entirely new light upon the structure of IR models. It contains the descriptions of all IR models in a unified formal style and language, along with examples for each, thus offering a comprehensive overview of them. The book also creates mathematical foundations and a consistent mathematical theory (including all mathematical results achieved so far) of IR as a stand-alone mathematical discipline, which thus can be read and taught independently. Also, the book contains all necessary mathematical knowledge on which IR relies, to help the reader avoid searching different sources. The book will be of interest to computer or information scientists, librarians, mathematicians, undergraduate students and researchers whose work involves information retrieval.
    Date
    22. 3.2008 12:26:32
  8. Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval (1986) 0.02
    0.022658026 = product of:
      0.11329012 = sum of:
        0.11329012 = weight(_text_:22 in 402) [ClassicSimilarity], result of:
          0.11329012 = score(doc=402,freq=2.0), product of:
            0.18300882 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052260913 = queryNorm
            0.61904186 = fieldWeight in 402, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.125 = fieldNorm(doc=402)
      0.2 = coord(1/5)
    
    Source
    Information processing and management. 22(1986) no.6, S.465-476
  9. Smeaton, A.F.; Rijsbergen, C.J. van: ¬The retrieval effects of query expansion on a feedback document retrieval system (1983) 0.02
    0.019825771 = product of:
      0.09912886 = sum of:
        0.09912886 = weight(_text_:22 in 2134) [ClassicSimilarity], result of:
          0.09912886 = score(doc=2134,freq=2.0), product of:
            0.18300882 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052260913 = queryNorm
            0.5416616 = fieldWeight in 2134, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=2134)
      0.2 = coord(1/5)
    
    Date
    30. 3.2001 13:32:22
  10. Back, J.: ¬An evaluation of relevancy ranking techniques used by Internet search engines (2000) 0.02
    0.019825771 = product of:
      0.09912886 = sum of:
        0.09912886 = weight(_text_:22 in 3445) [ClassicSimilarity], result of:
          0.09912886 = score(doc=3445,freq=2.0), product of:
            0.18300882 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052260913 = queryNorm
            0.5416616 = fieldWeight in 3445, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=3445)
      0.2 = coord(1/5)
    
    Date
    25. 8.2005 17:42:22
  11. Khoo, C.S.G.; Wan, K.-W.: ¬A simple relevancy-ranking strategy for an interface to Boolean OPACs (2004) 0.02
    0.019476023 = product of:
      0.048690055 = sum of:
        0.02390784 = weight(_text_:it in 2509) [ClassicSimilarity], result of:
          0.02390784 = score(doc=2509,freq=4.0), product of:
            0.15115225 = queryWeight, product of:
              2.892262 = idf(docFreq=6664, maxDocs=44218)
              0.052260913 = queryNorm
            0.15817058 = fieldWeight in 2509, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.892262 = idf(docFreq=6664, maxDocs=44218)
              0.02734375 = fieldNorm(doc=2509)
        0.024782214 = weight(_text_:22 in 2509) [ClassicSimilarity], result of:
          0.024782214 = score(doc=2509,freq=2.0), product of:
            0.18300882 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052260913 = queryNorm
            0.1354154 = fieldWeight in 2509, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.02734375 = fieldNorm(doc=2509)
      0.4 = coord(2/5)
    
    Content
    "Most Web search engines accept natural language queries, perform some kind of fuzzy matching and produce ranked output, displaying first the documents that are most likely to be relevant. On the other hand, most library online public access catalogs (OPACs) an the Web are still Boolean retrieval systems that perform exact matching, and require users to express their search requests precisely in a Boolean search language and to refine their search statements to improve the search results. It is well-documented that users have difficulty searching Boolean OPACs effectively (e.g. Borgman, 1996; Ensor, 1992; Wallace, 1993). One approach to making OPACs easier to use is to develop a natural language search interface that acts as a middleware between the user's Web browser and the OPAC system. The search interface can accept a natural language query from the user and reformulate it as a series of Boolean search statements that are then submitted to the OPAC. The records retrieved by the OPAC are ranked by the search interface before forwarding them to the user's Web browser. The user, then, does not need to interact directly with the Boolean OPAC but with the natural language search interface or search intermediary. The search interface interacts with the OPAC system an the user's behalf. The advantage of this approach is that no modification to the OPAC or library system is required. Furthermore, the search interface can access multiple OPACs, acting as a meta search engine, and integrate search results from various OPACs before sending them to the user. The search interface needs to incorporate a method for converting the user's natural language query into a series of Boolean search statements, and for ranking the OPAC records retrieved. The purpose of this study was to develop a relevancyranking algorithm for a search interface to Boolean OPAC systems. This is part of an on-going effort to develop a knowledge-based search interface to OPACs called the E-Referencer (Khoo et al., 1998, 1999; Poo et al., 2000). E-Referencer v. 2 that has been implemented applies a repertoire of initial search strategies and reformulation strategies to retrieve records from OPACs using the Z39.50 protocol, and also assists users in mapping query keywords to the Library of Congress subject headings."
    Source
    Electronic library. 22(2004) no.2, S.112-120
  12. Fuhr, N.: Ranking-Experimente mit gewichteter Indexierung (1986) 0.02
    0.016993519 = product of:
      0.08496759 = sum of:
        0.08496759 = weight(_text_:22 in 58) [ClassicSimilarity], result of:
          0.08496759 = score(doc=58,freq=2.0), product of:
            0.18300882 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052260913 = queryNorm
            0.46428138 = fieldWeight in 58, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=58)
      0.2 = coord(1/5)
    
    Date
    14. 6.2015 22:12:44
  13. Fuhr, N.: Rankingexperimente mit gewichteter Indexierung (1986) 0.02
    0.016993519 = product of:
      0.08496759 = sum of:
        0.08496759 = weight(_text_:22 in 2051) [ClassicSimilarity], result of:
          0.08496759 = score(doc=2051,freq=2.0), product of:
            0.18300882 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052260913 = queryNorm
            0.46428138 = fieldWeight in 2051, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=2051)
      0.2 = coord(1/5)
    
    Date
    14. 6.2015 22:12:56
  14. Bodoff, D.; Robertson, S.: ¬A new unified probabilistic model (2004) 0.01
    0.012960553 = product of:
      0.064802766 = sum of:
        0.064802766 = weight(_text_:it in 2129) [ClassicSimilarity], result of:
          0.064802766 = score(doc=2129,freq=10.0), product of:
            0.15115225 = queryWeight, product of:
              2.892262 = idf(docFreq=6664, maxDocs=44218)
              0.052260913 = queryNorm
            0.4287251 = fieldWeight in 2129, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              2.892262 = idf(docFreq=6664, maxDocs=44218)
              0.046875 = fieldNorm(doc=2129)
      0.2 = coord(1/5)
    
    Abstract
    This paper proposes a new unified probabilistic model. Two previous models, Robertson et al.'s "Model 0" and "Model 3," each have strengths and weaknesses. The strength of Model 0 not found in Model 3, is that it does not require relevance data about the particular document or query, and, related to that, its probability estimates are straightforward. The strength of Model 3 not found in Model 0 is that it can utilize feedback information about the particular document and query in question. In this paper we introduce a new unified probabilistic model that combines these strengths: the expression of its probabilities is straightforward, it does not require that data must be available for the particular document or query in question, but it can utilize such specific data if it is available. The model is one way to resolve the difficulty of combining two marginal views in probabilistic retrieval.
  15. Hoenkamp, E.: Unitary operators on the document space (2003) 0.01
    0.011831312 = product of:
      0.05915656 = sum of:
        0.05915656 = weight(_text_:it in 3457) [ClassicSimilarity], result of:
          0.05915656 = score(doc=3457,freq=12.0), product of:
            0.15115225 = queryWeight, product of:
              2.892262 = idf(docFreq=6664, maxDocs=44218)
              0.052260913 = queryNorm
            0.39137068 = fieldWeight in 3457, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              2.892262 = idf(docFreq=6664, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3457)
      0.2 = coord(1/5)
    
    Abstract
    When people search for documents, they eventually want content, not words. Hence, search engines should relate documents more by their underlying concepts than by the words they contain. One promising technique to do so is Latent Semantic Indexing (LSI). LSI dramatically reduces the dimension of the document space by mapping it into a space spanned by conceptual indices. Empirically, the number of concepts that can represent the documents are far fewer than the great variety of words in the textual representation. Although this almost obviates the problem of lexical matching, the mapping incurs a high computational cost compared to document parsing, indexing, query matching, and updating. This article accomplishes several things. First, it shows how the technique underlying LSI is just one example of a unitary operator, for which there are computationally more attractive alternatives. Second, it proposes the Haar transform as such an alternative, as it is memory efficient, and can be computed in linear to sublinear time. Third, it generalizes LSI by a multiresolution representation of the document space. The approach not only preserves the advantages of LSI at drastically reduced computational costs, it also opens a spectrum of possibilities for new research.
  16. Robertson, S.: Understanding inverse document frequency : on theoretical arguments for IDF (2004) 0.01
    0.011712402 = product of:
      0.05856201 = sum of:
        0.05856201 = weight(_text_:it in 4421) [ClassicSimilarity], result of:
          0.05856201 = score(doc=4421,freq=6.0), product of:
            0.15115225 = queryWeight, product of:
              2.892262 = idf(docFreq=6664, maxDocs=44218)
              0.052260913 = queryNorm
            0.38743722 = fieldWeight in 4421, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              2.892262 = idf(docFreq=6664, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4421)
      0.2 = coord(1/5)
    
    Abstract
    The term-weighting function known as IDF was proposed in 1972, and has since been extremely widely used, usually as part of a TF*IDF function. It is often described as a heuristic, and many papers have been written (some based on Shannon's Information Theory) seeking to establish some theoretical basis for it. Some of these attempts are reviewed, and it is shown that the Information Theory approaches are problematic, but that there are good theoretical justifications of both IDF and TF*IDF in the traditional probabilistic model of information retrieval.
  17. Thelwall, M.: Can Google's PageRank be used to find the most important academic Web pages? (2003) 0.01
    0.011592272 = product of:
      0.057961356 = sum of:
        0.057961356 = weight(_text_:it in 4457) [ClassicSimilarity], result of:
          0.057961356 = score(doc=4457,freq=8.0), product of:
            0.15115225 = queryWeight, product of:
              2.892262 = idf(docFreq=6664, maxDocs=44218)
              0.052260913 = queryNorm
            0.38346338 = fieldWeight in 4457, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              2.892262 = idf(docFreq=6664, maxDocs=44218)
              0.046875 = fieldNorm(doc=4457)
      0.2 = coord(1/5)
    
    Abstract
    Google's PageRank is an influential algorithm that uses a model of Web use that is dominated by its link structure in order to rank pages by their estimated value to the Web community. This paper reports on the outcome of applying the algorithm to the Web sites of three national university systems in order to test whether it is capable of identifying the most important Web pages. The results are also compared with simple inlink counts. It was discovered that the highest inlinked pages do not always have the highest PageRank, indicating that the two metrics are genuinely different, even for the top pages. More significantly, however, internal links dominated external links for the high ranks in either method and superficial reasons accounted for high scores in both cases. It is concluded that PageRank is not useful for identifying the top pages in a site and that it must be combined with a powerful text matching techniques in order to get the quality of information retrieval results provided by Google.
  18. MacFarlane, A.; Robertson, S.E.; McCann, J.A.: Parallel computing for passage retrieval (2004) 0.01
    0.011329013 = product of:
      0.05664506 = sum of:
        0.05664506 = weight(_text_:22 in 5108) [ClassicSimilarity], result of:
          0.05664506 = score(doc=5108,freq=2.0), product of:
            0.18300882 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052260913 = queryNorm
            0.30952093 = fieldWeight in 5108, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=5108)
      0.2 = coord(1/5)
    
    Date
    20. 1.2007 18:30:22
  19. Losada, D.E.; Barreiro, A.: Emebedding term similarity and inverse document frequency into a logical model of information retrieval (2003) 0.01
    0.011329013 = product of:
      0.05664506 = sum of:
        0.05664506 = weight(_text_:22 in 1422) [ClassicSimilarity], result of:
          0.05664506 = score(doc=1422,freq=2.0), product of:
            0.18300882 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052260913 = queryNorm
            0.30952093 = fieldWeight in 1422, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=1422)
      0.2 = coord(1/5)
    
    Date
    22. 3.2003 19:27:23
  20. Bornmann, L.; Mutz, R.: From P100 to P100' : a new citation-rank approach (2014) 0.01
    0.011329013 = product of:
      0.05664506 = sum of:
        0.05664506 = weight(_text_:22 in 1431) [ClassicSimilarity], result of:
          0.05664506 = score(doc=1431,freq=2.0), product of:
            0.18300882 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.052260913 = queryNorm
            0.30952093 = fieldWeight in 1431, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=1431)
      0.2 = coord(1/5)
    
    Date
    22. 8.2014 17:05:18

Years

Languages

  • e 105
  • d 5
  • More… Less…

Types

  • a 102
  • el 4
  • m 4
  • s 2
  • r 1
  • More… Less…