Search (354 results, page 1 of 18)

  • × theme_ss:"Retrievalalgorithmen"
  1. Ruthven, I.; Lalmas, M.: Selective relevance feedback using term characteristics (1999) 0.13
    0.12566501 = product of:
      0.16755334 = sum of:
        0.009158926 = weight(_text_:a in 3824) [ClassicSimilarity], result of:
          0.009158926 = score(doc=3824,freq=4.0), product of:
            0.05083672 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.044089027 = queryNorm
            0.18016359 = fieldWeight in 3824, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.078125 = fieldNorm(doc=3824)
        0.10723893 = weight(_text_:et in 3824) [ClassicSimilarity], result of:
          0.10723893 = score(doc=3824,freq=2.0), product of:
            0.20686594 = queryWeight, product of:
              4.692005 = idf(docFreq=1101, maxDocs=44218)
              0.044089027 = queryNorm
            0.5183982 = fieldWeight in 3824, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.692005 = idf(docFreq=1101, maxDocs=44218)
              0.078125 = fieldNorm(doc=3824)
        0.051155485 = product of:
          0.10231097 = sum of:
            0.10231097 = weight(_text_:al in 3824) [ClassicSimilarity], result of:
              0.10231097 = score(doc=3824,freq=2.0), product of:
                0.20205697 = queryWeight, product of:
                  4.582931 = idf(docFreq=1228, maxDocs=44218)
                  0.044089027 = queryNorm
                0.5063471 = fieldWeight in 3824, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.582931 = idf(docFreq=1228, maxDocs=44218)
                  0.078125 = fieldNorm(doc=3824)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Source
    Vocabulary as a central concept in digital libraries: interdisciplinary concepts, challenges, and opportunities : proceedings of the Third International Conference an Conceptions of Library and Information Science (COLIS3), Dubrovnik, Croatia, 23-26 May 1999. Ed. by T. Arpanac et al
    Type
    a
  2. Khoo, C.S.G.; Wan, K.-W.: ¬A simple relevancy-ranking strategy for an interface to Boolean OPACs (2004) 0.10
    0.10144564 = product of:
      0.13526085 = sum of:
        0.010631852 = weight(_text_:a in 2509) [ClassicSimilarity], result of:
          0.010631852 = score(doc=2509,freq=44.0), product of:
            0.05083672 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.044089027 = queryNorm
            0.20913726 = fieldWeight in 2509, product of:
              6.6332498 = tf(freq=44.0), with freq of:
                44.0 = termFreq=44.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.02734375 = fieldNorm(doc=2509)
        0.053080566 = weight(_text_:et in 2509) [ClassicSimilarity], result of:
          0.053080566 = score(doc=2509,freq=4.0), product of:
            0.20686594 = queryWeight, product of:
              4.692005 = idf(docFreq=1101, maxDocs=44218)
              0.044089027 = queryNorm
            0.25659403 = fieldWeight in 2509, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.692005 = idf(docFreq=1101, maxDocs=44218)
              0.02734375 = fieldNorm(doc=2509)
        0.07154844 = sum of:
          0.050641347 = weight(_text_:al in 2509) [ClassicSimilarity], result of:
            0.050641347 = score(doc=2509,freq=4.0), product of:
              0.20205697 = queryWeight, product of:
                4.582931 = idf(docFreq=1228, maxDocs=44218)
                0.044089027 = queryNorm
              0.25062904 = fieldWeight in 2509, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.582931 = idf(docFreq=1228, maxDocs=44218)
                0.02734375 = fieldNorm(doc=2509)
          0.020907091 = weight(_text_:22 in 2509) [ClassicSimilarity], result of:
            0.020907091 = score(doc=2509,freq=2.0), product of:
              0.15439226 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.044089027 = queryNorm
              0.1354154 = fieldWeight in 2509, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.02734375 = fieldNorm(doc=2509)
      0.75 = coord(3/4)
    
    Abstract
    A relevancy-ranking algorithm for a natural language interface to Boolean online public access catalogs (OPACs) was formulated and compared with that currently used in a knowledge-based search interface called the E-Referencer, being developed by the authors. The algorithm makes use of seven weIl-known ranking criteria: breadth of match, section weighting, proximity of query words, variant word forms (stemming), document frequency, term frequency and document length. The algorithm converts a natural language query into a series of increasingly broader Boolean search statements. In a small experiment with ten subjects in which the algorithm was simulated by hand, the algorithm obtained good results with a mean overall precision of 0.42 and mean average precision of 0.62, representing a 27 percent improvement in precision and 41 percent improvement in average precision compared to the E-Referencer. The usefulness of each step in the algorithm was analyzed and suggestions are made for improving the algorithm.
    Content
    "Most Web search engines accept natural language queries, perform some kind of fuzzy matching and produce ranked output, displaying first the documents that are most likely to be relevant. On the other hand, most library online public access catalogs (OPACs) an the Web are still Boolean retrieval systems that perform exact matching, and require users to express their search requests precisely in a Boolean search language and to refine their search statements to improve the search results. It is well-documented that users have difficulty searching Boolean OPACs effectively (e.g. Borgman, 1996; Ensor, 1992; Wallace, 1993). One approach to making OPACs easier to use is to develop a natural language search interface that acts as a middleware between the user's Web browser and the OPAC system. The search interface can accept a natural language query from the user and reformulate it as a series of Boolean search statements that are then submitted to the OPAC. The records retrieved by the OPAC are ranked by the search interface before forwarding them to the user's Web browser. The user, then, does not need to interact directly with the Boolean OPAC but with the natural language search interface or search intermediary. The search interface interacts with the OPAC system an the user's behalf. The advantage of this approach is that no modification to the OPAC or library system is required. Furthermore, the search interface can access multiple OPACs, acting as a meta search engine, and integrate search results from various OPACs before sending them to the user. The search interface needs to incorporate a method for converting the user's natural language query into a series of Boolean search statements, and for ranking the OPAC records retrieved. The purpose of this study was to develop a relevancyranking algorithm for a search interface to Boolean OPAC systems. This is part of an on-going effort to develop a knowledge-based search interface to OPACs called the E-Referencer (Khoo et al., 1998, 1999; Poo et al., 2000). E-Referencer v. 2 that has been implemented applies a repertoire of initial search strategies and reformulation strategies to retrieve records from OPACs using the Z39.50 protocol, and also assists users in mapping query keywords to the Library of Congress subject headings."
    Source
    Electronic library. 22(2004) no.2, S.112-120
    Type
    a
  3. Courtois, M.P.; Berry, M.W.: Results ranking in Web search engines (1999) 0.08
    0.07906755 = product of:
      0.1581351 = sum of:
        0.006476338 = weight(_text_:a in 3726) [ClassicSimilarity], result of:
          0.006476338 = score(doc=3726,freq=2.0), product of:
            0.05083672 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.044089027 = queryNorm
            0.12739488 = fieldWeight in 3726, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.078125 = fieldNorm(doc=3726)
        0.15165876 = weight(_text_:et in 3726) [ClassicSimilarity], result of:
          0.15165876 = score(doc=3726,freq=4.0), product of:
            0.20686594 = queryWeight, product of:
              4.692005 = idf(docFreq=1101, maxDocs=44218)
              0.044089027 = queryNorm
            0.7331258 = fieldWeight in 3726, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.692005 = idf(docFreq=1101, maxDocs=44218)
              0.078125 = fieldNorm(doc=3726)
      0.5 = coord(2/4)
    
    Abstract
    Comparaison des méthodes de classement de 5 moteurs de recherche (AltaVista, HotBot, Excie, Infoseek et Lycos). Sont testées la présence de tous les mots, la proximité et la localisation
    Type
    a
  4. Ding, Y.; Chowdhury, G.; Foo, S.: Organsising keywords in a Web search environment : a methodology based on co-word analysis (2000) 0.08
    0.07779418 = product of:
      0.10372557 = sum of:
        0.00868892 = weight(_text_:a in 105) [ClassicSimilarity], result of:
          0.00868892 = score(doc=105,freq=10.0), product of:
            0.05083672 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.044089027 = queryNorm
            0.1709182 = fieldWeight in 105, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=105)
        0.064343356 = weight(_text_:et in 105) [ClassicSimilarity], result of:
          0.064343356 = score(doc=105,freq=2.0), product of:
            0.20686594 = queryWeight, product of:
              4.692005 = idf(docFreq=1101, maxDocs=44218)
              0.044089027 = queryNorm
            0.3110389 = fieldWeight in 105, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.692005 = idf(docFreq=1101, maxDocs=44218)
              0.046875 = fieldNorm(doc=105)
        0.03069329 = product of:
          0.06138658 = sum of:
            0.06138658 = weight(_text_:al in 105) [ClassicSimilarity], result of:
              0.06138658 = score(doc=105,freq=2.0), product of:
                0.20205697 = queryWeight, product of:
                  4.582931 = idf(docFreq=1228, maxDocs=44218)
                  0.044089027 = queryNorm
                0.30380827 = fieldWeight in 105, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.582931 = idf(docFreq=1228, maxDocs=44218)
                  0.046875 = fieldNorm(doc=105)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Abstract
    The rapid development of the Internet and World Wide Web has caused some critical problem for information retrieval. Researchers have made several attempts to solve these problems. Thesauri and subject heading lists as traditional information retrieval tools have been criticised for their efficiency to tackle these newly emerging problems. This paper proposes an information retrieval tool generated by cocitation analysis, comprising keyword clusters with relationships based on the co-occurrences of keywords in the literature. Such a tool can play the role of an associative thesaurus that can provide information about the keywords in a domain that might be useful for information searching and query expansion
    Source
    Dynamism and stability in knowledge organization: Proceedings of the 6th International ISKO-Conference, 10-13 July 2000, Toronto, Canada. Ed.: C. Beghtol et al
    Type
    a
  5. Bodoff, D.; Robertson, S.: ¬A new unified probabilistic model (2004) 0.08
    0.07710619 = product of:
      0.10280825 = sum of:
        0.007771606 = weight(_text_:a in 2129) [ClassicSimilarity], result of:
          0.007771606 = score(doc=2129,freq=8.0), product of:
            0.05083672 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.044089027 = queryNorm
            0.15287387 = fieldWeight in 2129, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=2129)
        0.064343356 = weight(_text_:et in 2129) [ClassicSimilarity], result of:
          0.064343356 = score(doc=2129,freq=2.0), product of:
            0.20686594 = queryWeight, product of:
              4.692005 = idf(docFreq=1101, maxDocs=44218)
              0.044089027 = queryNorm
            0.3110389 = fieldWeight in 2129, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.692005 = idf(docFreq=1101, maxDocs=44218)
              0.046875 = fieldNorm(doc=2129)
        0.03069329 = product of:
          0.06138658 = sum of:
            0.06138658 = weight(_text_:al in 2129) [ClassicSimilarity], result of:
              0.06138658 = score(doc=2129,freq=2.0), product of:
                0.20205697 = queryWeight, product of:
                  4.582931 = idf(docFreq=1228, maxDocs=44218)
                  0.044089027 = queryNorm
                0.30380827 = fieldWeight in 2129, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.582931 = idf(docFreq=1228, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2129)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Abstract
    This paper proposes a new unified probabilistic model. Two previous models, Robertson et al.'s "Model 0" and "Model 3," each have strengths and weaknesses. The strength of Model 0 not found in Model 3, is that it does not require relevance data about the particular document or query, and, related to that, its probability estimates are straightforward. The strength of Model 3 not found in Model 0 is that it can utilize feedback information about the particular document and query in question. In this paper we introduce a new unified probabilistic model that combines these strengths: the expression of its probabilities is straightforward, it does not require that data must be available for the particular document or query in question, but it can utilize such specific data if it is available. The model is one way to resolve the difficulty of combining two marginal views in probabilistic retrieval.
    Type
    a
  6. Cross-language information retrieval (1998) 0.08
    0.075842336 = product of:
      0.10112311 = sum of:
        0.009848507 = weight(_text_:a in 6299) [ClassicSimilarity], result of:
          0.009848507 = score(doc=6299,freq=74.0), product of:
            0.05083672 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.044089027 = queryNorm
            0.19372822 = fieldWeight in 6299, product of:
              8.602325 = tf(freq=74.0), with freq of:
                74.0 = termFreq=74.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.01953125 = fieldNorm(doc=6299)
        0.05994839 = weight(_text_:et in 6299) [ClassicSimilarity], result of:
          0.05994839 = score(doc=6299,freq=10.0), product of:
            0.20686594 = queryWeight, product of:
              4.692005 = idf(docFreq=1101, maxDocs=44218)
              0.044089027 = queryNorm
            0.28979343 = fieldWeight in 6299, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              4.692005 = idf(docFreq=1101, maxDocs=44218)
              0.01953125 = fieldNorm(doc=6299)
        0.03132621 = product of:
          0.06265242 = sum of:
            0.06265242 = weight(_text_:al in 6299) [ClassicSimilarity], result of:
              0.06265242 = score(doc=6299,freq=12.0), product of:
                0.20205697 = queryWeight, product of:
                  4.582931 = idf(docFreq=1228, maxDocs=44218)
                  0.044089027 = queryNorm
                0.31007302 = fieldWeight in 6299, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  4.582931 = idf(docFreq=1228, maxDocs=44218)
                  0.01953125 = fieldNorm(doc=6299)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Content
    Enthält die Beiträge: GREFENSTETTE, G.: The Problem of Cross-Language Information Retrieval; DAVIS, M.W.: On the Effective Use of Large Parallel Corpora in Cross-Language Text Retrieval; BALLESTEROS, L. u. W.B. CROFT: Statistical Methods for Cross-Language Information Retrieval; Distributed Cross-Lingual Information Retrieval; Automatic Cross-Language Information Retrieval Using Latent Semantic Indexing; EVANS, D.A. u.a.: Mapping Vocabularies Using Latent Semantics; PICCHI, E. u. C. PETERS: Cross-Language Information Retrieval: A System for Comparable Corpus Querying; YAMABANA, K. u.a.: A Language Conversion Front-End for Cross-Language Information Retrieval; GACHOT, D.A. u.a.: The Systran NLP Browser: An Application of Machine Translation Technology in Cross-Language Information Retrieval; HULL, D.: A Weighted Boolean Model for Cross-Language Text Retrieval; SHERIDAN, P. u.a. Building a Large Multilingual Test Collection from Comparable News Documents; OARD; D.W. u. B.J. DORR: Evaluating Cross-Language Text Filtering Effectiveness
    Footnote
    Rez. in: Machine translation review: 1999, no.10, S.26-27 (D. Lewis): "Cross Language Information Retrieval (CLIR) addresses the growing need to access large volumes of data across language boundaries. The typical requirement is for the user to input a free form query, usually a brief description of a topic, into a search or retrieval engine which returns a list, in ranked order, of documents or web pages that are relevant to the topic. The search engine matches the terms in the query to indexed terms, usually keywords previously derived from the target documents. Unlike monolingual information retrieval, CLIR requires query terms in one language to be matched to indexed terms in another. Matching can be done by bilingual dictionary lookup, full machine translation, or by applying statistical methods. A query's success is measured in terms of recall (how many potentially relevant target documents are found) and precision (what proportion of documents found are relevant). Issues in CLIR are how to translate query terms into index terms, how to eliminate alternative translations (e.g. to decide that French 'traitement' in a query means 'treatment' and not 'salary'), and how to rank or weight translation alternatives that are retained (e.g. how to order the French terms 'aventure', 'business', 'affaire', and 'liaison' as relevant translations of English 'affair'). Grefenstette provides a lucid and useful overview of the field and the problems. The volume brings together a number of experiments and projects in CLIR. Mark Davies (New Mexico State University) describes Recuerdo, a Spanish retrieval engine which reduces translation ambiguities by scanning indexes for parallel texts; it also uses either a bilingual dictionary or direct equivalents from a parallel corpus in order to compare results for queries on parallel texts. Lisa Ballesteros and Bruce Croft (University of Massachusetts) use a 'local feedback' technique which automatically enhances a query by adding extra terms to it both before and after translation; such terms can be derived from documents known to be relevant to the query.
    Christian Fluhr at al (DIST/SMTI, France) outline the EMIR (European Multilingual Information Retrieval) and ESPRIT projects. They found that using SYSTRAN to machine translate queries and to access material from various multilingual databases produced less relevant results than a method referred to as 'multilingual reformulation' (the mechanics of which are only hinted at). An interesting technique is Latent Semantic Indexing (LSI), described by Michael Littman et al (Brown University) and, most clearly, by David Evans et al (Carnegie Mellon University). LSI involves creating matrices of documents and the terms they contain and 'fitting' related documents into a reduced matrix space. This effectively allows queries to be mapped onto a common semantic representation of the documents. Eugenio Picchi and Carol Peters (Pisa) report on a procedure to create links between translation equivalents in an Italian-English parallel corpus. The links are used to construct parallel linguistic contexts in real-time for any term or combination of terms that is being searched for in either language. Their interest is primarily lexicographic but they plan to apply the same procedure to comparable corpora, i.e. to texts which are not translations of each other but which share the same domain. Kiyoshi Yamabana et al (NEC, Japan) address the issue of how to disambiguate between alternative translations of query terms. Their DMAX (double maximise) method looks at co-occurrence frequencies between both source language words and target language words in order to arrive at the most probable translation. The statistical data for the decision are derived, not from the translation texts but independently from monolingual corpora in each language. An interactive user interface allows the user to influence the selection of terms during the matching process. Denis Gachot et al (SYSTRAN) describe the SYSTRAN NLP browser, a prototype tool which collects parsing information derived from a text or corpus previously translated with SYSTRAN. The user enters queries into the browser in either a structured or free form and receives grammatical and lexical information about the source text and/or its translation.
    The retrieved output from a query including the phrase 'big rockets' may be, for instance, a sentence containing 'giant rocket' which is semantically ranked above 'military ocket'. David Hull (Xerox Research Centre, Grenoble) describes an implementation of a weighted Boolean model for Spanish-English CLIR. Users construct Boolean-type queries, weighting each term in the query, which is then translated by an on-line dictionary before being applied to the database. Comparisons with the performance of unweighted free-form queries ('vector space' models) proved encouraging. Two contributions consider the evaluation of CLIR systems. In order to by-pass the time-consuming and expensive process of assembling a standard collection of documents and of user queries against which the performance of an CLIR system is manually assessed, Páriac Sheridan et al (ETH Zurich) propose a method based on retrieving 'seed documents'. This involves identifying a unique document in a database (the 'seed document') and, for a number of queries, measuring how fast it is retrieved. The authors have also assembled a large database of multilingual news documents for testing purposes. By storing the (fairly short) documents in a structured form tagged with descriptor codes (e.g. for topic, country and area), the test suite is easily expanded while remaining consistent for the purposes of testing. Douglas Ouard and Bonne Dorr (University of Maryland) describe an evaluation methodology which appears to apply LSI techniques in order to filter and rank incoming documents designed for testing CLIR systems. The volume provides the reader an excellent overview of several projects in CLIR. It is well supported with references and is intended as a secondary text for researchers and practitioners. It highlights the need for a good, general tutorial introduction to the field."
  7. Agosti, M.; Pretto, L.: ¬A theoretical study of a generalized version of kleinberg's HITS algorithm (2005) 0.07
    0.066683784 = product of:
      0.08891171 = sum of:
        0.0097145075 = weight(_text_:a in 4) [ClassicSimilarity], result of:
          0.0097145075 = score(doc=4,freq=18.0), product of:
            0.05083672 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.044089027 = queryNorm
            0.19109234 = fieldWeight in 4, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4)
        0.053619467 = weight(_text_:et in 4) [ClassicSimilarity], result of:
          0.053619467 = score(doc=4,freq=2.0), product of:
            0.20686594 = queryWeight, product of:
              4.692005 = idf(docFreq=1101, maxDocs=44218)
              0.044089027 = queryNorm
            0.2591991 = fieldWeight in 4, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.692005 = idf(docFreq=1101, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4)
        0.025577743 = product of:
          0.051155485 = sum of:
            0.051155485 = weight(_text_:al in 4) [ClassicSimilarity], result of:
              0.051155485 = score(doc=4,freq=2.0), product of:
                0.20205697 = queryWeight, product of:
                  4.582931 = idf(docFreq=1228, maxDocs=44218)
                  0.044089027 = queryNorm
                0.25317356 = fieldWeight in 4, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.582931 = idf(docFreq=1228, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Abstract
    Kleinberg's HITS (Hyperlink-Induced Topic Search) algorithm (Kleinberg 1999), which was originally developed in a Web context, tries to infer the authoritativeness of a Web page in relation to a specific query using the structure of a subgraph of the Web graph, which is obtained considering this specific query. Recent applications of this algorithm in contexts far removed from that of Web searching (Bacchin, Ferro and Melucci 2002, Ng et al. 2001) inspired us to study the algorithm in the abstract, independently of its particular applications, trying to mathematically illuminate its behaviour. In the present paper we detail this theoretical analysis. The original work starts from the definition of a revised and more general version of the algorithm, which includes the classic one as a particular case. We perform an analysis of the structure of two particular matrices, essential to studying the behaviour of the algorithm, and we prove the convergence of the algorithm in the most general case, finding the analytic expression of the vectors to which it converges. Then we study the symmetry of the algorithm and prove the equivalence between the existence of symmetry and the independence from the order of execution of some basic operations on initial vectors. Finally, we expound some interesting consequences of our theoretical results.
    Type
    a
  8. Widyantoro, D.H.; Ioerger, T.R.; Yen, J.: Learning user Interest dynamics with a three-descriptor representation (2001) 0.03
    0.031929728 = product of:
      0.063859455 = sum of:
        0.01023999 = weight(_text_:a in 5185) [ClassicSimilarity], result of:
          0.01023999 = score(doc=5185,freq=20.0), product of:
            0.05083672 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.044089027 = queryNorm
            0.20142901 = fieldWeight in 5185, product of:
              4.472136 = tf(freq=20.0), with freq of:
                20.0 = termFreq=20.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5185)
        0.053619467 = weight(_text_:et in 5185) [ClassicSimilarity], result of:
          0.053619467 = score(doc=5185,freq=2.0), product of:
            0.20686594 = queryWeight, product of:
              4.692005 = idf(docFreq=1101, maxDocs=44218)
              0.044089027 = queryNorm
            0.2591991 = fieldWeight in 5185, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.692005 = idf(docFreq=1101, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5185)
      0.5 = coord(2/4)
    
    Abstract
    The use of documents ranked high by user feedback to profile user interests is commonly done with Rocchio's `s algorithm which uses a single list of attribute value pairs called a descriptor to carry term value weights for an individual. Negative feed back on old preferences or positive feedback on new preferences adjusts the descriptor at a fixed, predetermined, and often slow pace. Widyantoro, et alia, suggest a three descriptor model which adds two short term interest descriptors, one each for positive and negative feedback. User short term interest in a particular document is computed by subtracting the similarity measure with the negative descriptor from the similarity measure with the positive descriptor. Using a constant to represent the desired impact of long and short term interests these values may be summed for a single interest value. Using the Reuters 21578 1.0 test collection split into training and test sets, topics with at least 100 documents in a tight cluster were chosen. The TDR handles change well showing better recovery speed and accuracy than the single descriptor model. The nearest neighbor update strategy appears to keep the category concept relatively consistent when multiple TDRs are used.
    Type
    a
  9. Chen, Z.; Meng, X.; Fowler, R.H.; Zhu, B.: Real-time adaptive feature and document learning for Web search (2001) 0.03
    0.031929728 = product of:
      0.063859455 = sum of:
        0.01023999 = weight(_text_:a in 5209) [ClassicSimilarity], result of:
          0.01023999 = score(doc=5209,freq=20.0), product of:
            0.05083672 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.044089027 = queryNorm
            0.20142901 = fieldWeight in 5209, product of:
              4.472136 = tf(freq=20.0), with freq of:
                20.0 = termFreq=20.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5209)
        0.053619467 = weight(_text_:et in 5209) [ClassicSimilarity], result of:
          0.053619467 = score(doc=5209,freq=2.0), product of:
            0.20686594 = queryWeight, product of:
              4.692005 = idf(docFreq=1101, maxDocs=44218)
              0.044089027 = queryNorm
            0.2591991 = fieldWeight in 5209, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.692005 = idf(docFreq=1101, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5209)
      0.5 = coord(2/4)
    
    Abstract
    Chen et alia report on the design of FEATURES, a web search engine with adaptive features based on minimal relevance feedback. Rather than developing user profiles from previous searcher activity either at the server or client location, or updating indexes after search completion, FEATURES allows for index and user characterization files to be updated during query modification on retrieval from a general purpose search engine. Indexing terms relevant to a query are defined as the union of all terms assigned to documents retrieved by the initial search run and are used to build a vector space model on this retrieved set. The top ten weighted terms are presented to the user for a relevant non-relevant choice which is used to modify the term weights. Documents are chosen if their summed term weights are greater than some threshold. A user evaluation of the top ten ranked documents as non-relevant will decrease these term weights and a positive judgement will increase them. A new ordering of the retrieved set will generate new display lists of terms and documents. Precision is improved in a test on Alta Vista searches.
    Type
    a
  10. Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval (1986) 0.03
    0.029074889 = product of:
      0.058149777 = sum of:
        0.010362141 = weight(_text_:a in 402) [ClassicSimilarity], result of:
          0.010362141 = score(doc=402,freq=2.0), product of:
            0.05083672 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.044089027 = queryNorm
            0.20383182 = fieldWeight in 402, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.125 = fieldNorm(doc=402)
        0.047787637 = product of:
          0.09557527 = sum of:
            0.09557527 = weight(_text_:22 in 402) [ClassicSimilarity], result of:
              0.09557527 = score(doc=402,freq=2.0), product of:
                0.15439226 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.044089027 = queryNorm
                0.61904186 = fieldWeight in 402, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.125 = fieldNorm(doc=402)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Source
    Information processing and management. 22(1986) no.6, S.465-476
    Type
    a
  11. Smeaton, A.F.; Rijsbergen, C.J. van: ¬The retrieval effects of query expansion on a feedback document retrieval system (1983) 0.03
    0.02731834 = product of:
      0.05463668 = sum of:
        0.012822497 = weight(_text_:a in 2134) [ClassicSimilarity], result of:
          0.012822497 = score(doc=2134,freq=4.0), product of:
            0.05083672 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.044089027 = queryNorm
            0.25222903 = fieldWeight in 2134, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.109375 = fieldNorm(doc=2134)
        0.041814182 = product of:
          0.083628364 = sum of:
            0.083628364 = weight(_text_:22 in 2134) [ClassicSimilarity], result of:
              0.083628364 = score(doc=2134,freq=2.0), product of:
                0.15439226 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.044089027 = queryNorm
                0.5416616 = fieldWeight in 2134, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=2134)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Date
    30. 3.2001 13:32:22
    Type
    a
  12. Al-Hawamdeh, S.; Smith, G.; Willett, P.; Vere, R. de: Using nearest-neighbour searching techniques to access full-text documents (1991) 0.03
    0.026254807 = product of:
      0.052509613 = sum of:
        0.011585227 = weight(_text_:a in 2300) [ClassicSimilarity], result of:
          0.011585227 = score(doc=2300,freq=10.0), product of:
            0.05083672 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.044089027 = queryNorm
            0.22789092 = fieldWeight in 2300, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=2300)
        0.040924385 = product of:
          0.08184877 = sum of:
            0.08184877 = weight(_text_:al in 2300) [ClassicSimilarity], result of:
              0.08184877 = score(doc=2300,freq=2.0), product of:
                0.20205697 = queryWeight, product of:
                  4.582931 = idf(docFreq=1228, maxDocs=44218)
                  0.044089027 = queryNorm
                0.4050777 = fieldWeight in 2300, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.582931 = idf(docFreq=1228, maxDocs=44218)
                  0.0625 = fieldNorm(doc=2300)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Summarises the results to date of a continuing programme of research at Sheffield Univ. to investigate the use of nearest-neighbour retrieval algorithms for full text searching. Given a natural language query statement, the research methods result in a ranking of the paragraphs comprising a full text document in order of decreasing similarity with the query, where the similarity for each paragraph is determined by the number of keyword stems that it has in common with the query
    Type
    a
  13. Back, J.: ¬An evaluation of relevancy ranking techniques used by Internet search engines (2000) 0.03
    0.025440529 = product of:
      0.050881058 = sum of:
        0.009066874 = weight(_text_:a in 3445) [ClassicSimilarity], result of:
          0.009066874 = score(doc=3445,freq=2.0), product of:
            0.05083672 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.044089027 = queryNorm
            0.17835285 = fieldWeight in 3445, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.109375 = fieldNorm(doc=3445)
        0.041814182 = product of:
          0.083628364 = sum of:
            0.083628364 = weight(_text_:22 in 3445) [ClassicSimilarity], result of:
              0.083628364 = score(doc=3445,freq=2.0), product of:
                0.15439226 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.044089027 = queryNorm
                0.5416616 = fieldWeight in 3445, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=3445)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Date
    25. 8.2005 17:42:22
    Type
    a
  14. Fuhr, N.: Ranking-Experimente mit gewichteter Indexierung (1986) 0.02
    0.021806166 = product of:
      0.04361233 = sum of:
        0.007771606 = weight(_text_:a in 58) [ClassicSimilarity], result of:
          0.007771606 = score(doc=58,freq=2.0), product of:
            0.05083672 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.044089027 = queryNorm
            0.15287387 = fieldWeight in 58, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.09375 = fieldNorm(doc=58)
        0.035840724 = product of:
          0.07168145 = sum of:
            0.07168145 = weight(_text_:22 in 58) [ClassicSimilarity], result of:
              0.07168145 = score(doc=58,freq=2.0), product of:
                0.15439226 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.044089027 = queryNorm
                0.46428138 = fieldWeight in 58, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=58)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Date
    14. 6.2015 22:12:44
    Type
    a
  15. Fuhr, N.: Rankingexperimente mit gewichteter Indexierung (1986) 0.02
    0.021806166 = product of:
      0.04361233 = sum of:
        0.007771606 = weight(_text_:a in 2051) [ClassicSimilarity], result of:
          0.007771606 = score(doc=2051,freq=2.0), product of:
            0.05083672 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.044089027 = queryNorm
            0.15287387 = fieldWeight in 2051, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.09375 = fieldNorm(doc=2051)
        0.035840724 = product of:
          0.07168145 = sum of:
            0.07168145 = weight(_text_:22 in 2051) [ClassicSimilarity], result of:
              0.07168145 = score(doc=2051,freq=2.0), product of:
                0.15439226 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.044089027 = queryNorm
                0.46428138 = fieldWeight in 2051, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=2051)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Date
    14. 6.2015 22:12:56
    Type
    a
  16. Losada, D.E.; Barreiro, A.: Emebedding term similarity and inverse document frequency into a logical model of information retrieval (2003) 0.02
    0.017739523 = product of:
      0.035479046 = sum of:
        0.011585227 = weight(_text_:a in 1422) [ClassicSimilarity], result of:
          0.011585227 = score(doc=1422,freq=10.0), product of:
            0.05083672 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.044089027 = queryNorm
            0.22789092 = fieldWeight in 1422, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=1422)
        0.023893818 = product of:
          0.047787637 = sum of:
            0.047787637 = weight(_text_:22 in 1422) [ClassicSimilarity], result of:
              0.047787637 = score(doc=1422,freq=2.0), product of:
                0.15439226 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.044089027 = queryNorm
                0.30952093 = fieldWeight in 1422, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1422)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    We propose a novel approach to incorporate term similarity and inverse document frequency into a logical model of information retrieval. The ability of the logic to handle expressive representations along with the use of such classical notions are promising characteristics for IR systems. The approach proposed here has been efficiently implemented and experiments against test collections are presented.
    Date
    22. 3.2003 19:27:23
    Type
    a
  17. Faloutsos, C.: Signature files (1992) 0.02
    0.01712798 = product of:
      0.03425596 = sum of:
        0.010362141 = weight(_text_:a in 3499) [ClassicSimilarity], result of:
          0.010362141 = score(doc=3499,freq=8.0), product of:
            0.05083672 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.044089027 = queryNorm
            0.20383182 = fieldWeight in 3499, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=3499)
        0.023893818 = product of:
          0.047787637 = sum of:
            0.047787637 = weight(_text_:22 in 3499) [ClassicSimilarity], result of:
              0.047787637 = score(doc=3499,freq=2.0), product of:
                0.15439226 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.044089027 = queryNorm
                0.30952093 = fieldWeight in 3499, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=3499)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Presents a survey and discussion on signature-based text retrieval methods. It describes the main idea behind the signature approach and its advantages over other text retrieval methods, it provides a classification of the signature methods that have appeared in the literature, it describes the main representatives of each class, together with the relative advantages and drawbacks, and it gives a list of applications as well as commercial or university prototypes that use the signature approach
    Date
    7. 5.1999 15:22:48
    Type
    a
  18. Bornmann, L.; Mutz, R.: From P100 to P100' : a new citation-rank approach (2014) 0.02
    0.01712798 = product of:
      0.03425596 = sum of:
        0.010362141 = weight(_text_:a in 1431) [ClassicSimilarity], result of:
          0.010362141 = score(doc=1431,freq=8.0), product of:
            0.05083672 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.044089027 = queryNorm
            0.20383182 = fieldWeight in 1431, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=1431)
        0.023893818 = product of:
          0.047787637 = sum of:
            0.047787637 = weight(_text_:22 in 1431) [ClassicSimilarity], result of:
              0.047787637 = score(doc=1431,freq=2.0), product of:
                0.15439226 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.044089027 = queryNorm
                0.30952093 = fieldWeight in 1431, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1431)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Properties of a percentile-based rating scale needed in bibliometrics are formulated. Based on these properties, P100 was recently introduced as a new citation-rank approach (Bornmann, Leydesdorff, & Wang, 2013). In this paper, we conceptualize P100 and propose an improvement which we call P100'. Advantages and disadvantages of citation-rank indicators are noted.
    Date
    22. 8.2014 17:05:18
    Type
    a
  19. MacFarlane, A.; Robertson, S.E.; McCann, J.A.: Parallel computing for passage retrieval (2004) 0.02
    0.01561048 = product of:
      0.03122096 = sum of:
        0.007327141 = weight(_text_:a in 5108) [ClassicSimilarity], result of:
          0.007327141 = score(doc=5108,freq=4.0), product of:
            0.05083672 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.044089027 = queryNorm
            0.14413087 = fieldWeight in 5108, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=5108)
        0.023893818 = product of:
          0.047787637 = sum of:
            0.047787637 = weight(_text_:22 in 5108) [ClassicSimilarity], result of:
              0.047787637 = score(doc=5108,freq=2.0), product of:
                0.15439226 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.044089027 = queryNorm
                0.30952093 = fieldWeight in 5108, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=5108)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Date
    20. 1.2007 18:30:22
    Type
    a
  20. Abu-Salem, H.; Al-Omari, M.; Evens, M.W.: Stemming methodologies over individual query words for an Arabic information retrieval system (1999) 0.02
    0.015593208 = product of:
      0.031186417 = sum of:
        0.005608674 = weight(_text_:a in 3672) [ClassicSimilarity], result of:
          0.005608674 = score(doc=3672,freq=6.0), product of:
            0.05083672 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.044089027 = queryNorm
            0.11032722 = fieldWeight in 3672, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3672)
        0.025577743 = product of:
          0.051155485 = sum of:
            0.051155485 = weight(_text_:al in 3672) [ClassicSimilarity], result of:
              0.051155485 = score(doc=3672,freq=2.0), product of:
                0.20205697 = queryWeight, product of:
                  4.582931 = idf(docFreq=1228, maxDocs=44218)
                  0.044089027 = queryNorm
                0.25317356 = fieldWeight in 3672, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.582931 = idf(docFreq=1228, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3672)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Stemming is one of the most important factors that affect the performance of information retrieval systems. This article investigates how to improve the performance of an Arabic information retrieval system by imposing the retrieval method over individual words of a query depending on the importance of the WORD, the STEM, or the ROOT of the query terms in the database. This method, called Mxed Stemming, computes term importance using a weighting scheme that use the Term Frequency (TF) and the Inverse Document Frequency (IDF), called TFxIDF. An extended version of the Arabic IRS system is designed, implemented, and evaluated to reduce the number of irrelevant documents retrieved. The results of the experiment suggest that the proposed method outperforms the Word index method using the TFxIDF weighting scheme. It also outperforms the Stem index method using the Binary weighting scheme but does not outperform the Stem index method using the TFxIDF weighting scheme, and again it outperforms the Root index method using the Binary weighting scheme but does not outperform the Root index method using the TFxIDF weighting scheme
    Type
    a

Years

Languages

Types

  • a 337
  • el 8
  • m 7
  • s 3
  • p 2
  • r 2
  • More… Less…