Search (67 results, page 1 of 4)

  • × theme_ss:"Retrievalalgorithmen"
  1. Khoo, C.S.G.; Wan, K.-W.: ¬A simple relevancy-ranking strategy for an interface to Boolean OPACs (2004) 0.06
    0.055848993 = product of:
      0.11169799 = sum of:
        0.11169799 = sum of:
          0.087150216 = weight(_text_:language in 2509) [ClassicSimilarity], result of:
            0.087150216 = score(doc=2509,freq=16.0), product of:
              0.2030952 = queryWeight, product of:
                3.9232929 = idf(docFreq=2376, maxDocs=44218)
                0.051766515 = queryNorm
              0.42911017 = fieldWeight in 2509, product of:
                4.0 = tf(freq=16.0), with freq of:
                  16.0 = termFreq=16.0
                3.9232929 = idf(docFreq=2376, maxDocs=44218)
                0.02734375 = fieldNorm(doc=2509)
          0.024547769 = weight(_text_:22 in 2509) [ClassicSimilarity], result of:
            0.024547769 = score(doc=2509,freq=2.0), product of:
              0.18127751 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.051766515 = queryNorm
              0.1354154 = fieldWeight in 2509, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.02734375 = fieldNorm(doc=2509)
      0.5 = coord(1/2)
    
    Abstract
    A relevancy-ranking algorithm for a natural language interface to Boolean online public access catalogs (OPACs) was formulated and compared with that currently used in a knowledge-based search interface called the E-Referencer, being developed by the authors. The algorithm makes use of seven weIl-known ranking criteria: breadth of match, section weighting, proximity of query words, variant word forms (stemming), document frequency, term frequency and document length. The algorithm converts a natural language query into a series of increasingly broader Boolean search statements. In a small experiment with ten subjects in which the algorithm was simulated by hand, the algorithm obtained good results with a mean overall precision of 0.42 and mean average precision of 0.62, representing a 27 percent improvement in precision and 41 percent improvement in average precision compared to the E-Referencer. The usefulness of each step in the algorithm was analyzed and suggestions are made for improving the algorithm.
    Content
    "Most Web search engines accept natural language queries, perform some kind of fuzzy matching and produce ranked output, displaying first the documents that are most likely to be relevant. On the other hand, most library online public access catalogs (OPACs) an the Web are still Boolean retrieval systems that perform exact matching, and require users to express their search requests precisely in a Boolean search language and to refine their search statements to improve the search results. It is well-documented that users have difficulty searching Boolean OPACs effectively (e.g. Borgman, 1996; Ensor, 1992; Wallace, 1993). One approach to making OPACs easier to use is to develop a natural language search interface that acts as a middleware between the user's Web browser and the OPAC system. The search interface can accept a natural language query from the user and reformulate it as a series of Boolean search statements that are then submitted to the OPAC. The records retrieved by the OPAC are ranked by the search interface before forwarding them to the user's Web browser. The user, then, does not need to interact directly with the Boolean OPAC but with the natural language search interface or search intermediary. The search interface interacts with the OPAC system an the user's behalf. The advantage of this approach is that no modification to the OPAC or library system is required. Furthermore, the search interface can access multiple OPACs, acting as a meta search engine, and integrate search results from various OPACs before sending them to the user. The search interface needs to incorporate a method for converting the user's natural language query into a series of Boolean search statements, and for ranking the OPAC records retrieved. The purpose of this study was to develop a relevancyranking algorithm for a search interface to Boolean OPAC systems. This is part of an on-going effort to develop a knowledge-based search interface to OPACs called the E-Referencer (Khoo et al., 1998, 1999; Poo et al., 2000). E-Referencer v. 2 that has been implemented applies a repertoire of initial search strategies and reformulation strategies to retrieve records from OPACs using the Z39.50 protocol, and also assists users in mapping query keywords to the Library of Congress subject headings."
    Source
    Electronic library. 22(2004) no.2, S.112-120
  2. Soulier, L.; Jabeur, L.B.; Tamine, L.; Bahsoun, W.: On ranking relevant entities in heterogeneous networks using a language-based model (2013) 0.05
    0.048659198 = product of:
      0.097318396 = sum of:
        0.097318396 = sum of:
          0.062250152 = weight(_text_:language in 664) [ClassicSimilarity], result of:
            0.062250152 = score(doc=664,freq=4.0), product of:
              0.2030952 = queryWeight, product of:
                3.9232929 = idf(docFreq=2376, maxDocs=44218)
                0.051766515 = queryNorm
              0.30650726 = fieldWeight in 664, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.9232929 = idf(docFreq=2376, maxDocs=44218)
                0.0390625 = fieldNorm(doc=664)
          0.03506824 = weight(_text_:22 in 664) [ClassicSimilarity], result of:
            0.03506824 = score(doc=664,freq=2.0), product of:
              0.18127751 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.051766515 = queryNorm
              0.19345059 = fieldWeight in 664, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=664)
      0.5 = coord(1/2)
    
    Abstract
    A new challenge, accessing multiple relevant entities, arises from the availability of linked heterogeneous data. In this article, we address more specifically the problem of accessing relevant entities, such as publications and authors within a bibliographic network, given an information need. We propose a novel algorithm, called BibRank, that estimates a joint relevance of documents and authors within a bibliographic network. This model ranks each type of entity using a score propagation algorithm with respect to the query topic and the structure of the underlying bi-type information entity network. Evidence sources, namely content-based and network-based scores, are both used to estimate the topical similarity between connected entities. For this purpose, authorship relationships are analyzed through a language model-based score on the one hand and on the other hand, non topically related entities of the same type are detected through marginal citations. The article reports the results of experiments using the Bibrank algorithm for an information retrieval task. The CiteSeerX bibliographic data set forms the basis for the topical query automatic generation and evaluation. We show that a statistically significant improvement over closely related ranking models is achieved.
    Date
    22. 3.2013 19:34:49
  3. Dominich, S.: Mathematical foundations of information retrieval (2001) 0.04
    0.03954287 = product of:
      0.07908574 = sum of:
        0.07908574 = sum of:
          0.0440175 = weight(_text_:language in 1753) [ClassicSimilarity], result of:
            0.0440175 = score(doc=1753,freq=2.0), product of:
              0.2030952 = queryWeight, product of:
                3.9232929 = idf(docFreq=2376, maxDocs=44218)
                0.051766515 = queryNorm
              0.21673335 = fieldWeight in 1753, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.9232929 = idf(docFreq=2376, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1753)
          0.03506824 = weight(_text_:22 in 1753) [ClassicSimilarity], result of:
            0.03506824 = score(doc=1753,freq=2.0), product of:
              0.18127751 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.051766515 = queryNorm
              0.19345059 = fieldWeight in 1753, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1753)
      0.5 = coord(1/2)
    
    Abstract
    This book offers a comprehensive and consistent mathematical approach to information retrieval (IR) without which no implementation is possible, and sheds an entirely new light upon the structure of IR models. It contains the descriptions of all IR models in a unified formal style and language, along with examples for each, thus offering a comprehensive overview of them. The book also creates mathematical foundations and a consistent mathematical theory (including all mathematical results achieved so far) of IR as a stand-alone mathematical discipline, which thus can be read and taught independently. Also, the book contains all necessary mathematical knowledge on which IR relies, to help the reader avoid searching different sources. The book will be of interest to computer or information scientists, librarians, mathematicians, undergraduate students and researchers whose work involves information retrieval.
    Date
    22. 3.2008 12:26:32
  4. Hoenkamp, E.; Bruza, P.: How everyday language can and will boost effective information retrieval (2015) 0.03
    0.031125076 = product of:
      0.062250152 = sum of:
        0.062250152 = product of:
          0.124500304 = sum of:
            0.124500304 = weight(_text_:language in 2123) [ClassicSimilarity], result of:
              0.124500304 = score(doc=2123,freq=16.0), product of:
                0.2030952 = queryWeight, product of:
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.051766515 = queryNorm
                0.6130145 = fieldWeight in 2123, product of:
                  4.0 = tf(freq=16.0), with freq of:
                    16.0 = termFreq=16.0
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2123)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Typing 2 or 3 keywords into a browser has become an easy and efficient way to find information. Yet, typing even short queries becomes tedious on ever shrinking (virtual) keyboards. Meanwhile, speech processing is maturing rapidly, facilitating everyday language input. Also, wearable technology can inform users proactively by listening in on their conversations or processing their social media interactions. Given these developments, everyday language may soon become the new input of choice. We present an information retrieval (IR) algorithm specifically designed to accept everyday language. It integrates two paradigms of information retrieval, previously studied in isolation; one directed mainly at the surface structure of language, the other primarily at the underlying meaning. The integration was achieved by a Markov machine that encodes meaning by its transition graph, and surface structure by the language it generates. A rigorous evaluation of the approach showed, first, that it can compete with the quality of existing language models, second, that it is more effective the more verbose the input, and third, as a consequence, that it is promising for an imminent transition from keyword input, where the onus is on the user to formulate concise queries, to a modality where users can express more freely, more informal, and more natural their need for information in everyday language.
  5. Niemi, T.; Junkkari, M.; Järvelin, K.; Viita, S.: Advanced query language for manipulating complex entities (2004) 0.03
    0.03081225 = product of:
      0.0616245 = sum of:
        0.0616245 = product of:
          0.123249 = sum of:
            0.123249 = weight(_text_:language in 4218) [ClassicSimilarity], result of:
              0.123249 = score(doc=4218,freq=2.0), product of:
                0.2030952 = queryWeight, product of:
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.051766515 = queryNorm
                0.60685337 = fieldWeight in 4218, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.109375 = fieldNorm(doc=4218)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
  6. Ponte, J.M.: Language models for relevance feedback (2000) 0.03
    0.029527843 = product of:
      0.059055686 = sum of:
        0.059055686 = product of:
          0.11811137 = sum of:
            0.11811137 = weight(_text_:language in 35) [ClassicSimilarity], result of:
              0.11811137 = score(doc=35,freq=10.0), product of:
                0.2030952 = queryWeight, product of:
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.051766515 = queryNorm
                0.5815567 = fieldWeight in 35, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.046875 = fieldNorm(doc=35)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The language modeling approach to Information Retrieval (IR) is a conceptually simple model of IR originally developed by Ponte and Croft (1998). In this approach, the query is treated as a random event and documents are ranked according to the likelihood that the query would be generated via a language model estimated for each document. The intuition behind this approach is that users have a prototypical document in mind and will choose query terms accordingly. The intuitive appeal of this method is that inferences about the semantic content of documents do not need to be made resulting in a conceptually simple model. In this paper, techniques for relevance feedback and routing are derived from the language modeling approach in a straightforward manner and their effectiveness is demonstrated empirically. These experiments demonstrate further proof of concept for the language modeling approach to retrieval
  7. Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval (1986) 0.03
    0.028054593 = product of:
      0.056109186 = sum of:
        0.056109186 = product of:
          0.11221837 = sum of:
            0.11221837 = weight(_text_:22 in 402) [ClassicSimilarity], result of:
              0.11221837 = score(doc=402,freq=2.0), product of:
                0.18127751 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.051766515 = queryNorm
                0.61904186 = fieldWeight in 402, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.125 = fieldNorm(doc=402)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Source
    Information processing and management. 22(1986) no.6, S.465-476
  8. Jascó, P.: Mapping algorithms to translate natural language questions into search queries for Web databases (1997) 0.03
    0.0264105 = product of:
      0.052821 = sum of:
        0.052821 = product of:
          0.105642 = sum of:
            0.105642 = weight(_text_:language in 314) [ClassicSimilarity], result of:
              0.105642 = score(doc=314,freq=2.0), product of:
                0.2030952 = queryWeight, product of:
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.051766515 = queryNorm
                0.52016 = fieldWeight in 314, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.09375 = fieldNorm(doc=314)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
  9. Tenopir, C.: Online databases : natural language searching with WIN (1993) 0.02
    0.02490006 = product of:
      0.04980012 = sum of:
        0.04980012 = product of:
          0.09960024 = sum of:
            0.09960024 = weight(_text_:language in 7038) [ClassicSimilarity], result of:
              0.09960024 = score(doc=7038,freq=4.0), product of:
                0.2030952 = queryWeight, product of:
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.051766515 = queryNorm
                0.4904116 = fieldWeight in 7038, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.0625 = fieldNorm(doc=7038)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    WESTLAW is one of the first major commercial online systems to embrace both natural language input and partial match searching. Provides a backgroud to WESTLAW. Explains how the WESTLAW Is Natural (WIN) search engine works. Some searchers find that when searching with commands and Boolean logic, results differ drastically from those produces by searching with WIN. Discusses exact match Boolean logic search engines
  10. Lalmas, M.: XML retrieval (2009) 0.02
    0.024606533 = product of:
      0.049213067 = sum of:
        0.049213067 = product of:
          0.09842613 = sum of:
            0.09842613 = weight(_text_:language in 4998) [ClassicSimilarity], result of:
              0.09842613 = score(doc=4998,freq=10.0), product of:
                0.2030952 = queryWeight, product of:
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.051766515 = queryNorm
                0.48463053 = fieldWeight in 4998, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4998)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Documents usually have a content and a structure. The content refers to the text of the document, whereas the structure refers to how a document is logically organized. An increasingly common way to encode the structure is through the use of a mark-up language. Nowadays, the most widely used mark-up language for representing structure is the eXtensible Mark-up Language (XML). XML can be used to provide a focused access to documents, i.e. returning XML elements, such as sections and paragraphs, instead of whole documents in response to a query. Such focused strategies are of particular benefit for information repositories containing long documents, or documents covering a wide variety of topics, where users are directed to the most relevant content within a document. The increased adoption of XML to represent a document structure requires the development of tools to effectively access documents marked-up in XML. This book provides a detailed description of query languages, indexing strategies, ranking algorithms, presentation scenarios developed to access XML documents. Major advances in XML retrieval were seen from 2002 as a result of INEX, the Initiative for Evaluation of XML Retrieval. INEX, also described in this book, provided test sets for evaluating XML retrieval effectiveness. Many of the developments and results described in this book were investigated within INEX.
    LCSH
    XML (Document markup language)
    Subject
    XML (Document markup language)
  11. Smeaton, A.F.; Rijsbergen, C.J. van: ¬The retrieval effects of query expansion on a feedback document retrieval system (1983) 0.02
    0.024547769 = product of:
      0.049095538 = sum of:
        0.049095538 = product of:
          0.098191075 = sum of:
            0.098191075 = weight(_text_:22 in 2134) [ClassicSimilarity], result of:
              0.098191075 = score(doc=2134,freq=2.0), product of:
                0.18127751 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.051766515 = queryNorm
                0.5416616 = fieldWeight in 2134, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=2134)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    30. 3.2001 13:32:22
  12. Back, J.: ¬An evaluation of relevancy ranking techniques used by Internet search engines (2000) 0.02
    0.024547769 = product of:
      0.049095538 = sum of:
        0.049095538 = product of:
          0.098191075 = sum of:
            0.098191075 = weight(_text_:22 in 3445) [ClassicSimilarity], result of:
              0.098191075 = score(doc=3445,freq=2.0), product of:
                0.18127751 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.051766515 = queryNorm
                0.5416616 = fieldWeight in 3445, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=3445)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    25. 8.2005 17:42:22
  13. Cross-language information retrieval (1998) 0.02
    0.023343807 = product of:
      0.046687614 = sum of:
        0.046687614 = product of:
          0.09337523 = sum of:
            0.09337523 = weight(_text_:language in 6299) [ClassicSimilarity], result of:
              0.09337523 = score(doc=6299,freq=36.0), product of:
                0.2030952 = queryWeight, product of:
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.051766515 = queryNorm
                0.45976087 = fieldWeight in 6299, product of:
                  6.0 = tf(freq=36.0), with freq of:
                    36.0 = termFreq=36.0
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.01953125 = fieldNorm(doc=6299)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Content
    Enthält die Beiträge: GREFENSTETTE, G.: The Problem of Cross-Language Information Retrieval; DAVIS, M.W.: On the Effective Use of Large Parallel Corpora in Cross-Language Text Retrieval; BALLESTEROS, L. u. W.B. CROFT: Statistical Methods for Cross-Language Information Retrieval; Distributed Cross-Lingual Information Retrieval; Automatic Cross-Language Information Retrieval Using Latent Semantic Indexing; EVANS, D.A. u.a.: Mapping Vocabularies Using Latent Semantics; PICCHI, E. u. C. PETERS: Cross-Language Information Retrieval: A System for Comparable Corpus Querying; YAMABANA, K. u.a.: A Language Conversion Front-End for Cross-Language Information Retrieval; GACHOT, D.A. u.a.: The Systran NLP Browser: An Application of Machine Translation Technology in Cross-Language Information Retrieval; HULL, D.: A Weighted Boolean Model for Cross-Language Text Retrieval; SHERIDAN, P. u.a. Building a Large Multilingual Test Collection from Comparable News Documents; OARD; D.W. u. B.J. DORR: Evaluating Cross-Language Text Filtering Effectiveness
    Footnote
    Rez. in: Machine translation review: 1999, no.10, S.26-27 (D. Lewis): "Cross Language Information Retrieval (CLIR) addresses the growing need to access large volumes of data across language boundaries. The typical requirement is for the user to input a free form query, usually a brief description of a topic, into a search or retrieval engine which returns a list, in ranked order, of documents or web pages that are relevant to the topic. The search engine matches the terms in the query to indexed terms, usually keywords previously derived from the target documents. Unlike monolingual information retrieval, CLIR requires query terms in one language to be matched to indexed terms in another. Matching can be done by bilingual dictionary lookup, full machine translation, or by applying statistical methods. A query's success is measured in terms of recall (how many potentially relevant target documents are found) and precision (what proportion of documents found are relevant). Issues in CLIR are how to translate query terms into index terms, how to eliminate alternative translations (e.g. to decide that French 'traitement' in a query means 'treatment' and not 'salary'), and how to rank or weight translation alternatives that are retained (e.g. how to order the French terms 'aventure', 'business', 'affaire', and 'liaison' as relevant translations of English 'affair'). Grefenstette provides a lucid and useful overview of the field and the problems. The volume brings together a number of experiments and projects in CLIR. Mark Davies (New Mexico State University) describes Recuerdo, a Spanish retrieval engine which reduces translation ambiguities by scanning indexes for parallel texts; it also uses either a bilingual dictionary or direct equivalents from a parallel corpus in order to compare results for queries on parallel texts. Lisa Ballesteros and Bruce Croft (University of Massachusetts) use a 'local feedback' technique which automatically enhances a query by adding extra terms to it both before and after translation; such terms can be derived from documents known to be relevant to the query.
    Christian Fluhr at al (DIST/SMTI, France) outline the EMIR (European Multilingual Information Retrieval) and ESPRIT projects. They found that using SYSTRAN to machine translate queries and to access material from various multilingual databases produced less relevant results than a method referred to as 'multilingual reformulation' (the mechanics of which are only hinted at). An interesting technique is Latent Semantic Indexing (LSI), described by Michael Littman et al (Brown University) and, most clearly, by David Evans et al (Carnegie Mellon University). LSI involves creating matrices of documents and the terms they contain and 'fitting' related documents into a reduced matrix space. This effectively allows queries to be mapped onto a common semantic representation of the documents. Eugenio Picchi and Carol Peters (Pisa) report on a procedure to create links between translation equivalents in an Italian-English parallel corpus. The links are used to construct parallel linguistic contexts in real-time for any term or combination of terms that is being searched for in either language. Their interest is primarily lexicographic but they plan to apply the same procedure to comparable corpora, i.e. to texts which are not translations of each other but which share the same domain. Kiyoshi Yamabana et al (NEC, Japan) address the issue of how to disambiguate between alternative translations of query terms. Their DMAX (double maximise) method looks at co-occurrence frequencies between both source language words and target language words in order to arrive at the most probable translation. The statistical data for the decision are derived, not from the translation texts but independently from monolingual corpora in each language. An interactive user interface allows the user to influence the selection of terms during the matching process. Denis Gachot et al (SYSTRAN) describe the SYSTRAN NLP browser, a prototype tool which collects parsing information derived from a text or corpus previously translated with SYSTRAN. The user enters queries into the browser in either a structured or free form and receives grammatical and lexical information about the source text and/or its translation.
  14. Liddy, E.D.; Paik, W.; McKenna, M.; Yu, E.S.: ¬A natural language text retrieval system with relevance feedback (1995) 0.02
    0.021787554 = product of:
      0.043575108 = sum of:
        0.043575108 = product of:
          0.087150216 = sum of:
            0.087150216 = weight(_text_:language in 3131) [ClassicSimilarity], result of:
              0.087150216 = score(doc=3131,freq=4.0), product of:
                0.2030952 = queryWeight, product of:
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.051766515 = queryNorm
                0.42911017 = fieldWeight in 3131, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=3131)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Outlines a fully integrated retrieval engine that processes documents and queries at the multiple, complex linguistic levels that humans use to construe meaning. Currently undergoing beta site trials, the DR-LINK natural language text retrieval system allows searchers to state queries as fully formed, natural sentences. The meaning and matching of both queries and documents is accomplished at the conceptual level of human expression, not by the simple concurrence of keywords. Furthermore, the natural browsing behaviour of information searchers is accomodated by allowing documents identified as potentially relevant by the explicit semantics of the system to be used as relevance feedback queries which provide an appropriate implicit semantic representation of the information seeker's need
  15. Paris, L.A.H.; Tibbo, H.R.: Freestyle vs. Boolean : a comparison of partial and exact match retrieval systems (1998) 0.02
    0.021787554 = product of:
      0.043575108 = sum of:
        0.043575108 = product of:
          0.087150216 = sum of:
            0.087150216 = weight(_text_:language in 3329) [ClassicSimilarity], result of:
              0.087150216 = score(doc=3329,freq=4.0), product of:
                0.2030952 = queryWeight, product of:
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.051766515 = queryNorm
                0.42911017 = fieldWeight in 3329, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=3329)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Compares the performance of partial match options, LEXIS/NEXIS's Freestyle, with that of traditional Boolean retrieval. Defines natural language and the natural language search engines currently available. Although the Boolean searches had better results more often than the Freestyle searches, neither mechanism demonstrated superior performance for every query. These results do not in any way prove the superiority of partial match techniques or exact match techniques, but they do suggest that different queries demand different techniques. Further study and analysis are needed to determine which elements of a query make it best suited for partial match or exact match retrieval
  16. Sánchez-de-Madariaga, R.; Fernández-del-Castillo, J.R.: ¬The bootstrapping of the Yarowsky algorithm in real corpora (2009) 0.02
    0.021787554 = product of:
      0.043575108 = sum of:
        0.043575108 = product of:
          0.087150216 = sum of:
            0.087150216 = weight(_text_:language in 2451) [ClassicSimilarity], result of:
              0.087150216 = score(doc=2451,freq=4.0), product of:
                0.2030952 = queryWeight, product of:
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.051766515 = queryNorm
                0.42911017 = fieldWeight in 2451, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2451)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The Yarowsky bootstrapping algorithm resolves the homograph-level word sense disambiguation (WSD) problem, which is the sense granularity level required for real natural language processing (NLP) applications. At the same time it resolves the knowledge acquisition bottleneck problem affecting most WSD algorithms and can be easily applied to foreign language corpora. However, this paper shows that the Yarowsky algorithm is significantly less accurate when applied to domain fluctuating, real corpora. This paper also introduces a new bootstrapping methodology that performs much better when applied to these corpora. The accuracy achieved in non-domain fluctuating corpora is not reached due to inherent domain fluctuation ambiguities.
  17. Fuhr, N.: Ranking-Experimente mit gewichteter Indexierung (1986) 0.02
    0.021040944 = product of:
      0.04208189 = sum of:
        0.04208189 = product of:
          0.08416378 = sum of:
            0.08416378 = weight(_text_:22 in 58) [ClassicSimilarity], result of:
              0.08416378 = score(doc=58,freq=2.0), product of:
                0.18127751 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.051766515 = queryNorm
                0.46428138 = fieldWeight in 58, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=58)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    14. 6.2015 22:12:44
  18. Fuhr, N.: Rankingexperimente mit gewichteter Indexierung (1986) 0.02
    0.021040944 = product of:
      0.04208189 = sum of:
        0.04208189 = product of:
          0.08416378 = sum of:
            0.08416378 = weight(_text_:22 in 2051) [ClassicSimilarity], result of:
              0.08416378 = score(doc=2051,freq=2.0), product of:
                0.18127751 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.051766515 = queryNorm
                0.46428138 = fieldWeight in 2051, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=2051)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    14. 6.2015 22:12:56
  19. Li, X.: ¬A new robust relevance model in the language model framework (2008) 0.02
    0.01906014 = product of:
      0.03812028 = sum of:
        0.03812028 = product of:
          0.07624056 = sum of:
            0.07624056 = weight(_text_:language in 2076) [ClassicSimilarity], result of:
              0.07624056 = score(doc=2076,freq=6.0), product of:
                0.2030952 = queryWeight, product of:
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.051766515 = queryNorm
                0.3753932 = fieldWeight in 2076, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2076)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    In this paper, a new robust relevance model is proposed that can be applied to both pseudo and true relevance feedback in the language-modeling framework for document retrieval. There are at least three main differences between our new relevance model and other relevance models. The proposed model brings back the original query into the relevance model by treating it as a short, special document, in addition to a number of top-ranked documents returned from the first round retrieval for pseudo feedback, or a number of relevant documents for true relevance feedback. Second, instead of using a uniform prior as in the original relevance model proposed by Lavrenko and Croft, documents are assigned with different priors according to their lengths (in terms) and ranks in the first round retrieval. Third, the probability of a term in the relevance model is further adjusted by its probability in a background language model. In both pseudo and true relevance cases, we have compared the performance of our model to that of the two baselines: the original relevance model and a linear combination model. Our experimental results show that the proposed new model outperforms both of the two baselines in terms of mean average precision.
  20. Liddy, E.D.: ¬An alternative representation for documents and queries (1993) 0.02
    0.018675046 = product of:
      0.037350092 = sum of:
        0.037350092 = product of:
          0.074700184 = sum of:
            0.074700184 = weight(_text_:language in 7813) [ClassicSimilarity], result of:
              0.074700184 = score(doc=7813,freq=4.0), product of:
                0.2030952 = queryWeight, product of:
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.051766515 = queryNorm
                0.3678087 = fieldWeight in 7813, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.046875 = fieldNorm(doc=7813)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Describes an alternative method of representation for documents and queries in information retrieval systems to the 2 most common methods: free text, natural language representation and controlled language representation. The alternative method combines the advantage of both traditional approaches and overcomes the difficulties associated with each. The scheme was developed for use with Longman's Dictionary of Contemporary English and uses a computerized version of the dictionary for the automatic generation of summary level semantic representations of each document and query. The system tags each word in a document with the appropriate Subject Field Code (SFC) from the dictionary and the SFCs are summed and normalized to produce a weighted, fixed length vector of the SFC. The search system matches the query SFC vector to the document SFC vectors in the database. The documents are then ranked on the basis of their vector's similarity to the query

Years

Languages

  • e 62
  • d 4
  • m 1
  • More… Less…

Types

  • a 60
  • m 6
  • s 2
  • r 1
  • More… Less…