Search (63 results, page 2 of 4)

  • × theme_ss:"Retrievalalgorithmen"
  • × type_ss:"a"
  1. Song, D.; Bruza, P.D.: Towards context sensitive information inference (2003) 0.01
    0.0050137215 = product of:
      0.0701921 = sum of:
        0.0701921 = sum of:
          0.04985209 = weight(_text_:texts in 1428) [ClassicSimilarity], result of:
            0.04985209 = score(doc=1428,freq=2.0), product of:
              0.16460659 = queryWeight, product of:
                5.4822793 = idf(docFreq=499, maxDocs=44218)
                0.03002521 = queryNorm
              0.302856 = fieldWeight in 1428, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4822793 = idf(docFreq=499, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1428)
          0.020340007 = weight(_text_:22 in 1428) [ClassicSimilarity], result of:
            0.020340007 = score(doc=1428,freq=2.0), product of:
              0.10514317 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.03002521 = queryNorm
              0.19345059 = fieldWeight in 1428, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1428)
      0.071428575 = coord(1/14)
    
    Abstract
    Humans can make hasty, but generally robust judgements about what a text fragment is, or is not, about. Such judgements are termed information inference. This article furnishes an account of information inference from a psychologistic stance. By drawing an theories from nonclassical logic and applied cognition, an information inference mechanism is proposed that makes inferences via computations of information flow through an approximation of a conceptual space. Within a conceptual space information is represented geometrically. In this article, geometric representations of words are realized as vectors in a high dimensional semantic space, which is automatically constructed from a text corpus. Two approaches were presented for priming vector representations according to context. The first approach uses a concept combination heuristic to adjust the vector representation of a concept in the light of the representation of another concept. The second approach computes a prototypical concept an the basis of exemplar trace texts and moves it in the dimensional space according to the context. Information inference is evaluated by measuring the effectiveness of query models derived by information flow computations. Results show that information flow contributes significantly to query model effectiveness, particularly with respect to precision. Moreover, retrieval effectiveness compares favorably with two probabilistic query models, and another based an semantic association. More generally, this article can be seen as a contribution towards realizing operational systems that mimic text-based human reasoning.
    Date
    22. 3.2003 19:35:46
  2. Jacucci, G.; Barral, O.; Daee, P.; Wenzel, M.; Serim, B.; Ruotsalo, T.; Pluchino, P.; Freeman, J.; Gamberini, L.; Kaski, S.; Blankertz, B.: Integrating neurophysiologic relevance feedback in intent modeling for information retrieval (2019) 0.00
    0.004806533 = product of:
      0.03364573 = sum of:
        0.016822865 = weight(_text_:classification in 5356) [ClassicSimilarity], result of:
          0.016822865 = score(doc=5356,freq=2.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.17593184 = fieldWeight in 5356, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5356)
        0.016822865 = weight(_text_:classification in 5356) [ClassicSimilarity], result of:
          0.016822865 = score(doc=5356,freq=2.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.17593184 = fieldWeight in 5356, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5356)
      0.14285715 = coord(2/14)
    
    Abstract
    The use of implicit relevance feedback from neurophysiology could deliver effortless information retrieval. However, both computing neurophysiologic responses and retrieving documents are characterized by uncertainty because of noisy signals and incomplete or inconsistent representations of the data. We present the first-of-its-kind, fully integrated information retrieval system that makes use of online implicit relevance feedback generated from brain activity as measured through electroencephalography (EEG), and eye movements. The findings of the evaluation experiment (N = 16) show that we are able to compute online neurophysiology-based relevance feedback with performance significantly better than chance in complex data domains and realistic search tasks. We contribute by demonstrating how to integrate in interactive intent modeling this inherently noisy implicit relevance feedback combined with scarce explicit feedback. Although experimental measures of task performance did not allow us to demonstrate how the classification outcomes translated into search task performance, the experiment proved that our approach is able to generate relevance feedback from brain signals and eye movements in a realistic scenario, thus providing promising implications for future work in neuroadaptive information retrieval (IR).
  3. Savoy, J.; Ndarugendamwo, M.; Vrajitoru, D.: Report on the TREC-4 experiment : combining probabilistic and vector-space schemes (1996) 0.00
    0.004071223 = product of:
      0.05699712 = sum of:
        0.05699712 = product of:
          0.11399424 = sum of:
            0.11399424 = weight(_text_:schemes in 7574) [ClassicSimilarity], result of:
              0.11399424 = score(doc=7574,freq=2.0), product of:
                0.16067243 = queryWeight, product of:
                  5.3512506 = idf(docFreq=569, maxDocs=44218)
                  0.03002521 = queryNorm
                0.7094823 = fieldWeight in 7574, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.3512506 = idf(docFreq=569, maxDocs=44218)
                  0.09375 = fieldNorm(doc=7574)
          0.5 = coord(1/2)
      0.071428575 = coord(1/14)
    
  4. Liu, R.-L.; Huang, Y.-C.: Ranker enhancement for proximity-based ranking of biomedical texts (2011) 0.00
    0.003560864 = product of:
      0.04985209 = sum of:
        0.04985209 = product of:
          0.09970418 = sum of:
            0.09970418 = weight(_text_:texts in 4947) [ClassicSimilarity], result of:
              0.09970418 = score(doc=4947,freq=8.0), product of:
                0.16460659 = queryWeight, product of:
                  5.4822793 = idf(docFreq=499, maxDocs=44218)
                  0.03002521 = queryNorm
                0.605712 = fieldWeight in 4947, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  5.4822793 = idf(docFreq=499, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4947)
          0.5 = coord(1/2)
      0.071428575 = coord(1/14)
    
    Abstract
    Biomedical decision making often requires relevant evidence from the biomedical literature. Retrieval of the evidence calls for a system that receives a natural language query for a biomedical information need and, among the huge amount of texts retrieved for the query, ranks relevant texts higher for further processing. However, state-of-the-art text rankers have weaknesses in dealing with biomedical queries, which often consist of several correlating concepts and prefer those texts that completely talk about the concepts. In this article, we present a technique, Proximity-Based Ranker Enhancer (PRE), to enhance text rankers by term-proximity information. PRE assesses the term frequency (TF) of each term in the text by integrating three types of term proximity to measure the contextual completeness of query terms appearing in nearby areas in the text being ranked. Therefore, PRE may serve as a preprocessor for (or supplement to) those rankers that consider TF in ranking, without the need to change the algorithms and development processes of the rankers. Empirical evaluation shows that PRE significantly improves various kinds of text rankers, and when compared with several state-of-the-art techniques that enhance rankers by term-proximity information, PRE may more stably and significantly enhance the rankers.
  5. Khoo, C.S.G.; Wan, K.-W.: ¬A simple relevancy-ranking strategy for an interface to Boolean OPACs (2004) 0.00
    0.0031387832 = product of:
      0.02197148 = sum of:
        0.014852478 = weight(_text_:subject in 2509) [ClassicSimilarity], result of:
          0.014852478 = score(doc=2509,freq=2.0), product of:
            0.10738805 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.03002521 = queryNorm
            0.13830662 = fieldWeight in 2509, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.02734375 = fieldNorm(doc=2509)
        0.0071190023 = product of:
          0.014238005 = sum of:
            0.014238005 = weight(_text_:22 in 2509) [ClassicSimilarity], result of:
              0.014238005 = score(doc=2509,freq=2.0), product of:
                0.10514317 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03002521 = queryNorm
                0.1354154 = fieldWeight in 2509, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.02734375 = fieldNorm(doc=2509)
          0.5 = coord(1/2)
      0.14285715 = coord(2/14)
    
    Content
    "Most Web search engines accept natural language queries, perform some kind of fuzzy matching and produce ranked output, displaying first the documents that are most likely to be relevant. On the other hand, most library online public access catalogs (OPACs) an the Web are still Boolean retrieval systems that perform exact matching, and require users to express their search requests precisely in a Boolean search language and to refine their search statements to improve the search results. It is well-documented that users have difficulty searching Boolean OPACs effectively (e.g. Borgman, 1996; Ensor, 1992; Wallace, 1993). One approach to making OPACs easier to use is to develop a natural language search interface that acts as a middleware between the user's Web browser and the OPAC system. The search interface can accept a natural language query from the user and reformulate it as a series of Boolean search statements that are then submitted to the OPAC. The records retrieved by the OPAC are ranked by the search interface before forwarding them to the user's Web browser. The user, then, does not need to interact directly with the Boolean OPAC but with the natural language search interface or search intermediary. The search interface interacts with the OPAC system an the user's behalf. The advantage of this approach is that no modification to the OPAC or library system is required. Furthermore, the search interface can access multiple OPACs, acting as a meta search engine, and integrate search results from various OPACs before sending them to the user. The search interface needs to incorporate a method for converting the user's natural language query into a series of Boolean search statements, and for ranking the OPAC records retrieved. The purpose of this study was to develop a relevancyranking algorithm for a search interface to Boolean OPAC systems. This is part of an on-going effort to develop a knowledge-based search interface to OPACs called the E-Referencer (Khoo et al., 1998, 1999; Poo et al., 2000). E-Referencer v. 2 that has been implemented applies a repertoire of initial search strategies and reformulation strategies to retrieve records from OPACs using the Z39.50 protocol, and also assists users in mapping query keywords to the Library of Congress subject headings."
    Source
    Electronic library. 22(2004) no.2, S.112-120
  6. Kekäläinen, J.: Binary and graded relevance in IR evaluations : comparison of the effects on ranking of IR systems (2005) 0.00
    0.0028787898 = product of:
      0.040303055 = sum of:
        0.040303055 = product of:
          0.08060611 = sum of:
            0.08060611 = weight(_text_:schemes in 1036) [ClassicSimilarity], result of:
              0.08060611 = score(doc=1036,freq=4.0), product of:
                0.16067243 = queryWeight, product of:
                  5.3512506 = idf(docFreq=569, maxDocs=44218)
                  0.03002521 = queryNorm
                0.5016798 = fieldWeight in 1036, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  5.3512506 = idf(docFreq=569, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1036)
          0.5 = coord(1/2)
      0.071428575 = coord(1/14)
    
    Abstract
    In this study the rankings of IR systems based on binary and graded relevance in TREC 7 and 8 data are compared. Relevance of a sample TREC results is reassessed using a relevance scale with four levels: non-relevant, marginally relevant, fairly relevant, highly relevant. Twenty-one topics and 90 systems from TREC 7 and 20 topics and 121 systems from TREC 8 form the data. Binary precision, and cumulated gain, discounted cumulated gain and normalised discounted cumulated gain are the measures compared. Different weighting schemes for relevance levels are tested with cumulated gain measures. Kendall's rank correlations are computed to determine to what extent the rankings produced by different measures are similar. Weighting schemes from binary to emphasising highly relevant documents form a continuum, where the measures correlate strongly in the binary end, and less in the heavily weighted end. The results show the different character of the measures.
  7. Van der Veer Martens, B.; Fleet, C. van: Opening the black box of "relevance work" : a domain analysis (2012) 0.00
    0.0025719889 = product of:
      0.036007844 = sum of:
        0.036007844 = weight(_text_:subject in 247) [ClassicSimilarity], result of:
          0.036007844 = score(doc=247,freq=4.0), product of:
            0.10738805 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.03002521 = queryNorm
            0.33530587 = fieldWeight in 247, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.046875 = fieldNorm(doc=247)
      0.071428575 = coord(1/14)
    
    Abstract
    In response to Hjørland's recent call for a reconceptualization of the foundations of relevance, we suggest that the sociocognitive aspects of intermediation by information agencies, such as archives and libraries, are a necessary and unexplored part of the infrastructure of the subject knowledge domains central to his recommended "view of relevance informed by a social paradigm" (2010, p. 217). From a comparative analysis of documents from 39 graduate-level introductory courses in archives, reference, and strategic/competitive intelligence taught in 13 American Library Association-accredited library and information science (LIS) programs, we identify four defining sociocognitive dimensions of "relevance work" in information agencies within Hjørland's proposed framework for relevance: tasks, time, systems, and assessors. This study is intended to supply sociocognitive content from within the relevance work domain to support further domain analytic research, and to emphasize the importance of intermediary relevance work for all subject knowledge domains.
  8. Couvreur, T.R.; Benzel, R.N.; Miller, S.F.; Zeitler, D.N.; Lee, D.L.; Singhal, M.; Shivaratri, N.; Wong, W.Y.P.: ¬An analysis of performance and cost factors in searching large text databases using parallel search systems (1994) 0.00
    0.002513852 = product of:
      0.035193928 = sum of:
        0.035193928 = weight(_text_:bibliographic in 7657) [ClassicSimilarity], result of:
          0.035193928 = score(doc=7657,freq=2.0), product of:
            0.11688946 = queryWeight, product of:
              3.893044 = idf(docFreq=2449, maxDocs=44218)
              0.03002521 = queryNorm
            0.30108726 = fieldWeight in 7657, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.893044 = idf(docFreq=2449, maxDocs=44218)
              0.0546875 = fieldNorm(doc=7657)
      0.071428575 = coord(1/14)
    
    Abstract
    The results of modelling the performance of searching large text databases (>10 GBytes) via various parallel hardware architectures and search algorithms are discussed. The performance under load and the cost of each configuration are compared. Strengths, weaknesses, performance sensitivities, and search features supported for each configuration are also addressed. In addition, a common search workload used in the modelling is described. The search workload is derived from a set of searches run against the Chemical Abstracts file of bibliographic and abstract text available on STN International. This common workload is applied to all configurations modelled to provide a common basis of comparison
  9. Wong, S.K.M.; Yao, Y.Y.: Query formulation in linear retrieval models (1990) 0.00
    0.0024248946 = product of:
      0.033948522 = sum of:
        0.033948522 = weight(_text_:subject in 3571) [ClassicSimilarity], result of:
          0.033948522 = score(doc=3571,freq=2.0), product of:
            0.10738805 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.03002521 = queryNorm
            0.31612942 = fieldWeight in 3571, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.0625 = fieldNorm(doc=3571)
      0.071428575 = coord(1/14)
    
    Abstract
    The subject of query formulation is analysed within the framework of adaptive linear models. The study is based on the notions of user preference and an acceptable ranking strategy. A gradient descent algorithm is used to formulate the query vector by an inductive process. Presents a critical analysis of the existing relevance feedback and probabilistic approaches.
  10. Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval (1986) 0.00
    0.0023245723 = product of:
      0.03254401 = sum of:
        0.03254401 = product of:
          0.06508802 = sum of:
            0.06508802 = weight(_text_:22 in 402) [ClassicSimilarity], result of:
              0.06508802 = score(doc=402,freq=2.0), product of:
                0.10514317 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03002521 = queryNorm
                0.61904186 = fieldWeight in 402, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.125 = fieldNorm(doc=402)
          0.5 = coord(1/2)
      0.071428575 = coord(1/14)
    
    Source
    Information processing and management. 22(1986) no.6, S.465-476
  11. Rada, R.; Barlow, J.; Potharst, J.; Zanstra, P.; Bijstra, D.: Document ranking using an enriched thesaurus (1991) 0.00
    0.0021547303 = product of:
      0.030166224 = sum of:
        0.030166224 = weight(_text_:bibliographic in 6626) [ClassicSimilarity], result of:
          0.030166224 = score(doc=6626,freq=2.0), product of:
            0.11688946 = queryWeight, product of:
              3.893044 = idf(docFreq=2449, maxDocs=44218)
              0.03002521 = queryNorm
            0.2580748 = fieldWeight in 6626, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.893044 = idf(docFreq=2449, maxDocs=44218)
              0.046875 = fieldNorm(doc=6626)
      0.071428575 = coord(1/14)
    
    Abstract
    A thesaurus may be viewed as a graph, and document retrieval algorithms can exploit this graph when both the documents and the query are represented by thesaurus terms. These retrieval algorithms measure the distance between the query and documents by using the path lengths in the graph. Previous work witj such strategies has shown that the hierarchical relations in the thesaurus are useful but the non-hierarchical are not. This paper shows that when the query explicitly mentions a particular non-hierarchical relation, the retrieval algorithm benefits from the presence of such relations in the thesaurus. Our algorithms were applied to the Excerpta Medica bibliographic citation database whose citations are indexed with terms from the EMTREE thesaurus. We also created an enriched EMTREE by systematically adding non-hierarchical relations from a medical knowledge base. Our algorithms used at one time EMTREE and, at another time, the enriched EMTREE in the course of ranking documents from Excerpta Medica against queries. When, and only when, the query specifically mentioned a particular non-hierarchical relation type, did EMTREE enriched with that relation type lead to a ranking that better corresponded to an expert's ranking
  12. White, K.J.; Sutcliffe, R.F.E.: Applying incremental tree induction to retrieval : from manuals and medical texts (2006) 0.00
    0.0021365185 = product of:
      0.029911257 = sum of:
        0.029911257 = product of:
          0.059822515 = sum of:
            0.059822515 = weight(_text_:texts in 5044) [ClassicSimilarity], result of:
              0.059822515 = score(doc=5044,freq=2.0), product of:
                0.16460659 = queryWeight, product of:
                  5.4822793 = idf(docFreq=499, maxDocs=44218)
                  0.03002521 = queryNorm
                0.36342722 = fieldWeight in 5044, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.4822793 = idf(docFreq=499, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5044)
          0.5 = coord(1/2)
      0.071428575 = coord(1/14)
    
  13. Keen, M.: Query reformulation in ranked output interaction (1994) 0.00
    0.0021217826 = product of:
      0.029704956 = sum of:
        0.029704956 = weight(_text_:subject in 1065) [ClassicSimilarity], result of:
          0.029704956 = score(doc=1065,freq=2.0), product of:
            0.10738805 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.03002521 = queryNorm
            0.27661324 = fieldWeight in 1065, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1065)
      0.071428575 = coord(1/14)
    
    Abstract
    Reports on a research project to evaluate and compare Boolean searching and methods of query reformulation using ranked output retrieval. Illustrates the design and operating features of the ranked output system, called ROSE (Ranked Output Search Engine), by means of typical results obtained by searching a database of 1239 records on the subject of cystic fibrosis. Concludes that further work is needed to determine the best reformulation tactics needed to harness the professional searcher's intelligence with that much more limited intelligence provided by the search software
  14. Cannane, A.; Williams, H.E.: General-purpose compression for efficient retrieval (2001) 0.00
    0.0020356115 = product of:
      0.02849856 = sum of:
        0.02849856 = product of:
          0.05699712 = sum of:
            0.05699712 = weight(_text_:schemes in 5705) [ClassicSimilarity], result of:
              0.05699712 = score(doc=5705,freq=2.0), product of:
                0.16067243 = queryWeight, product of:
                  5.3512506 = idf(docFreq=569, maxDocs=44218)
                  0.03002521 = queryNorm
                0.35474116 = fieldWeight in 5705, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.3512506 = idf(docFreq=569, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5705)
          0.5 = coord(1/2)
      0.071428575 = coord(1/14)
    
    Abstract
    Compression of databases not only reduces space requirements but can also reduce overall retrieval times. In text databases, compression of documents based on semistatic modeling with words has been shown to be both practical and fast. Similarly, for specific applications -such as databases of integers or scientific databases-specially designed semistatic compression schemes work well. We propose a scheme for general-purpose compression that can be applied to all types of data stored in large collections. We describe our approach -which we call RAY-in detail, and show experimentally the compression available, compression and decompression costs, and performance as a stream and random-access technique. We show that, in many cases, RAY achieves better compression than an efficient Huffman scheme and popular adaptive compression techniques, and that it can be used as an efficient general-purpose compression scheme
  15. Kaszkiel, M.; Zobel, J.: Effective ranking with arbitrary passages (2001) 0.00
    0.0020356115 = product of:
      0.02849856 = sum of:
        0.02849856 = product of:
          0.05699712 = sum of:
            0.05699712 = weight(_text_:schemes in 5764) [ClassicSimilarity], result of:
              0.05699712 = score(doc=5764,freq=2.0), product of:
                0.16067243 = queryWeight, product of:
                  5.3512506 = idf(docFreq=569, maxDocs=44218)
                  0.03002521 = queryNorm
                0.35474116 = fieldWeight in 5764, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.3512506 = idf(docFreq=569, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5764)
          0.5 = coord(1/2)
      0.071428575 = coord(1/14)
    
    Abstract
    Text retrieval systems store a great variety of documents, from abstracts, newspaper articles, and Web pages to journal articles, books, court transcripts, and legislation. Collections of diverse types of documents expose shortcomings in current approaches to ranking. Use of short fragments of documents, called passages, instead of whole documents can overcome these shortcomings: passage ranking provides convenient units of text to return to the user, can avoid the difficulties of comparing documents of different length, and enables identification of short blocks of relevant material among otherwise irrelevant text. In this article, we compare several kinds of passage in an extensive series of experiments. We introduce a new type of passage, overlapping fragments of either fixed or variable length. We show that ranking with these arbitrary passages gives substantial improvements in retrieval effectiveness over traditional document ranking schemes, particularly for queries on collections of long documents. Ranking with arbitrary passages shows consistent improvements compared to ranking with whole documents, and to ranking with previous passage types that depend on document structure or topic shifts in documents
  16. Smeaton, A.F.; Rijsbergen, C.J. van: ¬The retrieval effects of query expansion on a feedback document retrieval system (1983) 0.00
    0.0020340008 = product of:
      0.02847601 = sum of:
        0.02847601 = product of:
          0.05695202 = sum of:
            0.05695202 = weight(_text_:22 in 2134) [ClassicSimilarity], result of:
              0.05695202 = score(doc=2134,freq=2.0), product of:
                0.10514317 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03002521 = queryNorm
                0.5416616 = fieldWeight in 2134, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=2134)
          0.5 = coord(1/2)
      0.071428575 = coord(1/14)
    
    Date
    30. 3.2001 13:32:22
  17. Back, J.: ¬An evaluation of relevancy ranking techniques used by Internet search engines (2000) 0.00
    0.0020340008 = product of:
      0.02847601 = sum of:
        0.02847601 = product of:
          0.05695202 = sum of:
            0.05695202 = weight(_text_:22 in 3445) [ClassicSimilarity], result of:
              0.05695202 = score(doc=3445,freq=2.0), product of:
                0.10514317 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03002521 = queryNorm
                0.5416616 = fieldWeight in 3445, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=3445)
          0.5 = coord(1/2)
      0.071428575 = coord(1/14)
    
    Date
    25. 8.2005 17:42:22
  18. Liddy, E.D.: ¬An alternative representation for documents and queries (1993) 0.00
    0.0018186709 = product of:
      0.02546139 = sum of:
        0.02546139 = weight(_text_:subject in 7813) [ClassicSimilarity], result of:
          0.02546139 = score(doc=7813,freq=2.0), product of:
            0.10738805 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.03002521 = queryNorm
            0.23709705 = fieldWeight in 7813, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.046875 = fieldNorm(doc=7813)
      0.071428575 = coord(1/14)
    
    Abstract
    Describes an alternative method of representation for documents and queries in information retrieval systems to the 2 most common methods: free text, natural language representation and controlled language representation. The alternative method combines the advantage of both traditional approaches and overcomes the difficulties associated with each. The scheme was developed for use with Longman's Dictionary of Contemporary English and uses a computerized version of the dictionary for the automatic generation of summary level semantic representations of each document and query. The system tags each word in a document with the appropriate Subject Field Code (SFC) from the dictionary and the SFCs are summed and normalized to produce a weighted, fixed length vector of the SFC. The search system matches the query SFC vector to the document SFC vectors in the database. The documents are then ranked on the basis of their vector's similarity to the query
  19. Ding, Y.; Chowdhury, G.; Foo, S.: Organsising keywords in a Web search environment : a methodology based on co-word analysis (2000) 0.00
    0.0018186709 = product of:
      0.02546139 = sum of:
        0.02546139 = weight(_text_:subject in 105) [ClassicSimilarity], result of:
          0.02546139 = score(doc=105,freq=2.0), product of:
            0.10738805 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.03002521 = queryNorm
            0.23709705 = fieldWeight in 105, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.046875 = fieldNorm(doc=105)
      0.071428575 = coord(1/14)
    
    Abstract
    The rapid development of the Internet and World Wide Web has caused some critical problem for information retrieval. Researchers have made several attempts to solve these problems. Thesauri and subject heading lists as traditional information retrieval tools have been criticised for their efficiency to tackle these newly emerging problems. This paper proposes an information retrieval tool generated by cocitation analysis, comprising keyword clusters with relationships based on the co-occurrences of keywords in the literature. Such a tool can play the role of an associative thesaurus that can provide information about the keywords in a domain that might be useful for information searching and query expansion
  20. Liu, X.; Turtle, H.: Real-time user interest modeling for real-time ranking (2013) 0.00
    0.0018186709 = product of:
      0.02546139 = sum of:
        0.02546139 = weight(_text_:subject in 1035) [ClassicSimilarity], result of:
          0.02546139 = score(doc=1035,freq=2.0), product of:
            0.10738805 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.03002521 = queryNorm
            0.23709705 = fieldWeight in 1035, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.046875 = fieldNorm(doc=1035)
      0.071428575 = coord(1/14)
    
    Abstract
    User interest as a very dynamic information need is often ignored in most existing information retrieval systems. In this research, we present the results of experiments designed to evaluate the performance of a real-time interest model (RIM) that attempts to identify the dynamic and changing query level interests regarding social media outputs. Unlike most existing ranking methods, our ranking approach targets calculation of the probability that user interest in the content of the document is subject to very dynamic user interest change. We describe 2 formulations of the model (real-time interest vector space and real-time interest language model) stemming from classical relevance ranking methods and develop a novel methodology for evaluating the performance of RIM using Amazon Mechanical Turk to collect (interest-based) relevance judgments on a daily basis. Our results show that the model usually, although not always, performs better than baseline results obtained from commercial web search engines. We identify factors that affect RIM performance and outline plans for future research.

Authors

Years

Languages

  • e 59
  • d 4
  • More… Less…