Search (60 results, page 1 of 3)

  • × theme_ss:"Retrievalalgorithmen"
  1. Klas, C.-P.; Fuhr, N.; Schaefer, A.: Evaluating strategic support for information access in the DAFFODIL system (2004) 0.02
    0.021241087 = product of:
      0.08496435 = sum of:
        0.069213726 = weight(_text_:work in 2419) [ClassicSimilarity], result of:
          0.069213726 = score(doc=2419,freq=8.0), product of:
            0.14223081 = queryWeight, product of:
              3.6703904 = idf(docFreq=3060, maxDocs=44218)
              0.03875087 = queryNorm
            0.4866296 = fieldWeight in 2419, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.6703904 = idf(docFreq=3060, maxDocs=44218)
              0.046875 = fieldNorm(doc=2419)
        0.015750622 = product of:
          0.031501245 = sum of:
            0.031501245 = weight(_text_:22 in 2419) [ClassicSimilarity], result of:
              0.031501245 = score(doc=2419,freq=2.0), product of:
                0.13569894 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03875087 = queryNorm
                0.23214069 = fieldWeight in 2419, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2419)
          0.5 = coord(1/2)
      0.25 = coord(2/8)
    
    Abstract
    The digital library system Daffodil is targeted at strategic support of users during the information search process. For searching, exploring and managing digital library objects it provides user-customisable information seeking patterns over a federation of heterogeneous digital libraries. In this paper evaluation results with respect to retrieval effectiveness, efficiency and user satisfaction are presented. The analysis focuses on strategic support for the scientific work-flow. Daffodil supports the whole work-flow, from data source selection over information seeking to the representation, organisation and reuse of information. By embedding high level search functionality into the scientific work-flow, the user experiences better strategic system support due to a more systematic work process. These ideas have been implemented in Daffodil followed by a qualitative evaluation. The evaluation has been conducted with 28 participants, ranging from information seeking novices to experts. The results are promising, as they support the chosen model.
    Date
    16.11.2008 16:22:48
  2. Couvreur, T.R.; Benzel, R.N.; Miller, S.F.; Zeitler, D.N.; Lee, D.L.; Singhal, M.; Shivaratri, N.; Wong, W.Y.P.: ¬An analysis of performance and cost factors in searching large text databases using parallel search systems (1994) 0.01
    0.0131395515 = product of:
      0.10511641 = sum of:
        0.10511641 = weight(_text_:supported in 7657) [ClassicSimilarity], result of:
          0.10511641 = score(doc=7657,freq=2.0), product of:
            0.22949564 = queryWeight, product of:
              5.9223356 = idf(docFreq=321, maxDocs=44218)
              0.03875087 = queryNorm
            0.45803228 = fieldWeight in 7657, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.9223356 = idf(docFreq=321, maxDocs=44218)
              0.0546875 = fieldNorm(doc=7657)
      0.125 = coord(1/8)
    
    Abstract
    The results of modelling the performance of searching large text databases (>10 GBytes) via various parallel hardware architectures and search algorithms are discussed. The performance under load and the cost of each configuration are compared. Strengths, weaknesses, performance sensitivities, and search features supported for each configuration are also addressed. In addition, a common search workload used in the modelling is described. The search workload is derived from a set of searches run against the Chemical Abstracts file of bibliographic and abstract text available on STN International. This common workload is applied to all configurations modelled to provide a common basis of comparison
  3. Lee, D.L.; Ren, L.: Document ranking on weight-partitioned signature files (1996) 0.01
    0.0131395515 = product of:
      0.10511641 = sum of:
        0.10511641 = weight(_text_:supported in 2417) [ClassicSimilarity], result of:
          0.10511641 = score(doc=2417,freq=2.0), product of:
            0.22949564 = queryWeight, product of:
              5.9223356 = idf(docFreq=321, maxDocs=44218)
              0.03875087 = queryNorm
            0.45803228 = fieldWeight in 2417, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.9223356 = idf(docFreq=321, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2417)
      0.125 = coord(1/8)
    
    Abstract
    Proposes the weight partitioned signature file, a signature file organization for supporting document ranking. It uses multiple signature files each corresponding to one term frequency to represent terms with different term frequencies. Words with the same term frequency in a document are grouped together and hased into the signature file corresponding to that term frequency. Investigates the effect of false drops on retrieval effectiveness. Analyses the performance of the weight partitioned signature file under different search strategies and configurations. Obtains an optimal formula for storage allocation to minimise the effect of false drops on document ranks. Analytical results are supported by experiments on document collections
  4. Furner, J.: ¬A unifying model of document relatedness for hybrid search engines (2003) 0.01
    0.012589371 = product of:
      0.050357483 = sum of:
        0.034606863 = weight(_text_:work in 2717) [ClassicSimilarity], result of:
          0.034606863 = score(doc=2717,freq=2.0), product of:
            0.14223081 = queryWeight, product of:
              3.6703904 = idf(docFreq=3060, maxDocs=44218)
              0.03875087 = queryNorm
            0.2433148 = fieldWeight in 2717, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.6703904 = idf(docFreq=3060, maxDocs=44218)
              0.046875 = fieldNorm(doc=2717)
        0.015750622 = product of:
          0.031501245 = sum of:
            0.031501245 = weight(_text_:22 in 2717) [ClassicSimilarity], result of:
              0.031501245 = score(doc=2717,freq=2.0), product of:
                0.13569894 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03875087 = queryNorm
                0.23214069 = fieldWeight in 2717, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2717)
          0.5 = coord(1/2)
      0.25 = coord(2/8)
    
    Abstract
    Previous work an search-engine design has indicated that information-seekers may benefit from being given the opportunity to exploit multiple sources of evidence of document relatedness. Few existing systems, however, give users more than minimal control over the selections that may be made among methods of exploitation. By applying the methods of "document network analysis" (DNA), a unifying, graph-theoretic model of content-, collaboration-, and context-based systems (CCC) may be developed in which the nature of the similarities between types of document relatedness and document ranking are clarified. The usefulness of the approach to system design suggested by this model may be tested by constructing and evaluating a prototype system (UCXtra) that allows searchers to maintain control over the multiple ways in which document collections may be ranked and re-ranked.
    Date
    11. 9.2004 17:32:22
  5. Walz, J.: Analyse der Übertragbarkeit allgemeiner Rankingfaktoren von Web-Suchmaschinen auf Discovery-Systeme (2018) 0.01
    0.012000851 = product of:
      0.09600681 = sum of:
        0.09600681 = weight(_text_:hochschule in 5744) [ClassicSimilarity], result of:
          0.09600681 = score(doc=5744,freq=2.0), product of:
            0.23689921 = queryWeight, product of:
              6.113391 = idf(docFreq=265, maxDocs=44218)
              0.03875087 = queryNorm
            0.40526438 = fieldWeight in 5744, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.113391 = idf(docFreq=265, maxDocs=44218)
              0.046875 = fieldNorm(doc=5744)
      0.125 = coord(1/8)
    
    Footnote
    Bachelorarbeit Bibliothekswissenschaft, Technische Hochschule Köln.
  6. Dominich, S.: Mathematical foundations of information retrieval (2001) 0.01
    0.010491143 = product of:
      0.041964572 = sum of:
        0.028839052 = weight(_text_:work in 1753) [ClassicSimilarity], result of:
          0.028839052 = score(doc=1753,freq=2.0), product of:
            0.14223081 = queryWeight, product of:
              3.6703904 = idf(docFreq=3060, maxDocs=44218)
              0.03875087 = queryNorm
            0.20276234 = fieldWeight in 1753, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.6703904 = idf(docFreq=3060, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1753)
        0.01312552 = product of:
          0.02625104 = sum of:
            0.02625104 = weight(_text_:22 in 1753) [ClassicSimilarity], result of:
              0.02625104 = score(doc=1753,freq=2.0), product of:
                0.13569894 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03875087 = queryNorm
                0.19345059 = fieldWeight in 1753, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1753)
          0.5 = coord(1/2)
      0.25 = coord(2/8)
    
    Abstract
    This book offers a comprehensive and consistent mathematical approach to information retrieval (IR) without which no implementation is possible, and sheds an entirely new light upon the structure of IR models. It contains the descriptions of all IR models in a unified formal style and language, along with examples for each, thus offering a comprehensive overview of them. The book also creates mathematical foundations and a consistent mathematical theory (including all mathematical results achieved so far) of IR as a stand-alone mathematical discipline, which thus can be read and taught independently. Also, the book contains all necessary mathematical knowledge on which IR relies, to help the reader avoid searching different sources. The book will be of interest to computer or information scientists, librarians, mathematicians, undergraduate students and researchers whose work involves information retrieval.
    Date
    22. 3.2008 12:26:32
  7. Van der Veer Martens, B.; Fleet, C. van: Opening the black box of "relevance work" : a domain analysis (2012) 0.01
    0.008651716 = product of:
      0.069213726 = sum of:
        0.069213726 = weight(_text_:work in 247) [ClassicSimilarity], result of:
          0.069213726 = score(doc=247,freq=8.0), product of:
            0.14223081 = queryWeight, product of:
              3.6703904 = idf(docFreq=3060, maxDocs=44218)
              0.03875087 = queryNorm
            0.4866296 = fieldWeight in 247, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.6703904 = idf(docFreq=3060, maxDocs=44218)
              0.046875 = fieldNorm(doc=247)
      0.125 = coord(1/8)
    
    Abstract
    In response to Hjørland's recent call for a reconceptualization of the foundations of relevance, we suggest that the sociocognitive aspects of intermediation by information agencies, such as archives and libraries, are a necessary and unexplored part of the infrastructure of the subject knowledge domains central to his recommended "view of relevance informed by a social paradigm" (2010, p. 217). From a comparative analysis of documents from 39 graduate-level introductory courses in archives, reference, and strategic/competitive intelligence taught in 13 American Library Association-accredited library and information science (LIS) programs, we identify four defining sociocognitive dimensions of "relevance work" in information agencies within Hjørland's proposed framework for relevance: tasks, time, systems, and assessors. This study is intended to supply sociocognitive content from within the relevance work domain to support further domain analytic research, and to emphasize the importance of intermediary relevance work for all subject knowledge domains.
  8. Robertson, S.E.: OKAPI at TREC-1 (1994) 0.01
    0.007209763 = product of:
      0.057678103 = sum of:
        0.057678103 = weight(_text_:work in 7953) [ClassicSimilarity], result of:
          0.057678103 = score(doc=7953,freq=2.0), product of:
            0.14223081 = queryWeight, product of:
              3.6703904 = idf(docFreq=3060, maxDocs=44218)
              0.03875087 = queryNorm
            0.40552467 = fieldWeight in 7953, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.6703904 = idf(docFreq=3060, maxDocs=44218)
              0.078125 = fieldNorm(doc=7953)
      0.125 = coord(1/8)
    
    Abstract
    Describes the work carried out on the TREC-2 project following the results of the TREC-1 project. Experiments were conducted on the OKAPI experimental text information retrieval system which investigated a number of alternative probabilistic term weighting functions in place of the 'standard' Robertson Sparck Jones weighting functions used in TREC-1
  9. Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval (1986) 0.01
    0.005250208 = product of:
      0.042001665 = sum of:
        0.042001665 = product of:
          0.08400333 = sum of:
            0.08400333 = weight(_text_:22 in 402) [ClassicSimilarity], result of:
              0.08400333 = score(doc=402,freq=2.0), product of:
                0.13569894 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03875087 = queryNorm
                0.61904186 = fieldWeight in 402, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.125 = fieldNorm(doc=402)
          0.5 = coord(1/2)
      0.125 = coord(1/8)
    
    Source
    Information processing and management. 22(1986) no.6, S.465-476
  10. Chen, H.; Lally, A.M.; Zhu, B.; Chau, M.: HelpfulMed : Intelligent searching for medical information over the Internet (2003) 0.01
    0.005098072 = product of:
      0.040784575 = sum of:
        0.040784575 = weight(_text_:work in 1615) [ClassicSimilarity], result of:
          0.040784575 = score(doc=1615,freq=4.0), product of:
            0.14223081 = queryWeight, product of:
              3.6703904 = idf(docFreq=3060, maxDocs=44218)
              0.03875087 = queryNorm
            0.28674924 = fieldWeight in 1615, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.6703904 = idf(docFreq=3060, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1615)
      0.125 = coord(1/8)
    
    Abstract
    The Medical professionals and researchers need information from reputable sources to accomplish their work. Unfortunately, the Web has a large number of documents that are irrelevant to their work, even those documents that purport to be "medically-related." This paper describes an architecture designed to integrate advanced searching and indexing algorithms, an automatic thesaurus, or "concept space," and Kohonen-based Self-Organizing Map (SOM) technologies to provide searchers with finegrained results. Initial results indicate that these systems provide complementary retrieval functionalities. HelpfulMed not only allows users to search Web pages and other online databases, but also allows them to build searches through the use of an automatic thesaurus and browse a graphical display of medical-related topics. Evaluation results for each of the different components are included. Our spidering algorithm outperformed both breadth-first search and PageRank spiders an a test collection of 100,000 Web pages. The automatically generated thesaurus performed as well as both MeSH and UMLS-systems which require human mediation for currency. Lastly, a variant of the Kohonen SOM was comparable to MeSH terms in perceived cluster precision and significantly better at perceived cluster recall.
  11. Keen, M.: Query reformulation in ranked output interaction (1994) 0.01
    0.005046834 = product of:
      0.04037467 = sum of:
        0.04037467 = weight(_text_:work in 1065) [ClassicSimilarity], result of:
          0.04037467 = score(doc=1065,freq=2.0), product of:
            0.14223081 = queryWeight, product of:
              3.6703904 = idf(docFreq=3060, maxDocs=44218)
              0.03875087 = queryNorm
            0.28386727 = fieldWeight in 1065, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.6703904 = idf(docFreq=3060, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1065)
      0.125 = coord(1/8)
    
    Abstract
    Reports on a research project to evaluate and compare Boolean searching and methods of query reformulation using ranked output retrieval. Illustrates the design and operating features of the ranked output system, called ROSE (Ranked Output Search Engine), by means of typical results obtained by searching a database of 1239 records on the subject of cystic fibrosis. Concludes that further work is needed to determine the best reformulation tactics needed to harness the professional searcher's intelligence with that much more limited intelligence provided by the search software
  12. Torra, V.; Miyamoto, S.; Lanau, S.: Exploration of textual document archives using a fuzzy hierarchical clustering algorithm in the GAMBAL system (2005) 0.01
    0.005046834 = product of:
      0.04037467 = sum of:
        0.04037467 = weight(_text_:work in 1028) [ClassicSimilarity], result of:
          0.04037467 = score(doc=1028,freq=2.0), product of:
            0.14223081 = queryWeight, product of:
              3.6703904 = idf(docFreq=3060, maxDocs=44218)
              0.03875087 = queryNorm
            0.28386727 = fieldWeight in 1028, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.6703904 = idf(docFreq=3060, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1028)
      0.125 = coord(1/8)
    
    Abstract
    The Internet, together with the large amount of textual information available in document archives, has increased the relevance of information retrieval related tools. In this work we present an extension of the Gambal system for clustering and visualization of documents based on fuzzy clustering techniques. The tool allows to structure the set of documents in a hierarchical way (using a fuzzy hierarchical structure) and represent this structure in a graphical interface (a 3D sphere) over which the user can navigate. Gambal allows the analysis of the documents and the computation of their similarity not only on the basis of the syntactic similarity between words but also based on a dictionary (Wordnet 1.7) and latent semantics analysis.
  13. López-Pujalte, C.; Guerrero-Bote, V.P.; Moya-Anegón, F. de: Genetic algorithms in relevance feedback : a second test and new contributions (2003) 0.01
    0.005046834 = product of:
      0.04037467 = sum of:
        0.04037467 = weight(_text_:work in 1076) [ClassicSimilarity], result of:
          0.04037467 = score(doc=1076,freq=2.0), product of:
            0.14223081 = queryWeight, product of:
              3.6703904 = idf(docFreq=3060, maxDocs=44218)
              0.03875087 = queryNorm
            0.28386727 = fieldWeight in 1076, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.6703904 = idf(docFreq=3060, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1076)
      0.125 = coord(1/8)
    
    Abstract
    The present work is the continuation of an earlier study which reviewed the literature on relevance feedback genetic techniques that follow the vector space model (the model that is most commonly used in this type of application), and implemented them so that they could be compared with each other as well as with one of the best traditional methods of relevance feedback--the Ide dec-hi method. We here carry out the comparisons on more test collections (Cranfield, CISI, Medline, and NPL), using the residual collection method for their evaluation as is recommended in this type of technique. We also add some fitness functions of our own design.
  14. Otterbacher, J.; Erkan, G.; Radev, D.R.: Biased LexRank : passage retrieval using random walks with question-based priors (2009) 0.01
    0.005046834 = product of:
      0.04037467 = sum of:
        0.04037467 = weight(_text_:work in 2450) [ClassicSimilarity], result of:
          0.04037467 = score(doc=2450,freq=2.0), product of:
            0.14223081 = queryWeight, product of:
              3.6703904 = idf(docFreq=3060, maxDocs=44218)
              0.03875087 = queryNorm
            0.28386727 = fieldWeight in 2450, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.6703904 = idf(docFreq=3060, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2450)
      0.125 = coord(1/8)
    
    Abstract
    We present Biased LexRank, a method for semi-supervised passage retrieval in the context of question answering. We represent a text as a graph of passages linked based on their pairwise lexical similarity. We use traditional passage retrieval techniques to identify passages that are likely to be relevant to a user's natural language question. We then perform a random walk on the lexical similarity graph in order to recursively retrieve additional passages that are similar to other relevant passages. We present results on several benchmarks that show the applicability of our work to question answering and topic-focused text summarization.
  15. Cross-language information retrieval (1998) 0.00
    0.004692697 = product of:
      0.037541576 = sum of:
        0.037541576 = weight(_text_:supported in 6299) [ClassicSimilarity], result of:
          0.037541576 = score(doc=6299,freq=2.0), product of:
            0.22949564 = queryWeight, product of:
              5.9223356 = idf(docFreq=321, maxDocs=44218)
              0.03875087 = queryNorm
            0.16358295 = fieldWeight in 6299, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.9223356 = idf(docFreq=321, maxDocs=44218)
              0.01953125 = fieldNorm(doc=6299)
      0.125 = coord(1/8)
    
    Footnote
    The retrieved output from a query including the phrase 'big rockets' may be, for instance, a sentence containing 'giant rocket' which is semantically ranked above 'military ocket'. David Hull (Xerox Research Centre, Grenoble) describes an implementation of a weighted Boolean model for Spanish-English CLIR. Users construct Boolean-type queries, weighting each term in the query, which is then translated by an on-line dictionary before being applied to the database. Comparisons with the performance of unweighted free-form queries ('vector space' models) proved encouraging. Two contributions consider the evaluation of CLIR systems. In order to by-pass the time-consuming and expensive process of assembling a standard collection of documents and of user queries against which the performance of an CLIR system is manually assessed, Páriac Sheridan et al (ETH Zurich) propose a method based on retrieving 'seed documents'. This involves identifying a unique document in a database (the 'seed document') and, for a number of queries, measuring how fast it is retrieved. The authors have also assembled a large database of multilingual news documents for testing purposes. By storing the (fairly short) documents in a structured form tagged with descriptor codes (e.g. for topic, country and area), the test suite is easily expanded while remaining consistent for the purposes of testing. Douglas Ouard and Bonne Dorr (University of Maryland) describe an evaluation methodology which appears to apply LSI techniques in order to filter and rank incoming documents designed for testing CLIR systems. The volume provides the reader an excellent overview of several projects in CLIR. It is well supported with references and is intended as a secondary text for researchers and practitioners. It highlights the need for a good, general tutorial introduction to the field."
  16. Smeaton, A.F.; Rijsbergen, C.J. van: ¬The retrieval effects of query expansion on a feedback document retrieval system (1983) 0.00
    0.004593932 = product of:
      0.036751457 = sum of:
        0.036751457 = product of:
          0.07350291 = sum of:
            0.07350291 = weight(_text_:22 in 2134) [ClassicSimilarity], result of:
              0.07350291 = score(doc=2134,freq=2.0), product of:
                0.13569894 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03875087 = queryNorm
                0.5416616 = fieldWeight in 2134, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=2134)
          0.5 = coord(1/2)
      0.125 = coord(1/8)
    
    Date
    30. 3.2001 13:32:22
  17. Back, J.: ¬An evaluation of relevancy ranking techniques used by Internet search engines (2000) 0.00
    0.004593932 = product of:
      0.036751457 = sum of:
        0.036751457 = product of:
          0.07350291 = sum of:
            0.07350291 = weight(_text_:22 in 3445) [ClassicSimilarity], result of:
              0.07350291 = score(doc=3445,freq=2.0), product of:
                0.13569894 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03875087 = queryNorm
                0.5416616 = fieldWeight in 3445, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=3445)
          0.5 = coord(1/2)
      0.125 = coord(1/8)
    
    Date
    25. 8.2005 17:42:22
  18. Rada, R.; Barlow, J.; Potharst, J.; Zanstra, P.; Bijstra, D.: Document ranking using an enriched thesaurus (1991) 0.00
    0.004325858 = product of:
      0.034606863 = sum of:
        0.034606863 = weight(_text_:work in 6626) [ClassicSimilarity], result of:
          0.034606863 = score(doc=6626,freq=2.0), product of:
            0.14223081 = queryWeight, product of:
              3.6703904 = idf(docFreq=3060, maxDocs=44218)
              0.03875087 = queryNorm
            0.2433148 = fieldWeight in 6626, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.6703904 = idf(docFreq=3060, maxDocs=44218)
              0.046875 = fieldNorm(doc=6626)
      0.125 = coord(1/8)
    
    Abstract
    A thesaurus may be viewed as a graph, and document retrieval algorithms can exploit this graph when both the documents and the query are represented by thesaurus terms. These retrieval algorithms measure the distance between the query and documents by using the path lengths in the graph. Previous work witj such strategies has shown that the hierarchical relations in the thesaurus are useful but the non-hierarchical are not. This paper shows that when the query explicitly mentions a particular non-hierarchical relation, the retrieval algorithm benefits from the presence of such relations in the thesaurus. Our algorithms were applied to the Excerpta Medica bibliographic citation database whose citations are indexed with terms from the EMTREE thesaurus. We also created an enriched EMTREE by systematically adding non-hierarchical relations from a medical knowledge base. Our algorithms used at one time EMTREE and, at another time, the enriched EMTREE in the course of ranking documents from Excerpta Medica against queries. When, and only when, the query specifically mentioned a particular non-hierarchical relation type, did EMTREE enriched with that relation type lead to a ranking that better corresponded to an expert's ranking
  19. French, J.C.; Powell, A.L.; Schulman, E.: Using clustering strategies for creating authority files (2000) 0.00
    0.004325858 = product of:
      0.034606863 = sum of:
        0.034606863 = weight(_text_:work in 4811) [ClassicSimilarity], result of:
          0.034606863 = score(doc=4811,freq=2.0), product of:
            0.14223081 = queryWeight, product of:
              3.6703904 = idf(docFreq=3060, maxDocs=44218)
              0.03875087 = queryNorm
            0.2433148 = fieldWeight in 4811, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.6703904 = idf(docFreq=3060, maxDocs=44218)
              0.046875 = fieldNorm(doc=4811)
      0.125 = coord(1/8)
    
    Abstract
    As more online databases are integrated into digital libraries, the issue of quality control of the data becomes increasingly important, especially as it relates to the effective retrieval of information. Authority work, the need to discover and reconcile variant forms of strings in bibliographical entries, will become more critical in the future. Spelling variants, misspellings, and transliteration differences will all increase the difficulty of retrieving information. We investigate a number of approximate string matching techniques that have traditionally been used to help with this problem. We then introduce the notion of approximate word matching and show how it can be used to improve detection and categorization of variant forms. We demonstrate the utility of these approaches using data from the Astrophysics Data System and show how we can reduce the human effort involved in the creation of authority files
  20. Cannane, A.; Williams, H.E.: General-purpose compression for efficient retrieval (2001) 0.00
    0.004325858 = product of:
      0.034606863 = sum of:
        0.034606863 = weight(_text_:work in 5705) [ClassicSimilarity], result of:
          0.034606863 = score(doc=5705,freq=2.0), product of:
            0.14223081 = queryWeight, product of:
              3.6703904 = idf(docFreq=3060, maxDocs=44218)
              0.03875087 = queryNorm
            0.2433148 = fieldWeight in 5705, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.6703904 = idf(docFreq=3060, maxDocs=44218)
              0.046875 = fieldNorm(doc=5705)
      0.125 = coord(1/8)
    
    Abstract
    Compression of databases not only reduces space requirements but can also reduce overall retrieval times. In text databases, compression of documents based on semistatic modeling with words has been shown to be both practical and fast. Similarly, for specific applications -such as databases of integers or scientific databases-specially designed semistatic compression schemes work well. We propose a scheme for general-purpose compression that can be applied to all types of data stored in large collections. We describe our approach -which we call RAY-in detail, and show experimentally the compression available, compression and decompression costs, and performance as a stream and random-access technique. We show that, in many cases, RAY achieves better compression than an efficient Huffman scheme and popular adaptive compression techniques, and that it can be used as an efficient general-purpose compression scheme

Years

Languages

  • e 53
  • d 6

Types

  • a 52
  • el 3
  • m 2
  • p 1
  • r 1
  • s 1
  • x 1
  • More… Less…