Search (50 results, page 1 of 3)

  • × theme_ss:"Retrievalstudien"
  1. Hallet, K.S.: Separate but equal? : A system comparison study of MEDLINE's controlled vocabulary MeSH (1998) 0.05
    0.04850093 = product of:
      0.09700186 = sum of:
        0.051931016 = weight(_text_:reference in 3553) [ClassicSimilarity], result of:
          0.051931016 = score(doc=3553,freq=2.0), product of:
            0.19255297 = queryWeight, product of:
              4.0683694 = idf(docFreq=2055, maxDocs=44218)
              0.047329273 = queryNorm
            0.2696973 = fieldWeight in 3553, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.0683694 = idf(docFreq=2055, maxDocs=44218)
              0.046875 = fieldNorm(doc=3553)
        0.045070842 = product of:
          0.090141684 = sum of:
            0.090141684 = weight(_text_:file in 3553) [ClassicSimilarity], result of:
              0.090141684 = score(doc=3553,freq=2.0), product of:
                0.25368783 = queryWeight, product of:
                  5.3600616 = idf(docFreq=564, maxDocs=44218)
                  0.047329273 = queryNorm
                0.35532522 = fieldWeight in 3553, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.3600616 = idf(docFreq=564, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3553)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Reports results of a study to test the effect of controlled vocabulary search feature implementation on 2 online systems. Specifically, the study examined retrieval rates using 4 unique controlled vocabulary search features (Explode, major descriptor, descriptor, subheadings). 2 questions were addressed; what, if any, are the general differences between controlled vocabulary system implementations in DIALOG and Ovid; and what, if any are the impacts of each on the differing controlled vocabulary search features upon retrieval rates? Each search feature was applied to to 9 search queries obtained from a medical reference librarian. The same queires were searched in the complete MEDLINE file on the DIALOG and Ovid online host systems. The unique records (those records retrieved in only 1 of the 2 systems) were identified and analyzed. DIALOG produced equal or more records than Ovid in nearly 20% of the queries. Concludes than users need to be aware of system specific designs that may require differing input strategies across different systems for the same unique controlled vocabulary search features. Making recommendations and suggestions for future research
  2. Leiva-Mederos, A.; Senso, J.A.; Hidalgo-Delgado, Y.; Hipola, P.: Working framework of semantic interoperability for CRIS with heterogeneous data sources (2017) 0.03
    0.032333955 = product of:
      0.06466791 = sum of:
        0.03462068 = weight(_text_:reference in 3706) [ClassicSimilarity], result of:
          0.03462068 = score(doc=3706,freq=2.0), product of:
            0.19255297 = queryWeight, product of:
              4.0683694 = idf(docFreq=2055, maxDocs=44218)
              0.047329273 = queryNorm
            0.17979822 = fieldWeight in 3706, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.0683694 = idf(docFreq=2055, maxDocs=44218)
              0.03125 = fieldNorm(doc=3706)
        0.030047229 = product of:
          0.060094457 = sum of:
            0.060094457 = weight(_text_:file in 3706) [ClassicSimilarity], result of:
              0.060094457 = score(doc=3706,freq=2.0), product of:
                0.25368783 = queryWeight, product of:
                  5.3600616 = idf(docFreq=564, maxDocs=44218)
                  0.047329273 = queryNorm
                0.23688349 = fieldWeight in 3706, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.3600616 = idf(docFreq=564, maxDocs=44218)
                  0.03125 = fieldNorm(doc=3706)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Purpose Information from Current Research Information Systems (CRIS) is stored in different formats, in platforms that are not compatible, or even in independent networks. It would be helpful to have a well-defined methodology to allow for management data processing from a single site, so as to take advantage of the capacity to link disperse data found in different systems, platforms, sources and/or formats. Based on functionalities and materials of the VLIR project, the purpose of this paper is to present a model that provides for interoperability by means of semantic alignment techniques and metadata crosswalks, and facilitates the fusion of information stored in diverse sources. Design/methodology/approach After reviewing the state of the art regarding the diverse mechanisms for achieving semantic interoperability, the paper analyzes the following: the specific coverage of the data sets (type of data, thematic coverage and geographic coverage); the technical specifications needed to retrieve and analyze a distribution of the data set (format, protocol, etc.); the conditions of re-utilization (copyright and licenses); and the "dimensions" included in the data set as well as the semantics of these dimensions (the syntax and the taxonomies of reference). The semantic interoperability framework here presented implements semantic alignment and metadata crosswalk to convert information from three different systems (ABCD, Moodle and DSpace) to integrate all the databases in a single RDF file. Findings The paper also includes an evaluation based on the comparison - by means of calculations of recall and precision - of the proposed model and identical consultations made on Open Archives Initiative and SQL, in order to estimate its efficiency. The results have been satisfactory enough, due to the fact that the semantic interoperability facilitates the exact retrieval of information. Originality/value The proposed model enhances management of the syntactic and semantic interoperability of the CRIS system designed. In a real setting of use it achieves very positive results.
  3. Ravana, S.D.; Taheri, M.S.; Rajagopal, P.: Document-based approach to improve the accuracy of pairwise comparison in evaluating information retrieval systems (2015) 0.03
    0.029653504 = product of:
      0.05930701 = sum of:
        0.04327585 = weight(_text_:reference in 2587) [ClassicSimilarity], result of:
          0.04327585 = score(doc=2587,freq=2.0), product of:
            0.19255297 = queryWeight, product of:
              4.0683694 = idf(docFreq=2055, maxDocs=44218)
              0.047329273 = queryNorm
            0.22474778 = fieldWeight in 2587, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.0683694 = idf(docFreq=2055, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2587)
        0.016031157 = product of:
          0.032062314 = sum of:
            0.032062314 = weight(_text_:22 in 2587) [ClassicSimilarity], result of:
              0.032062314 = score(doc=2587,freq=2.0), product of:
                0.16573904 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.047329273 = queryNorm
                0.19345059 = fieldWeight in 2587, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2587)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Purpose The purpose of this paper is to propose a method to have more accurate results in comparing performance of the paired information retrieval (IR) systems with reference to the current method, which is based on the mean effectiveness scores of the systems across a set of identified topics/queries. Design/methodology/approach Based on the proposed approach, instead of the classic method of using a set of topic scores, the documents level scores are considered as the evaluation unit. These document scores are the defined document's weight, which play the role of the mean average precision (MAP) score of the systems as a significance test's statics. The experiments were conducted using the TREC 9 Web track collection. Findings The p-values generated through the two types of significance tests, namely the Student's t-test and Mann-Whitney show that by using the document level scores as an evaluation unit, the difference between IR systems is more significant compared with utilizing topic scores. Originality/value Utilizing a suitable test collection is a primary prerequisite for IR systems comparative evaluation. However, in addition to reusable test collections, having an accurate statistical testing is a necessity for these evaluations. The findings of this study will assist IR researchers to evaluate their retrieval systems and algorithms more accurately.
    Date
    20. 1.2015 18:30:22
  4. Scherer, B.: Automatische Indexierung und ihre Anwendung im DFG-Projekt "Gemeinsames Portal für Bibliotheken, Archive und Museen (BAM)" (2003) 0.03
    0.028342828 = product of:
      0.11337131 = sum of:
        0.11337131 = weight(_text_:master in 4283) [ClassicSimilarity], result of:
          0.11337131 = score(doc=4283,freq=2.0), product of:
            0.3116585 = queryWeight, product of:
              6.5848994 = idf(docFreq=165, maxDocs=44218)
              0.047329273 = queryNorm
            0.36376774 = fieldWeight in 4283, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.5848994 = idf(docFreq=165, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4283)
      0.25 = coord(1/4)
    
    Footnote
    Masterarbeit im Studiengang Information Engineering zur Erlagung des Grades eines Master of Science in Information science,
  5. Spink, A.; Goodrum, A.; Robins, D.: Search intermediary elicitations during mediated online searching (1995) 0.02
    0.015146547 = product of:
      0.060586188 = sum of:
        0.060586188 = weight(_text_:reference in 3872) [ClassicSimilarity], result of:
          0.060586188 = score(doc=3872,freq=2.0), product of:
            0.19255297 = queryWeight, product of:
              4.0683694 = idf(docFreq=2055, maxDocs=44218)
              0.047329273 = queryNorm
            0.31464687 = fieldWeight in 3872, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.0683694 = idf(docFreq=2055, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3872)
      0.25 = coord(1/4)
    
    Abstract
    Investigates search intermediary elicitations during mediated online searching. A study of 40 online reference interviews involving 1.557 search intermediary elicitation, found 15 different types of search intermediary elicitation to users. The elicitation purpose included search terms and strategies, database selection, relevance of retrieved items, users' knowledge and previous information seeking. Analysis of the patterns in the types and sequencing of elicitation showed significant strings of multiple elicitation regarding search terms and strategies, and relevance judgements. Discusses the implications of the findings for training search intermediaries and the design of interfaces eliciting information from end users
  6. Schultz Jr., W.N.; Braddy, L.: ¬A librarian-centered study of perceptions of subject terms and controlled vocabulary (2017) 0.02
    0.015146547 = product of:
      0.060586188 = sum of:
        0.060586188 = weight(_text_:reference in 5156) [ClassicSimilarity], result of:
          0.060586188 = score(doc=5156,freq=2.0), product of:
            0.19255297 = queryWeight, product of:
              4.0683694 = idf(docFreq=2055, maxDocs=44218)
              0.047329273 = queryNorm
            0.31464687 = fieldWeight in 5156, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.0683694 = idf(docFreq=2055, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5156)
      0.25 = coord(1/4)
    
    Abstract
    Controlled vocabulary and subject headings in OPAC records have proven to be useful in improving search results. The authors used a survey to gather information about librarian opinions and professional use of controlled vocabulary. Data from a range of backgrounds and expertise were examined, including academic and public libraries, and technical services as well as public services professionals. Responses overall demonstrated positive opinions of the value of controlled vocabulary, including in reference interactions as well as during bibliographic instruction sessions. Results are also examined based upon factors such as age and type of librarian.
  7. Prasher, R.G.: Evaluation of indexing system (1989) 0.02
    0.015023614 = product of:
      0.060094457 = sum of:
        0.060094457 = product of:
          0.120188914 = sum of:
            0.120188914 = weight(_text_:file in 4998) [ClassicSimilarity], result of:
              0.120188914 = score(doc=4998,freq=2.0), product of:
                0.25368783 = queryWeight, product of:
                  5.3600616 = idf(docFreq=564, maxDocs=44218)
                  0.047329273 = queryNorm
                0.47376698 = fieldWeight in 4998, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.3600616 = idf(docFreq=564, maxDocs=44218)
                  0.0625 = fieldNorm(doc=4998)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Describes information system and its various components-index file construstion, query formulation and searching. Discusses an indexing system, and brings out the need for its evaluation. Explains the concept of the efficiency of indexing systems and discusses factors which control this efficiency. Gives criteria for evaluation. Discusses recall and precision ratios, as also noise ratio, novelty ratio, and exhaustivity and specificity and the impact of each on the efficiency of indexing system. Mention also various steps for evaluation.
  8. Bhattacharyya, K.: ¬The effectiveness of natural language in science indexing and retrieval (1974) 0.01
    0.012982754 = product of:
      0.051931016 = sum of:
        0.051931016 = weight(_text_:reference in 2628) [ClassicSimilarity], result of:
          0.051931016 = score(doc=2628,freq=2.0), product of:
            0.19255297 = queryWeight, product of:
              4.0683694 = idf(docFreq=2055, maxDocs=44218)
              0.047329273 = queryNorm
            0.2696973 = fieldWeight in 2628, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.0683694 = idf(docFreq=2055, maxDocs=44218)
              0.046875 = fieldNorm(doc=2628)
      0.25 = coord(1/4)
    
    Abstract
    This paper examines the implications of the findings of evaluative tests regarding the retrieval performance of natural language in various subject fields. It suggests parallel investigations into the structure of natural language, with particular reference to terminology, as used in the different branches of basic science. The criteria for defining the terminological consistency of a subject are formulated and a measure suggested for determining the degree of terminological consistency. The terminological and information structures of specific disciplines such as, chemistry, physics, botany, zoology, and geology; the circumstances in which terms originate; and the efforts made by the international scientific community to standardize the terminology in their respective disciplines - are examined in detail. This investigation shows why and how an artificially created scientific language finds it impossible to keep pace with current developments and thus points to the source of strength of natural language
  9. Bar-Ilan, J.: ¬The Web as an information source on informetrics? : A content analysis (2000) 0.01
    0.012982754 = product of:
      0.051931016 = sum of:
        0.051931016 = weight(_text_:reference in 4587) [ClassicSimilarity], result of:
          0.051931016 = score(doc=4587,freq=2.0), product of:
            0.19255297 = queryWeight, product of:
              4.0683694 = idf(docFreq=2055, maxDocs=44218)
              0.047329273 = queryNorm
            0.2696973 = fieldWeight in 4587, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.0683694 = idf(docFreq=2055, maxDocs=44218)
              0.046875 = fieldNorm(doc=4587)
      0.25 = coord(1/4)
    
    Abstract
    This article addresses the question of whether the Web can serve as an information source for research. Specifically, it analyzes by way of content analysis the Web pages retrieved by the major search engines on a particular date (June 7, 1998), as a result of the query 'informetrics OR informetric'. In 807 out of the 942 retrieved pages, the search terms were mentioned in the context of information science. Over 70% of the pages contained only indirect information on the topic, in the form of hypertext links and bibliographical references without annotation. The bibliographical references extracted from the Web pages were analyzed, and lists of most productive authors, most cited authors, works, and sources were compiled. The list of reference obtained from the Web was also compared to data retrieved from commercial databases. For most cases, the list of references extracted from the Web outperformed the commercial, bibliographic databases. The results of these comparisons indicate that valuable, freely available data is hidden in the Web waiting to be extracted from the millions of Web pages
  10. Wood, F.; Ford, N.; Walsh, C.: ¬The effect of postings information on search behaviour (1994) 0.01
    0.0112677105 = product of:
      0.045070842 = sum of:
        0.045070842 = product of:
          0.090141684 = sum of:
            0.090141684 = weight(_text_:file in 6890) [ClassicSimilarity], result of:
              0.090141684 = score(doc=6890,freq=2.0), product of:
                0.25368783 = queryWeight, product of:
                  5.3600616 = idf(docFreq=564, maxDocs=44218)
                  0.047329273 = queryNorm
                0.35532522 = fieldWeight in 6890, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.3600616 = idf(docFreq=564, maxDocs=44218)
                  0.046875 = fieldNorm(doc=6890)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    How postings information is used for inverted file searching was investigated by comparing searches, made by postgraduate students at the Dept. of Information Studies, of the LISA database on CD-ROM with and without postings information. Performance (the number of relevant references, precision and recall) was not significantly different but searches with postings information took more time, and more sets were viewed, than in searches without postings. Postings information was used to make decisions to narrow or broaden the search; to view or print the references. The same techniques were used to amend searches whether or not postings information was available. Users decided that a search was satisfactory on the basis of the search results, and consequently many searches done without postings were still considered satisfactory. However, searchers thought that the lack of postings information had affected 90% of their searches. Differences in search performance and searching behaviour were found in participants who were shown to have different learning styles using the Witkin's Embedded Figures test and the Lancaster Short Inventory of Approaches to Learning Test. These differences were, in part, explained by the differences in behaviour indicated by their learning styles
  11. Fuhr, N.; Niewelt, B.: ¬Ein Retrievaltest mit automatisch indexierten Dokumenten (1984) 0.01
    0.01122181 = product of:
      0.04488724 = sum of:
        0.04488724 = product of:
          0.08977448 = sum of:
            0.08977448 = weight(_text_:22 in 262) [ClassicSimilarity], result of:
              0.08977448 = score(doc=262,freq=2.0), product of:
                0.16573904 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.047329273 = queryNorm
                0.5416616 = fieldWeight in 262, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=262)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    20.10.2000 12:22:23
  12. Tomaiuolo, N.G.; Parker, J.: Maximizing relevant retrieval : keyword and natural language searching (1998) 0.01
    0.01122181 = product of:
      0.04488724 = sum of:
        0.04488724 = product of:
          0.08977448 = sum of:
            0.08977448 = weight(_text_:22 in 6418) [ClassicSimilarity], result of:
              0.08977448 = score(doc=6418,freq=2.0), product of:
                0.16573904 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.047329273 = queryNorm
                0.5416616 = fieldWeight in 6418, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=6418)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Source
    Online. 22(1998) no.6, S.57-58
  13. Voorhees, E.M.; Harman, D.: Overview of the Sixth Text REtrieval Conference (TREC-6) (2000) 0.01
    0.01122181 = product of:
      0.04488724 = sum of:
        0.04488724 = product of:
          0.08977448 = sum of:
            0.08977448 = weight(_text_:22 in 6438) [ClassicSimilarity], result of:
              0.08977448 = score(doc=6438,freq=2.0), product of:
                0.16573904 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.047329273 = queryNorm
                0.5416616 = fieldWeight in 6438, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=6438)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    11. 8.2001 16:22:19
  14. Dalrymple, P.W.: Retrieval by reformulation in two library catalogs : toward a cognitive model of searching behavior (1990) 0.01
    0.01122181 = product of:
      0.04488724 = sum of:
        0.04488724 = product of:
          0.08977448 = sum of:
            0.08977448 = weight(_text_:22 in 5089) [ClassicSimilarity], result of:
              0.08977448 = score(doc=5089,freq=2.0), product of:
                0.16573904 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.047329273 = queryNorm
                0.5416616 = fieldWeight in 5089, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=5089)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    22. 7.2006 18:43:54
  15. Losada, D.E.; Parapar, J.; Barreiro, A.: Multi-armed bandits for adjudicating documents in pooling-based evaluation of information retrieval systems (2017) 0.01
    0.010818963 = product of:
      0.04327585 = sum of:
        0.04327585 = weight(_text_:reference in 5098) [ClassicSimilarity], result of:
          0.04327585 = score(doc=5098,freq=2.0), product of:
            0.19255297 = queryWeight, product of:
              4.0683694 = idf(docFreq=2055, maxDocs=44218)
              0.047329273 = queryNorm
            0.22474778 = fieldWeight in 5098, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.0683694 = idf(docFreq=2055, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5098)
      0.25 = coord(1/4)
    
    Abstract
    Evaluating Information Retrieval systems is crucial to making progress in search technologies. Evaluation is often based on assembling reference collections consisting of documents, queries and relevance judgments done by humans. In large-scale environments, exhaustively judging relevance becomes infeasible. Instead, only a pool of documents is judged for relevance. By selectively choosing documents from the pool we can optimize the number of judgments required to identify a given number of relevant documents. We argue that this iterative selection process can be naturally modeled as a reinforcement learning problem and propose innovative and formal adjudication methods based on multi-armed bandits. Casting document judging as a multi-armed bandit problem is not only theoretically appealing, but also leads to highly effective adjudication methods. Under this bandit allocation framework, we consider stationary and non-stationary models and propose seven new document adjudication methods (five stationary methods and two non-stationary variants). Our paper also reports a series of experiments performed to thoroughly compare our new methods against current adjudication methods. This comparative study includes existing methods designed for pooling-based evaluation and existing methods designed for metasearch. Our experiments show that our theoretically grounded adjudication methods can substantially minimize the assessment effort.
  16. Robertson, S.E.; Sparck Jones, K.: Simple, proven approaches to text retrieval (1997) 0.01
    0.009389759 = product of:
      0.037559036 = sum of:
        0.037559036 = product of:
          0.07511807 = sum of:
            0.07511807 = weight(_text_:file in 4532) [ClassicSimilarity], result of:
              0.07511807 = score(doc=4532,freq=2.0), product of:
                0.25368783 = queryWeight, product of:
                  5.3600616 = idf(docFreq=564, maxDocs=44218)
                  0.047329273 = queryNorm
                0.29610437 = fieldWeight in 4532, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.3600616 = idf(docFreq=564, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4532)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    This technical note describes straightforward techniques for document indexing and retrieval that have been solidly established through extensive testing and are easy to apply. They are useful for many different types of text material, are viable for very large files, and have the advantage that they do not require special skills or training for searching, but are easy for end users. The document and text retrieval methods described here have a sound theoretical basis, are well established by extensive testing, and the ideas involved are now implemented in some commercial retrieval systems. Testing in the last few years has, in particular, shown that the methods presented here work very well with full texts, not only title and abstracts, and with large files of texts containing three quarters of a million documents. These tests, the TREC Tests (see Harman 1993 - 1997; IP&M 1995), have been rigorous comparative evaluations involving many different approaches to information retrieval. These techniques depend an the use of simple terms for indexing both request and document texts; an term weighting exploiting statistical information about term occurrences; an scoring for request-document matching, using these weights, to obtain a ranked search output; and an relevance feedback to modify request weights or term sets in iterative searching. The normal implementation is via an inverted file organisation using a term list with linked document identifiers, plus counting data, and pointers to the actual texts. The user's request can be a word list, phrases, sentences or extended text.
  17. Cleverdon, C.W.; Mills, J.: ¬The testing of index language devices (1985) 0.01
    0.00865517 = product of:
      0.03462068 = sum of:
        0.03462068 = weight(_text_:reference in 3643) [ClassicSimilarity], result of:
          0.03462068 = score(doc=3643,freq=2.0), product of:
            0.19255297 = queryWeight, product of:
              4.0683694 = idf(docFreq=2055, maxDocs=44218)
              0.047329273 = queryNorm
            0.17979822 = fieldWeight in 3643, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.0683694 = idf(docFreq=2055, maxDocs=44218)
              0.03125 = fieldNorm(doc=3643)
      0.25 = coord(1/4)
    
    Abstract
    A landmark event in the twentieth-century development of subject analysis theory was a retrieval experiment, begun in 1957, by Cyril Cleverdon, Librarian of the Cranfield Institute of Technology. For this work he received the Professional Award of the Special Libraries Association in 1962 and the Award of Merit of the American Society for Information Science in 1970. The objective of the experiment, called Cranfield I, was to test the ability of four indexing systems-UDC, Facet, Uniterm, and Alphabetic-Subject Headings-to retrieve material responsive to questions addressed to a collection of documents. The experiment was ambitious in scale, consisting of eighteen thousand documents and twelve hundred questions. Prior to Cranfield I, the question of what constitutes good indexing was approached subjectively and reference was made to assumptions in the form of principles that should be observed or user needs that should be met. Cranfield I was the first large-scale effort to use objective criteria for determining the parameters of good indexing. Its creative impetus was the definition of user satisfaction in terms of precision and recall. Out of the experiment emerged the definition of recall as the percentage of relevant documents retrieved and precision as the percentage of retrieved documents that were relevant. Operationalizing the concept of user satisfaction, that is, making it measurable, meant that it could be studied empirically and manipulated as a variable in mathematical equations. Much has been made of the fact that the experimental methodology of Cranfield I was seriously flawed. This is unfortunate as it tends to diminish Cleverdon's contribu tion, which was not methodological-such contributions can be left to benchmark researchers-but rather creative: the introduction of a new paradigm, one that proved to be eminently productive. The criticism leveled at the methodological shortcomings of Cranfield I underscored the need for more precise definitions of the variables involved in information retrieval. Particularly important was the need for a definition of the dependent variable index language. Like the definitions of precision and recall, that of index language provided a new way of looking at the indexing process. It was a re-visioning that stimulated research activity and led not only to a better understanding of indexing but also the design of better retrieval systems." Cranfield I was followed by Cranfield II. While Cranfield I was a wholesale comparison of four indexing "systems," Cranfield II aimed to single out various individual factors in index languages, called "indexing devices," and to measure how variations in these affected retrieval performance. The following selection represents the thinking at Cranfield midway between these two notable retrieval experiments.
  18. Allan, J.; Callan, J.P.; Croft, W.B.; Ballesteros, L.; Broglio, J.; Xu, J.; Shu, H.: INQUERY at TREC-5 (1997) 0.01
    0.008015579 = product of:
      0.032062314 = sum of:
        0.032062314 = product of:
          0.06412463 = sum of:
            0.06412463 = weight(_text_:22 in 3103) [ClassicSimilarity], result of:
              0.06412463 = score(doc=3103,freq=2.0), product of:
                0.16573904 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.047329273 = queryNorm
                0.38690117 = fieldWeight in 3103, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=3103)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    27. 2.1999 20:55:22
  19. Ng, K.B.; Loewenstern, D.; Basu, C.; Hirsh, H.; Kantor, P.B.: Data fusion of machine-learning methods for the TREC5 routing tak (and other work) (1997) 0.01
    0.008015579 = product of:
      0.032062314 = sum of:
        0.032062314 = product of:
          0.06412463 = sum of:
            0.06412463 = weight(_text_:22 in 3107) [ClassicSimilarity], result of:
              0.06412463 = score(doc=3107,freq=2.0), product of:
                0.16573904 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.047329273 = queryNorm
                0.38690117 = fieldWeight in 3107, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=3107)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    27. 2.1999 20:59:22
  20. Saracevic, T.: On a method for studying the structure and nature of requests in information retrieval (1983) 0.01
    0.008015579 = product of:
      0.032062314 = sum of:
        0.032062314 = product of:
          0.06412463 = sum of:
            0.06412463 = weight(_text_:22 in 2417) [ClassicSimilarity], result of:
              0.06412463 = score(doc=2417,freq=2.0), product of:
                0.16573904 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.047329273 = queryNorm
                0.38690117 = fieldWeight in 2417, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=2417)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Pages
    S.22-25

Languages

  • e 44
  • d 4
  • f 1
  • More… Less…

Types

  • a 43
  • s 4
  • m 3
  • el 1
  • x 1
  • More… Less…