Search (45 results, page 1 of 3)

Iivonen, M.: Consistency in the selection of search concepts and search terms (1995) 0.09
```
0.090191014 = product of:
  0.18038203 = sum of:
    0.18038203 = sum of:
      0.13905653 = weight(_text_:ii in 1757) [ClassicSimilarity], result of:
        0.13905653 = score(doc=1757,freq=4.0), product of:
          0.2745971 = queryWeight, product of:
            5.4016213 = idf(docFreq=541, maxDocs=44218)
            0.050836053 = queryNorm
          0.506402 = fieldWeight in 1757, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            5.4016213 = idf(docFreq=541, maxDocs=44218)
            0.046875 = fieldNorm(doc=1757)
      0.0413255 = weight(_text_:22 in 1757) [ClassicSimilarity], result of:
        0.0413255 = score(doc=1757,freq=2.0), product of:
          0.1780192 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.050836053 = queryNorm
          0.23214069 = fieldWeight in 1757, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=1757)
  0.5 = coord(1/2)
```
Abstract

Considers intersearcher and intrasearcher consistency in the selection of search terms. Based on an empirical study where 22 searchers from 4 different types of search environments analyzed altogether 12 search requests of 4 different types in 2 separate test situations between which 2 months elapsed. Statistically very significant differences in consistency were found according to the types of search environments and search requests. Consistency was also considered according to the extent of the scope of search concept. At level I search terms were compared character by character. At level II different search terms were accepted as the same search concept with a rather simple evaluation of linguistic expressions. At level III, in addition to level II, the hierarchical approach of the search request was also controlled. At level IV different search terms were accepted as the same search concept with a broad interpretation of the search concept. Both intersearcher and intrasearcher consistency grew most immediately after a rather simple evaluation of linguistic impressions

Harter, S.P.: ¬The Cranfield II relevance assessments : a critical evaluation (1971) 0.07

0.06555188 = product of:
  0.13110375 = sum of:
    0.13110375 = product of:
      0.2622075 = sum of:
        0.2622075 = weight(_text_:ii in 5364) [ClassicSimilarity], result of:
          0.2622075 = score(doc=5364,freq=2.0), product of:
            0.2745971 = queryWeight, product of:
              5.4016213 = idf(docFreq=541, maxDocs=44218)
              0.050836053 = queryNorm
            0.9548808 = fieldWeight in 5364, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4016213 = idf(docFreq=541, maxDocs=44218)
              0.125 = fieldNorm(doc=5364)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Bauer, G.; Schneider, C.: PADOK-II : Untersuchungen zur Volltextproblematik und zur interpretativen Analyse der Retrievalprotokolle (1990) 0.06
```
0.05676959 = product of:
  0.11353918 = sum of:
    0.11353918 = product of:
      0.22707836 = sum of:
        0.22707836 = weight(_text_:ii in 4164) [ClassicSimilarity], result of:
          0.22707836 = score(doc=4164,freq=6.0), product of:
            0.2745971 = queryWeight, product of:
              5.4016213 = idf(docFreq=541, maxDocs=44218)
              0.050836053 = queryNorm
            0.826951 = fieldWeight in 4164, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              5.4016213 = idf(docFreq=541, maxDocs=44218)
              0.0625 = fieldNorm(doc=4164)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Dieser Beitrag baut auf dem Bericht über das methodische Konzept, über die Durchführung und die Ergebnisse der PADOK-II-Retrievaltests auf (Krause/Wormser-Hacker). Hier werden die Ergebnisse von Zusatztests zum Einfluß des Umfangs der zugrundeliegenden Dokumente (Volltext vs. Titel+Abstract) beschrieben, die eine deutliche Beeinträchtigung der Recall-Werte bei reduziertem Dokumentenumfang zeigen. Zur interpretativen Analyse der Retrievalprotokolle werden vor allem die methodische Einbindung, Ansatzpunkte der Analyse und erste Ergebnisse vorgestelt.

Object

PADOK-II

Gödert, W.; Liebig, M.: Maschinelle Indexierung auf dem Prüfstand : Ergebnisse eines Retrievaltests zum MILOS II Projekt (1997) 0.05

0.04967339 = product of:
  0.09934678 = sum of:
    0.09934678 = product of:
      0.19869356 = sum of:
        0.19869356 = weight(_text_:ii in 1174) [ClassicSimilarity], result of:
          0.19869356 = score(doc=1174,freq=6.0), product of:
            0.2745971 = queryWeight, product of:
              5.4016213 = idf(docFreq=541, maxDocs=44218)
              0.050836053 = queryNorm
            0.72358215 = fieldWeight in 1174, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              5.4016213 = idf(docFreq=541, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1174)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Abstract: The test ran between Nov 95-Aug 96 in Cologne Fachhochschule fur Bibliothekswesen (College of Librarianship).The test basis was a database of 190,000 book titles published between 1990-95. MILOS II mechanized indexing methods proved helpful in avoiding or reducing numbers of unsatisfied/no result retrieval searches. Retrieval from mechanised indexing is 3 times more successful than from title keyword data. MILOS II also used a standardized semantic vocabulary. Mechanised indexing demands high quality software and output data

Krause, J.; Womser-Hacker, C.: PADOK-II : Retrievaltests zur Bewertung von Volltextindexierungsvarianten für das deutsche Patentinformationssystem (1990) 0.03

0.03277594 = product of:
  0.06555188 = sum of:
    0.06555188 = product of:
      0.13110375 = sum of:
        0.13110375 = weight(_text_:ii in 2653) [ClassicSimilarity], result of:
          0.13110375 = score(doc=2653,freq=2.0), product of:
            0.2745971 = queryWeight, product of:
              5.4016213 = idf(docFreq=541, maxDocs=44218)
              0.050836053 = queryNorm
            0.4774404 = fieldWeight in 2653, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4016213 = idf(docFreq=541, maxDocs=44218)
              0.0625 = fieldNorm(doc=2653)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Robertson, S.E.: ¬The parametric description of retrieval tests : Part II: Overall measures (1969) 0.03

0.028678946 = product of:
  0.057357892 = sum of:
    0.057357892 = product of:
      0.114715785 = sum of:
        0.114715785 = weight(_text_:ii in 4156) [ClassicSimilarity], result of:
          0.114715785 = score(doc=4156,freq=2.0), product of:
            0.2745971 = queryWeight, product of:
              5.4016213 = idf(docFreq=541, maxDocs=44218)
              0.050836053 = queryNorm
            0.41776034 = fieldWeight in 4156, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4016213 = idf(docFreq=541, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4156)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Munkelt, J.: Erstellung einer DNB-Retrieval-Testkollektion (2018) 0.03

0.028678946 = product of:
  0.057357892 = sum of:
    0.057357892 = product of:
      0.114715785 = sum of:
        0.114715785 = weight(_text_:ii in 4310) [ClassicSimilarity], result of:
          0.114715785 = score(doc=4310,freq=2.0), product of:
            0.2745971 = queryWeight, product of:
              5.4016213 = idf(docFreq=541, maxDocs=44218)
              0.050836053 = queryNorm
            0.41776034 = fieldWeight in 4310, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4016213 = idf(docFreq=541, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4310)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Pages: II, 79 S

Bodoff, D.; Kambil, A.: Partial coordination : II. A preliminary evaluation and failure analysis (1998) 0.02

0.024581954 = product of:
  0.049163908 = sum of:
    0.049163908 = product of:
      0.098327816 = sum of:
        0.098327816 = weight(_text_:ii in 2323) [ClassicSimilarity], result of:
          0.098327816 = score(doc=2323,freq=2.0), product of:
            0.2745971 = queryWeight, product of:
              5.4016213 = idf(docFreq=541, maxDocs=44218)
              0.050836053 = queryNorm
            0.3580803 = fieldWeight in 2323, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4016213 = idf(docFreq=541, maxDocs=44218)
              0.046875 = fieldNorm(doc=2323)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Oberhauser, O.; Labner, J.: OPAC-Erweiterung durch automatische Indexierung : Empirische Untersuchung mit Daten aus dem Österreichischen Verbundkatalog (2002) 0.02
```
0.024581954 = product of:
  0.049163908 = sum of:
    0.049163908 = product of:
      0.098327816 = sum of:
        0.098327816 = weight(_text_:ii in 883) [ClassicSimilarity], result of:
          0.098327816 = score(doc=883,freq=2.0), product of:
            0.2745971 = queryWeight, product of:
              5.4016213 = idf(docFreq=541, maxDocs=44218)
              0.050836053 = queryNorm
            0.3580803 = fieldWeight in 883, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4016213 = idf(docFreq=541, maxDocs=44218)
              0.046875 = fieldNorm(doc=883)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

In Anlehnung an die in den neunziger Jahren durchgeführten Erschließungsprojekte MILOS I und MILOS II, die die Eignung eines Verfahrens zur automatischen Indexierung für Bibliothekskataloge zum Thema hatten, wurde eine empirische Untersuchung anhand einer repräsentativen Stichprobe von Titelsätzen aus dem Österreichischen Verbundkatalog durchgeführt. Ziel war die Prüfung und Bewertung der Einsatzmöglichkeit dieses Verfahrens in den Online-Katalogen des Verbundes. Der Realsituation der OPAC-Benutzung gemäß wurde ausschließlich die Auswirkung auf den automatisch generierten Begriffen angereicherten Basic Index ("Alle Felder") untersucht. Dazu wurden 100 Suchanfragen zunächst im ursprünglichen Basic Index und sodann im angereicherten Basic Index in einem OPAC unter Aleph 500 durchgeführt. Die Tests erbrachten einen Zuwachs an relevanten Treffern bei nur leichten Verlusten an Precision, eine Reduktion der Nulltreffer-Ergebnisse sowie Aufschlüsse über die Auswirkung einer vorhandenen verbalen Sacherschließung.

Fuhr, N.; Niewelt, B.: ¬Ein Retrievaltest mit automatisch indexierten Dokumenten (1984) 0.02

0.024106542 = product of:
  0.048213083 = sum of:
    0.048213083 = product of:
      0.09642617 = sum of:
        0.09642617 = weight(_text_:22 in 262) [ClassicSimilarity], result of:
          0.09642617 = score(doc=262,freq=2.0), product of:
            0.1780192 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050836053 = queryNorm
            0.5416616 = fieldWeight in 262, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=262)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 20.10.2000 12:22:23

Tomaiuolo, N.G.; Parker, J.: Maximizing relevant retrieval : keyword and natural language searching (1998) 0.02

0.024106542 = product of:
  0.048213083 = sum of:
    0.048213083 = product of:
      0.09642617 = sum of:
        0.09642617 = weight(_text_:22 in 6418) [ClassicSimilarity], result of:
          0.09642617 = score(doc=6418,freq=2.0), product of:
            0.1780192 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050836053 = queryNorm
            0.5416616 = fieldWeight in 6418, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=6418)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: Online. 22(1998) no.6, S.57-58

Voorhees, E.M.; Harman, D.: Overview of the Sixth Text REtrieval Conference (TREC-6) (2000) 0.02

0.024106542 = product of:
  0.048213083 = sum of:
    0.048213083 = product of:
      0.09642617 = sum of:
        0.09642617 = weight(_text_:22 in 6438) [ClassicSimilarity], result of:
          0.09642617 = score(doc=6438,freq=2.0), product of:
            0.1780192 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050836053 = queryNorm
            0.5416616 = fieldWeight in 6438, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=6438)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 11. 8.2001 16:22:19

Dalrymple, P.W.: Retrieval by reformulation in two library catalogs : toward a cognitive model of searching behavior (1990) 0.02

0.024106542 = product of:
  0.048213083 = sum of:
    0.048213083 = product of:
      0.09642617 = sum of:
        0.09642617 = weight(_text_:22 in 5089) [ClassicSimilarity], result of:
          0.09642617 = score(doc=5089,freq=2.0), product of:
            0.1780192 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050836053 = queryNorm
            0.5416616 = fieldWeight in 5089, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=5089)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 22. 7.2006 18:43:54

Cleverdon, C.W.; Mills, J.: ¬The testing of index language devices (1985) 0.02
```
0.023176087 = product of:
  0.046352174 = sum of:
    0.046352174 = product of:
      0.09270435 = sum of:
        0.09270435 = weight(_text_:ii in 3643) [ClassicSimilarity], result of:
          0.09270435 = score(doc=3643,freq=4.0), product of:
            0.2745971 = queryWeight, product of:
              5.4016213 = idf(docFreq=541, maxDocs=44218)
              0.050836053 = queryNorm
            0.33760133 = fieldWeight in 3643, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.4016213 = idf(docFreq=541, maxDocs=44218)
              0.03125 = fieldNorm(doc=3643)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

A landmark event in the twentieth-century development of subject analysis theory was a retrieval experiment, begun in 1957, by Cyril Cleverdon, Librarian of the Cranfield Institute of Technology. For this work he received the Professional Award of the Special Libraries Association in 1962 and the Award of Merit of the American Society for Information Science in 1970. The objective of the experiment, called Cranfield I, was to test the ability of four indexing systems-UDC, Facet, Uniterm, and Alphabetic-Subject Headings-to retrieve material responsive to questions addressed to a collection of documents. The experiment was ambitious in scale, consisting of eighteen thousand documents and twelve hundred questions. Prior to Cranfield I, the question of what constitutes good indexing was approached subjectively and reference was made to assumptions in the form of principles that should be observed or user needs that should be met. Cranfield I was the first large-scale effort to use objective criteria for determining the parameters of good indexing. Its creative impetus was the definition of user satisfaction in terms of precision and recall. Out of the experiment emerged the definition of recall as the percentage of relevant documents retrieved and precision as the percentage of retrieved documents that were relevant. Operationalizing the concept of user satisfaction, that is, making it measurable, meant that it could be studied empirically and manipulated as a variable in mathematical equations. Much has been made of the fact that the experimental methodology of Cranfield I was seriously flawed. This is unfortunate as it tends to diminish Cleverdon's contribu tion, which was not methodological-such contributions can be left to benchmark researchers-but rather creative: the introduction of a new paradigm, one that proved to be eminently productive. The criticism leveled at the methodological shortcomings of Cranfield I underscored the need for more precise definitions of the variables involved in information retrieval. Particularly important was the need for a definition of the dependent variable index language. Like the definitions of precision and recall, that of index language provided a new way of looking at the indexing process. It was a re-visioning that stimulated research activity and led not only to a better understanding of indexing but also the design of better retrieval systems." Cranfield I was followed by Cranfield II. While Cranfield I was a wholesale comparison of four indexing "systems," Cranfield II aimed to single out various individual factors in index languages, called "indexing devices," and to measure how variations in these affected retrieval performance. The following selection represents the thinking at Cranfield midway between these two notable retrieval experiments.
Angelini, M.; Fazzini, V.; Ferro, N.; Santucci, G.; Silvello, G.: CLAIRE: A combinatorial visual analytics system for information retrieval evaluation (2018) 0.02
```
0.020484962 = product of:
  0.040969923 = sum of:
    0.040969923 = product of:
      0.08193985 = sum of:
        0.08193985 = weight(_text_:ii in 5049) [ClassicSimilarity], result of:
          0.08193985 = score(doc=5049,freq=2.0), product of:
            0.2745971 = queryWeight, product of:
              5.4016213 = idf(docFreq=541, maxDocs=44218)
              0.050836053 = queryNorm
            0.29840025 = fieldWeight in 5049, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4016213 = idf(docFreq=541, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5049)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Information Retrieval (IR) develops complex systems, constituted of several components, which aim at returning and optimally ranking the most relevant documents in response to user queries. In this context, experimental evaluation plays a central role, since it allows for measuring IR systems effectiveness, increasing the understanding of their functioning, and better directing the efforts for improving them. Current evaluation methodologies are limited by two major factors: (i) IR systems are evaluated as "black boxes", since it is not possible to decompose the contributions of the different components, e.g., stop lists, stemmers, and IR models; (ii) given that it is not possible to predict the effectiveness of an IR system, both academia and industry need to explore huge numbers of systems, originated by large combinatorial compositions of their components, to understand how they perform and how these components interact together. We propose a Combinatorial visuaL Analytics system for Information Retrieval Evaluation (CLAIRE) which allows for exploring and making sense of the performances of a large amount of IR systems, in order to quickly and intuitively grasp which system configurations are preferred, what are the contributions of the different components and how these components interact together. The CLAIRE system is then validated against use cases based on several test collections using a wide set of systems, generated by a combinatorial composition of several off-the-shelf components, representing the most common denominator almost always present in English IR systems. In particular, we validate the findings enabled by CLAIRE with respect to consolidated deep statistical analyses and we show that the CLAIRE system allows the generation of new insights, which were not detectable with traditional approaches.

Allan, J.; Callan, J.P.; Croft, W.B.; Ballesteros, L.; Broglio, J.; Xu, J.; Shu, H.: INQUERY at TREC-5 (1997) 0.02

0.017218959 = product of:
  0.034437917 = sum of:
    0.034437917 = product of:
      0.068875834 = sum of:
        0.068875834 = weight(_text_:22 in 3103) [ClassicSimilarity], result of:
          0.068875834 = score(doc=3103,freq=2.0), product of:
            0.1780192 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050836053 = queryNorm
            0.38690117 = fieldWeight in 3103, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=3103)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 27. 2.1999 20:55:22

Ng, K.B.; Loewenstern, D.; Basu, C.; Hirsh, H.; Kantor, P.B.: Data fusion of machine-learning methods for the TREC5 routing tak (and other work) (1997) 0.02

0.017218959 = product of:
  0.034437917 = sum of:
    0.034437917 = product of:
      0.068875834 = sum of:
        0.068875834 = weight(_text_:22 in 3107) [ClassicSimilarity], result of:
          0.068875834 = score(doc=3107,freq=2.0), product of:
            0.1780192 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050836053 = queryNorm
            0.38690117 = fieldWeight in 3107, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=3107)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 27. 2.1999 20:59:22

Saracevic, T.: On a method for studying the structure and nature of requests in information retrieval (1983) 0.02

0.017218959 = product of:
  0.034437917 = sum of:
    0.034437917 = product of:
      0.068875834 = sum of:
        0.068875834 = weight(_text_:22 in 2417) [ClassicSimilarity], result of:
          0.068875834 = score(doc=2417,freq=2.0), product of:
            0.1780192 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050836053 = queryNorm
            0.38690117 = fieldWeight in 2417, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=2417)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Pages: S.22-25

Womser-Hacker, C.: Theorie des Information Retrieval III : Evaluierung (2004) 0.02
```
0.01638797 = product of:
  0.03277594 = sum of:
    0.03277594 = product of:
      0.06555188 = sum of:
        0.06555188 = weight(_text_:ii in 2919) [ClassicSimilarity], result of:
          0.06555188 = score(doc=2919,freq=2.0), product of:
            0.2745971 = queryWeight, product of:
              5.4016213 = idf(docFreq=541, maxDocs=44218)
              0.050836053 = queryNorm
            0.2387202 = fieldWeight in 2919, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4016213 = idf(docFreq=541, maxDocs=44218)
              0.03125 = fieldNorm(doc=2919)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Information-Retrieval-Systeme wurden bereits sehr früh aus einer bewertenden Perspektive betrachtet. Jede neu entwickelte Komponente sollte effektivitätssteigernd für das gesamte System wirken und musste ihre Funktionalität unter Beweis stellen oder den Vergleich zu existierenden Verfahren antreten (z.B. automatische Indexierung vs. manuelle Erschließung von Informationsobjekten). 1963 fanden die Cranfield-II-Experimente statt und begründeten die Evaluierungsprinzipien im Information Retrieval. Somit haben auch Bewertungsverfahren, -ansätze und -methoden bereits eine lange Tradition. Die von Sparck Jones eingebrachte Feststellung, dass die genauen Gründe für das Verhalten von Information-Retrieval-Systemen oft im Dunklen lägen, führte zu der Forderung nach einer exakten und expliziten Evaluierungsmethodologie und experimentellen Überprüfbarkeit. Als generelle Herangehensweise hat sich ein indirektes Verfahren zur Bewertung von InformationRetrieval-Systemen etabliert, bei welchem das System an sich als black box gesehen und nur der Retrievaloutput als Grundlage für die Bewertung herangezogen wird. In den Experimenten stand die Systemperspektive im Vordergrund, um zu einer bewertenden Aussage zu gelangen. Es wurde gemessen, wie gut die Systeme in der Lage sind, die an sie gestellten Anforderungen zu erfüllen, relevante Dokumente zu liefern und nicht-relevante zurückzuhalten. Durch die zunehmende Komplexität der Systeme sowie die immer stärkere Einbeziehung von Benutzern, die nicht über die Kompetenz und Professionalität von Informationsfachleuten verfügen, wurde es immer schwieriger, Einzeleigenschaften vom Gesamtsystem zu isolieren und experimentell zu bewerten. Erst im Zeitalter der Suchmaschinen ist man zu der Ansicht gelangt, dass den Benutzern der Systeme eine entscheidende Rolle bei der Bewertung zukommt. Die Verfahren der Qualitätsbewertung müssen - wie dieses Beispiel zeigt - ständig weiterentwickelt werden. Die Benutzermerkmale können heterogen sein und sich einer genauen Kenntnis entziehen, was eine vollständige Formalisierung bzw. Quantifizierung erschwert. Neueren Datums sind Studien, die sich auf interaktive Information-Retrieval-Systeme oder auf die Qualitätsbestimmung bestimmter Teilkomponenten spezialisiert haben wie z.B. die Erschließungsoder Visualisierungskomponente, die Gestaltung der Benutzungsschnittstelle aus softwareergonomischer Sicht oder auch die Multilingua-Fähigkeit.
Kutlu, M.; Elsayed, T.; Lease, M.: Intelligent topic selection for low-cost information retrieval evaluation : a new perspective on deep vs. shallow judging (2018) 0.02
```
0.01638797 = product of:
  0.03277594 = sum of:
    0.03277594 = product of:
      0.06555188 = sum of:
        0.06555188 = weight(_text_:ii in 5092) [ClassicSimilarity], result of:
          0.06555188 = score(doc=5092,freq=2.0), product of:
            0.2745971 = queryWeight, product of:
              5.4016213 = idf(docFreq=541, maxDocs=44218)
              0.050836053 = queryNorm
            0.2387202 = fieldWeight in 5092, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4016213 = idf(docFreq=541, maxDocs=44218)
              0.03125 = fieldNorm(doc=5092)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

While test collections provide the cornerstone for Cranfield-based evaluation of information retrieval (IR) systems, it has become practically infeasible to rely on traditional pooling techniques to construct test collections at the scale of today's massive document collections (e.g., ClueWeb12's 700M+ Webpages). This has motivated a flurry of studies proposing more cost-effective yet reliable IR evaluation methods. In this paper, we propose a new intelligent topic selection method which reduces the number of search topics (and thereby costly human relevance judgments) needed for reliable IR evaluation. To rigorously assess our method, we integrate previously disparate lines of research on intelligent topic selection and deep vs. shallow judging (i.e., whether it is more cost-effective to collect many relevance judgments for a few topics or a few judgments for many topics). While prior work on intelligent topic selection has never been evaluated against shallow judging baselines, prior work on deep vs. shallow judging has largely argued for shallowed judging, but assuming random topic selection. We argue that for evaluating any topic selection method, ultimately one must ask whether it is actually useful to select topics, or should one simply perform shallow judging over many topics? In seeking a rigorous answer to this over-arching question, we conduct a comprehensive investigation over a set of relevant factors never previously studied together: 1) method of topic selection; 2) the effect of topic familiarity on human judging speed; and 3) how different topic generation processes (requiring varying human effort) impact (i) budget utilization and (ii) the resultant quality of judgments. Experiments on NIST TREC Robust 2003 and Robust 2004 test collections show that not only can we reliably evaluate IR systems with fewer topics, but also that: 1) when topics are intelligently selected, deep judging is often more cost-effective than shallow judging in evaluation reliability; and 2) topic familiarity and topic generation costs greatly impact the evaluation cost vs. reliability trade-off. Our findings challenge conventional wisdom in showing that deep judging is often preferable to shallow judging when topics are selected intelligently.

Search (45 results, page 1 of 3)

Authors

Years

Languages

Themes