Search (6 results, page 1 of 1)

Wartena, C.; Golub, K.: Evaluierung von Verschlagwortung im Kontext des Information Retrievals (2021) 0.02
```
0.016422477 = product of:
  0.049267426 = sum of:
    0.009977593 = weight(_text_:in in 376) [ClassicSimilarity], result of:
      0.009977593 = score(doc=376,freq=10.0), product of:
        0.059380736 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.043654136 = queryNorm
        0.16802745 = fieldWeight in 376, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0390625 = fieldNorm(doc=376)
    0.039289832 = weight(_text_:und in 376) [ClassicSimilarity], result of:
      0.039289832 = score(doc=376,freq=22.0), product of:
        0.09675359 = queryWeight, product of:
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.043654136 = queryNorm
        0.40608138 = fieldWeight in 376, product of:
          4.690416 = tf(freq=22.0), with freq of:
            22.0 = termFreq=22.0
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.0390625 = fieldNorm(doc=376)
  0.33333334 = coord(2/6)
```
Abstract

Dieser Beitrag möchte einen Überblick über die in der Literatur diskutierten Möglichkeiten, Herausforderungen und Grenzen geben, Retrieval als eine extrinsische Evaluierungsmethode für die Ergebnisse verbaler Sacherschließung zu nutzen. Die inhaltliche Erschließung im Allgemeinen und die Verschlagwortung im Besonderen können intrinsisch oder extrinsisch evaluiert werden. Die intrinsische Evaluierung bezieht sich auf Eigenschaften der Erschließung, von denen vermutet wird, dass sie geeignete Indikatoren für die Qualität der Erschließung sind, wie formale Einheitlichkeit (im Hinblick auf die Anzahl zugewiesener Deskriptoren pro Dokument, auf die Granularität usw.), Konsistenz oder Übereinstimmung der Ergebnisse verschiedener Erschließer:innen. Bei einer extrinsischen Evaluierung geht es darum, die Qualität der gewählten Deskriptoren daran zu messen, wie gut sie sich tatsächlich bei der Suche bewähren. Obwohl die extrinsische Evaluierung direktere Auskunft darüber gibt, ob die Erschließung ihren Zweck erfüllt, und daher den Vorzug verdienen sollte, ist sie kompliziert und oft problematisch. In einem Retrievalsystem greifen verschiedene Algorithmen und Datenquellen in vielschichtiger Weise ineinander und interagieren bei der Evaluierung darüber hinaus noch mit Nutzer:innen und Rechercheaufgaben. Die Evaluierung einer Komponente im System kann nicht einfach dadurch vorgenommen werden, dass man sie austauscht und mit einer anderen Komponente vergleicht, da die gleiche Ressource oder der gleiche Algorithmus sich in unterschiedlichen Umgebungen unterschiedlich verhalten kann. Wir werden relevante Evaluierungsansätze vorstellen und diskutieren, und zum Abschluss einige Empfehlungen für die Evaluierung von Verschlagwortung im Kontext von Retrieval geben.

Series

Bibliotheks- und Informationspraxis; 70

Source

Qualität in der Inhaltserschließung. Hrsg.: M. Franke-Maier, u.a
Petras, V.; Womser-Hacker, C.: Evaluation im Information Retrieval (2023) 0.01
```
0.011261909 = product of:
  0.033785727 = sum of:
    0.005354538 = weight(_text_:in in 808) [ClassicSimilarity], result of:
      0.005354538 = score(doc=808,freq=2.0), product of:
        0.059380736 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.043654136 = queryNorm
        0.09017298 = fieldWeight in 808, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.046875 = fieldNorm(doc=808)
    0.02843119 = weight(_text_:und in 808) [ClassicSimilarity], result of:
      0.02843119 = score(doc=808,freq=8.0), product of:
        0.09675359 = queryWeight, product of:
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.043654136 = queryNorm
        0.29385152 = fieldWeight in 808, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.046875 = fieldNorm(doc=808)
  0.33333334 = coord(2/6)
```
Abstract

Das Ziel einer Evaluation ist die Überprüfung, ob bzw. in welchem Ausmaß ein Informationssystem die an das System gestellten Anforderungen erfüllt. Informationssysteme können aus verschiedenen Perspektiven evaluiert werden. Für eine ganzheitliche Evaluation (als Synonym wird auch Evaluierung benutzt), die unterschiedliche Qualitätsaspekte betrachtet (z. B. wie gut ein System relevante Dokumente rankt, wie schnell ein System die Suche durchführt, wie die Ergebnispräsentation gestaltet ist oder wie Suchende durch das System geführt werden) und die Erfüllung mehrerer Anforderungen überprüft, empfiehlt es sich, sowohl eine perspektivische als auch methodische Triangulation (d. h. der Einsatz von mehreren Ansätzen zur Qualitätsüberprüfung) vorzunehmen. Im Information Retrieval (IR) konzentriert sich die Evaluation auf die Qualitätseinschätzung der Suchfunktion eines Information-Retrieval-Systems (IRS), wobei oft zwischen systemzentrierter und nutzerzentrierter Evaluation unterschieden wird. Dieses Kapitel setzt den Fokus auf die systemzentrierte Evaluation, während andere Kapitel dieses Handbuchs andere Evaluationsansätze diskutieren (s. Kapitel C 4 Interaktives Information Retrieval, C 7 Cross-Language Information Retrieval und D 1 Information Behavior).

Source

Grundlagen der Informationswissenschaft. Hrsg.: Rainer Kuhlen, Dirk Lewandowski, Wolfgang Semar und Christa Womser-Hacker. 7., völlig neu gefasste Ausg
Meyer, O.C.: Retrievalexperimente mit bibliothekarischen Daten : Historischer Überblick und aktueller Forschungsstand (2022) 0.01
```
0.008308224 = product of:
  0.024924671 = sum of:
    0.010709076 = weight(_text_:in in 655) [ClassicSimilarity], result of:
      0.010709076 = score(doc=655,freq=8.0), product of:
        0.059380736 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.043654136 = queryNorm
        0.18034597 = fieldWeight in 655, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.046875 = fieldNorm(doc=655)
    0.014215595 = weight(_text_:und in 655) [ClassicSimilarity], result of:
      0.014215595 = score(doc=655,freq=2.0), product of:
        0.09675359 = queryWeight, product of:
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.043654136 = queryNorm
        0.14692576 = fieldWeight in 655, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.046875 = fieldNorm(doc=655)
  0.33333334 = coord(2/6)
```
Abstract

Die Retrievalforschung in der Bibliothekswissenschaft hat in den letzten Jahrzehnten beachtliche Fortschritte gemacht. Automatische Indexierungsmethoden werden immer häufiger angewendet, obwohl dieses Thema in der Bibliothekswelt kontrovers diskutiert wird. Die Ergebnisse maschineller Erschließungsarbeit werden anhand von Retrievaltests festgehalten. Der Gegenstand dieser Arbeit ist die Darstellung von Retrievalexperimenten mit bibliothekarischen Daten. Zu Beginn werden die Grundlagen solcher Retrievaltests sowie das Cranfield-Paradigma erläutert. Es folgt eine Vorstellung verschiedener wissen schaftlicher Projekte aus diesem Forschungsfeld in chronologischer Reihenfolge. Wenn Verbindungen oder Einflussnahmen zwischen den einzelnen Projekten bestehen, werden diese herausgestellt. Im besonderen Umfang wird das Retrievalprojekt GELIC der TH Köln beschrieben, an dem der Autor dieser Arbeit beteiligt war. Obwohl es isolierte Retrievalprojekte gibt, lässt sich aus methodischer Sicht eine Verbindung von den frühesten Experimenten zu den heutigen Retrievalexperimenten herstellen. Diese Entwicklung ist noch nicht abgeschlossen.
Breuer, T.; Tavakolpoursaleh, N.; Schaer, P.; Hienert, D.; Schaible, J.; Castro, L.J.: Online Information Retrieval Evaluation using the STELLA Framework (2022) 0.00
```
0.0015457221 = product of:
  0.009274333 = sum of:
    0.009274333 = weight(_text_:in in 640) [ClassicSimilarity], result of:
      0.009274333 = score(doc=640,freq=6.0), product of:
        0.059380736 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.043654136 = queryNorm
        0.1561842 = fieldWeight in 640, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.046875 = fieldNorm(doc=640)
  0.16666667 = coord(1/6)
```
Abstract

Involving users in early phases of software development has become a common strategy as it enables developers to consider user needs from the beginning. Once a system is in production, new opportunities to observe, evaluate and learn from users emerge as more information becomes available. Gathering information from users to continuously evaluate their behavior is a common practice for commercial software, while the Cranfield paradigm remains the preferred option for Information Retrieval (IR) and recommendation systems in the academic world. Here we introduce the Infrastructures for Living Labs STELLA project which aims to create an evaluation infrastructure allowing experimental systems to run along production web-based academic search systems with real users. STELLA combines user interactions and log files analyses to enable large-scale A/B experiments for academic search.
Parapar, J.; Losada, D.E.; Presedo-Quindimil, M.A.; Barreiro, A.: Using score distributions to compare statistical significance tests for information retrieval evaluation (2020) 0.00
```
0.0014873719 = product of:
  0.008924231 = sum of:
    0.008924231 = weight(_text_:in in 5506) [ClassicSimilarity], result of:
      0.008924231 = score(doc=5506,freq=8.0), product of:
        0.059380736 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.043654136 = queryNorm
        0.15028831 = fieldWeight in 5506, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5506)
  0.16666667 = coord(1/6)
```
Abstract

Statistical significance tests can provide evidence that the observed difference in performance between 2 methods is not due to chance. In information retrieval (IR), some studies have examined the validity and suitability of such tests for comparing search systems. We argue here that current methods for assessing the reliability of statistical tests suffer from some methodological weaknesses, and we propose a novel way to study significance tests for retrieval evaluation. Using Score Distributions, we model the output of multiple search systems, produce simulated search results from such models, and compare them using various significance tests. A key strength of this approach is that we assess statistical tests under perfect knowledge about the truth or falseness of the null hypothesis. This new method for studying the power of significance tests in IR evaluation is formal and innovative. Following this type of analysis, we found that both the sign test and Wilcoxon signed test have more power than the permutation test and the t-test. The sign test and Wilcoxon signed test also have good behavior in terms of type I errors. The bootstrap test shows few type I errors, but it has less power than the other methods tested.
Vegt, A. van der; Zuccon, G.; Koopman, B.: Do better search engines really equate to better clinical decisions? : If not, why not? (2021) 0.00
```
0.0012881019 = product of:
  0.007728611 = sum of:
    0.007728611 = weight(_text_:in in 150) [ClassicSimilarity], result of:
      0.007728611 = score(doc=150,freq=6.0), product of:
        0.059380736 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.043654136 = queryNorm
        0.1301535 = fieldWeight in 150, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0390625 = fieldNorm(doc=150)
  0.16666667 = coord(1/6)
```
Abstract

Previous research has found that improved search engine effectiveness-evaluated using a batch-style approach-does not always translate to significant improvements in user task performance; however, these prior studies focused on simple recall and precision-based search tasks. We investigated the same relationship, but for realistic, complex search tasks required in clinical decision making. One hundred and nine clinicians and final year medical students answered 16 clinical questions. Although the search engine did improve answer accuracy by 20 percentage points, there was no significant difference when participants used a more effective, state-of-the-art search engine. We also found that the search engine effectiveness difference, identified in the lab, was diminished by around 70% when the search engines were used with real users. Despite the aid of the search engine, half of the clinical questions were answered incorrectly. We further identified the relative contribution of search engine effectiveness to the overall end task success. We found that the ability to interpret documents correctly was a much more important factor impacting task success. If these findings are representative, information retrieval research may need to reorient its emphasis towards helping users to better understand information, rather than just finding it for them.

Search (6 results, page 1 of 1)

Authors

Languages

Types