Search (6 results, page 1 of 1)

Gao, R.; Ge, Y.; Sha, C.: FAIR: Fairness-aware information retrieval evaluation (2022) 0.00
```
0.0020106873 = product of:
  0.0040213745 = sum of:
    0.0040213745 = product of:
      0.008042749 = sum of:
        0.008042749 = weight(_text_:a in 669) [ClassicSimilarity], result of:
          0.008042749 = score(doc=669,freq=14.0), product of:
            0.04772363 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.041389145 = queryNorm
            0.1685276 = fieldWeight in 669, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=669)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

With the emerging needs of creating fairness-aware solutions for search and recommendation systems, a daunting challenge exists of evaluating such solutions. While many of the traditional information retrieval (IR) metrics can capture the relevance, diversity, and novelty for the utility with respect to users, they are not suitable for inferring whether the presented results are fair from the perspective of responsible information exposure. On the other hand, existing fairness metrics do not account for user utility or do not measure it adequately. To address this problem, we propose a new metric called FAIR. By unifying standard IR metrics and fairness measures into an integrated metric, this metric offers a new perspective for evaluating fairness-aware ranking results. Based on this metric, we developed an effective ranking algorithm that jointly optimized user utility and fairness. The experimental results showed that our FAIR metric could highlight results with good user utility and fair information exposure. We showed how FAIR related to a set of existing utility and fairness metrics and demonstrated the effectiveness of our FAIR-based algorithm. We believe our work opens up a new direction of pursuing a metric for evaluating and implementing the FAIR systems.

Type

a
Breuer, T.; Tavakolpoursaleh, N.; Schaer, P.; Hienert, D.; Schaible, J.; Castro, L.J.: Online Information Retrieval Evaluation using the STELLA Framework (2022) 0.00
```
0.001823924 = product of:
  0.003647848 = sum of:
    0.003647848 = product of:
      0.007295696 = sum of:
        0.007295696 = weight(_text_:a in 640) [ClassicSimilarity], result of:
          0.007295696 = score(doc=640,freq=8.0), product of:
            0.04772363 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.041389145 = queryNorm
            0.15287387 = fieldWeight in 640, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=640)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Involving users in early phases of software development has become a common strategy as it enables developers to consider user needs from the beginning. Once a system is in production, new opportunities to observe, evaluate and learn from users emerge as more information becomes available. Gathering information from users to continuously evaluate their behavior is a common practice for commercial software, while the Cranfield paradigm remains the preferred option for Information Retrieval (IR) and recommendation systems in the academic world. Here we introduce the Infrastructures for Living Labs STELLA project which aims to create an evaluation infrastructure allowing experimental systems to run along production web-based academic search systems with real users. STELLA combines user interactions and log files analyses to enable large-scale A/B experiments for academic search.
Vegt, A. van der; Zuccon, G.; Koopman, B.: Do better search engines really equate to better clinical decisions? : If not, why not? (2021) 0.00
```
0.0016993409 = product of:
  0.0033986818 = sum of:
    0.0033986818 = product of:
      0.0067973635 = sum of:
        0.0067973635 = weight(_text_:a in 150) [ClassicSimilarity], result of:
          0.0067973635 = score(doc=150,freq=10.0), product of:
            0.04772363 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.041389145 = queryNorm
            0.14243183 = fieldWeight in 150, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=150)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Previous research has found that improved search engine effectiveness-evaluated using a batch-style approach-does not always translate to significant improvements in user task performance; however, these prior studies focused on simple recall and precision-based search tasks. We investigated the same relationship, but for realistic, complex search tasks required in clinical decision making. One hundred and nine clinicians and final year medical students answered 16 clinical questions. Although the search engine did improve answer accuracy by 20 percentage points, there was no significant difference when participants used a more effective, state-of-the-art search engine. We also found that the search engine effectiveness difference, identified in the lab, was diminished by around 70% when the search engines were used with real users. Despite the aid of the search engine, half of the clinical questions were answered incorrectly. We further identified the relative contribution of search engine effectiveness to the overall end task success. We found that the ability to interpret documents correctly was a much more important factor impacting task success. If these findings are representative, information retrieval research may need to reorient its emphasis towards helping users to better understand information, rather than just finding it for them.

Type

a
Parapar, J.; Losada, D.E.; Presedo-Quindimil, M.A.; Barreiro, A.: Using score distributions to compare statistical significance tests for information retrieval evaluation (2020) 0.00
```
0.0015199365 = product of:
  0.003039873 = sum of:
    0.003039873 = product of:
      0.006079746 = sum of:
        0.006079746 = weight(_text_:a in 5506) [ClassicSimilarity], result of:
          0.006079746 = score(doc=5506,freq=8.0), product of:
            0.04772363 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.041389145 = queryNorm
            0.12739488 = fieldWeight in 5506, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5506)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Statistical significance tests can provide evidence that the observed difference in performance between 2 methods is not due to chance. In information retrieval (IR), some studies have examined the validity and suitability of such tests for comparing search systems. We argue here that current methods for assessing the reliability of statistical tests suffer from some methodological weaknesses, and we propose a novel way to study significance tests for retrieval evaluation. Using Score Distributions, we model the output of multiple search systems, produce simulated search results from such models, and compare them using various significance tests. A key strength of this approach is that we assess statistical tests under perfect knowledge about the truth or falseness of the null hypothesis. This new method for studying the power of significance tests in IR evaluation is formal and innovative. Following this type of analysis, we found that both the sign test and Wilcoxon signed test have more power than the permutation test and the t-test. The sign test and Wilcoxon signed test also have good behavior in terms of type I errors. The bootstrap test shows few type I errors, but it has less power than the other methods tested.

Type

a

Petras, V.; Womser-Hacker, C.: Evaluation im Information Retrieval (2023) 0.00

9.11962E-4 = product of:
  0.001823924 = sum of:
    0.001823924 = product of:
      0.003647848 = sum of:
        0.003647848 = weight(_text_:a in 808) [ClassicSimilarity], result of:
          0.003647848 = score(doc=808,freq=2.0), product of:
            0.04772363 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.041389145 = queryNorm
            0.07643694 = fieldWeight in 808, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=808)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Type: a

Wartena, C.; Golub, K.: Evaluierung von Verschlagwortung im Kontext des Information Retrievals (2021) 0.00

7.5996824E-4 = product of:
  0.0015199365 = sum of:
    0.0015199365 = product of:
      0.003039873 = sum of:
        0.003039873 = weight(_text_:a in 376) [ClassicSimilarity], result of:
          0.003039873 = score(doc=376,freq=2.0), product of:
            0.04772363 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.041389145 = queryNorm
            0.06369744 = fieldWeight in 376, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=376)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Type: a

Search (6 results, page 1 of 1)

Authors

Languages

Types