Search (3 results, page 1 of 1)

  • × theme_ss:"Retrievalstudien"
  • × type_ss:"a"
  • × year_i:[2020 TO 2030}
  1. Parapar, J.; Losada, D.E.; Presedo-Quindimil, M.A.; Barreiro, A.: Using score distributions to compare statistical significance tests for information retrieval evaluation (2020) 0.00
    0.0026372964 = product of:
      0.010549186 = sum of:
        0.010549186 = product of:
          0.031647556 = sum of:
            0.031647556 = weight(_text_:systems in 5506) [ClassicSimilarity], result of:
              0.031647556 = score(doc=5506,freq=4.0), product of:
                0.13181444 = queryWeight, product of:
                  3.0731742 = idf(docFreq=5561, maxDocs=44218)
                  0.04289195 = queryNorm
                0.24009174 = fieldWeight in 5506, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.0731742 = idf(docFreq=5561, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5506)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Abstract
    Statistical significance tests can provide evidence that the observed difference in performance between 2 methods is not due to chance. In information retrieval (IR), some studies have examined the validity and suitability of such tests for comparing search systems. We argue here that current methods for assessing the reliability of statistical tests suffer from some methodological weaknesses, and we propose a novel way to study significance tests for retrieval evaluation. Using Score Distributions, we model the output of multiple search systems, produce simulated search results from such models, and compare them using various significance tests. A key strength of this approach is that we assess statistical tests under perfect knowledge about the truth or falseness of the null hypothesis. This new method for studying the power of significance tests in IR evaluation is formal and innovative. Following this type of analysis, we found that both the sign test and Wilcoxon signed test have more power than the permutation test and the t-test. The sign test and Wilcoxon signed test also have good behavior in terms of type I errors. The bootstrap test shows few type I errors, but it has less power than the other methods tested.
  2. Gao, R.; Ge, Y.; Sha, C.: FAIR: Fairness-aware information retrieval evaluation (2022) 0.00
    0.0026372964 = product of:
      0.010549186 = sum of:
        0.010549186 = product of:
          0.031647556 = sum of:
            0.031647556 = weight(_text_:systems in 669) [ClassicSimilarity], result of:
              0.031647556 = score(doc=669,freq=4.0), product of:
                0.13181444 = queryWeight, product of:
                  3.0731742 = idf(docFreq=5561, maxDocs=44218)
                  0.04289195 = queryNorm
                0.24009174 = fieldWeight in 669, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.0731742 = idf(docFreq=5561, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=669)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Abstract
    With the emerging needs of creating fairness-aware solutions for search and recommendation systems, a daunting challenge exists of evaluating such solutions. While many of the traditional information retrieval (IR) metrics can capture the relevance, diversity, and novelty for the utility with respect to users, they are not suitable for inferring whether the presented results are fair from the perspective of responsible information exposure. On the other hand, existing fairness metrics do not account for user utility or do not measure it adequately. To address this problem, we propose a new metric called FAIR. By unifying standard IR metrics and fairness measures into an integrated metric, this metric offers a new perspective for evaluating fairness-aware ranking results. Based on this metric, we developed an effective ranking algorithm that jointly optimized user utility and fairness. The experimental results showed that our FAIR metric could highlight results with good user utility and fair information exposure. We showed how FAIR related to a set of existing utility and fairness metrics and demonstrated the effectiveness of our FAIR-based algorithm. We believe our work opens up a new direction of pursuing a metric for evaluating and implementing the FAIR systems.
  3. Petras, V.; Womser-Hacker, C.: Evaluation im Information Retrieval (2023) 0.00
    0.0022378203 = product of:
      0.008951281 = sum of:
        0.008951281 = product of:
          0.026853843 = sum of:
            0.026853843 = weight(_text_:systems in 808) [ClassicSimilarity], result of:
              0.026853843 = score(doc=808,freq=2.0), product of:
                0.13181444 = queryWeight, product of:
                  3.0731742 = idf(docFreq=5561, maxDocs=44218)
                  0.04289195 = queryNorm
                0.2037246 = fieldWeight in 808, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0731742 = idf(docFreq=5561, maxDocs=44218)
                  0.046875 = fieldNorm(doc=808)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Abstract
    Das Ziel einer Evaluation ist die Überprüfung, ob bzw. in welchem Ausmaß ein Informationssystem die an das System gestellten Anforderungen erfüllt. Informationssysteme können aus verschiedenen Perspektiven evaluiert werden. Für eine ganzheitliche Evaluation (als Synonym wird auch Evaluierung benutzt), die unterschiedliche Qualitätsaspekte betrachtet (z. B. wie gut ein System relevante Dokumente rankt, wie schnell ein System die Suche durchführt, wie die Ergebnispräsentation gestaltet ist oder wie Suchende durch das System geführt werden) und die Erfüllung mehrerer Anforderungen überprüft, empfiehlt es sich, sowohl eine perspektivische als auch methodische Triangulation (d. h. der Einsatz von mehreren Ansätzen zur Qualitätsüberprüfung) vorzunehmen. Im Information Retrieval (IR) konzentriert sich die Evaluation auf die Qualitätseinschätzung der Suchfunktion eines Information-Retrieval-Systems (IRS), wobei oft zwischen systemzentrierter und nutzerzentrierter Evaluation unterschieden wird. Dieses Kapitel setzt den Fokus auf die systemzentrierte Evaluation, während andere Kapitel dieses Handbuchs andere Evaluationsansätze diskutieren (s. Kapitel C 4 Interaktives Information Retrieval, C 7 Cross-Language Information Retrieval und D 1 Information Behavior).