Search (2 results, page 1 of 1)

Efron, M.: Eigenvalue-based model selection during Latent Semantic Indexing (2005) 0.02
```
0.01559464 = product of:
  0.03118928 = sum of:
    0.03118928 = product of:
      0.06237856 = sum of:
        0.06237856 = weight(_text_:k in 3685) [ClassicSimilarity], result of:
          0.06237856 = score(doc=3685,freq=4.0), product of:
            0.18639012 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.052213363 = queryNorm
            0.33466667 = fieldWeight in 3685, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.046875 = fieldNorm(doc=3685)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

In this study amended parallel analysis (APA), a novel method for model selection in unsupervised learning problems such as information retrieval (IR), is described. At issue is the selection of k, the number of dimensions retained under latent semantic indexing (LSI). Amended parallel analysis is an elaboration of Horn's parallel analysis, which advocates retaining eigenvalues larger than those that we would expect under term independence. Amended parallel analysis operates by deriving confidence intervals an these "null" eigenvalues. The technique amounts to a series of nonparametric hypothesis tests an the correlation matrix eigenvalues. In the study, APA is tested along with four established dimensionality estimators an six Standard IR test collections. These estimates are evaluated with regard to two IR performance metrics. Additionally, results from simulated data are reported. In both rounds of experimentation APA performs weIl, predicting the best values of k an 3 of 12 observations, with good predictions an several others, and never offering the worst estimate of optimal dimensionality.
Efron, M.; Winget, M.: Query polyrepresentation for ranking retrieval systems without relevance judgments (2010) 0.02
```
0.01559464 = product of:
  0.03118928 = sum of:
    0.03118928 = product of:
      0.06237856 = sum of:
        0.06237856 = weight(_text_:k in 3469) [ClassicSimilarity], result of:
          0.06237856 = score(doc=3469,freq=4.0), product of:
            0.18639012 = queryWeight, product of:
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.052213363 = queryNorm
            0.33466667 = fieldWeight in 3469, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.569778 = idf(docFreq=3384, maxDocs=44218)
              0.046875 = fieldNorm(doc=3469)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Ranking information retrieval (IR) systems with respect to their effectiveness is a crucial operation during IR evaluation, as well as during data fusion. This article offers a novel method of approaching the system-ranking problem, based on the widely studied idea of polyrepresentation. The principle of polyrepresentation suggests that a single information need can be represented by many query articulations-what we call query aspects. By skimming the top k (where k is small) documents retrieved by a single system for multiple query aspects, we collect a set of documents that are likely to be relevant to a given test topic. Labeling these skimmed documents as putatively relevant lets us build pseudorelevance judgments without undue human intervention. We report experiments where using these pseudorelevance judgments delivers a rank ordering of IR systems that correlates highly with rankings based on human relevance judgments.

Search (2 results, page 1 of 1)

Years