Efron, M.: Eigenvalue-based model selection during Latent Semantic Indexing (2005)
0.02
0.01559464 = product of:
0.03118928 = sum of:
0.03118928 = product of:
0.06237856 = sum of:
0.06237856 = weight(_text_:k in 3685) [ClassicSimilarity], result of:
0.06237856 = score(doc=3685,freq=4.0), product of:
0.18639012 = queryWeight, product of:
3.569778 = idf(docFreq=3384, maxDocs=44218)
0.052213363 = queryNorm
0.33466667 = fieldWeight in 3685, product of:
2.0 = tf(freq=4.0), with freq of:
4.0 = termFreq=4.0
3.569778 = idf(docFreq=3384, maxDocs=44218)
0.046875 = fieldNorm(doc=3685)
0.5 = coord(1/2)
0.5 = coord(1/2)
- Abstract
- In this study amended parallel analysis (APA), a novel method for model selection in unsupervised learning problems such as information retrieval (IR), is described. At issue is the selection of k, the number of dimensions retained under latent semantic indexing (LSI). Amended parallel analysis is an elaboration of Horn's parallel analysis, which advocates retaining eigenvalues larger than those that we would expect under term independence. Amended parallel analysis operates by deriving confidence intervals an these "null" eigenvalues. The technique amounts to a series of nonparametric hypothesis tests an the correlation matrix eigenvalues. In the study, APA is tested along with four established dimensionality estimators an six Standard IR test collections. These estimates are evaluated with regard to two IR performance metrics. Additionally, results from simulated data are reported. In both rounds of experimentation APA performs weIl, predicting the best values of k an 3 of 12 observations, with good predictions an several others, and never offering the worst estimate of optimal dimensionality.