Document (#28686)

Author
Efron, M.
Title
Eigenvalue-based model selection during Latent Semantic Indexing
Source
Journal of the American Society for Information Science and Technology. 56(2005) no.9, S.969-988
Year
2005
Abstract
In this study amended parallel analysis (APA), a novel method for model selection in unsupervised learning problems such as information retrieval (IR), is described. At issue is the selection of k, the number of dimensions retained under latent semantic indexing (LSI). Amended parallel analysis is an elaboration of Horn's parallel analysis, which advocates retaining eigenvalues larger than those that we would expect under term independence. Amended parallel analysis operates by deriving confidence intervals an these "null" eigenvalues. The technique amounts to a series of nonparametric hypothesis tests an the correlation matrix eigenvalues. In the study, APA is tested along with four established dimensionality estimators an six Standard IR test collections. These estimates are evaluated with regard to two IR performance metrics. Additionally, results from simulated data are reported. In both rounds of experimentation APA performs weIl, predicting the best values of k an 3 of 12 observations, with good predictions an several others, and never offering the worst estimate of optimal dimensionality.
Object
Latent Semantic Indexing

Similar documents (author)

  1. Efron, M.: Shannon meets Shortz : a probabilistic model of crossword puzzle difficulty (2008) 6.09
    6.094361 = sum of:
      6.094361 = weight(author_txt:efron in 1620) [ClassicSimilarity], result of:
        6.094361 = fieldWeight in 1620, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.7509775 = idf(docFreq=6, maxDocs=44218)
          0.625 = fieldNorm(doc=1620)
    
  2. Efron, M.: Query expansion and dimensionality reduction : Notions of optimality in Rocchio relevance feedback and latent semantic indexing (2008) 6.09
    6.094361 = sum of:
      6.094361 = weight(author_txt:efron in 2020) [ClassicSimilarity], result of:
        6.094361 = fieldWeight in 2020, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.7509775 = idf(docFreq=6, maxDocs=44218)
          0.625 = fieldNorm(doc=2020)
    
  3. Efron, M.: Linear time series models for term weighting in information retrieval (2010) 6.09
    6.094361 = sum of:
      6.094361 = weight(author_txt:efron in 3688) [ClassicSimilarity], result of:
        6.094361 = fieldWeight in 3688, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.7509775 = idf(docFreq=6, maxDocs=44218)
          0.625 = fieldNorm(doc=3688)
    
  4. Efron, M.: Information search and retrieval in microblogs (2011) 6.09
    6.094361 = sum of:
      6.094361 = weight(author_txt:efron in 4455) [ClassicSimilarity], result of:
        6.094361 = fieldWeight in 4455, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.7509775 = idf(docFreq=6, maxDocs=44218)
          0.625 = fieldNorm(doc=4455)
    
  5. Efron, M.; Winget, M.: Query polyrepresentation for ranking retrieval systems without relevance judgments (2010) 4.88
    4.8754888 = sum of:
      4.8754888 = weight(author_txt:efron in 3469) [ClassicSimilarity], result of:
        4.8754888 = fieldWeight in 3469, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.7509775 = idf(docFreq=6, maxDocs=44218)
          0.5 = fieldNorm(doc=3469)
    

Similar documents (content)

  1. Kumar, C.A.; Radvansky, M.; Annapurna, J.: Analysis of Vector Space Model, Latent Semantic Indexing and Formal Concept Analysis for information retrieval (2012) 0.10
    0.09677279 = sum of:
      0.09677279 = product of:
        0.40321997 = sum of:
          0.035499677 = weight(abstract_txt:model in 2710) [ClassicSimilarity], result of:
            0.035499677 = score(doc=2710,freq=2.0), product of:
              0.06717 = queryWeight, product of:
                1.0343238 = boost
                3.986234 = idf(docFreq=2231, maxDocs=44218)
                0.016291311 = queryNorm
              0.5285049 = fieldWeight in 2710, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.986234 = idf(docFreq=2231, maxDocs=44218)
                0.09375 = fieldNorm(doc=2710)
          0.032611083 = weight(abstract_txt:indexing in 2710) [ClassicSimilarity], result of:
            0.032611083 = score(doc=2710,freq=1.0), product of:
              0.07997346 = queryWeight, product of:
                1.1286045 = boost
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.016291311 = queryNorm
              0.40777382 = fieldWeight in 2710, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.09375 = fieldNorm(doc=2710)
          0.050201595 = weight(abstract_txt:semantic in 2710) [ClassicSimilarity], result of:
            0.050201595 = score(doc=2710,freq=2.0), product of:
              0.084626056 = queryWeight, product of:
                1.1609697 = boost
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.016291311 = queryNorm
              0.5932168 = fieldWeight in 2710, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.09375 = fieldNorm(doc=2710)
          0.054315485 = weight(abstract_txt:under in 2710) [ClassicSimilarity], result of:
            0.054315485 = score(doc=2710,freq=1.0), product of:
              0.11237029 = queryWeight, product of:
                1.3378105 = boost
                5.155857 = idf(docFreq=692, maxDocs=44218)
                0.016291311 = queryNorm
              0.4833616 = fieldWeight in 2710, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.155857 = idf(docFreq=692, maxDocs=44218)
                0.09375 = fieldNorm(doc=2710)
          0.19193813 = weight(abstract_txt:latent in 2710) [ClassicSimilarity], result of:
            0.19193813 = score(doc=2710,freq=2.0), product of:
              0.20691878 = queryWeight, product of:
                1.8153852 = boost
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.016291311 = queryNorm
              0.92760134 = fieldWeight in 2710, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.09375 = fieldNorm(doc=2710)
          0.038654003 = weight(abstract_txt:analysis in 2710) [ClassicSimilarity], result of:
            0.038654003 = score(doc=2710,freq=1.0), product of:
              0.112851866 = queryWeight, product of:
                1.8959994 = boost
                3.6535451 = idf(docFreq=3112, maxDocs=44218)
                0.016291311 = queryNorm
              0.34251985 = fieldWeight in 2710, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6535451 = idf(docFreq=3112, maxDocs=44218)
                0.09375 = fieldNorm(doc=2710)
        0.24 = coord(6/25)
    
  2. Cribbin, T.: Discovering latent topical structure by second-order similarity analysis (2011) 0.10
    0.09504058 = sum of:
      0.09504058 = product of:
        0.39600244 = sum of:
          0.06544456 = weight(abstract_txt:deriving in 4470) [ClassicSimilarity], result of:
            0.06544456 = score(doc=4470,freq=1.0), product of:
              0.13233323 = queryWeight, product of:
                1.0265694 = boost
                7.912698 = idf(docFreq=43, maxDocs=44218)
                0.016291311 = queryNorm
              0.4945436 = fieldWeight in 4470, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.912698 = idf(docFreq=43, maxDocs=44218)
                0.0625 = fieldNorm(doc=4470)
          0.06601664 = weight(abstract_txt:independence in 4470) [ClassicSimilarity], result of:
            0.06601664 = score(doc=4470,freq=1.0), product of:
              0.13310331 = queryWeight, product of:
                1.029552 = boost
                7.935687 = idf(docFreq=42, maxDocs=44218)
                0.016291311 = queryNorm
              0.49598044 = fieldWeight in 4470, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.935687 = idf(docFreq=42, maxDocs=44218)
                0.0625 = fieldNorm(doc=4470)
          0.016734708 = weight(abstract_txt:model in 4470) [ClassicSimilarity], result of:
            0.016734708 = score(doc=4470,freq=1.0), product of:
              0.06717 = queryWeight, product of:
                1.0343238 = boost
                3.986234 = idf(docFreq=2231, maxDocs=44218)
                0.016291311 = queryNorm
              0.24913962 = fieldWeight in 4470, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.986234 = idf(docFreq=2231, maxDocs=44218)
                0.0625 = fieldNorm(doc=4470)
          0.033467732 = weight(abstract_txt:semantic in 4470) [ClassicSimilarity], result of:
            0.033467732 = score(doc=4470,freq=2.0), product of:
              0.084626056 = queryWeight, product of:
                1.1609697 = boost
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.016291311 = queryNorm
              0.39547786 = fieldWeight in 4470, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.0625 = fieldNorm(doc=4470)
          0.15671682 = weight(abstract_txt:latent in 4470) [ClassicSimilarity], result of:
            0.15671682 = score(doc=4470,freq=3.0), product of:
              0.20691878 = queryWeight, product of:
                1.8153852 = boost
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.016291311 = queryNorm
              0.7573833 = fieldWeight in 4470, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.0625 = fieldNorm(doc=4470)
          0.05762199 = weight(abstract_txt:analysis in 4470) [ClassicSimilarity], result of:
            0.05762199 = score(doc=4470,freq=5.0), product of:
              0.112851866 = queryWeight, product of:
                1.8959994 = boost
                3.6535451 = idf(docFreq=3112, maxDocs=44218)
                0.016291311 = queryNorm
              0.5105985 = fieldWeight in 4470, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.6535451 = idf(docFreq=3112, maxDocs=44218)
                0.0625 = fieldNorm(doc=4470)
        0.24 = coord(6/25)
    
  3. He, X.; Cai, D.; Liu, H.; Ma, W.Y.: Locality preserving indexing for document representation (2004) 0.09
    0.08693507 = sum of:
      0.08693507 = product of:
        0.72445893 = sum of:
          0.15373012 = weight(abstract_txt:indexing in 4079) [ClassicSimilarity], result of:
            0.15373012 = score(doc=4079,freq=2.0), product of:
              0.07997346 = queryWeight, product of:
                1.1286045 = boost
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.016291311 = queryNorm
              1.9222642 = fieldWeight in 4079, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.3125 = fieldNorm(doc=4079)
          0.11832631 = weight(abstract_txt:semantic in 4079) [ClassicSimilarity], result of:
            0.11832631 = score(doc=4079,freq=1.0), product of:
              0.084626056 = queryWeight, product of:
                1.1609697 = boost
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.016291311 = queryNorm
              1.3982254 = fieldWeight in 4079, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.3125 = fieldNorm(doc=4079)
          0.4524025 = weight(abstract_txt:latent in 4079) [ClassicSimilarity], result of:
            0.4524025 = score(doc=4079,freq=1.0), product of:
              0.20691878 = queryWeight, product of:
                1.8153852 = boost
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.016291311 = queryNorm
              2.1863773 = fieldWeight in 4079, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.3125 = fieldNorm(doc=4079)
        0.12 = coord(3/25)
    
  4. Cheung, C.M.K.; Lee, M.K.O.: ¬The structure of Web-based information systems satisfaction : testing of competing models (2008) 0.08
    0.08032217 = sum of:
      0.08032217 = product of:
        0.50201356 = sum of:
          0.029583065 = weight(abstract_txt:model in 2005) [ClassicSimilarity], result of:
            0.029583065 = score(doc=2005,freq=2.0), product of:
              0.06717 = queryWeight, product of:
                1.0343238 = boost
                3.986234 = idf(docFreq=2231, maxDocs=44218)
                0.016291311 = queryNorm
              0.44042078 = fieldWeight in 2005, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.986234 = idf(docFreq=2231, maxDocs=44218)
                0.078125 = fieldNorm(doc=2005)
          0.10524229 = weight(abstract_txt:retained in 2005) [ClassicSimilarity], result of:
            0.10524229 = score(doc=2005,freq=1.0), product of:
              0.15653332 = queryWeight, product of:
                1.1164962 = boost
                8.6058445 = idf(docFreq=21, maxDocs=44218)
                0.016291311 = queryNorm
              0.6723316 = fieldWeight in 2005, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.6058445 = idf(docFreq=21, maxDocs=44218)
                0.078125 = fieldNorm(doc=2005)
          0.15994844 = weight(abstract_txt:latent in 2005) [ClassicSimilarity], result of:
            0.15994844 = score(doc=2005,freq=2.0), product of:
              0.20691878 = queryWeight, product of:
                1.8153852 = boost
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.016291311 = queryNorm
              0.7730011 = fieldWeight in 2005, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.078125 = fieldNorm(doc=2005)
          0.20723975 = weight(abstract_txt:dimensionality in 2005) [ClassicSimilarity], result of:
            0.20723975 = score(doc=2005,freq=1.0), product of:
              0.3098408 = queryWeight, product of:
                2.2214582 = boost
                8.561393 = idf(docFreq=22, maxDocs=44218)
                0.016291311 = queryNorm
              0.6688588 = fieldWeight in 2005, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.561393 = idf(docFreq=22, maxDocs=44218)
                0.078125 = fieldNorm(doc=2005)
        0.16 = coord(4/25)
    
  5. Zhan, J.; Loh, H.T.: Using latent semantic indexing to improve the accuracy of document clustering (2007) 0.08
    0.07960325 = sum of:
      0.07960325 = product of:
        0.39801624 = sum of:
          0.020918386 = weight(abstract_txt:model in 264) [ClassicSimilarity], result of:
            0.020918386 = score(doc=264,freq=1.0), product of:
              0.06717 = queryWeight, product of:
                1.0343238 = boost
                3.986234 = idf(docFreq=2231, maxDocs=44218)
                0.016291311 = queryNorm
              0.31142452 = fieldWeight in 264, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.986234 = idf(docFreq=2231, maxDocs=44218)
                0.078125 = fieldNorm(doc=264)
          0.027175901 = weight(abstract_txt:indexing in 264) [ClassicSimilarity], result of:
            0.027175901 = score(doc=264,freq=1.0), product of:
              0.07997346 = queryWeight, product of:
                1.1286045 = boost
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.016291311 = queryNorm
              0.3398115 = fieldWeight in 264, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.078125 = fieldNorm(doc=264)
          0.029581577 = weight(abstract_txt:semantic in 264) [ClassicSimilarity], result of:
            0.029581577 = score(doc=264,freq=1.0), product of:
              0.084626056 = queryWeight, product of:
                1.1609697 = boost
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.016291311 = queryNorm
              0.34955636 = fieldWeight in 264, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.078125 = fieldNorm(doc=264)
          0.113100626 = weight(abstract_txt:latent in 264) [ClassicSimilarity], result of:
            0.113100626 = score(doc=264,freq=1.0), product of:
              0.20691878 = queryWeight, product of:
                1.8153852 = boost
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.016291311 = queryNorm
              0.5465943 = fieldWeight in 264, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.078125 = fieldNorm(doc=264)
          0.20723975 = weight(abstract_txt:dimensionality in 264) [ClassicSimilarity], result of:
            0.20723975 = score(doc=264,freq=1.0), product of:
              0.3098408 = queryWeight, product of:
                2.2214582 = boost
                8.561393 = idf(docFreq=22, maxDocs=44218)
                0.016291311 = queryNorm
              0.6688588 = fieldWeight in 264, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.561393 = idf(docFreq=22, maxDocs=44218)
                0.078125 = fieldNorm(doc=264)
        0.2 = coord(5/25)