Document (#28687)

Author
Efron, M.
Title
Eigenvalue-based model selection during Latent Semantic Indexing
Source
Journal of the American Society for Information Science and Technology. 56(2005) no.9, S.969-988
Year
2005
Abstract
In this study amended parallel analysis (APA), a novel method for model selection in unsupervised learning problems such as information retrieval (IR), is described. At issue is the selection of k, the number of dimensions retained under latent semantic indexing (LSI). Amended parallel analysis is an elaboration of Horn's parallel analysis, which advocates retaining eigenvalues larger than those that we would expect under term independence. Amended parallel analysis operates by deriving confidence intervals an these "null" eigenvalues. The technique amounts to a series of nonparametric hypothesis tests an the correlation matrix eigenvalues. In the study, APA is tested along with four established dimensionality estimators an six Standard IR test collections. These estimates are evaluated with regard to two IR performance metrics. Additionally, results from simulated data are reported. In both rounds of experimentation APA performs weIl, predicting the best values of k an 3 of 12 observations, with good predictions an several others, and never offering the worst estimate of optimal dimensionality.
Object
Latent Semantic Indexing

Similar documents (author)

  1. Efron, M.: Shannon meets Shortz : a probabilistic model of crossword puzzle difficulty (2008) 6.07
    6.0731125 = sum of:
      6.0731125 = weight(author_txt:efron in 3621) [ClassicSimilarity], result of:
        6.0731125 = fieldWeight in 3621, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.71698 = idf(docFreq=6, maxDocs=42740)
          0.625 = fieldNorm(doc=3621)
    
  2. Efron, M.: Query expansion and dimensionality reduction : Notions of optimality in Rocchio relevance feedback and latent semantic indexing (2008) 6.07
    6.0731125 = sum of:
      6.0731125 = weight(author_txt:efron in 4021) [ClassicSimilarity], result of:
        6.0731125 = fieldWeight in 4021, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.71698 = idf(docFreq=6, maxDocs=42740)
          0.625 = fieldNorm(doc=4021)
    
  3. Efron, M.: Linear time series models for term weighting in information retrieval (2010) 6.07
    6.0731125 = sum of:
      6.0731125 = weight(author_txt:efron in 689) [ClassicSimilarity], result of:
        6.0731125 = fieldWeight in 689, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.71698 = idf(docFreq=6, maxDocs=42740)
          0.625 = fieldNorm(doc=689)
    
  4. Efron, M.: Information search and retrieval in microblogs (2011) 6.07
    6.0731125 = sum of:
      6.0731125 = weight(author_txt:efron in 1456) [ClassicSimilarity], result of:
        6.0731125 = fieldWeight in 1456, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.71698 = idf(docFreq=6, maxDocs=42740)
          0.625 = fieldNorm(doc=1456)
    
  5. Efron, M.; Winget, M.: Query polyrepresentation for ranking retrieval systems without relevance judgments (2010) 4.86
    4.85849 = sum of:
      4.85849 = weight(author_txt:efron in 470) [ClassicSimilarity], result of:
        4.85849 = fieldWeight in 470, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.71698 = idf(docFreq=6, maxDocs=42740)
          0.5 = fieldNorm(doc=470)
    

Similar documents (content)

  1. Kumar, C.A.; Radvansky, M.; Annapurna, J.: Analysis of Vector Space Model, Latent Semantic Indexing and Formal Concept Analysis for information retrieval (2012) 0.10
    0.09868369 = sum of:
      0.09868369 = product of:
        0.41118208 = sum of:
          0.036550026 = weight(abstract_txt:model in 4711) [ClassicSimilarity], result of:
            0.036550026 = score(doc=4711,freq=2.0), product of:
              0.068537526 = queryWeight, product of:
                1.0378934 = boost
                4.022287 = idf(docFreq=2080, maxDocs=42740)
                0.016417334 = queryNorm
              0.5332849 = fieldWeight in 4711, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.022287 = idf(docFreq=2080, maxDocs=42740)
                0.09375 = fieldNorm(doc=4711)
          0.03247409 = weight(abstract_txt:indexing in 4711) [ClassicSimilarity], result of:
            0.03247409 = score(doc=4711,freq=1.0), product of:
              0.07980643 = queryWeight, product of:
                1.1199728 = boost
                4.34038 = idf(docFreq=1513, maxDocs=42740)
                0.016417334 = queryNorm
              0.40691066 = fieldWeight in 4711, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.34038 = idf(docFreq=1513, maxDocs=42740)
                0.09375 = fieldNorm(doc=4711)
          0.051118284 = weight(abstract_txt:semantic in 4711) [ClassicSimilarity], result of:
            0.051118284 = score(doc=4711,freq=2.0), product of:
              0.08571447 = queryWeight, product of:
                1.1606883 = boost
                4.4981704 = idf(docFreq=1292, maxDocs=42740)
                0.016417334 = queryNorm
              0.59637874 = fieldWeight in 4711, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4981704 = idf(docFreq=1292, maxDocs=42740)
                0.09375 = fieldNorm(doc=4711)
          0.055389903 = weight(abstract_txt:under in 4711) [ClassicSimilarity], result of:
            0.055389903 = score(doc=4711,freq=1.0), product of:
              0.11392884 = queryWeight, product of:
                1.3381517 = boost
                5.1859183 = idf(docFreq=649, maxDocs=42740)
                0.016417334 = queryNorm
              0.48617983 = fieldWeight in 4711, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1859183 = idf(docFreq=649, maxDocs=42740)
                0.09375 = fieldNorm(doc=4711)
          0.19579479 = weight(abstract_txt:latent in 4711) [ClassicSimilarity], result of:
            0.19579479 = score(doc=4711,freq=2.0), product of:
              0.20983149 = queryWeight, product of:
                1.8160336 = boost
                7.0379176 = idf(docFreq=101, maxDocs=42740)
                0.016417334 = queryNorm
              0.9331049 = fieldWeight in 4711, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.0379176 = idf(docFreq=101, maxDocs=42740)
                0.09375 = fieldNorm(doc=4711)
          0.039854985 = weight(abstract_txt:analysis in 4711) [ClassicSimilarity], result of:
            0.039854985 = score(doc=4711,freq=1.0), product of:
              0.11525993 = queryWeight, product of:
                1.9034554 = boost
                3.6883576 = idf(docFreq=2905, maxDocs=42740)
                0.016417334 = queryNorm
              0.34578353 = fieldWeight in 4711, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6883576 = idf(docFreq=2905, maxDocs=42740)
                0.09375 = fieldNorm(doc=4711)
        0.24 = coord(6/25)
    
  2. Cribbin, T.: Discovering latent topical structure by second-order similarity analysis (2011) 0.10
    0.09657145 = sum of:
      0.09657145 = product of:
        0.40238106 = sum of:
          0.06589712 = weight(abstract_txt:deriving in 1471) [ClassicSimilarity], result of:
            0.06589712 = score(doc=1471,freq=1.0), product of:
              0.1330378 = queryWeight, product of:
                1.0224947 = boost
                7.925221 = idf(docFreq=41, maxDocs=42740)
                0.016417334 = queryNorm
              0.4953263 = fieldWeight in 1471, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.925221 = idf(docFreq=41, maxDocs=42740)
                0.0625 = fieldNorm(doc=1471)
          0.06589712 = weight(abstract_txt:independence in 1471) [ClassicSimilarity], result of:
            0.06589712 = score(doc=1471,freq=1.0), product of:
              0.1330378 = queryWeight, product of:
                1.0224947 = boost
                7.925221 = idf(docFreq=41, maxDocs=42740)
                0.016417334 = queryNorm
              0.4953263 = fieldWeight in 1471, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.925221 = idf(docFreq=41, maxDocs=42740)
                0.0625 = fieldNorm(doc=1471)
          0.01722985 = weight(abstract_txt:model in 1471) [ClassicSimilarity], result of:
            0.01722985 = score(doc=1471,freq=1.0), product of:
              0.068537526 = queryWeight, product of:
                1.0378934 = boost
                4.022287 = idf(docFreq=2080, maxDocs=42740)
                0.016417334 = queryNorm
              0.25139293 = fieldWeight in 1471, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.022287 = idf(docFreq=2080, maxDocs=42740)
                0.0625 = fieldNorm(doc=1471)
          0.03407886 = weight(abstract_txt:semantic in 1471) [ClassicSimilarity], result of:
            0.03407886 = score(doc=1471,freq=2.0), product of:
              0.08571447 = queryWeight, product of:
                1.1606883 = boost
                4.4981704 = idf(docFreq=1292, maxDocs=42740)
                0.016417334 = queryNorm
              0.39758584 = fieldWeight in 1471, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4981704 = idf(docFreq=1292, maxDocs=42740)
                0.0625 = fieldNorm(doc=1471)
          0.15986578 = weight(abstract_txt:latent in 1471) [ClassicSimilarity], result of:
            0.15986578 = score(doc=1471,freq=3.0), product of:
              0.20983149 = queryWeight, product of:
                1.8160336 = boost
                7.0379176 = idf(docFreq=101, maxDocs=42740)
                0.016417334 = queryNorm
              0.76187694 = fieldWeight in 1471, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.0379176 = idf(docFreq=101, maxDocs=42740)
                0.0625 = fieldNorm(doc=1471)
          0.059412304 = weight(abstract_txt:analysis in 1471) [ClassicSimilarity], result of:
            0.059412304 = score(doc=1471,freq=5.0), product of:
              0.11525993 = queryWeight, product of:
                1.9034554 = boost
                3.6883576 = idf(docFreq=2905, maxDocs=42740)
                0.016417334 = queryNorm
              0.51546365 = fieldWeight in 1471, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.6883576 = idf(docFreq=2905, maxDocs=42740)
                0.0625 = fieldNorm(doc=1471)
        0.24 = coord(6/25)
    
  3. He, X.; Cai, D.; Liu, H.; Ma, W.Y.: Locality preserving indexing for document representation (2004) 0.09
    0.088207684 = sum of:
      0.088207684 = product of:
        0.735064 = sum of:
          0.15308432 = weight(abstract_txt:indexing in 80) [ClassicSimilarity], result of:
            0.15308432 = score(doc=80,freq=2.0), product of:
              0.07980643 = queryWeight, product of:
                1.1199728 = boost
                4.34038 = idf(docFreq=1513, maxDocs=42740)
                0.016417334 = queryNorm
              1.9181952 = fieldWeight in 80, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.34038 = idf(docFreq=1513, maxDocs=42740)
                0.3125 = fieldNorm(doc=80)
          0.12048697 = weight(abstract_txt:semantic in 80) [ClassicSimilarity], result of:
            0.12048697 = score(doc=80,freq=1.0), product of:
              0.08571447 = queryWeight, product of:
                1.1606883 = boost
                4.4981704 = idf(docFreq=1292, maxDocs=42740)
                0.016417334 = queryNorm
              1.4056783 = fieldWeight in 80, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4981704 = idf(docFreq=1292, maxDocs=42740)
                0.3125 = fieldNorm(doc=80)
          0.46149272 = weight(abstract_txt:latent in 80) [ClassicSimilarity], result of:
            0.46149272 = score(doc=80,freq=1.0), product of:
              0.20983149 = queryWeight, product of:
                1.8160336 = boost
                7.0379176 = idf(docFreq=101, maxDocs=42740)
                0.016417334 = queryNorm
              2.1993492 = fieldWeight in 80, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0379176 = idf(docFreq=101, maxDocs=42740)
                0.3125 = fieldNorm(doc=80)
        0.12 = coord(3/25)
    
  4. Cheung, C.M.K.; Lee, M.K.O.: ¬The structure of Web-based information systems satisfaction : testing of competing models (2008) 0.08
    0.08049042 = sum of:
      0.08049042 = product of:
        0.5030651 = sum of:
          0.030458359 = weight(abstract_txt:model in 4006) [ClassicSimilarity], result of:
            0.030458359 = score(doc=4006,freq=2.0), product of:
              0.068537526 = queryWeight, product of:
                1.0378934 = boost
                4.022287 = idf(docFreq=2080, maxDocs=42740)
                0.016417334 = queryNorm
              0.44440413 = fieldWeight in 4006, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.022287 = idf(docFreq=2080, maxDocs=42740)
                0.078125 = fieldNorm(doc=4006)
          0.10422351 = weight(abstract_txt:retained in 4006) [ClassicSimilarity], result of:
            0.10422351 = score(doc=4006,freq=1.0), product of:
              0.15563284 = queryWeight, product of:
                1.1059211 = boost
                8.571848 = idf(docFreq=21, maxDocs=42740)
                0.016417334 = queryNorm
              0.6696756 = fieldWeight in 4006, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.571848 = idf(docFreq=21, maxDocs=42740)
                0.078125 = fieldNorm(doc=4006)
          0.16316232 = weight(abstract_txt:latent in 4006) [ClassicSimilarity], result of:
            0.16316232 = score(doc=4006,freq=2.0), product of:
              0.20983149 = queryWeight, product of:
                1.8160336 = boost
                7.0379176 = idf(docFreq=101, maxDocs=42740)
                0.016417334 = queryNorm
              0.77758735 = fieldWeight in 4006, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.0379176 = idf(docFreq=101, maxDocs=42740)
                0.078125 = fieldNorm(doc=4006)
          0.20522095 = weight(abstract_txt:dimensionality in 4006) [ClassicSimilarity], result of:
            0.20522095 = score(doc=4006,freq=1.0), product of:
              0.30804574 = queryWeight, product of:
                2.2003722 = boost
                8.527396 = idf(docFreq=22, maxDocs=42740)
                0.016417334 = queryNorm
              0.66620284 = fieldWeight in 4006, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.527396 = idf(docFreq=22, maxDocs=42740)
                0.078125 = fieldNorm(doc=4006)
        0.16 = coord(4/25)
    
  5. Zhan, J.; Loh, H.T.: Using latent semantic indexing to improve the accuracy of document clustering (2007) 0.08
    0.07986299 = sum of:
      0.07986299 = product of:
        0.39931494 = sum of:
          0.021537313 = weight(abstract_txt:model in 2265) [ClassicSimilarity], result of:
            0.021537313 = score(doc=2265,freq=1.0), product of:
              0.068537526 = queryWeight, product of:
                1.0378934 = boost
                4.022287 = idf(docFreq=2080, maxDocs=42740)
                0.016417334 = queryNorm
              0.31424117 = fieldWeight in 2265, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.022287 = idf(docFreq=2080, maxDocs=42740)
                0.078125 = fieldNorm(doc=2265)
          0.027061738 = weight(abstract_txt:indexing in 2265) [ClassicSimilarity], result of:
            0.027061738 = score(doc=2265,freq=1.0), product of:
              0.07980643 = queryWeight, product of:
                1.1199728 = boost
                4.34038 = idf(docFreq=1513, maxDocs=42740)
                0.016417334 = queryNorm
              0.3390922 = fieldWeight in 2265, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.34038 = idf(docFreq=1513, maxDocs=42740)
                0.078125 = fieldNorm(doc=2265)
          0.030121742 = weight(abstract_txt:semantic in 2265) [ClassicSimilarity], result of:
            0.030121742 = score(doc=2265,freq=1.0), product of:
              0.08571447 = queryWeight, product of:
                1.1606883 = boost
                4.4981704 = idf(docFreq=1292, maxDocs=42740)
                0.016417334 = queryNorm
              0.35141957 = fieldWeight in 2265, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4981704 = idf(docFreq=1292, maxDocs=42740)
                0.078125 = fieldNorm(doc=2265)
          0.11537318 = weight(abstract_txt:latent in 2265) [ClassicSimilarity], result of:
            0.11537318 = score(doc=2265,freq=1.0), product of:
              0.20983149 = queryWeight, product of:
                1.8160336 = boost
                7.0379176 = idf(docFreq=101, maxDocs=42740)
                0.016417334 = queryNorm
              0.5498373 = fieldWeight in 2265, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0379176 = idf(docFreq=101, maxDocs=42740)
                0.078125 = fieldNorm(doc=2265)
          0.20522095 = weight(abstract_txt:dimensionality in 2265) [ClassicSimilarity], result of:
            0.20522095 = score(doc=2265,freq=1.0), product of:
              0.30804574 = queryWeight, product of:
                2.2003722 = boost
                8.527396 = idf(docFreq=22, maxDocs=42740)
                0.016417334 = queryNorm
              0.66620284 = fieldWeight in 2265, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.527396 = idf(docFreq=22, maxDocs=42740)
                0.078125 = fieldNorm(doc=2265)
        0.2 = coord(5/25)