Document (#34019)

Author
Efron, M.
Title
Query expansion and dimensionality reduction : Notions of optimality in Rocchio relevance feedback and latent semantic indexing
Source
Information processing and management. 44(2008) no.1, S.163-180
Year
2008
Abstract
Rocchio relevance feedback and latent semantic indexing (LSI) are well-known extensions of the vector space model for information retrieval (IR). This paper analyzes the statistical relationship between these extensions. The analysis focuses on each method's basis in least-squares optimization. Noting that LSI and Rocchio relevance feedback both alter the vector space model in a way that is in some sense least-squares optimal, we ask: what is the relationship between LSI's and Rocchio's notions of optimality? What does this relationship imply for IR? Using an analytical approach, we argue that Rocchio relevance feedback is optimal if we understand retrieval as a simplified classification problem. On the other hand, LSI's motivation comes to the fore if we understand it as a biased regression technique, where projection onto a low-dimensional orthogonal subspace of the documents reduces model variance.
Theme
Retrievalalgorithmen
Object
Rocchio-Algorithmus
Latent semantic indexing

Similar documents (author)

  1. Efron, M.: Eigenvalue-based model selection during Latent Semantic Indexing (2005) 6.08
    6.084933 = sum of:
      6.084933 = weight(author_txt:efron in 4683) [ClassicSimilarity], result of:
        6.084933 = fieldWeight in 4683, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.735892 = idf(docFreq=6, maxDocs=43556)
          0.625 = fieldNorm(doc=4683)
    
  2. Efron, M.: Shannon meets Shortz : a probabilistic model of crossword puzzle difficulty (2008) 6.08
    6.084933 = sum of:
      6.084933 = weight(author_txt:efron in 3618) [ClassicSimilarity], result of:
        6.084933 = fieldWeight in 3618, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.735892 = idf(docFreq=6, maxDocs=43556)
          0.625 = fieldNorm(doc=3618)
    
  3. Efron, M.: Linear time series models for term weighting in information retrieval (2010) 6.08
    6.084933 = sum of:
      6.084933 = weight(author_txt:efron in 686) [ClassicSimilarity], result of:
        6.084933 = fieldWeight in 686, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.735892 = idf(docFreq=6, maxDocs=43556)
          0.625 = fieldNorm(doc=686)
    
  4. Efron, M.: Information search and retrieval in microblogs (2011) 6.08
    6.084933 = sum of:
      6.084933 = weight(author_txt:efron in 1453) [ClassicSimilarity], result of:
        6.084933 = fieldWeight in 1453, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.735892 = idf(docFreq=6, maxDocs=43556)
          0.625 = fieldNorm(doc=1453)
    
  5. Efron, M.; Winget, M.: Query polyrepresentation for ranking retrieval systems without relevance judgments (2010) 4.87
    4.867946 = sum of:
      4.867946 = weight(author_txt:efron in 467) [ClassicSimilarity], result of:
        4.867946 = fieldWeight in 467, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.735892 = idf(docFreq=6, maxDocs=43556)
          0.5 = fieldNorm(doc=467)
    

Similar documents (content)

  1. Tsai, C.-F.; Hu, Y.-H.; Chen, Z.-Y.: Factors affecting rocchio-based pseudorelevance feedback in image retrieval (2015) 0.19
    0.18631317 = sum of:
      0.18631317 = product of:
        1.1644573 = sum of:
          0.0944576 = weight(abstract_txt:optimal in 3605) [ClassicSimilarity], result of:
            0.0944576 = score(doc=3605,freq=2.0), product of:
              0.15987004 = queryWeight, product of:
                1.6928128 = boost
                6.6845903 = idf(docFreq=147, maxDocs=43556)
                0.014128088 = queryNorm
              0.59083986 = fieldWeight in 3605, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.6845903 = idf(docFreq=147, maxDocs=43556)
                0.0625 = fieldNorm(doc=3605)
          0.053725693 = weight(abstract_txt:relevance in 3605) [ClassicSimilarity], result of:
            0.053725693 = score(doc=3605,freq=1.0), product of:
              0.17421432 = queryWeight, product of:
                2.4990923 = boost
                4.934216 = idf(docFreq=851, maxDocs=43556)
                0.014128088 = queryNorm
              0.3083885 = fieldWeight in 3605, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.934216 = idf(docFreq=851, maxDocs=43556)
                0.0625 = fieldNorm(doc=3605)
          0.16304299 = weight(abstract_txt:feedback in 3605) [ClassicSimilarity], result of:
            0.16304299 = score(doc=3605,freq=3.0), product of:
              0.253196 = queryWeight, product of:
                3.0127895 = boost
                5.9484615 = idf(docFreq=308, maxDocs=43556)
                0.014128088 = queryNorm
              0.64393985 = fieldWeight in 3605, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.9484615 = idf(docFreq=308, maxDocs=43556)
                0.0625 = fieldNorm(doc=3605)
          0.8532311 = weight(abstract_txt:rocchio in 3605) [ClassicSimilarity], result of:
            0.8532311 = score(doc=3605,freq=5.0), product of:
              0.6437001 = queryWeight, product of:
                4.8037696 = boost
                9.484578 = idf(docFreq=8, maxDocs=43556)
                0.014128088 = queryNorm
              1.3255101 = fieldWeight in 3605, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                9.484578 = idf(docFreq=8, maxDocs=43556)
                0.0625 = fieldNorm(doc=3605)
        0.16 = coord(4/25)
    
  2. Zhu, W.Z.; Allen, R.B.: Document clustering using the LSI subspace signature model (2013) 0.18
    0.18277349 = sum of:
      0.18277349 = product of:
        0.6527625 = sum of:
          0.032510582 = weight(abstract_txt:indexing in 2688) [ClassicSimilarity], result of:
            0.032510582 = score(doc=2688,freq=2.0), product of:
              0.06766317 = queryWeight, product of:
                1.1012896 = boost
                4.3487797 = idf(docFreq=1529, maxDocs=43556)
                0.014128088 = queryNorm
              0.4804768 = fieldWeight in 2688, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.3487797 = idf(docFreq=1529, maxDocs=43556)
                0.078125 = fieldNorm(doc=2688)
          0.043636944 = weight(abstract_txt:semantic in 2688) [ClassicSimilarity], result of:
            0.043636944 = score(doc=2688,freq=3.0), product of:
              0.07192419 = queryWeight, product of:
                1.1354365 = boost
                4.483619 = idf(docFreq=1336, maxDocs=43556)
                0.014128088 = queryNorm
              0.6067075 = fieldWeight in 2688, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.483619 = idf(docFreq=1336, maxDocs=43556)
                0.078125 = fieldNorm(doc=2688)
          0.23417081 = weight(abstract_txt:subspace in 2688) [ClassicSimilarity], result of:
            0.23417081 = score(doc=2688,freq=3.0), product of:
              0.1749782 = queryWeight, product of:
                1.2522826 = boost
                9.890043 = idf(docFreq=5, maxDocs=43556)
                0.014128088 = queryNorm
              1.3382857 = fieldWeight in 2688, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.890043 = idf(docFreq=5, maxDocs=43556)
                0.078125 = fieldNorm(doc=2688)
          0.04395681 = weight(abstract_txt:space in 2688) [ClassicSimilarity], result of:
            0.04395681 = score(doc=2688,freq=1.0), product of:
              0.10423892 = queryWeight, product of:
                1.3669113 = boost
                5.3976684 = idf(docFreq=535, maxDocs=43556)
                0.014128088 = queryNorm
              0.42169285 = fieldWeight in 2688, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3976684 = idf(docFreq=535, maxDocs=43556)
                0.078125 = fieldNorm(doc=2688)
          0.05375131 = weight(abstract_txt:model in 2688) [ClassicSimilarity], result of:
            0.05375131 = score(doc=2688,freq=4.0), product of:
              0.08595721 = queryWeight, product of:
                1.5202401 = boost
                4.002089 = idf(docFreq=2163, maxDocs=43556)
                0.014128088 = queryNorm
              0.6253264 = fieldWeight in 2688, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.002089 = idf(docFreq=2163, maxDocs=43556)
                0.078125 = fieldNorm(doc=2688)
          0.0773667 = weight(abstract_txt:vector in 2688) [ClassicSimilarity], result of:
            0.0773667 = score(doc=2688,freq=1.0), product of:
              0.15195508 = queryWeight, product of:
                1.6503763 = boost
                6.517017 = idf(docFreq=174, maxDocs=43556)
                0.014128088 = queryNorm
              0.5091419 = fieldWeight in 2688, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.517017 = idf(docFreq=174, maxDocs=43556)
                0.078125 = fieldNorm(doc=2688)
          0.16736929 = weight(abstract_txt:latent in 2688) [ClassicSimilarity], result of:
            0.16736929 = score(doc=2688,freq=3.0), product of:
              0.17623383 = queryWeight, product of:
                1.7773379 = boost
                7.0183635 = idf(docFreq=105, maxDocs=43556)
                0.014128088 = queryNorm
              0.9497002 = fieldWeight in 2688, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.0183635 = idf(docFreq=105, maxDocs=43556)
                0.078125 = fieldNorm(doc=2688)
        0.28 = coord(7/25)
    
  3. Pan, M.; Huang, J.X.; He, T.; Mao, Z.; Ying, Z.; Tu, X.: ¬A simple kernel co-occurrence-based enhancement for pseudo-relevance feedback (2020) 0.16
    0.15694293 = sum of:
      0.15694293 = product of:
        0.78471464 = sum of:
          0.044565126 = weight(abstract_txt:least in 1964) [ClassicSimilarity], result of:
            0.044565126 = score(doc=1964,freq=1.0), product of:
              0.12207197 = queryWeight, product of:
                1.4792219 = boost
                5.8411613 = idf(docFreq=343, maxDocs=43556)
                0.014128088 = queryNorm
              0.36507258 = fieldWeight in 1964, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8411613 = idf(docFreq=343, maxDocs=43556)
                0.0625 = fieldNorm(doc=1964)
          0.030406334 = weight(abstract_txt:model in 1964) [ClassicSimilarity], result of:
            0.030406334 = score(doc=1964,freq=2.0), product of:
              0.08595721 = queryWeight, product of:
                1.5202401 = boost
                4.002089 = idf(docFreq=2163, maxDocs=43556)
                0.014128088 = queryNorm
              0.35373804 = fieldWeight in 1964, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.002089 = idf(docFreq=2163, maxDocs=43556)
                0.0625 = fieldNorm(doc=1964)
          0.0759796 = weight(abstract_txt:relevance in 1964) [ClassicSimilarity], result of:
            0.0759796 = score(doc=1964,freq=2.0), product of:
              0.17421432 = queryWeight, product of:
                2.4990923 = boost
                4.934216 = idf(docFreq=851, maxDocs=43556)
                0.014128088 = queryNorm
              0.4361272 = fieldWeight in 1964, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.934216 = idf(docFreq=851, maxDocs=43556)
                0.0625 = fieldNorm(doc=1964)
          0.094132915 = weight(abstract_txt:feedback in 1964) [ClassicSimilarity], result of:
            0.094132915 = score(doc=1964,freq=1.0), product of:
              0.253196 = queryWeight, product of:
                3.0127895 = boost
                5.9484615 = idf(docFreq=308, maxDocs=43556)
                0.014128088 = queryNorm
              0.37177885 = fieldWeight in 1964, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9484615 = idf(docFreq=308, maxDocs=43556)
                0.0625 = fieldNorm(doc=1964)
          0.53963065 = weight(abstract_txt:rocchio in 1964) [ClassicSimilarity], result of:
            0.53963065 = score(doc=1964,freq=2.0), product of:
              0.6437001 = queryWeight, product of:
                4.8037696 = boost
                9.484578 = idf(docFreq=8, maxDocs=43556)
                0.014128088 = queryNorm
              0.83832616 = fieldWeight in 1964, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.484578 = idf(docFreq=8, maxDocs=43556)
                0.0625 = fieldNorm(doc=1964)
        0.2 = coord(5/25)
    
  4. Kumar, C.A.; Radvansky, M.; Annapurna, J.: Analysis of Vector Space Model, Latent Semantic Indexing and Formal Concept Analysis for information retrieval (2012) 0.14
    0.13634175 = sum of:
      0.13634175 = product of:
        0.4869348 = sum of:
          0.027586145 = weight(abstract_txt:indexing in 4708) [ClassicSimilarity], result of:
            0.027586145 = score(doc=4708,freq=1.0), product of:
              0.06766317 = queryWeight, product of:
                1.1012896 = boost
                4.3487797 = idf(docFreq=1529, maxDocs=43556)
                0.014128088 = queryNorm
              0.4076981 = fieldWeight in 4708, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3487797 = idf(docFreq=1529, maxDocs=43556)
                0.09375 = fieldNorm(doc=4708)
          0.0427553 = weight(abstract_txt:semantic in 4708) [ClassicSimilarity], result of:
            0.0427553 = score(doc=4708,freq=2.0), product of:
              0.07192419 = queryWeight, product of:
                1.1354365 = boost
                4.483619 = idf(docFreq=1336, maxDocs=43556)
                0.014128088 = queryNorm
              0.5944495 = fieldWeight in 4708, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.483619 = idf(docFreq=1336, maxDocs=43556)
                0.09375 = fieldNorm(doc=4708)
          0.052748166 = weight(abstract_txt:space in 4708) [ClassicSimilarity], result of:
            0.052748166 = score(doc=4708,freq=1.0), product of:
              0.10423892 = queryWeight, product of:
                1.3669113 = boost
                5.3976684 = idf(docFreq=535, maxDocs=43556)
                0.014128088 = queryNorm
              0.5060314 = fieldWeight in 4708, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3976684 = idf(docFreq=535, maxDocs=43556)
                0.09375 = fieldNorm(doc=4708)
          0.0456095 = weight(abstract_txt:model in 4708) [ClassicSimilarity], result of:
            0.0456095 = score(doc=4708,freq=2.0), product of:
              0.08595721 = queryWeight, product of:
                1.5202401 = boost
                4.002089 = idf(docFreq=2163, maxDocs=43556)
                0.014128088 = queryNorm
              0.53060704 = fieldWeight in 4708, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.002089 = idf(docFreq=2163, maxDocs=43556)
                0.09375 = fieldNorm(doc=4708)
          0.092840046 = weight(abstract_txt:vector in 4708) [ClassicSimilarity], result of:
            0.092840046 = score(doc=4708,freq=1.0), product of:
              0.15195508 = queryWeight, product of:
                1.6503763 = boost
                6.517017 = idf(docFreq=174, maxDocs=43556)
                0.014128088 = queryNorm
              0.6109703 = fieldWeight in 4708, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.517017 = idf(docFreq=174, maxDocs=43556)
                0.09375 = fieldNorm(doc=4708)
          0.16398774 = weight(abstract_txt:latent in 4708) [ClassicSimilarity], result of:
            0.16398774 = score(doc=4708,freq=2.0), product of:
              0.17623383 = queryWeight, product of:
                1.7773379 = boost
                7.0183635 = idf(docFreq=105, maxDocs=43556)
                0.014128088 = queryNorm
              0.9305123 = fieldWeight in 4708, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.0183635 = idf(docFreq=105, maxDocs=43556)
                0.09375 = fieldNorm(doc=4708)
          0.061407898 = weight(abstract_txt:relationship in 4708) [ClassicSimilarity], result of:
            0.061407898 = score(doc=4708,freq=1.0), product of:
              0.13204995 = queryWeight, product of:
                1.884257 = boost
                4.960377 = idf(docFreq=829, maxDocs=43556)
                0.014128088 = queryNorm
              0.46503538 = fieldWeight in 4708, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.960377 = idf(docFreq=829, maxDocs=43556)
                0.09375 = fieldNorm(doc=4708)
        0.28 = coord(7/25)
    
  5. Layfield, C.; Azzopardi, J,; Staff, C.: Experiments with document retrieval from small text collections using Latent Semantic Analysis or term similarity with query coordination and automatic relevance feedback (2017) 0.13
    0.13080862 = sum of:
      0.13080862 = product of:
        0.40877697 = sum of:
          0.016091919 = weight(abstract_txt:indexing in 476) [ClassicSimilarity], result of:
            0.016091919 = score(doc=476,freq=1.0), product of:
              0.06766317 = queryWeight, product of:
                1.1012896 = boost
                4.3487797 = idf(docFreq=1529, maxDocs=43556)
                0.014128088 = queryNorm
              0.23782389 = fieldWeight in 476, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3487797 = idf(docFreq=1529, maxDocs=43556)
                0.0546875 = fieldNorm(doc=476)
          0.03054586 = weight(abstract_txt:semantic in 476) [ClassicSimilarity], result of:
            0.03054586 = score(doc=476,freq=3.0), product of:
              0.07192419 = queryWeight, product of:
                1.1354365 = boost
                4.483619 = idf(docFreq=1336, maxDocs=43556)
                0.014128088 = queryNorm
              0.42469525 = fieldWeight in 476, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.483619 = idf(docFreq=1336, maxDocs=43556)
                0.0546875 = fieldNorm(doc=476)
          0.030769765 = weight(abstract_txt:space in 476) [ClassicSimilarity], result of:
            0.030769765 = score(doc=476,freq=1.0), product of:
              0.10423892 = queryWeight, product of:
                1.3669113 = boost
                5.3976684 = idf(docFreq=535, maxDocs=43556)
                0.014128088 = queryNorm
              0.295185 = fieldWeight in 476, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3976684 = idf(docFreq=535, maxDocs=43556)
                0.0546875 = fieldNorm(doc=476)
          0.02660554 = weight(abstract_txt:model in 476) [ClassicSimilarity], result of:
            0.02660554 = score(doc=476,freq=2.0), product of:
              0.08595721 = queryWeight, product of:
                1.5202401 = boost
                4.002089 = idf(docFreq=2163, maxDocs=43556)
                0.014128088 = queryNorm
              0.30952078 = fieldWeight in 476, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.002089 = idf(docFreq=2163, maxDocs=43556)
                0.0546875 = fieldNorm(doc=476)
          0.054156695 = weight(abstract_txt:vector in 476) [ClassicSimilarity], result of:
            0.054156695 = score(doc=476,freq=1.0), product of:
              0.15195508 = queryWeight, product of:
                1.6503763 = boost
                6.517017 = idf(docFreq=174, maxDocs=43556)
                0.014128088 = queryNorm
              0.35639936 = fieldWeight in 476, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.517017 = idf(docFreq=174, maxDocs=43556)
                0.0546875 = fieldNorm(doc=476)
          0.0676415 = weight(abstract_txt:latent in 476) [ClassicSimilarity], result of:
            0.0676415 = score(doc=476,freq=1.0), product of:
              0.17623383 = queryWeight, product of:
                1.7773379 = boost
                7.0183635 = idf(docFreq=105, maxDocs=43556)
                0.014128088 = queryNorm
              0.38381675 = fieldWeight in 476, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0183635 = idf(docFreq=105, maxDocs=43556)
                0.0546875 = fieldNorm(doc=476)
          0.06648215 = weight(abstract_txt:relevance in 476) [ClassicSimilarity], result of:
            0.06648215 = score(doc=476,freq=2.0), product of:
              0.17421432 = queryWeight, product of:
                2.4990923 = boost
                4.934216 = idf(docFreq=851, maxDocs=43556)
                0.014128088 = queryNorm
              0.3816113 = fieldWeight in 476, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.934216 = idf(docFreq=851, maxDocs=43556)
                0.0546875 = fieldNorm(doc=476)
          0.11648353 = weight(abstract_txt:feedback in 476) [ClassicSimilarity], result of:
            0.11648353 = score(doc=476,freq=2.0), product of:
              0.253196 = queryWeight, product of:
                3.0127895 = boost
                5.9484615 = idf(docFreq=308, maxDocs=43556)
                0.014128088 = queryNorm
              0.46005282 = fieldWeight in 476, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.9484615 = idf(docFreq=308, maxDocs=43556)
                0.0546875 = fieldNorm(doc=476)
        0.32 = coord(8/25)