Document (#34021)

Efron, M.
Query expansion and dimensionality reduction : Notions of optimality in Rocchio relevance feedback and latent semantic indexing
Information processing and management. 44(2008) no.1, S.163-180
Rocchio relevance feedback and latent semantic indexing (LSI) are well-known extensions of the vector space model for information retrieval (IR). This paper analyzes the statistical relationship between these extensions. The analysis focuses on each method's basis in least-squares optimization. Noting that LSI and Rocchio relevance feedback both alter the vector space model in a way that is in some sense least-squares optimal, we ask: what is the relationship between LSI's and Rocchio's notions of optimality? What does this relationship imply for IR? Using an analytical approach, we argue that Rocchio relevance feedback is optimal if we understand retrieval as a simplified classification problem. On the other hand, LSI's motivation comes to the fore if we understand it as a biased regression technique, where projection onto a low-dimensional orthogonal subspace of the documents reduces model variance.
Latent semantic indexing

Similar documents (author)

  1. Efron, M.: Eigenvalue-based model selection during Latent Semantic Indexing (2005) 6.09
    6.094361 = sum of:
      6.094361 = weight(author_txt:efron in 3685) [ClassicSimilarity], result of:
        6.094361 = fieldWeight in 3685, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.7509775 = idf(docFreq=6, maxDocs=44218)
          0.625 = fieldNorm(doc=3685)
  2. Efron, M.: Shannon meets Shortz : a probabilistic model of crossword puzzle difficulty (2008) 6.09
    6.094361 = sum of:
      6.094361 = weight(author_txt:efron in 1620) [ClassicSimilarity], result of:
        6.094361 = fieldWeight in 1620, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.7509775 = idf(docFreq=6, maxDocs=44218)
          0.625 = fieldNorm(doc=1620)
  3. Efron, M.: Linear time series models for term weighting in information retrieval (2010) 6.09
    6.094361 = sum of:
      6.094361 = weight(author_txt:efron in 3688) [ClassicSimilarity], result of:
        6.094361 = fieldWeight in 3688, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.7509775 = idf(docFreq=6, maxDocs=44218)
          0.625 = fieldNorm(doc=3688)
  4. Efron, M.: Information search and retrieval in microblogs (2011) 6.09
    6.094361 = sum of:
      6.094361 = weight(author_txt:efron in 4455) [ClassicSimilarity], result of:
        6.094361 = fieldWeight in 4455, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.7509775 = idf(docFreq=6, maxDocs=44218)
          0.625 = fieldNorm(doc=4455)
  5. Efron, M.; Winget, M.: Query polyrepresentation for ranking retrieval systems without relevance judgments (2010) 4.88
    4.8754888 = sum of:
      4.8754888 = weight(author_txt:efron in 3469) [ClassicSimilarity], result of:
        4.8754888 = fieldWeight in 3469, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.7509775 = idf(docFreq=6, maxDocs=44218)
          0.5 = fieldNorm(doc=3469)

Similar documents (content)

  1. Tsai, C.-F.; Hu, Y.-H.; Chen, Z.-Y.: Factors affecting rocchio-based pseudorelevance feedback in image retrieval (2015) 0.19
    0.18712366 = sum of:
      0.18712366 = product of:
        1.1695229 = sum of:
          0.09489738 = weight(abstract_txt:optimal in 1607) [ClassicSimilarity], result of:
            0.09489738 = score(doc=1607,freq=2.0), product of:
              0.160414 = queryWeight, product of:
                1.7012537 = boost
                6.6929407 = idf(docFreq=148, maxDocs=44218)
                0.014088223 = queryNorm
              0.59157795 = fieldWeight in 1607, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.6929407 = idf(docFreq=148, maxDocs=44218)
                0.0625 = fieldNorm(doc=1607)
          0.053696677 = weight(abstract_txt:relevance in 1607) [ClassicSimilarity], result of:
            0.053696677 = score(doc=1607,freq=1.0), product of:
              0.17420383 = queryWeight, product of:
                2.5072162 = boost
                4.931848 = idf(docFreq=866, maxDocs=44218)
                0.014088223 = queryNorm
              0.3082405 = fieldWeight in 1607, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.931848 = idf(docFreq=866, maxDocs=44218)
                0.0625 = fieldNorm(doc=1607)
          0.16284868 = weight(abstract_txt:feedback in 1607) [ClassicSimilarity], result of:
            0.16284868 = score(doc=1607,freq=3.0), product of:
              0.25307068 = queryWeight, product of:
                3.0219262 = boost
                5.9443145 = idf(docFreq=314, maxDocs=44218)
                0.014088223 = queryNorm
              0.6434909 = fieldWeight in 1607, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.9443145 = idf(docFreq=314, maxDocs=44218)
                0.0625 = fieldNorm(doc=1607)
          0.85808015 = weight(abstract_txt:rocchio in 1607) [ClassicSimilarity], result of:
            0.85808015 = score(doc=1607,freq=5.0), product of:
              0.6463305 = queryWeight, product of:
                4.8293676 = boost
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.014088223 = queryNorm
              1.3276182 = fieldWeight in 1607, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.0625 = fieldNorm(doc=1607)
        0.16 = coord(4/25)
  2. Zhu, W.Z.; Allen, R.B.: Document clustering using the LSI subspace signature model (2013) 0.18
    0.18255211 = sum of:
      0.18255211 = product of:
        0.6519718 = sum of:
          0.032557983 = weight(abstract_txt:indexing in 690) [ClassicSimilarity], result of:
            0.032557983 = score(doc=690,freq=2.0), product of:
              0.06774923 = queryWeight, product of:
                1.1056054 = boost
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.014088223 = queryNorm
              0.48056605 = fieldWeight in 690, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.078125 = fieldNorm(doc=690)
          0.04340507 = weight(abstract_txt:semantic in 690) [ClassicSimilarity], result of:
            0.04340507 = score(doc=690,freq=3.0), product of:
              0.07169068 = queryWeight, product of:
                1.1373111 = boost
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.014088223 = queryNorm
              0.6054493 = fieldWeight in 690, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.078125 = fieldNorm(doc=690)
          0.23545566 = weight(abstract_txt:subspace in 690) [ClassicSimilarity], result of:
            0.23545566 = score(doc=690,freq=3.0), product of:
              0.17567034 = queryWeight, product of:
                1.2588737 = boost
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.014088223 = queryNorm
              1.3403268 = fieldWeight in 690, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.078125 = fieldNorm(doc=690)
          0.043868612 = weight(abstract_txt:space in 690) [ClassicSimilarity], result of:
            0.043868612 = score(doc=690,freq=1.0), product of:
              0.10413067 = queryWeight, product of:
                1.3706838 = boost
                5.3924384 = idf(docFreq=546, maxDocs=44218)
                0.014088223 = queryNorm
              0.42128426 = fieldWeight in 690, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3924384 = idf(docFreq=546, maxDocs=44218)
                0.078125 = fieldNorm(doc=690)
          0.05316281 = weight(abstract_txt:model in 690) [ClassicSimilarity], result of:
            0.05316281 = score(doc=690,freq=4.0), product of:
              0.085354246 = queryWeight, product of:
                1.519869 = boost
                3.986234 = idf(docFreq=2231, maxDocs=44218)
                0.014088223 = queryNorm
              0.62284905 = fieldWeight in 690, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.986234 = idf(docFreq=2231, maxDocs=44218)
                0.078125 = fieldNorm(doc=690)
          0.07756906 = weight(abstract_txt:vector in 690) [ClassicSimilarity], result of:
            0.07756906 = score(doc=690,freq=1.0), product of:
              0.1522656 = queryWeight, product of:
                1.657482 = boost
                6.5207376 = idf(docFreq=176, maxDocs=44218)
                0.014088223 = queryNorm
              0.5094326 = fieldWeight in 690, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5207376 = idf(docFreq=176, maxDocs=44218)
                0.078125 = fieldNorm(doc=690)
          0.16595264 = weight(abstract_txt:latent in 690) [ClassicSimilarity], result of:
            0.16595264 = score(doc=690,freq=3.0), product of:
              0.17529052 = queryWeight, product of:
                1.7783906 = boost
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.014088223 = queryNorm
              0.9467291 = fieldWeight in 690, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.078125 = fieldNorm(doc=690)
        0.28 = coord(7/25)
  3. Pan, M.; Huang, J.X.; He, T.; Mao, Z.; Ying, Z.; Tu, X.: ¬A simple kernel co-occurrence-based enhancement for pseudo-relevance feedback (2020) 0.16
    0.15741797 = sum of:
      0.15741797 = product of:
        0.7870898 = sum of:
          0.0443596 = weight(abstract_txt:least in 5678) [ClassicSimilarity], result of:
            0.0443596 = score(doc=5678,freq=1.0), product of:
              0.12173286 = queryWeight, product of:
                1.4820125 = boost
                5.830419 = idf(docFreq=352, maxDocs=44218)
                0.014088223 = queryNorm
              0.3644012 = fieldWeight in 5678, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.830419 = idf(docFreq=352, maxDocs=44218)
                0.0625 = fieldNorm(doc=5678)
          0.030073427 = weight(abstract_txt:model in 5678) [ClassicSimilarity], result of:
            0.030073427 = score(doc=5678,freq=2.0), product of:
              0.085354246 = queryWeight, product of:
                1.519869 = boost
                3.986234 = idf(docFreq=2231, maxDocs=44218)
                0.014088223 = queryNorm
              0.35233662 = fieldWeight in 5678, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.986234 = idf(docFreq=2231, maxDocs=44218)
                0.0625 = fieldNorm(doc=5678)
          0.07593857 = weight(abstract_txt:relevance in 5678) [ClassicSimilarity], result of:
            0.07593857 = score(doc=5678,freq=2.0), product of:
              0.17420383 = queryWeight, product of:
                2.5072162 = boost
                4.931848 = idf(docFreq=866, maxDocs=44218)
                0.014088223 = queryNorm
              0.43591788 = fieldWeight in 5678, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.931848 = idf(docFreq=866, maxDocs=44218)
                0.0625 = fieldNorm(doc=5678)
          0.09402073 = weight(abstract_txt:feedback in 5678) [ClassicSimilarity], result of:
            0.09402073 = score(doc=5678,freq=1.0), product of:
              0.25307068 = queryWeight, product of:
                3.0219262 = boost
                5.9443145 = idf(docFreq=314, maxDocs=44218)
                0.014088223 = queryNorm
              0.37151965 = fieldWeight in 5678, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9443145 = idf(docFreq=314, maxDocs=44218)
                0.0625 = fieldNorm(doc=5678)
          0.5426975 = weight(abstract_txt:rocchio in 5678) [ClassicSimilarity], result of:
            0.5426975 = score(doc=5678,freq=2.0), product of:
              0.6463305 = queryWeight, product of:
                4.8293676 = boost
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.014088223 = queryNorm
              0.83965945 = fieldWeight in 5678, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.0625 = fieldNorm(doc=5678)
        0.2 = coord(5/25)
  4. Kumar, C.A.; Radvansky, M.; Annapurna, J.: Analysis of Vector Space Model, Latent Semantic Indexing and Formal Concept Analysis for information retrieval (2012) 0.14
    0.13576078 = sum of:
      0.13576078 = product of:
        0.48485994 = sum of:
          0.027626364 = weight(abstract_txt:indexing in 2710) [ClassicSimilarity], result of:
            0.027626364 = score(doc=2710,freq=1.0), product of:
              0.06774923 = queryWeight, product of:
                1.1056054 = boost
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.014088223 = queryNorm
              0.40777382 = fieldWeight in 2710, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.09375 = fieldNorm(doc=2710)
          0.04252811 = weight(abstract_txt:semantic in 2710) [ClassicSimilarity], result of:
            0.04252811 = score(doc=2710,freq=2.0), product of:
              0.07169068 = queryWeight, product of:
                1.1373111 = boost
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.014088223 = queryNorm
              0.5932168 = fieldWeight in 2710, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.09375 = fieldNorm(doc=2710)
          0.05264233 = weight(abstract_txt:space in 2710) [ClassicSimilarity], result of:
            0.05264233 = score(doc=2710,freq=1.0), product of:
              0.10413067 = queryWeight, product of:
                1.3706838 = boost
                5.3924384 = idf(docFreq=546, maxDocs=44218)
                0.014088223 = queryNorm
              0.5055411 = fieldWeight in 2710, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3924384 = idf(docFreq=546, maxDocs=44218)
                0.09375 = fieldNorm(doc=2710)
          0.045110136 = weight(abstract_txt:model in 2710) [ClassicSimilarity], result of:
            0.045110136 = score(doc=2710,freq=2.0), product of:
              0.085354246 = queryWeight, product of:
                1.519869 = boost
                3.986234 = idf(docFreq=2231, maxDocs=44218)
                0.014088223 = queryNorm
              0.5285049 = fieldWeight in 2710, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.986234 = idf(docFreq=2231, maxDocs=44218)
                0.09375 = fieldNorm(doc=2710)
          0.093082875 = weight(abstract_txt:vector in 2710) [ClassicSimilarity], result of:
            0.093082875 = score(doc=2710,freq=1.0), product of:
              0.1522656 = queryWeight, product of:
                1.657482 = boost
                6.5207376 = idf(docFreq=176, maxDocs=44218)
                0.014088223 = queryNorm
              0.6113192 = fieldWeight in 2710, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5207376 = idf(docFreq=176, maxDocs=44218)
                0.09375 = fieldNorm(doc=2710)
          0.16259973 = weight(abstract_txt:latent in 2710) [ClassicSimilarity], result of:
            0.16259973 = score(doc=2710,freq=2.0), product of:
              0.17529052 = queryWeight, product of:
                1.7783906 = boost
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.014088223 = queryNorm
              0.92760134 = fieldWeight in 2710, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.09375 = fieldNorm(doc=2710)
          0.06127041 = weight(abstract_txt:relationship in 2710) [ClassicSimilarity], result of:
            0.06127041 = score(doc=2710,freq=1.0), product of:
              0.13189232 = queryWeight, product of:
                1.8893105 = boost
                4.9551864 = idf(docFreq=846, maxDocs=44218)
                0.014088223 = queryNorm
              0.4645487 = fieldWeight in 2710, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9551864 = idf(docFreq=846, maxDocs=44218)
                0.09375 = fieldNorm(doc=2710)
        0.28 = coord(7/25)
  5. Layfield, C.; Azzopardi, J,; Staff, C.: Experiments with document retrieval from small text collections using Latent Semantic Analysis or term similarity with query coordination and automatic relevance feedback (2017) 0.13
    0.13045743 = sum of:
      0.13045743 = product of:
        0.40767947 = sum of:
          0.016115379 = weight(abstract_txt:indexing in 3478) [ClassicSimilarity], result of:
            0.016115379 = score(doc=3478,freq=1.0), product of:
              0.06774923 = queryWeight, product of:
                1.1056054 = boost
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.014088223 = queryNorm
              0.23786807 = fieldWeight in 3478, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3495874 = idf(docFreq=1551, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3478)
          0.030383551 = weight(abstract_txt:semantic in 3478) [ClassicSimilarity], result of:
            0.030383551 = score(doc=3478,freq=3.0), product of:
              0.07169068 = queryWeight, product of:
                1.1373111 = boost
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.014088223 = queryNorm
              0.42381454 = fieldWeight in 3478, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3478)
          0.03070803 = weight(abstract_txt:space in 3478) [ClassicSimilarity], result of:
            0.03070803 = score(doc=3478,freq=1.0), product of:
              0.10413067 = queryWeight, product of:
                1.3706838 = boost
                5.3924384 = idf(docFreq=546, maxDocs=44218)
                0.014088223 = queryNorm
              0.294899 = fieldWeight in 3478, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3924384 = idf(docFreq=546, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3478)
          0.026314247 = weight(abstract_txt:model in 3478) [ClassicSimilarity], result of:
            0.026314247 = score(doc=3478,freq=2.0), product of:
              0.085354246 = queryWeight, product of:
                1.519869 = boost
                3.986234 = idf(docFreq=2231, maxDocs=44218)
                0.014088223 = queryNorm
              0.30829453 = fieldWeight in 3478, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.986234 = idf(docFreq=2231, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3478)
          0.054298345 = weight(abstract_txt:vector in 3478) [ClassicSimilarity], result of:
            0.054298345 = score(doc=3478,freq=1.0), product of:
              0.1522656 = queryWeight, product of:
                1.657482 = boost
                6.5207376 = idf(docFreq=176, maxDocs=44218)
                0.014088223 = queryNorm
              0.35660285 = fieldWeight in 3478, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5207376 = idf(docFreq=176, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3478)
          0.067068964 = weight(abstract_txt:latent in 3478) [ClassicSimilarity], result of:
            0.067068964 = score(doc=3478,freq=1.0), product of:
              0.17529052 = queryWeight, product of:
                1.7783906 = boost
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.014088223 = queryNorm
              0.382616 = fieldWeight in 3478, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.996407 = idf(docFreq=109, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3478)
          0.066446245 = weight(abstract_txt:relevance in 3478) [ClassicSimilarity], result of:
            0.066446245 = score(doc=3478,freq=2.0), product of:
              0.17420383 = queryWeight, product of:
                2.5072162 = boost
                4.931848 = idf(docFreq=866, maxDocs=44218)
                0.014088223 = queryNorm
              0.38142815 = fieldWeight in 3478, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.931848 = idf(docFreq=866, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3478)
          0.11634472 = weight(abstract_txt:feedback in 3478) [ClassicSimilarity], result of:
            0.11634472 = score(doc=3478,freq=2.0), product of:
              0.25307068 = queryWeight, product of:
                3.0219262 = boost
                5.9443145 = idf(docFreq=314, maxDocs=44218)
                0.014088223 = queryNorm
              0.45973212 = fieldWeight in 3478, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.9443145 = idf(docFreq=314, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3478)
        0.32 = coord(8/25)