Document (#34022)

Author
Efron, M.
Title
Query expansion and dimensionality reduction : Notions of optimality in Rocchio relevance feedback and latent semantic indexing
Source
Information processing and management. 44(2008) no.1, S.163-180
Year
2008
Abstract
Rocchio relevance feedback and latent semantic indexing (LSI) are well-known extensions of the vector space model for information retrieval (IR). This paper analyzes the statistical relationship between these extensions. The analysis focuses on each method's basis in least-squares optimization. Noting that LSI and Rocchio relevance feedback both alter the vector space model in a way that is in some sense least-squares optimal, we ask: what is the relationship between LSI's and Rocchio's notions of optimality? What does this relationship imply for IR? Using an analytical approach, we argue that Rocchio relevance feedback is optimal if we understand retrieval as a simplified classification problem. On the other hand, LSI's motivation comes to the fore if we understand it as a biased regression technique, where projection onto a low-dimensional orthogonal subspace of the documents reduces model variance.
Theme
Retrievalalgorithmen
Object
Rocchio-Algorithmus
Latent semantic indexing

Similar documents (author)

  1. Efron, M.: Eigenvalue-based model selection during Latent Semantic Indexing (2005) 6.07
    6.0731125 = sum of:
      6.0731125 = weight(author_txt:efron in 4686) [ClassicSimilarity], result of:
        6.0731125 = fieldWeight in 4686, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.71698 = idf(docFreq=6, maxDocs=42740)
          0.625 = fieldNorm(doc=4686)
    
  2. Efron, M.: Shannon meets Shortz : a probabilistic model of crossword puzzle difficulty (2008) 6.07
    6.0731125 = sum of:
      6.0731125 = weight(author_txt:efron in 3621) [ClassicSimilarity], result of:
        6.0731125 = fieldWeight in 3621, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.71698 = idf(docFreq=6, maxDocs=42740)
          0.625 = fieldNorm(doc=3621)
    
  3. Efron, M.: Linear time series models for term weighting in information retrieval (2010) 6.07
    6.0731125 = sum of:
      6.0731125 = weight(author_txt:efron in 689) [ClassicSimilarity], result of:
        6.0731125 = fieldWeight in 689, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.71698 = idf(docFreq=6, maxDocs=42740)
          0.625 = fieldNorm(doc=689)
    
  4. Efron, M.: Information search and retrieval in microblogs (2011) 6.07
    6.0731125 = sum of:
      6.0731125 = weight(author_txt:efron in 1456) [ClassicSimilarity], result of:
        6.0731125 = fieldWeight in 1456, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.71698 = idf(docFreq=6, maxDocs=42740)
          0.625 = fieldNorm(doc=1456)
    
  5. Efron, M.; Winget, M.: Query polyrepresentation for ranking retrieval systems without relevance judgments (2010) 4.86
    4.85849 = sum of:
      4.85849 = weight(author_txt:efron in 470) [ClassicSimilarity], result of:
        4.85849 = fieldWeight in 470, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.71698 = idf(docFreq=6, maxDocs=42740)
          0.5 = fieldNorm(doc=470)
    

Similar documents (content)

  1. Tsai, C.-F.; Hu, Y.-H.; Chen, Z.-Y.: Factors affecting rocchio-based pseudorelevance feedback in image retrieval (2015) 0.18
    0.18483718 = sum of:
      0.18483718 = product of:
        1.1552324 = sum of:
          0.09395434 = weight(abstract_txt:optimal in 3608) [ClassicSimilarity], result of:
            0.09395434 = score(doc=3608,freq=2.0), product of:
              0.1591446 = queryWeight, product of:
                1.6905962 = boost
                6.679284 = idf(docFreq=145, maxDocs=42740)
                0.014093605 = queryNorm
              0.5903709 = fieldWeight in 3608, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.679284 = idf(docFreq=145, maxDocs=42740)
                0.0625 = fieldNorm(doc=3608)
          0.053568393 = weight(abstract_txt:relevance in 3608) [ClassicSimilarity], result of:
            0.053568393 = score(doc=3608,freq=1.0), product of:
              0.17370264 = queryWeight, product of:
                2.4978259 = boost
                4.934262 = idf(docFreq=835, maxDocs=42740)
                0.014093605 = queryNorm
              0.30839136 = fieldWeight in 3608, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.934262 = idf(docFreq=835, maxDocs=42740)
                0.0625 = fieldNorm(doc=3608)
          0.16207932 = weight(abstract_txt:feedback in 3608) [ClassicSimilarity], result of:
            0.16207932 = score(doc=3608,freq=3.0), product of:
              0.25194862 = queryWeight, product of:
                3.008257 = boost
                5.942579 = idf(docFreq=304, maxDocs=42740)
                0.014093605 = queryNorm
              0.64330304 = fieldWeight in 3608, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.942579 = idf(docFreq=304, maxDocs=42740)
                0.0625 = fieldNorm(doc=3608)
          0.8456304 = weight(abstract_txt:rocchio in 3608) [ClassicSimilarity], result of:
            0.8456304 = score(doc=3608,freq=5.0), product of:
              0.6392407 = queryWeight, product of:
                4.791717 = boost
                9.465666 = idf(docFreq=8, maxDocs=42740)
                0.014093605 = queryNorm
              1.322867 = fieldWeight in 3608, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                9.465666 = idf(docFreq=8, maxDocs=42740)
                0.0625 = fieldNorm(doc=3608)
        0.16 = coord(4/25)
    
  2. Zhu, W.Z.; Allen, R.B.: Document clustering using the LSI subspace signature model (2013) 0.18
    0.18251586 = sum of:
      0.18251586 = product of:
        0.65184236 = sum of:
          0.03222704 = weight(abstract_txt:indexing in 2691) [ClassicSimilarity], result of:
            0.03222704 = score(doc=2691,freq=2.0), product of:
              0.06720284 = queryWeight, product of:
                1.0985954 = boost
                4.34038 = idf(docFreq=1513, maxDocs=42740)
                0.014093605 = queryNorm
              0.4795488 = fieldWeight in 2691, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.34038 = idf(docFreq=1513, maxDocs=42740)
                0.078125 = fieldNorm(doc=2691)
          0.04393296 = weight(abstract_txt:semantic in 2691) [ClassicSimilarity], result of:
            0.04393296 = score(doc=2691,freq=3.0), product of:
              0.07217784 = queryWeight, product of:
                1.1385337 = boost
                4.4981704 = idf(docFreq=1292, maxDocs=42740)
                0.014093605 = queryNorm
              0.60867655 = fieldWeight in 2691, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.4981704 = idf(docFreq=1292, maxDocs=42740)
                0.078125 = fieldNorm(doc=2691)
          0.23214185 = weight(abstract_txt:subspace in 2691) [ClassicSimilarity], result of:
            0.23214185 = score(doc=2691,freq=3.0), product of:
              0.17379445 = queryWeight, product of:
                1.2492429 = boost
                9.871131 = idf(docFreq=5, maxDocs=42740)
                0.014093605 = queryNorm
              1.3357265 = fieldWeight in 2691, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.871131 = idf(docFreq=5, maxDocs=42740)
                0.078125 = fieldNorm(doc=2691)
          0.043778725 = weight(abstract_txt:space in 2691) [ClassicSimilarity], result of:
            0.043778725 = score(doc=2691,freq=1.0), product of:
              0.10385468 = queryWeight, product of:
                1.3657053 = boost
                5.39569 = idf(docFreq=526, maxDocs=42740)
                0.014093605 = queryNorm
              0.4215383 = fieldWeight in 2691, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.39569 = idf(docFreq=526, maxDocs=42740)
                0.078125 = fieldNorm(doc=2691)
          0.054407965 = weight(abstract_txt:model in 2691) [ClassicSimilarity], result of:
            0.054407965 = score(doc=2691,freq=4.0), product of:
              0.0865704 = queryWeight, product of:
                1.5271239 = boost
                4.022287 = idf(docFreq=2080, maxDocs=42740)
                0.014093605 = queryNorm
              0.62848234 = fieldWeight in 2691, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.022287 = idf(docFreq=2080, maxDocs=42740)
                0.078125 = fieldNorm(doc=2691)
          0.0770805 = weight(abstract_txt:vector in 2691) [ClassicSimilarity], result of:
            0.0770805 = score(doc=2691,freq=1.0), product of:
              0.15143062 = queryWeight, product of:
                1.6491145 = boost
                6.515396 = idf(docFreq=171, maxDocs=42740)
                0.014093605 = queryNorm
              0.5090153 = fieldWeight in 2691, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.515396 = idf(docFreq=171, maxDocs=42740)
                0.078125 = fieldNorm(doc=2691)
          0.16827333 = weight(abstract_txt:latent in 2691) [ClassicSimilarity], result of:
            0.16827333 = score(doc=2691,freq=3.0), product of:
              0.17669344 = queryWeight, product of:
                1.7813702 = boost
                7.0379176 = idf(docFreq=101, maxDocs=42740)
                0.014093605 = queryNorm
              0.9523462 = fieldWeight in 2691, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.0379176 = idf(docFreq=101, maxDocs=42740)
                0.078125 = fieldNorm(doc=2691)
        0.28 = coord(7/25)
    
  3. Pan, M.; Huang, J.X.; He, T.; Mao, Z.; Ying, Z.; Tu, X.: ¬A simple kernel co-occurrence-based enhancement for pseudo-relevance feedback (2020) 0.16
    0.15594965 = sum of:
      0.15594965 = product of:
        0.77974826 = sum of:
          0.044813186 = weight(abstract_txt:least in 1679) [ClassicSimilarity], result of:
            0.044813186 = score(doc=1679,freq=1.0), product of:
              0.1224037 = queryWeight, product of:
                1.4826589 = boost
                5.8577557 = idf(docFreq=331, maxDocs=42740)
                0.014093605 = queryNorm
              0.36610973 = fieldWeight in 1679, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8577557 = idf(docFreq=331, maxDocs=42740)
                0.0625 = fieldNorm(doc=1679)
          0.030777792 = weight(abstract_txt:model in 1679) [ClassicSimilarity], result of:
            0.030777792 = score(doc=1679,freq=2.0), product of:
              0.0865704 = queryWeight, product of:
                1.5271239 = boost
                4.022287 = idf(docFreq=2080, maxDocs=42740)
                0.014093605 = queryNorm
              0.3555233 = fieldWeight in 1679, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.022287 = idf(docFreq=2080, maxDocs=42740)
                0.0625 = fieldNorm(doc=1679)
          0.075757146 = weight(abstract_txt:relevance in 1679) [ClassicSimilarity], result of:
            0.075757146 = score(doc=1679,freq=2.0), product of:
              0.17370264 = queryWeight, product of:
                2.4978259 = boost
                4.934262 = idf(docFreq=835, maxDocs=42740)
                0.014093605 = queryNorm
              0.43613124 = fieldWeight in 1679, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.934262 = idf(docFreq=835, maxDocs=42740)
                0.0625 = fieldNorm(doc=1679)
          0.093576536 = weight(abstract_txt:feedback in 1679) [ClassicSimilarity], result of:
            0.093576536 = score(doc=1679,freq=1.0), product of:
              0.25194862 = queryWeight, product of:
                3.008257 = boost
                5.942579 = idf(docFreq=304, maxDocs=42740)
                0.014093605 = queryNorm
              0.37141117 = fieldWeight in 1679, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.942579 = idf(docFreq=304, maxDocs=42740)
                0.0625 = fieldNorm(doc=1679)
          0.5348236 = weight(abstract_txt:rocchio in 1679) [ClassicSimilarity], result of:
            0.5348236 = score(doc=1679,freq=2.0), product of:
              0.6392407 = queryWeight, product of:
                4.791717 = boost
                9.465666 = idf(docFreq=8, maxDocs=42740)
                0.014093605 = queryNorm
              0.83665454 = fieldWeight in 1679, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.465666 = idf(docFreq=8, maxDocs=42740)
                0.0625 = fieldNorm(doc=1679)
        0.2 = coord(5/25)
    
  4. Kumar, C.A.; Radvansky, M.; Annapurna, J.: Analysis of Vector Space Model, Latent Semantic Indexing and Formal Concept Analysis for information retrieval (2012) 0.14
    0.13671301 = sum of:
      0.13671301 = product of:
        0.48826075 = sum of:
          0.027345551 = weight(abstract_txt:indexing in 4711) [ClassicSimilarity], result of:
            0.027345551 = score(doc=4711,freq=1.0), product of:
              0.06720284 = queryWeight, product of:
                1.0985954 = boost
                4.34038 = idf(docFreq=1513, maxDocs=42740)
                0.014093605 = queryNorm
              0.40691066 = fieldWeight in 4711, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.34038 = idf(docFreq=1513, maxDocs=42740)
                0.09375 = fieldNorm(doc=4711)
          0.04304533 = weight(abstract_txt:semantic in 4711) [ClassicSimilarity], result of:
            0.04304533 = score(doc=4711,freq=2.0), product of:
              0.07217784 = queryWeight, product of:
                1.1385337 = boost
                4.4981704 = idf(docFreq=1292, maxDocs=42740)
                0.014093605 = queryNorm
              0.59637874 = fieldWeight in 4711, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4981704 = idf(docFreq=1292, maxDocs=42740)
                0.09375 = fieldNorm(doc=4711)
          0.052534465 = weight(abstract_txt:space in 4711) [ClassicSimilarity], result of:
            0.052534465 = score(doc=4711,freq=1.0), product of:
              0.10385468 = queryWeight, product of:
                1.3657053 = boost
                5.39569 = idf(docFreq=526, maxDocs=42740)
                0.014093605 = queryNorm
              0.5058459 = fieldWeight in 4711, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.39569 = idf(docFreq=526, maxDocs=42740)
                0.09375 = fieldNorm(doc=4711)
          0.046166684 = weight(abstract_txt:model in 4711) [ClassicSimilarity], result of:
            0.046166684 = score(doc=4711,freq=2.0), product of:
              0.0865704 = queryWeight, product of:
                1.5271239 = boost
                4.022287 = idf(docFreq=2080, maxDocs=42740)
                0.014093605 = queryNorm
              0.5332849 = fieldWeight in 4711, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.022287 = idf(docFreq=2080, maxDocs=42740)
                0.09375 = fieldNorm(doc=4711)
          0.09249661 = weight(abstract_txt:vector in 4711) [ClassicSimilarity], result of:
            0.09249661 = score(doc=4711,freq=1.0), product of:
              0.15143062 = queryWeight, product of:
                1.6491145 = boost
                6.515396 = idf(docFreq=171, maxDocs=42740)
                0.014093605 = queryNorm
              0.6108184 = fieldWeight in 4711, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.515396 = idf(docFreq=171, maxDocs=42740)
                0.09375 = fieldNorm(doc=4711)
          0.16487351 = weight(abstract_txt:latent in 4711) [ClassicSimilarity], result of:
            0.16487351 = score(doc=4711,freq=2.0), product of:
              0.17669344 = queryWeight, product of:
                1.7813702 = boost
                7.0379176 = idf(docFreq=101, maxDocs=42740)
                0.014093605 = queryNorm
              0.9331049 = fieldWeight in 4711, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.0379176 = idf(docFreq=101, maxDocs=42740)
                0.09375 = fieldNorm(doc=4711)
          0.061798595 = weight(abstract_txt:relationship in 4711) [ClassicSimilarity], result of:
            0.061798595 = score(doc=4711,freq=1.0), product of:
              0.13247868 = queryWeight, product of:
                1.8891331 = boost
                4.975782 = idf(docFreq=801, maxDocs=42740)
                0.014093605 = queryNorm
              0.46647954 = fieldWeight in 4711, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.975782 = idf(docFreq=801, maxDocs=42740)
                0.09375 = fieldNorm(doc=4711)
        0.28 = coord(7/25)
    
  5. Layfield, C.; Azzopardi, J,; Staff, C.: Experiments with document retrieval from small text collections using Latent Semantic Analysis or term similarity with query coordination and automatic relevance feedback (2017) 0.13
    0.13066433 = sum of:
      0.13066433 = product of:
        0.40832606 = sum of:
          0.01595157 = weight(abstract_txt:indexing in 5479) [ClassicSimilarity], result of:
            0.01595157 = score(doc=5479,freq=1.0), product of:
              0.06720284 = queryWeight, product of:
                1.0985954 = boost
                4.34038 = idf(docFreq=1513, maxDocs=42740)
                0.014093605 = queryNorm
              0.23736455 = fieldWeight in 5479, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.34038 = idf(docFreq=1513, maxDocs=42740)
                0.0546875 = fieldNorm(doc=5479)
          0.03075307 = weight(abstract_txt:semantic in 5479) [ClassicSimilarity], result of:
            0.03075307 = score(doc=5479,freq=3.0), product of:
              0.07217784 = queryWeight, product of:
                1.1385337 = boost
                4.4981704 = idf(docFreq=1292, maxDocs=42740)
                0.014093605 = queryNorm
              0.42607355 = fieldWeight in 5479, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.4981704 = idf(docFreq=1292, maxDocs=42740)
                0.0546875 = fieldNorm(doc=5479)
          0.030645104 = weight(abstract_txt:space in 5479) [ClassicSimilarity], result of:
            0.030645104 = score(doc=5479,freq=1.0), product of:
              0.10385468 = queryWeight, product of:
                1.3657053 = boost
                5.39569 = idf(docFreq=526, maxDocs=42740)
                0.014093605 = queryNorm
              0.2950768 = fieldWeight in 5479, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.39569 = idf(docFreq=526, maxDocs=42740)
                0.0546875 = fieldNorm(doc=5479)
          0.026930567 = weight(abstract_txt:model in 5479) [ClassicSimilarity], result of:
            0.026930567 = score(doc=5479,freq=2.0), product of:
              0.0865704 = queryWeight, product of:
                1.5271239 = boost
                4.022287 = idf(docFreq=2080, maxDocs=42740)
                0.014093605 = queryNorm
              0.31108287 = fieldWeight in 5479, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.022287 = idf(docFreq=2080, maxDocs=42740)
                0.0546875 = fieldNorm(doc=5479)
          0.053956356 = weight(abstract_txt:vector in 5479) [ClassicSimilarity], result of:
            0.053956356 = score(doc=5479,freq=1.0), product of:
              0.15143062 = queryWeight, product of:
                1.6491145 = boost
                6.515396 = idf(docFreq=171, maxDocs=42740)
                0.014093605 = queryNorm
              0.35631073 = fieldWeight in 5479, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.515396 = idf(docFreq=171, maxDocs=42740)
                0.0546875 = fieldNorm(doc=5479)
          0.06800685 = weight(abstract_txt:latent in 5479) [ClassicSimilarity], result of:
            0.06800685 = score(doc=5479,freq=1.0), product of:
              0.17669344 = queryWeight, product of:
                1.7813702 = boost
                7.0379176 = idf(docFreq=101, maxDocs=42740)
                0.014093605 = queryNorm
              0.38488612 = fieldWeight in 5479, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0379176 = idf(docFreq=101, maxDocs=42740)
                0.0546875 = fieldNorm(doc=5479)
          0.0662875 = weight(abstract_txt:relevance in 5479) [ClassicSimilarity], result of:
            0.0662875 = score(doc=5479,freq=2.0), product of:
              0.17370264 = queryWeight, product of:
                2.4978259 = boost
                4.934262 = idf(docFreq=835, maxDocs=42740)
                0.014093605 = queryNorm
              0.38161483 = fieldWeight in 5479, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.934262 = idf(docFreq=835, maxDocs=42740)
                0.0546875 = fieldNorm(doc=5479)
          0.11579505 = weight(abstract_txt:feedback in 5479) [ClassicSimilarity], result of:
            0.11579505 = score(doc=5479,freq=2.0), product of:
              0.25194862 = queryWeight, product of:
                3.008257 = boost
                5.942579 = idf(docFreq=304, maxDocs=42740)
                0.014093605 = queryNorm
              0.4595979 = fieldWeight in 5479, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.942579 = idf(docFreq=304, maxDocs=42740)
                0.0546875 = fieldNorm(doc=5479)
        0.32 = coord(8/25)