Document (#27582)

Author
Drucker, H.
Shahrary, B.
Gibbon, D.C.
Title
Support vector machines : relevance feedback and information retrieval
Source
Information processing and management. 38(2002) no.3, S.305-323
Year
2002
Abstract
We compare support vector machines (SVMs) to Rocchio, Ide regular and Ide dec-hi algorithms in information retrieval (IR) of text documents using relevancy feedback. It is assumed a preliminary search finds a set of documents that the user marks as relevant or not and then feedback iterations commence. Particular attention is paid to IR searches where the number of relevant documents in the database is low and the preliminary set of documents used to start the search has few relevant documents. Experiments show that if inverse document frequency (IDF) weighting is not used because one is unwilling to pay the time penalty needed to obtain these features, then SVMs are better whether using term-frequency (TF) or binary weighting. SVM performance is marginally better than Ide dec-hi if TF-IDF weighting is used and there is a reasonable number of relevant documents found in the preliminary search. If the preliminary search is so poor that one has to search through many documents to find at least one relevant document, then SVM is preferred.
Theme
Retrievalalgorithmen

Similar documents (content)

  1. Sormunen, E.; Kekäläinen, J.; Koivisto, J.; Järvelin, K.: Document text characteristics affect the ranking of the most relevant documents by expanded structured queries (2001) 0.25
    0.24521197 = sum of:
      0.24521197 = product of:
        0.76628745 = sum of:
          0.020783808 = weight(abstract_txt:number in 4487) [ClassicSimilarity], result of:
            0.020783808 = score(doc=4487,freq=1.0), product of:
              0.08046678 = queryWeight, product of:
                1.1375197 = boost
                4.132649 = idf(docFreq=1927, maxDocs=44218)
                0.017117059 = queryNorm
              0.25829056 = fieldWeight in 4487, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.132649 = idf(docFreq=1927, maxDocs=44218)
                0.0625 = fieldNorm(doc=4487)
          0.023291813 = weight(abstract_txt:document in 4487) [ClassicSimilarity], result of:
            0.023291813 = score(doc=4487,freq=1.0), product of:
              0.08681645 = queryWeight, product of:
                1.1815487 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.017117059 = queryNorm
              0.26828802 = fieldWeight in 4487, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.0625 = fieldNorm(doc=4487)
          0.17758714 = weight(abstract_txt:marginally in 4487) [ClassicSimilarity], result of:
            0.17758714 = score(doc=4487,freq=3.0), product of:
              0.18507898 = queryWeight, product of:
                1.2198718 = boost
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.017117059 = queryNorm
              0.9595209 = fieldWeight in 4487, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.0625 = fieldNorm(doc=4487)
          0.031808395 = weight(abstract_txt:better in 4487) [ClassicSimilarity], result of:
            0.031808395 = score(doc=4487,freq=1.0), product of:
              0.106863074 = queryWeight, product of:
                1.3108846 = boost
                4.76249 = idf(docFreq=1026, maxDocs=44218)
                0.017117059 = queryNorm
              0.2976556 = fieldWeight in 4487, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.76249 = idf(docFreq=1026, maxDocs=44218)
                0.0625 = fieldNorm(doc=4487)
          0.01674468 = weight(abstract_txt:used in 4487) [ClassicSimilarity], result of:
            0.01674468 = score(doc=4487,freq=1.0), product of:
              0.07975321 = queryWeight, product of:
                1.3869805 = boost
                3.3592992 = idf(docFreq=4177, maxDocs=44218)
                0.017117059 = queryNorm
              0.2099562 = fieldWeight in 4487, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3592992 = idf(docFreq=4177, maxDocs=44218)
                0.0625 = fieldNorm(doc=4487)
          0.036035396 = weight(abstract_txt:search in 4487) [ClassicSimilarity], result of:
            0.036035396 = score(doc=4487,freq=1.0), product of:
              0.15761566 = queryWeight, product of:
                2.517215 = boost
                3.6580524 = idf(docFreq=3098, maxDocs=44218)
                0.017117059 = queryNorm
              0.22862828 = fieldWeight in 4487, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6580524 = idf(docFreq=3098, maxDocs=44218)
                0.0625 = fieldNorm(doc=4487)
          0.23189116 = weight(abstract_txt:relevant in 4487) [ClassicSimilarity], result of:
            0.23189116 = score(doc=4487,freq=10.0), product of:
              0.25310612 = queryWeight, product of:
                3.1898625 = boost
                4.635553 = idf(docFreq=1165, maxDocs=44218)
                0.017117059 = queryNorm
              0.91618156 = fieldWeight in 4487, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                4.635553 = idf(docFreq=1165, maxDocs=44218)
                0.0625 = fieldNorm(doc=4487)
          0.2281451 = weight(abstract_txt:documents in 4487) [ClassicSimilarity], result of:
            0.2281451 = score(doc=4487,freq=10.0), product of:
              0.28008935 = queryWeight, product of:
                3.970388 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.017117059 = queryNorm
              0.81454396 = fieldWeight in 4487, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.0625 = fieldNorm(doc=4487)
        0.32 = coord(8/25)
    
  2. Ruthven, I.; Lalmas, M.; Rijsbergen, K. van: Combining and selecting characteristics of information use (2002) 0.23
    0.23205194 = sum of:
      0.23205194 = product of:
        0.7251623 = sum of:
          0.050208163 = weight(abstract_txt:inverse in 5208) [ClassicSimilarity], result of:
            0.050208163 = score(doc=5208,freq=1.0), product of:
              0.13929383 = queryWeight, product of:
                1.0582824 = boost
                7.689554 = idf(docFreq=54, maxDocs=44218)
                0.017117059 = queryNorm
              0.36044785 = fieldWeight in 5208, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.689554 = idf(docFreq=54, maxDocs=44218)
                0.046875 = fieldNorm(doc=5208)
          0.049409397 = weight(abstract_txt:document in 5208) [ClassicSimilarity], result of:
            0.049409397 = score(doc=5208,freq=8.0), product of:
              0.08681645 = queryWeight, product of:
                1.1815487 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.017117059 = queryNorm
              0.5691248 = fieldWeight in 5208, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.046875 = fieldNorm(doc=5208)
          0.080475464 = weight(abstract_txt:frequency in 5208) [ClassicSimilarity], result of:
            0.080475464 = score(doc=5208,freq=3.0), product of:
              0.16665854 = queryWeight, product of:
                1.6370593 = boost
                5.947494 = idf(docFreq=313, maxDocs=44218)
                0.017117059 = queryNorm
              0.48287636 = fieldWeight in 5208, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.947494 = idf(docFreq=313, maxDocs=44218)
                0.046875 = fieldNorm(doc=5208)
          0.05646678 = weight(abstract_txt:then in 5208) [ClassicSimilarity], result of:
            0.05646678 = score(doc=5208,freq=3.0), product of:
              0.1506414 = queryWeight, product of:
                1.9061998 = boost
                4.616861 = idf(docFreq=1187, maxDocs=44218)
                0.017117059 = queryNorm
              0.37484238 = fieldWeight in 5208, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.616861 = idf(docFreq=1187, maxDocs=44218)
                0.046875 = fieldNorm(doc=5208)
          0.09840392 = weight(abstract_txt:feedback in 5208) [ClassicSimilarity], result of:
            0.09840392 = score(doc=5208,freq=2.0), product of:
              0.24972059 = queryWeight, product of:
                2.454276 = boost
                5.9443145 = idf(docFreq=314, maxDocs=44218)
                0.017117059 = queryNorm
              0.3940561 = fieldWeight in 5208, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.9443145 = idf(docFreq=314, maxDocs=44218)
                0.046875 = fieldNorm(doc=5208)
          0.15921077 = weight(abstract_txt:weighting in 5208) [ClassicSimilarity], result of:
            0.15921077 = score(doc=5208,freq=2.0), product of:
              0.34416053 = queryWeight, product of:
                2.8812225 = boost
                6.9783883 = idf(docFreq=111, maxDocs=44218)
                0.017117059 = queryNorm
              0.46260613 = fieldWeight in 5208, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.9783883 = idf(docFreq=111, maxDocs=44218)
                0.046875 = fieldNorm(doc=5208)
          0.10999563 = weight(abstract_txt:relevant in 5208) [ClassicSimilarity], result of:
            0.10999563 = score(doc=5208,freq=4.0), product of:
              0.25310612 = queryWeight, product of:
                3.1898625 = boost
                4.635553 = idf(docFreq=1165, maxDocs=44218)
                0.017117059 = queryNorm
              0.43458307 = fieldWeight in 5208, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.635553 = idf(docFreq=1165, maxDocs=44218)
                0.046875 = fieldNorm(doc=5208)
          0.120992206 = weight(abstract_txt:documents in 5208) [ClassicSimilarity], result of:
            0.120992206 = score(doc=5208,freq=5.0), product of:
              0.28008935 = queryWeight, product of:
                3.970388 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.017117059 = queryNorm
              0.43197718 = fieldWeight in 5208, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.046875 = fieldNorm(doc=5208)
        0.32 = coord(8/25)
    
  3. Bodoff, D.; Wu, B.; Wong, K.Y.M.: Relevance data for language models using maximum likelihood (2003) 0.22
    0.2150718 = sum of:
      0.2150718 = product of:
        0.76811355 = sum of:
          0.12316367 = weight(abstract_txt:relevancy in 1822) [ClassicSimilarity], result of:
            0.12316367 = score(doc=1822,freq=1.0), product of:
              0.15960656 = queryWeight, product of:
                1.1328202 = boost
                8.231152 = idf(docFreq=31, maxDocs=44218)
                0.017117059 = queryNorm
              0.77167046 = fieldWeight in 1822, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.231152 = idf(docFreq=31, maxDocs=44218)
                0.09375 = fieldNorm(doc=1822)
          0.049409397 = weight(abstract_txt:document in 1822) [ClassicSimilarity], result of:
            0.049409397 = score(doc=1822,freq=2.0), product of:
              0.08681645 = queryWeight, product of:
                1.1815487 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.017117059 = queryNorm
              0.5691248 = fieldWeight in 1822, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.09375 = fieldNorm(doc=1822)
          0.18933256 = weight(abstract_txt:rocchio in 1822) [ClassicSimilarity], result of:
            0.18933256 = score(doc=1822,freq=1.0), product of:
              0.21259148 = queryWeight, product of:
                1.3074002 = boost
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.017117059 = queryNorm
              0.89059335 = fieldWeight in 1822, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.09375 = fieldNorm(doc=1822)
          0.04771259 = weight(abstract_txt:better in 1822) [ClassicSimilarity], result of:
            0.04771259 = score(doc=1822,freq=1.0), product of:
              0.106863074 = queryWeight, product of:
                1.3108846 = boost
                4.76249 = idf(docFreq=1026, maxDocs=44218)
                0.017117059 = queryNorm
              0.44648343 = fieldWeight in 1822, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.76249 = idf(docFreq=1026, maxDocs=44218)
                0.09375 = fieldNorm(doc=1822)
          0.025117023 = weight(abstract_txt:used in 1822) [ClassicSimilarity], result of:
            0.025117023 = score(doc=1822,freq=1.0), product of:
              0.07975321 = queryWeight, product of:
                1.3869805 = boost
                3.3592992 = idf(docFreq=4177, maxDocs=44218)
                0.017117059 = queryNorm
              0.3149343 = fieldWeight in 1822, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3592992 = idf(docFreq=4177, maxDocs=44218)
                0.09375 = fieldNorm(doc=1822)
          0.22515959 = weight(abstract_txt:preliminary in 1822) [ClassicSimilarity], result of:
            0.22515959 = score(doc=1822,freq=1.0), product of:
              0.3787994 = queryWeight, product of:
                3.4903605 = boost
                6.340301 = idf(docFreq=211, maxDocs=44218)
                0.017117059 = queryNorm
              0.5944032 = fieldWeight in 1822, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.340301 = idf(docFreq=211, maxDocs=44218)
                0.09375 = fieldNorm(doc=1822)
          0.108218715 = weight(abstract_txt:documents in 1822) [ClassicSimilarity], result of:
            0.108218715 = score(doc=1822,freq=1.0), product of:
              0.28008935 = queryWeight, product of:
                3.970388 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.017117059 = queryNorm
              0.38637212 = fieldWeight in 1822, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.09375 = fieldNorm(doc=1822)
        0.28 = coord(7/25)
    
  4. Smith, M.P.; Pollitt, S.A.: ¬A comparison of ranking formulae and their ranks (1995) 0.21
    0.21080443 = sum of:
      0.21080443 = product of:
        0.75287294 = sum of:
          0.05809251 = weight(abstract_txt:number in 5802) [ClassicSimilarity], result of:
            0.05809251 = score(doc=5802,freq=5.0), product of:
              0.08046678 = queryWeight, product of:
                1.1375197 = boost
                4.132649 = idf(docFreq=1927, maxDocs=44218)
                0.017117059 = queryNorm
              0.72194403 = fieldWeight in 5802, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.132649 = idf(docFreq=1927, maxDocs=44218)
                0.078125 = fieldNorm(doc=5802)
          0.029114768 = weight(abstract_txt:document in 5802) [ClassicSimilarity], result of:
            0.029114768 = score(doc=5802,freq=1.0), product of:
              0.08681645 = queryWeight, product of:
                1.1815487 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.017117059 = queryNorm
              0.33536002 = fieldWeight in 5802, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.078125 = fieldNorm(doc=5802)
          0.02093085 = weight(abstract_txt:used in 5802) [ClassicSimilarity], result of:
            0.02093085 = score(doc=5802,freq=1.0), product of:
              0.07975321 = queryWeight, product of:
                1.3869805 = boost
                3.3592992 = idf(docFreq=4177, maxDocs=44218)
                0.017117059 = queryNorm
              0.26244524 = fieldWeight in 5802, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3592992 = idf(docFreq=4177, maxDocs=44218)
                0.078125 = fieldNorm(doc=5802)
          0.07743755 = weight(abstract_txt:frequency in 5802) [ClassicSimilarity], result of:
            0.07743755 = score(doc=5802,freq=1.0), product of:
              0.16665854 = queryWeight, product of:
                1.6370593 = boost
                5.947494 = idf(docFreq=313, maxDocs=44218)
                0.017117059 = queryNorm
              0.46464798 = fieldWeight in 5802, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.947494 = idf(docFreq=313, maxDocs=44218)
                0.078125 = fieldNorm(doc=5802)
          0.1876317 = weight(abstract_txt:weighting in 5802) [ClassicSimilarity], result of:
            0.1876317 = score(doc=5802,freq=1.0), product of:
              0.34416053 = queryWeight, product of:
                2.8812225 = boost
                6.9783883 = idf(docFreq=111, maxDocs=44218)
                0.017117059 = queryNorm
              0.5451866 = fieldWeight in 5802, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9783883 = idf(docFreq=111, maxDocs=44218)
                0.078125 = fieldNorm(doc=5802)
          0.15876502 = weight(abstract_txt:relevant in 5802) [ClassicSimilarity], result of:
            0.15876502 = score(doc=5802,freq=3.0), product of:
              0.25310612 = queryWeight, product of:
                3.1898625 = boost
                4.635553 = idf(docFreq=1165, maxDocs=44218)
                0.017117059 = queryNorm
              0.62726665 = fieldWeight in 5802, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.635553 = idf(docFreq=1165, maxDocs=44218)
                0.078125 = fieldNorm(doc=5802)
          0.22090054 = weight(abstract_txt:documents in 5802) [ClassicSimilarity], result of:
            0.22090054 = score(doc=5802,freq=6.0), product of:
              0.28008935 = queryWeight, product of:
                3.970388 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.017117059 = queryNorm
              0.7886788 = fieldWeight in 5802, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.078125 = fieldNorm(doc=5802)
        0.28 = coord(7/25)
    
  5. Ye, Z.; Huang, J.X.: ¬A learning to rank approach for quality-aware pseudo-relevance feedback (2016) 0.20
    0.20375128 = sum of:
      0.20375128 = product of:
        0.7276831 = sum of:
          0.05648178 = weight(abstract_txt:reasonable in 2855) [ClassicSimilarity], result of:
            0.05648178 = score(doc=2855,freq=1.0), product of:
              0.12437376 = queryWeight, product of:
                7.2660704 = idf(docFreq=83, maxDocs=44218)
                0.017117059 = queryNorm
              0.4541294 = fieldWeight in 2855, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.2660704 = idf(docFreq=83, maxDocs=44218)
                0.0625 = fieldNorm(doc=2855)
          0.05208208 = weight(abstract_txt:document in 2855) [ClassicSimilarity], result of:
            0.05208208 = score(doc=2855,freq=5.0), product of:
              0.08681645 = queryWeight, product of:
                1.1815487 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.017117059 = queryNorm
              0.59991026 = fieldWeight in 2855, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.0625 = fieldNorm(doc=2855)
          0.043468148 = weight(abstract_txt:then in 2855) [ClassicSimilarity], result of:
            0.043468148 = score(doc=2855,freq=1.0), product of:
              0.1506414 = queryWeight, product of:
                1.9061998 = boost
                4.616861 = idf(docFreq=1187, maxDocs=44218)
                0.017117059 = queryNorm
              0.2885538 = fieldWeight in 2855, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.616861 = idf(docFreq=1187, maxDocs=44218)
                0.0625 = fieldNorm(doc=2855)
          0.22725414 = weight(abstract_txt:feedback in 2855) [ClassicSimilarity], result of:
            0.22725414 = score(doc=2855,freq=6.0), product of:
              0.24972059 = queryWeight, product of:
                2.454276 = boost
                5.9443145 = idf(docFreq=314, maxDocs=44218)
                0.017117059 = queryNorm
              0.91003364 = fieldWeight in 2855, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.9443145 = idf(docFreq=314, maxDocs=44218)
                0.0625 = fieldNorm(doc=2855)
          0.073330425 = weight(abstract_txt:relevant in 2855) [ClassicSimilarity], result of:
            0.073330425 = score(doc=2855,freq=1.0), product of:
              0.25310612 = queryWeight, product of:
                3.1898625 = boost
                4.635553 = idf(docFreq=1165, maxDocs=44218)
                0.017117059 = queryNorm
              0.28972206 = fieldWeight in 2855, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.635553 = idf(docFreq=1165, maxDocs=44218)
                0.0625 = fieldNorm(doc=2855)
          0.15010639 = weight(abstract_txt:preliminary in 2855) [ClassicSimilarity], result of:
            0.15010639 = score(doc=2855,freq=1.0), product of:
              0.3787994 = queryWeight, product of:
                3.4903605 = boost
                6.340301 = idf(docFreq=211, maxDocs=44218)
                0.017117059 = queryNorm
              0.3962688 = fieldWeight in 2855, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.340301 = idf(docFreq=211, maxDocs=44218)
                0.0625 = fieldNorm(doc=2855)
          0.12496021 = weight(abstract_txt:documents in 2855) [ClassicSimilarity], result of:
            0.12496021 = score(doc=2855,freq=3.0), product of:
              0.28008935 = queryWeight, product of:
                3.970388 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.017117059 = queryNorm
              0.44614407 = fieldWeight in 2855, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.0625 = fieldNorm(doc=2855)
        0.28 = coord(7/25)