Document (#33371)

Author
Spoerri, A.
Title
Authority and ranking effects in data fusion
Source
Journal of the American Society for Information Science and Technology. 59(2008) no.3, S.450-460
Year
2008
Abstract
This paper provides empirical support for some of the key assumptions guiding the design of data fusion methods. It computes and analyzes the overlap structures between the search results of retrieval systems that participated in the short, long, and manual tracks in TREC 3, 6, 7, and 8 to examine what can be learned to infer a document's probability of being relevant. This paper shows that the potential relevance of a document increases exponentially as the number of systems retrieving it increases - called the Authority Effect. It also shows that documents higher up in ranked lists and found by more systems are more likely to be relevant - called the Ranking Effect. A contribution of this paper is that it shows that the Authority and Ranking Effects can be observed regardless of whether a query is generated manually or automatically and short or long queries are used. Further, it is illustrated that the Authority and Ranking Effects can be observed if the result sets of random groupings of five retrieval systems are compared and only the top 50 results are used in the overlap computation. Also discussed is how the Authority and Ranking Effects can help explain why major data fusion methods perform well.

Similar documents (content)

  1. Costa Carvalho, A. da; Rossi, C.; Moura, E.S. de; Silva, A.S. da; Fernandes, D.: LePrEF: Learn to precompute evidence fusion for efficient query evaluation (2012) 0.25
    0.24859375 = sum of:
      0.24859375 = product of:
        0.77685547 = sum of:
          0.030022642 = weight(abstract_txt:methods in 278) [ClassicSimilarity], result of:
            0.030022642 = score(doc=278,freq=2.0), product of:
              0.08191168 = queryWeight, product of:
                1.075974 = boost
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.01835845 = queryNorm
              0.36652455 = fieldWeight in 278, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.0625 = fieldNorm(doc=278)
          0.044364642 = weight(abstract_txt:called in 278) [ClassicSimilarity], result of:
            0.044364642 = score(doc=278,freq=1.0), product of:
              0.13388993 = queryWeight, product of:
                1.3756337 = boost
                5.3016257 = idf(docFreq=598, maxDocs=44218)
                0.01835845 = queryNorm
              0.3313516 = fieldWeight in 278, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3016257 = idf(docFreq=598, maxDocs=44218)
                0.0625 = fieldNorm(doc=278)
          0.047657806 = weight(abstract_txt:effect in 278) [ClassicSimilarity], result of:
            0.047657806 = score(doc=278,freq=1.0), product of:
              0.14043626 = queryWeight, product of:
                1.4088621 = boost
                5.4296865 = idf(docFreq=526, maxDocs=44218)
                0.01835845 = queryNorm
              0.3393554 = fieldWeight in 278, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4296865 = idf(docFreq=526, maxDocs=44218)
                0.0625 = fieldNorm(doc=278)
          0.023649415 = weight(abstract_txt:systems in 278) [ClassicSimilarity], result of:
            0.023649415 = score(doc=278,freq=1.0), product of:
              0.110903904 = queryWeight, product of:
                1.770587 = boost
                3.4118783 = idf(docFreq=3963, maxDocs=44218)
                0.01835845 = queryNorm
              0.2132424 = fieldWeight in 278, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4118783 = idf(docFreq=3963, maxDocs=44218)
                0.0625 = fieldNorm(doc=278)
          0.016803605 = weight(abstract_txt:that in 278) [ClassicSimilarity], result of:
            0.016803605 = score(doc=278,freq=2.0), product of:
              0.08023342 = queryWeight, product of:
                1.8444511 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.01835845 = queryNorm
              0.20943399 = fieldWeight in 278, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0625 = fieldNorm(doc=278)
          0.059613705 = weight(abstract_txt:shows in 278) [ClassicSimilarity], result of:
            0.059613705 = score(doc=278,freq=1.0), product of:
              0.18663126 = queryWeight, product of:
                1.9891461 = boost
                5.1107154 = idf(docFreq=724, maxDocs=44218)
                0.01835845 = queryNorm
              0.3194197 = fieldWeight in 278, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1107154 = idf(docFreq=724, maxDocs=44218)
                0.0625 = fieldNorm(doc=278)
          0.29348677 = weight(abstract_txt:fusion in 278) [ClassicSimilarity], result of:
            0.29348677 = score(doc=278,freq=2.0), product of:
              0.42868277 = queryWeight, product of:
                3.014689 = boost
                7.7456436 = idf(docFreq=51, maxDocs=44218)
                0.01835845 = queryNorm
              0.6846246 = fieldWeight in 278, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.7456436 = idf(docFreq=51, maxDocs=44218)
                0.0625 = fieldNorm(doc=278)
          0.26125684 = weight(abstract_txt:ranking in 278) [ClassicSimilarity], result of:
            0.26125684 = score(doc=278,freq=4.0), product of:
              0.37330317 = queryWeight, product of:
                3.631865 = boost
                5.598813 = idf(docFreq=444, maxDocs=44218)
                0.01835845 = queryNorm
              0.69985163 = fieldWeight in 278, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.598813 = idf(docFreq=444, maxDocs=44218)
                0.0625 = fieldNorm(doc=278)
        0.32 = coord(8/25)
    
  2. Kaszkiel, M.; Zobel, J.: Effective ranking with arbitrary passages (2001) 0.21
    0.20640853 = sum of:
      0.20640853 = product of:
        0.6450267 = sum of:
          0.029656125 = weight(abstract_txt:relevant in 5764) [ClassicSimilarity], result of:
            0.029656125 = score(doc=5764,freq=1.0), product of:
              0.102360606 = queryWeight, product of:
                1.2028052 = boost
                4.635553 = idf(docFreq=1165, maxDocs=44218)
                0.01835845 = queryNorm
              0.28972206 = fieldWeight in 5764, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.635553 = idf(docFreq=1165, maxDocs=44218)
                0.0625 = fieldNorm(doc=5764)
          0.04330648 = weight(abstract_txt:long in 5764) [ClassicSimilarity], result of:
            0.04330648 = score(doc=5764,freq=1.0), product of:
              0.13175239 = queryWeight, product of:
                1.3646086 = boost
                5.2591357 = idf(docFreq=624, maxDocs=44218)
                0.01835845 = queryNorm
              0.32869598 = fieldWeight in 5764, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2591357 = idf(docFreq=624, maxDocs=44218)
                0.0625 = fieldNorm(doc=5764)
          0.044364642 = weight(abstract_txt:called in 5764) [ClassicSimilarity], result of:
            0.044364642 = score(doc=5764,freq=1.0), product of:
              0.13388993 = queryWeight, product of:
                1.3756337 = boost
                5.3016257 = idf(docFreq=598, maxDocs=44218)
                0.01835845 = queryNorm
              0.3313516 = fieldWeight in 5764, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3016257 = idf(docFreq=598, maxDocs=44218)
                0.0625 = fieldNorm(doc=5764)
          0.082022384 = weight(abstract_txt:short in 5764) [ClassicSimilarity], result of:
            0.082022384 = score(doc=5764,freq=2.0), product of:
              0.16007918 = queryWeight, product of:
                1.5041678 = boost
                5.79699 = idf(docFreq=364, maxDocs=44218)
                0.01835845 = queryNorm
              0.5123863 = fieldWeight in 5764, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.79699 = idf(docFreq=364, maxDocs=44218)
                0.0625 = fieldNorm(doc=5764)
          0.023649415 = weight(abstract_txt:systems in 5764) [ClassicSimilarity], result of:
            0.023649415 = score(doc=5764,freq=1.0), product of:
              0.110903904 = queryWeight, product of:
                1.770587 = boost
                3.4118783 = idf(docFreq=3963, maxDocs=44218)
                0.01835845 = queryNorm
              0.2132424 = fieldWeight in 5764, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4118783 = idf(docFreq=3963, maxDocs=44218)
                0.0625 = fieldNorm(doc=5764)
          0.016803605 = weight(abstract_txt:that in 5764) [ClassicSimilarity], result of:
            0.016803605 = score(doc=5764,freq=2.0), product of:
              0.08023342 = queryWeight, product of:
                1.8444511 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.01835845 = queryNorm
              0.20943399 = fieldWeight in 5764, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0625 = fieldNorm(doc=5764)
          0.059613705 = weight(abstract_txt:shows in 5764) [ClassicSimilarity], result of:
            0.059613705 = score(doc=5764,freq=1.0), product of:
              0.18663126 = queryWeight, product of:
                1.9891461 = boost
                5.1107154 = idf(docFreq=724, maxDocs=44218)
                0.01835845 = queryNorm
              0.3194197 = fieldWeight in 5764, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1107154 = idf(docFreq=724, maxDocs=44218)
                0.0625 = fieldNorm(doc=5764)
          0.3456103 = weight(abstract_txt:ranking in 5764) [ClassicSimilarity], result of:
            0.3456103 = score(doc=5764,freq=7.0), product of:
              0.37330317 = queryWeight, product of:
                3.631865 = boost
                5.598813 = idf(docFreq=444, maxDocs=44218)
                0.01835845 = queryNorm
              0.92581666 = fieldWeight in 5764, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                5.598813 = idf(docFreq=444, maxDocs=44218)
                0.0625 = fieldNorm(doc=5764)
        0.32 = coord(8/25)
    
  3. Beitzel, S.M.; Jensen, E.C.; Chowdhury, A.; Grossman, D.; Frieder, O; Goharian, N.: Fusion of effective retrieval strategies in the same information retrieval system (2004) 0.19
    0.18510428 = sum of:
      0.18510428 = product of:
        0.77126783 = sum of:
          0.03590754 = weight(abstract_txt:data in 2502) [ClassicSimilarity], result of:
            0.03590754 = score(doc=2502,freq=3.0), product of:
              0.07953599 = queryWeight, product of:
                1.298543 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.01835845 = queryNorm
              0.4514628 = fieldWeight in 2502, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.078125 = fieldNorm(doc=2502)
          0.08276031 = weight(abstract_txt:observed in 2502) [ClassicSimilarity], result of:
            0.08276031 = score(doc=2502,freq=1.0), product of:
              0.1748496 = queryWeight, product of:
                1.5720313 = boost
                6.0585327 = idf(docFreq=280, maxDocs=44218)
                0.01835845 = queryNorm
              0.47332287 = fieldWeight in 2502, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0585327 = idf(docFreq=280, maxDocs=44218)
                0.078125 = fieldNorm(doc=2502)
          0.029561767 = weight(abstract_txt:systems in 2502) [ClassicSimilarity], result of:
            0.029561767 = score(doc=2502,freq=1.0), product of:
              0.110903904 = queryWeight, product of:
                1.770587 = boost
                3.4118783 = idf(docFreq=3963, maxDocs=44218)
                0.01835845 = queryNorm
              0.26655298 = fieldWeight in 2502, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4118783 = idf(docFreq=3963, maxDocs=44218)
                0.078125 = fieldNorm(doc=2502)
          0.029704858 = weight(abstract_txt:that in 2502) [ClassicSimilarity], result of:
            0.029704858 = score(doc=2502,freq=4.0), product of:
              0.08023342 = queryWeight, product of:
                1.8444511 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.01835845 = queryNorm
              0.3702305 = fieldWeight in 2502, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.078125 = fieldNorm(doc=2502)
          0.07451713 = weight(abstract_txt:shows in 2502) [ClassicSimilarity], result of:
            0.07451713 = score(doc=2502,freq=1.0), product of:
              0.18663126 = queryWeight, product of:
                1.9891461 = boost
                5.1107154 = idf(docFreq=724, maxDocs=44218)
                0.01835845 = queryNorm
              0.39927465 = fieldWeight in 2502, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1107154 = idf(docFreq=724, maxDocs=44218)
                0.078125 = fieldNorm(doc=2502)
          0.51881623 = weight(abstract_txt:fusion in 2502) [ClassicSimilarity], result of:
            0.51881623 = score(doc=2502,freq=4.0), product of:
              0.42868277 = queryWeight, product of:
                3.014689 = boost
                7.7456436 = idf(docFreq=51, maxDocs=44218)
                0.01835845 = queryNorm
              1.2102568 = fieldWeight in 2502, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.7456436 = idf(docFreq=51, maxDocs=44218)
                0.078125 = fieldNorm(doc=2502)
        0.24 = coord(6/25)
    
  4. Wu, S.; McClean, S.I.: Improving high accuracy retrieval by eliminating the uneven correlation effect in data fusion (2006) 0.16
    0.16489671 = sum of:
      0.16489671 = product of:
        0.68706965 = sum of:
          0.030022642 = weight(abstract_txt:methods in 219) [ClassicSimilarity], result of:
            0.030022642 = score(doc=219,freq=2.0), product of:
              0.08191168 = queryWeight, product of:
                1.075974 = boost
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.01835845 = queryNorm
              0.36652455 = fieldWeight in 219, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.0625 = fieldNorm(doc=219)
          0.03708515 = weight(abstract_txt:data in 219) [ClassicSimilarity], result of:
            0.03708515 = score(doc=219,freq=5.0), product of:
              0.07953599 = queryWeight, product of:
                1.298543 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.01835845 = queryNorm
              0.46626878 = fieldWeight in 219, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.0625 = fieldNorm(doc=219)
          0.06739831 = weight(abstract_txt:effect in 219) [ClassicSimilarity], result of:
            0.06739831 = score(doc=219,freq=2.0), product of:
              0.14043626 = queryWeight, product of:
                1.4088621 = boost
                5.4296865 = idf(docFreq=526, maxDocs=44218)
                0.01835845 = queryNorm
              0.479921 = fieldWeight in 219, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.4296865 = idf(docFreq=526, maxDocs=44218)
                0.0625 = fieldNorm(doc=219)
          0.023649415 = weight(abstract_txt:systems in 219) [ClassicSimilarity], result of:
            0.023649415 = score(doc=219,freq=1.0), product of:
              0.110903904 = queryWeight, product of:
                1.770587 = boost
                3.4118783 = idf(docFreq=3963, maxDocs=44218)
                0.01835845 = queryNorm
              0.2132424 = fieldWeight in 219, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4118783 = idf(docFreq=3963, maxDocs=44218)
                0.0625 = fieldNorm(doc=219)
          0.020580128 = weight(abstract_txt:that in 219) [ClassicSimilarity], result of:
            0.020580128 = score(doc=219,freq=3.0), product of:
              0.08023342 = queryWeight, product of:
                1.8444511 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.01835845 = queryNorm
              0.2565032 = fieldWeight in 219, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0625 = fieldNorm(doc=219)
          0.50833404 = weight(abstract_txt:fusion in 219) [ClassicSimilarity], result of:
            0.50833404 = score(doc=219,freq=6.0), product of:
              0.42868277 = queryWeight, product of:
                3.014689 = boost
                7.7456436 = idf(docFreq=51, maxDocs=44218)
                0.01835845 = queryNorm
              1.1858047 = fieldWeight in 219, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.7456436 = idf(docFreq=51, maxDocs=44218)
                0.0625 = fieldNorm(doc=219)
        0.24 = coord(6/25)
    
  5. Larsen, B.; Ingwersen, P.; Lund, B.: Data fusion according to the principle of polyrepresentation (2009) 0.16
    0.16266367 = sum of:
      0.16266367 = product of:
        0.6777653 = sum of:
          0.026269814 = weight(abstract_txt:methods in 2752) [ClassicSimilarity], result of:
            0.026269814 = score(doc=2752,freq=2.0), product of:
              0.08191168 = queryWeight, product of:
                1.075974 = boost
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.01835845 = queryNorm
              0.320709 = fieldWeight in 2752, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2752)
          0.025949111 = weight(abstract_txt:relevant in 2752) [ClassicSimilarity], result of:
            0.025949111 = score(doc=2752,freq=1.0), product of:
              0.102360606 = queryWeight, product of:
                1.2028052 = boost
                4.635553 = idf(docFreq=1165, maxDocs=44218)
                0.01835845 = queryNorm
              0.2535068 = fieldWeight in 2752, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.635553 = idf(docFreq=1165, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2752)
          0.032449506 = weight(abstract_txt:data in 2752) [ClassicSimilarity], result of:
            0.032449506 = score(doc=2752,freq=5.0), product of:
              0.07953599 = queryWeight, product of:
                1.298543 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.01835845 = queryNorm
              0.40798518 = fieldWeight in 2752, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2752)
          0.0871997 = weight(abstract_txt:overlap in 2752) [ClassicSimilarity], result of:
            0.0871997 = score(doc=2752,freq=1.0), product of:
              0.22964723 = queryWeight, product of:
                1.8016045 = boost
                6.943297 = idf(docFreq=115, maxDocs=44218)
                0.01835845 = queryNorm
              0.37971154 = fieldWeight in 2752, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.943297 = idf(docFreq=115, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2752)
          0.025466612 = weight(abstract_txt:that in 2752) [ClassicSimilarity], result of:
            0.025466612 = score(doc=2752,freq=6.0), product of:
              0.08023342 = queryWeight, product of:
                1.8444511 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.01835845 = queryNorm
              0.31740654 = fieldWeight in 2752, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2752)
          0.48043057 = weight(abstract_txt:fusion in 2752) [ClassicSimilarity], result of:
            0.48043057 = score(doc=2752,freq=7.0), product of:
              0.42868277 = queryWeight, product of:
                3.014689 = boost
                7.7456436 = idf(docFreq=51, maxDocs=44218)
                0.01835845 = queryNorm
              1.1207135 = fieldWeight in 2752, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                7.7456436 = idf(docFreq=51, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2752)
        0.24 = coord(6/25)