Document (#32925)

Author
He, B.
Ounis, I.
Title
Combining fields for query expansion and adaptive query expansion
Source
Information processing and management. 43(2007) no.5, S.1294-1307
Year
2007
Abstract
In this paper, we aim to improve query expansion for ad-hoc retrieval, by proposing a more fine-grained term reweighting process. This fine-grained process uses statistics from the representation of documents in various fields, such as their titles, the anchor text of their incoming links, and their body content. The contribution of this paper is twofold: First, we propose a novel query expansion mechanism on fields by combining field evidence available in a corpora. Second, we propose an adaptive query expansion mechanism that selects an appropriate collection resource, either the local collection, or a high-quality external resource, for query expansion on a per-query basis. The two proposed query expansion approaches are thoroughly evaluated using two standard Text Retrieval Conference (TREC) Web collections, namely the WT10G collection and the large-scale .GOV2 collection. From the experimental results, we observe a statistically significant improvement compared with the baselines. Moreover, we conclude that the adaptive query expansion mechanism is very effective when the external collection used is much larger than the local collection.

Similar documents (author)

  1. Lioma, C.; Ounis, I.: ¬A syntactically-based query reformulation technique for information retrieval (2008) 4.80
    4.801181 = sum of:
      4.801181 = weight(author_txt:ounis in 4029) [ClassicSimilarity], result of:
        4.801181 = fieldWeight in 4029, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.602362 = idf(docFreq=7, maxDocs=43556)
          0.5 = fieldNorm(doc=4029)
    
  2. Chang, Y.; Ounis, I.; Kim, M.: Query reformulation using automatically generated query concepts from a document space (2006) 3.60
    3.6008856 = sum of:
      3.6008856 = weight(author_txt:ounis in 2970) [ClassicSimilarity], result of:
        3.6008856 = fieldWeight in 2970, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.602362 = idf(docFreq=7, maxDocs=43556)
          0.375 = fieldNorm(doc=2970)
    
  3. Cacheda, F.; Plachouras, V.; Ounis, l.: ¬A case study of distributed information retrieval architectures to index one terabyte of text (2005) 3.60
    3.6008856 = sum of:
      3.6008856 = weight(author_txt:ounis in 3040) [ClassicSimilarity], result of:
        3.6008856 = fieldWeight in 3040, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.602362 = idf(docFreq=7, maxDocs=43556)
          0.375 = fieldNorm(doc=3040)
    
  4. Cacheda, F.; Carneiro, V.; Plachouras, V.; Ounis, I.: Performance analysis of distributed information retrieval architectures using an improved network simulation model (2007) 3.00
    3.0007381 = sum of:
      3.0007381 = weight(author_txt:ounis in 2901) [ClassicSimilarity], result of:
        3.0007381 = fieldWeight in 2901, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.602362 = idf(docFreq=7, maxDocs=43556)
          0.3125 = fieldNorm(doc=2901)
    
  5. Gray, A.J.G.; Gray, N.; Hall, C.W.; Ounis, I.: Finding the right term : retrieving and exploring semantic concepts in astronomical vocabularies (2010) 3.00
    3.0007381 = sum of:
      3.0007381 = weight(author_txt:ounis in 1233) [ClassicSimilarity], result of:
        3.0007381 = fieldWeight in 1233, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.602362 = idf(docFreq=7, maxDocs=43556)
          0.3125 = fieldNorm(doc=1233)
    

Similar documents (content)

  1. Sah, M.; Wade, V.: Personalized concept-based search on the Linked Open Data (2015) 0.27
    0.2712676 = sum of:
      0.2712676 = product of:
        0.7535211 = sum of:
          0.0063345046 = weight(abstract_txt:this in 4509) [ClassicSimilarity], result of:
            0.0063345046 = score(doc=4509,freq=2.0), product of:
              0.03374117 = queryWeight, product of:
                2.4274454 = idf(docFreq=10449, maxDocs=43556)
                0.013899867 = queryNorm
              0.18773815 = fieldWeight in 4509, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4274454 = idf(docFreq=10449, maxDocs=43556)
                0.0546875 = fieldNorm(doc=4509)
          0.060760967 = weight(abstract_txt:selects in 4509) [ClassicSimilarity], result of:
            0.060760967 = score(doc=4509,freq=1.0), product of:
              0.1330672 = queryWeight, product of:
                1.1465548 = boost
                8.349598 = idf(docFreq=27, maxDocs=43556)
                0.013899867 = queryNorm
              0.45661864 = fieldWeight in 4509, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.349598 = idf(docFreq=27, maxDocs=43556)
                0.0546875 = fieldNorm(doc=4509)
          0.010044301 = weight(abstract_txt:their in 4509) [ClassicSimilarity], result of:
            0.010044301 = score(doc=4509,freq=1.0), product of:
              0.057806253 = queryWeight, product of:
                1.3089026 = boost
                3.1772897 = idf(docFreq=4936, maxDocs=43556)
                0.013899867 = queryNorm
              0.17375803 = fieldWeight in 4509, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1772897 = idf(docFreq=4936, maxDocs=43556)
                0.0546875 = fieldNorm(doc=4509)
          0.029887868 = weight(abstract_txt:local in 4509) [ClassicSimilarity], result of:
            0.029887868 = score(doc=4509,freq=1.0), product of:
              0.10447071 = queryWeight, product of:
                1.4367181 = boost
                5.2313323 = idf(docFreq=632, maxDocs=43556)
                0.013899867 = queryNorm
              0.2860885 = fieldWeight in 4509, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2313323 = idf(docFreq=632, maxDocs=43556)
                0.0546875 = fieldNorm(doc=4509)
          0.050921112 = weight(abstract_txt:combining in 4509) [ClassicSimilarity], result of:
            0.050921112 = score(doc=4509,freq=1.0), product of:
              0.1490264 = queryWeight, product of:
                1.715955 = boost
                6.2480807 = idf(docFreq=228, maxDocs=43556)
                0.013899867 = queryNorm
              0.3416919 = fieldWeight in 4509, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2480807 = idf(docFreq=228, maxDocs=43556)
                0.0546875 = fieldNorm(doc=4509)
          0.07960246 = weight(abstract_txt:mechanism in 4509) [ClassicSimilarity], result of:
            0.07960246 = score(doc=4509,freq=1.0), product of:
              0.22978023 = queryWeight, product of:
                2.609614 = boost
                6.3346953 = idf(docFreq=209, maxDocs=43556)
                0.013899867 = queryNorm
              0.34642866 = fieldWeight in 4509, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.3346953 = idf(docFreq=209, maxDocs=43556)
                0.0546875 = fieldNorm(doc=4509)
          0.10492511 = weight(abstract_txt:adaptive in 4509) [ClassicSimilarity], result of:
            0.10492511 = score(doc=4509,freq=1.0), product of:
              0.27623665 = queryWeight, product of:
                2.8612814 = boost
                6.9456043 = idf(docFreq=113, maxDocs=43556)
                0.013899867 = queryNorm
              0.37983775 = fieldWeight in 4509, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9456043 = idf(docFreq=113, maxDocs=43556)
                0.0546875 = fieldNorm(doc=4509)
          0.14171667 = weight(abstract_txt:query in 4509) [ClassicSimilarity], result of:
            0.14171667 = score(doc=4509,freq=2.0), product of:
              0.3863724 = queryWeight, product of:
                5.8611603 = boost
                4.742549 = idf(docFreq=1031, maxDocs=43556)
                0.013899867 = queryNorm
              0.3667878 = fieldWeight in 4509, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.742549 = idf(docFreq=1031, maxDocs=43556)
                0.0546875 = fieldNorm(doc=4509)
          0.26932812 = weight(abstract_txt:expansion in 4509) [ClassicSimilarity], result of:
            0.26932812 = score(doc=4509,freq=2.0), product of:
              0.56998366 = queryWeight, product of:
                6.7117457 = boost
                6.1096487 = idf(docFreq=262, maxDocs=43556)
                0.013899867 = queryNorm
              0.472519 = fieldWeight in 4509, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.1096487 = idf(docFreq=262, maxDocs=43556)
                0.0546875 = fieldNorm(doc=4509)
        0.36 = coord(9/25)
    
  2. Efthimiadis, E.N.: End-users' understanding of thesaural knowledge structures in interactive query expansion (1994) 0.20
    0.20095377 = sum of:
      0.20095377 = product of:
        1.0047688 = sum of:
          0.010238105 = weight(abstract_txt:this in 691) [ClassicSimilarity], result of:
            0.010238105 = score(doc=691,freq=1.0), product of:
              0.03374117 = queryWeight, product of:
                2.4274454 = idf(docFreq=10449, maxDocs=43556)
                0.013899867 = queryNorm
              0.30343068 = fieldWeight in 691, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4274454 = idf(docFreq=10449, maxDocs=43556)
                0.125 = fieldNorm(doc=691)
          0.032041483 = weight(abstract_txt:process in 691) [ClassicSimilarity], result of:
            0.032041483 = score(doc=691,freq=1.0), product of:
              0.06306548 = queryWeight, product of:
                1.1162723 = boost
                4.064535 = idf(docFreq=2032, maxDocs=43556)
                0.013899867 = queryNorm
              0.5080669 = fieldWeight in 691, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.064535 = idf(docFreq=2032, maxDocs=43556)
                0.125 = fieldNorm(doc=691)
          0.022958402 = weight(abstract_txt:their in 691) [ClassicSimilarity], result of:
            0.022958402 = score(doc=691,freq=1.0), product of:
              0.057806253 = queryWeight, product of:
                1.3089026 = boost
                3.1772897 = idf(docFreq=4936, maxDocs=43556)
                0.013899867 = queryNorm
              0.39716122 = fieldWeight in 691, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1772897 = idf(docFreq=4936, maxDocs=43556)
                0.125 = fieldNorm(doc=691)
          0.32392383 = weight(abstract_txt:query in 691) [ClassicSimilarity], result of:
            0.32392383 = score(doc=691,freq=2.0), product of:
              0.3863724 = queryWeight, product of:
                5.8611603 = boost
                4.742549 = idf(docFreq=1031, maxDocs=43556)
                0.013899867 = queryNorm
              0.8383721 = fieldWeight in 691, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.742549 = idf(docFreq=1031, maxDocs=43556)
                0.125 = fieldNorm(doc=691)
          0.6156071 = weight(abstract_txt:expansion in 691) [ClassicSimilarity], result of:
            0.6156071 = score(doc=691,freq=2.0), product of:
              0.56998366 = queryWeight, product of:
                6.7117457 = boost
                6.1096487 = idf(docFreq=262, maxDocs=43556)
                0.013899867 = queryNorm
              1.0800434 = fieldWeight in 691, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.1096487 = idf(docFreq=262, maxDocs=43556)
                0.125 = fieldNorm(doc=691)
        0.2 = coord(5/25)
    
  3. Qiu, Y.; Frei, H.P.: Concept based query expansion (1993) 0.20
    0.1994793 = sum of:
      0.1994793 = product of:
        1.2467456 = sum of:
          0.0344376 = weight(abstract_txt:their in 2678) [ClassicSimilarity], result of:
            0.0344376 = score(doc=2678,freq=1.0), product of:
              0.057806253 = queryWeight, product of:
                1.3089026 = boost
                3.1772897 = idf(docFreq=4936, maxDocs=43556)
                0.013899867 = queryNorm
              0.5957418 = fieldWeight in 2678, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1772897 = idf(docFreq=4936, maxDocs=43556)
                0.1875 = fieldNorm(doc=2678)
          0.21578494 = weight(abstract_txt:collection in 2678) [ClassicSimilarity], result of:
            0.21578494 = score(doc=2678,freq=1.0), product of:
              0.247539 = queryWeight, product of:
                3.8305113 = boost
                4.6491785 = idf(docFreq=1132, maxDocs=43556)
                0.013899867 = queryNorm
              0.87172097 = fieldWeight in 2678, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6491785 = idf(docFreq=1132, maxDocs=43556)
                0.1875 = fieldNorm(doc=2678)
          0.34357312 = weight(abstract_txt:query in 2678) [ClassicSimilarity], result of:
            0.34357312 = score(doc=2678,freq=1.0), product of:
              0.3863724 = queryWeight, product of:
                5.8611603 = boost
                4.742549 = idf(docFreq=1031, maxDocs=43556)
                0.013899867 = queryNorm
              0.8892279 = fieldWeight in 2678, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.742549 = idf(docFreq=1031, maxDocs=43556)
                0.1875 = fieldNorm(doc=2678)
          0.6529499 = weight(abstract_txt:expansion in 2678) [ClassicSimilarity], result of:
            0.6529499 = score(doc=2678,freq=1.0), product of:
              0.56998366 = queryWeight, product of:
                6.7117457 = boost
                6.1096487 = idf(docFreq=262, maxDocs=43556)
                0.013899867 = queryNorm
              1.1455591 = fieldWeight in 2678, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1096487 = idf(docFreq=262, maxDocs=43556)
                0.1875 = fieldNorm(doc=2678)
        0.16 = coord(4/25)
    
  4. Efthimiadis, E.N.: Query expansion (1996) 0.20
    0.19912285 = sum of:
      0.19912285 = product of:
        1.6593571 = sum of:
          0.032041483 = weight(abstract_txt:process in 4913) [ClassicSimilarity], result of:
            0.032041483 = score(doc=4913,freq=1.0), product of:
              0.06306548 = queryWeight, product of:
                1.1162723 = boost
                4.064535 = idf(docFreq=2032, maxDocs=43556)
                0.013899867 = queryNorm
              0.5080669 = fieldWeight in 4913, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.064535 = idf(docFreq=2032, maxDocs=43556)
                0.125 = fieldNorm(doc=4913)
          0.56105256 = weight(abstract_txt:query in 4913) [ClassicSimilarity], result of:
            0.56105256 = score(doc=4913,freq=6.0), product of:
              0.3863724 = queryWeight, product of:
                5.8611603 = boost
                4.742549 = idf(docFreq=1031, maxDocs=43556)
                0.013899867 = queryNorm
              1.4521031 = fieldWeight in 4913, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.742549 = idf(docFreq=1031, maxDocs=43556)
                0.125 = fieldNorm(doc=4913)
          1.066263 = weight(abstract_txt:expansion in 4913) [ClassicSimilarity], result of:
            1.066263 = score(doc=4913,freq=6.0), product of:
              0.56998366 = queryWeight, product of:
                6.7117457 = boost
                6.1096487 = idf(docFreq=262, maxDocs=43556)
                0.013899867 = queryNorm
              1.8706903 = fieldWeight in 4913, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.1096487 = idf(docFreq=262, maxDocs=43556)
                0.125 = fieldNorm(doc=4913)
        0.12 = coord(3/25)
    
  5. Brandão, W.C.; Santos, R.L.T.; Ziviani, N.; Moura, E.S. de; Silva, A.S. da: Learning to expand queries using entities (2014) 0.20
    0.197636 = sum of:
      0.197636 = product of:
        0.82348335 = sum of:
          0.007239434 = weight(abstract_txt:this in 3341) [ClassicSimilarity], result of:
            0.007239434 = score(doc=3341,freq=2.0), product of:
              0.03374117 = queryWeight, product of:
                2.4274454 = idf(docFreq=10449, maxDocs=43556)
                0.013899867 = queryNorm
              0.21455789 = fieldWeight in 3341, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4274454 = idf(docFreq=10449, maxDocs=43556)
                0.0625 = fieldNorm(doc=3341)
          0.015842456 = weight(abstract_txt:text in 3341) [ClassicSimilarity], result of:
            0.015842456 = score(doc=3341,freq=1.0), product of:
              0.06259673 = queryWeight, product of:
                1.1121161 = boost
                4.0494018 = idf(docFreq=2063, maxDocs=43556)
                0.013899867 = queryNorm
              0.2530876 = fieldWeight in 3341, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0494018 = idf(docFreq=2063, maxDocs=43556)
                0.0625 = fieldNorm(doc=3341)
          0.011479201 = weight(abstract_txt:their in 3341) [ClassicSimilarity], result of:
            0.011479201 = score(doc=3341,freq=1.0), product of:
              0.057806253 = queryWeight, product of:
                1.3089026 = boost
                3.1772897 = idf(docFreq=4936, maxDocs=43556)
                0.013899867 = queryNorm
              0.19858061 = fieldWeight in 3341, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1772897 = idf(docFreq=4936, maxDocs=43556)
                0.0625 = fieldNorm(doc=3341)
          0.046157748 = weight(abstract_txt:fields in 3341) [ClassicSimilarity], result of:
            0.046157748 = score(doc=3341,freq=1.0), product of:
              0.14617175 = queryWeight, product of:
                2.0813813 = boost
                5.0524397 = idf(docFreq=756, maxDocs=43556)
                0.013899867 = queryNorm
              0.31577748 = fieldWeight in 3341, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0524397 = idf(docFreq=756, maxDocs=43556)
                0.0625 = fieldNorm(doc=3341)
          0.2560843 = weight(abstract_txt:query in 3341) [ClassicSimilarity], result of:
            0.2560843 = score(doc=3341,freq=5.0), product of:
              0.3863724 = queryWeight, product of:
                5.8611603 = boost
                4.742549 = idf(docFreq=1031, maxDocs=43556)
                0.013899867 = queryNorm
              0.6627914 = fieldWeight in 3341, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.742549 = idf(docFreq=1031, maxDocs=43556)
                0.0625 = fieldNorm(doc=3341)
          0.48668018 = weight(abstract_txt:expansion in 3341) [ClassicSimilarity], result of:
            0.48668018 = score(doc=3341,freq=5.0), product of:
              0.56998366 = queryWeight, product of:
                6.7117457 = boost
                6.1096487 = idf(docFreq=262, maxDocs=43556)
                0.013899867 = queryNorm
              0.85384935 = fieldWeight in 3341, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.1096487 = idf(docFreq=262, maxDocs=43556)
                0.0625 = fieldNorm(doc=3341)
        0.24 = coord(6/25)