Document (#32928)

Author
He, B.
Ounis, I.
Title
Combining fields for query expansion and adaptive query expansion
Source
Information processing and management. 43(2007) no.5, S.1294-1307
Year
2007
Abstract
In this paper, we aim to improve query expansion for ad-hoc retrieval, by proposing a more fine-grained term reweighting process. This fine-grained process uses statistics from the representation of documents in various fields, such as their titles, the anchor text of their incoming links, and their body content. The contribution of this paper is twofold: First, we propose a novel query expansion mechanism on fields by combining field evidence available in a corpora. Second, we propose an adaptive query expansion mechanism that selects an appropriate collection resource, either the local collection, or a high-quality external resource, for query expansion on a per-query basis. The two proposed query expansion approaches are thoroughly evaluated using two standard Text Retrieval Conference (TREC) Web collections, namely the WT10G collection and the large-scale .GOV2 collection. From the experimental results, we observe a statistically significant improvement compared with the baselines. Moreover, we conclude that the adaptive query expansion mechanism is very effective when the external collection used is much larger than the local collection.

Similar documents (author)

  1. Lioma, C.; Ounis, I.: ¬A syntactically-based query reformulation technique for information retrieval (2008) 4.86
    4.85849 = sum of:
      4.85849 = weight(author_txt:ounis in 4032) [ClassicSimilarity], result of:
        4.85849 = fieldWeight in 4032, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.71698 = idf(docFreq=6, maxDocs=42740)
          0.5 = fieldNorm(doc=4032)
    
  2. Chang, Y.; Ounis, I.; Kim, M.: Query reformulation using automatically generated query concepts from a document space (2006) 3.64
    3.6438675 = sum of:
      3.6438675 = weight(author_txt:ounis in 2973) [ClassicSimilarity], result of:
        3.6438675 = fieldWeight in 2973, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.71698 = idf(docFreq=6, maxDocs=42740)
          0.375 = fieldNorm(doc=2973)
    
  3. Cacheda, F.; Plachouras, V.; Ounis, l.: ¬A case study of distributed information retrieval architectures to index one terabyte of text (2005) 3.64
    3.6438675 = sum of:
      3.6438675 = weight(author_txt:ounis in 3043) [ClassicSimilarity], result of:
        3.6438675 = fieldWeight in 3043, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.71698 = idf(docFreq=6, maxDocs=42740)
          0.375 = fieldNorm(doc=3043)
    
  4. Cacheda, F.; Carneiro, V.; Plachouras, V.; Ounis, I.: Performance analysis of distributed information retrieval architectures using an improved network simulation model (2007) 3.04
    3.0365562 = sum of:
      3.0365562 = weight(author_txt:ounis in 2904) [ClassicSimilarity], result of:
        3.0365562 = fieldWeight in 2904, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.71698 = idf(docFreq=6, maxDocs=42740)
          0.3125 = fieldNorm(doc=2904)
    
  5. Gray, A.J.G.; Gray, N.; Hall, C.W.; Ounis, I.: Finding the right term : retrieving and exploring semantic concepts in astronomical vocabularies (2010) 3.04
    3.0365562 = sum of:
      3.0365562 = weight(author_txt:ounis in 1236) [ClassicSimilarity], result of:
        3.0365562 = fieldWeight in 1236, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.71698 = idf(docFreq=6, maxDocs=42740)
          0.3125 = fieldNorm(doc=1236)
    

Similar documents (content)

  1. Sah, M.; Wade, V.: Personalized concept-based search on the Linked Open Data (2015) 0.27
    0.2706026 = sum of:
      0.2706026 = product of:
        0.75167394 = sum of:
          0.0064262166 = weight(abstract_txt:this in 4512) [ClassicSimilarity], result of:
            0.0064262166 = score(doc=4512,freq=2.0), product of:
              0.034011792 = queryWeight, product of:
                1.001353 = boost
                2.442996 = idf(docFreq=10095, maxDocs=42740)
                0.013903352 = queryNorm
              0.18894084 = fieldWeight in 4512, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.442996 = idf(docFreq=10095, maxDocs=42740)
                0.0546875 = fieldNorm(doc=4512)
          0.06006107 = weight(abstract_txt:selects in 4512) [ClassicSimilarity], result of:
            0.06006107 = score(doc=4512,freq=1.0), product of:
              0.13183303 = queryWeight, product of:
                1.1382141 = boost
                8.330686 = idf(docFreq=27, maxDocs=42740)
                0.013903352 = queryNorm
              0.45558438 = fieldWeight in 4512, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.330686 = idf(docFreq=27, maxDocs=42740)
                0.0546875 = fieldNorm(doc=4512)
          0.0101373885 = weight(abstract_txt:their in 4512) [ClassicSimilarity], result of:
            0.0101373885 = score(doc=4512,freq=1.0), product of:
              0.0580702 = queryWeight, product of:
                1.3084259 = boost
                3.1921601 = idf(docFreq=4772, maxDocs=42740)
                0.013903352 = queryNorm
              0.17457126 = fieldWeight in 4512, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1921601 = idf(docFreq=4772, maxDocs=42740)
                0.0546875 = fieldNorm(doc=4512)
          0.02980419 = weight(abstract_txt:local in 4512) [ClassicSimilarity], result of:
            0.02980419 = score(doc=4512,freq=1.0), product of:
              0.10410952 = queryWeight, product of:
                1.4304479 = boost
                5.234785 = idf(docFreq=618, maxDocs=42740)
                0.013903352 = queryNorm
              0.2862773 = fieldWeight in 4512, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.234785 = idf(docFreq=618, maxDocs=42740)
                0.0546875 = fieldNorm(doc=4512)
          0.05141938 = weight(abstract_txt:combining in 4512) [ClassicSimilarity], result of:
            0.05141938 = score(doc=4512,freq=1.0), product of:
              0.14975803 = queryWeight, product of:
                1.715623 = boost
                6.2783957 = idf(docFreq=217, maxDocs=42740)
                0.013903352 = queryNorm
              0.34334975 = fieldWeight in 4512, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2783957 = idf(docFreq=217, maxDocs=42740)
                0.0546875 = fieldNorm(doc=4512)
          0.080922656 = weight(abstract_txt:mechanism in 4512) [ClassicSimilarity], result of:
            0.080922656 = score(doc=4512,freq=1.0), product of:
              0.23194377 = queryWeight, product of:
                2.6149526 = boost
                6.379687 = idf(docFreq=196, maxDocs=42740)
                0.013903352 = queryNorm
              0.3488891 = fieldWeight in 4512, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.379687 = idf(docFreq=196, maxDocs=42740)
                0.0546875 = fieldNorm(doc=4512)
          0.10357373 = weight(abstract_txt:adaptive in 4512) [ClassicSimilarity], result of:
            0.10357373 = score(doc=4512,freq=1.0), product of:
              0.2734234 = queryWeight, product of:
                2.8391628 = boost
                6.926692 = idf(docFreq=113, maxDocs=42740)
                0.013903352 = queryNorm
              0.37880346 = fieldWeight in 4512, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.926692 = idf(docFreq=113, maxDocs=42740)
                0.0546875 = fieldNorm(doc=4512)
          0.14022347 = weight(abstract_txt:query in 4512) [ClassicSimilarity], result of:
            0.14022347 = score(doc=4512,freq=2.0), product of:
              0.3830424 = queryWeight, product of:
                5.8204494 = boost
                4.7333736 = idf(docFreq=1021, maxDocs=42740)
                0.013903352 = queryNorm
              0.3660782 = fieldWeight in 4512, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.7333736 = idf(docFreq=1021, maxDocs=42740)
                0.0546875 = fieldNorm(doc=4512)
          0.2691058 = weight(abstract_txt:expansion in 4512) [ClassicSimilarity], result of:
            0.2691058 = score(doc=4512,freq=2.0), product of:
              0.5687624 = queryWeight, product of:
                6.686861 = boost
                6.117713 = idf(docFreq=255, maxDocs=42740)
                0.013903352 = queryNorm
              0.4731427 = fieldWeight in 4512, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.117713 = idf(docFreq=255, maxDocs=42740)
                0.0546875 = fieldNorm(doc=4512)
        0.36 = coord(9/25)
    
  2. Efthimiadis, E.N.: End-users' understanding of thesaural knowledge structures in interactive query expansion (1994) 0.20
    0.2002633 = sum of:
      0.2002633 = product of:
        1.0013165 = sum of:
          0.010386334 = weight(abstract_txt:this in 609) [ClassicSimilarity], result of:
            0.010386334 = score(doc=609,freq=1.0), product of:
              0.034011792 = queryWeight, product of:
                1.001353 = boost
                2.442996 = idf(docFreq=10095, maxDocs=42740)
                0.013903352 = queryNorm
              0.3053745 = fieldWeight in 609, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.442996 = idf(docFreq=10095, maxDocs=42740)
                0.125 = fieldNorm(doc=609)
          0.032149266 = weight(abstract_txt:process in 609) [ClassicSimilarity], result of:
            0.032149266 = score(doc=609,freq=1.0), product of:
              0.06310614 = queryWeight, product of:
                1.1136857 = boost
                4.07558 = idf(docFreq=1972, maxDocs=42740)
                0.013903352 = queryNorm
              0.5094475 = fieldWeight in 609, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.07558 = idf(docFreq=1972, maxDocs=42740)
                0.125 = fieldNorm(doc=609)
          0.023171173 = weight(abstract_txt:their in 609) [ClassicSimilarity], result of:
            0.023171173 = score(doc=609,freq=1.0), product of:
              0.0580702 = queryWeight, product of:
                1.3084259 = boost
                3.1921601 = idf(docFreq=4772, maxDocs=42740)
                0.013903352 = queryNorm
              0.39902002 = fieldWeight in 609, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1921601 = idf(docFreq=4772, maxDocs=42740)
                0.125 = fieldNorm(doc=609)
          0.32051077 = weight(abstract_txt:query in 609) [ClassicSimilarity], result of:
            0.32051077 = score(doc=609,freq=2.0), product of:
              0.3830424 = queryWeight, product of:
                5.8204494 = boost
                4.7333736 = idf(docFreq=1021, maxDocs=42740)
                0.013903352 = queryNorm
              0.83675015 = fieldWeight in 609, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.7333736 = idf(docFreq=1021, maxDocs=42740)
                0.125 = fieldNorm(doc=609)
          0.61509895 = weight(abstract_txt:expansion in 609) [ClassicSimilarity], result of:
            0.61509895 = score(doc=609,freq=2.0), product of:
              0.5687624 = queryWeight, product of:
                6.686861 = boost
                6.117713 = idf(docFreq=255, maxDocs=42740)
                0.013903352 = queryNorm
              1.081469 = fieldWeight in 609, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.117713 = idf(docFreq=255, maxDocs=42740)
                0.125 = fieldNorm(doc=609)
        0.2 = coord(5/25)
    
  3. Qiu, Y.; Frei, H.P.: Concept based query expansion (1993) 0.20
    0.1989773 = sum of:
      0.1989773 = product of:
        1.2436082 = sum of:
          0.03475676 = weight(abstract_txt:their in 2678) [ClassicSimilarity], result of:
            0.03475676 = score(doc=2678,freq=1.0), product of:
              0.0580702 = queryWeight, product of:
                1.3084259 = boost
                3.1921601 = idf(docFreq=4772, maxDocs=42740)
                0.013903352 = queryNorm
              0.59853005 = fieldWeight in 2678, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1921601 = idf(docFreq=4772, maxDocs=42740)
                0.1875 = fieldNorm(doc=2678)
          0.21648751 = weight(abstract_txt:collection in 2678) [ClassicSimilarity], result of:
            0.21648751 = score(doc=2678,freq=1.0), product of:
              0.24768083 = queryWeight, product of:
                3.8214982 = boost
                4.661645 = idf(docFreq=1097, maxDocs=42740)
                0.013903352 = queryNorm
              0.8740584 = fieldWeight in 2678, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.661645 = idf(docFreq=1097, maxDocs=42740)
                0.1875 = fieldNorm(doc=2678)
          0.33995304 = weight(abstract_txt:query in 2678) [ClassicSimilarity], result of:
            0.33995304 = score(doc=2678,freq=1.0), product of:
              0.3830424 = queryWeight, product of:
                5.8204494 = boost
                4.7333736 = idf(docFreq=1021, maxDocs=42740)
                0.013903352 = queryNorm
              0.88750756 = fieldWeight in 2678, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7333736 = idf(docFreq=1021, maxDocs=42740)
                0.1875 = fieldNorm(doc=2678)
          0.6524109 = weight(abstract_txt:expansion in 2678) [ClassicSimilarity], result of:
            0.6524109 = score(doc=2678,freq=1.0), product of:
              0.5687624 = queryWeight, product of:
                6.686861 = boost
                6.117713 = idf(docFreq=255, maxDocs=42740)
                0.013903352 = queryNorm
              1.1470711 = fieldWeight in 2678, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.117713 = idf(docFreq=255, maxDocs=42740)
                0.1875 = fieldNorm(doc=2678)
        0.16 = coord(4/25)
    
  4. Efthimiadis, E.N.: Query expansion (1996) 0.20
    0.19832076 = sum of:
      0.19832076 = product of:
        1.652673 = sum of:
          0.032149266 = weight(abstract_txt:process in 4916) [ClassicSimilarity], result of:
            0.032149266 = score(doc=4916,freq=1.0), product of:
              0.06310614 = queryWeight, product of:
                1.1136857 = boost
                4.07558 = idf(docFreq=1972, maxDocs=42740)
                0.013903352 = queryNorm
              0.5094475 = fieldWeight in 4916, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.07558 = idf(docFreq=1972, maxDocs=42740)
                0.125 = fieldNorm(doc=4916)
          0.555141 = weight(abstract_txt:query in 4916) [ClassicSimilarity], result of:
            0.555141 = score(doc=4916,freq=6.0), product of:
              0.3830424 = queryWeight, product of:
                5.8204494 = boost
                4.7333736 = idf(docFreq=1021, maxDocs=42740)
                0.013903352 = queryNorm
              1.4492939 = fieldWeight in 4916, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.7333736 = idf(docFreq=1021, maxDocs=42740)
                0.125 = fieldNorm(doc=4916)
          1.0653827 = weight(abstract_txt:expansion in 4916) [ClassicSimilarity], result of:
            1.0653827 = score(doc=4916,freq=6.0), product of:
              0.5687624 = queryWeight, product of:
                6.686861 = boost
                6.117713 = idf(docFreq=255, maxDocs=42740)
                0.013903352 = queryNorm
              1.8731594 = fieldWeight in 4916, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.117713 = idf(docFreq=255, maxDocs=42740)
                0.125 = fieldNorm(doc=4916)
        0.12 = coord(3/25)
    
  5. Brandão, W.C.; Santos, R.L.T.; Ziviani, N.; Moura, E.S. de; Silva, A.S. da: Learning to expand queries using entities (2014) 0.20
    0.19697885 = sum of:
      0.19697885 = product of:
        0.8207452 = sum of:
          0.0073442473 = weight(abstract_txt:this in 3344) [ClassicSimilarity], result of:
            0.0073442473 = score(doc=3344,freq=2.0), product of:
              0.034011792 = queryWeight, product of:
                1.001353 = boost
                2.442996 = idf(docFreq=10095, maxDocs=42740)
                0.013903352 = queryNorm
              0.21593238 = fieldWeight in 3344, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.442996 = idf(docFreq=10095, maxDocs=42740)
                0.0625 = fieldNorm(doc=3344)
          0.015774546 = weight(abstract_txt:text in 3344) [ClassicSimilarity], result of:
            0.015774546 = score(doc=3344,freq=1.0), product of:
              0.062318284 = queryWeight, product of:
                1.1067119 = boost
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.013903352 = queryNorm
              0.2531287 = fieldWeight in 3344, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.0625 = fieldNorm(doc=3344)
          0.011585587 = weight(abstract_txt:their in 3344) [ClassicSimilarity], result of:
            0.011585587 = score(doc=3344,freq=1.0), product of:
              0.0580702 = queryWeight, product of:
                1.3084259 = boost
                3.1921601 = idf(docFreq=4772, maxDocs=42740)
                0.013903352 = queryNorm
              0.19951001 = fieldWeight in 3344, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1921601 = idf(docFreq=4772, maxDocs=42740)
                0.0625 = fieldNorm(doc=3344)
          0.04637634 = weight(abstract_txt:fields in 3344) [ClassicSimilarity], result of:
            0.04637634 = score(doc=3344,freq=1.0), product of:
              0.14639929 = queryWeight, product of:
                2.0775044 = boost
                5.068477 = idf(docFreq=730, maxDocs=42740)
                0.013903352 = queryNorm
              0.31677982 = fieldWeight in 3344, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.068477 = idf(docFreq=730, maxDocs=42740)
                0.0625 = fieldNorm(doc=3344)
          0.25338602 = weight(abstract_txt:query in 3344) [ClassicSimilarity], result of:
            0.25338602 = score(doc=3344,freq=5.0), product of:
              0.3830424 = queryWeight, product of:
                5.8204494 = boost
                4.7333736 = idf(docFreq=1021, maxDocs=42740)
                0.013903352 = queryNorm
              0.6615091 = fieldWeight in 3344, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.7333736 = idf(docFreq=1021, maxDocs=42740)
                0.0625 = fieldNorm(doc=3344)
          0.48627844 = weight(abstract_txt:expansion in 3344) [ClassicSimilarity], result of:
            0.48627844 = score(doc=3344,freq=5.0), product of:
              0.5687624 = queryWeight, product of:
                6.686861 = boost
                6.117713 = idf(docFreq=255, maxDocs=42740)
                0.013903352 = queryNorm
              0.8549764 = fieldWeight in 3344, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.117713 = idf(docFreq=255, maxDocs=42740)
                0.0625 = fieldNorm(doc=3344)
        0.24 = coord(6/25)