Document (#32927)

Author
He, B.
Ounis, I.
Title
Combining fields for query expansion and adaptive query expansion
Source
Information processing and management. 43(2007) no.5, S.1294-1307
Year
2007
Abstract
In this paper, we aim to improve query expansion for ad-hoc retrieval, by proposing a more fine-grained term reweighting process. This fine-grained process uses statistics from the representation of documents in various fields, such as their titles, the anchor text of their incoming links, and their body content. The contribution of this paper is twofold: First, we propose a novel query expansion mechanism on fields by combining field evidence available in a corpora. Second, we propose an adaptive query expansion mechanism that selects an appropriate collection resource, either the local collection, or a high-quality external resource, for query expansion on a per-query basis. The two proposed query expansion approaches are thoroughly evaluated using two standard Text Retrieval Conference (TREC) Web collections, namely the WT10G collection and the large-scale .GOV2 collection. From the experimental results, we observe a statistically significant improvement compared with the baselines. Moreover, we conclude that the adaptive query expansion mechanism is very effective when the external collection used is much larger than the local collection.

Similar documents (author)

  1. Lioma, C.; Ounis, I.: ¬A syntactically-based query reformulation technique for information retrieval (2008) 4.81
    4.808723 = sum of:
      4.808723 = weight(author_txt:ounis in 2031) [ClassicSimilarity], result of:
        4.808723 = fieldWeight in 2031, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.617446 = idf(docFreq=7, maxDocs=44218)
          0.5 = fieldNorm(doc=2031)
    
  2. Chang, Y.; Ounis, I.; Kim, M.: Query reformulation using automatically generated query concepts from a document space (2006) 3.61
    3.606542 = sum of:
      3.606542 = weight(author_txt:ounis in 972) [ClassicSimilarity], result of:
        3.606542 = fieldWeight in 972, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.617446 = idf(docFreq=7, maxDocs=44218)
          0.375 = fieldNorm(doc=972)
    
  3. Cacheda, F.; Plachouras, V.; Ounis, l.: ¬A case study of distributed information retrieval architectures to index one terabyte of text (2005) 3.61
    3.606542 = sum of:
      3.606542 = weight(author_txt:ounis in 1042) [ClassicSimilarity], result of:
        3.606542 = fieldWeight in 1042, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.617446 = idf(docFreq=7, maxDocs=44218)
          0.375 = fieldNorm(doc=1042)
    
  4. Cacheda, F.; Carneiro, V.; Plachouras, V.; Ounis, I.: Performance analysis of distributed information retrieval architectures using an improved network simulation model (2007) 3.01
    3.005452 = sum of:
      3.005452 = weight(author_txt:ounis in 903) [ClassicSimilarity], result of:
        3.005452 = fieldWeight in 903, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.617446 = idf(docFreq=7, maxDocs=44218)
          0.3125 = fieldNorm(doc=903)
    
  5. Gray, A.J.G.; Gray, N.; Hall, C.W.; Ounis, I.: Finding the right term : retrieving and exploring semantic concepts in astronomical vocabularies (2010) 3.01
    3.005452 = sum of:
      3.005452 = weight(author_txt:ounis in 4235) [ClassicSimilarity], result of:
        3.005452 = fieldWeight in 4235, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.617446 = idf(docFreq=7, maxDocs=44218)
          0.3125 = fieldNorm(doc=4235)
    

Similar documents (content)

  1. Sah, M.; Wade, V.: Personalized concept-based search on the Linked Open Data (2015) 0.27
    0.27189368 = sum of:
      0.27189368 = product of:
        0.7552602 = sum of:
          0.0062425253 = weight(abstract_txt:this in 2511) [ClassicSimilarity], result of:
            0.0062425253 = score(doc=2511,freq=2.0), product of:
              0.03345005 = queryWeight, product of:
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.013862331 = queryNorm
              0.1866223 = fieldWeight in 2511, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2511)
          0.061290205 = weight(abstract_txt:selects in 2511) [ClassicSimilarity], result of:
            0.061290205 = score(doc=2511,freq=1.0), product of:
              0.13398418 = queryWeight, product of:
                1.1554941 = boost
                8.364683 = idf(docFreq=27, maxDocs=44218)
                0.013862331 = queryNorm
              0.4574436 = fieldWeight in 2511, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.364683 = idf(docFreq=27, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2511)
          0.009908745 = weight(abstract_txt:their in 2511) [ClassicSimilarity], result of:
            0.009908745 = score(doc=2511,freq=1.0), product of:
              0.057347216 = queryWeight, product of:
                1.3093562 = boost
                3.1594994 = idf(docFreq=5101, maxDocs=44218)
                0.013862331 = queryNorm
              0.17278512 = fieldWeight in 2511, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1594994 = idf(docFreq=5101, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2511)
          0.02984227 = weight(abstract_txt:local in 2511) [ClassicSimilarity], result of:
            0.02984227 = score(doc=2511,freq=1.0), product of:
              0.1044778 = queryWeight, product of:
                1.4430056 = boost
                5.2229967 = idf(docFreq=647, maxDocs=44218)
                0.013862331 = queryNorm
              0.28563264 = fieldWeight in 2511, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2229967 = idf(docFreq=647, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2511)
          0.050719574 = weight(abstract_txt:combining in 2511) [ClassicSimilarity], result of:
            0.050719574 = score(doc=2511,freq=1.0), product of:
              0.14879438 = queryWeight, product of:
                1.7220639 = boost
                6.2330556 = idf(docFreq=235, maxDocs=44218)
                0.013862331 = queryNorm
              0.34087023 = fieldWeight in 2511, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2330556 = idf(docFreq=235, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2511)
          0.0791944 = weight(abstract_txt:mechanism in 2511) [ClassicSimilarity], result of:
            0.0791944 = score(doc=2511,freq=1.0), product of:
              0.22924305 = queryWeight, product of:
                2.6178799 = boost
                6.31699 = idf(docFreq=216, maxDocs=44218)
                0.013862331 = queryNorm
              0.34546039 = fieldWeight in 2511, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.31699 = idf(docFreq=216, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2511)
          0.105162546 = weight(abstract_txt:adaptive in 2511) [ClassicSimilarity], result of:
            0.105162546 = score(doc=2511,freq=1.0), product of:
              0.2769538 = queryWeight, product of:
                2.8774333 = boost
                6.943297 = idf(docFreq=115, maxDocs=44218)
                0.013862331 = queryNorm
              0.37971154 = fieldWeight in 2511, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.943297 = idf(docFreq=115, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2511)
          0.14319015 = weight(abstract_txt:query in 2511) [ClassicSimilarity], result of:
            0.14319015 = score(doc=2511,freq=2.0), product of:
              0.38946855 = queryWeight, product of:
                5.910149 = boost
                4.7537646 = idf(docFreq=1035, maxDocs=44218)
                0.013862331 = queryNorm
              0.36765522 = fieldWeight in 2511, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.7537646 = idf(docFreq=1035, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2511)
          0.2697098 = weight(abstract_txt:expansion in 2511) [ClassicSimilarity], result of:
            0.2697098 = score(doc=2511,freq=2.0), product of:
              0.57114184 = queryWeight, product of:
                6.747734 = boost
                6.1059003 = idf(docFreq=267, maxDocs=44218)
                0.013862331 = queryNorm
              0.47222912 = fieldWeight in 2511, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.1059003 = idf(docFreq=267, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2511)
        0.36 = coord(9/25)
    
  2. Efthimiadis, E.N.: End-users' understanding of thesaural knowledge structures in interactive query expansion (1994) 0.20
    0.20166712 = sum of:
      0.20166712 = product of:
        1.0083356 = sum of:
          0.010089444 = weight(abstract_txt:this in 5693) [ClassicSimilarity], result of:
            0.010089444 = score(doc=5693,freq=1.0), product of:
              0.03345005 = queryWeight, product of:
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.013862331 = queryNorm
              0.3016272 = fieldWeight in 5693, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.125 = fieldNorm(doc=5693)
          0.031826217 = weight(abstract_txt:process in 5693) [ClassicSimilarity], result of:
            0.031826217 = score(doc=5693,freq=1.0), product of:
              0.0628509 = queryWeight, product of:
                1.1192104 = boost
                4.0510116 = idf(docFreq=2091, maxDocs=44218)
                0.013862331 = queryNorm
              0.50637645 = fieldWeight in 5693, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0510116 = idf(docFreq=2091, maxDocs=44218)
                0.125 = fieldNorm(doc=5693)
          0.022648562 = weight(abstract_txt:their in 5693) [ClassicSimilarity], result of:
            0.022648562 = score(doc=5693,freq=1.0), product of:
              0.057347216 = queryWeight, product of:
                1.3093562 = boost
                3.1594994 = idf(docFreq=5101, maxDocs=44218)
                0.013862331 = queryNorm
              0.39493743 = fieldWeight in 5693, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1594994 = idf(docFreq=5101, maxDocs=44218)
                0.125 = fieldNorm(doc=5693)
          0.32729176 = weight(abstract_txt:query in 5693) [ClassicSimilarity], result of:
            0.32729176 = score(doc=5693,freq=2.0), product of:
              0.38946855 = queryWeight, product of:
                5.910149 = boost
                4.7537646 = idf(docFreq=1035, maxDocs=44218)
                0.013862331 = queryNorm
              0.8403548 = fieldWeight in 5693, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.7537646 = idf(docFreq=1035, maxDocs=44218)
                0.125 = fieldNorm(doc=5693)
          0.6164796 = weight(abstract_txt:expansion in 5693) [ClassicSimilarity], result of:
            0.6164796 = score(doc=5693,freq=2.0), product of:
              0.57114184 = queryWeight, product of:
                6.747734 = boost
                6.1059003 = idf(docFreq=267, maxDocs=44218)
                0.013862331 = queryNorm
              1.0793809 = fieldWeight in 5693, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.1059003 = idf(docFreq=267, maxDocs=44218)
                0.125 = fieldNorm(doc=5693)
        0.2 = coord(5/25)
    
  3. Qiu, Y.; Frei, H.P.: Concept based query expansion (1993) 0.20
    0.20022205 = sum of:
      0.20022205 = product of:
        1.2513878 = sum of:
          0.033972844 = weight(abstract_txt:their in 2678) [ClassicSimilarity], result of:
            0.033972844 = score(doc=2678,freq=1.0), product of:
              0.057347216 = queryWeight, product of:
                1.3093562 = boost
                3.1594994 = idf(docFreq=5101, maxDocs=44218)
                0.013862331 = queryNorm
              0.59240615 = fieldWeight in 2678, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1594994 = idf(docFreq=5101, maxDocs=44218)
                0.1875 = fieldNorm(doc=2678)
          0.2163943 = weight(abstract_txt:collection in 2678) [ClassicSimilarity], result of:
            0.2163943 = score(doc=2678,freq=1.0), product of:
              0.24827422 = queryWeight, product of:
                3.852853 = boost
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.013862331 = queryNorm
              0.87159395 = fieldWeight in 2678, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.1875 = fieldNorm(doc=2678)
          0.34714532 = weight(abstract_txt:query in 2678) [ClassicSimilarity], result of:
            0.34714532 = score(doc=2678,freq=1.0), product of:
              0.38946855 = queryWeight, product of:
                5.910149 = boost
                4.7537646 = idf(docFreq=1035, maxDocs=44218)
                0.013862331 = queryNorm
              0.89133084 = fieldWeight in 2678, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7537646 = idf(docFreq=1035, maxDocs=44218)
                0.1875 = fieldNorm(doc=2678)
          0.65387535 = weight(abstract_txt:expansion in 2678) [ClassicSimilarity], result of:
            0.65387535 = score(doc=2678,freq=1.0), product of:
              0.57114184 = queryWeight, product of:
                6.747734 = boost
                6.1059003 = idf(docFreq=267, maxDocs=44218)
                0.013862331 = queryNorm
              1.1448563 = fieldWeight in 2678, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1059003 = idf(docFreq=267, maxDocs=44218)
                0.1875 = fieldNorm(doc=2678)
        0.16 = coord(4/25)
    
  4. Efthimiadis, E.N.: Query expansion (1996) 0.20
    0.19997835 = sum of:
      0.19997835 = product of:
        1.6664863 = sum of:
          0.031826217 = weight(abstract_txt:process in 4847) [ClassicSimilarity], result of:
            0.031826217 = score(doc=4847,freq=1.0), product of:
              0.0628509 = queryWeight, product of:
                1.1192104 = boost
                4.0510116 = idf(docFreq=2091, maxDocs=44218)
                0.013862331 = queryNorm
              0.50637645 = fieldWeight in 4847, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0510116 = idf(docFreq=2091, maxDocs=44218)
                0.125 = fieldNorm(doc=4847)
          0.566886 = weight(abstract_txt:query in 4847) [ClassicSimilarity], result of:
            0.566886 = score(doc=4847,freq=6.0), product of:
              0.38946855 = queryWeight, product of:
                5.910149 = boost
                4.7537646 = idf(docFreq=1035, maxDocs=44218)
                0.013862331 = queryNorm
              1.4555373 = fieldWeight in 4847, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.7537646 = idf(docFreq=1035, maxDocs=44218)
                0.125 = fieldNorm(doc=4847)
          1.067774 = weight(abstract_txt:expansion in 4847) [ClassicSimilarity], result of:
            1.067774 = score(doc=4847,freq=6.0), product of:
              0.57114184 = queryWeight, product of:
                6.747734 = boost
                6.1059003 = idf(docFreq=267, maxDocs=44218)
                0.013862331 = queryNorm
              1.8695426 = fieldWeight in 4847, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.1059003 = idf(docFreq=267, maxDocs=44218)
                0.125 = fieldNorm(doc=4847)
        0.12 = coord(3/25)
    
  5. Brandão, W.C.; Santos, R.L.T.; Ziviani, N.; Moura, E.S. de; Silva, A.S. da: Learning to expand queries using entities (2014) 0.20
    0.19827151 = sum of:
      0.19827151 = product of:
        0.82613134 = sum of:
          0.007134314 = weight(abstract_txt:this in 1343) [ClassicSimilarity], result of:
            0.007134314 = score(doc=1343,freq=2.0), product of:
              0.03345005 = queryWeight, product of:
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.013862331 = queryNorm
              0.21328263 = fieldWeight in 1343, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.0625 = fieldNorm(doc=1343)
          0.015829055 = weight(abstract_txt:text in 1343) [ClassicSimilarity], result of:
            0.015829055 = score(doc=1343,freq=1.0), product of:
              0.06262939 = queryWeight, product of:
                1.1172364 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.013862331 = queryNorm
              0.25274166 = fieldWeight in 1343, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=1343)
          0.011324281 = weight(abstract_txt:their in 1343) [ClassicSimilarity], result of:
            0.011324281 = score(doc=1343,freq=1.0), product of:
              0.057347216 = queryWeight, product of:
                1.3093562 = boost
                3.1594994 = idf(docFreq=5101, maxDocs=44218)
                0.013862331 = queryNorm
              0.19746871 = fieldWeight in 1343, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1594994 = idf(docFreq=5101, maxDocs=44218)
                0.0625 = fieldNorm(doc=1343)
          0.04572686 = weight(abstract_txt:fields in 1343) [ClassicSimilarity], result of:
            0.04572686 = score(doc=1343,freq=1.0), product of:
              0.14541845 = queryWeight, product of:
                2.0850255 = boost
                5.0312033 = idf(docFreq=784, maxDocs=44218)
                0.013862331 = queryNorm
              0.3144502 = fieldWeight in 1343, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0312033 = idf(docFreq=784, maxDocs=44218)
                0.0625 = fieldNorm(doc=1343)
          0.25874686 = weight(abstract_txt:query in 1343) [ClassicSimilarity], result of:
            0.25874686 = score(doc=1343,freq=5.0), product of:
              0.38946855 = queryWeight, product of:
                5.910149 = boost
                4.7537646 = idf(docFreq=1035, maxDocs=44218)
                0.013862331 = queryNorm
              0.6643588 = fieldWeight in 1343, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.7537646 = idf(docFreq=1035, maxDocs=44218)
                0.0625 = fieldNorm(doc=1343)
          0.48736992 = weight(abstract_txt:expansion in 1343) [ClassicSimilarity], result of:
            0.48736992 = score(doc=1343,freq=5.0), product of:
              0.57114184 = queryWeight, product of:
                6.747734 = boost
                6.1059003 = idf(docFreq=267, maxDocs=44218)
                0.013862331 = queryNorm
              0.85332555 = fieldWeight in 1343, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.1059003 = idf(docFreq=267, maxDocs=44218)
                0.0625 = fieldNorm(doc=1343)
        0.24 = coord(6/25)