Document (#39696)

Author
Buccio, E. Di
Melucci, M.
Moro, F.
Title
Detecting verbose queries and improving information retrieval
Source
Information processing and management. 50(2014) no.2, S.342-360
Year
2014
Abstract
Although most of the queries submitted to search engines are composed of a few keywords and have a length that ranges from three to six words, more than 15% of the total volume of the queries are verbose, introduce ambiguity and cause topic drifts. We consider verbosity a different property of queries from length since a verbose query is not necessarily long, it might be succinct and a short query might be verbose. This paper proposes a methodology to automatically detect verbose queries and conditionally modify queries. The methodology proposed in this paper exploits state-of-the-art classification algorithms, combines concepts from a large linguistic database and uses a topic gisting algorithm we designed for verbose query modification purposes. Our experimental results have been obtained using the TREC Robust track collection, thirty topics classified by difficulty degree, four queries per topic classified by verbosity and length, and human assessment of query verbosity. Our results suggest that the methodology for query modification conditioned to query verbosity detection and topic gisting is significantly effective and that query modification should be refined when topic difficulty and query verbosity are considered since these two properties interact and query verbosity is not straightforwardly related to query length.
Content
Vgl.: doi: 10.1016/j.ipm.2013.09.003.
Theme
Semantisches Umfeld in Indexierung u. Retrieval

Similar documents (author)

  1. Melucci, M.: Passage retrieval : a probabilistic technique (1998) 5.81
    5.81187 = sum of:
      5.81187 = weight(author_txt:melucci in 1150) [ClassicSimilarity], result of:
        5.81187 = fieldWeight in 1150, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.298992 = idf(docFreq=10, maxDocs=44218)
          0.625 = fieldNorm(doc=1150)
    
  2. Melucci, M.: Making digital libraries effective : automatic generation of links for similarity search across hyper-textbooks (2004) 5.81
    5.81187 = sum of:
      5.81187 = weight(author_txt:melucci in 2226) [ClassicSimilarity], result of:
        5.81187 = fieldWeight in 2226, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.298992 = idf(docFreq=10, maxDocs=44218)
          0.625 = fieldNorm(doc=2226)
    
  3. Melucci, M.: Contextual search : a computational framework (2012) 5.81
    5.81187 = sum of:
      5.81187 = weight(author_txt:melucci in 4913) [ClassicSimilarity], result of:
        5.81187 = fieldWeight in 4913, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.298992 = idf(docFreq=10, maxDocs=44218)
          0.625 = fieldNorm(doc=4913)
    
  4. Agosti, M.; Melucci, M.: Information retrieval techniques for the automatic construction of hypertext (2000) 4.65
    4.649496 = sum of:
      4.649496 = weight(author_txt:melucci in 4671) [ClassicSimilarity], result of:
        4.649496 = fieldWeight in 4671, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.298992 = idf(docFreq=10, maxDocs=44218)
          0.5 = fieldNorm(doc=4671)
    
  5. Melucci, M.; Orio, N.: Combining melody processing and information retrieval techniques : methodology, evaluation, and system implementation (2004) 4.65
    4.649496 = sum of:
      4.649496 = weight(author_txt:melucci in 3087) [ClassicSimilarity], result of:
        4.649496 = fieldWeight in 3087, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.298992 = idf(docFreq=10, maxDocs=44218)
          0.5 = fieldNorm(doc=3087)
    

Similar documents (content)

  1. Hoenkamp, E.; Bruza, P.D.; Song, D.; Huang, Q.: ¬An effective approach to verbose queries using a limited dependencies language model (2009) 0.20
    0.19819129 = sum of:
      0.19819129 = product of:
        0.9909564 = sum of:
          0.0047600693 = weight(abstract_txt:that in 2122) [ClassicSimilarity], result of:
            0.0047600693 = score(doc=2122,freq=2.0), product of:
              0.022728255 = queryWeight, product of:
                1.0186342 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.009416634 = queryNorm
              0.20943399 = fieldWeight in 2122, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0625 = fieldNorm(doc=2122)
          0.00534196 = weight(abstract_txt:from in 2122) [ClassicSimilarity], result of:
            0.00534196 = score(doc=2122,freq=1.0), product of:
              0.030924382 = queryWeight, product of:
                1.1881895 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.009416634 = queryNorm
              0.17274266 = fieldWeight in 2122, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.0625 = fieldNorm(doc=2122)
          0.13616678 = weight(abstract_txt:queries in 2122) [ClassicSimilarity], result of:
            0.13616678 = score(doc=2122,freq=3.0), product of:
              0.24632013 = queryWeight, product of:
                5.1224008 = boost
                5.106586 = idf(docFreq=727, maxDocs=44218)
                0.009416634 = queryNorm
              0.5528041 = fieldWeight in 2122, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.106586 = idf(docFreq=727, maxDocs=44218)
                0.0625 = fieldNorm(doc=2122)
          0.18120226 = weight(abstract_txt:query in 2122) [ClassicSimilarity], result of:
            0.18120226 = score(doc=2122,freq=4.0), product of:
              0.3049411 = queryWeight, product of:
                6.8121243 = boost
                4.7537646 = idf(docFreq=1035, maxDocs=44218)
                0.009416634 = queryNorm
              0.5942206 = fieldWeight in 2122, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.7537646 = idf(docFreq=1035, maxDocs=44218)
                0.0625 = fieldNorm(doc=2122)
          0.66348535 = weight(abstract_txt:verbose in 2122) [ClassicSimilarity], result of:
            0.66348535 = score(doc=2122,freq=2.0), product of:
              0.7698182 = queryWeight, product of:
                8.383865 = boost
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.009416634 = queryNorm
              0.8618728 = fieldWeight in 2122, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.0625 = fieldNorm(doc=2122)
        0.2 = coord(5/25)
    
  2. Sa, N.; Yuan, X.J.: Examining users' partial query modification patterns in voice search (2020) 0.18
    0.18265475 = sum of:
      0.18265475 = product of:
        0.6523384 = sum of:
          0.0058298702 = weight(abstract_txt:that in 5675) [ClassicSimilarity], result of:
            0.0058298702 = score(doc=5675,freq=3.0), product of:
              0.022728255 = queryWeight, product of:
                1.0186342 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.009416634 = queryNorm
              0.2565032 = fieldWeight in 5675, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0625 = fieldNorm(doc=5675)
          0.035682905 = weight(abstract_txt:thirty in 5675) [ClassicSimilarity], result of:
            0.035682905 = score(doc=5675,freq=1.0), product of:
              0.07605019 = queryWeight, product of:
                1.0757831 = boost
                7.5072327 = idf(docFreq=65, maxDocs=44218)
                0.009416634 = queryNorm
              0.46920204 = fieldWeight in 5675, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5072327 = idf(docFreq=65, maxDocs=44218)
                0.0625 = fieldNorm(doc=5675)
          0.04178249 = weight(abstract_txt:modify in 5675) [ClassicSimilarity], result of:
            0.04178249 = score(doc=5675,freq=1.0), product of:
              0.08448697 = queryWeight, product of:
                1.1338861 = boost
                7.912698 = idf(docFreq=43, maxDocs=44218)
                0.009416634 = queryNorm
              0.4945436 = fieldWeight in 5675, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.912698 = idf(docFreq=43, maxDocs=44218)
                0.0625 = fieldNorm(doc=5675)
          0.00534196 = weight(abstract_txt:from in 5675) [ClassicSimilarity], result of:
            0.00534196 = score(doc=5675,freq=1.0), product of:
              0.030924382 = queryWeight, product of:
                1.1881895 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.009416634 = queryNorm
              0.17274266 = fieldWeight in 5675, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.0625 = fieldNorm(doc=5675)
          0.2128134 = weight(abstract_txt:modification in 5675) [ClassicSimilarity], result of:
            0.2128134 = score(doc=5675,freq=4.0), product of:
              0.22723746 = queryWeight, product of:
                3.2208846 = boost
                7.4921947 = idf(docFreq=66, maxDocs=44218)
                0.009416634 = queryNorm
              0.93652433 = fieldWeight in 5675, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.4921947 = idf(docFreq=66, maxDocs=44218)
                0.0625 = fieldNorm(doc=5675)
          0.11117972 = weight(abstract_txt:queries in 5675) [ClassicSimilarity], result of:
            0.11117972 = score(doc=5675,freq=2.0), product of:
              0.24632013 = queryWeight, product of:
                5.1224008 = boost
                5.106586 = idf(docFreq=727, maxDocs=44218)
                0.009416634 = queryNorm
              0.4513627 = fieldWeight in 5675, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.106586 = idf(docFreq=727, maxDocs=44218)
                0.0625 = fieldNorm(doc=5675)
          0.23970807 = weight(abstract_txt:query in 5675) [ClassicSimilarity], result of:
            0.23970807 = score(doc=5675,freq=7.0), product of:
              0.3049411 = queryWeight, product of:
                6.8121243 = boost
                4.7537646 = idf(docFreq=1035, maxDocs=44218)
                0.009416634 = queryNorm
              0.78607994 = fieldWeight in 5675, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                4.7537646 = idf(docFreq=1035, maxDocs=44218)
                0.0625 = fieldNorm(doc=5675)
        0.28 = coord(7/25)
    
  3. Spink, A.; Ozmultu, H.C.: Characteristics of question format web queries : an exploratory study (2002) 0.18
    0.17896324 = sum of:
      0.17896324 = product of:
        0.63915443 = sum of:
          0.028996482 = weight(abstract_txt:submitted in 3910) [ClassicSimilarity], result of:
            0.028996482 = score(doc=3910,freq=1.0), product of:
              0.06622527 = queryWeight, product of:
                1.0038908 = boost
                7.0055394 = idf(docFreq=108, maxDocs=44218)
                0.009416634 = queryNorm
              0.4378462 = fieldWeight in 3910, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0055394 = idf(docFreq=108, maxDocs=44218)
                0.0625 = fieldNorm(doc=3910)
          0.0047600693 = weight(abstract_txt:that in 3910) [ClassicSimilarity], result of:
            0.0047600693 = score(doc=3910,freq=2.0), product of:
              0.022728255 = queryWeight, product of:
                1.0186342 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.009416634 = queryNorm
              0.20943399 = fieldWeight in 3910, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0625 = fieldNorm(doc=3910)
          0.00534196 = weight(abstract_txt:from in 3910) [ClassicSimilarity], result of:
            0.00534196 = score(doc=3910,freq=1.0), product of:
              0.030924382 = queryWeight, product of:
                1.1881895 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.009416634 = queryNorm
              0.17274266 = fieldWeight in 3910, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.0625 = fieldNorm(doc=3910)
          0.054704417 = weight(abstract_txt:topic in 3910) [ClassicSimilarity], result of:
            0.054704417 = score(doc=3910,freq=1.0), product of:
              0.17290138 = queryWeight, product of:
                3.627094 = boost
                5.062254 = idf(docFreq=760, maxDocs=44218)
                0.009416634 = queryNorm
              0.31639087 = fieldWeight in 3910, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.062254 = idf(docFreq=760, maxDocs=44218)
                0.0625 = fieldNorm(doc=3910)
          0.0942718 = weight(abstract_txt:length in 3910) [ClassicSimilarity], result of:
            0.0942718 = score(doc=3910,freq=1.0), product of:
              0.23071086 = queryWeight, product of:
                3.7474737 = boost
                6.537832 = idf(docFreq=173, maxDocs=44218)
                0.009416634 = queryNorm
              0.4086145 = fieldWeight in 3910, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.537832 = idf(docFreq=173, maxDocs=44218)
                0.0625 = fieldNorm(doc=3910)
          0.2941539 = weight(abstract_txt:queries in 3910) [ClassicSimilarity], result of:
            0.2941539 = score(doc=3910,freq=14.0), product of:
              0.24632013 = queryWeight, product of:
                5.1224008 = boost
                5.106586 = idf(docFreq=727, maxDocs=44218)
                0.009416634 = queryNorm
              1.1941935 = fieldWeight in 3910, product of:
                3.7416575 = tf(freq=14.0), with freq of:
                  14.0 = termFreq=14.0
                5.106586 = idf(docFreq=727, maxDocs=44218)
                0.0625 = fieldNorm(doc=3910)
          0.15692577 = weight(abstract_txt:query in 3910) [ClassicSimilarity], result of:
            0.15692577 = score(doc=3910,freq=3.0), product of:
              0.3049411 = queryWeight, product of:
                6.8121243 = boost
                4.7537646 = idf(docFreq=1035, maxDocs=44218)
                0.009416634 = queryNorm
              0.5146101 = fieldWeight in 3910, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.7537646 = idf(docFreq=1035, maxDocs=44218)
                0.0625 = fieldNorm(doc=3910)
        0.28 = coord(7/25)
    
  4. Koopman, B.; Zuccon, G.; Bruza, P.; Nguyen, A.: What makes an effective clinical query and querier? (2017) 0.16
    0.16353823 = sum of:
      0.16353823 = product of:
        0.81769115 = sum of:
          0.0047600693 = weight(abstract_txt:that in 3922) [ClassicSimilarity], result of:
            0.0047600693 = score(doc=3922,freq=2.0), product of:
              0.022728255 = queryWeight, product of:
                1.0186342 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.009416634 = queryNorm
              0.20943399 = fieldWeight in 3922, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0625 = fieldNorm(doc=3922)
          0.00534196 = weight(abstract_txt:from in 3922) [ClassicSimilarity], result of:
            0.00534196 = score(doc=3922,freq=1.0), product of:
              0.030924382 = queryWeight, product of:
                1.1881895 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.009416634 = queryNorm
              0.17274266 = fieldWeight in 3922, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.0625 = fieldNorm(doc=3922)
          0.15723187 = weight(abstract_txt:queries in 3922) [ClassicSimilarity], result of:
            0.15723187 = score(doc=3922,freq=4.0), product of:
              0.24632013 = queryWeight, product of:
                5.1224008 = boost
                5.106586 = idf(docFreq=727, maxDocs=44218)
                0.009416634 = queryNorm
              0.63832325 = fieldWeight in 3922, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.106586 = idf(docFreq=727, maxDocs=44218)
                0.0625 = fieldNorm(doc=3922)
          0.18120226 = weight(abstract_txt:query in 3922) [ClassicSimilarity], result of:
            0.18120226 = score(doc=3922,freq=4.0), product of:
              0.3049411 = queryWeight, product of:
                6.8121243 = boost
                4.7537646 = idf(docFreq=1035, maxDocs=44218)
                0.009416634 = queryNorm
              0.5942206 = fieldWeight in 3922, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.7537646 = idf(docFreq=1035, maxDocs=44218)
                0.0625 = fieldNorm(doc=3922)
          0.46915498 = weight(abstract_txt:verbose in 3922) [ClassicSimilarity], result of:
            0.46915498 = score(doc=3922,freq=1.0), product of:
              0.7698182 = queryWeight, product of:
                8.383865 = boost
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.009416634 = queryNorm
              0.6094361 = fieldWeight in 3922, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.0625 = fieldNorm(doc=3922)
        0.2 = coord(5/25)
    
  5. Li, X.; Schijvenaars, B.J.A.; Rijke, M.de: Investigating queries and search failures in academic search (2017) 0.16
    0.1605607 = sum of:
      0.1605607 = product of:
        0.66900295 = sum of:
          0.0065855384 = weight(abstract_txt:that in 5033) [ClassicSimilarity], result of:
            0.0065855384 = score(doc=5033,freq=5.0), product of:
              0.022728255 = queryWeight, product of:
                1.0186342 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.009416634 = queryNorm
              0.28975117 = fieldWeight in 5033, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5033)
          0.006610338 = weight(abstract_txt:from in 5033) [ClassicSimilarity], result of:
            0.006610338 = score(doc=5033,freq=2.0), product of:
              0.030924382 = queryWeight, product of:
                1.1881895 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.009416634 = queryNorm
              0.21375814 = fieldWeight in 5033, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5033)
          0.059338458 = weight(abstract_txt:conditioned in 5033) [ClassicSimilarity], result of:
            0.059338458 = score(doc=5033,freq=1.0), product of:
              0.11668427 = queryWeight, product of:
                1.3325415 = boost
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.009416634 = queryNorm
              0.5085386 = fieldWeight in 5033, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5033)
          0.08248783 = weight(abstract_txt:length in 5033) [ClassicSimilarity], result of:
            0.08248783 = score(doc=5033,freq=1.0), product of:
              0.23071086 = queryWeight, product of:
                3.7474737 = boost
                6.537832 = idf(docFreq=173, maxDocs=44218)
                0.009416634 = queryNorm
              0.3575377 = fieldWeight in 5033, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.537832 = idf(docFreq=173, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5033)
          0.22814712 = weight(abstract_txt:queries in 5033) [ClassicSimilarity], result of:
            0.22814712 = score(doc=5033,freq=11.0), product of:
              0.24632013 = queryWeight, product of:
                5.1224008 = boost
                5.106586 = idf(docFreq=727, maxDocs=44218)
                0.009416634 = queryNorm
              0.92622197 = fieldWeight in 5033, product of:
                3.3166249 = tf(freq=11.0), with freq of:
                  11.0 = termFreq=11.0
                5.106586 = idf(docFreq=727, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5033)
          0.28583366 = weight(abstract_txt:query in 5033) [ClassicSimilarity], result of:
            0.28583366 = score(doc=5033,freq=13.0), product of:
              0.3049411 = queryWeight, product of:
                6.8121243 = boost
                4.7537646 = idf(docFreq=1035, maxDocs=44218)
                0.009416634 = queryNorm
              0.9373406 = fieldWeight in 5033, product of:
                3.6055512 = tf(freq=13.0), with freq of:
                  13.0 = termFreq=13.0
                4.7537646 = idf(docFreq=1035, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5033)
        0.24 = coord(6/25)