Document (#34442)

Author
Song, R.
Luo, Z.
Nie, J.-Y.
Yu, Y.
Hon, H.-W.
Title
Identification of ambiguous queries in web search
Source
Information processing and management. 45(2009) no.2, S.216-229
Year
2009
Abstract
It is widely believed that many queries submitted to search engines are inherently ambiguous (e.g., java and apple). However, few studies have tried to classify queries based on ambiguity and to answer "what the proportion of ambiguous queries is". This paper deals with these issues. First, we clarify the definition of ambiguous queries by constructing the taxonomy of queries from being ambiguous to specific. Second, we ask human annotators to manually classify queries. From manually labeled results, we observe that query ambiguity is to some extent predictable. Third, we propose a supervised learning approach to automatically identify ambiguous queries. Experimental results show that we can correctly identify 87% of labeled queries with the approach. Finally, by using our approach, we estimate that about 16% of queries in a real search log are ambiguous.
Theme
Suchmaschinen
Suchtaktik

Similar documents (author)

  1. Song, F.W.: Virtual communities : bowling alone, online together (2009) 4.99
    4.989572 = sum of:
      4.989572 = weight(author_txt:song in 3287) [ClassicSimilarity], result of:
        4.989572 = fieldWeight in 3287, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.983315 = idf(docFreq=40, maxDocs=44218)
          0.625 = fieldNorm(doc=3287)
    
  2. Song, S.-f.: Rethinking of the development of reference service (1997) 3.99
    3.9916575 = sum of:
      3.9916575 = weight(author_txt:song in 859) [ClassicSimilarity], result of:
        3.9916575 = fieldWeight in 859, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.983315 = idf(docFreq=40, maxDocs=44218)
          0.5 = fieldNorm(doc=859)
    
  3. Song, D.; Bruza, P.D.: Towards context sensitive information inference (2003) 3.99
    3.9916575 = sum of:
      3.9916575 = weight(author_txt:song in 1428) [ClassicSimilarity], result of:
        3.9916575 = fieldWeight in 1428, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.983315 = idf(docFreq=40, maxDocs=44218)
          0.5 = fieldNorm(doc=1428)
    
  4. Song, Y.-S.: International business students : a study on their use of electronic library services (2004) 3.99
    3.9916575 = sum of:
      3.9916575 = weight(author_txt:song in 546) [ClassicSimilarity], result of:
        3.9916575 = fieldWeight in 546, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.983315 = idf(docFreq=40, maxDocs=44218)
          0.5 = fieldNorm(doc=546)
    
  5. Lau, R.Y.K.; Bruza, P.D.; Song, D.: Belief revision for adaptive information retrieval (2004) 2.99
    2.9937432 = sum of:
      2.9937432 = weight(author_txt:song in 4077) [ClassicSimilarity], result of:
        2.9937432 = fieldWeight in 4077, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.983315 = idf(docFreq=40, maxDocs=44218)
          0.375 = fieldNorm(doc=4077)
    

Similar documents (content)

  1. Liu, W.; Dog(an, R.I.; Kim, S.; Comeau, D.C.; Kim, W.; Yeganova, L.; Lu, Z.; Wilbur, W.J.: Author name disambiguation for PubMed (2014) 0.17
    0.17123163 = sum of:
      0.17123163 = product of:
        0.6115415 = sum of:
          0.016292177 = weight(abstract_txt:results in 1240) [ClassicSimilarity], result of:
            0.016292177 = score(doc=1240,freq=4.0), product of:
              0.042773977 = queryWeight, product of:
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.012282823 = queryNorm
              0.38088992 = fieldWeight in 1240, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.0546875 = fieldNorm(doc=1240)
          0.03595369 = weight(abstract_txt:estimate in 1240) [ClassicSimilarity], result of:
            0.03595369 = score(doc=1240,freq=1.0), product of:
              0.09134803 = queryWeight, product of:
                1.0333437 = boost
                7.1970778 = idf(docFreq=89, maxDocs=44218)
                0.012282823 = queryNorm
              0.39359018 = fieldWeight in 1240, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.1970778 = idf(docFreq=89, maxDocs=44218)
                0.0546875 = fieldNorm(doc=1240)
          0.007257822 = weight(abstract_txt:that in 1240) [ClassicSimilarity], result of:
            0.007257822 = score(doc=1240,freq=2.0), product of:
              0.0396051 = queryWeight, product of:
                1.3608202 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.012282823 = queryNorm
              0.18325473 = fieldWeight in 1240, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0546875 = fieldNorm(doc=1240)
          0.014162694 = weight(abstract_txt:search in 1240) [ClassicSimilarity], result of:
            0.014162694 = score(doc=1240,freq=1.0), product of:
              0.070795864 = queryWeight, product of:
                1.5756501 = boost
                3.6580524 = idf(docFreq=3098, maxDocs=44218)
                0.012282823 = queryNorm
              0.20004974 = fieldWeight in 1240, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6580524 = idf(docFreq=3098, maxDocs=44218)
                0.0546875 = fieldNorm(doc=1240)
          0.07061184 = weight(abstract_txt:ambiguity in 1240) [ClassicSimilarity], result of:
            0.07061184 = score(doc=1240,freq=1.0), product of:
              0.18049502 = queryWeight, product of:
                2.0542004 = boost
                7.1535926 = idf(docFreq=93, maxDocs=44218)
                0.012282823 = queryNorm
              0.3912121 = fieldWeight in 1240, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.1535926 = idf(docFreq=93, maxDocs=44218)
                0.0546875 = fieldNorm(doc=1240)
          0.18162759 = weight(abstract_txt:queries in 1240) [ClassicSimilarity], result of:
            0.18162759 = score(doc=1240,freq=2.0), product of:
              0.45988378 = queryWeight, product of:
                7.331946 = boost
                5.106586 = idf(docFreq=727, maxDocs=44218)
                0.012282823 = queryNorm
              0.39494237 = fieldWeight in 1240, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.106586 = idf(docFreq=727, maxDocs=44218)
                0.0546875 = fieldNorm(doc=1240)
          0.28563574 = weight(abstract_txt:ambiguous in 1240) [ClassicSimilarity], result of:
            0.28563574 = score(doc=1240,freq=1.0), product of:
              0.6957362 = queryWeight, product of:
                7.5451264 = boost
                7.5072327 = idf(docFreq=65, maxDocs=44218)
                0.012282823 = queryNorm
              0.4105518 = fieldWeight in 1240, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5072327 = idf(docFreq=65, maxDocs=44218)
                0.0546875 = fieldNorm(doc=1240)
        0.28 = coord(7/25)
    
  2. Bouidghaghen, O.; Tamine, L.: Spatio-temporal based personalization for mobile search (2012) 0.17
    0.16547507 = sum of:
      0.16547507 = product of:
        0.6894795 = sum of:
          0.0116372695 = weight(abstract_txt:results in 108) [ClassicSimilarity], result of:
            0.0116372695 = score(doc=108,freq=1.0), product of:
              0.042773977 = queryWeight, product of:
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.012282823 = queryNorm
              0.27206424 = fieldWeight in 108, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.078125 = fieldNorm(doc=108)
          0.010368317 = weight(abstract_txt:that in 108) [ClassicSimilarity], result of:
            0.010368317 = score(doc=108,freq=2.0), product of:
              0.0396051 = queryWeight, product of:
                1.3608202 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.012282823 = queryNorm
              0.26179248 = fieldWeight in 108, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.078125 = fieldNorm(doc=108)
          0.04524107 = weight(abstract_txt:search in 108) [ClassicSimilarity], result of:
            0.04524107 = score(doc=108,freq=5.0), product of:
              0.070795864 = queryWeight, product of:
                1.5756501 = boost
                3.6580524 = idf(docFreq=3098, maxDocs=44218)
                0.012282823 = queryNorm
              0.63903546 = fieldWeight in 108, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.6580524 = idf(docFreq=3098, maxDocs=44218)
                0.078125 = fieldNorm(doc=108)
          0.030710198 = weight(abstract_txt:approach in 108) [ClassicSimilarity], result of:
            0.030710198 = score(doc=108,freq=2.0), product of:
              0.074214324 = queryWeight, product of:
                1.6132426 = boost
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.012282823 = queryNorm
              0.41380417 = fieldWeight in 108, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.078125 = fieldNorm(doc=108)
          0.18347158 = weight(abstract_txt:queries in 108) [ClassicSimilarity], result of:
            0.18347158 = score(doc=108,freq=1.0), product of:
              0.45988378 = queryWeight, product of:
                7.331946 = boost
                5.106586 = idf(docFreq=727, maxDocs=44218)
                0.012282823 = queryNorm
              0.39895204 = fieldWeight in 108, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.106586 = idf(docFreq=727, maxDocs=44218)
                0.078125 = fieldNorm(doc=108)
          0.40805107 = weight(abstract_txt:ambiguous in 108) [ClassicSimilarity], result of:
            0.40805107 = score(doc=108,freq=1.0), product of:
              0.6957362 = queryWeight, product of:
                7.5451264 = boost
                7.5072327 = idf(docFreq=65, maxDocs=44218)
                0.012282823 = queryNorm
              0.58650255 = fieldWeight in 108, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5072327 = idf(docFreq=65, maxDocs=44218)
                0.078125 = fieldNorm(doc=108)
        0.24 = coord(6/25)
    
  3. Spink, A.; Ozmultu, H.C.: Characteristics of question format web queries : an exploratory study (2002) 0.16
    0.16101447 = sum of:
      0.16101447 = product of:
        0.6708936 = sum of:
          0.009309815 = weight(abstract_txt:results in 3910) [ClassicSimilarity], result of:
            0.009309815 = score(doc=3910,freq=1.0), product of:
              0.042773977 = queryWeight, product of:
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.012282823 = queryNorm
              0.21765138 = fieldWeight in 3910, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.0625 = fieldNorm(doc=3910)
          0.03789584 = weight(abstract_txt:submitted in 3910) [ClassicSimilarity], result of:
            0.03789584 = score(doc=3910,freq=1.0), product of:
              0.08655057 = queryWeight, product of:
                1.0058429 = boost
                7.0055394 = idf(docFreq=108, maxDocs=44218)
                0.012282823 = queryNorm
              0.4378462 = fieldWeight in 3910, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0055394 = idf(docFreq=108, maxDocs=44218)
                0.0625 = fieldNorm(doc=3910)
          0.008294654 = weight(abstract_txt:that in 3910) [ClassicSimilarity], result of:
            0.008294654 = score(doc=3910,freq=2.0), product of:
              0.0396051 = queryWeight, product of:
                1.3608202 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.012282823 = queryNorm
              0.20943399 = fieldWeight in 3910, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0625 = fieldNorm(doc=3910)
          0.026555788 = weight(abstract_txt:identify in 3910) [ClassicSimilarity], result of:
            0.026555788 = score(doc=3910,freq=1.0), product of:
              0.086031675 = queryWeight, product of:
                1.4182062 = boost
                4.9387927 = idf(docFreq=860, maxDocs=44218)
                0.012282823 = queryNorm
              0.30867454 = fieldWeight in 3910, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9387927 = idf(docFreq=860, maxDocs=44218)
                0.0625 = fieldNorm(doc=3910)
          0.03964729 = weight(abstract_txt:search in 3910) [ClassicSimilarity], result of:
            0.03964729 = score(doc=3910,freq=6.0), product of:
              0.070795864 = queryWeight, product of:
                1.5756501 = boost
                3.6580524 = idf(docFreq=3098, maxDocs=44218)
                0.012282823 = queryNorm
              0.56002265 = fieldWeight in 3910, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                3.6580524 = idf(docFreq=3098, maxDocs=44218)
                0.0625 = fieldNorm(doc=3910)
          0.5491902 = weight(abstract_txt:queries in 3910) [ClassicSimilarity], result of:
            0.5491902 = score(doc=3910,freq=14.0), product of:
              0.45988378 = queryWeight, product of:
                7.331946 = boost
                5.106586 = idf(docFreq=727, maxDocs=44218)
                0.012282823 = queryNorm
              1.1941935 = fieldWeight in 3910, product of:
                3.7416575 = tf(freq=14.0), with freq of:
                  14.0 = termFreq=14.0
                5.106586 = idf(docFreq=727, maxDocs=44218)
                0.0625 = fieldNorm(doc=3910)
        0.24 = coord(6/25)
    
  4. Li, M.; Li, H.; Zhou, Z.-H.: Semi-supervised document retrieval (2009) 0.15
    0.1529793 = sum of:
      0.1529793 = product of:
        0.54635465 = sum of:
          0.009309815 = weight(abstract_txt:results in 4218) [ClassicSimilarity], result of:
            0.009309815 = score(doc=4218,freq=1.0), product of:
              0.042773977 = queryWeight, product of:
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.012282823 = queryNorm
              0.21765138 = fieldWeight in 4218, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.0625 = fieldNorm(doc=4218)
          0.03804561 = weight(abstract_txt:constructing in 4218) [ClassicSimilarity], result of:
            0.03804561 = score(doc=4218,freq=1.0), product of:
              0.08677846 = queryWeight, product of:
                1.0071663 = boost
                7.014756 = idf(docFreq=107, maxDocs=44218)
                0.012282823 = queryNorm
              0.43842226 = fieldWeight in 4218, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.014756 = idf(docFreq=107, maxDocs=44218)
                0.0625 = fieldNorm(doc=4218)
          0.07934682 = weight(abstract_txt:supervised in 4218) [ClassicSimilarity], result of:
            0.07934682 = score(doc=4218,freq=3.0), product of:
              0.098217346 = queryWeight, product of:
                1.0714929 = boost
                7.462781 = idf(docFreq=68, maxDocs=44218)
                0.012282823 = queryNorm
              0.80786973 = fieldWeight in 4218, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.462781 = idf(docFreq=68, maxDocs=44218)
                0.0625 = fieldNorm(doc=4218)
          0.005865206 = weight(abstract_txt:that in 4218) [ClassicSimilarity], result of:
            0.005865206 = score(doc=4218,freq=1.0), product of:
              0.0396051 = queryWeight, product of:
                1.3608202 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.012282823 = queryNorm
              0.1480922 = fieldWeight in 4218, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0625 = fieldNorm(doc=4218)
          0.016185936 = weight(abstract_txt:search in 4218) [ClassicSimilarity], result of:
            0.016185936 = score(doc=4218,freq=1.0), product of:
              0.070795864 = queryWeight, product of:
                1.5756501 = boost
                3.6580524 = idf(docFreq=3098, maxDocs=44218)
                0.012282823 = queryNorm
              0.22862828 = fieldWeight in 4218, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6580524 = idf(docFreq=3098, maxDocs=44218)
                0.0625 = fieldNorm(doc=4218)
          0.19002688 = weight(abstract_txt:labeled in 4218) [ClassicSimilarity], result of:
            0.19002688 = score(doc=4218,freq=4.0), product of:
              0.20125297 = queryWeight, product of:
                2.1691089 = boost
                7.5537524 = idf(docFreq=62, maxDocs=44218)
                0.012282823 = queryNorm
              0.94421905 = fieldWeight in 4218, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.5537524 = idf(docFreq=62, maxDocs=44218)
                0.0625 = fieldNorm(doc=4218)
          0.20757438 = weight(abstract_txt:queries in 4218) [ClassicSimilarity], result of:
            0.20757438 = score(doc=4218,freq=2.0), product of:
              0.45988378 = queryWeight, product of:
                7.331946 = boost
                5.106586 = idf(docFreq=727, maxDocs=44218)
                0.012282823 = queryNorm
              0.4513627 = fieldWeight in 4218, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.106586 = idf(docFreq=727, maxDocs=44218)
                0.0625 = fieldNorm(doc=4218)
        0.28 = coord(7/25)
    
  5. Ortiz-Cordova, A.; Yang, Y.; Jansen, B.J.: External to internal search : associating searching on search engines with searching on sites (2015) 0.15
    0.14569269 = sum of:
      0.14569269 = product of:
        0.6070529 = sum of:
          0.05359281 = weight(abstract_txt:submitted in 2675) [ClassicSimilarity], result of:
            0.05359281 = score(doc=2675,freq=2.0), product of:
              0.08655057 = queryWeight, product of:
                1.0058429 = boost
                7.0055394 = idf(docFreq=108, maxDocs=44218)
                0.012282823 = queryNorm
              0.61920804 = fieldWeight in 2675, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.0055394 = idf(docFreq=108, maxDocs=44218)
                0.0625 = fieldNorm(doc=2675)
          0.008294654 = weight(abstract_txt:that in 2675) [ClassicSimilarity], result of:
            0.008294654 = score(doc=2675,freq=2.0), product of:
              0.0396051 = queryWeight, product of:
                1.3608202 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.012282823 = queryNorm
              0.20943399 = fieldWeight in 2675, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0625 = fieldNorm(doc=2675)
          0.026555788 = weight(abstract_txt:identify in 2675) [ClassicSimilarity], result of:
            0.026555788 = score(doc=2675,freq=1.0), product of:
              0.086031675 = queryWeight, product of:
                1.4182062 = boost
                4.9387927 = idf(docFreq=860, maxDocs=44218)
                0.012282823 = queryNorm
              0.30867454 = fieldWeight in 2675, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9387927 = idf(docFreq=860, maxDocs=44218)
                0.0625 = fieldNorm(doc=2675)
          0.06867111 = weight(abstract_txt:search in 2675) [ClassicSimilarity], result of:
            0.06867111 = score(doc=2675,freq=18.0), product of:
              0.070795864 = queryWeight, product of:
                1.5756501 = boost
                3.6580524 = idf(docFreq=3098, maxDocs=44218)
                0.012282823 = queryNorm
              0.9699876 = fieldWeight in 2675, product of:
                4.2426405 = tf(freq=18.0), with freq of:
                  18.0 = termFreq=18.0
                3.6580524 = idf(docFreq=3098, maxDocs=44218)
                0.0625 = fieldNorm(doc=2675)
          0.061602455 = weight(abstract_txt:classify in 2675) [ClassicSimilarity], result of:
            0.061602455 = score(doc=2675,freq=1.0), product of:
              0.15075935 = queryWeight, product of:
                1.8773806 = boost
                6.537832 = idf(docFreq=173, maxDocs=44218)
                0.012282823 = queryNorm
              0.4086145 = fieldWeight in 2675, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.537832 = idf(docFreq=173, maxDocs=44218)
                0.0625 = fieldNorm(doc=2675)
          0.38833612 = weight(abstract_txt:queries in 2675) [ClassicSimilarity], result of:
            0.38833612 = score(doc=2675,freq=7.0), product of:
              0.45988378 = queryWeight, product of:
                7.331946 = boost
                5.106586 = idf(docFreq=727, maxDocs=44218)
                0.012282823 = queryNorm
              0.8444223 = fieldWeight in 2675, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                5.106586 = idf(docFreq=727, maxDocs=44218)
                0.0625 = fieldNorm(doc=2675)
        0.24 = coord(6/25)