Document (#34443)

Author
Song, R.
Luo, Z.
Nie, J.-Y.
Yu, Y.
Hon, H.-W.
Title
Identification of ambiguous queries in web search
Source
Information processing and management. 45(2009) no.2, S.216-229
Year
2009
Abstract
It is widely believed that many queries submitted to search engines are inherently ambiguous (e.g., java and apple). However, few studies have tried to classify queries based on ambiguity and to answer "what the proportion of ambiguous queries is". This paper deals with these issues. First, we clarify the definition of ambiguous queries by constructing the taxonomy of queries from being ambiguous to specific. Second, we ask human annotators to manually classify queries. From manually labeled results, we observe that query ambiguity is to some extent predictable. Third, we propose a supervised learning approach to automatically identify ambiguous queries. Experimental results show that we can correctly identify 87% of labeled queries with the approach. Finally, by using our approach, we estimate that about 16% of queries in a real search log are ambiguous.
Theme
Suchmaschinen
Suchtaktik

Similar documents (author)

  1. Song, F.W.: Virtual communities : bowling alone, online together (2009) 5.09
    5.085331 = sum of:
      5.085331 = weight(author_txt:song in 288) [ClassicSimilarity], result of:
        5.085331 = fieldWeight in 288, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.13653 = idf(docFreq=33, maxDocs=42740)
          0.625 = fieldNorm(doc=288)
    
  2. Song, S.-f.: Rethinking of the development of reference service (1997) 4.07
    4.068265 = sum of:
      4.068265 = weight(author_txt:song in 860) [ClassicSimilarity], result of:
        4.068265 = fieldWeight in 860, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.13653 = idf(docFreq=33, maxDocs=42740)
          0.5 = fieldNorm(doc=860)
    
  3. Song, D.; Bruza, P.D.: Towards context sensitive information inference (2003) 4.07
    4.068265 = sum of:
      4.068265 = weight(author_txt:song in 2429) [ClassicSimilarity], result of:
        4.068265 = fieldWeight in 2429, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.13653 = idf(docFreq=33, maxDocs=42740)
          0.5 = fieldNorm(doc=2429)
    
  4. Song, Y.-S.: International business students : a study on their use of electronic library services (2004) 4.07
    4.068265 = sum of:
      4.068265 = weight(author_txt:song in 2547) [ClassicSimilarity], result of:
        4.068265 = fieldWeight in 2547, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.13653 = idf(docFreq=33, maxDocs=42740)
          0.5 = fieldNorm(doc=2547)
    
  5. Lau, R.Y.K.; Bruza, P.D.; Song, D.: Belief revision for adaptive information retrieval (2004) 3.05
    3.0511987 = sum of:
      3.0511987 = weight(author_txt:song in 78) [ClassicSimilarity], result of:
        3.0511987 = fieldWeight in 78, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.13653 = idf(docFreq=33, maxDocs=42740)
          0.375 = fieldNorm(doc=78)
    

Similar documents (content)

  1. Liu, W.; Dog(an, R.I.; Kim, S.; Comeau, D.C.; Kim, W.; Yeganova, L.; Lu, Z.; Wilbur, W.J.: Author name disambiguation for PubMed (2014) 0.17
    0.17121018 = sum of:
      0.17121018 = product of:
        0.6114649 = sum of:
          0.0164895 = weight(abstract_txt:results in 3241) [ClassicSimilarity], result of:
            0.0164895 = score(doc=3241,freq=4.0), product of:
              0.042985678 = queryWeight, product of:
                1.0007921 = boost
                3.5072412 = idf(docFreq=3482, maxDocs=42740)
                0.012246564 = queryNorm
              0.38360453 = fieldWeight in 3241, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.5072412 = idf(docFreq=3482, maxDocs=42740)
                0.0546875 = fieldNorm(doc=3241)
          0.03579259 = weight(abstract_txt:estimate in 3241) [ClassicSimilarity], result of:
            0.03579259 = score(doc=3241,freq=1.0), product of:
              0.09079408 = queryWeight, product of:
                1.0284798 = boost
                7.2085433 = idf(docFreq=85, maxDocs=42740)
                0.012246564 = queryNorm
              0.39421722 = fieldWeight in 3241, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.2085433 = idf(docFreq=85, maxDocs=42740)
                0.0546875 = fieldNorm(doc=3241)
          0.007422605 = weight(abstract_txt:that in 3241) [ClassicSimilarity], result of:
            0.007422605 = score(doc=3241,freq=2.0), product of:
              0.04007834 = queryWeight, product of:
                1.3666328 = boost
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.012246564 = queryNorm
              0.18520242 = fieldWeight in 3241, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.0546875 = fieldNorm(doc=3241)
          0.013957205 = weight(abstract_txt:search in 3241) [ClassicSimilarity], result of:
            0.013957205 = score(doc=3241,freq=1.0), product of:
              0.06989319 = queryWeight, product of:
                1.5629499 = boost
                3.6515355 = idf(docFreq=3014, maxDocs=42740)
                0.012246564 = queryNorm
              0.19969335 = fieldWeight in 3241, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6515355 = idf(docFreq=3014, maxDocs=42740)
                0.0546875 = fieldNorm(doc=3241)
          0.07158518 = weight(abstract_txt:ambiguity in 3241) [ClassicSimilarity], result of:
            0.07158518 = score(doc=3241,freq=1.0), product of:
              0.18158816 = queryWeight, product of:
                2.0569596 = boost
                7.2085433 = idf(docFreq=85, maxDocs=42740)
                0.012246564 = queryNorm
              0.39421722 = fieldWeight in 3241, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.2085433 = idf(docFreq=85, maxDocs=42740)
                0.0546875 = fieldNorm(doc=3241)
          0.17812169 = weight(abstract_txt:queries in 3241) [ClassicSimilarity], result of:
            0.17812169 = score(doc=3241,freq=2.0), product of:
              0.45254663 = queryWeight, product of:
                7.2610373 = boost
                5.0892105 = idf(docFreq=715, maxDocs=42740)
                0.012246564 = queryNorm
              0.39359853 = fieldWeight in 3241, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.0892105 = idf(docFreq=715, maxDocs=42740)
                0.0546875 = fieldNorm(doc=3241)
          0.28809616 = weight(abstract_txt:ambiguous in 3241) [ClassicSimilarity], result of:
            0.28809616 = score(doc=3241,freq=1.0), product of:
              0.6975678 = queryWeight, product of:
                7.5423946 = boost
                7.5520167 = idf(docFreq=60, maxDocs=42740)
                0.012246564 = queryNorm
              0.4130009 = fieldWeight in 3241, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5520167 = idf(docFreq=60, maxDocs=42740)
                0.0546875 = fieldNorm(doc=3241)
        0.28 = coord(7/25)
    
  2. Bouidghaghen, O.; Tamine, L.: Spatio-temporal based personalization for mobile search (2012) 0.17
    0.16549243 = sum of:
      0.16549243 = product of:
        0.68955183 = sum of:
          0.011778214 = weight(abstract_txt:results in 2109) [ClassicSimilarity], result of:
            0.011778214 = score(doc=2109,freq=1.0), product of:
              0.042985678 = queryWeight, product of:
                1.0007921 = boost
                3.5072412 = idf(docFreq=3482, maxDocs=42740)
                0.012246564 = queryNorm
              0.2740032 = fieldWeight in 2109, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5072412 = idf(docFreq=3482, maxDocs=42740)
                0.078125 = fieldNorm(doc=2109)
          0.010603722 = weight(abstract_txt:that in 2109) [ClassicSimilarity], result of:
            0.010603722 = score(doc=2109,freq=2.0), product of:
              0.04007834 = queryWeight, product of:
                1.3666328 = boost
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.012246564 = queryNorm
              0.2645749 = fieldWeight in 2109, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.078125 = fieldNorm(doc=2109)
          0.044584658 = weight(abstract_txt:search in 2109) [ClassicSimilarity], result of:
            0.044584658 = score(doc=2109,freq=5.0), product of:
              0.06989319 = queryWeight, product of:
                1.5629499 = boost
                3.6515355 = idf(docFreq=3014, maxDocs=42740)
                0.012246564 = queryNorm
              0.637897 = fieldWeight in 2109, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.6515355 = idf(docFreq=3014, maxDocs=42740)
                0.078125 = fieldNorm(doc=2109)
          0.031089252 = weight(abstract_txt:approach in 2109) [ClassicSimilarity], result of:
            0.031089252 = score(doc=2109,freq=2.0), product of:
              0.074593 = queryWeight, product of:
                1.6146436 = boost
                3.772308 = idf(docFreq=2671, maxDocs=42740)
                0.012246564 = queryNorm
              0.41678512 = fieldWeight in 2109, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.772308 = idf(docFreq=2671, maxDocs=42740)
                0.078125 = fieldNorm(doc=2109)
          0.17993008 = weight(abstract_txt:queries in 2109) [ClassicSimilarity], result of:
            0.17993008 = score(doc=2109,freq=1.0), product of:
              0.45254663 = queryWeight, product of:
                7.2610373 = boost
                5.0892105 = idf(docFreq=715, maxDocs=42740)
                0.012246564 = queryNorm
              0.39759457 = fieldWeight in 2109, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0892105 = idf(docFreq=715, maxDocs=42740)
                0.078125 = fieldNorm(doc=2109)
          0.4115659 = weight(abstract_txt:ambiguous in 2109) [ClassicSimilarity], result of:
            0.4115659 = score(doc=2109,freq=1.0), product of:
              0.6975678 = queryWeight, product of:
                7.5423946 = boost
                7.5520167 = idf(docFreq=60, maxDocs=42740)
                0.012246564 = queryNorm
              0.5900013 = fieldWeight in 2109, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5520167 = idf(docFreq=60, maxDocs=42740)
                0.078125 = fieldNorm(doc=2109)
        0.24 = coord(6/25)
    
  3. Spink, A.; Ozmultu, H.C.: Characteristics of question format web queries : an exploratory study (2002) 0.16
    0.15843731 = sum of:
      0.15843731 = product of:
        0.6601555 = sum of:
          0.03760086 = weight(abstract_txt:submitted in 4911) [ClassicSimilarity], result of:
            0.03760086 = score(doc=4911,freq=1.0), product of:
              0.085835315 = queryWeight, product of:
                7.00893 = idf(docFreq=104, maxDocs=42740)
                0.012246564 = queryNorm
              0.43805814 = fieldWeight in 4911, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.00893 = idf(docFreq=104, maxDocs=42740)
                0.0625 = fieldNorm(doc=4911)
          0.009422571 = weight(abstract_txt:results in 4911) [ClassicSimilarity], result of:
            0.009422571 = score(doc=4911,freq=1.0), product of:
              0.042985678 = queryWeight, product of:
                1.0007921 = boost
                3.5072412 = idf(docFreq=3482, maxDocs=42740)
                0.012246564 = queryNorm
              0.21920258 = fieldWeight in 4911, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5072412 = idf(docFreq=3482, maxDocs=42740)
                0.0625 = fieldNorm(doc=4911)
          0.008482978 = weight(abstract_txt:that in 4911) [ClassicSimilarity], result of:
            0.008482978 = score(doc=4911,freq=2.0), product of:
              0.04007834 = queryWeight, product of:
                1.3666328 = boost
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.012246564 = queryNorm
              0.21165991 = fieldWeight in 4911, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.0625 = fieldNorm(doc=4911)
          0.026987653 = weight(abstract_txt:identify in 4911) [ClassicSimilarity], result of:
            0.026987653 = score(doc=4911,freq=1.0), product of:
              0.086693704 = queryWeight, product of:
                1.4212674 = boost
                4.980782 = idf(docFreq=797, maxDocs=42740)
                0.012246564 = queryNorm
              0.31129888 = fieldWeight in 4911, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.980782 = idf(docFreq=797, maxDocs=42740)
                0.0625 = fieldNorm(doc=4911)
          0.039072037 = weight(abstract_txt:search in 4911) [ClassicSimilarity], result of:
            0.039072037 = score(doc=4911,freq=6.0), product of:
              0.06989319 = queryWeight, product of:
                1.5629499 = boost
                3.6515355 = idf(docFreq=3014, maxDocs=42740)
                0.012246564 = queryNorm
              0.55902493 = fieldWeight in 4911, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                3.6515355 = idf(docFreq=3014, maxDocs=42740)
                0.0625 = fieldNorm(doc=4911)
          0.53858936 = weight(abstract_txt:queries in 4911) [ClassicSimilarity], result of:
            0.53858936 = score(doc=4911,freq=14.0), product of:
              0.45254663 = queryWeight, product of:
                7.2610373 = boost
                5.0892105 = idf(docFreq=715, maxDocs=42740)
                0.012246564 = queryNorm
              1.1901301 = fieldWeight in 4911, product of:
                3.7416575 = tf(freq=14.0), with freq of:
                  14.0 = termFreq=14.0
                5.0892105 = idf(docFreq=715, maxDocs=42740)
                0.0625 = fieldNorm(doc=4911)
        0.24 = coord(6/25)
    
  4. Li, M.; Li, H.; Zhou, Z.-H.: Semi-supervised document retrieval (2009) 0.16
    0.15727277 = sum of:
      0.15727277 = product of:
        0.5616885 = sum of:
          0.009422571 = weight(abstract_txt:results in 1219) [ClassicSimilarity], result of:
            0.009422571 = score(doc=1219,freq=1.0), product of:
              0.042985678 = queryWeight, product of:
                1.0007921 = boost
                3.5072412 = idf(docFreq=3482, maxDocs=42740)
                0.012246564 = queryNorm
              0.21920258 = fieldWeight in 1219, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5072412 = idf(docFreq=3482, maxDocs=42740)
                0.0625 = fieldNorm(doc=1219)
          0.038069315 = weight(abstract_txt:constructing in 1219) [ClassicSimilarity], result of:
            0.038069315 = score(doc=1219,freq=1.0), product of:
              0.08654677 = queryWeight, product of:
                1.0041357 = boost
                7.0379176 = idf(docFreq=101, maxDocs=42740)
                0.012246564 = queryNorm
              0.43986985 = fieldWeight in 1219, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0379176 = idf(docFreq=101, maxDocs=42740)
                0.0625 = fieldNorm(doc=1219)
          0.08368365 = weight(abstract_txt:supervised in 1219) [ClassicSimilarity], result of:
            0.08368365 = score(doc=1219,freq=3.0), product of:
              0.101450495 = queryWeight, product of:
                1.0871615 = boost
                7.619839 = idf(docFreq=56, maxDocs=42740)
                0.012246564 = queryNorm
              0.8248718 = fieldWeight in 1219, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.619839 = idf(docFreq=56, maxDocs=42740)
                0.0625 = fieldNorm(doc=1219)
          0.005998371 = weight(abstract_txt:that in 1219) [ClassicSimilarity], result of:
            0.005998371 = score(doc=1219,freq=1.0), product of:
              0.04007834 = queryWeight, product of:
                1.3666328 = boost
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.012246564 = queryNorm
              0.14966616 = fieldWeight in 1219, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.0625 = fieldNorm(doc=1219)
          0.015951091 = weight(abstract_txt:search in 1219) [ClassicSimilarity], result of:
            0.015951091 = score(doc=1219,freq=1.0), product of:
              0.06989319 = queryWeight, product of:
                1.5629499 = boost
                3.6515355 = idf(docFreq=3014, maxDocs=42740)
                0.012246564 = queryNorm
              0.22822097 = fieldWeight in 1219, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6515355 = idf(docFreq=3014, maxDocs=42740)
                0.0625 = fieldNorm(doc=1219)
          0.20499583 = weight(abstract_txt:labeled in 1219) [ClassicSimilarity], result of:
            0.20499583 = score(doc=1219,freq=4.0), product of:
              0.21103485 = queryWeight, product of:
                2.2174768 = boost
                7.77107 = idf(docFreq=48, maxDocs=42740)
                0.012246564 = queryNorm
              0.97138375 = fieldWeight in 1219, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.77107 = idf(docFreq=48, maxDocs=42740)
                0.0625 = fieldNorm(doc=1219)
          0.20356764 = weight(abstract_txt:queries in 1219) [ClassicSimilarity], result of:
            0.20356764 = score(doc=1219,freq=2.0), product of:
              0.45254663 = queryWeight, product of:
                7.2610373 = boost
                5.0892105 = idf(docFreq=715, maxDocs=42740)
                0.012246564 = queryNorm
              0.4498269 = fieldWeight in 1219, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.0892105 = idf(docFreq=715, maxDocs=42740)
                0.0625 = fieldNorm(doc=1219)
        0.28 = coord(7/25)
    
  5. Ortiz-Cordova, A.; Yang, Y.; Jansen, B.J.: External to internal search : associating searching on search engines with searching on sites (2015) 0.14
    0.14390479 = sum of:
      0.14390479 = product of:
        0.5996033 = sum of:
          0.053175643 = weight(abstract_txt:submitted in 4676) [ClassicSimilarity], result of:
            0.053175643 = score(doc=4676,freq=2.0), product of:
              0.085835315 = queryWeight, product of:
                7.00893 = idf(docFreq=104, maxDocs=42740)
                0.012246564 = queryNorm
              0.61950773 = fieldWeight in 4676, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.00893 = idf(docFreq=104, maxDocs=42740)
                0.0625 = fieldNorm(doc=4676)
          0.008482978 = weight(abstract_txt:that in 4676) [ClassicSimilarity], result of:
            0.008482978 = score(doc=4676,freq=2.0), product of:
              0.04007834 = queryWeight, product of:
                1.3666328 = boost
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.012246564 = queryNorm
              0.21165991 = fieldWeight in 4676, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.0625 = fieldNorm(doc=4676)
          0.026987653 = weight(abstract_txt:identify in 4676) [ClassicSimilarity], result of:
            0.026987653 = score(doc=4676,freq=1.0), product of:
              0.086693704 = queryWeight, product of:
                1.4212674 = boost
                4.980782 = idf(docFreq=797, maxDocs=42740)
                0.012246564 = queryNorm
              0.31129888 = fieldWeight in 4676, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.980782 = idf(docFreq=797, maxDocs=42740)
                0.0625 = fieldNorm(doc=4676)
          0.06767475 = weight(abstract_txt:search in 4676) [ClassicSimilarity], result of:
            0.06767475 = score(doc=4676,freq=18.0), product of:
              0.06989319 = queryWeight, product of:
                1.5629499 = boost
                3.6515355 = idf(docFreq=3014, maxDocs=42740)
                0.012246564 = queryNorm
              0.9682595 = fieldWeight in 4676, product of:
                4.2426405 = tf(freq=18.0), with freq of:
                  18.0 = termFreq=18.0
                3.6515355 = idf(docFreq=3014, maxDocs=42740)
                0.0625 = fieldNorm(doc=4676)
          0.062442064 = weight(abstract_txt:classify in 4676) [ClassicSimilarity], result of:
            0.062442064 = score(doc=4676,freq=1.0), product of:
              0.15165696 = queryWeight, product of:
                1.8798066 = boost
                6.5877166 = idf(docFreq=159, maxDocs=42740)
                0.012246564 = queryNorm
              0.4117323 = fieldWeight in 4676, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5877166 = idf(docFreq=159, maxDocs=42740)
                0.0625 = fieldNorm(doc=4676)
          0.38084018 = weight(abstract_txt:queries in 4676) [ClassicSimilarity], result of:
            0.38084018 = score(doc=4676,freq=7.0), product of:
              0.45254663 = queryWeight, product of:
                7.2610373 = boost
                5.0892105 = idf(docFreq=715, maxDocs=42740)
                0.012246564 = queryNorm
              0.84154904 = fieldWeight in 4676, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                5.0892105 = idf(docFreq=715, maxDocs=42740)
                0.0625 = fieldNorm(doc=4676)
        0.24 = coord(6/25)