Document (#38128)

Author
Ye, Z.
He, B.
Wang, L.
Luo, T.
Title
Utilizing term proximity for blog post retrieval
Source
Journal of the American Society for Information Science and Technology. 64(2013) no.11, S.2278-2298
Year
2013
Abstract
Term proximity is effective for many information retrieval (IR) research fields yet remains unexplored in blogosphere IR. The blogosphere is characterized by large amounts of noise, including incohesive, off-topic content and spam. Consequently, the classical bag-of-words unigram IR models are not reliable enough to provide robust and effective retrieval performance. In this article, we propose to boost the blog postretrieval performance by employing term proximity information. We investigate a variety of popular and state-of-the-art proximity-based statistical IR models, including a proximity-based counting model, the Markov random field (MRF) model, and the divergence from randomness (DFR) multinomial model. Extensive experimentation on the standard TREC Blog06 test dataset demonstrates that the introduction of term proximity information is indeed beneficial to retrieval from the blogosphere. Results also indicate the superiority of the unordered bi-gram model with the sequential-dependence phrases over other variants of the proximity-based models. Finally, inspired by the effectiveness of proximity models, we extend our study by exploring the proximity evidence between query terms and opinionated terms. The consequent opinionated proximity model shows promising performance in the experiments.
Theme
Computerlinguistik

Similar documents (author)

  1. Wang, H.; Wang, C.: Ontologies for universal information systems (1995) 4.77
    4.7679567 = sum of:
      4.7679567 = weight(author_txt:wang in 3263) [ClassicSimilarity], result of:
        4.7679567 = fieldWeight in 3263, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.7429094 = idf(docFreq=136, maxDocs=42740)
          0.5 = fieldNorm(doc=3263)
    
  2. Wang, C.: ¬The online catalogue, subject access and user reactions : a review (1985) 4.21
    4.2143183 = sum of:
      4.2143183 = weight(author_txt:wang in 986) [ClassicSimilarity], result of:
        4.2143183 = fieldWeight in 986, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          6.7429094 = idf(docFreq=136, maxDocs=42740)
          0.625 = fieldNorm(doc=986)
    
  3. Wang, C.: Bibliometrics : a textbook (1990) 4.21
    4.2143183 = sum of:
      4.2143183 = weight(author_txt:wang in 5109) [ClassicSimilarity], result of:
        4.2143183 = fieldWeight in 5109, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          6.7429094 = idf(docFreq=136, maxDocs=42740)
          0.625 = fieldNorm(doc=5109)
    
  4. Wang, P.: Users' information needs at different stages of a research project : a cognitive view (1997) 4.21
    4.2143183 = sum of:
      4.2143183 = weight(author_txt:wang in 1321) [ClassicSimilarity], result of:
        4.2143183 = fieldWeight in 1321, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          6.7429094 = idf(docFreq=136, maxDocs=42740)
          0.625 = fieldNorm(doc=1321)
    
  5. Wang, D.: Cataloger appraises keyword searching in WorldCat (1997) 4.21
    4.2143183 = sum of:
      4.2143183 = weight(author_txt:wang in 2443) [ClassicSimilarity], result of:
        4.2143183 = fieldWeight in 2443, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          6.7429094 = idf(docFreq=136, maxDocs=42740)
          0.625 = fieldNorm(doc=2443)
    

Similar documents (content)

  1. Vechtomova, O.: Facet-based opinion retrieval from blogs (2010) 0.26
    0.2550452 = sum of:
      0.2550452 = product of:
        0.9108757 = sum of:
          0.027023207 = weight(abstract_txt:terms in 1226) [ClassicSimilarity], result of:
            0.027023207 = score(doc=1226,freq=2.0), product of:
              0.050202537 = queryWeight, product of:
                1.0421335 = boost
                4.05999 = idf(docFreq=2003, maxDocs=42740)
                0.011865263 = queryNorm
              0.5382837 = fieldWeight in 1226, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.05999 = idf(docFreq=2003, maxDocs=42740)
                0.09375 = fieldNorm(doc=1226)
          0.078633286 = weight(abstract_txt:divergence in 1226) [ClassicSimilarity], result of:
            0.078633286 = score(doc=1226,freq=1.0), product of:
              0.102322705 = queryWeight, product of:
                1.0520382 = boost
                8.197155 = idf(docFreq=31, maxDocs=42740)
                0.011865263 = queryNorm
              0.7684833 = fieldWeight in 1226, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.197155 = idf(docFreq=31, maxDocs=42740)
                0.09375 = fieldNorm(doc=1226)
          0.014150983 = weight(abstract_txt:based in 1226) [ClassicSimilarity], result of:
            0.014150983 = score(doc=1226,freq=1.0), product of:
              0.047039844 = queryWeight, product of:
                1.2354895 = boost
                3.2088501 = idf(docFreq=4693, maxDocs=42740)
                0.011865263 = queryNorm
              0.3008297 = fieldWeight in 1226, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.2088501 = idf(docFreq=4693, maxDocs=42740)
                0.09375 = fieldNorm(doc=1226)
          0.032386918 = weight(abstract_txt:effective in 1226) [ClassicSimilarity], result of:
            0.032386918 = score(doc=1226,freq=1.0), product of:
              0.07136591 = queryWeight, product of:
                1.2425272 = boost
                4.840693 = idf(docFreq=917, maxDocs=42740)
                0.011865263 = queryNorm
              0.45381498 = fieldWeight in 1226, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.840693 = idf(docFreq=917, maxDocs=42740)
                0.09375 = fieldNorm(doc=1226)
          0.023752645 = weight(abstract_txt:retrieval in 1226) [ClassicSimilarity], result of:
            0.023752645 = score(doc=1226,freq=1.0), product of:
              0.07312441 = queryWeight, product of:
                1.7787163 = boost
                3.4648013 = idf(docFreq=3633, maxDocs=42740)
                0.011865263 = queryNorm
              0.3248251 = fieldWeight in 1226, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4648013 = idf(docFreq=3633, maxDocs=42740)
                0.09375 = fieldNorm(doc=1226)
          0.18949793 = weight(abstract_txt:blog in 1226) [ClassicSimilarity], result of:
            0.18949793 = score(doc=1226,freq=2.0), product of:
              0.18392357 = queryWeight, product of:
                1.9947073 = boost
                7.77107 = idf(docFreq=48, maxDocs=42740)
                0.011865263 = queryNorm
              1.030308 = fieldWeight in 1226, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.77107 = idf(docFreq=48, maxDocs=42740)
                0.09375 = fieldNorm(doc=1226)
          0.5454307 = weight(abstract_txt:proximity in 1226) [ClassicSimilarity], result of:
            0.5454307 = score(doc=1226,freq=1.0), product of:
              0.8017903 = queryWeight, product of:
                9.312706 = boost
                7.256171 = idf(docFreq=81, maxDocs=42740)
                0.011865263 = queryNorm
              0.680266 = fieldWeight in 1226, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.256171 = idf(docFreq=81, maxDocs=42740)
                0.09375 = fieldNorm(doc=1226)
        0.28 = coord(7/25)
    
  2. Guo, L.; Wan, X.: Exploiting syntactic and semantic relationships between terms for opinion retrieval (2012) 0.20
    0.1964369 = sum of:
      0.1964369 = product of:
        0.8184871 = sum of:
          0.027580447 = weight(abstract_txt:terms in 2493) [ClassicSimilarity], result of:
            0.027580447 = score(doc=2493,freq=3.0), product of:
              0.050202537 = queryWeight, product of:
                1.0421335 = boost
                4.05999 = idf(docFreq=2003, maxDocs=42740)
                0.011865263 = queryNorm
              0.5493835 = fieldWeight in 2493, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.05999 = idf(docFreq=2003, maxDocs=42740)
                0.078125 = fieldNorm(doc=2493)
          0.011792485 = weight(abstract_txt:based in 2493) [ClassicSimilarity], result of:
            0.011792485 = score(doc=2493,freq=1.0), product of:
              0.047039844 = queryWeight, product of:
                1.2354895 = boost
                3.2088501 = idf(docFreq=4693, maxDocs=42740)
                0.011865263 = queryNorm
              0.2506914 = fieldWeight in 2493, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.2088501 = idf(docFreq=4693, maxDocs=42740)
                0.078125 = fieldNorm(doc=2493)
          0.02799276 = weight(abstract_txt:retrieval in 2493) [ClassicSimilarity], result of:
            0.02799276 = score(doc=2493,freq=2.0), product of:
              0.07312441 = queryWeight, product of:
                1.7787163 = boost
                3.4648013 = idf(docFreq=3633, maxDocs=42740)
                0.011865263 = queryNorm
              0.3828101 = fieldWeight in 2493, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4648013 = idf(docFreq=3633, maxDocs=42740)
                0.078125 = fieldNorm(doc=2493)
          0.05358071 = weight(abstract_txt:term in 2493) [ClassicSimilarity], result of:
            0.05358071 = score(doc=2493,freq=1.0), product of:
              0.14203025 = queryWeight, product of:
                2.4789395 = boost
                4.8287816 = idf(docFreq=928, maxDocs=42740)
                0.011865263 = queryNorm
              0.37724856 = fieldWeight in 2493, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8287816 = idf(docFreq=928, maxDocs=42740)
                0.078125 = fieldNorm(doc=2493)
          0.054744426 = weight(abstract_txt:model in 2493) [ClassicSimilarity], result of:
            0.054744426 = score(doc=2493,freq=2.0), product of:
              0.123186134 = queryWeight, product of:
                2.581139 = boost
                4.022287 = idf(docFreq=2080, maxDocs=42740)
                0.011865263 = queryNorm
              0.44440413 = fieldWeight in 2493, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.022287 = idf(docFreq=2080, maxDocs=42740)
                0.078125 = fieldNorm(doc=2493)
          0.6427963 = weight(abstract_txt:proximity in 2493) [ClassicSimilarity], result of:
            0.6427963 = score(doc=2493,freq=2.0), product of:
              0.8017903 = queryWeight, product of:
                9.312706 = boost
                7.256171 = idf(docFreq=81, maxDocs=42740)
                0.011865263 = queryNorm
              0.80170125 = fieldWeight in 2493, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.256171 = idf(docFreq=81, maxDocs=42740)
                0.078125 = fieldNorm(doc=2493)
        0.24 = coord(6/25)
    
  3. Hawking, D.; Thistlewaite, P.: Proximity operators : so near and yet so far (1996) 0.19
    0.19278127 = sum of:
      0.19278127 = product of:
        1.6065106 = sum of:
          0.037735954 = weight(abstract_txt:based in 617) [ClassicSimilarity], result of:
            0.037735954 = score(doc=617,freq=1.0), product of:
              0.047039844 = queryWeight, product of:
                1.2354895 = boost
                3.2088501 = idf(docFreq=4693, maxDocs=42740)
                0.011865263 = queryNorm
              0.80221254 = fieldWeight in 617, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.2088501 = idf(docFreq=4693, maxDocs=42740)
                0.25 = fieldNorm(doc=617)
          0.11429277 = weight(abstract_txt:performance in 617) [ClassicSimilarity], result of:
            0.11429277 = score(doc=617,freq=1.0), product of:
              0.098470956 = queryWeight, product of:
                1.7875582 = boost
                4.6426997 = idf(docFreq=1118, maxDocs=42740)
                0.011865263 = queryNorm
              1.1606749 = fieldWeight in 617, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6426997 = idf(docFreq=1118, maxDocs=42740)
                0.25 = fieldNorm(doc=617)
          1.454482 = weight(abstract_txt:proximity in 617) [ClassicSimilarity], result of:
            1.454482 = score(doc=617,freq=1.0), product of:
              0.8017903 = queryWeight, product of:
                9.312706 = boost
                7.256171 = idf(docFreq=81, maxDocs=42740)
                0.011865263 = queryNorm
              1.8140428 = fieldWeight in 617, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.256171 = idf(docFreq=81, maxDocs=42740)
                0.25 = fieldNorm(doc=617)
        0.12 = coord(3/25)
    
  4. Keen, E.M.: Some aspects of proximity searching in text retrieval systems (1992) 0.19
    0.19088991 = sum of:
      0.19088991 = product of:
        1.193062 = sum of:
          0.014150983 = weight(abstract_txt:based in 6190) [ClassicSimilarity], result of:
            0.014150983 = score(doc=6190,freq=1.0), product of:
              0.047039844 = queryWeight, product of:
                1.2354895 = boost
                3.2088501 = idf(docFreq=4693, maxDocs=42740)
                0.011865263 = queryNorm
              0.3008297 = fieldWeight in 6190, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.2088501 = idf(docFreq=4693, maxDocs=42740)
                0.09375 = fieldNorm(doc=6190)
          0.023752645 = weight(abstract_txt:retrieval in 6190) [ClassicSimilarity], result of:
            0.023752645 = score(doc=6190,freq=1.0), product of:
              0.07312441 = queryWeight, product of:
                1.7787163 = boost
                3.4648013 = idf(docFreq=3633, maxDocs=42740)
                0.011865263 = queryNorm
              0.3248251 = fieldWeight in 6190, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4648013 = idf(docFreq=3633, maxDocs=42740)
                0.09375 = fieldNorm(doc=6190)
          0.06429686 = weight(abstract_txt:term in 6190) [ClassicSimilarity], result of:
            0.06429686 = score(doc=6190,freq=1.0), product of:
              0.14203025 = queryWeight, product of:
                2.4789395 = boost
                4.8287816 = idf(docFreq=928, maxDocs=42740)
                0.011865263 = queryNorm
              0.4526983 = fieldWeight in 6190, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8287816 = idf(docFreq=928, maxDocs=42740)
                0.09375 = fieldNorm(doc=6190)
          1.0908614 = weight(abstract_txt:proximity in 6190) [ClassicSimilarity], result of:
            1.0908614 = score(doc=6190,freq=4.0), product of:
              0.8017903 = queryWeight, product of:
                9.312706 = boost
                7.256171 = idf(docFreq=81, maxDocs=42740)
                0.011865263 = queryNorm
              1.360532 = fieldWeight in 6190, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.256171 = idf(docFreq=81, maxDocs=42740)
                0.09375 = fieldNorm(doc=6190)
        0.16 = coord(4/25)
    
  5. Abdou, S.; Savoy, J.: Searching in Medline : query expansion and manual indexing evaluation (2008) 0.19
    0.19010058 = sum of:
      0.19010058 = product of:
        0.52805716 = sum of:
          0.015923578 = weight(abstract_txt:terms in 4063) [ClassicSimilarity], result of:
            0.015923578 = score(doc=4063,freq=1.0), product of:
              0.050202537 = queryWeight, product of:
                1.0421335 = boost
                4.05999 = idf(docFreq=2003, maxDocs=42740)
                0.011865263 = queryNorm
              0.3171867 = fieldWeight in 4063, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.05999 = idf(docFreq=2003, maxDocs=42740)
                0.078125 = fieldNorm(doc=4063)
          0.09267022 = weight(abstract_txt:divergence in 4063) [ClassicSimilarity], result of:
            0.09267022 = score(doc=4063,freq=2.0), product of:
              0.102322705 = queryWeight, product of:
                1.0520382 = boost
                8.197155 = idf(docFreq=31, maxDocs=42740)
                0.011865263 = queryNorm
              0.90566623 = fieldWeight in 4063, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.197155 = idf(docFreq=31, maxDocs=42740)
                0.078125 = fieldNorm(doc=4063)
          0.028304188 = weight(abstract_txt:including in 4063) [ClassicSimilarity], result of:
            0.028304188 = score(doc=4063,freq=2.0), product of:
              0.058468554 = queryWeight, product of:
                1.1246611 = boost
                4.381505 = idf(docFreq=1452, maxDocs=42740)
                0.011865263 = queryNorm
              0.48409247 = fieldWeight in 4063, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.381505 = idf(docFreq=1452, maxDocs=42740)
                0.078125 = fieldNorm(doc=4063)
          0.13380933 = weight(abstract_txt:randomness in 4063) [ClassicSimilarity], result of:
            0.13380933 = score(doc=4063,freq=2.0), product of:
              0.13071823 = queryWeight, product of:
                1.1890869 = boost
                9.264996 = idf(docFreq=10, maxDocs=42740)
                0.011865263 = queryNorm
              1.0236471 = fieldWeight in 4063, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.264996 = idf(docFreq=10, maxDocs=42740)
                0.078125 = fieldNorm(doc=4063)
          0.01667709 = weight(abstract_txt:based in 4063) [ClassicSimilarity], result of:
            0.01667709 = score(doc=4063,freq=2.0), product of:
              0.047039844 = queryWeight, product of:
                1.2354895 = boost
                3.2088501 = idf(docFreq=4693, maxDocs=42740)
                0.011865263 = queryNorm
              0.35453117 = fieldWeight in 4063, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.2088501 = idf(docFreq=4693, maxDocs=42740)
                0.078125 = fieldNorm(doc=4063)
          0.039587744 = weight(abstract_txt:retrieval in 4063) [ClassicSimilarity], result of:
            0.039587744 = score(doc=4063,freq=4.0), product of:
              0.07312441 = queryWeight, product of:
                1.7787163 = boost
                3.4648013 = idf(docFreq=3633, maxDocs=42740)
                0.011865263 = queryNorm
              0.5413752 = fieldWeight in 4063, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.4648013 = idf(docFreq=3633, maxDocs=42740)
                0.078125 = fieldNorm(doc=4063)
          0.035716493 = weight(abstract_txt:performance in 4063) [ClassicSimilarity], result of:
            0.035716493 = score(doc=4063,freq=1.0), product of:
              0.098470956 = queryWeight, product of:
                1.7875582 = boost
                4.6426997 = idf(docFreq=1118, maxDocs=42740)
                0.011865263 = queryNorm
              0.36271092 = fieldWeight in 4063, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6426997 = idf(docFreq=1118, maxDocs=42740)
                0.078125 = fieldNorm(doc=4063)
          0.09832057 = weight(abstract_txt:models in 4063) [ClassicSimilarity], result of:
            0.09832057 = score(doc=4063,freq=4.0), product of:
              0.13410701 = queryWeight, product of:
                2.4088027 = boost
                4.6921606 = idf(docFreq=1064, maxDocs=42740)
                0.011865263 = queryNorm
              0.7331501 = fieldWeight in 4063, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.6921606 = idf(docFreq=1064, maxDocs=42740)
                0.078125 = fieldNorm(doc=4063)
          0.06704795 = weight(abstract_txt:model in 4063) [ClassicSimilarity], result of:
            0.06704795 = score(doc=4063,freq=3.0), product of:
              0.123186134 = queryWeight, product of:
                2.581139 = boost
                4.022287 = idf(docFreq=2080, maxDocs=42740)
                0.011865263 = queryNorm
              0.54428166 = fieldWeight in 4063, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.022287 = idf(docFreq=2080, maxDocs=42740)
                0.078125 = fieldNorm(doc=4063)
        0.36 = coord(9/25)