Document (#42046)

Author
Belbachir, F.
Boughanem, M.
Title
Using language models to improve opinion detection
Source
Information processing and management. 54(2018) no.6, S.958-968
Year
2018
Abstract
Opinion mining is one of the most important research tasks in the information retrieval research community. With the huge volume of opinionated data available on the Web, approaches must be developed to differentiate opinion from fact. In this paper, we present a lexicon-based approach for opinion retrieval. Generally, opinion retrieval consists of two stages: relevance to the query and opinion detection. In our work, we focus on the second state which itself focusses on detecting opinionated documents . We compare the document to be analyzed with opinionated sources that contain subjective information. We hypothesize that a document with a strong similarity to opinionated sources is more likely to be opinionated itself. Typical lexicon-based approaches treat and choose their opinion sources according to their test collection, then calculate the opinion score based on the frequency of subjective terms in the document. In our work, we use different open opinion collections without any specific treatment and consider them as a reference collection. We then use language models to determine opinion scores. The analysis document and reference collection are represented by different language models (i.e., Dirichlet, Jelinek-Mercer and two-stage models). These language models are generally used in information retrieval to represent the relationship between documents and queries. However, in our study, we modify these language models to represent opinionated documents. We carry out several experiments using Text REtrieval Conference (TREC) Blogs 06 as our analysis collection and Internet Movie Data Bases (IMDB), Multi-Perspective Question Answering (MPQA) and CHESLY as our reference collection. To improve opinion detection, we study the impact of using different language models to represent the document and reference collection alongside different combinations of opinion and retrieval scores. We then use this data to deduce the best opinion detection models. Using the best models, our approach improves on the best baseline of TREC Blog (baseline4) by 30%.
Content
Vgl.: https://doi.org/10.1016/j.ipm.2018.07.001.
Theme
Computerlinguistik

Similar documents (content)

  1. Seki, K.; Uehara, K.: Opinionated document retrieval using subjective triggers (2011) 0.35
    0.3509646 = sum of:
      0.3509646 = product of:
        1.253445 = sum of:
          0.024103435 = weight(abstract_txt:itself in 911) [ClassicSimilarity], result of:
            0.024103435 = score(doc=911,freq=1.0), product of:
              0.06826985 = queryWeight, product of:
                1.1761276 = boost
                5.648979 = idf(docFreq=413, maxDocs=43254)
                0.010275537 = queryNorm
              0.3530612 = fieldWeight in 911, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.648979 = idf(docFreq=413, maxDocs=43254)
                0.0625 = fieldNorm(doc=911)
          0.066012174 = weight(abstract_txt:subjective in 911) [ClassicSimilarity], result of:
            0.066012174 = score(doc=911,freq=3.0), product of:
              0.0926585 = queryWeight, product of:
                1.3701956 = boost
                6.5810947 = idf(docFreq=162, maxDocs=43254)
                0.010275537 = queryNorm
              0.7124244 = fieldWeight in 911, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.5810947 = idf(docFreq=162, maxDocs=43254)
                0.0625 = fieldNorm(doc=911)
          0.033517454 = weight(abstract_txt:retrieval in 911) [ClassicSimilarity], result of:
            0.033517454 = score(doc=911,freq=4.0), product of:
              0.0772759 = queryWeight, product of:
                2.1673179 = boost
                3.4699 = idf(docFreq=3658, maxDocs=43254)
                0.010275537 = queryNorm
              0.4337375 = fieldWeight in 911, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.4699 = idf(docFreq=3658, maxDocs=43254)
                0.0625 = fieldNorm(doc=911)
          0.037169483 = weight(abstract_txt:document in 911) [ClassicSimilarity], result of:
            0.037169483 = score(doc=911,freq=2.0), product of:
              0.09816063 = queryWeight, product of:
                2.2298653 = boost
                4.2840466 = idf(docFreq=1620, maxDocs=43254)
                0.010275537 = queryNorm
              0.37865978 = fieldWeight in 911, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2840466 = idf(docFreq=1620, maxDocs=43254)
                0.0625 = fieldNorm(doc=911)
          0.041794557 = weight(abstract_txt:language in 911) [ClassicSimilarity], result of:
            0.041794557 = score(doc=911,freq=2.0), product of:
              0.112794146 = queryWeight, product of:
                2.6184473 = boost
                4.192163 = idf(docFreq=1776, maxDocs=43254)
                0.010275537 = queryNorm
              0.37053835 = fieldWeight in 911, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.192163 = idf(docFreq=1776, maxDocs=43254)
                0.0625 = fieldNorm(doc=911)
          0.48295107 = weight(abstract_txt:opinionated in 911) [ClassicSimilarity], result of:
            0.48295107 = score(doc=911,freq=2.0), product of:
              0.57651263 = queryWeight, product of:
                5.919772 = boost
                9.47762 = idf(docFreq=8, maxDocs=43254)
                0.010275537 = queryNorm
              0.83771116 = fieldWeight in 911, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.47762 = idf(docFreq=8, maxDocs=43254)
                0.0625 = fieldNorm(doc=911)
          0.56789684 = weight(abstract_txt:opinion in 911) [ClassicSimilarity], result of:
            0.56789684 = score(doc=911,freq=4.0), product of:
              0.65964013 = queryWeight, product of:
                9.320735 = boost
                6.8873534 = idf(docFreq=119, maxDocs=43254)
                0.010275537 = queryNorm
              0.8609192 = fieldWeight in 911, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.8873534 = idf(docFreq=119, maxDocs=43254)
                0.0625 = fieldNorm(doc=911)
        0.28 = coord(7/25)
    
  2. Guo, L.; Wan, X.: Exploiting syntactic and semantic relationships between terms for opinion retrieval (2012) 0.32
    0.3157627 = sum of:
      0.3157627 = product of:
        1.1277239 = sum of:
          0.00823082 = weight(abstract_txt:based in 1957) [ClassicSimilarity], result of:
            0.00823082 = score(doc=1957,freq=1.0), product of:
              0.032902494 = queryWeight, product of:
                3.2020218 = idf(docFreq=4782, maxDocs=43254)
                0.010275537 = queryNorm
              0.25015795 = fieldWeight in 1957, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.2020218 = idf(docFreq=4782, maxDocs=43254)
                0.078125 = fieldNorm(doc=1957)
          0.017486395 = weight(abstract_txt:documents in 1957) [ClassicSimilarity], result of:
            0.017486395 = score(doc=1957,freq=1.0), product of:
              0.054375123 = queryWeight, product of:
                1.2855403 = boost
                4.1163282 = idf(docFreq=1916, maxDocs=43254)
                0.010275537 = queryNorm
              0.32158816 = fieldWeight in 1957, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1163282 = idf(docFreq=1916, maxDocs=43254)
                0.078125 = fieldNorm(doc=1957)
          0.049919326 = weight(abstract_txt:trec in 1957) [ClassicSimilarity], result of:
            0.049919326 = score(doc=1957,freq=1.0), product of:
              0.09559066 = queryWeight, product of:
                1.3917066 = boost
                6.6844125 = idf(docFreq=146, maxDocs=43254)
                0.010275537 = queryNorm
              0.5222197 = fieldWeight in 1957, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6844125 = idf(docFreq=146, maxDocs=43254)
                0.078125 = fieldNorm(doc=1957)
          0.029625524 = weight(abstract_txt:retrieval in 1957) [ClassicSimilarity], result of:
            0.029625524 = score(doc=1957,freq=2.0), product of:
              0.0772759 = queryWeight, product of:
                2.1673179 = boost
                3.4699 = idf(docFreq=3658, maxDocs=43254)
                0.010275537 = queryNorm
              0.38337338 = fieldWeight in 1957, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4699 = idf(docFreq=3658, maxDocs=43254)
                0.078125 = fieldNorm(doc=1957)
          0.032853495 = weight(abstract_txt:document in 1957) [ClassicSimilarity], result of:
            0.032853495 = score(doc=1957,freq=1.0), product of:
              0.09816063 = queryWeight, product of:
                2.2298653 = boost
                4.2840466 = idf(docFreq=1620, maxDocs=43254)
                0.010275537 = queryNorm
              0.33469114 = fieldWeight in 1957, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2840466 = idf(docFreq=1620, maxDocs=43254)
                0.078125 = fieldNorm(doc=1957)
          0.05053735 = weight(abstract_txt:collection in 1957) [ClassicSimilarity], result of:
            0.05053735 = score(doc=1957,freq=1.0), product of:
              0.13900115 = queryWeight, product of:
                2.906764 = boost
                4.653761 = idf(docFreq=1119, maxDocs=43254)
                0.010275537 = queryNorm
              0.36357507 = fieldWeight in 1957, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.653761 = idf(docFreq=1119, maxDocs=43254)
                0.078125 = fieldNorm(doc=1957)
          0.93907106 = weight(abstract_txt:opinion in 1957) [ClassicSimilarity], result of:
            0.93907106 = score(doc=1957,freq=7.0), product of:
              0.65964013 = queryWeight, product of:
                9.320735 = boost
                6.8873534 = idf(docFreq=119, maxDocs=43254)
                0.010275537 = queryNorm
              1.4236112 = fieldWeight in 1957, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                6.8873534 = idf(docFreq=119, maxDocs=43254)
                0.078125 = fieldNorm(doc=1957)
        0.28 = coord(7/25)
    
  3. Ku, L.-W.; Ho, H.-W.; Chen, H.-H.: Opinion mining and relationship discovery using CopeOpi opinion analysis system (2009) 0.30
    0.30008158 = sum of:
      0.30008158 = product of:
        1.0717199 = sum of:
          0.0147237405 = weight(abstract_txt:based in 4939) [ClassicSimilarity], result of:
            0.0147237405 = score(doc=4939,freq=5.0), product of:
              0.032902494 = queryWeight, product of:
                3.2020218 = idf(docFreq=4782, maxDocs=43254)
                0.010275537 = queryNorm
              0.44749618 = fieldWeight in 4939, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.2020218 = idf(docFreq=4782, maxDocs=43254)
                0.0625 = fieldNorm(doc=4939)
          0.013989116 = weight(abstract_txt:documents in 4939) [ClassicSimilarity], result of:
            0.013989116 = score(doc=4939,freq=1.0), product of:
              0.054375123 = queryWeight, product of:
                1.2855403 = boost
                4.1163282 = idf(docFreq=1916, maxDocs=43254)
                0.010275537 = queryNorm
              0.25727051 = fieldWeight in 4939, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1163282 = idf(docFreq=1916, maxDocs=43254)
                0.0625 = fieldNorm(doc=4939)
          0.021565324 = weight(abstract_txt:sources in 4939) [ClassicSimilarity], result of:
            0.021565324 = score(doc=4939,freq=1.0), product of:
              0.07256225 = queryWeight, product of:
                1.4850496 = boost
                4.7551613 = idf(docFreq=1011, maxDocs=43254)
                0.010275537 = queryNorm
              0.29719758 = fieldWeight in 4939, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7551613 = idf(docFreq=1011, maxDocs=43254)
                0.0625 = fieldNorm(doc=4939)
          0.025600635 = weight(abstract_txt:best in 4939) [ClassicSimilarity], result of:
            0.025600635 = score(doc=4939,freq=1.0), product of:
              0.08135306 = queryWeight, product of:
                1.5724344 = boost
                5.0349693 = idf(docFreq=764, maxDocs=43254)
                0.010275537 = queryNorm
              0.31468558 = fieldWeight in 4939, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0349693 = idf(docFreq=764, maxDocs=43254)
                0.0625 = fieldNorm(doc=4939)
          0.037169483 = weight(abstract_txt:document in 4939) [ClassicSimilarity], result of:
            0.037169483 = score(doc=4939,freq=2.0), product of:
              0.09816063 = queryWeight, product of:
                2.2298653 = boost
                4.2840466 = idf(docFreq=1620, maxDocs=43254)
                0.010275537 = queryNorm
              0.37865978 = fieldWeight in 4939, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2840466 = idf(docFreq=1620, maxDocs=43254)
                0.0625 = fieldNorm(doc=4939)
          0.10682635 = weight(abstract_txt:models in 4939) [ClassicSimilarity], result of:
            0.10682635 = score(doc=4939,freq=3.0), product of:
              0.21085909 = queryWeight, product of:
                4.384725 = boost
                4.679995 = idf(docFreq=1090, maxDocs=43254)
                0.010275537 = queryNorm
              0.50662434 = fieldWeight in 4939, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.679995 = idf(docFreq=1090, maxDocs=43254)
                0.0625 = fieldNorm(doc=4939)
          0.85184526 = weight(abstract_txt:opinion in 4939) [ClassicSimilarity], result of:
            0.85184526 = score(doc=4939,freq=9.0), product of:
              0.65964013 = queryWeight, product of:
                9.320735 = boost
                6.8873534 = idf(docFreq=119, maxDocs=43254)
                0.010275537 = queryNorm
              1.2913787 = fieldWeight in 4939, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                6.8873534 = idf(docFreq=119, maxDocs=43254)
                0.0625 = fieldNorm(doc=4939)
        0.28 = coord(7/25)
    
  4. Ku, L.-W.; Chen, H.-H.: Mining opinions from the Web : beyond relevance retrieval (2007) 0.30
    0.2973234 = sum of:
      0.2973234 = product of:
        0.9291357 = sum of:
          0.016591262 = weight(abstract_txt:improve in 2606) [ClassicSimilarity], result of:
            0.016591262 = score(doc=2606,freq=1.0), product of:
              0.05322258 = queryWeight, product of:
                1.0384556 = boost
                4.987736 = idf(docFreq=801, maxDocs=43254)
                0.010275537 = queryNorm
              0.3117335 = fieldWeight in 2606, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.987736 = idf(docFreq=801, maxDocs=43254)
                0.0625 = fieldNorm(doc=2606)
          0.0076015717 = weight(abstract_txt:data in 2606) [ClassicSimilarity], result of:
            0.0076015717 = score(doc=2606,freq=1.0), product of:
              0.036208373 = queryWeight, product of:
                1.0490353 = boost
                3.3590338 = idf(docFreq=4087, maxDocs=43254)
                0.010275537 = queryNorm
              0.20993961 = fieldWeight in 2606, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3590338 = idf(docFreq=4087, maxDocs=43254)
                0.0625 = fieldNorm(doc=2606)
          0.031280614 = weight(abstract_txt:documents in 2606) [ClassicSimilarity], result of:
            0.031280614 = score(doc=2606,freq=5.0), product of:
              0.054375123 = queryWeight, product of:
                1.2855403 = boost
                4.1163282 = idf(docFreq=1916, maxDocs=43254)
                0.010275537 = queryNorm
              0.57527435 = fieldWeight in 2606, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.1163282 = idf(docFreq=1916, maxDocs=43254)
                0.0625 = fieldNorm(doc=2606)
          0.066012174 = weight(abstract_txt:subjective in 2606) [ClassicSimilarity], result of:
            0.066012174 = score(doc=2606,freq=3.0), product of:
              0.0926585 = queryWeight, product of:
                1.3701956 = boost
                6.5810947 = idf(docFreq=162, maxDocs=43254)
                0.010275537 = queryNorm
              0.7124244 = fieldWeight in 2606, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.5810947 = idf(docFreq=162, maxDocs=43254)
                0.0625 = fieldNorm(doc=2606)
          0.013351642 = weight(abstract_txt:different in 2606) [ClassicSimilarity], result of:
            0.013351642 = score(doc=2606,freq=1.0), product of:
              0.058015328 = queryWeight, product of:
                1.5332972 = boost
                3.6822383 = idf(docFreq=2958, maxDocs=43254)
                0.010275537 = queryNorm
              0.2301399 = fieldWeight in 2606, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6822383 = idf(docFreq=2958, maxDocs=43254)
                0.0625 = fieldNorm(doc=2606)
          0.016758727 = weight(abstract_txt:retrieval in 2606) [ClassicSimilarity], result of:
            0.016758727 = score(doc=2606,freq=1.0), product of:
              0.0772759 = queryWeight, product of:
                2.1673179 = boost
                3.4699 = idf(docFreq=3658, maxDocs=43254)
                0.010275537 = queryNorm
              0.21686874 = fieldWeight in 2606, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4699 = idf(docFreq=3658, maxDocs=43254)
                0.0625 = fieldNorm(doc=2606)
          0.026282795 = weight(abstract_txt:document in 2606) [ClassicSimilarity], result of:
            0.026282795 = score(doc=2606,freq=1.0), product of:
              0.09816063 = queryWeight, product of:
                2.2298653 = boost
                4.2840466 = idf(docFreq=1620, maxDocs=43254)
                0.010275537 = queryNorm
              0.26775292 = fieldWeight in 2606, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2840466 = idf(docFreq=1620, maxDocs=43254)
                0.0625 = fieldNorm(doc=2606)
          0.7512569 = weight(abstract_txt:opinion in 2606) [ClassicSimilarity], result of:
            0.7512569 = score(doc=2606,freq=7.0), product of:
              0.65964013 = queryWeight, product of:
                9.320735 = boost
                6.8873534 = idf(docFreq=119, maxDocs=43254)
                0.010275537 = queryNorm
              1.138889 = fieldWeight in 2606, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                6.8873534 = idf(docFreq=119, maxDocs=43254)
                0.0625 = fieldNorm(doc=2606)
        0.32 = coord(8/25)
    
  5. Varathan, K.D.; Giachanou, A.; Crestani, F.: Comparative opinion mining : a review (2017) 0.25
    0.2542803 = sum of:
      0.2542803 = product of:
        1.0595013 = sum of:
          0.0076015717 = weight(abstract_txt:data in 5005) [ClassicSimilarity], result of:
            0.0076015717 = score(doc=5005,freq=1.0), product of:
              0.036208373 = queryWeight, product of:
                1.0490353 = boost
                3.3590338 = idf(docFreq=4087, maxDocs=43254)
                0.010275537 = queryNorm
              0.20993961 = fieldWeight in 5005, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3590338 = idf(docFreq=4087, maxDocs=43254)
                0.0625 = fieldNorm(doc=5005)
          0.03811215 = weight(abstract_txt:subjective in 5005) [ClassicSimilarity], result of:
            0.03811215 = score(doc=5005,freq=1.0), product of:
              0.0926585 = queryWeight, product of:
                1.3701956 = boost
                6.5810947 = idf(docFreq=162, maxDocs=43254)
                0.010275537 = queryNorm
              0.41131842 = fieldWeight in 5005, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5810947 = idf(docFreq=162, maxDocs=43254)
                0.0625 = fieldNorm(doc=5005)
          0.018882072 = weight(abstract_txt:different in 5005) [ClassicSimilarity], result of:
            0.018882072 = score(doc=5005,freq=2.0), product of:
              0.058015328 = queryWeight, product of:
                1.5332972 = boost
                3.6822383 = idf(docFreq=2958, maxDocs=43254)
                0.010275537 = queryNorm
              0.32546696 = fieldWeight in 5005, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.6822383 = idf(docFreq=2958, maxDocs=43254)
                0.0625 = fieldNorm(doc=5005)
          0.023601856 = weight(abstract_txt:reference in 5005) [ClassicSimilarity], result of:
            0.023601856 = score(doc=5005,freq=1.0), product of:
              0.08481716 = queryWeight, product of:
                1.8539449 = boost
                4.452279 = idf(docFreq=1369, maxDocs=43254)
                0.010275537 = queryNorm
              0.27826744 = fieldWeight in 5005, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.452279 = idf(docFreq=1369, maxDocs=43254)
                0.0625 = fieldNorm(doc=5005)
          0.029553216 = weight(abstract_txt:language in 5005) [ClassicSimilarity], result of:
            0.029553216 = score(doc=5005,freq=1.0), product of:
              0.112794146 = queryWeight, product of:
                2.6184473 = boost
                4.192163 = idf(docFreq=1776, maxDocs=43254)
                0.010275537 = queryNorm
              0.2620102 = fieldWeight in 5005, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.192163 = idf(docFreq=1776, maxDocs=43254)
                0.0625 = fieldNorm(doc=5005)
          0.9417504 = weight(abstract_txt:opinion in 5005) [ClassicSimilarity], result of:
            0.9417504 = score(doc=5005,freq=11.0), product of:
              0.65964013 = queryWeight, product of:
                9.320735 = boost
                6.8873534 = idf(docFreq=119, maxDocs=43254)
                0.010275537 = queryNorm
              1.427673 = fieldWeight in 5005, product of:
                3.3166249 = tf(freq=11.0), with freq of:
                  11.0 = termFreq=11.0
                6.8873534 = idf(docFreq=119, maxDocs=43254)
                0.0625 = fieldNorm(doc=5005)
        0.24 = coord(6/25)