Document (#38306)

Author
Vechtomova, O.
Title
¬A method for automatic extraction of multiword units representing business aspects from user reviews
Source
Journal of the Association for Information Science and Technology. 65(2014) no.7, S.1463-1477
Year
2014
Abstract
The article describes a semi-supervised approach to extracting multiword aspects of user-written reviews that belong to a given category. The method starts with a small set of seed words, representing the target category, and calculates distributional similarity between the candidate and seed words. We compare 3 distributional similarity measures (Lin's, Weeds's, and balAPinc), and a document retrieval function, BM25, adapted as a word similarity measure. We then introduce a method for identifying multiword aspects by using a combination of syntactic rules and a co-occurrence association measure. Finally, we describe a method for ranking multiword aspects by the likelihood of belonging to the target aspect category. The task used for evaluation is extraction of restaurant dish names from a corpus of restaurant reviews.
Theme
Computerlinguistik

Similar documents (author)

  1. Vechtomova, O.: Facet-based opinion retrieval from blogs (2010) 5.98
    5.9832764 = sum of:
      5.9832764 = weight(author_txt:vechtomova in 1226) [ClassicSimilarity], result of:
        5.9832764 = fieldWeight in 1226, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.573242 = idf(docFreq=7, maxDocs=42306)
          0.625 = fieldNorm(doc=1226)
    
  2. Vechtomova, O.; Karamuftuoglu, M.: Query expansion with terms selected using lexical cohesion analysis of documents (2007) 4.79
    4.786621 = sum of:
      4.786621 = weight(author_txt:vechtomova in 2909) [ClassicSimilarity], result of:
        4.786621 = fieldWeight in 2909, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.573242 = idf(docFreq=7, maxDocs=42306)
          0.5 = fieldNorm(doc=2909)
    
  3. Vechtomova, O.; Karamuftuoglu, M.: Elicitation and use of relevance feedback information (2006) 4.79
    4.786621 = sum of:
      4.786621 = weight(author_txt:vechtomova in 2967) [ClassicSimilarity], result of:
        4.786621 = fieldWeight in 2967, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.573242 = idf(docFreq=7, maxDocs=42306)
          0.5 = fieldNorm(doc=2967)
    
  4. Vechtomova, O.; Karamuftuoglu, M.: Lexical cohesion and term proximity in document ranking (2008) 4.79
    4.786621 = sum of:
      4.786621 = weight(author_txt:vechtomova in 4102) [ClassicSimilarity], result of:
        4.786621 = fieldWeight in 4102, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.573242 = idf(docFreq=7, maxDocs=42306)
          0.5 = fieldNorm(doc=4102)
    
  5. Vechtomova, O.; Robertson, S.E.: ¬A domain-independent approach to finding related entities (2012) 4.79
    4.786621 = sum of:
      4.786621 = weight(author_txt:vechtomova in 4734) [ClassicSimilarity], result of:
        4.786621 = fieldWeight in 4734, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.573242 = idf(docFreq=7, maxDocs=42306)
          0.5 = fieldNorm(doc=4734)
    

Similar documents (content)

  1. Vechtomova, O.; Robertson, S.E.: ¬A domain-independent approach to finding related entities (2012) 0.30
    0.3004825 = sum of:
      0.3004825 = product of:
        0.9390079 = sum of:
          0.116441905 = weight(abstract_txt:candidate in 4734) [ClassicSimilarity], result of:
            0.116441905 = score(doc=4734,freq=4.0), product of:
              0.10159622 = queryWeight, product of:
                1.0786016 = boost
                7.335196 = idf(docFreq=74, maxDocs=42306)
                0.012841175 = queryNorm
              1.1461244 = fieldWeight in 4734, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.335196 = idf(docFreq=74, maxDocs=42306)
                0.078125 = fieldNorm(doc=4734)
          0.064123504 = weight(abstract_txt:likelihood in 4734) [ClassicSimilarity], result of:
            0.064123504 = score(doc=4734,freq=1.0), product of:
              0.10835181 = queryWeight, product of:
                1.113885 = boost
                7.5751467 = idf(docFreq=58, maxDocs=42306)
                0.012841175 = queryNorm
              0.5918083 = fieldWeight in 4734, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5751467 = idf(docFreq=58, maxDocs=42306)
                0.078125 = fieldNorm(doc=4734)
          0.04744713 = weight(abstract_txt:measure in 4734) [ClassicSimilarity], result of:
            0.04744713 = score(doc=4734,freq=1.0), product of:
              0.11167981 = queryWeight, product of:
                1.5992804 = boost
                5.438076 = idf(docFreq=499, maxDocs=42306)
                0.012841175 = queryNorm
              0.4248497 = fieldWeight in 4734, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.438076 = idf(docFreq=499, maxDocs=42306)
                0.078125 = fieldNorm(doc=4734)
          0.11941614 = weight(abstract_txt:target in 4734) [ClassicSimilarity], result of:
            0.11941614 = score(doc=4734,freq=2.0), product of:
              0.16400863 = queryWeight, product of:
                1.9380752 = boost
                6.5900893 = idf(docFreq=157, maxDocs=42306)
                0.012841175 = queryNorm
              0.7281089 = fieldWeight in 4734, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.5900893 = idf(docFreq=157, maxDocs=42306)
                0.078125 = fieldNorm(doc=4734)
          0.25779715 = weight(abstract_txt:seed in 4734) [ClassicSimilarity], result of:
            0.25779715 = score(doc=4734,freq=2.0), product of:
              0.27395347 = queryWeight, product of:
                2.504815 = boost
                8.51719 = idf(docFreq=22, maxDocs=42306)
                0.012841175 = queryNorm
              0.9410254 = fieldWeight in 4734, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.51719 = idf(docFreq=22, maxDocs=42306)
                0.078125 = fieldNorm(doc=4734)
          0.08701702 = weight(abstract_txt:similarity in 4734) [ClassicSimilarity], result of:
            0.08701702 = score(doc=4734,freq=1.0), product of:
              0.19154373 = queryWeight, product of:
                2.5651743 = boost
                5.814954 = idf(docFreq=342, maxDocs=42306)
                0.012841175 = queryNorm
              0.45429325 = fieldWeight in 4734, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.814954 = idf(docFreq=342, maxDocs=42306)
                0.078125 = fieldNorm(doc=4734)
          0.055022534 = weight(abstract_txt:method in 4734) [ClassicSimilarity], result of:
            0.055022534 = score(doc=4734,freq=1.0), product of:
              0.15531202 = queryWeight, product of:
                2.6671953 = boost
                4.534668 = idf(docFreq=1233, maxDocs=42306)
                0.012841175 = queryNorm
              0.35427094 = fieldWeight in 4734, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.534668 = idf(docFreq=1233, maxDocs=42306)
                0.078125 = fieldNorm(doc=4734)
          0.19174254 = weight(abstract_txt:category in 4734) [ClassicSimilarity], result of:
            0.19174254 = score(doc=4734,freq=3.0), product of:
              0.22489008 = queryWeight, product of:
                2.7795088 = boost
                6.300826 = idf(docFreq=210, maxDocs=42306)
                0.012841175 = queryNorm
              0.8526056 = fieldWeight in 4734, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.300826 = idf(docFreq=210, maxDocs=42306)
                0.078125 = fieldNorm(doc=4734)
        0.32 = coord(8/25)
    
  2. Cruys, T. van de; Moirón, B.V.: Semantics-based multiword expression extraction (2007) 0.20
    0.20484006 = sum of:
      0.20484006 = product of:
        1.0242003 = sum of:
          0.085334696 = weight(abstract_txt:extraction in 4920) [ClassicSimilarity], result of:
            0.085334696 = score(doc=4920,freq=1.0), product of:
              0.14626181 = queryWeight, product of:
                1.8302177 = boost
                6.2233386 = idf(docFreq=227, maxDocs=42306)
                0.012841175 = queryNorm
              0.583438 = fieldWeight in 4920, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2233386 = idf(docFreq=227, maxDocs=42306)
                0.09375 = fieldNorm(doc=4920)
          0.10442044 = weight(abstract_txt:similarity in 4920) [ClassicSimilarity], result of:
            0.10442044 = score(doc=4920,freq=1.0), product of:
              0.19154373 = queryWeight, product of:
                2.5651743 = boost
                5.814954 = idf(docFreq=342, maxDocs=42306)
                0.012841175 = queryNorm
              0.54515195 = fieldWeight in 4920, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.814954 = idf(docFreq=342, maxDocs=42306)
                0.09375 = fieldNorm(doc=4920)
          0.09337633 = weight(abstract_txt:method in 4920) [ClassicSimilarity], result of:
            0.09337633 = score(doc=4920,freq=2.0), product of:
              0.15531202 = queryWeight, product of:
                2.6671953 = boost
                4.534668 = idf(docFreq=1233, maxDocs=42306)
                0.012841175 = queryNorm
              0.6012177 = fieldWeight in 4920, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.534668 = idf(docFreq=1233, maxDocs=42306)
                0.09375 = fieldNorm(doc=4920)
          0.2894037 = weight(abstract_txt:distributional in 4920) [ClassicSimilarity], result of:
            0.2894037 = score(doc=4920,freq=1.0), product of:
              0.330154 = queryWeight, product of:
                2.749765 = boost
                9.3501 = idf(docFreq=9, maxDocs=42306)
                0.012841175 = queryNorm
              0.87657183 = fieldWeight in 4920, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.3501 = idf(docFreq=9, maxDocs=42306)
                0.09375 = fieldNorm(doc=4920)
          0.4516652 = weight(abstract_txt:multiword in 4920) [ClassicSimilarity], result of:
            0.4516652 = score(doc=4920,freq=1.0), product of:
              0.5596737 = queryWeight, product of:
                5.0631375 = boost
                8.608162 = idf(docFreq=20, maxDocs=42306)
                0.012841175 = queryNorm
              0.8070152 = fieldWeight in 4920, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.608162 = idf(docFreq=20, maxDocs=42306)
                0.09375 = fieldNorm(doc=4920)
        0.2 = coord(5/25)
    
  3. Nagy T., I.: Detecting multiword expressions and named entities in natural language texts (2014) 0.20
    0.2028953 = sum of:
      0.2028953 = product of:
        0.8453971 = sum of:
          0.029110476 = weight(abstract_txt:candidate in 3537) [ClassicSimilarity], result of:
            0.029110476 = score(doc=3537,freq=1.0), product of:
              0.10159622 = queryWeight, product of:
                1.0786016 = boost
                7.335196 = idf(docFreq=74, maxDocs=42306)
                0.012841175 = queryNorm
              0.2865311 = fieldWeight in 3537, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.335196 = idf(docFreq=74, maxDocs=42306)
                0.0390625 = fieldNorm(doc=3537)
          0.04628574 = weight(abstract_txt:supervised in 3537) [ClassicSimilarity], result of:
            0.04628574 = score(doc=3537,freq=2.0), product of:
              0.109849855 = queryWeight, product of:
                1.1215587 = boost
                7.6273327 = idf(docFreq=55, maxDocs=42306)
                0.012841175 = queryNorm
              0.4213546 = fieldWeight in 3537, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.6273327 = idf(docFreq=55, maxDocs=42306)
                0.0390625 = fieldNorm(doc=3537)
          0.022730526 = weight(abstract_txt:words in 3537) [ClassicSimilarity], result of:
            0.022730526 = score(doc=3537,freq=1.0), product of:
              0.10854113 = queryWeight, product of:
                1.5766469 = boost
                5.361115 = idf(docFreq=539, maxDocs=42306)
                0.012841175 = queryNorm
              0.20941855 = fieldWeight in 3537, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.361115 = idf(docFreq=539, maxDocs=42306)
                0.0390625 = fieldNorm(doc=3537)
          0.06158501 = weight(abstract_txt:extraction in 3537) [ClassicSimilarity], result of:
            0.06158501 = score(doc=3537,freq=3.0), product of:
              0.14626181 = queryWeight, product of:
                1.8302177 = boost
                6.2233386 = idf(docFreq=227, maxDocs=42306)
                0.012841175 = queryNorm
              0.4210601 = fieldWeight in 3537, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.2233386 = idf(docFreq=227, maxDocs=42306)
                0.0390625 = fieldNorm(doc=3537)
          0.06151706 = weight(abstract_txt:method in 3537) [ClassicSimilarity], result of:
            0.06151706 = score(doc=3537,freq=5.0), product of:
              0.15531202 = queryWeight, product of:
                2.6671953 = boost
                4.534668 = idf(docFreq=1233, maxDocs=42306)
                0.012841175 = queryNorm
              0.39608693 = fieldWeight in 3537, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.534668 = idf(docFreq=1233, maxDocs=42306)
                0.0390625 = fieldNorm(doc=3537)
          0.62416834 = weight(abstract_txt:multiword in 3537) [ClassicSimilarity], result of:
            0.62416834 = score(doc=3537,freq=11.0), product of:
              0.5596737 = queryWeight, product of:
                5.0631375 = boost
                8.608162 = idf(docFreq=20, maxDocs=42306)
                0.012841175 = queryNorm
              1.115236 = fieldWeight in 3537, product of:
                3.3166249 = tf(freq=11.0), with freq of:
                  11.0 = termFreq=11.0
                8.608162 = idf(docFreq=20, maxDocs=42306)
                0.0390625 = fieldNorm(doc=3537)
        0.24 = coord(6/25)
    
  4. Landauer, T.K.; Foltz, P.W.; Laham, D.: ¬An introduction to Latent Semantic Analysis (1998) 0.11
    0.10616275 = sum of:
      0.10616275 = product of:
        0.44234478 = sum of:
          0.049569163 = weight(abstract_txt:extracting in 3163) [ClassicSimilarity], result of:
            0.049569163 = score(doc=3163,freq=1.0), product of:
              0.09126391 = queryWeight, product of:
                1.0222846 = boost
                6.9522038 = idf(docFreq=109, maxDocs=42306)
                0.012841175 = queryNorm
              0.5431409 = fieldWeight in 3163, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9522038 = idf(docFreq=109, maxDocs=42306)
                0.078125 = fieldNorm(doc=3163)
          0.07874086 = weight(abstract_txt:words in 3163) [ClassicSimilarity], result of:
            0.07874086 = score(doc=3163,freq=3.0), product of:
              0.10854113 = queryWeight, product of:
                1.5766469 = boost
                5.361115 = idf(docFreq=539, maxDocs=42306)
                0.012841175 = queryNorm
              0.7254472 = fieldWeight in 3163, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.361115 = idf(docFreq=539, maxDocs=42306)
                0.078125 = fieldNorm(doc=3163)
          0.061292585 = weight(abstract_txt:representing in 3163) [ClassicSimilarity], result of:
            0.061292585 = score(doc=3163,freq=1.0), product of:
              0.13246667 = queryWeight, product of:
                1.7417691 = boost
                5.9225845 = idf(docFreq=307, maxDocs=42306)
                0.012841175 = queryNorm
              0.46270192 = fieldWeight in 3163, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9225845 = idf(docFreq=307, maxDocs=42306)
                0.078125 = fieldNorm(doc=3163)
          0.08701702 = weight(abstract_txt:similarity in 3163) [ClassicSimilarity], result of:
            0.08701702 = score(doc=3163,freq=1.0), product of:
              0.19154373 = queryWeight, product of:
                2.5651743 = boost
                5.814954 = idf(docFreq=342, maxDocs=42306)
                0.012841175 = queryNorm
              0.45429325 = fieldWeight in 3163, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.814954 = idf(docFreq=342, maxDocs=42306)
                0.078125 = fieldNorm(doc=3163)
          0.055022534 = weight(abstract_txt:method in 3163) [ClassicSimilarity], result of:
            0.055022534 = score(doc=3163,freq=1.0), product of:
              0.15531202 = queryWeight, product of:
                2.6671953 = boost
                4.534668 = idf(docFreq=1233, maxDocs=42306)
                0.012841175 = queryNorm
              0.35427094 = fieldWeight in 3163, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.534668 = idf(docFreq=1233, maxDocs=42306)
                0.078125 = fieldNorm(doc=3163)
          0.110702604 = weight(abstract_txt:category in 3163) [ClassicSimilarity], result of:
            0.110702604 = score(doc=3163,freq=1.0), product of:
              0.22489008 = queryWeight, product of:
                2.7795088 = boost
                6.300826 = idf(docFreq=210, maxDocs=42306)
                0.012841175 = queryNorm
              0.49225205 = fieldWeight in 3163, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.300826 = idf(docFreq=210, maxDocs=42306)
                0.078125 = fieldNorm(doc=3163)
        0.24 = coord(6/25)
    
  5. Gödert, W.: Detecting multiword phrases in mathematical text corpora (2012) 0.09
    0.09440168 = sum of:
      0.09440168 = product of:
        0.7866807 = sum of:
          0.054553267 = weight(abstract_txt:words in 2467) [ClassicSimilarity], result of:
            0.054553267 = score(doc=2467,freq=1.0), product of:
              0.10854113 = queryWeight, product of:
                1.5766469 = boost
                5.361115 = idf(docFreq=539, maxDocs=42306)
                0.012841175 = queryNorm
              0.50260454 = fieldWeight in 2467, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.361115 = idf(docFreq=539, maxDocs=42306)
                0.09375 = fieldNorm(doc=2467)
          0.09337633 = weight(abstract_txt:method in 2467) [ClassicSimilarity], result of:
            0.09337633 = score(doc=2467,freq=2.0), product of:
              0.15531202 = queryWeight, product of:
                2.6671953 = boost
                4.534668 = idf(docFreq=1233, maxDocs=42306)
                0.012841175 = queryNorm
              0.6012177 = fieldWeight in 2467, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.534668 = idf(docFreq=1233, maxDocs=42306)
                0.09375 = fieldNorm(doc=2467)
          0.6387511 = weight(abstract_txt:multiword in 2467) [ClassicSimilarity], result of:
            0.6387511 = score(doc=2467,freq=2.0), product of:
              0.5596737 = queryWeight, product of:
                5.0631375 = boost
                8.608162 = idf(docFreq=20, maxDocs=42306)
                0.012841175 = queryNorm
              1.1412919 = fieldWeight in 2467, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.608162 = idf(docFreq=20, maxDocs=42306)
                0.09375 = fieldNorm(doc=2467)
        0.12 = coord(3/25)