Document (#38306)

Author
Vechtomova, O.
Title
¬A method for automatic extraction of multiword units representing business aspects from user reviews
Source
Journal of the Association for Information Science and Technology. 65(2014) no.7, S.1463-1477
Year
2014
Abstract
The article describes a semi-supervised approach to extracting multiword aspects of user-written reviews that belong to a given category. The method starts with a small set of seed words, representing the target category, and calculates distributional similarity between the candidate and seed words. We compare 3 distributional similarity measures (Lin's, Weeds's, and balAPinc), and a document retrieval function, BM25, adapted as a word similarity measure. We then introduce a method for identifying multiword aspects by using a combination of syntactic rules and a co-occurrence association measure. Finally, we describe a method for ranking multiword aspects by the likelihood of belonging to the target aspect category. The task used for evaluation is extraction of restaurant dish names from a corpus of restaurant reviews.
Theme
Computerlinguistik

Similar documents (author)

  1. Vechtomova, O.: Facet-based opinion retrieval from blogs (2010) 5.99
    5.989656 = sum of:
      5.989656 = weight(author_txt:vechtomova in 1226) [ClassicSimilarity], result of:
        5.989656 = fieldWeight in 1226, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.583449 = idf(docFreq=7, maxDocs=42740)
          0.625 = fieldNorm(doc=1226)
    
  2. Vechtomova, O.; Karamuftuoglu, M.: Query expansion with terms selected using lexical cohesion analysis of documents (2007) 4.79
    4.7917247 = sum of:
      4.7917247 = weight(author_txt:vechtomova in 2909) [ClassicSimilarity], result of:
        4.7917247 = fieldWeight in 2909, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.583449 = idf(docFreq=7, maxDocs=42740)
          0.5 = fieldNorm(doc=2909)
    
  3. Vechtomova, O.; Karamuftuoglu, M.: Elicitation and use of relevance feedback information (2006) 4.79
    4.7917247 = sum of:
      4.7917247 = weight(author_txt:vechtomova in 2967) [ClassicSimilarity], result of:
        4.7917247 = fieldWeight in 2967, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.583449 = idf(docFreq=7, maxDocs=42740)
          0.5 = fieldNorm(doc=2967)
    
  4. Vechtomova, O.; Karamuftuoglu, M.: Lexical cohesion and term proximity in document ranking (2008) 4.79
    4.7917247 = sum of:
      4.7917247 = weight(author_txt:vechtomova in 4102) [ClassicSimilarity], result of:
        4.7917247 = fieldWeight in 4102, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.583449 = idf(docFreq=7, maxDocs=42740)
          0.5 = fieldNorm(doc=4102)
    
  5. Vechtomova, O.; Robertson, S.E.: ¬A domain-independent approach to finding related entities (2012) 4.79
    4.7917247 = sum of:
      4.7917247 = weight(author_txt:vechtomova in 4734) [ClassicSimilarity], result of:
        4.7917247 = fieldWeight in 4734, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.583449 = idf(docFreq=7, maxDocs=42740)
          0.5 = fieldNorm(doc=4734)
    

Similar documents (content)

  1. Vechtomova, O.; Robertson, S.E.: ¬A domain-independent approach to finding related entities (2012) 0.30
    0.29927784 = sum of:
      0.29927784 = product of:
        0.93524325 = sum of:
          0.11500559 = weight(abstract_txt:candidate in 4734) [ClassicSimilarity], result of:
            0.11500559 = score(doc=4734,freq=4.0), product of:
              0.100741506 = queryWeight, product of:
                1.0739523 = boost
                7.306182 = idf(docFreq=77, maxDocs=42740)
                0.012839052 = queryNorm
              1.141591 = fieldWeight in 4734, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.306182 = idf(docFreq=77, maxDocs=42740)
                0.078125 = fieldNorm(doc=4734)
          0.06434946 = weight(abstract_txt:likelihood in 4734) [ClassicSimilarity], result of:
            0.06434946 = score(doc=4734,freq=1.0), product of:
              0.1085873 = queryWeight, product of:
                1.1149883 = boost
                7.585353 = idf(docFreq=58, maxDocs=42740)
                0.012839052 = queryNorm
              0.5926057 = fieldWeight in 4734, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.585353 = idf(docFreq=58, maxDocs=42740)
                0.078125 = fieldNorm(doc=4734)
          0.047377337 = weight(abstract_txt:measure in 4734) [ClassicSimilarity], result of:
            0.047377337 = score(doc=4734,freq=1.0), product of:
              0.11155086 = queryWeight, product of:
                1.5982041 = boost
                5.4363537 = idf(docFreq=505, maxDocs=42740)
                0.012839052 = queryNorm
              0.42471513 = fieldWeight in 4734, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4363537 = idf(docFreq=505, maxDocs=42740)
                0.078125 = fieldNorm(doc=4734)
          0.11888701 = weight(abstract_txt:target in 4734) [ClassicSimilarity], result of:
            0.11888701 = score(doc=4734,freq=2.0), product of:
              0.16349536 = queryWeight, product of:
                1.9348553 = boost
                6.581486 = idf(docFreq=160, maxDocs=42740)
                0.012839052 = queryNorm
              0.7271583 = fieldWeight in 4734, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.581486 = idf(docFreq=160, maxDocs=42740)
                0.078125 = fieldNorm(doc=4734)
          0.25859007 = weight(abstract_txt:seed in 4734) [ClassicSimilarity], result of:
            0.25859007 = score(doc=4734,freq=2.0), product of:
              0.27446717 = queryWeight, product of:
                2.506923 = boost
                8.527396 = idf(docFreq=22, maxDocs=42740)
                0.012839052 = queryNorm
              0.94215304 = fieldWeight in 4734, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.527396 = idf(docFreq=22, maxDocs=42740)
                0.078125 = fieldNorm(doc=4734)
          0.08690935 = weight(abstract_txt:similarity in 4734) [ClassicSimilarity], result of:
            0.08690935 = score(doc=4734,freq=1.0), product of:
              0.19135238 = queryWeight, product of:
                2.5636477 = boost
                5.8135657 = idf(docFreq=346, maxDocs=42740)
                0.012839052 = queryNorm
              0.45418483 = fieldWeight in 4734, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8135657 = idf(docFreq=346, maxDocs=42740)
                0.078125 = fieldNorm(doc=4734)
          0.054521404 = weight(abstract_txt:method in 4734) [ClassicSimilarity], result of:
            0.054521404 = score(doc=4734,freq=1.0), product of:
              0.15434071 = queryWeight, product of:
                2.658588 = boost
                4.5216455 = idf(docFreq=1262, maxDocs=42740)
                0.012839052 = queryNorm
              0.35325354 = fieldWeight in 4734, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5216455 = idf(docFreq=1262, maxDocs=42740)
                0.078125 = fieldNorm(doc=4734)
          0.1896031 = weight(abstract_txt:category in 4734) [ClassicSimilarity], result of:
            0.1896031 = score(doc=4734,freq=3.0), product of:
              0.22317529 = queryWeight, product of:
                2.768627 = boost
                6.2783957 = idf(docFreq=217, maxDocs=42740)
                0.012839052 = queryNorm
              0.84957033 = fieldWeight in 4734, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.2783957 = idf(docFreq=217, maxDocs=42740)
                0.078125 = fieldNorm(doc=4734)
        0.32 = coord(8/25)
    
  2. Cruys, T. van de; Moirón, B.V.: Semantics-based multiword expression extraction (2007) 0.21
    0.20501009 = sum of:
      0.20501009 = product of:
        1.0250504 = sum of:
          0.08499511 = weight(abstract_txt:extraction in 4920) [ClassicSimilarity], result of:
            0.08499511 = score(doc=4920,freq=1.0), product of:
              0.14584816 = queryWeight, product of:
                1.8274531 = boost
                6.216153 = idf(docFreq=231, maxDocs=42740)
                0.012839052 = queryNorm
              0.5827644 = fieldWeight in 4920, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.216153 = idf(docFreq=231, maxDocs=42740)
                0.09375 = fieldNorm(doc=4920)
          0.104291216 = weight(abstract_txt:similarity in 4920) [ClassicSimilarity], result of:
            0.104291216 = score(doc=4920,freq=1.0), product of:
              0.19135238 = queryWeight, product of:
                2.5636477 = boost
                5.8135657 = idf(docFreq=346, maxDocs=42740)
                0.012839052 = queryNorm
              0.5450218 = fieldWeight in 4920, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8135657 = idf(docFreq=346, maxDocs=42740)
                0.09375 = fieldNorm(doc=4920)
          0.092525885 = weight(abstract_txt:method in 4920) [ClassicSimilarity], result of:
            0.092525885 = score(doc=4920,freq=2.0), product of:
              0.15434071 = queryWeight, product of:
                2.658588 = boost
                4.5216455 = idf(docFreq=1262, maxDocs=42740)
                0.012839052 = queryNorm
              0.5994911 = fieldWeight in 4920, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.5216455 = idf(docFreq=1262, maxDocs=42740)
                0.09375 = fieldNorm(doc=4920)
          0.29020098 = weight(abstract_txt:distributional in 4920) [ClassicSimilarity], result of:
            0.29020098 = score(doc=4920,freq=1.0), product of:
              0.33070257 = queryWeight, product of:
                2.7517855 = boost
                9.360306 = idf(docFreq=9, maxDocs=42740)
                0.012839052 = queryNorm
              0.87752867 = fieldWeight in 4920, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.360306 = idf(docFreq=9, maxDocs=42740)
                0.09375 = fieldNorm(doc=4920)
          0.45303726 = weight(abstract_txt:multiword in 4920) [ClassicSimilarity], result of:
            0.45303726 = score(doc=4920,freq=1.0), product of:
              0.5607091 = queryWeight, product of:
                5.0673347 = boost
                8.618368 = idf(docFreq=20, maxDocs=42740)
                0.012839052 = queryNorm
              0.807972 = fieldWeight in 4920, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.618368 = idf(docFreq=20, maxDocs=42740)
                0.09375 = fieldNorm(doc=4920)
        0.2 = coord(5/25)
    
  3. Nagy T., I.: Detecting multiword expressions and named entities in natural language texts (2014) 0.20
    0.20302145 = sum of:
      0.20302145 = product of:
        0.8459227 = sum of:
          0.028751398 = weight(abstract_txt:candidate in 3537) [ClassicSimilarity], result of:
            0.028751398 = score(doc=3537,freq=1.0), product of:
              0.100741506 = queryWeight, product of:
                1.0739523 = boost
                7.306182 = idf(docFreq=77, maxDocs=42740)
                0.012839052 = queryNorm
              0.28539774 = fieldWeight in 3537, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.306182 = idf(docFreq=77, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3537)
          0.046125382 = weight(abstract_txt:supervised in 3537) [ClassicSimilarity], result of:
            0.046125382 = score(doc=3537,freq=2.0), product of:
              0.109576926 = queryWeight, product of:
                1.1200576 = boost
                7.619839 = idf(docFreq=56, maxDocs=42740)
                0.012839052 = queryNorm
              0.42094064 = fieldWeight in 3537, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.619839 = idf(docFreq=56, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3537)
          0.022684705 = weight(abstract_txt:words in 3537) [ClassicSimilarity], result of:
            0.022684705 = score(doc=3537,freq=1.0), product of:
              0.10837636 = queryWeight, product of:
                1.5752993 = boost
                5.358442 = idf(docFreq=546, maxDocs=42740)
                0.012839052 = queryNorm
              0.20931414 = fieldWeight in 3537, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.358442 = idf(docFreq=546, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3537)
          0.061339933 = weight(abstract_txt:extraction in 3537) [ClassicSimilarity], result of:
            0.061339933 = score(doc=3537,freq=3.0), product of:
              0.14584816 = queryWeight, product of:
                1.8274531 = boost
                6.216153 = idf(docFreq=231, maxDocs=42740)
                0.012839052 = queryNorm
              0.42057395 = fieldWeight in 3537, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.216153 = idf(docFreq=231, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3537)
          0.060956787 = weight(abstract_txt:method in 3537) [ClassicSimilarity], result of:
            0.060956787 = score(doc=3537,freq=5.0), product of:
              0.15434071 = queryWeight, product of:
                2.658588 = boost
                4.5216455 = idf(docFreq=1262, maxDocs=42740)
                0.012839052 = queryNorm
              0.3949495 = fieldWeight in 3537, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.5216455 = idf(docFreq=1262, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3537)
          0.6260645 = weight(abstract_txt:multiword in 3537) [ClassicSimilarity], result of:
            0.6260645 = score(doc=3537,freq=11.0), product of:
              0.5607091 = queryWeight, product of:
                5.0673347 = boost
                8.618368 = idf(docFreq=20, maxDocs=42740)
                0.012839052 = queryNorm
              1.1165584 = fieldWeight in 3537, product of:
                3.3166249 = tf(freq=11.0), with freq of:
                  11.0 = termFreq=11.0
                8.618368 = idf(docFreq=20, maxDocs=42740)
                0.0390625 = fieldNorm(doc=3537)
        0.24 = coord(6/25)
    
  4. Landauer, T.K.; Foltz, P.W.; Laham, D.: ¬An introduction to Latent Semantic Analysis (1998) 0.11
    0.10572448 = sum of:
      0.10572448 = product of:
        0.44051865 = sum of:
          0.049761824 = weight(abstract_txt:extracting in 3163) [ClassicSimilarity], result of:
            0.049761824 = score(doc=3163,freq=1.0), product of:
              0.09148432 = queryWeight, product of:
                1.0234206 = boost
                6.96241 = idf(docFreq=109, maxDocs=42740)
                0.012839052 = queryNorm
              0.5439383 = fieldWeight in 3163, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.96241 = idf(docFreq=109, maxDocs=42740)
                0.078125 = fieldNorm(doc=3163)
          0.07858212 = weight(abstract_txt:words in 3163) [ClassicSimilarity], result of:
            0.07858212 = score(doc=3163,freq=3.0), product of:
              0.10837636 = queryWeight, product of:
                1.5752993 = boost
                5.358442 = idf(docFreq=546, maxDocs=42740)
                0.012839052 = queryNorm
              0.72508544 = fieldWeight in 3163, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.358442 = idf(docFreq=546, maxDocs=42740)
                0.078125 = fieldNorm(doc=3163)
          0.061276533 = weight(abstract_txt:representing in 3163) [ClassicSimilarity], result of:
            0.061276533 = score(doc=3163,freq=1.0), product of:
              0.13242051 = queryWeight, product of:
                1.7412993 = boost
                5.9230976 = idf(docFreq=310, maxDocs=42740)
                0.012839052 = queryNorm
              0.462742 = fieldWeight in 3163, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9230976 = idf(docFreq=310, maxDocs=42740)
                0.078125 = fieldNorm(doc=3163)
          0.08690935 = weight(abstract_txt:similarity in 3163) [ClassicSimilarity], result of:
            0.08690935 = score(doc=3163,freq=1.0), product of:
              0.19135238 = queryWeight, product of:
                2.5636477 = boost
                5.8135657 = idf(docFreq=346, maxDocs=42740)
                0.012839052 = queryNorm
              0.45418483 = fieldWeight in 3163, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8135657 = idf(docFreq=346, maxDocs=42740)
                0.078125 = fieldNorm(doc=3163)
          0.054521404 = weight(abstract_txt:method in 3163) [ClassicSimilarity], result of:
            0.054521404 = score(doc=3163,freq=1.0), product of:
              0.15434071 = queryWeight, product of:
                2.658588 = boost
                4.5216455 = idf(docFreq=1262, maxDocs=42740)
                0.012839052 = queryNorm
              0.35325354 = fieldWeight in 3163, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5216455 = idf(docFreq=1262, maxDocs=42740)
                0.078125 = fieldNorm(doc=3163)
          0.10946741 = weight(abstract_txt:category in 3163) [ClassicSimilarity], result of:
            0.10946741 = score(doc=3163,freq=1.0), product of:
              0.22317529 = queryWeight, product of:
                2.768627 = boost
                6.2783957 = idf(docFreq=217, maxDocs=42740)
                0.012839052 = queryNorm
              0.49049968 = fieldWeight in 3163, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2783957 = idf(docFreq=217, maxDocs=42740)
                0.078125 = fieldNorm(doc=3163)
        0.24 = coord(6/25)
    
  5. Gödert, W.: Detecting multiword phrases in mathematical text corpora (2012) 0.09
    0.09451927 = sum of:
      0.09451927 = product of:
        0.7876606 = sum of:
          0.05444329 = weight(abstract_txt:words in 2467) [ClassicSimilarity], result of:
            0.05444329 = score(doc=2467,freq=1.0), product of:
              0.10837636 = queryWeight, product of:
                1.5752993 = boost
                5.358442 = idf(docFreq=546, maxDocs=42740)
                0.012839052 = queryNorm
              0.5023539 = fieldWeight in 2467, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.358442 = idf(docFreq=546, maxDocs=42740)
                0.09375 = fieldNorm(doc=2467)
          0.092525885 = weight(abstract_txt:method in 2467) [ClassicSimilarity], result of:
            0.092525885 = score(doc=2467,freq=2.0), product of:
              0.15434071 = queryWeight, product of:
                2.658588 = boost
                4.5216455 = idf(docFreq=1262, maxDocs=42740)
                0.012839052 = queryNorm
              0.5994911 = fieldWeight in 2467, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.5216455 = idf(docFreq=1262, maxDocs=42740)
                0.09375 = fieldNorm(doc=2467)
          0.64069146 = weight(abstract_txt:multiword in 2467) [ClassicSimilarity], result of:
            0.64069146 = score(doc=2467,freq=2.0), product of:
              0.5607091 = queryWeight, product of:
                5.0673347 = boost
                8.618368 = idf(docFreq=20, maxDocs=42740)
                0.012839052 = queryNorm
              1.142645 = fieldWeight in 2467, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.618368 = idf(docFreq=20, maxDocs=42740)
                0.09375 = fieldNorm(doc=2467)
        0.12 = coord(3/25)