Document (#38305)

Author
Vechtomova, O.
Title
¬A method for automatic extraction of multiword units representing business aspects from user reviews
Source
Journal of the Association for Information Science and Technology. 65(2014) no.7, S.1463-1477
Year
2014
Abstract
The article describes a semi-supervised approach to extracting multiword aspects of user-written reviews that belong to a given category. The method starts with a small set of seed words, representing the target category, and calculates distributional similarity between the candidate and seed words. We compare 3 distributional similarity measures (Lin's, Weeds's, and balAPinc), and a document retrieval function, BM25, adapted as a word similarity measure. We then introduce a method for identifying multiword aspects by using a combination of syntactic rules and a co-occurrence association measure. Finally, we describe a method for ranking multiword aspects by the likelihood of belonging to the target aspect category. The task used for evaluation is extraction of restaurant dish names from a corpus of restaurant reviews.
Theme
Computerlinguistik

Similar documents (author)

  1. Vechtomova, O.: Facet-based opinion retrieval from blogs (2010) 6.01
    6.010904 = sum of:
      6.010904 = weight(author_txt:vechtomova in 4225) [ClassicSimilarity], result of:
        6.010904 = fieldWeight in 4225, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.617446 = idf(docFreq=7, maxDocs=44218)
          0.625 = fieldNorm(doc=4225)
    
  2. Vechtomova, O.; Karamuftuoglu, M.: Query expansion with terms selected using lexical cohesion analysis of documents (2007) 4.81
    4.808723 = sum of:
      4.808723 = weight(author_txt:vechtomova in 908) [ClassicSimilarity], result of:
        4.808723 = fieldWeight in 908, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.617446 = idf(docFreq=7, maxDocs=44218)
          0.5 = fieldNorm(doc=908)
    
  3. Vechtomova, O.; Karamuftuoglu, M.: Elicitation and use of relevance feedback information (2006) 4.81
    4.808723 = sum of:
      4.808723 = weight(author_txt:vechtomova in 966) [ClassicSimilarity], result of:
        4.808723 = fieldWeight in 966, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.617446 = idf(docFreq=7, maxDocs=44218)
          0.5 = fieldNorm(doc=966)
    
  4. Vechtomova, O.; Karamuftuoglu, M.: Lexical cohesion and term proximity in document ranking (2008) 4.81
    4.808723 = sum of:
      4.808723 = weight(author_txt:vechtomova in 2101) [ClassicSimilarity], result of:
        4.808723 = fieldWeight in 2101, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.617446 = idf(docFreq=7, maxDocs=44218)
          0.5 = fieldNorm(doc=2101)
    
  5. Vechtomova, O.; Robertson, S.E.: ¬A domain-independent approach to finding related entities (2012) 4.81
    4.808723 = sum of:
      4.808723 = weight(author_txt:vechtomova in 2733) [ClassicSimilarity], result of:
        4.808723 = fieldWeight in 2733, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.617446 = idf(docFreq=7, maxDocs=44218)
          0.5 = fieldNorm(doc=2733)
    

Similar documents (content)

  1. Vechtomova, O.; Robertson, S.E.: ¬A domain-independent approach to finding related entities (2012) 0.30
    0.29823753 = sum of:
      0.29823753 = product of:
        0.93199235 = sum of:
          0.11567074 = weight(abstract_txt:candidate in 2733) [ClassicSimilarity], result of:
            0.11567074 = score(doc=2733,freq=4.0), product of:
              0.101030216 = queryWeight, product of:
                1.0800691 = boost
                7.3274393 = idf(docFreq=78, maxDocs=44218)
                0.012765785 = queryNorm
              1.1449124 = fieldWeight in 2733, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.3274393 = idf(docFreq=78, maxDocs=44218)
                0.078125 = fieldNorm(doc=2733)
          0.062965974 = weight(abstract_txt:likelihood in 2733) [ClassicSimilarity], result of:
            0.062965974 = score(doc=2733,freq=1.0), product of:
              0.106920145 = queryWeight, product of:
                1.1111064 = boost
                7.538004 = idf(docFreq=63, maxDocs=44218)
                0.012765785 = queryNorm
              0.5889065 = fieldWeight in 2733, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.538004 = idf(docFreq=63, maxDocs=44218)
                0.078125 = fieldNorm(doc=2733)
          0.047262657 = weight(abstract_txt:measure in 2733) [ClassicSimilarity], result of:
            0.047262657 = score(doc=2733,freq=1.0), product of:
              0.11126135 = queryWeight, product of:
                1.6029245 = boost
                5.437306 = idf(docFreq=522, maxDocs=44218)
                0.012765785 = queryNorm
              0.42478952 = fieldWeight in 2733, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.437306 = idf(docFreq=522, maxDocs=44218)
                0.078125 = fieldNorm(doc=2733)
          0.11775534 = weight(abstract_txt:target in 2733) [ClassicSimilarity], result of:
            0.11775534 = score(doc=2733,freq=2.0), product of:
              0.16229656 = queryWeight, product of:
                1.9359562 = boost
                6.5669885 = idf(docFreq=168, maxDocs=44218)
                0.012765785 = queryNorm
              0.72555655 = fieldWeight in 2733, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.5669885 = idf(docFreq=168, maxDocs=44218)
                0.078125 = fieldNorm(doc=2733)
          0.26092464 = weight(abstract_txt:seed in 2733) [ClassicSimilarity], result of:
            0.26092464 = score(doc=2733,freq=2.0), product of:
              0.27584535 = queryWeight, product of:
                2.523909 = boost
                8.561393 = idf(docFreq=22, maxDocs=44218)
                0.012765785 = queryNorm
              0.94590914 = fieldWeight in 2733, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.561393 = idf(docFreq=22, maxDocs=44218)
                0.078125 = fieldNorm(doc=2733)
          0.08690346 = weight(abstract_txt:similarity in 2733) [ClassicSimilarity], result of:
            0.08690346 = score(doc=2733,freq=1.0), product of:
              0.19115576 = queryWeight, product of:
                2.5732396 = boost
                5.8191514 = idf(docFreq=356, maxDocs=44218)
                0.012765785 = queryNorm
              0.4546212 = fieldWeight in 2733, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8191514 = idf(docFreq=356, maxDocs=44218)
                0.078125 = fieldNorm(doc=2733)
          0.05361785 = weight(abstract_txt:method in 2733) [ClassicSimilarity], result of:
            0.05361785 = score(doc=2733,freq=1.0), product of:
              0.15248081 = queryWeight, product of:
                2.6537712 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.012765785 = queryNorm
              0.3516367 = fieldWeight in 2733, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.078125 = fieldNorm(doc=2733)
          0.18689175 = weight(abstract_txt:category in 2733) [ClassicSimilarity], result of:
            0.18689175 = score(doc=2733,freq=3.0), product of:
              0.2208254 = queryWeight, product of:
                2.765738 = boost
                6.2544694 = idf(docFreq=230, maxDocs=44218)
                0.012765785 = queryNorm
              0.84633267 = fieldWeight in 2733, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.2544694 = idf(docFreq=230, maxDocs=44218)
                0.078125 = fieldNorm(doc=2733)
        0.32 = coord(8/25)
    
  2. Cruys, T. van de; Moirón, B.V.: Semantics-based multiword expression extraction (2007) 0.21
    0.20571999 = sum of:
      0.20571999 = product of:
        1.0286 = sum of:
          0.08374279 = weight(abstract_txt:extraction in 2919) [ClassicSimilarity], result of:
            0.08374279 = score(doc=2919,freq=1.0), product of:
              0.1442701 = queryWeight, product of:
                1.8252782 = boost
                6.1915555 = idf(docFreq=245, maxDocs=44218)
                0.012765785 = queryNorm
              0.58045834 = fieldWeight in 2919, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1915555 = idf(docFreq=245, maxDocs=44218)
                0.09375 = fieldNorm(doc=2919)
          0.10428416 = weight(abstract_txt:similarity in 2919) [ClassicSimilarity], result of:
            0.10428416 = score(doc=2919,freq=1.0), product of:
              0.19115576 = queryWeight, product of:
                2.5732396 = boost
                5.8191514 = idf(docFreq=356, maxDocs=44218)
                0.012765785 = queryNorm
              0.54554546 = fieldWeight in 2919, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8191514 = idf(docFreq=356, maxDocs=44218)
                0.09375 = fieldNorm(doc=2919)
          0.09099251 = weight(abstract_txt:method in 2919) [ClassicSimilarity], result of:
            0.09099251 = score(doc=2919,freq=2.0), product of:
              0.15248081 = queryWeight, product of:
                2.6537712 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.012765785 = queryNorm
              0.5967473 = fieldWeight in 2919, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.09375 = fieldNorm(doc=2919)
          0.29251066 = weight(abstract_txt:distributional in 2919) [ClassicSimilarity], result of:
            0.29251066 = score(doc=2919,freq=1.0), product of:
              0.3321283 = queryWeight, product of:
                2.7694519 = boost
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.012765785 = queryNorm
              0.88071585 = fieldWeight in 2919, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.09375 = fieldNorm(doc=2919)
          0.4570698 = weight(abstract_txt:multiword in 2919) [ClassicSimilarity], result of:
            0.4570698 = score(doc=2919,freq=1.0), product of:
              0.56347734 = queryWeight, product of:
                5.1014557 = boost
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.012765785 = queryNorm
              0.8111592 = fieldWeight in 2919, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.09375 = fieldNorm(doc=2919)
        0.2 = coord(5/25)
    
  3. Nagy T., I.: Detecting multiword expressions and named entities in natural language texts (2014) 0.20
    0.20320576 = sum of:
      0.20320576 = product of:
        0.8466907 = sum of:
          0.028917685 = weight(abstract_txt:candidate in 1536) [ClassicSimilarity], result of:
            0.028917685 = score(doc=1536,freq=1.0), product of:
              0.101030216 = queryWeight, product of:
                1.0800691 = boost
                7.3274393 = idf(docFreq=78, maxDocs=44218)
                0.012765785 = queryNorm
              0.2862281 = fieldWeight in 1536, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3274393 = idf(docFreq=78, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1536)
          0.043203995 = weight(abstract_txt:supervised in 1536) [ClassicSimilarity], result of:
            0.043203995 = score(doc=1536,freq=2.0), product of:
              0.10479684 = queryWeight, product of:
                1.1000185 = boost
                7.462781 = idf(docFreq=68, maxDocs=44218)
                0.012765785 = queryNorm
              0.4122643 = fieldWeight in 1536, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.462781 = idf(docFreq=68, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1536)
          0.022549152 = weight(abstract_txt:words in 1536) [ClassicSimilarity], result of:
            0.022549152 = score(doc=1536,freq=1.0), product of:
              0.10783814 = queryWeight, product of:
                1.578073 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.012765785 = queryNorm
              0.20910183 = fieldWeight in 1536, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1536)
          0.060436152 = weight(abstract_txt:extraction in 1536) [ClassicSimilarity], result of:
            0.060436152 = score(doc=1536,freq=3.0), product of:
              0.1442701 = queryWeight, product of:
                1.8252782 = boost
                6.1915555 = idf(docFreq=245, maxDocs=44218)
                0.012765785 = queryNorm
              0.41890973 = fieldWeight in 1536, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.1915555 = idf(docFreq=245, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1536)
          0.05994658 = weight(abstract_txt:method in 1536) [ClassicSimilarity], result of:
            0.05994658 = score(doc=1536,freq=5.0), product of:
              0.15248081 = queryWeight, product of:
                2.6537712 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.012765785 = queryNorm
              0.3931418 = fieldWeight in 1536, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1536)
          0.63163716 = weight(abstract_txt:multiword in 1536) [ClassicSimilarity], result of:
            0.63163716 = score(doc=1536,freq=11.0), product of:
              0.56347734 = queryWeight, product of:
                5.1014557 = boost
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.012765785 = queryNorm
              1.1209629 = fieldWeight in 1536, product of:
                3.3166249 = tf(freq=11.0), with freq of:
                  11.0 = termFreq=11.0
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1536)
        0.24 = coord(6/25)
    
  4. Landauer, T.K.; Foltz, P.W.; Laham, D.: ¬An introduction to Latent Semantic Analysis (1998) 0.10
    0.10443502 = sum of:
      0.10443502 = product of:
        0.4351459 = sum of:
          0.049025543 = weight(abstract_txt:extracting in 1162) [ClassicSimilarity], result of:
            0.049025543 = score(doc=1162,freq=1.0), product of:
              0.09049068 = queryWeight, product of:
                1.022181 = boost
                6.9347134 = idf(docFreq=116, maxDocs=44218)
                0.012765785 = queryNorm
              0.5417745 = fieldWeight in 1162, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9347134 = idf(docFreq=116, maxDocs=44218)
                0.078125 = fieldNorm(doc=1162)
          0.07811255 = weight(abstract_txt:words in 1162) [ClassicSimilarity], result of:
            0.07811255 = score(doc=1162,freq=3.0), product of:
              0.10783814 = queryWeight, product of:
                1.578073 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.012765785 = queryNorm
              0.72435 = fieldWeight in 1162, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.078125 = fieldNorm(doc=1162)
          0.05958453 = weight(abstract_txt:representing in 1162) [ClassicSimilarity], result of:
            0.05958453 = score(doc=1162,freq=1.0), product of:
              0.12984382 = queryWeight, product of:
                1.7316157 = boost
                5.8738413 = idf(docFreq=337, maxDocs=44218)
                0.012765785 = queryNorm
              0.45889384 = fieldWeight in 1162, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8738413 = idf(docFreq=337, maxDocs=44218)
                0.078125 = fieldNorm(doc=1162)
          0.08690346 = weight(abstract_txt:similarity in 1162) [ClassicSimilarity], result of:
            0.08690346 = score(doc=1162,freq=1.0), product of:
              0.19115576 = queryWeight, product of:
                2.5732396 = boost
                5.8191514 = idf(docFreq=356, maxDocs=44218)
                0.012765785 = queryNorm
              0.4546212 = fieldWeight in 1162, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8191514 = idf(docFreq=356, maxDocs=44218)
                0.078125 = fieldNorm(doc=1162)
          0.05361785 = weight(abstract_txt:method in 1162) [ClassicSimilarity], result of:
            0.05361785 = score(doc=1162,freq=1.0), product of:
              0.15248081 = queryWeight, product of:
                2.6537712 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.012765785 = queryNorm
              0.3516367 = fieldWeight in 1162, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.078125 = fieldNorm(doc=1162)
          0.107902005 = weight(abstract_txt:category in 1162) [ClassicSimilarity], result of:
            0.107902005 = score(doc=1162,freq=1.0), product of:
              0.2208254 = queryWeight, product of:
                2.765738 = boost
                6.2544694 = idf(docFreq=230, maxDocs=44218)
                0.012765785 = queryNorm
              0.4886304 = fieldWeight in 1162, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2544694 = idf(docFreq=230, maxDocs=44218)
                0.078125 = fieldNorm(doc=1162)
        0.24 = coord(6/25)
    
  5. Gödert, W.: Detecting multiword phrases in mathematical text corpora (2012) 0.09
    0.094980575 = sum of:
      0.094980575 = product of:
        0.7915048 = sum of:
          0.054117966 = weight(abstract_txt:words in 466) [ClassicSimilarity], result of:
            0.054117966 = score(doc=466,freq=1.0), product of:
              0.10783814 = queryWeight, product of:
                1.578073 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.012765785 = queryNorm
              0.5018444 = fieldWeight in 466, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.09375 = fieldNorm(doc=466)
          0.09099251 = weight(abstract_txt:method in 466) [ClassicSimilarity], result of:
            0.09099251 = score(doc=466,freq=2.0), product of:
              0.15248081 = queryWeight, product of:
                2.6537712 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.012765785 = queryNorm
              0.5967473 = fieldWeight in 466, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.09375 = fieldNorm(doc=466)
          0.6463943 = weight(abstract_txt:multiword in 466) [ClassicSimilarity], result of:
            0.6463943 = score(doc=466,freq=2.0), product of:
              0.56347734 = queryWeight, product of:
                5.1014557 = boost
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.012765785 = queryNorm
              1.1471523 = fieldWeight in 466, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.09375 = fieldNorm(doc=466)
        0.12 = coord(3/25)