Vechtomova, O.: ¬A method for automatic extraction of multiword units representing business aspects from user reviews (2014)
0.02
0.016309833 = product of:
0.06523933 = sum of:
0.06523933 = product of:
0.13047867 = sum of:
0.13047867 = weight(_text_:aspects in 1304) [ClassicSimilarity], result of:
0.13047867 = score(doc=1304,freq=8.0), product of:
0.21773462 = queryWeight, product of:
4.5198684 = idf(docFreq=1308, maxDocs=44218)
0.04817278 = queryNorm
0.5992555 = fieldWeight in 1304, product of:
2.828427 = tf(freq=8.0), with freq of:
8.0 = termFreq=8.0
4.5198684 = idf(docFreq=1308, maxDocs=44218)
0.046875 = fieldNorm(doc=1304)
0.5 = coord(1/2)
0.25 = coord(1/4)
- Abstract
- The article describes a semi-supervised approach to extracting multiword aspects of user-written reviews that belong to a given category. The method starts with a small set of seed words, representing the target category, and calculates distributional similarity between the candidate and seed words. We compare 3 distributional similarity measures (Lin's, Weeds's, and balAPinc), and a document retrieval function, BM25, adapted as a word similarity measure. We then introduce a method for identifying multiword aspects by using a combination of syntactic rules and a co-occurrence association measure. Finally, we describe a method for ranking multiword aspects by the likelihood of belonging to the target aspect category. The task used for evaluation is extraction of restaurant dish names from a corpus of restaurant reviews.