Lhadj, L.S.; Boughanem, M.; Amrouche, K.: Enhancing information retrieval through concept-based language modeling and semantic smoothing (2016)
0.00
0.0016629322 = product of:
0.009977593 = sum of:
0.009977593 = weight(_text_:in in 3221) [ClassicSimilarity], result of:
0.009977593 = score(doc=3221,freq=10.0), product of:
0.059380736 = queryWeight, product of:
1.3602545 = idf(docFreq=30841, maxDocs=44218)
0.043654136 = queryNorm
0.16802745 = fieldWeight in 3221, product of:
3.1622777 = tf(freq=10.0), with freq of:
10.0 = termFreq=10.0
1.3602545 = idf(docFreq=30841, maxDocs=44218)
0.0390625 = fieldNorm(doc=3221)
0.16666667 = coord(1/6)
- Abstract
- Traditionally, many information retrieval models assume that terms occur in documents independently. Although these models have already shown good performance, the word independency assumption seems to be unrealistic from a natural language point of view, which considers that terms are related to each other. Therefore, such an assumption leads to two well-known problems in information retrieval (IR), namely, polysemy, or term mismatch, and synonymy. In language models, these issues have been addressed by considering dependencies such as bigrams, phrasal-concepts, or word relationships, but such models are estimated using simple n-grams or concept counting. In this paper, we address polysemy and synonymy mismatch with a concept-based language modeling approach that combines ontological concepts from external resources with frequently found collocations from the document collection. In addition, the concept-based model is enriched with subconcepts and semantic relationships through a semantic smoothing technique so as to perform semantic matching. Experiments carried out on TREC collections show that our model achieves significant improvements over a single word-based model and the Markov Random Field model (using a Markov classifier).