Ahmed, F.; Nürnberger, A.: Evaluation of n-gram conflation approaches for Arabic text retrieval (2009)
0.04
0.035687584 = product of:
0.07137517 = sum of:
0.07137517 = product of:
0.14275034 = sum of:
0.14275034 = weight(_text_:n in 2941) [ClassicSimilarity], result of:
0.14275034 = score(doc=2941,freq=10.0), product of:
0.22335295 = queryWeight, product of:
4.3116565 = idf(docFreq=1611, maxDocs=44218)
0.05180212 = queryNorm
0.63912445 = fieldWeight in 2941, product of:
3.1622777 = tf(freq=10.0), with freq of:
10.0 = termFreq=10.0
4.3116565 = idf(docFreq=1611, maxDocs=44218)
0.046875 = fieldNorm(doc=2941)
0.5 = coord(1/2)
0.5 = coord(1/2)
- Abstract
- In this paper we present a language-independent approach for conflation that does not depend on predefined rules or prior knowledge of the target language. The proposed unsupervised method is based on an enhancement of the pure n-gram model that can group related words based on various string-similarity measures, while restricting the search to specific locations of the target word by taking into account the order of n-grams. We show that the method is effective to achieve high score similarities for all word-form variations and reduces the ambiguity, i.e., obtains a higher precision and recall, compared to pure n-gram-based approaches for English, Portuguese, and Arabic. The proposed method is especially suited for conflation approaches in Arabic, since Arabic is a highly inflectional language. Therefore, we present in addition an adaptive user interface for Arabic text retrieval called araSearch. araSearch serves as a metasearch interface to existing search engines. The system is able to extend a query using the proposed conflation approach such that additional results for relevant subwords can be found automatically.
- Object
- n-grams