Airio, E.; Kettunen, K.: Does dictionary based bilingual retrieval work in a non-normalized index? (2009)
0.02
0.018442601 = product of:
0.092213005 = sum of:
0.092213005 = weight(_text_:index in 4224) [ClassicSimilarity], result of:
0.092213005 = score(doc=4224,freq=4.0), product of:
0.2250935 = queryWeight, product of:
4.369764 = idf(docFreq=1520, maxDocs=44218)
0.051511593 = queryNorm
0.40966535 = fieldWeight in 4224, product of:
2.0 = tf(freq=4.0), with freq of:
4.0 = termFreq=4.0
4.369764 = idf(docFreq=1520, maxDocs=44218)
0.046875 = fieldNorm(doc=4224)
0.2 = coord(1/5)
- Abstract
- Many operational IR indexes are non-normalized, i.e. no lemmatization or stemming techniques, etc. have been employed in indexing. This poses a challenge for dictionary-based cross-language retrieval (CLIR), because translations are mostly lemmas. In this study, we face the challenge of dictionary-based CLIR in a non-normalized index. We test two optional approaches: FCG (Frequent Case Generation) and s-gramming. The idea of FCG is to automatically generate the most frequent inflected forms for a given lemma. FCG has been tested in monolingual retrieval and has been shown to be a good method for inflected retrieval, especially for highly inflected languages. S-gramming is an approximate string matching technique (an extension of n-gramming). The language pairs in our tests were English-Finnish, English-Swedish, Swedish-Finnish and Finnish-Swedish. Both our approaches performed quite well, but the results varied depending on the language pair. S-gramming and FCG performed quite equally in all the other language pairs except Finnish-Swedish, where s-gramming outperformed FCG.