Oard, D.W.: Alternative approaches for cross-language text retrieval (1997)
0.01
0.00622365 = product of:
0.01867095 = sum of:
0.01867095 = weight(_text_:on in 1164) [ClassicSimilarity], result of:
0.01867095 = score(doc=1164,freq=8.0), product of:
0.109763056 = queryWeight, product of:
2.199415 = idf(docFreq=13325, maxDocs=44218)
0.04990557 = queryNorm
0.17010231 = fieldWeight in 1164, product of:
2.828427 = tf(freq=8.0), with freq of:
8.0 = termFreq=8.0
2.199415 = idf(docFreq=13325, maxDocs=44218)
0.02734375 = fieldNorm(doc=1164)
0.33333334 = coord(1/3)
- Abstract
- Multilingual text retrieval can be defined as selection of useful documents from collections that may contain several languages (English, French, Chinese, etc.). This formulation allows for the possibility that individual documents might contain more than one language, a common occurrence in some applications. Both cross-language and within-language retrieval are included in this formulation, but it is the cross-language aspect of the problem which distinguishes multilingual text retrieval from its well studied monolingual counterpart. At the SIGIR 96 workshop on "Cross-Linguistic Information Retrieval" the participants discussed the proliferation of terminology being used to describe the field and settled on "Cross-Language" as the best single description of the salient aspect of the problem. "Multilingual" was felt to be too broad, since that term has also been used to describe systems able to perform within-language retrieval in more than one language but that lack any cross-language capability. "Cross-lingual" and "cross-linguistic" were felt to be equally good descriptions of the field, but "crosslanguage" was selected as the preferred term in the interest of standardization. Unfortunately, at about the same time the U.S. Defense Advanced Research Projects Agency (DARPA) introduced "translingual" as their preferred term, so we are still some distance from reaching consensus on this matter.
I will not attempt to draw a sharp distinction between retrieval and filtering in this survey. Although my own work on adaptive cross-language text filtering has led me to make this distinction fairly carefully in other presentations (c.f., (Oard 1997b)), such an proach does little to help understand the fundamental techniques which have been applied or the results that have been obtained in this case. Since it is still common to view filtering (detection of useful documents in dynamic document streams) as a kind of retrieval, will simply adopt that perspective here.