Leppanen, E.: Homografiongelma tekstihaussa ja homografien disambiguoinnin vaikutukset (1996)
0.00
0.0016567915 = product of:
0.009112353 = sum of:
0.0070291325 = weight(_text_:a in 27) [ClassicSimilarity], result of:
0.0070291325 = score(doc=27,freq=18.0), product of:
0.030653298 = queryWeight, product of:
1.153047 = idf(docFreq=37942, maxDocs=44218)
0.026584605 = queryNorm
0.22931081 = fieldWeight in 27, product of:
4.2426405 = tf(freq=18.0), with freq of:
18.0 = termFreq=18.0
1.153047 = idf(docFreq=37942, maxDocs=44218)
0.046875 = fieldNorm(doc=27)
0.0020832212 = weight(_text_:s in 27) [ClassicSimilarity], result of:
0.0020832212 = score(doc=27,freq=2.0), product of:
0.028903782 = queryWeight, product of:
1.0872376 = idf(docFreq=40523, maxDocs=44218)
0.026584605 = queryNorm
0.072074346 = fieldWeight in 27, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
1.0872376 = idf(docFreq=40523, maxDocs=44218)
0.046875 = fieldNorm(doc=27)
0.18181819 = coord(2/11)
- Abstract
- Homonymy is known to often cause false drops in free text searching in a full text database. The problem is quite common and difficult to avoid in Finnish, but nobody has examined it before. Reports on a study that examined the frequency of, and solutions to, the homonymy problem, based on searches made in a Finnish full text database containing about 55.000 newspaper articles. The results indicate that homonymy is not a very serious problem in full text searching, with only about 1 search result set out of 4 containing false drops caused by homonymy. Several other reasons for nonrelevance were much more common. However, in some set results there were a considerable number of homonymy errors, so the number seems to be very random. A study was also made into whether homonyms can be disambiguated by syntactic analysis. The result was that 75,2% of homonyms were disambiguated by this method. Verb homonyms were considerably easier to disambiguate than substantives. Although homonymy is not a very big problem it could perhaps easily be eliminated if there was a suitable syntactic analyzer in the IR system
- Source
- Informaatiotutkimus. 15(1996) no.4, S.133-144
- Type
- a