Document (#18029)

Author
Leppanen, E.
Title
Homografiongelma tekstihaussa ja homografien disambiguoinnin vaikutukset
Source
Informaatiotutkimus. 15(1996) no.4, S.133-144
Year
1996
Abstract
Homonymy is known to often cause false drops in free text searching in a full text database. The problem is quite common and difficult to avoid in Finnish, but nobody has examined it before. Reports on a study that examined the frequency of, and solutions to, the homonymy problem, based on searches made in a Finnish full text database containing about 55.000 newspaper articles. The results indicate that homonymy is not a very serious problem in full text searching, with only about 1 search result set out of 4 containing false drops caused by homonymy. Several other reasons for nonrelevance were much more common. However, in some set results there were a considerable number of homonymy errors, so the number seems to be very random. A study was also made into whether homonyms can be disambiguated by syntactic analysis. The result was that 75,2% of homonyms were disambiguated by this method. Verb homonyms were considerably easier to disambiguate than substantives. Although homonymy is not a very big problem it could perhaps easily be eliminated if there was a suitable syntactic analyzer in the IR system
Footnote
Übers. d. Titels: The homonymy problem in free text searching and the results of homonymy disambiguation
Theme
Volltextretrieval
Retrievalstudien

Similar documents (content)

  1. Gillaspie, L.: ¬The role of linguistic phenomena in retrieval performance (1995) 0.17
    0.16589306 = sum of:
      0.16589306 = product of:
        0.82946527 = sum of:
          0.037161235 = weight(abstract_txt:number in 3930) [ClassicSimilarity], result of:
            0.037161235 = score(doc=3930,freq=1.0), product of:
              0.07207459 = queryWeight, product of:
                1.0105964 = boost
                4.124753 = idf(docFreq=1871, maxDocs=42596)
                0.017290458 = queryNorm
              0.5155941 = fieldWeight in 3930, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.124753 = idf(docFreq=1871, maxDocs=42596)
                0.125 = fieldNorm(doc=3930)
          0.094336346 = weight(abstract_txt:full in 3930) [ClassicSimilarity], result of:
            0.094336346 = score(doc=3930,freq=1.0), product of:
              0.15353422 = queryWeight, product of:
                1.8064871 = boost
                4.9154563 = idf(docFreq=848, maxDocs=42596)
                0.017290458 = queryNorm
              0.61443204 = fieldWeight in 3930, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9154563 = idf(docFreq=848, maxDocs=42596)
                0.125 = fieldNorm(doc=3930)
          0.23898615 = weight(abstract_txt:false in 3930) [ClassicSimilarity], result of:
            0.23898615 = score(doc=3930,freq=1.0), product of:
              0.24925119 = queryWeight, product of:
                1.8793396 = boost
                7.6705317 = idf(docFreq=53, maxDocs=42596)
                0.017290458 = queryNorm
              0.95881647 = fieldWeight in 3930, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.6705317 = idf(docFreq=53, maxDocs=42596)
                0.125 = fieldNorm(doc=3930)
          0.07031054 = weight(abstract_txt:text in 3930) [ClassicSimilarity], result of:
            0.07031054 = score(doc=3930,freq=1.0), product of:
              0.1389139 = queryWeight, product of:
                1.98415 = boost
                4.049158 = idf(docFreq=2018, maxDocs=42596)
                0.017290458 = queryNorm
              0.50614476 = fieldWeight in 3930, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.049158 = idf(docFreq=2018, maxDocs=42596)
                0.125 = fieldNorm(doc=3930)
          0.38867098 = weight(abstract_txt:drops in 3930) [ClassicSimilarity], result of:
            0.38867098 = score(doc=3930,freq=1.0), product of:
              0.34470177 = queryWeight, product of:
                2.210082 = boost
                9.020458 = idf(docFreq=13, maxDocs=42596)
                0.017290458 = queryNorm
              1.1275573 = fieldWeight in 3930, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.020458 = idf(docFreq=13, maxDocs=42596)
                0.125 = fieldNorm(doc=3930)
        0.2 = coord(5/25)
    
  2. Vaughan, L.; Shaw , D.: Bibliographic and Web citations : what Is the difference? (2003) 0.12
    0.11812699 = sum of:
      0.11812699 = product of:
        0.4218821 = sum of:
          0.016258042 = weight(abstract_txt:number in 177) [ClassicSimilarity], result of:
            0.016258042 = score(doc=177,freq=1.0), product of:
              0.07207459 = queryWeight, product of:
                1.0105964 = boost
                4.124753 = idf(docFreq=1871, maxDocs=42596)
                0.017290458 = queryNorm
              0.22557244 = fieldWeight in 177, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.124753 = idf(docFreq=1871, maxDocs=42596)
                0.0546875 = fieldNorm(doc=177)
          0.016283346 = weight(abstract_txt:there in 177) [ClassicSimilarity], result of:
            0.016283346 = score(doc=177,freq=1.0), product of:
              0.07214936 = queryWeight, product of:
                1.0111204 = boost
                4.126892 = idf(docFreq=1867, maxDocs=42596)
                0.017290458 = queryNorm
              0.22568941 = fieldWeight in 177, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.126892 = idf(docFreq=1867, maxDocs=42596)
                0.0546875 = fieldNorm(doc=177)
          0.033016082 = weight(abstract_txt:examined in 177) [ClassicSimilarity], result of:
            0.033016082 = score(doc=177,freq=1.0), product of:
              0.11558117 = queryWeight, product of:
                1.2797649 = boost
                5.2233653 = idf(docFreq=623, maxDocs=42596)
                0.017290458 = queryNorm
              0.2856528 = fieldWeight in 177, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2233653 = idf(docFreq=623, maxDocs=42596)
                0.0546875 = fieldNorm(doc=177)
          0.041272152 = weight(abstract_txt:full in 177) [ClassicSimilarity], result of:
            0.041272152 = score(doc=177,freq=1.0), product of:
              0.15353422 = queryWeight, product of:
                1.8064871 = boost
                4.9154563 = idf(docFreq=848, maxDocs=42596)
                0.017290458 = queryNorm
              0.26881403 = fieldWeight in 177, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9154563 = idf(docFreq=848, maxDocs=42596)
                0.0546875 = fieldNorm(doc=177)
          0.040452473 = weight(abstract_txt:were in 177) [ClassicSimilarity], result of:
            0.040452473 = score(doc=177,freq=3.0), product of:
              0.11561202 = queryWeight, product of:
                1.8101023 = boost
                3.69397 = idf(docFreq=2879, maxDocs=42596)
                0.017290458 = queryNorm
              0.3498985 = fieldWeight in 177, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.69397 = idf(docFreq=2879, maxDocs=42596)
                0.0546875 = fieldNorm(doc=177)
          0.104556434 = weight(abstract_txt:false in 177) [ClassicSimilarity], result of:
            0.104556434 = score(doc=177,freq=1.0), product of:
              0.24925119 = queryWeight, product of:
                1.8793396 = boost
                7.6705317 = idf(docFreq=53, maxDocs=42596)
                0.017290458 = queryNorm
              0.4194822 = fieldWeight in 177, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.6705317 = idf(docFreq=53, maxDocs=42596)
                0.0546875 = fieldNorm(doc=177)
          0.17004356 = weight(abstract_txt:drops in 177) [ClassicSimilarity], result of:
            0.17004356 = score(doc=177,freq=1.0), product of:
              0.34470177 = queryWeight, product of:
                2.210082 = boost
                9.020458 = idf(docFreq=13, maxDocs=42596)
                0.017290458 = queryNorm
              0.4933063 = fieldWeight in 177, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.020458 = idf(docFreq=13, maxDocs=42596)
                0.0546875 = fieldNorm(doc=177)
        0.28 = coord(7/25)
    
  3. Shuman, B.A.: One false drop deserves another : file selection as a means of increasing precision in online searches (1992) 0.12
    0.117930256 = sum of:
      0.117930256 = product of:
        0.5896513 = sum of:
          0.036312435 = weight(abstract_txt:searching in 4031) [ClassicSimilarity], result of:
            0.036312435 = score(doc=4031,freq=2.0), product of:
              0.07706 = queryWeight, product of:
                1.0449636 = boost
                4.2650228 = idf(docFreq=1626, maxDocs=42596)
                0.017290458 = queryNorm
              0.47122288 = fieldWeight in 4031, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2650228 = idf(docFreq=1626, maxDocs=42596)
                0.078125 = fieldNorm(doc=4031)
          0.037037253 = weight(abstract_txt:common in 4031) [ClassicSimilarity], result of:
            0.037037253 = score(doc=4031,freq=1.0), product of:
              0.09837723 = queryWeight, product of:
                1.1806847 = boost
                4.8189692 = idf(docFreq=934, maxDocs=42596)
                0.017290458 = queryNorm
              0.37648198 = fieldWeight in 4031, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8189692 = idf(docFreq=934, maxDocs=42596)
                0.078125 = fieldNorm(doc=4031)
          0.2112359 = weight(abstract_txt:false in 4031) [ClassicSimilarity], result of:
            0.2112359 = score(doc=4031,freq=2.0), product of:
              0.24925119 = queryWeight, product of:
                1.8793396 = boost
                7.6705317 = idf(docFreq=53, maxDocs=42596)
                0.017290458 = queryNorm
              0.847482 = fieldWeight in 4031, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.6705317 = idf(docFreq=53, maxDocs=42596)
                0.078125 = fieldNorm(doc=4031)
          0.062146325 = weight(abstract_txt:text in 4031) [ClassicSimilarity], result of:
            0.062146325 = score(doc=4031,freq=2.0), product of:
              0.1389139 = queryWeight, product of:
                1.98415 = boost
                4.049158 = idf(docFreq=2018, maxDocs=42596)
                0.017290458 = queryNorm
              0.44737297 = fieldWeight in 4031, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.049158 = idf(docFreq=2018, maxDocs=42596)
                0.078125 = fieldNorm(doc=4031)
          0.24291937 = weight(abstract_txt:drops in 4031) [ClassicSimilarity], result of:
            0.24291937 = score(doc=4031,freq=1.0), product of:
              0.34470177 = queryWeight, product of:
                2.210082 = boost
                9.020458 = idf(docFreq=13, maxDocs=42596)
                0.017290458 = queryNorm
              0.7047233 = fieldWeight in 4031, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.020458 = idf(docFreq=13, maxDocs=42596)
                0.078125 = fieldNorm(doc=4031)
        0.2 = coord(5/25)
    
  4. McBride, J.L.: Faceted subject access for music through USMARC : a case for linked fields (2000) 0.11
    0.10829246 = sum of:
      0.10829246 = product of:
        0.5414623 = sum of:
          0.023225771 = weight(abstract_txt:number in 404) [ClassicSimilarity], result of:
            0.023225771 = score(doc=404,freq=1.0), product of:
              0.07207459 = queryWeight, product of:
                1.0105964 = boost
                4.124753 = idf(docFreq=1871, maxDocs=42596)
                0.017290458 = queryNorm
              0.3222463 = fieldWeight in 404, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.124753 = idf(docFreq=1871, maxDocs=42596)
                0.078125 = fieldNorm(doc=404)
          0.044532128 = weight(abstract_txt:result in 404) [ClassicSimilarity], result of:
            0.044532128 = score(doc=404,freq=1.0), product of:
              0.11123745 = queryWeight, product of:
                1.2554868 = boost
                5.1242743 = idf(docFreq=688, maxDocs=42596)
                0.017290458 = queryNorm
              0.40033394 = fieldWeight in 404, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1242743 = idf(docFreq=688, maxDocs=42596)
                0.078125 = fieldNorm(doc=404)
          0.08141871 = weight(abstract_txt:containing in 404) [ClassicSimilarity], result of:
            0.08141871 = score(doc=404,freq=1.0), product of:
              0.16632271 = queryWeight, product of:
                1.535191 = boost
                6.265888 = idf(docFreq=219, maxDocs=42596)
                0.017290458 = queryNorm
              0.48952252 = fieldWeight in 404, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.265888 = idf(docFreq=219, maxDocs=42596)
                0.078125 = fieldNorm(doc=404)
          0.14936633 = weight(abstract_txt:false in 404) [ClassicSimilarity], result of:
            0.14936633 = score(doc=404,freq=1.0), product of:
              0.24925119 = queryWeight, product of:
                1.8793396 = boost
                7.6705317 = idf(docFreq=53, maxDocs=42596)
                0.017290458 = queryNorm
              0.5992603 = fieldWeight in 404, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.6705317 = idf(docFreq=53, maxDocs=42596)
                0.078125 = fieldNorm(doc=404)
          0.24291937 = weight(abstract_txt:drops in 404) [ClassicSimilarity], result of:
            0.24291937 = score(doc=404,freq=1.0), product of:
              0.34470177 = queryWeight, product of:
                2.210082 = boost
                9.020458 = idf(docFreq=13, maxDocs=42596)
                0.017290458 = queryNorm
              0.7047233 = fieldWeight in 404, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.020458 = idf(docFreq=13, maxDocs=42596)
                0.078125 = fieldNorm(doc=404)
        0.2 = coord(5/25)
    
  5. Ahlgren, P.; Kekäläinen, J.: Indexing strategies for Swedish full text retrieval under different user scenarios (2007) 0.10
    0.10125196 = sum of:
      0.10125196 = product of:
        0.36161414 = sum of:
          0.028974794 = weight(abstract_txt:database in 2076) [ClassicSimilarity], result of:
            0.028974794 = score(doc=2076,freq=2.0), product of:
              0.076927036 = queryWeight, product of:
                1.0440617 = boost
                4.2613416 = idf(docFreq=1632, maxDocs=42596)
                0.017290458 = queryNorm
              0.37665293 = fieldWeight in 2076, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2613416 = idf(docFreq=1632, maxDocs=42596)
                0.0625 = fieldNorm(doc=2076)
          0.025330564 = weight(abstract_txt:made in 2076) [ClassicSimilarity], result of:
            0.025330564 = score(doc=2076,freq=1.0), product of:
              0.088614605 = queryWeight, product of:
                1.1205708 = boost
                4.573614 = idf(docFreq=1194, maxDocs=42596)
                0.017290458 = queryNorm
              0.28585088 = fieldWeight in 2076, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.573614 = idf(docFreq=1194, maxDocs=42596)
                0.0625 = fieldNorm(doc=2076)
          0.1051713 = weight(abstract_txt:analyzer in 2076) [ClassicSimilarity], result of:
            0.1051713 = score(doc=2076,freq=1.0), product of:
              0.18168968 = queryWeight, product of:
                1.1345843 = boost
                9.2616205 = idf(docFreq=10, maxDocs=42596)
                0.017290458 = queryNorm
              0.5788513 = fieldWeight in 2076, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.2616205 = idf(docFreq=10, maxDocs=42596)
                0.0625 = fieldNorm(doc=2076)
          0.047168173 = weight(abstract_txt:full in 2076) [ClassicSimilarity], result of:
            0.047168173 = score(doc=2076,freq=1.0), product of:
              0.15353422 = queryWeight, product of:
                1.8064871 = boost
                4.9154563 = idf(docFreq=848, maxDocs=42596)
                0.017290458 = queryNorm
              0.30721602 = fieldWeight in 2076, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9154563 = idf(docFreq=848, maxDocs=42596)
                0.0625 = fieldNorm(doc=2076)
          0.053383417 = weight(abstract_txt:were in 2076) [ClassicSimilarity], result of:
            0.053383417 = score(doc=2076,freq=4.0), product of:
              0.11561202 = queryWeight, product of:
                1.8101023 = boost
                3.69397 = idf(docFreq=2879, maxDocs=42596)
                0.017290458 = queryNorm
              0.46174625 = fieldWeight in 2076, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.69397 = idf(docFreq=2879, maxDocs=42596)
                0.0625 = fieldNorm(doc=2076)
          0.03515527 = weight(abstract_txt:text in 2076) [ClassicSimilarity], result of:
            0.03515527 = score(doc=2076,freq=1.0), product of:
              0.1389139 = queryWeight, product of:
                1.98415 = boost
                4.049158 = idf(docFreq=2018, maxDocs=42596)
                0.017290458 = queryNorm
              0.25307238 = fieldWeight in 2076, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.049158 = idf(docFreq=2018, maxDocs=42596)
                0.0625 = fieldNorm(doc=2076)
          0.06643063 = weight(abstract_txt:problem in 2076) [ClassicSimilarity], result of:
            0.06643063 = score(doc=2076,freq=2.0), product of:
              0.16852112 = queryWeight, product of:
                2.1853893 = boost
                4.4598374 = idf(docFreq=1338, maxDocs=42596)
                0.017290458 = queryNorm
              0.39419764 = fieldWeight in 2076, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4598374 = idf(docFreq=1338, maxDocs=42596)
                0.0625 = fieldNorm(doc=2076)
        0.28 = coord(7/25)