Document (#18028)

Author
Leppanen, E.
Title
Homografiongelma tekstihaussa ja homografien disambiguoinnin vaikutukset
Source
Informaatiotutkimus. 15(1996) no.4, S.133-144
Year
1996
Abstract
Homonymy is known to often cause false drops in free text searching in a full text database. The problem is quite common and difficult to avoid in Finnish, but nobody has examined it before. Reports on a study that examined the frequency of, and solutions to, the homonymy problem, based on searches made in a Finnish full text database containing about 55.000 newspaper articles. The results indicate that homonymy is not a very serious problem in full text searching, with only about 1 search result set out of 4 containing false drops caused by homonymy. Several other reasons for nonrelevance were much more common. However, in some set results there were a considerable number of homonymy errors, so the number seems to be very random. A study was also made into whether homonyms can be disambiguated by syntactic analysis. The result was that 75,2% of homonyms were disambiguated by this method. Verb homonyms were considerably easier to disambiguate than substantives. Although homonymy is not a very big problem it could perhaps easily be eliminated if there was a suitable syntactic analyzer in the IR system
Footnote
Übers. d. Titels: The homonymy problem in free text searching and the results of homonymy disambiguation
Theme
Volltextretrieval
Retrievalstudien

Similar documents (content)

  1. Gillaspie, L.: ¬The role of linguistic phenomena in retrieval performance (1995) 0.17
    0.16719492 = sum of:
      0.16719492 = product of:
        0.8359746 = sum of:
          0.03757627 = weight(abstract_txt:number in 3861) [ClassicSimilarity], result of:
            0.03757627 = score(doc=3861,freq=1.0), product of:
              0.07274031 = queryWeight, product of:
                1.0152009 = boost
                4.132649 = idf(docFreq=1927, maxDocs=44218)
                0.017337825 = queryNorm
              0.5165811 = fieldWeight in 3861, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.132649 = idf(docFreq=1927, maxDocs=44218)
                0.125 = fieldNorm(doc=3861)
          0.095262006 = weight(abstract_txt:full in 3861) [ClassicSimilarity], result of:
            0.095262006 = score(doc=3861,freq=1.0), product of:
              0.15481377 = queryWeight, product of:
                1.8139063 = boost
                4.922663 = idf(docFreq=874, maxDocs=44218)
                0.017337825 = queryNorm
              0.6153329 = fieldWeight in 3861, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.922663 = idf(docFreq=874, maxDocs=44218)
                0.125 = fieldNorm(doc=3861)
          0.23708366 = weight(abstract_txt:false in 3861) [ClassicSimilarity], result of:
            0.23708366 = score(doc=3861,freq=1.0), product of:
              0.24837074 = queryWeight, product of:
                1.8759214 = boost
                7.636444 = idf(docFreq=57, maxDocs=44218)
                0.017337825 = queryNorm
              0.9545555 = fieldWeight in 3861, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.636444 = idf(docFreq=57, maxDocs=44218)
                0.125 = fieldNorm(doc=3861)
          0.07041231 = weight(abstract_txt:text in 3861) [ClassicSimilarity], result of:
            0.07041231 = score(doc=3861,freq=1.0), product of:
              0.139297 = queryWeight, product of:
                1.9867823 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.017337825 = queryNorm
              0.5054833 = fieldWeight in 3861, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.125 = fieldNorm(doc=3861)
          0.39564034 = weight(abstract_txt:drops in 3861) [ClassicSimilarity], result of:
            0.39564034 = score(doc=3861,freq=1.0), product of:
              0.349435 = queryWeight, product of:
                2.2250903 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.017337825 = queryNorm
              1.1322287 = fieldWeight in 3861, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.125 = fieldNorm(doc=3861)
        0.2 = coord(5/25)
    
  2. Shuman, B.A.: One false drop deserves another : file selection as a means of increasing precision in online searches (1992) 0.12
    0.118605055 = sum of:
      0.118605055 = product of:
        0.59302527 = sum of:
          0.03701626 = weight(abstract_txt:searching in 4031) [ClassicSimilarity], result of:
            0.03701626 = score(doc=4031,freq=2.0), product of:
              0.07819237 = queryWeight, product of:
                1.0525594 = boost
                4.284727 = idf(docFreq=1655, maxDocs=44218)
                0.017337825 = queryNorm
              0.47339994 = fieldWeight in 4031, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.284727 = idf(docFreq=1655, maxDocs=44218)
                0.078125 = fieldNorm(doc=4031)
          0.036943223 = weight(abstract_txt:common in 4031) [ClassicSimilarity], result of:
            0.036943223 = score(doc=4031,freq=1.0), product of:
              0.09838658 = queryWeight, product of:
                1.1806804 = boost
                4.806278 = idf(docFreq=982, maxDocs=44218)
                0.017337825 = queryNorm
              0.3754905 = fieldWeight in 4031, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.806278 = idf(docFreq=982, maxDocs=44218)
                0.078125 = fieldNorm(doc=4031)
          0.20955431 = weight(abstract_txt:false in 4031) [ClassicSimilarity], result of:
            0.20955431 = score(doc=4031,freq=2.0), product of:
              0.24837074 = queryWeight, product of:
                1.8759214 = boost
                7.636444 = idf(docFreq=57, maxDocs=44218)
                0.017337825 = queryNorm
              0.8437158 = fieldWeight in 4031, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.636444 = idf(docFreq=57, maxDocs=44218)
                0.078125 = fieldNorm(doc=4031)
          0.062236276 = weight(abstract_txt:text in 4031) [ClassicSimilarity], result of:
            0.062236276 = score(doc=4031,freq=2.0), product of:
              0.139297 = queryWeight, product of:
                1.9867823 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.017337825 = queryNorm
              0.44678837 = fieldWeight in 4031, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=4031)
          0.24727522 = weight(abstract_txt:drops in 4031) [ClassicSimilarity], result of:
            0.24727522 = score(doc=4031,freq=1.0), product of:
              0.349435 = queryWeight, product of:
                2.2250903 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.017337825 = queryNorm
              0.707643 = fieldWeight in 4031, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.078125 = fieldNorm(doc=4031)
        0.2 = coord(5/25)
    
  3. Vaughan, L.; Shaw , D.: Bibliographic and Web citations : what Is the difference? (2003) 0.12
    0.11858147 = sum of:
      0.11858147 = product of:
        0.42350525 = sum of:
          0.01604708 = weight(abstract_txt:there in 5176) [ClassicSimilarity], result of:
            0.01604708 = score(doc=5176,freq=1.0), product of:
              0.07157774 = queryWeight, product of:
                1.0070555 = boost
                4.099491 = idf(docFreq=1992, maxDocs=44218)
                0.017337825 = queryNorm
              0.22419092 = fieldWeight in 5176, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.099491 = idf(docFreq=1992, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5176)
          0.016439619 = weight(abstract_txt:number in 5176) [ClassicSimilarity], result of:
            0.016439619 = score(doc=5176,freq=1.0), product of:
              0.07274031 = queryWeight, product of:
                1.0152009 = boost
                4.132649 = idf(docFreq=1927, maxDocs=44218)
                0.017337825 = queryNorm
              0.22600424 = fieldWeight in 5176, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.132649 = idf(docFreq=1927, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5176)
          0.032638963 = weight(abstract_txt:examined in 5176) [ClassicSimilarity], result of:
            0.032638963 = score(doc=5176,freq=1.0), product of:
              0.114904806 = queryWeight, product of:
                1.2759496 = boost
                5.194097 = idf(docFreq=666, maxDocs=44218)
                0.017337825 = queryNorm
              0.2840522 = fieldWeight in 5176, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.194097 = idf(docFreq=666, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5176)
          0.039885737 = weight(abstract_txt:were in 5176) [ClassicSimilarity], result of:
            0.039885737 = score(doc=5176,freq=3.0), product of:
              0.114734836 = queryWeight, product of:
                1.80313 = boost
                3.6700637 = idf(docFreq=3061, maxDocs=44218)
                0.017337825 = queryNorm
              0.34763405 = fieldWeight in 5176, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.6700637 = idf(docFreq=3061, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5176)
          0.041677125 = weight(abstract_txt:full in 5176) [ClassicSimilarity], result of:
            0.041677125 = score(doc=5176,freq=1.0), product of:
              0.15481377 = queryWeight, product of:
                1.8139063 = boost
                4.922663 = idf(docFreq=874, maxDocs=44218)
                0.017337825 = queryNorm
              0.26920813 = fieldWeight in 5176, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.922663 = idf(docFreq=874, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5176)
          0.1037241 = weight(abstract_txt:false in 5176) [ClassicSimilarity], result of:
            0.1037241 = score(doc=5176,freq=1.0), product of:
              0.24837074 = queryWeight, product of:
                1.8759214 = boost
                7.636444 = idf(docFreq=57, maxDocs=44218)
                0.017337825 = queryNorm
              0.41761804 = fieldWeight in 5176, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.636444 = idf(docFreq=57, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5176)
          0.17309265 = weight(abstract_txt:drops in 5176) [ClassicSimilarity], result of:
            0.17309265 = score(doc=5176,freq=1.0), product of:
              0.349435 = queryWeight, product of:
                2.2250903 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.017337825 = queryNorm
              0.49535006 = fieldWeight in 5176, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5176)
        0.28 = coord(7/25)
    
  4. McBride, J.L.: Faceted subject access for music through USMARC : a case for linked fields (2000) 0.11
    0.10911658 = sum of:
      0.10911658 = product of:
        0.5455829 = sum of:
          0.023485169 = weight(abstract_txt:number in 5403) [ClassicSimilarity], result of:
            0.023485169 = score(doc=5403,freq=1.0), product of:
              0.07274031 = queryWeight, product of:
                1.0152009 = boost
                4.132649 = idf(docFreq=1927, maxDocs=44218)
                0.017337825 = queryNorm
              0.3228632 = fieldWeight in 5403, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.132649 = idf(docFreq=1927, maxDocs=44218)
                0.078125 = fieldNorm(doc=5403)
          0.044202868 = weight(abstract_txt:result in 5403) [ClassicSimilarity], result of:
            0.044202868 = score(doc=5403,freq=1.0), product of:
              0.110886745 = queryWeight, product of:
                1.2534419 = boost
                5.1024737 = idf(docFreq=730, maxDocs=44218)
                0.017337825 = queryNorm
              0.39863077 = fieldWeight in 5403, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1024737 = idf(docFreq=730, maxDocs=44218)
                0.078125 = fieldNorm(doc=5403)
          0.082442336 = weight(abstract_txt:containing in 5403) [ClassicSimilarity], result of:
            0.082442336 = score(doc=5403,freq=1.0), product of:
              0.16801429 = queryWeight, product of:
                1.5428991 = boost
                6.280787 = idf(docFreq=224, maxDocs=44218)
                0.017337825 = queryNorm
              0.49068648 = fieldWeight in 5403, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.280787 = idf(docFreq=224, maxDocs=44218)
                0.078125 = fieldNorm(doc=5403)
          0.14817728 = weight(abstract_txt:false in 5403) [ClassicSimilarity], result of:
            0.14817728 = score(doc=5403,freq=1.0), product of:
              0.24837074 = queryWeight, product of:
                1.8759214 = boost
                7.636444 = idf(docFreq=57, maxDocs=44218)
                0.017337825 = queryNorm
              0.5965972 = fieldWeight in 5403, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.636444 = idf(docFreq=57, maxDocs=44218)
                0.078125 = fieldNorm(doc=5403)
          0.24727522 = weight(abstract_txt:drops in 5403) [ClassicSimilarity], result of:
            0.24727522 = score(doc=5403,freq=1.0), product of:
              0.349435 = queryWeight, product of:
                2.2250903 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.017337825 = queryNorm
              0.707643 = fieldWeight in 5403, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.078125 = fieldNorm(doc=5403)
        0.2 = coord(5/25)
    
  5. Ahlgren, P.; Kekäläinen, J.: Indexing strategies for Swedish full text retrieval under different user scenarios (2007) 0.10
    0.10115253 = sum of:
      0.10115253 = product of:
        0.36125904 = sum of:
          0.0295132 = weight(abstract_txt:database in 896) [ClassicSimilarity], result of:
            0.0295132 = score(doc=896,freq=2.0), product of:
              0.07801658 = queryWeight, product of:
                1.0513755 = boost
                4.2799077 = idf(docFreq=1663, maxDocs=44218)
                0.017337825 = queryNorm
              0.37829396 = fieldWeight in 896, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2799077 = idf(docFreq=1663, maxDocs=44218)
                0.0625 = fieldNorm(doc=896)
          0.025406547 = weight(abstract_txt:made in 896) [ClassicSimilarity], result of:
            0.025406547 = score(doc=896,freq=1.0), product of:
              0.08895078 = queryWeight, product of:
                1.1226369 = boost
                4.5699964 = idf(docFreq=1244, maxDocs=44218)
                0.017337825 = queryNorm
              0.28562477 = fieldWeight in 896, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5699964 = idf(docFreq=1244, maxDocs=44218)
                0.0625 = fieldNorm(doc=896)
          0.104046434 = weight(abstract_txt:analyzer in 896) [ClassicSimilarity], result of:
            0.104046434 = score(doc=896,freq=1.0), product of:
              0.18071498 = queryWeight, product of:
                1.131479 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.017337825 = queryNorm
              0.5757488 = fieldWeight in 896, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.0625 = fieldNorm(doc=896)
          0.05263552 = weight(abstract_txt:were in 896) [ClassicSimilarity], result of:
            0.05263552 = score(doc=896,freq=4.0), product of:
              0.114734836 = queryWeight, product of:
                1.80313 = boost
                3.6700637 = idf(docFreq=3061, maxDocs=44218)
                0.017337825 = queryNorm
              0.45875797 = fieldWeight in 896, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.6700637 = idf(docFreq=3061, maxDocs=44218)
                0.0625 = fieldNorm(doc=896)
          0.047631003 = weight(abstract_txt:full in 896) [ClassicSimilarity], result of:
            0.047631003 = score(doc=896,freq=1.0), product of:
              0.15481377 = queryWeight, product of:
                1.8139063 = boost
                4.922663 = idf(docFreq=874, maxDocs=44218)
                0.017337825 = queryNorm
              0.30766645 = fieldWeight in 896, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.922663 = idf(docFreq=874, maxDocs=44218)
                0.0625 = fieldNorm(doc=896)
          0.035206154 = weight(abstract_txt:text in 896) [ClassicSimilarity], result of:
            0.035206154 = score(doc=896,freq=1.0), product of:
              0.139297 = queryWeight, product of:
                1.9867823 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.017337825 = queryNorm
              0.25274166 = fieldWeight in 896, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=896)
          0.0668202 = weight(abstract_txt:problem in 896) [ClassicSimilarity], result of:
            0.0668202 = score(doc=896,freq=2.0), product of:
              0.16948237 = queryWeight, product of:
                2.191501 = boost
                4.460548 = idf(docFreq=1388, maxDocs=44218)
                0.017337825 = queryNorm
              0.39426047 = fieldWeight in 896, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.460548 = idf(docFreq=1388, maxDocs=44218)
                0.0625 = fieldNorm(doc=896)
        0.28 = coord(7/25)