Document (#18026)

Author
Leppanen, E.
Title
Homografiongelma tekstihaussa ja homografien disambiguoinnin vaikutukset
Source
Informaatiotutkimus. 15(1996) no.4, S.133-144
Year
1996
Abstract
Homonymy is known to often cause false drops in free text searching in a full text database. The problem is quite common and difficult to avoid in Finnish, but nobody has examined it before. Reports on a study that examined the frequency of, and solutions to, the homonymy problem, based on searches made in a Finnish full text database containing about 55.000 newspaper articles. The results indicate that homonymy is not a very serious problem in full text searching, with only about 1 search result set out of 4 containing false drops caused by homonymy. Several other reasons for nonrelevance were much more common. However, in some set results there were a considerable number of homonymy errors, so the number seems to be very random. A study was also made into whether homonyms can be disambiguated by syntactic analysis. The result was that 75,2% of homonyms were disambiguated by this method. Verb homonyms were considerably easier to disambiguate than substantives. Although homonymy is not a very big problem it could perhaps easily be eliminated if there was a suitable syntactic analyzer in the IR system
Footnote
Übers. d. Titels: The homonymy problem in free text searching and the results of homonymy disambiguation
Theme
Volltextretrieval
Retrievalstudien

Similar documents (content)

  1. Gillaspie, L.: ¬The role of linguistic phenomena in retrieval performance (1995) 0.17
    0.16678277 = sum of:
      0.16678277 = product of:
        0.8339138 = sum of:
          0.03743189 = weight(abstract_txt:number in 3927) [ClassicSimilarity], result of:
            0.03743189 = score(doc=3927,freq=1.0), product of:
              0.07259778 = queryWeight, product of:
                1.0151665 = boost
                4.124852 = idf(docFreq=1913, maxDocs=43556)
                0.017337149 = queryNorm
              0.5156065 = fieldWeight in 3927, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.124852 = idf(docFreq=1913, maxDocs=43556)
                0.125 = fieldNorm(doc=3927)
          0.09515939 = weight(abstract_txt:full in 3927) [ClassicSimilarity], result of:
            0.09515939 = score(doc=3927,freq=1.0), product of:
              0.15479623 = queryWeight, product of:
                1.8155216 = boost
                4.9179177 = idf(docFreq=865, maxDocs=43556)
                0.017337149 = queryNorm
              0.6147397 = fieldWeight in 3927, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9179177 = idf(docFreq=865, maxDocs=43556)
                0.125 = fieldNorm(doc=3927)
          0.2361098 = weight(abstract_txt:false in 3927) [ClassicSimilarity], result of:
            0.2361098 = score(doc=3927,freq=1.0), product of:
              0.24784006 = queryWeight, product of:
                1.8756913 = boost
                7.62136 = idf(docFreq=57, maxDocs=43556)
                0.017337149 = queryNorm
              0.95267 = fieldWeight in 3927, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.62136 = idf(docFreq=57, maxDocs=43556)
                0.125 = fieldNorm(doc=3927)
          0.07083031 = weight(abstract_txt:text in 3927) [ClassicSimilarity], result of:
            0.07083031 = score(doc=3927,freq=1.0), product of:
              0.1399324 = queryWeight, product of:
                1.9931948 = boost
                4.0494018 = idf(docFreq=2063, maxDocs=43556)
                0.017337149 = queryNorm
              0.5061752 = fieldWeight in 3927, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0494018 = idf(docFreq=2063, maxDocs=43556)
                0.125 = fieldNorm(doc=3927)
          0.3943824 = weight(abstract_txt:drops in 3927) [ClassicSimilarity], result of:
            0.3943824 = score(doc=3927,freq=1.0), product of:
              0.348905 = queryWeight, product of:
                2.2255082 = boost
                9.042746 = idf(docFreq=13, maxDocs=43556)
                0.017337149 = queryNorm
              1.1303432 = fieldWeight in 3927, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.042746 = idf(docFreq=13, maxDocs=43556)
                0.125 = fieldNorm(doc=3927)
        0.2 = coord(5/25)
    
  2. Vaughan, L.; Shaw , D.: Bibliographic and Web citations : what Is the difference? (2003) 0.12
    0.11853025 = sum of:
      0.11853025 = product of:
        0.42332232 = sum of:
          0.016204081 = weight(abstract_txt:there in 174) [ClassicSimilarity], result of:
            0.016204081 = score(doc=174,freq=1.0), product of:
              0.07208747 = queryWeight, product of:
                1.0115923 = boost
                4.110329 = idf(docFreq=1941, maxDocs=43556)
                0.017337149 = queryNorm
              0.22478363 = fieldWeight in 174, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.110329 = idf(docFreq=1941, maxDocs=43556)
                0.0546875 = fieldNorm(doc=174)
          0.016376453 = weight(abstract_txt:number in 174) [ClassicSimilarity], result of:
            0.016376453 = score(doc=174,freq=1.0), product of:
              0.07259778 = queryWeight, product of:
                1.0151665 = boost
                4.124852 = idf(docFreq=1913, maxDocs=43556)
                0.017337149 = queryNorm
              0.22557786 = fieldWeight in 174, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.124852 = idf(docFreq=1913, maxDocs=43556)
                0.0546875 = fieldNorm(doc=174)
          0.03295987 = weight(abstract_txt:examined in 174) [ClassicSimilarity], result of:
            0.03295987 = score(doc=174,freq=1.0), product of:
              0.11572677 = queryWeight, product of:
                1.2817181 = boost
                5.207912 = idf(docFreq=647, maxDocs=43556)
                0.017337149 = queryNorm
              0.28480768 = fieldWeight in 174, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.207912 = idf(docFreq=647, maxDocs=43556)
                0.0546875 = fieldNorm(doc=174)
          0.040309355 = weight(abstract_txt:were in 174) [ClassicSimilarity], result of:
            0.040309355 = score(doc=174,freq=3.0), product of:
              0.11561574 = queryWeight, product of:
                1.8117534 = boost
                3.6807828 = idf(docFreq=2983, maxDocs=43556)
                0.017337149 = queryNorm
              0.34864938 = fieldWeight in 174, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.6807828 = idf(docFreq=2983, maxDocs=43556)
                0.0546875 = fieldNorm(doc=174)
          0.04163223 = weight(abstract_txt:full in 174) [ClassicSimilarity], result of:
            0.04163223 = score(doc=174,freq=1.0), product of:
              0.15479623 = queryWeight, product of:
                1.8155216 = boost
                4.9179177 = idf(docFreq=865, maxDocs=43556)
                0.017337149 = queryNorm
              0.2689486 = fieldWeight in 174, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9179177 = idf(docFreq=865, maxDocs=43556)
                0.0546875 = fieldNorm(doc=174)
          0.10329803 = weight(abstract_txt:false in 174) [ClassicSimilarity], result of:
            0.10329803 = score(doc=174,freq=1.0), product of:
              0.24784006 = queryWeight, product of:
                1.8756913 = boost
                7.62136 = idf(docFreq=57, maxDocs=43556)
                0.017337149 = queryNorm
              0.4167931 = fieldWeight in 174, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.62136 = idf(docFreq=57, maxDocs=43556)
                0.0546875 = fieldNorm(doc=174)
          0.17254229 = weight(abstract_txt:drops in 174) [ClassicSimilarity], result of:
            0.17254229 = score(doc=174,freq=1.0), product of:
              0.348905 = queryWeight, product of:
                2.2255082 = boost
                9.042746 = idf(docFreq=13, maxDocs=43556)
                0.017337149 = queryNorm
              0.49452513 = fieldWeight in 174, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.042746 = idf(docFreq=13, maxDocs=43556)
                0.0546875 = fieldNorm(doc=174)
        0.28 = coord(7/25)
    
  3. Shuman, B.A.: One false drop deserves another : file selection as a means of increasing precision in online searches (1992) 0.12
    0.11838853 = sum of:
      0.11838853 = product of:
        0.59194267 = sum of:
          0.03689681 = weight(abstract_txt:searching in 4031) [ClassicSimilarity], result of:
            0.03689681 = score(doc=4031,freq=2.0), product of:
              0.078071296 = queryWeight, product of:
                1.0527405 = boost
                4.2775235 = idf(docFreq=1642, maxDocs=43556)
                0.017337149 = queryNorm
              0.47260404 = fieldWeight in 4031, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2775235 = idf(docFreq=1642, maxDocs=43556)
                0.078125 = fieldNorm(doc=4031)
          0.03725756 = weight(abstract_txt:common in 4031) [ClassicSimilarity], result of:
            0.03725756 = score(doc=4031,freq=1.0), product of:
              0.09900378 = queryWeight, product of:
                1.1854998 = boost
                4.816955 = idf(docFreq=957, maxDocs=43556)
                0.017337149 = queryNorm
              0.37632462 = fieldWeight in 4031, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.816955 = idf(docFreq=957, maxDocs=43556)
                0.078125 = fieldNorm(doc=4031)
          0.20869353 = weight(abstract_txt:false in 4031) [ClassicSimilarity], result of:
            0.20869353 = score(doc=4031,freq=2.0), product of:
              0.24784006 = queryWeight, product of:
                1.8756913 = boost
                7.62136 = idf(docFreq=57, maxDocs=43556)
                0.017337149 = queryNorm
              0.84204924 = fieldWeight in 4031, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.62136 = idf(docFreq=57, maxDocs=43556)
                0.078125 = fieldNorm(doc=4031)
          0.06260574 = weight(abstract_txt:text in 4031) [ClassicSimilarity], result of:
            0.06260574 = score(doc=4031,freq=2.0), product of:
              0.1399324 = queryWeight, product of:
                1.9931948 = boost
                4.0494018 = idf(docFreq=2063, maxDocs=43556)
                0.017337149 = queryNorm
              0.4473999 = fieldWeight in 4031, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0494018 = idf(docFreq=2063, maxDocs=43556)
                0.078125 = fieldNorm(doc=4031)
          0.246489 = weight(abstract_txt:drops in 4031) [ClassicSimilarity], result of:
            0.246489 = score(doc=4031,freq=1.0), product of:
              0.348905 = queryWeight, product of:
                2.2255082 = boost
                9.042746 = idf(docFreq=13, maxDocs=43556)
                0.017337149 = queryNorm
              0.7064645 = fieldWeight in 4031, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.042746 = idf(docFreq=13, maxDocs=43556)
                0.078125 = fieldNorm(doc=4031)
        0.2 = coord(5/25)
    
  4. McBride, J.L.: Faceted subject access for music through USMARC : a case for linked fields (2000) 0.11
    0.10888922 = sum of:
      0.10888922 = product of:
        0.5444461 = sum of:
          0.02339493 = weight(abstract_txt:number in 401) [ClassicSimilarity], result of:
            0.02339493 = score(doc=401,freq=1.0), product of:
              0.07259778 = queryWeight, product of:
                1.0151665 = boost
                4.124852 = idf(docFreq=1913, maxDocs=43556)
                0.017337149 = queryNorm
              0.32225406 = fieldWeight in 401, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.124852 = idf(docFreq=1913, maxDocs=43556)
                0.078125 = fieldNorm(doc=401)
          0.044466916 = weight(abstract_txt:result in 401) [ClassicSimilarity], result of:
            0.044466916 = score(doc=401,freq=1.0), product of:
              0.1113953 = queryWeight, product of:
                1.257503 = boost
                5.1095204 = idf(docFreq=714, maxDocs=43556)
                0.017337149 = queryNorm
              0.39918128 = fieldWeight in 401, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1095204 = idf(docFreq=714, maxDocs=43556)
                0.078125 = fieldNorm(doc=401)
          0.082526624 = weight(abstract_txt:containing in 401) [ClassicSimilarity], result of:
            0.082526624 = score(doc=401,freq=1.0), product of:
              0.16823056 = queryWeight, product of:
                1.5453542 = boost
                6.279125 = idf(docFreq=221, maxDocs=43556)
                0.017337149 = queryNorm
              0.49055666 = fieldWeight in 401, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.279125 = idf(docFreq=221, maxDocs=43556)
                0.078125 = fieldNorm(doc=401)
          0.14756861 = weight(abstract_txt:false in 401) [ClassicSimilarity], result of:
            0.14756861 = score(doc=401,freq=1.0), product of:
              0.24784006 = queryWeight, product of:
                1.8756913 = boost
                7.62136 = idf(docFreq=57, maxDocs=43556)
                0.017337149 = queryNorm
              0.59541875 = fieldWeight in 401, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.62136 = idf(docFreq=57, maxDocs=43556)
                0.078125 = fieldNorm(doc=401)
          0.246489 = weight(abstract_txt:drops in 401) [ClassicSimilarity], result of:
            0.246489 = score(doc=401,freq=1.0), product of:
              0.348905 = queryWeight, product of:
                2.2255082 = boost
                9.042746 = idf(docFreq=13, maxDocs=43556)
                0.017337149 = queryNorm
              0.7064645 = fieldWeight in 401, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.042746 = idf(docFreq=13, maxDocs=43556)
                0.078125 = fieldNorm(doc=401)
        0.2 = coord(5/25)
    
  5. Ahlgren, P.; Kekäläinen, J.: Indexing strategies for Swedish full text retrieval under different user scenarios (2007) 0.10
    0.101247385 = sum of:
      0.101247385 = product of:
        0.3615978 = sum of:
          0.029417012 = weight(abstract_txt:database in 2894) [ClassicSimilarity], result of:
            0.029417012 = score(doc=2894,freq=2.0), product of:
              0.0778941 = queryWeight, product of:
                1.051545 = boost
                4.2726665 = idf(docFreq=1650, maxDocs=43556)
                0.017337149 = queryNorm
              0.37765393 = fieldWeight in 2894, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2726665 = idf(docFreq=1650, maxDocs=43556)
                0.0625 = fieldNorm(doc=2894)
          0.025580958 = weight(abstract_txt:made in 2894) [ClassicSimilarity], result of:
            0.025580958 = score(doc=2894,freq=1.0), product of:
              0.0894115 = queryWeight, product of:
                1.1266066 = boost
                4.5776587 = idf(docFreq=1216, maxDocs=43556)
                0.017337149 = queryNorm
              0.28610367 = fieldWeight in 2894, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5776587 = idf(docFreq=1216, maxDocs=43556)
                0.0625 = fieldNorm(doc=2894)
          0.10372429 = weight(abstract_txt:analyzer in 2894) [ClassicSimilarity], result of:
            0.10372429 = score(doc=2894,freq=1.0), product of:
              0.18045095 = queryWeight, product of:
                1.1317232 = boost
                9.196897 = idf(docFreq=11, maxDocs=43556)
                0.017337149 = queryNorm
              0.57480603 = fieldWeight in 2894, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.196897 = idf(docFreq=11, maxDocs=43556)
                0.0625 = fieldNorm(doc=2894)
          0.053194553 = weight(abstract_txt:were in 2894) [ClassicSimilarity], result of:
            0.053194553 = score(doc=2894,freq=4.0), product of:
              0.11561574 = queryWeight, product of:
                1.8117534 = boost
                3.6807828 = idf(docFreq=2983, maxDocs=43556)
                0.017337149 = queryNorm
              0.46009785 = fieldWeight in 2894, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.6807828 = idf(docFreq=2983, maxDocs=43556)
                0.0625 = fieldNorm(doc=2894)
          0.047579695 = weight(abstract_txt:full in 2894) [ClassicSimilarity], result of:
            0.047579695 = score(doc=2894,freq=1.0), product of:
              0.15479623 = queryWeight, product of:
                1.8155216 = boost
                4.9179177 = idf(docFreq=865, maxDocs=43556)
                0.017337149 = queryNorm
              0.30736986 = fieldWeight in 2894, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9179177 = idf(docFreq=865, maxDocs=43556)
                0.0625 = fieldNorm(doc=2894)
          0.035415154 = weight(abstract_txt:text in 2894) [ClassicSimilarity], result of:
            0.035415154 = score(doc=2894,freq=1.0), product of:
              0.1399324 = queryWeight, product of:
                1.9931948 = boost
                4.0494018 = idf(docFreq=2063, maxDocs=43556)
                0.017337149 = queryNorm
              0.2530876 = fieldWeight in 2894, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0494018 = idf(docFreq=2063, maxDocs=43556)
                0.0625 = fieldNorm(doc=2894)
          0.06668616 = weight(abstract_txt:problem in 2894) [ClassicSimilarity], result of:
            0.06668616 = score(doc=2894,freq=2.0), product of:
              0.1693581 = queryWeight, product of:
                2.1927726 = boost
                4.454867 = idf(docFreq=1375, maxDocs=43556)
                0.017337149 = queryNorm
              0.39375833 = fieldWeight in 2894, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.454867 = idf(docFreq=1375, maxDocs=43556)
                0.0625 = fieldNorm(doc=2894)
        0.28 = coord(7/25)