Document (#18029)

Author
Leppanen, E.
Title
Homografiongelma tekstihaussa ja homografien disambiguoinnin vaikutukset
Source
Informaatiotutkimus. 15(1996) no.4, S.133-144
Year
1996
Abstract
Homonymy is known to often cause false drops in free text searching in a full text database. The problem is quite common and difficult to avoid in Finnish, but nobody has examined it before. Reports on a study that examined the frequency of, and solutions to, the homonymy problem, based on searches made in a Finnish full text database containing about 55.000 newspaper articles. The results indicate that homonymy is not a very serious problem in full text searching, with only about 1 search result set out of 4 containing false drops caused by homonymy. Several other reasons for nonrelevance were much more common. However, in some set results there were a considerable number of homonymy errors, so the number seems to be very random. A study was also made into whether homonyms can be disambiguated by syntactic analysis. The result was that 75,2% of homonyms were disambiguated by this method. Verb homonyms were considerably easier to disambiguate than substantives. Although homonymy is not a very big problem it could perhaps easily be eliminated if there was a suitable syntactic analyzer in the IR system
Footnote
Übers. d. Titels: The homonymy problem in free text searching and the results of homonymy disambiguation
Theme
Volltextretrieval
Retrievalstudien

Similar documents (content)

  1. Gillaspie, L.: ¬The role of linguistic phenomena in retrieval performance (1995) 0.17
    0.16706805 = sum of:
      0.16706805 = product of:
        0.8353402 = sum of:
          0.037208352 = weight(abstract_txt:number in 3930) [ClassicSimilarity], result of:
            0.037208352 = score(doc=3930,freq=1.0), product of:
              0.07206015 = queryWeight, product of:
                1.0101283 = boost
                4.1308103 = idf(docFreq=1832, maxDocs=41962)
                0.017269647 = queryNorm
              0.5163513 = fieldWeight in 3930, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1308103 = idf(docFreq=1832, maxDocs=41962)
                0.125 = fieldNorm(doc=3930)
          0.09399742 = weight(abstract_txt:full in 3930) [ClassicSimilarity], result of:
            0.09399742 = score(doc=3930,freq=1.0), product of:
              0.15300629 = queryWeight, product of:
                1.8027238 = boost
                4.9146957 = idf(docFreq=836, maxDocs=41962)
                0.017269647 = queryNorm
              0.61433697 = fieldWeight in 3930, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9146957 = idf(docFreq=836, maxDocs=41962)
                0.125 = fieldNorm(doc=3930)
          0.23858263 = weight(abstract_txt:false in 3930) [ClassicSimilarity], result of:
            0.23858263 = score(doc=3930,freq=1.0), product of:
              0.2487105 = queryWeight, product of:
                1.8766184 = boost
                7.6742277 = idf(docFreq=52, maxDocs=41962)
                0.017269647 = queryNorm
              0.95927846 = fieldWeight in 3930, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.6742277 = idf(docFreq=52, maxDocs=41962)
                0.125 = fieldNorm(doc=3930)
          0.070430204 = weight(abstract_txt:text in 3930) [ClassicSimilarity], result of:
            0.070430204 = score(doc=3930,freq=1.0), product of:
              0.13892621 = queryWeight, product of:
                1.9835174 = boost
                4.05569 = idf(docFreq=1975, maxDocs=41962)
                0.017269647 = queryNorm
              0.5069612 = fieldWeight in 3930, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.05569 = idf(docFreq=1975, maxDocs=41962)
                0.125 = fieldNorm(doc=3930)
          0.39512157 = weight(abstract_txt:drops in 3930) [ClassicSimilarity], result of:
            0.39512157 = score(doc=3930,freq=1.0), product of:
              0.3481412 = queryWeight, product of:
                2.2202742 = boost
                9.079571 = idf(docFreq=12, maxDocs=41962)
                0.017269647 = queryNorm
              1.1349463 = fieldWeight in 3930, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.079571 = idf(docFreq=12, maxDocs=41962)
                0.125 = fieldNorm(doc=3930)
        0.2 = coord(5/25)
    
  2. Vaughan, L.; Shaw , D.: Bibliographic and Web citations : what Is the difference? (2003) 0.12
    0.119023636 = sum of:
      0.119023636 = product of:
        0.4250844 = sum of:
          0.016278654 = weight(abstract_txt:number in 177) [ClassicSimilarity], result of:
            0.016278654 = score(doc=177,freq=1.0), product of:
              0.07206015 = queryWeight, product of:
                1.0101283 = boost
                4.1308103 = idf(docFreq=1832, maxDocs=41962)
                0.017269647 = queryNorm
              0.22590369 = fieldWeight in 177, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1308103 = idf(docFreq=1832, maxDocs=41962)
                0.0546875 = fieldNorm(doc=177)
          0.016356435 = weight(abstract_txt:there in 177) [ClassicSimilarity], result of:
            0.016356435 = score(doc=177,freq=1.0), product of:
              0.07228951 = queryWeight, product of:
                1.0117345 = boost
                4.1373787 = idf(docFreq=1820, maxDocs=41962)
                0.017269647 = queryNorm
              0.2262629 = fieldWeight in 177, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1373787 = idf(docFreq=1820, maxDocs=41962)
                0.0546875 = fieldNorm(doc=177)
          0.033214707 = weight(abstract_txt:examined in 177) [ClassicSimilarity], result of:
            0.033214707 = score(doc=177,freq=1.0), product of:
              0.115923055 = queryWeight, product of:
                1.2811909 = boost
                5.239291 = idf(docFreq=604, maxDocs=41962)
                0.017269647 = queryNorm
              0.28652373 = fieldWeight in 177, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.239291 = idf(docFreq=604, maxDocs=41962)
                0.0546875 = fieldNorm(doc=177)
          0.04112387 = weight(abstract_txt:full in 177) [ClassicSimilarity], result of:
            0.04112387 = score(doc=177,freq=1.0), product of:
              0.15300629 = queryWeight, product of:
                1.8027238 = boost
                4.9146957 = idf(docFreq=836, maxDocs=41962)
                0.017269647 = queryNorm
              0.26877242 = fieldWeight in 177, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9146957 = idf(docFreq=836, maxDocs=41962)
                0.0546875 = fieldNorm(doc=177)
          0.040865157 = weight(abstract_txt:were in 177) [ClassicSimilarity], result of:
            0.040865157 = score(doc=177,freq=3.0), product of:
              0.116275415 = queryWeight, product of:
                1.8146291 = boost
                3.7103646 = idf(docFreq=2790, maxDocs=41962)
                0.017269647 = queryNorm
              0.3514514 = fieldWeight in 177, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.7103646 = idf(docFreq=2790, maxDocs=41962)
                0.0546875 = fieldNorm(doc=177)
          0.1043799 = weight(abstract_txt:false in 177) [ClassicSimilarity], result of:
            0.1043799 = score(doc=177,freq=1.0), product of:
              0.2487105 = queryWeight, product of:
                1.8766184 = boost
                7.6742277 = idf(docFreq=52, maxDocs=41962)
                0.017269647 = queryNorm
              0.41968432 = fieldWeight in 177, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.6742277 = idf(docFreq=52, maxDocs=41962)
                0.0546875 = fieldNorm(doc=177)
          0.17286569 = weight(abstract_txt:drops in 177) [ClassicSimilarity], result of:
            0.17286569 = score(doc=177,freq=1.0), product of:
              0.3481412 = queryWeight, product of:
                2.2202742 = boost
                9.079571 = idf(docFreq=12, maxDocs=41962)
                0.017269647 = queryNorm
              0.49653903 = fieldWeight in 177, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.079571 = idf(docFreq=12, maxDocs=41962)
                0.0546875 = fieldNorm(doc=177)
        0.28 = coord(7/25)
    
  3. Shuman, B.A.: One false drop deserves another : file selection as a means of increasing precision in online searches (1992) 0.12
    0.11870523 = sum of:
      0.11870523 = product of:
        0.5935261 = sum of:
          0.036116093 = weight(abstract_txt:searching in 4031) [ClassicSimilarity], result of:
            0.036116093 = score(doc=4031,freq=2.0), product of:
              0.076701775 = queryWeight, product of:
                1.0421534 = boost
                4.261773 = idf(docFreq=1607, maxDocs=41962)
                0.017269647 = queryNorm
              0.47086385 = fieldWeight in 4031, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.261773 = idf(docFreq=1607, maxDocs=41962)
                0.078125 = fieldNorm(doc=4031)
          0.03732769 = weight(abstract_txt:common in 4031) [ClassicSimilarity], result of:
            0.03732769 = score(doc=4031,freq=1.0), product of:
              0.09878757 = queryWeight, product of:
                1.1827149 = boost
                4.8365846 = idf(docFreq=904, maxDocs=41962)
                0.017269647 = queryNorm
              0.37785816 = fieldWeight in 4031, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8365846 = idf(docFreq=904, maxDocs=41962)
                0.078125 = fieldNorm(doc=4031)
          0.21087924 = weight(abstract_txt:false in 4031) [ClassicSimilarity], result of:
            0.21087924 = score(doc=4031,freq=2.0), product of:
              0.2487105 = queryWeight, product of:
                1.8766184 = boost
                7.6742277 = idf(docFreq=52, maxDocs=41962)
                0.017269647 = queryNorm
              0.8478904 = fieldWeight in 4031, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.6742277 = idf(docFreq=52, maxDocs=41962)
                0.078125 = fieldNorm(doc=4031)
          0.06225209 = weight(abstract_txt:text in 4031) [ClassicSimilarity], result of:
            0.06225209 = score(doc=4031,freq=2.0), product of:
              0.13892621 = queryWeight, product of:
                1.9835174 = boost
                4.05569 = idf(docFreq=1975, maxDocs=41962)
                0.017269647 = queryNorm
              0.44809464 = fieldWeight in 4031, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.05569 = idf(docFreq=1975, maxDocs=41962)
                0.078125 = fieldNorm(doc=4031)
          0.24695098 = weight(abstract_txt:drops in 4031) [ClassicSimilarity], result of:
            0.24695098 = score(doc=4031,freq=1.0), product of:
              0.3481412 = queryWeight, product of:
                2.2202742 = boost
                9.079571 = idf(docFreq=12, maxDocs=41962)
                0.017269647 = queryNorm
              0.70934147 = fieldWeight in 4031, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.079571 = idf(docFreq=12, maxDocs=41962)
                0.078125 = fieldNorm(doc=4031)
        0.2 = coord(5/25)
    
  4. McBride, J.L.: Faceted subject access for music through USMARC : a case for linked fields (2000) 0.11
    0.10916499 = sum of:
      0.10916499 = product of:
        0.54582494 = sum of:
          0.02325522 = weight(abstract_txt:number in 404) [ClassicSimilarity], result of:
            0.02325522 = score(doc=404,freq=1.0), product of:
              0.07206015 = queryWeight, product of:
                1.0101283 = boost
                4.1308103 = idf(docFreq=1832, maxDocs=41962)
                0.017269647 = queryNorm
              0.32271954 = fieldWeight in 404, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1308103 = idf(docFreq=1832, maxDocs=41962)
                0.078125 = fieldNorm(doc=404)
          0.044847943 = weight(abstract_txt:result in 404) [ClassicSimilarity], result of:
            0.044847943 = score(doc=404,freq=1.0), product of:
              0.11164602 = queryWeight, product of:
                1.2573336 = boost
                5.14173 = idf(docFreq=666, maxDocs=41962)
                0.017269647 = queryNorm
              0.40169764 = fieldWeight in 404, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.14173 = idf(docFreq=666, maxDocs=41962)
                0.078125 = fieldNorm(doc=404)
          0.08165666 = weight(abstract_txt:containing in 404) [ClassicSimilarity], result of:
            0.08165666 = score(doc=404,freq=1.0), product of:
              0.16647255 = queryWeight, product of:
                1.5353247 = boost
                6.278544 = idf(docFreq=213, maxDocs=41962)
                0.017269647 = queryNorm
              0.49051124 = fieldWeight in 404, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.278544 = idf(docFreq=213, maxDocs=41962)
                0.078125 = fieldNorm(doc=404)
          0.14911415 = weight(abstract_txt:false in 404) [ClassicSimilarity], result of:
            0.14911415 = score(doc=404,freq=1.0), product of:
              0.2487105 = queryWeight, product of:
                1.8766184 = boost
                7.6742277 = idf(docFreq=52, maxDocs=41962)
                0.017269647 = queryNorm
              0.59954906 = fieldWeight in 404, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.6742277 = idf(docFreq=52, maxDocs=41962)
                0.078125 = fieldNorm(doc=404)
          0.24695098 = weight(abstract_txt:drops in 404) [ClassicSimilarity], result of:
            0.24695098 = score(doc=404,freq=1.0), product of:
              0.3481412 = queryWeight, product of:
                2.2202742 = boost
                9.079571 = idf(docFreq=12, maxDocs=41962)
                0.017269647 = queryNorm
              0.70934147 = fieldWeight in 404, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.079571 = idf(docFreq=12, maxDocs=41962)
                0.078125 = fieldNorm(doc=404)
        0.2 = coord(5/25)
    
  5. Ahlgren, P.; Kekäläinen, J.: Indexing strategies for Swedish full text retrieval under different user scenarios (2007) 0.10
    0.10120829 = sum of:
      0.10120829 = product of:
        0.36145818 = sum of:
          0.028804623 = weight(abstract_txt:database in 2897) [ClassicSimilarity], result of:
            0.028804623 = score(doc=2897,freq=2.0), product of:
              0.07654551 = queryWeight, product of:
                1.0410912 = boost
                4.2574296 = idf(docFreq=1614, maxDocs=41962)
                0.017269647 = queryNorm
              0.37630716 = fieldWeight in 2897, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2574296 = idf(docFreq=1614, maxDocs=41962)
                0.0625 = fieldNorm(doc=2897)
          0.025339037 = weight(abstract_txt:made in 2897) [ClassicSimilarity], result of:
            0.025339037 = score(doc=2897,freq=1.0), product of:
              0.08854179 = queryWeight, product of:
                1.1197035 = boost
                4.5789065 = idf(docFreq=1170, maxDocs=41962)
                0.017269647 = queryNorm
              0.28618166 = fieldWeight in 2897, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5789065 = idf(docFreq=1170, maxDocs=41962)
                0.0625 = fieldNorm(doc=2897)
          0.10433364 = weight(abstract_txt:analyzer in 2897) [ClassicSimilarity], result of:
            0.10433364 = score(doc=2897,freq=1.0), product of:
              0.18053488 = queryWeight, product of:
                1.1305623 = boost
                9.246624 = idf(docFreq=10, maxDocs=41962)
                0.017269647 = queryNorm
              0.577914 = fieldWeight in 2897, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.246624 = idf(docFreq=10, maxDocs=41962)
                0.0625 = fieldNorm(doc=2897)
          0.04699871 = weight(abstract_txt:full in 2897) [ClassicSimilarity], result of:
            0.04699871 = score(doc=2897,freq=1.0), product of:
              0.15300629 = queryWeight, product of:
                1.8027238 = boost
                4.9146957 = idf(docFreq=836, maxDocs=41962)
                0.017269647 = queryNorm
              0.30716848 = fieldWeight in 2897, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9146957 = idf(docFreq=836, maxDocs=41962)
                0.0625 = fieldNorm(doc=2897)
          0.05392802 = weight(abstract_txt:were in 2897) [ClassicSimilarity], result of:
            0.05392802 = score(doc=2897,freq=4.0), product of:
              0.116275415 = queryWeight, product of:
                1.8146291 = boost
                3.7103646 = idf(docFreq=2790, maxDocs=41962)
                0.017269647 = queryNorm
              0.46379557 = fieldWeight in 2897, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.7103646 = idf(docFreq=2790, maxDocs=41962)
                0.0625 = fieldNorm(doc=2897)
          0.035215102 = weight(abstract_txt:text in 2897) [ClassicSimilarity], result of:
            0.035215102 = score(doc=2897,freq=1.0), product of:
              0.13892621 = queryWeight, product of:
                1.9835174 = boost
                4.05569 = idf(docFreq=1975, maxDocs=41962)
                0.017269647 = queryNorm
              0.2534806 = fieldWeight in 2897, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.05569 = idf(docFreq=1975, maxDocs=41962)
                0.0625 = fieldNorm(doc=2897)
          0.066839054 = weight(abstract_txt:problem in 2897) [ClassicSimilarity], result of:
            0.066839054 = score(doc=2897,freq=2.0), product of:
              0.16903439 = queryWeight, product of:
                2.18792 = boost
                4.4736314 = idf(docFreq=1300, maxDocs=41962)
                0.017269647 = queryNorm
              0.3954169 = fieldWeight in 2897, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4736314 = idf(docFreq=1300, maxDocs=41962)
                0.0625 = fieldNorm(doc=2897)
        0.28 = coord(7/25)