Document (#7228)

Taghva, K.
¬The effects of noisy data on text retrieval
Journal of the American Society for Information Science. 45(1994) no.1, S.50-58
Reports of the results of experiments on query evaluation on the presence of noisy data, in particular, an OCR-generated database and its corresponding 99.8 % correct version are used to process a set of queries to determine the effect the degraded version will have on retrieval. With the set of scientific documents used in the testing, the effect is insignificant. Improves the result by applying an automatic postprocessing system designed to correct the kinds of errors generated by recognition devices

Similar documents (content)

  1. Tagheva, K.; Borsack, J.; Condit, A.: Effects of OCR errors on ranking and feedback using the vector space model (1996) 0.23
    0.22723691 = sum of:
      0.22723691 = product of:
        0.8115604 = sum of:
          0.08780046 = weight(abstract_txt:recognition in 4951) [ClassicSimilarity], result of:
            0.08780046 = score(doc=4951,freq=1.0), product of:
              0.13114771 = queryWeight, product of:
                1.198636 = boost
                6.1209383 = idf(docFreq=263, maxDocs=44218)
                0.017875385 = queryNorm
              0.66947764 = fieldWeight in 4951, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1209383 = idf(docFreq=263, maxDocs=44218)
                0.109375 = fieldNorm(doc=4951)
          0.10272928 = weight(abstract_txt:presence in 4951) [ClassicSimilarity], result of:
            0.10272928 = score(doc=4951,freq=1.0), product of:
              0.14562157 = queryWeight, product of:
                1.2630479 = boost
                6.449863 = idf(docFreq=189, maxDocs=44218)
                0.017875385 = queryNorm
              0.70545375 = fieldWeight in 4951, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.449863 = idf(docFreq=189, maxDocs=44218)
                0.109375 = fieldNorm(doc=4951)
          0.1521111 = weight(abstract_txt:errors in 4951) [ClassicSimilarity], result of:
            0.1521111 = score(doc=4951,freq=2.0), product of:
              0.15015051 = queryWeight, product of:
                1.2825384 = boost
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.017875385 = queryNorm
              1.0130575 = fieldWeight in 4951, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.109375 = fieldNorm(doc=4951)
          0.02902812 = weight(abstract_txt:used in 4951) [ClassicSimilarity], result of:
            0.02902812 = score(doc=4951,freq=1.0), product of:
              0.079004556 = queryWeight, product of:
                1.3156732 = boost
                3.3592992 = idf(docFreq=4177, maxDocs=44218)
                0.017875385 = queryNorm
              0.36742336 = fieldWeight in 4951, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3592992 = idf(docFreq=4177, maxDocs=44218)
                0.109375 = fieldNorm(doc=4951)
          0.1161931 = weight(abstract_txt:improves in 4951) [ClassicSimilarity], result of:
            0.1161931 = score(doc=4951,freq=1.0), product of:
              0.15808225 = queryWeight, product of:
                1.3159777 = boost
                6.7201533 = idf(docFreq=144, maxDocs=44218)
                0.017875385 = queryNorm
              0.73501676 = fieldWeight in 4951, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7201533 = idf(docFreq=144, maxDocs=44218)
                0.109375 = fieldNorm(doc=4951)
          0.032135826 = weight(abstract_txt:retrieval in 4951) [ClassicSimilarity], result of:
            0.032135826 = score(doc=4951,freq=1.0), product of:
              0.08454719 = queryWeight, product of:
                1.3610421 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.017875385 = queryNorm
              0.38009337 = fieldWeight in 4951, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.109375 = fieldNorm(doc=4951)
          0.2915625 = weight(abstract_txt:degraded in 4951) [ClassicSimilarity], result of:
            0.2915625 = score(doc=4951,freq=1.0), product of:
              0.29191113 = queryWeight, product of:
                1.7882668 = boost
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.017875385 = queryNorm
              0.9988057 = fieldWeight in 4951, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.109375 = fieldNorm(doc=4951)
        0.28 = coord(7/25)
  2. Li, D.; Tang, J.; Ding, Y.; Shuai, X.; Chambers, T.; Sun, G.; Luo, Z.; Zhang, J.: Topic-level opinion influence model (TOIM) : an investigation using tencent microblogging (2015) 0.11
    0.10735042 = sum of:
      0.10735042 = product of:
        0.4472934 = sum of:
          0.0331351 = weight(abstract_txt:experiments in 2345) [ClassicSimilarity], result of:
            0.0331351 = score(doc=2345,freq=1.0), product of:
              0.09945969 = queryWeight, product of:
                1.0438318 = boost
                5.3304167 = idf(docFreq=581, maxDocs=44218)
                0.017875385 = queryNorm
              0.33315104 = fieldWeight in 2345, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3304167 = idf(docFreq=581, maxDocs=44218)
                0.0625 = fieldNorm(doc=2345)
          0.016249826 = weight(abstract_txt:data in 2345) [ClassicSimilarity], result of:
            0.016249826 = score(doc=2345,freq=1.0), product of:
              0.07792869 = queryWeight, product of:
                1.3066843 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.017875385 = queryNorm
              0.20852174 = fieldWeight in 2345, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.0625 = fieldNorm(doc=2345)
          0.016587496 = weight(abstract_txt:used in 2345) [ClassicSimilarity], result of:
            0.016587496 = score(doc=2345,freq=1.0), product of:
              0.079004556 = queryWeight, product of:
                1.3156732 = boost
                3.3592992 = idf(docFreq=4177, maxDocs=44218)
                0.017875385 = queryNorm
              0.2099562 = fieldWeight in 2345, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3592992 = idf(docFreq=4177, maxDocs=44218)
                0.0625 = fieldNorm(doc=2345)
          0.06639606 = weight(abstract_txt:improves in 2345) [ClassicSimilarity], result of:
            0.06639606 = score(doc=2345,freq=1.0), product of:
              0.15808225 = queryWeight, product of:
                1.3159777 = boost
                6.7201533 = idf(docFreq=144, maxDocs=44218)
                0.017875385 = queryNorm
              0.42000958 = fieldWeight in 2345, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7201533 = idf(docFreq=144, maxDocs=44218)
                0.0625 = fieldNorm(doc=2345)
          0.07363643 = weight(abstract_txt:generated in 2345) [ClassicSimilarity], result of:
            0.07363643 = score(doc=2345,freq=1.0), product of:
              0.21339948 = queryWeight, product of:
                2.1623135 = boost
                5.52102 = idf(docFreq=480, maxDocs=44218)
                0.017875385 = queryNorm
              0.34506375 = fieldWeight in 2345, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.52102 = idf(docFreq=480, maxDocs=44218)
                0.0625 = fieldNorm(doc=2345)
          0.24128848 = weight(abstract_txt:noisy in 2345) [ClassicSimilarity], result of:
            0.24128848 = score(doc=2345,freq=1.0), product of:
              0.47078502 = queryWeight, product of:
                3.2116876 = boost
                8.200379 = idf(docFreq=32, maxDocs=44218)
                0.017875385 = queryNorm
              0.5125237 = fieldWeight in 2345, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.200379 = idf(docFreq=32, maxDocs=44218)
                0.0625 = fieldNorm(doc=2345)
        0.24 = coord(6/25)
  3. Beall, J.; Kafadar, K.: Measuring typographical errors' impact on retrieval in bibliographic databases (2007) 0.11
    0.105169706 = sum of:
      0.105169706 = product of:
        0.5258485 = sum of:
          0.07337806 = weight(abstract_txt:presence in 261) [ClassicSimilarity], result of:
            0.07337806 = score(doc=261,freq=1.0), product of:
              0.14562157 = queryWeight, product of:
                1.2630479 = boost
                6.449863 = idf(docFreq=189, maxDocs=44218)
                0.017875385 = queryNorm
              0.5038955 = fieldWeight in 261, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.449863 = idf(docFreq=189, maxDocs=44218)
                0.078125 = fieldNorm(doc=261)
          0.108650774 = weight(abstract_txt:errors in 261) [ClassicSimilarity], result of:
            0.108650774 = score(doc=261,freq=2.0), product of:
              0.15015051 = queryWeight, product of:
                1.2825384 = boost
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.017875385 = queryNorm
              0.7236124 = fieldWeight in 261, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.078125 = fieldNorm(doc=261)
          0.02295416 = weight(abstract_txt:retrieval in 261) [ClassicSimilarity], result of:
            0.02295416 = score(doc=261,freq=1.0), product of:
              0.08454719 = queryWeight, product of:
                1.3610421 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.017875385 = queryNorm
              0.27149525 = fieldWeight in 261, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.078125 = fieldNorm(doc=261)
          0.08755261 = weight(abstract_txt:effect in 261) [ClassicSimilarity], result of:
            0.08755261 = score(doc=261,freq=1.0), product of:
              0.20639743 = queryWeight, product of:
                2.1265428 = boost
                5.4296865 = idf(docFreq=526, maxDocs=44218)
                0.017875385 = queryNorm
              0.42419428 = fieldWeight in 261, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4296865 = idf(docFreq=526, maxDocs=44218)
                0.078125 = fieldNorm(doc=261)
          0.23331288 = weight(abstract_txt:correct in 261) [ClassicSimilarity], result of:
            0.23331288 = score(doc=261,freq=2.0), product of:
              0.31487682 = queryWeight, product of:
                2.6265903 = boost
                6.7064548 = idf(docFreq=146, maxDocs=44218)
                0.017875385 = queryNorm
              0.74096555 = fieldWeight in 261, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.7064548 = idf(docFreq=146, maxDocs=44218)
                0.078125 = fieldNorm(doc=261)
        0.2 = coord(5/25)
  4. Taghva, K.; Borsack, J.; Condit, A.: Evaluation of model-based retrieval effectiveness with OCR text (1996) 0.10
    0.10171864 = sum of:
      0.10171864 = product of:
        0.5085932 = sum of:
          0.07029016 = weight(abstract_txt:experiments in 4485) [ClassicSimilarity], result of:
            0.07029016 = score(doc=4485,freq=2.0), product of:
              0.09945969 = queryWeight, product of:
                1.0438318 = boost
                5.3304167 = idf(docFreq=581, maxDocs=44218)
                0.017875385 = queryNorm
              0.70672005 = fieldWeight in 4485, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.3304167 = idf(docFreq=581, maxDocs=44218)
                0.09375 = fieldNorm(doc=4485)
          0.06732324 = weight(abstract_txt:applying in 4485) [ClassicSimilarity], result of:
            0.06732324 = score(doc=4485,freq=1.0), product of:
              0.12175984 = queryWeight, product of:
                1.1549389 = boost
                5.8977947 = idf(docFreq=329, maxDocs=44218)
                0.017875385 = queryNorm
              0.55291826 = fieldWeight in 4485, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8977947 = idf(docFreq=329, maxDocs=44218)
                0.09375 = fieldNorm(doc=4485)
          0.15968339 = weight(abstract_txt:errors in 4485) [ClassicSimilarity], result of:
            0.15968339 = score(doc=4485,freq=3.0), product of:
              0.15015051 = queryWeight, product of:
                1.2825384 = boost
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.017875385 = queryNorm
              1.0634888 = fieldWeight in 4485, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.09375 = fieldNorm(doc=4485)
          0.055089988 = weight(abstract_txt:retrieval in 4485) [ClassicSimilarity], result of:
            0.055089988 = score(doc=4485,freq=4.0), product of:
              0.08454719 = queryWeight, product of:
                1.3610421 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.017875385 = queryNorm
              0.6515886 = fieldWeight in 4485, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.09375 = fieldNorm(doc=4485)
          0.15620644 = weight(abstract_txt:generated in 4485) [ClassicSimilarity], result of:
            0.15620644 = score(doc=4485,freq=2.0), product of:
              0.21339948 = queryWeight, product of:
                2.1623135 = boost
                5.52102 = idf(docFreq=480, maxDocs=44218)
                0.017875385 = queryNorm
              0.7319907 = fieldWeight in 4485, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.52102 = idf(docFreq=480, maxDocs=44218)
                0.09375 = fieldNorm(doc=4485)
        0.2 = coord(5/25)
  5. Tüür-Fröhlich, T.: ¬The non-trivial effects of trivial errors in scientific communication and evaluation (2016) 0.10
    0.10071975 = sum of:
      0.10071975 = product of:
        0.41966563 = sum of:
          0.1422869 = weight(abstract_txt:errors in 3137) [ClassicSimilarity], result of:
            0.1422869 = score(doc=3137,freq=7.0), product of:
              0.15015051 = queryWeight, product of:
                1.2825384 = boost
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.017875385 = queryNorm
              0.9476285 = fieldWeight in 3137, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3137)
          0.020108135 = weight(abstract_txt:data in 3137) [ClassicSimilarity], result of:
            0.020108135 = score(doc=3137,freq=2.0), product of:
              0.07792869 = queryWeight, product of:
                1.3066843 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.017875385 = queryNorm
              0.2580325 = fieldWeight in 3137, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3137)
          0.016067913 = weight(abstract_txt:retrieval in 3137) [ClassicSimilarity], result of:
            0.016067913 = score(doc=3137,freq=1.0), product of:
              0.08454719 = queryWeight, product of:
                1.3610421 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.017875385 = queryNorm
              0.19004668 = fieldWeight in 3137, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3137)
          0.061286822 = weight(abstract_txt:effect in 3137) [ClassicSimilarity], result of:
            0.061286822 = score(doc=3137,freq=1.0), product of:
              0.20639743 = queryWeight, product of:
                2.1265428 = boost
                5.4296865 = idf(docFreq=526, maxDocs=44218)
                0.017875385 = queryNorm
              0.29693598 = fieldWeight in 3137, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4296865 = idf(docFreq=526, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3137)
          0.064431876 = weight(abstract_txt:generated in 3137) [ClassicSimilarity], result of:
            0.064431876 = score(doc=3137,freq=1.0), product of:
              0.21339948 = queryWeight, product of:
                2.1623135 = boost
                5.52102 = idf(docFreq=480, maxDocs=44218)
                0.017875385 = queryNorm
              0.3019308 = fieldWeight in 3137, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.52102 = idf(docFreq=480, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3137)
          0.115483984 = weight(abstract_txt:correct in 3137) [ClassicSimilarity], result of:
            0.115483984 = score(doc=3137,freq=1.0), product of:
              0.31487682 = queryWeight, product of:
                2.6265903 = boost
                6.7064548 = idf(docFreq=146, maxDocs=44218)
                0.017875385 = queryNorm
              0.36675924 = fieldWeight in 3137, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7064548 = idf(docFreq=146, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3137)
        0.24 = coord(6/25)