Document (#13141)

Author
Pearce, C.
Nicholas, C.
Title
TELLTALE: Experiments in a dynamic hypertext environment for degraded and multilingual data
Source
Journal of the American Society for Information Science. 47(1996) no.4, S.263-275
Year
1996
Abstract
Methods and tools for finding documents relevant to a user's needs in a document corpora can be found in the information retrieval, library science, and hypertext communities. Typically, these systems provide retrieval capabilities for fairly static copora, their algorithms are dependent on the language for which they are written, e.g. English, and they do not perform well when presented with misspelled words or text that has been degraded by OCR techniques. In this article, we present experimentation results for the TELLTALE system. TELLTALE is a dynamic hypertext environment that provides full-text search from a hypertext-style user interface for text corpora that may be garbled by OCR or transmission errors, and that may contain languages other than English. TELLTALE uses several techniques based on n-grams (n character sequences of text). With these results we show that the dynamic linkage mechanisms in TELLTALE are tolerant of garbles in up to 30% of the characters in the body of the texts
Theme
Volltextretrieval
Multilinguale Probleme
Object
TELLTALE
OCR

Similar documents (author)

  1. Pearce, T.: Draft IIS guidelines for professional ethics for information professionals (1998) 2.55
    2.553584 = sum of:
      2.553584 = product of:
        5.107168 = sum of:
          5.107168 = weight(author_txt:pearce in 2443) [ClassicSimilarity], result of:
            5.107168 = score(doc=2443,freq=1.0), product of:
              0.8380154 = queryWeight, product of:
                1.2392823 = boost
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.06934795 = queryNorm
              6.094361 = fieldWeight in 2443, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.625 = fieldNorm(doc=2443)
        0.5 = coord(1/2)
    
  2. Pearce, H.J.: Thesaurus of disability index terms (1996) 2.55
    2.553584 = sum of:
      2.553584 = product of:
        5.107168 = sum of:
          5.107168 = weight(author_txt:pearce in 3634) [ClassicSimilarity], result of:
            5.107168 = score(doc=3634,freq=1.0), product of:
              0.8380154 = queryWeight, product of:
                1.2392823 = boost
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.06934795 = queryNorm
              6.094361 = fieldWeight in 3634, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.625 = fieldNorm(doc=3634)
        0.5 = coord(1/2)
    
  3. Calvert, P.J.; Pearce, B.: Expanding the Fiji numbers in the Dewey Decimal Classification (1979) 2.04
    2.0428672 = sum of:
      2.0428672 = product of:
        4.0857344 = sum of:
          4.0857344 = weight(author_txt:pearce in 2547) [ClassicSimilarity], result of:
            4.0857344 = score(doc=2547,freq=1.0), product of:
              0.8380154 = queryWeight, product of:
                1.2392823 = boost
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.06934795 = queryNorm
              4.8754888 = fieldWeight in 2547, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.5 = fieldNorm(doc=2547)
        0.5 = coord(1/2)
    
  4. Weibel, S.; Pearce, J.: ¬The changing landscape of networked resource description (1996) 2.04
    2.0428672 = sum of:
      2.0428672 = product of:
        4.0857344 = sum of:
          4.0857344 = weight(author_txt:pearce in 5533) [ClassicSimilarity], result of:
            4.0857344 = score(doc=5533,freq=1.0), product of:
              0.8380154 = queryWeight, product of:
                1.2392823 = boost
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.06934795 = queryNorm
              4.8754888 = fieldWeight in 5533, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.5 = fieldNorm(doc=5533)
        0.5 = coord(1/2)
    
  5. Moloney, J.; Pearce, F.: Workshop 3 : Cataloguing standards - can we afford them? (1997) 2.04
    2.0428672 = sum of:
      2.0428672 = product of:
        4.0857344 = sum of:
          4.0857344 = weight(author_txt:pearce in 2837) [ClassicSimilarity], result of:
            4.0857344 = score(doc=2837,freq=1.0), product of:
              0.8380154 = queryWeight, product of:
                1.2392823 = boost
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.06934795 = queryNorm
              4.8754888 = fieldWeight in 2837, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.5 = fieldNorm(doc=2837)
        0.5 = coord(1/2)
    

Similar documents (content)

  1. Intelligent hypertext : Advanced techniques for the World Wide Web (1997) 0.19
    0.19087864 = sum of:
      0.19087864 = product of:
        0.9543932 = sum of:
          0.025704024 = weight(abstract_txt:retrieval in 975) [ClassicSimilarity], result of:
            0.025704024 = score(doc=975,freq=1.0), product of:
              0.06762555 = queryWeight, product of:
                1.0667948 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.018241381 = queryNorm
              0.38009337 = fieldWeight in 975, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.109375 = fieldNorm(doc=975)
          0.098604426 = weight(abstract_txt:techniques in 975) [ClassicSimilarity], result of:
            0.098604426 = score(doc=975,freq=3.0), product of:
              0.11490367 = queryWeight, product of:
                1.3905686 = boost
                4.5298495 = idf(docFreq=1295, maxDocs=44218)
                0.018241381 = queryNorm
              0.8581486 = fieldWeight in 975, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.5298495 = idf(docFreq=1295, maxDocs=44218)
                0.109375 = fieldNorm(doc=975)
          0.061008323 = weight(abstract_txt:environment in 975) [ClassicSimilarity], result of:
            0.061008323 = score(doc=975,freq=1.0), product of:
              0.12032877 = queryWeight, product of:
                1.4230174 = boost
                4.635553 = idf(docFreq=1165, maxDocs=44218)
                0.018241381 = queryNorm
              0.5070136 = fieldWeight in 975, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.635553 = idf(docFreq=1165, maxDocs=44218)
                0.109375 = fieldNorm(doc=975)
          0.17647953 = weight(abstract_txt:dynamic in 975) [ClassicSimilarity], result of:
            0.17647953 = score(doc=975,freq=1.0), product of:
              0.27964264 = queryWeight, product of:
                2.6568851 = boost
                5.7699614 = idf(docFreq=374, maxDocs=44218)
                0.018241381 = queryNorm
              0.6310895 = fieldWeight in 975, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7699614 = idf(docFreq=374, maxDocs=44218)
                0.109375 = fieldNorm(doc=975)
          0.5925969 = weight(abstract_txt:hypertext in 975) [ClassicSimilarity], result of:
            0.5925969 = score(doc=975,freq=7.0), product of:
              0.36079484 = queryWeight, product of:
                3.484742 = boost
                5.6758637 = idf(docFreq=411, maxDocs=44218)
                0.018241381 = queryNorm
              1.6424761 = fieldWeight in 975, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                5.6758637 = idf(docFreq=411, maxDocs=44218)
                0.109375 = fieldNorm(doc=975)
        0.2 = coord(5/25)
    
  2. Tagheva, K.; Borsack, J.; Condit, A.: Effects of OCR errors on ranking and feedback using the vector space model (1996) 0.16
    0.15589558 = sum of:
      0.15589558 = product of:
        0.77947783 = sum of:
          0.08468734 = weight(abstract_txt:character in 4951) [ClassicSimilarity], result of:
            0.08468734 = score(doc=4951,freq=1.0), product of:
              0.118844494 = queryWeight, product of:
                6.515104 = idf(docFreq=177, maxDocs=44218)
                0.018241381 = queryNorm
              0.7125895 = fieldWeight in 4951, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.515104 = idf(docFreq=177, maxDocs=44218)
                0.109375 = fieldNorm(doc=4951)
          0.121666916 = weight(abstract_txt:errors in 4951) [ClassicSimilarity], result of:
            0.121666916 = score(doc=4951,freq=2.0), product of:
              0.12009873 = queryWeight, product of:
                1.005263 = boost
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.018241381 = queryNorm
              1.0130575 = fieldWeight in 4951, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.109375 = fieldNorm(doc=4951)
          0.025704024 = weight(abstract_txt:retrieval in 4951) [ClassicSimilarity], result of:
            0.025704024 = score(doc=4951,freq=1.0), product of:
              0.06762555 = queryWeight, product of:
                1.0667948 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.018241381 = queryNorm
              0.38009337 = fieldWeight in 4951, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.109375 = fieldNorm(doc=4951)
          0.08100371 = weight(abstract_txt:text in 4951) [ClassicSimilarity], result of:
            0.08100371 = score(doc=4951,freq=1.0), product of:
              0.18314287 = queryWeight, product of:
                2.4827642 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.018241381 = queryNorm
              0.4422979 = fieldWeight in 4951, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.109375 = fieldNorm(doc=4951)
          0.46641582 = weight(abstract_txt:degraded in 4951) [ClassicSimilarity], result of:
            0.46641582 = score(doc=4951,freq=1.0), product of:
              0.4669735 = queryWeight, product of:
                2.803313 = boost
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.018241381 = queryNorm
              0.9988057 = fieldWeight in 4951, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.109375 = fieldNorm(doc=4951)
        0.2 = coord(5/25)
    
  3. Khoo, C.S.G.; Dai, D.; Loh, T.E.: Using statistical and contextual information to identify two- and three-character words in Chinese text (2002) 0.15
    0.14993148 = sum of:
      0.14993148 = product of:
        0.53546953 = sum of:
          0.11853758 = weight(abstract_txt:character in 5206) [ClassicSimilarity], result of:
            0.11853758 = score(doc=5206,freq=6.0), product of:
              0.118844494 = queryWeight, product of:
                6.515104 = idf(docFreq=177, maxDocs=44218)
                0.018241381 = queryNorm
              0.9974175 = fieldWeight in 5206, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.515104 = idf(docFreq=177, maxDocs=44218)
                0.0625 = fieldNorm(doc=5206)
          0.020902853 = weight(abstract_txt:results in 5206) [ClassicSimilarity], result of:
            0.020902853 = score(doc=5206,freq=2.0), product of:
              0.067909285 = queryWeight, product of:
                1.0690304 = boost
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.018241381 = queryNorm
              0.30780554 = fieldWeight in 5206, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.482422 = idf(docFreq=3693, maxDocs=44218)
                0.0625 = fieldNorm(doc=5206)
          0.121797964 = weight(abstract_txt:characters in 5206) [ClassicSimilarity], result of:
            0.121797964 = score(doc=5206,freq=3.0), product of:
              0.15246789 = queryWeight, product of:
                1.1326603 = boost
                7.3793993 = idf(docFreq=74, maxDocs=44218)
                0.018241381 = queryNorm
              0.7988434 = fieldWeight in 5206, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.3793993 = idf(docFreq=74, maxDocs=44218)
                0.0625 = fieldNorm(doc=5206)
          0.018486258 = weight(abstract_txt:they in 5206) [ClassicSimilarity], result of:
            0.018486258 = score(doc=5206,freq=1.0), product of:
              0.0788318 = queryWeight, product of:
                1.1517977 = boost
                3.7520406 = idf(docFreq=2820, maxDocs=44218)
                0.018241381 = queryNorm
              0.23450254 = fieldWeight in 5206, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.7520406 = idf(docFreq=2820, maxDocs=44218)
                0.0625 = fieldNorm(doc=5206)
          0.19299597 = weight(abstract_txt:grams in 5206) [ClassicSimilarity], result of:
            0.19299597 = score(doc=5206,freq=4.0), product of:
              0.18828003 = queryWeight, product of:
                1.2586721 = boost
                8.200379 = idf(docFreq=32, maxDocs=44218)
                0.018241381 = queryNorm
              1.0250474 = fieldWeight in 5206, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                8.200379 = idf(docFreq=32, maxDocs=44218)
                0.0625 = fieldNorm(doc=5206)
          0.016461063 = weight(abstract_txt:that in 5206) [ClassicSimilarity], result of:
            0.016461063 = score(doc=5206,freq=2.0), product of:
              0.07859786 = queryWeight, product of:
                1.8184477 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.018241381 = queryNorm
              0.20943399 = fieldWeight in 5206, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0625 = fieldNorm(doc=5206)
          0.046287835 = weight(abstract_txt:text in 5206) [ClassicSimilarity], result of:
            0.046287835 = score(doc=5206,freq=1.0), product of:
              0.18314287 = queryWeight, product of:
                2.4827642 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.018241381 = queryNorm
              0.25274166 = fieldWeight in 5206, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=5206)
        0.28 = coord(7/25)
    
  4. Robertson, A.M.; Willett, P.: Applications of n-grams in textual information systems (1998) 0.15
    0.1460501 = sum of:
      0.1460501 = product of:
        0.6085421 = sum of:
          0.086031504 = weight(abstract_txt:errors in 4715) [ClassicSimilarity], result of:
            0.086031504 = score(doc=4715,freq=1.0), product of:
              0.12009873 = queryWeight, product of:
                1.005263 = boost
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.018241381 = queryNorm
              0.7163398 = fieldWeight in 4715, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.109375 = fieldNorm(doc=4715)
          0.025704024 = weight(abstract_txt:retrieval in 4715) [ClassicSimilarity], result of:
            0.025704024 = score(doc=4715,freq=1.0), product of:
              0.06762555 = queryWeight, product of:
                1.0667948 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.018241381 = queryNorm
              0.38009337 = fieldWeight in 4715, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.109375 = fieldNorm(doc=4715)
          0.12306016 = weight(abstract_txt:characters in 4715) [ClassicSimilarity], result of:
            0.12306016 = score(doc=4715,freq=1.0), product of:
              0.15246789 = queryWeight, product of:
                1.1326603 = boost
                7.3793993 = idf(docFreq=74, maxDocs=44218)
                0.018241381 = queryNorm
              0.8071218 = fieldWeight in 4715, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3793993 = idf(docFreq=74, maxDocs=44218)
                0.109375 = fieldNorm(doc=4715)
          0.23882031 = weight(abstract_txt:grams in 4715) [ClassicSimilarity], result of:
            0.23882031 = score(doc=4715,freq=2.0), product of:
              0.18828003 = queryWeight, product of:
                1.2586721 = boost
                8.200379 = idf(docFreq=32, maxDocs=44218)
                0.018241381 = queryNorm
              1.2684314 = fieldWeight in 4715, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.200379 = idf(docFreq=32, maxDocs=44218)
                0.109375 = fieldNorm(doc=4715)
          0.020369528 = weight(abstract_txt:that in 4715) [ClassicSimilarity], result of:
            0.020369528 = score(doc=4715,freq=1.0), product of:
              0.07859786 = queryWeight, product of:
                1.8184477 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.018241381 = queryNorm
              0.25916135 = fieldWeight in 4715, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.109375 = fieldNorm(doc=4715)
          0.11455654 = weight(abstract_txt:text in 4715) [ClassicSimilarity], result of:
            0.11455654 = score(doc=4715,freq=2.0), product of:
              0.18314287 = queryWeight, product of:
                2.4827642 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.018241381 = queryNorm
              0.6255037 = fieldWeight in 4715, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.109375 = fieldNorm(doc=4715)
        0.24 = coord(6/25)
    
  5. Yang, C.C.; Li, K.W.: ¬A heuristic method based on a statistical approach for chinese text segmentation (2005) 0.14
    0.13735436 = sum of:
      0.13735436 = product of:
        0.49055126 = sum of:
          0.06952395 = weight(abstract_txt:errors in 4580) [ClassicSimilarity], result of:
            0.06952395 = score(doc=4580,freq=2.0), product of:
              0.12009873 = queryWeight, product of:
                1.005263 = boost
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.018241381 = queryNorm
              0.57888997 = fieldWeight in 4580, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.5493927 = idf(docFreq=171, maxDocs=44218)
                0.0625 = fieldNorm(doc=4580)
          0.014688014 = weight(abstract_txt:retrieval in 4580) [ClassicSimilarity], result of:
            0.014688014 = score(doc=4580,freq=1.0), product of:
              0.06762555 = queryWeight, product of:
                1.0667948 = boost
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.018241381 = queryNorm
              0.21719621 = fieldWeight in 4580, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4751394 = idf(docFreq=3720, maxDocs=44218)
                0.0625 = fieldNorm(doc=4580)
          0.07032009 = weight(abstract_txt:characters in 4580) [ClassicSimilarity], result of:
            0.07032009 = score(doc=4580,freq=1.0), product of:
              0.15246789 = queryWeight, product of:
                1.1326603 = boost
                7.3793993 = idf(docFreq=74, maxDocs=44218)
                0.018241381 = queryNorm
              0.46121246 = fieldWeight in 4580, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3793993 = idf(docFreq=74, maxDocs=44218)
                0.0625 = fieldNorm(doc=4580)
          0.13646875 = weight(abstract_txt:grams in 4580) [ClassicSimilarity], result of:
            0.13646875 = score(doc=4580,freq=2.0), product of:
              0.18828003 = queryWeight, product of:
                1.2586721 = boost
                8.200379 = idf(docFreq=32, maxDocs=44218)
                0.018241381 = queryNorm
              0.724818 = fieldWeight in 4580, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.200379 = idf(docFreq=32, maxDocs=44218)
                0.0625 = fieldNorm(doc=4580)
          0.06062328 = weight(abstract_txt:english in 4580) [ClassicSimilarity], result of:
            0.06062328 = score(doc=4580,freq=1.0), product of:
              0.174005 = queryWeight, product of:
                1.7112219 = boost
                5.574394 = idf(docFreq=455, maxDocs=44218)
                0.018241381 = queryNorm
              0.34839964 = fieldWeight in 4580, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.574394 = idf(docFreq=455, maxDocs=44218)
                0.0625 = fieldNorm(doc=4580)
          0.016461063 = weight(abstract_txt:that in 4580) [ClassicSimilarity], result of:
            0.016461063 = score(doc=4580,freq=2.0), product of:
              0.07859786 = queryWeight, product of:
                1.8184477 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.018241381 = queryNorm
              0.20943399 = fieldWeight in 4580, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0625 = fieldNorm(doc=4580)
          0.122466095 = weight(abstract_txt:text in 4580) [ClassicSimilarity], result of:
            0.122466095 = score(doc=4580,freq=7.0), product of:
              0.18314287 = queryWeight, product of:
                2.4827642 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.018241381 = queryNorm
              0.6686916 = fieldWeight in 4580, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=4580)
        0.28 = coord(7/25)