Document (#27542)

Author
Doszkocs, T.E.
Zamora, A.
Title
Dictionary services and spelling aids for Web searching
Source
Online. 28(2004) no.3, S.22-29
Year
2004
Abstract
The Specialized Information Services Division (SIS) of the National Library of Medicine (NLM) provides Web access to more than a dozen scientific databases on toxicology and the environment on TOXNET . Search queries on TOXNET often include misspelled or variant English words, medical and scientific jargon and chemical names. Following the example of search engines like Google and ClinicalTrials.gov, we set out to develop a spelling "suggestion" system for increased recall and precision in TOXNET searching. This paper describes development of dictionary technology that can be used in a variety of applications such as orthographic verification, writing aid, natural language processing, and information storage and retrieval. The design of the technology allows building complex applications using the components developed in the earlier phases of the work in a modular fashion without extensive rewriting of computer code. Since many of the potential applications envisioned for this work have on-line or web-based interfaces, the dictionaries and other computer components must have fast response, and must be adaptable to open-ended database vocabularies, including chemical nomenclature. The dictionary vocabulary for this work was derived from SIS and other databases and specialized resources, such as NLM's Unified Medical Language Systems (UMLS) . The resulting technology, A-Z Dictionary (AZdict), has three major constituents: 1) the vocabulary list, 2) the word attributes that define part of speech and morphological relationships between words in the list, and 3) a set of programs that implements the retrieval of words and their attributes, and determines similarity between words (ChemSpell). These three components can be used in various applications such as spelling verification, spelling aid, part-of-speech tagging, paraphrasing, and many other natural language processing functions.
Theme
Computerlinguistik
Field
Chemie
Object
TOXNET

Similar documents (author)

  1. Doszkocs, T.E.: CITE NLM: Natural language searching in an online catalog (1983) 6.01
    6.010904 = sum of:
      6.010904 = weight(author_txt:doszkocs in 784) [ClassicSimilarity], result of:
        6.010904 = fieldWeight in 784, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.617446 = idf(docFreq=7, maxDocs=44218)
          0.625 = fieldNorm(doc=784)
    
  2. Doszkocs, T.E.: Natural language processing in information retrieval (1986) 6.01
    6.010904 = sum of:
      6.010904 = weight(author_txt:doszkocs in 2696) [ClassicSimilarity], result of:
        6.010904 = fieldWeight in 2696, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.617446 = idf(docFreq=7, maxDocs=44218)
          0.625 = fieldNorm(doc=2696)
    
  3. Doszkocs, T.E.: Simultaneous searching of distributed information and subject repositories on the World Wide Web (1998) 6.01
    6.010904 = sum of:
      6.010904 = weight(author_txt:doszkocs in 2334) [ClassicSimilarity], result of:
        6.010904 = fieldWeight in 2334, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.617446 = idf(docFreq=7, maxDocs=44218)
          0.625 = fieldNorm(doc=2334)
    
  4. Doszkocs, T.E.: Virtual hypertext searching of online databases via the World Wide Web (1996) 6.01
    6.010904 = sum of:
      6.010904 = weight(author_txt:doszkocs in 2416) [ClassicSimilarity], result of:
        6.010904 = fieldWeight in 2416, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.617446 = idf(docFreq=7, maxDocs=44218)
          0.625 = fieldNorm(doc=2416)
    
  5. Doszkocs, T.E.; Weinberg, B.H.: Natural language interfaces for information retrieval (1988) 4.81
    4.808723 = sum of:
      4.808723 = weight(author_txt:doszkocs in 2697) [ClassicSimilarity], result of:
        4.808723 = fieldWeight in 2697, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.617446 = idf(docFreq=7, maxDocs=44218)
          0.5 = fieldNorm(doc=2697)
    

Similar documents (content)

  1. Bellaachia, A.; Amor-Tijani, G.: Proper nouns in English-Arabic cross language information retrieval (2008) 0.23
    0.23025396 = sum of:
      0.23025396 = product of:
        0.8223356 = sum of:
          0.024480812 = weight(abstract_txt:other in 2372) [ClassicSimilarity], result of:
            0.024480812 = score(doc=2372,freq=2.0), product of:
              0.078673236 = queryWeight, product of:
                1.1357669 = boost
                3.5204957 = idf(docFreq=3555, maxDocs=44218)
                0.019675873 = queryNorm
              0.3111708 = fieldWeight in 2372, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5204957 = idf(docFreq=3555, maxDocs=44218)
                0.0625 = fieldNorm(doc=2372)
          0.040689986 = weight(abstract_txt:vocabulary in 2372) [ClassicSimilarity], result of:
            0.040689986 = score(doc=2372,freq=1.0), product of:
              0.12150134 = queryWeight, product of:
                1.152446 = boost
                5.358293 = idf(docFreq=565, maxDocs=44218)
                0.019675873 = queryNorm
              0.33489332 = fieldWeight in 2372, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.358293 = idf(docFreq=565, maxDocs=44218)
                0.0625 = fieldNorm(doc=2372)
          0.041038714 = weight(abstract_txt:language in 2372) [ClassicSimilarity], result of:
            0.041038714 = score(doc=2372,freq=2.0), product of:
              0.111021124 = queryWeight, product of:
                1.3492067 = boost
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.019675873 = queryNorm
              0.3696478 = fieldWeight in 2372, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.0625 = fieldNorm(doc=2372)
          0.08570534 = weight(abstract_txt:speech in 2372) [ClassicSimilarity], result of:
            0.08570534 = score(doc=2372,freq=1.0), product of:
              0.19964631 = queryWeight, product of:
                1.4772727 = boost
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.019675873 = queryNorm
              0.42928585 = fieldWeight in 2372, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.0625 = fieldNorm(doc=2372)
          0.14053747 = weight(abstract_txt:words in 2372) [ClassicSimilarity], result of:
            0.14053747 = score(doc=2372,freq=3.0), product of:
              0.24252343 = queryWeight, product of:
                2.302618 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.019675873 = queryNorm
              0.57948 = fieldWeight in 2372, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.0625 = fieldNorm(doc=2372)
          0.1544621 = weight(abstract_txt:dictionary in 2372) [ClassicSimilarity], result of:
            0.1544621 = score(doc=2372,freq=1.0), product of:
              0.37251806 = queryWeight, product of:
                2.8537683 = boost
                6.634292 = idf(docFreq=157, maxDocs=44218)
                0.019675873 = queryNorm
              0.41464326 = fieldWeight in 2372, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.634292 = idf(docFreq=157, maxDocs=44218)
                0.0625 = fieldNorm(doc=2372)
          0.3354212 = weight(abstract_txt:spelling in 2372) [ClassicSimilarity], result of:
            0.3354212 = score(doc=2372,freq=2.0), product of:
              0.49581125 = queryWeight, product of:
                3.292329 = boost
                7.653836 = idf(docFreq=56, maxDocs=44218)
                0.019675873 = queryNorm
              0.67650986 = fieldWeight in 2372, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.653836 = idf(docFreq=56, maxDocs=44218)
                0.0625 = fieldNorm(doc=2372)
        0.28 = coord(7/25)
    
  2. Toivonen, J.; Pirkola, A.; Keskustalo, H.; Visala, K.; Järvelin, K.: Translating cross-lingual spelling variants using transformation rules (2005) 0.20
    0.19869561 = sum of:
      0.19869561 = product of:
        0.8278984 = sum of:
          0.019935051 = weight(abstract_txt:such in 1052) [ClassicSimilarity], result of:
            0.019935051 = score(doc=1052,freq=1.0), product of:
              0.07448886 = queryWeight, product of:
                1.1051503 = boost
                3.4255946 = idf(docFreq=3909, maxDocs=44218)
                0.019675873 = queryNorm
              0.2676246 = fieldWeight in 1052, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4255946 = idf(docFreq=3909, maxDocs=44218)
                0.078125 = fieldNorm(doc=1052)
          0.021638187 = weight(abstract_txt:other in 1052) [ClassicSimilarity], result of:
            0.021638187 = score(doc=1052,freq=1.0), product of:
              0.078673236 = queryWeight, product of:
                1.1357669 = boost
                3.5204957 = idf(docFreq=3555, maxDocs=44218)
                0.019675873 = queryNorm
              0.27503872 = fieldWeight in 1052, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5204957 = idf(docFreq=3555, maxDocs=44218)
                0.078125 = fieldNorm(doc=1052)
          0.07254688 = weight(abstract_txt:language in 1052) [ClassicSimilarity], result of:
            0.07254688 = score(doc=1052,freq=4.0), product of:
              0.111021124 = queryWeight, product of:
                1.3492067 = boost
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.019675873 = queryNorm
              0.65345114 = fieldWeight in 1052, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.078125 = fieldNorm(doc=1052)
          0.10142419 = weight(abstract_txt:words in 1052) [ClassicSimilarity], result of:
            0.10142419 = score(doc=1052,freq=1.0), product of:
              0.24252343 = queryWeight, product of:
                2.302618 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.019675873 = queryNorm
              0.41820365 = fieldWeight in 1052, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.078125 = fieldNorm(doc=1052)
          0.19307762 = weight(abstract_txt:dictionary in 1052) [ClassicSimilarity], result of:
            0.19307762 = score(doc=1052,freq=1.0), product of:
              0.37251806 = queryWeight, product of:
                2.8537683 = boost
                6.634292 = idf(docFreq=157, maxDocs=44218)
                0.019675873 = queryNorm
              0.51830405 = fieldWeight in 1052, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.634292 = idf(docFreq=157, maxDocs=44218)
                0.078125 = fieldNorm(doc=1052)
          0.4192765 = weight(abstract_txt:spelling in 1052) [ClassicSimilarity], result of:
            0.4192765 = score(doc=1052,freq=2.0), product of:
              0.49581125 = queryWeight, product of:
                3.292329 = boost
                7.653836 = idf(docFreq=56, maxDocs=44218)
                0.019675873 = queryNorm
              0.8456373 = fieldWeight in 1052, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.653836 = idf(docFreq=56, maxDocs=44218)
                0.078125 = fieldNorm(doc=1052)
        0.24 = coord(6/25)
    
  3. Zimmermann, H.H.: Language and language technology (1991) 0.19
    0.18576282 = sum of:
      0.18576282 = product of:
        1.1610177 = sum of:
          0.079319 = weight(abstract_txt:processing in 2568) [ClassicSimilarity], result of:
            0.079319 = score(doc=2568,freq=1.0), product of:
              0.10293131 = queryWeight, product of:
                1.0607275 = boost
                4.931848 = idf(docFreq=866, maxDocs=44218)
                0.019675873 = queryNorm
              0.7706013 = fieldWeight in 2568, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.931848 = idf(docFreq=866, maxDocs=44218)
                0.15625 = fieldNorm(doc=2568)
          0.10259678 = weight(abstract_txt:language in 2568) [ClassicSimilarity], result of:
            0.10259678 = score(doc=2568,freq=2.0), product of:
              0.111021124 = queryWeight, product of:
                1.3492067 = boost
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.019675873 = queryNorm
              0.9241195 = fieldWeight in 2568, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.15625 = fieldNorm(doc=2568)
          0.38615525 = weight(abstract_txt:dictionary in 2568) [ClassicSimilarity], result of:
            0.38615525 = score(doc=2568,freq=1.0), product of:
              0.37251806 = queryWeight, product of:
                2.8537683 = boost
                6.634292 = idf(docFreq=157, maxDocs=44218)
                0.019675873 = queryNorm
              1.0366081 = fieldWeight in 2568, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.634292 = idf(docFreq=157, maxDocs=44218)
                0.15625 = fieldNorm(doc=2568)
          0.5929466 = weight(abstract_txt:spelling in 2568) [ClassicSimilarity], result of:
            0.5929466 = score(doc=2568,freq=1.0), product of:
              0.49581125 = queryWeight, product of:
                3.292329 = boost
                7.653836 = idf(docFreq=56, maxDocs=44218)
                0.019675873 = queryNorm
              1.1959119 = fieldWeight in 2568, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.653836 = idf(docFreq=56, maxDocs=44218)
                0.15625 = fieldNorm(doc=2568)
        0.16 = coord(4/25)
    
  4. Wacholder, N.; Byrd, R.J.: Retrieving information from full text using linguistic knowledge (1994) 0.18
    0.17957857 = sum of:
      0.17957857 = product of:
        0.64135206 = sum of:
          0.056087002 = weight(abstract_txt:processing in 8524) [ClassicSimilarity], result of:
            0.056087002 = score(doc=8524,freq=2.0), product of:
              0.10293131 = queryWeight, product of:
                1.0607275 = boost
                4.931848 = idf(docFreq=866, maxDocs=44218)
                0.019675873 = queryNorm
              0.5448974 = fieldWeight in 8524, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.931848 = idf(docFreq=866, maxDocs=44218)
                0.078125 = fieldNorm(doc=8524)
          0.061276235 = weight(abstract_txt:natural in 8524) [ClassicSimilarity], result of:
            0.061276235 = score(doc=8524,freq=2.0), product of:
              0.10918612 = queryWeight, product of:
                1.0924807 = boost
                5.0794845 = idf(docFreq=747, maxDocs=44218)
                0.019675873 = queryNorm
              0.561209 = fieldWeight in 8524, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.0794845 = idf(docFreq=747, maxDocs=44218)
                0.078125 = fieldNorm(doc=8524)
          0.043328844 = weight(abstract_txt:must in 8524) [ClassicSimilarity], result of:
            0.043328844 = score(doc=8524,freq=1.0), product of:
              0.10918612 = queryWeight, product of:
                1.0924807 = boost
                5.0794845 = idf(docFreq=747, maxDocs=44218)
                0.019675873 = queryNorm
              0.39683473 = fieldWeight in 8524, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0794845 = idf(docFreq=747, maxDocs=44218)
                0.078125 = fieldNorm(doc=8524)
          0.019935051 = weight(abstract_txt:such in 8524) [ClassicSimilarity], result of:
            0.019935051 = score(doc=8524,freq=1.0), product of:
              0.07448886 = queryWeight, product of:
                1.1051503 = boost
                3.4255946 = idf(docFreq=3909, maxDocs=44218)
                0.019675873 = queryNorm
              0.2676246 = fieldWeight in 8524, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4255946 = idf(docFreq=3909, maxDocs=44218)
                0.078125 = fieldNorm(doc=8524)
          0.062827446 = weight(abstract_txt:language in 8524) [ClassicSimilarity], result of:
            0.062827446 = score(doc=8524,freq=3.0), product of:
              0.111021124 = queryWeight, product of:
                1.3492067 = boost
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.019675873 = queryNorm
              0.56590533 = fieldWeight in 8524, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.078125 = fieldNorm(doc=8524)
          0.10142419 = weight(abstract_txt:words in 8524) [ClassicSimilarity], result of:
            0.10142419 = score(doc=8524,freq=1.0), product of:
              0.24252343 = queryWeight, product of:
                2.302618 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.019675873 = queryNorm
              0.41820365 = fieldWeight in 8524, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.078125 = fieldNorm(doc=8524)
          0.2964733 = weight(abstract_txt:spelling in 8524) [ClassicSimilarity], result of:
            0.2964733 = score(doc=8524,freq=1.0), product of:
              0.49581125 = queryWeight, product of:
                3.292329 = boost
                7.653836 = idf(docFreq=56, maxDocs=44218)
                0.019675873 = queryNorm
              0.59795594 = fieldWeight in 8524, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.653836 = idf(docFreq=56, maxDocs=44218)
                0.078125 = fieldNorm(doc=8524)
        0.28 = coord(7/25)
    
  5. Ballard, T.; Lifshin, A.: Prediction of OPAC spelling errors through a keyword inventory (1992) 0.18
    0.17655948 = sum of:
      0.17655948 = product of:
        0.7356645 = sum of:
          0.14705081 = weight(abstract_txt:misspelled in 1499) [ClassicSimilarity], result of:
            0.14705081 = score(doc=1499,freq=1.0), product of:
              0.19571209 = queryWeight, product of:
                1.0342461 = boost
                9.617446 = idf(docFreq=7, maxDocs=44218)
                0.019675873 = queryNorm
              0.751363 = fieldWeight in 1499, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.617446 = idf(docFreq=7, maxDocs=44218)
                0.078125 = fieldNorm(doc=1499)
          0.019935051 = weight(abstract_txt:such in 1499) [ClassicSimilarity], result of:
            0.019935051 = score(doc=1499,freq=1.0), product of:
              0.07448886 = queryWeight, product of:
                1.1051503 = boost
                3.4255946 = idf(docFreq=3909, maxDocs=44218)
                0.019675873 = queryNorm
              0.2676246 = fieldWeight in 1499, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4255946 = idf(docFreq=3909, maxDocs=44218)
                0.078125 = fieldNorm(doc=1499)
          0.021638187 = weight(abstract_txt:other in 1499) [ClassicSimilarity], result of:
            0.021638187 = score(doc=1499,freq=1.0), product of:
              0.078673236 = queryWeight, product of:
                1.1357669 = boost
                3.5204957 = idf(docFreq=3555, maxDocs=44218)
                0.019675873 = queryNorm
              0.27503872 = fieldWeight in 1499, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5204957 = idf(docFreq=3555, maxDocs=44218)
                0.078125 = fieldNorm(doc=1499)
          0.107131675 = weight(abstract_txt:speech in 1499) [ClassicSimilarity], result of:
            0.107131675 = score(doc=1499,freq=1.0), product of:
              0.19964631 = queryWeight, product of:
                1.4772727 = boost
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.019675873 = queryNorm
              0.5366073 = fieldWeight in 1499, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.078125 = fieldNorm(doc=1499)
          0.14343546 = weight(abstract_txt:words in 1499) [ClassicSimilarity], result of:
            0.14343546 = score(doc=1499,freq=2.0), product of:
              0.24252343 = queryWeight, product of:
                2.302618 = boost
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.019675873 = queryNorm
              0.5914293 = fieldWeight in 1499, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.353007 = idf(docFreq=568, maxDocs=44218)
                0.078125 = fieldNorm(doc=1499)
          0.2964733 = weight(abstract_txt:spelling in 1499) [ClassicSimilarity], result of:
            0.2964733 = score(doc=1499,freq=1.0), product of:
              0.49581125 = queryWeight, product of:
                3.292329 = boost
                7.653836 = idf(docFreq=56, maxDocs=44218)
                0.019675873 = queryNorm
              0.59795594 = fieldWeight in 1499, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.653836 = idf(docFreq=56, maxDocs=44218)
                0.078125 = fieldNorm(doc=1499)
        0.24 = coord(6/25)