Document (#27543)

Author
Doszkocs, T.E.
Zamora, A.
Title
Dictionary services and spelling aids for Web searching
Source
Online. 28(2004) no.3, S.22-29
Year
2004
Abstract
The Specialized Information Services Division (SIS) of the National Library of Medicine (NLM) provides Web access to more than a dozen scientific databases on toxicology and the environment on TOXNET . Search queries on TOXNET often include misspelled or variant English words, medical and scientific jargon and chemical names. Following the example of search engines like Google and ClinicalTrials.gov, we set out to develop a spelling "suggestion" system for increased recall and precision in TOXNET searching. This paper describes development of dictionary technology that can be used in a variety of applications such as orthographic verification, writing aid, natural language processing, and information storage and retrieval. The design of the technology allows building complex applications using the components developed in the earlier phases of the work in a modular fashion without extensive rewriting of computer code. Since many of the potential applications envisioned for this work have on-line or web-based interfaces, the dictionaries and other computer components must have fast response, and must be adaptable to open-ended database vocabularies, including chemical nomenclature. The dictionary vocabulary for this work was derived from SIS and other databases and specialized resources, such as NLM's Unified Medical Language Systems (UMLS) . The resulting technology, A-Z Dictionary (AZdict), has three major constituents: 1) the vocabulary list, 2) the word attributes that define part of speech and morphological relationships between words in the list, and 3) a set of programs that implements the retrieval of words and their attributes, and determines similarity between words (ChemSpell). These three components can be used in various applications such as spelling verification, spelling aid, part-of-speech tagging, paraphrasing, and many other natural language processing functions.
Theme
Computerlinguistik
Field
Chemie
Object
TOXNET

Similar documents (author)

  1. Doszkocs, T.E.: CITE NLM: Natural language searching in an online catalog (1983) 5.99
    5.989656 = sum of:
      5.989656 = weight(author_txt:doszkocs in 784) [ClassicSimilarity], result of:
        5.989656 = fieldWeight in 784, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.583449 = idf(docFreq=7, maxDocs=42740)
          0.625 = fieldNorm(doc=784)
    
  2. Doszkocs, T.E.: Natural language processing in information retrieval (1986) 5.99
    5.989656 = sum of:
      5.989656 = weight(author_txt:doszkocs in 2696) [ClassicSimilarity], result of:
        5.989656 = fieldWeight in 2696, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.583449 = idf(docFreq=7, maxDocs=42740)
          0.625 = fieldNorm(doc=2696)
    
  3. Doszkocs, T.E.: Simultaneous searching of distributed information and subject repositories on the World Wide Web (1998) 5.99
    5.989656 = sum of:
      5.989656 = weight(author_txt:doszkocs in 3335) [ClassicSimilarity], result of:
        5.989656 = fieldWeight in 3335, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.583449 = idf(docFreq=7, maxDocs=42740)
          0.625 = fieldNorm(doc=3335)
    
  4. Doszkocs, T.E.: Virtual hypertext searching of online databases via the World Wide Web (1996) 5.99
    5.989656 = sum of:
      5.989656 = weight(author_txt:doszkocs in 3417) [ClassicSimilarity], result of:
        5.989656 = fieldWeight in 3417, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.583449 = idf(docFreq=7, maxDocs=42740)
          0.625 = fieldNorm(doc=3417)
    
  5. Doszkocs, T.E.; Weinberg, B.H.: Natural language interfaces for information retrieval (1988) 4.79
    4.7917247 = sum of:
      4.7917247 = weight(author_txt:doszkocs in 2697) [ClassicSimilarity], result of:
        4.7917247 = fieldWeight in 2697, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.583449 = idf(docFreq=7, maxDocs=42740)
          0.5 = fieldNorm(doc=2697)
    

Similar documents (content)

  1. Bellaachia, A.; Amor-Tijani, G.: Proper nouns in English-Arabic cross language information retrieval (2008) 0.23
    0.2324783 = sum of:
      0.2324783 = product of:
        0.83027965 = sum of:
          0.025070477 = weight(abstract_txt:other in 4373) [ClassicSimilarity], result of:
            0.025070477 = score(doc=4373,freq=2.0), product of:
              0.08024002 = queryWeight, product of:
                1.1320734 = boost
                3.5348954 = idf(docFreq=3387, maxDocs=42740)
                0.020051178 = queryNorm
              0.31244355 = fieldWeight in 4373, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5348954 = idf(docFreq=3387, maxDocs=42740)
                0.0625 = fieldNorm(doc=4373)
          0.041208517 = weight(abstract_txt:vocabulary in 4373) [ClassicSimilarity], result of:
            0.041208517 = score(doc=4373,freq=1.0), product of:
              0.123004265 = queryWeight, product of:
                1.1444412 = boost
                5.3602715 = idf(docFreq=545, maxDocs=42740)
                0.020051178 = queryNorm
              0.33501697 = fieldWeight in 4373, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3602715 = idf(docFreq=545, maxDocs=42740)
                0.0625 = fieldNorm(doc=4373)
          0.041865673 = weight(abstract_txt:language in 4373) [ClassicSimilarity], result of:
            0.041865673 = score(doc=4373,freq=2.0), product of:
              0.11294179 = queryWeight, product of:
                1.3430939 = boost
                4.1938066 = idf(docFreq=1752, maxDocs=42740)
                0.020051178 = queryNorm
              0.37068364 = fieldWeight in 4373, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1938066 = idf(docFreq=1752, maxDocs=42740)
                0.0625 = fieldNorm(doc=4373)
          0.08858505 = weight(abstract_txt:speech in 4373) [ClassicSimilarity], result of:
            0.08858505 = score(doc=4373,freq=1.0), product of:
              0.20488137 = queryWeight, product of:
                1.4770142 = boost
                6.9179583 = idf(docFreq=114, maxDocs=42740)
                0.020051178 = queryNorm
              0.4323724 = fieldWeight in 4373, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9179583 = idf(docFreq=114, maxDocs=42740)
                0.0625 = fieldNorm(doc=4373)
          0.14260437 = weight(abstract_txt:words in 4373) [ClassicSimilarity], result of:
            0.14260437 = score(doc=4373,freq=3.0), product of:
              0.24584062 = queryWeight, product of:
                2.2881012 = boost
                5.358442 = idf(docFreq=546, maxDocs=42740)
                0.020051178 = queryNorm
              0.58006835 = fieldWeight in 4373, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.358442 = idf(docFreq=546, maxDocs=42740)
                0.0625 = fieldNorm(doc=4373)
          0.15612738 = weight(abstract_txt:dictionary in 4373) [ClassicSimilarity], result of:
            0.15612738 = score(doc=4373,freq=1.0), product of:
              0.37663865 = queryWeight, product of:
                2.832115 = boost
                6.6324525 = idf(docFreq=152, maxDocs=42740)
                0.020051178 = queryNorm
              0.41452828 = fieldWeight in 4373, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6324525 = idf(docFreq=152, maxDocs=42740)
                0.0625 = fieldNorm(doc=4373)
          0.33481818 = weight(abstract_txt:spelling in 4373) [ClassicSimilarity], result of:
            0.33481818 = score(doc=4373,freq=2.0), product of:
              0.49712798 = queryWeight, product of:
                3.2537377 = boost
                7.619839 = idf(docFreq=56, maxDocs=42740)
                0.020051178 = queryNorm
              0.673505 = fieldWeight in 4373, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.619839 = idf(docFreq=56, maxDocs=42740)
                0.0625 = fieldNorm(doc=4373)
        0.28 = coord(7/25)
    
  2. Jurafsky, D.; Martin, J.H.: Speech and language processing : ani ntroduction to natural language processing, computational linguistics and speech recognition (2009) 0.20
    0.20040447 = sum of:
      0.20040447 = product of:
        0.626264 = sum of:
          0.034365013 = weight(abstract_txt:scientific in 3082) [ClassicSimilarity], result of:
            0.034365013 = score(doc=3082,freq=1.0), product of:
              0.09391461 = queryWeight, product of:
                4.6837454 = idf(docFreq=1073, maxDocs=42740)
                0.020051178 = queryNorm
              0.36591762 = fieldWeight in 3082, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6837454 = idf(docFreq=1073, maxDocs=42740)
                0.078125 = fieldNorm(doc=3082)
          0.09935978 = weight(abstract_txt:processing in 3082) [ClassicSimilarity], result of:
            0.09935978 = score(doc=3082,freq=6.0), product of:
              0.10489276 = queryWeight, product of:
                1.0568326 = boost
                4.9499345 = idf(docFreq=822, maxDocs=42740)
                0.020051178 = queryNorm
              0.9472511 = fieldWeight in 3082, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.9499345 = idf(docFreq=822, maxDocs=42740)
                0.078125 = fieldNorm(doc=3082)
          0.06312652 = weight(abstract_txt:natural in 3082) [ClassicSimilarity], result of:
            0.06312652 = score(doc=3082,freq=2.0), product of:
              0.11180299 = queryWeight, product of:
                1.0910889 = boost
                5.1103826 = idf(docFreq=700, maxDocs=42740)
                0.020051178 = queryNorm
              0.5646228 = fieldWeight in 3082, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.1103826 = idf(docFreq=700, maxDocs=42740)
                0.078125 = fieldNorm(doc=3082)
          0.02215938 = weight(abstract_txt:other in 3082) [ClassicSimilarity], result of:
            0.02215938 = score(doc=3082,freq=1.0), product of:
              0.08024002 = queryWeight, product of:
                1.1320734 = boost
                3.5348954 = idf(docFreq=3387, maxDocs=42740)
                0.020051178 = queryNorm
              0.2761637 = fieldWeight in 3082, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5348954 = idf(docFreq=3387, maxDocs=42740)
                0.078125 = fieldNorm(doc=3082)
          0.104664184 = weight(abstract_txt:language in 3082) [ClassicSimilarity], result of:
            0.104664184 = score(doc=3082,freq=8.0), product of:
              0.11294179 = queryWeight, product of:
                1.3430939 = boost
                4.1938066 = idf(docFreq=1752, maxDocs=42740)
                0.020051178 = queryNorm
              0.9267091 = fieldWeight in 3082, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                4.1938066 = idf(docFreq=1752, maxDocs=42740)
                0.078125 = fieldNorm(doc=3082)
          0.03977079 = weight(abstract_txt:technology in 3082) [ClassicSimilarity], result of:
            0.03977079 = score(doc=3082,freq=1.0), product of:
              0.11850283 = queryWeight, product of:
                1.3757623 = boost
                4.2958136 = idf(docFreq=1582, maxDocs=42740)
                0.020051178 = queryNorm
              0.33561045 = fieldWeight in 3082, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2958136 = idf(docFreq=1582, maxDocs=42740)
                0.078125 = fieldNorm(doc=3082)
          0.19179225 = weight(abstract_txt:speech in 3082) [ClassicSimilarity], result of:
            0.19179225 = score(doc=3082,freq=3.0), product of:
              0.20488137 = queryWeight, product of:
                1.4770142 = boost
                6.9179583 = idf(docFreq=114, maxDocs=42740)
                0.020051178 = queryNorm
              0.93611366 = fieldWeight in 3082, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.9179583 = idf(docFreq=114, maxDocs=42740)
                0.078125 = fieldNorm(doc=3082)
          0.07102612 = weight(abstract_txt:applications in 3082) [ClassicSimilarity], result of:
            0.07102612 = score(doc=3082,freq=1.0), product of:
              0.19198954 = queryWeight, product of:
                2.0220282 = boost
                4.7353325 = idf(docFreq=1019, maxDocs=42740)
                0.020051178 = queryNorm
              0.36994785 = fieldWeight in 3082, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7353325 = idf(docFreq=1019, maxDocs=42740)
                0.078125 = fieldNorm(doc=3082)
        0.32 = coord(8/25)
    
  3. Toivonen, J.; Pirkola, A.; Keskustalo, H.; Visala, K.; Järvelin, K.: Translating cross-lingual spelling variants using transformation rules (2005) 0.20
    0.2000129 = sum of:
      0.2000129 = product of:
        0.83338714 = sum of:
          0.02062121 = weight(abstract_txt:such in 3053) [ClassicSimilarity], result of:
            0.02062121 = score(doc=3053,freq=1.0), product of:
              0.0764825 = queryWeight, product of:
                1.1052489 = boost
                3.451136 = idf(docFreq=3683, maxDocs=42740)
                0.020051178 = queryNorm
              0.26962 = fieldWeight in 3053, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.451136 = idf(docFreq=3683, maxDocs=42740)
                0.078125 = fieldNorm(doc=3053)
          0.02215938 = weight(abstract_txt:other in 3053) [ClassicSimilarity], result of:
            0.02215938 = score(doc=3053,freq=1.0), product of:
              0.08024002 = queryWeight, product of:
                1.1320734 = boost
                3.5348954 = idf(docFreq=3387, maxDocs=42740)
                0.020051178 = queryNorm
              0.2761637 = fieldWeight in 3053, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.5348954 = idf(docFreq=3387, maxDocs=42740)
                0.078125 = fieldNorm(doc=3053)
          0.07400875 = weight(abstract_txt:language in 3053) [ClassicSimilarity], result of:
            0.07400875 = score(doc=3053,freq=4.0), product of:
              0.11294179 = queryWeight, product of:
                1.3430939 = boost
                4.1938066 = idf(docFreq=1752, maxDocs=42740)
                0.020051178 = queryNorm
              0.65528226 = fieldWeight in 3053, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.1938066 = idf(docFreq=1752, maxDocs=42740)
                0.078125 = fieldNorm(doc=3053)
          0.10291584 = weight(abstract_txt:words in 3053) [ClassicSimilarity], result of:
            0.10291584 = score(doc=3053,freq=1.0), product of:
              0.24584062 = queryWeight, product of:
                2.2881012 = boost
                5.358442 = idf(docFreq=546, maxDocs=42740)
                0.020051178 = queryNorm
              0.41862828 = fieldWeight in 3053, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.358442 = idf(docFreq=546, maxDocs=42740)
                0.078125 = fieldNorm(doc=3053)
          0.19515921 = weight(abstract_txt:dictionary in 3053) [ClassicSimilarity], result of:
            0.19515921 = score(doc=3053,freq=1.0), product of:
              0.37663865 = queryWeight, product of:
                2.832115 = boost
                6.6324525 = idf(docFreq=152, maxDocs=42740)
                0.020051178 = queryNorm
              0.51816034 = fieldWeight in 3053, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6324525 = idf(docFreq=152, maxDocs=42740)
                0.078125 = fieldNorm(doc=3053)
          0.41852275 = weight(abstract_txt:spelling in 3053) [ClassicSimilarity], result of:
            0.41852275 = score(doc=3053,freq=2.0), product of:
              0.49712798 = queryWeight, product of:
                3.2537377 = boost
                7.619839 = idf(docFreq=56, maxDocs=42740)
                0.020051178 = queryNorm
              0.8418813 = fieldWeight in 3053, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.619839 = idf(docFreq=56, maxDocs=42740)
                0.078125 = fieldNorm(doc=3053)
        0.24 = coord(6/25)
    
  4. Webster, N.: Webster's third new international dictionary of the English language unabridged : utilizing all the experience and resources of more than one hundred years of Merriam-Webster dictionaries (1993) 0.19
    0.19196641 = sum of:
      0.19196641 = product of:
        0.6855943 = sum of:
          0.02749201 = weight(abstract_txt:scientific in 3416) [ClassicSimilarity], result of:
            0.02749201 = score(doc=3416,freq=1.0), product of:
              0.09391461 = queryWeight, product of:
                4.6837454 = idf(docFreq=1073, maxDocs=42740)
                0.020051178 = queryNorm
              0.2927341 = fieldWeight in 3416, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6837454 = idf(docFreq=1073, maxDocs=42740)
                0.0625 = fieldNorm(doc=3416)
          0.01649697 = weight(abstract_txt:such in 3416) [ClassicSimilarity], result of:
            0.01649697 = score(doc=3416,freq=1.0), product of:
              0.0764825 = queryWeight, product of:
                1.1052489 = boost
                3.451136 = idf(docFreq=3683, maxDocs=42740)
                0.020051178 = queryNorm
              0.215696 = fieldWeight in 3416, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.451136 = idf(docFreq=3683, maxDocs=42740)
                0.0625 = fieldNorm(doc=3416)
          0.05827764 = weight(abstract_txt:vocabulary in 3416) [ClassicSimilarity], result of:
            0.05827764 = score(doc=3416,freq=2.0), product of:
              0.123004265 = queryWeight, product of:
                1.1444412 = boost
                5.3602715 = idf(docFreq=545, maxDocs=42740)
                0.020051178 = queryNorm
              0.47378552 = fieldWeight in 3416, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.3602715 = idf(docFreq=545, maxDocs=42740)
                0.0625 = fieldNorm(doc=3416)
          0.022493895 = weight(abstract_txt:work in 3416) [ClassicSimilarity], result of:
            0.022493895 = score(doc=3416,freq=1.0), product of:
              0.094045006 = queryWeight, product of:
                1.2255949 = boost
                3.826916 = idf(docFreq=2529, maxDocs=42740)
                0.020051178 = queryNorm
              0.23918225 = fieldWeight in 3416, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.826916 = idf(docFreq=2529, maxDocs=42740)
                0.0625 = fieldNorm(doc=3416)
          0.09528637 = weight(abstract_txt:chemical in 3416) [ClassicSimilarity], result of:
            0.09528637 = score(doc=3416,freq=1.0), product of:
              0.21508794 = queryWeight, product of:
                1.5133572 = boost
                7.0881796 = idf(docFreq=96, maxDocs=42740)
                0.020051178 = queryNorm
              0.44301122 = fieldWeight in 3416, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0881796 = idf(docFreq=96, maxDocs=42740)
                0.0625 = fieldNorm(doc=3416)
          0.116435975 = weight(abstract_txt:words in 3416) [ClassicSimilarity], result of:
            0.116435975 = score(doc=3416,freq=2.0), product of:
              0.24584062 = queryWeight, product of:
                2.2881012 = boost
                5.358442 = idf(docFreq=546, maxDocs=42740)
                0.020051178 = queryNorm
              0.4736238 = fieldWeight in 3416, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.358442 = idf(docFreq=546, maxDocs=42740)
                0.0625 = fieldNorm(doc=3416)
          0.34911144 = weight(abstract_txt:dictionary in 3416) [ClassicSimilarity], result of:
            0.34911144 = score(doc=3416,freq=5.0), product of:
              0.37663865 = queryWeight, product of:
                2.832115 = boost
                6.6324525 = idf(docFreq=152, maxDocs=42740)
                0.020051178 = queryNorm
              0.92691344 = fieldWeight in 3416, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.6324525 = idf(docFreq=152, maxDocs=42740)
                0.0625 = fieldNorm(doc=3416)
        0.28 = coord(7/25)
    
  5. Zimmermann, H.H.: Language and language technology (1991) 0.19
    0.18687841 = sum of:
      0.18687841 = product of:
        1.1679901 = sum of:
          0.08112692 = weight(abstract_txt:processing in 3569) [ClassicSimilarity], result of:
            0.08112692 = score(doc=3569,freq=1.0), product of:
              0.10489276 = queryWeight, product of:
                1.0568326 = boost
                4.9499345 = idf(docFreq=822, maxDocs=42740)
                0.020051178 = queryNorm
              0.77342725 = fieldWeight in 3569, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9499345 = idf(docFreq=822, maxDocs=42740)
                0.15625 = fieldNorm(doc=3569)
          0.104664184 = weight(abstract_txt:language in 3569) [ClassicSimilarity], result of:
            0.104664184 = score(doc=3569,freq=2.0), product of:
              0.11294179 = queryWeight, product of:
                1.3430939 = boost
                4.1938066 = idf(docFreq=1752, maxDocs=42740)
                0.020051178 = queryNorm
              0.9267091 = fieldWeight in 3569, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1938066 = idf(docFreq=1752, maxDocs=42740)
                0.15625 = fieldNorm(doc=3569)
          0.39031842 = weight(abstract_txt:dictionary in 3569) [ClassicSimilarity], result of:
            0.39031842 = score(doc=3569,freq=1.0), product of:
              0.37663865 = queryWeight, product of:
                2.832115 = boost
                6.6324525 = idf(docFreq=152, maxDocs=42740)
                0.020051178 = queryNorm
              1.0363207 = fieldWeight in 3569, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6324525 = idf(docFreq=152, maxDocs=42740)
                0.15625 = fieldNorm(doc=3569)
          0.59188056 = weight(abstract_txt:spelling in 3569) [ClassicSimilarity], result of:
            0.59188056 = score(doc=3569,freq=1.0), product of:
              0.49712798 = queryWeight, product of:
                3.2537377 = boost
                7.619839 = idf(docFreq=56, maxDocs=42740)
                0.020051178 = queryNorm
              1.1905999 = fieldWeight in 3569, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.619839 = idf(docFreq=56, maxDocs=42740)
                0.15625 = fieldNorm(doc=3569)
        0.16 = coord(4/25)