Document (#37756)

Author
Justeson, J.S.
Katz, S.M.
Title
Technical terminology : some linguistic properties and an algorithm for identification in text
Source
Natural language engineering. 1(1995) no.1, S.9-27
Year
1995
Abstract
This paper identifies some linguistic properties of technical terminology, and uses them to formulate an algorithm for identifying technical terms in running text. The grammatical properties discussed are preferred phrase structures: technical terms consist mostly of noun phrases containing adjectives, nouns, and occasionally prepositions; rerely do terms contain verbs, adverbs, or conjunctions. The discourse properties are patterns of repetition that distinguish noun phrases that are technical terms, especially those multi-word phrases that constitute a substantial majority of all technical vocabulary, from other types of noun phrase. The paper presents a terminology indentification algorithm that is motivated by these linguistic properties. An implementation of the algorithm is described; it recovers a high proportion of the technical terms in a text, and a high proportaion of the recovered strings are vaild technical terms. The algorithm proves to be effective regardless of the domain of the text to which it is applied.
Theme
Computerlinguistik

Similar documents (author)

  1. Katz, M.: Multimedia: the future of information delivery to homes and business (1993) 5.54
    5.5397964 = sum of:
      5.5397964 = weight(author_txt:katz in 6646) [ClassicSimilarity], result of:
        5.5397964 = fieldWeight in 6646, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.863674 = idf(docFreq=16, maxDocs=44218)
          0.625 = fieldNorm(doc=6646)
    
  2. Katz, B.: Community college reference services : a working guide for and by librarians (1992) 5.54
    5.5397964 = sum of:
      5.5397964 = weight(author_txt:katz in 661) [ClassicSimilarity], result of:
        5.5397964 = fieldWeight in 661, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.863674 = idf(docFreq=16, maxDocs=44218)
          0.625 = fieldNorm(doc=661)
    
  3. Katz, J.S.: Bibliometric standards : personal experience and lessons learned (1996) 5.54
    5.5397964 = sum of:
      5.5397964 = weight(author_txt:katz in 5058) [ClassicSimilarity], result of:
        5.5397964 = fieldWeight in 5058, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.863674 = idf(docFreq=16, maxDocs=44218)
          0.625 = fieldNorm(doc=5058)
    
  4. Katz, W.A.: Introduction to reference work (1997) 5.54
    5.5397964 = sum of:
      5.5397964 = weight(author_txt:katz in 1188) [ClassicSimilarity], result of:
        5.5397964 = fieldWeight in 1188, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.863674 = idf(docFreq=16, maxDocs=44218)
          0.625 = fieldNorm(doc=1188)
    
  5. Katz, W.A.: Introduction to reference work : Vol.1: Basic information sources; vol.2: Reference services and reference processes (1992) 5.54
    5.5397964 = sum of:
      5.5397964 = weight(author_txt:katz in 3364) [ClassicSimilarity], result of:
        5.5397964 = fieldWeight in 3364, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.863674 = idf(docFreq=16, maxDocs=44218)
          0.625 = fieldNorm(doc=3364)
    

Similar documents (content)

  1. Szostak, R.: Basic Concepts Classification (BCC) (2020) 0.17
    0.17449193 = sum of:
      0.17449193 = product of:
        0.62318546 = sum of:
          0.047928207 = weight(abstract_txt:strings in 5883) [ClassicSimilarity], result of:
            0.047928207 = score(doc=5883,freq=1.0), product of:
              0.104654744 = queryWeight, product of:
                7.3274393 = idf(docFreq=78, maxDocs=44218)
                0.01428258 = queryNorm
              0.45796496 = fieldWeight in 5883, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3274393 = idf(docFreq=78, maxDocs=44218)
                0.0625 = fieldNorm(doc=5883)
          0.08483514 = weight(abstract_txt:adjectives in 5883) [ClassicSimilarity], result of:
            0.08483514 = score(doc=5883,freq=1.0), product of:
              0.15313765 = queryWeight, product of:
                1.209655 = boost
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.01428258 = queryNorm
              0.55397964 = fieldWeight in 5883, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.0625 = fieldNorm(doc=5883)
          0.015879124 = weight(abstract_txt:that in 5883) [ClassicSimilarity], result of:
            0.015879124 = score(doc=5883,freq=6.0), product of:
              0.043774255 = queryWeight, product of:
                1.2934806 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.01428258 = queryNorm
              0.36275032 = fieldWeight in 5883, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0625 = fieldNorm(doc=5883)
          0.10837116 = weight(abstract_txt:adverbs in 5883) [ClassicSimilarity], result of:
            0.10837116 = score(doc=5883,freq=1.0), product of:
              0.18029097 = queryWeight, product of:
                1.3125248 = boost
                9.617446 = idf(docFreq=7, maxDocs=44218)
                0.01428258 = queryNorm
              0.6010904 = fieldWeight in 5883, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.617446 = idf(docFreq=7, maxDocs=44218)
                0.0625 = fieldNorm(doc=5883)
          0.06906251 = weight(abstract_txt:terminology in 5883) [ClassicSimilarity], result of:
            0.06906251 = score(doc=5883,freq=1.0), product of:
              0.19256032 = queryWeight, product of:
                2.349441 = boost
                5.7384624 = idf(docFreq=386, maxDocs=44218)
                0.01428258 = queryNorm
              0.3586539 = fieldWeight in 5883, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7384624 = idf(docFreq=386, maxDocs=44218)
                0.0625 = fieldNorm(doc=5883)
          0.04833661 = weight(abstract_txt:terms in 5883) [ClassicSimilarity], result of:
            0.04833661 = score(doc=5883,freq=1.0), product of:
              0.19124907 = queryWeight, product of:
                3.3112793 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.01428258 = queryNorm
              0.25274166 = fieldWeight in 5883, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=5883)
          0.24877268 = weight(abstract_txt:properties in 5883) [ClassicSimilarity], result of:
            0.24877268 = score(doc=5883,freq=4.0), product of:
              0.3379636 = queryWeight, product of:
                4.018283 = boost
                5.888745 = idf(docFreq=332, maxDocs=44218)
                0.01428258 = queryNorm
              0.7360931 = fieldWeight in 5883, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.888745 = idf(docFreq=332, maxDocs=44218)
                0.0625 = fieldNorm(doc=5883)
        0.28 = coord(7/25)
    
  2. Vilares, J.; Alonso, M.A.; Vilares, M.: Extraction of complex index terms in non-English IR : a shallow parsing based approach (2008) 0.16
    0.16283429 = sum of:
      0.16283429 = product of:
        0.6784762 = sum of:
          0.011228236 = weight(abstract_txt:that in 2107) [ClassicSimilarity], result of:
            0.011228236 = score(doc=2107,freq=3.0), product of:
              0.043774255 = queryWeight, product of:
                1.2934806 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.01428258 = queryNorm
              0.2565032 = fieldWeight in 2107, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0625 = fieldNorm(doc=2107)
          0.08726976 = weight(abstract_txt:phrase in 2107) [ClassicSimilarity], result of:
            0.08726976 = score(doc=2107,freq=1.0), product of:
              0.19661531 = queryWeight, product of:
                1.9384036 = boost
                7.1017675 = idf(docFreq=98, maxDocs=44218)
                0.01428258 = queryNorm
              0.44386047 = fieldWeight in 2107, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.1017675 = idf(docFreq=98, maxDocs=44218)
                0.0625 = fieldNorm(doc=2107)
          0.1021426 = weight(abstract_txt:linguistic in 2107) [ClassicSimilarity], result of:
            0.1021426 = score(doc=2107,freq=2.0), product of:
              0.19839612 = queryWeight, product of:
                2.3847768 = boost
                5.8247695 = idf(docFreq=354, maxDocs=44218)
                0.01428258 = queryNorm
              0.51484174 = fieldWeight in 2107, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.8247695 = idf(docFreq=354, maxDocs=44218)
                0.0625 = fieldNorm(doc=2107)
          0.16748287 = weight(abstract_txt:phrases in 2107) [ClassicSimilarity], result of:
            0.16748287 = score(doc=2107,freq=2.0), product of:
              0.27587277 = queryWeight, product of:
                2.8121312 = boost
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.01428258 = queryNorm
              0.60710186 = fieldWeight in 2107, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.0625 = fieldNorm(doc=2107)
          0.24199446 = weight(abstract_txt:noun in 2107) [ClassicSimilarity], result of:
            0.24199446 = score(doc=2107,freq=2.0), product of:
              0.35258636 = queryWeight, product of:
                3.1791713 = boost
                7.7650614 = idf(docFreq=50, maxDocs=44218)
                0.01428258 = queryNorm
              0.6863409 = fieldWeight in 2107, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.7650614 = idf(docFreq=50, maxDocs=44218)
                0.0625 = fieldNorm(doc=2107)
          0.06835829 = weight(abstract_txt:terms in 2107) [ClassicSimilarity], result of:
            0.06835829 = score(doc=2107,freq=2.0), product of:
              0.19124907 = queryWeight, product of:
                3.3112793 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.01428258 = queryNorm
              0.3574307 = fieldWeight in 2107, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=2107)
        0.24 = coord(6/25)
    
  3. Bounhas, I.; Elayeb, B.; Evrard, F.; Slimani, Y.: Organizing contextual knowledge for Arabic text disambiguation and terminology extraction (2011) 0.16
    0.16009896 = sum of:
      0.16009896 = product of:
        0.667079 = sum of:
          0.108359896 = weight(abstract_txt:nouns in 4846) [ClassicSimilarity], result of:
            0.108359896 = score(doc=4846,freq=3.0), product of:
              0.12499811 = queryWeight, product of:
                1.0928794 = boost
                8.008008 = idf(docFreq=39, maxDocs=44218)
                0.01428258 = queryNorm
              0.8668923 = fieldWeight in 4846, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.008008 = idf(docFreq=39, maxDocs=44218)
                0.0625 = fieldNorm(doc=4846)
          0.06906251 = weight(abstract_txt:terminology in 4846) [ClassicSimilarity], result of:
            0.06906251 = score(doc=4846,freq=1.0), product of:
              0.19256032 = queryWeight, product of:
                2.349441 = boost
                5.7384624 = idf(docFreq=386, maxDocs=44218)
                0.01428258 = queryNorm
              0.3586539 = fieldWeight in 4846, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7384624 = idf(docFreq=386, maxDocs=44218)
                0.0625 = fieldNorm(doc=4846)
          0.07222573 = weight(abstract_txt:linguistic in 4846) [ClassicSimilarity], result of:
            0.07222573 = score(doc=4846,freq=1.0), product of:
              0.19839612 = queryWeight, product of:
                2.3847768 = boost
                5.8247695 = idf(docFreq=354, maxDocs=44218)
                0.01428258 = queryNorm
              0.3640481 = fieldWeight in 4846, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8247695 = idf(docFreq=354, maxDocs=44218)
                0.0625 = fieldNorm(doc=4846)
          0.118428275 = weight(abstract_txt:phrases in 4846) [ClassicSimilarity], result of:
            0.118428275 = score(doc=4846,freq=1.0), product of:
              0.27587277 = queryWeight, product of:
                2.8121312 = boost
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.01428258 = queryNorm
              0.42928585 = fieldWeight in 4846, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.0625 = fieldNorm(doc=4846)
          0.17111592 = weight(abstract_txt:noun in 4846) [ClassicSimilarity], result of:
            0.17111592 = score(doc=4846,freq=1.0), product of:
              0.35258636 = queryWeight, product of:
                3.1791713 = boost
                7.7650614 = idf(docFreq=50, maxDocs=44218)
                0.01428258 = queryNorm
              0.48531634 = fieldWeight in 4846, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.7650614 = idf(docFreq=50, maxDocs=44218)
                0.0625 = fieldNorm(doc=4846)
          0.12788664 = weight(abstract_txt:terms in 4846) [ClassicSimilarity], result of:
            0.12788664 = score(doc=4846,freq=7.0), product of:
              0.19124907 = queryWeight, product of:
                3.3112793 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.01428258 = queryNorm
              0.6686916 = fieldWeight in 4846, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=4846)
        0.24 = coord(6/25)
    
  4. Godby, J.: WordSmith research project bridges gap between tokens and indexes (1998) 0.16
    0.15885329 = sum of:
      0.15885329 = product of:
        0.6618887 = sum of:
          0.013751726 = weight(abstract_txt:that in 4729) [ClassicSimilarity], result of:
            0.013751726 = score(doc=4729,freq=2.0), product of:
              0.043774255 = queryWeight, product of:
                1.2934806 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.01428258 = queryNorm
              0.314151 = fieldWeight in 4729, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.09375 = fieldNorm(doc=4729)
          0.068358295 = weight(abstract_txt:text in 4729) [ClassicSimilarity], result of:
            0.068358295 = score(doc=4729,freq=2.0), product of:
              0.12749939 = queryWeight, product of:
                2.2075198 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.01428258 = queryNorm
              0.53614604 = fieldWeight in 4729, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.09375 = fieldNorm(doc=4729)
          0.14650372 = weight(abstract_txt:terminology in 4729) [ClassicSimilarity], result of:
            0.14650372 = score(doc=4729,freq=2.0), product of:
              0.19256032 = queryWeight, product of:
                2.349441 = boost
                5.7384624 = idf(docFreq=386, maxDocs=44218)
                0.01428258 = queryNorm
              0.76081985 = fieldWeight in 4729, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.7384624 = idf(docFreq=386, maxDocs=44218)
                0.09375 = fieldNorm(doc=4729)
          0.1776424 = weight(abstract_txt:phrases in 4729) [ClassicSimilarity], result of:
            0.1776424 = score(doc=4729,freq=1.0), product of:
              0.27587277 = queryWeight, product of:
                2.8121312 = boost
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.01428258 = queryNorm
              0.64392877 = fieldWeight in 4729, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.09375 = fieldNorm(doc=4729)
          0.07250491 = weight(abstract_txt:terms in 4729) [ClassicSimilarity], result of:
            0.07250491 = score(doc=4729,freq=1.0), product of:
              0.19124907 = queryWeight, product of:
                3.3112793 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.01428258 = queryNorm
              0.37911248 = fieldWeight in 4729, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.09375 = fieldNorm(doc=4729)
          0.18312767 = weight(abstract_txt:technical in 4729) [ClassicSimilarity], result of:
            0.18312767 = score(doc=4729,freq=1.0), product of:
              0.39039412 = queryWeight, product of:
                5.4628234 = boost
                5.0035634 = idf(docFreq=806, maxDocs=44218)
                0.01428258 = queryNorm
              0.46908408 = fieldWeight in 4729, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0035634 = idf(docFreq=806, maxDocs=44218)
                0.09375 = fieldNorm(doc=4729)
        0.24 = coord(6/25)
    
  5. Austin, D.; Sørensen, J.: PRECIS in a multilingual context : Pt.2: A linguistic and logical explanation of the syntax. (1976) 0.15
    0.15415207 = sum of:
      0.15415207 = product of:
        0.55054307 = sum of:
          0.047928207 = weight(abstract_txt:strings in 981) [ClassicSimilarity], result of:
            0.047928207 = score(doc=981,freq=1.0), product of:
              0.104654744 = queryWeight, product of:
                7.3274393 = idf(docFreq=78, maxDocs=44218)
                0.01428258 = queryNorm
              0.45796496 = fieldWeight in 981, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3274393 = idf(docFreq=78, maxDocs=44218)
                0.0625 = fieldNorm(doc=981)
          0.012122041 = weight(abstract_txt:some in 981) [ClassicSimilarity], result of:
            0.012122041 = score(doc=981,freq=1.0), product of:
              0.05273415 = queryWeight, product of:
                1.0038793 = boost
                3.6779325 = idf(docFreq=3037, maxDocs=44218)
                0.01428258 = queryNorm
              0.22987078 = fieldWeight in 981, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6779325 = idf(docFreq=3037, maxDocs=44218)
                0.0625 = fieldNorm(doc=981)
          0.014495591 = weight(abstract_txt:that in 981) [ClassicSimilarity], result of:
            0.014495591 = score(doc=981,freq=5.0), product of:
              0.043774255 = queryWeight, product of:
                1.2934806 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.01428258 = queryNorm
              0.3311442 = fieldWeight in 981, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0625 = fieldNorm(doc=981)
          0.12509863 = weight(abstract_txt:linguistic in 981) [ClassicSimilarity], result of:
            0.12509863 = score(doc=981,freq=3.0), product of:
              0.19839612 = queryWeight, product of:
                2.3847768 = boost
                5.8247695 = idf(docFreq=354, maxDocs=44218)
                0.01428258 = queryNorm
              0.6305498 = fieldWeight in 981, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.8247695 = idf(docFreq=354, maxDocs=44218)
                0.0625 = fieldNorm(doc=981)
          0.118428275 = weight(abstract_txt:phrases in 981) [ClassicSimilarity], result of:
            0.118428275 = score(doc=981,freq=1.0), product of:
              0.27587277 = queryWeight, product of:
                2.8121312 = boost
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.01428258 = queryNorm
              0.42928585 = fieldWeight in 981, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.0625 = fieldNorm(doc=981)
          0.10808395 = weight(abstract_txt:terms in 981) [ClassicSimilarity], result of:
            0.10808395 = score(doc=981,freq=5.0), product of:
              0.19124907 = queryWeight, product of:
                3.3112793 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.01428258 = queryNorm
              0.5651476 = fieldWeight in 981, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=981)
          0.12438634 = weight(abstract_txt:properties in 981) [ClassicSimilarity], result of:
            0.12438634 = score(doc=981,freq=1.0), product of:
              0.3379636 = queryWeight, product of:
                4.018283 = boost
                5.888745 = idf(docFreq=332, maxDocs=44218)
                0.01428258 = queryNorm
              0.36804655 = fieldWeight in 981, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.888745 = idf(docFreq=332, maxDocs=44218)
                0.0625 = fieldNorm(doc=981)
        0.28 = coord(7/25)