Document (#11720)

Author
Losee, R.M.
Haas, S.W.
Title
Sublanguage terms : dictionaries, usage, and automatic classification
Source
Journal of the American Society for Information Science. 46(1995) no.7, S.519-529
Year
1995
Abstract
The use of terms from natural and social science titles and abstracts is studied from the perspective of sublanguages and their specialized dictionaries. Explores different notions of sublanguage distinctiveness. Object methods for separating hard and soft sciences are suggested based on measures of sublanguage use, dictionary characteristics, and sublanguage distinctiveness. Abstracts were automatically classified with a high degree of accuracy by using a formula that condsiders the degree of uniqueness of terms in each sublanguage. This may prove useful for text filtering of information retrieval systems
Theme
Automatisches Klassifizieren

Similar documents (author)

  1. Haas, S.W.; Losee, R.M.: Looking in text windows : their size and composition (1994) 6.05
    6.046562 = sum of:
      6.046562 = sum of:
        2.7530165 = weight(author_txt:losee in 8525) [ClassicSimilarity], result of:
          2.7530165 = score(doc=8525,freq=1.0), product of:
            0.66372216 = queryWeight, product of:
              8.29569 = idf(docFreq=29, maxDocs=44218)
              0.080008075 = queryNorm
            4.147845 = fieldWeight in 8525, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.29569 = idf(docFreq=29, maxDocs=44218)
              0.5 = fieldNorm(doc=8525)
        3.2935457 = weight(author_txt:haas in 8525) [ClassicSimilarity], result of:
          3.2935457 = score(doc=8525,freq=1.0), product of:
            0.7479793 = queryWeight, product of:
              1.0615773 = boost
              8.806516 = idf(docFreq=17, maxDocs=44218)
              0.080008075 = queryNorm
            4.403258 = fieldWeight in 8525, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.806516 = idf(docFreq=17, maxDocs=44218)
              0.5 = fieldNorm(doc=8525)
    
  2. Haas, S.W.: ¬A feasibility study of the case hierarchy model for the construction and porting of natural language interfaces (1990) 2.06
    2.058466 = sum of:
      2.058466 = product of:
        4.116932 = sum of:
          4.116932 = weight(author_txt:haas in 8071) [ClassicSimilarity], result of:
            4.116932 = score(doc=8071,freq=1.0), product of:
              0.7479793 = queryWeight, product of:
                1.0615773 = boost
                8.806516 = idf(docFreq=17, maxDocs=44218)
                0.080008075 = queryNorm
              5.504072 = fieldWeight in 8071, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.806516 = idf(docFreq=17, maxDocs=44218)
                0.625 = fieldNorm(doc=8071)
        0.5 = coord(1/2)
    
  3. Haas, S.W.: Disciplinary variation in automatic sublanguage term identification (1997) 2.06
    2.058466 = sum of:
      2.058466 = product of:
        4.116932 = sum of:
          4.116932 = weight(author_txt:haas in 6500) [ClassicSimilarity], result of:
            4.116932 = score(doc=6500,freq=1.0), product of:
              0.7479793 = queryWeight, product of:
                1.0615773 = boost
                8.806516 = idf(docFreq=17, maxDocs=44218)
                0.080008075 = queryNorm
              5.504072 = fieldWeight in 6500, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.806516 = idf(docFreq=17, maxDocs=44218)
                0.625 = fieldNorm(doc=6500)
        0.5 = coord(1/2)
    
  4. Haas, S.W.: ¬A text filter for the automatic identification of empirical articles (1996) 2.06
    2.058466 = sum of:
      2.058466 = product of:
        4.116932 = sum of:
          4.116932 = weight(author_txt:haas in 6798) [ClassicSimilarity], result of:
            4.116932 = score(doc=6798,freq=1.0), product of:
              0.7479793 = queryWeight, product of:
                1.0615773 = boost
                8.806516 = idf(docFreq=17, maxDocs=44218)
                0.080008075 = queryNorm
              5.504072 = fieldWeight in 6798, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.806516 = idf(docFreq=17, maxDocs=44218)
                0.625 = fieldNorm(doc=6798)
        0.5 = coord(1/2)
    
  5. Haas, S.W.: Natural language processing : toward large-scale, robust systems (1996) 2.06
    2.058466 = sum of:
      2.058466 = product of:
        4.116932 = sum of:
          4.116932 = weight(author_txt:haas in 7415) [ClassicSimilarity], result of:
            4.116932 = score(doc=7415,freq=1.0), product of:
              0.7479793 = queryWeight, product of:
                1.0615773 = boost
                8.806516 = idf(docFreq=17, maxDocs=44218)
                0.080008075 = queryNorm
              5.504072 = fieldWeight in 7415, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.806516 = idf(docFreq=17, maxDocs=44218)
                0.625 = fieldNorm(doc=7415)
        0.5 = coord(1/2)
    

Similar documents (content)

  1. Haas, S.; He, S.: Toward the automatic identification of sublanguage vocabulary (1993) 0.24
    0.23870006 = sum of:
      0.23870006 = product of:
        1.9891672 = sum of:
          0.009686642 = weight(abstract_txt:from in 4891) [ClassicSimilarity], result of:
            0.009686642 = score(doc=4891,freq=1.0), product of:
              0.028037783 = queryWeight, product of:
                1.0019748 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.010124353 = queryNorm
              0.34548533 = fieldWeight in 4891, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.125 = fieldNorm(doc=4891)
          0.13760784 = weight(abstract_txt:abstracts in 4891) [ClassicSimilarity], result of:
            0.13760784 = score(doc=4891,freq=2.0), product of:
              0.13053098 = queryWeight, product of:
                2.1619308 = boost
                5.963546 = idf(docFreq=308, maxDocs=44218)
                0.010124353 = queryNorm
              1.0542159 = fieldWeight in 4891, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.963546 = idf(docFreq=308, maxDocs=44218)
                0.125 = fieldNorm(doc=4891)
          1.8418727 = weight(abstract_txt:sublanguage in 4891) [ClassicSimilarity], result of:
            1.8418727 = score(doc=4891,freq=3.0), product of:
              0.87245053 = queryWeight, product of:
                8.837418 = boost
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.010124353 = queryNorm
              2.1111486 = fieldWeight in 4891, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.125 = fieldNorm(doc=4891)
        0.12 = coord(3/25)
    
  2. Haas, S.W.: Disciplinary variation in automatic sublanguage term identification (1997) 0.18
    0.1751523 = sum of:
      0.1751523 = product of:
        0.7298013 = sum of:
          0.019258961 = weight(abstract_txt:automatically in 6500) [ClassicSimilarity], result of:
            0.019258961 = score(doc=6500,freq=1.0), product of:
              0.05585474 = queryWeight, product of:
                5.5168705 = idf(docFreq=482, maxDocs=44218)
                0.010124353 = queryNorm
              0.3448044 = fieldWeight in 6500, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5168705 = idf(docFreq=482, maxDocs=44218)
                0.0625 = fieldNorm(doc=6500)
          0.004843321 = weight(abstract_txt:from in 6500) [ClassicSimilarity], result of:
            0.004843321 = score(doc=6500,freq=1.0), product of:
              0.028037783 = queryWeight, product of:
                1.0019748 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.010124353 = queryNorm
              0.17274266 = fieldWeight in 6500, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.0625 = fieldNorm(doc=6500)
          0.06514186 = weight(abstract_txt:hard in 6500) [ClassicSimilarity], result of:
            0.06514186 = score(doc=6500,freq=4.0), product of:
              0.0792851 = queryWeight, product of:
                1.1914225 = boost
                6.572923 = idf(docFreq=167, maxDocs=44218)
                0.010124353 = queryNorm
              0.8216154 = fieldWeight in 6500, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.572923 = idf(docFreq=167, maxDocs=44218)
                0.0625 = fieldNorm(doc=6500)
          0.048651718 = weight(abstract_txt:abstracts in 6500) [ClassicSimilarity], result of:
            0.048651718 = score(doc=6500,freq=1.0), product of:
              0.13053098 = queryWeight, product of:
                2.1619308 = boost
                5.963546 = idf(docFreq=308, maxDocs=44218)
                0.010124353 = queryNorm
              0.3727216 = fieldWeight in 6500, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.963546 = idf(docFreq=308, maxDocs=44218)
                0.0625 = fieldNorm(doc=6500)
          0.060202595 = weight(abstract_txt:terms in 6500) [ClassicSimilarity], result of:
            0.060202595 = score(doc=6500,freq=7.0), product of:
              0.09003044 = queryWeight, product of:
                2.1990001 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.010124353 = queryNorm
              0.6686916 = fieldWeight in 6500, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=6500)
          0.5317028 = weight(abstract_txt:sublanguage in 6500) [ClassicSimilarity], result of:
            0.5317028 = score(doc=6500,freq=1.0), product of:
              0.87245053 = queryWeight, product of:
                8.837418 = boost
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.010124353 = queryNorm
              0.6094361 = fieldWeight in 6500, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.0625 = fieldNorm(doc=6500)
        0.24 = coord(6/25)
    
  3. Tsujii, J.-I.: Automatic acquisition of semantic collocation from corpora (1995) 0.09
    0.08584738 = sum of:
      0.08584738 = product of:
        1.0730922 = sum of:
          0.009686642 = weight(abstract_txt:from in 4709) [ClassicSimilarity], result of:
            0.009686642 = score(doc=4709,freq=1.0), product of:
              0.028037783 = queryWeight, product of:
                1.0019748 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.010124353 = queryNorm
              0.34548533 = fieldWeight in 4709, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.125 = fieldNorm(doc=4709)
          1.0634056 = weight(abstract_txt:sublanguage in 4709) [ClassicSimilarity], result of:
            1.0634056 = score(doc=4709,freq=1.0), product of:
              0.87245053 = queryWeight, product of:
                8.837418 = boost
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.010124353 = queryNorm
              1.2188722 = fieldWeight in 4709, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.125 = fieldNorm(doc=4709)
        0.08 = coord(2/25)
    
  4. Hutchins, J.: ¬A new era in machine translation research (1995) 0.08
    0.08468258 = sum of:
      0.08468258 = product of:
        0.7056882 = sum of:
          0.006054152 = weight(abstract_txt:from in 3846) [ClassicSimilarity], result of:
            0.006054152 = score(doc=3846,freq=1.0), product of:
              0.028037783 = queryWeight, product of:
                1.0019748 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.010124353 = queryNorm
              0.21592833 = fieldWeight in 3846, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.078125 = fieldNorm(doc=3846)
          0.035005458 = weight(abstract_txt:specialized in 3846) [ClassicSimilarity], result of:
            0.035005458 = score(doc=3846,freq=1.0), product of:
              0.071689464 = queryWeight, product of:
                1.1329159 = boost
                6.2501497 = idf(docFreq=231, maxDocs=44218)
                0.010124353 = queryNorm
              0.48829293 = fieldWeight in 3846, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2501497 = idf(docFreq=231, maxDocs=44218)
                0.078125 = fieldNorm(doc=3846)
          0.66462857 = weight(abstract_txt:sublanguage in 3846) [ClassicSimilarity], result of:
            0.66462857 = score(doc=3846,freq=1.0), product of:
              0.87245053 = queryWeight, product of:
                8.837418 = boost
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.010124353 = queryNorm
              0.7617951 = fieldWeight in 3846, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.078125 = fieldNorm(doc=3846)
        0.12 = coord(3/25)
    
  5. Ananiadou, S.; McNaught, J.: Terms are not alone : term choice and choice terms (1995) 0.08
    0.08025857 = sum of:
      0.08025857 = product of:
        1.0032321 = sum of:
          0.07275207 = weight(abstract_txt:degree in 1791) [ClassicSimilarity], result of:
            0.07275207 = score(doc=1791,freq=1.0), product of:
              0.11754018 = queryWeight, product of:
                2.0515313 = boost
                5.659016 = idf(docFreq=418, maxDocs=44218)
                0.010124353 = queryNorm
              0.6189549 = fieldWeight in 1791, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.659016 = idf(docFreq=418, maxDocs=44218)
                0.109375 = fieldNorm(doc=1791)
          0.93048 = weight(abstract_txt:sublanguage in 1791) [ClassicSimilarity], result of:
            0.93048 = score(doc=1791,freq=1.0), product of:
              0.87245053 = queryWeight, product of:
                8.837418 = boost
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.010124353 = queryNorm
              1.0665132 = fieldWeight in 1791, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.109375 = fieldNorm(doc=1791)
        0.08 = coord(2/25)