Search (4 results, page 1 of 1)

Haas, S.W.: Natural language processing : toward large-scale, robust systems (1996) 0.04

0.04443473 = product of:
  0.08886946 = sum of:
    0.08886946 = sum of:
      0.03211372 = weight(_text_:science in 7415) [ClassicSimilarity], result of:
        0.03211372 = score(doc=7415,freq=2.0), product of:
          0.13793045 = queryWeight, product of:
            2.6341193 = idf(docFreq=8627, maxDocs=44218)
            0.052363027 = queryNorm
          0.23282544 = fieldWeight in 7415, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            2.6341193 = idf(docFreq=8627, maxDocs=44218)
            0.0625 = fieldNorm(doc=7415)
      0.05675574 = weight(_text_:22 in 7415) [ClassicSimilarity], result of:
        0.05675574 = score(doc=7415,freq=2.0), product of:
          0.1833664 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.052363027 = queryNorm
          0.30952093 = fieldWeight in 7415, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0625 = fieldNorm(doc=7415)
  0.5 = coord(1/2)

Abstract: State of the art review of natural language processing updating an earlier review published in ARIST 22(1987). Discusses important developments that have allowed for significant advances in the field of natural language processing: materials and resources; knowledge based systems and statistical approaches; and a strong emphasis on evaluation. Reviews some natural language processing applications and common problems still awaiting solution. Considers closely related applications such as language generation and th egeneration phase of machine translation which face the same problems as natural language processing. Covers natural language methodologies for information retrieval only briefly
Source: Annual review of information science and technology. 31(1996), S.83-119

Haas, S.W.: ¬A text filter for the automatic identification of empirical articles (1996) 0.01

0.014049753 = product of:
  0.028099505 = sum of:
    0.028099505 = product of:
      0.05619901 = sum of:
        0.05619901 = weight(_text_:science in 6798) [ClassicSimilarity], result of:
          0.05619901 = score(doc=6798,freq=2.0), product of:
            0.13793045 = queryWeight, product of:
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.052363027 = queryNorm
            0.40744454 = fieldWeight in 6798, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.109375 = fieldNorm(doc=6798)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: Journal of the American Society for Information Science. 47(1996) no.2, S.167-169

Losee, R.M.; Haas, S.W.: Sublanguage terms : dictionaries, usage, and automatic classification (1995) 0.01
```
0.011353915 = product of:
  0.02270783 = sum of:
    0.02270783 = product of:
      0.04541566 = sum of:
        0.04541566 = weight(_text_:science in 2650) [ClassicSimilarity], result of:
          0.04541566 = score(doc=2650,freq=4.0), product of:
            0.13793045 = queryWeight, product of:
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.052363027 = queryNorm
            0.3292649 = fieldWeight in 2650, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.0625 = fieldNorm(doc=2650)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The use of terms from natural and social science titles and abstracts is studied from the perspective of sublanguages and their specialized dictionaries. Explores different notions of sublanguage distinctiveness. Object methods for separating hard and soft sciences are suggested based on measures of sublanguage use, dictionary characteristics, and sublanguage distinctiveness. Abstracts were automatically classified with a high degree of accuracy by using a formula that condsiders the degree of uniqueness of terms in each sublanguage. This may prove useful for text filtering of information retrieval systems

Source

Journal of the American Society for Information Science. 46(1995) no.7, S.519-529
Haas, S.W.: Disciplinary variation in automatic sublanguage term identification (1997) 0.01
```
0.0070961965 = product of:
  0.014192393 = sum of:
    0.014192393 = product of:
      0.028384786 = sum of:
        0.028384786 = weight(_text_:science in 6500) [ClassicSimilarity], result of:
          0.028384786 = score(doc=6500,freq=4.0), product of:
            0.13793045 = queryWeight, product of:
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.052363027 = queryNorm
            0.20579056 = fieldWeight in 6500, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.0390625 = fieldNorm(doc=6500)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The research presented here describes a method for automatically idetifying sublanguage (SL) domain terms and revealing the patterns in which they occur in text. By applying this method to abstracts from a variety of disciplines, differences in how SL domain terminology occurs can be discerned. Results of this research have both practical and theoretical implications. These include 1) the identification of patterns of domain term occurrence, 2) a step towar the identification of families of SLs that share term occurrence patterns, 3) a domain term extraction procedure that can exploit the term occurrence patterns, and 4) evidence to support the intuitive notion of a continuum of 'technicality' of disciplines and their SLs. The investigation revealed relatively consistent differences between the hard sciences, such as physics or biology, and the social sciences and humanities, such as history or sociology. The hard sciences tended to have more domain terms, and more of these terms occured in sequences than in the social sciences and humanities. The seed terms used in this research occured adjacent to domain terms more often in the hard sciences than in the social sciences. The extraction process was more successful in the hard science disciplines than in the social sciences, identifying more of the domain terms while extracting fewer general terms

Source

Journal of the American Society for Information Science. 48(1997) no.1, S.67-79

Search (4 results, page 1 of 1)

Themes