Search (10 results, page 1 of 1)

  • × author_ss:"Haas, S.W."
  1. Haas, S.W.: Natural language processing : toward large-scale, robust systems (1996) 0.03
    0.028787265 = product of:
      0.05757453 = sum of:
        0.05757453 = sum of:
          0.007654148 = weight(_text_:a in 7415) [ClassicSimilarity], result of:
            0.007654148 = score(doc=7415,freq=4.0), product of:
              0.053105544 = queryWeight, product of:
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.046056706 = queryNorm
              0.14413087 = fieldWeight in 7415, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.0625 = fieldNorm(doc=7415)
          0.04992038 = weight(_text_:22 in 7415) [ClassicSimilarity], result of:
            0.04992038 = score(doc=7415,freq=2.0), product of:
              0.16128273 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046056706 = queryNorm
              0.30952093 = fieldWeight in 7415, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0625 = fieldNorm(doc=7415)
      0.5 = coord(1/2)
    
    Abstract
    State of the art review of natural language processing updating an earlier review published in ARIST 22(1987). Discusses important developments that have allowed for significant advances in the field of natural language processing: materials and resources; knowledge based systems and statistical approaches; and a strong emphasis on evaluation. Reviews some natural language processing applications and common problems still awaiting solution. Considers closely related applications such as language generation and th egeneration phase of machine translation which face the same problems as natural language processing. Covers natural language methodologies for information retrieval only briefly
    Type
    a
  2. Haas, S.W.: ¬A feasibility study of the case hierarchy model for the construction and porting of natural language interfaces (1990) 0.00
    0.00334869 = product of:
      0.00669738 = sum of:
        0.00669738 = product of:
          0.01339476 = sum of:
            0.01339476 = weight(_text_:a in 8071) [ClassicSimilarity], result of:
              0.01339476 = score(doc=8071,freq=4.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.25222903 = fieldWeight in 8071, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.109375 = fieldNorm(doc=8071)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Type
    a
  3. Haas, S.W.: ¬A text filter for the automatic identification of empirical articles (1996) 0.00
    0.00334869 = product of:
      0.00669738 = sum of:
        0.00669738 = product of:
          0.01339476 = sum of:
            0.01339476 = weight(_text_:a in 6798) [ClassicSimilarity], result of:
              0.01339476 = score(doc=6798,freq=4.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.25222903 = fieldWeight in 6798, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.109375 = fieldNorm(doc=6798)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Type
    a
  4. Haas, S.W.; Grams, E.S.: Readers, authors, and page structure : a discussion of four questions arising from a content analysis of Web pages (2000) 0.00
    0.0032090992 = product of:
      0.0064181983 = sum of:
        0.0064181983 = product of:
          0.012836397 = sum of:
            0.012836397 = weight(_text_:a in 4387) [ClassicSimilarity], result of:
              0.012836397 = score(doc=4387,freq=20.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.24171482 = fieldWeight in 4387, product of:
                  4.472136 = tf(freq=20.0), with freq of:
                    20.0 = termFreq=20.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4387)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Previous research describing Web page and link classification systems resulting from a content analysis of over 75 Web pages left us with four unanswered questions: (1) What is the most useful apllication of page types: as descriptions of entire pages or as components that are combined to create pages? (2) Is there a kind of analysis that we can perform on isolated anchors, which can be text, icons, or both together, that is equivalent to the syntactic analysis for embedded and labeld anchors? (3) How explicitly are readers informed about what can be found by traversing a link, especially for the relatively broad categories of expansion and resource links? (4) Is there a relationship between the type of link and whther its target is a whole page or a fragment, or of its target is in the same site or a different site than its source? This article examines these questions
    Type
    a
  5. Haas, S.W.: Improving the search environment : informed decision making in the search for statistical information (2003) 0.00
    0.0023919214 = product of:
      0.0047838427 = sum of:
        0.0047838427 = product of:
          0.009567685 = sum of:
            0.009567685 = weight(_text_:a in 1687) [ClassicSimilarity], result of:
              0.009567685 = score(doc=1687,freq=16.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.18016359 = fieldWeight in 1687, product of:
                  4.0 = tf(freq=16.0), with freq of:
                    16.0 = termFreq=16.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1687)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    A search for information can be viewed as a series of decisions made by the searcher. Two dimensions of the search environment affect a user's decisions: the user's knowledge, and the configuration of the information retrieval system. Drawing an previous findings an users' lack of search or domain knowledge, this article investigates what the user needs to know to make informed search decisions at the United States Bureau of Labor Statistics (BLS) Web site, which provides statistical information an labor and related topics. Its extensive Web site is a rich collection of statistical information, ranging from individual statistics such as the current Consumer Price Index (CM), to a [arge statistical database called LABSTAT that can be queried to construct a tabie or time series an the fly. Two models of the search environment and the query process in LABSTAT are presented. They provide complementary views of the decision points at which help may be needed, and also suggest useful help content. Extensive examples based an the industry concept illustrate how the information could assist users' search decisions. The article concludes with a discussion of the role of help facilities in Web searching, and the interesting question of how to initiate the provision of help.
    Type
    a
  6. Losee, R.M.; Haas, S.W.: Sublanguage terms : dictionaries, usage, and automatic classification (1995) 0.00
    0.0023435948 = product of:
      0.0046871896 = sum of:
        0.0046871896 = product of:
          0.009374379 = sum of:
            0.009374379 = weight(_text_:a in 2650) [ClassicSimilarity], result of:
              0.009374379 = score(doc=2650,freq=6.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.17652355 = fieldWeight in 2650, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0625 = fieldNorm(doc=2650)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The use of terms from natural and social science titles and abstracts is studied from the perspective of sublanguages and their specialized dictionaries. Explores different notions of sublanguage distinctiveness. Object methods for separating hard and soft sciences are suggested based on measures of sublanguage use, dictionary characteristics, and sublanguage distinctiveness. Abstracts were automatically classified with a high degree of accuracy by using a formula that condsiders the degree of uniqueness of terms in each sublanguage. This may prove useful for text filtering of information retrieval systems
    Type
    a
  7. Metzler, D.P.; Haas, S.W.; Cosic, C.L.; Wheeler, L.H.: Constituent object parsing for information retrieval and similar text processing problems (1989) 0.00
    0.0023435948 = product of:
      0.0046871896 = sum of:
        0.0046871896 = product of:
          0.009374379 = sum of:
            0.009374379 = weight(_text_:a in 2858) [ClassicSimilarity], result of:
              0.009374379 = score(doc=2858,freq=6.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.17652355 = fieldWeight in 2858, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0625 = fieldNorm(doc=2858)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Describes the architecture and functioning of the Constituent Object Parser. This system has been developed specially for text processing applications such as information retrieval, which can benefit from structural comparisons between elements of text such as a query and a potentially relevant abstract. Describes the general way in which this objective influenced the design of the system.
    Type
    a
  8. Haas, S.W.; Losee, R.M.: Looking in text windows : their size and composition (1994) 0.00
    0.002269176 = product of:
      0.004538352 = sum of:
        0.004538352 = product of:
          0.009076704 = sum of:
            0.009076704 = weight(_text_:a in 8525) [ClassicSimilarity], result of:
              0.009076704 = score(doc=8525,freq=10.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.1709182 = fieldWeight in 8525, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=8525)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    A text window is a group of words appearing in contiguous positions in text used to exploit a variety of lexical, syntactics, and semantic relationships without having to analyze the text explicitely for their structure. This supports the previously suggested idea that natural grouping of words are best treated as a unit of size 7 to 11 words, that is, plus or minus 3 to 5 words. The text retrieval experiments varying the size of windows, both with full text and with stopwords removed, support these size ranges. The characteristcs of windows that best match terms in queries are examined in detail, revealing intersting differences between those for queries with good results and those for queries with poorer results. Queries with good results tend to contain morte content word phrase and few terms with high frequency of use in the database. Information retrieval systems may benefit from expanding thesaurus-style relationships or incorporating statistical dependencies for terms within these windows
    Type
    a
  9. Haas, S.W.: Disciplinary variation in automatic sublanguage term identification (1997) 0.00
    0.0020714647 = product of:
      0.0041429293 = sum of:
        0.0041429293 = product of:
          0.008285859 = sum of:
            0.008285859 = weight(_text_:a in 6500) [ClassicSimilarity], result of:
              0.008285859 = score(doc=6500,freq=12.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.15602624 = fieldWeight in 6500, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=6500)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The research presented here describes a method for automatically idetifying sublanguage (SL) domain terms and revealing the patterns in which they occur in text. By applying this method to abstracts from a variety of disciplines, differences in how SL domain terminology occurs can be discerned. Results of this research have both practical and theoretical implications. These include 1) the identification of patterns of domain term occurrence, 2) a step towar the identification of families of SLs that share term occurrence patterns, 3) a domain term extraction procedure that can exploit the term occurrence patterns, and 4) evidence to support the intuitive notion of a continuum of 'technicality' of disciplines and their SLs. The investigation revealed relatively consistent differences between the hard sciences, such as physics or biology, and the social sciences and humanities, such as history or sociology. The hard sciences tended to have more domain terms, and more of these terms occured in sequences than in the social sciences and humanities. The seed terms used in this research occured adjacent to domain terms more often in the hard sciences than in the social sciences. The extraction process was more successful in the hard science disciplines than in the social sciences, identifying more of the domain terms while extracting fewer general terms
    Type
    a
  10. Metzler, D.P.; Haas, S.W.: ¬The constituent object parser : syntactic structure matching for information retrieval (1989) 0.00
    0.0016913437 = product of:
      0.0033826875 = sum of:
        0.0033826875 = product of:
          0.006765375 = sum of:
            0.006765375 = weight(_text_:a in 3607) [ClassicSimilarity], result of:
              0.006765375 = score(doc=3607,freq=2.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.12739488 = fieldWeight in 3607, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.078125 = fieldNorm(doc=3607)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Type
    a