Search (2 results, page 1 of 1)

  • × author_ss:"Sebastiani, F."
  • × theme_ss:"Automatisches Klassifizieren"
  • × theme_ss:"Computerlinguistik"
  1. Sebastiani, F.: ¬A tutorial an automated text categorisation (1999) 0.00
    0.0024924895 = product of:
      0.004984979 = sum of:
        0.004984979 = product of:
          0.009969958 = sum of:
            0.009969958 = weight(_text_:a in 3390) [ClassicSimilarity], result of:
              0.009969958 = score(doc=3390,freq=18.0), product of:
                0.043477926 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.037706986 = queryNorm
                0.22931081 = fieldWeight in 3390, product of:
                  4.2426405 = tf(freq=18.0), with freq of:
                    18.0 = termFreq=18.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3390)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The automated categorisation (or classification) of texts into topical categories has a long history, dating back at least to 1960. Until the late '80s, the dominant approach to the problem involved knowledge-engineering automatic categorisers, i.e. manually building a set of rules encoding expert knowledge an how to classify documents. In the '90s, with the booming production and availability of on-line documents, automated text categorisation has witnessed an increased and renewed interest. A newer paradigm based an machine learning has superseded the previous approach. Within this paradigm, a general inductive process automatically builds a classifier by "learning", from a set of previously classified documents, the characteristics of one or more categories; the advantages are a very good effectiveness, a considerable savings in terms of expert manpower, and domain independence. In this tutorial we look at the main approaches that have been taken towards automatic text categorisation within the general machine learning paradigm. Issues of document indexing, classifier construction, and classifier evaluation, will be touched upon.
  2. Sebastiani, F.: Machine learning in automated text categorization (2002) 0.00
    0.0021981692 = product of:
      0.0043963385 = sum of:
        0.0043963385 = product of:
          0.008792677 = sum of:
            0.008792677 = weight(_text_:a in 3389) [ClassicSimilarity], result of:
              0.008792677 = score(doc=3389,freq=14.0), product of:
                0.043477926 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.037706986 = queryNorm
                0.20223314 = fieldWeight in 3389, product of:
                  3.7416575 = tf(freq=14.0), with freq of:
                    14.0 = termFreq=14.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3389)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last 10 years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based an machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert labor power, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.
    Type
    a

Types