Search (1 results, page 1 of 1)

  • × author_ss:"Giorgetti, D."
  • × theme_ss:"Automatisches Klassifizieren"
  1. Giorgetti, D.; Sebastiani, F.: Automating survey coding by multiclass text categorization techniques (2003) 0.01
    0.0051961667 = product of:
      0.020784667 = sum of:
        0.020784667 = product of:
          0.051961668 = sum of:
            0.0264534 = weight(_text_:28 in 5172) [ClassicSimilarity], result of:
              0.0264534 = score(doc=5172,freq=2.0), product of:
                0.13367462 = queryWeight, product of:
                  3.5822632 = idf(docFreq=3342, maxDocs=44218)
                  0.03731569 = queryNorm
                0.19789396 = fieldWeight in 5172, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5822632 = idf(docFreq=3342, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5172)
            0.025508268 = weight(_text_:29 in 5172) [ClassicSimilarity], result of:
              0.025508268 = score(doc=5172,freq=2.0), product of:
                0.13126493 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.03731569 = queryNorm
                0.19432661 = fieldWeight in 5172, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5172)
          0.4 = coord(2/5)
      0.25 = coord(1/4)
    
    Abstract
    In this issue Giorgetti, and Sebastiani suggest that answers to open ended questions in survey instruments can be coded automatically by creating classifiers which learn from training sets of manually coded answers. The manual effort required is only that of classifying a representative set of documents, not creating a dictionary of words that trigger an assignment. They use a naive Bayesian probabilistic learner from Mc Callum's RAINBOW package and the multi-class support vector machine learner from Hsu and Lin's BSVM package, both examples of text categorization techniques. Data from the 1996 General Social Survey by the U.S. National Opinion Research Center provided a set of answers to three questions (previously tested by Viechnicki using a dictionary approach), their associated manually assigned category codes, and a complete set of predefined category codes. The learners were run on three random disjoint subsets of the answer sets to create the classifiers and a remaining set was used as a test set. The dictionary approach is out preformed by 18% for RAINBOW and by 17% for BSVM, while the standard deviation of the results is reduced by 28% and 34% respectively over the dictionary approach.
    Date
    9. 7.2006 10:29:12