Search (1 results, page 1 of 1)

  • × author_ss:"Goller, C."
  • × theme_ss:"Automatisches Indexieren"
  • × type_ss:"a"
  1. Goller, C.; Löning, J.; Will, T.; Wolff, W.: Automatic document classification : a thourough evaluation of various methods (2000) 0.00
    0.004308666 = product of:
      0.049549658 = sum of:
        0.04483575 = weight(_text_:informationswissenschaft in 5480) [ClassicSimilarity], result of:
          0.04483575 = score(doc=5480,freq=4.0), product of:
            0.10616633 = queryWeight, product of:
              4.504705 = idf(docFreq=1328, maxDocs=44218)
              0.023567878 = queryNorm
            0.42231607 = fieldWeight in 5480, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.504705 = idf(docFreq=1328, maxDocs=44218)
              0.046875 = fieldNorm(doc=5480)
        0.0047139092 = product of:
          0.0094278185 = sum of:
            0.0094278185 = weight(_text_:1 in 5480) [ClassicSimilarity], result of:
              0.0094278185 = score(doc=5480,freq=2.0), product of:
                0.057894554 = queryWeight, product of:
                  2.4565027 = idf(docFreq=10304, maxDocs=44218)
                  0.023567878 = queryNorm
                0.16284466 = fieldWeight in 5480, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.4565027 = idf(docFreq=10304, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5480)
          0.5 = coord(1/2)
      0.08695652 = coord(2/23)
    
    Abstract
    (Automatic) document classification is generally defined as content-based assignment of one or more predefined categories to documents. Usually, machine learning, statistical pattern recognition, or neural network approaches are used to construct classifiers automatically. In this paper we thoroughly evaluate a wide variety of these methods on a document classification task for German text. We evaluate different feature construction and selection methods and various classifiers. Our main results are: (1) feature selection is necessary not only to reduce learning and classification time, but also to avoid overfitting (even for Support Vector Machines); (2) surprisingly, our morphological analysis does not improve classification quality compared to a letter 5-gram approach; (3) Support Vector Machines are significantly better than all other classification methods
    Series
    Schriften zur Informationswissenschaft; Bd.38
    Source
    Informationskompetenz - Basiskompetenz in der Informationsgesellschaft: Proceedings des 7. Internationalen Symposiums für Informationswissenschaft (ISI 2000), Hrsg.: G. Knorz u. R. Kuhlen