Search (3 results, page 1 of 1)

  • × theme_ss:"Automatisches Indexieren"
  • × theme_ss:"Computerlinguistik"
  • × year_i:[2000 TO 2010}
  1. Goller, C.; Löning, J.; Will, T.; Wolff, W.: Automatic document classification : a thourough evaluation of various methods (2000) 0.00
    0.004827458 = product of:
      0.038619664 = sum of:
        0.038619664 = weight(_text_:wide in 5480) [ClassicSimilarity], result of:
          0.038619664 = score(doc=5480,freq=2.0), product of:
            0.13148437 = queryWeight, product of:
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.029675366 = queryNorm
            0.29372054 = fieldWeight in 5480, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.046875 = fieldNorm(doc=5480)
      0.125 = coord(1/8)
    
    Abstract
    (Automatic) document classification is generally defined as content-based assignment of one or more predefined categories to documents. Usually, machine learning, statistical pattern recognition, or neural network approaches are used to construct classifiers automatically. In this paper we thoroughly evaluate a wide variety of these methods on a document classification task for German text. We evaluate different feature construction and selection methods and various classifiers. Our main results are: (1) feature selection is necessary not only to reduce learning and classification time, but also to avoid overfitting (even for Support Vector Machines); (2) surprisingly, our morphological analysis does not improve classification quality compared to a letter 5-gram approach; (3) Support Vector Machines are significantly better than all other classification methods
  2. Snajder, J.; Dalbelo Basic, B.D.; Tadic, M.: Automatic acquisition of inflectional lexica for morphological normalisation (2008) 0.00
    0.003914421 = product of:
      0.031315368 = sum of:
        0.031315368 = product of:
          0.062630735 = sum of:
            0.062630735 = weight(_text_:mining in 2910) [ClassicSimilarity], result of:
              0.062630735 = score(doc=2910,freq=2.0), product of:
                0.16744171 = queryWeight, product of:
                  5.642448 = idf(docFreq=425, maxDocs=44218)
                  0.029675366 = queryNorm
                0.37404498 = fieldWeight in 2910, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.642448 = idf(docFreq=425, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2910)
          0.5 = coord(1/2)
      0.125 = coord(1/8)
    
    Abstract
    Due to natural language morphology, words can take on various morphological forms. Morphological normalisation - often used in information retrieval and text mining systems - conflates morphological variants of a word to a single representative form. In this paper, we describe an approach to lexicon-based inflectional normalisation. This approach is in between stemming and lemmatisation, and is suitable for morphological normalisation of inflectionally complex languages. To eliminate the immense effort required to compile the lexicon by hand, we focus on the problem of acquiring automatically an inflectional morphological lexicon from raw corpora. We propose a convenient and highly expressive morphology representation formalism on which the acquisition procedure is based. Our approach is applied to the morphologically complex Croatian language, but it should be equally applicable to other languages of similar morphological complexity. Experimental results show that our approach can be used to acquire a lexicon whose linguistic quality allows for rather good normalisation performance.
  3. Lorenz, S.: Konzeption und prototypische Realisierung einer begriffsbasierten Texterschließung (2006) 0.00
    0.0015077259 = product of:
      0.012061807 = sum of:
        0.012061807 = product of:
          0.024123615 = sum of:
            0.024123615 = weight(_text_:22 in 1746) [ClassicSimilarity], result of:
              0.024123615 = score(doc=1746,freq=2.0), product of:
                0.103918076 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.029675366 = queryNorm
                0.23214069 = fieldWeight in 1746, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1746)
          0.5 = coord(1/2)
      0.125 = coord(1/8)
    
    Date
    22. 3.2015 9:17:30

Languages

Types