Search (4 results, page 1 of 1)

  • × theme_ss:"Automatisches Indexieren"
  • × theme_ss:"Computerlinguistik"
  • × year_i:[2000 TO 2010}
  1. Goller, C.; Löning, J.; Will, T.; Wolff, W.: Automatic document classification : a thourough evaluation of various methods (2000) 0.01
    0.008683693 = product of:
      0.052102152 = sum of:
        0.052102152 = weight(_text_:wide in 5480) [ClassicSimilarity], result of:
          0.052102152 = score(doc=5480,freq=2.0), product of:
            0.17738682 = queryWeight, product of:
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.04003532 = queryNorm
            0.29372054 = fieldWeight in 5480, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.046875 = fieldNorm(doc=5480)
      0.16666667 = coord(1/6)
    
    Abstract
    (Automatic) document classification is generally defined as content-based assignment of one or more predefined categories to documents. Usually, machine learning, statistical pattern recognition, or neural network approaches are used to construct classifiers automatically. In this paper we thoroughly evaluate a wide variety of these methods on a document classification task for German text. We evaluate different feature construction and selection methods and various classifiers. Our main results are: (1) feature selection is necessary not only to reduce learning and classification time, but also to avoid overfitting (even for Support Vector Machines); (2) surprisingly, our morphological analysis does not improve classification quality compared to a letter 5-gram approach; (3) Support Vector Machines are significantly better than all other classification methods
  2. Pirkola, A.: Morphological typology of languages for IR (2001) 0.01
    0.0065349266 = product of:
      0.03920956 = sum of:
        0.03920956 = weight(_text_:world in 4476) [ClassicSimilarity], result of:
          0.03920956 = score(doc=4476,freq=2.0), product of:
            0.1538826 = queryWeight, product of:
              3.8436708 = idf(docFreq=2573, maxDocs=44218)
              0.04003532 = queryNorm
            0.25480178 = fieldWeight in 4476, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.8436708 = idf(docFreq=2573, maxDocs=44218)
              0.046875 = fieldNorm(doc=4476)
      0.16666667 = coord(1/6)
    
    Abstract
    This paper presents a morphological classification of languages from the IR perspective. Linguistic typology research has shown that the morphological complexity of every language in the world can be described by two variables, index of synthesis and index of fusion. These variables provide a theoretical basis for IR research handling morphological issues. A common theoretical framework is needed in particular because of the increasing significance of cross-language retrieval research and CLIR systems processing different languages. The paper elaborates the linguistic morphological typology for the purposes of IR research. It studies how the indexes of synthesis and fusion could be used as practical tools in mono- and cross-lingual IR research. The need for semantic and syntactic typologies is discussed. The paper also reviews studies made in different languages on the effects of morphology and stemming in IR.
  3. Lorenz, S.: Konzeption und prototypische Realisierung einer begriffsbasierten Texterschließung (2006) 0.00
    0.0018080776 = product of:
      0.010848465 = sum of:
        0.010848465 = product of:
          0.032545395 = sum of:
            0.032545395 = weight(_text_:22 in 1746) [ClassicSimilarity], result of:
              0.032545395 = score(doc=1746,freq=2.0), product of:
                0.14019686 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04003532 = queryNorm
                0.23214069 = fieldWeight in 1746, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1746)
          0.33333334 = coord(1/3)
      0.16666667 = coord(1/6)
    
    Date
    22. 3.2015 9:17:30
  4. Li, W.; Wong, K.-F.; Yuan, C.: Toward automatic Chinese temporal information extraction (2001) 0.00
    0.0015204087 = product of:
      0.009122452 = sum of:
        0.009122452 = product of:
          0.027367353 = sum of:
            0.027367353 = weight(_text_:29 in 6029) [ClassicSimilarity], result of:
              0.027367353 = score(doc=6029,freq=2.0), product of:
                0.14083174 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.04003532 = queryNorm
                0.19432661 = fieldWeight in 6029, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=6029)
          0.33333334 = coord(1/3)
      0.16666667 = coord(1/6)
    
    Date
    29. 9.2001 14:02:50

Languages

Types