Search (3 results, page 1 of 1)

  • × theme_ss:"Automatisches Indexieren"
  • × theme_ss:"Computerlinguistik"
  • × year_i:[2000 TO 2010}
  1. Witschel, H.F.: Terminology extraction and automatic indexing : comparison and qualitative evaluation of methods (2005) 0.02
    0.018822905 = product of:
      0.09411452 = sum of:
        0.09411452 = weight(_text_:index in 1842) [ClassicSimilarity], result of:
          0.09411452 = score(doc=1842,freq=6.0), product of:
            0.2250935 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.051511593 = queryNorm
            0.418113 = fieldWeight in 1842, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1842)
      0.2 = coord(1/5)
    
    Abstract
    Many terminology engineering processes involve the task of automatic terminology extraction: before the terminology of a given domain can be modelled, organised or standardised, important concepts (or terms) of this domain have to be identified and fed into terminological databases. These serve in further steps as a starting point for compiling dictionaries, thesauri or maybe even terminological ontologies for the domain. For the extraction of the initial concepts, extraction methods are needed that operate on specialised language texts. On the other hand, many machine learning or information retrieval applications require automatic indexing techniques. In Machine Learning applications concerned with the automatic clustering or classification of texts, often feature vectors are needed that describe the contents of a given text briefly but meaningfully. These feature vectors typically consist of a fairly small set of index terms together with weights indicating their importance. Short but meaningful descriptions of document contents as provided by good index terms are also useful to humans: some knowledge management applications (e.g. topic maps) use them as a set of basic concepts (topics). The author believes that the tasks of terminology extraction and automatic indexing have much in common and can thus benefit from the same set of basic algorithms. It is the goal of this paper to outline some methods that may be used in both contexts, but also to find the discriminating factors between the two tasks that call for the variation of parameters or application of different techniques. The discussion of these methods will be based on statistical, syntactical and especially morphological properties of (index) terms. The paper is concluded by the presentation of some qualitative and quantitative results comparing statistical and morphological methods.
  2. Pirkola, A.: Morphological typology of languages for IR (2001) 0.02
    0.018442601 = product of:
      0.092213005 = sum of:
        0.092213005 = weight(_text_:index in 4476) [ClassicSimilarity], result of:
          0.092213005 = score(doc=4476,freq=4.0), product of:
            0.2250935 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.051511593 = queryNorm
            0.40966535 = fieldWeight in 4476, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.046875 = fieldNorm(doc=4476)
      0.2 = coord(1/5)
    
    Abstract
    This paper presents a morphological classification of languages from the IR perspective. Linguistic typology research has shown that the morphological complexity of every language in the world can be described by two variables, index of synthesis and index of fusion. These variables provide a theoretical basis for IR research handling morphological issues. A common theoretical framework is needed in particular because of the increasing significance of cross-language retrieval research and CLIR systems processing different languages. The paper elaborates the linguistic morphological typology for the purposes of IR research. It studies how the indexes of synthesis and fusion could be used as practical tools in mono- and cross-lingual IR research. The need for semantic and syntactic typologies is discussed. The paper also reviews studies made in different languages on the effects of morphology and stemming in IR.
  3. Lorenz, S.: Konzeption und prototypische Realisierung einer begriffsbasierten Texterschließung (2006) 0.01
    0.008374932 = product of:
      0.04187466 = sum of:
        0.04187466 = weight(_text_:22 in 1746) [ClassicSimilarity], result of:
          0.04187466 = score(doc=1746,freq=2.0), product of:
            0.18038483 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051511593 = queryNorm
            0.23214069 = fieldWeight in 1746, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=1746)
      0.2 = coord(1/5)
    
    Date
    22. 3.2015 9:17:30

Languages

Types