Search (5 results, page 1 of 1)

  • × theme_ss:"Computerlinguistik"
  • × theme_ss:"Automatisches Indexieren"
  1. Cheng, K.-H.: Automatic identification for topics of electronic documents (1997) 0.03
    0.027969502 = product of:
      0.083908506 = sum of:
        0.083908506 = weight(_text_:electronic in 1811) [ClassicSimilarity], result of:
          0.083908506 = score(doc=1811,freq=4.0), product of:
            0.19623034 = queryWeight, product of:
              3.9095051 = idf(docFreq=2409, maxDocs=44218)
              0.05019314 = queryNorm
            0.4276021 = fieldWeight in 1811, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.9095051 = idf(docFreq=2409, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1811)
      0.33333334 = coord(1/3)
    
    Abstract
    With the rapid rise in numbers of electronic documents on the Internet, how to effectively assign topics to documents become an important issue. Current research in this area focuses on the behaviour of nouns in documents. Proposes, however, that nouns and verbs together contribute to the process of topic identification. Constructs a mathematical model taking into account the following factors: word importance, word frequency, word co-occurence, and word distance. Preliminary experiments ahow that the performance of the proposed model is equivalent to that of a human being
  2. Renouf, A.: Sticking to the text : a corpus linguist's view of language (1993) 0.02
    0.019777425 = product of:
      0.059332274 = sum of:
        0.059332274 = weight(_text_:electronic in 2314) [ClassicSimilarity], result of:
          0.059332274 = score(doc=2314,freq=2.0), product of:
            0.19623034 = queryWeight, product of:
              3.9095051 = idf(docFreq=2409, maxDocs=44218)
              0.05019314 = queryNorm
            0.30236036 = fieldWeight in 2314, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.9095051 = idf(docFreq=2409, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2314)
      0.33333334 = coord(1/3)
    
    Abstract
    Corpus linguistics is the study of large, computer held bodies of text. Some corpus linguists are concerned with language descriptions for its own sake. On the corpus-linguistic continuum, the study of raw ASCII text is situated at one end, and the study of heavily pre-coded text at the other. Discusses the use of word frequency to identify changes in the lexicon; word repetition and word positioning in automatic abstracting and word clusters in automatic text retrieval. Compares the machine extract with manual abstracts. Abstractors and indexers may find themselves taking the original wording of the text more into account as the focus moves towards the electronic medium and away from the hard copy
  3. Galvez, C.; Moya-Anegón, F. de: ¬An evaluation of conflation accuracy using finite-state transducers (2006) 0.02
    0.016952079 = product of:
      0.050856233 = sum of:
        0.050856233 = weight(_text_:electronic in 5599) [ClassicSimilarity], result of:
          0.050856233 = score(doc=5599,freq=2.0), product of:
            0.19623034 = queryWeight, product of:
              3.9095051 = idf(docFreq=2409, maxDocs=44218)
              0.05019314 = queryNorm
            0.259166 = fieldWeight in 5599, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.9095051 = idf(docFreq=2409, maxDocs=44218)
              0.046875 = fieldNorm(doc=5599)
      0.33333334 = coord(1/3)
    
    Abstract
    Purpose - To evaluate the accuracy of conflation methods based on finite-state transducers (FSTs). Design/methodology/approach - Incorrectly lemmatized and stemmed forms may lead to the retrieval of inappropriate documents. Experimental studies to date have focused on retrieval performance, but very few on conflation performance. The process of normalization we used involved a linguistic toolbox that allowed us to construct, through graphic interfaces, electronic dictionaries represented internally by FSTs. The lexical resources developed were applied to a Spanish test corpus for merging term variants in canonical lemmatized forms. Conflation performance was evaluated in terms of an adaptation of recall and precision measures, based on accuracy and coverage, not actual retrieval. The results were compared with those obtained using a Spanish version of the Porter algorithm. Findings - The conclusion is that the main strength of lemmatization is its accuracy, whereas its main limitation is the underanalysis of variant forms. Originality/value - The report outlines the potential of transducers in their application to normalization processes.
  4. Riloff, E.: ¬An empirical study of automated dictionary construction for information extraction in three domains (1996) 0.01
    0.0090673035 = product of:
      0.02720191 = sum of:
        0.02720191 = product of:
          0.05440382 = sum of:
            0.05440382 = weight(_text_:22 in 6752) [ClassicSimilarity], result of:
              0.05440382 = score(doc=6752,freq=2.0), product of:
                0.17576782 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05019314 = queryNorm
                0.30952093 = fieldWeight in 6752, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=6752)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    6. 3.1997 16:22:15
  5. Lorenz, S.: Konzeption und prototypische Realisierung einer begriffsbasierten Texterschließung (2006) 0.01
    0.0068004774 = product of:
      0.020401431 = sum of:
        0.020401431 = product of:
          0.040802862 = sum of:
            0.040802862 = weight(_text_:22 in 1746) [ClassicSimilarity], result of:
              0.040802862 = score(doc=1746,freq=2.0), product of:
                0.17576782 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05019314 = queryNorm
                0.23214069 = fieldWeight in 1746, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1746)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    22. 3.2015 9:17:30

Languages

Types