Search (3 results, page 1 of 1)

  • × theme_ss:"Computerlinguistik"
  • × theme_ss:"Konzeption und Anwendung des Prinzips Thesaurus"
  • × year_i:[2000 TO 2010}
  1. Schneider, J.W.; Borlund, P.: ¬A bibliometric-based semiautomatic approach to identification of candidate thesaurus terms : parsing and filtering of noun phrases from citation contexts (2005) 0.02
    0.023312453 = product of:
      0.046624906 = sum of:
        0.046624906 = sum of:
          0.0073713013 = weight(_text_:a in 156) [ClassicSimilarity], result of:
            0.0073713013 = score(doc=156,freq=6.0), product of:
              0.04772363 = queryWeight, product of:
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.041389145 = queryNorm
              0.1544581 = fieldWeight in 156, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.0546875 = fieldNorm(doc=156)
          0.039253604 = weight(_text_:22 in 156) [ClassicSimilarity], result of:
            0.039253604 = score(doc=156,freq=2.0), product of:
              0.14493774 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.041389145 = queryNorm
              0.2708308 = fieldWeight in 156, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0546875 = fieldNorm(doc=156)
      0.5 = coord(1/2)
    
    Abstract
    The present study investigates the ability of a bibliometric based semi-automatic method to select candidate thesaurus terms from citation contexts. The method consists of document co-citation analysis, citation context analysis, and noun phrase parsing. The investigation is carried out within the specialty area of periodontology. The results clearly demonstrate that the method is able to select important candidate thesaurus terms within the chosen specialty area.
    Date
    8. 3.2007 19:55:22
    Type
    a
  2. Tseng, Y.-H.: Automatic thesaurus generation for Chinese documents (2002) 0.00
    0.0024032309 = product of:
      0.0048064617 = sum of:
        0.0048064617 = product of:
          0.0096129235 = sum of:
            0.0096129235 = weight(_text_:a in 5226) [ClassicSimilarity], result of:
              0.0096129235 = score(doc=5226,freq=20.0), product of:
                0.04772363 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.041389145 = queryNorm
                0.20142901 = fieldWeight in 5226, product of:
                  4.472136 = tf(freq=20.0), with freq of:
                    20.0 = termFreq=20.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5226)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Tseng constructs a word co-occurrence based thesaurus by means of the automatic analysis of Chinese text. Words are identified by a longest dictionary match supplemented by a key word extraction algorithm that merges back nearby tokens and accepts shorter strings of characters if they occur more often than the longest string. Single character auxiliary words are a major source of error but this can be greatly reduced with the use of a 70-character 2680 word stop list. Extracted terms with their associate document weights are sorted by decreasing frequency and the top of this list is associated using a Dice coefficient modified to account for longer documents on the weights of term pairs. Co-occurrence is not in the document as a whole but in paragraph or sentence size sections in order to reduce computation time. A window of 29 characters or 11 words was found to be sufficient. A thesaurus was produced from 25,230 Chinese news articles and judges asked to review the top 50 terms associated with each of 30 single word query terms. They determined 69% to be relevant.
    Type
    a
  3. Pimenov, E.N.: Normativnost' i nekotorye problem razrabotki tezauruzov i drugikh lingvistiicheskikh sredstv IPS (2000) 0.00
    0.0015199365 = product of:
      0.003039873 = sum of:
        0.003039873 = product of:
          0.006079746 = sum of:
            0.006079746 = weight(_text_:a in 3281) [ClassicSimilarity], result of:
              0.006079746 = score(doc=3281,freq=2.0), product of:
                0.04772363 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.041389145 = queryNorm
                0.12739488 = fieldWeight in 3281, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.078125 = fieldNorm(doc=3281)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Type
    a

Languages