Search (3 results, page 1 of 1)

Did you mean:
author's%3a%22Kieslich%2c S. %28Bearb.%29%22 3
authors%3a%22Kieslich%2c S. %28Bearb.%29%22 3

Schneider, J.W.; Borlund, P.: ¬A bibliometric-based semiautomatic approach to identification of candidate thesaurus terms : parsing and filtering of noun phrases from citation contexts (2005) 0.01

0.010907775 = product of:
  0.016361661 = sum of:
    0.0036702133 = weight(_text_:s in 156) [ClassicSimilarity], result of:
      0.0036702133 = score(doc=156,freq=2.0), product of:
        0.043647945 = queryWeight, product of:
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.04014573 = queryNorm
        0.08408674 = fieldWeight in 156, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.0546875 = fieldNorm(doc=156)
    0.012691448 = product of:
      0.038074344 = sum of:
        0.038074344 = weight(_text_:22 in 156) [ClassicSimilarity], result of:
          0.038074344 = score(doc=156,freq=2.0), product of:
            0.1405835 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04014573 = queryNorm
            0.2708308 = fieldWeight in 156, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=156)
      0.33333334 = coord(1/3)
  0.6666667 = coord(2/3)

Date: 8. 3.2007 19:55:22
Pages: S.226-237

Tseng, Y.-H.: Automatic thesaurus generation for Chinese documents (2002) 0.01
```
0.007846127 = product of:
  0.01176919 = sum of:
    0.002621581 = weight(_text_:s in 5226) [ClassicSimilarity], result of:
      0.002621581 = score(doc=5226,freq=2.0), product of:
        0.043647945 = queryWeight, product of:
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.04014573 = queryNorm
        0.060061958 = fieldWeight in 5226, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5226)
    0.00914761 = product of:
      0.027442828 = sum of:
        0.027442828 = weight(_text_:29 in 5226) [ClassicSimilarity], result of:
          0.027442828 = score(doc=5226,freq=2.0), product of:
            0.14122012 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.04014573 = queryNorm
            0.19432661 = fieldWeight in 5226, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5226)
      0.33333334 = coord(1/3)
  0.6666667 = coord(2/3)
```
Abstract

Tseng constructs a word co-occurrence based thesaurus by means of the automatic analysis of Chinese text. Words are identified by a longest dictionary match supplemented by a key word extraction algorithm that merges back nearby tokens and accepts shorter strings of characters if they occur more often than the longest string. Single character auxiliary words are a major source of error but this can be greatly reduced with the use of a 70-character 2680 word stop list. Extracted terms with their associate document weights are sorted by decreasing frequency and the top of this list is associated using a Dice coefficient modified to account for longer documents on the weights of term pairs. Co-occurrence is not in the document as a whole but in paragraph or sentence size sections in order to reduce computation time. A window of 29 characters or 11 words was found to be sufficient. A thesaurus was produced from 25,230 Chinese news articles and judges asked to review the top 50 terms associated with each of 30 single word query terms. They determined 69% to be relevant.

Source

Journal of the American Society for Information Science and technology. 53(2002) no.13, S.1130-1138

Pimenov, E.N.: Normativnost' i nekotorye problem razrabotki tezauruzov i drugikh lingvistiicheskikh sredstv IPS (2000) 0.00

0.0017477208 = product of:
  0.005243162 = sum of:
    0.005243162 = weight(_text_:s in 3281) [ClassicSimilarity], result of:
      0.005243162 = score(doc=3281,freq=2.0), product of:
        0.043647945 = queryWeight, product of:
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.04014573 = queryNorm
        0.120123915 = fieldWeight in 3281, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.078125 = fieldNorm(doc=3281)
  0.33333334 = coord(1/3)

Source: Nauchno- Tekhnicheskaya Informatsiya; Series 1. 2000, no.5, S.7-16

Search (3 results, page 1 of 1)

Authors

Languages