Search (3 results, page 1 of 1)

Did you mean:
rvk_ss%3a%2200 74500 allgemeines %2f buch- und bibliothekswesen%2c informationswissenschaft %2f bibliothekswesen %2f kataloge%2c katalogisierung %2f titelaufnahme%2c katalogisierung im ausland %28.6 1%29 %2f international%2c allgemeines%22 3
rvk_ss%3a%2200 74500 allgemeines %2f buch- und bibliothekswesen%2c informationswissenschaft %2f bibliothekswesen %2f kataloge%2c katalogisierung %2f titelaufnahme%2c katalogisierung im ausland %28.6 1%29 %2f international%2c allgemeinen%22 3
rvk_ss%3a%2200 74500 allgemeines %2f buch- und bibliothekswesen%2c informationswissenschaft %2f bibliothekswesen %2f kataloge%2c katalogisierung %2f titelaufnahme%2c katalogisierungs im ausland %28.6 1%29 %2f international%2c allgemeines%22 3
rvk_ss%3a%2200 74500 allgemeines %2f buch- und bibliothekswesen%2c informationswissenschaft %2f bibliothekswesen %2f kataloge%2c katalogisierung %2f titelaufnahmen%2c katalogisierung im ausland %28.6 1%29 %2f international%2c allgemeines%22 3
rvk_ss%3a%2200 74500 allgemeines %2f buch- und bibliothekswesen%2c informationswissenschaft %2f bibliothekswesen %2f kataloge%2c katalogisierungs %2f titelaufnahme%2c katalogisierung im ausland %28.6 1%29 %2f international%2c allgemeines%22 3

Tseng, Y.-H.: Automatic thesaurus generation for Chinese documents (2002) 0.00
```
0.0012565941 = product of:
  0.014450832 = sum of:
    0.0063955644 = weight(_text_:und in 5226) [ClassicSimilarity], result of:
      0.0063955644 = score(doc=5226,freq=2.0), product of:
        0.052235067 = queryWeight, product of:
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.023567878 = queryNorm
        0.12243814 = fieldWeight in 5226, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5226)
    0.008055268 = product of:
      0.016110536 = sum of:
        0.016110536 = weight(_text_:29 in 5226) [ClassicSimilarity], result of:
          0.016110536 = score(doc=5226,freq=2.0), product of:
            0.08290443 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.023567878 = queryNorm
            0.19432661 = fieldWeight in 5226, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5226)
      0.5 = coord(1/2)
  0.08695652 = coord(2/23)
```
Abstract

Tseng constructs a word co-occurrence based thesaurus by means of the automatic analysis of Chinese text. Words are identified by a longest dictionary match supplemented by a key word extraction algorithm that merges back nearby tokens and accepts shorter strings of characters if they occur more often than the longest string. Single character auxiliary words are a major source of error but this can be greatly reduced with the use of a 70-character 2680 word stop list. Extracted terms with their associate document weights are sorted by decreasing frequency and the top of this list is associated using a Dice coefficient modified to account for longer documents on the weights of term pairs. Co-occurrence is not in the document as a whole but in paragraph or sentence size sections in order to reduce computation time. A window of 29 characters or 11 words was found to be sufficient. A thesaurus was produced from 25,230 Chinese news articles and judges asked to review the top 50 terms associated with each of 30 single word query terms. They determined 69% to be relevant.

Theme

Konzeption und Anwendung des Prinzips Thesaurus

Tseng, Y.-H.: Automatic cataloguing and searching for retrospective data by use of OCR text (2001) 0.00

4.2027488E-4 = product of:
  0.009666322 = sum of:
    0.009666322 = product of:
      0.019332644 = sum of:
        0.019332644 = weight(_text_:29 in 5421) [ClassicSimilarity], result of:
          0.019332644 = score(doc=5421,freq=2.0), product of:
            0.08290443 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.023567878 = queryNorm
            0.23319192 = fieldWeight in 5421, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.046875 = fieldNorm(doc=5421)
      0.5 = coord(1/2)
  0.04347826 = coord(1/23)

Date: 29. 9.2001 13:58:18

Tseng, Y.-H.: Solving vocabulary problems with interactive query expansion (1998) 0.00

2.4153895E-4 = product of:
  0.0055553955 = sum of:
    0.0055553955 = product of:
      0.011110791 = sum of:
        0.011110791 = weight(_text_:1 in 5159) [ClassicSimilarity], result of:
          0.011110791 = score(doc=5159,freq=4.0), product of:
            0.057894554 = queryWeight, product of:
              2.4565027 = idf(docFreq=10304, maxDocs=44218)
              0.023567878 = queryNorm
            0.19191428 = fieldWeight in 5159, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.4565027 = idf(docFreq=10304, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5159)
      0.5 = coord(1/2)
  0.04347826 = coord(1/23)

Source: Journal of library and information science. 24(1998) no.1, S.1-18

Search (3 results, page 1 of 1)

Years

Languages

Themes