-
Tseng, Y.-H.: Automatic cataloguing and searching for retrospective data by use of OCR text (2001)
0.00
0.0045070197 = product of:
0.0135210585 = sum of:
0.0045123748 = weight(_text_:e in 5421) [ClassicSimilarity], result of:
0.0045123748 = score(doc=5421,freq=2.0), product of:
0.047356583 = queryWeight, product of:
1.43737 = idf(docFreq=28552, maxDocs=44218)
0.03294669 = queryNorm
0.09528506 = fieldWeight in 5421, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
1.43737 = idf(docFreq=28552, maxDocs=44218)
0.046875 = fieldNorm(doc=5421)
0.009008683 = product of:
0.027026048 = sum of:
0.027026048 = weight(_text_:29 in 5421) [ClassicSimilarity], result of:
0.027026048 = score(doc=5421,freq=2.0), product of:
0.11589616 = queryWeight, product of:
3.5176873 = idf(docFreq=3565, maxDocs=44218)
0.03294669 = queryNorm
0.23319192 = fieldWeight in 5421, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.5176873 = idf(docFreq=3565, maxDocs=44218)
0.046875 = fieldNorm(doc=5421)
0.33333334 = coord(1/3)
0.33333334 = coord(2/6)
- Date
- 29. 9.2001 13:58:18
- Language
- e
-
Tseng, Y.-H.: Automatic thesaurus generation for Chinese documents (2002)
0.00
0.0037558496 = product of:
0.011267548 = sum of:
0.0037603125 = weight(_text_:e in 5226) [ClassicSimilarity], result of:
0.0037603125 = score(doc=5226,freq=2.0), product of:
0.047356583 = queryWeight, product of:
1.43737 = idf(docFreq=28552, maxDocs=44218)
0.03294669 = queryNorm
0.07940422 = fieldWeight in 5226, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
1.43737 = idf(docFreq=28552, maxDocs=44218)
0.0390625 = fieldNorm(doc=5226)
0.007507236 = product of:
0.022521708 = sum of:
0.022521708 = weight(_text_:29 in 5226) [ClassicSimilarity], result of:
0.022521708 = score(doc=5226,freq=2.0), product of:
0.11589616 = queryWeight, product of:
3.5176873 = idf(docFreq=3565, maxDocs=44218)
0.03294669 = queryNorm
0.19432661 = fieldWeight in 5226, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.5176873 = idf(docFreq=3565, maxDocs=44218)
0.0390625 = fieldNorm(doc=5226)
0.33333334 = coord(1/3)
0.33333334 = coord(2/6)
- Abstract
- Tseng constructs a word co-occurrence based thesaurus by means of the automatic analysis of Chinese text. Words are identified by a longest dictionary match supplemented by a key word extraction algorithm that merges back nearby tokens and accepts shorter strings of characters if they occur more often than the longest string. Single character auxiliary words are a major source of error but this can be greatly reduced with the use of a 70-character 2680 word stop list. Extracted terms with their associate document weights are sorted by decreasing frequency and the top of this list is associated using a Dice coefficient modified to account for longer documents on the weights of term pairs. Co-occurrence is not in the document as a whole but in paragraph or sentence size sections in order to reduce computation time. A window of 29 characters or 11 words was found to be sufficient. A thesaurus was produced from 25,230 Chinese news articles and judges asked to review the top 50 terms associated with each of 30 single word query terms. They determined 69% to be relevant.
- Language
- e
-
Tseng, Y.-H.: Solving vocabulary problems with interactive query expansion (1998)
0.00
0.0032524513 = product of:
0.019514708 = sum of:
0.019514708 = weight(_text_:u in 5159) [ClassicSimilarity], result of:
0.019514708 = score(doc=5159,freq=2.0), product of:
0.107882105 = queryWeight, product of:
3.2744443 = idf(docFreq=4547, maxDocs=44218)
0.03294669 = queryNorm
0.1808892 = fieldWeight in 5159, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.2744443 = idf(docFreq=4547, maxDocs=44218)
0.0390625 = fieldNorm(doc=5159)
0.16666667 = coord(1/6)
- Theme
- Semantisches Umfeld in Indexierung u. Retrieval
-
Tseng, Y.-H.; Lin, C.-J.; Lin, Y.-I.: Text mining techniques for patent analysis (2007)
0.00
6.2671874E-4 = product of:
0.0037603125 = sum of:
0.0037603125 = weight(_text_:e in 935) [ClassicSimilarity], result of:
0.0037603125 = score(doc=935,freq=2.0), product of:
0.047356583 = queryWeight, product of:
1.43737 = idf(docFreq=28552, maxDocs=44218)
0.03294669 = queryNorm
0.07940422 = fieldWeight in 935, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
1.43737 = idf(docFreq=28552, maxDocs=44218)
0.0390625 = fieldNorm(doc=935)
0.16666667 = coord(1/6)
- Language
- e
-
Lee, L.-H.; Juan, Y.-C.; Tseng, W.-L.; Chen, H.-H.; Tseng, Y.-H.: Mining browsing behaviors for objectionable content filtering (2015)
0.00
6.2671874E-4 = product of:
0.0037603125 = sum of:
0.0037603125 = weight(_text_:e in 1818) [ClassicSimilarity], result of:
0.0037603125 = score(doc=1818,freq=2.0), product of:
0.047356583 = queryWeight, product of:
1.43737 = idf(docFreq=28552, maxDocs=44218)
0.03294669 = queryNorm
0.07940422 = fieldWeight in 1818, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
1.43737 = idf(docFreq=28552, maxDocs=44218)
0.0390625 = fieldNorm(doc=1818)
0.16666667 = coord(1/6)
- Language
- e