-
Chen, H.; Ng, T.D.; Martinez, J.; Schatz, B.R.: ¬A concept space approach to addressing the vocabulary problem in scientific information retrieval : an experiment on the Worm Community System (1997)
0.02
0.022759551 = product of:
0.060692135 = sum of:
0.036153924 = weight(_text_:retrieval in 6492) [ClassicSimilarity], result of:
0.036153924 = score(doc=6492,freq=6.0), product of:
0.124912694 = queryWeight, product of:
3.024915 = idf(docFreq=5836, maxDocs=44218)
0.041294612 = queryNorm
0.28943354 = fieldWeight in 6492, product of:
2.4494898 = tf(freq=6.0), with freq of:
6.0 = termFreq=6.0
3.024915 = idf(docFreq=5836, maxDocs=44218)
0.0390625 = fieldNorm(doc=6492)
0.0167351 = weight(_text_:of in 6492) [ClassicSimilarity], result of:
0.0167351 = score(doc=6492,freq=18.0), product of:
0.06457475 = queryWeight, product of:
1.5637573 = idf(docFreq=25162, maxDocs=44218)
0.041294612 = queryNorm
0.25915858 = fieldWeight in 6492, product of:
4.2426405 = tf(freq=18.0), with freq of:
18.0 = termFreq=18.0
1.5637573 = idf(docFreq=25162, maxDocs=44218)
0.0390625 = fieldNorm(doc=6492)
0.007803111 = product of:
0.015606222 = sum of:
0.015606222 = weight(_text_:on in 6492) [ClassicSimilarity], result of:
0.015606222 = score(doc=6492,freq=4.0), product of:
0.090823986 = queryWeight, product of:
2.199415 = idf(docFreq=13325, maxDocs=44218)
0.041294612 = queryNorm
0.1718293 = fieldWeight in 6492, product of:
2.0 = tf(freq=4.0), with freq of:
4.0 = termFreq=4.0
2.199415 = idf(docFreq=13325, maxDocs=44218)
0.0390625 = fieldNorm(doc=6492)
0.5 = coord(1/2)
0.375 = coord(3/8)
- Abstract
- This research presents an algorithmic approach to addressing the vocabulary problem in scientific information retrieval and information sharing, using the molecular biology domain as an example. We first present a literature review of cognitive studies related to the vocabulary problem and vocabulary-based search aids (thesauri) and then discuss techniques for building robust and domain-specific thesauri to assist in cross-domain scientific information retrieval. Using a variation of the automatic thesaurus generation techniques, which we refer to as the concept space approach, we recently conducted an experiment in the molecular biology domain in which we created a C. elegans worm thesaurus of 7.657 worm-specific terms and a Drosophila fly thesaurus of 15.626 terms. About 30% of these terms overlapped, which created vocabulary paths from one subject domain to the other. Based on a cognitve study of term association involving 4 biologists, we found that a large percentage (59,6-85,6%) of the terms suggested by the subjects were identified in the cojoined fly-worm thesaurus. However, we found only a small percentage (8,4-18,1%) of the associations suggested by the subjects in the thesaurus
- Source
- Journal of the American Society for Information Science. 48(1997) no.1, S.17-31
-
Chen, H.; Martinez, J.; Kirchhoff, A.; Ng, T.D.; Schatz, B.R.: Alleviating search uncertainty through concept associations : automatic indexing, co-occurence analysis, and parallel computing (1998)
0.02
0.016869199 = product of:
0.04498453 = sum of:
0.025048172 = weight(_text_:retrieval in 5202) [ClassicSimilarity], result of:
0.025048172 = score(doc=5202,freq=2.0), product of:
0.124912694 = queryWeight, product of:
3.024915 = idf(docFreq=5836, maxDocs=44218)
0.041294612 = queryNorm
0.20052543 = fieldWeight in 5202, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.024915 = idf(docFreq=5836, maxDocs=44218)
0.046875 = fieldNorm(doc=5202)
0.0066940407 = weight(_text_:of in 5202) [ClassicSimilarity], result of:
0.0066940407 = score(doc=5202,freq=2.0), product of:
0.06457475 = queryWeight, product of:
1.5637573 = idf(docFreq=25162, maxDocs=44218)
0.041294612 = queryNorm
0.103663445 = fieldWeight in 5202, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
1.5637573 = idf(docFreq=25162, maxDocs=44218)
0.046875 = fieldNorm(doc=5202)
0.013242318 = product of:
0.026484637 = sum of:
0.026484637 = weight(_text_:on in 5202) [ClassicSimilarity], result of:
0.026484637 = score(doc=5202,freq=8.0), product of:
0.090823986 = queryWeight, product of:
2.199415 = idf(docFreq=13325, maxDocs=44218)
0.041294612 = queryNorm
0.29160398 = fieldWeight in 5202, product of:
2.828427 = tf(freq=8.0), with freq of:
8.0 = termFreq=8.0
2.199415 = idf(docFreq=13325, maxDocs=44218)
0.046875 = fieldNorm(doc=5202)
0.5 = coord(1/2)
0.375 = coord(3/8)
- Abstract
- In this article, we report research on an algorithmic approach to alleviating search uncertainty in a large information space. Grounded on object filtering, automatic indexing, and co-occurence analysis, we performed a large-scale experiment using a parallel supercomputer (SGI Power Challenge) to analyze 400.000+ abstracts in an INSPEC computer engineering collection. Two system-generated thesauri, one based on a combined object filtering and automatic indexing method, and the other based on automatic indexing only, were compaed with the human-generated INSPEC subject thesaurus. Our user evaluation revealed that the system-generated thesauri were better than the INSPEC thesaurus in 'concept recall', but in 'concept precision' the 3 thesauri were comparable. Our analysis also revealed that the terms suggested by the 3 thesauri were complementary and could be used to significantly increase 'variety' in search terms the thereby reduce search uncertainty
- Source
- Journal of the American Society for Information Science. 49(1998) no.3, S.206-216
- Theme
- Semantisches Umfeld in Indexierung u. Retrieval
-
Chen, H.; Lynch, K.J.; Bashu, K.; Ng, T.D.: Generating, integrating, and activating thesauri for concept-based document retrieval (1993)
0.01
0.008349391 = product of:
0.066795126 = sum of:
0.066795126 = weight(_text_:retrieval in 8549) [ClassicSimilarity], result of:
0.066795126 = score(doc=8549,freq=2.0), product of:
0.124912694 = queryWeight, product of:
3.024915 = idf(docFreq=5836, maxDocs=44218)
0.041294612 = queryNorm
0.5347345 = fieldWeight in 8549, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.024915 = idf(docFreq=5836, maxDocs=44218)
0.125 = fieldNorm(doc=8549)
0.125 = coord(1/8)