Li, D.; Kwong, C.-P.: Understanding latent semantic indexing : a topological structure analysis using Q-analysis (2010)
0.01
0.005565266 = product of:
0.038956862 = sum of:
0.013536699 = weight(_text_:information in 3427) [ClassicSimilarity], result of:
0.013536699 = score(doc=3427,freq=10.0), product of:
0.052020688 = queryWeight, product of:
1.7554779 = idf(docFreq=20772, maxDocs=44218)
0.029633347 = queryNorm
0.2602176 = fieldWeight in 3427, product of:
3.1622777 = tf(freq=10.0), with freq of:
10.0 = termFreq=10.0
1.7554779 = idf(docFreq=20772, maxDocs=44218)
0.046875 = fieldNorm(doc=3427)
0.025420163 = weight(_text_:retrieval in 3427) [ClassicSimilarity], result of:
0.025420163 = score(doc=3427,freq=4.0), product of:
0.08963835 = queryWeight, product of:
3.024915 = idf(docFreq=5836, maxDocs=44218)
0.029633347 = queryNorm
0.2835858 = fieldWeight in 3427, product of:
2.0 = tf(freq=4.0), with freq of:
4.0 = termFreq=4.0
3.024915 = idf(docFreq=5836, maxDocs=44218)
0.046875 = fieldNorm(doc=3427)
0.14285715 = coord(2/14)
- Abstract
- The method of latent semantic indexing (LSI) is well-known for tackling the synonymy and polysemy problems in information retrieval; however, its performance can be very different for various datasets, and the questions of what characteristics of a dataset and why these characteristics contribute to this difference have not been fully understood. In this article, we propose that the mathematical structure of simplexes can be attached to a term-document matrix in the vector space model (VSM) for information retrieval. The Q-analysis devised by R.H. Atkin ([1974]) may then be applied to effect an analysis of the topological structure of the simplexes and their corresponding dataset. Experimental results of this analysis reveal that there is a correlation between the effectiveness of LSI and the topological structure of the dataset. By using the information obtained from the topological analysis, we develop a new method to explore the semantic information in a dataset. Experimental results show that our method can enhance the performance of VSM for datasets over which LSI is not effective.
- Source
- Journal of the American Society for Information Science and Technology. 61(2010) no.3, S.592-608