Li, D.; Kwong, C.-P.: Understanding latent semantic indexing : a topological structure analysis using Q-analysis (2010)
0.00
0.002991431 = product of:
0.005982862 = sum of:
0.005982862 = product of:
0.011965724 = sum of:
0.011965724 = weight(_text_:a in 3427) [ClassicSimilarity], result of:
0.011965724 = score(doc=3427,freq=14.0), product of:
0.059167966 = queryWeight, product of:
1.153047 = idf(docFreq=37942, maxDocs=44218)
0.051314447 = queryNorm
0.20223314 = fieldWeight in 3427, product of:
3.7416575 = tf(freq=14.0), with freq of:
14.0 = termFreq=14.0
1.153047 = idf(docFreq=37942, maxDocs=44218)
0.046875 = fieldNorm(doc=3427)
0.5 = coord(1/2)
0.5 = coord(1/2)
- Abstract
- The method of latent semantic indexing (LSI) is well-known for tackling the synonymy and polysemy problems in information retrieval; however, its performance can be very different for various datasets, and the questions of what characteristics of a dataset and why these characteristics contribute to this difference have not been fully understood. In this article, we propose that the mathematical structure of simplexes can be attached to a term-document matrix in the vector space model (VSM) for information retrieval. The Q-analysis devised by R.H. Atkin ([1974]) may then be applied to effect an analysis of the topological structure of the simplexes and their corresponding dataset. Experimental results of this analysis reveal that there is a correlation between the effectiveness of LSI and the topological structure of the dataset. By using the information obtained from the topological analysis, we develop a new method to explore the semantic information in a dataset. Experimental results show that our method can enhance the performance of VSM for datasets over which LSI is not effective.
- Type
- a