Diese Datenbank enthält über 40.000 Dokumente zu Themen aus den Bereichen Formalerschließung – Inhaltserschließung – Information Retrieval.
© 2015 W. Gödert, TH Köln, Institut für Informationswissenschaft / Powered by litecat, BIS Oldenburg (Stand: 04. Juni 2021)
1Li, D. ; Kwong, C.-P.: Understanding latent semantic indexing : a topological structure analysis using Q-analysis.
In: Journal of the American Society for Information Science and Technology. 61(2010) no.3, S.592-608.
Abstract: The method of latent semantic indexing (LSI) is well-known for tackling the synonymy and polysemy problems in information retrieval; however, its performance can be very different for various datasets, and the questions of what characteristics of a dataset and why these characteristics contribute to this difference have not been fully understood. In this article, we propose that the mathematical structure of simplexes can be attached to a term-document matrix in the vector space model (VSM) for information retrieval. The Q-analysis devised by R.H. Atkin () may then be applied to effect an analysis of the topological structure of the simplexes and their corresponding dataset. Experimental results of this analysis reveal that there is a correlation between the effectiveness of LSI and the topological structure of the dataset. By using the information obtained from the topological analysis, we develop a new method to explore the semantic information in a dataset. Experimental results show that our method can enhance the performance of VSM for datasets over which LSI is not effective.
Objekt: Latent Semantic Indexing ; Q-analysis
2Li, D. ; Kwong, C.-P. ; Lee, D.L.: Unified linear subspace approach to semantic analysis.
In: Journal of the American Society for Information Science and Technology. 61(2010) no.1, S.175-189.
Abstract: The Basic Vector Space Model (BVSM) is well known in information retrieval. Unfortunately, its retrieval effectiveness is limited because it is based on literal term matching. The Generalized Vector Space Model (GVSM) and Latent Semantic Indexing (LSI) are two prominent semantic retrieval methods, both of which assume there is some underlying latent semantic structure in a dataset that can be used to improve retrieval performance. However, while this structure may be derived from both the term space and the document space, GVSM exploits only the former and LSI the latter. In this article, the latent semantic structure of a dataset is examined from a dual perspective; namely, we consider the term space and the document space simultaneously. This new viewpoint has a natural connection to the notion of kernels. Specifically, a unified kernel function can be derived for a class of vector space models. The dual perspective provides a deeper understanding of the semantic space and makes transparent the geometrical meaning of the unified kernel function. New semantic analysis methods based on the unified kernel function are developed, which combine the advantages of LSI and GVSM. We also prove that the new methods are stable because although the selected rank of the truncated Singular Value Decomposition (SVD) is far from the optimum, the retrieval performance will not be degraded significantly. Experiments performed on standard test collections show that our methods are promising.
Themenfeld: Semantisches Umfeld in Indexierung u. Retrieval
Objekt: Latent Semantic Indexing ; Generalized Vector Space Model