Zhou, D.; Lawless, S.; Wu, X.; Zhao, W.; Liu, J.: ¬A study of user profile representation for personalized cross-language information retrieval (2016)
0.02
0.020948619 = product of:
0.041897237 = sum of:
0.041897237 = sum of:
0.010696997 = weight(_text_:a in 3167) [ClassicSimilarity], result of:
0.010696997 = score(doc=3167,freq=20.0), product of:
0.053105544 = queryWeight, product of:
1.153047 = idf(docFreq=37942, maxDocs=44218)
0.046056706 = queryNorm
0.20142901 = fieldWeight in 3167, product of:
4.472136 = tf(freq=20.0), with freq of:
20.0 = termFreq=20.0
1.153047 = idf(docFreq=37942, maxDocs=44218)
0.0390625 = fieldNorm(doc=3167)
0.03120024 = weight(_text_:22 in 3167) [ClassicSimilarity], result of:
0.03120024 = score(doc=3167,freq=2.0), product of:
0.16128273 = queryWeight, product of:
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.046056706 = queryNorm
0.19345059 = fieldWeight in 3167, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.0390625 = fieldNorm(doc=3167)
0.5 = coord(1/2)
- Abstract
- Purpose - With an increase in the amount of multilingual content on the World Wide Web, users are often striving to access information provided in a language of which they are non-native speakers. The purpose of this paper is to present a comprehensive study of user profile representation techniques and investigate their use in personalized cross-language information retrieval (CLIR) systems through the means of personalized query expansion. Design/methodology/approach - The user profiles consist of weighted terms computed by using frequency-based methods such as tf-idf and BM25, as well as various latent semantic models trained on monolingual documents and cross-lingual comparable documents. This paper also proposes an automatic evaluation method for comparing various user profile generation techniques and query expansion methods. Findings - Experimental results suggest that latent semantic-weighted user profile representation techniques are superior to frequency-based methods, and are particularly suitable for users with a sufficient amount of historical data. The study also confirmed that user profiles represented by latent semantic models trained on a cross-lingual level gained better performance than the models trained on a monolingual level. Originality/value - Previous studies on personalized information retrieval systems have primarily investigated user profiles and personalization strategies on a monolingual level. The effect of utilizing such monolingual profiles for personalized CLIR remains unclear. The current study fills the gap by a comprehensive study of user profile representation for personalized CLIR and a novel personalized CLIR evaluation methodology to ensure repeatable and controlled experiments can be conducted.
- Date
- 20. 1.2015 18:30:22
- Type
- a