Aizawa, A.
¬An information-theoretic perspective of tf-idf measures
Information processing and management. 39(2003) no.1, S.45-65
2003
This paper presents a mathematical definition of the "probability-weighted amount of information" (PWI), a measure of specificity of terms in documents that is based on an information-theoretic view of retrieval events. The proposed PWI is expressed as a product of the occurrence probabilities of terms and their amounts of information, and corresponds well with the conventional term frequency - inverse document frequency measures that are commonly used in today's information retrieval systems. The mathematical definition of the PWI is shown, together with some illustrative examples of the calculation.
Retrievalalgorithmen
TF/iDF

Bruza, P.D.; Huibers, T.W.C.: ¬A study of aboutness in information retrieval (1996)
Wong, S.K.M.; Yao, Y.Y.: ¬An information-theoretic measure of term specifics (1992)
Rölleke, T.; Tsikrika, T.; Kazai, G.: ¬A general matrix framework for modelling Information Retrieval (2006)
Dang, E.K.F.; Luk, R.W.P.; Allan, J.: Beyond bag-of-words : bigram-enhanced context-dependent term weights (2014)
Wong, S.K.M.: On modelling information retrieval with probabilistic inference (1995)
