Ahlgren, P.; Jarneving, B.; Rousseau, R.: Requirements for a cocitation similarity measure, with special reference to Pearson's correlation coefficient (2003)
0.02
0.015505663 = product of:
0.031011326 = sum of:
0.031011326 = sum of:
0.0060511357 = weight(_text_:a in 5171) [ClassicSimilarity], result of:
0.0060511357 = score(doc=5171,freq=10.0), product of:
0.053105544 = queryWeight, product of:
1.153047 = idf(docFreq=37942, maxDocs=44218)
0.046056706 = queryNorm
0.11394546 = fieldWeight in 5171, product of:
3.1622777 = tf(freq=10.0), with freq of:
10.0 = termFreq=10.0
1.153047 = idf(docFreq=37942, maxDocs=44218)
0.03125 = fieldNorm(doc=5171)
0.02496019 = weight(_text_:22 in 5171) [ClassicSimilarity], result of:
0.02496019 = score(doc=5171,freq=2.0), product of:
0.16128273 = queryWeight, product of:
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.046056706 = queryNorm
0.15476047 = fieldWeight in 5171, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.03125 = fieldNorm(doc=5171)
0.5 = coord(1/2)
- Abstract
- Ahlgren, Jarneving, and. Rousseau review accepted procedures for author co-citation analysis first pointing out that since in the raw data matrix the row and column values are identical i,e, the co-citation count of two authors, there is no clear choice for diagonal values. They suggest the number of times an author has been co-cited with himself excluding self citation rather than the common treatment as zeros or as missing values. When the matrix is converted to a similarity matrix the normal procedure is to create a matrix of Pearson's r coefficients between data vectors. Ranking by r and by co-citation frequency and by intuition can easily yield three different orders. It would seem necessary that the adding of zeros to the matrix will not affect the value or the relative order of similarity measures but it is shown that this is not the case with Pearson's r. Using 913 bibliographic descriptions form the Web of Science of articles form JASIS and Scientometrics, authors names were extracted, edited and 12 information retrieval authors and 12 bibliometric authors each from the top 100 most cited were selected. Co-citation and r value (diagonal elements treated as missing) matrices were constructed, and then reconstructed in expanded form. Adding zeros can both change the r value and the ordering of the authors based upon that value. A chi-squared distance measure would not violate these requirements, nor would the cosine coefficient. It is also argued that co-citation data is ordinal data since there is no assurance of an absolute zero number of co-citations, and thus Pearson is not appropriate. The number of ties in co-citation data make the use of the Spearman rank order coefficient problematic.
- Date
- 9. 7.2006 10:22:35
- Type
- a