Jiang, Y.; Bai, W.; Zhang, X.; Hu, J.: Wikipedia-based information content and semantic similarity computation (2017)
0.02
0.017687023 = product of:
0.047165394 = sum of:
0.020873476 = weight(_text_:retrieval in 2877) [ClassicSimilarity], result of:
0.020873476 = score(doc=2877,freq=2.0), product of:
0.124912694 = queryWeight, product of:
3.024915 = idf(docFreq=5836, maxDocs=44218)
0.041294612 = queryNorm
0.16710453 = fieldWeight in 2877, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.024915 = idf(docFreq=5836, maxDocs=44218)
0.0390625 = fieldNorm(doc=2877)
0.0167351 = weight(_text_:of in 2877) [ClassicSimilarity], result of:
0.0167351 = score(doc=2877,freq=18.0), product of:
0.06457475 = queryWeight, product of:
1.5637573 = idf(docFreq=25162, maxDocs=44218)
0.041294612 = queryNorm
0.25915858 = fieldWeight in 2877, product of:
4.2426405 = tf(freq=18.0), with freq of:
18.0 = termFreq=18.0
1.5637573 = idf(docFreq=25162, maxDocs=44218)
0.0390625 = fieldNorm(doc=2877)
0.00955682 = product of:
0.01911364 = sum of:
0.01911364 = weight(_text_:on in 2877) [ClassicSimilarity], result of:
0.01911364 = score(doc=2877,freq=6.0), product of:
0.090823986 = queryWeight, product of:
2.199415 = idf(docFreq=13325, maxDocs=44218)
0.041294612 = queryNorm
0.21044704 = fieldWeight in 2877, product of:
2.4494898 = tf(freq=6.0), with freq of:
6.0 = termFreq=6.0
2.199415 = idf(docFreq=13325, maxDocs=44218)
0.0390625 = fieldNorm(doc=2877)
0.5 = coord(1/2)
0.375 = coord(3/8)
- Abstract
- The Information Content (IC) of a concept is a fundamental dimension in computational linguistics. It enables a better understanding of concept's semantics. In the past, several approaches to compute IC of a concept have been proposed. However, there are some limitations such as the facts of relying on corpora availability, manual tagging, or predefined ontologies and fitting non-dynamic domains in the existing methods. Wikipedia provides a very large domain-independent encyclopedic repository and semantic network for computing IC of concepts with more coverage than usual ontologies. In this paper, we propose some novel methods to IC computation of a concept to solve the shortcomings of existing approaches. The presented methods focus on the IC computation of a concept (i.e., Wikipedia category) drawn from the Wikipedia category structure. We propose several new IC-based measures to compute the semantic similarity between concepts. The evaluation, based on several widely used benchmarks and a benchmark developed in ourselves, sustains the intuitions with respect to human judgments. Overall, some methods proposed in this paper have a good human correlation and constitute some effective ways of determining IC values for concepts and semantic similarity between concepts.
- Theme
- Semantisches Umfeld in Indexierung u. Retrieval