Pons-Porrata, A.; Berlanga-Llavori, R.; Ruiz-Shulcloper, J.: Topic discovery based on text mining techniques (2007)
0.00
0.002735886 = product of:
0.005471772 = sum of:
0.005471772 = product of:
0.010943544 = sum of:
0.010943544 = weight(_text_:a in 916) [ClassicSimilarity], result of:
0.010943544 = score(doc=916,freq=18.0), product of:
0.04772363 = queryWeight, product of:
1.153047 = idf(docFreq=37942, maxDocs=44218)
0.041389145 = queryNorm
0.22931081 = fieldWeight in 916, product of:
4.2426405 = tf(freq=18.0), with freq of:
18.0 = termFreq=18.0
1.153047 = idf(docFreq=37942, maxDocs=44218)
0.046875 = fieldNorm(doc=916)
0.5 = coord(1/2)
0.5 = coord(1/2)
- Abstract
- In this paper, we present a topic discovery system aimed to reveal the implicit knowledge present in news streams. This knowledge is expressed as a hierarchy of topic/subtopics, where each topic contains the set of documents that are related to it and a summary extracted from these documents. Summaries so built are useful to browse and select topics of interest from the generated hierarchies. Our proposal consists of a new incremental hierarchical clustering algorithm, which combines both partitional and agglomerative approaches, taking the main benefits from them. Finally, a new summarization method based on Testor Theory has been proposed to build the topic summaries. Experimental results in the TDT2 collection demonstrate its usefulness and effectiveness not only as a topic detection system, but also as a classification and summarization tool.
- Type
- a