Lee, Y.-H.; Wei, C.-P.; Hu, P.J.-H.: ¬An ontology-based technique for preserving user preferences in document-category evolutions (2011)
0.00
0.0020724537 = product of:
0.0041449075 = sum of:
0.0041449075 = product of:
0.008289815 = sum of:
0.008289815 = weight(_text_:a in 4353) [ClassicSimilarity], result of:
0.008289815 = score(doc=4353,freq=28.0), product of:
0.043477926 = queryWeight, product of:
1.153047 = idf(docFreq=37942, maxDocs=44218)
0.037706986 = queryNorm
0.19066721 = fieldWeight in 4353, product of:
5.2915025 = tf(freq=28.0), with freq of:
28.0 = termFreq=28.0
1.153047 = idf(docFreq=37942, maxDocs=44218)
0.03125 = fieldNorm(doc=4353)
0.5 = coord(1/2)
0.5 = coord(1/2)
- Abstract
- Influxes of new documents over time necessitate reorganization of document categories that a user has created previously. As documents are available in increasing quantities and accelerating frequencies, the manual approach to reorganizing document categories becomes prohibitively tedious and ineffective, thus making a system-oriented approach appealing. Previous research (Larsen & Aone, 1999; Pantel & Lin, 2002) largely has followed the category-discovery approach, which groups documents by using a document-clustering technique to partition a document corpus. This approach does not consider existing categories a user created previously, which in effect reflect his or her document-grouping preference. A handful of studies (Wei, Hu, & Dong, 2002; Wei, Hu, & Lee, 2009) have taken a category-evolution approach to develop lexicon-based techniques for preserving user preference in document-category reorganizations, but have serious limitations. Responding to the significance of document-category reorganizations and addressing the fundamental problems of salient, lexicon-based techniques, we develop an ontology-based category evolution (ONCE), a technique that first enriches a concept hierarchy by incorporating important concept descriptors (jointly referred to as an ontology) and then employs the resulting enriched ontology to support category evolutions at a concept level rather than analyzing and comparing feature vectors at the lexicon level. We empirically evaluate our proposed technique and compare it with two benchmark techniques: CE2 (a lexicon-based category-evolution technique) and hierarchical agglomerative clustering (HAC; a conventional hierarchical document-clustering technique). Overall, our results show that the ONCE technique is more effective than are CE2 and HAC, across all the scenarios studied. Furthermore, the completeness of a concept hierarchy has important impacts on the performance of the proposed technique. Our results have some important implications for further research.
- Type
- a