Diese Datenbank enthält über 40.000 Dokumente zu Themen aus den Bereichen Formalerschließung – Inhaltserschließung – Information Retrieval.
© 2015 W. Gödert, TH Köln, Institut für Informationswissenschaft / Powered by litecat, BIS Oldenburg (Stand: 23. Dezember 2017)
1Negm, E. ; AbdelRahman, S. ; Bahgat, R.: PREFCA: a portal retrieval engine based on formal concept analysis.
In: Information processing and management. 53(2017) no.1, S.203-222.
Abstract: The web is a network of linked sites whereby each site either forms a physical portal or a standalone page. In the former case, the portal presents an access point to its embedded web pages that coherently present a specific topic. In the latter case, there are millions of standalone web pages, that are scattered throughout the web, having the same topic and could be conceptually linked together to form virtual portals. Search engines have been developed to help users in reaching the adequate pages in an efficient and effective manner. All the known current search engine techniques rely on the web page as the basic atomic search unit. They ignore the conceptual links, that reveal the implicit web related meanings, among the retrieved pages. However, building a semantic model for the whole portal may contain more semantic information than a model of scattered individual pages. In addition, user queries can be poor and contain imprecise terms that do not reflect the real user intention. Consequently, retrieving the standalone individual pages that are directly related to the query may not satisfy the user's need. In this paper, we propose PREFCA, a Portal Retrieval Engine based on Formal Concept Analysis that relies on the portal as the main search unit. PREFCA consists of three phases: First, the information extraction phase that is concerned with extracting portal's semantic data. Second, the formal concept analysis phase that utilizes formal concept analysis to discover the conceptual links among portal and attributes. Finally, the information retrieval phase where we propose a portal ranking method to retrieve ranked pairs of portals and embedded pages. Additionally, we apply the network analysis rules to output some portal characteristics. We evaluated PREFCA using two data sets, namely the Forum for Information Retrieval Evaluation 2010 and ClueWeb09 (category B) test data, for physical and virtual portals respectively. PREFCA proves higher F-measure accuracy, better Mean Average Precision ranking and comparable network analysis and efficiency results than other search engine approaches, namely Term Frequency Inverse Document Frequency (TF-IDF), Latent Semantic Analysis (LSA), and BM25 techniques. As well, it gains high Mean Average Precision in comparison with learning to rank techniques. Moreover, PREFCA also gains better reach time than Carrot as a well-known topic-based search engine.
Inhalt: Vgl.: http://www.sciencedirect.com/science/article/pii/S0306457316303569 [http://dx.doi.org/10.1016/j.ipm.2016.08.002].
Themenfeld: Formale Begriffsanalyse