Diese Datenbank enthält über 40.000 Dokumente zu Themen aus den Bereichen Formalerschließung – Inhaltserschließung – Information Retrieval.
© 2015 W. Gödert, TH Köln, Institut für Informationswissenschaft / Powered by litecat, BIS Oldenburg (Stand: 03. März 2020)
1Lee, L.-H. ; Juan, Y.-C. ; Tseng, W.-L. ; Chen, H.-H. ; Tseng, Y.-H.: Mining browsing behaviors for objectionable content filtering.
In: Journal of the Association for Information Science and Technology. 66(2015) no.5, S.930-942.
Abstract: This article explores users' browsing intents to predict the category of a user's next access during web surfing and applies the results to filter objectionable content, such as pornography, gambling, violence, and drugs. Users' access trails in terms of category sequences in click-through data are employed to mine users' web browsing behaviors. Contextual relationships of URL categories are learned by the hidden Markov model. The top-level domains (TLDs) extracted from URLs themselves and the corresponding categories are caught by the TLD model. Given a URL to be predicted, its TLD and current context are empirically combined in an aggregation model. In addition to the uses of the current context, the predictions of the URL accessed previously in different contexts by various users are also considered by majority rule to improve the aggregation model. Large-scale experiments show that the advanced aggregation approach achieves promising performance while maintaining an acceptably low false positive rate. Different strategies are introduced to integrate the model with the blacklist it generates for filtering objectionable web pages without analyzing their content. In practice, this is complementary to the existing content analysis from users' behavioral perspectives.
Inhalt: Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23217/abstract.
2Lee, L.-H. ; Chen, H.-H.: Mining search intents for collaborative cyberporn filtering.
In: Journal of the American Society for Information Science and Technology. 63(2012) no.2, S.366-376.
Abstract: This article presents a search-intent-based method to generate pornographic blacklists for collaborative cyberporn filtering. A novel porn-detection framework that can find newly appearing pornographic web pages by mining search query logs is proposed. First, suspected queries are identified along with their clicked URLs by an automatically constructed lexicon. Then, a candidate URL is determined if the number of clicks satisfies majority voting rules. Finally, a candidate whose URL contains at least one categorical keyword will be included in a blacklist. Several experiments are conducted on an MSN search porn dataset to demonstrate the effectiveness of our method. The resulting blacklist generated by our search-intent-based method achieves high precision (0.701) while maintaining a favorably low false-positive rate (0.086). The experiments of a real-life filtering simulation reveal that our proposed method with its accumulative update strategy can achieve 44.15% of a macro-averaging blocking rate, when the update frequency is set to 1 day. In addition, the overblocking rates are less than 9% with time change due to the strong advantages of our search-intent-based method. This user-behavior-oriented method can be easily applied to search engines for incorporating only implicit collective intelligence from query logs without other efforts. In practice, it is complementary to intelligent content analysis for keeping up with the changing trails of objectionable websites from users' perspectives.
3Lee, L.-H. ; Luh, C.-J.: Generation of pornographic blacklist and its incremental update using an inverse chi-square based method.
In: Information processing and management. 44(2008) no.5, S.1698-1706.
Abstract: This study presented an inverse chi-square based web content classification system that works along with an incremental update mechanism for incremental generation of pornographic blacklist. The proposed system, as indicated from the experimental results, can classify bilingual (English and Chinese) web pages at an average precision rate of 97.11%; while maintaining a favorably low false positive rate. Such satisfactory performance was obtained under a cost-effective parameter configuration used in inverse chi-square calculations. The proposed incremental update mechanism operates on the linking structure of pornographic hubs to locate newly added pornographic sites. The resulting blacklist has been empirically verified to be comparatively responsive to the growth dynamics of pornography sites than three public domain blacklists.
4Pang, B. ; Lee, L.: Opinion mining and sentiment analysis.
Boston, MA : Now Publ., 2008. IX, 137 S.
(Foundations and trends(r) in information retrieval; 2,1/2)
Abstract: An important part of our information-gathering behavior has always been to find out what other people think. With the growing availability and popularity of opinion-rich resources such as online review sites and personal blogs, new opportunities and challenges arise as people can, and do, actively use information technologies to seek out and understand the opinions of others. The sudden eruption of activity in the area of opinion mining and sentiment analysis, which deals with the computational treatment of opinion, sentiment, and subjectivity in text, has thus occurred at least in part as a direct response to the surge of interest in new systems that deal directly with opinions as a first-class object. Opinion Mining and Sentiment Analysis covers techniques and approaches that promise to directly enable opinion-oriented information-seeking systems. The focus is on methods that seek to address the new challenges raised by sentiment-aware applications, as compared to those that are already present in more traditional fact-based analysis. The survey includes an enumeration of the various applications, a look at general challenges and discusses categorization, extraction and summarization. Finally, it moves beyond just the technical issues, devoting significant attention to the broader implications that the development of opinion-oriented information-access services have: questions of privacy, vulnerability to manipulation, and whether or not reviews can have measurable economic impact. To facilitate future work, a discussion of available resources, benchmark datasets, and evaluation campaigns is also provided. Opinion Mining and Sentiment Analysis is the first such comprehensive survey of this vibrant and important research area and will be of interest to anyone with an interest in opinion-oriented information-seeking systems.
Inhalt: Table of contents 1. Introduction 2. Applications 3. General Challenges 4. Classification and Extraction 5. Summarization 6. Broader Implications 7. Publicly Available Resources 8. Concluding Remarks References
LCSH: Information behavior ; Research ; Information retrieval ; Public opinion ; Text processing (Computer science)
RSWK: World Wide Web / Meinungsäußerung / Data Mining ; Data Mining / Psycholinguistik (BVB)
BK: 54.72 (Künstliche Intelligenz)
RVK: ST 530