Search (2 results, page 1 of 1)

  • × author_ss:"Lee, L.-H."
  • × author_ss:"Chen, H.-H."
  1. Lee, L.-H.; Juan, Y.-C.; Tseng, W.-L.; Chen, H.-H.; Tseng, Y.-H.: Mining browsing behaviors for objectionable content filtering (2015) 0.02
    0.016453793 = product of:
      0.049361378 = sum of:
        0.049361378 = weight(_text_:web in 1818) [ClassicSimilarity], result of:
          0.049361378 = score(doc=1818,freq=6.0), product of:
            0.15807624 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.048437484 = queryNorm
            0.3122631 = fieldWeight in 1818, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1818)
      0.33333334 = coord(1/3)
    
    Abstract
    This article explores users' browsing intents to predict the category of a user's next access during web surfing and applies the results to filter objectionable content, such as pornography, gambling, violence, and drugs. Users' access trails in terms of category sequences in click-through data are employed to mine users' web browsing behaviors. Contextual relationships of URL categories are learned by the hidden Markov model. The top-level domains (TLDs) extracted from URLs themselves and the corresponding categories are caught by the TLD model. Given a URL to be predicted, its TLD and current context are empirically combined in an aggregation model. In addition to the uses of the current context, the predictions of the URL accessed previously in different contexts by various users are also considered by majority rule to improve the aggregation model. Large-scale experiments show that the advanced aggregation approach achieves promising performance while maintaining an acceptably low false positive rate. Different strategies are introduced to integrate the model with the blacklist it generates for filtering objectionable web pages without analyzing their content. In practice, this is complementary to the existing content analysis from users' behavioral perspectives.
  2. Lee, L.-H.; Chen, H.-H.: Mining search intents for collaborative cyberporn filtering (2012) 0.01
    0.009499602 = product of:
      0.028498804 = sum of:
        0.028498804 = weight(_text_:web in 4988) [ClassicSimilarity], result of:
          0.028498804 = score(doc=4988,freq=2.0), product of:
            0.15807624 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.048437484 = queryNorm
            0.18028519 = fieldWeight in 4988, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4988)
      0.33333334 = coord(1/3)
    
    Abstract
    This article presents a search-intent-based method to generate pornographic blacklists for collaborative cyberporn filtering. A novel porn-detection framework that can find newly appearing pornographic web pages by mining search query logs is proposed. First, suspected queries are identified along with their clicked URLs by an automatically constructed lexicon. Then, a candidate URL is determined if the number of clicks satisfies majority voting rules. Finally, a candidate whose URL contains at least one categorical keyword will be included in a blacklist. Several experiments are conducted on an MSN search porn dataset to demonstrate the effectiveness of our method. The resulting blacklist generated by our search-intent-based method achieves high precision (0.701) while maintaining a favorably low false-positive rate (0.086). The experiments of a real-life filtering simulation reveal that our proposed method with its accumulative update strategy can achieve 44.15% of a macro-averaging blocking rate, when the update frequency is set to 1 day. In addition, the overblocking rates are less than 9% with time change due to the strong advantages of our search-intent-based method. This user-behavior-oriented method can be easily applied to search engines for incorporating only implicit collective intelligence from query logs without other efforts. In practice, it is complementary to intelligent content analysis for keeping up with the changing trails of objectionable websites from users' perspectives.