Document (#38819)

Author
Lee, L.-H.
Juan, Y.-C.
Tseng, W.-L.
Chen, H.-H.
Tseng, Y.-H.
Title
Mining browsing behaviors for objectionable content filtering
Source
Journal of the Association for Information Science and Technology. 66(2015) no.5, S.930-942
Year
2015
Abstract
This article explores users' browsing intents to predict the category of a user's next access during web surfing and applies the results to filter objectionable content, such as pornography, gambling, violence, and drugs. Users' access trails in terms of category sequences in click-through data are employed to mine users' web browsing behaviors. Contextual relationships of URL categories are learned by the hidden Markov model. The top-level domains (TLDs) extracted from URLs themselves and the corresponding categories are caught by the TLD model. Given a URL to be predicted, its TLD and current context are empirically combined in an aggregation model. In addition to the uses of the current context, the predictions of the URL accessed previously in different contexts by various users are also considered by majority rule to improve the aggregation model. Large-scale experiments show that the advanced aggregation approach achieves promising performance while maintaining an acceptably low false positive rate. Different strategies are introduced to integrate the model with the blacklist it generates for filtering objectionable web pages without analyzing their content. In practice, this is complementary to the existing content analysis from users' behavioral perspectives.
Content
Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23217/abstract.

Similar documents (author)

  1. Tseng, Y.-H.: Automatic cataloguing and searching for retrospective data by use of OCR text (2001) 2.23
    2.2263923 = sum of:
      2.2263923 = product of:
        4.4527845 = sum of:
          4.4527845 = weight(author_txt:tseng in 5421) [ClassicSimilarity], result of:
            4.4527845 = score(doc=5421,freq=1.0), product of:
              0.97521126 = queryWeight, product of:
                2.9689107 = boost
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.03596985 = queryNorm
              4.565969 = fieldWeight in 5421, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.5 = fieldNorm(doc=5421)
        0.5 = coord(1/2)
    
  2. Tseng, Y.-H.: Keyword extraction techniques and relevance feedback (1997) 2.23
    2.2263923 = sum of:
      2.2263923 = product of:
        4.4527845 = sum of:
          4.4527845 = weight(author_txt:tseng in 1830) [ClassicSimilarity], result of:
            4.4527845 = score(doc=1830,freq=1.0), product of:
              0.97521126 = queryWeight, product of:
                2.9689107 = boost
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.03596985 = queryNorm
              4.565969 = fieldWeight in 1830, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.5 = fieldNorm(doc=1830)
        0.5 = coord(1/2)
    
  3. Tseng, Y.-H.: Solving vocabulary problems with interactive query expansion (1998) 2.23
    2.2263923 = sum of:
      2.2263923 = product of:
        4.4527845 = sum of:
          4.4527845 = weight(author_txt:tseng in 5159) [ClassicSimilarity], result of:
            4.4527845 = score(doc=5159,freq=1.0), product of:
              0.97521126 = queryWeight, product of:
                2.9689107 = boost
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.03596985 = queryNorm
              4.565969 = fieldWeight in 5159, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.5 = fieldNorm(doc=5159)
        0.5 = coord(1/2)
    
  4. Tseng, Y.H.; Lin, Y.I.: Evaluation of fuzzy search, term suggestion, and term relevance feedback in an OPAC system (1998) 2.23
    2.2263923 = sum of:
      2.2263923 = product of:
        4.4527845 = sum of:
          4.4527845 = weight(author_txt:tseng in 6430) [ClassicSimilarity], result of:
            4.4527845 = score(doc=6430,freq=1.0), product of:
              0.97521126 = queryWeight, product of:
                2.9689107 = boost
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.03596985 = queryNorm
              4.565969 = fieldWeight in 6430, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.5 = fieldNorm(doc=6430)
        0.5 = coord(1/2)
    
  5. Tseng, Y.-H.: Automatic thesaurus generation for Chinese documents (2002) 2.23
    2.2263923 = sum of:
      2.2263923 = product of:
        4.4527845 = sum of:
          4.4527845 = weight(author_txt:tseng in 5226) [ClassicSimilarity], result of:
            4.4527845 = score(doc=5226,freq=1.0), product of:
              0.97521126 = queryWeight, product of:
                2.9689107 = boost
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.03596985 = queryNorm
              4.565969 = fieldWeight in 5226, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.5 = fieldNorm(doc=5226)
        0.5 = coord(1/2)
    

Similar documents (content)

  1. Lee, L.-H.; Chen, H.-H.: Mining search intents for collaborative cyberporn filtering (2012) 0.15
    0.14609823 = sum of:
      0.14609823 = product of:
        0.6087426 = sum of:
          0.07058806 = weight(abstract_txt:false in 4988) [ClassicSimilarity], result of:
            0.07058806 = score(doc=4988,freq=1.0), product of:
              0.14789723 = queryWeight, product of:
                1.0066478 = boost
                7.636444 = idf(docFreq=57, maxDocs=44218)
                0.01923939 = queryNorm
              0.47727776 = fieldWeight in 4988, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.636444 = idf(docFreq=57, maxDocs=44218)
                0.0625 = fieldNorm(doc=4988)
          0.11266265 = weight(abstract_txt:trails in 4988) [ClassicSimilarity], result of:
            0.11266265 = score(doc=4988,freq=1.0), product of:
              0.20198815 = queryWeight, product of:
                1.1764148 = boost
                8.924298 = idf(docFreq=15, maxDocs=44218)
                0.01923939 = queryNorm
              0.55776864 = fieldWeight in 4988, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.924298 = idf(docFreq=15, maxDocs=44218)
                0.0625 = fieldNorm(doc=4988)
          0.21784766 = weight(abstract_txt:blacklist in 4988) [ClassicSimilarity], result of:
            0.21784766 = score(doc=4988,freq=2.0), product of:
              0.24882719 = queryWeight, product of:
                1.3057092 = boost
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.01923939 = queryNorm
              0.8754978 = fieldWeight in 4988, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.0625 = fieldNorm(doc=4988)
          0.1252864 = weight(abstract_txt:filtering in 4988) [ClassicSimilarity], result of:
            0.1252864 = score(doc=4988,freq=2.0), product of:
              0.21680793 = queryWeight, product of:
                1.7236542 = boost
                6.537832 = idf(docFreq=173, maxDocs=44218)
                0.01923939 = queryNorm
              0.57786816 = fieldWeight in 4988, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.537832 = idf(docFreq=173, maxDocs=44218)
                0.0625 = fieldNorm(doc=4988)
          0.046304002 = weight(abstract_txt:content in 4988) [ClassicSimilarity], result of:
            0.046304002 = score(doc=4988,freq=1.0), product of:
              0.17724401 = queryWeight, product of:
                2.2040088 = boost
                4.17991 = idf(docFreq=1838, maxDocs=44218)
                0.01923939 = queryNorm
              0.2612444 = fieldWeight in 4988, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.17991 = idf(docFreq=1838, maxDocs=44218)
                0.0625 = fieldNorm(doc=4988)
          0.036053825 = weight(abstract_txt:users in 4988) [ClassicSimilarity], result of:
            0.036053825 = score(doc=4988,freq=1.0), product of:
              0.16159582 = queryWeight, product of:
                2.3528683 = boost
                3.569778 = idf(docFreq=3384, maxDocs=44218)
                0.01923939 = queryNorm
              0.22311112 = fieldWeight in 4988, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.569778 = idf(docFreq=3384, maxDocs=44218)
                0.0625 = fieldNorm(doc=4988)
        0.24 = coord(6/25)
    
  2. Herrera-Viedma, E.; Pasi, G.; Lopez-Herrera, A.G.; Porcel; C.: Evaluating the information quality of Web sites : a methodology based on fuzzy computing with words (2006) 0.10
    0.09885234 = sum of:
      0.09885234 = product of:
        0.49426168 = sum of:
          0.10495602 = weight(abstract_txt:generates in 5286) [ClassicSimilarity], result of:
            0.10495602 = score(doc=5286,freq=2.0), product of:
              0.15292113 = queryWeight, product of:
                1.0236024 = boost
                7.7650614 = idf(docFreq=50, maxDocs=44218)
                0.01923939 = queryNorm
              0.6863409 = fieldWeight in 5286, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.7650614 = idf(docFreq=50, maxDocs=44218)
                0.0625 = fieldNorm(doc=5286)
          0.08859086 = weight(abstract_txt:filtering in 5286) [ClassicSimilarity], result of:
            0.08859086 = score(doc=5286,freq=1.0), product of:
              0.21680793 = queryWeight, product of:
                1.7236542 = boost
                6.537832 = idf(docFreq=173, maxDocs=44218)
                0.01923939 = queryNorm
              0.4086145 = fieldWeight in 5286, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.537832 = idf(docFreq=173, maxDocs=44218)
                0.0625 = fieldNorm(doc=5286)
          0.06548374 = weight(abstract_txt:content in 5286) [ClassicSimilarity], result of:
            0.06548374 = score(doc=5286,freq=2.0), product of:
              0.17724401 = queryWeight, product of:
                2.2040088 = boost
                4.17991 = idf(docFreq=1838, maxDocs=44218)
                0.01923939 = queryNorm
              0.36945534 = fieldWeight in 5286, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.17991 = idf(docFreq=1838, maxDocs=44218)
                0.0625 = fieldNorm(doc=5286)
          0.05098781 = weight(abstract_txt:users in 5286) [ClassicSimilarity], result of:
            0.05098781 = score(doc=5286,freq=2.0), product of:
              0.16159582 = queryWeight, product of:
                2.3528683 = boost
                3.569778 = idf(docFreq=3384, maxDocs=44218)
                0.01923939 = queryNorm
              0.31552678 = fieldWeight in 5286, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.569778 = idf(docFreq=3384, maxDocs=44218)
                0.0625 = fieldNorm(doc=5286)
          0.18424323 = weight(abstract_txt:aggregation in 5286) [ClassicSimilarity], result of:
            0.18424323 = score(doc=5286,freq=1.0), product of:
              0.40436542 = queryWeight, product of:
                2.8830035 = boost
                7.290168 = idf(docFreq=81, maxDocs=44218)
                0.01923939 = queryNorm
              0.4556355 = fieldWeight in 5286, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.290168 = idf(docFreq=81, maxDocs=44218)
                0.0625 = fieldNorm(doc=5286)
        0.2 = coord(5/25)
    
  3. Rorissa, A.; Iyer, H.: Theories of cognition and image categorization : what category labels reveal about basic level theory (2008) 0.09
    0.088205166 = sum of:
      0.088205166 = product of:
        0.44102582 = sum of:
          0.055007067 = weight(abstract_txt:categories in 1958) [ClassicSimilarity], result of:
            0.055007067 = score(doc=1958,freq=1.0), product of:
              0.13598412 = queryWeight, product of:
                1.3650753 = boost
                5.17774 = idf(docFreq=677, maxDocs=44218)
                0.01923939 = queryNorm
              0.40451095 = fieldWeight in 1958, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.17774 = idf(docFreq=677, maxDocs=44218)
                0.078125 = fieldNorm(doc=1958)
          0.13711473 = weight(abstract_txt:category in 1958) [ClassicSimilarity], result of:
            0.13711473 = score(doc=1958,freq=2.0), product of:
              0.19842146 = queryWeight, product of:
                1.6489476 = boost
                6.2544694 = idf(docFreq=230, maxDocs=44218)
                0.01923939 = queryNorm
              0.69102776 = fieldWeight in 1958, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.2544694 = idf(docFreq=230, maxDocs=44218)
                0.078125 = fieldNorm(doc=1958)
          0.057880003 = weight(abstract_txt:content in 1958) [ClassicSimilarity], result of:
            0.057880003 = score(doc=1958,freq=1.0), product of:
              0.17724401 = queryWeight, product of:
                2.2040088 = boost
                4.17991 = idf(docFreq=1838, maxDocs=44218)
                0.01923939 = queryNorm
              0.3265555 = fieldWeight in 1958, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.17991 = idf(docFreq=1838, maxDocs=44218)
                0.078125 = fieldNorm(doc=1958)
          0.14595672 = weight(abstract_txt:browsing in 1958) [ClassicSimilarity], result of:
            0.14595672 = score(doc=1958,freq=2.0), product of:
              0.23679854 = queryWeight, product of:
                2.2062142 = boost
                5.57879 = idf(docFreq=453, maxDocs=44218)
                0.01923939 = queryNorm
              0.6163751 = fieldWeight in 1958, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.57879 = idf(docFreq=453, maxDocs=44218)
                0.078125 = fieldNorm(doc=1958)
          0.045067284 = weight(abstract_txt:users in 1958) [ClassicSimilarity], result of:
            0.045067284 = score(doc=1958,freq=1.0), product of:
              0.16159582 = queryWeight, product of:
                2.3528683 = boost
                3.569778 = idf(docFreq=3384, maxDocs=44218)
                0.01923939 = queryNorm
              0.2788889 = fieldWeight in 1958, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.569778 = idf(docFreq=3384, maxDocs=44218)
                0.078125 = fieldNorm(doc=1958)
        0.2 = coord(5/25)
    
  4. Lee, L.-H.; Luh, C.-J.: Generation of pornographic blacklist and its incremental update using an inverse chi-square based method (2008) 0.09
    0.087482825 = sum of:
      0.087482825 = product of:
        0.54676765 = sum of:
          0.08823507 = weight(abstract_txt:false in 1340) [ClassicSimilarity], result of:
            0.08823507 = score(doc=1340,freq=1.0), product of:
              0.14789723 = queryWeight, product of:
                1.0066478 = boost
                7.636444 = idf(docFreq=57, maxDocs=44218)
                0.01923939 = queryNorm
              0.5965972 = fieldWeight in 1340, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.636444 = idf(docFreq=57, maxDocs=44218)
                0.078125 = fieldNorm(doc=1340)
          0.12834302 = weight(abstract_txt:pornography in 1340) [ClassicSimilarity], result of:
            0.12834302 = score(doc=1340,freq=1.0), product of:
              0.18986607 = queryWeight, product of:
                1.1405681 = boost
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.01923939 = queryNorm
              0.675966 = fieldWeight in 1340, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.652365 = idf(docFreq=20, maxDocs=44218)
                0.078125 = fieldNorm(doc=1340)
          0.27230957 = weight(abstract_txt:blacklist in 1340) [ClassicSimilarity], result of:
            0.27230957 = score(doc=1340,freq=2.0), product of:
              0.24882719 = queryWeight, product of:
                1.3057092 = boost
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.01923939 = queryNorm
              1.0943723 = fieldWeight in 1340, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.078125 = fieldNorm(doc=1340)
          0.057880003 = weight(abstract_txt:content in 1340) [ClassicSimilarity], result of:
            0.057880003 = score(doc=1340,freq=1.0), product of:
              0.17724401 = queryWeight, product of:
                2.2040088 = boost
                4.17991 = idf(docFreq=1838, maxDocs=44218)
                0.01923939 = queryNorm
              0.3265555 = fieldWeight in 1340, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.17991 = idf(docFreq=1838, maxDocs=44218)
                0.078125 = fieldNorm(doc=1340)
        0.16 = coord(4/25)
    
  5. Bondarenko, O.; Janssen, R.: Connecting visual cues to semantic judgments in the context of the office environment (2009) 0.09
    0.08683436 = sum of:
      0.08683436 = product of:
        0.4341718 = sum of:
          0.044885725 = weight(abstract_txt:context in 2797) [ClassicSimilarity], result of:
            0.044885725 = score(doc=2797,freq=3.0), product of:
              0.09553906 = queryWeight, product of:
                1.1442028 = boost
                4.339969 = idf(docFreq=1566, maxDocs=44218)
                0.01923939 = queryNorm
              0.46981543 = fieldWeight in 2797, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.339969 = idf(docFreq=1566, maxDocs=44218)
                0.0625 = fieldNorm(doc=2797)
          0.06223339 = weight(abstract_txt:categories in 2797) [ClassicSimilarity], result of:
            0.06223339 = score(doc=2797,freq=2.0), product of:
              0.13598412 = queryWeight, product of:
                1.3650753 = boost
                5.17774 = idf(docFreq=677, maxDocs=44218)
                0.01923939 = queryNorm
              0.45765188 = fieldWeight in 2797, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.17774 = idf(docFreq=677, maxDocs=44218)
                0.0625 = fieldNorm(doc=2797)
          0.092608005 = weight(abstract_txt:content in 2797) [ClassicSimilarity], result of:
            0.092608005 = score(doc=2797,freq=4.0), product of:
              0.17724401 = queryWeight, product of:
                2.2040088 = boost
                4.17991 = idf(docFreq=1838, maxDocs=44218)
                0.01923939 = queryNorm
              0.5224888 = fieldWeight in 2797, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.17991 = idf(docFreq=1838, maxDocs=44218)
                0.0625 = fieldNorm(doc=2797)
          0.05020143 = weight(abstract_txt:model in 2797) [ClassicSimilarity], result of:
            0.05020143 = score(doc=2797,freq=1.0), product of:
              0.20149918 = queryWeight, product of:
                2.6273575 = boost
                3.986234 = idf(docFreq=2231, maxDocs=44218)
                0.01923939 = queryNorm
              0.24913962 = fieldWeight in 2797, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.986234 = idf(docFreq=2231, maxDocs=44218)
                0.0625 = fieldNorm(doc=2797)
          0.18424323 = weight(abstract_txt:aggregation in 2797) [ClassicSimilarity], result of:
            0.18424323 = score(doc=2797,freq=1.0), product of:
              0.40436542 = queryWeight, product of:
                2.8830035 = boost
                7.290168 = idf(docFreq=81, maxDocs=44218)
                0.01923939 = queryNorm
              0.4556355 = fieldWeight in 2797, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.290168 = idf(docFreq=81, maxDocs=44218)
                0.0625 = fieldNorm(doc=2797)
        0.2 = coord(5/25)