Search (4 results, page 1 of 1)

  • × author_ss:"Chen, H.-H."
  • × language_ss:"e"
  • × year_i:[2010 TO 2020}
  1. Lee, L.-H.; Chen, H.-H.: Mining search intents for collaborative cyberporn filtering (2012) 0.05
    0.050061207 = product of:
      0.100122415 = sum of:
        0.010194084 = product of:
          0.040776335 = sum of:
            0.040776335 = weight(_text_:based in 4988) [ClassicSimilarity], result of:
              0.040776335 = score(doc=4988,freq=6.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.28829288 = fieldWeight in 4988, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4988)
          0.25 = coord(1/4)
        0.08992833 = weight(_text_:frequency in 4988) [ClassicSimilarity], result of:
          0.08992833 = score(doc=4988,freq=2.0), product of:
            0.27643865 = queryWeight, product of:
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.04694356 = queryNorm
            0.32531026 = fieldWeight in 4988, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4988)
      0.5 = coord(2/4)
    
    Abstract
    This article presents a search-intent-based method to generate pornographic blacklists for collaborative cyberporn filtering. A novel porn-detection framework that can find newly appearing pornographic web pages by mining search query logs is proposed. First, suspected queries are identified along with their clicked URLs by an automatically constructed lexicon. Then, a candidate URL is determined if the number of clicks satisfies majority voting rules. Finally, a candidate whose URL contains at least one categorical keyword will be included in a blacklist. Several experiments are conducted on an MSN search porn dataset to demonstrate the effectiveness of our method. The resulting blacklist generated by our search-intent-based method achieves high precision (0.701) while maintaining a favorably low false-positive rate (0.086). The experiments of a real-life filtering simulation reveal that our proposed method with its accumulative update strategy can achieve 44.15% of a macro-averaging blocking rate, when the update frequency is set to 1 day. In addition, the overblocking rates are less than 9% with time change due to the strong advantages of our search-intent-based method. This user-behavior-oriented method can be easily applied to search engines for incorporating only implicit collective intelligence from query logs without other efforts. In practice, it is complementary to intelligent content analysis for keeping up with the changing trails of objectionable websites from users' perspectives.
  2. Liu, J.S.; Chen, H.-H.; Ho, M.H.-C.; Li, Y.-C.: Citations with different levels of relevancy : tracing the main paths of legal opinions (2014) 0.00
    0.002548521 = product of:
      0.010194084 = sum of:
        0.010194084 = product of:
          0.040776335 = sum of:
            0.040776335 = weight(_text_:based in 1546) [ClassicSimilarity], result of:
              0.040776335 = score(doc=1546,freq=6.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.28829288 = fieldWeight in 1546, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1546)
          0.25 = coord(1/4)
      0.25 = coord(1/4)
    
    Abstract
    This study explores the effect from considering citation relevancy in the main path analysis. Traditional citation-based analyses treat all citations equally even though there can be various reasons and different levels of relevancy for one document to reference another. Taking the relevancy level into consideration is intuitively advantageous because it adopts more accurate information and will thus make the results of a citation-based analysis more trustworthy. This is nevertheless a challenging task. We are aware of no citation-based analysis that has taken the relevancy level into consideration. The difficulty lies in the fact that the existing patent or patent citation database provides no readily available relevancy level information. We overcome this issue by obtaining citation relevancy information from a legal database that has relevancy level ranked by legal experts. This paper selects trademark dilution, a legal concept that has been the subject of many lawsuit cases, as the target for exploration. We apply main path analysis, taking citation relevancy into consideration, and verify the results against a set of test cases that are mentioned in an authoritative trademark book. The findings show that relevancy information helps main path analysis uncover legal cases of higher importance. Nevertheless, in terms of the number of significant cases retrieved, relevancy information does not seem to make a noticeable difference.
  3. Tsai, M.-.F.; Chen, H.-H.; Wang, Y.-T.: Learning a merge model for multilingual information retrieval (2011) 0.00
    0.0020808585 = product of:
      0.008323434 = sum of:
        0.008323434 = product of:
          0.033293735 = sum of:
            0.033293735 = weight(_text_:based in 2750) [ClassicSimilarity], result of:
              0.033293735 = score(doc=2750,freq=4.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.23539014 = fieldWeight in 2750, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2750)
          0.25 = coord(1/4)
      0.25 = coord(1/4)
    
    Abstract
    This paper proposes a learning approach for the merging process in multilingual information retrieval (MLIR). To conduct the learning approach, we present a number of features that may influence the MLIR merging process. These features are mainly extracted from three levels: query, document, and translation. After the feature extraction, we then use the FRank ranking algorithm to construct a merge model. To the best of our knowledge, this practice is the first attempt to use a learning-based ranking algorithm to construct a merge model for MLIR merging. In our experiments, three test collections for the task of crosslingual information retrieval (CLIR) in NTCIR3, 4, and 5 are employed to assess the performance of our proposed method. Moreover, several merging methods are also carried out for a comparison, including traditional merging methods, the 2-step merging strategy, and the merging method based on logistic regression. The experimental results show that our proposed method can significantly improve merging quality on two different types of datasets. In addition to the effectiveness, through the merge model generated by FRank, our method can further identify key factors that influence the merging process. This information might provide us more insight and understanding into MLIR merging.
  4. Hsu, M.-H.; Chen, H.-H.: Efficient and effective prediction of social tags to enhance Web search (2011) 0.00
    0.0014713892 = product of:
      0.005885557 = sum of:
        0.005885557 = product of:
          0.023542227 = sum of:
            0.023542227 = weight(_text_:based in 4625) [ClassicSimilarity], result of:
              0.023542227 = score(doc=4625,freq=2.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.16644597 = fieldWeight in 4625, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4625)
          0.25 = coord(1/4)
      0.25 = coord(1/4)
    
    Abstract
    As the web has grown into an integral part of daily life, social annotation has become a popular manner for web users to manage resources. This method of management has many potential applications, but it is limited in applicability by the cold-start problem, especially for new resources on the web. In this article, we study automatic tag prediction for web pages comprehensively and utilize the predicted tags to improve search performance. First, we explore the stabilizing phenomenon of tag usage in a social bookmarking system. Then, we propose a two-stage tag prediction approach, which is efficient and is effective in making use of early annotations from users. In the first stage, content-based ranking, candidate tags are selected and ranked to generate an initial tag list. In the second stage, random-walk re-ranking, we adopt a random-walk model that utilizes tag co-occurrence information to re-rank the initial list. The experimental results show that our algorithm effectively proposes appropriate tags for target web pages. In addition, we present a framework to incorporate tag prediction in a general web search. The experimental results of the web search validate the hypothesis that the proposed framework significantly enhances the typical retrieval model.