Search (7 results, page 1 of 1)

  • × author_ss:"Chen, H.-H."
  1. Ku, L.-W.; Ho, H.-W.; Chen, H.-H.: Opinion mining and relationship discovery using CopeOpi opinion analysis system (2009) 0.02
    0.017020667 = product of:
      0.042551666 = sum of:
        0.032950602 = weight(_text_:system in 2938) [ClassicSimilarity], result of:
          0.032950602 = score(doc=2938,freq=4.0), product of:
            0.13391352 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.04251826 = queryNorm
            0.24605882 = fieldWeight in 2938, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2938)
        0.009601062 = product of:
          0.028803186 = sum of:
            0.028803186 = weight(_text_:22 in 2938) [ClassicSimilarity], result of:
              0.028803186 = score(doc=2938,freq=2.0), product of:
                0.1488917 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04251826 = queryNorm
                0.19345059 = fieldWeight in 2938, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2938)
          0.33333334 = coord(1/3)
      0.4 = coord(2/5)
    
    Abstract
    We present CopeOpi, an opinion-analysis system, which extracts from the Web opinions about specific targets, summarizes the polarity and strength of these opinions, and tracks opinion variations over time. Objects that yield similar opinion tendencies over a certain time period may be correlated due to the latent causal events. CopeOpi discovers relationships among objects based on their opinion-tracking plots and collocations. Event bursts are detected from the tracking plots, and the strength of opinion relationships is determined by the coverage of these plots. To evaluate opinion mining, we use the NTCIR corpus annotated with opinion information at sentence and document levels. CopeOpi achieves sentence- and document-level f-measures of 62% and 74%. For relationship discovery, we collected 1.3M economics-related documents from 93 Web sources over 22 months, and analyzed collocation-based, opinion-based, and hybrid models. We consider as correlated company pairs that demonstrate similar stock-price variations, and selected these as the gold standard for evaluation. Results show that opinion-based and collocation-based models complement each other, and that integrated models perform the best. The top 25, 50, and 100 pairs discovered achieve precision rates of 1, 0.92, and 0.79, respectively.
  2. Bian, G.-W.; Chen, H.-H.: Cross-language information access to multilingual collections on the Internet (2000) 0.02
    0.015792316 = product of:
      0.039480787 = sum of:
        0.027959513 = weight(_text_:system in 4436) [ClassicSimilarity], result of:
          0.027959513 = score(doc=4436,freq=2.0), product of:
            0.13391352 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.04251826 = queryNorm
            0.20878783 = fieldWeight in 4436, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.046875 = fieldNorm(doc=4436)
        0.011521274 = product of:
          0.03456382 = sum of:
            0.03456382 = weight(_text_:22 in 4436) [ClassicSimilarity], result of:
              0.03456382 = score(doc=4436,freq=2.0), product of:
                0.1488917 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04251826 = queryNorm
                0.23214069 = fieldWeight in 4436, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4436)
          0.33333334 = coord(1/3)
      0.4 = coord(2/5)
    
    Abstract
    Language barrier is the major problem that people face in searching for, retrieving, and understanding multilingual collections on the Internet. This paper deals with query translation and document translation in a Chinese-English information retrieval system called MTIR. Bilingual dictionary and monolingual corpus-based approaches are adopted to select suitable tranlated query terms. A machine transliteration algorithm is introduced to resolve proper name searching. We consider several design issues for document translation, including which material is translated, what roles the HTML tags play in translation, what the tradeoff is between the speed performance and the translation performance, and what from the translated result is presented in. About 100.000 Web pages translated in the last 4 months of 1997 are used for quantitative study of online and real-time Web page translation
    Date
    16. 2.2000 14:22:39
  3. Lee, L.-H.; Juan, Y.-C.; Tseng, W.-L.; Chen, H.-H.; Tseng, Y.-H.: Mining browsing behaviors for objectionable content filtering (2015) 0.01
    0.011412249 = product of:
      0.057061244 = sum of:
        0.057061244 = weight(_text_:context in 1818) [ClassicSimilarity], result of:
          0.057061244 = score(doc=1818,freq=4.0), product of:
            0.17622331 = queryWeight, product of:
              4.14465 = idf(docFreq=1904, maxDocs=44218)
              0.04251826 = queryNorm
            0.32380077 = fieldWeight in 1818, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.14465 = idf(docFreq=1904, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1818)
      0.2 = coord(1/5)
    
    Abstract
    This article explores users' browsing intents to predict the category of a user's next access during web surfing and applies the results to filter objectionable content, such as pornography, gambling, violence, and drugs. Users' access trails in terms of category sequences in click-through data are employed to mine users' web browsing behaviors. Contextual relationships of URL categories are learned by the hidden Markov model. The top-level domains (TLDs) extracted from URLs themselves and the corresponding categories are caught by the TLD model. Given a URL to be predicted, its TLD and current context are empirically combined in an aggregation model. In addition to the uses of the current context, the predictions of the URL accessed previously in different contexts by various users are also considered by majority rule to improve the aggregation model. Large-scale experiments show that the advanced aggregation approach achieves promising performance while maintaining an acceptably low false positive rate. Different strategies are introduced to integrate the model with the blacklist it generates for filtering objectionable web pages without analyzing their content. In practice, this is complementary to the existing content analysis from users' behavioral perspectives.
  4. Chen, H.-H.; Kuo, J.-J.; Huang, S.-J.; Lin, C.-J.; Wung, H.-C.: ¬A summarization system for Chinese news from multiple sources (2003) 0.01
    0.011183805 = product of:
      0.055919025 = sum of:
        0.055919025 = weight(_text_:system in 2115) [ClassicSimilarity], result of:
          0.055919025 = score(doc=2115,freq=8.0), product of:
            0.13391352 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.04251826 = queryNorm
            0.41757566 = fieldWeight in 2115, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.046875 = fieldNorm(doc=2115)
      0.2 = coord(1/5)
    
    Abstract
    This article proposes a summarization system for multiple documents. It employs not only named entities and other signatures to cluster news from different sources, but also employs punctuation marks, linking elements, and topic chains to identify the meaningful units (MUs). Using nouns and verbs to identify the similar MUs, focusing and browsing models are applied to represent the summarization results. To reduce information loss during summarization, informative words in a document are introduced. For the evaluation, a question answering system (QA system) is proposed to substitute the human assessors. In large-scale experiments containing 140 questions to 17,877 documents, the results show that those models using informative words outperform pure heuristic voting-only strategy by news reporters. This model can be easily further applied to summarize multilingual news from multiple sources.
  5. Hsu, M.-H.; Chen, H.-H.: Efficient and effective prediction of social tags to enhance Web search (2011) 0.00
    0.0046599186 = product of:
      0.023299592 = sum of:
        0.023299592 = weight(_text_:system in 4625) [ClassicSimilarity], result of:
          0.023299592 = score(doc=4625,freq=2.0), product of:
            0.13391352 = queryWeight, product of:
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.04251826 = queryNorm
            0.17398985 = fieldWeight in 4625, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1495528 = idf(docFreq=5152, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4625)
      0.2 = coord(1/5)
    
    Abstract
    As the web has grown into an integral part of daily life, social annotation has become a popular manner for web users to manage resources. This method of management has many potential applications, but it is limited in applicability by the cold-start problem, especially for new resources on the web. In this article, we study automatic tag prediction for web pages comprehensively and utilize the predicted tags to improve search performance. First, we explore the stabilizing phenomenon of tag usage in a social bookmarking system. Then, we propose a two-stage tag prediction approach, which is efficient and is effective in making use of early annotations from users. In the first stage, content-based ranking, candidate tags are selected and ranked to generate an initial tag list. In the second stage, random-walk re-ranking, we adopt a random-walk model that utilizes tag co-occurrence information to re-rank the initial list. The experimental results show that our algorithm effectively proposes appropriate tags for target web pages. In addition, we present a framework to incorporate tag prediction in a general web search. The experimental results of the web search validate the hypothesis that the proposed framework significantly enhances the typical retrieval model.
  6. Chen, H.-H.; Lin, W.-C.; Yang, C.; Lin, W.-H.: Translating-transliterating named entities for multilingual information access (2006) 0.00
    0.0026882975 = product of:
      0.013441487 = sum of:
        0.013441487 = product of:
          0.04032446 = sum of:
            0.04032446 = weight(_text_:22 in 1080) [ClassicSimilarity], result of:
              0.04032446 = score(doc=1080,freq=2.0), product of:
                0.1488917 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04251826 = queryNorm
                0.2708308 = fieldWeight in 1080, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1080)
          0.33333334 = coord(1/3)
      0.2 = coord(1/5)
    
    Date
    4. 6.2006 19:52:22
  7. Tsai, M.-.F.; Chen, H.-H.; Wang, Y.-T.: Learning a merge model for multilingual information retrieval (2011) 0.00
    0.001937643 = product of:
      0.009688215 = sum of:
        0.009688215 = product of:
          0.029064644 = sum of:
            0.029064644 = weight(_text_:29 in 2750) [ClassicSimilarity], result of:
              0.029064644 = score(doc=2750,freq=2.0), product of:
                0.14956595 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.04251826 = queryNorm
                0.19432661 = fieldWeight in 2750, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2750)
          0.33333334 = coord(1/3)
      0.2 = coord(1/5)
    
    Date
    29. 1.2016 20:34:33