Search (13 results, page 1 of 1)

  • × author_ss:"Chen, H.-H."
  1. Chen, H.-H.; Lin, W.-C.; Yang, C.; Lin, W.-H.: Translating-transliterating named entities for multilingual information access (2006) 0.03
    0.028550193 = product of:
      0.07137548 = sum of:
        0.008258085 = weight(_text_:a in 1080) [ClassicSimilarity], result of:
          0.008258085 = score(doc=1080,freq=6.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.1544581 = fieldWeight in 1080, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1080)
        0.0631174 = sum of:
          0.019141505 = weight(_text_:information in 1080) [ClassicSimilarity], result of:
            0.019141505 = score(doc=1080,freq=6.0), product of:
              0.08139861 = queryWeight, product of:
                1.7554779 = idf(docFreq=20772, maxDocs=44218)
                0.046368346 = queryNorm
              0.23515764 = fieldWeight in 1080, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                1.7554779 = idf(docFreq=20772, maxDocs=44218)
                0.0546875 = fieldNorm(doc=1080)
          0.043975897 = weight(_text_:22 in 1080) [ClassicSimilarity], result of:
            0.043975897 = score(doc=1080,freq=2.0), product of:
              0.16237405 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046368346 = queryNorm
              0.2708308 = fieldWeight in 1080, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0546875 = fieldNorm(doc=1080)
      0.4 = coord(2/5)
    
    Abstract
    Named entities are major constituents of a document but are usually unknown words. This work proposes a systematic way of dealing with formulation, transformation, translation, and transliteration of multilingual-named entities. The rules and similarity matrices for translation and transliteration are learned automatically from parallel-named-entity corpora. The results are applied in cross-language access to collections of images with captions. Experimental results demonstrate that the similarity-based transliteration of named entities is effective, and runs in which transliteration is considered outperform the runs in which it is neglected.
    Date
    4. 6.2006 19:52:22
    Footnote
    Beitrag einer special topic section on multilingual information systems
    Source
    Journal of the American Society for Information Science and Technology. 57(2006) no.5, S.645-659
    Type
    a
  2. Bian, G.-W.; Chen, H.-H.: Cross-language information access to multilingual collections on the Internet (2000) 0.02
    0.024471594 = product of:
      0.061178982 = sum of:
        0.007078358 = weight(_text_:a in 4436) [ClassicSimilarity], result of:
          0.007078358 = score(doc=4436,freq=6.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.13239266 = fieldWeight in 4436, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=4436)
        0.054100625 = sum of:
          0.016407004 = weight(_text_:information in 4436) [ClassicSimilarity], result of:
            0.016407004 = score(doc=4436,freq=6.0), product of:
              0.08139861 = queryWeight, product of:
                1.7554779 = idf(docFreq=20772, maxDocs=44218)
                0.046368346 = queryNorm
              0.20156369 = fieldWeight in 4436, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                1.7554779 = idf(docFreq=20772, maxDocs=44218)
                0.046875 = fieldNorm(doc=4436)
          0.037693623 = weight(_text_:22 in 4436) [ClassicSimilarity], result of:
            0.037693623 = score(doc=4436,freq=2.0), product of:
              0.16237405 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046368346 = queryNorm
              0.23214069 = fieldWeight in 4436, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=4436)
      0.4 = coord(2/5)
    
    Abstract
    Language barrier is the major problem that people face in searching for, retrieving, and understanding multilingual collections on the Internet. This paper deals with query translation and document translation in a Chinese-English information retrieval system called MTIR. Bilingual dictionary and monolingual corpus-based approaches are adopted to select suitable tranlated query terms. A machine transliteration algorithm is introduced to resolve proper name searching. We consider several design issues for document translation, including which material is translated, what roles the HTML tags play in translation, what the tradeoff is between the speed performance and the translation performance, and what from the translated result is presented in. About 100.000 Web pages translated in the last 4 months of 1997 are used for quantitative study of online and real-time Web page translation
    Date
    16. 2.2000 14:22:39
    Source
    Journal of the American Society for Information Science. 51(2000) no.3, S.281-296
    Type
    a
  3. Ku, L.-W.; Ho, H.-W.; Chen, H.-H.: Opinion mining and relationship discovery using CopeOpi opinion analysis system (2009) 0.02
    0.018956447 = product of:
      0.047391117 = sum of:
        0.0048162127 = weight(_text_:a in 2938) [ClassicSimilarity], result of:
          0.0048162127 = score(doc=2938,freq=4.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.090081796 = fieldWeight in 2938, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2938)
        0.042574905 = sum of:
          0.011163551 = weight(_text_:information in 2938) [ClassicSimilarity], result of:
            0.011163551 = score(doc=2938,freq=4.0), product of:
              0.08139861 = queryWeight, product of:
                1.7554779 = idf(docFreq=20772, maxDocs=44218)
                0.046368346 = queryNorm
              0.13714671 = fieldWeight in 2938, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                1.7554779 = idf(docFreq=20772, maxDocs=44218)
                0.0390625 = fieldNorm(doc=2938)
          0.031411353 = weight(_text_:22 in 2938) [ClassicSimilarity], result of:
            0.031411353 = score(doc=2938,freq=2.0), product of:
              0.16237405 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046368346 = queryNorm
              0.19345059 = fieldWeight in 2938, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=2938)
      0.4 = coord(2/5)
    
    Abstract
    We present CopeOpi, an opinion-analysis system, which extracts from the Web opinions about specific targets, summarizes the polarity and strength of these opinions, and tracks opinion variations over time. Objects that yield similar opinion tendencies over a certain time period may be correlated due to the latent causal events. CopeOpi discovers relationships among objects based on their opinion-tracking plots and collocations. Event bursts are detected from the tracking plots, and the strength of opinion relationships is determined by the coverage of these plots. To evaluate opinion mining, we use the NTCIR corpus annotated with opinion information at sentence and document levels. CopeOpi achieves sentence- and document-level f-measures of 62% and 74%. For relationship discovery, we collected 1.3M economics-related documents from 93 Web sources over 22 months, and analyzed collocation-based, opinion-based, and hybrid models. We consider as correlated company pairs that demonstrate similar stock-price variations, and selected these as the gold standard for evaluation. Results show that opinion-based and collocation-based models complement each other, and that integrated models perform the best. The top 25, 50, and 100 pairs discovered achieve precision rates of 1, 0.92, and 0.79, respectively.
    Source
    Journal of the American Society for Information Science and Technology. 60(2009) no.7, S.1486-1503
    Type
    a
  4. Liu, J.S.; Chen, H.-H.; Ho, M.H.-C.; Li, Y.-C.: Citations with different levels of relevancy : tracing the main paths of legal opinions (2014) 0.01
    0.007471291 = product of:
      0.018678227 = sum of:
        0.009010308 = weight(_text_:a in 1546) [ClassicSimilarity], result of:
          0.009010308 = score(doc=1546,freq=14.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.1685276 = fieldWeight in 1546, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1546)
        0.009667919 = product of:
          0.019335838 = sum of:
            0.019335838 = weight(_text_:information in 1546) [ClassicSimilarity], result of:
              0.019335838 = score(doc=1546,freq=12.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.23754507 = fieldWeight in 1546, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1546)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    This study explores the effect from considering citation relevancy in the main path analysis. Traditional citation-based analyses treat all citations equally even though there can be various reasons and different levels of relevancy for one document to reference another. Taking the relevancy level into consideration is intuitively advantageous because it adopts more accurate information and will thus make the results of a citation-based analysis more trustworthy. This is nevertheless a challenging task. We are aware of no citation-based analysis that has taken the relevancy level into consideration. The difficulty lies in the fact that the existing patent or patent citation database provides no readily available relevancy level information. We overcome this issue by obtaining citation relevancy information from a legal database that has relevancy level ranked by legal experts. This paper selects trademark dilution, a legal concept that has been the subject of many lawsuit cases, as the target for exploration. We apply main path analysis, taking citation relevancy into consideration, and verify the results against a set of test cases that are mentioned in an authoritative trademark book. The findings show that relevancy information helps main path analysis uncover legal cases of higher importance. Nevertheless, in terms of the number of significant cases retrieved, relevancy information does not seem to make a noticeable difference.
    Source
    Journal of the Association for Information Science and Technology. 65(2014) no.12, S.2479-2488
    Type
    a
  5. Tsai, M.-.F.; Chen, H.-H.; Wang, Y.-T.: Learning a merge model for multilingual information retrieval (2011) 0.01
    0.0073831948 = product of:
      0.018457986 = sum of:
        0.009632425 = weight(_text_:a in 2750) [ClassicSimilarity], result of:
          0.009632425 = score(doc=2750,freq=16.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.18016359 = fieldWeight in 2750, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2750)
        0.008825562 = product of:
          0.017651124 = sum of:
            0.017651124 = weight(_text_:information in 2750) [ClassicSimilarity], result of:
              0.017651124 = score(doc=2750,freq=10.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.21684799 = fieldWeight in 2750, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2750)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    This paper proposes a learning approach for the merging process in multilingual information retrieval (MLIR). To conduct the learning approach, we present a number of features that may influence the MLIR merging process. These features are mainly extracted from three levels: query, document, and translation. After the feature extraction, we then use the FRank ranking algorithm to construct a merge model. To the best of our knowledge, this practice is the first attempt to use a learning-based ranking algorithm to construct a merge model for MLIR merging. In our experiments, three test collections for the task of crosslingual information retrieval (CLIR) in NTCIR3, 4, and 5 are employed to assess the performance of our proposed method. Moreover, several merging methods are also carried out for a comparison, including traditional merging methods, the 2-step merging strategy, and the merging method based on logistic regression. The experimental results show that our proposed method can significantly improve merging quality on two different types of datasets. In addition to the effectiveness, through the merge model generated by FRank, our method can further identify key factors that influence the merging process. This information might provide us more insight and understanding into MLIR merging.
    Source
    Information processing and management. 47(2011) no.5, S.635-646
    Type
    a
  6. Chen, H.-H.; Kuo, J.-J.; Huang, S.-J.; Lin, C.-J.; Wung, H.-C.: ¬A summarization system for Chinese news from multiple sources (2003) 0.01
    0.006334501 = product of:
      0.015836252 = sum of:
        0.009138121 = weight(_text_:a in 2115) [ClassicSimilarity], result of:
          0.009138121 = score(doc=2115,freq=10.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.1709182 = fieldWeight in 2115, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=2115)
        0.0066981306 = product of:
          0.013396261 = sum of:
            0.013396261 = weight(_text_:information in 2115) [ClassicSimilarity], result of:
              0.013396261 = score(doc=2115,freq=4.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.16457605 = fieldWeight in 2115, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2115)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    This article proposes a summarization system for multiple documents. It employs not only named entities and other signatures to cluster news from different sources, but also employs punctuation marks, linking elements, and topic chains to identify the meaningful units (MUs). Using nouns and verbs to identify the similar MUs, focusing and browsing models are applied to represent the summarization results. To reduce information loss during summarization, informative words in a document are introduced. For the evaluation, a question answering system (QA system) is proposed to substitute the human assessors. In large-scale experiments containing 140 questions to 17,877 documents, the results show that those models using informative words outperform pure heuristic voting-only strategy by news reporters. This model can be easily further applied to summarize multilingual news from multiple sources.
    Source
    Journal of the American Society for Information Science and technology. 54(2003) no.13, S.1224-1236
    Type
    a
  7. Lin, W.-C.; Chang, Y.-C.; Chen, H.-H.: Integrating textual and visual information for cross-language image retrieval : a trans-media dictionary approach (2007) 0.01
    0.006100817 = product of:
      0.015252043 = sum of:
        0.005779455 = weight(_text_:a in 904) [ClassicSimilarity], result of:
          0.005779455 = score(doc=904,freq=4.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.10809815 = fieldWeight in 904, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=904)
        0.009472587 = product of:
          0.018945174 = sum of:
            0.018945174 = weight(_text_:information in 904) [ClassicSimilarity], result of:
              0.018945174 = score(doc=904,freq=8.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.23274569 = fieldWeight in 904, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046875 = fieldNorm(doc=904)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    This paper explores the integration of textual and visual information for cross-language image retrieval. An approach which automatically transforms textual queries into visual representations is proposed. First, we mine the relationships between text and images and employ the mined relationships to construct visual queries from textual ones. Then, the retrieval results of textual and visual queries are combined. To evaluate the proposed approach, we conduct English monolingual and Chinese-English cross-language retrieval experiments. The selection of suitable textual query terms to construct visual queries is the major issue. Experimental results show that the proposed approach improves retrieval performance, and use of nouns is appropriate to generate visual queries.
    Footnote
    Beitrag in: Special issue on AIRS2005: Information Retrieval Research in Asia
    Source
    Information processing and management. 43(2007) no.2, S.488-502
    Type
    a
  8. Hsu, M.-H.; Chen, H.-H.: Efficient and effective prediction of social tags to enhance Web search (2011) 0.01
    0.0058368337 = product of:
      0.014592084 = sum of:
        0.009010308 = weight(_text_:a in 4625) [ClassicSimilarity], result of:
          0.009010308 = score(doc=4625,freq=14.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.1685276 = fieldWeight in 4625, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4625)
        0.0055817757 = product of:
          0.011163551 = sum of:
            0.011163551 = weight(_text_:information in 4625) [ClassicSimilarity], result of:
              0.011163551 = score(doc=4625,freq=4.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.13714671 = fieldWeight in 4625, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4625)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    As the web has grown into an integral part of daily life, social annotation has become a popular manner for web users to manage resources. This method of management has many potential applications, but it is limited in applicability by the cold-start problem, especially for new resources on the web. In this article, we study automatic tag prediction for web pages comprehensively and utilize the predicted tags to improve search performance. First, we explore the stabilizing phenomenon of tag usage in a social bookmarking system. Then, we propose a two-stage tag prediction approach, which is efficient and is effective in making use of early annotations from users. In the first stage, content-based ranking, candidate tags are selected and ranked to generate an initial tag list. In the second stage, random-walk re-ranking, we adopt a random-walk model that utilizes tag co-occurrence information to re-rank the initial list. The experimental results show that our algorithm effectively proposes appropriate tags for target web pages. In addition, we present a framework to incorporate tag prediction in a general web search. The experimental results of the web search validate the hypothesis that the proposed framework significantly enhances the typical retrieval model.
    Source
    Journal of the American Society for Information Science and Technology. 62(2011) no.8, S.1473-1487
    Type
    a
  9. Lee, L.-H.; Chen, H.-H.: Mining search intents for collaborative cyberporn filtering (2012) 0.01
    0.0056654564 = product of:
      0.014163641 = sum of:
        0.01021673 = weight(_text_:a in 4988) [ClassicSimilarity], result of:
          0.01021673 = score(doc=4988,freq=18.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.19109234 = fieldWeight in 4988, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4988)
        0.003946911 = product of:
          0.007893822 = sum of:
            0.007893822 = weight(_text_:information in 4988) [ClassicSimilarity], result of:
              0.007893822 = score(doc=4988,freq=2.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.09697737 = fieldWeight in 4988, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4988)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    This article presents a search-intent-based method to generate pornographic blacklists for collaborative cyberporn filtering. A novel porn-detection framework that can find newly appearing pornographic web pages by mining search query logs is proposed. First, suspected queries are identified along with their clicked URLs by an automatically constructed lexicon. Then, a candidate URL is determined if the number of clicks satisfies majority voting rules. Finally, a candidate whose URL contains at least one categorical keyword will be included in a blacklist. Several experiments are conducted on an MSN search porn dataset to demonstrate the effectiveness of our method. The resulting blacklist generated by our search-intent-based method achieves high precision (0.701) while maintaining a favorably low false-positive rate (0.086). The experiments of a real-life filtering simulation reveal that our proposed method with its accumulative update strategy can achieve 44.15% of a macro-averaging blocking rate, when the update frequency is set to 1 day. In addition, the overblocking rates are less than 9% with time change due to the strong advantages of our search-intent-based method. This user-behavior-oriented method can be easily applied to search engines for incorporating only implicit collective intelligence from query logs without other efforts. In practice, it is complementary to intelligent content analysis for keeping up with the changing trails of objectionable websites from users' perspectives.
    Source
    Journal of the American Society for Information Science and Technology. 63(2012) no.2, S.366-376
    Type
    a
  10. Ku, L.-W.; Chen, H.-H.: Mining opinions from the Web : beyond relevance retrieval (2007) 0.01
    0.005539249 = product of:
      0.013848122 = sum of:
        0.0034055763 = weight(_text_:a in 605) [ClassicSimilarity], result of:
          0.0034055763 = score(doc=605,freq=2.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.06369744 = fieldWeight in 605, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=605)
        0.010442546 = product of:
          0.020885091 = sum of:
            0.020885091 = weight(_text_:information in 605) [ClassicSimilarity], result of:
              0.020885091 = score(doc=605,freq=14.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.256578 = fieldWeight in 605, product of:
                  3.7416575 = tf(freq=14.0), with freq of:
                    14.0 = termFreq=14.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=605)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    Documents discussing public affairs, common themes, interesting products, and so on, are reported and distributed on the Web. Positive and negative opinions embedded in documents are useful references and feedbacks for governments to improve their services, for companies to market their products, and for customers to purchase their objects. Web opinion mining aims to extract, summarize, and track various aspects of subjective information on the Web. Mining subjective information enables traditional information retrieval (IR) systems to retrieve more data from human viewpoints and provide information with finer granularity. Opinion extraction identifies opinion holders, extracts the relevant opinion sentences, and decides their polarities. Opinion summarization recognizes the major events embedded in documents and summarizes the supportive and the nonsupportive evidence. Opinion tracking captures subjective information from various genres and monitors the developments of opinions from spatial and temporal dimensions. To demonstrate and evaluate the proposed opinion mining algorithms, news and bloggers' articles are adopted. Documents in the evaluation corpora are tagged in different granularities from words, sentences to documents. In the experiments, positive and negative sentiment words and their weights are mined on the basis of Chinese word structures. The f-measure is 73.18% and 63.75% for verbs and nouns, respectively. Utilizing the sentiment words mined together with topical words, we achieve f-measure 62.16% at the sentence level and 74.37% at the document level.
    Footnote
    Beitrag eines Themenschwerpunktes "Mining Web resources for enhancing information retrieval"
    Source
    Journal of the American Society for Information Science and Technology. 58(2007) no.12, S.1838-1850
    Type
    a
  11. Huang, H.-H.; Wang, J.-J.; Chen, H.-H.: Implicit opinion analysis : extraction and polarity labelling (2017) 0.01
    0.005032917 = product of:
      0.012582293 = sum of:
        0.004767807 = weight(_text_:a in 3820) [ClassicSimilarity], result of:
          0.004767807 = score(doc=3820,freq=2.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.089176424 = fieldWeight in 3820, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3820)
        0.007814486 = product of:
          0.015628971 = sum of:
            0.015628971 = weight(_text_:information in 3820) [ClassicSimilarity], result of:
              0.015628971 = score(doc=3820,freq=4.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.1920054 = fieldWeight in 3820, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=3820)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    Opinion words are crucial information for sentiment analysis. In some text, however, opinion words are absent or highly ambiguous. The resulting implicit opinions are more difficult to extract and label than explicit ones. In this paper, cutting-edge machine-learning approaches - deep neural network and word-embedding - are adopted for implicit opinion mining at the snippet and clause levels. Hotel reviews written in Chinese are collected and annotated as the experimental data set. Results show the convolutional neural network models not only outperform traditional support vector machine models, but also capture hidden knowledge within the raw text. The strength of word-embedding is also analyzed.
    Source
    Journal of the Association for Information Science and Technology. 68(2017) no.9, S.2076-2087
    Type
    a
  12. Lee, Y.-Y.; Ke, H.; Yen, T.-Y.; Huang, H.-H.; Chen, H.-H.: Combining and learning word embedding with WordNet for semantic relatedness and similarity measurement (2020) 0.00
    0.0049160775 = product of:
      0.012290194 = sum of:
        0.004086692 = weight(_text_:a in 5871) [ClassicSimilarity], result of:
          0.004086692 = score(doc=5871,freq=2.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.07643694 = fieldWeight in 5871, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=5871)
        0.008203502 = product of:
          0.016407004 = sum of:
            0.016407004 = weight(_text_:information in 5871) [ClassicSimilarity], result of:
              0.016407004 = score(doc=5871,freq=6.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.20156369 = fieldWeight in 5871, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5871)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    In this research, we propose 3 different approaches to measure the semantic relatedness between 2 words: (i) boost the performance of GloVe word embedding model via removing or transforming abnormal dimensions; (ii) linearly combine the information extracted from WordNet and word embeddings; and (iii) utilize word embedding and 12 linguistic information extracted from WordNet as features for Support Vector Regression. We conducted our experiments on 8 benchmark data sets, and computed Spearman correlations between the outputs of our methods and the ground truth. We report our results together with 3 state-of-the-art approaches. The experimental results show that our method can outperform state-of-the-art approaches in all the selected English benchmark data sets.
    Source
    Journal of the Association for Information Science and Technology. 71(2020) no.6, S.657-670
    Type
    a
  13. Lee, L.-H.; Juan, Y.-C.; Tseng, W.-L.; Chen, H.-H.; Tseng, Y.-H.: Mining browsing behaviors for objectionable content filtering (2015) 0.00
    0.0039382176 = product of:
      0.009845544 = sum of:
        0.005898632 = weight(_text_:a in 1818) [ClassicSimilarity], result of:
          0.005898632 = score(doc=1818,freq=6.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.11032722 = fieldWeight in 1818, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1818)
        0.003946911 = product of:
          0.007893822 = sum of:
            0.007893822 = weight(_text_:information in 1818) [ClassicSimilarity], result of:
              0.007893822 = score(doc=1818,freq=2.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.09697737 = fieldWeight in 1818, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1818)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    This article explores users' browsing intents to predict the category of a user's next access during web surfing and applies the results to filter objectionable content, such as pornography, gambling, violence, and drugs. Users' access trails in terms of category sequences in click-through data are employed to mine users' web browsing behaviors. Contextual relationships of URL categories are learned by the hidden Markov model. The top-level domains (TLDs) extracted from URLs themselves and the corresponding categories are caught by the TLD model. Given a URL to be predicted, its TLD and current context are empirically combined in an aggregation model. In addition to the uses of the current context, the predictions of the URL accessed previously in different contexts by various users are also considered by majority rule to improve the aggregation model. Large-scale experiments show that the advanced aggregation approach achieves promising performance while maintaining an acceptably low false positive rate. Different strategies are introduced to integrate the model with the blacklist it generates for filtering objectionable web pages without analyzing their content. In practice, this is complementary to the existing content analysis from users' behavioral perspectives.
    Source
    Journal of the Association for Information Science and Technology. 66(2015) no.5, S.930-942
    Type
    a