Search (29 results, page 1 of 2)

  • × author_ss:"Chen, H."
  • × year_i:[2000 TO 2010}
  1. Chung, W.; Chen, H.: Browsing the underdeveloped Web : an experiment on the Arabic Medical Web Directory (2009) 0.03
    0.033295516 = product of:
      0.06659103 = sum of:
        0.06659103 = sum of:
          0.010670074 = weight(_text_:a in 2733) [ClassicSimilarity], result of:
            0.010670074 = score(doc=2733,freq=14.0), product of:
              0.052761257 = queryWeight, product of:
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.045758117 = queryNorm
              0.20223314 = fieldWeight in 2733, product of:
                3.7416575 = tf(freq=14.0), with freq of:
                  14.0 = termFreq=14.0
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.046875 = fieldNorm(doc=2733)
          0.018723397 = weight(_text_:h in 2733) [ClassicSimilarity], result of:
            0.018723397 = score(doc=2733,freq=2.0), product of:
              0.113683715 = queryWeight, product of:
                2.4844491 = idf(docFreq=10020, maxDocs=44218)
                0.045758117 = queryNorm
              0.16469726 = fieldWeight in 2733, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4844491 = idf(docFreq=10020, maxDocs=44218)
                0.046875 = fieldNorm(doc=2733)
          0.03719756 = weight(_text_:22 in 2733) [ClassicSimilarity], result of:
            0.03719756 = score(doc=2733,freq=2.0), product of:
              0.16023713 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.045758117 = queryNorm
              0.23214069 = fieldWeight in 2733, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=2733)
      0.5 = coord(1/2)
    
    Abstract
    While the Web has grown significantly in recent years, some portions of the Web remain largely underdeveloped, as shown in a lack of high-quality content and functionality. An example is the Arabic Web, in which a lack of well-structured Web directories limits users' ability to browse for Arabic resources. In this research, we proposed an approach to building Web directories for the underdeveloped Web and developed a proof-of-concept prototype called the Arabic Medical Web Directory (AMedDir) that supports browsing of over 5,000 Arabic medical Web sites and pages organized in a hierarchical structure. We conducted an experiment involving Arab participants and found that the AMedDir significantly outperformed two benchmark Arabic Web directories in terms of browsing effectiveness, efficiency, information quality, and user satisfaction. Participants expressed strong preference for the AMedDir and provided many positive comments. This research thus contributes to developing a useful Web directory for organizing the information in the Arabic medical domain and to a better understanding of how to support browsing on the underdeveloped Web.
    Date
    22. 3.2009 17:57:50
    Type
    a
  2. Zheng, R.; Li, J.; Chen, H.; Huang, Z.: ¬A framework for authorship identification of online messages : writing-style features and classification techniques (2006) 0.03
    0.027416471 = product of:
      0.054832943 = sum of:
        0.054832943 = sum of:
          0.008232141 = weight(_text_:a in 5276) [ClassicSimilarity], result of:
            0.008232141 = score(doc=5276,freq=12.0), product of:
              0.052761257 = queryWeight, product of:
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.045758117 = queryNorm
              0.15602624 = fieldWeight in 5276, product of:
                3.4641016 = tf(freq=12.0), with freq of:
                  12.0 = termFreq=12.0
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.0390625 = fieldNorm(doc=5276)
          0.015602832 = weight(_text_:h in 5276) [ClassicSimilarity], result of:
            0.015602832 = score(doc=5276,freq=2.0), product of:
              0.113683715 = queryWeight, product of:
                2.4844491 = idf(docFreq=10020, maxDocs=44218)
                0.045758117 = queryNorm
              0.13724773 = fieldWeight in 5276, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4844491 = idf(docFreq=10020, maxDocs=44218)
                0.0390625 = fieldNorm(doc=5276)
          0.030997967 = weight(_text_:22 in 5276) [ClassicSimilarity], result of:
            0.030997967 = score(doc=5276,freq=2.0), product of:
              0.16023713 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.045758117 = queryNorm
              0.19345059 = fieldWeight in 5276, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=5276)
      0.5 = coord(1/2)
    
    Abstract
    With the rapid proliferation of Internet technologies and applications, misuse of online messages for inappropriate or illegal purposes has become a major concern for society. The anonymous nature of online-message distribution makes identity tracing a critical problem. We developed a framework for authorship identification of online messages to address the identity-tracing problem. In this framework, four types of writing-style features (lexical, syntactic, structural, and content-specific features) are extracted and inductive learning algorithms are used to build feature-based classification models to identify authorship of online messages. To examine this framework, we conducted experiments on English and Chinese online-newsgroup messages. We compared the discriminating power of the four types of features and of three classification techniques: decision trees, backpropagation neural networks, and support vector machines. The experimental results showed that the proposed approach was able to identify authors of online messages with satisfactory accuracy of 70 to 95%. All four types of message features contributed to discriminating authors of online messages. Support vector machines outperformed the other two classification techniques in our experiments. The high performance we achieved for both the English and Chinese datasets showed the potential of this approach in a multiple-language context.
    Date
    22. 7.2006 16:14:37
    Type
    a
  3. Leroy, G.; Chen, H.: Genescene: an ontology-enhanced integration of linguistic and co-occurrence based relations in biomedical texts (2005) 0.03
    0.027057841 = product of:
      0.054115683 = sum of:
        0.054115683 = sum of:
          0.007514882 = weight(_text_:a in 5259) [ClassicSimilarity], result of:
            0.007514882 = score(doc=5259,freq=10.0), product of:
              0.052761257 = queryWeight, product of:
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.045758117 = queryNorm
              0.14243183 = fieldWeight in 5259, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.0390625 = fieldNorm(doc=5259)
          0.015602832 = weight(_text_:h in 5259) [ClassicSimilarity], result of:
            0.015602832 = score(doc=5259,freq=2.0), product of:
              0.113683715 = queryWeight, product of:
                2.4844491 = idf(docFreq=10020, maxDocs=44218)
                0.045758117 = queryNorm
              0.13724773 = fieldWeight in 5259, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4844491 = idf(docFreq=10020, maxDocs=44218)
                0.0390625 = fieldNorm(doc=5259)
          0.030997967 = weight(_text_:22 in 5259) [ClassicSimilarity], result of:
            0.030997967 = score(doc=5259,freq=2.0), product of:
              0.16023713 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.045758117 = queryNorm
              0.19345059 = fieldWeight in 5259, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=5259)
      0.5 = coord(1/2)
    
    Abstract
    The increasing amount of publicly available literature and experimental data in biomedicine makes it hard for biomedical researchers to stay up-to-date. Genescene is a toolkit that will help alleviate this problem by providing an overview of published literature content. We combined a linguistic parser with Concept Space, a co-occurrence based semantic net. Both techniques extract complementary biomedical relations between noun phrases from MEDLINE abstracts. The parser extracts precise and semantically rich relations from individual abstracts. Concept Space extracts relations that hold true for the collection of abstracts. The Gene Ontology, the Human Genome Nomenclature, and the Unified Medical Language System, are also integrated in Genescene. Currently, they are used to facilitate the integration of the two relation types, and to select the more interesting and high-quality relations for presentation. A user study focusing on p53 literature is discussed. All MEDLINE abstracts discussing p53 were processed in Genescene. Two researchers evaluated the terms and relations from several abstracts of interest to them. The results show that the terms were precise (precision 93%) and relevant, as were the parser relations (precision 95%). The Concept Space relations were more precise when selected with ontological knowledge (precision 78%) than without (60%).
    Date
    22. 7.2006 14:26:01
    Type
    a
  4. Hu, D.; Kaza, S.; Chen, H.: Identifying significant facilitators of dark network evolution (2009) 0.03
    0.027057841 = product of:
      0.054115683 = sum of:
        0.054115683 = sum of:
          0.007514882 = weight(_text_:a in 2753) [ClassicSimilarity], result of:
            0.007514882 = score(doc=2753,freq=10.0), product of:
              0.052761257 = queryWeight, product of:
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.045758117 = queryNorm
              0.14243183 = fieldWeight in 2753, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.0390625 = fieldNorm(doc=2753)
          0.015602832 = weight(_text_:h in 2753) [ClassicSimilarity], result of:
            0.015602832 = score(doc=2753,freq=2.0), product of:
              0.113683715 = queryWeight, product of:
                2.4844491 = idf(docFreq=10020, maxDocs=44218)
                0.045758117 = queryNorm
              0.13724773 = fieldWeight in 2753, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4844491 = idf(docFreq=10020, maxDocs=44218)
                0.0390625 = fieldNorm(doc=2753)
          0.030997967 = weight(_text_:22 in 2753) [ClassicSimilarity], result of:
            0.030997967 = score(doc=2753,freq=2.0), product of:
              0.16023713 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.045758117 = queryNorm
              0.19345059 = fieldWeight in 2753, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=2753)
      0.5 = coord(1/2)
    
    Abstract
    Social networks evolve over time with the addition and removal of nodes and links to survive and thrive in their environments. Previous studies have shown that the link-formation process in such networks is influenced by a set of facilitators. However, there have been few empirical evaluations to determine the important facilitators. In a research partnership with law enforcement agencies, we used dynamic social-network analysis methods to examine several plausible facilitators of co-offending relationships in a large-scale narcotics network consisting of individuals and vehicles. Multivariate Cox regression and a two-proportion z-test on cyclic and focal closures of the network showed that mutual acquaintance and vehicle affiliations were significant facilitators for the network under study. We also found that homophily with respect to age, race, and gender were not good predictors of future link formation in these networks. Moreover, we examined the social causes and policy implications for the significance and insignificance of various facilitators including common jails on future co-offending. These findings provide important insights into the link-formation processes and the resilience of social networks. In addition, they can be used to aid in the prediction of future links. The methods described can also help in understanding the driving forces behind the formation and evolution of social networks facilitated by mobile and Web technologies.
    Date
    22. 3.2009 18:50:30
    Type
    a
  5. Dumais, S.; Chen, H.: Hierarchical classification of Web content (2000) 0.02
    0.01517087 = product of:
      0.03034174 = sum of:
        0.03034174 = product of:
          0.04551261 = sum of:
            0.008065818 = weight(_text_:a in 492) [ClassicSimilarity], result of:
              0.008065818 = score(doc=492,freq=2.0), product of:
                0.052761257 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.045758117 = queryNorm
                0.15287387 = fieldWeight in 492, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.09375 = fieldNorm(doc=492)
            0.037446793 = weight(_text_:h in 492) [ClassicSimilarity], result of:
              0.037446793 = score(doc=492,freq=2.0), product of:
                0.113683715 = queryWeight, product of:
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.045758117 = queryNorm
                0.32939452 = fieldWeight in 492, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.09375 = fieldNorm(doc=492)
          0.6666667 = coord(2/3)
      0.5 = coord(1/2)
    
    Type
    a
  6. Schumaker, R.P.; Chen, H.: Evaluating a news-aware quantitative trader : the effect of momentum and contrarian stock selection strategies (2008) 0.01
    0.012482962 = product of:
      0.024965923 = sum of:
        0.024965923 = product of:
          0.037448883 = sum of:
            0.01560492 = weight(_text_:a in 1352) [ClassicSimilarity], result of:
              0.01560492 = score(doc=1352,freq=22.0), product of:
                0.052761257 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.045758117 = queryNorm
                0.29576474 = fieldWeight in 1352, product of:
                  4.690416 = tf(freq=22.0), with freq of:
                    22.0 = termFreq=22.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1352)
            0.021843962 = weight(_text_:h in 1352) [ClassicSimilarity], result of:
              0.021843962 = score(doc=1352,freq=2.0), product of:
                0.113683715 = queryWeight, product of:
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.045758117 = queryNorm
                0.19214681 = fieldWeight in 1352, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1352)
          0.6666667 = coord(2/3)
      0.5 = coord(1/2)
    
    Abstract
    We study the coupling of basic quantitative portfolio selection strategies with a financial news article prediction system, AZFinText. By varying the degrees of portfolio formation time, we found that a hybrid system using both quantitative strategy and a full set of financial news articles performed the best. With a 1-week portfolio formation period, we achieved a 20.79% trading return using a Momentum strategy and a 4.54% return using a Contrarian strategy over a 5-week holding period. We also found that trader overreaction to these events led AZFinText to capitalize on these short-term surges in price.
    Type
    a
  7. Hu, P.J.-H.; Lin, C.; Chen, H.: User acceptance of intelligence and security informatics technology : a study of COPLINK (2005) 0.01
    0.0115149 = product of:
      0.0230298 = sum of:
        0.0230298 = product of:
          0.0345447 = sum of:
            0.008065818 = weight(_text_:a in 3233) [ClassicSimilarity], result of:
              0.008065818 = score(doc=3233,freq=8.0), product of:
                0.052761257 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.045758117 = queryNorm
                0.15287387 = fieldWeight in 3233, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3233)
            0.026478881 = weight(_text_:h in 3233) [ClassicSimilarity], result of:
              0.026478881 = score(doc=3233,freq=4.0), product of:
                0.113683715 = queryWeight, product of:
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.045758117 = queryNorm
                0.2329171 = fieldWeight in 3233, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3233)
          0.6666667 = coord(2/3)
      0.5 = coord(1/2)
    
    Abstract
    The importance of Intelligence and Security Informatics (ISI) has significantly increased with the rapid and largescale migration of local/national security information from physical media to electronic platforms, including the Internet and information systems. Motivated by the significance of ISI in law enforcement (particularly in the digital government context) and the limited investigations of officers' technology-acceptance decisionmaking, we developed and empirically tested a factor model for explaining law-enforcement officers' technology acceptance. Specifically, our empirical examination targeted the COPLINK technology and involved more than 280 police officers. Overall, our model shows a good fit to the data collected and exhibits satisfactory Power for explaining law-enforcement officers' technology acceptance decisions. Our findings have several implications for research and technology management practices in law enforcement, which are also discussed.
    Type
    a
  8. Chung, W.; Zhang, Y.; Huang, Z.; Wang, G.; Ong, T.-H.; Chen, H.: Internet searching and browsing in a multilingual world : an experiment an the Chinese Business Intelligence Portal (CBizPort) (2004) 0.01
    0.010523799 = product of:
      0.021047598 = sum of:
        0.021047598 = product of:
          0.031571396 = sum of:
            0.0095056575 = weight(_text_:a in 2393) [ClassicSimilarity], result of:
              0.0095056575 = score(doc=2393,freq=16.0), product of:
                0.052761257 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.045758117 = queryNorm
                0.18016359 = fieldWeight in 2393, product of:
                  4.0 = tf(freq=16.0), with freq of:
                    16.0 = termFreq=16.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2393)
            0.022065736 = weight(_text_:h in 2393) [ClassicSimilarity], result of:
              0.022065736 = score(doc=2393,freq=4.0), product of:
                0.113683715 = queryWeight, product of:
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.045758117 = queryNorm
                0.1940976 = fieldWeight in 2393, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2393)
          0.6666667 = coord(2/3)
      0.5 = coord(1/2)
    
    Abstract
    The rapid growth of the non-English-speaking Internet population has created a need for better searching and browsing capabilities in languages other than English. However, existing search engines may not serve the needs of many non-English-speaking Internet users. In this paper, we propose a generic and integrated approach to searching and browsing the Internet in a multilingual world. Based an this approach, we have developed the Chinese Business Intelligence Portal (CBizPort), a meta-search engine that searches for business information of mainland China, Taiwan, and Hong Kong. Additional functions provided by CBizPort include encoding conversion (between Simplified Chinese and Traditional Chinese), summarization, and categorization. Experimental results of our user evaluation study show that the searching and browsing performance of CBizPort was comparable to that of regional Chinese search engines, and CBizPort could significantly augment these search engines. Subjects' verbal comments indicate that CBizPort performed best in terms of analysis functions, cross-regional searching, and user-friendliness, whereas regional search engines were more efficient and more popular. Subjects especially liked CBizPort's summarizer and categorizer, which helped in understanding search results. These encouraging results suggest a promising future of our approach to Internet searching and browsing in a multilingual world.
    Type
    a
  9. Chen, H.: ¬An analysis of image queries in the field of art history (2001) 0.01
    0.010418028 = product of:
      0.020836055 = sum of:
        0.020836055 = product of:
          0.031254083 = sum of:
            0.009410121 = weight(_text_:a in 5187) [ClassicSimilarity], result of:
              0.009410121 = score(doc=5187,freq=8.0), product of:
                0.052761257 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.045758117 = queryNorm
                0.17835285 = fieldWeight in 5187, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5187)
            0.021843962 = weight(_text_:h in 5187) [ClassicSimilarity], result of:
              0.021843962 = score(doc=5187,freq=2.0), product of:
                0.113683715 = queryWeight, product of:
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.045758117 = queryNorm
                0.19214681 = fieldWeight in 5187, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5187)
          0.6666667 = coord(2/3)
      0.5 = coord(1/2)
    
    Abstract
    Chen arranged with an Art History instructor to require 20 medieval art images in papers received from 29 students. Participants completed a self administered presearch and postsearch questionnaire, and were interviewed after questionnaire analysis, in order to collect both the keywords and phrases they planned to use, and those actually used. Three MLIS student reviewers then mapped the queries to Enser and McGregor's four categories, Jorgensen's 12 classes, and Fidel's 12 feature data and object poles providing a degree of match on a seven point scale (one not at all to 7 exact). The reviewers give highest scores to Enser and McGregor;'s categories. Modifications to both the Enser and McGregor and Jorgensen schemes are suggested
    Type
    a
  10. Chen, H.; Fan, H.; Chau, M.; Zeng, D.: MetaSpider : meta-searching and categorization on the Web (2001) 0.01
    0.009595751 = product of:
      0.019191502 = sum of:
        0.019191502 = product of:
          0.028787252 = sum of:
            0.0067215143 = weight(_text_:a in 6849) [ClassicSimilarity], result of:
              0.0067215143 = score(doc=6849,freq=8.0), product of:
                0.052761257 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.045758117 = queryNorm
                0.12739488 = fieldWeight in 6849, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=6849)
            0.022065736 = weight(_text_:h in 6849) [ClassicSimilarity], result of:
              0.022065736 = score(doc=6849,freq=4.0), product of:
                0.113683715 = queryWeight, product of:
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.045758117 = queryNorm
                0.1940976 = fieldWeight in 6849, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=6849)
          0.6666667 = coord(2/3)
      0.5 = coord(1/2)
    
    Abstract
    It has become increasingly difficult to locate relevant information on the Web, even with the help of Web search engines. Two approaches to addressing the low precision and poor presentation of search results of current search tools are studied: meta-search and document categorization. Meta-search engines improve precision by selecting and integrating search results from generic or domain-specific Web search engines or other resources. Document categorization promises better organization and presentation of retrieved results. This article introduces MetaSpider, a meta-search engine that has real-time indexing and categorizing functions. We report in this paper the major components of MetaSpider and discuss related technical approaches. Initial results of a user evaluation study comparing Meta-Spider, NorthernLight, and MetaCrawler in terms of clustering performance and of time and effort expended show that MetaSpider performed best in precision rate, but disclose no statistically significant differences in recall rate and time requirements. Our experimental study also reveals that MetaSpider exhibited a higher level of automation than the other two systems and facilitated efficient searching by providing the user with an organized, comprehensive view of the retrieved documents.
    Type
    a
  11. Li, J.; Zhang, Z.; Li, X.; Chen, H.: Kernel-based learning for biomedical relation extraction (2008) 0.01
    0.009533988 = product of:
      0.019067977 = sum of:
        0.019067977 = product of:
          0.028601965 = sum of:
            0.009878568 = weight(_text_:a in 1611) [ClassicSimilarity], result of:
              0.009878568 = score(doc=1611,freq=12.0), product of:
                0.052761257 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.045758117 = queryNorm
                0.18723148 = fieldWeight in 1611, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1611)
            0.018723397 = weight(_text_:h in 1611) [ClassicSimilarity], result of:
              0.018723397 = score(doc=1611,freq=2.0), product of:
                0.113683715 = queryWeight, product of:
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.045758117 = queryNorm
                0.16469726 = fieldWeight in 1611, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1611)
          0.6666667 = coord(2/3)
      0.5 = coord(1/2)
    
    Abstract
    Relation extraction is the process of scanning text for relationships between named entities. Recently, significant studies have focused on automatically extracting relations from biomedical corpora. Most existing biomedical relation extractors require manual creation of biomedical lexicons or parsing templates based on domain knowledge. In this study, we propose to use kernel-based learning methods to automatically extract biomedical relations from literature text. We develop a framework of kernel-based learning for biomedical relation extraction. In particular, we modified the standard tree kernel function by incorporating a trace kernel to capture richer contextual information. In our experiments on a biomedical corpus, we compare different kernel functions for biomedical relation detection and classification. The experimental results show that a tree kernel outperforms word and sequence kernels for relation detection, our trace-tree kernel outperforms the standard tree kernel, and a composite kernel outperforms individual kernels for relation extraction.
    Type
    a
  12. Fu, T.; Abbasi, A.; Chen, H.: ¬A hybrid approach to Web forum interactional coherence analysis (2008) 0.01
    0.009533988 = product of:
      0.019067977 = sum of:
        0.019067977 = product of:
          0.028601965 = sum of:
            0.009878568 = weight(_text_:a in 1872) [ClassicSimilarity], result of:
              0.009878568 = score(doc=1872,freq=12.0), product of:
                0.052761257 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.045758117 = queryNorm
                0.18723148 = fieldWeight in 1872, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1872)
            0.018723397 = weight(_text_:h in 1872) [ClassicSimilarity], result of:
              0.018723397 = score(doc=1872,freq=2.0), product of:
                0.113683715 = queryWeight, product of:
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.045758117 = queryNorm
                0.16469726 = fieldWeight in 1872, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1872)
          0.6666667 = coord(2/3)
      0.5 = coord(1/2)
    
    Abstract
    Despite the rapid growth of text-based computer-mediated communication (CMC), its limitations have rendered the media highly incoherent. This poses problems for content analysis of online discourse archives. Interactional coherence analysis (ICA) attempts to accurately identify and construct CMC interaction networks. In this study, we propose the Hybrid Interactional Coherence (HIC) algorithm for identification of web forum interaction. HIC utilizes a bevy of system and linguistic features, including message header information, quotations, direct address, and lexical relations. Furthermore, several similarity-based methods including a Lexical Match Algorithm (LMA) and a sliding window method are utilized to account for interactional idiosyncrasies. Experiments results on two web forums revealed that the proposed HIC algorithm significantly outperformed comparison techniques in terms of precision, recall, and F-measure at both the forum and thread levels. Additionally, an example was used to illustrate how the improved ICA results can facilitate enhanced social network and role analysis capabilities.
    Type
    a
  13. Chen, H.; Chung, W.; Qin, J.; Reid, E.; Sageman, M.; Weimann, G.: Uncovering the dark Web : a case study of Jihad on the Web (2008) 0.01
    0.009533988 = product of:
      0.019067977 = sum of:
        0.019067977 = product of:
          0.028601965 = sum of:
            0.009878568 = weight(_text_:a in 1880) [ClassicSimilarity], result of:
              0.009878568 = score(doc=1880,freq=12.0), product of:
                0.052761257 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.045758117 = queryNorm
                0.18723148 = fieldWeight in 1880, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1880)
            0.018723397 = weight(_text_:h in 1880) [ClassicSimilarity], result of:
              0.018723397 = score(doc=1880,freq=2.0), product of:
                0.113683715 = queryWeight, product of:
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.045758117 = queryNorm
                0.16469726 = fieldWeight in 1880, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1880)
          0.6666667 = coord(2/3)
      0.5 = coord(1/2)
    
    Abstract
    While the Web has become a worldwide platform for communication, terrorists share their ideology and communicate with members on the Dark Web - the reverse side of the Web used by terrorists. Currently, the problems of information overload and difficulty to obtain a comprehensive picture of terrorist activities hinder effective and efficient analysis of terrorist information on the Web. To improve understanding of terrorist activities, we have developed a novel methodology for collecting and analyzing Dark Web information. The methodology incorporates information collection, analysis, and visualization techniques, and exploits various Web information sources. We applied it to collecting and analyzing information of 39 Jihad Web sites and developed visualization of their site contents, relationships, and activity levels. An expert evaluation showed that the methodology is very useful and promising, having a high potential to assist in investigation and understanding of terrorist activities by producing results that could potentially help guide both policymaking and intelligence research.
    Type
    a
  14. Dang, Y.; Zhang, Y.; Chen, H.; Hu, P.J.-H.; Brown, S.A.; Larson, C.: Arizona Literature Mapper : an integrated approach to monitor and analyze global bioterrorism research literature (2009) 0.01
    0.00929558 = product of:
      0.01859116 = sum of:
        0.01859116 = product of:
          0.027886739 = sum of:
            0.0058210026 = weight(_text_:a in 2943) [ClassicSimilarity], result of:
              0.0058210026 = score(doc=2943,freq=6.0), product of:
                0.052761257 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.045758117 = queryNorm
                0.11032722 = fieldWeight in 2943, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2943)
            0.022065736 = weight(_text_:h in 2943) [ClassicSimilarity], result of:
              0.022065736 = score(doc=2943,freq=4.0), product of:
                0.113683715 = queryWeight, product of:
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.045758117 = queryNorm
                0.1940976 = fieldWeight in 2943, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2943)
          0.6666667 = coord(2/3)
      0.5 = coord(1/2)
    
    Abstract
    Biomedical research is critical to biodefense, which is drawing increasing attention from governments globally as well as from various research communities. The U.S. government has been closely monitoring and regulating biomedical research activities, particularly those studying or involving bioterrorism agents or diseases. Effective surveillance requires comprehensive understanding of extant biomedical research and timely detection of new developments or emerging trends. The rapid knowledge expansion, technical breakthroughs, and spiraling collaboration networks demand greater support for literature search and sharing, which cannot be effectively supported by conventional literature search mechanisms or systems. In this study, we propose an integrated approach that integrates advanced techniques for content analysis, network analysis, and information visualization. We design and implement Arizona Literature Mapper, a Web-based portal that allows users to gain timely, comprehensive understanding of bioterrorism research, including leading scientists, research groups, institutions as well as insights about current mainstream interests or emerging trends. We conduct two user studies to evaluate Arizona Literature Mapper and include a well-known system for benchmarking purposes. According to our results, Arizona Literature Mapper is significantly more effective for supporting users' search of bioterrorism publications than PubMed. Users consider Arizona Literature Mapper more useful and easier to use than PubMed. Users are also more satisfied with Arizona Literature Mapper and show stronger intentions to use it in the future. Assessments of Arizona Literature Mapper's analysis functions are also positive, as our subjects consider them useful, easy to use, and satisfactory. Our results have important implications that are also discussed in the article.
    Type
    a
  15. Zhu, B.; Chen, H.: Validating a geographical image retrieval system (2000) 0.01
    0.009247085 = product of:
      0.01849417 = sum of:
        0.01849417 = product of:
          0.027741255 = sum of:
            0.009017859 = weight(_text_:a in 4769) [ClassicSimilarity], result of:
              0.009017859 = score(doc=4769,freq=10.0), product of:
                0.052761257 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.045758117 = queryNorm
                0.1709182 = fieldWeight in 4769, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4769)
            0.018723397 = weight(_text_:h in 4769) [ClassicSimilarity], result of:
              0.018723397 = score(doc=4769,freq=2.0), product of:
                0.113683715 = queryWeight, product of:
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.045758117 = queryNorm
                0.16469726 = fieldWeight in 4769, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4769)
          0.6666667 = coord(2/3)
      0.5 = coord(1/2)
    
    Abstract
    This paper summarizes a prototype geographical image retrieval system that demonstrates how to integrate image processing and information analysis techniques to support large-scale content-based image retrieval. By using an image as its interface, the prototype system addresses a troublesome aspect of traditional retrieval models, which require users to have complete knowledge of the low-level features of an image. In addition we describe an experiment to validate against that of human subjects in an effort to address the scarcity of research evaluating performance of an algorithm against that of human beings. The results of the experiment indicate that the system could do as well as human subjects in accomplishing the tasks of similarity analysis and image categorization. We also found that under some circumstances texture features of an image are insufficient to represent an geographic image. We believe, however, that our image retrieval system provides a promising approach to integrating image processing techniques and information retrieval algorithms
    Type
    a
  16. Huang, Z.; Chung, Z.W.; Chen, H.: ¬A graph model for e-commerce recommender systems (2004) 0.01
    0.009247085 = product of:
      0.01849417 = sum of:
        0.01849417 = product of:
          0.027741255 = sum of:
            0.009017859 = weight(_text_:a in 501) [ClassicSimilarity], result of:
              0.009017859 = score(doc=501,freq=10.0), product of:
                0.052761257 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.045758117 = queryNorm
                0.1709182 = fieldWeight in 501, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=501)
            0.018723397 = weight(_text_:h in 501) [ClassicSimilarity], result of:
              0.018723397 = score(doc=501,freq=2.0), product of:
                0.113683715 = queryWeight, product of:
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.045758117 = queryNorm
                0.16469726 = fieldWeight in 501, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.046875 = fieldNorm(doc=501)
          0.6666667 = coord(2/3)
      0.5 = coord(1/2)
    
    Abstract
    Information overload on the Web has created enormous challenges to customers selecting products for online purchases and to online businesses attempting to identify customers' preferences efficiently. Various recommender systems employing different data representations and recommendation methods are currently used to address these challenges. In this research, we developed a graph model that provides a generic data representation and can support different recommendation methods. To demonstrate its usefulness and flexibility, we developed three recommendation methods: direct retrieval, association mining, and high-degree association retrieval. We used a data set from an online bookstore as our research test-bed. Evaluation results showed that combining product content information and historical customer transaction information achieved more accurate predictions and relevant recommendations than using only collaborative information. However, comparisons among different methods showed that high-degree association retrieval did not perform significantly better than the association mining method or the direct retrieval method in our test-bed.
    Type
    a
  17. Vishwanath, A.; Chen, H.: Personal communication technologies as an extension of the self : a cross-cultural comparison of people's associations with technology and their symbolic proximity with others (2008) 0.01
    0.009247085 = product of:
      0.01849417 = sum of:
        0.01849417 = product of:
          0.027741255 = sum of:
            0.009017859 = weight(_text_:a in 2355) [ClassicSimilarity], result of:
              0.009017859 = score(doc=2355,freq=10.0), product of:
                0.052761257 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.045758117 = queryNorm
                0.1709182 = fieldWeight in 2355, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2355)
            0.018723397 = weight(_text_:h in 2355) [ClassicSimilarity], result of:
              0.018723397 = score(doc=2355,freq=2.0), product of:
                0.113683715 = queryWeight, product of:
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.045758117 = queryNorm
                0.16469726 = fieldWeight in 2355, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2355)
          0.6666667 = coord(2/3)
      0.5 = coord(1/2)
    
    Abstract
    Increasingly, individuals use communication technologies such as e-mail, IMs, blogs, and cell phones to locate, learn about, and communicate with one another. Not much, however, is known about how individuals relate to various personal technologies, their preferences for each, or their extensional associations with them. Even less is known about the cultural differences in these preferences. The current study used the Galileo system of multidimensional scaling to systematically map the extensional associations with nine personal communication technologies across three cultures: U.S., Germany, and Singapore. Across the three cultures, the technologies closest to the self were similar, suggesting a universality of associations with certain technologies. In contrast, the technologies farther from the self were significantly different across cultures. Moreover, the magnitude of associations with each technology differed based on the extensional association or distance from the self. Also, and more importantly, the antecedents to these associations differed significantly across cultures, suggesting a stronger influence of cultural norms on personal-technology choice.
    Type
    a
  18. Chen, H.; Chau, M.: Web mining : machine learning for Web applications (2003) 0.01
    0.008929739 = product of:
      0.017859478 = sum of:
        0.017859478 = product of:
          0.026789214 = sum of:
            0.008065818 = weight(_text_:a in 4242) [ClassicSimilarity], result of:
              0.008065818 = score(doc=4242,freq=8.0), product of:
                0.052761257 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.045758117 = queryNorm
                0.15287387 = fieldWeight in 4242, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4242)
            0.018723397 = weight(_text_:h in 4242) [ClassicSimilarity], result of:
              0.018723397 = score(doc=4242,freq=2.0), product of:
                0.113683715 = queryWeight, product of:
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.045758117 = queryNorm
                0.16469726 = fieldWeight in 4242, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4242)
          0.6666667 = coord(2/3)
      0.5 = coord(1/2)
    
    Abstract
    With more than two billion pages created by millions of Web page authors and organizations, the World Wide Web is a tremendously rich knowledge base. The knowledge comes not only from the content of the pages themselves, but also from the unique characteristics of the Web, such as its hyperlink structure and its diversity of content and languages. Analysis of these characteristics often reveals interesting patterns and new knowledge. Such knowledge can be used to improve users' efficiency and effectiveness in searching for information an the Web, and also for applications unrelated to the Web, such as support for decision making or business management. The Web's size and its unstructured and dynamic content, as well as its multilingual nature, make the extraction of useful knowledge a challenging research problem. Furthermore, the Web generates a large amount of data in other formats that contain valuable information. For example, Web server logs' information about user access patterns can be used for information personalization or improving Web page design.
    Type
    a
  19. Schroeder, J.; Xu, J.; Chen, H.; Chau, M.: Automated criminal link analysis based on domain knowledge (2007) 0.01
    0.008929739 = product of:
      0.017859478 = sum of:
        0.017859478 = product of:
          0.026789214 = sum of:
            0.008065818 = weight(_text_:a in 275) [ClassicSimilarity], result of:
              0.008065818 = score(doc=275,freq=8.0), product of:
                0.052761257 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.045758117 = queryNorm
                0.15287387 = fieldWeight in 275, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=275)
            0.018723397 = weight(_text_:h in 275) [ClassicSimilarity], result of:
              0.018723397 = score(doc=275,freq=2.0), product of:
                0.113683715 = queryWeight, product of:
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.045758117 = queryNorm
                0.16469726 = fieldWeight in 275, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.046875 = fieldNorm(doc=275)
          0.6666667 = coord(2/3)
      0.5 = coord(1/2)
    
    Abstract
    Link (association) analysis has been used in the criminal justice domain to search large datasets for associations between crime entities in order to facilitate crime investigations. However, link analysis still faces many challenging problems, such as information overload, high search complexity, and heavy reliance on domain knowledge. To address these challenges, this article proposes several techniques for automated, effective, and efficient link analysis. These techniques include the co-occurrence analysis, the shortest path algorithm, and a heuristic approach to identifying associations and determining their importance. We developed a prototype system called CrimeLink Explorer based on the proposed techniques. Results of a user study with 10 crime investigators from the Tucson Police Department showed that our system could help subjects conduct link analysis more efficiently than traditional single-level link analysis tools. Moreover, subjects believed that association paths found based on the heuristic approach were more accurate than those found based solely on the co-occurrence analysis and that the automated link analysis system would be of great help in crime investigations.
    Type
    a
  20. Qin, J.; Zhou, Y.; Chau, M.; Chen, H.: Multilingual Web retrieval : an experiment in English-Chinese business intelligence (2006) 0.01
    0.008164853 = product of:
      0.016329706 = sum of:
        0.016329706 = product of:
          0.024494559 = sum of:
            0.008891728 = weight(_text_:a in 5054) [ClassicSimilarity], result of:
              0.008891728 = score(doc=5054,freq=14.0), product of:
                0.052761257 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.045758117 = queryNorm
                0.1685276 = fieldWeight in 5054, product of:
                  3.7416575 = tf(freq=14.0), with freq of:
                    14.0 = termFreq=14.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5054)
            0.015602832 = weight(_text_:h in 5054) [ClassicSimilarity], result of:
              0.015602832 = score(doc=5054,freq=2.0), product of:
                0.113683715 = queryWeight, product of:
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.045758117 = queryNorm
                0.13724773 = fieldWeight in 5054, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5054)
          0.6666667 = coord(2/3)
      0.5 = coord(1/2)
    
    Abstract
    As increasing numbers of non-English resources have become available on the Web, the interesting and important issue of how Web users can retrieve documents in different languages has arisen. Cross-language information retrieval (CLIP), the study of retrieving information in one language by queries expressed in another language, is a promising approach to the problem. Cross-language information retrieval has attracted much attention in recent years. Most research systems have achieved satisfactory performance on standard Text REtrieval Conference (TREC) collections such as news articles, but CLIR techniques have not been widely studied and evaluated for applications such as Web portals. In this article, the authors present their research in developing and evaluating a multilingual English-Chinese Web portal that incorporates various CLIP techniques for use in the business domain. A dictionary-based approach was adopted and combines phrasal translation, co-occurrence analysis, and pre- and posttranslation query expansion. The portal was evaluated by domain experts, using a set of queries in both English and Chinese. The experimental results showed that co-occurrence-based phrasal translation achieved a 74.6% improvement in precision over simple word-byword translation. When used together, pre- and posttranslation query expansion improved the performance slightly, achieving a 78.0% improvement over the baseline word-by-word translation approach. In general, applying CLIR techniques in Web applications shows promise.
    Type
    a