Search (9 results, page 1 of 1)

Suakkaphong, N.; Zhang, Z.; Chen, H.: Disease named entity recognition using semisupervised learning and conditional random fields (2011) 0.05
```
0.04727441 = product of:
  0.09454882 = sum of:
    0.057835944 = weight(_text_:data in 4367) [ClassicSimilarity], result of:
      0.057835944 = score(doc=4367,freq=10.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.39059696 = fieldWeight in 4367, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4367)
    0.036712877 = product of:
      0.073425755 = sum of:
        0.073425755 = weight(_text_:processing in 4367) [ClassicSimilarity], result of:
          0.073425755 = score(doc=4367,freq=6.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.38733965 = fieldWeight in 4367, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4367)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

Information extraction is an important text-mining task that aims at extracting prespecified types of information from large text collections and making them available in structured representations such as databases. In the biomedical domain, information extraction can be applied to help biologists make the most use of their digital-literature archives. Currently, there are large amounts of biomedical literature that contain rich information about biomedical substances. Extracting such knowledge requires a good named entity recognition technique. In this article, we combine conditional random fields (CRFs), a state-of-the-art sequence-labeling algorithm, with two semisupervised learning techniques, bootstrapping and feature sampling, to recognize disease names from biomedical literature. Two data-processing strategies for each technique also were analyzed: one sequentially processing unlabeled data partitions and another one processing unlabeled data partitions in a round-robin fashion. The experimental results showed the advantage of semisupervised learning techniques given limited labeled training data. Specifically, CRFs with bootstrapping implemented in sequential fashion outperformed strictly supervised CRFs for disease name recognition. The project was supported by NIH/NLM Grant R33 LM07299-01, 2002-2005.

Theme

Data Mining
Hu, P.J.-H.; Hsu, F.-M.; Hu, H.-f.; Chen, H.: Agency satisfaction with electronic record management systems : a large-scale survey (2010) 0.02
```
0.023530604 = product of:
  0.04706121 = sum of:
    0.02586502 = weight(_text_:data in 4115) [ClassicSimilarity], result of:
      0.02586502 = score(doc=4115,freq=2.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.17468026 = fieldWeight in 4115, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4115)
    0.021196188 = product of:
      0.042392377 = sum of:
        0.042392377 = weight(_text_:processing in 4115) [ClassicSimilarity], result of:
          0.042392377 = score(doc=4115,freq=2.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.22363065 = fieldWeight in 4115, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4115)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

We investigated agency satisfaction with an electronic record management system (ERMS) that supports the electronic creation, archival, processing, transmittal, and sharing of records (documents) among autonomous government agencies. A factor model, explaining agency satisfaction with ERMS functionalities, offers hypotheses, which we tested empirically with a large-scale survey that involved more than 1,600 government agencies in Taiwan. The data showed a good fit to our model and supported all the hypotheses. Overall, agency satisfaction with ERMS functionalities appears jointly determined by regulatory compliance, job relevance, and satisfaction with support services. Among the determinants we studied, agency satisfaction with support services seems the strongest predictor of agency satisfaction with ERMS functionalities. Regulatory compliance also has important influences on agency satisfaction with ERMS, through its influence on job relevance and satisfaction with support services. Further analyses showed that satisfaction with support services partially mediated the impact of regulatory compliance on satisfaction with ERMS functionalities, and job relevance partially mediated the influence of regulatory compliance on satisfaction with ERMS functionalities. Our findings have important implications for research and practice, which we also discuss.
Huang, C.; Fu, T.; Chen, H.: Text-based video content classification for online video-sharing sites (2010) 0.01
```
0.014458986 = product of:
  0.057835944 = sum of:
    0.057835944 = weight(_text_:data in 3452) [ClassicSimilarity], result of:
      0.057835944 = score(doc=3452,freq=10.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.39059696 = fieldWeight in 3452, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3452)
  0.25 = coord(1/4)
```
Abstract

With the emergence of Web 2.0, sharing personal content, communicating ideas, and interacting with other online users in Web 2.0 communities have become daily routines for online users. User-generated data from Web 2.0 sites provide rich personal information (e.g., personal preferences and interests) and can be utilized to obtain insight about cyber communities and their social networks. Many studies have focused on leveraging user-generated information to analyze blogs and forums, but few studies have applied this approach to video-sharing Web sites. In this study, we propose a text-based framework for video content classification of online-video sharing Web sites. Different types of user-generated data (e.g., titles, descriptions, and comments) were used as proxies for online videos, and three types of text features (lexical, syntactic, and content-specific features) were extracted. Three feature-based classification techniques (C4.5, Naïve Bayes, and Support Vector Machine) were used to classify videos. To evaluate the proposed framework, user-generated data from candidate videos, which were identified by searching user-given keywords on YouTube, were first collected. Then, a subset of the collected data was randomly selected and manually tagged by users as our experiment data. The experimental results showed that the proposed approach was able to classify online videos based on users' interests with accuracy rates up to 87.2%, and all three types of text features contributed to discriminating videos. Support Vector Machine outperformed C4.5 and Naïve Bayes techniques in our experiments. In addition, our case study further demonstrated that accurate video-classification results are very useful for identifying implicit cyber communities on video-sharing Web sites.
Qu, B.; Cong, G.; Li, C.; Sun, A.; Chen, H.: ¬An evaluation of classification models for question topic categorization (2012) 0.01
```
0.009144665 = product of:
  0.03657866 = sum of:
    0.03657866 = weight(_text_:data in 237) [ClassicSimilarity], result of:
      0.03657866 = score(doc=237,freq=4.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.24703519 = fieldWeight in 237, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=237)
  0.25 = coord(1/4)
```
Abstract

We study the problem of question topic classification using a very large real-world Community Question Answering (CQA) dataset from Yahoo! Answers. The dataset comprises 3.9 million questions and these questions are organized into more than 1,000 categories in a hierarchy. To the best knowledge, this is the first systematic evaluation of the performance of different classification methods on question topic classification as well as short texts. Specifically, we empirically evaluate the following in classifying questions into CQA categories: (a) the usefulness of n-gram features and bag-of-word features; (b) the performance of three standard classification algorithms (naive Bayes, maximum entropy, and support vector machines); (c) the performance of the state-of-the-art hierarchical classification algorithms; (d) the effect of training data size on performance; and (e) the effectiveness of the different components of CQA data, including subject, content, asker, and the best answer. The experimental results show what aspects are important for question topic classification in terms of both effectiveness and efficiency. We believe that the experimental findings from this study will be useful in real-world classification problems.
Benjamin, V.; Chen, H.; Zimbra, D.: Bridging the virtual and real : the relationship between web content, linkage, and geographical proximity of social movements (2014) 0.01
```
0.009144665 = product of:
  0.03657866 = sum of:
    0.03657866 = weight(_text_:data in 1527) [ClassicSimilarity], result of:
      0.03657866 = score(doc=1527,freq=4.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.24703519 = fieldWeight in 1527, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1527)
  0.25 = coord(1/4)
```
Abstract

As the Internet becomes ubiquitous, it has advanced to more closely represent aspects of the real world. Due to this trend, researchers in various disciplines have become interested in studying relationships between real-world phenomena and their virtual representations. One such area of emerging research seeks to study relationships between real-world and virtual activism of social movement organization (SMOs). In particular, SMOs holding extreme social perspectives are often studied due to their tendency to have robust virtual presences to circumvent real-world social barriers preventing information dissemination. However, many previous studies have been limited in scope because they utilize manual data-collection and analysis methods. They also often have failed to consider the real-world aspects of groups that partake in virtual activism. We utilize automated data-collection and analysis methods to identify significant relationships between aspects of SMO virtual communities and their respective real-world locations and ideological perspectives. Our results also demonstrate that the interconnectedness of SMO virtual communities is affected specifically by aspects of the real world. These observations provide insight into the behaviors of SMOs within virtual environments, suggesting that the virtual communities of SMOs are strongly affected by aspects of the real world.
Jiang, S.; Gao, Q.; Chen, H.; Roco, M.C.: ¬The roles of sharing, transfer, and public funding in nanotechnology knowledge-diffusion networks (2015) 0.01
```
0.0077595054 = product of:
  0.031038022 = sum of:
    0.031038022 = weight(_text_:data in 1823) [ClassicSimilarity], result of:
      0.031038022 = score(doc=1823,freq=2.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.2096163 = fieldWeight in 1823, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046875 = fieldNorm(doc=1823)
  0.25 = coord(1/4)
```
Abstract

Understanding the knowledge-diffusion networks of patent inventors can help governments and businesses effectively use their investment to stimulate commercial science and technology development. Such inventor networks are usually large and complex. This study proposes a multidimensional network analysis framework that utilizes Exponential Random Graph Models (ERGMs) to simultaneously model knowledge-sharing and knowledge-transfer processes, examine their interactions, and evaluate the impacts of network structures and public funding on knowledge-diffusion networks. Experiments are conducted on a longitudinal data set that covers 2 decades (1991-2010) of nanotechnology-related US Patent and Trademark Office (USPTO) patents. The results show that knowledge sharing and knowledge transfer are closely interrelated. High degree centrality or boundary inventors play significant roles in the network, and National Science Foundation (NSF) public funding positively affects knowledge sharing despite its small fraction in overall funding and upstream research topics.
Chen, H.; Beaudoin, C.E.; Hong, H.: Teen online information disclosure : empirical testing of a protection motivation and social capital model (2016) 0.01
```
0.0077595054 = product of:
  0.031038022 = sum of:
    0.031038022 = weight(_text_:data in 3203) [ClassicSimilarity], result of:
      0.031038022 = score(doc=3203,freq=2.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.2096163 = fieldWeight in 3203, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046875 = fieldNorm(doc=3203)
  0.25 = coord(1/4)
```
Abstract

With bases in protection motivation theory and social capital theory, this study investigates teen and parental factors that determine teens' online privacy concerns, online privacy protection behaviors, and subsequent online information disclosure on social network sites. With secondary data from a 2012 survey (N?=?622), the final well-fitting structural equation model revealed that teen online privacy concerns were primarily influenced by parental interpersonal trust and parental concerns about teens' online privacy, whereas teen privacy protection behaviors were primarily predicted by teen cost-benefit appraisal of online interactions. In turn, teen online privacy concerns predicted increased privacy protection behaviors and lower teen information disclosure. Finally, restrictive and instructive parental mediation exerted differential influences on teens' privacy protection behaviors and online information disclosure.
Liu, X.; Kaza, S.; Zhang, P.; Chen, H.: Determining inventor status and its effect on knowledge diffusion : a study on nanotechnology literature from China, Russia, and India (2011) 0.01
```
0.006466255 = product of:
  0.02586502 = sum of:
    0.02586502 = weight(_text_:data in 4468) [ClassicSimilarity], result of:
      0.02586502 = score(doc=4468,freq=2.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.17468026 = fieldWeight in 4468, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4468)
  0.25 = coord(1/4)
```
Abstract

In an increasingly global research landscape, it is important to identify the most prolific researchers in various institutions and their influence on the diffusion of knowledge. Knowledge diffusion within institutions is influenced by not just the status of individual researchers but also the collaborative culture that determines status. There are various methods to measure individual status, but few studies have compared them or explored the possible effects of different cultures on the status measures. In this article, we examine knowledge diffusion within science and technology-oriented research organizations. Using social network analysis metrics to measure individual status in large-scale coauthorship networks, we studied an individual's impact on the recombination of knowledge to produce innovation in nanotechnology. Data from the most productive and high-impact institutions in China (Chinese Academy of Sciences), Russia (Russian Academy of Sciences), and India (Indian Institutes of Technology) were used. We found that boundary-spanning individuals influenced knowledge diffusion in all countries. However, our results also indicate that cultural and institutional differences may influence knowledge diffusion.
Chen, H.; Baptista Nunes, J.M.; Ragsdell, G.; An, X.: Somatic and cultural knowledge : drivers of a habitus-driven model of tacit knowledge acquisition (2019) 0.00
```
0.004526378 = product of:
  0.018105512 = sum of:
    0.018105512 = weight(_text_:data in 5460) [ClassicSimilarity], result of:
      0.018105512 = score(doc=5460,freq=2.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.12227618 = fieldWeight in 5460, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.02734375 = fieldNorm(doc=5460)
  0.25 = coord(1/4)
```
Abstract

The purpose of this paper is to identify and explain the role of individual learning and development in acquiring tacit knowledge in the context of the inexorable and intense continuous change (technological and otherwise) that characterizes our society today, and also to investigate the software (SW) sector, which is at the core of contemporary continuous change and is a paradigm of effective and intrinsic knowledge sharing (KS). This makes the SW sector unique and different from others where KS is so hard to implement. Design/methodology/approach The study employed an inductive qualitative approach based on a multi-case study approach, composed of three successful SW companies in China. These companies are representative of the fabric of the sector, namely a small- and medium-sized enterprise, a large private company and a large state-owned enterprise. The fieldwork included 44 participants who were interviewed using a semi-structured script. The interview data were coded and interpreted following the Straussian grounded theory pattern of open coding, axial coding and selective coding. The process of interviewing was stopped when theoretical saturation was achieved after a careful process of theoretical sampling.

Search (9 results, page 1 of 1)

Authors

Themes