Search (3 results, page 1 of 1)

  • × author_ss:"Zhang, C."
  • × theme_ss:"Computerlinguistik"
  1. Lu, C.; Bu, Y.; Wang, J.; Ding, Y.; Torvik, V.; Schnaars, M.; Zhang, C.: Examining scientific writing styles from the perspective of linguistic complexity : a cross-level moderation model (2019) 0.01
    0.005549766 = product of:
      0.013874415 = sum of:
        0.009138121 = weight(_text_:a in 5219) [ClassicSimilarity], result of:
          0.009138121 = score(doc=5219,freq=10.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.1709182 = fieldWeight in 5219, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=5219)
        0.0047362936 = product of:
          0.009472587 = sum of:
            0.009472587 = weight(_text_:information in 5219) [ClassicSimilarity], result of:
              0.009472587 = score(doc=5219,freq=2.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.116372846 = fieldWeight in 5219, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5219)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    Publishing articles in high-impact English journals is difficult for scholars around the world, especially for non-native English-speaking scholars (NNESs), most of whom struggle with proficiency in English. To uncover the differences in English scientific writing between native English-speaking scholars (NESs) and NNESs, we collected a large-scale data set containing more than 150,000 full-text articles published in PLoS between 2006 and 2015. We divided these articles into three groups according to the ethnic backgrounds of the first and corresponding authors, obtained by Ethnea, and examined the scientific writing styles in English from a two-fold perspective of linguistic complexity: (a) syntactic complexity, including measurements of sentence length and sentence complexity; and (b) lexical complexity, including measurements of lexical diversity, lexical density, and lexical sophistication. The observations suggest marginal differences between groups in syntactical and lexical complexity.
    Source
    Journal of the Association for Information Science and Technology. 70(2019) no.5, S.462-475
    Type
    a
  2. Zhang, Y.; Zhang, C.; Li, J.: Joint modeling of characters, words, and conversation contexts for microblog keyphrase extraction (2020) 0.00
    0.004303226 = product of:
      0.010758064 = sum of:
        0.0068111527 = weight(_text_:a in 5816) [ClassicSimilarity], result of:
          0.0068111527 = score(doc=5816,freq=8.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.12739488 = fieldWeight in 5816, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5816)
        0.003946911 = product of:
          0.007893822 = sum of:
            0.007893822 = weight(_text_:information in 5816) [ClassicSimilarity], result of:
              0.007893822 = score(doc=5816,freq=2.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.09697737 = fieldWeight in 5816, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5816)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    Millions of messages are produced on microblog platforms every day, leading to the pressing need for automatic identification of key points from the massive texts. To absorb salient content from the vast bulk of microblog posts, this article focuses on the task of microblog keyphrase extraction. In previous work, most efforts treat messages as independent documents and might suffer from the data sparsity problem exhibited in short and informal microblog posts. On the contrary, we propose to enrich contexts via exploiting conversations initialized by target posts and formed by their replies, which are generally centered around relevant topics to the target posts and therefore helpful for keyphrase identification. Concretely, we present a neural keyphrase extraction framework, which has 2 modules: a conversation context encoder and a keyphrase tagger. The conversation context encoder captures indicative representation from their conversation contexts and feeds the representation into the keyphrase tagger, and the keyphrase tagger extracts salient words from target posts. The 2 modules were trained jointly to optimize the conversation context encoding and keyphrase extraction processes. In the conversation context encoder, we leverage hierarchical structures to capture the word-level indicative representation and message-level indicative representation hierarchically. In both of the modules, we apply character-level representations, which enables the model to explore morphological features and deal with the out-of-vocabulary problem caused by the informal language style of microblog messages. Extensive comparison results on real-life data sets indicate that our model outperforms state-of-the-art models from previous studies.
    Source
    Journal of the Association for Information Science and Technology. 71(2020) no.5, S.553-567
    Type
    a
  3. Zhang, C.; Zeng, D.; Li, J.; Wang, F.-Y.; Zuo, W.: Sentiment analysis of Chinese documents : from sentence to document level (2009) 0.00
    0.0042062993 = product of:
      0.0105157485 = sum of:
        0.005779455 = weight(_text_:a in 3296) [ClassicSimilarity], result of:
          0.005779455 = score(doc=3296,freq=4.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.10809815 = fieldWeight in 3296, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=3296)
        0.0047362936 = product of:
          0.009472587 = sum of:
            0.009472587 = weight(_text_:information in 3296) [ClassicSimilarity], result of:
              0.009472587 = score(doc=3296,freq=2.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.116372846 = fieldWeight in 3296, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3296)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    User-generated content on the Web has become an extremely valuable source for mining and analyzing user opinions on any topic. Recent years have seen an increasing body of work investigating methods to recognize favorable and unfavorable sentiments toward specific subjects from online text. However, most of these efforts focus on English and there have been very few studies on sentiment analysis of Chinese content. This paper aims to address the unique challenges posed by Chinese sentiment analysis. We propose a rule-based approach including two phases: (1) determining each sentence's sentiment based on word dependency, and (2) aggregating sentences to predict the document sentiment. We report the results of an experimental study comparing our approach with three machine learning-based approaches using two sets of Chinese articles. These results illustrate the effectiveness of our proposed method and its advantages against learning-based approaches.
    Source
    Journal of the American Society for Information Science and Technology. 60(2009) no.12, S.2474-2487
    Type
    a