Search (8 results, page 1 of 1)

  • × author_ss:"Li, W."
  1. Ouyang, Y.; Li, W.; Li, S.; Lu, Q.: Intertopic information mining for query-based summarization (2010) 0.00
    4.9332716E-4 = product of:
      0.007399907 = sum of:
        0.007399907 = product of:
          0.014799814 = sum of:
            0.014799814 = weight(_text_:information in 3459) [ClassicSimilarity], result of:
              0.014799814 = score(doc=3459,freq=18.0), product of:
                0.050870337 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.028978055 = queryNorm
                0.2909321 = fieldWeight in 3459, product of:
                  4.2426405 = tf(freq=18.0), with freq of:
                    18.0 = termFreq=18.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3459)
          0.5 = coord(1/2)
      0.06666667 = coord(1/15)
    
    Abstract
    In this article, the authors address the problem of sentence ranking in summarization. Although most existing summarization approaches are concerned with the information embodied in a particular topic (including a set of documents and an associated query) for sentence ranking, they propose a novel ranking approach that incorporates intertopic information mining. Intertopic information, in contrast to intratopic information, is able to reveal pairwise topic relationships and thus can be considered as the bridge across different topics. In this article, the intertopic information is used for transferring word importance learned from known topics to unknown topics under a learning-based summarization framework. To mine this information, the authors model the topic relationship by clustering all the words in both known and unknown topics according to various kinds of word conceptual labels, which indicate the roles of the words in the topic. Based on the mined relationships, we develop a probabilistic model using manually generated summaries provided for known topics to predict ranking scores for sentences in unknown topics. A series of experiments have been conducted on the Document Understanding Conference (DUC) 2006 data set. The evaluation results show that intertopic information is indeed effective for sentence ranking and the resultant summarization system performs comparably well to the best-performing DUC participating systems on the same data set.
    Source
    Journal of the American Society for Information Science and Technology. 61(2010) no.5, S.1062-1072
  2. Li, W.; Wong, K.-F.; Yuan, C.: Toward automatic Chinese temporal information extraction (2001) 0.00
    4.0279995E-4 = product of:
      0.006041999 = sum of:
        0.006041999 = product of:
          0.012083998 = sum of:
            0.012083998 = weight(_text_:information in 6029) [ClassicSimilarity], result of:
              0.012083998 = score(doc=6029,freq=12.0), product of:
                0.050870337 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.028978055 = queryNorm
                0.23754507 = fieldWeight in 6029, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=6029)
          0.5 = coord(1/2)
      0.06666667 = coord(1/15)
    
    Abstract
    Over the past few years, temporal information processing and temporal database management have increasingly become hot topics. Nevertheless, only a few researchers have investigated these areas in the Chinese language. This lays down the objective of our research: to exploit Chinese language processing techniques for temporal information extraction and concept reasoning. In this article, we first study the mechanism for expressing time in Chinese. On the basis of the study, we then design a general frame structure for maintaining the extracted temporal concepts and propose a system for extracting time-dependent information from Hong Kong financial news. In the system, temporal knowledge is represented by different types of temporal concepts (TTC) and different temporal relations, including absolute and relative relations, which are used to correlate between action times and reference times. In analyzing a sentence, the algorithm first determines the situation related to the verb. This in turn will identify the type of temporal concept associated with the verb. After that, the relevant temporal information is extracted and the temporal relations are derived. These relations link relevant concept frames together in chronological order, which in turn provide the knowledge to fulfill users' queries, e.g., for question-answering (i.e., Q&A) applications
    Source
    Journal of the American Society for Information Science and technology. 52(2001) no.9, S.748-762
  3. Wei, F.; Li, W.; Lu, Q.; He, Y.: Applying two-level reinforcement ranking in query-oriented multidocument summarization (2009) 0.00
    2.3255666E-4 = product of:
      0.0034883497 = sum of:
        0.0034883497 = product of:
          0.0069766995 = sum of:
            0.0069766995 = weight(_text_:information in 3120) [ClassicSimilarity], result of:
              0.0069766995 = score(doc=3120,freq=4.0), product of:
                0.050870337 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.028978055 = queryNorm
                0.13714671 = fieldWeight in 3120, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3120)
          0.5 = coord(1/2)
      0.06666667 = coord(1/15)
    
    Abstract
    Sentence ranking is the issue of most concern in document summarization today. While traditional feature-based approaches evaluate sentence significance and rank the sentences relying on the features that are particularly designed to characterize the different aspects of the individual sentences, the newly emerging graph-based ranking algorithms (such as the PageRank-like algorithms) recursively compute sentence significance using the global information in a text graph that links sentences together. In general, the existing PageRank-like algorithms can model well the phenomena that a sentence is important if it is linked by many other important sentences. Or they are capable of modeling the mutual reinforcement among the sentences in the text graph. However, when dealing with multidocument summarization these algorithms often assemble a set of documents into one large file. The document dimension is totally ignored. In this article we present a framework to model the two-level mutual reinforcement among sentences as well as documents. Under this framework we design and develop a novel ranking algorithm such that the document reinforcement is taken into account in the process of sentence ranking. The convergence issue is examined. We also explore an interesting and important property of the proposed algorithm. When evaluated on the DUC 2005 and 2006 query-oriented multidocument summarization datasets, significant results are achieved.
    Source
    Journal of the American Society for Information Science and Technology. 60(2009) no.10, S.2119-2131
  4. Cai, X.; Li, W.: Enhancing sentence-level clustering with integrated and interactive frameworks for theme-based summarization (2011) 0.00
    2.3255666E-4 = product of:
      0.0034883497 = sum of:
        0.0034883497 = product of:
          0.0069766995 = sum of:
            0.0069766995 = weight(_text_:information in 4770) [ClassicSimilarity], result of:
              0.0069766995 = score(doc=4770,freq=4.0), product of:
                0.050870337 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.028978055 = queryNorm
                0.13714671 = fieldWeight in 4770, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4770)
          0.5 = coord(1/2)
      0.06666667 = coord(1/15)
    
    Abstract
    Sentence clustering plays a pivotal role in theme-based summarization, which discovers topic themes defined as the clusters of highly related sentences to avoid redundancy and cover more diverse information. As the length of sentences is short and the content it contains is limited, the bag-of-words cosine similarity traditionally used for document clustering is no longer suitable. Special treatment for measuring sentence similarity is necessary. In this article, we study the sentence-level clustering problem. After exploiting concept- and context-enriched sentence vector representations, we develop two co-clustering frameworks to enhance sentence-level clustering for theme-based summarization-integrated clustering and interactive clustering-both allowing word and document to play an explicit role in sentence clustering as independent text objects rather than using word or concept as features of a sentence in a document set. In each framework, we experiment with two-level co-clustering (i.e., sentence-word co-clustering or sentence-document co-clustering) and three-level co-clustering (i.e., document-sentence-word co-clustering). Compared against concept- and context-oriented sentence-representation reformation, co-clustering shows a clear advantage in both intrinsic clustering quality evaluation and extrinsic summarization evaluation conducted on the Document Understanding Conferences (DUC) datasets.
    Source
    Journal of the American Society for Information Science and Technology. 62(2011) no.10, S.2067-2082
  5. Liu, Y.; Li, W.; Huang, Z.; Fang, Q.: ¬A fast method based on multiple clustering for name disambiguation in bibliographic citations (2015) 0.00
    2.3255666E-4 = product of:
      0.0034883497 = sum of:
        0.0034883497 = product of:
          0.0069766995 = sum of:
            0.0069766995 = weight(_text_:information in 1672) [ClassicSimilarity], result of:
              0.0069766995 = score(doc=1672,freq=4.0), product of:
                0.050870337 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.028978055 = queryNorm
                0.13714671 = fieldWeight in 1672, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1672)
          0.5 = coord(1/2)
      0.06666667 = coord(1/15)
    
    Abstract
    Name ambiguity in the context of bibliographic citation affects the quality of services in digital libraries. Previous methods are not widely applied in practice because of their high computational complexity and their strong dependency on excessive attributes, such as institutional affiliation, research area, address, etc., which are difficult to obtain in practice. To solve this problem, we propose a novel coarse-to-fine framework for name disambiguation which sequentially employs 3 common and easily accessible attributes (i.e., coauthor name, article title, and publication venue). Our proposed framework is based on multiple clustering and consists of 3 steps: (a) clustering articles by coauthorship and obtaining rough clusters, that is fragments; (b) clustering fragments obtained in step 1 by title information and getting bigger fragments; (c) and clustering fragments obtained in step 2 by the latent relations among venues. Experimental results on a Digital Bibliography and Library Project (DBLP) data set show that our method outperforms the existing state-of-the-art methods by 2.4% to 22.7% on the average pairwise F1 score and is 10 to 100 times faster in terms of execution time.
    Source
    Journal of the Association for Information Science and Technology. 66(2015) no.3, S.634-644
  6. Xiang, R.; Chersoni, E.; Lu, Q.; Huang, C.-R.; Li, W.; Long, Y.: Lexical data augmentation for sentiment analysis (2021) 0.00
    2.3255666E-4 = product of:
      0.0034883497 = sum of:
        0.0034883497 = product of:
          0.0069766995 = sum of:
            0.0069766995 = weight(_text_:information in 392) [ClassicSimilarity], result of:
              0.0069766995 = score(doc=392,freq=4.0), product of:
                0.050870337 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.028978055 = queryNorm
                0.13714671 = fieldWeight in 392, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=392)
          0.5 = coord(1/2)
      0.06666667 = coord(1/15)
    
    Abstract
    Machine learning methods, especially deep learning models, have achieved impressive performance in various natural language processing tasks including sentiment analysis. However, deep learning models are more demanding for training data. Data augmentation techniques are widely used to generate new instances based on modifications to existing data or relying on external knowledge bases to address annotated data scarcity, which hinders the full potential of machine learning techniques. This paper presents our work using part-of-speech (POS) focused lexical substitution for data augmentation (PLSDA) to enhance the performance of machine learning algorithms in sentiment analysis. We exploit POS information to identify words to be replaced and investigate different augmentation strategies to find semantically related substitutions when generating new instances. The choice of POS tags as well as a variety of strategies such as semantic-based substitution methods and sampling methods are discussed in detail. Performance evaluation focuses on the comparison between PLSDA and two previous lexical substitution-based data augmentation methods, one of which is thesaurus-based, and the other is lexicon manipulation based. Our approach is tested on five English sentiment analysis benchmarks: SST-2, MR, IMDB, Twitter, and AirRecord. Hyperparameters such as the candidate similarity threshold and number of newly generated instances are optimized. Results show that six classifiers (SVM, LSTM, BiLSTM-AT, bidirectional encoder representations from transformers [BERT], XLNet, and RoBERTa) trained with PLSDA achieve accuracy improvement of more than 0.6% comparing to two previous lexical substitution methods averaged on five benchmarks. Introducing POS constraint and well-designed augmentation strategies can improve the reliability of lexical data augmentation methods. Consequently, PLSDA significantly improves the performance of sentiment analysis algorithms.
    Source
    Journal of the Association for Information Science and Technology. 72(2021) no.11, S.1432-1447
  7. Li, W.; Zheng, Y.; Zhan, Y.; Feng, R.; Zhang, T.; Fan, W.: Cross-modal retrieval with dual multi-angle self-attention (2021) 0.00
    1.9733087E-4 = product of:
      0.002959963 = sum of:
        0.002959963 = product of:
          0.005919926 = sum of:
            0.005919926 = weight(_text_:information in 67) [ClassicSimilarity], result of:
              0.005919926 = score(doc=67,freq=2.0), product of:
                0.050870337 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.028978055 = queryNorm
                0.116372846 = fieldWeight in 67, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046875 = fieldNorm(doc=67)
          0.5 = coord(1/2)
      0.06666667 = coord(1/15)
    
    Source
    Journal of the Association for Information Science and Technology. 72(2021) no.1, S.46-65
  8. Wei, F.; Li, W.; Liu, S.: iRANK: a rank-learn-combine framework for unsupervised ensemble ranking (2010) 0.00
    1.6444239E-4 = product of:
      0.0024666358 = sum of:
        0.0024666358 = product of:
          0.0049332716 = sum of:
            0.0049332716 = weight(_text_:information in 3472) [ClassicSimilarity], result of:
              0.0049332716 = score(doc=3472,freq=2.0), product of:
                0.050870337 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.028978055 = queryNorm
                0.09697737 = fieldWeight in 3472, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3472)
          0.5 = coord(1/2)
      0.06666667 = coord(1/15)
    
    Source
    Journal of the American Society for Information Science and Technology. 61(2010) no.6, S.1232-1243