Search (11 results, page 1 of 1)

  • × author_ss:"Li, D."
  1. Li, D.; Kwong, C.-P.; Lee, D.L.: Unified linear subspace approach to semantic analysis (2009) 0.35
    0.3499609 = product of:
      0.4666145 = sum of:
        0.21809667 = weight(_text_:vector in 3321) [ClassicSimilarity], result of:
          0.21809667 = score(doc=3321,freq=8.0), product of:
            0.30654848 = queryWeight, product of:
              6.439392 = idf(docFreq=191, maxDocs=44218)
              0.047605187 = queryNorm
            0.711459 = fieldWeight in 3321, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              6.439392 = idf(docFreq=191, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3321)
        0.21484317 = weight(_text_:space in 3321) [ClassicSimilarity], result of:
          0.21484317 = score(doc=3321,freq=18.0), product of:
            0.24842183 = queryWeight, product of:
              5.2183776 = idf(docFreq=650, maxDocs=44218)
              0.047605187 = queryNorm
            0.86483204 = fieldWeight in 3321, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              5.2183776 = idf(docFreq=650, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3321)
        0.0336747 = product of:
          0.0673494 = sum of:
            0.0673494 = weight(_text_:model in 3321) [ClassicSimilarity], result of:
              0.0673494 = score(doc=3321,freq=6.0), product of:
                0.1830527 = queryWeight, product of:
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.047605187 = queryNorm
                0.36792353 = fieldWeight in 3321, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3321)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Abstract
    The Basic Vector Space Model (BVSM) is well known in information retrieval. Unfortunately, its retrieval effectiveness is limited because it is based on literal term matching. The Generalized Vector Space Model (GVSM) and Latent Semantic Indexing (LSI) are two prominent semantic retrieval methods, both of which assume there is some underlying latent semantic structure in a dataset that can be used to improve retrieval performance. However, while this structure may be derived from both the term space and the document space, GVSM exploits only the former and LSI the latter. In this article, the latent semantic structure of a dataset is examined from a dual perspective; namely, we consider the term space and the document space simultaneously. This new viewpoint has a natural connection to the notion of kernels. Specifically, a unified kernel function can be derived for a class of vector space models. The dual perspective provides a deeper understanding of the semantic space and makes transparent the geometrical meaning of the unified kernel function. New semantic analysis methods based on the unified kernel function are developed, which combine the advantages of LSI and GVSM. We also prove that the new methods are stable because although the selected rank of the truncated Singular Value Decomposition (SVD) is far from the optimum, the retrieval performance will not be degraded significantly. Experiments performed on standard test collections show that our methods are promising.
    Object
    Generalized Vector Space Model
  2. Li, D.; Kwong, C.-P.: Understanding latent semantic indexing : a topological structure analysis using Q-analysis (2010) 0.18
    0.18009433 = product of:
      0.24012578 = sum of:
        0.130858 = weight(_text_:vector in 3427) [ClassicSimilarity], result of:
          0.130858 = score(doc=3427,freq=2.0), product of:
            0.30654848 = queryWeight, product of:
              6.439392 = idf(docFreq=191, maxDocs=44218)
              0.047605187 = queryNorm
            0.4268754 = fieldWeight in 3427, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.439392 = idf(docFreq=191, maxDocs=44218)
              0.046875 = fieldNorm(doc=3427)
        0.08593727 = weight(_text_:space in 3427) [ClassicSimilarity], result of:
          0.08593727 = score(doc=3427,freq=2.0), product of:
            0.24842183 = queryWeight, product of:
              5.2183776 = idf(docFreq=650, maxDocs=44218)
              0.047605187 = queryNorm
            0.34593284 = fieldWeight in 3427, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.2183776 = idf(docFreq=650, maxDocs=44218)
              0.046875 = fieldNorm(doc=3427)
        0.023330513 = product of:
          0.046661027 = sum of:
            0.046661027 = weight(_text_:model in 3427) [ClassicSimilarity], result of:
              0.046661027 = score(doc=3427,freq=2.0), product of:
                0.1830527 = queryWeight, product of:
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.047605187 = queryNorm
                0.25490487 = fieldWeight in 3427, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3427)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Abstract
    The method of latent semantic indexing (LSI) is well-known for tackling the synonymy and polysemy problems in information retrieval; however, its performance can be very different for various datasets, and the questions of what characteristics of a dataset and why these characteristics contribute to this difference have not been fully understood. In this article, we propose that the mathematical structure of simplexes can be attached to a term-document matrix in the vector space model (VSM) for information retrieval. The Q-analysis devised by R.H. Atkin ([1974]) may then be applied to effect an analysis of the topological structure of the simplexes and their corresponding dataset. Experimental results of this analysis reveal that there is a correlation between the effectiveness of LSI and the topological structure of the dataset. By using the information obtained from the topological analysis, we develop a new method to explore the semantic information in a dataset. Experimental results show that our method can enhance the performance of VSM for datasets over which LSI is not effective.
  3. Shen, X.; Li, D.; Shen, C.: Evaluating China's university library Web sites using correspondence analysis (2006) 0.03
    0.028453367 = product of:
      0.11381347 = sum of:
        0.11381347 = sum of:
          0.062214702 = weight(_text_:model in 5277) [ClassicSimilarity], result of:
            0.062214702 = score(doc=5277,freq=2.0), product of:
              0.1830527 = queryWeight, product of:
                3.845226 = idf(docFreq=2569, maxDocs=44218)
                0.047605187 = queryNorm
              0.33987316 = fieldWeight in 5277, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.845226 = idf(docFreq=2569, maxDocs=44218)
                0.0625 = fieldNorm(doc=5277)
          0.051598765 = weight(_text_:22 in 5277) [ClassicSimilarity], result of:
            0.051598765 = score(doc=5277,freq=2.0), product of:
              0.16670525 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.047605187 = queryNorm
              0.30952093 = fieldWeight in 5277, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0625 = fieldNorm(doc=5277)
      0.25 = coord(1/4)
    
    Abstract
    In recent years, many evaluations of Web sites have been conducted, and relevant researches have also been carried out in academic circles. Correspondence analysis is introduced in this paper to evaluate university library Web sites through building a correspondence analysis model. This paper gives suggestions as to how to construct university library Web sites based on analysis and summary of evaluation results, in a bid to strengthen the construction of university library Web sites.
    Date
    22. 7.2006 16:40:18
  4. Li, D.: Knowledge representation and discovery based on linguistic atoms (1998) 0.03
    0.026171934 = product of:
      0.104687735 = sum of:
        0.104687735 = sum of:
          0.06598866 = weight(_text_:model in 3836) [ClassicSimilarity], result of:
            0.06598866 = score(doc=3836,freq=4.0), product of:
              0.1830527 = queryWeight, product of:
                3.845226 = idf(docFreq=2569, maxDocs=44218)
                0.047605187 = queryNorm
              0.36048993 = fieldWeight in 3836, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.845226 = idf(docFreq=2569, maxDocs=44218)
                0.046875 = fieldNorm(doc=3836)
          0.03869907 = weight(_text_:22 in 3836) [ClassicSimilarity], result of:
            0.03869907 = score(doc=3836,freq=2.0), product of:
              0.16670525 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.047605187 = queryNorm
              0.23214069 = fieldWeight in 3836, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=3836)
      0.25 = coord(1/4)
    
    Abstract
    Describes a new concept of linguistic atoms with 3 digital characteristics: expected value Ex, entropy En, and deviation D. The mathematical description has effectively integrated the fuzziness and randomness of linguistic terms in a unified way. Develops a method of knowledge representation in KDD, which bridges the gap between quantitative and qualitative knowledge. Mapping between quantities and qualities becomes much easier and interchangeable. In order to discover generalised knowledge from a database, uses virtual linguistic terms and cloud transfer for the auto-generation of concept hierarchies to attributes. Predicitve data mining with the cloud model is given for implementation. Illustrates the advantages of this linguistic model in KDD
    Footnote
    Contribution to a special issue of selected papers from the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'97), held Singapore, 22-23 Feb 1997
  5. Li, D.; Ding, Y.; Sugimoto, C.; He, B.; Tang, J.; Yan, E.; Lin, N.; Qin, Z.; Dong, T.: Modeling topic and community structure in social tagging : the TTR-LDA-Community model (2011) 0.01
    0.011905802 = product of:
      0.04762321 = sum of:
        0.04762321 = product of:
          0.09524642 = sum of:
            0.09524642 = weight(_text_:model in 4759) [ClassicSimilarity], result of:
              0.09524642 = score(doc=4759,freq=12.0), product of:
                0.1830527 = queryWeight, product of:
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.047605187 = queryNorm
                0.5203224 = fieldWeight in 4759, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4759)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    The presence of social networks in complex systems has made networks and community structure a focal point of study in many domains. Previous studies have focused on the structural emergence and growth of communities and on the topics displayed within the network. However, few scholars have closely examined the relationship between the thematic and structural properties of networks. Therefore, this article proposes the Tagger Tag Resource-Latent Dirichlet Allocation-Community model (TTR-LDA-Community model), which combines the Latent Dirichlet Allocation (LDA) model with the Girvan-Newman community detection algorithm through an inference mechanism. Using social tagging data from Delicious, this article demonstrates the clustering of active taggers into communities, the topic distributions within communities, and the ranking of taggers, tags, and resources within these communities. The data analysis evaluates patterns in community structure and topical affiliations diachronically. The article evaluates the effectiveness of community detection and the inference mechanism embedded in the model and finds that the TTR-LDA-Community model outperforms other traditional models in tag prediction. This has implications for scholars in domains interested in community detection, profiling, and recommender systems.
  6. Li, D.; Tang, J.; Ding, Y.; Shuai, X.; Chambers, T.; Sun, G.; Luo, Z.; Zhang, J.: Topic-level opinion influence model (TOIM) : an investigation using tencent microblogging (2015) 0.01
    0.009721047 = product of:
      0.03888419 = sum of:
        0.03888419 = product of:
          0.07776838 = sum of:
            0.07776838 = weight(_text_:model in 2345) [ClassicSimilarity], result of:
              0.07776838 = score(doc=2345,freq=8.0), product of:
                0.1830527 = queryWeight, product of:
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.047605187 = queryNorm
                0.42484146 = fieldWeight in 2345, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2345)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Text mining has been widely used in multiple types of user-generated data to infer user opinion, but its application to microblogging is difficult because text messages are short and noisy, providing limited information about user opinion. Given that microblogging users communicate with each other to form a social network, we hypothesize that user opinion is influenced by its neighbors in the network. In this paper, we infer user opinion on a topic by combining two factors: the user's historical opinion about relevant topics and opinion influence from his/her neighbors. We thus build a topic-level opinion influence model (TOIM) by integrating both topic factor and opinion influence factor into a unified probabilistic model. We evaluate our model in one of the largest microblogging sites in China, Tencent Weibo, and the experiments show that TOIM outperforms baseline methods in opinion inference accuracy. Moreover, incorporating indirect influence further improves inference recall and f1-measure. Finally, we demonstrate some useful applications of TOIM in analyzing users' behaviors in Tencent Weibo.
  7. Li, D.; Wang, Y.; Madden, A.; Ding, Y.; Sun, G.G.; Zhang, N.; Zhou, E.: Analyzing stock market trends using social media user moods and social influence (2019) 0.01
    0.009721047 = product of:
      0.03888419 = sum of:
        0.03888419 = product of:
          0.07776838 = sum of:
            0.07776838 = weight(_text_:model in 5362) [ClassicSimilarity], result of:
              0.07776838 = score(doc=5362,freq=8.0), product of:
                0.1830527 = queryWeight, product of:
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.047605187 = queryNorm
                0.42484146 = fieldWeight in 5362, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5362)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Information from microblogs is gaining increasing attention from researchers interested in analyzing fluctuations in stock markets. Behavioral financial theory draws on social psychology to explain some of the irrational behaviors associated with financial decisions to help explain some of the fluctuations. In this study we argue that social media users who demonstrate an interest in finance can offer insights into ways in which irrational behaviors may affect a stock market. To test this, we analyzed all the data collected over a 3-month period in 2011 from Tencent Weibo (one of the largest microblogging websites in China). We designed a social influence (SI)-based Tencent finance-related moods model to simulate investors' irrational behaviors, and designed a Tencent Moods-based Stock Trend Analysis (TM_STA) model to detect correlations between Tencent moods and the Hushen-300 index (one of the most important financial indexes in China). Experimental results show that the proposed method can help explain the data fluctuation. The findings support the existing behavioral financial theory, and can help to understand short-term rises and falls in a stock market. We use behavioral financial theory to further explain our findings, and to propose a trading model to verify the proposed model.
  8. Li, D.; Luo, Z.; Ding, Y.; Tang, J.; Sun, G.G.-Z.; Dai, X.; Du, J.; Zhang, J.; Kong, S.: User-level microblogging recommendation incorporating social influence (2017) 0.01
    0.008418675 = product of:
      0.0336747 = sum of:
        0.0336747 = product of:
          0.0673494 = sum of:
            0.0673494 = weight(_text_:model in 3426) [ClassicSimilarity], result of:
              0.0673494 = score(doc=3426,freq=6.0), product of:
                0.1830527 = queryWeight, product of:
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.047605187 = queryNorm
                0.36792353 = fieldWeight in 3426, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3426)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    With the information overload of user-generated content in microblogging, users find it extremely challenging to browse and find valuable information in their first attempt. In this paper we propose a microblogging recommendation algorithm, TSI-MR (Topic-Level Social Influence-based Microblogging Recommendation), which can significantly improve users' microblogging experiences. The main innovation of this proposed algorithm is that we consider social influences and their indirect structural relationships, which are largely based on social status theory, from the topic level. The primary advantage of this approach is that it can build an accurate description of latent relationships between two users with weak connections, which can improve the performance of the model; furthermore, it can solve sparsity problems of training data to a certain extent. The realization of the model is mainly based on Factor Graph. We also applied a distributed strategy to further improve the efficiency of the model. Finally, we use data from Tencent Weibo, one of the most popular microblogging services in China, to evaluate our methods. The results show that incorporating social influence can improve microblogging performance considerably, and outperform the baseline methods.
  9. Li, H.; Wu, H.; Li, D.; Lin, S.; Su, Z.; Luo, X.: PSI: A probabilistic semantic interpretable framework for fine-grained image ranking (2018) 0.01
    0.0058326283 = product of:
      0.023330513 = sum of:
        0.023330513 = product of:
          0.046661027 = sum of:
            0.046661027 = weight(_text_:model in 4577) [ClassicSimilarity], result of:
              0.046661027 = score(doc=4577,freq=2.0), product of:
                0.1830527 = queryWeight, product of:
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.047605187 = queryNorm
                0.25490487 = fieldWeight in 4577, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4577)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Image Ranking is one of the key problems in information science research area. However, most current methods focus on increasing the performance, leaving the semantic gap problem, which refers to the learned ranking models are hard to be understood, remaining intact. Therefore, in this article, we aim at learning an interpretable ranking model to tackle the semantic gap in fine-grained image ranking. We propose to combine attribute-based representation and online passive-aggressive (PA) learning based ranking models to achieve this goal. Besides, considering the highly localized instances in fine-grained image ranking, we introduce a supervised constrained clustering method to gather class-balanced training instances for local PA-based models, and incorporate the learned local models into a unified probabilistic framework. Extensive experiments on the benchmark demonstrate that the proposed framework outperforms state-of-the-art methods in terms of accuracy and speed.
  10. Lin, N.; Li, D.; Ding, Y.; He, B.; Qin, Z.; Tang, J.; Li, J.; Dong, T.: ¬The dynamic features of Delicious, Flickr, and YouTube (2012) 0.00
    0.0048605236 = product of:
      0.019442094 = sum of:
        0.019442094 = product of:
          0.03888419 = sum of:
            0.03888419 = weight(_text_:model in 4970) [ClassicSimilarity], result of:
              0.03888419 = score(doc=4970,freq=2.0), product of:
                0.1830527 = queryWeight, product of:
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.047605187 = queryNorm
                0.21242073 = fieldWeight in 4970, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4970)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    This article investigates the dynamic features of social tagging vocabularies in Delicious, Flickr, and YouTube from 2003 to 2008. Three algorithms are designed to study the macro- and micro-tag growth as well as the dynamics of taggers' activities, respectively. Moreover, we propose a Tagger Tag Resource Latent Dirichlet Allocation (TTR-LDA) model to explore the evolution of topics emerging from those social vocabularies. Our results show that (a) at the macro level, tag growth in all the three tagging systems obeys power law distribution with exponents lower than 1; at the micro level, the tag growth of popular resources in all three tagging systems follows a similar power law distribution; (b) the exponents of tag growth vary in different evolving stages of resources; (c) the growth of number of taggers associated with different popular resources presents a feature of convergence over time; (d) the active level of taggers has a positive correlation with the macro-tag growth of different tagging systems; and (e) some topics evolve into several subtopics over time while others experience relatively stable stages in which their contents do not change much, and certain groups of taggers continue their interests in them.
  11. Liu, M.; Bu, Y.; Chen, C.; Xu, J.; Li, D.; Leng, Y.; Freeman, R.B.; Meyer, E.T.; Yoon, W.; Sung, M.; Jeong, M.; Lee, J.; Kang, J.; Min, C.; Zhai, Y.; Song, M.; Ding, Y.: Pandemics are catalysts of scientific novelty : evidence from COVID-19 (2022) 0.00
    0.0048605236 = product of:
      0.019442094 = sum of:
        0.019442094 = product of:
          0.03888419 = sum of:
            0.03888419 = weight(_text_:model in 633) [ClassicSimilarity], result of:
              0.03888419 = score(doc=633,freq=2.0), product of:
                0.1830527 = queryWeight, product of:
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.047605187 = queryNorm
                0.21242073 = fieldWeight in 633, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.845226 = idf(docFreq=2569, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=633)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Scientific novelty drives the efforts to invent new vaccines and solutions during the pandemic. First-time collaboration and international collaboration are two pivotal channels to expand teams' search activities for a broader scope of resources required to address the global challenge, which might facilitate the generation of novel ideas. Our analysis of 98,981 coronavirus papers suggests that scientific novelty measured by the BioBERT model that is pretrained on 29 million PubMed articles, and first-time collaboration increased after the outbreak of COVID-19, and international collaboration witnessed a sudden decrease. During COVID-19, papers with more first-time collaboration were found to be more novel and international collaboration did not hamper novelty as it had done in the normal periods. The findings suggest the necessity of reaching out for distant resources and the importance of maintaining a collaborative scientific community beyond nationalism during a pandemic.