Search (5 results, page 1 of 1)

  • × author_ss:"Li, X."
  1. Li, X.; Zhang, A.; Li, C.; Ouyang, J.; Cai, Y.: Exploring coherent topics by topic modeling with term weighting (2018) 0.02
    0.016854495 = product of:
      0.03370899 = sum of:
        0.03370899 = product of:
          0.06741798 = sum of:
            0.06741798 = weight(_text_:e.g in 5045) [ClassicSimilarity], result of:
              0.06741798 = score(doc=5045,freq=2.0), product of:
                0.23393378 = queryWeight, product of:
                  5.2168427 = idf(docFreq=651, maxDocs=44218)
                  0.044842023 = queryNorm
                0.28819257 = fieldWeight in 5045, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.2168427 = idf(docFreq=651, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5045)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Topic models often produce unexplainable topics that are filled with noisy words. The reason is that words in topic modeling have equal weights. High frequency words dominate the top topic word lists, but most of them are meaningless words, e.g., domain-specific stopwords. To address this issue, in this paper we aim to investigate how to weight words, and then develop a straightforward but effective term weighting scheme, namely entropy weighting (EW). The proposed EW scheme is based on conditional entropy measured by word co-occurrences. Compared with existing term weighting schemes, the highlight of EW is that it can automatically reward informative words. For more robust word weight, we further suggest a combination form of EW (CEW) with two existing weighting schemes. Basically, our CEW assigns meaningless words lower weights and informative words higher weights, leading to more coherent topics during topic modeling inference. We apply CEW to Dirichlet multinomial mixture and latent Dirichlet allocation, and evaluate it by topic quality, document clustering and classification tasks on 8 real world data sets. Experimental results show that weighting words can effectively improve the topic modeling performance over both short texts and normal long texts. More importantly, the proposed CEW significantly outperforms the existing term weighting schemes, since it further considers which words are informative.
  2. Li, X.; Rijke, M.de: Characterizing and predicting downloads in academic search (2019) 0.02
    0.016854495 = product of:
      0.03370899 = sum of:
        0.03370899 = product of:
          0.06741798 = sum of:
            0.06741798 = weight(_text_:e.g in 5103) [ClassicSimilarity], result of:
              0.06741798 = score(doc=5103,freq=2.0), product of:
                0.23393378 = queryWeight, product of:
                  5.2168427 = idf(docFreq=651, maxDocs=44218)
                  0.044842023 = queryNorm
                0.28819257 = fieldWeight in 5103, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.2168427 = idf(docFreq=651, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5103)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Numerous studies have been conducted on the information interaction behavior of search engine users. Few studies have considered information interactions in the domain of academic search. We focus on conversion behavior in this domain. Conversions have been widely studied in the e-commerce domain, e.g., for online shopping and hotel booking, but little is known about conversions in academic search. We start with a description of a unique dataset of a particular type of conversion in academic search, viz. users' downloads of scientific papers. Then we move to an observational analysis of users' download actions. We first characterize user actions and show their statistics in sessions. Then we focus on behavioral and topical aspects of downloads, revealing behavioral correlations across download sessions. We discover unique properties that differ from other conversion settings such as online shopping. Using insights gained from these observations, we consider the task of predicting the next download. In particular, we focus on predicting the time until the next download session, and on predicting the number of downloads. We cast these as time series prediction problems and model them using LSTMs. We develop a specialized model built on user segmentations that achieves significant improvements over the state-of-the art.
  3. Li, X.; Schijvenaars, B.J.A.; Rijke, M.de: Investigating queries and search failures in academic search (2017) 0.01
    0.013483594 = product of:
      0.026967188 = sum of:
        0.026967188 = product of:
          0.053934377 = sum of:
            0.053934377 = weight(_text_:e.g in 5033) [ClassicSimilarity], result of:
              0.053934377 = score(doc=5033,freq=2.0), product of:
                0.23393378 = queryWeight, product of:
                  5.2168427 = idf(docFreq=651, maxDocs=44218)
                  0.044842023 = queryNorm
                0.23055404 = fieldWeight in 5033, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.2168427 = idf(docFreq=651, maxDocs=44218)
                  0.03125 = fieldNorm(doc=5033)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Academic search concerns the retrieval and profiling of information objects in the domain of academic research. In this paper we reveal important observations of academic search queries, and provide an algorithmic solution to address a type of failure during search sessions: null queries. We start by providing a general characterization of academic search queries, by analyzing a large-scale transaction log of a leading academic search engine. Unlike previous small-scale analyses of academic search queries, we find important differences with query characteristics known from web search. E.g., in academic search there is a substantially bigger proportion of entity queries, and a heavier tail in query length distribution. We then focus on search failures and, in particular, on null queries that lead to an empty search engine result page, on null sessions that contain such null queries, and on users who are prone to issue null queries. In academic search approximately 1 in 10 queries is a null query, and 25% of the sessions contain a null query. They appear in different types of search sessions, and prevent users from achieving their search goal. To address the high rate of null queries in academic search, we consider the task of providing query suggestions. Specifically we focus on a highly frequent query type: non-boolean informational queries. To this end we need to overcome query sparsity and make effective use of session information. We find that using entities helps to surface more relevant query suggestions in the face of query sparsity. We also find that query suggestions should be conditioned on the type of session in which they are offered to be more effective. After casting the session classification problem as a multi-label classification problem, we generate session-conditional query suggestions based on predicted session type. We find that this session-conditional method leads to significant improvements over a generic query suggestion method. Personalization yields very little further improvements over session-conditional query suggestions.
  4. Li, X.: Designing an interactive Web tutorial with cross-browser dynamic HTML (2000) 0.01
    0.009113212 = product of:
      0.018226424 = sum of:
        0.018226424 = product of:
          0.03645285 = sum of:
            0.03645285 = weight(_text_:22 in 4897) [ClassicSimilarity], result of:
              0.03645285 = score(doc=4897,freq=2.0), product of:
                0.15702912 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.044842023 = queryNorm
                0.23214069 = fieldWeight in 4897, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4897)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    28. 1.2006 19:21:22
  5. Li, X.; Thelwall, M.; Kousha, K.: ¬The role of arXiv, RePEc, SSRN and PMC in formal scholarly communication (2015) 0.01
    0.0075943437 = product of:
      0.0151886875 = sum of:
        0.0151886875 = product of:
          0.030377375 = sum of:
            0.030377375 = weight(_text_:22 in 2593) [ClassicSimilarity], result of:
              0.030377375 = score(doc=2593,freq=2.0), product of:
                0.15702912 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.044842023 = queryNorm
                0.19345059 = fieldWeight in 2593, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2593)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    20. 1.2015 18:30:22