Search (3 results, page 1 of 1)

  • × author_ss:"Lu, W."
  • × year_i:[2020 TO 2030}
  1. Jiang, Y.; Meng, R.; Huang, Y.; Lu, W.; Liu, J.: Generating keyphrases for readers : a controllable keyphrase generation framework (2023) 0.02
    0.01586826 = product of:
      0.03173652 = sum of:
        0.03173652 = sum of:
          0.0061926404 = weight(_text_:a in 1012) [ClassicSimilarity], result of:
            0.0061926404 = score(doc=1012,freq=10.0), product of:
              0.043477926 = queryWeight, product of:
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.037706986 = queryNorm
              0.14243183 = fieldWeight in 1012, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1012)
          0.02554388 = weight(_text_:22 in 1012) [ClassicSimilarity], result of:
            0.02554388 = score(doc=1012,freq=2.0), product of:
              0.13204344 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.037706986 = queryNorm
              0.19345059 = fieldWeight in 1012, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1012)
      0.5 = coord(1/2)
    
    Abstract
    With the wide application of keyphrases in many Information Retrieval (IR) and Natural Language Processing (NLP) tasks, automatic keyphrase prediction has been emerging. However, these statistically important phrases are contributing increasingly less to the related tasks because the end-to-end learning mechanism enables models to learn the important semantic information of the text directly. Similarly, keyphrases are of little help for readers to quickly grasp the paper's main idea because the relationship between the keyphrase and the paper is not explicit to readers. Therefore, we propose to generate keyphrases with specific functions for readers to bridge the semantic gap between them and the information producers, and verify the effectiveness of the keyphrase function for assisting users' comprehension with a user experiment. A controllable keyphrase generation framework (the CKPG) that uses the keyphrase function as a control code to generate categorized keyphrases is proposed and implemented based on Transformer, BART, and T5, respectively. For the Computer Science domain, the Macro-avgs of , , and on the Paper with Code dataset are up to 0.680, 0.535, and 0.558, respectively. Our experimental results indicate the effectiveness of the CKPG models.
    Date
    22. 6.2023 14:55:20
    Type
    a
  2. Zhang, L.; Lu, W.; Yang, J.: LAGOS-AND : a large gold standard dataset for scholarly author name disambiguation (2023) 0.02
    0.0151703395 = product of:
      0.030340679 = sum of:
        0.030340679 = sum of:
          0.004796799 = weight(_text_:a in 883) [ClassicSimilarity], result of:
            0.004796799 = score(doc=883,freq=6.0), product of:
              0.043477926 = queryWeight, product of:
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.037706986 = queryNorm
              0.11032722 = fieldWeight in 883, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.0390625 = fieldNorm(doc=883)
          0.02554388 = weight(_text_:22 in 883) [ClassicSimilarity], result of:
            0.02554388 = score(doc=883,freq=2.0), product of:
              0.13204344 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.037706986 = queryNorm
              0.19345059 = fieldWeight in 883, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=883)
      0.5 = coord(1/2)
    
    Abstract
    In this article, we present a method to automatically build large labeled datasets for the author ambiguity problem in the academic world by leveraging the authoritative academic resources, ORCID and DOI. Using the method, we built LAGOS-AND, two large, gold-standard sub-datasets for author name disambiguation (AND), of which LAGOS-AND-BLOCK is created for clustering-based AND research and LAGOS-AND-PAIRWISE is created for classification-based AND research. Our LAGOS-AND datasets are substantially different from the existing ones. The initial versions of the datasets (v1.0, released in February 2021) include 7.5 M citations authored by 798 K unique authors (LAGOS-AND-BLOCK) and close to 1 M instances (LAGOS-AND-PAIRWISE). And both datasets show close similarities to the whole Microsoft Academic Graph (MAG) across validations of six facets. In building the datasets, we reveal the variation degrees of last names in three literature databases, PubMed, MAG, and Semantic Scholar, by comparing author names hosted to the authors' official last names shown on the ORCID pages. Furthermore, we evaluate several baseline disambiguation methods as well as the MAG's author IDs system on our datasets, and the evaluation helps identify several interesting findings. We hope the datasets and findings will bring new insights for future studies. The code and datasets are publicly available.
    Date
    22. 1.2023 18:40:36
    Type
    a
  3. Huang, S.; Qian, J.; Huang, Y.; Lu, W.; Bu, Y.; Yang, J.; Cheng, Q.: Disclosing the relationship between citation structure and future impact of a publication (2022) 0.00
    0.0018318077 = product of:
      0.0036636153 = sum of:
        0.0036636153 = product of:
          0.0073272306 = sum of:
            0.0073272306 = weight(_text_:a in 621) [ClassicSimilarity], result of:
              0.0073272306 = score(doc=621,freq=14.0), product of:
                0.043477926 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.037706986 = queryNorm
                0.1685276 = fieldWeight in 621, product of:
                  3.7416575 = tf(freq=14.0), with freq of:
                    14.0 = termFreq=14.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=621)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Each section header of an article has its distinct communicative function. Citations from distinct sections may be different regarding citing motivation. In this paper, we grouped section headers with similar functions as a structural function and defined the distribution of citations from structural functions for a paper as its citation structure. We aim to explore the relationship between citation structure and the future impact of a publication and disclose the relative importance among citations from different structural functions. Specifically, we proposed two citation counting methods and a citation life cycle identification method, by which the regression data were built. Subsequently, we employed a ridge regression model to predict the future impact of the paper and analyzed the relative weights of regressors. Based on documents collected from the Association for Computational Linguistics Anthology website, our empirical experiments disclosed that functional structure features improve the prediction accuracy of citation count prediction and that there exist differences among citations from different structural functions. Specifically, at the early stage of citation lifetime, citations from Introduction and Method are particularly important for perceiving future impact of papers, and citations from Result and Conclusion are also vital. However, early accumulation of citations from the Background seems less important.
    Type
    a