Search (8 results, page 1 of 1)

  • × theme_ss:"Formalerschließung"
  • × year_i:[2020 TO 2030}
  1. Zhang, L.; Lu, W.; Yang, J.: LAGOS-AND : a large gold standard dataset for scholarly author name disambiguation (2023) 0.04
    0.037325222 = sum of:
      0.0203468 = product of:
        0.0813872 = sum of:
          0.0813872 = weight(_text_:authors in 883) [ClassicSimilarity], result of:
            0.0813872 = score(doc=883,freq=4.0), product of:
              0.22851472 = queryWeight, product of:
                4.558814 = idf(docFreq=1258, maxDocs=44218)
                0.05012591 = queryNorm
              0.35615736 = fieldWeight in 883, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.558814 = idf(docFreq=1258, maxDocs=44218)
                0.0390625 = fieldNorm(doc=883)
        0.25 = coord(1/4)
      0.016978422 = product of:
        0.033956844 = sum of:
          0.033956844 = weight(_text_:22 in 883) [ClassicSimilarity], result of:
            0.033956844 = score(doc=883,freq=2.0), product of:
              0.1755324 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.05012591 = queryNorm
              0.19345059 = fieldWeight in 883, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=883)
        0.5 = coord(1/2)
    
    Abstract
    In this article, we present a method to automatically build large labeled datasets for the author ambiguity problem in the academic world by leveraging the authoritative academic resources, ORCID and DOI. Using the method, we built LAGOS-AND, two large, gold-standard sub-datasets for author name disambiguation (AND), of which LAGOS-AND-BLOCK is created for clustering-based AND research and LAGOS-AND-PAIRWISE is created for classification-based AND research. Our LAGOS-AND datasets are substantially different from the existing ones. The initial versions of the datasets (v1.0, released in February 2021) include 7.5 M citations authored by 798 K unique authors (LAGOS-AND-BLOCK) and close to 1 M instances (LAGOS-AND-PAIRWISE). And both datasets show close similarities to the whole Microsoft Academic Graph (MAG) across validations of six facets. In building the datasets, we reveal the variation degrees of last names in three literature databases, PubMed, MAG, and Semantic Scholar, by comparing author names hosted to the authors' official last names shown on the ORCID pages. Furthermore, we evaluate several baseline disambiguation methods as well as the MAG's author IDs system on our datasets, and the evaluation helps identify several interesting findings. We hope the datasets and findings will bring new insights for future studies. The code and datasets are publicly available.
    Date
    22. 1.2023 18:40:36
  2. Kim, J.(im); Kim, J.(enna): Effect of forename string on author name disambiguation (2020) 0.03
    0.031365782 = sum of:
      0.01438736 = product of:
        0.05754944 = sum of:
          0.05754944 = weight(_text_:authors in 5930) [ClassicSimilarity], result of:
            0.05754944 = score(doc=5930,freq=2.0), product of:
              0.22851472 = queryWeight, product of:
                4.558814 = idf(docFreq=1258, maxDocs=44218)
                0.05012591 = queryNorm
              0.25184128 = fieldWeight in 5930, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.558814 = idf(docFreq=1258, maxDocs=44218)
                0.0390625 = fieldNorm(doc=5930)
        0.25 = coord(1/4)
      0.016978422 = product of:
        0.033956844 = sum of:
          0.033956844 = weight(_text_:22 in 5930) [ClassicSimilarity], result of:
            0.033956844 = score(doc=5930,freq=2.0), product of:
              0.1755324 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.05012591 = queryNorm
              0.19345059 = fieldWeight in 5930, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=5930)
        0.5 = coord(1/2)
    
    Abstract
    In author name disambiguation, author forenames are used to decide which name instances are disambiguated together and how much they are likely to refer to the same author. Despite such a crucial role of forenames, their effect on the performance of heuristic (string matching) and algorithmic disambiguation is not well understood. This study assesses the contributions of forenames in author name disambiguation using multiple labeled data sets under varying ratios and lengths of full forenames, reflecting real-world scenarios in which an author is represented by forename variants (synonym) and some authors share the same forenames (homonym). The results show that increasing the ratios of full forenames substantially improves both heuristic and machine-learning-based disambiguation. Performance gains by algorithmic disambiguation are pronounced when many forenames are initialized or homonyms are prevalent. As the ratios of full forenames increase, however, they become marginal compared to those by string matching. Using a small portion of forename strings does not reduce much the performances of both heuristic and algorithmic disambiguation methods compared to using full-length strings. These findings provide practical suggestions, such as restoring initialized forenames into a full-string format via record linkage for improved disambiguation performances.
    Date
    11. 7.2020 13:22:58
  3. Yon, A.; Willey, E.: Using the Cataloguing Code of Ethics principles for a retrospective project analysis (2022) 0.01
    0.014242759 = product of:
      0.028485518 = sum of:
        0.028485518 = product of:
          0.11394207 = sum of:
            0.11394207 = weight(_text_:authors in 729) [ClassicSimilarity], result of:
              0.11394207 = score(doc=729,freq=4.0), product of:
                0.22851472 = queryWeight, product of:
                  4.558814 = idf(docFreq=1258, maxDocs=44218)
                  0.05012591 = queryNorm
                0.49862027 = fieldWeight in 729, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.558814 = idf(docFreq=1258, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=729)
          0.25 = coord(1/4)
      0.5 = coord(1/2)
    
    Abstract
    This study uses the recently released Cataloguing Code of Ethics to evaluate a project which explored how to ethically, efficiently, and accurately add demographic terms for African-American authors to catalog records. By reviewing the project through the lens of these principles the authors were able to examine how their practice was ethical in some ways but could have been improved in others. This helped them identify areas of potential improvement in their current and future research and practice and explore ethical difficulties in cataloging resources with records that are used globally, especially in a linked data environment.
  4. Morris, V.: Automated language identification of bibliographic resources (2020) 0.01
    0.013582738 = product of:
      0.027165476 = sum of:
        0.027165476 = product of:
          0.054330952 = sum of:
            0.054330952 = weight(_text_:22 in 5749) [ClassicSimilarity], result of:
              0.054330952 = score(doc=5749,freq=2.0), product of:
                0.1755324 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05012591 = queryNorm
                0.30952093 = fieldWeight in 5749, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=5749)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    2. 3.2020 19:04:22
  5. Pooja, K.M.; Mondal, S.; Chandra, J.: ¬A graph combination with edge pruning-based approach for author name disambiguation (2020) 0.01
    0.01245982 = product of:
      0.02491964 = sum of:
        0.02491964 = product of:
          0.09967856 = sum of:
            0.09967856 = weight(_text_:authors in 59) [ClassicSimilarity], result of:
              0.09967856 = score(doc=59,freq=6.0), product of:
                0.22851472 = queryWeight, product of:
                  4.558814 = idf(docFreq=1258, maxDocs=44218)
                  0.05012591 = queryNorm
                0.43620193 = fieldWeight in 59, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  4.558814 = idf(docFreq=1258, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=59)
          0.25 = coord(1/4)
      0.5 = coord(1/2)
    
    Abstract
    Author name disambiguation (AND) is a challenging problem due to several issues such as missing key identifiers, same name corresponding to multiple authors, along with inconsistent representation. Several techniques have been proposed but maintaining consistent accuracy levels over all data sets is still a major challenge. We identify two major issues associated with the AND problem. First, the namesake problem in which two or more authors with the same name publishes in a similar domain. Second, the diverse topic problem in which one author publishes in diverse topical domains with a different set of coauthors. In this work, we initially propose a method named ATGEP for AND that addresses the namesake issue. We evaluate the performance of ATGEP using various ambiguous name references collected from the Arnetminer Citation (AC) and Web of Science (WoS) data set. We empirically show that the two aforementioned problems are crucial to address the AND problem that are difficult to handle using state-of-the-art techniques. To handle the diverse topic issue, we extend ATGEP to a new variant named ATGEP-web that considers external web information of the authors. Experiments show that with enough information available from external web sources ATGEP-web can significantly improve the results further compared with ATGEP.
  6. Corbara, S.; Moreo, A.; Sebastiani, F.: Syllabic quantity patterns as rhythmic features for Latin authorship attribution (2023) 0.01
    0.012208079 = product of:
      0.024416158 = sum of:
        0.024416158 = product of:
          0.09766463 = sum of:
            0.09766463 = weight(_text_:authors in 846) [ClassicSimilarity], result of:
              0.09766463 = score(doc=846,freq=4.0), product of:
                0.22851472 = queryWeight, product of:
                  4.558814 = idf(docFreq=1258, maxDocs=44218)
                  0.05012591 = queryNorm
                0.42738882 = fieldWeight in 846, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.558814 = idf(docFreq=1258, maxDocs=44218)
                  0.046875 = fieldNorm(doc=846)
          0.25 = coord(1/4)
      0.5 = coord(1/2)
    
    Abstract
    It is well known that, within the Latin production of written text, peculiar metric schemes were followed not only in poetic compositions, but also in many prose works. Such metric patterns were based on so-called syllabic quantity, that is, on the length of the involved syllables, and there is substantial evidence suggesting that certain authors had a preference for certain metric patterns over others. In this research we investigate the possibility to employ syllabic quantity as a base for deriving rhythmic features for the task of computational authorship attribution of Latin prose texts. We test the impact of these features on the authorship attribution task when combined with other topic-agnostic features. Our experiments, carried out on three different datasets using support vector machines (SVMs) show that rhythmic features based on syllabic quantity are beneficial in discriminating among Latin prose authors.
  7. Das, S.; Paik, J.H.: Gender tagging of named entities using retrieval-assisted multi-context aggregation : an unsupervised approach (2023) 0.01
    0.010187053 = product of:
      0.020374106 = sum of:
        0.020374106 = product of:
          0.040748212 = sum of:
            0.040748212 = weight(_text_:22 in 941) [ClassicSimilarity], result of:
              0.040748212 = score(doc=941,freq=2.0), product of:
                0.1755324 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05012591 = queryNorm
                0.23214069 = fieldWeight in 941, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=941)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    22. 3.2023 12:00:14
  8. Abrahamse, B.: Corporate bodies : access points and authority control (2021) 0.01
    0.010071151 = product of:
      0.020142302 = sum of:
        0.020142302 = product of:
          0.08056921 = sum of:
            0.08056921 = weight(_text_:authors in 698) [ClassicSimilarity], result of:
              0.08056921 = score(doc=698,freq=2.0), product of:
                0.22851472 = queryWeight, product of:
                  4.558814 = idf(docFreq=1258, maxDocs=44218)
                  0.05012591 = queryNorm
                0.35257778 = fieldWeight in 698, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.558814 = idf(docFreq=1258, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=698)
          0.25 = coord(1/4)
      0.5 = coord(1/2)
    
    Abstract
    The concept of authorship is central to how libraries organize their collections. But libraries do not only collect resources created by individuals, they also collect documents issued by organizations. Library catalogers use the concept of a "corporate body" to treat organizations as authors for the purpose of making their documents discoverable to users. This essay looks at the key features of establishing authorized access points (AAPs) and applying authority control for corporate bodies. It examines how practices with regard to corporate bodies have changed over time and considers the changes catalogers might expect to see in the future.