Search (6 results, page 1 of 1)

  • × theme_ss:"Formalerschließung"
  • × type_ss:"a"
  • × year_i:[2020 TO 2030}
  1. Das, S.; Paik, J.H.: Gender tagging of named entities using retrieval-assisted multi-context aggregation : an unsupervised approach (2023) 0.03
    0.026304178 = product of:
      0.10521671 = sum of:
        0.10521671 = sum of:
          0.068480164 = weight(_text_:methods in 941) [ClassicSimilarity], result of:
            0.068480164 = score(doc=941,freq=4.0), product of:
              0.18168657 = queryWeight, product of:
                4.0204134 = idf(docFreq=2156, maxDocs=44218)
                0.045191016 = queryNorm
              0.37691376 = fieldWeight in 941, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.0204134 = idf(docFreq=2156, maxDocs=44218)
                0.046875 = fieldNorm(doc=941)
          0.03673655 = weight(_text_:22 in 941) [ClassicSimilarity], result of:
            0.03673655 = score(doc=941,freq=2.0), product of:
              0.15825124 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.045191016 = queryNorm
              0.23214069 = fieldWeight in 941, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=941)
      0.25 = coord(1/4)
    
    Abstract
    Inferring the gender of named entities present in a text has several practical applications in information sciences. Existing approaches toward name gender identification rely exclusively on using the gender distributions from labeled data. In the absence of such labeled data, these methods fail. In this article, we propose a two-stage model that is able to infer the gender of names present in text without requiring explicit name-gender labels. We use coreference resolution as the backbone for our proposed model. To aid coreference resolution where the existing contextual information does not suffice, we use a retrieval-assisted context aggregation framework. We demonstrate that state-of-the-art name gender inference is possible without supervision. Our proposed method matches or outperforms several supervised approaches and commercially used methods on five English language datasets from different domains.
    Date
    22. 3.2023 12:00:14
  2. Kim, J.(im); Kim, J.(enna): Effect of forename string on author name disambiguation (2020) 0.02
    0.01774153 = product of:
      0.07096612 = sum of:
        0.07096612 = sum of:
          0.040352322 = weight(_text_:methods in 5930) [ClassicSimilarity], result of:
            0.040352322 = score(doc=5930,freq=2.0), product of:
              0.18168657 = queryWeight, product of:
                4.0204134 = idf(docFreq=2156, maxDocs=44218)
                0.045191016 = queryNorm
              0.22209854 = fieldWeight in 5930, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0204134 = idf(docFreq=2156, maxDocs=44218)
                0.0390625 = fieldNorm(doc=5930)
          0.030613795 = weight(_text_:22 in 5930) [ClassicSimilarity], result of:
            0.030613795 = score(doc=5930,freq=2.0), product of:
              0.15825124 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.045191016 = queryNorm
              0.19345059 = fieldWeight in 5930, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=5930)
      0.25 = coord(1/4)
    
    Abstract
    In author name disambiguation, author forenames are used to decide which name instances are disambiguated together and how much they are likely to refer to the same author. Despite such a crucial role of forenames, their effect on the performance of heuristic (string matching) and algorithmic disambiguation is not well understood. This study assesses the contributions of forenames in author name disambiguation using multiple labeled data sets under varying ratios and lengths of full forenames, reflecting real-world scenarios in which an author is represented by forename variants (synonym) and some authors share the same forenames (homonym). The results show that increasing the ratios of full forenames substantially improves both heuristic and machine-learning-based disambiguation. Performance gains by algorithmic disambiguation are pronounced when many forenames are initialized or homonyms are prevalent. As the ratios of full forenames increase, however, they become marginal compared to those by string matching. Using a small portion of forename strings does not reduce much the performances of both heuristic and algorithmic disambiguation methods compared to using full-length strings. These findings provide practical suggestions, such as restoring initialized forenames into a full-string format via record linkage for improved disambiguation performances.
    Date
    11. 7.2020 13:22:58
  3. Zhang, L.; Lu, W.; Yang, J.: LAGOS-AND : a large gold standard dataset for scholarly author name disambiguation (2023) 0.02
    0.01774153 = product of:
      0.07096612 = sum of:
        0.07096612 = sum of:
          0.040352322 = weight(_text_:methods in 883) [ClassicSimilarity], result of:
            0.040352322 = score(doc=883,freq=2.0), product of:
              0.18168657 = queryWeight, product of:
                4.0204134 = idf(docFreq=2156, maxDocs=44218)
                0.045191016 = queryNorm
              0.22209854 = fieldWeight in 883, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0204134 = idf(docFreq=2156, maxDocs=44218)
                0.0390625 = fieldNorm(doc=883)
          0.030613795 = weight(_text_:22 in 883) [ClassicSimilarity], result of:
            0.030613795 = score(doc=883,freq=2.0), product of:
              0.15825124 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.045191016 = queryNorm
              0.19345059 = fieldWeight in 883, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=883)
      0.25 = coord(1/4)
    
    Abstract
    In this article, we present a method to automatically build large labeled datasets for the author ambiguity problem in the academic world by leveraging the authoritative academic resources, ORCID and DOI. Using the method, we built LAGOS-AND, two large, gold-standard sub-datasets for author name disambiguation (AND), of which LAGOS-AND-BLOCK is created for clustering-based AND research and LAGOS-AND-PAIRWISE is created for classification-based AND research. Our LAGOS-AND datasets are substantially different from the existing ones. The initial versions of the datasets (v1.0, released in February 2021) include 7.5 M citations authored by 798 K unique authors (LAGOS-AND-BLOCK) and close to 1 M instances (LAGOS-AND-PAIRWISE). And both datasets show close similarities to the whole Microsoft Academic Graph (MAG) across validations of six facets. In building the datasets, we reveal the variation degrees of last names in three literature databases, PubMed, MAG, and Semantic Scholar, by comparing author names hosted to the authors' official last names shown on the ORCID pages. Furthermore, we evaluate several baseline disambiguation methods as well as the MAG's author IDs system on our datasets, and the evaluation helps identify several interesting findings. We hope the datasets and findings will bring new insights for future studies. The code and datasets are publicly available.
    Date
    22. 1.2023 18:40:36
  4. Hahn, J.: Semi-automated methods for BIBFRAME work entity description (2021) 0.01
    0.009986691 = product of:
      0.039946765 = sum of:
        0.039946765 = product of:
          0.07989353 = sum of:
            0.07989353 = weight(_text_:methods in 725) [ClassicSimilarity], result of:
              0.07989353 = score(doc=725,freq=4.0), product of:
                0.18168657 = queryWeight, product of:
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.045191016 = queryNorm
                0.43973273 = fieldWeight in 725, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=725)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    This paper reports an investigation of machine learning methods for the semi-automated creation of a BIBFRAME Work entity description within the RDF linked data editor Sinopia (https://sinopia.io). The automated subject indexing software Annif was configured with the Library of Congress Subject Headings (LCSH) vocabulary from the Linked Data Service at https://id.loc.gov/. The training corpus was comprised of 9.3 million titles and LCSH linked data references from the IvyPlus POD project (https://pod.stanford.edu/) and from Share-VDE (https://wiki.share-vde.org). Semi-automated processes were explored to support and extend, not replace, professional expertise.
  5. Serra, L.G.; Schneider, J.A.; Santarém Segundo, J.E.: Person identifiers in MARC 21 records in a semantic environment (2020) 0.01
    0.0070616566 = product of:
      0.028246626 = sum of:
        0.028246626 = product of:
          0.056493253 = sum of:
            0.056493253 = weight(_text_:methods in 127) [ClassicSimilarity], result of:
              0.056493253 = score(doc=127,freq=2.0), product of:
                0.18168657 = queryWeight, product of:
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.045191016 = queryNorm
                0.31093797 = fieldWeight in 127, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=127)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    This article discusses how libraries can include person identifiers in the MARC format. It suggests using URIs in fields and subfields to help transition the data to an RDF model, and to help prepare the catalog for a Linked Data. It analyzes the selection of URIs and Real-World Objects, and the use of tag 024 to describe person identifiers in authority records. When a creator or collaborator is identified in a work, the identifiers are transferred from authority to the bibliographic record. The article concludes that URI-based descriptions can provide a better experience for users, offering other methods of discovery.
  6. Morris, V.: Automated language identification of bibliographic resources (2020) 0.01
    0.006122759 = product of:
      0.024491036 = sum of:
        0.024491036 = product of:
          0.048982073 = sum of:
            0.048982073 = weight(_text_:22 in 5749) [ClassicSimilarity], result of:
              0.048982073 = score(doc=5749,freq=2.0), product of:
                0.15825124 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.045191016 = queryNorm
                0.30952093 = fieldWeight in 5749, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=5749)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    2. 3.2020 19:04:22