Search (13 results, page 1 of 1)

Zhang, L.; Lu, W.; Yang, J.: LAGOS-AND : a large gold standard dataset for scholarly author name disambiguation (2023) 0.00
```
0.004296265 = product of:
  0.01718506 = sum of:
    0.010225092 = product of:
      0.030675275 = sum of:
        0.030675275 = weight(_text_:problem in 883) [ClassicSimilarity], result of:
          0.030675275 = score(doc=883,freq=2.0), product of:
            0.13082431 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.030822188 = queryNorm
            0.23447686 = fieldWeight in 883, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.0390625 = fieldNorm(doc=883)
      0.33333334 = coord(1/3)
    0.0069599687 = product of:
      0.020879906 = sum of:
        0.020879906 = weight(_text_:22 in 883) [ClassicSimilarity], result of:
          0.020879906 = score(doc=883,freq=2.0), product of:
            0.10793405 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.030822188 = queryNorm
            0.19345059 = fieldWeight in 883, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=883)
      0.33333334 = coord(1/3)
  0.25 = coord(2/8)
```
Abstract

In this article, we present a method to automatically build large labeled datasets for the author ambiguity problem in the academic world by leveraging the authoritative academic resources, ORCID and DOI. Using the method, we built LAGOS-AND, two large, gold-standard sub-datasets for author name disambiguation (AND), of which LAGOS-AND-BLOCK is created for clustering-based AND research and LAGOS-AND-PAIRWISE is created for classification-based AND research. Our LAGOS-AND datasets are substantially different from the existing ones. The initial versions of the datasets (v1.0, released in February 2021) include 7.5 M citations authored by 798 K unique authors (LAGOS-AND-BLOCK) and close to 1 M instances (LAGOS-AND-PAIRWISE). And both datasets show close similarities to the whole Microsoft Academic Graph (MAG) across validations of six facets. In building the datasets, we reveal the variation degrees of last names in three literature databases, PubMed, MAG, and Semantic Scholar, by comparing author names hosted to the authors' official last names shown on the ORCID pages. Furthermore, we evaluate several baseline disambiguation methods as well as the MAG's author IDs system on our datasets, and the evaluation helps identify several interesting findings. We hope the datasets and findings will bring new insights for future studies. The code and datasets are publicly available.

Date

22. 1.2023 18:40:36
Pooja, K.M.; Mondal, S.; Chandra, J.: ¬A graph combination with edge pruning-based approach for author name disambiguation (2020) 0.00
```
0.0028580003 = product of:
  0.022864003 = sum of:
    0.022864003 = product of:
      0.068592004 = sum of:
        0.068592004 = weight(_text_:problem in 59) [ClassicSimilarity], result of:
          0.068592004 = score(doc=59,freq=10.0), product of:
            0.13082431 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.030822188 = queryNorm
            0.52430624 = fieldWeight in 59, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.0390625 = fieldNorm(doc=59)
      0.33333334 = coord(1/3)
  0.125 = coord(1/8)
```
Abstract

Author name disambiguation (AND) is a challenging problem due to several issues such as missing key identifiers, same name corresponding to multiple authors, along with inconsistent representation. Several techniques have been proposed but maintaining consistent accuracy levels over all data sets is still a major challenge. We identify two major issues associated with the AND problem. First, the namesake problem in which two or more authors with the same name publishes in a similar domain. Second, the diverse topic problem in which one author publishes in diverse topical domains with a different set of coauthors. In this work, we initially propose a method named ATGEP for AND that addresses the namesake issue. We evaluate the performance of ATGEP using various ambiguous name references collected from the Arnetminer Citation (AC) and Web of Science (WoS) data set. We empirically show that the two aforementioned problems are crucial to address the AND problem that are difficult to handle using state-of-the-art techniques. To handle the diverse topic issue, we extend ATGEP to a new variant named ATGEP-web that considers external web information of the authors. Experiments show that with enough information available from external web sources ATGEP-web can significantly improve the results further compared with ATGEP.

Morris, V.: Automated language identification of bibliographic resources (2020) 0.00

0.0013919937 = product of:
  0.01113595 = sum of:
    0.01113595 = product of:
      0.03340785 = sum of:
        0.03340785 = weight(_text_:22 in 5749) [ClassicSimilarity], result of:
          0.03340785 = score(doc=5749,freq=2.0), product of:
            0.10793405 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.030822188 = queryNorm
            0.30952093 = fieldWeight in 5749, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=5749)
      0.33333334 = coord(1/3)
  0.125 = coord(1/8)

Date: 2. 3.2020 19:04:22

Kim, J.; Kim, J.; Owen-Smith, J.: Ethnicity-based name partitioning for author name disambiguation using supervised machine learning (2021) 0.00
```
0.0012781365 = product of:
  0.010225092 = sum of:
    0.010225092 = product of:
      0.030675275 = sum of:
        0.030675275 = weight(_text_:problem in 311) [ClassicSimilarity], result of:
          0.030675275 = score(doc=311,freq=2.0), product of:
            0.13082431 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.030822188 = queryNorm
            0.23447686 = fieldWeight in 311, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.0390625 = fieldNorm(doc=311)
      0.33333334 = coord(1/3)
  0.125 = coord(1/8)
```
Abstract

In several author name disambiguation studies, some ethnic name groups such as East Asian names are reported to be more difficult to disambiguate than others. This implies that disambiguation approaches might be improved if ethnic name groups are distinguished before disambiguation. We explore the potential of ethnic name partitioning by comparing performance of four machine learning algorithms trained and tested on the entire data or specifically on individual name groups. Results show that ethnicity-based name partitioning can substantially improve disambiguation performance because the individual models are better suited for their respective name group. The improvements occur across all ethnic name groups with different magnitudes. Performance gains in predicting matched name pairs outweigh losses in predicting nonmatched pairs. Feature (e.g., coauthor name) similarities of name pairs vary across ethnic name groups. Such differences may enable the development of ethnicity-specific feature weights to improve prediction for specific ethic name categories. These findings are observed for three labeled data with a natural distribution of problem sizes as well as one in which all ethnic name groups are controlled for the same sizes of ambiguous names. This study is expected to motive scholars to group author names based on ethnicity prior to disambiguation.

Preminger, M.; Rype, I.; Ådland, M.K.; Massey, D.; Tallerås, K.: ¬The public library metadata landscape : the case of Norway 2017-2018 (2020) 0.00

0.0012290506 = product of:
  0.009832405 = sum of:
    0.009832405 = product of:
      0.029497212 = sum of:
        0.029497212 = weight(_text_:29 in 5802) [ClassicSimilarity], result of:
          0.029497212 = score(doc=5802,freq=2.0), product of:
            0.108422816 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.030822188 = queryNorm
            0.27205724 = fieldWeight in 5802, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5802)
      0.33333334 = coord(1/3)
  0.125 = coord(1/8)

Date: 30. 3.2020 19:29:18

Holden, C.: ¬The bibliographic work : history, theory, and practice (2021) 0.00

0.0012290506 = product of:
  0.009832405 = sum of:
    0.009832405 = product of:
      0.029497212 = sum of:
        0.029497212 = weight(_text_:29 in 120) [ClassicSimilarity], result of:
          0.029497212 = score(doc=120,freq=2.0), product of:
            0.108422816 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.030822188 = queryNorm
            0.27205724 = fieldWeight in 120, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0546875 = fieldNorm(doc=120)
      0.33333334 = coord(1/3)
  0.125 = coord(1/8)

Date: 25. 9.2022 19:54:29

Aalberg, T.; O'Neill, E.; Zumer, M.: Extending the LRM Model to integrating resources (2021) 0.00

0.0012290506 = product of:
  0.009832405 = sum of:
    0.009832405 = product of:
      0.029497212 = sum of:
        0.029497212 = weight(_text_:29 in 295) [ClassicSimilarity], result of:
          0.029497212 = score(doc=295,freq=2.0), product of:
            0.108422816 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.030822188 = queryNorm
            0.27205724 = fieldWeight in 295, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0546875 = fieldNorm(doc=295)
      0.33333334 = coord(1/3)
  0.125 = coord(1/8)

Date: 28. 6.2021 19:29:58

Yon, A.; Willey, E.: Using the Cataloguing Code of Ethics principles for a retrospective project analysis (2022) 0.00

0.0012290506 = product of:
  0.009832405 = sum of:
    0.009832405 = product of:
      0.029497212 = sum of:
        0.029497212 = weight(_text_:29 in 729) [ClassicSimilarity], result of:
          0.029497212 = score(doc=729,freq=2.0), product of:
            0.108422816 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.030822188 = queryNorm
            0.27205724 = fieldWeight in 729, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0546875 = fieldNorm(doc=729)
      0.33333334 = coord(1/3)
  0.125 = coord(1/8)

Date: 29. 9.2022 17:15:25

Perera, T.: Description specialists and inclusive description work and/or initiatives : an exploratory study (2022) 0.00

0.0012290506 = product of:
  0.009832405 = sum of:
    0.009832405 = product of:
      0.029497212 = sum of:
        0.029497212 = weight(_text_:29 in 974) [ClassicSimilarity], result of:
          0.029497212 = score(doc=974,freq=2.0), product of:
            0.108422816 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.030822188 = queryNorm
            0.27205724 = fieldWeight in 974, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0546875 = fieldNorm(doc=974)
      0.33333334 = coord(1/3)
  0.125 = coord(1/8)

Date: 29. 9.2022 18:01:16

Oudenaar, H.; Bullard, J.: NOT A BOOK : goodreads and the risks of social cataloging with insufficient direction (2024) 0.00

0.0012290506 = product of:
  0.009832405 = sum of:
    0.009832405 = product of:
      0.029497212 = sum of:
        0.029497212 = weight(_text_:29 in 1156) [ClassicSimilarity], result of:
          0.029497212 = score(doc=1156,freq=2.0), product of:
            0.108422816 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.030822188 = queryNorm
            0.27205724 = fieldWeight in 1156, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1156)
      0.33333334 = coord(1/3)
  0.125 = coord(1/8)

Date: 22.11.2023 18:29:56

Das, S.; Paik, J.H.: Gender tagging of named entities using retrieval-assisted multi-context aggregation : an unsupervised approach (2023) 0.00

0.0010439953 = product of:
  0.008351962 = sum of:
    0.008351962 = product of:
      0.025055885 = sum of:
        0.025055885 = weight(_text_:22 in 941) [ClassicSimilarity], result of:
          0.025055885 = score(doc=941,freq=2.0), product of:
            0.10793405 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.030822188 = queryNorm
            0.23214069 = fieldWeight in 941, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=941)
      0.33333334 = coord(1/3)
  0.125 = coord(1/8)

Date: 22. 3.2023 12:00:14

Soos, C.; Leazer, H.H.: Presentations of authorship in knowledge organization (2020) 0.00

8.7789324E-4 = product of:
  0.007023146 = sum of:
    0.007023146 = product of:
      0.021069437 = sum of:
        0.021069437 = weight(_text_:29 in 21) [ClassicSimilarity], result of:
          0.021069437 = score(doc=21,freq=2.0), product of:
            0.108422816 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.030822188 = queryNorm
            0.19432661 = fieldWeight in 21, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0390625 = fieldNorm(doc=21)
      0.33333334 = coord(1/3)
  0.125 = coord(1/8)

Date: 31.10.2020 18:53:29

Kim, J.(im); Kim, J.(enna): Effect of forename string on author name disambiguation (2020) 0.00

8.699961E-4 = product of:
  0.0069599687 = sum of:
    0.0069599687 = product of:
      0.020879906 = sum of:
        0.020879906 = weight(_text_:22 in 5930) [ClassicSimilarity], result of:
          0.020879906 = score(doc=5930,freq=2.0), product of:
            0.10793405 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.030822188 = queryNorm
            0.19345059 = fieldWeight in 5930, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5930)
      0.33333334 = coord(1/3)
  0.125 = coord(1/8)

Date: 11. 7.2020 13:22:58

Search (13 results, page 1 of 1)

Authors

Themes