Search (1 results, page 1 of 1)

Kim, J.(im); Kim, J.(enna): Effect of forename string on author name disambiguation (2020) 0.02
```
0.018982807 = product of:
  0.037965614 = sum of:
    0.037965614 = sum of:
      0.006765375 = weight(_text_:a in 5930) [ClassicSimilarity], result of:
        0.006765375 = score(doc=5930,freq=8.0), product of:
          0.053105544 = queryWeight, product of:
            1.153047 = idf(docFreq=37942, maxDocs=44218)
            0.046056706 = queryNorm
          0.12739488 = fieldWeight in 5930, product of:
            2.828427 = tf(freq=8.0), with freq of:
              8.0 = termFreq=8.0
            1.153047 = idf(docFreq=37942, maxDocs=44218)
            0.0390625 = fieldNorm(doc=5930)
      0.03120024 = weight(_text_:22 in 5930) [ClassicSimilarity], result of:
        0.03120024 = score(doc=5930,freq=2.0), product of:
          0.16128273 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046056706 = queryNorm
          0.19345059 = fieldWeight in 5930, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=5930)
  0.5 = coord(1/2)
```
Abstract

In author name disambiguation, author forenames are used to decide which name instances are disambiguated together and how much they are likely to refer to the same author. Despite such a crucial role of forenames, their effect on the performance of heuristic (string matching) and algorithmic disambiguation is not well understood. This study assesses the contributions of forenames in author name disambiguation using multiple labeled data sets under varying ratios and lengths of full forenames, reflecting real-world scenarios in which an author is represented by forename variants (synonym) and some authors share the same forenames (homonym). The results show that increasing the ratios of full forenames substantially improves both heuristic and machine-learning-based disambiguation. Performance gains by algorithmic disambiguation are pronounced when many forenames are initialized or homonyms are prevalent. As the ratios of full forenames increase, however, they become marginal compared to those by string matching. Using a small portion of forename strings does not reduce much the performances of both heuristic and algorithmic disambiguation methods compared to using full-length strings. These findings provide practical suggestions, such as restoring initialized forenames into a full-string format via record linkage for improved disambiguation performances.

Date

11. 7.2020 13:22:58

Type

a