Document (#42931)

Author
Kim, J.(im)
Kim, J.(enna)
Title
Effect of forename string on author name disambiguation
Source
Journal of the Association for Information Science and Technology. 71(2020) no.7, S.839-855
Year
2020
Abstract
In author name disambiguation, author forenames are used to decide which name instances are disambiguated together and how much they are likely to refer to the same author. Despite such a crucial role of forenames, their effect on the performance of heuristic (string matching) and algorithmic disambiguation is not well understood. This study assesses the contributions of forenames in author name disambiguation using multiple labeled data sets under varying ratios and lengths of full forenames, reflecting real-world scenarios in which an author is represented by forename variants (synonym) and some authors share the same forenames (homonym). The results show that increasing the ratios of full forenames substantially improves both heuristic and machine-learning-based disambiguation. Performance gains by algorithmic disambiguation are pronounced when many forenames are initialized or homonyms are prevalent. As the ratios of full forenames increase, however, they become marginal compared to those by string matching. Using a small portion of forename strings does not reduce much the performances of both heuristic and algorithmic disambiguation methods compared to using full-length strings. These findings provide practical suggestions, such as restoring initialized forenames into a full-string format via record linkage for improved disambiguation performances.
Content
https://asistdl.onlinelibrary.wiley.com/doi/10.1002/asi.24298.
Theme
Formalerschließung

Similar documents (content)

  1. Kim, J.; Diesner, J.: Distortive effects of initial-based name disambiguation on measurements of large-scale coauthorship networks (2016) 0.32
    0.31536824 = sum of:
      0.31536824 = product of:
        1.1263151 = sum of:
          0.09301065 = weight(abstract_txt:disambiguated in 2936) [ClassicSimilarity], result of:
            0.09301065 = score(doc=2936,freq=2.0), product of:
              0.117066905 = queryWeight, product of:
                1.1392372 = boost
                8.988837 = idf(docFreq=14, maxDocs=44218)
                0.011431849 = queryNorm
              0.79450846 = fieldWeight in 2936, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.988837 = idf(docFreq=14, maxDocs=44218)
                0.0625 = fieldNorm(doc=2936)
          0.017980358 = weight(abstract_txt:performance in 2936) [ClassicSimilarity], result of:
            0.017980358 = score(doc=2936,freq=1.0), product of:
              0.06212951 = queryWeight, product of:
                1.1737106 = boost
                4.63042 = idf(docFreq=1171, maxDocs=44218)
                0.011431849 = queryNorm
              0.28940126 = fieldWeight in 2936, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.63042 = idf(docFreq=1171, maxDocs=44218)
                0.0625 = fieldNorm(doc=2936)
          0.022306796 = weight(abstract_txt:much in 2936) [ClassicSimilarity], result of:
            0.022306796 = score(doc=2936,freq=1.0), product of:
              0.071733795 = queryWeight, product of:
                1.261171 = boost
                4.9754615 = idf(docFreq=829, maxDocs=44218)
                0.011431849 = queryNorm
              0.31096634 = fieldWeight in 2936, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9754615 = idf(docFreq=829, maxDocs=44218)
                0.0625 = fieldNorm(doc=2936)
          0.120166145 = weight(abstract_txt:algorithmic in 2936) [ClassicSimilarity], result of:
            0.120166145 = score(doc=2936,freq=1.0), product of:
              0.2523389 = queryWeight, product of:
                2.8970087 = boost
                7.61935 = idf(docFreq=58, maxDocs=44218)
                0.011431849 = queryNorm
              0.47620937 = fieldWeight in 2936, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.61935 = idf(docFreq=58, maxDocs=44218)
                0.0625 = fieldNorm(doc=2936)
          0.1374515 = weight(abstract_txt:name in 2936) [ClassicSimilarity], result of:
            0.1374515 = score(doc=2936,freq=4.0), product of:
              0.19136184 = queryWeight, product of:
                2.9130957 = boost
                5.746245 = idf(docFreq=383, maxDocs=44218)
                0.011431849 = queryNorm
              0.7182806 = fieldWeight in 2936, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.746245 = idf(docFreq=383, maxDocs=44218)
                0.0625 = fieldNorm(doc=2936)
          0.09477747 = weight(abstract_txt:author in 2936) [ClassicSimilarity], result of:
            0.09477747 = score(doc=2936,freq=2.0), product of:
              0.21541016 = queryWeight, product of:
                3.7853477 = boost
                4.9778743 = idf(docFreq=827, maxDocs=44218)
                0.011431849 = queryNorm
              0.43998608 = fieldWeight in 2936, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.9778743 = idf(docFreq=827, maxDocs=44218)
                0.0625 = fieldNorm(doc=2936)
          0.64062214 = weight(abstract_txt:disambiguation in 2936) [ClassicSimilarity], result of:
            0.64062214 = score(doc=2936,freq=5.0), product of:
              0.62449694 = queryWeight, product of:
                7.442301 = boost
                7.3401785 = idf(docFreq=77, maxDocs=44218)
                0.011431849 = queryNorm
              1.0258211 = fieldWeight in 2936, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.3401785 = idf(docFreq=77, maxDocs=44218)
                0.0625 = fieldNorm(doc=2936)
        0.28 = coord(7/25)
    
  2. Kim, J.; Kim, J.; Owen-Smith, J.: Ethnicity-based name partitioning for author name disambiguation using supervised machine learning (2021) 0.26
    0.260994 = sum of:
      0.260994 = product of:
        1.087475 = sum of:
          0.044481073 = weight(abstract_txt:gains in 311) [ClassicSimilarity], result of:
            0.044481073 = score(doc=311,freq=1.0), product of:
              0.09019986 = queryWeight, product of:
                7.890225 = idf(docFreq=44, maxDocs=44218)
                0.011431849 = queryNorm
              0.49313906 = fieldWeight in 311, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.890225 = idf(docFreq=44, maxDocs=44218)
                0.0625 = fieldNorm(doc=311)
          0.031142892 = weight(abstract_txt:performance in 311) [ClassicSimilarity], result of:
            0.031142892 = score(doc=311,freq=3.0), product of:
              0.06212951 = queryWeight, product of:
                1.1737106 = boost
                4.63042 = idf(docFreq=1171, maxDocs=44218)
                0.011431849 = queryNorm
              0.50125766 = fieldWeight in 311, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.63042 = idf(docFreq=1171, maxDocs=44218)
                0.0625 = fieldNorm(doc=311)
          0.019303136 = weight(abstract_txt:same in 311) [ClassicSimilarity], result of:
            0.019303136 = score(doc=311,freq=1.0), product of:
              0.06514048 = queryWeight, product of:
                1.2018148 = boost
                4.7412944 = idf(docFreq=1048, maxDocs=44218)
                0.011431849 = queryNorm
              0.2963309 = fieldWeight in 311, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7412944 = idf(docFreq=1048, maxDocs=44218)
                0.0625 = fieldNorm(doc=311)
          0.25714824 = weight(abstract_txt:name in 311) [ClassicSimilarity], result of:
            0.25714824 = score(doc=311,freq=14.0), product of:
              0.19136184 = queryWeight, product of:
                2.9130957 = boost
                5.746245 = idf(docFreq=383, maxDocs=44218)
                0.011431849 = queryNorm
              1.34378 = fieldWeight in 311, product of:
                3.7416575 = tf(freq=14.0), with freq of:
                  14.0 = termFreq=14.0
                5.746245 = idf(docFreq=383, maxDocs=44218)
                0.0625 = fieldNorm(doc=311)
          0.09477747 = weight(abstract_txt:author in 311) [ClassicSimilarity], result of:
            0.09477747 = score(doc=311,freq=2.0), product of:
              0.21541016 = queryWeight, product of:
                3.7853477 = boost
                4.9778743 = idf(docFreq=827, maxDocs=44218)
                0.011431849 = queryNorm
              0.43998608 = fieldWeight in 311, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.9778743 = idf(docFreq=827, maxDocs=44218)
                0.0625 = fieldNorm(doc=311)
          0.64062214 = weight(abstract_txt:disambiguation in 311) [ClassicSimilarity], result of:
            0.64062214 = score(doc=311,freq=5.0), product of:
              0.62449694 = queryWeight, product of:
                7.442301 = boost
                7.3401785 = idf(docFreq=77, maxDocs=44218)
                0.011431849 = queryNorm
              1.0258211 = fieldWeight in 311, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.3401785 = idf(docFreq=77, maxDocs=44218)
                0.0625 = fieldNorm(doc=311)
        0.24 = coord(6/25)
    
  3. Kang, I.-S.; Na, S.-H.; Lee, S.; Jung, H.; Kim, P.; Sung, W.-K.; Lee, J.-H.: On co-authorship for author disambiguation (2009) 0.23
    0.2262777 = sum of:
      0.2262777 = product of:
        1.1313884 = sum of:
          0.08221058 = weight(abstract_txt:disambiguated in 2453) [ClassicSimilarity], result of:
            0.08221058 = score(doc=2453,freq=1.0), product of:
              0.117066905 = queryWeight, product of:
                1.1392372 = boost
                8.988837 = idf(docFreq=14, maxDocs=44218)
                0.011431849 = queryNorm
              0.7022529 = fieldWeight in 2453, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.988837 = idf(docFreq=14, maxDocs=44218)
                0.078125 = fieldNorm(doc=2453)
          0.024128921 = weight(abstract_txt:same in 2453) [ClassicSimilarity], result of:
            0.024128921 = score(doc=2453,freq=1.0), product of:
              0.06514048 = queryWeight, product of:
                1.2018148 = boost
                4.7412944 = idf(docFreq=1048, maxDocs=44218)
                0.011431849 = queryNorm
              0.37041363 = fieldWeight in 2453, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7412944 = idf(docFreq=1048, maxDocs=44218)
                0.078125 = fieldNorm(doc=2453)
          0.12149111 = weight(abstract_txt:name in 2453) [ClassicSimilarity], result of:
            0.12149111 = score(doc=2453,freq=2.0), product of:
              0.19136184 = queryWeight, product of:
                2.9130957 = boost
                5.746245 = idf(docFreq=383, maxDocs=44218)
                0.011431849 = queryNorm
              0.6348764 = fieldWeight in 2453, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.746245 = idf(docFreq=383, maxDocs=44218)
                0.078125 = fieldNorm(doc=2453)
          0.18732043 = weight(abstract_txt:author in 2453) [ClassicSimilarity], result of:
            0.18732043 = score(doc=2453,freq=5.0), product of:
              0.21541016 = queryWeight, product of:
                3.7853477 = boost
                4.9778743 = idf(docFreq=827, maxDocs=44218)
                0.011431849 = queryNorm
              0.86959887 = fieldWeight in 2453, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.9778743 = idf(docFreq=827, maxDocs=44218)
                0.078125 = fieldNorm(doc=2453)
          0.71623737 = weight(abstract_txt:disambiguation in 2453) [ClassicSimilarity], result of:
            0.71623737 = score(doc=2453,freq=4.0), product of:
              0.62449694 = queryWeight, product of:
                7.442301 = boost
                7.3401785 = idf(docFreq=77, maxDocs=44218)
                0.011431849 = queryNorm
              1.1469029 = fieldWeight in 2453, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.3401785 = idf(docFreq=77, maxDocs=44218)
                0.078125 = fieldNorm(doc=2453)
        0.2 = coord(5/25)
    
  4. Kim, J.: Scale-free collaboration networks : an author name disambiguation perspective (2019) 0.20
    0.20130964 = sum of:
      0.20130964 = product of:
        0.8387902 = sum of:
          0.09301065 = weight(abstract_txt:disambiguated in 5297) [ClassicSimilarity], result of:
            0.09301065 = score(doc=5297,freq=2.0), product of:
              0.117066905 = queryWeight, product of:
                1.1392372 = boost
                8.988837 = idf(docFreq=14, maxDocs=44218)
                0.011431849 = queryNorm
              0.79450846 = fieldWeight in 5297, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.988837 = idf(docFreq=14, maxDocs=44218)
                0.0625 = fieldNorm(doc=5297)
          0.040063944 = weight(abstract_txt:matching in 5297) [ClassicSimilarity], result of:
            0.040063944 = score(doc=5297,freq=1.0), product of:
              0.1059908 = queryWeight, product of:
                1.533014 = boost
                6.047913 = idf(docFreq=283, maxDocs=44218)
                0.011431849 = queryNorm
              0.37799457 = fieldWeight in 5297, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.047913 = idf(docFreq=283, maxDocs=44218)
                0.0625 = fieldNorm(doc=5297)
          0.10858021 = weight(abstract_txt:heuristic in 5297) [ClassicSimilarity], result of:
            0.10858021 = score(doc=5297,freq=1.0), product of:
              0.23584674 = queryWeight, product of:
                2.800739 = boost
                7.3661537 = idf(docFreq=75, maxDocs=44218)
                0.011431849 = queryNorm
              0.4603846 = fieldWeight in 5297, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3661537 = idf(docFreq=75, maxDocs=44218)
                0.0625 = fieldNorm(doc=5297)
          0.09719288 = weight(abstract_txt:name in 5297) [ClassicSimilarity], result of:
            0.09719288 = score(doc=5297,freq=2.0), product of:
              0.19136184 = queryWeight, product of:
                2.9130957 = boost
                5.746245 = idf(docFreq=383, maxDocs=44218)
                0.011431849 = queryNorm
              0.5079011 = fieldWeight in 5297, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.746245 = idf(docFreq=383, maxDocs=44218)
                0.0625 = fieldNorm(doc=5297)
          0.09477747 = weight(abstract_txt:author in 5297) [ClassicSimilarity], result of:
            0.09477747 = score(doc=5297,freq=2.0), product of:
              0.21541016 = queryWeight, product of:
                3.7853477 = boost
                4.9778743 = idf(docFreq=827, maxDocs=44218)
                0.011431849 = queryNorm
              0.43998608 = fieldWeight in 5297, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.9778743 = idf(docFreq=827, maxDocs=44218)
                0.0625 = fieldNorm(doc=5297)
          0.40516502 = weight(abstract_txt:disambiguation in 5297) [ClassicSimilarity], result of:
            0.40516502 = score(doc=5297,freq=2.0), product of:
              0.62449694 = queryWeight, product of:
                7.442301 = boost
                7.3401785 = idf(docFreq=77, maxDocs=44218)
                0.011431849 = queryNorm
              0.64878625 = fieldWeight in 5297, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.3401785 = idf(docFreq=77, maxDocs=44218)
                0.0625 = fieldNorm(doc=5297)
        0.24 = coord(6/25)
    
  5. Liu, W.; Dog(an, R.I.; Kim, S.; Comeau, D.C.; Kim, W.; Yeganova, L.; Lu, Z.; Wilbur, W.J.: Author name disambiguation for PubMed (2014) 0.20
    0.19837981 = sum of:
      0.19837981 = product of:
        0.82658255 = sum of:
          0.022249559 = weight(abstract_txt:performance in 1240) [ClassicSimilarity], result of:
            0.022249559 = score(doc=1240,freq=2.0), product of:
              0.06212951 = queryWeight, product of:
                1.1737106 = boost
                4.63042 = idf(docFreq=1171, maxDocs=44218)
                0.011431849 = queryNorm
              0.3581158 = fieldWeight in 1240, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.63042 = idf(docFreq=1171, maxDocs=44218)
                0.0546875 = fieldNorm(doc=1240)
          0.016890245 = weight(abstract_txt:same in 1240) [ClassicSimilarity], result of:
            0.016890245 = score(doc=1240,freq=1.0), product of:
              0.06514048 = queryWeight, product of:
                1.2018148 = boost
                4.7412944 = idf(docFreq=1048, maxDocs=44218)
                0.011431849 = queryNorm
              0.25928953 = fieldWeight in 1240, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7412944 = idf(docFreq=1048, maxDocs=44218)
                0.0546875 = fieldNorm(doc=1240)
          0.028284024 = weight(abstract_txt:compared in 1240) [ClassicSimilarity], result of:
            0.028284024 = score(doc=1240,freq=2.0), product of:
              0.07290844 = queryWeight, product of:
                1.2714549 = boost
                5.0160327 = idf(docFreq=796, maxDocs=44218)
                0.011431849 = queryNorm
              0.38793898 = fieldWeight in 1240, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.0160327 = idf(docFreq=796, maxDocs=44218)
                0.0546875 = fieldNorm(doc=1240)
          0.15910234 = weight(abstract_txt:name in 1240) [ClassicSimilarity], result of:
            0.15910234 = score(doc=1240,freq=7.0), product of:
              0.19136184 = queryWeight, product of:
                2.9130957 = boost
                5.746245 = idf(docFreq=383, maxDocs=44218)
                0.011431849 = queryNorm
              0.83142143 = fieldWeight in 1240, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                5.746245 = idf(docFreq=383, maxDocs=44218)
                0.0546875 = fieldNorm(doc=1240)
          0.16586058 = weight(abstract_txt:author in 1240) [ClassicSimilarity], result of:
            0.16586058 = score(doc=1240,freq=8.0), product of:
              0.21541016 = queryWeight, product of:
                3.7853477 = boost
                4.9778743 = idf(docFreq=827, maxDocs=44218)
                0.011431849 = queryNorm
              0.76997566 = fieldWeight in 1240, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                4.9778743 = idf(docFreq=827, maxDocs=44218)
                0.0546875 = fieldNorm(doc=1240)
          0.43419582 = weight(abstract_txt:disambiguation in 1240) [ClassicSimilarity], result of:
            0.43419582 = score(doc=1240,freq=3.0), product of:
              0.62449694 = queryWeight, product of:
                7.442301 = boost
                7.3401785 = idf(docFreq=77, maxDocs=44218)
                0.011431849 = queryNorm
              0.6952729 = fieldWeight in 1240, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.3401785 = idf(docFreq=77, maxDocs=44218)
                0.0546875 = fieldNorm(doc=1240)
        0.24 = coord(6/25)