Document (#39937)

Author
Kim, J.
Diesner, J.
Title
Distortive effects of initial-based name disambiguation on measurements of large-scale coauthorship networks
Source
Journal of the Association for Information Science and Technology. 67(2016) no.6, S.1446-1461
Year
2016
Abstract
Scholars have often relied on name initials to resolve name ambiguities in large-scale coauthorship network research. This approach bears the risk of incorrectly merging or splitting author identities. The use of initial-based disambiguation has been justified by the assumption that such errors would not affect research findings too much. This paper tests that assumption by analyzing coauthorship networks from five academic fields-biology, computer science, nanoscience, neuroscience, and physics-and an interdisciplinary journal, PNAS. Name instances in data sets of this study were disambiguated based on heuristics gained from previous algorithmic disambiguation solutions. We use disambiguated data as a proxy of ground-truth to test the performance of three types of initial-based disambiguation. Our results show that initial-based disambiguation can misrepresent statistical properties of coauthorship networks: It deflates the number of unique authors, number of components, average shortest paths, clustering coefficient, and assortativity, while it inflates average productivity, density, average coauthor number per author, and largest component size. Also, on average, more than half of top 10 productive or collaborative authors drop off the lists. Asian names were found to account for the majority of misidentification by initial-based disambiguation due to their common surname and given name initials.
Content
Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23489/abstract.
Theme
Formalerschließung

Similar documents (content)

  1. Kim, J.: Scale-free collaboration networks : an author name disambiguation perspective (2019) 0.32
    0.3234267 = sum of:
      0.3234267 = product of:
        1.0107085 = sum of:
          0.035622608 = weight(abstract_txt:author in 5297) [ClassicSimilarity], result of:
            0.035622608 = score(doc=5297,freq=2.0), product of:
              0.08096304 = queryWeight, product of:
                1.1952264 = boost
                4.9778743 = idf(docFreq=827, maxDocs=44218)
                0.013607949 = queryNorm
              0.43998608 = fieldWeight in 5297, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.9778743 = idf(docFreq=827, maxDocs=44218)
                0.0625 = fieldNorm(doc=5297)
          0.047430195 = weight(abstract_txt:scale in 5297) [ClassicSimilarity], result of:
            0.047430195 = score(doc=5297,freq=2.0), product of:
              0.097988 = queryWeight, product of:
                1.3149016 = boost
                5.476297 = idf(docFreq=502, maxDocs=44218)
                0.013607949 = queryNorm
              0.48404083 = fieldWeight in 5297, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.476297 = idf(docFreq=502, maxDocs=44218)
                0.0625 = fieldNorm(doc=5297)
          0.10960772 = weight(abstract_txt:networks in 5297) [ClassicSimilarity], result of:
            0.10960772 = score(doc=5297,freq=7.0), product of:
              0.12913327 = queryWeight, product of:
                1.8487215 = boost
                5.133032 = idf(docFreq=708, maxDocs=44218)
                0.013607949 = queryNorm
              0.84879535 = fieldWeight in 5297, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                5.133032 = idf(docFreq=708, maxDocs=44218)
                0.0625 = fieldNorm(doc=5297)
          0.20975123 = weight(abstract_txt:disambiguated in 5297) [ClassicSimilarity], result of:
            0.20975123 = score(doc=5297,freq=2.0), product of:
              0.26400125 = queryWeight, product of:
                2.15829 = boost
                8.988837 = idf(docFreq=14, maxDocs=44218)
                0.013607949 = queryNorm
              0.79450846 = fieldWeight in 5297, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.988837 = idf(docFreq=14, maxDocs=44218)
                0.0625 = fieldNorm(doc=5297)
          0.028069884 = weight(abstract_txt:based in 5297) [ClassicSimilarity], result of:
            0.028069884 = score(doc=5297,freq=2.0), product of:
              0.09961785 = queryWeight, product of:
                2.296339 = boost
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.013607949 = queryNorm
              0.28177565 = fieldWeight in 5297, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.0625 = fieldNorm(doc=5297)
          0.1369892 = weight(abstract_txt:name in 5297) [ClassicSimilarity], result of:
            0.1369892 = score(doc=5297,freq=2.0), product of:
              0.26971632 = queryWeight, product of:
                3.4492955 = boost
                5.746245 = idf(docFreq=383, maxDocs=44218)
                0.013607949 = queryNorm
              0.5079011 = fieldWeight in 5297, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.746245 = idf(docFreq=383, maxDocs=44218)
                0.0625 = fieldNorm(doc=5297)
          0.10059999 = weight(abstract_txt:initial in 5297) [ClassicSimilarity], result of:
            0.10059999 = score(doc=5297,freq=1.0), product of:
              0.27660388 = queryWeight, product of:
                3.4930592 = boost
                5.8191514 = idf(docFreq=356, maxDocs=44218)
                0.013607949 = queryNorm
              0.36369696 = fieldWeight in 5297, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8191514 = idf(docFreq=356, maxDocs=44218)
                0.0625 = fieldNorm(doc=5297)
          0.34263763 = weight(abstract_txt:disambiguation in 5297) [ClassicSimilarity], result of:
            0.34263763 = score(doc=5297,freq=2.0), product of:
              0.528121 = queryWeight, product of:
                5.287302 = boost
                7.3401785 = idf(docFreq=77, maxDocs=44218)
                0.013607949 = queryNorm
              0.64878625 = fieldWeight in 5297, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.3401785 = idf(docFreq=77, maxDocs=44218)
                0.0625 = fieldNorm(doc=5297)
        0.32 = coord(8/25)
    
  2. Kang, I.-S.; Na, S.-H.; Lee, S.; Jung, H.; Kim, P.; Sung, W.-K.; Lee, J.-H.: On co-authorship for author disambiguation (2009) 0.28
    0.27880803 = sum of:
      0.27880803 = product of:
        1.1617001 = sum of:
          0.09269782 = weight(abstract_txt:coauthor in 2453) [ClassicSimilarity], result of:
            0.09269782 = score(doc=2453,freq=1.0), product of:
              0.13200063 = queryWeight, product of:
                1.079145 = boost
                8.988837 = idf(docFreq=14, maxDocs=44218)
                0.013607949 = queryNorm
              0.7022529 = fieldWeight in 2453, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.988837 = idf(docFreq=14, maxDocs=44218)
                0.078125 = fieldNorm(doc=2453)
          0.03626125 = weight(abstract_txt:authors in 2453) [ClassicSimilarity], result of:
            0.03626125 = score(doc=2453,freq=2.0), product of:
              0.07060327 = queryWeight, product of:
                1.1161413 = boost
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.013607949 = queryNorm
              0.51359165 = fieldWeight in 2453, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.078125 = fieldNorm(doc=2453)
          0.070405364 = weight(abstract_txt:author in 2453) [ClassicSimilarity], result of:
            0.070405364 = score(doc=2453,freq=5.0), product of:
              0.08096304 = queryWeight, product of:
                1.1952264 = boost
                4.9778743 = idf(docFreq=827, maxDocs=44218)
                0.013607949 = queryNorm
              0.86959887 = fieldWeight in 2453, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.9778743 = idf(docFreq=827, maxDocs=44218)
                0.078125 = fieldNorm(doc=2453)
          0.18539564 = weight(abstract_txt:disambiguated in 2453) [ClassicSimilarity], result of:
            0.18539564 = score(doc=2453,freq=1.0), product of:
              0.26400125 = queryWeight, product of:
                2.15829 = boost
                8.988837 = idf(docFreq=14, maxDocs=44218)
                0.013607949 = queryNorm
              0.7022529 = fieldWeight in 2453, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.988837 = idf(docFreq=14, maxDocs=44218)
                0.078125 = fieldNorm(doc=2453)
          0.17123652 = weight(abstract_txt:name in 2453) [ClassicSimilarity], result of:
            0.17123652 = score(doc=2453,freq=2.0), product of:
              0.26971632 = queryWeight, product of:
                3.4492955 = boost
                5.746245 = idf(docFreq=383, maxDocs=44218)
                0.013607949 = queryNorm
              0.6348764 = fieldWeight in 2453, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.746245 = idf(docFreq=383, maxDocs=44218)
                0.078125 = fieldNorm(doc=2453)
          0.60570353 = weight(abstract_txt:disambiguation in 2453) [ClassicSimilarity], result of:
            0.60570353 = score(doc=2453,freq=4.0), product of:
              0.528121 = queryWeight, product of:
                5.287302 = boost
                7.3401785 = idf(docFreq=77, maxDocs=44218)
                0.013607949 = queryNorm
              1.1469029 = fieldWeight in 2453, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.3401785 = idf(docFreq=77, maxDocs=44218)
                0.078125 = fieldNorm(doc=2453)
        0.24 = coord(6/25)
    
  3. Kim, J.(im); Kim, J.(enna): Effect of forename string on author name disambiguation (2020) 0.25
    0.25291076 = sum of:
      0.25291076 = product of:
        1.0537949 = sum of:
          0.02051246 = weight(abstract_txt:authors in 5930) [ClassicSimilarity], result of:
            0.02051246 = score(doc=5930,freq=1.0), product of:
              0.07060327 = queryWeight, product of:
                1.1161413 = boost
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.013607949 = queryNorm
              0.2905313 = fieldWeight in 5930, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.0625 = fieldNorm(doc=5930)
          0.056324292 = weight(abstract_txt:author in 5930) [ClassicSimilarity], result of:
            0.056324292 = score(doc=5930,freq=5.0), product of:
              0.08096304 = queryWeight, product of:
                1.1952264 = boost
                4.9778743 = idf(docFreq=827, maxDocs=44218)
                0.013607949 = queryNorm
              0.69567907 = fieldWeight in 5930, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.9778743 = idf(docFreq=827, maxDocs=44218)
                0.0625 = fieldNorm(doc=5930)
          0.14831652 = weight(abstract_txt:disambiguated in 5930) [ClassicSimilarity], result of:
            0.14831652 = score(doc=5930,freq=1.0), product of:
              0.26400125 = queryWeight, product of:
                2.15829 = boost
                8.988837 = idf(docFreq=14, maxDocs=44218)
                0.013607949 = queryNorm
              0.5618023 = fieldWeight in 5930, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.988837 = idf(docFreq=14, maxDocs=44218)
                0.0625 = fieldNorm(doc=5930)
          0.019848406 = weight(abstract_txt:based in 5930) [ClassicSimilarity], result of:
            0.019848406 = score(doc=5930,freq=1.0), product of:
              0.09961785 = queryWeight, product of:
                2.296339 = boost
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.013607949 = queryNorm
              0.19924548 = fieldWeight in 5930, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.0625 = fieldNorm(doc=5930)
          0.16777684 = weight(abstract_txt:name in 5930) [ClassicSimilarity], result of:
            0.16777684 = score(doc=5930,freq=3.0), product of:
              0.26971632 = queryWeight, product of:
                3.4492955 = boost
                5.746245 = idf(docFreq=383, maxDocs=44218)
                0.013607949 = queryNorm
              0.6220493 = fieldWeight in 5930, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.746245 = idf(docFreq=383, maxDocs=44218)
                0.0625 = fieldNorm(doc=5930)
          0.6410163 = weight(abstract_txt:disambiguation in 5930) [ClassicSimilarity], result of:
            0.6410163 = score(doc=5930,freq=7.0), product of:
              0.528121 = queryWeight, product of:
                5.287302 = boost
                7.3401785 = idf(docFreq=77, maxDocs=44218)
                0.013607949 = queryNorm
              1.2137679 = fieldWeight in 5930, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                7.3401785 = idf(docFreq=77, maxDocs=44218)
                0.0625 = fieldNorm(doc=5930)
        0.24 = coord(6/25)
    
  4. Strotmann, A.; Zhao, D.: Author name disambiguation : what difference does it make in author-based citation analysis? (2012) 0.25
    0.24560629 = sum of:
      0.24560629 = product of:
        0.8771653 = sum of:
          0.07981933 = weight(abstract_txt:surname in 389) [ClassicSimilarity], result of:
            0.07981933 = score(doc=389,freq=1.0), product of:
              0.13863568 = queryWeight, product of:
                1.1059343 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.013607949 = queryNorm
              0.5757488 = fieldWeight in 389, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.0625 = fieldNorm(doc=389)
          0.04102492 = weight(abstract_txt:authors in 389) [ClassicSimilarity], result of:
            0.04102492 = score(doc=389,freq=4.0), product of:
              0.07060327 = queryWeight, product of:
                1.1161413 = boost
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.013607949 = queryNorm
              0.5810626 = fieldWeight in 389, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.0625 = fieldNorm(doc=389)
          0.079654574 = weight(abstract_txt:author in 389) [ClassicSimilarity], result of:
            0.079654574 = score(doc=389,freq=10.0), product of:
              0.08096304 = queryWeight, product of:
                1.1952264 = boost
                4.9778743 = idf(docFreq=827, maxDocs=44218)
                0.013607949 = queryNorm
              0.9838388 = fieldWeight in 389, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                4.9778743 = idf(docFreq=827, maxDocs=44218)
                0.0625 = fieldNorm(doc=389)
          0.039696813 = weight(abstract_txt:based in 389) [ClassicSimilarity], result of:
            0.039696813 = score(doc=389,freq=4.0), product of:
              0.09961785 = queryWeight, product of:
                2.296339 = boost
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.013607949 = queryNorm
              0.39849097 = fieldWeight in 389, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.0625 = fieldNorm(doc=389)
          0.19373201 = weight(abstract_txt:name in 389) [ClassicSimilarity], result of:
            0.19373201 = score(doc=389,freq=4.0), product of:
              0.26971632 = queryWeight, product of:
                3.4492955 = boost
                5.746245 = idf(docFreq=383, maxDocs=44218)
                0.013607949 = queryNorm
              0.7182806 = fieldWeight in 389, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.746245 = idf(docFreq=383, maxDocs=44218)
                0.0625 = fieldNorm(doc=389)
          0.10059999 = weight(abstract_txt:initial in 389) [ClassicSimilarity], result of:
            0.10059999 = score(doc=389,freq=1.0), product of:
              0.27660388 = queryWeight, product of:
                3.4930592 = boost
                5.8191514 = idf(docFreq=356, maxDocs=44218)
                0.013607949 = queryNorm
              0.36369696 = fieldWeight in 389, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8191514 = idf(docFreq=356, maxDocs=44218)
                0.0625 = fieldNorm(doc=389)
          0.34263763 = weight(abstract_txt:disambiguation in 389) [ClassicSimilarity], result of:
            0.34263763 = score(doc=389,freq=2.0), product of:
              0.528121 = queryWeight, product of:
                5.287302 = boost
                7.3401785 = idf(docFreq=77, maxDocs=44218)
                0.013607949 = queryNorm
              0.64878625 = fieldWeight in 389, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.3401785 = idf(docFreq=77, maxDocs=44218)
                0.0625 = fieldNorm(doc=389)
        0.28 = coord(7/25)
    
  5. Liu, W.; Dog(an, R.I.; Kim, S.; Comeau, D.C.; Kim, W.; Yeganova, L.; Lu, Z.; Wilbur, W.J.: Author name disambiguation for PubMed (2014) 0.22
    0.21914828 = sum of:
      0.21914828 = product of:
        0.7826724 = sum of:
          0.01578933 = weight(abstract_txt:large in 1240) [ClassicSimilarity], result of:
            0.01578933 = score(doc=1240,freq=1.0), product of:
              0.064821154 = queryWeight, product of:
                1.0694616 = boost
                4.454089 = idf(docFreq=1397, maxDocs=44218)
                0.013607949 = queryNorm
              0.243583 = fieldWeight in 1240, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.454089 = idf(docFreq=1397, maxDocs=44218)
                0.0546875 = fieldNorm(doc=1240)
          0.0663941 = weight(abstract_txt:splitting in 1240) [ClassicSimilarity], result of:
            0.0663941 = score(doc=1240,freq=1.0), product of:
              0.13403471 = queryWeight, product of:
                1.0874279 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.013607949 = queryNorm
              0.49535006 = fieldWeight in 1240, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.0546875 = fieldNorm(doc=1240)
          0.06233957 = weight(abstract_txt:author in 1240) [ClassicSimilarity], result of:
            0.06233957 = score(doc=1240,freq=8.0), product of:
              0.08096304 = queryWeight, product of:
                1.1952264 = boost
                4.9778743 = idf(docFreq=827, maxDocs=44218)
                0.013607949 = queryNorm
              0.76997566 = fieldWeight in 1240, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                4.9778743 = idf(docFreq=827, maxDocs=44218)
                0.0546875 = fieldNorm(doc=1240)
          0.029345937 = weight(abstract_txt:scale in 1240) [ClassicSimilarity], result of:
            0.029345937 = score(doc=1240,freq=1.0), product of:
              0.097988 = queryWeight, product of:
                1.3149016 = boost
                5.476297 = idf(docFreq=502, maxDocs=44218)
                0.013607949 = queryNorm
              0.299485 = fieldWeight in 1240, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.476297 = idf(docFreq=502, maxDocs=44218)
                0.0546875 = fieldNorm(doc=1240)
          0.017367356 = weight(abstract_txt:based in 1240) [ClassicSimilarity], result of:
            0.017367356 = score(doc=1240,freq=1.0), product of:
              0.09961785 = queryWeight, product of:
                2.296339 = boost
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.013607949 = queryNorm
              0.1743398 = fieldWeight in 1240, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.0546875 = fieldNorm(doc=1240)
          0.22424793 = weight(abstract_txt:name in 1240) [ClassicSimilarity], result of:
            0.22424793 = score(doc=1240,freq=7.0), product of:
              0.26971632 = queryWeight, product of:
                3.4492955 = boost
                5.746245 = idf(docFreq=383, maxDocs=44218)
                0.013607949 = queryNorm
              0.83142143 = fieldWeight in 1240, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                5.746245 = idf(docFreq=383, maxDocs=44218)
                0.0546875 = fieldNorm(doc=1240)
          0.36718822 = weight(abstract_txt:disambiguation in 1240) [ClassicSimilarity], result of:
            0.36718822 = score(doc=1240,freq=3.0), product of:
              0.528121 = queryWeight, product of:
                5.287302 = boost
                7.3401785 = idf(docFreq=77, maxDocs=44218)
                0.013607949 = queryNorm
              0.6952729 = fieldWeight in 1240, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.3401785 = idf(docFreq=77, maxDocs=44218)
                0.0546875 = fieldNorm(doc=1240)
        0.28 = coord(7/25)