Document (#39777)

Author
Donner, P.
Title
Enhanced self-citation detection by fuzzy author name matching and complementary error estimates
Source
Journal of the Association for Information Science and Technology. 67(2016) no.3, S.662-670
Year
2016
Abstract
In this article I investigate the shortcomings of exact string match-based author self-citation detection methods. The contributions of this study are twofold. First, I apply a fuzzy string matching algorithm for self-citation detection and benchmark this approach and other common methods of exclusively author name-based self-citation detection against a manually curated ground truth sample. Near full recall can be achieved with the proposed method while incurring only negligible precision loss. Second, I report some important observations from the results about the extent of latent self-citations and their characteristics and give an example of the effect of improved self-citation detection on the document level self-citation rate of real data.
Content
Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23399/abstract.
Theme
Informetrie

Similar documents (content)

  1. Gipp, B.; Meuschke, N.; Breitinger, C.: Citation-based plagiarism detection : practicability on a large-scale scientific corpus (2014) 0.26
    0.2585967 = sum of:
      0.2585967 = product of:
        0.9235596 = sum of:
          0.045768477 = weight(abstract_txt:ground in 3332) [ClassicSimilarity], result of:
            0.045768477 = score(doc=3332,freq=1.0), product of:
              0.10493764 = queryWeight, product of:
                1.0183414 = boost
                6.9783883 = idf(docFreq=111, maxDocs=44218)
                0.014766676 = queryNorm
              0.43614927 = fieldWeight in 3332, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9783883 = idf(docFreq=111, maxDocs=44218)
                0.0625 = fieldNorm(doc=3332)
          0.0056768013 = weight(abstract_txt:this in 3332) [ClassicSimilarity], result of:
            0.0056768013 = score(doc=3332,freq=1.0), product of:
              0.03764118 = queryWeight, product of:
                1.0563796 = boost
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.014766676 = queryNorm
              0.1508136 = fieldWeight in 3332, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.0625 = fieldNorm(doc=3332)
          0.052713178 = weight(abstract_txt:benchmark in 3332) [ClassicSimilarity], result of:
            0.052713178 = score(doc=3332,freq=1.0), product of:
              0.11530101 = queryWeight, product of:
                1.067442 = boost
                7.314861 = idf(docFreq=79, maxDocs=44218)
                0.014766676 = queryNorm
              0.4571788 = fieldWeight in 3332, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.314861 = idf(docFreq=79, maxDocs=44218)
                0.0625 = fieldNorm(doc=3332)
          0.054120783 = weight(abstract_txt:truth in 3332) [ClassicSimilarity], result of:
            0.054120783 = score(doc=3332,freq=1.0), product of:
              0.11734458 = queryWeight, product of:
                1.0768601 = boost
                7.3793993 = idf(docFreq=74, maxDocs=44218)
                0.014766676 = queryNorm
              0.46121246 = fieldWeight in 3332, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3793993 = idf(docFreq=74, maxDocs=44218)
                0.0625 = fieldNorm(doc=3332)
          0.019206809 = weight(abstract_txt:methods in 3332) [ClassicSimilarity], result of:
            0.019206809 = score(doc=3332,freq=1.0), product of:
              0.07410835 = queryWeight, product of:
                1.2102534 = boost
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.014766676 = queryNorm
              0.259172 = fieldWeight in 3332, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.0625 = fieldNorm(doc=3332)
          0.18975678 = weight(abstract_txt:citation in 3332) [ClassicSimilarity], result of:
            0.18975678 = score(doc=3332,freq=4.0), product of:
              0.3100147 = queryWeight, product of:
                4.287405 = boost
                4.896717 = idf(docFreq=897, maxDocs=44218)
                0.014766676 = queryNorm
              0.61208963 = fieldWeight in 3332, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.896717 = idf(docFreq=897, maxDocs=44218)
                0.0625 = fieldNorm(doc=3332)
          0.5563168 = weight(abstract_txt:detection in 3332) [ClassicSimilarity], result of:
            0.5563168 = score(doc=3332,freq=7.0), product of:
              0.4958981 = queryWeight, product of:
                4.950043 = boost
                6.784232 = idf(docFreq=135, maxDocs=44218)
                0.014766676 = queryNorm
              1.1218369 = fieldWeight in 3332, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                6.784232 = idf(docFreq=135, maxDocs=44218)
                0.0625 = fieldNorm(doc=3332)
        0.28 = coord(7/25)
    
  2. Davarpanah, M.R.; Amel, F.: Author self-citation pattern in science (2009) 0.16
    0.155178 = sum of:
      0.155178 = product of:
        0.9698625 = sum of:
          0.010035262 = weight(abstract_txt:this in 2968) [ClassicSimilarity], result of:
            0.010035262 = score(doc=2968,freq=2.0), product of:
              0.03764118 = queryWeight, product of:
                1.0563796 = boost
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.014766676 = queryNorm
              0.2666033 = fieldWeight in 2968, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.078125 = fieldNorm(doc=2968)
          0.08810064 = weight(abstract_txt:author in 2968) [ClassicSimilarity], result of:
            0.08810064 = score(doc=2968,freq=2.0), product of:
              0.16018805 = queryWeight, product of:
                2.179232 = boost
                4.9778743 = idf(docFreq=827, maxDocs=44218)
                0.014766676 = queryNorm
              0.5499826 = fieldWeight in 2968, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.9778743 = idf(docFreq=827, maxDocs=44218)
                0.078125 = fieldNorm(doc=2968)
          0.33544576 = weight(abstract_txt:citation in 2968) [ClassicSimilarity], result of:
            0.33544576 = score(doc=2968,freq=8.0), product of:
              0.3100147 = queryWeight, product of:
                4.287405 = boost
                4.896717 = idf(docFreq=897, maxDocs=44218)
                0.014766676 = queryNorm
              1.0820318 = fieldWeight in 2968, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                4.896717 = idf(docFreq=897, maxDocs=44218)
                0.078125 = fieldNorm(doc=2968)
          0.5362809 = weight(abstract_txt:self in 2968) [ClassicSimilarity], result of:
            0.5362809 = score(doc=2968,freq=7.0), product of:
              0.46652532 = queryWeight, product of:
                5.680864 = boost
                5.561322 = idf(docFreq=461, maxDocs=44218)
                0.014766676 = queryNorm
              1.1495215 = fieldWeight in 2968, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                5.561322 = idf(docFreq=461, maxDocs=44218)
                0.078125 = fieldNorm(doc=2968)
        0.16 = coord(4/25)
    
  3. Galvez, C.; Moya-Anegón, F.: Approximate personal name-matching through finite-state graphs (2007) 0.14
    0.13510776 = sum of:
      0.13510776 = product of:
        0.48252767 = sum of:
          0.009832508 = weight(abstract_txt:this in 614) [ClassicSimilarity], result of:
            0.009832508 = score(doc=614,freq=3.0), product of:
              0.03764118 = queryWeight, product of:
                1.0563796 = boost
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.014766676 = queryNorm
              0.2612168 = fieldWeight in 614, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.0625 = fieldNorm(doc=614)
          0.027162528 = weight(abstract_txt:methods in 614) [ClassicSimilarity], result of:
            0.027162528 = score(doc=614,freq=2.0), product of:
              0.07410835 = queryWeight, product of:
                1.2102534 = boost
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.014766676 = queryNorm
              0.36652455 = fieldWeight in 614, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.0625 = fieldNorm(doc=614)
          0.07227672 = weight(abstract_txt:name in 614) [ClassicSimilarity], result of:
            0.07227672 = score(doc=614,freq=2.0), product of:
              0.14230472 = queryWeight, product of:
                1.6770747 = boost
                5.746245 = idf(docFreq=383, maxDocs=44218)
                0.014766676 = queryNorm
              0.5079011 = fieldWeight in 614, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.746245 = idf(docFreq=383, maxDocs=44218)
                0.0625 = fieldNorm(doc=614)
          0.05958647 = weight(abstract_txt:matching in 614) [ClassicSimilarity], result of:
            0.05958647 = score(doc=614,freq=1.0), product of:
              0.15763843 = queryWeight, product of:
                1.7651182 = boost
                6.047913 = idf(docFreq=283, maxDocs=44218)
                0.014766676 = queryNorm
              0.37799457 = fieldWeight in 614, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.047913 = idf(docFreq=283, maxDocs=44218)
                0.0625 = fieldNorm(doc=614)
          0.099498056 = weight(abstract_txt:string in 614) [ClassicSimilarity], result of:
            0.099498056 = score(doc=614,freq=1.0), product of:
              0.22187416 = queryWeight, product of:
                2.0940938 = boost
                7.1750984 = idf(docFreq=91, maxDocs=44218)
                0.014766676 = queryNorm
              0.44844365 = fieldWeight in 614, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.1750984 = idf(docFreq=91, maxDocs=44218)
                0.0625 = fieldNorm(doc=614)
          0.049837247 = weight(abstract_txt:author in 614) [ClassicSimilarity], result of:
            0.049837247 = score(doc=614,freq=1.0), product of:
              0.16018805 = queryWeight, product of:
                2.179232 = boost
                4.9778743 = idf(docFreq=827, maxDocs=44218)
                0.014766676 = queryNorm
              0.31111714 = fieldWeight in 614, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9778743 = idf(docFreq=827, maxDocs=44218)
                0.0625 = fieldNorm(doc=614)
          0.16433418 = weight(abstract_txt:citation in 614) [ClassicSimilarity], result of:
            0.16433418 = score(doc=614,freq=3.0), product of:
              0.3100147 = queryWeight, product of:
                4.287405 = boost
                4.896717 = idf(docFreq=897, maxDocs=44218)
                0.014766676 = queryNorm
              0.53008515 = fieldWeight in 614, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.896717 = idf(docFreq=897, maxDocs=44218)
                0.0625 = fieldNorm(doc=614)
        0.28 = coord(7/25)
    
  4. Ferreira, A.A.; Veloso, A.; Gonçalves, M.A.; Laender, A.H.F.: Self-training author name disambiguation for information scarce scenarios (2014) 0.13
    0.12640487 = sum of:
      0.12640487 = product of:
        0.6320243 = sum of:
          0.033267166 = weight(abstract_txt:methods in 1292) [ClassicSimilarity], result of:
            0.033267166 = score(doc=1292,freq=3.0), product of:
              0.07410835 = queryWeight, product of:
                1.2102534 = boost
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.014766676 = queryNorm
              0.44889906 = fieldWeight in 1292, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.0625 = fieldNorm(doc=1292)
          0.07227672 = weight(abstract_txt:name in 1292) [ClassicSimilarity], result of:
            0.07227672 = score(doc=1292,freq=2.0), product of:
              0.14230472 = queryWeight, product of:
                1.6770747 = boost
                5.746245 = idf(docFreq=383, maxDocs=44218)
                0.014766676 = queryNorm
              0.5079011 = fieldWeight in 1292, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.746245 = idf(docFreq=383, maxDocs=44218)
                0.0625 = fieldNorm(doc=1292)
          0.111439474 = weight(abstract_txt:author in 1292) [ClassicSimilarity], result of:
            0.111439474 = score(doc=1292,freq=5.0), product of:
              0.16018805 = queryWeight, product of:
                2.179232 = boost
                4.9778743 = idf(docFreq=827, maxDocs=44218)
                0.014766676 = queryNorm
              0.69567907 = fieldWeight in 1292, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.9778743 = idf(docFreq=827, maxDocs=44218)
                0.0625 = fieldNorm(doc=1292)
          0.13417831 = weight(abstract_txt:citation in 1292) [ClassicSimilarity], result of:
            0.13417831 = score(doc=1292,freq=2.0), product of:
              0.3100147 = queryWeight, product of:
                4.287405 = boost
                4.896717 = idf(docFreq=897, maxDocs=44218)
                0.014766676 = queryNorm
              0.43281272 = fieldWeight in 1292, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.896717 = idf(docFreq=897, maxDocs=44218)
                0.0625 = fieldNorm(doc=1292)
          0.2808626 = weight(abstract_txt:self in 1292) [ClassicSimilarity], result of:
            0.2808626 = score(doc=1292,freq=3.0), product of:
              0.46652532 = queryWeight, product of:
                5.680864 = boost
                5.561322 = idf(docFreq=461, maxDocs=44218)
                0.014766676 = queryNorm
              0.60203075 = fieldWeight in 1292, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.561322 = idf(docFreq=461, maxDocs=44218)
                0.0625 = fieldNorm(doc=1292)
        0.2 = coord(5/25)
    
  5. Kim, J.(im); Kim, J.(enna): Effect of forename string on author name disambiguation (2020) 0.12
    0.11554736 = sum of:
      0.11554736 = product of:
        0.48144734 = sum of:
          0.0056768013 = weight(abstract_txt:this in 5930) [ClassicSimilarity], result of:
            0.0056768013 = score(doc=5930,freq=1.0), product of:
              0.03764118 = queryWeight, product of:
                1.0563796 = boost
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.014766676 = queryNorm
              0.1508136 = fieldWeight in 5930, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.0625 = fieldNorm(doc=5930)
          0.019206809 = weight(abstract_txt:methods in 5930) [ClassicSimilarity], result of:
            0.019206809 = score(doc=5930,freq=1.0), product of:
              0.07410835 = queryWeight, product of:
                1.2102534 = boost
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.014766676 = queryNorm
              0.259172 = fieldWeight in 5930, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.0625 = fieldNorm(doc=5930)
          0.08852055 = weight(abstract_txt:name in 5930) [ClassicSimilarity], result of:
            0.08852055 = score(doc=5930,freq=3.0), product of:
              0.14230472 = queryWeight, product of:
                1.6770747 = boost
                5.746245 = idf(docFreq=383, maxDocs=44218)
                0.014766676 = queryNorm
              0.6220493 = fieldWeight in 5930, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.746245 = idf(docFreq=383, maxDocs=44218)
                0.0625 = fieldNorm(doc=5930)
          0.084267996 = weight(abstract_txt:matching in 5930) [ClassicSimilarity], result of:
            0.084267996 = score(doc=5930,freq=2.0), product of:
              0.15763843 = queryWeight, product of:
                1.7651182 = boost
                6.047913 = idf(docFreq=283, maxDocs=44218)
                0.014766676 = queryNorm
              0.53456503 = fieldWeight in 5930, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.047913 = idf(docFreq=283, maxDocs=44218)
                0.0625 = fieldNorm(doc=5930)
          0.1723357 = weight(abstract_txt:string in 5930) [ClassicSimilarity], result of:
            0.1723357 = score(doc=5930,freq=3.0), product of:
              0.22187416 = queryWeight, product of:
                2.0940938 = boost
                7.1750984 = idf(docFreq=91, maxDocs=44218)
                0.014766676 = queryNorm
              0.7767272 = fieldWeight in 5930, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.1750984 = idf(docFreq=91, maxDocs=44218)
                0.0625 = fieldNorm(doc=5930)
          0.111439474 = weight(abstract_txt:author in 5930) [ClassicSimilarity], result of:
            0.111439474 = score(doc=5930,freq=5.0), product of:
              0.16018805 = queryWeight, product of:
                2.179232 = boost
                4.9778743 = idf(docFreq=827, maxDocs=44218)
                0.014766676 = queryNorm
              0.69567907 = fieldWeight in 5930, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.9778743 = idf(docFreq=827, maxDocs=44218)
                0.0625 = fieldNorm(doc=5930)
        0.24 = coord(6/25)