Document (#43412)

Author
Cabanac, G.
Labbé, C.
Title
Prevalence of nonsensical algorithmically generated papers in the scientific literature
Source
Journal of the Association for Information Science and Technology. 72(2021) no.12, S.1461-1476
Year
2021
Abstract
In 2014 leading publishers withdrew more than 120 nonsensical publications automatically generated with the SCIgen program. Casual observations suggested that similar problematic papers are still published and sold, without follow-up retractions. No systematic screening has been performed and the prevalence of such nonsensical publications in the scientific literature is unknown. Our contribution is 2-fold. First, we designed a detector that combs the scientific literature for grammar-based computer-generated papers. Applied to SCIgen, it has a 83.6% precision. Second, we performed a scientometric study of the 243 detected SCIgen-papers from 19 publishers. We estimate the prevalence of SCIgen-papers to be 75 per million papers in Information and Computing Sciences. Only 19% of the 243 problematic papers were dealt with: formal retraction (12) or silent removal (34). Publishers still serve and sometimes sell the remaining 197 papers without any caveat. We found evidence of citation manipulation via edited SCIgen bibliographies. This work reveals metric gaming up to the point of absurdity: fraudsters publish nonsensical algorithmically generated papers featuring genuine references. It stresses the need to screen papers for nonsense before peer-review and chase citation manipulation in published papers. Overall, this is yet another illustration of the harmful effects of the pressure to publish or perish.
Content
Vgl.: https://asistdl.onlinelibrary.wiley.com/doi/10.1002/asi.24495.
Theme
Elektronisches Publizieren
Informetrie
Field
Kommunikationswissenschaften
Object
SCIgen

Similar documents (author)

  1. Cabanac, G.: Shaping the landscape of research in information systems from the perspective of editorial boards : a scientometric study of 77 leading journals (2012) 5.94
    5.937289 = sum of:
      5.937289 = weight(author_txt:cabanac in 242) [ClassicSimilarity], result of:
        5.937289 = fieldWeight in 242, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.499662 = idf(docFreq=8, maxDocs=44218)
          0.625 = fieldNorm(doc=242)
    
  2. Cabanac, G.: Bibliogifts in LibGen? : a study of a text-sharing platform driven by biblioleaks and crowdsourcing (2016) 5.94
    5.937289 = sum of:
      5.937289 = weight(author_txt:cabanac in 2850) [ClassicSimilarity], result of:
        5.937289 = fieldWeight in 2850, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.499662 = idf(docFreq=8, maxDocs=44218)
          0.625 = fieldNorm(doc=2850)
    
  3. Cabanac, G.; Preuss, T.: Capitalizing on order effects in the bids of peer-reviewed conferences to secure reviews by expert referees (2013) 4.75
    4.749831 = sum of:
      4.749831 = weight(author_txt:cabanac in 619) [ClassicSimilarity], result of:
        4.749831 = fieldWeight in 619, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.499662 = idf(docFreq=8, maxDocs=44218)
          0.5 = fieldNorm(doc=619)
    
  4. Cabanac, G.; Hartley, J.: Issues of work-life balance among JASIST authors and editors (2013) 4.75
    4.749831 = sum of:
      4.749831 = weight(author_txt:cabanac in 996) [ClassicSimilarity], result of:
        4.749831 = fieldWeight in 996, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.499662 = idf(docFreq=8, maxDocs=44218)
          0.5 = fieldNorm(doc=996)
    
  5. Cabanac, G.; Hubert, G.; Hartley, J.: Solo versus collaborative writing : discrepancies in the use of tables and graphs in academic articles (2014) 3.56
    3.5623734 = sum of:
      3.5623734 = weight(author_txt:cabanac in 1242) [ClassicSimilarity], result of:
        3.5623734 = fieldWeight in 1242, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.499662 = idf(docFreq=8, maxDocs=44218)
          0.375 = fieldNorm(doc=1242)
    

Similar documents (content)

  1. Larivière, V.; Gingras, Y.: On the prevalence and scientific impact of duplicate publications in different scientific fields (1980-2007) (2010) 0.25
    0.24529575 = sum of:
      0.24529575 = product of:
        1.0220656 = sum of:
          0.046193518 = weight(abstract_txt:published in 3622) [ClassicSimilarity], result of:
            0.046193518 = score(doc=3622,freq=3.0), product of:
              0.086824805 = queryWeight, product of:
                1.0851816 = boost
                4.9146953 = idf(docFreq=881, maxDocs=44218)
                0.01627964 = queryNorm
              0.53203136 = fieldWeight in 3622, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.9146953 = idf(docFreq=881, maxDocs=44218)
                0.0625 = fieldNorm(doc=3622)
          0.046470396 = weight(abstract_txt:publications in 3622) [ClassicSimilarity], result of:
            0.046470396 = score(doc=3622,freq=2.0), product of:
              0.09978635 = queryWeight, product of:
                1.1633652 = boost
                5.268782 = idf(docFreq=618, maxDocs=44218)
                0.01627964 = queryNorm
              0.46569893 = fieldWeight in 3622, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.268782 = idf(docFreq=618, maxDocs=44218)
                0.0625 = fieldNorm(doc=3622)
          0.028970316 = weight(abstract_txt:literature in 3622) [ClassicSimilarity], result of:
            0.028970316 = score(doc=3622,freq=1.0), product of:
              0.10502583 = queryWeight, product of:
                1.4617537 = boost
                4.413439 = idf(docFreq=1455, maxDocs=44218)
                0.01627964 = queryNorm
              0.27583992 = fieldWeight in 3622, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.413439 = idf(docFreq=1455, maxDocs=44218)
                0.0625 = fieldNorm(doc=3622)
          0.047657672 = weight(abstract_txt:scientific in 3622) [ClassicSimilarity], result of:
            0.047657672 = score(doc=3622,freq=2.0), product of:
              0.11616425 = queryWeight, product of:
                1.5373133 = boost
                4.6415744 = idf(docFreq=1158, maxDocs=44218)
                0.01627964 = queryNorm
              0.4102611 = fieldWeight in 3622, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.6415744 = idf(docFreq=1158, maxDocs=44218)
                0.0625 = fieldNorm(doc=3622)
          0.30858776 = weight(abstract_txt:prevalence in 3622) [ClassicSimilarity], result of:
            0.30858776 = score(doc=3622,freq=3.0), product of:
              0.35253805 = queryWeight, product of:
                2.6781147 = boost
                8.085969 = idf(docFreq=36, maxDocs=44218)
                0.01627964 = queryNorm
              0.8753318 = fieldWeight in 3622, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.085969 = idf(docFreq=36, maxDocs=44218)
                0.0625 = fieldNorm(doc=3622)
          0.54418594 = weight(abstract_txt:papers in 3622) [ClassicSimilarity], result of:
            0.54418594 = score(doc=3622,freq=9.0), product of:
              0.5501762 = queryWeight, product of:
                6.406381 = boost
                5.2752647 = idf(docFreq=614, maxDocs=44218)
                0.01627964 = queryNorm
              0.98911214 = fieldWeight in 3622, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                5.2752647 = idf(docFreq=614, maxDocs=44218)
                0.0625 = fieldNorm(doc=3622)
        0.24 = coord(6/25)
    
  2. Jiao, H.; Qiu, Y.; Ma, X.; Yang, B.: Dissmination effect of data papers on scientific datasets (2024) 0.21
    0.2051551 = sum of:
      0.2051551 = product of:
        0.854813 = sum of:
          0.04568843 = weight(abstract_txt:citation in 1204) [ClassicSimilarity], result of:
            0.04568843 = score(doc=1204,freq=3.0), product of:
              0.08619074 = queryWeight, product of:
                1.0812119 = boost
                4.896717 = idf(docFreq=897, maxDocs=44218)
                0.01627964 = queryNorm
              0.53008515 = fieldWeight in 1204, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.896717 = idf(docFreq=897, maxDocs=44218)
                0.0625 = fieldNorm(doc=1204)
          0.026669841 = weight(abstract_txt:published in 1204) [ClassicSimilarity], result of:
            0.026669841 = score(doc=1204,freq=1.0), product of:
              0.086824805 = queryWeight, product of:
                1.0851816 = boost
                4.9146953 = idf(docFreq=881, maxDocs=44218)
                0.01627964 = queryNorm
              0.30716845 = fieldWeight in 1204, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9146953 = idf(docFreq=881, maxDocs=44218)
                0.0625 = fieldNorm(doc=1204)
          0.032859534 = weight(abstract_txt:publications in 1204) [ClassicSimilarity], result of:
            0.032859534 = score(doc=1204,freq=1.0), product of:
              0.09978635 = queryWeight, product of:
                1.1633652 = boost
                5.268782 = idf(docFreq=618, maxDocs=44218)
                0.01627964 = queryNorm
              0.32929888 = fieldWeight in 1204, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.268782 = idf(docFreq=618, maxDocs=44218)
                0.0625 = fieldNorm(doc=1204)
          0.058368485 = weight(abstract_txt:scientific in 1204) [ClassicSimilarity], result of:
            0.058368485 = score(doc=1204,freq=3.0), product of:
              0.11616425 = queryWeight, product of:
                1.5373133 = boost
                4.6415744 = idf(docFreq=1158, maxDocs=44218)
                0.01627964 = queryNorm
              0.5024651 = fieldWeight in 1204, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.6415744 = idf(docFreq=1158, maxDocs=44218)
                0.0625 = fieldNorm(doc=1204)
          0.17816323 = weight(abstract_txt:prevalence in 1204) [ClassicSimilarity], result of:
            0.17816323 = score(doc=1204,freq=1.0), product of:
              0.35253805 = queryWeight, product of:
                2.6781147 = boost
                8.085969 = idf(docFreq=36, maxDocs=44218)
                0.01627964 = queryNorm
              0.50537306 = fieldWeight in 1204, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.085969 = idf(docFreq=36, maxDocs=44218)
                0.0625 = fieldNorm(doc=1204)
          0.51306343 = weight(abstract_txt:papers in 1204) [ClassicSimilarity], result of:
            0.51306343 = score(doc=1204,freq=8.0), product of:
              0.5501762 = queryWeight, product of:
                6.406381 = boost
                5.2752647 = idf(docFreq=614, maxDocs=44218)
                0.01627964 = queryNorm
              0.9325439 = fieldWeight in 1204, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                5.2752647 = idf(docFreq=614, maxDocs=44218)
                0.0625 = fieldNorm(doc=1204)
        0.24 = coord(6/25)
    
  3. Pertile, S. de L.; Moreira, V.P.: Comparing and combining content- and citation-based approaches for plagiarism detection (2016) 0.14
    0.1402484 = sum of:
      0.1402484 = product of:
        0.58436835 = sum of:
          0.02637823 = weight(abstract_txt:citation in 3123) [ClassicSimilarity], result of:
            0.02637823 = score(doc=3123,freq=1.0), product of:
              0.08619074 = queryWeight, product of:
                1.0812119 = boost
                4.896717 = idf(docFreq=897, maxDocs=44218)
                0.01627964 = queryNorm
              0.30604482 = fieldWeight in 3123, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.896717 = idf(docFreq=897, maxDocs=44218)
                0.0625 = fieldNorm(doc=3123)
          0.032067176 = weight(abstract_txt:without in 3123) [ClassicSimilarity], result of:
            0.032067176 = score(doc=3123,freq=1.0), product of:
              0.098175704 = queryWeight, product of:
                1.1539382 = boost
                5.2260876 = idf(docFreq=645, maxDocs=44218)
                0.01627964 = queryNorm
              0.32663047 = fieldWeight in 3123, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2260876 = idf(docFreq=645, maxDocs=44218)
                0.0625 = fieldNorm(doc=3123)
          0.032859534 = weight(abstract_txt:publications in 3123) [ClassicSimilarity], result of:
            0.032859534 = score(doc=3123,freq=1.0), product of:
              0.09978635 = queryWeight, product of:
                1.1633652 = boost
                5.268782 = idf(docFreq=618, maxDocs=44218)
                0.01627964 = queryNorm
              0.32929888 = fieldWeight in 3123, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.268782 = idf(docFreq=618, maxDocs=44218)
                0.0625 = fieldNorm(doc=3123)
          0.058368485 = weight(abstract_txt:scientific in 3123) [ClassicSimilarity], result of:
            0.058368485 = score(doc=3123,freq=3.0), product of:
              0.11616425 = queryWeight, product of:
                1.5373133 = boost
                4.6415744 = idf(docFreq=1158, maxDocs=44218)
                0.01627964 = queryNorm
              0.5024651 = fieldWeight in 3123, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.6415744 = idf(docFreq=1158, maxDocs=44218)
                0.0625 = fieldNorm(doc=3123)
          0.17816323 = weight(abstract_txt:prevalence in 3123) [ClassicSimilarity], result of:
            0.17816323 = score(doc=3123,freq=1.0), product of:
              0.35253805 = queryWeight, product of:
                2.6781147 = boost
                8.085969 = idf(docFreq=36, maxDocs=44218)
                0.01627964 = queryNorm
              0.50537306 = fieldWeight in 3123, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.085969 = idf(docFreq=36, maxDocs=44218)
                0.0625 = fieldNorm(doc=3123)
          0.25653172 = weight(abstract_txt:papers in 3123) [ClassicSimilarity], result of:
            0.25653172 = score(doc=3123,freq=2.0), product of:
              0.5501762 = queryWeight, product of:
                6.406381 = boost
                5.2752647 = idf(docFreq=614, maxDocs=44218)
                0.01627964 = queryNorm
              0.46627194 = fieldWeight in 3123, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.2752647 = idf(docFreq=614, maxDocs=44218)
                0.0625 = fieldNorm(doc=3123)
        0.24 = coord(6/25)
    
  4. Shuai, X.; Rollins, J.; Moulinier, I.; Custis, T.; Edmunds, M.; Schilder, F.: ¬A multidimensional investigation of the effects of publication retraction on scholarly impact (2017) 0.13
    0.1265177 = sum of:
      0.1265177 = product of:
        0.79073566 = sum of:
          0.13618842 = weight(abstract_txt:retraction in 3798) [ClassicSimilarity], result of:
            0.13618842 = score(doc=3798,freq=2.0), product of:
              0.16219483 = queryWeight, product of:
                1.0487791 = boost
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.01627964 = queryNorm
              0.83965945 = fieldWeight in 3798, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.0625 = fieldNorm(doc=3798)
          0.24409893 = weight(abstract_txt:retractions in 3798) [ClassicSimilarity], result of:
            0.24409893 = score(doc=3798,freq=5.0), product of:
              0.17633592 = queryWeight, product of:
                1.0935432 = boost
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.01627964 = queryNorm
              1.3842837 = fieldWeight in 3798, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                9.905128 = idf(docFreq=5, maxDocs=44218)
                0.0625 = fieldNorm(doc=3798)
          0.047657672 = weight(abstract_txt:scientific in 3798) [ClassicSimilarity], result of:
            0.047657672 = score(doc=3798,freq=2.0), product of:
              0.11616425 = queryWeight, product of:
                1.5373133 = boost
                4.6415744 = idf(docFreq=1158, maxDocs=44218)
                0.01627964 = queryNorm
              0.4102611 = fieldWeight in 3798, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.6415744 = idf(docFreq=1158, maxDocs=44218)
                0.0625 = fieldNorm(doc=3798)
          0.36279064 = weight(abstract_txt:papers in 3798) [ClassicSimilarity], result of:
            0.36279064 = score(doc=3798,freq=4.0), product of:
              0.5501762 = queryWeight, product of:
                6.406381 = boost
                5.2752647 = idf(docFreq=614, maxDocs=44218)
                0.01627964 = queryNorm
              0.6594081 = fieldWeight in 3798, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.2752647 = idf(docFreq=614, maxDocs=44218)
                0.0625 = fieldNorm(doc=3798)
        0.16 = coord(4/25)
    
  5. Schöpfel, J.; Farace, D.; Prost, H.; Zane, A.: Data papers as a new form of knowledge organization in the field of research data (2019) 0.13
    0.1256019 = sum of:
      0.1256019 = product of:
        0.7850119 = sum of:
          0.041074418 = weight(abstract_txt:publications in 5639) [ClassicSimilarity], result of:
            0.041074418 = score(doc=5639,freq=1.0), product of:
              0.09978635 = queryWeight, product of:
                1.1633652 = boost
                5.268782 = idf(docFreq=618, maxDocs=44218)
                0.01627964 = queryNorm
              0.4116236 = fieldWeight in 5639, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.268782 = idf(docFreq=618, maxDocs=44218)
                0.078125 = fieldNorm(doc=5639)
          0.09400899 = weight(abstract_txt:publishers in 5639) [ClassicSimilarity], result of:
            0.09400899 = score(doc=5639,freq=1.0), product of:
              0.19838105 = queryWeight, product of:
                2.008983 = boost
                6.0656753 = idf(docFreq=278, maxDocs=44218)
                0.01627964 = queryNorm
              0.4738809 = fieldWeight in 5639, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0656753 = idf(docFreq=278, maxDocs=44218)
                0.078125 = fieldNorm(doc=5639)
          0.09452105 = weight(abstract_txt:generated in 5639) [ClassicSimilarity], result of:
            0.09452105 = score(doc=5639,freq=1.0), product of:
              0.21913877 = queryWeight, product of:
                2.4381204 = boost
                5.52102 = idf(docFreq=480, maxDocs=44218)
                0.01627964 = queryNorm
              0.43132967 = fieldWeight in 5639, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.52102 = idf(docFreq=480, maxDocs=44218)
                0.078125 = fieldNorm(doc=5639)
          0.55540746 = weight(abstract_txt:papers in 5639) [ClassicSimilarity], result of:
            0.55540746 = score(doc=5639,freq=6.0), product of:
              0.5501762 = queryWeight, product of:
                6.406381 = boost
                5.2752647 = idf(docFreq=614, maxDocs=44218)
                0.01627964 = queryNorm
              1.0095084 = fieldWeight in 5639, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.2752647 = idf(docFreq=614, maxDocs=44218)
                0.078125 = fieldNorm(doc=5639)
        0.16 = coord(4/25)