Document (#43746)

Pech, G.
Delgado, C.
Sorella, S.P.
Classifying papers into subfields using Abstracts, Titles, Keywords and KeyWords Plus through pattern detection and optimization procedures : an application in Physics
Journal of the Association for Information Science and Technology. 73(2022) no.11, S.1513-1528
Classifying papers according to the fields of knowledge is critical to clearly understand the dynamics of scientific (sub)fields, their leading questions, and trends. Most studies rely on journal categories defined by popular databases such as WoS or Scopus, but some experts find that those categories may not correctly map the existing subfields nor identify the subfield of a specific article. This study addresses the classification problem using data from each paper (Abstract, Title, Keywords, and the KeyWords Plus) and the help of experts to identify the existing subfields and journals exclusive of each subfield. These "exclusive journals" are critical to obtain, through a pattern detection procedure that uses machine learning techniques (from software NVivo), a list of the frequent terms that are specific to each subfield. With that list of terms and with the help of optimization procedures, we can identify to which subfield each paper most likely belongs. This study can contribute to support scientific policy-makers, funding, and research institutions-via more accurate academic performance evaluations-, to support editors in their tasks to redefine the scopes of journals, and to support popular databases in their processes of refining categories.
Automatisches Klassifizieren

Similar documents (author)

  1. Delgado, Y. Hidalgo- => Hidalgo-Delgado, Y.: 5.04
    5.0379567 = sum of:
      5.0379567 = weight(author_txt:delgado in 3705) [ClassicSimilarity], result of:
        5.0379567 = fieldWeight in 3705, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          9.499662 = idf(docFreq=8, maxDocs=44218)
          0.375 = fieldNorm(doc=3705)
  2. Quirós, L. Delgado- => Delgado-Quirós, L.: 5.04
    5.0379567 = sum of:
      5.0379567 = weight(author_txt:delgado in 840) [ClassicSimilarity], result of:
        5.0379567 = fieldWeight in 840, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          9.499662 = idf(docFreq=8, maxDocs=44218)
          0.375 = fieldNorm(doc=840)
  3. Thelwall, M.; Delgado, M.M.: Arts and humanities research evaluation : no metrics please, just data (2015) 4.75
    4.749831 = sum of:
      4.749831 = weight(author_txt:delgado in 2313) [ClassicSimilarity], result of:
        4.749831 = fieldWeight in 2313, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.499662 = idf(docFreq=8, maxDocs=44218)
          0.5 = fieldNorm(doc=2313)
  4. Montalvo, S.; Martínez, R.; Fresno, V.; Delgado, A.: Exploiting named entities for bilingual news clustering (2015) 2.97
    2.9686446 = sum of:
      2.9686446 = weight(author_txt:delgado in 1642) [ClassicSimilarity], result of:
        2.9686446 = fieldWeight in 1642, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.499662 = idf(docFreq=8, maxDocs=44218)
          0.3125 = fieldNorm(doc=1642)
  5. Delgado, A.D.; Martínez, R.; Montalvo, S.; Fresno, V.: Person name disambiguation in the Web using adaptive threshold clustering (2017) 2.97
    2.9686446 = sum of:
      2.9686446 = weight(author_txt:delgado in 3694) [ClassicSimilarity], result of:
        2.9686446 = fieldWeight in 3694, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.499662 = idf(docFreq=8, maxDocs=44218)
          0.3125 = fieldNorm(doc=3694)

Similar documents (content)

  1. Chen, Y.-N.; Ke, H.-R.: ¬A study on mental models of taggers and experts for article indexing based on analysis of keyword usage (2014) 0.17
    0.16761464 = sum of:
      0.16761464 = product of:
        0.5986237 = sum of:
          0.016610947 = weight(abstract_txt:their in 1334) [ClassicSimilarity], result of:
            0.016610947 = score(doc=1334,freq=2.0), product of:
              0.059481386 = queryWeight, product of:
                3.1594994 = idf(docFreq=5101, maxDocs=44218)
                0.018826205 = queryNorm
              0.27926293 = fieldWeight in 1334, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.1594994 = idf(docFreq=5101, maxDocs=44218)
                0.0625 = fieldNorm(doc=1334)
          0.06772973 = weight(abstract_txt:popular in 1334) [ClassicSimilarity], result of:
            0.06772973 = score(doc=1334,freq=2.0), product of:
              0.13261947 = queryWeight, product of:
                1.219179 = boost
                5.777993 = idf(docFreq=371, maxDocs=44218)
                0.018826205 = queryNorm
              0.51070726 = fieldWeight in 1334, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.777993 = idf(docFreq=371, maxDocs=44218)
                0.0625 = fieldNorm(doc=1334)
          0.117170796 = weight(abstract_txt:experts in 1334) [ClassicSimilarity], result of:
            0.117170796 = score(doc=1334,freq=5.0), product of:
              0.14081664 = queryWeight, product of:
                1.2562927 = boost
                5.953884 = idf(docFreq=311, maxDocs=44218)
                0.018826205 = queryNorm
              0.8320806 = fieldWeight in 1334, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.953884 = idf(docFreq=311, maxDocs=44218)
                0.0625 = fieldNorm(doc=1334)
          0.120986685 = weight(abstract_txt:pattern in 1334) [ClassicSimilarity], result of:
            0.120986685 = score(doc=1334,freq=4.0), product of:
              0.15496589 = queryWeight, product of:
                1.3178983 = boost
                6.2458487 = idf(docFreq=232, maxDocs=44218)
                0.018826205 = queryNorm
              0.7807311 = fieldWeight in 1334, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.2458487 = idf(docFreq=232, maxDocs=44218)
                0.0625 = fieldNorm(doc=1334)
          0.050325558 = weight(abstract_txt:journals in 1334) [ClassicSimilarity], result of:
            0.050325558 = score(doc=1334,freq=1.0), product of:
              0.15691118 = queryWeight, product of:
                1.6241884 = boost
                5.1316223 = idf(docFreq=709, maxDocs=44218)
                0.018826205 = queryNorm
              0.3207264 = fieldWeight in 1334, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1316223 = idf(docFreq=709, maxDocs=44218)
                0.0625 = fieldNorm(doc=1334)
          0.07310721 = weight(abstract_txt:categories in 1334) [ClassicSimilarity], result of:
            0.07310721 = score(doc=1334,freq=2.0), product of:
              0.15974416 = queryWeight, product of:
                1.6387849 = boost
                5.17774 = idf(docFreq=677, maxDocs=44218)
                0.018826205 = queryNorm
              0.45765188 = fieldWeight in 1334, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.17774 = idf(docFreq=677, maxDocs=44218)
                0.0625 = fieldNorm(doc=1334)
          0.15269278 = weight(abstract_txt:keywords in 1334) [ClassicSimilarity], result of:
            0.15269278 = score(doc=1334,freq=2.0), product of:
              0.2872831 = queryWeight, product of:
                2.5376625 = boost
                6.0133076 = idf(docFreq=293, maxDocs=44218)
                0.018826205 = queryNorm
              0.5315063 = fieldWeight in 1334, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.0133076 = idf(docFreq=293, maxDocs=44218)
                0.0625 = fieldNorm(doc=1334)
        0.28 = coord(7/25)
  2. Zhang, J.; Yu, Q.; Zheng, F.; Long, C.; Lu, Z.; Duan, Z.: Comparing keywords plus of WOS and author keywords : a case study of patient adherence research (2016) 0.17
    0.16696838 = sum of:
      0.16696838 = product of:
        0.83484185 = sum of:
          0.055895112 = weight(abstract_txt:fields in 2857) [ClassicSimilarity], result of:
            0.055895112 = score(doc=2857,freq=2.0), product of:
              0.100553446 = queryWeight, product of:
                1.0616034 = boost
                5.0312033 = idf(docFreq=784, maxDocs=44218)
                0.018826205 = queryNorm
              0.55587465 = fieldWeight in 2857, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.0312033 = idf(docFreq=784, maxDocs=44218)
                0.078125 = fieldNorm(doc=2857)
          0.045559198 = weight(abstract_txt:papers in 2857) [ClassicSimilarity], result of:
            0.045559198 = score(doc=2857,freq=1.0), product of:
              0.11054568 = queryWeight, product of:
                1.1131014 = boost
                5.2752647 = idf(docFreq=614, maxDocs=44218)
                0.018826205 = queryNorm
              0.41213006 = fieldWeight in 2857, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2752647 = idf(docFreq=614, maxDocs=44218)
                0.078125 = fieldNorm(doc=2857)
          0.059865184 = weight(abstract_txt:popular in 2857) [ClassicSimilarity], result of:
            0.059865184 = score(doc=2857,freq=1.0), product of:
              0.13261947 = queryWeight, product of:
                1.219179 = boost
                5.777993 = idf(docFreq=371, maxDocs=44218)
                0.018826205 = queryNorm
              0.4514057 = fieldWeight in 2857, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.777993 = idf(docFreq=371, maxDocs=44218)
                0.078125 = fieldNorm(doc=2857)
          0.22590193 = weight(abstract_txt:plus in 2857) [ClassicSimilarity], result of:
            0.22590193 = score(doc=2857,freq=7.0), product of:
              0.16803704 = queryWeight, product of:
                1.3723546 = boost
                6.5039306 = idf(docFreq=179, maxDocs=44218)
                0.018826205 = queryNorm
              1.344358 = fieldWeight in 2857, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                6.5039306 = idf(docFreq=179, maxDocs=44218)
                0.078125 = fieldNorm(doc=2857)
          0.4476204 = weight(abstract_txt:keywords in 2857) [ClassicSimilarity], result of:
            0.4476204 = score(doc=2857,freq=11.0), product of:
              0.2872831 = queryWeight, product of:
                2.5376625 = boost
                6.0133076 = idf(docFreq=293, maxDocs=44218)
                0.018826205 = queryNorm
              1.5581161 = fieldWeight in 2857, product of:
                3.3166249 = tf(freq=11.0), with freq of:
                  11.0 = termFreq=11.0
                6.0133076 = idf(docFreq=293, maxDocs=44218)
                0.078125 = fieldNorm(doc=2857)
        0.2 = coord(5/25)
  3. Hjoerland, B.: Citation analysis : a social and dynamic approach to knowledge organization (2013) 0.13
    0.1342894 = sum of:
      0.1342894 = product of:
        0.5595392 = sum of:
          0.011745713 = weight(abstract_txt:their in 2710) [ClassicSimilarity], result of:
            0.011745713 = score(doc=2710,freq=1.0), product of:
              0.059481386 = queryWeight, product of:
                3.1594994 = idf(docFreq=5101, maxDocs=44218)
                0.018826205 = queryNorm
              0.19746871 = fieldWeight in 2710, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1594994 = idf(docFreq=5101, maxDocs=44218)
                0.0625 = fieldNorm(doc=2710)
          0.036447357 = weight(abstract_txt:papers in 2710) [ClassicSimilarity], result of:
            0.036447357 = score(doc=2710,freq=1.0), product of:
              0.11054568 = queryWeight, product of:
                1.1131014 = boost
                5.2752647 = idf(docFreq=614, maxDocs=44218)
                0.018826205 = queryNorm
              0.32970405 = fieldWeight in 2710, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2752647 = idf(docFreq=614, maxDocs=44218)
                0.0625 = fieldNorm(doc=2710)
          0.044862855 = weight(abstract_txt:identify in 2710) [ClassicSimilarity], result of:
            0.044862855 = score(doc=2710,freq=1.0), product of:
              0.14534031 = queryWeight, product of:
                1.5631566 = boost
                4.9387927 = idf(docFreq=860, maxDocs=44218)
                0.018826205 = queryNorm
              0.30867454 = fieldWeight in 2710, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9387927 = idf(docFreq=860, maxDocs=44218)
                0.0625 = fieldNorm(doc=2710)
          0.03469421 = weight(abstract_txt:each in 2710) [ClassicSimilarity], result of:
            0.03469421 = score(doc=2710,freq=1.0), product of:
              0.13477595 = queryWeight, product of:
                1.7381412 = boost
                4.118742 = idf(docFreq=1954, maxDocs=44218)
                0.018826205 = queryNorm
              0.25742137 = fieldWeight in 2710, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.118742 = idf(docFreq=1954, maxDocs=44218)
                0.0625 = fieldNorm(doc=2710)
          0.17436497 = weight(abstract_txt:subfields in 2710) [ClassicSimilarity], result of:
            0.17436497 = score(doc=2710,freq=1.0), product of:
              0.35928106 = queryWeight, product of:
                2.4576874 = boost
                7.7650614 = idf(docFreq=50, maxDocs=44218)
                0.018826205 = queryNorm
              0.48531634 = fieldWeight in 2710, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.7650614 = idf(docFreq=50, maxDocs=44218)
                0.0625 = fieldNorm(doc=2710)
          0.25742415 = weight(abstract_txt:subfield in 2710) [ClassicSimilarity], result of:
            0.25742415 = score(doc=2710,freq=1.0), product of:
              0.51271254 = queryWeight, product of:
                3.3901258 = boost
                8.033325 = idf(docFreq=38, maxDocs=44218)
                0.018826205 = queryNorm
              0.5020828 = fieldWeight in 2710, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.033325 = idf(docFreq=38, maxDocs=44218)
                0.0625 = fieldNorm(doc=2710)
        0.24 = coord(6/25)
  4. Bar-Ilan, J.: Informetrics (2009) 0.12
    0.11815727 = sum of:
      0.11815727 = product of:
        0.73848295 = sum of:
          0.045559198 = weight(abstract_txt:papers in 3822) [ClassicSimilarity], result of:
            0.045559198 = score(doc=3822,freq=1.0), product of:
              0.11054568 = queryWeight, product of:
                1.1131014 = boost
                5.2752647 = idf(docFreq=614, maxDocs=44218)
                0.018826205 = queryNorm
              0.41213006 = fieldWeight in 3822, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2752647 = idf(docFreq=614, maxDocs=44218)
                0.078125 = fieldNorm(doc=3822)
          0.06290694 = weight(abstract_txt:journals in 3822) [ClassicSimilarity], result of:
            0.06290694 = score(doc=3822,freq=1.0), product of:
              0.15691118 = queryWeight, product of:
                1.6241884 = boost
                5.1316223 = idf(docFreq=709, maxDocs=44218)
                0.018826205 = queryNorm
              0.400908 = fieldWeight in 3822, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1316223 = idf(docFreq=709, maxDocs=44218)
                0.078125 = fieldNorm(doc=3822)
          0.3082366 = weight(abstract_txt:subfields in 3822) [ClassicSimilarity], result of:
            0.3082366 = score(doc=3822,freq=2.0), product of:
              0.35928106 = queryWeight, product of:
                2.4576874 = boost
                7.7650614 = idf(docFreq=50, maxDocs=44218)
                0.018826205 = queryNorm
              0.85792613 = fieldWeight in 3822, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.7650614 = idf(docFreq=50, maxDocs=44218)
                0.078125 = fieldNorm(doc=3822)
          0.3217802 = weight(abstract_txt:subfield in 3822) [ClassicSimilarity], result of:
            0.3217802 = score(doc=3822,freq=1.0), product of:
              0.51271254 = queryWeight, product of:
                3.3901258 = boost
                8.033325 = idf(docFreq=38, maxDocs=44218)
                0.018826205 = queryNorm
              0.62760353 = fieldWeight in 3822, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.033325 = idf(docFreq=38, maxDocs=44218)
                0.078125 = fieldNorm(doc=3822)
        0.16 = coord(4/25)
  5. Gwak, J.H.; Sohn, S.J.: ¬A novel approach to explore patent development paths for subfield technologies (2018) 0.11
    0.11212153 = sum of:
      0.11212153 = product of:
        0.9343461 = sum of:
          0.043367762 = weight(abstract_txt:each in 4120) [ClassicSimilarity], result of:
            0.043367762 = score(doc=4120,freq=1.0), product of:
              0.13477595 = queryWeight, product of:
                1.7381412 = boost
                4.118742 = idf(docFreq=1954, maxDocs=44218)
                0.018826205 = queryNorm
              0.32177672 = fieldWeight in 4120, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.118742 = idf(docFreq=1954, maxDocs=44218)
                0.078125 = fieldNorm(doc=4120)
          0.4359124 = weight(abstract_txt:subfields in 4120) [ClassicSimilarity], result of:
            0.4359124 = score(doc=4120,freq=4.0), product of:
              0.35928106 = queryWeight, product of:
                2.4576874 = boost
                7.7650614 = idf(docFreq=50, maxDocs=44218)
                0.018826205 = queryNorm
              1.2132908 = fieldWeight in 4120, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.7650614 = idf(docFreq=50, maxDocs=44218)
                0.078125 = fieldNorm(doc=4120)
          0.4550659 = weight(abstract_txt:subfield in 4120) [ClassicSimilarity], result of:
            0.4550659 = score(doc=4120,freq=2.0), product of:
              0.51271254 = queryWeight, product of:
                3.3901258 = boost
                8.033325 = idf(docFreq=38, maxDocs=44218)
                0.018826205 = queryNorm
              0.8875654 = fieldWeight in 4120, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.033325 = idf(docFreq=38, maxDocs=44218)
                0.078125 = fieldNorm(doc=4120)
        0.12 = coord(3/25)