Document (#38665)

Author
Moohebat, M.
Raj, R.G.
Kareem, S.B.A.
Thorleuchter, D.
Title
Identifying ISI-indexed articles by their lexical usage : a text analysis approach
Source
Journal of the Association for Information Science and Technology. 66(2015) no.3, S.501-511
Year
2015
Abstract
This research creates an architecture for investigating the existence of probable lexical divergences between articles, categorized as Institute for Scientific Information (ISI) and non-ISI, and consequently, if such a difference is discovered, to propose the best available classification method. Based on a collection of ISI- and non-ISI-indexed articles in the areas of business and computer science, three classification models are trained. A sensitivity analysis is applied to demonstrate the impact of words in different syntactical forms on the classification decision. The results demonstrate that the lexical domains of ISI and non-ISI articles are distinguishable by machine learning techniques. Our findings indicate that the support vector machine identifies ISI-indexed articles in both disciplines with higher precision than do the Naïve Bayesian and K-Nearest Neighbors techniques.
Content
Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23194/abstract.
Theme
Informetrie
Computerlinguistik

Similar documents (content)

  1. Xiang, R.; Chersoni, E.; Lu, Q.; Huang, C.-R.; Li, W.; Long, Y.: Lexical data augmentation for sentiment analysis (2021) 0.12
    0.12180702 = sum of:
      0.12180702 = product of:
        0.50752926 = sum of:
          0.051656764 = weight(abstract_txt:consequently in 392) [ClassicSimilarity], result of:
            0.051656764 = score(doc=392,freq=1.0), product of:
              0.13553065 = queryWeight, product of:
                1.0823226 = boost
                6.9694996 = idf(docFreq=112, maxDocs=44218)
                0.017967151 = queryNorm
              0.38114452 = fieldWeight in 392, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9694996 = idf(docFreq=112, maxDocs=44218)
                0.0546875 = fieldNorm(doc=392)
          0.053744584 = weight(abstract_txt:trained in 392) [ClassicSimilarity], result of:
            0.053744584 = score(doc=392,freq=1.0), product of:
              0.13915832 = queryWeight, product of:
                1.0967119 = boost
                7.062158 = idf(docFreq=102, maxDocs=44218)
                0.017967151 = queryNorm
              0.38621178 = fieldWeight in 392, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.062158 = idf(docFreq=102, maxDocs=44218)
                0.0546875 = fieldNorm(doc=392)
          0.029766371 = weight(abstract_txt:analysis in 392) [ClassicSimilarity], result of:
            0.029766371 = score(doc=392,freq=4.0), product of:
              0.074489206 = queryWeight, product of:
                1.1347485 = boost
                3.6535451 = idf(docFreq=3112, maxDocs=44218)
                0.017967151 = queryNorm
              0.3996065 = fieldWeight in 392, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.6535451 = idf(docFreq=3112, maxDocs=44218)
                0.0546875 = fieldNorm(doc=392)
          0.04011611 = weight(abstract_txt:techniques in 392) [ClassicSimilarity], result of:
            0.04011611 = score(doc=392,freq=2.0), product of:
              0.114506975 = queryWeight, product of:
                1.4069184 = boost
                4.5298495 = idf(docFreq=1295, maxDocs=44218)
                0.017967151 = queryNorm
              0.35033768 = fieldWeight in 392, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.5298495 = idf(docFreq=1295, maxDocs=44218)
                0.0546875 = fieldNorm(doc=392)
          0.077741005 = weight(abstract_txt:machine in 392) [ClassicSimilarity], result of:
            0.077741005 = score(doc=392,freq=3.0), product of:
              0.15548521 = queryWeight, product of:
                1.6394475 = boost
                5.2785225 = idf(docFreq=612, maxDocs=44218)
                0.017967151 = queryNorm
              0.4999897 = fieldWeight in 392, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.2785225 = idf(docFreq=612, maxDocs=44218)
                0.0546875 = fieldNorm(doc=392)
          0.25450444 = weight(abstract_txt:lexical in 392) [ClassicSimilarity], result of:
            0.25450444 = score(doc=392,freq=4.0), product of:
              0.35653603 = queryWeight, product of:
                3.040537 = boost
                6.5264034 = idf(docFreq=175, maxDocs=44218)
                0.017967151 = queryNorm
              0.71382535 = fieldWeight in 392, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.5264034 = idf(docFreq=175, maxDocs=44218)
                0.0546875 = fieldNorm(doc=392)
        0.24 = coord(6/25)
    
  2. Ikae, C.; Savoy, J.: Gender identification on Twitter (2022) 0.12
    0.117101975 = sum of:
      0.117101975 = product of:
        0.4879249 = sum of:
          0.048350897 = weight(abstract_txt:vector in 445) [ClassicSimilarity], result of:
            0.048350897 = score(doc=445,freq=1.0), product of:
              0.11863909 = queryWeight, product of:
                1.0126325 = boost
                6.5207376 = idf(docFreq=176, maxDocs=44218)
                0.017967151 = queryNorm
              0.4075461 = fieldWeight in 445, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5207376 = idf(docFreq=176, maxDocs=44218)
                0.0625 = fieldNorm(doc=445)
          0.017009355 = weight(abstract_txt:analysis in 445) [ClassicSimilarity], result of:
            0.017009355 = score(doc=445,freq=1.0), product of:
              0.074489206 = queryWeight, product of:
                1.1347485 = boost
                3.6535451 = idf(docFreq=3112, maxDocs=44218)
                0.017967151 = queryNorm
              0.22834657 = fieldWeight in 445, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6535451 = idf(docFreq=3112, maxDocs=44218)
                0.0625 = fieldNorm(doc=445)
          0.09410962 = weight(abstract_txt:nearest in 445) [ClassicSimilarity], result of:
            0.09410962 = score(doc=445,freq=1.0), product of:
              0.18494707 = queryWeight, product of:
                1.2643336 = boost
                8.14154 = idf(docFreq=34, maxDocs=44218)
                0.017967151 = queryNorm
              0.5088462 = fieldWeight in 445, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.14154 = idf(docFreq=34, maxDocs=44218)
                0.0625 = fieldNorm(doc=445)
          0.100782596 = weight(abstract_txt:naïve in 445) [ClassicSimilarity], result of:
            0.100782596 = score(doc=445,freq=1.0), product of:
              0.19358951 = queryWeight, product of:
                1.293537 = boost
                8.329592 = idf(docFreq=28, maxDocs=44218)
                0.017967151 = queryNorm
              0.5205995 = fieldWeight in 445, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.329592 = idf(docFreq=28, maxDocs=44218)
                0.0625 = fieldNorm(doc=445)
          0.15512924 = weight(abstract_txt:neighbors in 445) [ClassicSimilarity], result of:
            0.15512924 = score(doc=445,freq=1.0), product of:
              0.25807974 = queryWeight, product of:
                1.4935333 = boost
                9.617446 = idf(docFreq=7, maxDocs=44218)
                0.017967151 = queryNorm
              0.6010904 = fieldWeight in 445, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.617446 = idf(docFreq=7, maxDocs=44218)
                0.0625 = fieldNorm(doc=445)
          0.07254317 = weight(abstract_txt:machine in 445) [ClassicSimilarity], result of:
            0.07254317 = score(doc=445,freq=2.0), product of:
              0.15548521 = queryWeight, product of:
                1.6394475 = boost
                5.2785225 = idf(docFreq=612, maxDocs=44218)
                0.017967151 = queryNorm
              0.4665599 = fieldWeight in 445, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.2785225 = idf(docFreq=612, maxDocs=44218)
                0.0625 = fieldNorm(doc=445)
        0.24 = coord(6/25)
    
  3. Huang, C.; Fu, T.; Chen, H.: Text-based video content classification for online video-sharing sites (2010) 0.11
    0.11179915 = sum of:
      0.11179915 = product of:
        0.4658298 = sum of:
          0.059831183 = weight(abstract_txt:vector in 3452) [ClassicSimilarity], result of:
            0.059831183 = score(doc=3452,freq=2.0), product of:
              0.11863909 = queryWeight, product of:
                1.0126325 = boost
                6.5207376 = idf(docFreq=176, maxDocs=44218)
                0.017967151 = queryNorm
              0.5043126 = fieldWeight in 3452, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.5207376 = idf(docFreq=176, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3452)
          0.124712095 = weight(abstract_txt:naïve in 3452) [ClassicSimilarity], result of:
            0.124712095 = score(doc=3452,freq=2.0), product of:
              0.19358951 = queryWeight, product of:
                1.293537 = boost
                8.329592 = idf(docFreq=28, maxDocs=44218)
                0.017967151 = queryNorm
              0.64420897 = fieldWeight in 3452, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.329592 = idf(docFreq=28, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3452)
          0.04011611 = weight(abstract_txt:techniques in 3452) [ClassicSimilarity], result of:
            0.04011611 = score(doc=3452,freq=2.0), product of:
              0.114506975 = queryWeight, product of:
                1.4069184 = boost
                4.5298495 = idf(docFreq=1295, maxDocs=44218)
                0.017967151 = queryNorm
              0.35033768 = fieldWeight in 3452, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.5298495 = idf(docFreq=1295, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3452)
          0.063475266 = weight(abstract_txt:machine in 3452) [ClassicSimilarity], result of:
            0.063475266 = score(doc=3452,freq=2.0), product of:
              0.15548521 = queryWeight, product of:
                1.6394475 = boost
                5.2785225 = idf(docFreq=612, maxDocs=44218)
                0.017967151 = queryNorm
              0.4082399 = fieldWeight in 3452, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.2785225 = idf(docFreq=612, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3452)
          0.050442945 = weight(abstract_txt:classification in 3452) [ClassicSimilarity], result of:
            0.050442945 = score(doc=3452,freq=3.0), product of:
              0.13339914 = queryWeight, product of:
                1.859838 = boost
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.017967151 = queryNorm
              0.37813544 = fieldWeight in 3452, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3452)
          0.12725222 = weight(abstract_txt:lexical in 3452) [ClassicSimilarity], result of:
            0.12725222 = score(doc=3452,freq=1.0), product of:
              0.35653603 = queryWeight, product of:
                3.040537 = boost
                6.5264034 = idf(docFreq=175, maxDocs=44218)
                0.017967151 = queryNorm
              0.35691267 = fieldWeight in 3452, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5264034 = idf(docFreq=175, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3452)
        0.24 = coord(6/25)
    
  4. Sabourin, C.F. (Bearb.): Computational lexicology and lexicography : bibliography (1994) 0.10
    0.09605976 = sum of:
      0.09605976 = product of:
        0.800498 = sum of:
          0.042523384 = weight(abstract_txt:analysis in 8871) [ClassicSimilarity], result of:
            0.042523384 = score(doc=8871,freq=1.0), product of:
              0.074489206 = queryWeight, product of:
                1.1347485 = boost
                3.6535451 = idf(docFreq=3112, maxDocs=44218)
                0.017967151 = queryNorm
              0.5708664 = fieldWeight in 8871, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6535451 = idf(docFreq=3112, maxDocs=44218)
                0.15625 = fieldNorm(doc=8871)
          0.12823941 = weight(abstract_txt:machine in 8871) [ClassicSimilarity], result of:
            0.12823941 = score(doc=8871,freq=1.0), product of:
              0.15548521 = queryWeight, product of:
                1.6394475 = boost
                5.2785225 = idf(docFreq=612, maxDocs=44218)
                0.017967151 = queryNorm
              0.82476914 = fieldWeight in 8871, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2785225 = idf(docFreq=612, maxDocs=44218)
                0.15625 = fieldNorm(doc=8871)
          0.62973523 = weight(abstract_txt:lexical in 8871) [ClassicSimilarity], result of:
            0.62973523 = score(doc=8871,freq=3.0), product of:
              0.35653603 = queryWeight, product of:
                3.040537 = boost
                6.5264034 = idf(docFreq=175, maxDocs=44218)
                0.017967151 = queryNorm
              1.7662597 = fieldWeight in 8871, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.5264034 = idf(docFreq=175, maxDocs=44218)
                0.15625 = fieldNorm(doc=8871)
        0.12 = coord(3/25)
    
  5. Lu, C.; Bu, Y.; Wang, J.; Ding, Y.; Torvik, V.; Schnaars, M.; Zhang, C.: Examining scientific writing styles from the perspective of linguistic complexity : a cross-level moderation model (2019) 0.09
    0.08996866 = sum of:
      0.08996866 = product of:
        0.7497388 = sum of:
          0.13679104 = weight(abstract_txt:syntactical in 5219) [ClassicSimilarity], result of:
            0.13679104 = score(doc=5219,freq=1.0), product of:
              0.20451407 = queryWeight, product of:
                1.3295343 = boost
                8.561393 = idf(docFreq=22, maxDocs=44218)
                0.017967151 = queryNorm
              0.6688588 = fieldWeight in 5219, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.561393 = idf(docFreq=22, maxDocs=44218)
                0.078125 = fieldNorm(doc=5219)
          0.40649235 = weight(abstract_txt:lexical in 5219) [ClassicSimilarity], result of:
            0.40649235 = score(doc=5219,freq=5.0), product of:
              0.35653603 = queryWeight, product of:
                3.040537 = boost
                6.5264034 = idf(docFreq=175, maxDocs=44218)
                0.017967151 = queryNorm
              1.1401157 = fieldWeight in 5219, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.5264034 = idf(docFreq=175, maxDocs=44218)
                0.078125 = fieldNorm(doc=5219)
          0.20645544 = weight(abstract_txt:articles in 5219) [ClassicSimilarity], result of:
            0.20645544 = score(doc=5219,freq=3.0), product of:
              0.31904498 = queryWeight, product of:
                3.7132049 = boost
                4.7821565 = idf(docFreq=1006, maxDocs=44218)
                0.017967151 = queryNorm
              0.6471045 = fieldWeight in 5219, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.7821565 = idf(docFreq=1006, maxDocs=44218)
                0.078125 = fieldNorm(doc=5219)
        0.12 = coord(3/25)