Document (#40016)

Author
Teich, E.
Degaetano-Ortlieb, S.
Fankhauser, P.
Kermes, H.
Lapshinova-Koltunski, E.
Title
¬The linguistic construal of disciplinarity : a data-mining approach using register features
Source
Journal of the Association for Information Science and Technology. 67(2016) no.7, S.1668-1678
Year
2016
Abstract
We analyze the linguistic evolution of selected scientific disciplines over a 30-year time span (1970s to 2000s). Our focus is on four highly specialized disciplines at the boundaries of computer science that emerged during that time: computational linguistics, bioinformatics, digital construction, and microelectronics. Our analysis is driven by the question whether these disciplines develop a distinctive language use-both individually and collectively-over the given time period. The data set is the English Scientific Text Corpus (scitex), which includes texts from the 1970s/1980s and early 2000s. Our theoretical basis is register theory. In terms of methods, we combine corpus-based methods of feature extraction (various aggregated features [part-of-speech based], n-grams, lexico-grammatical patterns) and automatic text classification. The results of our research are directly relevant to the study of linguistic variation and languages for specific purposes (LSP) and have implications for various natural language processing (NLP) tasks, for example, authorship attribution, text mining, or training NLP tools.
Content
Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23452/abstract.
Theme
Automatisches Klassifizieren
Data Mining

Similar documents (content)

  1. Altinel, B.; Ganiz, M.C.: Semantic text classification : a survey of past and recent advances (2018) 0.11
    0.11274475 = sum of:
      0.11274475 = product of:
        0.4026598 = sum of:
          0.039869625 = weight(abstract_txt:methods in 5051) [ClassicSimilarity], result of:
            0.039869625 = score(doc=5051,freq=4.0), product of:
              0.0879055 = queryWeight, product of:
                1.0810748 = boost
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.019608853 = queryNorm
              0.453551 = fieldWeight in 5051, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5051)
          0.020448774 = weight(abstract_txt:language in 5051) [ClassicSimilarity], result of:
            0.020448774 = score(doc=5051,freq=1.0), product of:
              0.08941 = queryWeight, product of:
                1.090287 = boost
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.019608853 = queryNorm
              0.22870791 = fieldWeight in 5051, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5051)
          0.030232767 = weight(abstract_txt:over in 5051) [ClassicSimilarity], result of:
            0.030232767 = score(doc=5051,freq=2.0), product of:
              0.09209793 = queryWeight, product of:
                1.1065542 = boost
                4.244485 = idf(docFreq=1723, maxDocs=44218)
                0.019608853 = queryNorm
              0.3282676 = fieldWeight in 5051, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.244485 = idf(docFreq=1723, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5051)
          0.099986486 = weight(abstract_txt:text in 5051) [ClassicSimilarity], result of:
            0.099986486 = score(doc=5051,freq=13.0), product of:
              0.12539631 = queryWeight, product of:
                1.5813782 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.019608853 = queryNorm
              0.7973639 = fieldWeight in 5051, product of:
                3.6055512 = tf(freq=13.0), with freq of:
                  13.0 = termFreq=13.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5051)
          0.063408814 = weight(abstract_txt:corpus in 5051) [ClassicSimilarity], result of:
            0.063408814 = score(doc=5051,freq=1.0), product of:
              0.19012578 = queryWeight, product of:
                1.5898943 = boost
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.019608853 = queryNorm
              0.33350983 = fieldWeight in 5051, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5051)
          0.06583985 = weight(abstract_txt:mining in 5051) [ClassicSimilarity], result of:
            0.06583985 = score(doc=5051,freq=1.0), product of:
              0.19495474 = queryWeight, product of:
                1.6099584 = boost
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.019608853 = queryNorm
              0.33771864 = fieldWeight in 5051, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5051)
          0.082873456 = weight(abstract_txt:linguistic in 5051) [ClassicSimilarity], result of:
            0.082873456 = score(doc=5051,freq=1.0), product of:
              0.26016486 = queryWeight, product of:
                2.277811 = boost
                5.8247695 = idf(docFreq=354, maxDocs=44218)
                0.019608853 = queryNorm
              0.3185421 = fieldWeight in 5051, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8247695 = idf(docFreq=354, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5051)
        0.28 = coord(7/25)
    
  2. Castillo, C.; Baeza-Yates, R.: Web retrieval and mining (2009) 0.11
    0.10790044 = sum of:
      0.10790044 = product of:
        0.5395022 = sum of:
          0.034173965 = weight(abstract_txt:methods in 3904) [ClassicSimilarity], result of:
            0.034173965 = score(doc=3904,freq=1.0), product of:
              0.0879055 = queryWeight, product of:
                1.0810748 = boost
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.019608853 = queryNorm
              0.388758 = fieldWeight in 3904, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.09375 = fieldNorm(doc=3904)
          0.10870083 = weight(abstract_txt:corpus in 3904) [ClassicSimilarity], result of:
            0.10870083 = score(doc=3904,freq=1.0), product of:
              0.19012578 = queryWeight, product of:
                1.5898943 = boost
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.019608853 = queryNorm
              0.57173115 = fieldWeight in 3904, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.09375 = fieldNorm(doc=3904)
          0.1596199 = weight(abstract_txt:mining in 3904) [ClassicSimilarity], result of:
            0.1596199 = score(doc=3904,freq=2.0), product of:
              0.19495474 = queryWeight, product of:
                1.6099584 = boost
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.019608853 = queryNorm
              0.8187536 = fieldWeight in 3904, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.09375 = fieldNorm(doc=3904)
          0.051319536 = weight(abstract_txt:time in 3904) [ClassicSimilarity], result of:
            0.051319536 = score(doc=3904,freq=1.0), product of:
              0.1319587 = queryWeight, product of:
                1.6222298 = boost
                4.148331 = idf(docFreq=1897, maxDocs=44218)
                0.019608853 = queryNorm
              0.38890606 = fieldWeight in 3904, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.148331 = idf(docFreq=1897, maxDocs=44218)
                0.09375 = fieldNorm(doc=3904)
          0.18568796 = weight(abstract_txt:1970s in 3904) [ClassicSimilarity], result of:
            0.18568796 = score(doc=3904,freq=1.0), product of:
              0.27169082 = queryWeight, product of:
                1.9005759 = boost
                7.290168 = idf(docFreq=81, maxDocs=44218)
                0.019608853 = queryNorm
              0.6834532 = fieldWeight in 3904, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.290168 = idf(docFreq=81, maxDocs=44218)
                0.09375 = fieldNorm(doc=3904)
        0.2 = coord(5/25)
    
  3. Ibekwe-SanJuan, F.: Semantic metadata annotation : tagging Medline abstracts for enhanced information access (2010) 0.09
    0.092746876 = sum of:
      0.092746876 = product of:
        0.4637344 = sum of:
          0.028192082 = weight(abstract_txt:methods in 3949) [ClassicSimilarity], result of:
            0.028192082 = score(doc=3949,freq=2.0), product of:
              0.0879055 = queryWeight, product of:
                1.0810748 = boost
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.019608853 = queryNorm
              0.320709 = fieldWeight in 3949, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3949)
          0.039536566 = weight(abstract_txt:scientific in 3949) [ClassicSimilarity], result of:
            0.039536566 = score(doc=3949,freq=2.0), product of:
              0.110136315 = queryWeight, product of:
                1.210077 = boost
                4.6415744 = idf(docFreq=1158, maxDocs=44218)
                0.019608853 = queryNorm
              0.35897845 = fieldWeight in 3949, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.6415744 = idf(docFreq=1158, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3949)
          0.12959923 = weight(abstract_txt:lexico in 3949) [ClassicSimilarity], result of:
            0.12959923 = score(doc=3949,freq=1.0), product of:
              0.24303353 = queryWeight, product of:
                1.2710594 = boost
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.019608853 = queryNorm
              0.5332566 = fieldWeight in 3949, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3949)
          0.063408814 = weight(abstract_txt:corpus in 3949) [ClassicSimilarity], result of:
            0.063408814 = score(doc=3949,freq=1.0), product of:
              0.19012578 = queryWeight, product of:
                1.5898943 = boost
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.019608853 = queryNorm
              0.33350983 = fieldWeight in 3949, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3949)
          0.20299768 = weight(abstract_txt:linguistic in 3949) [ClassicSimilarity], result of:
            0.20299768 = score(doc=3949,freq=6.0), product of:
              0.26016486 = queryWeight, product of:
                2.277811 = boost
                5.8247695 = idf(docFreq=354, maxDocs=44218)
                0.019608853 = queryNorm
              0.78026557 = fieldWeight in 3949, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.8247695 = idf(docFreq=354, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3949)
        0.2 = coord(5/25)
    
  4. Mao, J.; Cui, H.: Identifying bacterial biotope entities using sequence labeling : performance and feature analysis (2018) 0.09
    0.09252298 = sum of:
      0.09252298 = product of:
        0.3304392 = sum of:
          0.03221952 = weight(abstract_txt:methods in 4462) [ClassicSimilarity], result of:
            0.03221952 = score(doc=4462,freq=2.0), product of:
              0.0879055 = queryWeight, product of:
                1.0810748 = boost
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.019608853 = queryNorm
              0.36652455 = fieldWeight in 4462, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.0625 = fieldNorm(doc=4462)
          0.027100435 = weight(abstract_txt:various in 4462) [ClassicSimilarity], result of:
            0.027100435 = score(doc=4462,freq=1.0), product of:
              0.09868795 = queryWeight, product of:
                1.1454597 = boost
                4.3937173 = idf(docFreq=1484, maxDocs=44218)
                0.019608853 = queryNorm
              0.27460733 = fieldWeight in 4462, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3937173 = idf(docFreq=1484, maxDocs=44218)
                0.0625 = fieldNorm(doc=4462)
          0.05976322 = weight(abstract_txt:features in 4462) [ClassicSimilarity], result of:
            0.05976322 = score(doc=4462,freq=4.0), product of:
              0.10532932 = queryWeight, product of:
                1.183375 = boost
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.019608853 = queryNorm
              0.56739396 = fieldWeight in 4462, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.0625 = fieldNorm(doc=4462)
          0.03195037 = weight(abstract_txt:scientific in 4462) [ClassicSimilarity], result of:
            0.03195037 = score(doc=4462,freq=1.0), product of:
              0.110136315 = queryWeight, product of:
                1.210077 = boost
                4.6415744 = idf(docFreq=1158, maxDocs=44218)
                0.019608853 = queryNorm
              0.2900984 = fieldWeight in 4462, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6415744 = idf(docFreq=1158, maxDocs=44218)
                0.0625 = fieldNorm(doc=4462)
          0.031692874 = weight(abstract_txt:text in 4462) [ClassicSimilarity], result of:
            0.031692874 = score(doc=4462,freq=1.0), product of:
              0.12539631 = queryWeight, product of:
                1.5813782 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.019608853 = queryNorm
              0.25274166 = fieldWeight in 4462, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=4462)
          0.072467215 = weight(abstract_txt:corpus in 4462) [ClassicSimilarity], result of:
            0.072467215 = score(doc=4462,freq=1.0), product of:
              0.19012578 = queryWeight, product of:
                1.5898943 = boost
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.019608853 = queryNorm
              0.3811541 = fieldWeight in 4462, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.0625 = fieldNorm(doc=4462)
          0.075245544 = weight(abstract_txt:mining in 4462) [ClassicSimilarity], result of:
            0.075245544 = score(doc=4462,freq=1.0), product of:
              0.19495474 = queryWeight, product of:
                1.6099584 = boost
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.019608853 = queryNorm
              0.38596416 = fieldWeight in 4462, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.0625 = fieldNorm(doc=4462)
        0.28 = coord(7/25)
    
  5. HaCohen-Kerner, Y.; Kass, A.; Peretz, A.: Initialism disambiguation : man versus machine (2013) 0.09
    0.092384025 = sum of:
      0.092384025 = product of:
        0.32994294 = sum of:
          0.075370796 = weight(abstract_txt:individually in 1094) [ClassicSimilarity], result of:
            0.075370796 = score(doc=1094,freq=1.0), product of:
              0.15490735 = queryWeight, product of:
                1.0147727 = boost
                7.7848644 = idf(docFreq=49, maxDocs=44218)
                0.019608853 = queryNorm
              0.48655403 = fieldWeight in 1094, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.7848644 = idf(docFreq=49, maxDocs=44218)
                0.0625 = fieldNorm(doc=1094)
          0.022782642 = weight(abstract_txt:methods in 1094) [ClassicSimilarity], result of:
            0.022782642 = score(doc=1094,freq=1.0), product of:
              0.0879055 = queryWeight, product of:
                1.0810748 = boost
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.019608853 = queryNorm
              0.259172 = fieldWeight in 1094, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.0625 = fieldNorm(doc=1094)
          0.023370028 = weight(abstract_txt:language in 1094) [ClassicSimilarity], result of:
            0.023370028 = score(doc=1094,freq=1.0), product of:
              0.08941 = queryWeight, product of:
                1.090287 = boost
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.019608853 = queryNorm
              0.26138046 = fieldWeight in 1094, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.0625 = fieldNorm(doc=1094)
          0.024431767 = weight(abstract_txt:over in 1094) [ClassicSimilarity], result of:
            0.024431767 = score(doc=1094,freq=1.0), product of:
              0.09209793 = queryWeight, product of:
                1.1065542 = boost
                4.244485 = idf(docFreq=1723, maxDocs=44218)
                0.019608853 = queryNorm
              0.2652803 = fieldWeight in 1094, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.244485 = idf(docFreq=1723, maxDocs=44218)
                0.0625 = fieldNorm(doc=1094)
          0.0383258 = weight(abstract_txt:various in 1094) [ClassicSimilarity], result of:
            0.0383258 = score(doc=1094,freq=2.0), product of:
              0.09868795 = queryWeight, product of:
                1.1454597 = boost
                4.3937173 = idf(docFreq=1484, maxDocs=44218)
                0.019608853 = queryNorm
              0.3883534 = fieldWeight in 1094, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.3937173 = idf(docFreq=1484, maxDocs=44218)
                0.0625 = fieldNorm(doc=1094)
          0.0731947 = weight(abstract_txt:features in 1094) [ClassicSimilarity], result of:
            0.0731947 = score(doc=1094,freq=6.0), product of:
              0.10532932 = queryWeight, product of:
                1.183375 = boost
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.019608853 = queryNorm
              0.69491285 = fieldWeight in 1094, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.0625 = fieldNorm(doc=1094)
          0.072467215 = weight(abstract_txt:corpus in 1094) [ClassicSimilarity], result of:
            0.072467215 = score(doc=1094,freq=1.0), product of:
              0.19012578 = queryWeight, product of:
                1.5898943 = boost
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.019608853 = queryNorm
              0.3811541 = fieldWeight in 1094, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.0625 = fieldNorm(doc=1094)
        0.28 = coord(7/25)