Document (#40014)

Author
Teich, E.
Degaetano-Ortlieb, S.
Fankhauser, P.
Kermes, H.
Lapshinova-Koltunski, E.
Title
¬The linguistic construal of disciplinarity : a data-mining approach using register features
Source
Journal of the Association for Information Science and Technology. 67(2016) no.7, S.1668-1678
Year
2016
Abstract
We analyze the linguistic evolution of selected scientific disciplines over a 30-year time span (1970s to 2000s). Our focus is on four highly specialized disciplines at the boundaries of computer science that emerged during that time: computational linguistics, bioinformatics, digital construction, and microelectronics. Our analysis is driven by the question whether these disciplines develop a distinctive language use-both individually and collectively-over the given time period. The data set is the English Scientific Text Corpus (scitex), which includes texts from the 1970s/1980s and early 2000s. Our theoretical basis is register theory. In terms of methods, we combine corpus-based methods of feature extraction (various aggregated features [part-of-speech based], n-grams, lexico-grammatical patterns) and automatic text classification. The results of our research are directly relevant to the study of linguistic variation and languages for specific purposes (LSP) and have implications for various natural language processing (NLP) tasks, for example, authorship attribution, text mining, or training NLP tools.
Content
Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23452/abstract.
Theme
Automatisches Klassifizieren
Data Mining

Similar documents (content)

  1. Altinel, B.; Ganiz, M.C.: Semantic text classification : a survey of past and recent advances (2018) 0.11
    0.11339271 = sum of:
      0.11339271 = product of:
        0.40497395 = sum of:
          0.0401399 = weight(abstract_txt:methods in 1337) [ClassicSimilarity], result of:
            0.0401399 = score(doc=1337,freq=4.0), product of:
              0.08825517 = queryWeight, product of:
                1.0784713 = boost
                4.1583214 = idf(docFreq=1850, maxDocs=43556)
                0.019679476 = queryNorm
              0.4548164 = fieldWeight in 1337, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.1583214 = idf(docFreq=1850, maxDocs=43556)
                0.0546875 = fieldNorm(doc=1337)
          0.020534452 = weight(abstract_txt:language in 1337) [ClassicSimilarity], result of:
            0.020534452 = score(doc=1337,freq=1.0), product of:
              0.089611694 = queryWeight, product of:
                1.086728 = boost
                4.1901574 = idf(docFreq=1792, maxDocs=43556)
                0.019679476 = queryNorm
              0.22914924 = fieldWeight in 1337, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1901574 = idf(docFreq=1792, maxDocs=43556)
                0.0546875 = fieldNorm(doc=1337)
          0.0304147 = weight(abstract_txt:over in 1337) [ClassicSimilarity], result of:
            0.0304147 = score(doc=1337,freq=2.0), product of:
              0.09241767 = queryWeight, product of:
                1.103611 = boost
                4.255254 = idf(docFreq=1679, maxDocs=43556)
                0.019679476 = queryNorm
              0.3291005 = fieldWeight in 1337, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.255254 = idf(docFreq=1679, maxDocs=43556)
                0.0546875 = fieldNorm(doc=1337)
          0.1002369 = weight(abstract_txt:text in 1337) [ClassicSimilarity], result of:
            0.1002369 = score(doc=1337,freq=13.0), product of:
              0.12553853 = queryWeight, product of:
                1.575334 = boost
                4.0494018 = idf(docFreq=2063, maxDocs=43556)
                0.019679476 = queryNorm
              0.7984553 = fieldWeight in 1337, product of:
                3.6055512 = tf(freq=13.0), with freq of:
                  13.0 = termFreq=13.0
                4.0494018 = idf(docFreq=2063, maxDocs=43556)
                0.0546875 = fieldNorm(doc=1337)
          0.06389513 = weight(abstract_txt:corpus in 1337) [ClassicSimilarity], result of:
            0.06389513 = score(doc=1337,freq=1.0), product of:
              0.19099462 = queryWeight, product of:
                1.5865328 = boost
                6.1172824 = idf(docFreq=260, maxDocs=43556)
                0.019679476 = queryNorm
              0.33453888 = fieldWeight in 1337, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1172824 = idf(docFreq=260, maxDocs=43556)
                0.0546875 = fieldNorm(doc=1337)
          0.066559754 = weight(abstract_txt:mining in 1337) [ClassicSimilarity], result of:
            0.066559754 = score(doc=1337,freq=1.0), product of:
              0.19626842 = queryWeight, product of:
                1.6082877 = boost
                6.201164 = idf(docFreq=239, maxDocs=43556)
                0.019679476 = queryNorm
              0.33912614 = fieldWeight in 1337, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.201164 = idf(docFreq=239, maxDocs=43556)
                0.0546875 = fieldNorm(doc=1337)
          0.083193086 = weight(abstract_txt:linguistic in 1337) [ClassicSimilarity], result of:
            0.083193086 = score(doc=1337,freq=1.0), product of:
              0.26069412 = queryWeight, product of:
                2.270125 = boost
                5.835364 = idf(docFreq=345, maxDocs=43556)
                0.019679476 = queryNorm
              0.31912145 = fieldWeight in 1337, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.835364 = idf(docFreq=345, maxDocs=43556)
                0.0546875 = fieldNorm(doc=1337)
        0.28 = coord(7/25)
    
  2. Castillo, C.; Baeza-Yates, R.: Web retrieval and mining (2009) 0.11
    0.108996354 = sum of:
      0.108996354 = product of:
        0.5449818 = sum of:
          0.034405626 = weight(abstract_txt:methods in 902) [ClassicSimilarity], result of:
            0.034405626 = score(doc=902,freq=1.0), product of:
              0.08825517 = queryWeight, product of:
                1.0784713 = boost
                4.1583214 = idf(docFreq=1850, maxDocs=43556)
                0.019679476 = queryNorm
              0.38984263 = fieldWeight in 902, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1583214 = idf(docFreq=1850, maxDocs=43556)
                0.09375 = fieldNorm(doc=902)
          0.1095345 = weight(abstract_txt:corpus in 902) [ClassicSimilarity], result of:
            0.1095345 = score(doc=902,freq=1.0), product of:
              0.19099462 = queryWeight, product of:
                1.5865328 = boost
                6.1172824 = idf(docFreq=260, maxDocs=43556)
                0.019679476 = queryNorm
              0.5734952 = fieldWeight in 902, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1172824 = idf(docFreq=260, maxDocs=43556)
                0.09375 = fieldNorm(doc=902)
          0.16136521 = weight(abstract_txt:mining in 902) [ClassicSimilarity], result of:
            0.16136521 = score(doc=902,freq=2.0), product of:
              0.19626842 = queryWeight, product of:
                1.6082877 = boost
                6.201164 = idf(docFreq=239, maxDocs=43556)
                0.019679476 = queryNorm
              0.8221659 = fieldWeight in 902, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.201164 = idf(docFreq=239, maxDocs=43556)
                0.09375 = fieldNorm(doc=902)
          0.05160844 = weight(abstract_txt:time in 902) [ClassicSimilarity], result of:
            0.05160844 = score(doc=902,freq=1.0), product of:
              0.13238275 = queryWeight, product of:
                1.6177069 = boost
                4.1583214 = idf(docFreq=1850, maxDocs=43556)
                0.019679476 = queryNorm
              0.38984263 = fieldWeight in 902, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1583214 = idf(docFreq=1850, maxDocs=43556)
                0.09375 = fieldNorm(doc=902)
          0.18806799 = weight(abstract_txt:1970s in 902) [ClassicSimilarity], result of:
            0.18806799 = score(doc=902,freq=1.0), product of:
              0.27386114 = queryWeight, product of:
                1.8997818 = boost
                7.3250937 = idf(docFreq=77, maxDocs=43556)
                0.019679476 = queryNorm
              0.6867275 = fieldWeight in 902, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3250937 = idf(docFreq=77, maxDocs=43556)
                0.09375 = fieldNorm(doc=902)
        0.2 = coord(5/25)
    
  3. Mao, J.; Cui, H.: Identifying bacterial biotope entities using sequence labeling : performance and feature analysis (2018) 0.09
    0.093276456 = sum of:
      0.093276456 = product of:
        0.3331302 = sum of:
          0.032437935 = weight(abstract_txt:methods in 748) [ClassicSimilarity], result of:
            0.032437935 = score(doc=748,freq=2.0), product of:
              0.08825517 = queryWeight, product of:
                1.0784713 = boost
                4.1583214 = idf(docFreq=1850, maxDocs=43556)
                0.019679476 = queryNorm
              0.36754715 = fieldWeight in 748, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1583214 = idf(docFreq=1850, maxDocs=43556)
                0.0625 = fieldNorm(doc=748)
          0.027283449 = weight(abstract_txt:various in 748) [ClassicSimilarity], result of:
            0.027283449 = score(doc=748,freq=1.0), product of:
              0.09907883 = queryWeight, product of:
                1.1426914 = boost
                4.405938 = idf(docFreq=1444, maxDocs=43556)
                0.019679476 = queryNorm
              0.27537113 = fieldWeight in 748, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.405938 = idf(docFreq=1444, maxDocs=43556)
                0.0625 = fieldNorm(doc=748)
          0.0601005 = weight(abstract_txt:features in 748) [ClassicSimilarity], result of:
            0.0601005 = score(doc=748,freq=4.0), product of:
              0.10566879 = queryWeight, product of:
                1.1800811 = boost
                4.550104 = idf(docFreq=1250, maxDocs=43556)
                0.019679476 = queryNorm
              0.568763 = fieldWeight in 748, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.550104 = idf(docFreq=1250, maxDocs=43556)
                0.0625 = fieldNorm(doc=748)
          0.0324448 = weight(abstract_txt:scientific in 748) [ClassicSimilarity], result of:
            0.0324448 = score(doc=748,freq=1.0), product of:
              0.111210234 = queryWeight, product of:
                1.2106285 = boost
                4.667887 = idf(docFreq=1111, maxDocs=43556)
                0.019679476 = queryNorm
              0.29174295 = fieldWeight in 748, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.667887 = idf(docFreq=1111, maxDocs=43556)
                0.0625 = fieldNorm(doc=748)
          0.031772245 = weight(abstract_txt:text in 748) [ClassicSimilarity], result of:
            0.031772245 = score(doc=748,freq=1.0), product of:
              0.12553853 = queryWeight, product of:
                1.575334 = boost
                4.0494018 = idf(docFreq=2063, maxDocs=43556)
                0.019679476 = queryNorm
              0.2530876 = fieldWeight in 748, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0494018 = idf(docFreq=2063, maxDocs=43556)
                0.0625 = fieldNorm(doc=748)
          0.073023 = weight(abstract_txt:corpus in 748) [ClassicSimilarity], result of:
            0.073023 = score(doc=748,freq=1.0), product of:
              0.19099462 = queryWeight, product of:
                1.5865328 = boost
                6.1172824 = idf(docFreq=260, maxDocs=43556)
                0.019679476 = queryNorm
              0.38233015 = fieldWeight in 748, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1172824 = idf(docFreq=260, maxDocs=43556)
                0.0625 = fieldNorm(doc=748)
          0.07606829 = weight(abstract_txt:mining in 748) [ClassicSimilarity], result of:
            0.07606829 = score(doc=748,freq=1.0), product of:
              0.19626842 = queryWeight, product of:
                1.6082877 = boost
                6.201164 = idf(docFreq=239, maxDocs=43556)
                0.019679476 = queryNorm
              0.38757274 = fieldWeight in 748, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.201164 = idf(docFreq=239, maxDocs=43556)
                0.0625 = fieldNorm(doc=748)
        0.28 = coord(7/25)
    
  4. Ibekwe-SanJuan, F.: Semantic metadata annotation : tagging Medline abstracts for enhanced information access (2010) 0.09
    0.0929999 = sum of:
      0.0929999 = product of:
        0.4649995 = sum of:
          0.028383195 = weight(abstract_txt:methods in 947) [ClassicSimilarity], result of:
            0.028383195 = score(doc=947,freq=2.0), product of:
              0.08825517 = queryWeight, product of:
                1.0784713 = boost
                4.1583214 = idf(docFreq=1850, maxDocs=43556)
                0.019679476 = queryNorm
              0.32160378 = fieldWeight in 947, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1583214 = idf(docFreq=1850, maxDocs=43556)
                0.0546875 = fieldNorm(doc=947)
          0.040148392 = weight(abstract_txt:scientific in 947) [ClassicSimilarity], result of:
            0.040148392 = score(doc=947,freq=2.0), product of:
              0.111210234 = queryWeight, product of:
                1.2106285 = boost
                4.667887 = idf(docFreq=1111, maxDocs=43556)
                0.019679476 = queryNorm
              0.36101347 = fieldWeight in 947, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.667887 = idf(docFreq=1111, maxDocs=43556)
                0.0546875 = fieldNorm(doc=947)
          0.12879215 = weight(abstract_txt:lexico in 947) [ClassicSimilarity], result of:
            0.12879215 = score(doc=947,freq=1.0), product of:
              0.24189426 = queryWeight, product of:
                1.2625142 = boost
                9.735892 = idf(docFreq=6, maxDocs=43556)
                0.019679476 = queryNorm
              0.5324316 = fieldWeight in 947, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.735892 = idf(docFreq=6, maxDocs=43556)
                0.0546875 = fieldNorm(doc=947)
          0.06389513 = weight(abstract_txt:corpus in 947) [ClassicSimilarity], result of:
            0.06389513 = score(doc=947,freq=1.0), product of:
              0.19099462 = queryWeight, product of:
                1.5865328 = boost
                6.1172824 = idf(docFreq=260, maxDocs=43556)
                0.019679476 = queryNorm
              0.33453888 = fieldWeight in 947, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1172824 = idf(docFreq=260, maxDocs=43556)
                0.0546875 = fieldNorm(doc=947)
          0.20378062 = weight(abstract_txt:linguistic in 947) [ClassicSimilarity], result of:
            0.20378062 = score(doc=947,freq=6.0), product of:
              0.26069412 = queryWeight, product of:
                2.270125 = boost
                5.835364 = idf(docFreq=345, maxDocs=43556)
                0.019679476 = queryNorm
              0.78168476 = fieldWeight in 947, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.835364 = idf(docFreq=345, maxDocs=43556)
                0.0546875 = fieldNorm(doc=947)
        0.2 = coord(5/25)
    
  5. HaCohen-Kerner, Y.; Kass, A.; Peretz, A.: Initialism disambiguation : man versus machine (2013) 0.09
    0.09284739 = sum of:
      0.09284739 = product of:
        0.33159783 = sum of:
          0.07539864 = weight(abstract_txt:individually in 3092) [ClassicSimilarity], result of:
            0.07539864 = score(doc=3092,freq=1.0), product of:
              0.15486276 = queryWeight, product of:
                1.010176 = boost
                7.7899823 = idf(docFreq=48, maxDocs=43556)
                0.019679476 = queryNorm
              0.4868739 = fieldWeight in 3092, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.7899823 = idf(docFreq=48, maxDocs=43556)
                0.0625 = fieldNorm(doc=3092)
          0.022937084 = weight(abstract_txt:methods in 3092) [ClassicSimilarity], result of:
            0.022937084 = score(doc=3092,freq=1.0), product of:
              0.08825517 = queryWeight, product of:
                1.0784713 = boost
                4.1583214 = idf(docFreq=1850, maxDocs=43556)
                0.019679476 = queryNorm
              0.2598951 = fieldWeight in 3092, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1583214 = idf(docFreq=1850, maxDocs=43556)
                0.0625 = fieldNorm(doc=3092)
          0.023467945 = weight(abstract_txt:language in 3092) [ClassicSimilarity], result of:
            0.023467945 = score(doc=3092,freq=1.0), product of:
              0.089611694 = queryWeight, product of:
                1.086728 = boost
                4.1901574 = idf(docFreq=1792, maxDocs=43556)
                0.019679476 = queryNorm
              0.26188484 = fieldWeight in 3092, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1901574 = idf(docFreq=1792, maxDocs=43556)
                0.0625 = fieldNorm(doc=3092)
          0.024578791 = weight(abstract_txt:over in 3092) [ClassicSimilarity], result of:
            0.024578791 = score(doc=3092,freq=1.0), product of:
              0.09241767 = queryWeight, product of:
                1.103611 = boost
                4.255254 = idf(docFreq=1679, maxDocs=43556)
                0.019679476 = queryNorm
              0.26595336 = fieldWeight in 3092, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.255254 = idf(docFreq=1679, maxDocs=43556)
                0.0625 = fieldNorm(doc=3092)
          0.038584623 = weight(abstract_txt:various in 3092) [ClassicSimilarity], result of:
            0.038584623 = score(doc=3092,freq=2.0), product of:
              0.09907883 = queryWeight, product of:
                1.1426914 = boost
                4.405938 = idf(docFreq=1444, maxDocs=43556)
                0.019679476 = queryNorm
              0.3894336 = fieldWeight in 3092, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.405938 = idf(docFreq=1444, maxDocs=43556)
                0.0625 = fieldNorm(doc=3092)
          0.07360778 = weight(abstract_txt:features in 3092) [ClassicSimilarity], result of:
            0.07360778 = score(doc=3092,freq=6.0), product of:
              0.10566879 = queryWeight, product of:
                1.1800811 = boost
                4.550104 = idf(docFreq=1250, maxDocs=43556)
                0.019679476 = queryNorm
              0.6965896 = fieldWeight in 3092, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.550104 = idf(docFreq=1250, maxDocs=43556)
                0.0625 = fieldNorm(doc=3092)
          0.073023 = weight(abstract_txt:corpus in 3092) [ClassicSimilarity], result of:
            0.073023 = score(doc=3092,freq=1.0), product of:
              0.19099462 = queryWeight, product of:
                1.5865328 = boost
                6.1172824 = idf(docFreq=260, maxDocs=43556)
                0.019679476 = queryNorm
              0.38233015 = fieldWeight in 3092, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1172824 = idf(docFreq=260, maxDocs=43556)
                0.0625 = fieldNorm(doc=3092)
        0.28 = coord(7/25)