Document (#40017)

Author
Teich, E.
Degaetano-Ortlieb, S.
Fankhauser, P.
Kermes, H.
Lapshinova-Koltunski, E.
Title
¬The linguistic construal of disciplinarity : a data-mining approach using register features
Source
Journal of the Association for Information Science and Technology. 67(2016) no.7, S.1668-1678
Year
2016
Abstract
We analyze the linguistic evolution of selected scientific disciplines over a 30-year time span (1970s to 2000s). Our focus is on four highly specialized disciplines at the boundaries of computer science that emerged during that time: computational linguistics, bioinformatics, digital construction, and microelectronics. Our analysis is driven by the question whether these disciplines develop a distinctive language use-both individually and collectively-over the given time period. The data set is the English Scientific Text Corpus (scitex), which includes texts from the 1970s/1980s and early 2000s. Our theoretical basis is register theory. In terms of methods, we combine corpus-based methods of feature extraction (various aggregated features [part-of-speech based], n-grams, lexico-grammatical patterns) and automatic text classification. The results of our research are directly relevant to the study of linguistic variation and languages for specific purposes (LSP) and have implications for various natural language processing (NLP) tasks, for example, authorship attribution, text mining, or training NLP tools.
Content
Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23452/abstract.
Theme
Automatisches Klassifizieren
Data Mining

Similar documents (content)

  1. Altinel, B.; Ganiz, M.C.: Semantic text classification : a survey of past and recent advances (2018) 0.11
    0.11367263 = sum of:
      0.11367263 = product of:
        0.40597367 = sum of:
          0.04035678 = weight(abstract_txt:methods in 52) [ClassicSimilarity], result of:
            0.04035678 = score(doc=52,freq=4.0), product of:
              0.08857891 = queryWeight, product of:
                1.0813112 = boost
                4.1655097 = idf(docFreq=1824, maxDocs=43254)
                0.019665793 = queryNorm
              0.45560262 = fieldWeight in 52, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.1655097 = idf(docFreq=1824, maxDocs=43254)
                0.0546875 = fieldNorm(doc=52)
          0.020568212 = weight(abstract_txt:language in 52) [ClassicSimilarity], result of:
            0.020568212 = score(doc=52,freq=1.0), product of:
              0.089716084 = queryWeight, product of:
                1.08823 = boost
                4.192163 = idf(docFreq=1776, maxDocs=43254)
                0.019665793 = queryNorm
              0.22925891 = fieldWeight in 52, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.192163 = idf(docFreq=1776, maxDocs=43254)
                0.0546875 = fieldNorm(doc=52)
          0.03045126 = weight(abstract_txt:over in 52) [ClassicSimilarity], result of:
            0.03045126 = score(doc=52,freq=2.0), product of:
              0.0924981 = queryWeight, product of:
                1.1049737 = boost
                4.2566643 = idf(docFreq=1665, maxDocs=43254)
                0.019665793 = queryNorm
              0.32920957 = fieldWeight in 52, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2566643 = idf(docFreq=1665, maxDocs=43254)
                0.0546875 = fieldNorm(doc=52)
          0.10028263 = weight(abstract_txt:text in 52) [ClassicSimilarity], result of:
            0.10028263 = score(doc=52,freq=13.0), product of:
              0.12558538 = queryWeight, product of:
                1.5768875 = boost
                4.049738 = idf(docFreq=2048, maxDocs=43254)
                0.019665793 = queryNorm
              0.7985216 = fieldWeight in 52, product of:
                3.6055512 = tf(freq=13.0), with freq of:
                  13.0 = termFreq=13.0
                4.049738 = idf(docFreq=2048, maxDocs=43254)
                0.0546875 = fieldNorm(doc=52)
          0.064420566 = weight(abstract_txt:corpus in 52) [ClassicSimilarity], result of:
            0.064420566 = score(doc=52,freq=1.0), product of:
              0.19205354 = queryWeight, product of:
                1.5921966 = boost
                6.1335816 = idf(docFreq=254, maxDocs=43254)
                0.019665793 = queryNorm
              0.33543023 = fieldWeight in 52, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1335816 = idf(docFreq=254, maxDocs=43254)
                0.0546875 = fieldNorm(doc=52)
          0.066483974 = weight(abstract_txt:mining in 52) [ClassicSimilarity], result of:
            0.066483974 = score(doc=52,freq=1.0), product of:
              0.19613296 = queryWeight, product of:
                1.6090177 = boost
                6.1983814 = idf(docFreq=238, maxDocs=43254)
                0.019665793 = queryNorm
              0.338974 = fieldWeight in 52, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1983814 = idf(docFreq=238, maxDocs=43254)
                0.0546875 = fieldNorm(doc=52)
          0.08341027 = weight(abstract_txt:linguistic in 52) [ClassicSimilarity], result of:
            0.08341027 = score(doc=52,freq=1.0), product of:
              0.26116565 = queryWeight, product of:
                2.2739935 = boost
                5.8400345 = idf(docFreq=341, maxDocs=43254)
                0.019665793 = queryNorm
              0.3193769 = fieldWeight in 52, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8400345 = idf(docFreq=341, maxDocs=43254)
                0.0546875 = fieldNorm(doc=52)
        0.28 = coord(7/25)
    
  2. Castillo, C.; Baeza-Yates, R.: Web retrieval and mining (2009) 0.11
    0.1090845 = sum of:
      0.1090845 = product of:
        0.5454225 = sum of:
          0.034591526 = weight(abstract_txt:methods in 369) [ClassicSimilarity], result of:
            0.034591526 = score(doc=369,freq=1.0), product of:
              0.08857891 = queryWeight, product of:
                1.0813112 = boost
                4.1655097 = idf(docFreq=1824, maxDocs=43254)
                0.019665793 = queryNorm
              0.39051652 = fieldWeight in 369, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1655097 = idf(docFreq=1824, maxDocs=43254)
                0.09375 = fieldNorm(doc=369)
          0.11043526 = weight(abstract_txt:corpus in 369) [ClassicSimilarity], result of:
            0.11043526 = score(doc=369,freq=1.0), product of:
              0.19205354 = queryWeight, product of:
                1.5921966 = boost
                6.1335816 = idf(docFreq=254, maxDocs=43254)
                0.019665793 = queryNorm
              0.5750233 = fieldWeight in 369, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1335816 = idf(docFreq=254, maxDocs=43254)
                0.09375 = fieldNorm(doc=369)
          0.16118148 = weight(abstract_txt:mining in 369) [ClassicSimilarity], result of:
            0.16118148 = score(doc=369,freq=2.0), product of:
              0.19613296 = queryWeight, product of:
                1.6090177 = boost
                6.1983814 = idf(docFreq=238, maxDocs=43254)
                0.019665793 = queryNorm
              0.821797 = fieldWeight in 369, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.1983814 = idf(docFreq=238, maxDocs=43254)
                0.09375 = fieldNorm(doc=369)
          0.05164276 = weight(abstract_txt:time in 369) [ClassicSimilarity], result of:
            0.05164276 = score(doc=369,freq=1.0), product of:
              0.13245058 = queryWeight, product of:
                1.6194148 = boost
                4.158956 = idf(docFreq=1836, maxDocs=43254)
                0.019665793 = queryNorm
              0.3899021 = fieldWeight in 369, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.158956 = idf(docFreq=1836, maxDocs=43254)
                0.09375 = fieldNorm(doc=369)
          0.18757145 = weight(abstract_txt:1970s in 369) [ClassicSimilarity], result of:
            0.18757145 = score(doc=369,freq=1.0), product of:
              0.27339777 = queryWeight, product of:
                1.8996913 = boost
                7.318136 = idf(docFreq=77, maxDocs=43254)
                0.019665793 = queryNorm
              0.68607527 = fieldWeight in 369, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.318136 = idf(docFreq=77, maxDocs=43254)
                0.09375 = fieldNorm(doc=369)
        0.2 = coord(5/25)
    
  3. Mao, J.; Cui, H.: Identifying bacterial biotope entities using sequence labeling : performance and feature analysis (2018) 0.09
    0.09352088 = sum of:
      0.09352088 = product of:
        0.33400315 = sum of:
          0.032613207 = weight(abstract_txt:methods in 463) [ClassicSimilarity], result of:
            0.032613207 = score(doc=463,freq=2.0), product of:
              0.08857891 = queryWeight, product of:
                1.0813112 = boost
                4.1655097 = idf(docFreq=1824, maxDocs=43254)
                0.019665793 = queryNorm
              0.3681825 = fieldWeight in 463, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1655097 = idf(docFreq=1824, maxDocs=43254)
                0.0625 = fieldNorm(doc=463)
          0.027379813 = weight(abstract_txt:various in 463) [ClassicSimilarity], result of:
            0.027379813 = score(doc=463,freq=1.0), product of:
              0.09931884 = queryWeight, product of:
                1.1449891 = boost
                4.410815 = idf(docFreq=1427, maxDocs=43254)
                0.019665793 = queryNorm
              0.27567592 = fieldWeight in 463, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.410815 = idf(docFreq=1427, maxDocs=43254)
                0.0625 = fieldNorm(doc=463)
          0.06012336 = weight(abstract_txt:features in 463) [ClassicSimilarity], result of:
            0.06012336 = score(doc=463,freq=4.0), product of:
              0.105702884 = queryWeight, product of:
                1.181215 = boost
                4.550367 = idf(docFreq=1241, maxDocs=43254)
                0.019665793 = queryNorm
              0.56879586 = fieldWeight in 463, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.550367 = idf(docFreq=1241, maxDocs=43254)
                0.0625 = fieldNorm(doc=463)
          0.032494828 = weight(abstract_txt:scientific in 463) [ClassicSimilarity], result of:
            0.032494828 = score(doc=463,freq=1.0), product of:
              0.11133221 = queryWeight, product of:
                1.2122605 = boost
                4.669963 = idf(docFreq=1101, maxDocs=43254)
                0.019665793 = queryNorm
              0.29187268 = fieldWeight in 463, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.669963 = idf(docFreq=1101, maxDocs=43254)
                0.0625 = fieldNorm(doc=463)
          0.03178674 = weight(abstract_txt:text in 463) [ClassicSimilarity], result of:
            0.03178674 = score(doc=463,freq=1.0), product of:
              0.12558538 = queryWeight, product of:
                1.5768875 = boost
                4.049738 = idf(docFreq=2048, maxDocs=43254)
                0.019665793 = queryNorm
              0.25310862 = fieldWeight in 463, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.049738 = idf(docFreq=2048, maxDocs=43254)
                0.0625 = fieldNorm(doc=463)
          0.07362351 = weight(abstract_txt:corpus in 463) [ClassicSimilarity], result of:
            0.07362351 = score(doc=463,freq=1.0), product of:
              0.19205354 = queryWeight, product of:
                1.5921966 = boost
                6.1335816 = idf(docFreq=254, maxDocs=43254)
                0.019665793 = queryNorm
              0.38334885 = fieldWeight in 463, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1335816 = idf(docFreq=254, maxDocs=43254)
                0.0625 = fieldNorm(doc=463)
          0.07598168 = weight(abstract_txt:mining in 463) [ClassicSimilarity], result of:
            0.07598168 = score(doc=463,freq=1.0), product of:
              0.19613296 = queryWeight, product of:
                1.6090177 = boost
                6.1983814 = idf(docFreq=238, maxDocs=43254)
                0.019665793 = queryNorm
              0.38739884 = fieldWeight in 463, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1983814 = idf(docFreq=238, maxDocs=43254)
                0.0625 = fieldNorm(doc=463)
        0.28 = coord(7/25)
    
  4. Ibekwe-SanJuan, F.: Semantic metadata annotation : tagging Medline abstracts for enhanced information access (2010) 0.09
    0.09320458 = sum of:
      0.09320458 = product of:
        0.4660229 = sum of:
          0.028536556 = weight(abstract_txt:methods in 414) [ClassicSimilarity], result of:
            0.028536556 = score(doc=414,freq=2.0), product of:
              0.08857891 = queryWeight, product of:
                1.0813112 = boost
                4.1655097 = idf(docFreq=1824, maxDocs=43254)
                0.019665793 = queryNorm
              0.3221597 = fieldWeight in 414, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1655097 = idf(docFreq=1824, maxDocs=43254)
                0.0546875 = fieldNorm(doc=414)
          0.0402103 = weight(abstract_txt:scientific in 414) [ClassicSimilarity], result of:
            0.0402103 = score(doc=414,freq=2.0), product of:
              0.11133221 = queryWeight, product of:
                1.2122605 = boost
                4.669963 = idf(docFreq=1101, maxDocs=43254)
                0.019665793 = queryNorm
              0.36117402 = fieldWeight in 414, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.669963 = idf(docFreq=1101, maxDocs=43254)
                0.0546875 = fieldNorm(doc=414)
          0.12854287 = weight(abstract_txt:lexico in 414) [ClassicSimilarity], result of:
            0.12854287 = score(doc=414,freq=1.0), product of:
              0.24159871 = queryWeight, product of:
                1.2627513 = boost
                9.728935 = idf(docFreq=6, maxDocs=43254)
                0.019665793 = queryNorm
              0.53205115 = fieldWeight in 414, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.728935 = idf(docFreq=6, maxDocs=43254)
                0.0546875 = fieldNorm(doc=414)
          0.064420566 = weight(abstract_txt:corpus in 414) [ClassicSimilarity], result of:
            0.064420566 = score(doc=414,freq=1.0), product of:
              0.19205354 = queryWeight, product of:
                1.5921966 = boost
                6.1335816 = idf(docFreq=254, maxDocs=43254)
                0.019665793 = queryNorm
              0.33543023 = fieldWeight in 414, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1335816 = idf(docFreq=254, maxDocs=43254)
                0.0546875 = fieldNorm(doc=414)
          0.20431261 = weight(abstract_txt:linguistic in 414) [ClassicSimilarity], result of:
            0.20431261 = score(doc=414,freq=6.0), product of:
              0.26116565 = queryWeight, product of:
                2.2739935 = boost
                5.8400345 = idf(docFreq=341, maxDocs=43254)
                0.019665793 = queryNorm
              0.7823104 = fieldWeight in 414, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.8400345 = idf(docFreq=341, maxDocs=43254)
                0.0546875 = fieldNorm(doc=414)
        0.2 = coord(5/25)
    
  5. HaCohen-Kerner, Y.; Kass, A.; Peretz, A.: Initialism disambiguation : man versus machine (2013) 0.09
    0.09306316 = sum of:
      0.09306316 = product of:
        0.33236843 = sum of:
          0.07521237 = weight(abstract_txt:individually in 2559) [ClassicSimilarity], result of:
            0.07521237 = score(doc=2559,freq=1.0), product of:
              0.15461828 = queryWeight, product of:
                1.0101851 = boost
                7.783025 = idf(docFreq=48, maxDocs=43254)
                0.019665793 = queryNorm
              0.48643905 = fieldWeight in 2559, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.783025 = idf(docFreq=48, maxDocs=43254)
                0.0625 = fieldNorm(doc=2559)
          0.023061018 = weight(abstract_txt:methods in 2559) [ClassicSimilarity], result of:
            0.023061018 = score(doc=2559,freq=1.0), product of:
              0.08857891 = queryWeight, product of:
                1.0813112 = boost
                4.1655097 = idf(docFreq=1824, maxDocs=43254)
                0.019665793 = queryNorm
              0.26034436 = fieldWeight in 2559, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1655097 = idf(docFreq=1824, maxDocs=43254)
                0.0625 = fieldNorm(doc=2559)
          0.023506528 = weight(abstract_txt:language in 2559) [ClassicSimilarity], result of:
            0.023506528 = score(doc=2559,freq=1.0), product of:
              0.089716084 = queryWeight, product of:
                1.08823 = boost
                4.192163 = idf(docFreq=1776, maxDocs=43254)
                0.019665793 = queryNorm
              0.2620102 = fieldWeight in 2559, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.192163 = idf(docFreq=1776, maxDocs=43254)
                0.0625 = fieldNorm(doc=2559)
          0.024608335 = weight(abstract_txt:over in 2559) [ClassicSimilarity], result of:
            0.024608335 = score(doc=2559,freq=1.0), product of:
              0.0924981 = queryWeight, product of:
                1.1049737 = boost
                4.2566643 = idf(docFreq=1665, maxDocs=43254)
                0.019665793 = queryNorm
              0.26604152 = fieldWeight in 2559, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2566643 = idf(docFreq=1665, maxDocs=43254)
                0.0625 = fieldNorm(doc=2559)
          0.038720902 = weight(abstract_txt:various in 2559) [ClassicSimilarity], result of:
            0.038720902 = score(doc=2559,freq=2.0), product of:
              0.09931884 = queryWeight, product of:
                1.1449891 = boost
                4.410815 = idf(docFreq=1427, maxDocs=43254)
                0.019665793 = queryNorm
              0.38986462 = fieldWeight in 2559, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.410815 = idf(docFreq=1427, maxDocs=43254)
                0.0625 = fieldNorm(doc=2559)
          0.07363578 = weight(abstract_txt:features in 2559) [ClassicSimilarity], result of:
            0.07363578 = score(doc=2559,freq=6.0), product of:
              0.105702884 = queryWeight, product of:
                1.181215 = boost
                4.550367 = idf(docFreq=1241, maxDocs=43254)
                0.019665793 = queryNorm
              0.6966298 = fieldWeight in 2559, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.550367 = idf(docFreq=1241, maxDocs=43254)
                0.0625 = fieldNorm(doc=2559)
          0.07362351 = weight(abstract_txt:corpus in 2559) [ClassicSimilarity], result of:
            0.07362351 = score(doc=2559,freq=1.0), product of:
              0.19205354 = queryWeight, product of:
                1.5921966 = boost
                6.1335816 = idf(docFreq=254, maxDocs=43254)
                0.019665793 = queryNorm
              0.38334885 = fieldWeight in 2559, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1335816 = idf(docFreq=254, maxDocs=43254)
                0.0625 = fieldNorm(doc=2559)
        0.28 = coord(7/25)