Search (8 results, page 1 of 1)

  • × theme_ss:"Data Mining"
  • × year_i:[2010 TO 2020}
  1. Maaten, L. van den: Accelerating t-SNE using Tree-Based Algorithms (2014) 0.02
    0.016259817 = product of:
      0.032519635 = sum of:
        0.032519635 = product of:
          0.06503927 = sum of:
            0.06503927 = weight(_text_:n in 3886) [ClassicSimilarity], result of:
              0.06503927 = score(doc=3886,freq=2.0), product of:
                0.19504215 = queryWeight, product of:
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.045236014 = queryNorm
                0.33346266 = fieldWeight in 3886, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=3886)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The paper investigates the acceleration of t-SNE-an embedding technique that is commonly used for the visualization of high-dimensional data in scatter plots-using two tree-based algorithms. In particular, the paper develops variants of the Barnes-Hut algorithm and of the dual-tree algorithm that approximate the gradient used for learning t-SNE embeddings in O(N*logN). Our experiments show that the resulting algorithms substantially accelerate t-SNE, and that they make it possible to learn embeddings of data sets with millions of objects. Somewhat counterintuitively, the Barnes-Hut variant of t-SNE appears to outperform the dual-tree variant.
  2. Teich, E.; Degaetano-Ortlieb, S.; Fankhauser, P.; Kermes, H.; Lapshinova-Koltunski, E.: ¬The linguistic construal of disciplinarity : a data-mining approach using register features (2016) 0.01
    0.013936987 = product of:
      0.027873974 = sum of:
        0.027873974 = product of:
          0.05574795 = sum of:
            0.05574795 = weight(_text_:n in 3015) [ClassicSimilarity], result of:
              0.05574795 = score(doc=3015,freq=2.0), product of:
                0.19504215 = queryWeight, product of:
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.045236014 = queryNorm
                0.28582513 = fieldWeight in 3015, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3015)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    We analyze the linguistic evolution of selected scientific disciplines over a 30-year time span (1970s to 2000s). Our focus is on four highly specialized disciplines at the boundaries of computer science that emerged during that time: computational linguistics, bioinformatics, digital construction, and microelectronics. Our analysis is driven by the question whether these disciplines develop a distinctive language use-both individually and collectively-over the given time period. The data set is the English Scientific Text Corpus (scitex), which includes texts from the 1970s/1980s and early 2000s. Our theoretical basis is register theory. In terms of methods, we combine corpus-based methods of feature extraction (various aggregated features [part-of-speech based], n-grams, lexico-grammatical patterns) and automatic text classification. The results of our research are directly relevant to the study of linguistic variation and languages for specific purposes (LSP) and have implications for various natural language processing (NLP) tasks, for example, authorship attribution, text mining, or training NLP tools.
  3. Suakkaphong, N.; Zhang, Z.; Chen, H.: Disease named entity recognition using semisupervised learning and conditional random fields (2011) 0.01
    0.011614156 = product of:
      0.023228312 = sum of:
        0.023228312 = product of:
          0.046456624 = sum of:
            0.046456624 = weight(_text_:n in 4367) [ClassicSimilarity], result of:
              0.046456624 = score(doc=4367,freq=2.0), product of:
                0.19504215 = queryWeight, product of:
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.045236014 = queryNorm
                0.23818761 = fieldWeight in 4367, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4367)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
  4. Tu, Y.-N.; Hsu, S.-L.: Constructing conceptual trajectory maps to trace the development of research fields (2016) 0.01
    0.011614156 = product of:
      0.023228312 = sum of:
        0.023228312 = product of:
          0.046456624 = sum of:
            0.046456624 = weight(_text_:n in 3059) [ClassicSimilarity], result of:
              0.046456624 = score(doc=3059,freq=2.0), product of:
                0.19504215 = queryWeight, product of:
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.045236014 = queryNorm
                0.23818761 = fieldWeight in 3059, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3059)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
  5. Hallonsten, O.; Holmberg, D.: Analyzing structural stratification in the Swedish higher education system : data contextualization with policy-history analysis (2013) 0.01
    0.0076610697 = product of:
      0.0153221395 = sum of:
        0.0153221395 = product of:
          0.030644279 = sum of:
            0.030644279 = weight(_text_:22 in 668) [ClassicSimilarity], result of:
              0.030644279 = score(doc=668,freq=2.0), product of:
                0.15840882 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.045236014 = queryNorm
                0.19345059 = fieldWeight in 668, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=668)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    22. 3.2013 19:43:01
  6. Vaughan, L.; Chen, Y.: Data mining from web search queries : a comparison of Google trends and Baidu index (2015) 0.01
    0.0076610697 = product of:
      0.0153221395 = sum of:
        0.0153221395 = product of:
          0.030644279 = sum of:
            0.030644279 = weight(_text_:22 in 1605) [ClassicSimilarity], result of:
              0.030644279 = score(doc=1605,freq=2.0), product of:
                0.15840882 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.045236014 = queryNorm
                0.19345059 = fieldWeight in 1605, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1605)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Source
    Journal of the Association for Information Science and Technology. 66(2015) no.1, S.13-22
  7. Fonseca, F.; Marcinkowski, M.; Davis, C.: Cyber-human systems of thought and understanding (2019) 0.01
    0.0076610697 = product of:
      0.0153221395 = sum of:
        0.0153221395 = product of:
          0.030644279 = sum of:
            0.030644279 = weight(_text_:22 in 5011) [ClassicSimilarity], result of:
              0.030644279 = score(doc=5011,freq=2.0), product of:
                0.15840882 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.045236014 = queryNorm
                0.19345059 = fieldWeight in 5011, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5011)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    7. 3.2019 16:32:22
  8. Jäger, L.: Von Big Data zu Big Brother (2018) 0.01
    0.0061288555 = product of:
      0.012257711 = sum of:
        0.012257711 = product of:
          0.024515422 = sum of:
            0.024515422 = weight(_text_:22 in 5234) [ClassicSimilarity], result of:
              0.024515422 = score(doc=5234,freq=2.0), product of:
                0.15840882 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.045236014 = queryNorm
                0.15476047 = fieldWeight in 5234, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03125 = fieldNorm(doc=5234)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    22. 1.2018 11:33:49