Search (7 results, page 1 of 1)

  • × year_i:[2010 TO 2020}
  • × theme_ss:"Data Mining"
  1. Zhang, Z.; Li, Q.; Zeng, D.; Ga, H.: Extracting evolutionary communities in community question answering (2014) 0.03
    0.028653976 = product of:
      0.1146159 = sum of:
        0.1146159 = weight(_text_:evolution in 1286) [ClassicSimilarity], result of:
          0.1146159 = score(doc=1286,freq=8.0), product of:
            0.19585751 = queryWeight, product of:
              5.29663 = idf(docFreq=601, maxDocs=44218)
              0.03697776 = queryNorm
            0.5852004 = fieldWeight in 1286, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              5.29663 = idf(docFreq=601, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1286)
      0.25 = coord(1/4)
    
    Abstract
    With the rapid growth of Web 2.0, community question answering (CQA) has become a prevalent information seeking channel, in which users form interactive communities by posting questions and providing answers. Communities may evolve over time, because of changes in users' interests, activities, and new users joining the network. To better understand user interactions in CQA communities, it is necessary to analyze the community structures and track community evolution over time. Existing work in CQA focuses on question searching or content quality detection, and the important problems of community extraction and evolutionary pattern detection have not been studied. In this article, we propose a probabilistic community model (PCM) to extract overlapping community structures and capture their evolution patterns in CQA. The empirical results show that our algorithm appears to improve the community extraction quality. We show empirically, using the iPhone data set, that interesting community evolution patterns can be discovered, with each evolution pattern reflecting the variation of users' interests over time. Our analysis suggests that individual users could benefit to gain comprehensive information from tracking the transition of products. We also show that the communities provide a decision-making basis for business.
  2. Song, J.; Huang, Y.; Qi, X.; Li, Y.; Li, F.; Fu, K.; Huang, T.: Discovering hierarchical topic evolution in time-stamped documents (2016) 0.02
    0.024313705 = product of:
      0.09725482 = sum of:
        0.09725482 = weight(_text_:evolution in 2853) [ClassicSimilarity], result of:
          0.09725482 = score(doc=2853,freq=4.0), product of:
            0.19585751 = queryWeight, product of:
              5.29663 = idf(docFreq=601, maxDocs=44218)
              0.03697776 = queryNorm
            0.49655905 = fieldWeight in 2853, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.29663 = idf(docFreq=601, maxDocs=44218)
              0.046875 = fieldNorm(doc=2853)
      0.25 = coord(1/4)
    
    Abstract
    The objective of this paper is to propose a hierarchical topic evolution model (HTEM) that can organize time-varying topics in a hierarchy and discover their evolutions with multiple timescales. In the proposed HTEM, topics near the root of the hierarchy are more abstract and also evolve in the longer timescales than those near the leaves. To achieve this goal, the distance-dependent Chinese restaurant process (ddCRP) is extended to a new nested process that is able to simultaneously model the dependencies among data and the relationship between clusters. The HTEM is proposed based on the new process for time-stamped documents, in which the timestamp is utilized to measure the dependencies among documents. Moreover, an efficient Gibbs sampler is developed for the proposed HTEM. Our experimental results on two popular real-world data sets verify that the proposed HTEM can capture coherent topics and discover their hierarchical evolutions. It also outperforms the baseline model in terms of likelihood on held-out data.
  3. Teich, E.; Degaetano-Ortlieb, S.; Fankhauser, P.; Kermes, H.; Lapshinova-Koltunski, E.: ¬The linguistic construal of disciplinarity : a data-mining approach using register features (2016) 0.02
    0.017192384 = product of:
      0.06876954 = sum of:
        0.06876954 = weight(_text_:evolution in 3015) [ClassicSimilarity], result of:
          0.06876954 = score(doc=3015,freq=2.0), product of:
            0.19585751 = queryWeight, product of:
              5.29663 = idf(docFreq=601, maxDocs=44218)
              0.03697776 = queryNorm
            0.35112026 = fieldWeight in 3015, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.29663 = idf(docFreq=601, maxDocs=44218)
              0.046875 = fieldNorm(doc=3015)
      0.25 = coord(1/4)
    
    Abstract
    We analyze the linguistic evolution of selected scientific disciplines over a 30-year time span (1970s to 2000s). Our focus is on four highly specialized disciplines at the boundaries of computer science that emerged during that time: computational linguistics, bioinformatics, digital construction, and microelectronics. Our analysis is driven by the question whether these disciplines develop a distinctive language use-both individually and collectively-over the given time period. The data set is the English Scientific Text Corpus (scitex), which includes texts from the 1970s/1980s and early 2000s. Our theoretical basis is register theory. In terms of methods, we combine corpus-based methods of feature extraction (various aggregated features [part-of-speech based], n-grams, lexico-grammatical patterns) and automatic text classification. The results of our research are directly relevant to the study of linguistic variation and languages for specific purposes (LSP) and have implications for various natural language processing (NLP) tasks, for example, authorship attribution, text mining, or training NLP tools.
  4. Hallonsten, O.; Holmberg, D.: Analyzing structural stratification in the Swedish higher education system : data contextualization with policy-history analysis (2013) 0.00
    0.0020874902 = product of:
      0.008349961 = sum of:
        0.008349961 = product of:
          0.025049882 = sum of:
            0.025049882 = weight(_text_:22 in 668) [ClassicSimilarity], result of:
              0.025049882 = score(doc=668,freq=2.0), product of:
                0.12948982 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03697776 = queryNorm
                0.19345059 = fieldWeight in 668, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=668)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Date
    22. 3.2013 19:43:01
  5. Vaughan, L.; Chen, Y.: Data mining from web search queries : a comparison of Google trends and Baidu index (2015) 0.00
    0.0020874902 = product of:
      0.008349961 = sum of:
        0.008349961 = product of:
          0.025049882 = sum of:
            0.025049882 = weight(_text_:22 in 1605) [ClassicSimilarity], result of:
              0.025049882 = score(doc=1605,freq=2.0), product of:
                0.12948982 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03697776 = queryNorm
                0.19345059 = fieldWeight in 1605, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1605)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Source
    Journal of the Association for Information Science and Technology. 66(2015) no.1, S.13-22
  6. Fonseca, F.; Marcinkowski, M.; Davis, C.: Cyber-human systems of thought and understanding (2019) 0.00
    0.0020874902 = product of:
      0.008349961 = sum of:
        0.008349961 = product of:
          0.025049882 = sum of:
            0.025049882 = weight(_text_:22 in 5011) [ClassicSimilarity], result of:
              0.025049882 = score(doc=5011,freq=2.0), product of:
                0.12948982 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03697776 = queryNorm
                0.19345059 = fieldWeight in 5011, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5011)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Date
    7. 3.2019 16:32:22
  7. Jäger, L.: Von Big Data zu Big Brother (2018) 0.00
    0.0016699921 = product of:
      0.0066799684 = sum of:
        0.0066799684 = product of:
          0.020039905 = sum of:
            0.020039905 = weight(_text_:22 in 5234) [ClassicSimilarity], result of:
              0.020039905 = score(doc=5234,freq=2.0), product of:
                0.12948982 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03697776 = queryNorm
                0.15476047 = fieldWeight in 5234, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03125 = fieldNorm(doc=5234)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Date
    22. 1.2018 11:33:49