Search (5 results, page 1 of 1)

  • × theme_ss:"Data Mining"
  • × theme_ss:"Automatisches Klassifizieren"
  1. Liu, X.; Yu, S.; Janssens, F.; Glänzel, W.; Moreau, Y.; Moor, B.de: Weighted hybrid clustering by combining text mining and bibliometrics on a large-scale journal database (2010) 0.00
    0.0025239778 = product of:
      0.010095911 = sum of:
        0.010095911 = weight(_text_:information in 3464) [ClassicSimilarity], result of:
          0.010095911 = score(doc=3464,freq=4.0), product of:
            0.06134496 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.034944877 = queryNorm
            0.16457605 = fieldWeight in 3464, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=3464)
      0.25 = coord(1/4)
    
    Abstract
    We propose a new hybrid clustering framework to incorporate text mining with bibliometrics in journal set analysis. The framework integrates two different approaches: clustering ensemble and kernel-fusion clustering. To improve the flexibility and the efficiency of processing large-scale data, we propose an information-based weighting scheme to leverage the effect of multiple data sources in hybrid clustering. Three different algorithms are extended by the proposed weighting scheme and they are employed on a large journal set retrieved from the Web of Science (WoS) database. The clustering performance of the proposed algorithms is systematically evaluated using multiple evaluation methods, and they were cross-compared with alternative methods. Experimental results demonstrate that the proposed weighted hybrid clustering strategy is superior to other methods in clustering performance and efficiency. The proposed approach also provides a more refined structural mapping of journal sets, which is useful for monitoring and detecting new trends in different scientific fields.
    Source
    Journal of the American Society for Information Science and Technology. 61(2010) no.6, S.1105-1119
  2. Classification, automation, and new media : Proceedings of the 24th Annual Conference of the Gesellschaft für Klassifikation e.V., University of Passau, March 15 - 17, 2000 (2002) 0.00
    0.0021033147 = product of:
      0.008413259 = sum of:
        0.008413259 = weight(_text_:information in 5997) [ClassicSimilarity], result of:
          0.008413259 = score(doc=5997,freq=4.0), product of:
            0.06134496 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.034944877 = queryNorm
            0.13714671 = fieldWeight in 5997, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5997)
      0.25 = coord(1/4)
    
    Abstract
    Given the huge amount of information in the internet and in practically every domain of knowledge that we are facing today, knowledge discovery calls for automation. The book deals with methods from classification and data analysis that respond effectively to this rapidly growing challenge. The interested reader will find new methodological insights as well as applications in economics, management science, finance, and marketing, and in pattern recognition, biology, health, and archaeology.
    Content
    Data Analysis, Statistics, and Classification.- Pattern Recognition and Automation.- Data Mining, Information Processing, and Automation.- New Media, Web Mining, and Automation.- Applications in Management Science, Finance, and Marketing.- Applications in Medicine, Biology, Archaeology, and Others.- Author Index.- Subject Index.
  3. Ma, Z.; Sun, A.; Cong, G.: On predicting the popularity of newly emerging hashtags in Twitter (2013) 0.00
    0.0021033147 = product of:
      0.008413259 = sum of:
        0.008413259 = weight(_text_:information in 967) [ClassicSimilarity], result of:
          0.008413259 = score(doc=967,freq=4.0), product of:
            0.06134496 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.034944877 = queryNorm
            0.13714671 = fieldWeight in 967, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=967)
      0.25 = coord(1/4)
    
    Abstract
    Because of Twitter's popularity and the viral nature of information dissemination on Twitter, predicting which Twitter topics will become popular in the near future becomes a task of considerable economic importance. Many Twitter topics are annotated by hashtags. In this article, we propose methods to predict the popularity of new hashtags on Twitter by formulating the problem as a classification task. We use five standard classification models (i.e., Naïve bayes, k-nearest neighbors, decision trees, support vector machines, and logistic regression) for prediction. The main challenge is the identification of effective features for describing new hashtags. We extract 7 content features from a hashtag string and the collection of tweets containing the hashtag and 11 contextual features from the social graph formed by users who have adopted the hashtag. We conducted experiments on a Twitter data set consisting of 31 million tweets from 2 million Singapore-based users. The experimental results show that the standard classifiers using the extracted features significantly outperform the baseline methods that do not use these features. Among the five classifiers, the logistic regression model performs the best in terms of the Micro-F1 measure. We also observe that contextual features are more effective than content features.
    Source
    Journal of the American Society for Information Science and Technology. 64(2013) no.7, S.1399-1410
  4. Wu, K.J.; Chen, M.-C.; Sun, Y.: Automatic topics discovery from hyperlinked documents (2004) 0.00
    0.0017847219 = product of:
      0.0071388874 = sum of:
        0.0071388874 = weight(_text_:information in 2563) [ClassicSimilarity], result of:
          0.0071388874 = score(doc=2563,freq=2.0), product of:
            0.06134496 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.034944877 = queryNorm
            0.116372846 = fieldWeight in 2563, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=2563)
      0.25 = coord(1/4)
    
    Source
    Information processing and management. 40(2004) no.2, S.239-255
  5. Teich, E.; Degaetano-Ortlieb, S.; Fankhauser, P.; Kermes, H.; Lapshinova-Koltunski, E.: ¬The linguistic construal of disciplinarity : a data-mining approach using register features (2016) 0.00
    0.0017847219 = product of:
      0.0071388874 = sum of:
        0.0071388874 = weight(_text_:information in 3015) [ClassicSimilarity], result of:
          0.0071388874 = score(doc=3015,freq=2.0), product of:
            0.06134496 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.034944877 = queryNorm
            0.116372846 = fieldWeight in 3015, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=3015)
      0.25 = coord(1/4)
    
    Source
    Journal of the Association for Information Science and Technology. 67(2016) no.7, S.1668-1678