Document (#37969)

Author
Ma, Z.
Sun, A.
Cong, G.
Title
On predicting the popularity of newly emerging hashtags in Twitter
Source
Journal of the American Society for Information Science and Technology. 64(2013) no.7, S.1399-1410
Year
2013
Abstract
Because of Twitter's popularity and the viral nature of information dissemination on Twitter, predicting which Twitter topics will become popular in the near future becomes a task of considerable economic importance. Many Twitter topics are annotated by hashtags. In this article, we propose methods to predict the popularity of new hashtags on Twitter by formulating the problem as a classification task. We use five standard classification models (i.e., Naïve bayes, k-nearest neighbors, decision trees, support vector machines, and logistic regression) for prediction. The main challenge is the identification of effective features for describing new hashtags. We extract 7 content features from a hashtag string and the collection of tweets containing the hashtag and 11 contextual features from the social graph formed by users who have adopted the hashtag. We conducted experiments on a Twitter data set consisting of 31 million tweets from 2 million Singapore-based users. The experimental results show that the standard classifiers using the extracted features significantly outperform the baseline methods that do not use these features. Among the five classifiers, the logistic regression model performs the best in terms of the Micro-F1 measure. We also observe that contextual features are more effective than content features.
Theme
Automatisches Klassifizieren
Data Mining
Object
Twitter

Similar documents (content)

  1. Çelebi, A.; Özgür, A.: Segmenting hashtags and analyzing their grammatical structure (2018) 0.53
    0.5294567 = sum of:
      0.5294567 = product of:
        1.8909167 = sum of:
          0.021599608 = weight(abstract_txt:task in 222) [ClassicSimilarity], result of:
            0.021599608 = score(doc=222,freq=1.0), product of:
              0.07000562 = queryWeight, product of:
                1.2090192 = boost
                4.936657 = idf(docFreq=833, maxDocs=42740)
                0.011729156 = queryNorm
              0.30854106 = fieldWeight in 222, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.936657 = idf(docFreq=833, maxDocs=42740)
                0.0625 = fieldNorm(doc=222)
          0.063113056 = weight(abstract_txt:million in 222) [ClassicSimilarity], result of:
            0.063113056 = score(doc=222,freq=2.0), product of:
              0.11356342 = queryWeight, product of:
                1.5398768 = boost
                6.287612 = idf(docFreq=215, maxDocs=42740)
                0.011729156 = queryNorm
              0.5557516 = fieldWeight in 222, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.287612 = idf(docFreq=215, maxDocs=42740)
                0.0625 = fieldNorm(doc=222)
          0.084253795 = weight(abstract_txt:tweets in 222) [ClassicSimilarity], result of:
            0.084253795 = score(doc=222,freq=1.0), product of:
              0.1734717 = queryWeight, product of:
                1.9031854 = boost
                7.77107 = idf(docFreq=48, maxDocs=42740)
                0.011729156 = queryNorm
              0.48569188 = fieldWeight in 222, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.77107 = idf(docFreq=48, maxDocs=42740)
                0.0625 = fieldNorm(doc=222)
          0.57919395 = weight(abstract_txt:hashtag in 222) [ClassicSimilarity], result of:
            0.57919395 = score(doc=222,freq=5.0), product of:
              0.41984802 = queryWeight, product of:
                3.6262558 = boost
                9.871131 = idf(docFreq=5, maxDocs=42740)
                0.011729156 = queryNorm
              1.3795325 = fieldWeight in 222, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                9.871131 = idf(docFreq=5, maxDocs=42740)
                0.0625 = fieldNorm(doc=222)
          0.08437365 = weight(abstract_txt:features in 222) [ClassicSimilarity], result of:
            0.08437365 = score(doc=222,freq=2.0), product of:
              0.20924395 = queryWeight, product of:
                3.9104545 = boost
                4.5620384 = idf(docFreq=1212, maxDocs=42740)
                0.011729156 = queryNorm
              0.40323102 = fieldWeight in 222, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.5620384 = idf(docFreq=1212, maxDocs=42740)
                0.0625 = fieldNorm(doc=222)
          0.8613417 = weight(abstract_txt:hashtags in 222) [ClassicSimilarity], result of:
            0.8613417 = score(doc=222,freq=8.0), product of:
              0.5147535 = queryWeight, product of:
                4.6364055 = boost
                9.465666 = idf(docFreq=8, maxDocs=42740)
                0.011729156 = queryNorm
              1.6733091 = fieldWeight in 222, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                9.465666 = idf(docFreq=8, maxDocs=42740)
                0.0625 = fieldNorm(doc=222)
          0.19704093 = weight(abstract_txt:twitter in 222) [ClassicSimilarity], result of:
            0.19704093 = score(doc=222,freq=1.0), product of:
              0.44080555 = queryWeight, product of:
                5.254736 = boost
                7.152031 = idf(docFreq=90, maxDocs=42740)
                0.011729156 = queryNorm
              0.44700193 = fieldWeight in 222, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.152031 = idf(docFreq=90, maxDocs=42740)
                0.0625 = fieldNorm(doc=222)
        0.28 = coord(7/25)
    
  2. Chang, H.-C.; Iyer, I.: Trends in Twitter hashtag applications : design features for value-added dimensions to future library catalogues (2012) 0.49
    0.48704028 = sum of:
      0.48704028 = product of:
        2.0293345 = sum of:
          0.01660743 = weight(abstract_txt:content in 1575) [ClassicSimilarity], result of:
            0.01660743 = score(doc=1575,freq=1.0), product of:
              0.050632644 = queryWeight, product of:
                1.0282105 = boost
                4.1983805 = idf(docFreq=1744, maxDocs=42740)
                0.011729156 = queryNorm
              0.32799846 = fieldWeight in 1575, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1983805 = idf(docFreq=1744, maxDocs=42740)
                0.078125 = fieldNorm(doc=1575)
          0.14894107 = weight(abstract_txt:tweets in 1575) [ClassicSimilarity], result of:
            0.14894107 = score(doc=1575,freq=2.0), product of:
              0.1734717 = queryWeight, product of:
                1.9031854 = boost
                7.77107 = idf(docFreq=48, maxDocs=42740)
                0.011729156 = queryNorm
              0.85859 = fieldWeight in 1575, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.77107 = idf(docFreq=48, maxDocs=42740)
                0.078125 = fieldNorm(doc=1575)
          0.64755857 = weight(abstract_txt:hashtag in 1575) [ClassicSimilarity], result of:
            0.64755857 = score(doc=1575,freq=4.0), product of:
              0.41984802 = queryWeight, product of:
                3.6262558 = boost
                9.871131 = idf(docFreq=5, maxDocs=42740)
                0.011729156 = queryNorm
              1.5423642 = fieldWeight in 1575, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                9.871131 = idf(docFreq=5, maxDocs=42740)
                0.078125 = fieldNorm(doc=1575)
          0.07457648 = weight(abstract_txt:features in 1575) [ClassicSimilarity], result of:
            0.07457648 = score(doc=1575,freq=1.0), product of:
              0.20924395 = queryWeight, product of:
                3.9104545 = boost
                4.5620384 = idf(docFreq=1212, maxDocs=42740)
                0.011729156 = queryNorm
              0.35640925 = fieldWeight in 1575, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5620384 = idf(docFreq=1212, maxDocs=42740)
                0.078125 = fieldNorm(doc=1575)
          0.5383386 = weight(abstract_txt:hashtags in 1575) [ClassicSimilarity], result of:
            0.5383386 = score(doc=1575,freq=2.0), product of:
              0.5147535 = queryWeight, product of:
                4.6364055 = boost
                9.465666 = idf(docFreq=8, maxDocs=42740)
                0.011729156 = queryNorm
              1.0458182 = fieldWeight in 1575, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.465666 = idf(docFreq=8, maxDocs=42740)
                0.078125 = fieldNorm(doc=1575)
          0.60331225 = weight(abstract_txt:twitter in 1575) [ClassicSimilarity], result of:
            0.60331225 = score(doc=1575,freq=6.0), product of:
              0.44080555 = queryWeight, product of:
                5.254736 = boost
                7.152031 = idf(docFreq=90, maxDocs=42740)
                0.011729156 = queryNorm
              1.3686584 = fieldWeight in 1575, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.152031 = idf(docFreq=90, maxDocs=42740)
                0.078125 = fieldNorm(doc=1575)
        0.24 = coord(6/25)
    
  3. Kong, S.; Ye, F.; Feng, L.; Zhao, Z.: Towards the prediction problems of bursting hashtags on Twitter (2015) 0.42
    0.42460564 = sum of:
      0.42460564 = product of:
        1.7691902 = sum of:
          0.008760833 = weight(abstract_txt:from in 4339) [ClassicSimilarity], result of:
            0.008760833 = score(doc=4339,freq=1.0), product of:
              0.03350957 = queryWeight, product of:
                1.0244642 = boost
                2.7887225 = idf(docFreq=7144, maxDocs=42740)
                0.011729156 = queryNorm
              0.26144272 = fieldWeight in 4339, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.7887225 = idf(docFreq=7144, maxDocs=42740)
                0.09375 = fieldNorm(doc=4339)
          0.03615386 = weight(abstract_txt:topics in 4339) [ClassicSimilarity], result of:
            0.03615386 = score(doc=4339,freq=1.0), product of:
              0.07531439 = queryWeight, product of:
                1.2540237 = boost
                5.1204185 = idf(docFreq=693, maxDocs=42740)
                0.011729156 = queryNorm
              0.48003924 = fieldWeight in 4339, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1204185 = idf(docFreq=693, maxDocs=42740)
                0.09375 = fieldNorm(doc=4339)
          0.38853514 = weight(abstract_txt:hashtag in 4339) [ClassicSimilarity], result of:
            0.38853514 = score(doc=4339,freq=1.0), product of:
              0.41984802 = queryWeight, product of:
                3.6262558 = boost
                9.871131 = idf(docFreq=5, maxDocs=42740)
                0.011729156 = queryNorm
              0.9254185 = fieldWeight in 4339, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.871131 = idf(docFreq=5, maxDocs=42740)
                0.09375 = fieldNorm(doc=4339)
          0.12656048 = weight(abstract_txt:features in 4339) [ClassicSimilarity], result of:
            0.12656048 = score(doc=4339,freq=2.0), product of:
              0.20924395 = queryWeight, product of:
                3.9104545 = boost
                4.5620384 = idf(docFreq=1212, maxDocs=42740)
                0.011729156 = queryNorm
              0.60484654 = fieldWeight in 4339, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.5620384 = idf(docFreq=1212, maxDocs=42740)
                0.09375 = fieldNorm(doc=4339)
          0.79119295 = weight(abstract_txt:hashtags in 4339) [ClassicSimilarity], result of:
            0.79119295 = score(doc=4339,freq=3.0), product of:
              0.5147535 = queryWeight, product of:
                4.6364055 = boost
                9.465666 = idf(docFreq=8, maxDocs=42740)
                0.011729156 = queryNorm
              1.5370326 = fieldWeight in 4339, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.465666 = idf(docFreq=8, maxDocs=42740)
                0.09375 = fieldNorm(doc=4339)
          0.41798696 = weight(abstract_txt:twitter in 4339) [ClassicSimilarity], result of:
            0.41798696 = score(doc=4339,freq=2.0), product of:
              0.44080555 = queryWeight, product of:
                5.254736 = boost
                7.152031 = idf(docFreq=90, maxDocs=42740)
                0.011729156 = queryNorm
              0.9482343 = fieldWeight in 4339, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.152031 = idf(docFreq=90, maxDocs=42740)
                0.09375 = fieldNorm(doc=4339)
        0.24 = coord(6/25)
    
  4. Sedhai, S.; Sun, A.: ¬An analysis of 14 Million tweets on hashtag-oriented spamming* (2017) 0.36
    0.35864472 = sum of:
      0.35864472 = product of:
        1.494353 = sum of:
          0.013285944 = weight(abstract_txt:content in 5684) [ClassicSimilarity], result of:
            0.013285944 = score(doc=5684,freq=1.0), product of:
              0.050632644 = queryWeight, product of:
                1.0282105 = boost
                4.1983805 = idf(docFreq=1744, maxDocs=42740)
                0.011729156 = queryNorm
              0.26239878 = fieldWeight in 5684, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1983805 = idf(docFreq=1744, maxDocs=42740)
                0.0625 = fieldNorm(doc=5684)
          0.04462767 = weight(abstract_txt:million in 5684) [ClassicSimilarity], result of:
            0.04462767 = score(doc=5684,freq=1.0), product of:
              0.11356342 = queryWeight, product of:
                1.5398768 = boost
                6.287612 = idf(docFreq=215, maxDocs=42740)
                0.011729156 = queryNorm
              0.39297575 = fieldWeight in 5684, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.287612 = idf(docFreq=215, maxDocs=42740)
                0.0625 = fieldNorm(doc=5684)
          0.22291459 = weight(abstract_txt:tweets in 5684) [ClassicSimilarity], result of:
            0.22291459 = score(doc=5684,freq=7.0), product of:
              0.1734717 = queryWeight, product of:
                1.9031854 = boost
                7.77107 = idf(docFreq=48, maxDocs=42740)
                0.011729156 = queryNorm
              1.2850199 = fieldWeight in 5684, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                7.77107 = idf(docFreq=48, maxDocs=42740)
                0.0625 = fieldNorm(doc=5684)
          0.09197744 = weight(abstract_txt:popularity in 5684) [ClassicSimilarity], result of:
            0.09197744 = score(doc=5684,freq=1.0), product of:
              0.21053305 = queryWeight, product of:
                2.567867 = boost
                6.9900618 = idf(docFreq=106, maxDocs=42740)
                0.011729156 = queryNorm
              0.43687886 = fieldWeight in 5684, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9900618 = idf(docFreq=106, maxDocs=42740)
                0.0625 = fieldNorm(doc=5684)
          0.68095046 = weight(abstract_txt:hashtags in 5684) [ClassicSimilarity], result of:
            0.68095046 = score(doc=5684,freq=5.0), product of:
              0.5147535 = queryWeight, product of:
                4.6364055 = boost
                9.465666 = idf(docFreq=8, maxDocs=42740)
                0.011729156 = queryNorm
              1.322867 = fieldWeight in 5684, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                9.465666 = idf(docFreq=8, maxDocs=42740)
                0.0625 = fieldNorm(doc=5684)
          0.44059694 = weight(abstract_txt:twitter in 5684) [ClassicSimilarity], result of:
            0.44059694 = score(doc=5684,freq=5.0), product of:
              0.44080555 = queryWeight, product of:
                5.254736 = boost
                7.152031 = idf(docFreq=90, maxDocs=42740)
                0.011729156 = queryNorm
              0.99952674 = fieldWeight in 5684, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.152031 = idf(docFreq=90, maxDocs=42740)
                0.0625 = fieldNorm(doc=5684)
        0.24 = coord(6/25)
    
  5. Yi, K.; Choi, N.; Kim, Y.S.: ¬A content analysis of Twitter hyperlinks and their application in web resource indexing (2016) 0.23
    0.22584592 = sum of:
      0.22584592 = product of:
        1.1292295 = sum of:
          0.0058405558 = weight(abstract_txt:from in 5076) [ClassicSimilarity], result of:
            0.0058405558 = score(doc=5076,freq=1.0), product of:
              0.03350957 = queryWeight, product of:
                1.0244642 = boost
                2.7887225 = idf(docFreq=7144, maxDocs=42740)
                0.011729156 = queryNorm
              0.17429516 = fieldWeight in 5076, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.7887225 = idf(docFreq=7144, maxDocs=42740)
                0.0625 = fieldNorm(doc=5076)
          0.04462767 = weight(abstract_txt:million in 5076) [ClassicSimilarity], result of:
            0.04462767 = score(doc=5076,freq=1.0), product of:
              0.11356342 = queryWeight, product of:
                1.5398768 = boost
                6.287612 = idf(docFreq=215, maxDocs=42740)
                0.011729156 = queryNorm
              0.39297575 = fieldWeight in 5076, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.287612 = idf(docFreq=215, maxDocs=42740)
                0.0625 = fieldNorm(doc=5076)
          0.11915286 = weight(abstract_txt:tweets in 5076) [ClassicSimilarity], result of:
            0.11915286 = score(doc=5076,freq=2.0), product of:
              0.1734717 = queryWeight, product of:
                1.9031854 = boost
                7.77107 = idf(docFreq=48, maxDocs=42740)
                0.011729156 = queryNorm
              0.686872 = fieldWeight in 5076, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.77107 = idf(docFreq=48, maxDocs=42740)
                0.0625 = fieldNorm(doc=5076)
          0.68095046 = weight(abstract_txt:hashtags in 5076) [ClassicSimilarity], result of:
            0.68095046 = score(doc=5076,freq=5.0), product of:
              0.5147535 = queryWeight, product of:
                4.6364055 = boost
                9.465666 = idf(docFreq=8, maxDocs=42740)
                0.011729156 = queryNorm
              1.322867 = fieldWeight in 5076, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                9.465666 = idf(docFreq=8, maxDocs=42740)
                0.0625 = fieldNorm(doc=5076)
          0.27865797 = weight(abstract_txt:twitter in 5076) [ClassicSimilarity], result of:
            0.27865797 = score(doc=5076,freq=2.0), product of:
              0.44080555 = queryWeight, product of:
                5.254736 = boost
                7.152031 = idf(docFreq=90, maxDocs=42740)
                0.011729156 = queryNorm
              0.6321562 = fieldWeight in 5076, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.152031 = idf(docFreq=90, maxDocs=42740)
                0.0625 = fieldNorm(doc=5076)
        0.2 = coord(5/25)