Document (#37968)

Author
Ma, Z.
Sun, A.
Cong, G.
Title
On predicting the popularity of newly emerging hashtags in Twitter
Source
Journal of the American Society for Information Science and Technology. 64(2013) no.7, S.1399-1410
Year
2013
Abstract
Because of Twitter's popularity and the viral nature of information dissemination on Twitter, predicting which Twitter topics will become popular in the near future becomes a task of considerable economic importance. Many Twitter topics are annotated by hashtags. In this article, we propose methods to predict the popularity of new hashtags on Twitter by formulating the problem as a classification task. We use five standard classification models (i.e., Naïve bayes, k-nearest neighbors, decision trees, support vector machines, and logistic regression) for prediction. The main challenge is the identification of effective features for describing new hashtags. We extract 7 content features from a hashtag string and the collection of tweets containing the hashtag and 11 contextual features from the social graph formed by users who have adopted the hashtag. We conducted experiments on a Twitter data set consisting of 31 million tweets from 2 million Singapore-based users. The experimental results show that the standard classifiers using the extracted features significantly outperform the baseline methods that do not use these features. Among the five classifiers, the logistic regression model performs the best in terms of the Micro-F1 measure. We also observe that contextual features are more effective than content features.
Theme
Automatisches Klassifizieren
Data Mining
Object
Twitter

Similar documents (content)

  1. Çelebi, A.; Özgür, A.: Segmenting hashtags and analyzing their grammatical structure (2018) 0.51
    0.51333064 = sum of:
      0.51333064 = product of:
        1.8333237 = sum of:
          0.02223335 = weight(abstract_txt:task in 4221) [ClassicSimilarity], result of:
            0.02223335 = score(doc=4221,freq=1.0), product of:
              0.07243166 = queryWeight, product of:
                1.2064793 = boost
                4.9112997 = idf(docFreq=884, maxDocs=44218)
                0.0122239655 = queryNorm
              0.30695623 = fieldWeight in 4221, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9112997 = idf(docFreq=884, maxDocs=44218)
                0.0625 = fieldNorm(doc=4221)
          0.063755214 = weight(abstract_txt:million in 4221) [ClassicSimilarity], result of:
            0.063755214 = score(doc=4221,freq=2.0), product of:
              0.11603589 = queryWeight, product of:
                1.527045 = boost
                6.2162485 = idf(docFreq=239, maxDocs=44218)
                0.0122239655 = queryNorm
              0.5494439 = fieldWeight in 4221, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.2162485 = idf(docFreq=239, maxDocs=44218)
                0.0625 = fieldNorm(doc=4221)
          0.08193254 = weight(abstract_txt:tweets in 4221) [ClassicSimilarity], result of:
            0.08193254 = score(doc=4221,freq=1.0), product of:
              0.17280757 = queryWeight, product of:
                1.8635329 = boost
                7.5860133 = idf(docFreq=60, maxDocs=44218)
                0.0122239655 = queryNorm
              0.47412583 = fieldWeight in 4221, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5860133 = idf(docFreq=60, maxDocs=44218)
                0.0625 = fieldNorm(doc=4221)
          0.5836283 = weight(abstract_txt:hashtag in 4221) [ClassicSimilarity], result of:
            0.5836283 = score(doc=4221,freq=5.0), product of:
              0.42827544 = queryWeight, product of:
                3.593047 = boost
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.0122239655 = queryNorm
              1.3627405 = fieldWeight in 4221, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.0625 = fieldNorm(doc=4221)
          0.08688058 = weight(abstract_txt:features in 4221) [ClassicSimilarity], result of:
            0.08688058 = score(doc=4221,freq=2.0), product of:
              0.21654741 = queryWeight, product of:
                3.9027092 = boost
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.0122239655 = queryNorm
              0.4012081 = fieldWeight in 4221, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.0625 = fieldNorm(doc=4221)
          0.8084989 = weight(abstract_txt:hashtags in 4221) [ClassicSimilarity], result of:
            0.8084989 = score(doc=4221,freq=8.0), product of:
              0.5008313 = queryWeight, product of:
                4.4865904 = boost
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.0122239655 = queryNorm
              1.6143138 = fieldWeight in 4221, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.0625 = fieldNorm(doc=4221)
          0.1863949 = weight(abstract_txt:twitter in 4221) [ClassicSimilarity], result of:
            0.1863949 = score(doc=4221,freq=1.0), product of:
              0.43111017 = queryWeight, product of:
                5.0981245 = boost
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.0122239655 = queryNorm
              0.43236023 = fieldWeight in 4221, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.0625 = fieldNorm(doc=4221)
        0.28 = coord(7/25)
    
  2. Chang, H.-C.; Iyer, I.: Trends in Twitter hashtag applications : design features for value-added dimensions to future library catalogues (2012) 0.47
    0.47215346 = sum of:
      0.47215346 = product of:
        1.9673061 = sum of:
          0.017132726 = weight(abstract_txt:content in 5574) [ClassicSimilarity], result of:
            0.017132726 = score(doc=5574,freq=1.0), product of:
              0.052464977 = queryWeight, product of:
                1.0268108 = boost
                4.17991 = idf(docFreq=1838, maxDocs=44218)
                0.0122239655 = queryNorm
              0.3265555 = fieldWeight in 5574, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.17991 = idf(docFreq=1838, maxDocs=44218)
                0.078125 = fieldNorm(doc=5574)
          0.14483762 = weight(abstract_txt:tweets in 5574) [ClassicSimilarity], result of:
            0.14483762 = score(doc=5574,freq=2.0), product of:
              0.17280757 = queryWeight, product of:
                1.8635329 = boost
                7.5860133 = idf(docFreq=60, maxDocs=44218)
                0.0122239655 = queryNorm
              0.83814394 = fieldWeight in 5574, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.5860133 = idf(docFreq=60, maxDocs=44218)
                0.078125 = fieldNorm(doc=5574)
          0.65251625 = weight(abstract_txt:hashtag in 5574) [ClassicSimilarity], result of:
            0.65251625 = score(doc=5574,freq=4.0), product of:
              0.42827544 = queryWeight, product of:
                3.593047 = boost
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.0122239655 = queryNorm
              1.5235902 = fieldWeight in 5574, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.078125 = fieldNorm(doc=5574)
          0.07679231 = weight(abstract_txt:features in 5574) [ClassicSimilarity], result of:
            0.07679231 = score(doc=5574,freq=1.0), product of:
              0.21654741 = queryWeight, product of:
                3.9027092 = boost
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.0122239655 = queryNorm
              0.35462123 = fieldWeight in 5574, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.078125 = fieldNorm(doc=5574)
          0.50531185 = weight(abstract_txt:hashtags in 5574) [ClassicSimilarity], result of:
            0.50531185 = score(doc=5574,freq=2.0), product of:
              0.5008313 = queryWeight, product of:
                4.4865904 = boost
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.0122239655 = queryNorm
              1.0089462 = fieldWeight in 5574, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.078125 = fieldNorm(doc=5574)
          0.5707155 = weight(abstract_txt:twitter in 5574) [ClassicSimilarity], result of:
            0.5707155 = score(doc=5574,freq=6.0), product of:
              0.43111017 = queryWeight, product of:
                5.0981245 = boost
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.0122239655 = queryNorm
              1.3238275 = fieldWeight in 5574, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.078125 = fieldNorm(doc=5574)
        0.24 = coord(6/25)
    
  3. Kong, S.; Ye, F.; Feng, L.; Zhao, Z.: Towards the prediction problems of bursting hashtags on Twitter (2015) 0.41
    0.40940267 = sum of:
      0.40940267 = product of:
        1.7058445 = sum of:
          0.008915731 = weight(abstract_txt:from in 2338) [ClassicSimilarity], result of:
            0.008915731 = score(doc=2338,freq=1.0), product of:
              0.034408525 = queryWeight, product of:
                1.0184374 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.0122239655 = queryNorm
              0.259114 = fieldWeight in 2338, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.09375 = fieldNorm(doc=2338)
          0.037041184 = weight(abstract_txt:topics in 2338) [ClassicSimilarity], result of:
            0.037041184 = score(doc=2338,freq=1.0), product of:
              0.07768209 = queryWeight, product of:
                1.2494421 = boost
                5.086191 = idf(docFreq=742, maxDocs=44218)
                0.0122239655 = queryNorm
              0.47683042 = fieldWeight in 2338, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.086191 = idf(docFreq=742, maxDocs=44218)
                0.09375 = fieldNorm(doc=2338)
          0.39150977 = weight(abstract_txt:hashtag in 2338) [ClassicSimilarity], result of:
            0.39150977 = score(doc=2338,freq=1.0), product of:
              0.42827544 = queryWeight, product of:
                3.593047 = boost
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.0122239655 = queryNorm
              0.9141542 = fieldWeight in 2338, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.7509775 = idf(docFreq=6, maxDocs=44218)
                0.09375 = fieldNorm(doc=2338)
          0.13032086 = weight(abstract_txt:features in 2338) [ClassicSimilarity], result of:
            0.13032086 = score(doc=2338,freq=2.0), product of:
              0.21654741 = queryWeight, product of:
                3.9027092 = boost
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.0122239655 = queryNorm
              0.6018121 = fieldWeight in 2338, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.09375 = fieldNorm(doc=2338)
          0.74265367 = weight(abstract_txt:hashtags in 2338) [ClassicSimilarity], result of:
            0.74265367 = score(doc=2338,freq=3.0), product of:
              0.5008313 = queryWeight, product of:
                4.4865904 = boost
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.0122239655 = queryNorm
              1.482842 = fieldWeight in 2338, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.09375 = fieldNorm(doc=2338)
          0.3954033 = weight(abstract_txt:twitter in 2338) [ClassicSimilarity], result of:
            0.3954033 = score(doc=2338,freq=2.0), product of:
              0.43111017 = queryWeight, product of:
                5.0981245 = boost
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.0122239655 = queryNorm
              0.9171746 = fieldWeight in 2338, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.09375 = fieldNorm(doc=2338)
        0.24 = coord(6/25)
    
  4. Sedhai, S.; Sun, A.: ¬An analysis of 14 Million tweets on hashtag-oriented spamming* (2017) 0.34
    0.34169328 = sum of:
      0.34169328 = product of:
        1.423722 = sum of:
          0.01370618 = weight(abstract_txt:content in 3683) [ClassicSimilarity], result of:
            0.01370618 = score(doc=3683,freq=1.0), product of:
              0.052464977 = queryWeight, product of:
                1.0268108 = boost
                4.17991 = idf(docFreq=1838, maxDocs=44218)
                0.0122239655 = queryNorm
              0.2612444 = fieldWeight in 3683, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.17991 = idf(docFreq=1838, maxDocs=44218)
                0.0625 = fieldNorm(doc=3683)
          0.045081746 = weight(abstract_txt:million in 3683) [ClassicSimilarity], result of:
            0.045081746 = score(doc=3683,freq=1.0), product of:
              0.11603589 = queryWeight, product of:
                1.527045 = boost
                6.2162485 = idf(docFreq=239, maxDocs=44218)
                0.0122239655 = queryNorm
              0.38851553 = fieldWeight in 3683, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2162485 = idf(docFreq=239, maxDocs=44218)
                0.0625 = fieldNorm(doc=3683)
          0.2167731 = weight(abstract_txt:tweets in 3683) [ClassicSimilarity], result of:
            0.2167731 = score(doc=3683,freq=7.0), product of:
              0.17280757 = queryWeight, product of:
                1.8635329 = boost
                7.5860133 = idf(docFreq=60, maxDocs=44218)
                0.0122239655 = queryNorm
              1.254419 = fieldWeight in 3683, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                7.5860133 = idf(docFreq=60, maxDocs=44218)
                0.0625 = fieldNorm(doc=3683)
          0.09219479 = weight(abstract_txt:popularity in 3683) [ClassicSimilarity], result of:
            0.09219479 = score(doc=3683,freq=1.0), product of:
              0.21400627 = queryWeight, product of:
                2.539888 = boost
                6.892866 = idf(docFreq=121, maxDocs=44218)
                0.0122239655 = queryNorm
              0.43080413 = fieldWeight in 3683, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.892866 = idf(docFreq=121, maxDocs=44218)
                0.0625 = fieldNorm(doc=3683)
          0.6391745 = weight(abstract_txt:hashtags in 3683) [ClassicSimilarity], result of:
            0.6391745 = score(doc=3683,freq=5.0), product of:
              0.5008313 = queryWeight, product of:
                4.4865904 = boost
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.0122239655 = queryNorm
              1.2762271 = fieldWeight in 3683, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.0625 = fieldNorm(doc=3683)
          0.41679165 = weight(abstract_txt:twitter in 3683) [ClassicSimilarity], result of:
            0.41679165 = score(doc=3683,freq=5.0), product of:
              0.43111017 = queryWeight, product of:
                5.0981245 = boost
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.0122239655 = queryNorm
              0.96678686 = fieldWeight in 3683, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.0625 = fieldNorm(doc=3683)
        0.24 = coord(6/25)
    
  5. Yi, K.; Choi, N.; Kim, Y.S.: ¬A content analysis of Twitter hyperlinks and their application in web resource indexing (2016) 0.21
    0.21393447 = sum of:
      0.21393447 = product of:
        1.0696723 = sum of:
          0.0059438203 = weight(abstract_txt:from in 3075) [ClassicSimilarity], result of:
            0.0059438203 = score(doc=3075,freq=1.0), product of:
              0.034408525 = queryWeight, product of:
                1.0184374 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.0122239655 = queryNorm
              0.17274266 = fieldWeight in 3075, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.0625 = fieldNorm(doc=3075)
          0.045081746 = weight(abstract_txt:million in 3075) [ClassicSimilarity], result of:
            0.045081746 = score(doc=3075,freq=1.0), product of:
              0.11603589 = queryWeight, product of:
                1.527045 = boost
                6.2162485 = idf(docFreq=239, maxDocs=44218)
                0.0122239655 = queryNorm
              0.38851553 = fieldWeight in 3075, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2162485 = idf(docFreq=239, maxDocs=44218)
                0.0625 = fieldNorm(doc=3075)
          0.1158701 = weight(abstract_txt:tweets in 3075) [ClassicSimilarity], result of:
            0.1158701 = score(doc=3075,freq=2.0), product of:
              0.17280757 = queryWeight, product of:
                1.8635329 = boost
                7.5860133 = idf(docFreq=60, maxDocs=44218)
                0.0122239655 = queryNorm
              0.6705152 = fieldWeight in 3075, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.5860133 = idf(docFreq=60, maxDocs=44218)
                0.0625 = fieldNorm(doc=3075)
          0.6391745 = weight(abstract_txt:hashtags in 3075) [ClassicSimilarity], result of:
            0.6391745 = score(doc=3075,freq=5.0), product of:
              0.5008313 = queryWeight, product of:
                4.4865904 = boost
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.0122239655 = queryNorm
              1.2762271 = fieldWeight in 3075, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.0625 = fieldNorm(doc=3075)
          0.2636022 = weight(abstract_txt:twitter in 3075) [ClassicSimilarity], result of:
            0.2636022 = score(doc=3075,freq=2.0), product of:
              0.43111017 = queryWeight, product of:
                5.0981245 = boost
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.0122239655 = queryNorm
              0.6114497 = fieldWeight in 3075, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.0625 = fieldNorm(doc=3075)
        0.2 = coord(5/25)