Document (#40827)

Author
Bandaragoda, T.R.
Silva, D. de
Alahakoon, D.
Title
Automatic event detection in microblogs using incremental machine learning
Source
Journal of the Association for Information Science and Technology. 68(2017) no.10, S.2394-2411
Year
2017
Abstract
The global popularity of microblogs has led to an increasing accumulation of large volumes of text data on microblogging platforms such as Twitter. These corpora are untapped resources to understand social expressions on diverse subjects. Microblog analysis aims to unlock the value of such expressions by discovering insights and events of significance hidden among swathes of text. Besides velocity; diversity of content, brevity, absence of structure and time-sensitivity are key challenges in microblog analysis. In this paper, we propose an unsupervised incremental machine learning and event detection technique to address these challenges. The proposed technique separates a microblog discussion into topics to address the key problem of diversity. It maintains a record of the evolution of each topic over time. Brevity, time-sensitivity and unstructured nature are addressed by these individual topic pathways which contribute to generate a temporal, topic-driven structure of a microblog discussion. The proposed event detection method continuously monitors these topic pathways using multiple domain-independent event indicators for events of significance. The autonomous nature of topic separation, topic pathway generation, new topic identification and event detection, appropriates the proposed technique for extensive applications in microblog analysis. We demonstrate these capabilities on tweets containing #microsoft and tweets containing #obama.
Content
Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23896/full.
Theme
Internet

Similar documents (author)

  1. Silva, M.: Creating electronic environments for learning (1998) 4.69
    4.6920204 = sum of:
      4.6920204 = weight(author_txt:silva in 2785) [ClassicSimilarity], result of:
        4.6920204 = fieldWeight in 2785, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.5072327 = idf(docFreq=65, maxDocs=44218)
          0.625 = fieldNorm(doc=2785)
    
  2. Silva, A.J.: ¬Ein Netz von Erinnerungen (2018) 4.69
    4.6920204 = sum of:
      4.6920204 = weight(author_txt:silva in 4194) [ClassicSimilarity], result of:
        4.6920204 = fieldWeight in 4194, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.5072327 = idf(docFreq=65, maxDocs=44218)
          0.625 = fieldNorm(doc=4194)
    
  3. Silva, A.J.: ¬Das Gedächtnisnetz (2018) 4.69
    4.6920204 = sum of:
      4.6920204 = weight(author_txt:silva in 4421) [ClassicSimilarity], result of:
        4.6920204 = fieldWeight in 4421, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.5072327 = idf(docFreq=65, maxDocs=44218)
          0.625 = fieldNorm(doc=4421)
    
  4. Silva, A.M. Da -> Da Silva, A.M.: 3.98
    3.9813113 = sum of:
      3.9813113 = weight(author_txt:silva in 2168) [ClassicSimilarity], result of:
        3.9813113 = fieldWeight in 2168, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          7.5072327 = idf(docFreq=65, maxDocs=44218)
          0.375 = fieldNorm(doc=2168)
    
  5. Lucas da Silva, D. -> Silva, D.L da: 3.98
    3.9813113 = sum of:
      3.9813113 = weight(author_txt:silva in 885) [ClassicSimilarity], result of:
        3.9813113 = fieldWeight in 885, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          7.5072327 = idf(docFreq=65, maxDocs=44218)
          0.375 = fieldNorm(doc=885)
    

Similar documents (content)

  1. Efron, M.: Information search and retrieval in microblogs (2011) 0.26
    0.2595842 = sum of:
      0.2595842 = product of:
        1.0816008 = sum of:
          0.023737501 = weight(abstract_txt:discussion in 4455) [ClassicSimilarity], result of:
            0.023737501 = score(doc=4455,freq=1.0), product of:
              0.07382815 = queryWeight, product of:
                1.0387305 = boost
                5.144379 = idf(docFreq=700, maxDocs=44218)
                0.013816121 = queryNorm
              0.3215237 = fieldWeight in 4455, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.144379 = idf(docFreq=700, maxDocs=44218)
                0.0625 = fieldNorm(doc=4455)
          0.012754715 = weight(abstract_txt:analysis in 4455) [ClassicSimilarity], result of:
            0.012754715 = score(doc=4455,freq=1.0), product of:
              0.055856828 = queryWeight, product of:
                1.1065618 = boost
                3.6535451 = idf(docFreq=3112, maxDocs=44218)
                0.013816121 = queryNorm
              0.22834657 = fieldWeight in 4455, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6535451 = idf(docFreq=3112, maxDocs=44218)
                0.0625 = fieldNorm(doc=4455)
          0.01867014 = weight(abstract_txt:time in 4455) [ClassicSimilarity], result of:
            0.01867014 = score(doc=4455,freq=1.0), product of:
              0.07201022 = queryWeight, product of:
                1.2564193 = boost
                4.148331 = idf(docFreq=1897, maxDocs=44218)
                0.013816121 = queryNorm
              0.2592707 = fieldWeight in 4455, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.148331 = idf(docFreq=1897, maxDocs=44218)
                0.0625 = fieldNorm(doc=4455)
          0.014124851 = weight(abstract_txt:these in 4455) [ClassicSimilarity], result of:
            0.014124851 = score(doc=4455,freq=1.0), product of:
              0.070887215 = queryWeight, product of:
                1.6093328 = boost
                3.1881294 = idf(docFreq=4957, maxDocs=44218)
                0.013816121 = queryNorm
              0.19925809 = fieldWeight in 4455, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1881294 = idf(docFreq=4957, maxDocs=44218)
                0.0625 = fieldNorm(doc=4455)
          0.25037462 = weight(abstract_txt:microblogs in 4455) [ClassicSimilarity], result of:
            0.25037462 = score(doc=4455,freq=3.0), product of:
              0.24619834 = queryWeight, product of:
                1.8968564 = boost
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.013816121 = queryNorm
              1.016963 = fieldWeight in 4455, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.0625 = fieldNorm(doc=4455)
          0.761939 = weight(abstract_txt:microblog in 4455) [ClassicSimilarity], result of:
            0.761939 = score(doc=4455,freq=5.0), product of:
              0.59183705 = queryWeight, product of:
                4.6501074 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.013816121 = queryNorm
              1.2874135 = fieldWeight in 4455, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.0625 = fieldNorm(doc=4455)
        0.24 = coord(6/25)
    
  2. Jansen, B.J.; Zhang, M.; Sobel, K.; Chowdury, A.: Twitter power : tweets as electronic word of mouth (2009) 0.25
    0.24565297 = sum of:
      0.24565297 = product of:
        1.0235541 = sum of:
          0.043199457 = weight(abstract_txt:containing in 3157) [ClassicSimilarity], result of:
            0.043199457 = score(doc=3157,freq=1.0), product of:
              0.11004852 = queryWeight, product of:
                1.2681891 = boost
                6.280787 = idf(docFreq=224, maxDocs=44218)
                0.013816121 = queryNorm
              0.3925492 = fieldWeight in 3157, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.280787 = idf(docFreq=224, maxDocs=44218)
                0.0625 = fieldNorm(doc=3157)
          0.07724489 = weight(abstract_txt:expressions in 3157) [ClassicSimilarity], result of:
            0.07724489 = score(doc=3157,freq=2.0), product of:
              0.12867728 = queryWeight, product of:
                1.3713326 = boost
                6.7916126 = idf(docFreq=134, maxDocs=44218)
                0.013816121 = queryNorm
              0.6002994 = fieldWeight in 3157, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.7916126 = idf(docFreq=134, maxDocs=44218)
                0.0625 = fieldNorm(doc=3157)
          0.1076445 = weight(abstract_txt:tweets in 3157) [ClassicSimilarity], result of:
            0.1076445 = score(doc=3157,freq=2.0), product of:
              0.16053998 = queryWeight, product of:
                1.5317346 = boost
                7.5860133 = idf(docFreq=60, maxDocs=44218)
                0.013816121 = queryNorm
              0.6705152 = fieldWeight in 3157, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.5860133 = idf(docFreq=60, maxDocs=44218)
                0.0625 = fieldNorm(doc=3157)
          0.02446496 = weight(abstract_txt:these in 3157) [ClassicSimilarity], result of:
            0.02446496 = score(doc=3157,freq=3.0), product of:
              0.070887215 = queryWeight, product of:
                1.6093328 = boost
                3.1881294 = idf(docFreq=4957, maxDocs=44218)
                0.013816121 = queryNorm
              0.34512514 = fieldWeight in 3157, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.1881294 = idf(docFreq=4957, maxDocs=44218)
                0.0625 = fieldNorm(doc=3157)
          0.2891077 = weight(abstract_txt:microblogs in 3157) [ClassicSimilarity], result of:
            0.2891077 = score(doc=3157,freq=4.0), product of:
              0.24619834 = queryWeight, product of:
                1.8968564 = boost
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.013816121 = queryNorm
              1.1742878 = fieldWeight in 3157, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.0625 = fieldNorm(doc=3157)
          0.48189253 = weight(abstract_txt:microblog in 3157) [ClassicSimilarity], result of:
            0.48189253 = score(doc=3157,freq=2.0), product of:
              0.59183705 = queryWeight, product of:
                4.6501074 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.013816121 = queryNorm
              0.81423175 = fieldWeight in 3157, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.0625 = fieldNorm(doc=3157)
        0.24 = coord(6/25)
    
  3. Paltoglou, G.: Sentiment-based event detection in Twitter (2016) 0.18
    0.18218243 = sum of:
      0.18218243 = product of:
        0.6506515 = sum of:
          0.018037891 = weight(abstract_txt:analysis in 3010) [ClassicSimilarity], result of:
            0.018037891 = score(doc=3010,freq=2.0), product of:
              0.055856828 = queryWeight, product of:
                1.1065618 = boost
                3.6535451 = idf(docFreq=3112, maxDocs=44218)
                0.013816121 = queryNorm
              0.3229308 = fieldWeight in 3010, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.6535451 = idf(docFreq=3112, maxDocs=44218)
                0.0625 = fieldNorm(doc=3010)
          0.01867014 = weight(abstract_txt:time in 3010) [ClassicSimilarity], result of:
            0.01867014 = score(doc=3010,freq=1.0), product of:
              0.07201022 = queryWeight, product of:
                1.2564193 = boost
                4.148331 = idf(docFreq=1897, maxDocs=44218)
                0.013816121 = queryNorm
              0.2592707 = fieldWeight in 3010, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.148331 = idf(docFreq=1897, maxDocs=44218)
                0.0625 = fieldNorm(doc=3010)
          0.07466527 = weight(abstract_txt:events in 3010) [ClassicSimilarity], result of:
            0.07466527 = score(doc=3010,freq=3.0), product of:
              0.109893166 = queryWeight, product of:
                1.2672936 = boost
                6.2763524 = idf(docFreq=225, maxDocs=44218)
                0.013816121 = queryNorm
              0.6794351 = fieldWeight in 3010, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.2763524 = idf(docFreq=225, maxDocs=44218)
                0.0625 = fieldNorm(doc=3010)
          0.07611615 = weight(abstract_txt:tweets in 3010) [ClassicSimilarity], result of:
            0.07611615 = score(doc=3010,freq=1.0), product of:
              0.16053998 = queryWeight, product of:
                1.5317346 = boost
                7.5860133 = idf(docFreq=60, maxDocs=44218)
                0.013816121 = queryNorm
              0.47412583 = fieldWeight in 3010, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5860133 = idf(docFreq=60, maxDocs=44218)
                0.0625 = fieldNorm(doc=3010)
          0.09723394 = weight(abstract_txt:sensitivity in 3010) [ClassicSimilarity], result of:
            0.09723394 = score(doc=3010,freq=1.0), product of:
              0.18900673 = queryWeight, product of:
                1.6619982 = boost
                8.231152 = idf(docFreq=31, maxDocs=44218)
                0.013816121 = queryNorm
              0.514447 = fieldWeight in 3010, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.231152 = idf(docFreq=31, maxDocs=44218)
                0.0625 = fieldNorm(doc=3010)
          0.15398668 = weight(abstract_txt:detection in 3010) [ClassicSimilarity], result of:
            0.15398668 = score(doc=3010,freq=2.0), product of:
              0.25679553 = queryWeight, product of:
                2.7396848 = boost
                6.784232 = idf(docFreq=135, maxDocs=44218)
                0.013816121 = queryNorm
              0.59964705 = fieldWeight in 3010, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.784232 = idf(docFreq=135, maxDocs=44218)
                0.0625 = fieldNorm(doc=3010)
          0.2119414 = weight(abstract_txt:event in 3010) [ClassicSimilarity], result of:
            0.2119414 = score(doc=3010,freq=2.0), product of:
              0.34227818 = queryWeight, product of:
                3.5363197 = boost
                7.0055394 = idf(docFreq=108, maxDocs=44218)
                0.013816121 = queryNorm
              0.61920804 = fieldWeight in 3010, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.0055394 = idf(docFreq=108, maxDocs=44218)
                0.0625 = fieldNorm(doc=3010)
        0.28 = coord(7/25)
    
  4. Kim, H.H.; Kim, Y.H.: ERP/MMR algorithm for classifying topic-relevant and topic-irrelevant visual shots of documentary videos (2019) 0.17
    0.17413783 = sum of:
      0.17413783 = product of:
        0.6219208 = sum of:
          0.012754715 = weight(abstract_txt:analysis in 5358) [ClassicSimilarity], result of:
            0.012754715 = score(doc=5358,freq=1.0), product of:
              0.055856828 = queryWeight, product of:
                1.1065618 = boost
                3.6535451 = idf(docFreq=3112, maxDocs=44218)
                0.013816121 = queryNorm
              0.22834657 = fieldWeight in 5358, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6535451 = idf(docFreq=3112, maxDocs=44218)
                0.0625 = fieldNorm(doc=5358)
          0.04395078 = weight(abstract_txt:significance in 5358) [ClassicSimilarity], result of:
            0.04395078 = score(doc=5358,freq=1.0), product of:
              0.11132082 = queryWeight, product of:
                1.275499 = boost
                6.31699 = idf(docFreq=216, maxDocs=44218)
                0.013816121 = queryNorm
              0.39481187 = fieldWeight in 5358, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.31699 = idf(docFreq=216, maxDocs=44218)
                0.0625 = fieldNorm(doc=5358)
          0.04633106 = weight(abstract_txt:diversity in 5358) [ClassicSimilarity], result of:
            0.04633106 = score(doc=5358,freq=1.0), product of:
              0.11530465 = queryWeight, product of:
                1.2981215 = boost
                6.429029 = idf(docFreq=193, maxDocs=44218)
                0.013816121 = queryNorm
              0.4018143 = fieldWeight in 5358, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.429029 = idf(docFreq=193, maxDocs=44218)
                0.0625 = fieldNorm(doc=5358)
          0.036220215 = weight(abstract_txt:proposed in 5358) [ClassicSimilarity], result of:
            0.036220215 = score(doc=5358,freq=2.0), product of:
              0.08890369 = queryWeight, product of:
                1.3960385 = boost
                4.6093135 = idf(docFreq=1196, maxDocs=44218)
                0.013816121 = queryNorm
              0.4074096 = fieldWeight in 5358, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.6093135 = idf(docFreq=1196, maxDocs=44218)
                0.0625 = fieldNorm(doc=5358)
          0.10888503 = weight(abstract_txt:detection in 5358) [ClassicSimilarity], result of:
            0.10888503 = score(doc=5358,freq=1.0), product of:
              0.25679553 = queryWeight, product of:
                2.7396848 = boost
                6.784232 = idf(docFreq=135, maxDocs=44218)
                0.013816121 = queryNorm
              0.4240145 = fieldWeight in 5358, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.784232 = idf(docFreq=135, maxDocs=44218)
                0.0625 = fieldNorm(doc=5358)
          0.14986521 = weight(abstract_txt:event in 5358) [ClassicSimilarity], result of:
            0.14986521 = score(doc=5358,freq=1.0), product of:
              0.34227818 = queryWeight, product of:
                3.5363197 = boost
                7.0055394 = idf(docFreq=108, maxDocs=44218)
                0.013816121 = queryNorm
              0.4378462 = fieldWeight in 5358, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0055394 = idf(docFreq=108, maxDocs=44218)
                0.0625 = fieldNorm(doc=5358)
          0.2239138 = weight(abstract_txt:topic in 5358) [ClassicSimilarity], result of:
            0.2239138 = score(doc=5358,freq=8.0), product of:
              0.2502142 = queryWeight, product of:
                3.5775185 = boost
                5.062254 = idf(docFreq=760, maxDocs=44218)
                0.013816121 = queryNorm
              0.8948885 = fieldWeight in 5358, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                5.062254 = idf(docFreq=760, maxDocs=44218)
                0.0625 = fieldNorm(doc=5358)
        0.28 = coord(7/25)
    
  5. Aksoy, C.; Can, F.; Kocberber, S.: Novelty detection for topic tracking (2012) 0.14
    0.14235651 = sum of:
      0.14235651 = product of:
        0.59315217 = sum of:
          0.028327294 = weight(abstract_txt:address in 51) [ClassicSimilarity], result of:
            0.028327294 = score(doc=51,freq=1.0), product of:
              0.08306194 = queryWeight, product of:
                1.101775 = boost
                5.456611 = idf(docFreq=512, maxDocs=44218)
                0.013816121 = queryNorm
              0.3410382 = fieldWeight in 51, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.456611 = idf(docFreq=512, maxDocs=44218)
                0.0625 = fieldNorm(doc=51)
          0.01867014 = weight(abstract_txt:time in 51) [ClassicSimilarity], result of:
            0.01867014 = score(doc=51,freq=1.0), product of:
              0.07201022 = queryWeight, product of:
                1.2564193 = boost
                4.148331 = idf(docFreq=1897, maxDocs=44218)
                0.013816121 = queryNorm
              0.2592707 = fieldWeight in 51, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.148331 = idf(docFreq=1897, maxDocs=44218)
                0.0625 = fieldNorm(doc=51)
          0.043108016 = weight(abstract_txt:events in 51) [ClassicSimilarity], result of:
            0.043108016 = score(doc=51,freq=1.0), product of:
              0.109893166 = queryWeight, product of:
                1.2672936 = boost
                6.2763524 = idf(docFreq=225, maxDocs=44218)
                0.013816121 = queryNorm
              0.39227203 = fieldWeight in 51, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2763524 = idf(docFreq=225, maxDocs=44218)
                0.0625 = fieldNorm(doc=51)
          0.15398668 = weight(abstract_txt:detection in 51) [ClassicSimilarity], result of:
            0.15398668 = score(doc=51,freq=2.0), product of:
              0.25679553 = queryWeight, product of:
                2.7396848 = boost
                6.784232 = idf(docFreq=135, maxDocs=44218)
                0.013816121 = queryNorm
              0.59964705 = fieldWeight in 51, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.784232 = idf(docFreq=135, maxDocs=44218)
                0.0625 = fieldNorm(doc=51)
          0.2119414 = weight(abstract_txt:event in 51) [ClassicSimilarity], result of:
            0.2119414 = score(doc=51,freq=2.0), product of:
              0.34227818 = queryWeight, product of:
                3.5363197 = boost
                7.0055394 = idf(docFreq=108, maxDocs=44218)
                0.013816121 = queryNorm
              0.61920804 = fieldWeight in 51, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.0055394 = idf(docFreq=108, maxDocs=44218)
                0.0625 = fieldNorm(doc=51)
          0.13711864 = weight(abstract_txt:topic in 51) [ClassicSimilarity], result of:
            0.13711864 = score(doc=51,freq=3.0), product of:
              0.2502142 = queryWeight, product of:
                3.5775185 = boost
                5.062254 = idf(docFreq=760, maxDocs=44218)
                0.013816121 = queryNorm
              0.54800504 = fieldWeight in 51, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.062254 = idf(docFreq=760, maxDocs=44218)
                0.0625 = fieldNorm(doc=51)
        0.24 = coord(6/25)