Document (#40828)

Author
Bandaragoda, T.R.
Silva, D. de
Alahakoon, D.
Title
Automatic event detection in microblogs using incremental machine learning
Source
Journal of the Association for Information Science and Technology. 68(2017) no.10, S.2394-2411
Year
2017
Abstract
The global popularity of microblogs has led to an increasing accumulation of large volumes of text data on microblogging platforms such as Twitter. These corpora are untapped resources to understand social expressions on diverse subjects. Microblog analysis aims to unlock the value of such expressions by discovering insights and events of significance hidden among swathes of text. Besides velocity; diversity of content, brevity, absence of structure and time-sensitivity are key challenges in microblog analysis. In this paper, we propose an unsupervised incremental machine learning and event detection technique to address these challenges. The proposed technique separates a microblog discussion into topics to address the key problem of diversity. It maintains a record of the evolution of each topic over time. Brevity, time-sensitivity and unstructured nature are addressed by these individual topic pathways which contribute to generate a temporal, topic-driven structure of a microblog discussion. The proposed event detection method continuously monitors these topic pathways using multiple domain-independent event indicators for events of significance. The autonomous nature of topic separation, topic pathway generation, new topic identification and event detection, appropriates the proposed technique for extensive applications in microblog analysis. We demonstrate these capabilities on tweets containing #microsoft and tweets containing #obama.
Content
Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23896/full.
Theme
Internet

Similar documents (author)

  1. Silva, M.: Creating electronic environments for learning (1998) 4.71
    4.7098475 = sum of:
      4.7098475 = weight(author_txt:silva in 3786) [ClassicSimilarity], result of:
        4.7098475 = fieldWeight in 3786, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.535756 = idf(docFreq=61, maxDocs=42740)
          0.625 = fieldNorm(doc=3786)
    
  2. Silva, A.J.: ¬Ein Netz von Erinnerungen (2018) 4.71
    4.7098475 = sum of:
      4.7098475 = weight(author_txt:silva in 195) [ClassicSimilarity], result of:
        4.7098475 = fieldWeight in 195, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.535756 = idf(docFreq=61, maxDocs=42740)
          0.625 = fieldNorm(doc=195)
    
  3. Silva, A.J.: ¬Das Gedächtnisnetz (2018) 4.71
    4.7098475 = sum of:
      4.7098475 = weight(author_txt:silva in 422) [ClassicSimilarity], result of:
        4.7098475 = fieldWeight in 422, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          7.535756 = idf(docFreq=61, maxDocs=42740)
          0.625 = fieldNorm(doc=422)
    
  4. Silva, A.M. Da -> Da Silva, A.M.: 4.00
    3.996438 = sum of:
      3.996438 = weight(author_txt:silva in 2168) [ClassicSimilarity], result of:
        3.996438 = fieldWeight in 2168, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          7.535756 = idf(docFreq=61, maxDocs=42740)
          0.375 = fieldNorm(doc=2168)
    
  5. Lucas da Silva, D. -> Silva, D.L da: 4.00
    3.996438 = sum of:
      3.996438 = weight(author_txt:silva in 2886) [ClassicSimilarity], result of:
        3.996438 = fieldWeight in 2886, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          7.535756 = idf(docFreq=61, maxDocs=42740)
          0.375 = fieldNorm(doc=2886)
    

Similar documents (content)

  1. Efron, M.: Information search and retrieval in microblogs (2011) 0.26
    0.26326615 = sum of:
      0.26326615 = product of:
        1.0969423 = sum of:
          0.023574876 = weight(abstract_txt:discussion in 1456) [ClassicSimilarity], result of:
            0.023574876 = score(doc=1456,freq=1.0), product of:
              0.07303516 = queryWeight, product of:
                1.0464066 = boost
                5.1646085 = idf(docFreq=663, maxDocs=42740)
                0.013514317 = queryNorm
              0.32278803 = fieldWeight in 1456, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1646085 = idf(docFreq=663, maxDocs=42740)
                0.0625 = fieldNorm(doc=1456)
          0.012880346 = weight(abstract_txt:analysis in 1456) [ClassicSimilarity], result of:
            0.012880346 = score(doc=1456,freq=1.0), product of:
              0.05587461 = queryWeight, product of:
                1.120953 = boost
                3.6883576 = idf(docFreq=2905, maxDocs=42740)
                0.013514317 = queryNorm
              0.23052235 = fieldWeight in 1456, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6883576 = idf(docFreq=2905, maxDocs=42740)
                0.0625 = fieldNorm(doc=1456)
          0.01855612 = weight(abstract_txt:time in 1456) [ClassicSimilarity], result of:
            0.01855612 = score(doc=1456,freq=1.0), product of:
              0.07127232 = queryWeight, product of:
                1.26602 = boost
                4.1656833 = idf(docFreq=1802, maxDocs=42740)
                0.013514317 = queryNorm
              0.2603552 = fieldWeight in 1456, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1656833 = idf(docFreq=1802, maxDocs=42740)
                0.0625 = fieldNorm(doc=1456)
          0.014269378 = weight(abstract_txt:these in 1456) [ClassicSimilarity], result of:
            0.014269378 = score(doc=1456,freq=1.0), product of:
              0.07092768 = queryWeight, product of:
                1.6304684 = boost
                3.2189133 = idf(docFreq=4646, maxDocs=42740)
                0.013514317 = queryNorm
              0.20118208 = fieldWeight in 1456, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.2189133 = idf(docFreq=4646, maxDocs=42740)
                0.0625 = fieldNorm(doc=1456)
          0.24309045 = weight(abstract_txt:microblogs in 1456) [ClassicSimilarity], result of:
            0.24309045 = score(doc=1456,freq=3.0), product of:
              0.23990387 = queryWeight, product of:
                1.8965012 = boost
                9.360306 = idf(docFreq=9, maxDocs=42740)
                0.013514317 = queryNorm
              1.0132828 = fieldWeight in 1456, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.360306 = idf(docFreq=9, maxDocs=42740)
                0.0625 = fieldNorm(doc=1456)
          0.7845711 = weight(abstract_txt:microblog in 1456) [ClassicSimilarity], result of:
            0.7845711 = score(doc=1456,freq=5.0), product of:
              0.59975964 = queryWeight, product of:
                4.741253 = boost
                9.360306 = idf(docFreq=9, maxDocs=42740)
                0.013514317 = queryNorm
              1.3081425 = fieldWeight in 1456, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                9.360306 = idf(docFreq=9, maxDocs=42740)
                0.0625 = fieldNorm(doc=1456)
        0.24 = coord(6/25)
    
  2. Jansen, B.J.; Zhang, M.; Sobel, K.; Chowdury, A.: Twitter power : tweets as electronic word of mouth (2009) 0.25
    0.24837679 = sum of:
      0.24837679 = product of:
        1.0349033 = sum of:
          0.042168353 = weight(abstract_txt:containing in 158) [ClassicSimilarity], result of:
            0.042168353 = score(doc=158,freq=1.0), product of:
              0.10761929 = queryWeight, product of:
                1.2702218 = boost
                6.269263 = idf(docFreq=219, maxDocs=42740)
                0.013514317 = queryNorm
              0.39182892 = fieldWeight in 158, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.269263 = idf(docFreq=219, maxDocs=42740)
                0.0625 = fieldNorm(doc=158)
          0.0775385 = weight(abstract_txt:expressions in 158) [ClassicSimilarity], result of:
            0.0775385 = score(doc=158,freq=2.0), product of:
              0.12820372 = queryWeight, product of:
                1.3863881 = boost
                6.842609 = idf(docFreq=123, maxDocs=42740)
                0.013514317 = queryNorm
              0.6048069 = fieldWeight in 158, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.842609 = idf(docFreq=123, maxDocs=42740)
                0.0625 = fieldNorm(doc=158)
          0.11357811 = weight(abstract_txt:tweets in 158) [ClassicSimilarity], result of:
            0.11357811 = score(doc=158,freq=2.0), product of:
              0.16535556 = queryWeight, product of:
                1.5745045 = boost
                7.77107 = idf(docFreq=48, maxDocs=42740)
                0.013514317 = queryNorm
              0.686872 = fieldWeight in 158, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.77107 = idf(docFreq=48, maxDocs=42740)
                0.0625 = fieldNorm(doc=158)
          0.024715288 = weight(abstract_txt:these in 158) [ClassicSimilarity], result of:
            0.024715288 = score(doc=158,freq=3.0), product of:
              0.07092768 = queryWeight, product of:
                1.6304684 = boost
                3.2189133 = idf(docFreq=4646, maxDocs=42740)
                0.013514317 = queryNorm
              0.34845757 = fieldWeight in 158, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.2189133 = idf(docFreq=4646, maxDocs=42740)
                0.0625 = fieldNorm(doc=158)
          0.2806967 = weight(abstract_txt:microblogs in 158) [ClassicSimilarity], result of:
            0.2806967 = score(doc=158,freq=4.0), product of:
              0.23990387 = queryWeight, product of:
                1.8965012 = boost
                9.360306 = idf(docFreq=9, maxDocs=42740)
                0.013514317 = queryNorm
              1.1700382 = fieldWeight in 158, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                9.360306 = idf(docFreq=9, maxDocs=42740)
                0.0625 = fieldNorm(doc=158)
          0.4962063 = weight(abstract_txt:microblog in 158) [ClassicSimilarity], result of:
            0.4962063 = score(doc=158,freq=2.0), product of:
              0.59975964 = queryWeight, product of:
                4.741253 = boost
                9.360306 = idf(docFreq=9, maxDocs=42740)
                0.013514317 = queryNorm
              0.827342 = fieldWeight in 158, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.360306 = idf(docFreq=9, maxDocs=42740)
                0.0625 = fieldNorm(doc=158)
        0.24 = coord(6/25)
    
  3. Paltoglou, G.: Sentiment-based event detection in Twitter (2016) 0.18
    0.18390271 = sum of:
      0.18390271 = product of:
        0.6567954 = sum of:
          0.01821556 = weight(abstract_txt:analysis in 5011) [ClassicSimilarity], result of:
            0.01821556 = score(doc=5011,freq=2.0), product of:
              0.05587461 = queryWeight, product of:
                1.120953 = boost
                3.6883576 = idf(docFreq=2905, maxDocs=42740)
                0.013514317 = queryNorm
              0.3260078 = fieldWeight in 5011, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.6883576 = idf(docFreq=2905, maxDocs=42740)
                0.0625 = fieldNorm(doc=5011)
          0.01855612 = weight(abstract_txt:time in 5011) [ClassicSimilarity], result of:
            0.01855612 = score(doc=5011,freq=1.0), product of:
              0.07127232 = queryWeight, product of:
                1.26602 = boost
                4.1656833 = idf(docFreq=1802, maxDocs=42740)
                0.013514317 = queryNorm
              0.2603552 = fieldWeight in 5011, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1656833 = idf(docFreq=1802, maxDocs=42740)
                0.0625 = fieldNorm(doc=5011)
          0.07467572 = weight(abstract_txt:events in 5011) [ClassicSimilarity], result of:
            0.07467572 = score(doc=5011,freq=3.0), product of:
              0.10922237 = queryWeight, product of:
                1.2796474 = boost
                6.315783 = idf(docFreq=209, maxDocs=42740)
                0.013514317 = queryNorm
              0.68370354 = fieldWeight in 5011, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.315783 = idf(docFreq=209, maxDocs=42740)
                0.0625 = fieldNorm(doc=5011)
          0.08031186 = weight(abstract_txt:tweets in 5011) [ClassicSimilarity], result of:
            0.08031186 = score(doc=5011,freq=1.0), product of:
              0.16535556 = queryWeight, product of:
                1.5745045 = boost
                7.77107 = idf(docFreq=48, maxDocs=42740)
                0.013514317 = queryNorm
              0.48569188 = fieldWeight in 5011, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.77107 = idf(docFreq=48, maxDocs=42740)
                0.0625 = fieldNorm(doc=5011)
          0.09650379 = weight(abstract_txt:sensitivity in 5011) [ClassicSimilarity], result of:
            0.09650379 = score(doc=5011,freq=1.0), product of:
              0.18689397 = queryWeight, product of:
                1.6739101 = boost
                8.261693 = idf(docFreq=29, maxDocs=42740)
                0.013514317 = queryNorm
              0.5163558 = fieldWeight in 5011, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.261693 = idf(docFreq=29, maxDocs=42740)
                0.0625 = fieldNorm(doc=5011)
          0.15399167 = weight(abstract_txt:detection in 5011) [ClassicSimilarity], result of:
            0.15399167 = score(doc=5011,freq=2.0), product of:
              0.2552097 = queryWeight, product of:
                2.7662923 = boost
                6.8266087 = idf(docFreq=125, maxDocs=42740)
                0.013514317 = queryNorm
              0.60339266 = fieldWeight in 5011, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.8266087 = idf(docFreq=125, maxDocs=42740)
                0.0625 = fieldNorm(doc=5011)
          0.21454068 = weight(abstract_txt:event in 5011) [ClassicSimilarity], result of:
            0.21454068 = score(doc=5011,freq=2.0), product of:
              0.34293264 = queryWeight, product of:
                3.585163 = boost
                7.077923 = idf(docFreq=97, maxDocs=42740)
                0.013514317 = queryNorm
              0.6256059 = fieldWeight in 5011, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.077923 = idf(docFreq=97, maxDocs=42740)
                0.0625 = fieldNorm(doc=5011)
        0.28 = coord(7/25)
    
  4. Kim, H.H.; Kim, Y.H.: ERP/MMR algorithm for classifying topic-relevant and topic-irrelevant visual shots of documentary videos (2019) 0.18
    0.17521624 = sum of:
      0.17521624 = product of:
        0.6257723 = sum of:
          0.012880346 = weight(abstract_txt:analysis in 1359) [ClassicSimilarity], result of:
            0.012880346 = score(doc=1359,freq=1.0), product of:
              0.05587461 = queryWeight, product of:
                1.120953 = boost
                3.6883576 = idf(docFreq=2905, maxDocs=42740)
                0.013514317 = queryNorm
              0.23052235 = fieldWeight in 1359, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6883576 = idf(docFreq=2905, maxDocs=42740)
                0.0625 = fieldNorm(doc=1359)
          0.044120967 = weight(abstract_txt:significance in 1359) [ClassicSimilarity], result of:
            0.044120967 = score(doc=1359,freq=1.0), product of:
              0.11091639 = queryWeight, product of:
                1.2895327 = boost
                6.364573 = idf(docFreq=199, maxDocs=42740)
                0.013514317 = queryNorm
              0.3977858 = fieldWeight in 1359, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.364573 = idf(docFreq=199, maxDocs=42740)
                0.0625 = fieldNorm(doc=1359)
          0.047847204 = weight(abstract_txt:diversity in 1359) [ClassicSimilarity], result of:
            0.047847204 = score(doc=1359,freq=1.0), product of:
              0.1170766 = queryWeight, product of:
                1.3248587 = boost
                6.5389266 = idf(docFreq=167, maxDocs=42740)
                0.013514317 = queryNorm
              0.4086829 = fieldWeight in 1359, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5389266 = idf(docFreq=167, maxDocs=42740)
                0.0625 = fieldNorm(doc=1359)
          0.036287248 = weight(abstract_txt:proposed in 1359) [ClassicSimilarity], result of:
            0.036287248 = score(doc=1359,freq=2.0), product of:
              0.08846174 = queryWeight, product of:
                1.4104506 = boost
                4.640914 = idf(docFreq=1120, maxDocs=42740)
                0.013514317 = queryNorm
              0.4102027 = fieldWeight in 1359, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.640914 = idf(docFreq=1120, maxDocs=42740)
                0.0625 = fieldNorm(doc=1359)
          0.10888855 = weight(abstract_txt:detection in 1359) [ClassicSimilarity], result of:
            0.10888855 = score(doc=1359,freq=1.0), product of:
              0.2552097 = queryWeight, product of:
                2.7662923 = boost
                6.8266087 = idf(docFreq=125, maxDocs=42740)
                0.013514317 = queryNorm
              0.42666304 = fieldWeight in 1359, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8266087 = idf(docFreq=125, maxDocs=42740)
                0.0625 = fieldNorm(doc=1359)
          0.15170318 = weight(abstract_txt:event in 1359) [ClassicSimilarity], result of:
            0.15170318 = score(doc=1359,freq=1.0), product of:
              0.34293264 = queryWeight, product of:
                3.585163 = boost
                7.077923 = idf(docFreq=97, maxDocs=42740)
                0.013514317 = queryNorm
              0.44237018 = fieldWeight in 1359, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.077923 = idf(docFreq=97, maxDocs=42740)
                0.0625 = fieldNorm(doc=1359)
          0.22404478 = weight(abstract_txt:topic in 1359) [ClassicSimilarity], result of:
            0.22404478 = score(doc=1359,freq=8.0), product of:
              0.24876063 = queryWeight, product of:
                3.6129284 = boost
                5.0948124 = idf(docFreq=711, maxDocs=42740)
                0.013514317 = queryNorm
              0.90064406 = fieldWeight in 1359, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                5.0948124 = idf(docFreq=711, maxDocs=42740)
                0.0625 = fieldNorm(doc=1359)
        0.28 = coord(7/25)
    
  5. Aksoy, C.; Can, F.; Kocberber, S.: Novelty detection for topic tracking (2012) 0.14
    0.14307164 = sum of:
      0.14307164 = product of:
        0.59613186 = sum of:
          0.028730461 = weight(abstract_txt:address in 2052) [ClassicSimilarity], result of:
            0.028730461 = score(doc=2052,freq=1.0), product of:
              0.083328605 = queryWeight, product of:
                1.1177162 = boost
                5.5165615 = idf(docFreq=466, maxDocs=42740)
                0.013514317 = queryNorm
              0.3447851 = fieldWeight in 2052, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5165615 = idf(docFreq=466, maxDocs=42740)
                0.0625 = fieldNorm(doc=2052)
          0.01855612 = weight(abstract_txt:time in 2052) [ClassicSimilarity], result of:
            0.01855612 = score(doc=2052,freq=1.0), product of:
              0.07127232 = queryWeight, product of:
                1.26602 = boost
                4.1656833 = idf(docFreq=1802, maxDocs=42740)
                0.013514317 = queryNorm
              0.2603552 = fieldWeight in 2052, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1656833 = idf(docFreq=1802, maxDocs=42740)
                0.0625 = fieldNorm(doc=2052)
          0.043114047 = weight(abstract_txt:events in 2052) [ClassicSimilarity], result of:
            0.043114047 = score(doc=2052,freq=1.0), product of:
              0.10922237 = queryWeight, product of:
                1.2796474 = boost
                6.315783 = idf(docFreq=209, maxDocs=42740)
                0.013514317 = queryNorm
              0.39473644 = fieldWeight in 2052, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.315783 = idf(docFreq=209, maxDocs=42740)
                0.0625 = fieldNorm(doc=2052)
          0.15399167 = weight(abstract_txt:detection in 2052) [ClassicSimilarity], result of:
            0.15399167 = score(doc=2052,freq=2.0), product of:
              0.2552097 = queryWeight, product of:
                2.7662923 = boost
                6.8266087 = idf(docFreq=125, maxDocs=42740)
                0.013514317 = queryNorm
              0.60339266 = fieldWeight in 2052, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.8266087 = idf(docFreq=125, maxDocs=42740)
                0.0625 = fieldNorm(doc=2052)
          0.21454068 = weight(abstract_txt:event in 2052) [ClassicSimilarity], result of:
            0.21454068 = score(doc=2052,freq=2.0), product of:
              0.34293264 = queryWeight, product of:
                3.585163 = boost
                7.077923 = idf(docFreq=97, maxDocs=42740)
                0.013514317 = queryNorm
              0.6256059 = fieldWeight in 2052, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.077923 = idf(docFreq=97, maxDocs=42740)
                0.0625 = fieldNorm(doc=2052)
          0.13719885 = weight(abstract_txt:topic in 2052) [ClassicSimilarity], result of:
            0.13719885 = score(doc=2052,freq=3.0), product of:
              0.24876063 = queryWeight, product of:
                3.6129284 = boost
                5.0948124 = idf(docFreq=711, maxDocs=42740)
                0.013514317 = queryNorm
              0.5515296 = fieldWeight in 2052, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.0948124 = idf(docFreq=711, maxDocs=42740)
                0.0625 = fieldNorm(doc=2052)
        0.24 = coord(6/25)