Document (#42508)

Editor
Hale, C.
Schmidt, V.
Author
Pekar, V.
Binner, J.
Najafi, H.
Title
Early detection of heterogeneous disaster events using social media
Source
Journal of the Association for Information Science and Technology. 71(2020) no.1, S.43-54
Year
2020
Abstract
This article addresses the problem of detecting crisis-related messages on social media, in order to improve the situational awareness of emergency services. Previous work focused on developing machine-learning classifiers restricted to specific disasters, such as storms or wildfires. We investigate for the first time methods to detect such messages where the type of the crisis is not known in advance, that is, the data are highly heterogeneous. Data heterogeneity causes significant difficulties for learning algorithms to generalize and accurately label incoming data. Our main contributions are as follows. First, we evaluate the extent of this problem in the context of disaster management, finding that the performance of traditional learners drops by up to 40% when trained and tested on heterogeneous data vis-á-vis homogeneous data. Then, in order to overcome data heterogeneity, we propose a new ensemble learning method, and found this to perform on a par with the Gradient Boosting and AdaBoost ensemble learners. The methods are studied on a benchmark data set comprising 26 disaster events and four classification problems: detection of relevant messages, informative messages, eyewitness reports, and topical classification of messages. Finally, in a case study, we evaluate the proposed methods on a real-world data set to assess its practical value.
Content
Vgl.: https://asistdl.onlinelibrary.wiley.com/doi/10.1002/asi.24208.

Similar documents (content)

  1. Rahmi, R.; Joho, H.; Shirai, T.: ¬An analysis of natural disaster-related information-seeking behavior using temporal stages (2019) 0.21
    0.21100307 = sum of:
      0.21100307 = product of:
        0.8791795 = sum of:
          0.017461125 = weight(abstract_txt:first in 5298) [ClassicSimilarity], result of:
            0.017461125 = score(doc=5298,freq=1.0), product of:
              0.067028984 = queryWeight, product of:
                4.168018 = idf(docFreq=1860, maxDocs=44218)
                0.016081741 = queryNorm
              0.26050112 = fieldWeight in 5298, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.168018 = idf(docFreq=1860, maxDocs=44218)
                0.0625 = fieldNorm(doc=5298)
          0.018091552 = weight(abstract_txt:social in 5298) [ClassicSimilarity], result of:
            0.018091552 = score(doc=5298,freq=1.0), product of:
              0.0686328 = queryWeight, product of:
                1.0118928 = boost
                4.2175875 = idf(docFreq=1770, maxDocs=44218)
                0.016081741 = queryNorm
              0.26359922 = fieldWeight in 5298, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2175875 = idf(docFreq=1770, maxDocs=44218)
                0.0625 = fieldNorm(doc=5298)
          0.12671879 = weight(abstract_txt:disasters in 5298) [ClassicSimilarity], result of:
            0.12671879 = score(doc=5298,freq=2.0), product of:
              0.15827847 = queryWeight, product of:
                1.0865872 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.016081741 = queryNorm
              0.8006066 = fieldWeight in 5298, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.0625 = fieldNorm(doc=5298)
          0.059621852 = weight(abstract_txt:events in 5298) [ClassicSimilarity], result of:
            0.059621852 = score(doc=5298,freq=1.0), product of:
              0.15199108 = queryWeight, product of:
                1.5058362 = boost
                6.2763524 = idf(docFreq=225, maxDocs=44218)
                0.016081741 = queryNorm
              0.39227203 = fieldWeight in 5298, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2763524 = idf(docFreq=225, maxDocs=44218)
                0.0625 = fieldNorm(doc=5298)
          0.6066253 = weight(abstract_txt:disaster in 5298) [ClassicSimilarity], result of:
            0.6066253 = score(doc=5298,freq=8.0), product of:
              0.40847167 = queryWeight, product of:
                3.0233977 = boost
                8.401051 = idf(docFreq=26, maxDocs=44218)
                0.016081741 = queryNorm
              1.4851099 = fieldWeight in 5298, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                8.401051 = idf(docFreq=26, maxDocs=44218)
                0.0625 = fieldNorm(doc=5298)
          0.05066085 = weight(abstract_txt:data in 5298) [ClassicSimilarity], result of:
            0.05066085 = score(doc=5298,freq=2.0), product of:
              0.17179325 = queryWeight, product of:
                3.2018557 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.016081741 = queryNorm
              0.29489428 = fieldWeight in 5298, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.0625 = fieldNorm(doc=5298)
        0.24 = coord(6/25)
    
  2. Paltoglou, G.: Sentiment-based event detection in Twitter (2016) 0.15
    0.1501929 = sum of:
      0.1501929 = product of:
        0.53640324 = sum of:
          0.025585316 = weight(abstract_txt:social in 3010) [ClassicSimilarity], result of:
            0.025585316 = score(doc=3010,freq=2.0), product of:
              0.0686328 = queryWeight, product of:
                1.0118928 = boost
                4.2175875 = idf(docFreq=1770, maxDocs=44218)
                0.016081741 = queryNorm
              0.37278557 = fieldWeight in 3010, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2175875 = idf(docFreq=1770, maxDocs=44218)
                0.0625 = fieldNorm(doc=3010)
          0.021401694 = weight(abstract_txt:problem in 3010) [ClassicSimilarity], result of:
            0.021401694 = score(doc=3010,freq=1.0), product of:
              0.076767944 = queryWeight, product of:
                1.0701845 = boost
                4.460548 = idf(docFreq=1388, maxDocs=44218)
                0.016081741 = queryNorm
              0.27878425 = fieldWeight in 3010, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.460548 = idf(docFreq=1388, maxDocs=44218)
                0.0625 = fieldNorm(doc=3010)
          0.044979285 = weight(abstract_txt:media in 3010) [ClassicSimilarity], result of:
            0.044979285 = score(doc=3010,freq=2.0), product of:
              0.09997226 = queryWeight, product of:
                1.2212609 = boost
                5.090237 = idf(docFreq=739, maxDocs=44218)
                0.016081741 = queryNorm
              0.44991764 = fieldWeight in 3010, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.090237 = idf(docFreq=739, maxDocs=44218)
                0.0625 = fieldNorm(doc=3010)
          0.10326807 = weight(abstract_txt:events in 3010) [ClassicSimilarity], result of:
            0.10326807 = score(doc=3010,freq=3.0), product of:
              0.15199108 = queryWeight, product of:
                1.5058362 = boost
                6.2763524 = idf(docFreq=225, maxDocs=44218)
                0.016081741 = queryNorm
              0.6794351 = fieldWeight in 3010, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.2763524 = idf(docFreq=225, maxDocs=44218)
                0.0625 = fieldNorm(doc=3010)
          0.10648798 = weight(abstract_txt:detection in 3010) [ClassicSimilarity], result of:
            0.10648798 = score(doc=3010,freq=2.0), product of:
              0.17758444 = queryWeight, product of:
                1.6276879 = boost
                6.784232 = idf(docFreq=135, maxDocs=44218)
                0.016081741 = queryNorm
              0.59964705 = fieldWeight in 3010, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.784232 = idf(docFreq=135, maxDocs=44218)
                0.0625 = fieldNorm(doc=3010)
          0.035822626 = weight(abstract_txt:data in 3010) [ClassicSimilarity], result of:
            0.035822626 = score(doc=3010,freq=1.0), product of:
              0.17179325 = queryWeight, product of:
                3.2018557 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.016081741 = queryNorm
              0.20852174 = fieldWeight in 3010, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.0625 = fieldNorm(doc=3010)
          0.19885828 = weight(abstract_txt:messages in 3010) [ClassicSimilarity], result of:
            0.19885828 = score(doc=3010,freq=1.0), product of:
              0.4604936 = queryWeight, product of:
                4.1442933 = boost
                6.9093957 = idf(docFreq=119, maxDocs=44218)
                0.016081741 = queryNorm
              0.43183723 = fieldWeight in 3010, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9093957 = idf(docFreq=119, maxDocs=44218)
                0.0625 = fieldNorm(doc=3010)
        0.28 = coord(7/25)
    
  3. Weessies, K.W.: ¬The publishing dynamics of catastrophic events (2007) 0.12
    0.124762654 = sum of:
      0.124762654 = product of:
        0.7797666 = sum of:
          0.25045 = weight(abstract_txt:disasters in 283) [ClassicSimilarity], result of:
            0.25045 = score(doc=283,freq=5.0), product of:
              0.15827847 = queryWeight, product of:
                1.0865872 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.016081741 = queryNorm
              1.5823377 = fieldWeight in 283, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.078125 = fieldNorm(doc=283)
          0.10539754 = weight(abstract_txt:events in 283) [ClassicSimilarity], result of:
            0.10539754 = score(doc=283,freq=2.0), product of:
              0.15199108 = queryWeight, product of:
                1.5058362 = boost
                6.2763524 = idf(docFreq=225, maxDocs=44218)
                0.016081741 = queryNorm
              0.6934455 = fieldWeight in 283, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.2763524 = idf(docFreq=225, maxDocs=44218)
                0.078125 = fieldNorm(doc=283)
          0.37914082 = weight(abstract_txt:disaster in 283) [ClassicSimilarity], result of:
            0.37914082 = score(doc=283,freq=2.0), product of:
              0.40847167 = queryWeight, product of:
                3.0233977 = boost
                8.401051 = idf(docFreq=26, maxDocs=44218)
                0.016081741 = queryNorm
              0.9281937 = fieldWeight in 283, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.401051 = idf(docFreq=26, maxDocs=44218)
                0.078125 = fieldNorm(doc=283)
          0.044778287 = weight(abstract_txt:data in 283) [ClassicSimilarity], result of:
            0.044778287 = score(doc=283,freq=1.0), product of:
              0.17179325 = queryWeight, product of:
                3.2018557 = boost
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.016081741 = queryNorm
              0.26065218 = fieldWeight in 283, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3363478 = idf(docFreq=4274, maxDocs=44218)
                0.078125 = fieldNorm(doc=283)
        0.16 = coord(4/25)
    
  4. Du, H.; Hao, J.-X..; Kwok, R.; Wagner, C.: Can a lean medium enhance large-group communication? : Examining the impact of interactive mobile learning (2010) 0.11
    0.10770487 = sum of:
      0.10770487 = product of:
        0.5385243 = sum of:
          0.031981643 = weight(abstract_txt:social in 4003) [ClassicSimilarity], result of:
            0.031981643 = score(doc=4003,freq=2.0), product of:
              0.0686328 = queryWeight, product of:
                1.0118928 = boost
                4.2175875 = idf(docFreq=1770, maxDocs=44218)
                0.016081741 = queryNorm
              0.46598196 = fieldWeight in 4003, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2175875 = idf(docFreq=1770, maxDocs=44218)
                0.078125 = fieldNorm(doc=4003)
          0.039756447 = weight(abstract_txt:media in 4003) [ClassicSimilarity], result of:
            0.039756447 = score(doc=4003,freq=1.0), product of:
              0.09997226 = queryWeight, product of:
                1.2212609 = boost
                5.090237 = idf(docFreq=739, maxDocs=44218)
                0.016081741 = queryNorm
              0.39767477 = fieldWeight in 4003, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.090237 = idf(docFreq=739, maxDocs=44218)
                0.078125 = fieldNorm(doc=4003)
          0.083978035 = weight(abstract_txt:learning in 4003) [ClassicSimilarity], result of:
            0.083978035 = score(doc=4003,freq=3.0), product of:
              0.13062961 = queryWeight, product of:
                1.7097598 = boost
                4.750873 = idf(docFreq=1038, maxDocs=44218)
                0.016081741 = queryNorm
              0.6428713 = fieldWeight in 4003, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.750873 = idf(docFreq=1038, maxDocs=44218)
                0.078125 = fieldNorm(doc=4003)
          0.1342354 = weight(abstract_txt:learners in 4003) [ClassicSimilarity], result of:
            0.1342354 = score(doc=4003,freq=1.0), product of:
              0.22500172 = queryWeight, product of:
                1.8321525 = boost
                7.636444 = idf(docFreq=57, maxDocs=44218)
                0.016081741 = queryNorm
              0.5965972 = fieldWeight in 4003, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.636444 = idf(docFreq=57, maxDocs=44218)
                0.078125 = fieldNorm(doc=4003)
          0.24857284 = weight(abstract_txt:messages in 4003) [ClassicSimilarity], result of:
            0.24857284 = score(doc=4003,freq=1.0), product of:
              0.4604936 = queryWeight, product of:
                4.1442933 = boost
                6.9093957 = idf(docFreq=119, maxDocs=44218)
                0.016081741 = queryNorm
              0.53979653 = fieldWeight in 4003, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9093957 = idf(docFreq=119, maxDocs=44218)
                0.078125 = fieldNorm(doc=4003)
        0.2 = coord(5/25)
    
  5. Muresan, S.; Gonzalez-Ibanez, R.; Ghosh, D.; Wacholder, N.: Identification of nonliteral language in social media : a case study on sarcasm (2016) 0.10
    0.102650955 = sum of:
      0.102650955 = product of:
        0.42771232 = sum of:
          0.018091552 = weight(abstract_txt:social in 3155) [ClassicSimilarity], result of:
            0.018091552 = score(doc=3155,freq=1.0), product of:
              0.0686328 = queryWeight, product of:
                1.0118928 = boost
                4.2175875 = idf(docFreq=1770, maxDocs=44218)
                0.016081741 = queryNorm
              0.26359922 = fieldWeight in 3155, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2175875 = idf(docFreq=1770, maxDocs=44218)
                0.0625 = fieldNorm(doc=3155)
          0.031805158 = weight(abstract_txt:media in 3155) [ClassicSimilarity], result of:
            0.031805158 = score(doc=3155,freq=1.0), product of:
              0.09997226 = queryWeight, product of:
                1.2212609 = boost
                5.090237 = idf(docFreq=739, maxDocs=44218)
                0.016081741 = queryNorm
              0.31813982 = fieldWeight in 3155, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.090237 = idf(docFreq=739, maxDocs=44218)
                0.0625 = fieldNorm(doc=3155)
          0.036476564 = weight(abstract_txt:methods in 3155) [ClassicSimilarity], result of:
            0.036476564 = score(doc=3155,freq=2.0), product of:
              0.09952011 = queryWeight, product of:
                1.4923468 = boost
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.016081741 = queryNorm
              0.36652455 = fieldWeight in 3155, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.146752 = idf(docFreq=1900, maxDocs=44218)
                0.0625 = fieldNorm(doc=3155)
          0.07529838 = weight(abstract_txt:detection in 3155) [ClassicSimilarity], result of:
            0.07529838 = score(doc=3155,freq=1.0), product of:
              0.17758444 = queryWeight, product of:
                1.6276879 = boost
                6.784232 = idf(docFreq=135, maxDocs=44218)
                0.016081741 = queryNorm
              0.4240145 = fieldWeight in 3155, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.784232 = idf(docFreq=135, maxDocs=44218)
                0.0625 = fieldNorm(doc=3155)
          0.06718243 = weight(abstract_txt:learning in 3155) [ClassicSimilarity], result of:
            0.06718243 = score(doc=3155,freq=3.0), product of:
              0.13062961 = queryWeight, product of:
                1.7097598 = boost
                4.750873 = idf(docFreq=1038, maxDocs=44218)
                0.016081741 = queryNorm
              0.51429707 = fieldWeight in 3155, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.750873 = idf(docFreq=1038, maxDocs=44218)
                0.0625 = fieldNorm(doc=3155)
          0.19885828 = weight(abstract_txt:messages in 3155) [ClassicSimilarity], result of:
            0.19885828 = score(doc=3155,freq=1.0), product of:
              0.4604936 = queryWeight, product of:
                4.1442933 = boost
                6.9093957 = idf(docFreq=119, maxDocs=44218)
                0.016081741 = queryNorm
              0.43183723 = fieldWeight in 3155, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9093957 = idf(docFreq=119, maxDocs=44218)
                0.0625 = fieldNorm(doc=3155)
        0.24 = coord(6/25)