Document (#41370)

Author
Zubiaga, A.
Title
¬A longitudinal assessment of the persistence of twitter datasets
Source
Journal of the Association for Information Science and Technology. 69(2018) no.8, S.974-984
Year
2018
Abstract
Social media datasets are not always completely replicable. Having to adhere to requirements of platforms such as Twitter, researchers can only release a list of unique identifiers, which others can then use to recollect the data themselves. This leads to subsets of the data no longer being available, as content can be deleted or user accounts deactivated. To quantify the long-term impact of this in the replicability of datasets, we perform a longitudinal analysis of the persistence of 30 Twitter datasets, which include more than 147 million tweets. By recollecting Twitter datasets ranging from 0 to 4 years old by using the tweet IDs, we look at four different factors quantifying the extent to which recollected datasets resemble original ones: completeness, representativity, similarity, and changingness. Although the ratio of available tweets keeps decreasing as the dataset gets older, we find that the textual content of the recollected subset is still largely representative of the original dataset. The representativity of the metadata, however, keeps fading over time, both because the dataset shrinks and because certain metadata, such as the users' number of followers, keeps changing. Our study has important implications for researchers sharing and using publicly shared Twitter datasets in their research.
Content
Vgl.: https://onlinelibrary.wiley.com/doi/abs/10.1002/asi.24026.
Theme
Informetrie
Object
Twitter

Similar documents (content)

  1. Sedhai, S.; Sun, A.: ¬An analysis of 14 Million tweets on hashtag-oriented spamming* (2017) 0.26
    0.26205263 = sum of:
      0.26205263 = product of:
        0.9359022 = sum of:
          0.015045818 = weight(abstract_txt:content in 5684) [ClassicSimilarity], result of:
            0.015045818 = score(doc=5684,freq=1.0), product of:
              0.05733951 = queryWeight, product of:
                1.0718017 = boost
                4.1983805 = idf(docFreq=1744, maxDocs=42740)
                0.012742592 = queryNorm
              0.26239878 = fieldWeight in 5684, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1983805 = idf(docFreq=1744, maxDocs=42740)
                0.0625 = fieldNorm(doc=5684)
          0.11271355 = weight(abstract_txt:tweet in 5684) [ClassicSimilarity], result of:
            0.11271355 = score(doc=5684,freq=3.0), product of:
              0.12081211 = queryWeight, product of:
                1.1000886 = boost
                8.618368 = idf(docFreq=20, maxDocs=42740)
                0.012742592 = queryNorm
              0.9329657 = fieldWeight in 5684, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.618368 = idf(docFreq=20, maxDocs=42740)
                0.0625 = fieldNorm(doc=5684)
          0.010887596 = weight(abstract_txt:which in 5684) [ClassicSimilarity], result of:
            0.010887596 = score(doc=5684,freq=2.0), product of:
              0.04199058 = queryWeight, product of:
                1.1233342 = boost
                2.9334934 = idf(docFreq=6181, maxDocs=42740)
                0.012742592 = queryNorm
              0.25928664 = fieldWeight in 5684, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.9334934 = idf(docFreq=6181, maxDocs=42740)
                0.0625 = fieldNorm(doc=5684)
          0.024007224 = weight(abstract_txt:metadata in 5684) [ClassicSimilarity], result of:
            0.024007224 = score(doc=5684,freq=1.0), product of:
              0.07829573 = queryWeight, product of:
                1.2524387 = boost
                4.905958 = idf(docFreq=859, maxDocs=42740)
                0.012742592 = queryNorm
              0.3066224 = fieldWeight in 5684, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.905958 = idf(docFreq=859, maxDocs=42740)
                0.0625 = fieldNorm(doc=5684)
          0.25244218 = weight(abstract_txt:tweets in 5684) [ClassicSimilarity], result of:
            0.25244218 = score(doc=5684,freq=7.0), product of:
              0.19645001 = queryWeight, product of:
                1.9838711 = boost
                7.77107 = idf(docFreq=48, maxDocs=42740)
                0.012742592 = queryNorm
              1.2850199 = fieldWeight in 5684, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                7.77107 = idf(docFreq=48, maxDocs=42740)
                0.0625 = fieldNorm(doc=5684)
          0.10500667 = weight(abstract_txt:dataset in 5684) [ClassicSimilarity], result of:
            0.10500667 = score(doc=5684,freq=1.0), product of:
              0.23970944 = queryWeight, product of:
                2.6839573 = boost
                7.00893 = idf(docFreq=104, maxDocs=42740)
                0.012742592 = queryNorm
              0.43805814 = fieldWeight in 5684, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.00893 = idf(docFreq=104, maxDocs=42740)
                0.0625 = fieldNorm(doc=5684)
          0.41579914 = weight(abstract_txt:twitter in 5684) [ClassicSimilarity], result of:
            0.41579914 = score(doc=5684,freq=5.0), product of:
              0.41599602 = queryWeight, product of:
                4.5645924 = boost
                7.152031 = idf(docFreq=90, maxDocs=42740)
                0.012742592 = queryNorm
              0.99952674 = fieldWeight in 5684, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.152031 = idf(docFreq=90, maxDocs=42740)
                0.0625 = fieldNorm(doc=5684)
        0.28 = coord(7/25)
    
  2. Saif, H.; He, Y.; Fernandez, M.; Alani, H.: Contextual semantics for sentiment analysis of Twitter (2016) 0.22
    0.22411302 = sum of:
      0.22411302 = product of:
        0.9338043 = sum of:
          0.092030235 = weight(abstract_txt:tweet in 4668) [ClassicSimilarity], result of:
            0.092030235 = score(doc=4668,freq=2.0), product of:
              0.12081211 = queryWeight, product of:
                1.1000886 = boost
                8.618368 = idf(docFreq=20, maxDocs=42740)
                0.012742592 = queryNorm
              0.76176333 = fieldWeight in 4668, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.618368 = idf(docFreq=20, maxDocs=42740)
                0.0625 = fieldNorm(doc=4668)
          0.007698693 = weight(abstract_txt:which in 4668) [ClassicSimilarity], result of:
            0.007698693 = score(doc=4668,freq=1.0), product of:
              0.04199058 = queryWeight, product of:
                1.1233342 = boost
                2.9334934 = idf(docFreq=6181, maxDocs=42740)
                0.012742592 = queryNorm
              0.18334334 = fieldWeight in 4668, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.9334934 = idf(docFreq=6181, maxDocs=42740)
                0.0625 = fieldNorm(doc=4668)
          0.09541418 = weight(abstract_txt:tweets in 4668) [ClassicSimilarity], result of:
            0.09541418 = score(doc=4668,freq=1.0), product of:
              0.19645001 = queryWeight, product of:
                1.9838711 = boost
                7.77107 = idf(docFreq=48, maxDocs=42740)
                0.012742592 = queryNorm
              0.48569188 = fieldWeight in 4668, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.77107 = idf(docFreq=48, maxDocs=42740)
                0.0625 = fieldNorm(doc=4668)
          0.10500667 = weight(abstract_txt:dataset in 4668) [ClassicSimilarity], result of:
            0.10500667 = score(doc=4668,freq=1.0), product of:
              0.23970944 = queryWeight, product of:
                2.6839573 = boost
                7.00893 = idf(docFreq=104, maxDocs=42740)
                0.012742592 = queryNorm
              0.43805814 = fieldWeight in 4668, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.00893 = idf(docFreq=104, maxDocs=42740)
                0.0625 = fieldNorm(doc=4668)
          0.32207662 = weight(abstract_txt:twitter in 4668) [ClassicSimilarity], result of:
            0.32207662 = score(doc=4668,freq=3.0), product of:
              0.41599602 = queryWeight, product of:
                4.5645924 = boost
                7.152031 = idf(docFreq=90, maxDocs=42740)
                0.012742592 = queryNorm
              0.77423006 = fieldWeight in 4668, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.152031 = idf(docFreq=90, maxDocs=42740)
                0.0625 = fieldNorm(doc=4668)
          0.31157792 = weight(abstract_txt:datasets in 4668) [ClassicSimilarity], result of:
            0.31157792 = score(doc=4668,freq=2.0), product of:
              0.5210754 = queryWeight, product of:
                6.0446577 = boost
                6.765051 = idf(docFreq=133, maxDocs=42740)
                0.012742592 = queryNorm
              0.59795165 = fieldWeight in 4668, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.765051 = idf(docFreq=133, maxDocs=42740)
                0.0625 = fieldNorm(doc=4668)
        0.24 = coord(6/25)
    
  3. Ortega, J.L.: ¬The presence of academic journals on Twitter and its relationship with dissemination (tweets) and research impact (citations) (2017) 0.16
    0.15770002 = sum of:
      0.15770002 = product of:
        0.7885001 = sum of:
          0.015045818 = weight(abstract_txt:content in 411) [ClassicSimilarity], result of:
            0.015045818 = score(doc=411,freq=1.0), product of:
              0.05733951 = queryWeight, product of:
                1.0718017 = boost
                4.1983805 = idf(docFreq=1744, maxDocs=42740)
                0.012742592 = queryNorm
              0.26239878 = fieldWeight in 411, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1983805 = idf(docFreq=1744, maxDocs=42740)
                0.0625 = fieldNorm(doc=411)
          0.007698693 = weight(abstract_txt:which in 411) [ClassicSimilarity], result of:
            0.007698693 = score(doc=411,freq=1.0), product of:
              0.04199058 = queryWeight, product of:
                1.1233342 = boost
                2.9334934 = idf(docFreq=6181, maxDocs=42740)
                0.012742592 = queryNorm
              0.18334334 = fieldWeight in 411, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.9334934 = idf(docFreq=6181, maxDocs=42740)
                0.0625 = fieldNorm(doc=411)
          0.07655434 = weight(abstract_txt:followers in 411) [ClassicSimilarity], result of:
            0.07655434 = score(doc=411,freq=1.0), product of:
              0.1346315 = queryWeight, product of:
                1.1613035 = boost
                9.097941 = idf(docFreq=12, maxDocs=42740)
                0.012742592 = queryNorm
              0.56862134 = fieldWeight in 411, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.097941 = idf(docFreq=12, maxDocs=42740)
                0.0625 = fieldNorm(doc=411)
          0.23371604 = weight(abstract_txt:tweets in 411) [ClassicSimilarity], result of:
            0.23371604 = score(doc=411,freq=6.0), product of:
              0.19645001 = queryWeight, product of:
                1.9838711 = boost
                7.77107 = idf(docFreq=48, maxDocs=42740)
                0.012742592 = queryNorm
              1.1896973 = fieldWeight in 411, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.77107 = idf(docFreq=48, maxDocs=42740)
                0.0625 = fieldNorm(doc=411)
          0.45548514 = weight(abstract_txt:twitter in 411) [ClassicSimilarity], result of:
            0.45548514 = score(doc=411,freq=6.0), product of:
              0.41599602 = queryWeight, product of:
                4.5645924 = boost
                7.152031 = idf(docFreq=90, maxDocs=42740)
                0.012742592 = queryNorm
              1.0949267 = fieldWeight in 411, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                7.152031 = idf(docFreq=90, maxDocs=42740)
                0.0625 = fieldNorm(doc=411)
        0.2 = coord(5/25)
    
  4. Haustein, S.; Bowman, T.D.; Holmberg, K.; Tsou, A.; Sugimoto, C.R.; Larivière, V.: Tweets as impact indicators : Examining the implications of automated "bot" accounts on Twitter (2016) 0.14
    0.13878398 = sum of:
      0.13878398 = product of:
        0.6939199 = sum of:
          0.018807271 = weight(abstract_txt:content in 4503) [ClassicSimilarity], result of:
            0.018807271 = score(doc=4503,freq=1.0), product of:
              0.05733951 = queryWeight, product of:
                1.0718017 = boost
                4.1983805 = idf(docFreq=1744, maxDocs=42740)
                0.012742592 = queryNorm
              0.32799846 = fieldWeight in 4503, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1983805 = idf(docFreq=1744, maxDocs=42740)
                0.078125 = fieldNorm(doc=4503)
          0.08134401 = weight(abstract_txt:tweet in 4503) [ClassicSimilarity], result of:
            0.08134401 = score(doc=4503,freq=1.0), product of:
              0.12081211 = queryWeight, product of:
                1.1000886 = boost
                8.618368 = idf(docFreq=20, maxDocs=42740)
                0.012742592 = queryNorm
              0.67331004 = fieldWeight in 4503, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.618368 = idf(docFreq=20, maxDocs=42740)
                0.078125 = fieldNorm(doc=4503)
          0.009623366 = weight(abstract_txt:which in 4503) [ClassicSimilarity], result of:
            0.009623366 = score(doc=4503,freq=1.0), product of:
              0.04199058 = queryWeight, product of:
                1.1233342 = boost
                2.9334934 = idf(docFreq=6181, maxDocs=42740)
                0.012742592 = queryNorm
              0.22917917 = fieldWeight in 4503, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.9334934 = idf(docFreq=6181, maxDocs=42740)
                0.078125 = fieldNorm(doc=4503)
          0.11926772 = weight(abstract_txt:tweets in 4503) [ClassicSimilarity], result of:
            0.11926772 = score(doc=4503,freq=1.0), product of:
              0.19645001 = queryWeight, product of:
                1.9838711 = boost
                7.77107 = idf(docFreq=48, maxDocs=42740)
                0.012742592 = queryNorm
              0.60711485 = fieldWeight in 4503, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.77107 = idf(docFreq=48, maxDocs=42740)
                0.078125 = fieldNorm(doc=4503)
          0.46487755 = weight(abstract_txt:twitter in 4503) [ClassicSimilarity], result of:
            0.46487755 = score(doc=4503,freq=4.0), product of:
              0.41599602 = queryWeight, product of:
                4.5645924 = boost
                7.152031 = idf(docFreq=90, maxDocs=42740)
                0.012742592 = queryNorm
              1.1175048 = fieldWeight in 4503, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.152031 = idf(docFreq=90, maxDocs=42740)
                0.078125 = fieldNorm(doc=4503)
        0.2 = coord(5/25)
    
  5. Arakawa, Y.; Kameda, A.; Aizawa, A.; Suzuki, T.: Adding Twitter-specific features to stylistic features for classifying tweets by user type and number of retweets (2014) 0.13
    0.13044058 = sum of:
      0.13044058 = product of:
        0.65220284 = sum of:
          0.065075204 = weight(abstract_txt:tweet in 3308) [ClassicSimilarity], result of:
            0.065075204 = score(doc=3308,freq=1.0), product of:
              0.12081211 = queryWeight, product of:
                1.1000886 = boost
                8.618368 = idf(docFreq=20, maxDocs=42740)
                0.012742592 = queryNorm
              0.538648 = fieldWeight in 3308, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.618368 = idf(docFreq=20, maxDocs=42740)
                0.0625 = fieldNorm(doc=3308)
          0.07655434 = weight(abstract_txt:followers in 3308) [ClassicSimilarity], result of:
            0.07655434 = score(doc=3308,freq=1.0), product of:
              0.1346315 = queryWeight, product of:
                1.1613035 = boost
                9.097941 = idf(docFreq=12, maxDocs=42740)
                0.012742592 = queryNorm
              0.56862134 = fieldWeight in 3308, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.097941 = idf(docFreq=12, maxDocs=42740)
                0.0625 = fieldNorm(doc=3308)
          0.023234515 = weight(abstract_txt:researchers in 3308) [ClassicSimilarity], result of:
            0.023234515 = score(doc=3308,freq=1.0), product of:
              0.07660654 = queryWeight, product of:
                1.2388546 = boost
                4.852748 = idf(docFreq=906, maxDocs=42740)
                0.012742592 = queryNorm
              0.30329674 = fieldWeight in 3308, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.852748 = idf(docFreq=906, maxDocs=42740)
                0.0625 = fieldNorm(doc=3308)
          0.16526219 = weight(abstract_txt:tweets in 3308) [ClassicSimilarity], result of:
            0.16526219 = score(doc=3308,freq=3.0), product of:
              0.19645001 = queryWeight, product of:
                1.9838711 = boost
                7.77107 = idf(docFreq=48, maxDocs=42740)
                0.012742592 = queryNorm
              0.84124297 = fieldWeight in 3308, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.77107 = idf(docFreq=48, maxDocs=42740)
                0.0625 = fieldNorm(doc=3308)
          0.32207662 = weight(abstract_txt:twitter in 3308) [ClassicSimilarity], result of:
            0.32207662 = score(doc=3308,freq=3.0), product of:
              0.41599602 = queryWeight, product of:
                4.5645924 = boost
                7.152031 = idf(docFreq=90, maxDocs=42740)
                0.012742592 = queryNorm
              0.77423006 = fieldWeight in 3308, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.152031 = idf(docFreq=90, maxDocs=42740)
                0.0625 = fieldNorm(doc=3308)
        0.2 = coord(5/25)