Search (3 results, page 1 of 1)

  • × theme_ss:"Informetrie"
  • × author_ss:"Sun, A."
  1. Sedhai, S.; Sun, A.: ¬An analysis of 14 Million tweets on hashtag-oriented spamming* (2017) 0.05
    0.05084972 = product of:
      0.10169944 = sum of:
        0.011771114 = product of:
          0.047084454 = sum of:
            0.047084454 = weight(_text_:based in 3683) [ClassicSimilarity], result of:
              0.047084454 = score(doc=3683,freq=8.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.33289194 = fieldWeight in 3683, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3683)
          0.25 = coord(1/4)
        0.08992833 = weight(_text_:frequency in 3683) [ClassicSimilarity], result of:
          0.08992833 = score(doc=3683,freq=2.0), product of:
            0.27643865 = queryWeight, product of:
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.04694356 = queryNorm
            0.32531026 = fieldWeight in 3683, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3683)
      0.5 = coord(2/4)
    
    Abstract
    Over the years, Twitter has become a popular platform for information dissemination and information gathering. However, the popularity of Twitter has attracted not only legitimate users but also spammers who exploit social graphs, popular keywords, and hashtags for malicious purposes. In this paper, we present a detailed analysis of the HSpam14 dataset, which contains 14 million tweets with spam and ham (i.e., nonspam) labels, to understand spamming activities on Twitter. The primary focus of this paper is to analyze various aspects of spam on Twitter based on hashtags, tweet contents, and user profiles, which are useful for both tweet-level and user-level spam detection. First, we compare the usage of hashtags in spam and ham tweets based on frequency, position, orthography, and co-occurrence. Second, for content-based analysis, we analyze the variations in word usage, metadata, and near-duplicate tweets. Third, for user-based analysis, we investigate user profile information. In our study, we validate that spammers use popular hashtags to promote their tweets. We also observe differences in the usage of words in spam and ham tweets. Spam tweets are more likely to be emphasized using exclamation points and capitalized words. Furthermore, we observe that spammers use multiple accounts to post near-duplicate tweets to promote their services and products. Unlike spammers, legitimate users are likely to provide more information such as their locations and personal descriptions in their profiles. In summary, this study presents a comprehensive analysis of hashtags, tweet contents, and user profiles in Twitter spamming.
  2. Li, C.; Sun, A.: Extracting fine-grained location with temporal awareness in tweets : a two-stage approach (2017) 0.03
    0.034293205 = product of:
      0.06858641 = sum of:
        0.0047084456 = product of:
          0.018833783 = sum of:
            0.018833783 = weight(_text_:based in 3686) [ClassicSimilarity], result of:
              0.018833783 = score(doc=3686,freq=2.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.13315678 = fieldWeight in 3686, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.03125 = fieldNorm(doc=3686)
          0.25 = coord(1/4)
        0.06387796 = weight(_text_:term in 3686) [ClassicSimilarity], result of:
          0.06387796 = score(doc=3686,freq=4.0), product of:
            0.21904005 = queryWeight, product of:
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.04694356 = queryNorm
            0.29162687 = fieldWeight in 3686, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.03125 = fieldNorm(doc=3686)
      0.5 = coord(2/4)
    
    Abstract
    Twitter has attracted billions of users for life logging and sharing activities and opinions. In their tweets, users often reveal their location information and short-term visiting histories or plans. Capturing user's short-term activities could benefit many applications for providing the right context at the right time and location. In this paper we are interested in extracting locations mentioned in tweets at fine-grained granularity, with temporal awareness. Specifically, we recognize the points-of-interest (POIs) mentioned in a tweet and predict whether the user has visited, is currently at, or will soon visit the mentioned POIs. A POI can be a restaurant, a shopping mall, a bookstore, or any other fine-grained location. Our proposed framework, named TS-Petar (Two-Stage POI Extractor with Temporal Awareness), consists of two main components: a POI inventory and a two-stage time-aware POI tagger. The POI inventory is built by exploiting the crowd wisdom of the Foursquare community. It contains both POIs' formal names and their informal abbreviations, commonly observed in Foursquare check-ins. The time-aware POI tagger, based on the Conditional Random Field (CRF) model, is devised to disambiguate the POI mentions and to resolve their associated temporal awareness accordingly. Three sets of contextual features (linguistic, temporal, and inventory features) and two labeling schema features (OP and BILOU schemas) are explored for the time-aware POI extraction task. Our empirical study shows that the subtask of POI disambiguation and the subtask of temporal awareness resolution call for different feature settings for best performance. We have also evaluated the proposed TS-Petar against several strong baseline methods. The experimental results demonstrate that the two-stage approach achieves the best accuracy and outperforms all baseline methods in terms of both effectiveness and efficiency.
  3. Zheng, X.; Sun, A.: Collecting event-related tweets from twitter stream (2019) 0.00
    0.0017656671 = product of:
      0.0070626684 = sum of:
        0.0070626684 = product of:
          0.028250674 = sum of:
            0.028250674 = weight(_text_:based in 4672) [ClassicSimilarity], result of:
              0.028250674 = score(doc=4672,freq=2.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.19973516 = fieldWeight in 4672, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4672)
          0.25 = coord(1/4)
      0.25 = coord(1/4)
    
    Abstract
    Twitter provides a channel of collecting and publishing instant information on major events like natural disasters. However, information flow on Twitter is of great volume. For a specific event, messages collected from the Twitter Stream based on either location constraint or predefined keywords would contain a lot of noise. In this article, we propose a method to achieve both high-precision and high-recall in collecting event-related tweets. Our method involves an automatic keyword generation component, and an event-related tweet identification component. For keyword generation, we consider three properties of candidate keywords, namely relevance, coverage, and evolvement. The keyword updating mechanism enables our method to track the main topics of tweets along event development. To minimize annotation effort in identifying event-related tweets, we adopt active learning and incorporate multiple-instance learning which assigns labels to bags instead of instances (that is, individual tweets). Through experiments on two real-world events, we demonstrate the superiority of our method against state-of-the-art alternatives.

Authors