Search (7 results, page 1 of 1)

  • × author_ss:"Sun, A."
  1. Sedhai, S.; Sun, A.: ¬An analysis of 14 Million tweets on hashtag-oriented spamming* (2017) 0.03
    0.03341625 = product of:
      0.0668325 = sum of:
        0.040692065 = weight(_text_:social in 3683) [ClassicSimilarity], result of:
          0.040692065 = score(doc=3683,freq=2.0), product of:
            0.1847249 = queryWeight, product of:
              3.9875789 = idf(docFreq=2228, maxDocs=44218)
              0.046325076 = queryNorm
            0.22028469 = fieldWeight in 3683, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.9875789 = idf(docFreq=2228, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3683)
        0.026140431 = product of:
          0.052280862 = sum of:
            0.052280862 = weight(_text_:aspects in 3683) [ClassicSimilarity], result of:
              0.052280862 = score(doc=3683,freq=2.0), product of:
                0.20938325 = queryWeight, product of:
                  4.5198684 = idf(docFreq=1308, maxDocs=44218)
                  0.046325076 = queryNorm
                0.2496898 = fieldWeight in 3683, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.5198684 = idf(docFreq=1308, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3683)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Over the years, Twitter has become a popular platform for information dissemination and information gathering. However, the popularity of Twitter has attracted not only legitimate users but also spammers who exploit social graphs, popular keywords, and hashtags for malicious purposes. In this paper, we present a detailed analysis of the HSpam14 dataset, which contains 14 million tweets with spam and ham (i.e., nonspam) labels, to understand spamming activities on Twitter. The primary focus of this paper is to analyze various aspects of spam on Twitter based on hashtags, tweet contents, and user profiles, which are useful for both tweet-level and user-level spam detection. First, we compare the usage of hashtags in spam and ham tweets based on frequency, position, orthography, and co-occurrence. Second, for content-based analysis, we analyze the variations in word usage, metadata, and near-duplicate tweets. Third, for user-based analysis, we investigate user profile information. In our study, we validate that spammers use popular hashtags to promote their tweets. We also observe differences in the usage of words in spam and ham tweets. Spam tweets are more likely to be emphasized using exclamation points and capitalized words. Furthermore, we observe that spammers use multiple accounts to post near-duplicate tweets to promote their services and products. Unlike spammers, legitimate users are likely to provide more information such as their locations and personal descriptions in their profiles. In summary, this study presents a comprehensive analysis of hashtags, tweet contents, and user profiles in Twitter spamming.
  2. Li, H.; Bhowmick, S.S.; Sun, A.: AffRank: affinity-driven ranking of products in online social rating networks (2011) 0.02
    0.017620182 = product of:
      0.07048073 = sum of:
        0.07048073 = weight(_text_:social in 4483) [ClassicSimilarity], result of:
          0.07048073 = score(doc=4483,freq=6.0), product of:
            0.1847249 = queryWeight, product of:
              3.9875789 = idf(docFreq=2228, maxDocs=44218)
              0.046325076 = queryNorm
            0.3815443 = fieldWeight in 4483, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.9875789 = idf(docFreq=2228, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4483)
      0.25 = coord(1/4)
    
    Abstract
    Large online social rating networks (e.g., Epinions, Blippr) have recently come into being containing information related to various types of products. Typically, each product in these networks is associated with a group of members who have provided ratings and comments on it. These people form a product community. A potential member can join a product community by giving a new rating to the product. We refer to this phenomenon of a product community's ability to "attract" new members as product affinity. The knowledge of a ranked list of products based on product affinity is of much importance for implementing policies, marketing research, online advertisement, and other applications. In this article, we identify and analyze an array of features that exert effect on product affinity and propose a novel model, called AffRank, that utilizes these features to predict the future rank of products according to their affinities. Evaluated on two real-world datasets, we demonstrate the effectiveness and superior prediction quality of AffRank compared with baseline methods. Our experiments show that features such as affinity rank history, affinity evolution distance, and average rating are the most important factors affecting future rank of products. At the same time, interestingly, traditional community features (e.g., community size, member connectivity, and social context) have negligible influence on product affinities.
  3. Sun, A.; Bhowmick, S.S.; Nguyen, K.T.N.; Bai, G.: Tag-based social image retrieval : an empirical evaluation (2011) 0.02
    0.017620182 = product of:
      0.07048073 = sum of:
        0.07048073 = weight(_text_:social in 4938) [ClassicSimilarity], result of:
          0.07048073 = score(doc=4938,freq=6.0), product of:
            0.1847249 = queryWeight, product of:
              3.9875789 = idf(docFreq=2228, maxDocs=44218)
              0.046325076 = queryNorm
            0.3815443 = fieldWeight in 4938, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.9875789 = idf(docFreq=2228, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4938)
      0.25 = coord(1/4)
    
    Abstract
    Tags associated with social images are valuable information source for superior image search and retrieval experiences. Although various heuristics are valuable to boost tag-based search for images, there is a lack of general framework to study the impact of these heuristics. Specifically, the task of ranking images matching a given tag query based on their associated tags in descending order of relevance has not been well studied. In this article, we take the first step to propose a generic, flexible, and extensible framework for this task and exploit it for a systematic and comprehensive empirical evaluation of various methods for ranking images. To this end, we identified five orthogonal dimensions to quantify the matching score between a tagged image and a tag query. These five dimensions are: (i) tag relatedness to measure the degree of effectiveness of a tag describing the tagged image; (ii) tag discrimination to quantify the degree of discrimination of a tag with respect to the entire tagged image collection; (iii) tag length normalization analogous to document length normalization in web search; (iv) tag-query matching model for the matching score computation between an image tag and a query tag; and (v) query model for tag query rewriting. For each dimension, we identify a few implementations and evaluate their impact on NUS-WIDE dataset, the largest human-annotated dataset consisting of more than 269K tagged images from Flickr. We evaluated 81 single-tag queries and 443 multi-tag queries over 288 search methods and systematically compare their performances using standard metrics including Precision at top-K, Mean Average Precision (MAP), Recall, and Normalized Discounted Cumulative Gain (NDCG).
    Theme
    Social tagging
  4. Ma, Z.; Sun, A.; Cong, G.: On predicting the popularity of newly emerging hashtags in Twitter (2013) 0.01
    0.010173016 = product of:
      0.040692065 = sum of:
        0.040692065 = weight(_text_:social in 967) [ClassicSimilarity], result of:
          0.040692065 = score(doc=967,freq=2.0), product of:
            0.1847249 = queryWeight, product of:
              3.9875789 = idf(docFreq=2228, maxDocs=44218)
              0.046325076 = queryNorm
            0.22028469 = fieldWeight in 967, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.9875789 = idf(docFreq=2228, maxDocs=44218)
              0.0390625 = fieldNorm(doc=967)
      0.25 = coord(1/4)
    
    Abstract
    Because of Twitter's popularity and the viral nature of information dissemination on Twitter, predicting which Twitter topics will become popular in the near future becomes a task of considerable economic importance. Many Twitter topics are annotated by hashtags. In this article, we propose methods to predict the popularity of new hashtags on Twitter by formulating the problem as a classification task. We use five standard classification models (i.e., Naïve bayes, k-nearest neighbors, decision trees, support vector machines, and logistic regression) for prediction. The main challenge is the identification of effective features for describing new hashtags. We extract 7 content features from a hashtag string and the collection of tweets containing the hashtag and 11 contextual features from the social graph formed by users who have adopted the hashtag. We conducted experiments on a Twitter data set consisting of 31 million tweets from 2 million Singapore-based users. The experimental results show that the standard classifiers using the extracted features significantly outperform the baseline methods that do not use these features. Among the five classifiers, the logistic regression model performs the best in terms of the Micro-F1 measure. We also observe that contextual features are more effective than content features.
  5. Phan, M.C.; Sun, A.: Collective named entity recognition in user comments via parameterized label propagation (2020) 0.01
    0.010173016 = product of:
      0.040692065 = sum of:
        0.040692065 = weight(_text_:social in 5815) [ClassicSimilarity], result of:
          0.040692065 = score(doc=5815,freq=2.0), product of:
            0.1847249 = queryWeight, product of:
              3.9875789 = idf(docFreq=2228, maxDocs=44218)
              0.046325076 = queryNorm
            0.22028469 = fieldWeight in 5815, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.9875789 = idf(docFreq=2228, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5815)
      0.25 = coord(1/4)
    
    Abstract
    Named entity recognition (NER) in the past has focused on extracting mentions in a local region, within a sentence or short paragraph. When dealing with user-generated text, the diverse and informal writing style makes traditional approaches much less effective. On the other hand, in many types of text on social media such as user comments, tweets, or question-answer posts, the contextual connections between documents do exist. Examples include posts in a thread discussing the same topic, tweets that share a hashtag about the same entity. Our idea in this work is utilizing the related contexts across documents to perform mention recognition in a collective manner. Intuitively, within a mention coreference graph, the labels of mentions are expected to propagate from more confidence cases to less confidence ones. To this end, we propose a novel semisupervised inference algorithm named parameterized label propagation. In our model, the propagation weights between mentions are learned by an attention-like mechanism, given their local contexts and the initial labels as input. We study the performance of our approach in the Yahoo! News data set, where comments and articles within a thread share similar context. The results show that our model significantly outperforms all other noncollective NER baselines.
  6. Qu, B.; Cong, G.; Li, C.; Sun, A.; Chen, H.: ¬An evaluation of classification models for question topic categorization (2012) 0.01
    0.0065351077 = product of:
      0.026140431 = sum of:
        0.026140431 = product of:
          0.052280862 = sum of:
            0.052280862 = weight(_text_:aspects in 237) [ClassicSimilarity], result of:
              0.052280862 = score(doc=237,freq=2.0), product of:
                0.20938325 = queryWeight, product of:
                  4.5198684 = idf(docFreq=1308, maxDocs=44218)
                  0.046325076 = queryNorm
                0.2496898 = fieldWeight in 237, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.5198684 = idf(docFreq=1308, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=237)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    We study the problem of question topic classification using a very large real-world Community Question Answering (CQA) dataset from Yahoo! Answers. The dataset comprises 3.9 million questions and these questions are organized into more than 1,000 categories in a hierarchy. To the best knowledge, this is the first systematic evaluation of the performance of different classification methods on question topic classification as well as short texts. Specifically, we empirically evaluate the following in classifying questions into CQA categories: (a) the usefulness of n-gram features and bag-of-word features; (b) the performance of three standard classification algorithms (naive Bayes, maximum entropy, and support vector machines); (c) the performance of the state-of-the-art hierarchical classification algorithms; (d) the effect of training data size on performance; and (e) the effectiveness of the different components of CQA data, including subject, content, asker, and the best answer. The experimental results show what aspects are important for question topic classification in terms of both effectiveness and efficiency. We believe that the experimental findings from this study will be useful in real-world classification problems.
  7. Sun, A.; Lim, E.-P.: Web unit-based mining of homepage relationships (2006) 0.00
    0.0039227554 = product of:
      0.015691021 = sum of:
        0.015691021 = product of:
          0.031382043 = sum of:
            0.031382043 = weight(_text_:22 in 5274) [ClassicSimilarity], result of:
              0.031382043 = score(doc=5274,freq=2.0), product of:
                0.16222252 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046325076 = queryNorm
                0.19345059 = fieldWeight in 5274, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5274)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    22. 7.2006 16:18:25