Search (3 results, page 1 of 1)

  • × author_ss:"Cong, G."
  • × language_ss:"e"
  • × year_i:[2010 TO 2020}
  1. Liu, B.; Yuan, Q.; Cong, G.; Xu, D.: Where your photo is taken : geolocation prediction for social images (2014) 0.02
    0.017620182 = product of:
      0.07048073 = sum of:
        0.07048073 = weight(_text_:social in 1290) [ClassicSimilarity], result of:
          0.07048073 = score(doc=1290,freq=6.0), product of:
            0.1847249 = queryWeight, product of:
              3.9875789 = idf(docFreq=2228, maxDocs=44218)
              0.046325076 = queryNorm
            0.3815443 = fieldWeight in 1290, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.9875789 = idf(docFreq=2228, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1290)
      0.25 = coord(1/4)
    
    Abstract
    Social image-sharing websites have attracted a large number of users. These systems allow users to associate geolocation information with their images, which is essential for many interesting applications. However, only a small fraction of social images have geolocation information. Thus, an automated tool for suggesting geolocation is essential to help users geotag their images. In this article, we use a large data set consisting of 221 million Flickr images uploaded by 2.2 million users. For the first time, we analyze user uploading patterns, user geotagging behaviors, and the relationship between the taken-time gap and the geographical distance between two images from the same user. Based on the findings, we represent a user profile by historical tags for the user and build a multinomial model on the user profile for geotagging. We further propose a unified framework to suggest geolocations for images, which combines the information from both image tags and the user profile. Experimental results show that for images uploaded by users who have never done geotagging, our method outperforms the state-of-the-art method by 10.6 to 34.2%, depending on the granularity of the prediction. For images from users who have done geotagging, a simple method is able to achieve very high accuracy.
  2. Ma, Z.; Sun, A.; Cong, G.: On predicting the popularity of newly emerging hashtags in Twitter (2013) 0.01
    0.010173016 = product of:
      0.040692065 = sum of:
        0.040692065 = weight(_text_:social in 967) [ClassicSimilarity], result of:
          0.040692065 = score(doc=967,freq=2.0), product of:
            0.1847249 = queryWeight, product of:
              3.9875789 = idf(docFreq=2228, maxDocs=44218)
              0.046325076 = queryNorm
            0.22028469 = fieldWeight in 967, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.9875789 = idf(docFreq=2228, maxDocs=44218)
              0.0390625 = fieldNorm(doc=967)
      0.25 = coord(1/4)
    
    Abstract
    Because of Twitter's popularity and the viral nature of information dissemination on Twitter, predicting which Twitter topics will become popular in the near future becomes a task of considerable economic importance. Many Twitter topics are annotated by hashtags. In this article, we propose methods to predict the popularity of new hashtags on Twitter by formulating the problem as a classification task. We use five standard classification models (i.e., Naïve bayes, k-nearest neighbors, decision trees, support vector machines, and logistic regression) for prediction. The main challenge is the identification of effective features for describing new hashtags. We extract 7 content features from a hashtag string and the collection of tweets containing the hashtag and 11 contextual features from the social graph formed by users who have adopted the hashtag. We conducted experiments on a Twitter data set consisting of 31 million tweets from 2 million Singapore-based users. The experimental results show that the standard classifiers using the extracted features significantly outperform the baseline methods that do not use these features. Among the five classifiers, the logistic regression model performs the best in terms of the Micro-F1 measure. We also observe that contextual features are more effective than content features.
  3. Qu, B.; Cong, G.; Li, C.; Sun, A.; Chen, H.: ¬An evaluation of classification models for question topic categorization (2012) 0.01
    0.0065351077 = product of:
      0.026140431 = sum of:
        0.026140431 = product of:
          0.052280862 = sum of:
            0.052280862 = weight(_text_:aspects in 237) [ClassicSimilarity], result of:
              0.052280862 = score(doc=237,freq=2.0), product of:
                0.20938325 = queryWeight, product of:
                  4.5198684 = idf(docFreq=1308, maxDocs=44218)
                  0.046325076 = queryNorm
                0.2496898 = fieldWeight in 237, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.5198684 = idf(docFreq=1308, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=237)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    We study the problem of question topic classification using a very large real-world Community Question Answering (CQA) dataset from Yahoo! Answers. The dataset comprises 3.9 million questions and these questions are organized into more than 1,000 categories in a hierarchy. To the best knowledge, this is the first systematic evaluation of the performance of different classification methods on question topic classification as well as short texts. Specifically, we empirically evaluate the following in classifying questions into CQA categories: (a) the usefulness of n-gram features and bag-of-word features; (b) the performance of three standard classification algorithms (naive Bayes, maximum entropy, and support vector machines); (c) the performance of the state-of-the-art hierarchical classification algorithms; (d) the effect of training data size on performance; and (e) the effectiveness of the different components of CQA data, including subject, content, asker, and the best answer. The experimental results show what aspects are important for question topic classification in terms of both effectiveness and efficiency. We believe that the experimental findings from this study will be useful in real-world classification problems.

Authors