Diese Datenbank enthält über 40.000 Dokumente zu Themen aus den Bereichen Formalerschließung – Inhaltserschließung – Information Retrieval.
© 2015 W. Gödert, TH Köln, Institut für Informationswissenschaft / Powered by litecat, BIS Oldenburg (Stand: 28. April 2022)
1Liu, B. ; Yuan, Q. ; Cong, G. ; Xu, D.: Where your photo is taken : geolocation prediction for social images.
In: Journal of the Association for Information Science and Technology. 65(2014) no.6, S.1232-1243.
Abstract: Social image-sharing websites have attracted a large number of users. These systems allow users to associate geolocation information with their images, which is essential for many interesting applications. However, only a small fraction of social images have geolocation information. Thus, an automated tool for suggesting geolocation is essential to help users geotag their images. In this article, we use a large data set consisting of 221 million Flickr images uploaded by 2.2 million users. For the first time, we analyze user uploading patterns, user geotagging behaviors, and the relationship between the taken-time gap and the geographical distance between two images from the same user. Based on the findings, we represent a user profile by historical tags for the user and build a multinomial model on the user profile for geotagging. We further propose a unified framework to suggest geolocations for images, which combines the information from both image tags and the user profile. Experimental results show that for images uploaded by users who have never done geotagging, our method outperforms the state-of-the-art method by 10.6 to 34.2%, depending on the granularity of the prediction. For images from users who have done geotagging, a simple method is able to achieve very high accuracy.
Behandelte Form: Bilder
2Ma, Z. ; Sun, A. ; Cong, G.: On predicting the popularity of newly emerging hashtags in Twitter.
In: Journal of the American Society for Information Science and Technology. 64(2013) no.7, S.1399-1410.
Abstract: Because of Twitter's popularity and the viral nature of information dissemination on Twitter, predicting which Twitter topics will become popular in the near future becomes a task of considerable economic importance. Many Twitter topics are annotated by hashtags. In this article, we propose methods to predict the popularity of new hashtags on Twitter by formulating the problem as a classification task. We use five standard classification models (i.e., Naïve bayes, k-nearest neighbors, decision trees, support vector machines, and logistic regression) for prediction. The main challenge is the identification of effective features for describing new hashtags. We extract 7 content features from a hashtag string and the collection of tweets containing the hashtag and 11 contextual features from the social graph formed by users who have adopted the hashtag. We conducted experiments on a Twitter data set consisting of 31 million tweets from 2 million Singapore-based users. The experimental results show that the standard classifiers using the extracted features significantly outperform the baseline methods that do not use these features. Among the five classifiers, the logistic regression model performs the best in terms of the Micro-F1 measure. We also observe that contextual features are more effective than content features.
Themenfeld: Automatisches Klassifizieren ; Data Mining
3Qu, B. ; Cong, G. ; Li, C. ; Sun, A. ; Chen, H.: ¬An evaluation of classification models for question topic categorization.
In: Journal of the American Society for Information Science and Technology. 63(2012) no.5, S.889-903.
Abstract: We study the problem of question topic classification using a very large real-world Community Question Answering (CQA) dataset from Yahoo! Answers. The dataset comprises 3.9 million questions and these questions are organized into more than 1,000 categories in a hierarchy. To the best knowledge, this is the first systematic evaluation of the performance of different classification methods on question topic classification as well as short texts. Specifically, we empirically evaluate the following in classifying questions into CQA categories: (a) the usefulness of n-gram features and bag-of-word features; (b) the performance of three standard classification algorithms (naive Bayes, maximum entropy, and support vector machines); (c) the performance of the state-of-the-art hierarchical classification algorithms; (d) the effect of training data size on performance; and (e) the effectiveness of the different components of CQA data, including subject, content, asker, and the best answer. The experimental results show what aspects are important for question topic classification in terms of both effectiveness and efficiency. We believe that the experimental findings from this study will be useful in real-world classification problems.
Themenfeld: Automatisches Klassifizieren