Search (4 results, page 1 of 1)

  • × author_ss:"Sun, A."
  • × year_i:[2010 TO 2020}
  1. Qu, B.; Cong, G.; Li, C.; Sun, A.; Chen, H.: ¬An evaluation of classification models for question topic categorization (2012) 0.01
    0.0069548762 = product of:
      0.0139097525 = sum of:
        0.0139097525 = product of:
          0.041729257 = sum of:
            0.041729257 = weight(_text_:c in 237) [ClassicSimilarity], result of:
              0.041729257 = score(doc=237,freq=4.0), product of:
                0.15484828 = queryWeight, product of:
                  3.4494052 = idf(docFreq=3817, maxDocs=44218)
                  0.044891298 = queryNorm
                0.2694848 = fieldWeight in 237, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.4494052 = idf(docFreq=3817, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=237)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
    Abstract
    We study the problem of question topic classification using a very large real-world Community Question Answering (CQA) dataset from Yahoo! Answers. The dataset comprises 3.9 million questions and these questions are organized into more than 1,000 categories in a hierarchy. To the best knowledge, this is the first systematic evaluation of the performance of different classification methods on question topic classification as well as short texts. Specifically, we empirically evaluate the following in classifying questions into CQA categories: (a) the usefulness of n-gram features and bag-of-word features; (b) the performance of three standard classification algorithms (naive Bayes, maximum entropy, and support vector machines); (c) the performance of the state-of-the-art hierarchical classification algorithms; (d) the effect of training data size on performance; and (e) the effectiveness of the different components of CQA data, including subject, content, asker, and the best answer. The experimental results show what aspects are important for question topic classification in terms of both effectiveness and efficiency. We believe that the experimental findings from this study will be useful in real-world classification problems.
  2. Sun, A.; Bhowmick, S.S.; Nguyen, K.T.N.; Bai, G.: Tag-based social image retrieval : an empirical evaluation (2011) 0.01
    0.0058798613 = product of:
      0.011759723 = sum of:
        0.011759723 = product of:
          0.035279166 = sum of:
            0.035279166 = weight(_text_:i in 4938) [ClassicSimilarity], result of:
              0.035279166 = score(doc=4938,freq=2.0), product of:
                0.16931784 = queryWeight, product of:
                  3.7717297 = idf(docFreq=2765, maxDocs=44218)
                  0.044891298 = queryNorm
                0.20836058 = fieldWeight in 4938, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.7717297 = idf(docFreq=2765, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4938)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
    Abstract
    Tags associated with social images are valuable information source for superior image search and retrieval experiences. Although various heuristics are valuable to boost tag-based search for images, there is a lack of general framework to study the impact of these heuristics. Specifically, the task of ranking images matching a given tag query based on their associated tags in descending order of relevance has not been well studied. In this article, we take the first step to propose a generic, flexible, and extensible framework for this task and exploit it for a systematic and comprehensive empirical evaluation of various methods for ranking images. To this end, we identified five orthogonal dimensions to quantify the matching score between a tagged image and a tag query. These five dimensions are: (i) tag relatedness to measure the degree of effectiveness of a tag describing the tagged image; (ii) tag discrimination to quantify the degree of discrimination of a tag with respect to the entire tagged image collection; (iii) tag length normalization analogous to document length normalization in web search; (iv) tag-query matching model for the matching score computation between an image tag and a query tag; and (v) query model for tag query rewriting. For each dimension, we identify a few implementations and evaluate their impact on NUS-WIDE dataset, the largest human-annotated dataset consisting of more than 269K tagged images from Flickr. We evaluated 81 single-tag queries and 443 multi-tag queries over 288 search methods and systematically compare their performances using standard metrics including Precision at top-K, Mean Average Precision (MAP), Recall, and Normalized Discounted Cumulative Gain (NDCG).
  3. Li, C.; Sun, A.; Datta, A.: TSDW: Two-stage word sense disambiguation using Wikipedia (2013) 0.00
    0.0049178395 = product of:
      0.009835679 = sum of:
        0.009835679 = product of:
          0.029507035 = sum of:
            0.029507035 = weight(_text_:c in 956) [ClassicSimilarity], result of:
              0.029507035 = score(doc=956,freq=2.0), product of:
                0.15484828 = queryWeight, product of:
                  3.4494052 = idf(docFreq=3817, maxDocs=44218)
                  0.044891298 = queryNorm
                0.1905545 = fieldWeight in 956, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.4494052 = idf(docFreq=3817, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=956)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
  4. Li, C.; Sun, A.: Extracting fine-grained location with temporal awareness in tweets : a two-stage approach (2017) 0.00
    0.0039342716 = product of:
      0.007868543 = sum of:
        0.007868543 = product of:
          0.02360563 = sum of:
            0.02360563 = weight(_text_:c in 3686) [ClassicSimilarity], result of:
              0.02360563 = score(doc=3686,freq=2.0), product of:
                0.15484828 = queryWeight, product of:
                  3.4494052 = idf(docFreq=3817, maxDocs=44218)
                  0.044891298 = queryNorm
                0.1524436 = fieldWeight in 3686, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.4494052 = idf(docFreq=3817, maxDocs=44218)
                  0.03125 = fieldNorm(doc=3686)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)