Search (7 results, page 1 of 1)

  • × author_ss:"Li, C."
  • × type_ss:"a"
  1. Cheang, B.; Chu, S.K.W.; Li, C.; Lim, A.: ¬A multidimensional approach to evaluating management journals : refining pagerank via the differentiation of citation types and identifying the roles that management journals play (2014) 0.00
    0.0026849252 = product of:
      0.0053698504 = sum of:
        0.0053698504 = product of:
          0.010739701 = sum of:
            0.010739701 = weight(_text_:a in 1551) [ClassicSimilarity], result of:
              0.010739701 = score(doc=1551,freq=14.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.20223314 = fieldWeight in 1551, product of:
                  3.7416575 = tf(freq=14.0), with freq of:
                    14.0 = termFreq=14.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1551)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    In this article, the authors introduce two citation-based approaches to facilitate a multidimensional evaluation of 39 selected management journals. The first is a refined application of PageRank via the differentiation of citation types. The second is a form of mathematical manipulation to identify the roles that the selected management journals play. Their findings reveal that Academy of Management Journal, Academy of Management Review, and Administrative Science Quarterly are the top three management journals, respectively. They also discovered that these three journals play the role of a knowledge hub in the domain. Finally, when compared with Journal Citation Reports (Thomson Reuters, Philadelphia, PA), their results closely match expert opinions.
    Type
    a
  2. Li, C.; Sun, A.; Datta, A.: TSDW: Two-stage word sense disambiguation using Wikipedia (2013) 0.00
    0.0023919214 = product of:
      0.0047838427 = sum of:
        0.0047838427 = product of:
          0.009567685 = sum of:
            0.009567685 = weight(_text_:a in 956) [ClassicSimilarity], result of:
              0.009567685 = score(doc=956,freq=16.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.18016359 = fieldWeight in 956, product of:
                  4.0 = tf(freq=16.0), with freq of:
                    16.0 = termFreq=16.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=956)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The semantic knowledge of Wikipedia has proved to be useful for many tasks, for example, named entity disambiguation. Among these applications, the task of identifying the word sense based on Wikipedia is a crucial component because the output of this component is often used in subsequent tasks. In this article, we present a two-stage framework (called TSDW) for word sense disambiguation using knowledge latent in Wikipedia. The disambiguation of a given phrase is applied through a two-stage disambiguation process: (a) The first-stage disambiguation explores the contextual semantic information, where the noisy information is pruned for better effectiveness and efficiency; and (b) the second-stage disambiguation explores the disambiguated phrases of high confidence from the first stage to achieve better redisambiguation decisions for the phrases that are difficult to disambiguate in the first stage. Moreover, existing studies have addressed the disambiguation problem for English text only. Considering the popular usage of Wikipedia in different languages, we study the performance of TSDW and the existing state-of-the-art approaches over both English and Traditional Chinese articles. The experimental results show that TSDW generalizes well to different semantic relatedness measures and text in different languages. More important, TSDW significantly outperforms the state-of-the-art approaches with both better effectiveness and efficiency.
    Type
    a
  3. Toms, E.G.; Freund, L.; Li, C.: WilRE: the Web Interactive information retrieval experimentation system prototype (2004) 0.00
    0.002269176 = product of:
      0.004538352 = sum of:
        0.004538352 = product of:
          0.009076704 = sum of:
            0.009076704 = weight(_text_:a in 2534) [ClassicSimilarity], result of:
              0.009076704 = score(doc=2534,freq=10.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.1709182 = fieldWeight in 2534, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2534)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    We introduce WiIRE, a prototype system for conducting interactive information retrieval (IIR) experiments via the Internet. We conceived Wi IRE to increase validity while streamlining procedures and adding efficiencies to the conduct of IIR experiments. The system incorporates password-controlled access, online questionnaires, study instructions and tutorials, conditional interface assignment, and conditional query assignment as well as provision for data collection. As an initial evaluation, we used WiIRE inhouse to conduct a Web-based IIR experiment using an external search engine with customized search interfaces and the TREC 11 Interactive Track search queries. Our evaluation of the prototype indicated significant cost efficiencies in the conduct of IIR studies, and additionally had some novel findings about the human perspective: about half participants would have preferred some personal contact with the researcher, and participants spent a significantly decreasing amount of time on tasks over the course of a session.
    Type
    a
  4. Li, C.; Sun, A.: Extracting fine-grained location with temporal awareness in tweets : a two-stage approach (2017) 0.00
    0.0021393995 = product of:
      0.004278799 = sum of:
        0.004278799 = product of:
          0.008557598 = sum of:
            0.008557598 = weight(_text_:a in 3686) [ClassicSimilarity], result of:
              0.008557598 = score(doc=3686,freq=20.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.16114321 = fieldWeight in 3686, product of:
                  4.472136 = tf(freq=20.0), with freq of:
                    20.0 = termFreq=20.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.03125 = fieldNorm(doc=3686)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Twitter has attracted billions of users for life logging and sharing activities and opinions. In their tweets, users often reveal their location information and short-term visiting histories or plans. Capturing user's short-term activities could benefit many applications for providing the right context at the right time and location. In this paper we are interested in extracting locations mentioned in tweets at fine-grained granularity, with temporal awareness. Specifically, we recognize the points-of-interest (POIs) mentioned in a tweet and predict whether the user has visited, is currently at, or will soon visit the mentioned POIs. A POI can be a restaurant, a shopping mall, a bookstore, or any other fine-grained location. Our proposed framework, named TS-Petar (Two-Stage POI Extractor with Temporal Awareness), consists of two main components: a POI inventory and a two-stage time-aware POI tagger. The POI inventory is built by exploiting the crowd wisdom of the Foursquare community. It contains both POIs' formal names and their informal abbreviations, commonly observed in Foursquare check-ins. The time-aware POI tagger, based on the Conditional Random Field (CRF) model, is devised to disambiguate the POI mentions and to resolve their associated temporal awareness accordingly. Three sets of contextual features (linguistic, temporal, and inventory features) and two labeling schema features (OP and BILOU schemas) are explored for the time-aware POI extraction task. Our empirical study shows that the subtask of POI disambiguation and the subtask of temporal awareness resolution call for different feature settings for best performance. We have also evaluated the proposed TS-Petar against several strong baseline methods. The experimental results demonstrate that the two-stage approach achieves the best accuracy and outperforms all baseline methods in terms of both effectiveness and efficiency.
    Type
    a
  5. Qu, B.; Cong, G.; Li, C.; Sun, A.; Chen, H.: ¬An evaluation of classification models for question topic categorization (2012) 0.00
    0.0018909799 = product of:
      0.0037819599 = sum of:
        0.0037819599 = product of:
          0.0075639198 = sum of:
            0.0075639198 = weight(_text_:a in 237) [ClassicSimilarity], result of:
              0.0075639198 = score(doc=237,freq=10.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.14243183 = fieldWeight in 237, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=237)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    We study the problem of question topic classification using a very large real-world Community Question Answering (CQA) dataset from Yahoo! Answers. The dataset comprises 3.9 million questions and these questions are organized into more than 1,000 categories in a hierarchy. To the best knowledge, this is the first systematic evaluation of the performance of different classification methods on question topic classification as well as short texts. Specifically, we empirically evaluate the following in classifying questions into CQA categories: (a) the usefulness of n-gram features and bag-of-word features; (b) the performance of three standard classification algorithms (naive Bayes, maximum entropy, and support vector machines); (c) the performance of the state-of-the-art hierarchical classification algorithms; (d) the effect of training data size on performance; and (e) the effectiveness of the different components of CQA data, including subject, content, asker, and the best answer. The experimental results show what aspects are important for question topic classification in terms of both effectiveness and efficiency. We believe that the experimental findings from this study will be useful in real-world classification problems.
    Type
    a
  6. Li, C.; Sugimoto, S.: Provenance description of metadata application profiles for long-term maintenance of metadata schemas : Luciano Floridi's philosophy of information as the foundation for library and information science (2018) 0.00
    0.0018909799 = product of:
      0.0037819599 = sum of:
        0.0037819599 = product of:
          0.0075639198 = sum of:
            0.0075639198 = weight(_text_:a in 4048) [ClassicSimilarity], result of:
              0.0075639198 = score(doc=4048,freq=10.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.14243183 = fieldWeight in 4048, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4048)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Purpose Provenance information is crucial for consistent maintenance of metadata schemas over time. The purpose of this paper is to propose a provenance model named DSP-PROV to keep track of structural changes of metadata schemas. Design/methodology/approach The DSP-PROV model is developed through applying the general provenance description standard PROV of the World Wide Web Consortium to the Dublin Core Application Profile. Metadata Application Profile of Digital Public Library of America is selected as a case study to apply the DSP-PROV model. Finally, this paper evaluates the proposed model by comparison between formal provenance description in DSP-PROV and semi-formal change log description in English. Findings Formal provenance description in the DSP-PROV model has advantages over semi-formal provenance description in English to keep metadata schemas consistent over time. Research limitations/implications The DSP-PROV model is applicable to keep track of the structural changes of metadata schema over time. Provenance description of other features of metadata schema such as vocabulary and encoding syntax are not covered. Originality/value This study proposes a simple model for provenance description of structural features of metadata schemas based on a few standards widely accepted on the Web and shows the advantage of the proposed model to conventional semi-formal provenance description.
    Type
    a
  7. Li, X.; Zhang, A.; Li, C.; Ouyang, J.; Cai, Y.: Exploring coherent topics by topic modeling with term weighting (2018) 0.00
    0.0016913437 = product of:
      0.0033826875 = sum of:
        0.0033826875 = product of:
          0.006765375 = sum of:
            0.006765375 = weight(_text_:a in 5045) [ClassicSimilarity], result of:
              0.006765375 = score(doc=5045,freq=8.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.12739488 = fieldWeight in 5045, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5045)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Topic models often produce unexplainable topics that are filled with noisy words. The reason is that words in topic modeling have equal weights. High frequency words dominate the top topic word lists, but most of them are meaningless words, e.g., domain-specific stopwords. To address this issue, in this paper we aim to investigate how to weight words, and then develop a straightforward but effective term weighting scheme, namely entropy weighting (EW). The proposed EW scheme is based on conditional entropy measured by word co-occurrences. Compared with existing term weighting schemes, the highlight of EW is that it can automatically reward informative words. For more robust word weight, we further suggest a combination form of EW (CEW) with two existing weighting schemes. Basically, our CEW assigns meaningless words lower weights and informative words higher weights, leading to more coherent topics during topic modeling inference. We apply CEW to Dirichlet multinomial mixture and latent Dirichlet allocation, and evaluate it by topic quality, document clustering and classification tasks on 8 real world data sets. Experimental results show that weighting words can effectively improve the topic modeling performance over both short texts and normal long texts. More importantly, the proposed CEW significantly outperforms the existing term weighting schemes, since it further considers which words are informative.
    Type
    a