Search (4 results, page 1 of 1)

  • × author_ss:"Cai, X."
  1. Smiraglia, R.P.; Cai, X.: Tracking the evolution of clustering, machine learning, automatic indexing and automatic classification in knowledge organization (2017) 0.01
    0.008231445 = product of:
      0.03841341 = sum of:
        0.017879399 = weight(_text_:web in 3627) [ClassicSimilarity], result of:
          0.017879399 = score(doc=3627,freq=2.0), product of:
            0.09917287 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.030388402 = queryNorm
            0.18028519 = fieldWeight in 3627, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3627)
        0.005173371 = weight(_text_:information in 3627) [ClassicSimilarity], result of:
          0.005173371 = score(doc=3627,freq=2.0), product of:
            0.05334617 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.030388402 = queryNorm
            0.09697737 = fieldWeight in 3627, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3627)
        0.0153606385 = weight(_text_:retrieval in 3627) [ClassicSimilarity], result of:
          0.0153606385 = score(doc=3627,freq=2.0), product of:
            0.091922335 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.030388402 = queryNorm
            0.16710453 = fieldWeight in 3627, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3627)
      0.21428572 = coord(3/14)
    
    Abstract
    A very important extension of the traditional domain of knowledge organization (KO) arises from attempts to incorporate techniques devised in the computer science domain for automatic concept extraction and for grouping, categorizing, clustering and otherwise organizing knowledge using mechanical means. Four specific terms have emerged to identify the most prevalent techniques: machine learning, clustering, automatic indexing, and automatic classification. Our study presents three domain analytical case analyses in search of answers. The first case relies on citations located using the ISKO-supported "Knowledge Organization Bibliography." The second case relies on works in both Web of Science and SCOPUS. Case three applies co-word analysis and citation analysis to the contents of the papers in the present special issue. We observe scholars involved in "clustering" and "automatic classification" who share common thematic emphases. But we have found no coherence, no common activity and no social semantics. We have not found a research front, or a common teleology within the KO domain. We also have found a lively group of authors who have succeeded in submitting papers to this special issue, and their work quite interestingly aligns with the case studies we report. There is an emphasis on KO for information retrieval; there is much work on clustering (which involves conceptual points within texts) and automatic classification (which involves semantic groupings at the meta-document level).
  2. Lu, K.; Cai, X.; Ajiferuke, I.; Wolfram, D.: Vocabulary size and its effect on topic representation (2017) 0.00
    0.0041693454 = product of:
      0.029185416 = sum of:
        0.01075265 = weight(_text_:information in 3414) [ClassicSimilarity], result of:
          0.01075265 = score(doc=3414,freq=6.0), product of:
            0.05334617 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.030388402 = queryNorm
            0.20156369 = fieldWeight in 3414, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=3414)
        0.018432766 = weight(_text_:retrieval in 3414) [ClassicSimilarity], result of:
          0.018432766 = score(doc=3414,freq=2.0), product of:
            0.091922335 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.030388402 = queryNorm
            0.20052543 = fieldWeight in 3414, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=3414)
      0.14285715 = coord(2/14)
    
    Abstract
    This study investigates how computational overhead for topic model training may be reduced by selectively removing terms from the vocabulary of text corpora being modeled. We compare the impact of removing singly occurring terms, the top 0.5%, 1% and 5% most frequently occurring terms and both top 0.5% most frequent and singly occurring terms, along with changes in the number of topics modeled (10, 20, 30, 40, 50, 100) using three datasets. Four outcome measures are compared. The removal of singly occurring terms has little impact on outcomes for all of the measures tested. Document discriminative capacity, as measured by the document space density, is reduced by the removal of frequently occurring terms, but increases with higher numbers of topics. Vocabulary size does not greatly influence entropy, but entropy is affected by the number of topics. Finally, topic similarity, as measured by pairwise topic similarity and Jensen-Shannon divergence, decreases with the removal of frequent terms. The findings have implications for information science research in information retrieval and informetrics that makes use of topic modeling.
    Source
    Information processing and management. 53(2017) no.3, S.653-665
  3. Liu, Q.; Yang, Z.; Cai, X.; Du, Q.; Fan, W.: ¬The more, the better? : The effect of feedback and user's past successes on idea implementation in open innovation communities (2022) 0.00
    6.4003875E-4 = product of:
      0.008960542 = sum of:
        0.008960542 = weight(_text_:information in 497) [ClassicSimilarity], result of:
          0.008960542 = score(doc=497,freq=6.0), product of:
            0.05334617 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.030388402 = queryNorm
            0.16796975 = fieldWeight in 497, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=497)
      0.071428575 = coord(1/14)
    
    Abstract
    Establishing open innovation communities has evolved as an important product innovation and development strategy for companies. Yet, the success of such communities relies on the successful implementation of many user-submitted ideas. Although extant literature has examined the impact of user experience and idea characteristics on idea implementation, little is known from the information input perspective, for example, feedback. Based on the information overload theory and knowledge content framework, we propose that the amount and types of feedback content have different effects on the likelihood of subsequent idea implementation, and such effects depend on the level of users' success experience. We tested the research model using a panel logistic model with the data of MIUI Forum. The study results revealed that the amount of feedback has an inverted U-shaped effect on idea implementation, and such effect is moderated by a user's past success. Moreover, the type of feedback content (cost and benefit-related feedback and functionality-related feedback) positively affects idea implementation, and a user's past success positively moderated the above effects. Finally, we discuss the theoretical and practical implications, limitations of our research, and suggestions for future research.
    Source
    Journal of the Association for Information Science and Technology. 73(2022) no.3, S.376-392
  4. Cai, X.; Li, W.: Enhancing sentence-level clustering with integrated and interactive frameworks for theme-based summarization (2011) 0.00
    5.225894E-4 = product of:
      0.0073162518 = sum of:
        0.0073162518 = weight(_text_:information in 4770) [ClassicSimilarity], result of:
          0.0073162518 = score(doc=4770,freq=4.0), product of:
            0.05334617 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.030388402 = queryNorm
            0.13714671 = fieldWeight in 4770, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4770)
      0.071428575 = coord(1/14)
    
    Abstract
    Sentence clustering plays a pivotal role in theme-based summarization, which discovers topic themes defined as the clusters of highly related sentences to avoid redundancy and cover more diverse information. As the length of sentences is short and the content it contains is limited, the bag-of-words cosine similarity traditionally used for document clustering is no longer suitable. Special treatment for measuring sentence similarity is necessary. In this article, we study the sentence-level clustering problem. After exploiting concept- and context-enriched sentence vector representations, we develop two co-clustering frameworks to enhance sentence-level clustering for theme-based summarization-integrated clustering and interactive clustering-both allowing word and document to play an explicit role in sentence clustering as independent text objects rather than using word or concept as features of a sentence in a document set. In each framework, we experiment with two-level co-clustering (i.e., sentence-word co-clustering or sentence-document co-clustering) and three-level co-clustering (i.e., document-sentence-word co-clustering). Compared against concept- and context-oriented sentence-representation reformation, co-clustering shows a clear advantage in both intrinsic clustering quality evaluation and extrinsic summarization evaluation conducted on the Document Understanding Conferences (DUC) datasets.
    Source
    Journal of the American Society for Information Science and Technology. 62(2011) no.10, S.2067-2082