Search (3 results, page 1 of 1)

  • × author_ss:"Huang, X."
  1. Liu, Y.; Huang, X.; An, A.: Personalized recommendation with adaptive mixture of markov models (2007) 0.01
    0.008710952 = product of:
      0.052265707 = sum of:
        0.052265707 = weight(_text_:web in 606) [ClassicSimilarity], result of:
          0.052265707 = score(doc=606,freq=8.0), product of:
            0.14495286 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.044416238 = queryNorm
            0.36057037 = fieldWeight in 606, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=606)
      0.16666667 = coord(1/6)
    
    Abstract
    With more and more information available on the Internet, the task of making personalized recommendations to assist the user's navigation has become increasingly important. Considering there might be millions of users with different backgrounds accessing a Web site everyday, it is infeasible to build a separate recommendation system for each user. To address this problem, clustering techniques can first be employed to discover user groups. Then, user navigation patterns for each group can be discovered, to allow the adaptation of a Web site to the interest of each individual group. In this paper, we propose to model user access sequences as stochastic processes, and a mixture of Markov models based approach is taken to cluster users and to capture the sequential relationships inherent in user access histories. Several important issues that arise in constructing the Markov models are also addressed. The first issue lies in the complexity of the mixture of Markov models. To improve the efficiency of building/maintaining the mixture of Markov models, we develop a lightweight adapt-ive algorithm to update the model parameters without recomputing model parameters from scratch. The second issue concerns the proper selection of training data for building the mixture of Markov models. We investigate two different training data selection strategies and perform extensive experiments to compare their effectiveness on a real dataset that is generated by a Web-based knowledge management system, Livelink.
    Footnote
    Beitrag eines Themenschwerpunktes "Mining Web resources for enhancing information retrieval"
  2. Peng, F.; Huang, X.: Machine learning for Asian language text classification (2007) 0.01
    0.006159573 = product of:
      0.036957435 = sum of:
        0.036957435 = weight(_text_:web in 831) [ClassicSimilarity], result of:
          0.036957435 = score(doc=831,freq=4.0), product of:
            0.14495286 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.044416238 = queryNorm
            0.25496176 = fieldWeight in 831, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=831)
      0.16666667 = coord(1/6)
    
    Abstract
    Purpose - The purpose of this research is to compare several machine learning techniques on the task of Asian language text classification, such as Chinese and Japanese where no word boundary information is available in written text. The paper advocates a simple language modeling based approach for this task. Design/methodology/approach - Naïve Bayes, maximum entropy model, support vector machines, and language modeling approaches were implemented and were applied to Chinese and Japanese text classification. To investigate the influence of word segmentation, different word segmentation approaches were investigated and applied to Chinese text. A segmentation-based approach was compared with the non-segmentation-based approach. Findings - There were two findings: the experiments show that statistical language modeling can significantly outperform standard techniques, given the same set of features; and it was found that classification with word level features normally yields improved classification performance, but that classification performance is not monotonically related to segmentation accuracy. In particular, classification performance may initially improve with increased segmentation accuracy, but eventually classification performance stops improving, and can in fact even decrease, after a certain level of segmentation accuracy. Practical implications - Apply the findings to real web text classification is ongoing work. Originality/value - The paper is very relevant to Chinese and Japanese information processing, e.g. webpage classification, web search.
  3. Huang, X.; Peng, F,; An, A.; Schuurmans, D.: Dynamic Web log session identification with statistical language models (2004) 0.01
    0.0052265706 = product of:
      0.031359423 = sum of:
        0.031359423 = weight(_text_:web in 3096) [ClassicSimilarity], result of:
          0.031359423 = score(doc=3096,freq=2.0), product of:
            0.14495286 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.044416238 = queryNorm
            0.21634221 = fieldWeight in 3096, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=3096)
      0.16666667 = coord(1/6)