Search (4 results, page 1 of 1)

  • × author_ss:"Tseng, Y.-H."
  1. Tseng, Y.-H.: Keyword extraction techniques and relevance feedback (1997) 0.00
    0.004473776 = product of:
      0.040263984 = sum of:
        0.040263984 = weight(_text_:data in 1830) [ClassicSimilarity], result of:
          0.040263984 = score(doc=1830,freq=4.0), product of:
            0.11642061 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.036818076 = queryNorm
            0.34584928 = fieldWeight in 1830, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1830)
      0.11111111 = coord(1/9)
    
    Abstract
    Automatic keyword extraction is an important and fundamental technology in an advanced information retrieval systems. Briefly compares several major keyword extraction methods, lists their advantages and disadvantages, and reports recent research progress in Taiwan. Also describes the application of a keyword extraction algorithm in an information retrieval system for relevance feedback. Preliminary analysis shows that the error rate of extracting relevant keywords is 18%, and that the precision rate is over 50%. The main disadvantage of this approach is that the extraction results depend on the retrieval results, which in turn depend on the data held by the database. Apart from collecting more data, this problem can be alleviated by the application of a thesaurus constructed by the same keyword extraction algorithm
  2. Tseng, Y.-H.: Solving vocabulary problems with interactive query expansion (1998) 0.00
    0.0034250922 = product of:
      0.03082583 = sum of:
        0.03082583 = weight(_text_:bibliographic in 5159) [ClassicSimilarity], result of:
          0.03082583 = score(doc=5159,freq=2.0), product of:
            0.14333439 = queryWeight, product of:
              3.893044 = idf(docFreq=2449, maxDocs=44218)
              0.036818076 = queryNorm
            0.21506234 = fieldWeight in 5159, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.893044 = idf(docFreq=2449, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5159)
      0.11111111 = coord(1/9)
    
    Abstract
    One of the major causes of search failures in information retrieval systems is vocabulary mismatch. Presents a solution to the vocabulary problem through 2 strategies known as term suggestion (TS) and term relevance feedback (TRF). In TS, collection specific terms are extracted from the text collection. These terms and their frequencies constitute the keyword database for suggesting terms in response to users' queries. One effect of this term suggestion is that it functions as a dynamic directory if the query is a general term that contains broad meaning. In term relevance feedback, terms extracted from the top ranked documents retrieved from the previous query are shown to users for relevance feedback. In the experiment, interactive TS provides very high precision rates while achieving similar recall rates as n-gram matching. Local TRF achieves improvement in both precision and recall rate in a full text news database and degrades slightly in recall rate in bibliographic databases due to the very limited source of information for feedback. In terms of Rijsbergen's combined measure of recall and precision, both TS and TRF achieve better performance than n-gram matching, which implies that the greater improvement in precision rate compensates the slight degradation in recall rate for TS and TRF
  3. Tseng, Y.-H.: Automatic cataloguing and searching for retrospective data by use of OCR text (2001) 0.00
    0.0027115175 = product of:
      0.024403658 = sum of:
        0.024403658 = weight(_text_:data in 5421) [ClassicSimilarity], result of:
          0.024403658 = score(doc=5421,freq=2.0), product of:
            0.11642061 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.036818076 = queryNorm
            0.2096163 = fieldWeight in 5421, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=5421)
      0.11111111 = coord(1/9)
    
  4. Lee, L.-H.; Juan, Y.-C.; Tseng, W.-L.; Chen, H.-H.; Tseng, Y.-H.: Mining browsing behaviors for objectionable content filtering (2015) 0.00
    0.0022595983 = product of:
      0.020336384 = sum of:
        0.020336384 = weight(_text_:data in 1818) [ClassicSimilarity], result of:
          0.020336384 = score(doc=1818,freq=2.0), product of:
            0.11642061 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.036818076 = queryNorm
            0.17468026 = fieldWeight in 1818, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1818)
      0.11111111 = coord(1/9)
    
    Abstract
    This article explores users' browsing intents to predict the category of a user's next access during web surfing and applies the results to filter objectionable content, such as pornography, gambling, violence, and drugs. Users' access trails in terms of category sequences in click-through data are employed to mine users' web browsing behaviors. Contextual relationships of URL categories are learned by the hidden Markov model. The top-level domains (TLDs) extracted from URLs themselves and the corresponding categories are caught by the TLD model. Given a URL to be predicted, its TLD and current context are empirically combined in an aggregation model. In addition to the uses of the current context, the predictions of the URL accessed previously in different contexts by various users are also considered by majority rule to improve the aggregation model. Large-scale experiments show that the advanced aggregation approach achieves promising performance while maintaining an acceptably low false positive rate. Different strategies are introduced to integrate the model with the blacklist it generates for filtering objectionable web pages without analyzing their content. In practice, this is complementary to the existing content analysis from users' behavioral perspectives.