Search (2 results, page 1 of 1)

  • × author_ss:"Tseng, Y.-H."
  1. Tseng, Y.-H.: Solving vocabulary problems with interactive query expansion (1998) 0.02
    0.016424898 = product of:
      0.032849796 = sum of:
        0.032849796 = product of:
          0.06569959 = sum of:
            0.06569959 = weight(_text_:n in 5159) [ClassicSimilarity], result of:
              0.06569959 = score(doc=5159,freq=4.0), product of:
                0.19504215 = queryWeight, product of:
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.045236014 = queryNorm
                0.33684817 = fieldWeight in 5159, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5159)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    One of the major causes of search failures in information retrieval systems is vocabulary mismatch. Presents a solution to the vocabulary problem through 2 strategies known as term suggestion (TS) and term relevance feedback (TRF). In TS, collection specific terms are extracted from the text collection. These terms and their frequencies constitute the keyword database for suggesting terms in response to users' queries. One effect of this term suggestion is that it functions as a dynamic directory if the query is a general term that contains broad meaning. In term relevance feedback, terms extracted from the top ranked documents retrieved from the previous query are shown to users for relevance feedback. In the experiment, interactive TS provides very high precision rates while achieving similar recall rates as n-gram matching. Local TRF achieves improvement in both precision and recall rate in a full text news database and degrades slightly in recall rate in bibliographic databases due to the very limited source of information for feedback. In terms of Rijsbergen's combined measure of recall and precision, both TS and TRF achieve better performance than n-gram matching, which implies that the greater improvement in precision rate compensates the slight degradation in recall rate for TS and TRF
  2. Tseng, Y.-H.: Automatic cataloguing and searching for retrospective data by use of OCR text (2001) 0.01
    0.013936987 = product of:
      0.027873974 = sum of:
        0.027873974 = product of:
          0.05574795 = sum of:
            0.05574795 = weight(_text_:n in 5421) [ClassicSimilarity], result of:
              0.05574795 = score(doc=5421,freq=2.0), product of:
                0.19504215 = queryWeight, product of:
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.045236014 = queryNorm
                0.28582513 = fieldWeight in 5421, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5421)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    This article describes our efforts in supporting information retrieval from OCR degraded text. In particular, we report our approach to an automatic cataloging and searching contest for books in multiple languages. In this contest, 500 books in English, German, French, and Italian published during the 1770s to 1970s are scanned into images and OCRed to digital text. The goal is to use only automatic ways to extract information for sophisticated searching. We adopted the vector space retrieval model, an n-gram indexing method, and a special weighting scheme to tackle this problem. Although the performance by this approach is slightly inferior to the best approach, which is mainly based on regular expression match, one advantage of our approach is that it is less language dependent and less layout sensitive, thus is readily applicable to other languages and document collections. Problems of OCR text retrieval for some Asian languages are also discussed in this article, and solutions are suggested