Search (3 results, page 1 of 1)

  • × author_ss:"Tseng, Y.-H."
  1. Tseng, Y.-H.: Automatic cataloguing and searching for retrospective data by use of OCR text (2001) 0.00
    9.660233E-4 = product of:
      0.0057961396 = sum of:
        0.0057961396 = product of:
          0.028980698 = sum of:
            0.028980698 = weight(_text_:29 in 5421) [ClassicSimilarity], result of:
              0.028980698 = score(doc=5421,freq=2.0), product of:
                0.124278314 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.03532955 = queryNorm
                0.23319192 = fieldWeight in 5421, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5421)
          0.2 = coord(1/5)
      0.16666667 = coord(1/6)
    
    Date
    29. 9.2001 13:58:18
  2. Tseng, Y.-H.; Lin, C.-J.; Lin, Y.-I.: Text mining techniques for patent analysis (2007) 0.00
    8.3484704E-4 = product of:
      0.005009082 = sum of:
        0.005009082 = product of:
          0.02504541 = sum of:
            0.02504541 = weight(_text_:28 in 935) [ClassicSimilarity], result of:
              0.02504541 = score(doc=935,freq=2.0), product of:
                0.12655975 = queryWeight, product of:
                  3.5822632 = idf(docFreq=3342, maxDocs=44218)
                  0.03532955 = queryNorm
                0.19789396 = fieldWeight in 935, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5822632 = idf(docFreq=3342, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=935)
          0.2 = coord(1/5)
      0.16666667 = coord(1/6)
    
    Date
    26.12.2007 11:28:39
  3. Tseng, Y.-H.: Automatic thesaurus generation for Chinese documents (2002) 0.00
    8.050195E-4 = product of:
      0.004830117 = sum of:
        0.004830117 = product of:
          0.024150584 = sum of:
            0.024150584 = weight(_text_:29 in 5226) [ClassicSimilarity], result of:
              0.024150584 = score(doc=5226,freq=2.0), product of:
                0.124278314 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.03532955 = queryNorm
                0.19432661 = fieldWeight in 5226, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5226)
          0.2 = coord(1/5)
      0.16666667 = coord(1/6)
    
    Abstract
    Tseng constructs a word co-occurrence based thesaurus by means of the automatic analysis of Chinese text. Words are identified by a longest dictionary match supplemented by a key word extraction algorithm that merges back nearby tokens and accepts shorter strings of characters if they occur more often than the longest string. Single character auxiliary words are a major source of error but this can be greatly reduced with the use of a 70-character 2680 word stop list. Extracted terms with their associate document weights are sorted by decreasing frequency and the top of this list is associated using a Dice coefficient modified to account for longer documents on the weights of term pairs. Co-occurrence is not in the document as a whole but in paragraph or sentence size sections in order to reduce computation time. A window of 29 characters or 11 words was found to be sufficient. A thesaurus was produced from 25,230 Chinese news articles and judges asked to review the top 50 terms associated with each of 30 single word query terms. They determined 69% to be relevant.