Search (5 results, page 1 of 1)

  • × author_ss:"Chen, H.-H."
  • × theme_ss:"Multilinguale Probleme"
  1. Chen, H.-H.; Lin, W.-C.; Yang, C.; Lin, W.-H.: Translating-transliterating named entities for multilingual information access (2006) 0.03
    0.025941458 = product of:
      0.051882915 = sum of:
        0.051882915 = sum of:
          0.008202582 = weight(_text_:a in 1080) [ClassicSimilarity], result of:
            0.008202582 = score(doc=1080,freq=6.0), product of:
              0.053105544 = queryWeight, product of:
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.046056706 = queryNorm
              0.1544581 = fieldWeight in 1080, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.0546875 = fieldNorm(doc=1080)
          0.043680333 = weight(_text_:22 in 1080) [ClassicSimilarity], result of:
            0.043680333 = score(doc=1080,freq=2.0), product of:
              0.16128273 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046056706 = queryNorm
              0.2708308 = fieldWeight in 1080, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0546875 = fieldNorm(doc=1080)
      0.5 = coord(1/2)
    
    Abstract
    Named entities are major constituents of a document but are usually unknown words. This work proposes a systematic way of dealing with formulation, transformation, translation, and transliteration of multilingual-named entities. The rules and similarity matrices for translation and transliteration are learned automatically from parallel-named-entity corpora. The results are applied in cross-language access to collections of images with captions. Experimental results demonstrate that the similarity-based transliteration of named entities is effective, and runs in which transliteration is considered outperform the runs in which it is neglected.
    Date
    4. 6.2006 19:52:22
    Type
    a
  2. Bian, G.-W.; Chen, H.-H.: Cross-language information access to multilingual collections on the Internet (2000) 0.02
    0.022235535 = product of:
      0.04447107 = sum of:
        0.04447107 = sum of:
          0.007030784 = weight(_text_:a in 4436) [ClassicSimilarity], result of:
            0.007030784 = score(doc=4436,freq=6.0), product of:
              0.053105544 = queryWeight, product of:
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.046056706 = queryNorm
              0.13239266 = fieldWeight in 4436, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.046875 = fieldNorm(doc=4436)
          0.037440285 = weight(_text_:22 in 4436) [ClassicSimilarity], result of:
            0.037440285 = score(doc=4436,freq=2.0), product of:
              0.16128273 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046056706 = queryNorm
              0.23214069 = fieldWeight in 4436, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=4436)
      0.5 = coord(1/2)
    
    Abstract
    Language barrier is the major problem that people face in searching for, retrieving, and understanding multilingual collections on the Internet. This paper deals with query translation and document translation in a Chinese-English information retrieval system called MTIR. Bilingual dictionary and monolingual corpus-based approaches are adopted to select suitable tranlated query terms. A machine transliteration algorithm is introduced to resolve proper name searching. We consider several design issues for document translation, including which material is translated, what roles the HTML tags play in translation, what the tradeoff is between the speed performance and the translation performance, and what from the translated result is presented in. About 100.000 Web pages translated in the last 4 months of 1997 are used for quantitative study of online and real-time Web page translation
    Date
    16. 2.2000 14:22:39
    Type
    a
  3. Tsai, M.-.F.; Chen, H.-H.; Wang, Y.-T.: Learning a merge model for multilingual information retrieval (2011) 0.00
    0.0023919214 = product of:
      0.0047838427 = sum of:
        0.0047838427 = product of:
          0.009567685 = sum of:
            0.009567685 = weight(_text_:a in 2750) [ClassicSimilarity], result of:
              0.009567685 = score(doc=2750,freq=16.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.18016359 = fieldWeight in 2750, product of:
                  4.0 = tf(freq=16.0), with freq of:
                    16.0 = termFreq=16.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2750)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    This paper proposes a learning approach for the merging process in multilingual information retrieval (MLIR). To conduct the learning approach, we present a number of features that may influence the MLIR merging process. These features are mainly extracted from three levels: query, document, and translation. After the feature extraction, we then use the FRank ranking algorithm to construct a merge model. To the best of our knowledge, this practice is the first attempt to use a learning-based ranking algorithm to construct a merge model for MLIR merging. In our experiments, three test collections for the task of crosslingual information retrieval (CLIR) in NTCIR3, 4, and 5 are employed to assess the performance of our proposed method. Moreover, several merging methods are also carried out for a comparison, including traditional merging methods, the 2-step merging strategy, and the merging method based on logistic regression. The experimental results show that our proposed method can significantly improve merging quality on two different types of datasets. In addition to the effectiveness, through the merge model generated by FRank, our method can further identify key factors that influence the merging process. This information might provide us more insight and understanding into MLIR merging.
    Type
    a
  4. Chen, H.-H.; Kuo, J.-J.; Huang, S.-J.; Lin, C.-J.; Wung, H.-C.: ¬A summarization system for Chinese news from multiple sources (2003) 0.00
    0.002269176 = product of:
      0.004538352 = sum of:
        0.004538352 = product of:
          0.009076704 = sum of:
            0.009076704 = weight(_text_:a in 2115) [ClassicSimilarity], result of:
              0.009076704 = score(doc=2115,freq=10.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.1709182 = fieldWeight in 2115, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2115)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    This article proposes a summarization system for multiple documents. It employs not only named entities and other signatures to cluster news from different sources, but also employs punctuation marks, linking elements, and topic chains to identify the meaningful units (MUs). Using nouns and verbs to identify the similar MUs, focusing and browsing models are applied to represent the summarization results. To reduce information loss during summarization, informative words in a document are introduced. For the evaluation, a question answering system (QA system) is proposed to substitute the human assessors. In large-scale experiments containing 140 questions to 17,877 documents, the results show that those models using informative words outperform pure heuristic voting-only strategy by news reporters. This model can be easily further applied to summarize multilingual news from multiple sources.
    Type
    a
  5. Lin, W.-C.; Chang, Y.-C.; Chen, H.-H.: Integrating textual and visual information for cross-language image retrieval : a trans-media dictionary approach (2007) 0.00
    0.0014351527 = product of:
      0.0028703054 = sum of:
        0.0028703054 = product of:
          0.005740611 = sum of:
            0.005740611 = weight(_text_:a in 904) [ClassicSimilarity], result of:
              0.005740611 = score(doc=904,freq=4.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.10809815 = fieldWeight in 904, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=904)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Type
    a