Search (4 results, page 1 of 1)

Yang, C.C.; Lam, W.: Introduction to the special topic section on multilingual information systems (2006) 0.00
```
0.0047219303 = product of:
  0.018887721 = sum of:
    0.018887721 = weight(_text_:information in 5043) [ClassicSimilarity], result of:
      0.018887721 = score(doc=5043,freq=14.0), product of:
        0.06134496 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.034944877 = queryNorm
        0.3078936 = fieldWeight in 5043, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=5043)
  0.25 = coord(1/4)
```
Abstract

The information available in languages other than English on the World Wide Web and global information systems is increasing significantly. According to some recent reports. the growth of non-English speaking Internet users is significantly higher than the growth of English-speaking Internet users. Asia and Europe have become the two most-populated regions of Internet users. However, there are many different languages in the many different countries of Asia and Europe. And there are many countries in the world using more than one language as their official languages. For example, Chinese and English are official languages in Hong Kong SAR; English and French are official languages in Canada. In the global economy, information systems are no longer utilized by users in a single geographical region but all over the world. Information can be generated, stored, processed, and accessed in several different languages. All of this reveals the importance of research in multilingual information systems.

Source

Journal of the American Society for Information Science and Technology. 57(2006) no.5, S.629-631
Li, K.W.; Yang, C.C.: Automatic crosslingual thesaurus generated from the Hong Kong SAR Police Department Web Corpus for Crime Analysis (2005) 0.00
```
0.0035694435 = product of:
  0.014277774 = sum of:
    0.014277774 = weight(_text_:information in 3391) [ClassicSimilarity], result of:
      0.014277774 = score(doc=3391,freq=18.0), product of:
        0.06134496 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.034944877 = queryNorm
        0.23274568 = fieldWeight in 3391, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.03125 = fieldNorm(doc=3391)
  0.25 = coord(1/4)
```
Abstract

For the sake of national security, very large volumes of data and information are generated and gathered daily. Much of this data and information is written in different languages, stored in different locations, and may be seemingly unconnected. Crosslingual semantic interoperability is a major challenge to generate an overview of this disparate data and information so that it can be analyzed, shared, searched, and summarized. The recent terrorist attacks and the tragic events of September 11, 2001 have prompted increased attention an national security and criminal analysis. Many Asian countries and cities, such as Japan, Taiwan, and Singapore, have been advised that they may become the next targets of terrorist attacks. Semantic interoperability has been a focus in digital library research. Traditional information retrieval (IR) approaches normally require a document to share some common keywords with the query. Generating the associations for the related terms between the two term spaces of users and documents is an important issue. The problem can be viewed as the creation of a thesaurus. Apart from this, terrorists and criminals may communicate through letters, e-mails, and faxes in languages other than English. The translation ambiguity significantly exacerbates the retrieval problem. The problem is expanded to crosslingual semantic interoperability. In this paper, we focus an the English/Chinese crosslingual semantic interoperability problem. However, the developed techniques are not limited to English and Chinese languages but can be applied to many other languages. English and Chinese are popular languages in the Asian region. Much information about national security or crime is communicated in these languages. An efficient automatically generated thesaurus between these languages is important to crosslingual information retrieval between English and Chinese languages. To facilitate crosslingual information retrieval, a corpus-based approach uses the term co-occurrence statistics in parallel or comparable corpora to construct a statistical translation model to cross the language boundary. In this paper, the text based approach to align English/Chinese Hong Kong Police press release documents from the Web is first presented. We also introduce an algorithmic approach to generate a robust knowledge base based an statistical correlation analysis of the semantics (knowledge) embedded in the bilingual press release corpus. The research output consisted of a thesaurus-like, semantic network knowledge base, which can aid in semanticsbased crosslingual information management and retrieval.

Source

Journal of the American Society for Information Science and Technology. 56(2005) no.3, S.272-281
Li, K.W.; Yang, C.C.: Conceptual analysis of parallel corpus collected from the Web (2006) 0.00
```
0.0033256328 = product of:
  0.013302531 = sum of:
    0.013302531 = weight(_text_:information in 5051) [ClassicSimilarity], result of:
      0.013302531 = score(doc=5051,freq=10.0), product of:
        0.06134496 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.034944877 = queryNorm
        0.21684799 = fieldWeight in 5051, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5051)
  0.25 = coord(1/4)
```
Abstract

As illustrated by the World Wide Web, the volume of information in languages other than English has grown significantly in recent years. This highlights the importance of multilingual corpora. Much effort has been devoted to the compilation of multilingual corpora for the purpose of cross-lingual information retrieval and machine translation. Existing parallel corpora mostly involve European languages, such as English-French and English-Spanish. There is still a lack of parallel corpora between European languages and Asian. languages. In the authors' previous work, an alignment method to identify one-to-one Chinese and English title pairs was developed to construct an English-Chinese parallel corpus that works automatically from the World Wide Web, and a 100% precision and 87% recall were obtained. Careful analysis of these results has helped the authors to understand how the alignment method can be improved. A conceptual analysis was conducted, which includes the analysis of conceptual equivalent and conceptual information alternation in the aligned and nonaligned English-Chinese title pairs that are obtained by the alignment method. The result of the analysis not only reflects the characteristics of parallel corpora, but also gives insight into the strengths and weaknesses of the alignment method. In particular, conceptual alternation, such as omission and addition, is found to have a significant impact on the performance of the alignment method.

Footnote

Beitrag einer special topic section on multilingual information systems

Source

Journal of the American Society for Information Science and Technology. 57(2006) no.5, S.632-644
Wang, F.L.; Yang, C.C.: ¬The impact analysis of language differences on an automatic multilingual text summarization system (2006) 0.00
```
0.0029745363 = product of:
  0.011898145 = sum of:
    0.011898145 = weight(_text_:information in 5049) [ClassicSimilarity], result of:
      0.011898145 = score(doc=5049,freq=8.0), product of:
        0.06134496 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.034944877 = queryNorm
        0.19395474 = fieldWeight in 5049, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5049)
  0.25 = coord(1/4)
```
Abstract

Based on the salient features of the documents, automatic text summarization systems extract the key sentences from source documents. This process supports the users in evaluating the relevance of the extracted documents returned by information retrieval systems. Because of this tool, efficient filtering can be achieved. Indirectly, these systems help to resolve the problem of information overloading. Many automatic text summarization systems have been implemented for use with different languages. It has been established that the grammatical and lexical differences between languages have a significant effect on text processing. However, the impact of the language differences on the automatic text summarization systems has not yet been investigated. The authors provide an impact analysis of language difference on automatic text summarization. It includes the effect on the extraction processes, the scoring mechanisms, the performance, and the matching of the extracted sentences, using the parallel corpus in English and Chinese as the tested object. The analysis results provide a greater understanding of language differences and promote the future development of more advanced text summarization techniques.

Footnote

Beitrag einer special topic section on multilingual information systems

Source

Journal of the American Society for Information Science and Technology. 57(2006) no.5, S.684-696

Search (4 results, page 1 of 1)

Authors

Themes