Search (24 results, page 1 of 2)

  • × theme_ss:"Multilinguale Probleme"
  • × theme_ss:"Computerlinguistik"
  1. Bian, G.-W.; Chen, H.-H.: Cross-language information access to multilingual collections on the Internet (2000) 0.02
    0.022235535 = product of:
      0.04447107 = sum of:
        0.04447107 = sum of:
          0.007030784 = weight(_text_:a in 4436) [ClassicSimilarity], result of:
            0.007030784 = score(doc=4436,freq=6.0), product of:
              0.053105544 = queryWeight, product of:
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.046056706 = queryNorm
              0.13239266 = fieldWeight in 4436, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.046875 = fieldNorm(doc=4436)
          0.037440285 = weight(_text_:22 in 4436) [ClassicSimilarity], result of:
            0.037440285 = score(doc=4436,freq=2.0), product of:
              0.16128273 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046056706 = queryNorm
              0.23214069 = fieldWeight in 4436, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=4436)
      0.5 = coord(1/2)
    
    Abstract
    Language barrier is the major problem that people face in searching for, retrieving, and understanding multilingual collections on the Internet. This paper deals with query translation and document translation in a Chinese-English information retrieval system called MTIR. Bilingual dictionary and monolingual corpus-based approaches are adopted to select suitable tranlated query terms. A machine transliteration algorithm is introduced to resolve proper name searching. We consider several design issues for document translation, including which material is translated, what roles the HTML tags play in translation, what the tradeoff is between the speed performance and the translation performance, and what from the translated result is presented in. About 100.000 Web pages translated in the last 4 months of 1997 are used for quantitative study of online and real-time Web page translation
    Date
    16. 2.2000 14:22:39
    Type
    a
  2. Rettinger, A.; Schumilin, A.; Thoma, S.; Ell, B.: Learning a cross-lingual semantic representation of relations expressed in text (2015) 0.00
    0.0033826875 = product of:
      0.006765375 = sum of:
        0.006765375 = product of:
          0.01353075 = sum of:
            0.01353075 = weight(_text_:a in 2027) [ClassicSimilarity], result of:
              0.01353075 = score(doc=2027,freq=8.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.25478977 = fieldWeight in 2027, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.078125 = fieldNorm(doc=2027)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Type
    a
  3. Senez, D.: Developments in Systran (1995) 0.00
    0.0030255679 = product of:
      0.0060511357 = sum of:
        0.0060511357 = product of:
          0.012102271 = sum of:
            0.012102271 = weight(_text_:a in 8546) [ClassicSimilarity], result of:
              0.012102271 = score(doc=8546,freq=10.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.22789092 = fieldWeight in 8546, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0625 = fieldNorm(doc=8546)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Systran, the European Commission's multilingual machine translation system, is a fast service which is available to all Commission officials. The computer cannot match the skills of the professional translator, who must continue to be responsible for all texts which are legally binding or which are for publication. But machine translation can deal, in a matter of minutes, with short-lived documents, designed, say, for information or preparatory work, and which are required urgently. It can also give a broad view of a paper in an unfamiliar language, so that an official can decide how much, if any, of it needs to go to translators
    Type
    a
  4. Gopestake, A.: Acquisition of lexical translation relations from MRDS (1994/95) 0.00
    0.0030255679 = product of:
      0.0060511357 = sum of:
        0.0060511357 = product of:
          0.012102271 = sum of:
            0.012102271 = weight(_text_:a in 4073) [ClassicSimilarity], result of:
              0.012102271 = score(doc=4073,freq=10.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.22789092 = fieldWeight in 4073, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0625 = fieldNorm(doc=4073)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Presents a methodology for extracting information about lexical translation equivalences from the machine readable versions of conventional dictionaries (MRDs), and describes a series of experiments on semi automatic construction of a linked multilingual lexical knowledge base for English, Dutch and Spanish. Discusses the advantage and limitations of using MRDs that this has revealed, and some strategies developed to cover gaps where direct translation can be found
    Type
    a
  5. Kim, S.; Ko, Y.; Oard, D.W.: Combining lexical and statistical translation evidence for cross-language information retrieval (2015) 0.00
    0.0028703054 = product of:
      0.005740611 = sum of:
        0.005740611 = product of:
          0.011481222 = sum of:
            0.011481222 = weight(_text_:a in 1606) [ClassicSimilarity], result of:
              0.011481222 = score(doc=1606,freq=16.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.2161963 = fieldWeight in 1606, product of:
                  4.0 = tf(freq=16.0), with freq of:
                    16.0 = termFreq=16.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1606)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    This article explores how best to use lexical and statistical translation evidence together for cross-language information retrieval (CLIR). Lexical translation evidence is assembled from Wikipedia and from a large machine-readable dictionary, statistical translation evidence is drawn from parallel corpora, and evidence from co-occurrence in the document language provides a basis for limiting the adverse effect of translation ambiguity. Coverage statistics for NII Testbeds and Community for Information Access Research (NTCIR) queries confirm that these resources have complementary strengths. Experiments with translation evidence from a small parallel corpus indicate that even rather rough estimates of translation probabilities can yield further improvements over a strong technique for translation weighting based on using Jensen-Shannon divergence as a term-association measure. Finally, a novel approach to posttranslation query expansion using a random walk over the Wikipedia concept link graph is shown to yield further improvements over alternative techniques for posttranslation query expansion. Evaluation results on the NTCIR-5 English-Korean test collection show statistically significant improvements over strong baselines.
    Type
    a
  6. Mustafa el Hadi, W.: Dynamics of the linguistic paradigm in information retrieval (2000) 0.00
    0.0026849252 = product of:
      0.0053698504 = sum of:
        0.0053698504 = product of:
          0.010739701 = sum of:
            0.010739701 = weight(_text_:a in 151) [ClassicSimilarity], result of:
              0.010739701 = score(doc=151,freq=14.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.20223314 = fieldWeight in 151, product of:
                  3.7416575 = tf(freq=14.0), with freq of:
                    14.0 = termFreq=14.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=151)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    In this paper we briefly sketch the dynamics of the linguistic paradigm in Information Retrieval (IR) and its adaptation to the Internet. The emergence of Natural Language Processing (NLP) techniques has been a major factor leading to this adaptation. These techniques and tools try to adapt to the current needs, i.e. retrieving information from documents written and indexed in a foreign language by using a native language query to express the information need. This process, known as cross-language IR (CLIR), is a field at the cross roads of both Machine Translation and IR. This field represents a real challenge to the IR community and will require a solid cooperation with the NLP community.
    Type
    a
  7. Ballesteros, L.A.: Cross-language retrieval via transitive relation (2000) 0.00
    0.0025370158 = product of:
      0.0050740317 = sum of:
        0.0050740317 = product of:
          0.010148063 = sum of:
            0.010148063 = weight(_text_:a in 30) [ClassicSimilarity], result of:
              0.010148063 = score(doc=30,freq=18.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.19109234 = fieldWeight in 30, product of:
                  4.2426405 = tf(freq=18.0), with freq of:
                    18.0 = termFreq=18.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=30)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The growth in availability of multi-lingual data in all areas of the public and private sector is driving an increasing need for systems that facilitate access to multi-lingual resources. Cross-language Retrieval (CLR) technology is a means of addressing this need. A CLR system must address two main hurdles to effective cross-language retrieval. First, it must address the ambiguity that arises when trying to map the meaning of text across languages. That is, it must address both within-language ambiguity and cross-language ambiguity. Second, it has to incorporate multilingual resources that will enable it to perform the mapping across languages. The difficulty here is that there is a limited number of lexical resources and virtually none for some pairs of languages. This work focuses on a dictionary approach to addressing the problem of limited lexical resources. A dictionary approach is taken since bilingual dictionaries are more prevalent and simpler to apply than other resources. We show that a transitive translation approach, where a third language is employed as an interlingua between the source and target languages, is a viable means of performing CLR between languages for which no bilingual dictionary is available
    Type
    a
  8. Airio, E.; Kettunen, K.: Does dictionary based bilingual retrieval work in a non-normalized index? (2009) 0.00
    0.0024857575 = product of:
      0.004971515 = sum of:
        0.004971515 = product of:
          0.00994303 = sum of:
            0.00994303 = weight(_text_:a in 4224) [ClassicSimilarity], result of:
              0.00994303 = score(doc=4224,freq=12.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.18723148 = fieldWeight in 4224, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4224)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Many operational IR indexes are non-normalized, i.e. no lemmatization or stemming techniques, etc. have been employed in indexing. This poses a challenge for dictionary-based cross-language retrieval (CLIR), because translations are mostly lemmas. In this study, we face the challenge of dictionary-based CLIR in a non-normalized index. We test two optional approaches: FCG (Frequent Case Generation) and s-gramming. The idea of FCG is to automatically generate the most frequent inflected forms for a given lemma. FCG has been tested in monolingual retrieval and has been shown to be a good method for inflected retrieval, especially for highly inflected languages. S-gramming is an approximate string matching technique (an extension of n-gramming). The language pairs in our tests were English-Finnish, English-Swedish, Swedish-Finnish and Finnish-Swedish. Both our approaches performed quite well, but the results varied depending on the language pair. S-gramming and FCG performed quite equally in all the other language pairs except Finnish-Swedish, where s-gramming outperformed FCG.
    Type
    a
  9. Gonzalo, J.; Verdejo, F.; Peters, C.; Calzolari, N.: Applying EuroWordNet to cross-language text retrieval (1998) 0.00
    0.0023678814 = product of:
      0.0047357627 = sum of:
        0.0047357627 = product of:
          0.009471525 = sum of:
            0.009471525 = weight(_text_:a in 6445) [ClassicSimilarity], result of:
              0.009471525 = score(doc=6445,freq=2.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.17835285 = fieldWeight in 6445, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.109375 = fieldNorm(doc=6445)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Type
    a
  10. Airio, E.: Who benefits from CLIR in web retrieval? (2008) 0.00
    0.002269176 = product of:
      0.004538352 = sum of:
        0.004538352 = product of:
          0.009076704 = sum of:
            0.009076704 = weight(_text_:a in 2342) [ClassicSimilarity], result of:
              0.009076704 = score(doc=2342,freq=10.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.1709182 = fieldWeight in 2342, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2342)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Purpose - The aim of the current paper is to test whether query translation is beneficial in web retrieval. Design/methodology/approach - The language pairs were Finnish-Swedish, English-German and Finnish-French. A total of 12-18 participants were recruited for each language pair. Each participant performed four retrieval tasks. The author's aim was to compare the performance of the translated queries with that of the target language queries. Thus, the author asked participants to formulate a source language query and a target language query for each task. The source language queries were translated into the target language utilizing a dictionary-based system. In English-German, also machine translation was utilized. The author used Google as the search engine. Findings - The results differed depending on the language pair. The author concluded that the dictionary coverage had an effect on the results. On average, the results of query-translation were better than in the traditional laboratory tests. Originality/value - This research shows that query translation in web is beneficial especially for users with moderate and non-active language skills. This is valuable information for developers of cross-language information retrieval systems.
    Type
    a
  11. Kishida, K.: Term disambiguation techniques based on target document collection for cross-language information retrieval : an empirical comparison of performance between techniques (2007) 0.00
    0.0020714647 = product of:
      0.0041429293 = sum of:
        0.0041429293 = product of:
          0.008285859 = sum of:
            0.008285859 = weight(_text_:a in 897) [ClassicSimilarity], result of:
              0.008285859 = score(doc=897,freq=12.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.15602624 = fieldWeight in 897, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=897)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Dictionary-based query translation for cross-language information retrieval often yields various translation candidates having different meanings for a source term in the query. This paper examines methods for solving the ambiguity of translations based on only the target document collections. First, we discuss two kinds of disambiguation technique: (1) one is a method using term co-occurrence statistics in the collection, and (2) a technique based on pseudo-relevance feedback. Next, these techniques are empirically compared using the CLEF 2003 test collection for German to Italian bilingual searches, which are executed by using English language as a pivot. The experiments showed that a variation of term co-occurrence based techniques, in which the best sequence algorithm for selecting translations is used with the Cosine coefficient, is dominant, and that the PRF method shows comparable high search performance, although statistical tests did not sufficiently support these conclusions. Furthermore, we repeat the same experiments for the case of French to Italian (pivot) and English to Italian (non-pivot) searches on the same CLEF 2003 test collection in order to verity our findings. Again, similar results were observed except that the Dice coefficient outperforms slightly the Cosine coefficient in the case of disambiguation based on term co-occurrence for English to Italian searches.
    Type
    a
  12. Chen, K.-H.: Evaluating Chinese text retrieval with multilingual queries (2002) 0.00
    0.0020506454 = product of:
      0.004101291 = sum of:
        0.004101291 = product of:
          0.008202582 = sum of:
            0.008202582 = weight(_text_:a in 1851) [ClassicSimilarity], result of:
              0.008202582 = score(doc=1851,freq=6.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.1544581 = fieldWeight in 1851, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1851)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    This paper reports the design of a Chinese test collection with multilingual queries and the application of this test collection to evaluate information retrieval Systems. The effective indexing units, IR models, translation techniques, and query expansion for Chinese text retrieval are identified. The collaboration of East Asian countries for construction of test collections for cross-language multilingual text retrieval is also discussed in this paper. As well, a tool is designed to help assessors judge relevante and gather the events of relevante judgment. The log file created by this tool will be used to analyze the behaviors of assessors in the future.
    Type
    a
  13. Lonsdale, D.; Mitamura, T.; Nyberg, E.: Acquisition of large lexicons for practical knowledge-based MT (1994/95) 0.00
    0.0020296127 = product of:
      0.0040592253 = sum of:
        0.0040592253 = product of:
          0.008118451 = sum of:
            0.008118451 = weight(_text_:a in 7409) [ClassicSimilarity], result of:
              0.008118451 = score(doc=7409,freq=8.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.15287387 = fieldWeight in 7409, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=7409)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Although knowledge based MT systems have the potential to achieve high translation accuracy, each successful application system requires a large amount of hand coded lexical knowledge. Systems like KBMT-89 and its descendants have demonstarted how knowledge based translation can produce good results in technical domains with tractable domain semantics. Nevertheless, the magnitude of the development task for large scale applications with 10s of 1000s of of domain concepts precludes a purely hand crafted approach. The current challenge for the next generation of knowledge based MT systems is to utilize online textual resources and corpus analysis software in order to automate the most laborious aspects of the knowledge acquisition process. This partial automation can in turn maximize the productivity of human knowledge engineers and help to make large scale applications of knowledge based MT an viable approach. Discusses the corpus based knowledge acquisition methodology used in KANT, a knowledge based translation system for multilingual document production. This methodology can be generalized beyond the KANT interlinhua approach for use with any system that requires similar kinds of knowledge
    Type
    a
  14. Peters, W.; Vossen, P.; Diez-Orzas, P.; Adriaens, G.: Cross-linguistic alignment of WordNets with an inter-lingual-index (1998) 0.00
    0.0020296127 = product of:
      0.0040592253 = sum of:
        0.0040592253 = product of:
          0.008118451 = sum of:
            0.008118451 = weight(_text_:a in 6446) [ClassicSimilarity], result of:
              0.008118451 = score(doc=6446,freq=2.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.15287387 = fieldWeight in 6446, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.09375 = fieldNorm(doc=6446)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Type
    a
  15. Oard, D.W.; He, D.; Wang, J.: User-assisted query translation for interactive cross-language information retrieval (2008) 0.00
    0.0020296127 = product of:
      0.0040592253 = sum of:
        0.0040592253 = product of:
          0.008118451 = sum of:
            0.008118451 = weight(_text_:a in 2030) [ClassicSimilarity], result of:
              0.008118451 = score(doc=2030,freq=8.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.15287387 = fieldWeight in 2030, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2030)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Interactive Cross-Language Information Retrieval (CLIR), a process in which searcher and system collaborate to find documents that satisfy an information need regardless of the language in which those documents are written, calls for designs in which synergies between searcher and system can be leveraged so that the strengths of one can cover weaknesses of the other. This paper describes an approach that employs user-assisted query translation to help searchers better understand the system's operation. Supporting interaction and interface designs are introduced, and results from three user studies are presented. The results indicate that experienced searchers presented with this new system evolve new search strategies that make effective use of the new capabilities, that they achieve retrieval effectiveness comparable to results obtained using fully automatic techniques, and that reported satisfaction with support for cross-language searching increased. The paper concludes with a description of a freely available interactive CLIR system that incorporates lessons learned from this research.
    Type
    a
  16. Pollitt, A.S.; Ellis, G.: Multilingual access to document databases (1993) 0.00
    0.001757696 = product of:
      0.003515392 = sum of:
        0.003515392 = product of:
          0.007030784 = sum of:
            0.007030784 = weight(_text_:a in 1302) [ClassicSimilarity], result of:
              0.007030784 = score(doc=1302,freq=6.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.13239266 = fieldWeight in 1302, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1302)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    This paper examines the reasons why approaches to facilitate document retrieval which apply AI (Artificial Intelligence) or Expert Systems techniques, relying on so-called "natural language" query statements from the end-user will result in sub-optimal solutions. It does so by reflecting on the nature of language and the fundamental problems in document retrieval. Support is given to the work of thesaurus builders and indexers with illustrations of how their work may be utilised in a generally applicable computer-based document retrieval system using Multilingual MenUSE software. The EuroMenUSE interface providing multilingual document access to EPOQUE, the European Parliament's Online Query System is described.
    Source
    Information as a Global Commodity - Communication, Processing and Use (CAIS/ACSI '93) : 21st Annual Conference Canadian Association for Information Science, Antigonish, Nova Scotia, Canada. July 1993
    Type
    a
  17. He, S.: Translingual alteration of conceptual information in medical translation : a crosslanguage analysis between English and chinese (2000) 0.00
    0.0016913437 = product of:
      0.0033826875 = sum of:
        0.0033826875 = product of:
          0.006765375 = sum of:
            0.006765375 = weight(_text_:a in 5162) [ClassicSimilarity], result of:
              0.006765375 = score(doc=5162,freq=8.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.12739488 = fieldWeight in 5162, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5162)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    This research investigated conceptual alteration in medical article titles translation between English and Chinese with a twofold purpose: one was to further justify the findings from a pilot study, and the other was to further investigate how the concepts were altered in translation. The research corpus of 800 medical article titles in English and Chinese was selected from two English medical journals and two Chinese medical journals. The analysis was based on the pairing of concepts in English and Chinese and their conceptual similarity/ dissimilarity via translation between English and Chinese. Two kinds of conceptual alteration were discussed: one was apparent conceptual alteration that was obvious with addition or omission of concepts in translation. The other was latent conceptual alteration that was not obvious, and can only be recognized by the differences between the original and translated concepts. The findings from the pilot study were verified with the findings from this research. Additional findings, for example, the addition/omission of single-word and multiword concepts in the general and medical domain and, implicit information vs. explicit information, were also discussed. The findings provided useful insights into future studies on crosslanguage information retrieval via medical translation between English and Chinese, and other languages as well
    Type
    a
  18. Bellaachia, A.; Amor-Tijani, G.: Proper nouns in English-Arabic cross language information retrieval (2008) 0.00
    0.0016913437 = product of:
      0.0033826875 = sum of:
        0.0033826875 = product of:
          0.006765375 = sum of:
            0.006765375 = weight(_text_:a in 2372) [ClassicSimilarity], result of:
              0.006765375 = score(doc=2372,freq=8.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.12739488 = fieldWeight in 2372, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2372)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Out of vocabulary words, mostly proper nouns and technical terms, are one main source of performance degradation in Cross Language Information Retrieval (CLIR) systems. Those are words not found in the dictionary. Bilingual dictionaries in general do not cover most proper nouns, which are usually primary keys in the query. As they are spelling variants of each other in most languages, using an approximate string matching technique against the target database index is the common approach taken to find the target language correspondents of the original query key. N-gram technique proved to be the most effective among other string matching techniques. The issue arises when the languages dealt with have different alphabets. Transliteration is then applied based on phonetic similarities between the languages involved. In this study, both transliteration and the n-gram technique are combined to generate possible transliterations in an English-Arabic CLIR system. We refer to this technique as Transliteration N-Gram (TNG). We further enhance TNG by applying Part Of Speech disambiguation on the set of transliterations so that words with a similar spelling, but a different meaning, are excluded. Experimental results show that TNG gives promising results, and enhanced TNG further improves performance.
    Type
    a
  19. Mustafa el Hadi, W.: Human language technology and its role in information access and management (2003) 0.00
    0.0014647468 = product of:
      0.0029294936 = sum of:
        0.0029294936 = product of:
          0.005858987 = sum of:
            0.005858987 = weight(_text_:a in 5524) [ClassicSimilarity], result of:
              0.005858987 = score(doc=5524,freq=6.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.11032722 = fieldWeight in 5524, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5524)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The role of linguistics in information access, extraction and dissemination is essential. Radical changes in the techniques of information and communication at the end of the twentieth century have had a significant effect on the function of the linguistic paradigm and its applications in all forms of communication. The introduction of new technical means have deeply changed the possibilities for the distribution of information. In this situation, what is the role of the linguistic paradigm and its practical applications, i.e., natural language processing (NLP) techniques when applied to information access? What solutions can linguistics offer in human computer interaction, extraction and management? Many fields show the relevance of the linguistic paradigm through the various technologies that require NLP, such as document and message understanding, information detection, extraction, and retrieval, question and answer, cross-language information retrieval (CLIR), text summarization, filtering, and spoken document retrieval. This paper focuses on the central role of human language technologies in the information society, surveys the current situation, describes the benefits of the above mentioned applications, outlines successes and challenges, and discusses solutions. It reviews the resources and means needed to advance information access and dissemination across language boundaries in the twenty-first century. Multilingualism, which is a natural result of globalization, requires more effort in the direction of language technology. The scope of human language technology (HLT) is large, so we limit our review to applications that involve multilinguality.
    Type
    a
  20. Li, Q.; Chen, Y.P.; Myaeng, S.-H.; Jin, Y.; Kang, B.-Y.: Concept unification of terms in different languages via web mining for Information Retrieval (2009) 0.00
    0.0014647468 = product of:
      0.0029294936 = sum of:
        0.0029294936 = product of:
          0.005858987 = sum of:
            0.005858987 = weight(_text_:a in 4215) [ClassicSimilarity], result of:
              0.005858987 = score(doc=4215,freq=6.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.11032722 = fieldWeight in 4215, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4215)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    For historical and cultural reasons, English phrases, especially proper nouns and new words, frequently appear in Web pages written primarily in East Asian languages such as Chinese, Korean, and Japanese. Although such English terms and their equivalences in these East Asian languages refer to the same concept, they are often erroneously treated as independent index units in traditional Information Retrieval (IR). This paper describes the degree to which the problem arises in IR and proposes a novel technique to solve it. Our method first extracts English terms from native Web documents in an East Asian language, and then unifies the extracted terms and their equivalences in the native language as one index unit. For Cross-Language Information Retrieval (CLIR), one of the major hindrances to achieving retrieval performance at the level of Mono-Lingual Information Retrieval (MLIR) is the translation of terms in search queries which can not be found in a bilingual dictionary. The Web mining approach proposed in this paper for concept unification of terms in different languages can also be applied to solve this well-known challenge in CLIR. Experimental results based on NTCIR and KT-Set test collections show that the high translation precision of our approach greatly improves performance of both Mono-Lingual and Cross-Language Information Retrieval.
    Type
    a