Search (24 results, page 1 of 2)

  • × theme_ss:"Computerlinguistik"
  • × theme_ss:"Multilinguale Probleme"
  1. Tartakovski, O.; Shramko, M.: Implementierung eines Werkzeugs zur Sprachidentifikation in mono- und multilingualen Texten (2006) 0.02
    0.019508056 = product of:
      0.039016113 = sum of:
        0.03737085 = weight(_text_:von in 5978) [ClassicSimilarity], result of:
          0.03737085 = score(doc=5978,freq=4.0), product of:
            0.12806706 = queryWeight, product of:
              2.6679487 = idf(docFreq=8340, maxDocs=44218)
              0.04800207 = queryNorm
            0.29180688 = fieldWeight in 5978, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.6679487 = idf(docFreq=8340, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5978)
        0.0016452647 = product of:
          0.004935794 = sum of:
            0.004935794 = weight(_text_:a in 5978) [ClassicSimilarity], result of:
              0.004935794 = score(doc=5978,freq=2.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.089176424 = fieldWeight in 5978, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5978)
          0.33333334 = coord(1/3)
      0.5 = coord(2/4)
    
    Abstract
    Die Identifikation der Sprache bzw. der Sprachen in Textdokumenten ist einer der wichtigsten Schritte maschineller Textverarbeitung für das Information Retrieval. Der vorliegende Artikel stellt Langldent vor, ein System zur Sprachidentifikation von mono- und multilingualen elektronischen Textdokumenten. Das System bietet sowohl eine Auswahl von gängigen Algorithmen für die Sprachidentifikation monolingualer Textdokumente als auch einen neuen Algorithmus für die Sprachidentifikation multilingualer Textdokumente.
    Type
    a
  2. Carter-Sigglow, J.: ¬Die Rolle der Sprache bei der Informationsvermittlung (2001) 0.01
    0.012030191 = product of:
      0.024060382 = sum of:
        0.022650154 = weight(_text_:von in 5882) [ClassicSimilarity], result of:
          0.022650154 = score(doc=5882,freq=2.0), product of:
            0.12806706 = queryWeight, product of:
              2.6679487 = idf(docFreq=8340, maxDocs=44218)
              0.04800207 = queryNorm
            0.17686167 = fieldWeight in 5882, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.6679487 = idf(docFreq=8340, maxDocs=44218)
              0.046875 = fieldNorm(doc=5882)
        0.001410227 = product of:
          0.004230681 = sum of:
            0.004230681 = weight(_text_:a in 5882) [ClassicSimilarity], result of:
              0.004230681 = score(doc=5882,freq=2.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.07643694 = fieldWeight in 5882, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5882)
          0.33333334 = coord(1/3)
      0.5 = coord(2/4)
    
    Abstract
    In der Zeit des Internets und E-Commerce müssen auch deutsche Informationsfachleute ihre Dienste auf Englisch anbieten und sogar auf Englisch gestalten, um die internationale Community zu erreichen. Auf der anderen Seite spielt gerade auf dem Wissensmarkt Europa die sprachliche Identität der einzelnen Nationen eine große Rolle. In diesem Spannungsfeld zwischen Globalisierung und Lokalisierung arbeiten Informationsvermittler und werden dabei von Sprachspezialisten unterstützt. Man muss sich darüber im Klaren sein, dass jede Sprache - auch die für international gehaltene Sprache Englisch - eine Sprachgemeinschaft darstellt. In diesem Beitrag wird anhand aktueller Beispiele gezeigt, dass Sprache nicht nur grammatikalisch und terminologisch korrekt sein muss, sie soll auch den sprachlichen Erwartungen der Rezipienten gerecht werden, um die Grenzen der Sprachwelt nicht zu verletzen. Die Rolle der Sprachspezialisten besteht daher darin, die Informationsvermittlung zwischen diesen Welten reibungslos zu gestalten
    Type
    a
  3. Jensen, N.: Evaluierung von mehrsprachigem Web-Retrieval : Experimente mit dem EuroGOV-Korpus im Rahmen des Cross Language Evaluation Forum (CLEF) (2006) 0.01
    0.012030191 = product of:
      0.024060382 = sum of:
        0.022650154 = weight(_text_:von in 5964) [ClassicSimilarity], result of:
          0.022650154 = score(doc=5964,freq=2.0), product of:
            0.12806706 = queryWeight, product of:
              2.6679487 = idf(docFreq=8340, maxDocs=44218)
              0.04800207 = queryNorm
            0.17686167 = fieldWeight in 5964, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.6679487 = idf(docFreq=8340, maxDocs=44218)
              0.046875 = fieldNorm(doc=5964)
        0.001410227 = product of:
          0.004230681 = sum of:
            0.004230681 = weight(_text_:a in 5964) [ClassicSimilarity], result of:
              0.004230681 = score(doc=5964,freq=2.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.07643694 = fieldWeight in 5964, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5964)
          0.33333334 = coord(1/3)
      0.5 = coord(2/4)
    
    Type
    a
  4. Strötgen, R.; Mandl, T.; Schneider, R.: Entwicklung und Evaluierung eines Question Answering Systems im Rahmen des Cross Language Evaluation Forum (CLEF) (2006) 0.01
    0.012030191 = product of:
      0.024060382 = sum of:
        0.022650154 = weight(_text_:von in 5981) [ClassicSimilarity], result of:
          0.022650154 = score(doc=5981,freq=2.0), product of:
            0.12806706 = queryWeight, product of:
              2.6679487 = idf(docFreq=8340, maxDocs=44218)
              0.04800207 = queryNorm
            0.17686167 = fieldWeight in 5981, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.6679487 = idf(docFreq=8340, maxDocs=44218)
              0.046875 = fieldNorm(doc=5981)
        0.001410227 = product of:
          0.004230681 = sum of:
            0.004230681 = weight(_text_:a in 5981) [ClassicSimilarity], result of:
              0.004230681 = score(doc=5981,freq=2.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.07643694 = fieldWeight in 5981, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5981)
          0.33333334 = coord(1/3)
      0.5 = coord(2/4)
    
    Abstract
    Question Answering Systeme versuchen, zu konkreten Fragen eine korrekte Antwort zu liefern. Dazu durchsuchen sie einen Dokumentenbestand und extrahieren einen Bruchteil eines Dokuments. Dieser Beitrag beschreibt die Entwicklung eines modularen Systems zum multilingualen Question Answering. Die Strategie bei der Entwicklung zielte auf eine schnellstmögliche Verwendbarkeit eines modularen Systems, das auf viele frei verfügbare Ressourcen zugreift. Das System integriert Module zur Erkennung von Eigennamen, zu Indexierung und Retrieval, elektronische Wörterbücher, Online-Übersetzungswerkzeuge sowie Textkorpora zu Trainings- und Testzwecken und implementiert eigene Ansätze zu den Bereichen der Frage- und AntwortTaxonomien, zum Passagenretrieval und zum Ranking alternativer Antworten.
    Type
    a
  5. Bian, G.-W.; Chen, H.-H.: Cross-language information access to multilingual collections on the Internet (2000) 0.01
    0.0077249105 = product of:
      0.030899642 = sum of:
        0.030899642 = product of:
          0.046349462 = sum of:
            0.007327754 = weight(_text_:a in 4436) [ClassicSimilarity], result of:
              0.007327754 = score(doc=4436,freq=6.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.13239266 = fieldWeight in 4436, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4436)
            0.039021708 = weight(_text_:22 in 4436) [ClassicSimilarity], result of:
              0.039021708 = score(doc=4436,freq=2.0), product of:
                0.16809508 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04800207 = queryNorm
                0.23214069 = fieldWeight in 4436, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4436)
          0.6666667 = coord(2/3)
      0.25 = coord(1/4)
    
    Abstract
    Language barrier is the major problem that people face in searching for, retrieving, and understanding multilingual collections on the Internet. This paper deals with query translation and document translation in a Chinese-English information retrieval system called MTIR. Bilingual dictionary and monolingual corpus-based approaches are adopted to select suitable tranlated query terms. A machine transliteration algorithm is introduced to resolve proper name searching. We consider several design issues for document translation, including which material is translated, what roles the HTML tags play in translation, what the tradeoff is between the speed performance and the translation performance, and what from the translated result is presented in. About 100.000 Web pages translated in the last 4 months of 1997 are used for quantitative study of online and real-time Web page translation
    Date
    16. 2.2000 14:22:39
    Type
    a
  6. Rettinger, A.; Schumilin, A.; Thoma, S.; Ell, B.: Learning a cross-lingual semantic representation of relations expressed in text (2015) 0.00
    0.001175189 = product of:
      0.004700756 = sum of:
        0.004700756 = product of:
          0.014102268 = sum of:
            0.014102268 = weight(_text_:a in 2027) [ClassicSimilarity], result of:
              0.014102268 = score(doc=2027,freq=8.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.25478977 = fieldWeight in 2027, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.078125 = fieldNorm(doc=2027)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Type
    a
  7. Senez, D.: Developments in Systran (1995) 0.00
    0.0010511212 = product of:
      0.0042044846 = sum of:
        0.0042044846 = product of:
          0.012613453 = sum of:
            0.012613453 = weight(_text_:a in 8546) [ClassicSimilarity], result of:
              0.012613453 = score(doc=8546,freq=10.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.22789092 = fieldWeight in 8546, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0625 = fieldNorm(doc=8546)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Abstract
    Systran, the European Commission's multilingual machine translation system, is a fast service which is available to all Commission officials. The computer cannot match the skills of the professional translator, who must continue to be responsible for all texts which are legally binding or which are for publication. But machine translation can deal, in a matter of minutes, with short-lived documents, designed, say, for information or preparatory work, and which are required urgently. It can also give a broad view of a paper in an unfamiliar language, so that an official can decide how much, if any, of it needs to go to translators
    Type
    a
  8. Gopestake, A.: Acquisition of lexical translation relations from MRDS (1994/95) 0.00
    0.0010511212 = product of:
      0.0042044846 = sum of:
        0.0042044846 = product of:
          0.012613453 = sum of:
            0.012613453 = weight(_text_:a in 4073) [ClassicSimilarity], result of:
              0.012613453 = score(doc=4073,freq=10.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.22789092 = fieldWeight in 4073, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0625 = fieldNorm(doc=4073)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Abstract
    Presents a methodology for extracting information about lexical translation equivalences from the machine readable versions of conventional dictionaries (MRDs), and describes a series of experiments on semi automatic construction of a linked multilingual lexical knowledge base for English, Dutch and Spanish. Discusses the advantage and limitations of using MRDs that this has revealed, and some strategies developed to cover gaps where direct translation can be found
    Type
    a
  9. Kim, S.; Ko, Y.; Oard, D.W.: Combining lexical and statistical translation evidence for cross-language information retrieval (2015) 0.00
    9.97181E-4 = product of:
      0.003988724 = sum of:
        0.003988724 = product of:
          0.011966172 = sum of:
            0.011966172 = weight(_text_:a in 1606) [ClassicSimilarity], result of:
              0.011966172 = score(doc=1606,freq=16.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.2161963 = fieldWeight in 1606, product of:
                  4.0 = tf(freq=16.0), with freq of:
                    16.0 = termFreq=16.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1606)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Abstract
    This article explores how best to use lexical and statistical translation evidence together for cross-language information retrieval (CLIR). Lexical translation evidence is assembled from Wikipedia and from a large machine-readable dictionary, statistical translation evidence is drawn from parallel corpora, and evidence from co-occurrence in the document language provides a basis for limiting the adverse effect of translation ambiguity. Coverage statistics for NII Testbeds and Community for Information Access Research (NTCIR) queries confirm that these resources have complementary strengths. Experiments with translation evidence from a small parallel corpus indicate that even rather rough estimates of translation probabilities can yield further improvements over a strong technique for translation weighting based on using Jensen-Shannon divergence as a term-association measure. Finally, a novel approach to posttranslation query expansion using a random walk over the Wikipedia concept link graph is shown to yield further improvements over alternative techniques for posttranslation query expansion. Evaluation results on the NTCIR-5 English-Korean test collection show statistically significant improvements over strong baselines.
    Type
    a
  10. Mustafa el Hadi, W.: Dynamics of the linguistic paradigm in information retrieval (2000) 0.00
    9.327775E-4 = product of:
      0.00373111 = sum of:
        0.00373111 = product of:
          0.0111933295 = sum of:
            0.0111933295 = weight(_text_:a in 151) [ClassicSimilarity], result of:
              0.0111933295 = score(doc=151,freq=14.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.20223314 = fieldWeight in 151, product of:
                  3.7416575 = tf(freq=14.0), with freq of:
                    14.0 = termFreq=14.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=151)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Abstract
    In this paper we briefly sketch the dynamics of the linguistic paradigm in Information Retrieval (IR) and its adaptation to the Internet. The emergence of Natural Language Processing (NLP) techniques has been a major factor leading to this adaptation. These techniques and tools try to adapt to the current needs, i.e. retrieving information from documents written and indexed in a foreign language by using a native language query to express the information need. This process, known as cross-language IR (CLIR), is a field at the cross roads of both Machine Translation and IR. This field represents a real challenge to the IR community and will require a solid cooperation with the NLP community.
    Type
    a
  11. Ballesteros, L.A.: Cross-language retrieval via transitive relation (2000) 0.00
    8.813918E-4 = product of:
      0.0035255672 = sum of:
        0.0035255672 = product of:
          0.010576702 = sum of:
            0.010576702 = weight(_text_:a in 30) [ClassicSimilarity], result of:
              0.010576702 = score(doc=30,freq=18.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.19109234 = fieldWeight in 30, product of:
                  4.2426405 = tf(freq=18.0), with freq of:
                    18.0 = termFreq=18.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=30)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Abstract
    The growth in availability of multi-lingual data in all areas of the public and private sector is driving an increasing need for systems that facilitate access to multi-lingual resources. Cross-language Retrieval (CLR) technology is a means of addressing this need. A CLR system must address two main hurdles to effective cross-language retrieval. First, it must address the ambiguity that arises when trying to map the meaning of text across languages. That is, it must address both within-language ambiguity and cross-language ambiguity. Second, it has to incorporate multilingual resources that will enable it to perform the mapping across languages. The difficulty here is that there is a limited number of lexical resources and virtually none for some pairs of languages. This work focuses on a dictionary approach to addressing the problem of limited lexical resources. A dictionary approach is taken since bilingual dictionaries are more prevalent and simpler to apply than other resources. We show that a transitive translation approach, where a third language is employed as an interlingua between the source and target languages, is a viable means of performing CLR between languages for which no bilingual dictionary is available
    Type
    a
  12. Airio, E.; Kettunen, K.: Does dictionary based bilingual retrieval work in a non-normalized index? (2009) 0.00
    8.63584E-4 = product of:
      0.003454336 = sum of:
        0.003454336 = product of:
          0.010363008 = sum of:
            0.010363008 = weight(_text_:a in 4224) [ClassicSimilarity], result of:
              0.010363008 = score(doc=4224,freq=12.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.18723148 = fieldWeight in 4224, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4224)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Abstract
    Many operational IR indexes are non-normalized, i.e. no lemmatization or stemming techniques, etc. have been employed in indexing. This poses a challenge for dictionary-based cross-language retrieval (CLIR), because translations are mostly lemmas. In this study, we face the challenge of dictionary-based CLIR in a non-normalized index. We test two optional approaches: FCG (Frequent Case Generation) and s-gramming. The idea of FCG is to automatically generate the most frequent inflected forms for a given lemma. FCG has been tested in monolingual retrieval and has been shown to be a good method for inflected retrieval, especially for highly inflected languages. S-gramming is an approximate string matching technique (an extension of n-gramming). The language pairs in our tests were English-Finnish, English-Swedish, Swedish-Finnish and Finnish-Swedish. Both our approaches performed quite well, but the results varied depending on the language pair. S-gramming and FCG performed quite equally in all the other language pairs except Finnish-Swedish, where s-gramming outperformed FCG.
    Type
    a
  13. Gonzalo, J.; Verdejo, F.; Peters, C.; Calzolari, N.: Applying EuroWordNet to cross-language text retrieval (1998) 0.00
    8.2263234E-4 = product of:
      0.0032905294 = sum of:
        0.0032905294 = product of:
          0.009871588 = sum of:
            0.009871588 = weight(_text_:a in 6445) [ClassicSimilarity], result of:
              0.009871588 = score(doc=6445,freq=2.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.17835285 = fieldWeight in 6445, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.109375 = fieldNorm(doc=6445)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Type
    a
  14. Airio, E.: Who benefits from CLIR in web retrieval? (2008) 0.00
    7.883408E-4 = product of:
      0.0031533632 = sum of:
        0.0031533632 = product of:
          0.00946009 = sum of:
            0.00946009 = weight(_text_:a in 2342) [ClassicSimilarity], result of:
              0.00946009 = score(doc=2342,freq=10.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.1709182 = fieldWeight in 2342, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2342)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Abstract
    Purpose - The aim of the current paper is to test whether query translation is beneficial in web retrieval. Design/methodology/approach - The language pairs were Finnish-Swedish, English-German and Finnish-French. A total of 12-18 participants were recruited for each language pair. Each participant performed four retrieval tasks. The author's aim was to compare the performance of the translated queries with that of the target language queries. Thus, the author asked participants to formulate a source language query and a target language query for each task. The source language queries were translated into the target language utilizing a dictionary-based system. In English-German, also machine translation was utilized. The author used Google as the search engine. Findings - The results differed depending on the language pair. The author concluded that the dictionary coverage had an effect on the results. On average, the results of query-translation were better than in the traditional laboratory tests. Originality/value - This research shows that query translation in web is beneficial especially for users with moderate and non-active language skills. This is valuable information for developers of cross-language information retrieval systems.
    Type
    a
  15. Kishida, K.: Term disambiguation techniques based on target document collection for cross-language information retrieval : an empirical comparison of performance between techniques (2007) 0.00
    7.196534E-4 = product of:
      0.0028786135 = sum of:
        0.0028786135 = product of:
          0.00863584 = sum of:
            0.00863584 = weight(_text_:a in 897) [ClassicSimilarity], result of:
              0.00863584 = score(doc=897,freq=12.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.15602624 = fieldWeight in 897, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=897)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Abstract
    Dictionary-based query translation for cross-language information retrieval often yields various translation candidates having different meanings for a source term in the query. This paper examines methods for solving the ambiguity of translations based on only the target document collections. First, we discuss two kinds of disambiguation technique: (1) one is a method using term co-occurrence statistics in the collection, and (2) a technique based on pseudo-relevance feedback. Next, these techniques are empirically compared using the CLEF 2003 test collection for German to Italian bilingual searches, which are executed by using English language as a pivot. The experiments showed that a variation of term co-occurrence based techniques, in which the best sequence algorithm for selecting translations is used with the Cosine coefficient, is dominant, and that the PRF method shows comparable high search performance, although statistical tests did not sufficiently support these conclusions. Furthermore, we repeat the same experiments for the case of French to Italian (pivot) and English to Italian (non-pivot) searches on the same CLEF 2003 test collection in order to verity our findings. Again, similar results were observed except that the Dice coefficient outperforms slightly the Cosine coefficient in the case of disambiguation based on term co-occurrence for English to Italian searches.
    Type
    a
  16. Chen, K.-H.: Evaluating Chinese text retrieval with multilingual queries (2002) 0.00
    7.1242056E-4 = product of:
      0.0028496822 = sum of:
        0.0028496822 = product of:
          0.008549047 = sum of:
            0.008549047 = weight(_text_:a in 1851) [ClassicSimilarity], result of:
              0.008549047 = score(doc=1851,freq=6.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.1544581 = fieldWeight in 1851, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1851)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Abstract
    This paper reports the design of a Chinese test collection with multilingual queries and the application of this test collection to evaluate information retrieval Systems. The effective indexing units, IR models, translation techniques, and query expansion for Chinese text retrieval are identified. The collaboration of East Asian countries for construction of test collections for cross-language multilingual text retrieval is also discussed in this paper. As well, a tool is designed to help assessors judge relevante and gather the events of relevante judgment. The log file created by this tool will be used to analyze the behaviors of assessors in the future.
    Type
    a
  17. Lonsdale, D.; Mitamura, T.; Nyberg, E.: Acquisition of large lexicons for practical knowledge-based MT (1994/95) 0.00
    7.051135E-4 = product of:
      0.002820454 = sum of:
        0.002820454 = product of:
          0.008461362 = sum of:
            0.008461362 = weight(_text_:a in 7409) [ClassicSimilarity], result of:
              0.008461362 = score(doc=7409,freq=8.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.15287387 = fieldWeight in 7409, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=7409)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Abstract
    Although knowledge based MT systems have the potential to achieve high translation accuracy, each successful application system requires a large amount of hand coded lexical knowledge. Systems like KBMT-89 and its descendants have demonstarted how knowledge based translation can produce good results in technical domains with tractable domain semantics. Nevertheless, the magnitude of the development task for large scale applications with 10s of 1000s of of domain concepts precludes a purely hand crafted approach. The current challenge for the next generation of knowledge based MT systems is to utilize online textual resources and corpus analysis software in order to automate the most laborious aspects of the knowledge acquisition process. This partial automation can in turn maximize the productivity of human knowledge engineers and help to make large scale applications of knowledge based MT an viable approach. Discusses the corpus based knowledge acquisition methodology used in KANT, a knowledge based translation system for multilingual document production. This methodology can be generalized beyond the KANT interlinhua approach for use with any system that requires similar kinds of knowledge
    Type
    a
  18. Peters, W.; Vossen, P.; Diez-Orzas, P.; Adriaens, G.: Cross-linguistic alignment of WordNets with an inter-lingual-index (1998) 0.00
    7.051135E-4 = product of:
      0.002820454 = sum of:
        0.002820454 = product of:
          0.008461362 = sum of:
            0.008461362 = weight(_text_:a in 6446) [ClassicSimilarity], result of:
              0.008461362 = score(doc=6446,freq=2.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.15287387 = fieldWeight in 6446, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.09375 = fieldNorm(doc=6446)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Type
    a
  19. Oard, D.W.; He, D.; Wang, J.: User-assisted query translation for interactive cross-language information retrieval (2008) 0.00
    7.051135E-4 = product of:
      0.002820454 = sum of:
        0.002820454 = product of:
          0.008461362 = sum of:
            0.008461362 = weight(_text_:a in 2030) [ClassicSimilarity], result of:
              0.008461362 = score(doc=2030,freq=8.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.15287387 = fieldWeight in 2030, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2030)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Abstract
    Interactive Cross-Language Information Retrieval (CLIR), a process in which searcher and system collaborate to find documents that satisfy an information need regardless of the language in which those documents are written, calls for designs in which synergies between searcher and system can be leveraged so that the strengths of one can cover weaknesses of the other. This paper describes an approach that employs user-assisted query translation to help searchers better understand the system's operation. Supporting interaction and interface designs are introduced, and results from three user studies are presented. The results indicate that experienced searchers presented with this new system evolve new search strategies that make effective use of the new capabilities, that they achieve retrieval effectiveness comparable to results obtained using fully automatic techniques, and that reported satisfaction with support for cross-language searching increased. The paper concludes with a description of a freely available interactive CLIR system that incorporates lessons learned from this research.
    Type
    a
  20. Pollitt, A.S.; Ellis, G.: Multilingual access to document databases (1993) 0.00
    6.106462E-4 = product of:
      0.0024425848 = sum of:
        0.0024425848 = product of:
          0.007327754 = sum of:
            0.007327754 = weight(_text_:a in 1302) [ClassicSimilarity], result of:
              0.007327754 = score(doc=1302,freq=6.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.13239266 = fieldWeight in 1302, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1302)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Abstract
    This paper examines the reasons why approaches to facilitate document retrieval which apply AI (Artificial Intelligence) or Expert Systems techniques, relying on so-called "natural language" query statements from the end-user will result in sub-optimal solutions. It does so by reflecting on the nature of language and the fundamental problems in document retrieval. Support is given to the work of thesaurus builders and indexers with illustrations of how their work may be utilised in a generally applicable computer-based document retrieval system using Multilingual MenUSE software. The EuroMenUSE interface providing multilingual document access to EPOQUE, the European Parliament's Online Query System is described.
    Source
    Information as a Global Commodity - Communication, Processing and Use (CAIS/ACSI '93) : 21st Annual Conference Canadian Association for Information Science, Antigonish, Nova Scotia, Canada. July 1993
    Type
    a