Search (5 results, page 1 of 1)

  • × language_ss:"e"
  • × theme_ss:"Computerlinguistik"
  • × theme_ss:"Multilinguale Probleme"
  1. Gonzalo, J.; Verdejo, F.; Peters, C.; Calzolari, N.: Applying EuroWordNet to cross-language text retrieval (1998) 0.04
    0.03607856 = product of:
      0.07215712 = sum of:
        0.07215712 = product of:
          0.14431424 = sum of:
            0.14431424 = weight(_text_:n in 6445) [ClassicSimilarity], result of:
              0.14431424 = score(doc=6445,freq=2.0), product of:
                0.2163874 = queryWeight, product of:
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.050186608 = queryNorm
                0.6669253 = fieldWeight in 6445, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.109375 = fieldNorm(doc=6445)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
  2. Bellaachia, A.; Amor-Tijani, G.: Proper nouns in English-Arabic cross language information retrieval (2008) 0.02
    0.022317823 = product of:
      0.044635646 = sum of:
        0.044635646 = product of:
          0.08927129 = sum of:
            0.08927129 = weight(_text_:n in 2372) [ClassicSimilarity], result of:
              0.08927129 = score(doc=2372,freq=6.0), product of:
                0.2163874 = queryWeight, product of:
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.050186608 = queryNorm
                0.41255307 = fieldWeight in 2372, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2372)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Out of vocabulary words, mostly proper nouns and technical terms, are one main source of performance degradation in Cross Language Information Retrieval (CLIR) systems. Those are words not found in the dictionary. Bilingual dictionaries in general do not cover most proper nouns, which are usually primary keys in the query. As they are spelling variants of each other in most languages, using an approximate string matching technique against the target database index is the common approach taken to find the target language correspondents of the original query key. N-gram technique proved to be the most effective among other string matching techniques. The issue arises when the languages dealt with have different alphabets. Transliteration is then applied based on phonetic similarities between the languages involved. In this study, both transliteration and the n-gram technique are combined to generate possible transliterations in an English-Arabic CLIR system. We refer to this technique as Transliteration N-Gram (TNG). We further enhance TNG by applying Part Of Speech disambiguation on the set of transliterations so that words with a similar spelling, but a different meaning, are excluded. Experimental results show that TNG gives promising results, and enhanced TNG further improves performance.
  3. Airio, E.: Who benefits from CLIR in web retrieval? (2008) 0.02
    0.018780733 = product of:
      0.037561465 = sum of:
        0.037561465 = product of:
          0.15024586 = sum of:
            0.15024586 = weight(_text_:author's in 2342) [ClassicSimilarity], result of:
              0.15024586 = score(doc=2342,freq=2.0), product of:
                0.3372617 = queryWeight, product of:
                  6.7201533 = idf(docFreq=144, maxDocs=44218)
                  0.050186608 = queryNorm
                0.44548744 = fieldWeight in 2342, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  6.7201533 = idf(docFreq=144, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2342)
          0.25 = coord(1/4)
      0.5 = coord(1/2)
    
    Abstract
    Purpose - The aim of the current paper is to test whether query translation is beneficial in web retrieval. Design/methodology/approach - The language pairs were Finnish-Swedish, English-German and Finnish-French. A total of 12-18 participants were recruited for each language pair. Each participant performed four retrieval tasks. The author's aim was to compare the performance of the translated queries with that of the target language queries. Thus, the author asked participants to formulate a source language query and a target language query for each task. The source language queries were translated into the target language utilizing a dictionary-based system. In English-German, also machine translation was utilized. The author used Google as the search engine. Findings - The results differed depending on the language pair. The author concluded that the dictionary coverage had an effect on the results. On average, the results of query-translation were better than in the traditional laboratory tests. Originality/value - This research shows that query translation in web is beneficial especially for users with moderate and non-active language skills. This is valuable information for developers of cross-language information retrieval systems.
  4. Airio, E.; Kettunen, K.: Does dictionary based bilingual retrieval work in a non-normalized index? (2009) 0.02
    0.01546224 = product of:
      0.03092448 = sum of:
        0.03092448 = product of:
          0.06184896 = sum of:
            0.06184896 = weight(_text_:n in 4224) [ClassicSimilarity], result of:
              0.06184896 = score(doc=4224,freq=2.0), product of:
                0.2163874 = queryWeight, product of:
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.050186608 = queryNorm
                0.28582513 = fieldWeight in 4224, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4224)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Many operational IR indexes are non-normalized, i.e. no lemmatization or stemming techniques, etc. have been employed in indexing. This poses a challenge for dictionary-based cross-language retrieval (CLIR), because translations are mostly lemmas. In this study, we face the challenge of dictionary-based CLIR in a non-normalized index. We test two optional approaches: FCG (Frequent Case Generation) and s-gramming. The idea of FCG is to automatically generate the most frequent inflected forms for a given lemma. FCG has been tested in monolingual retrieval and has been shown to be a good method for inflected retrieval, especially for highly inflected languages. S-gramming is an approximate string matching technique (an extension of n-gramming). The language pairs in our tests were English-Finnish, English-Swedish, Swedish-Finnish and Finnish-Swedish. Both our approaches performed quite well, but the results varied depending on the language pair. S-gramming and FCG performed quite equally in all the other language pairs except Finnish-Swedish, where s-gramming outperformed FCG.
  5. Bian, G.-W.; Chen, H.-H.: Cross-language information access to multilingual collections on the Internet (2000) 0.01
    0.0101993885 = product of:
      0.020398777 = sum of:
        0.020398777 = product of:
          0.040797554 = sum of:
            0.040797554 = weight(_text_:22 in 4436) [ClassicSimilarity], result of:
              0.040797554 = score(doc=4436,freq=2.0), product of:
                0.17574495 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050186608 = queryNorm
                0.23214069 = fieldWeight in 4436, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4436)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    16. 2.2000 14:22:39