Search (5 results, page 1 of 1)

  • × theme_ss:"Computerlinguistik"
  • × theme_ss:"Multilinguale Probleme"
  1. Gonzalo, J.; Verdejo, F.; Peters, C.; Calzolari, N.: Applying EuroWordNet to cross-language text retrieval (1998) 0.03
    0.032519635 = product of:
      0.06503927 = sum of:
        0.06503927 = product of:
          0.13007854 = sum of:
            0.13007854 = weight(_text_:n in 6445) [ClassicSimilarity], result of:
              0.13007854 = score(doc=6445,freq=2.0), product of:
                0.19504215 = queryWeight, product of:
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.045236014 = queryNorm
                0.6669253 = fieldWeight in 6445, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.109375 = fieldNorm(doc=6445)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
  2. Bellaachia, A.; Amor-Tijani, G.: Proper nouns in English-Arabic cross language information retrieval (2008) 0.02
    0.020116309 = product of:
      0.040232617 = sum of:
        0.040232617 = product of:
          0.080465235 = sum of:
            0.080465235 = weight(_text_:n in 2372) [ClassicSimilarity], result of:
              0.080465235 = score(doc=2372,freq=6.0), product of:
                0.19504215 = queryWeight, product of:
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.045236014 = queryNorm
                0.41255307 = fieldWeight in 2372, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2372)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Out of vocabulary words, mostly proper nouns and technical terms, are one main source of performance degradation in Cross Language Information Retrieval (CLIR) systems. Those are words not found in the dictionary. Bilingual dictionaries in general do not cover most proper nouns, which are usually primary keys in the query. As they are spelling variants of each other in most languages, using an approximate string matching technique against the target database index is the common approach taken to find the target language correspondents of the original query key. N-gram technique proved to be the most effective among other string matching techniques. The issue arises when the languages dealt with have different alphabets. Transliteration is then applied based on phonetic similarities between the languages involved. In this study, both transliteration and the n-gram technique are combined to generate possible transliterations in an English-Arabic CLIR system. We refer to this technique as Transliteration N-Gram (TNG). We further enhance TNG by applying Part Of Speech disambiguation on the set of transliterations so that words with a similar spelling, but a different meaning, are excluded. Experimental results show that TNG gives promising results, and enhanced TNG further improves performance.
  3. Jensen, N.: Evaluierung von mehrsprachigem Web-Retrieval : Experimente mit dem EuroGOV-Korpus im Rahmen des Cross Language Evaluation Forum (CLEF) (2006) 0.01
    0.013936987 = product of:
      0.027873974 = sum of:
        0.027873974 = product of:
          0.05574795 = sum of:
            0.05574795 = weight(_text_:n in 5964) [ClassicSimilarity], result of:
              0.05574795 = score(doc=5964,freq=2.0), product of:
                0.19504215 = queryWeight, product of:
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.045236014 = queryNorm
                0.28582513 = fieldWeight in 5964, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5964)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
  4. Airio, E.; Kettunen, K.: Does dictionary based bilingual retrieval work in a non-normalized index? (2009) 0.01
    0.013936987 = product of:
      0.027873974 = sum of:
        0.027873974 = product of:
          0.05574795 = sum of:
            0.05574795 = weight(_text_:n in 4224) [ClassicSimilarity], result of:
              0.05574795 = score(doc=4224,freq=2.0), product of:
                0.19504215 = queryWeight, product of:
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.045236014 = queryNorm
                0.28582513 = fieldWeight in 4224, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4224)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Many operational IR indexes are non-normalized, i.e. no lemmatization or stemming techniques, etc. have been employed in indexing. This poses a challenge for dictionary-based cross-language retrieval (CLIR), because translations are mostly lemmas. In this study, we face the challenge of dictionary-based CLIR in a non-normalized index. We test two optional approaches: FCG (Frequent Case Generation) and s-gramming. The idea of FCG is to automatically generate the most frequent inflected forms for a given lemma. FCG has been tested in monolingual retrieval and has been shown to be a good method for inflected retrieval, especially for highly inflected languages. S-gramming is an approximate string matching technique (an extension of n-gramming). The language pairs in our tests were English-Finnish, English-Swedish, Swedish-Finnish and Finnish-Swedish. Both our approaches performed quite well, but the results varied depending on the language pair. S-gramming and FCG performed quite equally in all the other language pairs except Finnish-Swedish, where s-gramming outperformed FCG.
  5. Bian, G.-W.; Chen, H.-H.: Cross-language information access to multilingual collections on the Internet (2000) 0.01
    0.0091932835 = product of:
      0.018386567 = sum of:
        0.018386567 = product of:
          0.036773134 = sum of:
            0.036773134 = weight(_text_:22 in 4436) [ClassicSimilarity], result of:
              0.036773134 = score(doc=4436,freq=2.0), product of:
                0.15840882 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.045236014 = queryNorm
                0.23214069 = fieldWeight in 4436, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4436)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    16. 2.2000 14:22:39