Search (5 results, page 1 of 1)

  • × theme_ss:"Multilinguale Probleme"
  • × author_ss:"Järvelin, K."
  1. Lehtokangas, R.; Keskustalo, H.; Järvelin, K.: Experiments with transitive dictionary translation and pseudo-relevance feedback using graded relevance assessments (2008) 0.00
    0.003091229 = product of:
      0.012364916 = sum of:
        0.012364916 = weight(_text_:information in 1349) [ClassicSimilarity], result of:
          0.012364916 = score(doc=1349,freq=6.0), product of:
            0.06134496 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.034944877 = queryNorm
            0.20156369 = fieldWeight in 1349, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=1349)
      0.25 = coord(1/4)
    
    Abstract
    In this article, the authors present evaluation results for transitive dictionary-based cross-language information retrieval (CLIR) using graded relevance assessments in a best match retrieval environment. A text database containing newspaper articles and a related set of 35 search topics were used in the tests. Source language topics (in English, German, and Swedish) were automatically translated into the target language (Finnish) via an intermediate (or pivot) language. Effectiveness of the transitively translated queries was compared to that of the directly translated and monolingual Finnish queries. Pseudo-relevance feedback (PRF) was also used to expand the original transitive target queries. Cross-language information retrieval performance was evaluated on three relevance thresholds: stringent, regular, and liberal. The transitive translations performed well achieving, on the average, 85-93% of the direct translation performance, and 66-72% of monolingual performance. Moreover, PRF was successful in raising the performance of transitive translation routes in absolute terms as well as in relation to monolingual and direct translation performance applying PRF.
    Source
    Journal of the American Society for Information Science and Technology. 59(2008) no.3, S.476-488
  2. Talvensaari, T.; Juhola, M.; Laurikkala, J.; Järvelin, K.: Corpus-based cross-language information retrieval in retrieval of highly relevant documents (2007) 0.00
    0.0029745363 = product of:
      0.011898145 = sum of:
        0.011898145 = weight(_text_:information in 139) [ClassicSimilarity], result of:
          0.011898145 = score(doc=139,freq=8.0), product of:
            0.06134496 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.034944877 = queryNorm
            0.19395474 = fieldWeight in 139, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=139)
      0.25 = coord(1/4)
    
    Abstract
    Information retrieval systems' ability to retrieve highly relevant documents has become more and more important in the age of extremely large collections, such as the World Wide Web (WWW). The authors' aim was to find out how corpus-based cross-language information retrieval (CLIR) manages in retrieving highly relevant documents. They created a Finnish-Swedish comparable corpus from two loosely related document collections and used it as a source of knowledge for query translation. Finnish test queries were translated into Swedish and run against a Swedish test collection. Graded relevance assessments were used in evaluating the results and three relevance criterion levels-liberal, regular, and stringent-were applied. The runs were also evaluated with generalized recall and precision, which weight the retrieved documents according to their relevance level. The performance of the Comparable Corpus Translation system (COCOT) was compared to that of a dictionarybased query translation program; the two translation methods were also combined. The results indicate that corpus-based CUR performs particularly well with highly relevant documents. In average precision, COCOT even matched the monolingual baseline on the highest relevance level. The performance of the different query translation methods was further analyzed by finding out reasons for poor rankings of highly relevant documents.
    Source
    Journal of the American Society for Information Science and Technology. 58(2007) no.3, S.322-334
  3. Toivonen, J.; Pirkola, A.; Keskustalo, H.; Visala, K.; Järvelin, K.: Translating cross-lingual spelling variants using transformation rules (2005) 0.00
    0.0025239778 = product of:
      0.010095911 = sum of:
        0.010095911 = weight(_text_:information in 1052) [ClassicSimilarity], result of:
          0.010095911 = score(doc=1052,freq=4.0), product of:
            0.06134496 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.034944877 = queryNorm
            0.16457605 = fieldWeight in 1052, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=1052)
      0.25 = coord(1/4)
    
    Abstract
    Technical terms and proper names constitute a major problem in dictionary-based cross-language information retrieval (CLIR). However, technical terms and proper names in different languages often share the same Latin or Greek origin, being thus spelling variants of each other. In this paper we present a novel two-step fuzzy translation technique for cross-lingual spelling variants. In the first step, transformation rules are applied to source words to render them more similar to their target language equivalents. The rules are generated automatically using translation dictionaries as source data. In the second step, the intermediate forms obtained in the first step are translated into a target language using fuzzy matching. The effectiveness of the technique was evaluated empirically using five source languages and English as a target language. The two-step technique performed better, in some cases considerably better, than fuzzy matching alone. Even using the first step as such showed promising results.
    Source
    Information processing and management. 41(2005) no.4, S.859-872
  4. Pirkola, A.; Puolamäki, D.; Järvelin, K.: Applying query structuring in cross-language retrieval (2003) 0.00
    0.0025239778 = product of:
      0.010095911 = sum of:
        0.010095911 = weight(_text_:information in 1074) [ClassicSimilarity], result of:
          0.010095911 = score(doc=1074,freq=4.0), product of:
            0.06134496 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.034944877 = queryNorm
            0.16457605 = fieldWeight in 1074, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=1074)
      0.25 = coord(1/4)
    
    Abstract
    We will explore various ways to apply query structuring in cross-language information retrieval. In the first test, English queries were translated into Finnish using an electronic dictionary, and were run in a Finnish newspaper database of 55,000 articles. Queries were structured by combining the Finnish translation equivalents of the same English query key using the syn-operator of the InQuery retrieval system. Structured queries performed markedly better than unstructured queries. Second, the effects of compound-based structuring using a proximity operator for the translation equivalents of query language compound components were tested. The method was not useful in syn-based queries but resulted in decrease in retrieval effectiveness. Proper names are often non-identical spelling variants in different languages. This allows n-gram based translation of names not included in a dictionary. In the third test, a query structuring method where the Boolean and-operator was used to assign more weight to keys translated through n-gram matching gave good results.
    Source
    Information processing and management. 39(2003) no.3, S.391-402
  5. Talvensaari, T.; Laurikkala, J.; Järvelin, K.; Juhola, M.: ¬A study on automatic creation of a comparable document collection in cross-language information retrieval (2006) 0.00
    0.0021033147 = product of:
      0.008413259 = sum of:
        0.008413259 = weight(_text_:information in 5601) [ClassicSimilarity], result of:
          0.008413259 = score(doc=5601,freq=4.0), product of:
            0.06134496 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.034944877 = queryNorm
            0.13714671 = fieldWeight in 5601, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5601)
      0.25 = coord(1/4)
    
    Abstract
    Purpose - To present a method for creating a comparable document collection from two document collections in different languages. Design/methodology/approach - The best query keys were extracted from a Finnish source collection (articles of the newspaper Aamulehti) with the relative average term frequency formula. The keys were translated into English with a dictionary-based query translation program. The resulting lists of words were used as queries that were run against the target collection (Los Angeles Times articles) with the nearest neighbor method. The documents were aligned with unrestricted and date-restricted alignment schemes, which were also combined. Findings - The combined alignment scheme was found the best, when the relatedness of the document pairs was assessed with a five-degree relevance scale. Of the 400 document pairs, roughly 40 percent were highly or fairly related and 75 percent included at least lexical similarity. Research limitations/implications - The number of alignment pairs was small due to the short common time period of the two collections, and their geographical (and thus, topical) remoteness. In future, our aim is to build larger comparable corpora in various languages and use them as source of translation knowledge for the purposes of cross-language information retrieval (CLIR). Practical implications - Readily available parallel corpora are scarce. With this method, two unrelated document collections can relatively easily be aligned to create a CLIR resource. Originality/value - The method can be applied to weakly linked collections and morphologically complex languages, such as Finnish.