Search (3 results, page 1 of 1)

  • × author_ss:"Dolamic, L."
  • × author_ss:"Savoy, J."
  1. Dolamic, L.; Savoy, J.: Retrieval effectiveness of machine translated queries (2010) 0.00
    0.0036413912 = product of:
      0.014565565 = sum of:
        0.014565565 = weight(_text_:information in 4102) [ClassicSimilarity], result of:
          0.014565565 = score(doc=4102,freq=4.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.16457605 = fieldWeight in 4102, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=4102)
      0.25 = coord(1/4)
    
    Abstract
    This article describes and evaluates various information retrieval models used to search document collections written in English through submitting queries written in various other languages, either members of the Indo-European family (English, French, German, and Spanish) or radically different language groups such as Chinese. This evaluation method involves searching a rather large number of topics (around 300) and using two commercial machine translation systems to translate across the language barriers. In this study, mean average precision is used to measure variances in retrieval effectiveness when a query language differs from the document language. Although performance differences are rather large for certain languages pairs, this does not mean that bilingual search methods are not commercially viable. Causes of the difficulties incurred when searching or during translation are analyzed and the results of concrete examples are explained.
    Source
    Journal of the American Society for Information Science and Technology. 61(2010) no.11, S.2266-2273
  2. Dolamic, L.; Savoy, J.: Indexing and searching strategies for the Russian language (2009) 0.00
    0.0030344925 = product of:
      0.01213797 = sum of:
        0.01213797 = weight(_text_:information in 3301) [ClassicSimilarity], result of:
          0.01213797 = score(doc=3301,freq=4.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.13714671 = fieldWeight in 3301, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3301)
      0.25 = coord(1/4)
    
    Abstract
    This paper describes and evaluates various stemming and indexing strategies for the Russian language. We design and evaluate two stemming approaches, a light and a more aggressive one, and compare these stemmers to the Snowball stemmer, to no stemming, and also to a language-independent approach (n-gram). To evaluate the suggested stemming strategies we apply various probabilistic information retrieval (IR) models, including the Okapi, the Divergence from Randomness (DFR), a statistical language model (LM), as well as two vector-space approaches, namely, the classical tf idf scheme and the dtu-dtn model. We find that the vector-space dtu-dtn and the DFR models tend to result in better retrieval effectiveness than the Okapi, LM, or tf idf models, while only the latter two IR approaches result in statistically significant performance differences. Ignoring stemming generally reduces the MAP by more than 50%, and these differences are always significant. When applying an n-gram approach, performance differences are usually lower than an approach involving stemming. Finally, our light stemmer tends to perform best, although performance differences between the light, aggressive, and Snowball stemmers are not statistically significant.
    Source
    Journal of the American Society for Information Science and Technology. 60(2009) no.12, S.2540-2547
  3. Dolamic, L.; Savoy, J.: When stopword lists make the difference (2009) 0.00
    0.0030039945 = product of:
      0.012015978 = sum of:
        0.012015978 = weight(_text_:information in 3319) [ClassicSimilarity], result of:
          0.012015978 = score(doc=3319,freq=2.0), product of:
            0.08850355 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.050415643 = queryNorm
            0.13576832 = fieldWeight in 3319, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3319)
      0.25 = coord(1/4)
    
    Source
    Journal of the American Society for Information Science and Technology. 61(2010) no.1, S.200-203