Search (8 results, page 1 of 1)

Dolamic, L.; Savoy, J.: Indexing and searching strategies for the Russian language (2009) 0.03
```
0.03000176 = product of:
  0.06000352 = sum of:
    0.06000352 = product of:
      0.09000528 = sum of:
        0.024974043 = weight(_text_:j in 3301) [ClassicSimilarity], result of:
          0.024974043 = score(doc=3301,freq=2.0), product of:
            0.14227505 = queryWeight, product of:
              3.1774964 = idf(docFreq=5010, maxDocs=44218)
              0.044775832 = queryNorm
            0.17553353 = fieldWeight in 3301, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1774964 = idf(docFreq=5010, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3301)
        0.06503124 = weight(_text_:n in 3301) [ClassicSimilarity], result of:
          0.06503124 = score(doc=3301,freq=4.0), product of:
            0.19305801 = queryWeight, product of:
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.044775832 = queryNorm
            0.33684817 = fieldWeight in 3301, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.3116565 = idf(docFreq=1611, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3301)
      0.6666667 = coord(2/3)
  0.5 = coord(1/2)
```
Abstract

This paper describes and evaluates various stemming and indexing strategies for the Russian language. We design and evaluate two stemming approaches, a light and a more aggressive one, and compare these stemmers to the Snowball stemmer, to no stemming, and also to a language-independent approach (n-gram). To evaluate the suggested stemming strategies we apply various probabilistic information retrieval (IR) models, including the Okapi, the Divergence from Randomness (DFR), a statistical language model (LM), as well as two vector-space approaches, namely, the classical tf idf scheme and the dtu-dtn model. We find that the vector-space dtu-dtn and the DFR models tend to result in better retrieval effectiveness than the Okapi, LM, or tf idf models, while only the latter two IR approaches result in statistically significant performance differences. Ignoring stemming generally reduces the MAP by more than 50%, and these differences are always significant. When applying an n-gram approach, performance differences are usually lower than an approach involving stemming. Finally, our light stemmer tends to perform best, although performance differences between the light, aggressive, and Snowball stemmers are not statistically significant.

Savoy, J.; Picard, J.: Retrieval effectiveness on the web (2001) 0.02

0.01648203 = product of:
  0.03296406 = sum of:
    0.03296406 = product of:
      0.09889217 = sum of:
        0.09889217 = weight(_text_:j in 775) [ClassicSimilarity], result of:
          0.09889217 = score(doc=775,freq=4.0), product of:
            0.14227505 = queryWeight, product of:
              3.1774964 = idf(docFreq=5010, maxDocs=44218)
              0.044775832 = queryNorm
            0.69507736 = fieldWeight in 775, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1774964 = idf(docFreq=5010, maxDocs=44218)
              0.109375 = fieldNorm(doc=775)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Picard, J.; Savoy, J.: Enhancing retrieval with hyperlinks : a general model based on propositional argumentation systems (2003) 0.01

0.0058864383 = product of:
  0.011772877 = sum of:
    0.011772877 = product of:
      0.035318628 = sum of:
        0.035318628 = weight(_text_:j in 1427) [ClassicSimilarity], result of:
          0.035318628 = score(doc=1427,freq=4.0), product of:
            0.14227505 = queryWeight, product of:
              3.1774964 = idf(docFreq=5010, maxDocs=44218)
              0.044775832 = queryNorm
            0.2482419 = fieldWeight in 1427, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1774964 = idf(docFreq=5010, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1427)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Savoy, J.: Bibliographic database access using free-text and controlled vocabulary : an evaluation (2005) 0.01

0.005827277 = product of:
  0.011654554 = sum of:
    0.011654554 = product of:
      0.03496366 = sum of:
        0.03496366 = weight(_text_:j in 1053) [ClassicSimilarity], result of:
          0.03496366 = score(doc=1053,freq=2.0), product of:
            0.14227505 = queryWeight, product of:
              3.1774964 = idf(docFreq=5010, maxDocs=44218)
              0.044775832 = queryNorm
            0.24574696 = fieldWeight in 1053, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1774964 = idf(docFreq=5010, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1053)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Dolamic, L.; Savoy, J.: When stopword lists make the difference (2009) 0.01

0.005827277 = product of:
  0.011654554 = sum of:
    0.011654554 = product of:
      0.03496366 = sum of:
        0.03496366 = weight(_text_:j in 3319) [ClassicSimilarity], result of:
          0.03496366 = score(doc=3319,freq=2.0), product of:
            0.14227505 = queryWeight, product of:
              3.1774964 = idf(docFreq=5010, maxDocs=44218)
              0.044775832 = queryNorm
            0.24574696 = fieldWeight in 3319, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1774964 = idf(docFreq=5010, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3319)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Savoy, J.: Searching strategies for the Hungarian language (2008) 0.00

0.0049948087 = product of:
  0.009989617 = sum of:
    0.009989617 = product of:
      0.029968852 = sum of:
        0.029968852 = weight(_text_:j in 2037) [ClassicSimilarity], result of:
          0.029968852 = score(doc=2037,freq=2.0), product of:
            0.14227505 = queryWeight, product of:
              3.1774964 = idf(docFreq=5010, maxDocs=44218)
              0.044775832 = queryNorm
            0.21064025 = fieldWeight in 2037, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1774964 = idf(docFreq=5010, maxDocs=44218)
              0.046875 = fieldNorm(doc=2037)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Abdou, S.; Savoy, J.: Searching in Medline : query expansion and manual indexing evaluation (2008) 0.00

0.0049948087 = product of:
  0.009989617 = sum of:
    0.009989617 = product of:
      0.029968852 = sum of:
        0.029968852 = weight(_text_:j in 2062) [ClassicSimilarity], result of:
          0.029968852 = score(doc=2062,freq=2.0), product of:
            0.14227505 = queryWeight, product of:
              3.1774964 = idf(docFreq=5010, maxDocs=44218)
              0.044775832 = queryNorm
            0.21064025 = fieldWeight in 2062, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1774964 = idf(docFreq=5010, maxDocs=44218)
              0.046875 = fieldNorm(doc=2062)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Fautsch, C.; Savoy, J.: Algorithmic stemmers or morphological analysis? : an evaluation (2009) 0.00

0.0049948087 = product of:
  0.009989617 = sum of:
    0.009989617 = product of:
      0.029968852 = sum of:
        0.029968852 = weight(_text_:j in 2950) [ClassicSimilarity], result of:
          0.029968852 = score(doc=2950,freq=2.0), product of:
            0.14227505 = queryWeight, product of:
              3.1774964 = idf(docFreq=5010, maxDocs=44218)
              0.044775832 = queryNorm
            0.21064025 = fieldWeight in 2950, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1774964 = idf(docFreq=5010, maxDocs=44218)
              0.046875 = fieldNorm(doc=2950)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Search (8 results, page 1 of 1)

Authors

Themes