Search (27 results, page 1 of 2)

  • × author_ss:"Savoy, J."
  1. Dolamic, L.; Savoy, J.: Indexing and searching strategies for the Russian language (2009) 0.03
    0.03000176 = product of:
      0.06000352 = sum of:
        0.06000352 = product of:
          0.09000528 = sum of:
            0.024974043 = weight(_text_:j in 3301) [ClassicSimilarity], result of:
              0.024974043 = score(doc=3301,freq=2.0), product of:
                0.14227505 = queryWeight, product of:
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.044775832 = queryNorm
                0.17553353 = fieldWeight in 3301, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3301)
            0.06503124 = weight(_text_:n in 3301) [ClassicSimilarity], result of:
              0.06503124 = score(doc=3301,freq=4.0), product of:
                0.19305801 = queryWeight, product of:
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.044775832 = queryNorm
                0.33684817 = fieldWeight in 3301, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3301)
          0.6666667 = coord(2/3)
      0.5 = coord(1/2)
    
    Abstract
    This paper describes and evaluates various stemming and indexing strategies for the Russian language. We design and evaluate two stemming approaches, a light and a more aggressive one, and compare these stemmers to the Snowball stemmer, to no stemming, and also to a language-independent approach (n-gram). To evaluate the suggested stemming strategies we apply various probabilistic information retrieval (IR) models, including the Okapi, the Divergence from Randomness (DFR), a statistical language model (LM), as well as two vector-space approaches, namely, the classical tf idf scheme and the dtu-dtn model. We find that the vector-space dtu-dtn and the DFR models tend to result in better retrieval effectiveness than the Okapi, LM, or tf idf models, while only the latter two IR approaches result in statistically significant performance differences. Ignoring stemming generally reduces the MAP by more than 50%, and these differences are always significant. When applying an n-gram approach, performance differences are usually lower than an approach involving stemming. Finally, our light stemmer tends to perform best, although performance differences between the light, aggressive, and Snowball stemmers are not statistically significant.
  2. Ikae, C.; Savoy, J.: Gender identification on Twitter (2022) 0.02
    0.023652691 = product of:
      0.047305383 = sum of:
        0.047305383 = product of:
          0.07095807 = sum of:
            0.024974043 = weight(_text_:j in 445) [ClassicSimilarity], result of:
              0.024974043 = score(doc=445,freq=2.0), product of:
                0.14227505 = queryWeight, product of:
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.044775832 = queryNorm
                0.17553353 = fieldWeight in 445, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=445)
            0.045984026 = weight(_text_:n in 445) [ClassicSimilarity], result of:
              0.045984026 = score(doc=445,freq=2.0), product of:
                0.19305801 = queryWeight, product of:
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.044775832 = queryNorm
                0.23818761 = fieldWeight in 445, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.3116565 = idf(docFreq=1611, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=445)
          0.6666667 = coord(2/3)
      0.5 = coord(1/2)
    
    Abstract
    To determine the author of a text's gender, various feature types have been suggested (e.g., function words, n-gram of letters, etc.) leading to a huge number of stylistic markers. To determine the target category, different machine learning models have been suggested (e.g., logistic regression, decision tree, k nearest-neighbors, support vector machine, naïve Bayes, neural networks, and random forest). In this study, our first objective is to know whether or not the same model always proposes the best effectiveness when considering similar corpora under the same conditions. Thus, based on 7 CLEF-PAN collections, this study analyzes the effectiveness of 10 different classifiers. Our second aim is to propose a 2-stage feature selection to reduce the feature size to a few hundred terms without any significant change in the performance level compared to approaches using all the attributes (increase of around 5% after applying the proposed feature selection). Based on our experiments, neural network or random forest tend, on average, to produce the highest effectiveness. Moreover, empirical evidence indicates that reducing the feature set size to around 300 without penalizing the effectiveness is possible. Finally, based on such reduced feature sizes, an analysis reveals some of the specific terms that clearly discriminate between the 2 genders.
  3. Savoy, J.: Estimating the probability of an authorship attribution (2016) 0.02
    0.018435527 = product of:
      0.036871053 = sum of:
        0.036871053 = product of:
          0.055306576 = sum of:
            0.024974043 = weight(_text_:j in 2937) [ClassicSimilarity], result of:
              0.024974043 = score(doc=2937,freq=2.0), product of:
                0.14227505 = queryWeight, product of:
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.044775832 = queryNorm
                0.17553353 = fieldWeight in 2937, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2937)
            0.030332536 = weight(_text_:22 in 2937) [ClassicSimilarity], result of:
              0.030332536 = score(doc=2937,freq=2.0), product of:
                0.15679733 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.044775832 = queryNorm
                0.19345059 = fieldWeight in 2937, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2937)
          0.6666667 = coord(2/3)
      0.5 = coord(1/2)
    
    Date
    7. 5.2016 21:22:27
  4. Savoy, J.; Picard, J.: Retrieval effectiveness on the web (2001) 0.02
    0.01648203 = product of:
      0.03296406 = sum of:
        0.03296406 = product of:
          0.09889217 = sum of:
            0.09889217 = weight(_text_:j in 775) [ClassicSimilarity], result of:
              0.09889217 = score(doc=775,freq=4.0), product of:
                0.14227505 = queryWeight, product of:
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.044775832 = queryNorm
                0.69507736 = fieldWeight in 775, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.109375 = fieldNorm(doc=775)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
  5. Savoy, J.: Stemming of French words based on grammatical categories (1993) 0.01
    0.0133194905 = product of:
      0.026638981 = sum of:
        0.026638981 = product of:
          0.07991694 = sum of:
            0.07991694 = weight(_text_:j in 4650) [ClassicSimilarity], result of:
              0.07991694 = score(doc=4650,freq=2.0), product of:
                0.14227505 = queryWeight, product of:
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.044775832 = queryNorm
                0.5617073 = fieldWeight in 4650, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.125 = fieldNorm(doc=4650)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
  6. Savoy, J.: Bayesian inference networks and spreading activation in hypertext systems (1992) 0.01
    0.0133194905 = product of:
      0.026638981 = sum of:
        0.026638981 = product of:
          0.07991694 = sum of:
            0.07991694 = weight(_text_:j in 192) [ClassicSimilarity], result of:
              0.07991694 = score(doc=192,freq=2.0), product of:
                0.14227505 = queryWeight, product of:
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.044775832 = queryNorm
                0.5617073 = fieldWeight in 192, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.125 = fieldNorm(doc=192)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
  7. Savoy, J.; Ndarugendamwo, M.; Vrajitoru, D.: Report on the TREC-4 experiment : combining probabilistic and vector-space schemes (1996) 0.01
    0.009989617 = product of:
      0.019979235 = sum of:
        0.019979235 = product of:
          0.059937704 = sum of:
            0.059937704 = weight(_text_:j in 7574) [ClassicSimilarity], result of:
              0.059937704 = score(doc=7574,freq=2.0), product of:
                0.14227505 = queryWeight, product of:
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.044775832 = queryNorm
                0.4212805 = fieldWeight in 7574, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.09375 = fieldNorm(doc=7574)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
  8. Savoy, J.; Calvé, A. le; Vrajitoru, D.: Report on the TREC5 experiment : data fusion and collection fusion (1997) 0.01
    0.009989617 = product of:
      0.019979235 = sum of:
        0.019979235 = product of:
          0.059937704 = sum of:
            0.059937704 = weight(_text_:j in 3108) [ClassicSimilarity], result of:
              0.059937704 = score(doc=3108,freq=2.0), product of:
                0.14227505 = queryWeight, product of:
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.044775832 = queryNorm
                0.4212805 = fieldWeight in 3108, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.09375 = fieldNorm(doc=3108)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
  9. Savoy, J.: ¬A stemming procedure and stopword list for general French Corpora (1999) 0.01
    0.009989617 = product of:
      0.019979235 = sum of:
        0.019979235 = product of:
          0.059937704 = sum of:
            0.059937704 = weight(_text_:j in 4314) [ClassicSimilarity], result of:
              0.059937704 = score(doc=4314,freq=2.0), product of:
                0.14227505 = queryWeight, product of:
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.044775832 = queryNorm
                0.4212805 = fieldWeight in 4314, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.09375 = fieldNorm(doc=4314)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
  10. Savoy, J.; Desbois, D.: Information retrieval in hypertext systems (1991) 0.01
    0.0066597452 = product of:
      0.0133194905 = sum of:
        0.0133194905 = product of:
          0.03995847 = sum of:
            0.03995847 = weight(_text_:j in 4452) [ClassicSimilarity], result of:
              0.03995847 = score(doc=4452,freq=2.0), product of:
                0.14227505 = queryWeight, product of:
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.044775832 = queryNorm
                0.28085366 = fieldWeight in 4452, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.0625 = fieldNorm(doc=4452)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
  11. Savoy, J.: Effectiveness of information retrieval systems used in a hypertext environment (1993) 0.01
    0.0066597452 = product of:
      0.0133194905 = sum of:
        0.0133194905 = product of:
          0.03995847 = sum of:
            0.03995847 = weight(_text_:j in 6511) [ClassicSimilarity], result of:
              0.03995847 = score(doc=6511,freq=2.0), product of:
                0.14227505 = queryWeight, product of:
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.044775832 = queryNorm
                0.28085366 = fieldWeight in 6511, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.0625 = fieldNorm(doc=6511)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
  12. Savoy, J.: ¬A learning scheme for information retrieval in hypertext (1994) 0.01
    0.0066597452 = product of:
      0.0133194905 = sum of:
        0.0133194905 = product of:
          0.03995847 = sum of:
            0.03995847 = weight(_text_:j in 7292) [ClassicSimilarity], result of:
              0.03995847 = score(doc=7292,freq=2.0), product of:
                0.14227505 = queryWeight, product of:
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.044775832 = queryNorm
                0.28085366 = fieldWeight in 7292, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.0625 = fieldNorm(doc=7292)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
  13. Savoy, J.: Searching information in legal hypertext systems (1993/94) 0.01
    0.0066597452 = product of:
      0.0133194905 = sum of:
        0.0133194905 = product of:
          0.03995847 = sum of:
            0.03995847 = weight(_text_:j in 757) [ClassicSimilarity], result of:
              0.03995847 = score(doc=757,freq=2.0), product of:
                0.14227505 = queryWeight, product of:
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.044775832 = queryNorm
                0.28085366 = fieldWeight in 757, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.0625 = fieldNorm(doc=757)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
  14. Picard, J.; Savoy, J.: Enhancing retrieval with hyperlinks : a general model based on propositional argumentation systems (2003) 0.01
    0.0058864383 = product of:
      0.011772877 = sum of:
        0.011772877 = product of:
          0.035318628 = sum of:
            0.035318628 = weight(_text_:j in 1427) [ClassicSimilarity], result of:
              0.035318628 = score(doc=1427,freq=4.0), product of:
                0.14227505 = queryWeight, product of:
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.044775832 = queryNorm
                0.2482419 = fieldWeight in 1427, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1427)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
  15. Savoy, J.: ¬A new probabilistic scheme for information retrieval in hypertext (1995) 0.01
    0.005827277 = product of:
      0.011654554 = sum of:
        0.011654554 = product of:
          0.03496366 = sum of:
            0.03496366 = weight(_text_:j in 7254) [ClassicSimilarity], result of:
              0.03496366 = score(doc=7254,freq=2.0), product of:
                0.14227505 = queryWeight, product of:
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.044775832 = queryNorm
                0.24574696 = fieldWeight in 7254, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=7254)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
  16. Savoy, J.: Bibliographic database access using free-text and controlled vocabulary : an evaluation (2005) 0.01
    0.005827277 = product of:
      0.011654554 = sum of:
        0.011654554 = product of:
          0.03496366 = sum of:
            0.03496366 = weight(_text_:j in 1053) [ClassicSimilarity], result of:
              0.03496366 = score(doc=1053,freq=2.0), product of:
                0.14227505 = queryWeight, product of:
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.044775832 = queryNorm
                0.24574696 = fieldWeight in 1053, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1053)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
  17. Dolamic, L.; Savoy, J.: When stopword lists make the difference (2009) 0.01
    0.005827277 = product of:
      0.011654554 = sum of:
        0.011654554 = product of:
          0.03496366 = sum of:
            0.03496366 = weight(_text_:j in 3319) [ClassicSimilarity], result of:
              0.03496366 = score(doc=3319,freq=2.0), product of:
                0.14227505 = queryWeight, product of:
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.044775832 = queryNorm
                0.24574696 = fieldWeight in 3319, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=3319)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
  18. Savoy, J.: Ranking schemes in hybrid Boolean systems : a new approach (1997) 0.00
    0.0049948087 = product of:
      0.009989617 = sum of:
        0.009989617 = product of:
          0.029968852 = sum of:
            0.029968852 = weight(_text_:j in 393) [ClassicSimilarity], result of:
              0.029968852 = score(doc=393,freq=2.0), product of:
                0.14227505 = queryWeight, product of:
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.044775832 = queryNorm
                0.21064025 = fieldWeight in 393, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.046875 = fieldNorm(doc=393)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
  19. Savoy, J.: Searching strategies for the Hungarian language (2008) 0.00
    0.0049948087 = product of:
      0.009989617 = sum of:
        0.009989617 = product of:
          0.029968852 = sum of:
            0.029968852 = weight(_text_:j in 2037) [ClassicSimilarity], result of:
              0.029968852 = score(doc=2037,freq=2.0), product of:
                0.14227505 = queryWeight, product of:
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.044775832 = queryNorm
                0.21064025 = fieldWeight in 2037, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2037)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
  20. Abdou, S.; Savoy, J.: Searching in Medline : query expansion and manual indexing evaluation (2008) 0.00
    0.0049948087 = product of:
      0.009989617 = sum of:
        0.009989617 = product of:
          0.029968852 = sum of:
            0.029968852 = weight(_text_:j in 2062) [ClassicSimilarity], result of:
              0.029968852 = score(doc=2062,freq=2.0), product of:
                0.14227505 = queryWeight, product of:
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.044775832 = queryNorm
                0.21064025 = fieldWeight in 2062, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1774964 = idf(docFreq=5010, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2062)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)