Document (#33003)

Author
Bacchin, M.
Ferro, N.
Melucci, M.
Title
¬A probabilistic model for stemmer generation
Source
Information processing and management. 41(2005) no.1, S.121-137
Year
2005
Abstract
In this paper we will present a language-independent probabilistic model which can automatically generate stemmers. Stemmers can improve the retrieval effectiveness of information retrieval systems, however the designing and the implementation of stemmers requires a laborious amount of effort due to the fact that documents and queries are often written or spoken in several different languages. The probabilistic model proposed in this paper aims at the development of stemmers used for several languages. The proposed model describes the mutual reinforcement relationship between stems and derivations and then provides a probabilistic interpretation. A series of experiments shows that the stemmers generated by the probabilistic model are as effective as the ones based on linguistic knowledge.
Theme
Computerlinguistik

Similar documents (author)

  1. Melucci, M.: Passage retrieval : a probabilistic technique (1998) 1.95
    1.9476473 = sum of:
      1.9476473 = product of:
        3.8952947 = sum of:
          3.8952947 = weight(author_txt:melucci in 2151) [ClassicSimilarity], result of:
            3.8952947 = score(doc=2151,freq=1.0), product of:
              0.67269015 = queryWeight, product of:
                9.264996 = idf(docFreq=10, maxDocs=42740)
                0.07260556 = queryNorm
              5.790622 = fieldWeight in 2151, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.264996 = idf(docFreq=10, maxDocs=42740)
                0.625 = fieldNorm(doc=2151)
        0.5 = coord(1/2)
    
  2. Melucci, M.: Making digital libraries effective : automatic generation of links for similarity search across hyper-textbooks (2004) 1.95
    1.9476473 = sum of:
      1.9476473 = product of:
        3.8952947 = sum of:
          3.8952947 = weight(author_txt:melucci in 3227) [ClassicSimilarity], result of:
            3.8952947 = score(doc=3227,freq=1.0), product of:
              0.67269015 = queryWeight, product of:
                9.264996 = idf(docFreq=10, maxDocs=42740)
                0.07260556 = queryNorm
              5.790622 = fieldWeight in 3227, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.264996 = idf(docFreq=10, maxDocs=42740)
                0.625 = fieldNorm(doc=3227)
        0.5 = coord(1/2)
    
  3. Melucci, M.: Contextual search : a computational framework (2012) 1.95
    1.9476473 = sum of:
      1.9476473 = product of:
        3.8952947 = sum of:
          3.8952947 = weight(author_txt:melucci in 1914) [ClassicSimilarity], result of:
            3.8952947 = score(doc=1914,freq=1.0), product of:
              0.67269015 = queryWeight, product of:
                9.264996 = idf(docFreq=10, maxDocs=42740)
                0.07260556 = queryNorm
              5.790622 = fieldWeight in 1914, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.264996 = idf(docFreq=10, maxDocs=42740)
                0.625 = fieldNorm(doc=1914)
        0.5 = coord(1/2)
    
  4. Ferro, N.; Silvello, G.: NESTOR: a formal model for digital archives (2013) 1.80
    1.7974573 = sum of:
      1.7974573 = product of:
        3.5949147 = sum of:
          3.5949147 = weight(author_txt:ferro in 4708) [ClassicSimilarity], result of:
            3.5949147 = score(doc=4708,freq=1.0), product of:
              0.73992425 = queryWeight, product of:
                1.0487841 = boost
                9.71698 = idf(docFreq=6, maxDocs=42740)
                0.07260556 = queryNorm
              4.85849 = fieldWeight in 4708, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.71698 = idf(docFreq=6, maxDocs=42740)
                0.5 = fieldNorm(doc=4708)
        0.5 = coord(1/2)
    
  5. Ferro, N.; Silvello, G.: Toward an anatomy of IR system component performances (2018) 1.80
    1.7974573 = sum of:
      1.7974573 = product of:
        3.5949147 = sum of:
          3.5949147 = weight(author_txt:ferro in 36) [ClassicSimilarity], result of:
            3.5949147 = score(doc=36,freq=1.0), product of:
              0.73992425 = queryWeight, product of:
                1.0487841 = boost
                9.71698 = idf(docFreq=6, maxDocs=42740)
                0.07260556 = queryNorm
              4.85849 = fieldWeight in 36, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.71698 = idf(docFreq=6, maxDocs=42740)
                0.5 = fieldNorm(doc=36)
        0.5 = coord(1/2)
    

Similar documents (content)

  1. Melucci, M.; Orio, N.: Design, implementation, and evaluation of a methodology for automatic stemmer generation (2007) 0.33
    0.326718 = sum of:
      0.326718 = product of:
        1.16685 = sum of:
          0.033705417 = weight(abstract_txt:requires in 2269) [ClassicSimilarity], result of:
            0.033705417 = score(doc=2269,freq=1.0), product of:
              0.062265962 = queryWeight, product of:
                5.7740126 = idf(docFreq=360, maxDocs=42740)
                0.010783829 = queryNorm
              0.54131365 = fieldWeight in 2269, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7740126 = idf(docFreq=360, maxDocs=42740)
                0.09375 = fieldNorm(doc=2269)
          0.03385177 = weight(abstract_txt:written in 2269) [ClassicSimilarity], result of:
            0.03385177 = score(doc=2269,freq=1.0), product of:
              0.062446076 = queryWeight, product of:
                1.0014453 = boost
                5.7823577 = idf(docFreq=357, maxDocs=42740)
                0.010783829 = queryNorm
              0.542096 = fieldWeight in 2269, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7823577 = idf(docFreq=357, maxDocs=42740)
                0.09375 = fieldNorm(doc=2269)
          0.04969426 = weight(abstract_txt:linguistic in 2269) [ClassicSimilarity], result of:
            0.04969426 = score(doc=2269,freq=2.0), product of:
              0.06401942 = queryWeight, product of:
                1.0139827 = boost
                5.8547482 = idf(docFreq=332, maxDocs=42740)
                0.010783829 = queryNorm
              0.77623725 = fieldWeight in 2269, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.8547482 = idf(docFreq=332, maxDocs=42740)
                0.09375 = fieldNorm(doc=2269)
          0.03552306 = weight(abstract_txt:amount in 2269) [ClassicSimilarity], result of:
            0.03552306 = score(doc=2269,freq=1.0), product of:
              0.06448487 = queryWeight, product of:
                1.017662 = boost
                5.8759933 = idf(docFreq=325, maxDocs=42740)
                0.010783829 = queryNorm
              0.55087435 = fieldWeight in 2269, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8759933 = idf(docFreq=325, maxDocs=42740)
                0.09375 = fieldNorm(doc=2269)
          0.035003148 = weight(abstract_txt:proposed in 2269) [ClassicSimilarity], result of:
            0.035003148 = score(doc=2269,freq=1.0), product of:
              0.08045116 = queryWeight, product of:
                1.6075178 = boost
                4.640914 = idf(docFreq=1120, maxDocs=42740)
                0.010783829 = queryNorm
              0.43508568 = fieldWeight in 2269, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.640914 = idf(docFreq=1120, maxDocs=42740)
                0.09375 = fieldNorm(doc=2269)
          0.06931678 = weight(abstract_txt:languages in 2269) [ClassicSimilarity], result of:
            0.06931678 = score(doc=2269,freq=2.0), product of:
              0.100695446 = queryWeight, product of:
                1.7984343 = boost
                5.192091 = idf(docFreq=645, maxDocs=42740)
                0.010783829 = queryNorm
              0.6883805 = fieldWeight in 2269, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.192091 = idf(docFreq=645, maxDocs=42740)
                0.09375 = fieldNorm(doc=2269)
          0.9097555 = weight(abstract_txt:stemmers in 2269) [ClassicSimilarity], result of:
            0.9097555 = score(doc=2269,freq=2.0), product of:
              0.76040924 = queryWeight, product of:
                7.814179 = boost
                9.023833 = idf(docFreq=13, maxDocs=42740)
                0.010783829 = queryNorm
              1.1964025 = fieldWeight in 2269, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.023833 = idf(docFreq=13, maxDocs=42740)
                0.09375 = fieldNorm(doc=2269)
        0.28 = coord(7/25)
    
  2. Dolamic, L.; Savoy, J.: Indexing and searching strategies for the Russian language (2009) 0.28
    0.2848034 = sum of:
      0.2848034 = product of:
        1.0171549 = sum of:
          0.023072721 = weight(abstract_txt:independent in 302) [ClassicSimilarity], result of:
            0.023072721 = score(doc=302,freq=1.0), product of:
              0.063373975 = queryWeight, product of:
                1.0088582 = boost
                5.82516 = idf(docFreq=342, maxDocs=42740)
                0.010783829 = queryNorm
              0.3640725 = fieldWeight in 302, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.82516 = idf(docFreq=342, maxDocs=42740)
                0.0625 = fieldNorm(doc=302)
          0.013732679 = weight(abstract_txt:retrieval in 302) [ClassicSimilarity], result of:
            0.013732679 = score(doc=302,freq=2.0), product of:
              0.044841684 = queryWeight, product of:
                1.2001364 = boost
                3.4648013 = idf(docFreq=3633, maxDocs=42740)
                0.010783829 = queryNorm
              0.30624807 = fieldWeight in 302, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4648013 = idf(docFreq=3633, maxDocs=42740)
                0.0625 = fieldNorm(doc=302)
          0.009988229 = weight(abstract_txt:paper in 302) [ClassicSimilarity], result of:
            0.009988229 = score(doc=302,freq=1.0), product of:
              0.04569276 = queryWeight, product of:
                1.2114719 = boost
                3.497527 = idf(docFreq=3516, maxDocs=42740)
                0.010783829 = queryNorm
              0.21859543 = fieldWeight in 302, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.497527 = idf(docFreq=3516, maxDocs=42740)
                0.0625 = fieldNorm(doc=302)
          0.12762396 = weight(abstract_txt:stemmer in 302) [ClassicSimilarity], result of:
            0.12762396 = score(doc=302,freq=2.0), product of:
              0.15732215 = queryWeight, product of:
                1.5895331 = boost
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.010783829 = queryNorm
              0.81122684 = fieldWeight in 302, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.0625 = fieldNorm(doc=302)
          0.053713 = weight(abstract_txt:model in 302) [ClassicSimilarity], result of:
            0.053713 = score(doc=302,freq=2.0), product of:
              0.15108153 = queryWeight, product of:
                3.4830952 = boost
                4.022287 = idf(docFreq=2080, maxDocs=42740)
                0.010783829 = queryNorm
              0.3555233 = fieldWeight in 302, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.022287 = idf(docFreq=2080, maxDocs=42740)
                0.0625 = fieldNorm(doc=302)
          0.18252058 = weight(abstract_txt:probabilistic in 302) [ClassicSimilarity], result of:
            0.18252058 = score(doc=302,freq=1.0), product of:
              0.43023887 = queryWeight, product of:
                5.8777957 = boost
                6.787693 = idf(docFreq=130, maxDocs=42740)
                0.010783829 = queryNorm
              0.4242308 = fieldWeight in 302, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.787693 = idf(docFreq=130, maxDocs=42740)
                0.0625 = fieldNorm(doc=302)
          0.6065037 = weight(abstract_txt:stemmers in 302) [ClassicSimilarity], result of:
            0.6065037 = score(doc=302,freq=2.0), product of:
              0.76040924 = queryWeight, product of:
                7.814179 = boost
                9.023833 = idf(docFreq=13, maxDocs=42740)
                0.010783829 = queryNorm
              0.7976017 = fieldWeight in 302, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.023833 = idf(docFreq=13, maxDocs=42740)
                0.0625 = fieldNorm(doc=302)
        0.28 = coord(7/25)
    
  3. Fox, B.; Fox, C.J.: Efficient stemmer generation (2002) 0.28
    0.27622208 = sum of:
      0.27622208 = product of:
        1.726388 = sum of:
          0.055852566 = weight(abstract_txt:written in 3586) [ClassicSimilarity], result of:
            0.055852566 = score(doc=3586,freq=2.0), product of:
              0.062446076 = queryWeight, product of:
                1.0014453 = boost
                5.7823577 = idf(docFreq=357, maxDocs=42740)
                0.010783829 = queryNorm
              0.89441276 = fieldWeight in 3586, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.7823577 = idf(docFreq=357, maxDocs=42740)
                0.109375 = fieldNorm(doc=3586)
          0.017479401 = weight(abstract_txt:paper in 3586) [ClassicSimilarity], result of:
            0.017479401 = score(doc=3586,freq=1.0), product of:
              0.04569276 = queryWeight, product of:
                1.2114719 = boost
                3.497527 = idf(docFreq=3516, maxDocs=42740)
                0.010783829 = queryNorm
              0.382542 = fieldWeight in 3586, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.497527 = idf(docFreq=3516, maxDocs=42740)
                0.109375 = fieldNorm(doc=3586)
          0.35313457 = weight(abstract_txt:stemmer in 3586) [ClassicSimilarity], result of:
            0.35313457 = score(doc=3586,freq=5.0), product of:
              0.15732215 = queryWeight, product of:
                1.5895331 = boost
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.010783829 = queryNorm
              2.244659 = fieldWeight in 3586, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.109375 = fieldNorm(doc=3586)
          1.2999215 = weight(abstract_txt:stemmers in 3586) [ClassicSimilarity], result of:
            1.2999215 = score(doc=3586,freq=3.0), product of:
              0.76040924 = queryWeight, product of:
                7.814179 = boost
                9.023833 = idf(docFreq=13, maxDocs=42740)
                0.010783829 = queryNorm
              1.7095026 = fieldWeight in 3586, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.023833 = idf(docFreq=13, maxDocs=42740)
                0.109375 = fieldNorm(doc=3586)
        0.16 = coord(4/25)
    
  4. Xu, J.; Weischedel, R.: Empirical studies on the impact of lexical resources on CLIR performance (2005) 0.21
    0.20735423 = sum of:
      0.20735423 = product of:
        0.863976 = sum of:
          0.012138088 = weight(abstract_txt:retrieval in 3021) [ClassicSimilarity], result of:
            0.012138088 = score(doc=3021,freq=1.0), product of:
              0.044841684 = queryWeight, product of:
                1.2001364 = boost
                3.4648013 = idf(docFreq=3633, maxDocs=42740)
                0.010783829 = queryNorm
              0.2706876 = fieldWeight in 3021, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4648013 = idf(docFreq=3633, maxDocs=42740)
                0.078125 = fieldNorm(doc=3021)
          0.012485286 = weight(abstract_txt:paper in 3021) [ClassicSimilarity], result of:
            0.012485286 = score(doc=3021,freq=1.0), product of:
              0.04569276 = queryWeight, product of:
                1.2114719 = boost
                3.497527 = idf(docFreq=3516, maxDocs=42740)
                0.010783829 = queryNorm
              0.2732443 = fieldWeight in 3021, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.497527 = idf(docFreq=3516, maxDocs=42740)
                0.078125 = fieldNorm(doc=3021)
          0.027647227 = weight(abstract_txt:several in 3021) [ClassicSimilarity], result of:
            0.027647227 = score(doc=3021,freq=1.0), product of:
              0.07762759 = queryWeight, product of:
                1.5790566 = boost
                4.5587463 = idf(docFreq=1216, maxDocs=42740)
                0.010783829 = queryNorm
              0.35615206 = fieldWeight in 3021, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5587463 = idf(docFreq=1216, maxDocs=42740)
                0.078125 = fieldNorm(doc=3021)
          0.04747604 = weight(abstract_txt:model in 3021) [ClassicSimilarity], result of:
            0.04747604 = score(doc=3021,freq=1.0), product of:
              0.15108153 = queryWeight, product of:
                3.4830952 = boost
                4.022287 = idf(docFreq=2080, maxDocs=42740)
                0.010783829 = queryNorm
              0.31424117 = fieldWeight in 3021, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.022287 = idf(docFreq=2080, maxDocs=42740)
                0.078125 = fieldNorm(doc=3021)
          0.22815074 = weight(abstract_txt:probabilistic in 3021) [ClassicSimilarity], result of:
            0.22815074 = score(doc=3021,freq=1.0), product of:
              0.43023887 = queryWeight, product of:
                5.8777957 = boost
                6.787693 = idf(docFreq=130, maxDocs=42740)
                0.010783829 = queryNorm
              0.5302885 = fieldWeight in 3021, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.787693 = idf(docFreq=130, maxDocs=42740)
                0.078125 = fieldNorm(doc=3021)
          0.53607863 = weight(abstract_txt:stemmers in 3021) [ClassicSimilarity], result of:
            0.53607863 = score(doc=3021,freq=1.0), product of:
              0.76040924 = queryWeight, product of:
                7.814179 = boost
                9.023833 = idf(docFreq=13, maxDocs=42740)
                0.010783829 = queryNorm
              0.704987 = fieldWeight in 3021, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.023833 = idf(docFreq=13, maxDocs=42740)
                0.078125 = fieldNorm(doc=3021)
        0.24 = coord(6/25)
    
  5. Flores, F.N.; Moreira, V.P.: Assessing the impact of stemming accuracy on information retrieval : a multilingual perspective (2016) 0.17
    0.16550218 = sum of:
      0.16550218 = product of:
        1.0343887 = sum of:
          0.0378864 = weight(abstract_txt:ones in 5188) [ClassicSimilarity], result of:
            0.0378864 = score(doc=5188,freq=1.0), product of:
              0.076014064 = queryWeight, product of:
                1.1048965 = boost
                6.379687 = idf(docFreq=196, maxDocs=42740)
                0.010783829 = queryNorm
              0.49841303 = fieldWeight in 5188, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.379687 = idf(docFreq=196, maxDocs=42740)
                0.078125 = fieldNorm(doc=5188)
          0.02714159 = weight(abstract_txt:retrieval in 5188) [ClassicSimilarity], result of:
            0.02714159 = score(doc=5188,freq=5.0), product of:
              0.044841684 = queryWeight, product of:
                1.2001364 = boost
                3.4648013 = idf(docFreq=3633, maxDocs=42740)
                0.010783829 = queryNorm
              0.60527587 = fieldWeight in 5188, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.4648013 = idf(docFreq=3633, maxDocs=42740)
                0.078125 = fieldNorm(doc=5188)
          0.040845305 = weight(abstract_txt:languages in 5188) [ClassicSimilarity], result of:
            0.040845305 = score(doc=5188,freq=1.0), product of:
              0.100695446 = queryWeight, product of:
                1.7984343 = boost
                5.192091 = idf(docFreq=645, maxDocs=42740)
                0.010783829 = queryNorm
              0.4056321 = fieldWeight in 5188, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.192091 = idf(docFreq=645, maxDocs=42740)
                0.078125 = fieldNorm(doc=5188)
          0.9285154 = weight(abstract_txt:stemmers in 5188) [ClassicSimilarity], result of:
            0.9285154 = score(doc=5188,freq=3.0), product of:
              0.76040924 = queryWeight, product of:
                7.814179 = boost
                9.023833 = idf(docFreq=13, maxDocs=42740)
                0.010783829 = queryNorm
              1.2210733 = fieldWeight in 5188, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.023833 = idf(docFreq=13, maxDocs=42740)
                0.078125 = fieldNorm(doc=5188)
        0.16 = coord(4/25)