Search (2 results, page 1 of 1)

Bacchin, M.; Ferro, N.; Melucci, M.: ¬A probabilistic model for stemmer generation (2005) 0.02

0.024943165 = product of:
  0.06651511 = sum of:
    0.041327372 = weight(_text_:retrieval in 1001) [ClassicSimilarity], result of:
      0.041327372 = score(doc=1001,freq=4.0), product of:
        0.124912694 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.041294612 = queryNorm
        0.33085006 = fieldWeight in 1001, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1001)
    0.017463053 = weight(_text_:of in 1001) [ClassicSimilarity], result of:
      0.017463053 = score(doc=1001,freq=10.0), product of:
        0.06457475 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.041294612 = queryNorm
        0.2704316 = fieldWeight in 1001, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1001)
    0.007724685 = product of:
      0.01544937 = sum of:
        0.01544937 = weight(_text_:on in 1001) [ClassicSimilarity], result of:
          0.01544937 = score(doc=1001,freq=2.0), product of:
            0.090823986 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.041294612 = queryNorm
            0.17010231 = fieldWeight in 1001, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1001)
      0.5 = coord(1/2)
  0.375 = coord(3/8)

Abstract: In this paper we will present a language-independent probabilistic model which can automatically generate stemmers. Stemmers can improve the retrieval effectiveness of information retrieval systems, however the designing and the implementation of stemmers requires a laborious amount of effort due to the fact that documents and queries are often written or spoken in several different languages. The probabilistic model proposed in this paper aims at the development of stemmers used for several languages. The proposed model describes the mutual reinforcement relationship between stems and derivations and then provides a probabilistic interpretation. A series of experiments shows that the stemmers generated by the probabilistic model are as effective as the ones based on linguistic knowledge.

Melucci, M.; Orio, N.: Design, implementation, and evaluation of a methodology for automatic stemmer generation (2007) 0.01

0.006635946 = product of:
  0.026543785 = sum of:
    0.015619429 = weight(_text_:of in 268) [ClassicSimilarity], result of:
      0.015619429 = score(doc=268,freq=8.0), product of:
        0.06457475 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.041294612 = queryNorm
        0.24188137 = fieldWeight in 268, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=268)
    0.010924355 = product of:
      0.02184871 = sum of:
        0.02184871 = weight(_text_:on in 268) [ClassicSimilarity], result of:
          0.02184871 = score(doc=268,freq=4.0), product of:
            0.090823986 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.041294612 = queryNorm
            0.24056101 = fieldWeight in 268, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0546875 = fieldNorm(doc=268)
      0.5 = coord(1/2)
  0.25 = coord(2/8)

Abstract: The authors describe a statistical approach based on hidden Markov models (HMMs), for generating stemmers automatically. The proposed approach requires little effort to insert new languages in the system even if minimal linguistic knowledge is available. This is a key advantage especially for digital libraries, which are often developed for a specific institution or government because the program can manage a great amount of documents written in local languages. The evaluation described in the article shows that the stemmers implemented by means of HMMs are as effective as those based on linguistic rules.
Source: Journal of the American Society for Information Science and Technology. 58(2007) no.5, S.673-686

Search (2 results, page 1 of 1)

Authors