Document (#32269)

Author
Melucci, M.
Orio, N.
Title
Design, implementation, and evaluation of a methodology for automatic stemmer generation
Source
Journal of the American Society for Information Science and Technology. 58(2007) no.5, S.673-686
Year
2007
Abstract
The authors describe a statistical approach based on hidden Markov models (HMMs), for generating stemmers automatically. The proposed approach requires little effort to insert new languages in the system even if minimal linguistic knowledge is available. This is a key advantage especially for digital libraries, which are often developed for a specific institution or government because the program can manage a great amount of documents written in local languages. The evaluation described in the article shows that the stemmers implemented by means of HMMs are as effective as those based on linguistic rules.
Theme
Computerlinguistik

Similar documents (author)

  1. Melucci, M.: Passage retrieval : a probabilistic technique (1998) 5.81
    5.81187 = sum of:
      5.81187 = weight(author_txt:melucci in 1150) [ClassicSimilarity], result of:
        5.81187 = fieldWeight in 1150, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.298992 = idf(docFreq=10, maxDocs=44218)
          0.625 = fieldNorm(doc=1150)
    
  2. Melucci, M.: Making digital libraries effective : automatic generation of links for similarity search across hyper-textbooks (2004) 5.81
    5.81187 = sum of:
      5.81187 = weight(author_txt:melucci in 2226) [ClassicSimilarity], result of:
        5.81187 = fieldWeight in 2226, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.298992 = idf(docFreq=10, maxDocs=44218)
          0.625 = fieldNorm(doc=2226)
    
  3. Melucci, M.: Contextual search : a computational framework (2012) 5.81
    5.81187 = sum of:
      5.81187 = weight(author_txt:melucci in 4913) [ClassicSimilarity], result of:
        5.81187 = fieldWeight in 4913, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.298992 = idf(docFreq=10, maxDocs=44218)
          0.625 = fieldNorm(doc=4913)
    
  4. Agosti, M.; Melucci, M.: Information retrieval techniques for the automatic construction of hypertext (2000) 4.65
    4.649496 = sum of:
      4.649496 = weight(author_txt:melucci in 4671) [ClassicSimilarity], result of:
        4.649496 = fieldWeight in 4671, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.298992 = idf(docFreq=10, maxDocs=44218)
          0.5 = fieldNorm(doc=4671)
    
  5. Melucci, M.; Orio, N.: Combining melody processing and information retrieval techniques : methodology, evaluation, and system implementation (2004) 4.65
    4.649496 = sum of:
      4.649496 = weight(author_txt:melucci in 3087) [ClassicSimilarity], result of:
        4.649496 = fieldWeight in 3087, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.298992 = idf(docFreq=10, maxDocs=44218)
          0.5 = fieldNorm(doc=3087)
    

Similar documents (content)

  1. Fox, B.; Fox, C.J.: Efficient stemmer generation (2002) 0.49
    0.49013522 = sum of:
      0.49013522 = product of:
        2.0422301 = sum of:
          0.06899438 = weight(abstract_txt:generation in 2585) [ClassicSimilarity], result of:
            0.06899438 = score(doc=2585,freq=1.0), product of:
              0.11212032 = queryWeight, product of:
                1.0143998 = boost
                5.6261497 = idf(docFreq=432, maxDocs=44218)
                0.019645538 = queryNorm
              0.61536014 = fieldWeight in 2585, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6261497 = idf(docFreq=432, maxDocs=44218)
                0.109375 = fieldNorm(doc=2585)
          0.07148312 = weight(abstract_txt:program in 2585) [ClassicSimilarity], result of:
            0.07148312 = score(doc=2585,freq=1.0), product of:
              0.11480062 = queryWeight, product of:
                1.026453 = boost
                5.6930003 = idf(docFreq=404, maxDocs=44218)
                0.019645538 = queryNorm
              0.6226719 = fieldWeight in 2585, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6930003 = idf(docFreq=404, maxDocs=44218)
                0.109375 = fieldNorm(doc=2585)
          0.10583583 = weight(abstract_txt:written in 2585) [ClassicSimilarity], result of:
            0.10583583 = score(doc=2585,freq=2.0), product of:
              0.1183642 = queryWeight, product of:
                1.0422626 = boost
                5.780685 = idf(docFreq=370, maxDocs=44218)
                0.019645538 = queryNorm
              0.8941541 = fieldWeight in 2585, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.780685 = idf(docFreq=370, maxDocs=44218)
                0.109375 = fieldNorm(doc=2585)
          0.121366434 = weight(abstract_txt:generating in 2585) [ClassicSimilarity], result of:
            0.121366434 = score(doc=2585,freq=1.0), product of:
              0.16338328 = queryWeight, product of:
                1.2245337 = boost
                6.7916126 = idf(docFreq=134, maxDocs=44218)
                0.019645538 = queryNorm
              0.74283266 = fieldWeight in 2585, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7916126 = idf(docFreq=134, maxDocs=44218)
                0.109375 = fieldNorm(doc=2585)
          0.67721087 = weight(abstract_txt:stemmer in 2585) [ClassicSimilarity], result of:
            0.67721087 = score(doc=2585,freq=5.0), product of:
              0.30058536 = queryWeight, product of:
                1.6609282 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.019645538 = queryNorm
              2.2529736 = fieldWeight in 2585, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.109375 = fieldNorm(doc=2585)
          0.9973394 = weight(abstract_txt:stemmers in 2585) [ClassicSimilarity], result of:
            0.9973394 = score(doc=2585,freq=3.0), product of:
              0.58121943 = queryWeight, product of:
                3.2662694 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.019645538 = queryNorm
              1.715943 = fieldWeight in 2585, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.109375 = fieldNorm(doc=2585)
        0.24 = coord(6/25)
    
  2. Bacchin, M.; Ferro, N.; Melucci, M.: ¬A probabilistic model for stemmer generation (2005) 0.44
    0.43854427 = sum of:
      0.43854427 = product of:
        1.3704509 = sum of:
          0.051872928 = weight(abstract_txt:effort in 1001) [ClassicSimilarity], result of:
            0.051872928 = score(doc=1001,freq=1.0), product of:
              0.11601686 = queryWeight, product of:
                1.031876 = boost
                5.723078 = idf(docFreq=392, maxDocs=44218)
                0.019645538 = queryNorm
              0.44711545 = fieldWeight in 1001, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.723078 = idf(docFreq=392, maxDocs=44218)
                0.078125 = fieldNorm(doc=1001)
          0.053084657 = weight(abstract_txt:requires in 1001) [ClassicSimilarity], result of:
            0.053084657 = score(doc=1001,freq=1.0), product of:
              0.11781663 = queryWeight, product of:
                1.0398489 = boost
                5.767298 = idf(docFreq=375, maxDocs=44218)
                0.019645538 = queryNorm
              0.45057017 = fieldWeight in 1001, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.767298 = idf(docFreq=375, maxDocs=44218)
                0.078125 = fieldNorm(doc=1001)
          0.05345517 = weight(abstract_txt:written in 1001) [ClassicSimilarity], result of:
            0.05345517 = score(doc=1001,freq=1.0), product of:
              0.1183642 = queryWeight, product of:
                1.0422626 = boost
                5.780685 = idf(docFreq=370, maxDocs=44218)
                0.019645538 = queryNorm
              0.45161602 = fieldWeight in 1001, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.780685 = idf(docFreq=370, maxDocs=44218)
                0.078125 = fieldNorm(doc=1001)
          0.05574503 = weight(abstract_txt:amount in 1001) [ClassicSimilarity], result of:
            0.05574503 = score(doc=1001,freq=1.0), product of:
              0.12172077 = queryWeight, product of:
                1.0569375 = boost
                5.8620763 = idf(docFreq=341, maxDocs=44218)
                0.019645538 = queryNorm
              0.4579747 = fieldWeight in 1001, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8620763 = idf(docFreq=341, maxDocs=44218)
                0.078125 = fieldNorm(doc=1001)
          0.017931111 = weight(abstract_txt:based in 1001) [ClassicSimilarity], result of:
            0.017931111 = score(doc=1001,freq=1.0), product of:
              0.071996056 = queryWeight, product of:
                1.1495724 = boost
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.019645538 = queryNorm
              0.24905685 = fieldWeight in 1001, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.078125 = fieldNorm(doc=1001)
          0.109301545 = weight(abstract_txt:languages in 1001) [ClassicSimilarity], result of:
            0.109301545 = score(doc=1001,freq=2.0), product of:
              0.19068277 = queryWeight, product of:
                1.8708445 = boost
                5.188118 = idf(docFreq=670, maxDocs=44218)
                0.019645538 = queryNorm
              0.57321143 = fieldWeight in 1001, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.188118 = idf(docFreq=670, maxDocs=44218)
                0.078125 = fieldNorm(doc=1001)
          0.109375 = weight(abstract_txt:linguistic in 1001) [ClassicSimilarity], result of:
            0.109375 = score(doc=1001,freq=1.0), product of:
              0.24035285 = queryWeight, product of:
                2.1004221 = boost
                5.8247695 = idf(docFreq=354, maxDocs=44218)
                0.019645538 = queryNorm
              0.45506012 = fieldWeight in 1001, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8247695 = idf(docFreq=354, maxDocs=44218)
                0.078125 = fieldNorm(doc=1001)
          0.9196854 = weight(abstract_txt:stemmers in 1001) [ClassicSimilarity], result of:
            0.9196854 = score(doc=1001,freq=5.0), product of:
              0.58121943 = queryWeight, product of:
                3.2662694 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.019645538 = queryNorm
              1.5823377 = fieldWeight in 1001, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.078125 = fieldNorm(doc=1001)
        0.32 = coord(8/25)
    
  3. Dunning, T.: Statistical identification of language (1994) 0.14
    0.13951543 = sum of:
      0.13951543 = product of:
        0.49826938 = sum of:
          0.047212638 = weight(abstract_txt:statistical in 3627) [ClassicSimilarity], result of:
            0.047212638 = score(doc=3627,freq=1.0), product of:
              0.10895975 = queryWeight, product of:
                5.5462847 = idf(docFreq=468, maxDocs=44218)
                0.019645538 = queryNorm
              0.43330348 = fieldWeight in 3627, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5462847 = idf(docFreq=468, maxDocs=44218)
                0.078125 = fieldNorm(doc=3627)
          0.11417222 = weight(abstract_txt:program in 3627) [ClassicSimilarity], result of:
            0.11417222 = score(doc=3627,freq=5.0), product of:
              0.11480062 = queryWeight, product of:
                1.026453 = boost
                5.6930003 = idf(docFreq=404, maxDocs=44218)
                0.019645538 = queryNorm
              0.9945262 = fieldWeight in 3627, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.6930003 = idf(docFreq=404, maxDocs=44218)
                0.078125 = fieldNorm(doc=3627)
          0.05345517 = weight(abstract_txt:written in 3627) [ClassicSimilarity], result of:
            0.05345517 = score(doc=3627,freq=1.0), product of:
              0.1183642 = queryWeight, product of:
                1.0422626 = boost
                5.780685 = idf(docFreq=370, maxDocs=44218)
                0.019645538 = queryNorm
              0.45161602 = fieldWeight in 3627, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.780685 = idf(docFreq=370, maxDocs=44218)
                0.078125 = fieldNorm(doc=3627)
          0.07883539 = weight(abstract_txt:amount in 3627) [ClassicSimilarity], result of:
            0.07883539 = score(doc=3627,freq=2.0), product of:
              0.12172077 = queryWeight, product of:
                1.0569375 = boost
                5.8620763 = idf(docFreq=341, maxDocs=44218)
                0.019645538 = queryNorm
              0.6476741 = fieldWeight in 3627, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.8620763 = idf(docFreq=341, maxDocs=44218)
                0.078125 = fieldNorm(doc=3627)
          0.017931111 = weight(abstract_txt:based in 3627) [ClassicSimilarity], result of:
            0.017931111 = score(doc=3627,freq=1.0), product of:
              0.071996056 = queryWeight, product of:
                1.1495724 = boost
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.019645538 = queryNorm
              0.24905685 = fieldWeight in 3627, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.078125 = fieldNorm(doc=3627)
          0.07728787 = weight(abstract_txt:languages in 3627) [ClassicSimilarity], result of:
            0.07728787 = score(doc=3627,freq=1.0), product of:
              0.19068277 = queryWeight, product of:
                1.8708445 = boost
                5.188118 = idf(docFreq=670, maxDocs=44218)
                0.019645538 = queryNorm
              0.40532172 = fieldWeight in 3627, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.188118 = idf(docFreq=670, maxDocs=44218)
                0.078125 = fieldNorm(doc=3627)
          0.109375 = weight(abstract_txt:linguistic in 3627) [ClassicSimilarity], result of:
            0.109375 = score(doc=3627,freq=1.0), product of:
              0.24035285 = queryWeight, product of:
                2.1004221 = boost
                5.8247695 = idf(docFreq=354, maxDocs=44218)
                0.019645538 = queryNorm
              0.45506012 = fieldWeight in 3627, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8247695 = idf(docFreq=354, maxDocs=44218)
                0.078125 = fieldNorm(doc=3627)
        0.28 = coord(7/25)
    
  4. Aldebei, K.; He, X.; Jia, W.; Yeh, W.: SUDMAD: Sequential and unsupervised decomposition of a multi-author document based on a hidden markov model (2018) 0.14
    0.138831 = sum of:
      0.138831 = product of:
        0.4338469 = sum of:
          0.03304885 = weight(abstract_txt:statistical in 4037) [ClassicSimilarity], result of:
            0.03304885 = score(doc=4037,freq=1.0), product of:
              0.10895975 = queryWeight, product of:
                5.5462847 = idf(docFreq=468, maxDocs=44218)
                0.019645538 = queryNorm
              0.30331245 = fieldWeight in 4037, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5462847 = idf(docFreq=468, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4037)
          0.052917916 = weight(abstract_txt:written in 4037) [ClassicSimilarity], result of:
            0.052917916 = score(doc=4037,freq=2.0), product of:
              0.1183642 = queryWeight, product of:
                1.0422626 = boost
                5.780685 = idf(docFreq=370, maxDocs=44218)
                0.019645538 = queryNorm
              0.44707704 = fieldWeight in 4037, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.780685 = idf(docFreq=370, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4037)
          0.040174134 = weight(abstract_txt:great in 4037) [ClassicSimilarity], result of:
            0.040174134 = score(doc=4037,freq=1.0), product of:
              0.124106035 = queryWeight, product of:
                1.0672432 = boost
                5.9192348 = idf(docFreq=322, maxDocs=44218)
                0.019645538 = queryNorm
              0.32370815 = fieldWeight in 4037, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9192348 = idf(docFreq=322, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4037)
          0.012551778 = weight(abstract_txt:based in 4037) [ClassicSimilarity], result of:
            0.012551778 = score(doc=4037,freq=1.0), product of:
              0.071996056 = queryWeight, product of:
                1.1495724 = boost
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.019645538 = queryNorm
              0.1743398 = fieldWeight in 4037, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4037)
          0.096889295 = weight(abstract_txt:hidden in 4037) [ClassicSimilarity], result of:
            0.096889295 = score(doc=4037,freq=2.0), product of:
              0.17714779 = queryWeight, product of:
                1.2750723 = boost
                7.071914 = idf(docFreq=101, maxDocs=44218)
                0.019645538 = queryNorm
              0.54694045 = fieldWeight in 4037, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.071914 = idf(docFreq=101, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4037)
          0.040707964 = weight(abstract_txt:approach in 4037) [ClassicSimilarity], result of:
            0.040707964 = score(doc=4037,freq=4.0), product of:
              0.0993737 = queryWeight, product of:
                1.3505719 = boost
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.019645538 = queryNorm
              0.40964526 = fieldWeight in 4037, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4037)
          0.10345545 = weight(abstract_txt:markov in 4037) [ClassicSimilarity], result of:
            0.10345545 = score(doc=4037,freq=1.0), product of:
              0.23316541 = queryWeight, product of:
                1.4628474 = boost
                8.113368 = idf(docFreq=35, maxDocs=44218)
                0.019645538 = queryNorm
              0.4436998 = fieldWeight in 4037, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.113368 = idf(docFreq=35, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4037)
          0.054101508 = weight(abstract_txt:languages in 4037) [ClassicSimilarity], result of:
            0.054101508 = score(doc=4037,freq=1.0), product of:
              0.19068277 = queryWeight, product of:
                1.8708445 = boost
                5.188118 = idf(docFreq=670, maxDocs=44218)
                0.019645538 = queryNorm
              0.2837252 = fieldWeight in 4037, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.188118 = idf(docFreq=670, maxDocs=44218)
                0.0546875 = fieldNorm(doc=4037)
        0.32 = coord(8/25)
    
  5. Fautsch, C.; Savoy, J.: Algorithmic stemmers or morphological analysis? : an evaluation (2009) 0.14
    0.13854158 = sum of:
      0.13854158 = product of:
        0.8658849 = sum of:
          0.017931111 = weight(abstract_txt:based in 2950) [ClassicSimilarity], result of:
            0.017931111 = score(doc=2950,freq=1.0), product of:
              0.071996056 = queryWeight, product of:
                1.1495724 = boost
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.019645538 = queryNorm
              0.24905685 = fieldWeight in 2950, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.078125 = fieldNorm(doc=2950)
          0.049966574 = weight(abstract_txt:evaluation in 2950) [ClassicSimilarity], result of:
            0.049966574 = score(doc=2950,freq=1.0), product of:
              0.14256851 = queryWeight, product of:
                1.6176842 = boost
                4.4860687 = idf(docFreq=1353, maxDocs=44218)
                0.019645538 = queryNorm
              0.35047412 = fieldWeight in 2950, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4860687 = idf(docFreq=1353, maxDocs=44218)
                0.078125 = fieldNorm(doc=2950)
          0.21632709 = weight(abstract_txt:stemmer in 2950) [ClassicSimilarity], result of:
            0.21632709 = score(doc=2950,freq=1.0), product of:
              0.30058536 = queryWeight, product of:
                1.6609282 = boost
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.019645538 = queryNorm
              0.71968603 = fieldWeight in 2950, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.211981 = idf(docFreq=11, maxDocs=44218)
                0.078125 = fieldNorm(doc=2950)
          0.58166015 = weight(abstract_txt:stemmers in 2950) [ClassicSimilarity], result of:
            0.58166015 = score(doc=2950,freq=2.0), product of:
              0.58121943 = queryWeight, product of:
                3.2662694 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.019645538 = queryNorm
              1.0007583 = fieldWeight in 2950, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.078125 = fieldNorm(doc=2950)
        0.16 = coord(4/25)