Document (#35458)

Author
Al-Shawakfa, E.
Al-Badarneh, A.
Shatnawi, S.
Al-Rabab'ah, K.
Bani-Ismail, B.
Title
¬A comparison study of some Arabic root finding algorithms
Source
Journal of the American Society for Information Science and Technology. 61(2010) no.5, S.1015-1024
Year
2010
Abstract
Arabic has a complex structure, which makes it difficult to apply natural language processing (NLP). Much research on Arabic NLP (ANLP) does exist; however, it is not as mature as that of other languages. Finding Arabic roots is an important step toward conducting effective research on most of ANLP applications. The authors have studied and compared six root-finding algorithms with success rates of over 90%. All algorithms of this study did not use the same testing corpus and/or benchmarking measures. They unified the testing process by implementing their own algorithm descriptions and building a corpus out of 3823 triliteral roots, applying 73 triliteral patterns, and with 18 affixes, producing around 27.6 million words. They tested the algorithms with the generated corpus and have obtained interesting results; they offer to share the corpus freely for benchmarking and ANLP research.
Theme
Computerlinguistik

Similar documents (content)

  1. Hmeidi, I.I.; Al-Shalabi, R.F.; Al-Taani, A.T.; Najadat, H.; Al-Hazaimeh, S.A.: ¬A novel approach to the extraction of roots from Arabic words using bigrams (2010) 0.40
    0.39641252 = sum of:
      0.39641252 = product of:
        1.2387892 = sum of:
          0.009458879 = weight(abstract_txt:have in 3426) [ClassicSimilarity], result of:
            0.009458879 = score(doc=3426,freq=1.0), product of:
              0.04722648 = queryWeight, product of:
                1.0528612 = boost
                3.2046018 = idf(docFreq=4876, maxDocs=44218)
                0.013997176 = queryNorm
              0.20028761 = fieldWeight in 3426, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.2046018 = idf(docFreq=4876, maxDocs=44218)
                0.0625 = fieldNorm(doc=3426)
          0.016313981 = weight(abstract_txt:study in 3426) [ClassicSimilarity], result of:
            0.016313981 = score(doc=3426,freq=2.0), product of:
              0.05390832 = queryWeight, product of:
                1.1248801 = boost
                3.423806 = idf(docFreq=3916, maxDocs=44218)
                0.013997176 = queryNorm
              0.30262455 = fieldWeight in 3426, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.423806 = idf(docFreq=3916, maxDocs=44218)
                0.0625 = fieldNorm(doc=3426)
          0.0116640795 = weight(abstract_txt:with in 3426) [ClassicSimilarity], result of:
            0.0116640795 = score(doc=3426,freq=3.0), product of:
              0.04310386 = queryWeight, product of:
                1.2319187 = boost
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.013997176 = queryNorm
              0.27060407 = fieldWeight in 3426, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.0625 = fieldNorm(doc=3426)
          0.21732962 = weight(abstract_txt:roots in 3426) [ClassicSimilarity], result of:
            0.21732962 = score(doc=3426,freq=3.0), product of:
              0.2646456 = queryWeight, product of:
                2.4923594 = boost
                7.5860133 = idf(docFreq=60, maxDocs=44218)
                0.013997176 = queryNorm
              0.82121 = fieldWeight in 3426, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.5860133 = idf(docFreq=60, maxDocs=44218)
                0.0625 = fieldNorm(doc=3426)
          0.14492011 = weight(abstract_txt:root in 3426) [ClassicSimilarity], result of:
            0.14492011 = score(doc=3426,freq=1.0), product of:
              0.29132533 = queryWeight, product of:
                2.6149745 = boost
                7.9592175 = idf(docFreq=41, maxDocs=44218)
                0.013997176 = queryNorm
              0.4974511 = fieldWeight in 3426, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.9592175 = idf(docFreq=41, maxDocs=44218)
                0.0625 = fieldNorm(doc=3426)
          0.15118107 = weight(abstract_txt:algorithms in 3426) [ClassicSimilarity], result of:
            0.15118107 = score(doc=3426,freq=2.0), product of:
              0.29965678 = queryWeight, product of:
                3.75064 = boost
                5.707926 = idf(docFreq=398, maxDocs=44218)
                0.013997176 = queryNorm
              0.5045141 = fieldWeight in 3426, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.707926 = idf(docFreq=398, maxDocs=44218)
                0.0625 = fieldNorm(doc=3426)
          0.13037945 = weight(abstract_txt:corpus in 3426) [ClassicSimilarity], result of:
            0.13037945 = score(doc=3426,freq=1.0), product of:
              0.34206495 = queryWeight, product of:
                4.007261 = boost
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.013997176 = queryNorm
              0.3811541 = fieldWeight in 3426, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.0625 = fieldNorm(doc=3426)
          0.557542 = weight(abstract_txt:arabic in 3426) [ClassicSimilarity], result of:
            0.557542 = score(doc=3426,freq=5.0), product of:
              0.5270246 = queryWeight, product of:
                4.9740343 = boost
                7.5697527 = idf(docFreq=61, maxDocs=44218)
                0.013997176 = queryNorm
              1.0579051 = fieldWeight in 3426, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.5697527 = idf(docFreq=61, maxDocs=44218)
                0.0625 = fieldNorm(doc=3426)
        0.32 = coord(8/25)
    
  2. Kanaan, G.; Al-Shalabi, R.; Ghwanmeh, S.; Al-Ma'adeed, H.: ¬A comparison of text-classification techniques applied to Arabic text (2009) 0.26
    0.260397 = sum of:
      0.260397 = product of:
        1.3019849 = sum of:
          0.014188319 = weight(abstract_txt:have in 3096) [ClassicSimilarity], result of:
            0.014188319 = score(doc=3096,freq=1.0), product of:
              0.04722648 = queryWeight, product of:
                1.0528612 = boost
                3.2046018 = idf(docFreq=4876, maxDocs=44218)
                0.013997176 = queryNorm
              0.30043143 = fieldWeight in 3096, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.2046018 = idf(docFreq=4876, maxDocs=44218)
                0.09375 = fieldNorm(doc=3096)
          0.029142827 = weight(abstract_txt:research in 3096) [ClassicSimilarity], result of:
            0.029142827 = score(doc=3096,freq=2.0), product of:
              0.06933298 = queryWeight, product of:
                1.562406 = boost
                3.170338 = idf(docFreq=5046, maxDocs=44218)
                0.013997176 = queryNorm
              0.4203314 = fieldWeight in 3096, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.170338 = idf(docFreq=5046, maxDocs=44218)
                0.09375 = fieldNorm(doc=3096)
          0.22677161 = weight(abstract_txt:algorithms in 3096) [ClassicSimilarity], result of:
            0.22677161 = score(doc=3096,freq=2.0), product of:
              0.29965678 = queryWeight, product of:
                3.75064 = boost
                5.707926 = idf(docFreq=398, maxDocs=44218)
                0.013997176 = queryNorm
              0.75677115 = fieldWeight in 3096, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.707926 = idf(docFreq=398, maxDocs=44218)
                0.09375 = fieldNorm(doc=3096)
          0.19556919 = weight(abstract_txt:corpus in 3096) [ClassicSimilarity], result of:
            0.19556919 = score(doc=3096,freq=1.0), product of:
              0.34206495 = queryWeight, product of:
                4.007261 = boost
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.013997176 = queryNorm
              0.57173115 = fieldWeight in 3096, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.09375 = fieldNorm(doc=3096)
          0.836313 = weight(abstract_txt:arabic in 3096) [ClassicSimilarity], result of:
            0.836313 = score(doc=3096,freq=5.0), product of:
              0.5270246 = queryWeight, product of:
                4.9740343 = boost
                7.5697527 = idf(docFreq=61, maxDocs=44218)
                0.013997176 = queryNorm
              1.5868576 = fieldWeight in 3096, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.5697527 = idf(docFreq=61, maxDocs=44218)
                0.09375 = fieldNorm(doc=3096)
        0.2 = coord(5/25)
    
  3. Hmeidi, I.; Kanaan, G.; Evens, M.: Design and implementation of automatic indexing for information retrieval with Arabic documents (1997) 0.19
    0.19126487 = sum of:
      0.19126487 = product of:
        0.95632434 = sum of:
          0.014580101 = weight(abstract_txt:with in 1660) [ClassicSimilarity], result of:
            0.014580101 = score(doc=1660,freq=3.0), product of:
              0.04310386 = queryWeight, product of:
                1.2319187 = boost
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.013997176 = queryNorm
              0.3382551 = fieldWeight in 1660, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.078125 = fieldNorm(doc=1660)
          0.15684414 = weight(abstract_txt:roots in 1660) [ClassicSimilarity], result of:
            0.15684414 = score(doc=1660,freq=1.0), product of:
              0.2646456 = queryWeight, product of:
                2.4923594 = boost
                7.5860133 = idf(docFreq=60, maxDocs=44218)
                0.013997176 = queryNorm
              0.59265727 = fieldWeight in 1660, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5860133 = idf(docFreq=60, maxDocs=44218)
                0.078125 = fieldNorm(doc=1660)
          0.18115014 = weight(abstract_txt:root in 1660) [ClassicSimilarity], result of:
            0.18115014 = score(doc=1660,freq=1.0), product of:
              0.29132533 = queryWeight, product of:
                2.6149745 = boost
                7.9592175 = idf(docFreq=41, maxDocs=44218)
                0.013997176 = queryNorm
              0.6218139 = fieldWeight in 1660, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.9592175 = idf(docFreq=41, maxDocs=44218)
                0.078125 = fieldNorm(doc=1660)
          0.16297431 = weight(abstract_txt:corpus in 1660) [ClassicSimilarity], result of:
            0.16297431 = score(doc=1660,freq=1.0), product of:
              0.34206495 = queryWeight, product of:
                4.007261 = boost
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.013997176 = queryNorm
              0.4764426 = fieldWeight in 1660, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.078125 = fieldNorm(doc=1660)
          0.44077566 = weight(abstract_txt:arabic in 1660) [ClassicSimilarity], result of:
            0.44077566 = score(doc=1660,freq=2.0), product of:
              0.5270246 = queryWeight, product of:
                4.9740343 = boost
                7.5697527 = idf(docFreq=61, maxDocs=44218)
                0.013997176 = queryNorm
              0.8363474 = fieldWeight in 1660, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.5697527 = idf(docFreq=61, maxDocs=44218)
                0.078125 = fieldNorm(doc=1660)
        0.2 = coord(5/25)
    
  4. Abdelali, A.: Localization in modern standard Arabic (2004) 0.18
    0.18058944 = sum of:
      0.18058944 = product of:
        0.9029472 = sum of:
          0.016721092 = weight(abstract_txt:have in 2066) [ClassicSimilarity], result of:
            0.016721092 = score(doc=2066,freq=2.0), product of:
              0.04722648 = queryWeight, product of:
                1.0528612 = boost
                3.2046018 = idf(docFreq=4876, maxDocs=44218)
                0.013997176 = queryNorm
              0.35406178 = fieldWeight in 2066, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.2046018 = idf(docFreq=4876, maxDocs=44218)
                0.078125 = fieldNorm(doc=2066)
          0.014419658 = weight(abstract_txt:study in 2066) [ClassicSimilarity], result of:
            0.014419658 = score(doc=2066,freq=1.0), product of:
              0.05390832 = queryWeight, product of:
                1.1248801 = boost
                3.423806 = idf(docFreq=3916, maxDocs=44218)
                0.013997176 = queryNorm
              0.26748484 = fieldWeight in 2066, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.423806 = idf(docFreq=3916, maxDocs=44218)
                0.078125 = fieldNorm(doc=2066)
          0.011904602 = weight(abstract_txt:with in 2066) [ClassicSimilarity], result of:
            0.011904602 = score(doc=2066,freq=2.0), product of:
              0.04310386 = queryWeight, product of:
                1.2319187 = boost
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.013997176 = queryNorm
              0.27618414 = fieldWeight in 2066, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.078125 = fieldNorm(doc=2066)
          0.16297431 = weight(abstract_txt:corpus in 2066) [ClassicSimilarity], result of:
            0.16297431 = score(doc=2066,freq=1.0), product of:
              0.34206495 = queryWeight, product of:
                4.007261 = boost
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.013997176 = queryNorm
              0.4764426 = fieldWeight in 2066, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.078125 = fieldNorm(doc=2066)
          0.69692755 = weight(abstract_txt:arabic in 2066) [ClassicSimilarity], result of:
            0.69692755 = score(doc=2066,freq=5.0), product of:
              0.5270246 = queryWeight, product of:
                4.9740343 = boost
                7.5697527 = idf(docFreq=61, maxDocs=44218)
                0.013997176 = queryNorm
              1.3223814 = fieldWeight in 2066, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.5697527 = idf(docFreq=61, maxDocs=44218)
                0.078125 = fieldNorm(doc=2066)
        0.2 = coord(5/25)
    
  5. Rushdi-Saleh, M.; Martín-Valdivia, M.T.; Ureña-López, L.A.; Perea-Ortega, J.M.: OCA: Opinion corpus for Arabic (2011) 0.18
    0.17855825 = sum of:
      0.17855825 = product of:
        0.8927912 = sum of:
          0.011823598 = weight(abstract_txt:have in 4360) [ClassicSimilarity], result of:
            0.011823598 = score(doc=4360,freq=1.0), product of:
              0.04722648 = queryWeight, product of:
                1.0528612 = boost
                3.2046018 = idf(docFreq=4876, maxDocs=44218)
                0.013997176 = queryNorm
              0.2503595 = fieldWeight in 4360, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.2046018 = idf(docFreq=4876, maxDocs=44218)
                0.078125 = fieldNorm(doc=4360)
          0.02428569 = weight(abstract_txt:research in 4360) [ClassicSimilarity], result of:
            0.02428569 = score(doc=4360,freq=2.0), product of:
              0.06933298 = queryWeight, product of:
                1.562406 = boost
                3.170338 = idf(docFreq=5046, maxDocs=44218)
                0.013997176 = queryNorm
              0.35027617 = fieldWeight in 4360, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.170338 = idf(docFreq=5046, maxDocs=44218)
                0.078125 = fieldNorm(doc=4360)
          0.13362646 = weight(abstract_txt:algorithms in 4360) [ClassicSimilarity], result of:
            0.13362646 = score(doc=4360,freq=1.0), product of:
              0.29965678 = queryWeight, product of:
                3.75064 = boost
                5.707926 = idf(docFreq=398, maxDocs=44218)
                0.013997176 = queryNorm
              0.4459317 = fieldWeight in 4360, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.707926 = idf(docFreq=398, maxDocs=44218)
                0.078125 = fieldNorm(doc=4360)
          0.2822798 = weight(abstract_txt:corpus in 4360) [ClassicSimilarity], result of:
            0.2822798 = score(doc=4360,freq=3.0), product of:
              0.34206495 = queryWeight, product of:
                4.007261 = boost
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.013997176 = queryNorm
              0.8252228 = fieldWeight in 4360, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.078125 = fieldNorm(doc=4360)
          0.44077566 = weight(abstract_txt:arabic in 4360) [ClassicSimilarity], result of:
            0.44077566 = score(doc=4360,freq=2.0), product of:
              0.5270246 = queryWeight, product of:
                4.9740343 = boost
                7.5697527 = idf(docFreq=61, maxDocs=44218)
                0.013997176 = queryNorm
              0.8363474 = fieldWeight in 4360, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.5697527 = idf(docFreq=61, maxDocs=44218)
                0.078125 = fieldNorm(doc=4360)
        0.2 = coord(5/25)