Document (#27068)

Author
Abdelali, A.
Title
Localization in modern standard Arabic
Source
Journal of the American Society for Information Science and technology. 55(2004) no.1, S.23-28
Year
2004
Abstract
Modern Standard Arabic (MSA) is the official language used in all Arabic countries. In this paper we describe an investigation of the uniformity of MSA across different countries. Many studies have been carried out locally or regionally an Arabic and its dialects. Here we look an a more global scale by studying language variations between countries. The source material used in this investigation was derived from national newspapers available an the Web, which provided samples of common media usage in each country. This corpus has been used to investigate the lexical characteristics of Modern Standard Arabic as found in 10 different Arabic speaking countries. We describe our collection methods, the types of lexical analysis performed, and the results of our investigations. With respect to newspaper articles, MSA seems to be very uniform across all the countries included in the study, but we have detected various types of differences, with implications for computational processing of MSA.
Theme
Computerlinguistik

Similar documents (content)

  1. Hmeidi, I.I.; Al-Shalabi, R.F.; Al-Taani, A.T.; Najadat, H.; Al-Hazaimeh, S.A.: ¬A novel approach to the extraction of roots from Arabic words using bigrams (2010) 0.22
    0.22159661 = sum of:
      0.22159661 = product of:
        0.9233192 = sum of:
          0.012845525 = weight(abstract_txt:been in 427) [ClassicSimilarity], result of:
            0.012845525 = score(doc=427,freq=1.0), product of:
              0.056514762 = queryWeight, product of:
                1.0751494 = boost
                3.6367204 = idf(docFreq=3059, maxDocs=42740)
                0.014453838 = queryNorm
              0.22729503 = fieldWeight in 427, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6367204 = idf(docFreq=3059, maxDocs=42740)
                0.0625 = fieldNorm(doc=427)
          0.008260317 = weight(abstract_txt:this in 427) [ClassicSimilarity], result of:
            0.008260317 = score(doc=427,freq=2.0), product of:
              0.038254183 = queryWeight, product of:
                1.0833604 = boost
                2.442996 = idf(docFreq=10095, maxDocs=42740)
                0.014453838 = queryNorm
              0.21593238 = fieldWeight in 427, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.442996 = idf(docFreq=10095, maxDocs=42740)
                0.0625 = fieldNorm(doc=427)
          0.019699143 = weight(abstract_txt:language in 427) [ClassicSimilarity], result of:
            0.019699143 = score(doc=427,freq=1.0), product of:
              0.075155176 = queryWeight, product of:
                1.2398448 = boost
                4.1938066 = idf(docFreq=1752, maxDocs=42740)
                0.014453838 = queryNorm
              0.26211292 = fieldWeight in 427, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1938066 = idf(docFreq=1752, maxDocs=42740)
                0.0625 = fieldNorm(doc=427)
          0.021776855 = weight(abstract_txt:used in 427) [ClassicSimilarity], result of:
            0.021776855 = score(doc=427,freq=2.0), product of:
              0.07300365 = queryWeight, product of:
                1.4966002 = boost
                3.3748589 = idf(docFreq=3975, maxDocs=42740)
                0.014453838 = queryNorm
              0.29829818 = fieldWeight in 427, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.3748589 = idf(docFreq=3975, maxDocs=42740)
                0.0625 = fieldNorm(doc=427)
          0.0788293 = weight(abstract_txt:modern in 427) [ClassicSimilarity], result of:
            0.0788293 = score(doc=427,freq=1.0), product of:
              0.21684508 = queryWeight, product of:
                2.5793383 = boost
                5.8164515 = idf(docFreq=345, maxDocs=42740)
                0.014453838 = queryNorm
              0.36352822 = fieldWeight in 427, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8164515 = idf(docFreq=345, maxDocs=42740)
                0.0625 = fieldNorm(doc=427)
          0.7819081 = weight(abstract_txt:arabic in 427) [ClassicSimilarity], result of:
            0.7819081 = score(doc=427,freq=5.0), product of:
              0.7375898 = queryWeight, product of:
                6.727535 = boost
                7.585353 = idf(docFreq=58, maxDocs=42740)
                0.014453838 = queryNorm
              1.0600853 = fieldWeight in 427, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.585353 = idf(docFreq=58, maxDocs=42740)
                0.0625 = fieldNorm(doc=427)
        0.24 = coord(6/25)
    
  2. Shaalan, K.; Raza, H.: NERA: Named Entity Recognition for Arabic (2009) 0.22
    0.21619907 = sum of:
      0.21619907 = product of:
        0.9008295 = sum of:
          0.019467963 = weight(abstract_txt:been in 4954) [ClassicSimilarity], result of:
            0.019467963 = score(doc=4954,freq=3.0), product of:
              0.056514762 = queryWeight, product of:
                1.0751494 = boost
                3.6367204 = idf(docFreq=3059, maxDocs=42740)
                0.014453838 = queryNorm
              0.34447572 = fieldWeight in 4954, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.6367204 = idf(docFreq=3059, maxDocs=42740)
                0.0546875 = fieldNorm(doc=4954)
          0.0072277775 = weight(abstract_txt:this in 4954) [ClassicSimilarity], result of:
            0.0072277775 = score(doc=4954,freq=2.0), product of:
              0.038254183 = queryWeight, product of:
                1.0833604 = boost
                2.442996 = idf(docFreq=10095, maxDocs=42740)
                0.014453838 = queryNorm
              0.18894084 = fieldWeight in 4954, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.442996 = idf(docFreq=10095, maxDocs=42740)
                0.0546875 = fieldNorm(doc=4954)
          0.016666226 = weight(abstract_txt:different in 4954) [ClassicSimilarity], result of:
            0.016666226 = score(doc=4954,freq=2.0), product of:
              0.05832707 = queryWeight, product of:
                1.0922523 = boost
                3.694571 = idf(docFreq=2887, maxDocs=42740)
                0.014453838 = queryNorm
              0.2857374 = fieldWeight in 4954, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.694571 = idf(docFreq=2887, maxDocs=42740)
                0.0546875 = fieldNorm(doc=4954)
          0.0344735 = weight(abstract_txt:language in 4954) [ClassicSimilarity], result of:
            0.0344735 = score(doc=4954,freq=4.0), product of:
              0.075155176 = queryWeight, product of:
                1.2398448 = boost
                4.1938066 = idf(docFreq=1752, maxDocs=42740)
                0.014453838 = queryNorm
              0.45869762 = fieldWeight in 4954, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.1938066 = idf(docFreq=1752, maxDocs=42740)
                0.0546875 = fieldNorm(doc=4954)
          0.013473743 = weight(abstract_txt:used in 4954) [ClassicSimilarity], result of:
            0.013473743 = score(doc=4954,freq=1.0), product of:
              0.07300365 = queryWeight, product of:
                1.4966002 = boost
                3.3748589 = idf(docFreq=3975, maxDocs=42740)
                0.014453838 = queryNorm
              0.1845626 = fieldWeight in 4954, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3748589 = idf(docFreq=3975, maxDocs=42740)
                0.0546875 = fieldNorm(doc=4954)
          0.8095203 = weight(abstract_txt:arabic in 4954) [ClassicSimilarity], result of:
            0.8095203 = score(doc=4954,freq=7.0), product of:
              0.7375898 = queryWeight, product of:
                6.727535 = boost
                7.585353 = idf(docFreq=58, maxDocs=42740)
                0.014453838 = queryNorm
              1.0975211 = fieldWeight in 4954, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                7.585353 = idf(docFreq=58, maxDocs=42740)
                0.0546875 = fieldNorm(doc=4954)
        0.24 = coord(6/25)
    
  3. Mutawa, F.; Alnajem, S.; Alzhouri, F.: ¬An HPSG approach to Arabic nominal sentences (2008) 0.21
    0.20832036 = sum of:
      0.20832036 = product of:
        1.3020023 = sum of:
          0.02569105 = weight(abstract_txt:been in 3369) [ClassicSimilarity], result of:
            0.02569105 = score(doc=3369,freq=1.0), product of:
              0.056514762 = queryWeight, product of:
                1.0751494 = boost
                3.6367204 = idf(docFreq=3059, maxDocs=42740)
                0.014453838 = queryNorm
              0.45459005 = fieldWeight in 3369, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6367204 = idf(docFreq=3059, maxDocs=42740)
                0.125 = fieldNorm(doc=3369)
          0.016520634 = weight(abstract_txt:this in 3369) [ClassicSimilarity], result of:
            0.016520634 = score(doc=3369,freq=2.0), product of:
              0.038254183 = queryWeight, product of:
                1.0833604 = boost
                2.442996 = idf(docFreq=10095, maxDocs=42740)
                0.014453838 = queryNorm
              0.43186477 = fieldWeight in 3369, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.442996 = idf(docFreq=10095, maxDocs=42740)
                0.125 = fieldNorm(doc=3369)
          0.048463866 = weight(abstract_txt:types in 3369) [ClassicSimilarity], result of:
            0.048463866 = score(doc=3369,freq=1.0), product of:
              0.08628184 = queryWeight, product of:
                1.3284572 = boost
                4.4935403 = idf(docFreq=1298, maxDocs=42740)
                0.014453838 = queryNorm
              0.56169254 = fieldWeight in 3369, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4935403 = idf(docFreq=1298, maxDocs=42740)
                0.125 = fieldNorm(doc=3369)
          1.2113267 = weight(abstract_txt:arabic in 3369) [ClassicSimilarity], result of:
            1.2113267 = score(doc=3369,freq=3.0), product of:
              0.7375898 = queryWeight, product of:
                6.727535 = boost
                7.585353 = idf(docFreq=58, maxDocs=42740)
                0.014453838 = queryNorm
              1.642277 = fieldWeight in 3369, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.585353 = idf(docFreq=58, maxDocs=42740)
                0.125 = fieldNorm(doc=3369)
        0.16 = coord(4/25)
    
  4. Kanaan, G.; Al-Shalabi, R.; Ghwanmeh, S.; Al-Ma'adeed, H.: ¬A comparison of text-classification techniques applied to Arabic text (2009) 0.20
    0.1982126 = sum of:
      0.1982126 = product of:
        1.2388288 = sum of:
          0.03337365 = weight(abstract_txt:been in 97) [ClassicSimilarity], result of:
            0.03337365 = score(doc=97,freq=3.0), product of:
              0.056514762 = queryWeight, product of:
                1.0751494 = boost
                3.6367204 = idf(docFreq=3059, maxDocs=42740)
                0.014453838 = queryNorm
              0.5905298 = fieldWeight in 97, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.6367204 = idf(docFreq=3059, maxDocs=42740)
                0.09375 = fieldNorm(doc=97)
          0.012390476 = weight(abstract_txt:this in 97) [ClassicSimilarity], result of:
            0.012390476 = score(doc=97,freq=2.0), product of:
              0.038254183 = queryWeight, product of:
                1.0833604 = boost
                2.442996 = idf(docFreq=10095, maxDocs=42740)
                0.014453838 = queryNorm
              0.32389858 = fieldWeight in 97, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.442996 = idf(docFreq=10095, maxDocs=42740)
                0.09375 = fieldNorm(doc=97)
          0.020202518 = weight(abstract_txt:different in 97) [ClassicSimilarity], result of:
            0.020202518 = score(doc=97,freq=1.0), product of:
              0.05832707 = queryWeight, product of:
                1.0922523 = boost
                3.694571 = idf(docFreq=2887, maxDocs=42740)
                0.014453838 = queryNorm
              0.34636605 = fieldWeight in 97, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.694571 = idf(docFreq=2887, maxDocs=42740)
                0.09375 = fieldNorm(doc=97)
          1.1728622 = weight(abstract_txt:arabic in 97) [ClassicSimilarity], result of:
            1.1728622 = score(doc=97,freq=5.0), product of:
              0.7375898 = queryWeight, product of:
                6.727535 = boost
                7.585353 = idf(docFreq=58, maxDocs=42740)
                0.014453838 = queryNorm
              1.590128 = fieldWeight in 97, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.585353 = idf(docFreq=58, maxDocs=42740)
                0.09375 = fieldNorm(doc=97)
        0.16 = coord(4/25)
    
  5. Aqeel, S.U.; Beitzel, S.M.; Jensen, E.C.; Grossman, D.; Frieder, O.: On the development of name search techniques for Arabic (2006) 0.17
    0.17060201 = sum of:
      0.17060201 = product of:
        0.85301006 = sum of:
          0.016056905 = weight(abstract_txt:been in 290) [ClassicSimilarity], result of:
            0.016056905 = score(doc=290,freq=1.0), product of:
              0.056514762 = queryWeight, product of:
                1.0751494 = boost
                3.6367204 = idf(docFreq=3059, maxDocs=42740)
                0.014453838 = queryNorm
              0.28411877 = fieldWeight in 290, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6367204 = idf(docFreq=3059, maxDocs=42740)
                0.078125 = fieldNorm(doc=290)
          0.010325396 = weight(abstract_txt:this in 290) [ClassicSimilarity], result of:
            0.010325396 = score(doc=290,freq=2.0), product of:
              0.038254183 = queryWeight, product of:
                1.0833604 = boost
                2.442996 = idf(docFreq=10095, maxDocs=42740)
                0.014453838 = queryNorm
              0.2699155 = fieldWeight in 290, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.442996 = idf(docFreq=10095, maxDocs=42740)
                0.078125 = fieldNorm(doc=290)
          0.01683543 = weight(abstract_txt:different in 290) [ClassicSimilarity], result of:
            0.01683543 = score(doc=290,freq=1.0), product of:
              0.05832707 = queryWeight, product of:
                1.0922523 = boost
                3.694571 = idf(docFreq=2887, maxDocs=42740)
                0.014453838 = queryNorm
              0.28863835 = fieldWeight in 290, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.694571 = idf(docFreq=2887, maxDocs=42740)
                0.078125 = fieldNorm(doc=290)
          0.052713126 = weight(abstract_txt:standard in 290) [ClassicSimilarity], result of:
            0.052713126 = score(doc=290,freq=1.0), product of:
              0.14289936 = queryWeight, product of:
                2.093865 = boost
                4.7217007 = idf(docFreq=1033, maxDocs=42740)
                0.014453838 = queryNorm
              0.36888286 = fieldWeight in 290, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7217007 = idf(docFreq=1033, maxDocs=42740)
                0.078125 = fieldNorm(doc=290)
          0.7570792 = weight(abstract_txt:arabic in 290) [ClassicSimilarity], result of:
            0.7570792 = score(doc=290,freq=3.0), product of:
              0.7375898 = queryWeight, product of:
                6.727535 = boost
                7.585353 = idf(docFreq=58, maxDocs=42740)
                0.014453838 = queryNorm
              1.0264231 = fieldWeight in 290, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.585353 = idf(docFreq=58, maxDocs=42740)
                0.078125 = fieldNorm(doc=290)
        0.2 = coord(5/25)