Document (#20603)

Author
Ucoluk, G.
Toroslu, I.H.
Title
¬A genetic algorithm approach for verification of the syllable-based text compression technique
Source
Journal of information science. 23(1997) no.5, S.365-372.
Year
1997
Abstract
It is possible to decompose any text into strings that have lengthy greater than 1 and occur frequently, provided that an easy mechanism exists for it. Having in one hand the set of such frequently occuring strings and in the other the set of letters and symbols, it is possible to compress the text using Huffman coding over an alphabet which is a subset of the union of these 2 sets. Observations reveal that, in most cases, the maximal inclusion of the strings leads to an optimal length of the compressed text. However, the verification of this prediction requires the consideration of all subsets in order to find the one that leads to the best compression. Describes a genetic algorithm devised and used for this process and concludes that Turkish texts, because of the agglutinative nature of the language and the highly regular syllable formation, provides a useful test bed for this technique
Object
Huffman codes

Similar documents (content)

  1. Robertson, A.M.; Willett, P.: Generation of equifrequent groups of words using a genetic algorithm (1994) 0.14
    0.13883986 = sum of:
      0.13883986 = product of:
        0.69419926 = sum of:
          0.15532368 = weight(abstract_txt:turkish in 158) [ClassicSimilarity], result of:
            0.15532368 = score(doc=158,freq=1.0), product of:
              0.18360113 = queryWeight, product of:
                1.2393905 = boost
                9.023833 = idf(docFreq=13, maxDocs=42740)
                0.016416332 = queryNorm
              0.84598434 = fieldWeight in 158, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.023833 = idf(docFreq=13, maxDocs=42740)
                0.09375 = fieldNorm(doc=158)
          0.11189698 = weight(abstract_txt:algorithm in 158) [ClassicSimilarity], result of:
            0.11189698 = score(doc=158,freq=2.0), product of:
              0.14754657 = queryWeight, product of:
                1.5712672 = boost
                5.7200913 = idf(docFreq=380, maxDocs=42740)
                0.016416332 = queryNorm
              0.7583841 = fieldWeight in 158, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.7200913 = idf(docFreq=380, maxDocs=42740)
                0.09375 = fieldNorm(doc=158)
          0.03245266 = weight(abstract_txt:that in 158) [ClassicSimilarity], result of:
            0.03245266 = score(doc=158,freq=5.0), product of:
              0.06464731 = queryWeight, product of:
                1.644488 = boost
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.016416332 = queryNorm
              0.50199556 = fieldWeight in 158, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.09375 = fieldNorm(doc=158)
          0.3383554 = weight(abstract_txt:genetic in 158) [ClassicSimilarity], result of:
            0.3383554 = score(doc=158,freq=3.0), product of:
              0.26952675 = queryWeight, product of:
                2.123667 = boost
                7.731065 = idf(docFreq=50, maxDocs=42740)
                0.016416332 = queryNorm
              1.2553685 = fieldWeight in 158, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.731065 = idf(docFreq=50, maxDocs=42740)
                0.09375 = fieldNorm(doc=158)
          0.056170546 = weight(abstract_txt:text in 158) [ClassicSimilarity], result of:
            0.056170546 = score(doc=158,freq=1.0), product of:
              0.14793672 = queryWeight, product of:
                2.2250433 = boost
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.016416332 = queryNorm
              0.37969306 = fieldWeight in 158, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.09375 = fieldNorm(doc=158)
        0.2 = coord(5/25)
    
  2. Cheng, K.-S.; Young, G.H.; Wong, K.-F.: ¬A study on word-based and integral-bit Chinese text compression algorithms (1999) 0.12
    0.12468147 = sum of:
      0.12468147 = product of:
        0.62340736 = sum of:
          0.010787003 = weight(abstract_txt:this in 4057) [ClassicSimilarity], result of:
            0.010787003 = score(doc=4057,freq=1.0), product of:
              0.040370114 = queryWeight, product of:
                1.0066097 = boost
                2.442996 = idf(docFreq=10095, maxDocs=42740)
                0.016416332 = queryNorm
              0.26720268 = fieldWeight in 4057, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.442996 = idf(docFreq=10095, maxDocs=42740)
                0.109375 = fieldNorm(doc=4057)
          0.15988614 = weight(abstract_txt:algorithm in 4057) [ClassicSimilarity], result of:
            0.15988614 = score(doc=4057,freq=3.0), product of:
              0.14754657 = queryWeight, product of:
                1.5712672 = boost
                5.7200913 = idf(docFreq=380, maxDocs=42740)
                0.016416332 = queryNorm
              1.0836316 = fieldWeight in 4057, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.7200913 = idf(docFreq=380, maxDocs=42740)
                0.109375 = fieldNorm(doc=4057)
          0.023945676 = weight(abstract_txt:that in 4057) [ClassicSimilarity], result of:
            0.023945676 = score(doc=4057,freq=2.0), product of:
              0.06464731 = queryWeight, product of:
                1.644488 = boost
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.016416332 = queryNorm
              0.37040484 = fieldWeight in 4057, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.109375 = fieldNorm(doc=4057)
          0.36325628 = weight(abstract_txt:compression in 4057) [ClassicSimilarity], result of:
            0.36325628 = score(doc=4057,freq=3.0), product of:
              0.25499442 = queryWeight, product of:
                2.0656219 = boost
                7.519756 = idf(docFreq=62, maxDocs=42740)
                0.016416332 = queryNorm
              1.4245656 = fieldWeight in 4057, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.519756 = idf(docFreq=62, maxDocs=42740)
                0.109375 = fieldNorm(doc=4057)
          0.0655323 = weight(abstract_txt:text in 4057) [ClassicSimilarity], result of:
            0.0655323 = score(doc=4057,freq=1.0), product of:
              0.14793672 = queryWeight, product of:
                2.2250433 = boost
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.016416332 = queryNorm
              0.44297522 = fieldWeight in 4057, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.109375 = fieldNorm(doc=4057)
        0.2 = coord(5/25)
    
  3. Karakos, A.: Greeklish : an experimental interface for automatic transliteration (2003) 0.11
    0.109391406 = sum of:
      0.109391406 = product of:
        0.45579752 = sum of:
          0.017228909 = weight(abstract_txt:this in 2821) [ClassicSimilarity], result of:
            0.017228909 = score(doc=2821,freq=5.0), product of:
              0.040370114 = queryWeight, product of:
                1.0066097 = boost
                2.442996 = idf(docFreq=10095, maxDocs=42740)
                0.016416332 = queryNorm
              0.42677385 = fieldWeight in 2821, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                2.442996 = idf(docFreq=10095, maxDocs=42740)
                0.078125 = fieldNorm(doc=2821)
          0.12183221 = weight(abstract_txt:letters in 2821) [ClassicSimilarity], result of:
            0.12183221 = score(doc=2821,freq=2.0), product of:
              0.13995953 = queryWeight, product of:
                1.0821108 = boost
                7.8787007 = idf(docFreq=43, maxDocs=42740)
                0.016416332 = queryNorm
              0.87048167 = fieldWeight in 2821, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.8787007 = idf(docFreq=43, maxDocs=42740)
                0.078125 = fieldNorm(doc=2821)
          0.19864336 = weight(abstract_txt:alphabet in 2821) [ClassicSimilarity], result of:
            0.19864336 = score(doc=2821,freq=3.0), product of:
              0.16937397 = queryWeight, product of:
                1.1904025 = boost
                8.667158 = idf(docFreq=19, maxDocs=42740)
                0.016416332 = queryNorm
              1.1728092 = fieldWeight in 2821, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.667158 = idf(docFreq=19, maxDocs=42740)
                0.078125 = fieldNorm(doc=2821)
          0.035053056 = weight(abstract_txt:possible in 2821) [ClassicSimilarity], result of:
            0.035053056 = score(doc=2821,freq=1.0), product of:
              0.09682741 = queryWeight, product of:
                1.2728717 = boost
                4.633803 = idf(docFreq=1128, maxDocs=42740)
                0.016416332 = queryNorm
              0.36201584 = fieldWeight in 2821, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.633803 = idf(docFreq=1128, maxDocs=42740)
                0.078125 = fieldNorm(doc=2821)
          0.065935925 = weight(abstract_txt:algorithm in 2821) [ClassicSimilarity], result of:
            0.065935925 = score(doc=2821,freq=1.0), product of:
              0.14754657 = queryWeight, product of:
                1.5712672 = boost
                5.7200913 = idf(docFreq=380, maxDocs=42740)
                0.016416332 = queryNorm
              0.44688213 = fieldWeight in 2821, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7200913 = idf(docFreq=380, maxDocs=42740)
                0.078125 = fieldNorm(doc=2821)
          0.017104054 = weight(abstract_txt:that in 2821) [ClassicSimilarity], result of:
            0.017104054 = score(doc=2821,freq=2.0), product of:
              0.06464731 = queryWeight, product of:
                1.644488 = boost
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.016416332 = queryNorm
              0.2645749 = fieldWeight in 2821, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.078125 = fieldNorm(doc=2821)
        0.24 = coord(6/25)
    
  4. Akman, K.I.: ¬A new text compression technique based on natural language structure (1995) 0.11
    0.10859158 = sum of:
      0.10859158 = product of:
        0.5429579 = sum of:
          0.1294364 = weight(abstract_txt:turkish in 1929) [ClassicSimilarity], result of:
            0.1294364 = score(doc=1929,freq=1.0), product of:
              0.18360113 = queryWeight, product of:
                1.2393905 = boost
                9.023833 = idf(docFreq=13, maxDocs=42740)
                0.016416332 = queryNorm
              0.704987 = fieldWeight in 1929, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.023833 = idf(docFreq=13, maxDocs=42740)
                0.078125 = fieldNorm(doc=1929)
          0.061609868 = weight(abstract_txt:technique in 1929) [ClassicSimilarity], result of:
            0.061609868 = score(doc=1929,freq=1.0), product of:
              0.14102018 = queryWeight, product of:
                1.5361234 = boost
                5.5921526 = idf(docFreq=432, maxDocs=42740)
                0.016416332 = queryNorm
              0.4368869 = fieldWeight in 1929, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5921526 = idf(docFreq=432, maxDocs=42740)
                0.078125 = fieldNorm(doc=1929)
          0.09324749 = weight(abstract_txt:algorithm in 1929) [ClassicSimilarity], result of:
            0.09324749 = score(doc=1929,freq=2.0), product of:
              0.14754657 = queryWeight, product of:
                1.5712672 = boost
                5.7200913 = idf(docFreq=380, maxDocs=42740)
                0.016416332 = queryNorm
              0.6319868 = fieldWeight in 1929, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.7200913 = idf(docFreq=380, maxDocs=42740)
                0.078125 = fieldNorm(doc=1929)
          0.21185535 = weight(abstract_txt:compression in 1929) [ClassicSimilarity], result of:
            0.21185535 = score(doc=1929,freq=2.0), product of:
              0.25499442 = queryWeight, product of:
                2.0656219 = boost
                7.519756 = idf(docFreq=62, maxDocs=42740)
                0.016416332 = queryNorm
              0.8308235 = fieldWeight in 1929, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.519756 = idf(docFreq=62, maxDocs=42740)
                0.078125 = fieldNorm(doc=1929)
          0.04680879 = weight(abstract_txt:text in 1929) [ClassicSimilarity], result of:
            0.04680879 = score(doc=1929,freq=1.0), product of:
              0.14793672 = queryWeight, product of:
                2.2250433 = boost
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.016416332 = queryNorm
              0.3164109 = fieldWeight in 1929, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.078125 = fieldNorm(doc=1929)
        0.2 = coord(5/25)
    
  5. Kokol, P.; Podgorelec, V.; Zorman, M.; Kokol, T.; Njivar, T.: Computer and natural language texts : a comparison based on long-range correlations (1999) 0.10
    0.10079987 = sum of:
      0.10079987 = product of:
        0.50399935 = sum of:
          0.08200914 = weight(abstract_txt:symbols in 5300) [ClassicSimilarity], result of:
            0.08200914 = score(doc=5300,freq=1.0), product of:
              0.11993843 = queryWeight, product of:
                1.0017277 = boost
                7.2934427 = idf(docFreq=78, maxDocs=42740)
                0.016416332 = queryNorm
              0.6837603 = fieldWeight in 5300, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.2934427 = idf(docFreq=78, maxDocs=42740)
                0.09375 = fieldNorm(doc=5300)
          0.013075823 = weight(abstract_txt:this in 5300) [ClassicSimilarity], result of:
            0.013075823 = score(doc=5300,freq=2.0), product of:
              0.040370114 = queryWeight, product of:
                1.0066097 = boost
                2.442996 = idf(docFreq=10095, maxDocs=42740)
                0.016416332 = queryNorm
              0.32389858 = fieldWeight in 5300, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.442996 = idf(docFreq=10095, maxDocs=42740)
                0.09375 = fieldNorm(doc=5300)
          0.13107318 = weight(abstract_txt:maximal in 5300) [ClassicSimilarity], result of:
            0.13107318 = score(doc=5300,freq=1.0), product of:
              0.16395555 = queryWeight, product of:
                1.1712067 = boost
                8.527396 = idf(docFreq=22, maxDocs=42740)
                0.016416332 = queryNorm
              0.79944336 = fieldWeight in 5300, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.527396 = idf(docFreq=22, maxDocs=42740)
                0.09375 = fieldNorm(doc=5300)
          0.025137722 = weight(abstract_txt:that in 5300) [ClassicSimilarity], result of:
            0.025137722 = score(doc=5300,freq=3.0), product of:
              0.06464731 = queryWeight, product of:
                1.644488 = boost
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.016416332 = queryNorm
              0.38884407 = fieldWeight in 5300, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.09375 = fieldNorm(doc=5300)
          0.2527035 = weight(abstract_txt:strings in 5300) [ClassicSimilarity], result of:
            0.2527035 = score(doc=5300,freq=1.0), product of:
              0.3662954 = queryWeight, product of:
                3.032123 = boost
                7.358825 = idf(docFreq=73, maxDocs=42740)
                0.016416332 = queryNorm
              0.68988985 = fieldWeight in 5300, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.358825 = idf(docFreq=73, maxDocs=42740)
                0.09375 = fieldNorm(doc=5300)
        0.2 = coord(5/25)