Document (#42089)

Author
Ruano-Ordás, D.
Fdez-Riverola, F.
Méndez, J.R.
Title
Using evolutionary computation for discovering spam patterns from e-mail samples
Source
Information processing and management. 54(2018) no.2, S.303-317
Year
2018
Abstract
One of the most relevant problems affecting the efficient use of e-mail to communicate worldwide is the spam phenomenon. Spamming involves flooding Internet with undesired messages aimed to promote illegal or low value products and services. Beyond the existence of different well-known machine learning techniques, collaborative schemes and other complementary approaches, some popular anti-spam frameworks such as SpamAssassin or Wirebrush4SPAM enabled the possibility of using regular expressions to effectively improve filter performance. In this work, we provide a review of existing proposals to automatically generate fully functional regular expressions from any input dataset combining spam and ham messages. Due to configuration difficulties and the low performance achieved by analysed schemes, in this work we introduce DiscoverRegex, a novel automatic spam pattern-finding tool. Patterns generated DiscoverRegex outperform those created by existing approaches (able to avoid FP errors) whilst minimising the computational resources required for its proper operation. DiscoverRegex source code is publicly available at https://github.com/sing-group/DiscoverRegex.
Content
Vgl.: https://doi.org/10.1016/j.ipm.2017.12.001.
Object
EMail

Similar documents (author)

  1. Martinez Méndez, F.J.: Aproximacion general a la evaluacion de la recuperacion mediante motores de busqueda en Internet (2001) 1.75
    1.7497653 = sum of:
      1.7497653 = product of:
        3.4995306 = sum of:
          3.4995306 = weight(author_txt:méndez in 4804) [ClassicSimilarity], result of:
            3.4995306 = score(doc=4804,freq=1.0), product of:
              0.7305853 = queryWeight, product of:
                1.0343843 = boost
                9.580074 = idf(docFreq=7, maxDocs=42596)
                0.07372591 = queryNorm
              4.790037 = fieldWeight in 4804, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.580074 = idf(docFreq=7, maxDocs=42596)
                0.5 = fieldNorm(doc=4804)
        0.5 = coord(1/2)
    
  2. Valdivia, J. Fdez- -> Fdez-Valdivia, J.: 1.68
    1.6769124 = sum of:
      1.6769124 = product of:
        3.3538249 = sum of:
          3.3538249 = weight(author_txt:fdez in 1042) [ClassicSimilarity], result of:
            3.3538249 = score(doc=1042,freq=2.0), product of:
              0.6828214 = queryWeight, product of:
                9.2616205 = idf(docFreq=10, maxDocs=42596)
                0.07372591 = queryNorm
              4.911716 = fieldWeight in 1042, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.2616205 = idf(docFreq=10, maxDocs=42596)
                0.375 = fieldNorm(doc=1042)
        0.5 = coord(1/2)
    
  3. Valdivia, J. Fdez -> Fdez-Valdivia, J.: 1.68
    1.6769124 = sum of:
      1.6769124 = product of:
        3.3538249 = sum of:
          3.3538249 = weight(author_txt:fdez in 1503) [ClassicSimilarity], result of:
            3.3538249 = score(doc=1503,freq=2.0), product of:
              0.6828214 = queryWeight, product of:
                9.2616205 = idf(docFreq=10, maxDocs=42596)
                0.07372591 = queryNorm
              4.911716 = fieldWeight in 1503, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.2616205 = idf(docFreq=10, maxDocs=42596)
                0.375 = fieldNorm(doc=1503)
        0.5 = coord(1/2)
    
  4. Rodríguez, E.M.M. -> Méndez Rodríguez, E.M.: 1.53
    1.5310446 = sum of:
      1.5310446 = product of:
        3.0620892 = sum of:
          3.0620892 = weight(author_txt:méndez in 2856) [ClassicSimilarity], result of:
            3.0620892 = score(doc=2856,freq=1.0), product of:
              0.7305853 = queryWeight, product of:
                1.0343843 = boost
                9.580074 = idf(docFreq=7, maxDocs=42596)
                0.07372591 = queryNorm
              4.1912823 = fieldWeight in 2856, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.580074 = idf(docFreq=7, maxDocs=42596)
                0.4375 = fieldNorm(doc=2856)
        0.5 = coord(1/2)
    
  5. Greenberg, J.; Méndez Rodríguez, E.M.: Introduction: toward a more library-like Web via semantic knitting (2006) 1.53
    1.5310446 = sum of:
      1.5310446 = product of:
        3.0620892 = sum of:
          3.0620892 = weight(author_txt:méndez in 529) [ClassicSimilarity], result of:
            3.0620892 = score(doc=529,freq=1.0), product of:
              0.7305853 = queryWeight, product of:
                1.0343843 = boost
                9.580074 = idf(docFreq=7, maxDocs=42596)
                0.07372591 = queryNorm
              4.1912823 = fieldWeight in 529, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.580074 = idf(docFreq=7, maxDocs=42596)
                0.4375 = fieldNorm(doc=529)
        0.5 = coord(1/2)
    

Similar documents (content)

  1. Pera, M.S.; Ng, Y.-K.: SpamED : a spam E-mail detection approach based on phrase similarity (2009) 0.35
    0.35477334 = sum of:
      0.35477334 = product of:
        1.7738667 = sum of:
          0.032255594 = weight(abstract_txt:approaches in 3901) [ClassicSimilarity], result of:
            0.032255594 = score(doc=3901,freq=1.0), product of:
              0.0885452 = queryWeight, product of:
                1.3004916 = boost
                4.662834 = idf(docFreq=1092, maxDocs=42596)
                0.014601838 = queryNorm
              0.36428392 = fieldWeight in 3901, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.662834 = idf(docFreq=1092, maxDocs=42596)
                0.078125 = fieldNorm(doc=3901)
          0.033095714 = weight(abstract_txt:existing in 3901) [ClassicSimilarity], result of:
            0.033095714 = score(doc=3901,freq=1.0), product of:
              0.09007609 = queryWeight, product of:
                1.3116857 = boost
                4.70297 = idf(docFreq=1049, maxDocs=42596)
                0.014601838 = queryNorm
              0.36741954 = fieldWeight in 3901, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.70297 = idf(docFreq=1049, maxDocs=42596)
                0.078125 = fieldNorm(doc=3901)
          0.16689217 = weight(abstract_txt:mail in 3901) [ClassicSimilarity], result of:
            0.16689217 = score(doc=3901,freq=6.0), product of:
              0.14577013 = queryWeight, product of:
                1.6686271 = boost
                5.982762 = idf(docFreq=291, maxDocs=42596)
                0.014601838 = queryNorm
              1.1448996 = fieldWeight in 3901, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.982762 = idf(docFreq=291, maxDocs=42596)
                0.078125 = fieldNorm(doc=3901)
          0.25764927 = weight(abstract_txt:messages in 3901) [ClassicSimilarity], result of:
            0.25764927 = score(doc=3901,freq=6.0), product of:
              0.19471401 = queryWeight, product of:
                1.9285177 = boost
                6.9145837 = idf(docFreq=114, maxDocs=42596)
                0.014601838 = queryNorm
              1.323219 = fieldWeight in 3901, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.9145837 = idf(docFreq=114, maxDocs=42596)
                0.078125 = fieldNorm(doc=3901)
          1.2839739 = weight(abstract_txt:spam in 3901) [ClassicSimilarity], result of:
            1.2839739 = score(doc=3901,freq=7.0), product of:
              0.7323969 = queryWeight, product of:
                5.913822 = boost
                8.481462 = idf(docFreq=23, maxDocs=42596)
                0.014601838 = queryNorm
              1.7531122 = fieldWeight in 3901, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                8.481462 = idf(docFreq=23, maxDocs=42596)
                0.078125 = fieldNorm(doc=3901)
        0.2 = coord(5/25)
    
  2. Sedhai, S.; Sun, A.: ¬An analysis of 14 Million tweets on hashtag-oriented spamming* (2017) 0.09
    0.08827722 = sum of:
      0.08827722 = product of:
        1.1034653 = sum of:
          0.15248214 = weight(abstract_txt:spamming in 4684) [ClassicSimilarity], result of:
            0.15248214 = score(doc=4684,freq=2.0), product of:
              0.18231721 = queryWeight, product of:
                1.319544 = boost
                9.462291 = idf(docFreq=8, maxDocs=42596)
                0.014601838 = queryNorm
              0.8363562 = fieldWeight in 4684, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.462291 = idf(docFreq=8, maxDocs=42596)
                0.0625 = fieldNorm(doc=4684)
          0.95098317 = weight(abstract_txt:spam in 4684) [ClassicSimilarity], result of:
            0.95098317 = score(doc=4684,freq=6.0), product of:
              0.7323969 = queryWeight, product of:
                5.913822 = boost
                8.481462 = idf(docFreq=23, maxDocs=42596)
                0.014601838 = queryNorm
              1.2984533 = fieldWeight in 4684, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                8.481462 = idf(docFreq=23, maxDocs=42596)
                0.0625 = fieldNorm(doc=4684)
        0.08 = coord(2/25)
    
  3. Goodman, J.; Heckerman, D.; Rounthwaite, R.: Schutzwälle gegen Spam (2005) 0.08
    0.08068252 = sum of:
      0.08068252 = product of:
        1.0085316 = sum of:
          0.047693405 = weight(abstract_txt:mail in 4697) [ClassicSimilarity], result of:
            0.047693405 = score(doc=4697,freq=1.0), product of:
              0.14577013 = queryWeight, product of:
                1.6686271 = boost
                5.982762 = idf(docFreq=291, maxDocs=42596)
                0.014601838 = queryNorm
              0.3271823 = fieldWeight in 4697, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.982762 = idf(docFreq=291, maxDocs=42596)
                0.0546875 = fieldNorm(doc=4697)
          0.96083814 = weight(abstract_txt:spam in 4697) [ClassicSimilarity], result of:
            0.96083814 = score(doc=4697,freq=8.0), product of:
              0.7323969 = queryWeight, product of:
                5.913822 = boost
                8.481462 = idf(docFreq=23, maxDocs=42596)
                0.014601838 = queryNorm
              1.3119091 = fieldWeight in 4697, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                8.481462 = idf(docFreq=23, maxDocs=42596)
                0.0546875 = fieldNorm(doc=4697)
        0.08 = coord(2/25)
    
  4. Krüger, K.: Suchmaschinen-Spamming : Vergleichend-kritische Analysen zur Wirkung kommerzieller Strategien der Website-Optimierung auf das Ranking in www-Suchmaschinen (2004) 0.08
    0.07882458 = sum of:
      0.07882458 = product of:
        0.9853073 = sum of:
          0.16173172 = weight(abstract_txt:spamming in 4701) [ClassicSimilarity], result of:
            0.16173172 = score(doc=4701,freq=1.0), product of:
              0.18231721 = queryWeight, product of:
                1.319544 = boost
                9.462291 = idf(docFreq=8, maxDocs=42596)
                0.014601838 = queryNorm
              0.8870897 = fieldWeight in 4701, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.462291 = idf(docFreq=8, maxDocs=42596)
                0.09375 = fieldNorm(doc=4701)
          0.82357556 = weight(abstract_txt:spam in 4701) [ClassicSimilarity], result of:
            0.82357556 = score(doc=4701,freq=2.0), product of:
              0.7323969 = queryWeight, product of:
                5.913822 = boost
                8.481462 = idf(docFreq=23, maxDocs=42596)
                0.014601838 = queryNorm
              1.1244935 = fieldWeight in 4701, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.481462 = idf(docFreq=23, maxDocs=42596)
                0.09375 = fieldNorm(doc=4701)
        0.08 = coord(2/25)
    
  5. Heidrich, J.: Illegale E-Mail-Filterung : Eigenmächtiges Unterdrücken elektronischer Post ist strafbar (2005) 0.07
    0.07083904 = sum of:
      0.07083904 = product of:
        0.88548803 = sum of:
          0.1090135 = weight(abstract_txt:mail in 4240) [ClassicSimilarity], result of:
            0.1090135 = score(doc=4240,freq=1.0), product of:
              0.14577013 = queryWeight, product of:
                1.6686271 = boost
                5.982762 = idf(docFreq=291, maxDocs=42596)
                0.014601838 = queryNorm
              0.74784523 = fieldWeight in 4240, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.982762 = idf(docFreq=291, maxDocs=42596)
                0.125 = fieldNorm(doc=4240)
          0.77647454 = weight(abstract_txt:spam in 4240) [ClassicSimilarity], result of:
            0.77647454 = score(doc=4240,freq=1.0), product of:
              0.7323969 = queryWeight, product of:
                5.913822 = boost
                8.481462 = idf(docFreq=23, maxDocs=42596)
                0.014601838 = queryNorm
              1.0601827 = fieldWeight in 4240, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.481462 = idf(docFreq=23, maxDocs=42596)
                0.125 = fieldNorm(doc=4240)
        0.08 = coord(2/25)