Document (#42089)

Author
Ruano-Ordás, D.
Fdez-Riverola, F.
Méndez, J.R.
Title
Using evolutionary computation for discovering spam patterns from e-mail samples
Source
Information processing and management. 54(2018) no.2, S.303-317
Year
2018
Abstract
One of the most relevant problems affecting the efficient use of e-mail to communicate worldwide is the spam phenomenon. Spamming involves flooding Internet with undesired messages aimed to promote illegal or low value products and services. Beyond the existence of different well-known machine learning techniques, collaborative schemes and other complementary approaches, some popular anti-spam frameworks such as SpamAssassin or Wirebrush4SPAM enabled the possibility of using regular expressions to effectively improve filter performance. In this work, we provide a review of existing proposals to automatically generate fully functional regular expressions from any input dataset combining spam and ham messages. Due to configuration difficulties and the low performance achieved by analysed schemes, in this work we introduce DiscoverRegex, a novel automatic spam pattern-finding tool. Patterns generated DiscoverRegex outperform those created by existing approaches (able to avoid FP errors) whilst minimising the computational resources required for its proper operation. DiscoverRegex source code is publicly available at https://github.com/sing-group/DiscoverRegex.
Content
Vgl.: https://doi.org/10.1016/j.ipm.2017.12.001.
Object
EMail

Similar documents (author)

  1. Martinez Méndez, F.J.: Aproximacion general a la evaluacion de la recuperacion mediante motores de busqueda en Internet (2001) 1.71
    1.7147765 = sum of:
      1.7147765 = product of:
        3.429553 = sum of:
          3.429553 = weight(author_txt:méndez in 3803) [ClassicSimilarity], result of:
            3.429553 = score(doc=3803,freq=1.0), product of:
              0.72203684 = queryWeight, product of:
                1.0215797 = boost
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.07440103 = queryNorm
              4.749831 = fieldWeight in 3803, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.5 = fieldNorm(doc=3803)
        0.5 = coord(1/2)
    
  2. Valdivia, J. Fdez- -> Fdez-Valdivia, J.: 1.71
    1.7059526 = sum of:
      1.7059526 = product of:
        3.4119053 = sum of:
          3.4119053 = weight(author_txt:fdez in 41) [ClassicSimilarity], result of:
            3.4119053 = score(doc=41,freq=2.0), product of:
              0.6918546 = queryWeight, product of:
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.07440103 = queryNorm
              4.9315352 = fieldWeight in 41, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.375 = fieldNorm(doc=41)
        0.5 = coord(1/2)
    
  3. Valdivia, J. Fdez -> Fdez-Valdivia, J.: 1.71
    1.7059526 = sum of:
      1.7059526 = product of:
        3.4119053 = sum of:
          3.4119053 = weight(author_txt:fdez in 502) [ClassicSimilarity], result of:
            3.4119053 = score(doc=502,freq=2.0), product of:
              0.6918546 = queryWeight, product of:
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.07440103 = queryNorm
              4.9315352 = fieldWeight in 502, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.298992 = idf(docFreq=10, maxDocs=44218)
                0.375 = fieldNorm(doc=502)
        0.5 = coord(1/2)
    
  4. Rodríguez, E.M.M. -> Méndez Rodríguez, E.M.: 1.50
    1.5004294 = sum of:
      1.5004294 = product of:
        3.0008588 = sum of:
          3.0008588 = weight(author_txt:méndez in 2856) [ClassicSimilarity], result of:
            3.0008588 = score(doc=2856,freq=1.0), product of:
              0.72203684 = queryWeight, product of:
                1.0215797 = boost
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.07440103 = queryNorm
              4.156102 = fieldWeight in 2856, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.4375 = fieldNorm(doc=2856)
        0.5 = coord(1/2)
    
  5. Greenberg, J.; Méndez Rodríguez, E.M.: Introduction: toward a more library-like Web via semantic knitting (2006) 1.50
    1.5004294 = sum of:
      1.5004294 = product of:
        3.0008588 = sum of:
          3.0008588 = weight(author_txt:méndez in 224) [ClassicSimilarity], result of:
            3.0008588 = score(doc=224,freq=1.0), product of:
              0.72203684 = queryWeight, product of:
                1.0215797 = boost
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.07440103 = queryNorm
              4.156102 = fieldWeight in 224, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.4375 = fieldNorm(doc=224)
        0.5 = coord(1/2)
    

Similar documents (content)

  1. Pera, M.S.; Ng, Y.-K.: SpamED : a spam E-mail detection approach based on phrase similarity (2009) 0.36
    0.35829756 = sum of:
      0.35829756 = product of:
        1.7914877 = sum of:
          0.031169167 = weight(abstract_txt:approaches in 2721) [ClassicSimilarity], result of:
            0.031169167 = score(doc=2721,freq=1.0), product of:
              0.08657203 = queryWeight, product of:
                1.2865142 = boost
                4.6084785 = idf(docFreq=1197, maxDocs=44218)
                0.014601768 = queryNorm
              0.3600374 = fieldWeight in 2721, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6084785 = idf(docFreq=1197, maxDocs=44218)
                0.078125 = fieldNorm(doc=2721)
          0.032042213 = weight(abstract_txt:existing in 2721) [ClassicSimilarity], result of:
            0.032042213 = score(doc=2721,freq=1.0), product of:
              0.08818115 = queryWeight, product of:
                1.2984154 = boost
                4.6511106 = idf(docFreq=1147, maxDocs=44218)
                0.014601768 = queryNorm
              0.36336803 = fieldWeight in 2721, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6511106 = idf(docFreq=1147, maxDocs=44218)
                0.078125 = fieldNorm(doc=2721)
          0.16875897 = weight(abstract_txt:mail in 2721) [ClassicSimilarity], result of:
            0.16875897 = score(doc=2721,freq=6.0), product of:
              0.14689994 = queryWeight, product of:
                1.6758555 = boost
                6.003155 = idf(docFreq=296, maxDocs=44218)
                0.014601768 = queryNorm
              1.1488022 = fieldWeight in 2721, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.003155 = idf(docFreq=296, maxDocs=44218)
                0.078125 = fieldNorm(doc=2721)
          0.25730503 = weight(abstract_txt:messages in 2721) [ClassicSimilarity], result of:
            0.25730503 = score(doc=2721,freq=6.0), product of:
              0.19459987 = queryWeight, product of:
                1.9288437 = boost
                6.9093957 = idf(docFreq=119, maxDocs=44218)
                0.014601768 = queryNorm
              1.322226 = fieldWeight in 2721, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.9093957 = idf(docFreq=119, maxDocs=44218)
                0.078125 = fieldNorm(doc=2721)
          1.3022122 = weight(abstract_txt:spam in 2721) [ClassicSimilarity], result of:
            1.3022122 = score(doc=2721,freq=7.0), product of:
              0.73954165 = queryWeight, product of:
                5.9453454 = boost
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.014601768 = queryNorm
              1.760837 = fieldWeight in 2721, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.078125 = fieldNorm(doc=2721)
        0.2 = coord(5/25)
    
  2. Sedhai, S.; Sun, A.: ¬An analysis of 14 Million tweets on hashtag-oriented spamming* (2017) 0.09
    0.08951429 = sum of:
      0.08951429 = product of:
        1.1189287 = sum of:
          0.15443721 = weight(abstract_txt:spamming in 3683) [ClassicSimilarity], result of:
            0.15443721 = score(doc=3683,freq=2.0), product of:
              0.18392839 = queryWeight, product of:
                1.3259745 = boost
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.014601768 = queryNorm
              0.83965945 = fieldWeight in 3683, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.0625 = fieldNorm(doc=3683)
          0.9644915 = weight(abstract_txt:spam in 3683) [ClassicSimilarity], result of:
            0.9644915 = score(doc=3683,freq=6.0), product of:
              0.73954165 = queryWeight, product of:
                5.9453454 = boost
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.014601768 = queryNorm
              1.3041747 = fieldWeight in 3683, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.0625 = fieldNorm(doc=3683)
        0.08 = coord(2/25)
    
  3. Goodman, J.; Heckerman, D.; Rounthwaite, R.: Schutzwälle gegen Spam (2005) 0.08
    0.08181706 = sum of:
      0.08181706 = product of:
        1.0227133 = sum of:
          0.04822689 = weight(abstract_txt:mail in 3696) [ClassicSimilarity], result of:
            0.04822689 = score(doc=3696,freq=1.0), product of:
              0.14689994 = queryWeight, product of:
                1.6758555 = boost
                6.003155 = idf(docFreq=296, maxDocs=44218)
                0.014601768 = queryNorm
              0.32829756 = fieldWeight in 3696, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.003155 = idf(docFreq=296, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3696)
          0.9744865 = weight(abstract_txt:spam in 3696) [ClassicSimilarity], result of:
            0.9744865 = score(doc=3696,freq=8.0), product of:
              0.73954165 = queryWeight, product of:
                5.9453454 = boost
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.014601768 = queryNorm
              1.3176898 = fieldWeight in 3696, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3696)
        0.08 = coord(2/25)
    
  4. Krüger, K.: Suchmaschinen-Spamming : Vergleichend-kritische Analysen zur Wirkung kommerzieller Strategien der Website-Optimierung auf das Ranking in www-Suchmaschinen (2004) 0.08
    0.079926364 = sum of:
      0.079926364 = product of:
        0.9990796 = sum of:
          0.1638054 = weight(abstract_txt:spamming in 3700) [ClassicSimilarity], result of:
            0.1638054 = score(doc=3700,freq=1.0), product of:
              0.18392839 = queryWeight, product of:
                1.3259745 = boost
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.014601768 = queryNorm
              0.89059335 = fieldWeight in 3700, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.09375 = fieldNorm(doc=3700)
          0.83527416 = weight(abstract_txt:spam in 3700) [ClassicSimilarity], result of:
            0.83527416 = score(doc=3700,freq=2.0), product of:
              0.73954165 = queryWeight, product of:
                5.9453454 = boost
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.014601768 = queryNorm
              1.1294484 = fieldWeight in 3700, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.09375 = fieldNorm(doc=3700)
        0.08 = coord(2/25)
    
  5. Heidrich, J.: Illegale E-Mail-Filterung : Eigenmächtiges Unterdrücken elektronischer Post ist strafbar (2005) 0.07
    0.07181895 = sum of:
      0.07181895 = product of:
        0.8977369 = sum of:
          0.11023289 = weight(abstract_txt:mail in 3239) [ClassicSimilarity], result of:
            0.11023289 = score(doc=3239,freq=1.0), product of:
              0.14689994 = queryWeight, product of:
                1.6758555 = boost
                6.003155 = idf(docFreq=296, maxDocs=44218)
                0.014601768 = queryNorm
              0.7503944 = fieldWeight in 3239, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.003155 = idf(docFreq=296, maxDocs=44218)
                0.125 = fieldNorm(doc=3239)
          0.787504 = weight(abstract_txt:spam in 3239) [ClassicSimilarity], result of:
            0.787504 = score(doc=3239,freq=1.0), product of:
              0.73954165 = queryWeight, product of:
                5.9453454 = boost
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.014601768 = queryNorm
              1.0648541 = fieldWeight in 3239, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.125 = fieldNorm(doc=3239)
        0.08 = coord(2/25)