Document (#34723)

Author
Pera, M.S.
Ng, Y.-K.
Title
SpamED : a spam E-mail detection approach based on phrase similarity
Source
Journal of the American Society for Information Science and Technology. 60(2009) no.2, S.393-409
Year
2009
Abstract
E-mail messages are unquestionably one of the most popular communication media these days. Not only are they fast and reliable but also free in general. Unfortunately, a significant number of e-mail messages received by e-mail users on a daily basis are spam. This fact is annoying since spam messages translate into a waste of the user's time in reviewing and deleting them. In addition, spam messages consume resources such as storage, bandwidth, and computer-processing time. Many attempts have been made in the past to eradicate spam; however, none has proven highly effective. In this article, we propose a spam e-mail detection approach, called SpamED, which uses the similarity of phrases in messages to detect spam. Conducted experiments not only verify that SpamED using trigrams in e-mail messages is capable of minimizing false positives and false negatives in spam detection but it also outperforms a number of existing e-mail filtering approaches with a 96% accuracy rate.

Similar documents (author)

  1. Pera, M. Soledad => Soledad Pera, M.: 5.23
    5.2349577 = sum of:
      5.2349577 = weight(author_txt:pera in 3876) [ClassicSimilarity], result of:
        5.2349577 = fieldWeight in 3876, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          9.871131 = idf(docFreq=5, maxDocs=42740)
          0.375 = fieldNorm(doc=3876)
    
  2. Pera, M.S.; Lund, W.; Ng, Y.-K.: ¬A sophisticated library search strategy using folksonomies and similarity matching (2009) 3.70
    3.701674 = sum of:
      3.701674 = weight(author_txt:pera in 4940) [ClassicSimilarity], result of:
        3.701674 = fieldWeight in 4940, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.871131 = idf(docFreq=5, maxDocs=42740)
          0.375 = fieldNorm(doc=4940)
    
  3. Denning, J.; Pera, M.S.; Ng, Y.-K.: ¬A readability level prediction tool for K-12 books (2016) 3.70
    3.701674 = sum of:
      3.701674 = weight(author_txt:pera in 4773) [ClassicSimilarity], result of:
        3.701674 = fieldWeight in 4773, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.871131 = idf(docFreq=5, maxDocs=42740)
          0.375 = fieldNorm(doc=4773)
    
  4. Soledad Pera, M.; Ng, Y.-K.: Recommending books to be exchanged online in the absence of wish lists (2018) 3.70
    3.701674 = sum of:
      3.701674 = weight(author_txt:pera in 183) [ClassicSimilarity], result of:
        3.701674 = fieldWeight in 183, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.871131 = idf(docFreq=5, maxDocs=42740)
          0.375 = fieldNorm(doc=183)
    

Similar documents (content)

  1. Ruano-Ordás, D.; Fdez-Riverola, F.; Méndez, J.R.: Using evolutionary computation for discovering spam patterns from e-mail samples (2018) 0.17
    0.17173657 = sum of:
      0.17173657 = product of:
        1.431138 = sum of:
          0.2855634 = weight(abstract_txt:messages in 1089) [ClassicSimilarity], result of:
            0.2855634 = score(doc=1089,freq=2.0), product of:
              0.37361097 = queryWeight, product of:
                5.758133 = boost
                6.9179583 = idf(docFreq=114, maxDocs=42740)
                0.009379075 = queryNorm
              0.7643336 = fieldWeight in 1089, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.9179583 = idf(docFreq=114, maxDocs=42740)
                0.078125 = fieldNorm(doc=1089)
          0.15210867 = weight(abstract_txt:mail in 1089) [ClassicSimilarity], result of:
            0.15210867 = score(doc=1089,freq=1.0), product of:
              0.3256213 = queryWeight, product of:
                5.8063297 = boost
                5.979311 = idf(docFreq=293, maxDocs=42740)
                0.009379075 = queryNorm
              0.46713367 = fieldWeight in 1089, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.979311 = idf(docFreq=293, maxDocs=42740)
                0.078125 = fieldNorm(doc=1089)
          0.9934659 = weight(abstract_txt:spam in 1089) [ClassicSimilarity], result of:
            0.9934659 = score(doc=1089,freq=4.0), product of:
              0.7493582 = queryWeight, product of:
                9.416423 = boost
                8.484837 = idf(docFreq=23, maxDocs=42740)
                0.009379075 = queryNorm
              1.3257557 = fieldWeight in 1089, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                8.484837 = idf(docFreq=23, maxDocs=42740)
                0.078125 = fieldNorm(doc=1089)
        0.12 = coord(3/25)
    
  2. Sedhai, S.; Sun, A.: ¬An analysis of 14 Million tweets on hashtag-oriented spamming* (2017) 0.13
    0.12762803 = sum of:
      0.12762803 = product of:
        1.0635669 = sum of:
          0.012561066 = weight(abstract_txt:only in 5684) [ClassicSimilarity], result of:
            0.012561066 = score(doc=5684,freq=1.0), product of:
              0.0471931 = queryWeight, product of:
                1.1815456 = boost
                4.258611 = idf(docFreq=1642, maxDocs=42740)
                0.009379075 = queryNorm
              0.2661632 = fieldWeight in 5684, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.258611 = idf(docFreq=1642, maxDocs=42740)
                0.0625 = fieldNorm(doc=5684)
          0.07761199 = weight(abstract_txt:detection in 5684) [ClassicSimilarity], result of:
            0.07761199 = score(doc=5684,freq=1.0), product of:
              0.18190464 = queryWeight, product of:
                2.8410492 = boost
                6.8266087 = idf(docFreq=125, maxDocs=42740)
                0.009379075 = queryNorm
              0.42666304 = fieldWeight in 5684, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8266087 = idf(docFreq=125, maxDocs=42740)
                0.0625 = fieldNorm(doc=5684)
          0.97339386 = weight(abstract_txt:spam in 5684) [ClassicSimilarity], result of:
            0.97339386 = score(doc=5684,freq=6.0), product of:
              0.7493582 = queryWeight, product of:
                9.416423 = boost
                8.484837 = idf(docFreq=23, maxDocs=42740)
                0.009379075 = queryNorm
              1.2989701 = fieldWeight in 5684, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                8.484837 = idf(docFreq=23, maxDocs=42740)
                0.0625 = fieldNorm(doc=5684)
        0.12 = coord(3/25)
    
  3. Zilberman, P.; Katz, G.; Shabtai, A.; Elovici, Y.: Analyzing group E-mail exchange to detect data leakage (2013) 0.13
    0.1255457 = sum of:
      0.1255457 = product of:
        0.52310705 = sum of:
          0.017461194 = weight(abstract_txt:approach in 3051) [ClassicSimilarity], result of:
            0.017461194 = score(doc=3051,freq=4.0), product of:
              0.03703026 = queryWeight, product of:
                1.0466214 = boost
                3.772308 = idf(docFreq=2671, maxDocs=42740)
                0.009379075 = queryNorm
              0.4715385 = fieldWeight in 3051, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.772308 = idf(docFreq=2671, maxDocs=42740)
                0.0625 = fieldNorm(doc=3051)
          0.011756587 = weight(abstract_txt:time in 3051) [ClassicSimilarity], result of:
            0.011756587 = score(doc=3051,freq=1.0), product of:
              0.04515595 = queryWeight, product of:
                1.1557628 = boost
                4.1656833 = idf(docFreq=1802, maxDocs=42740)
                0.009379075 = queryNorm
              0.2603552 = fieldWeight in 3051, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1656833 = idf(docFreq=1802, maxDocs=42740)
                0.0625 = fieldNorm(doc=3051)
          0.012561066 = weight(abstract_txt:only in 3051) [ClassicSimilarity], result of:
            0.012561066 = score(doc=3051,freq=1.0), product of:
              0.0471931 = queryWeight, product of:
                1.1815456 = boost
                4.258611 = idf(docFreq=1642, maxDocs=42740)
                0.009379075 = queryNorm
              0.2661632 = fieldWeight in 3051, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.258611 = idf(docFreq=1642, maxDocs=42740)
                0.0625 = fieldNorm(doc=3051)
          0.07349741 = weight(abstract_txt:false in 3051) [ClassicSimilarity], result of:
            0.07349741 = score(doc=3051,freq=1.0), product of:
              0.15324119 = queryWeight, product of:
                2.1291144 = boost
                7.6739063 = idf(docFreq=53, maxDocs=42740)
                0.009379075 = queryNorm
              0.47961915 = fieldWeight in 3051, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.6739063 = idf(docFreq=53, maxDocs=42740)
                0.0625 = fieldNorm(doc=3051)
          0.10975993 = weight(abstract_txt:detection in 3051) [ClassicSimilarity], result of:
            0.10975993 = score(doc=3051,freq=2.0), product of:
              0.18190464 = queryWeight, product of:
                2.8410492 = boost
                6.8266087 = idf(docFreq=125, maxDocs=42740)
                0.009379075 = queryNorm
              0.60339266 = fieldWeight in 3051, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.8266087 = idf(docFreq=125, maxDocs=42740)
                0.0625 = fieldNorm(doc=3051)
          0.2980709 = weight(abstract_txt:mail in 3051) [ClassicSimilarity], result of:
            0.2980709 = score(doc=3051,freq=6.0), product of:
              0.3256213 = queryWeight, product of:
                5.8063297 = boost
                5.979311 = idf(docFreq=293, maxDocs=42740)
                0.009379075 = queryNorm
              0.9153913 = fieldWeight in 3051, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.979311 = idf(docFreq=293, maxDocs=42740)
                0.0625 = fieldNorm(doc=3051)
        0.24 = coord(6/25)
    
  4. Sebastiani, F.: Classification of text, automatic (2006) 0.10
    0.0950047 = sum of:
      0.0950047 = product of:
        0.79170585 = sum of:
          0.013095896 = weight(abstract_txt:approach in 4) [ClassicSimilarity], result of:
            0.013095896 = score(doc=4,freq=1.0), product of:
              0.03703026 = queryWeight, product of:
                1.0466214 = boost
                3.772308 = idf(docFreq=2671, maxDocs=42740)
                0.009379075 = queryNorm
              0.35365388 = fieldWeight in 4, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.772308 = idf(docFreq=2671, maxDocs=42740)
                0.09375 = fieldNorm(doc=4)
          0.18253042 = weight(abstract_txt:mail in 4) [ClassicSimilarity], result of:
            0.18253042 = score(doc=4,freq=1.0), product of:
              0.3256213 = queryWeight, product of:
                5.8063297 = boost
                5.979311 = idf(docFreq=293, maxDocs=42740)
                0.009379075 = queryNorm
              0.5605604 = fieldWeight in 4, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.979311 = idf(docFreq=293, maxDocs=42740)
                0.09375 = fieldNorm(doc=4)
          0.5960795 = weight(abstract_txt:spam in 4) [ClassicSimilarity], result of:
            0.5960795 = score(doc=4,freq=1.0), product of:
              0.7493582 = queryWeight, product of:
                9.416423 = boost
                8.484837 = idf(docFreq=23, maxDocs=42740)
                0.009379075 = queryNorm
              0.7954534 = fieldWeight in 4, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.484837 = idf(docFreq=23, maxDocs=42740)
                0.09375 = fieldNorm(doc=4)
        0.12 = coord(3/25)
    
  5. Goodman, J.; Heckerman, D.; Rounthwaite, R.: Schutzwälle gegen Spam (2005) 0.09
    0.087196566 = sum of:
      0.087196566 = product of:
        1.0899571 = sum of:
          0.106476076 = weight(abstract_txt:mail in 4697) [ClassicSimilarity], result of:
            0.106476076 = score(doc=4697,freq=1.0), product of:
              0.3256213 = queryWeight, product of:
                5.8063297 = boost
                5.979311 = idf(docFreq=293, maxDocs=42740)
                0.009379075 = queryNorm
              0.32699358 = fieldWeight in 4697, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.979311 = idf(docFreq=293, maxDocs=42740)
                0.0546875 = fieldNorm(doc=4697)
          0.983481 = weight(abstract_txt:spam in 4697) [ClassicSimilarity], result of:
            0.983481 = score(doc=4697,freq=8.0), product of:
              0.7493582 = queryWeight, product of:
                9.416423 = boost
                8.484837 = idf(docFreq=23, maxDocs=42740)
                0.009379075 = queryNorm
              1.3124311 = fieldWeight in 4697, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                8.484837 = idf(docFreq=23, maxDocs=42740)
                0.0546875 = fieldNorm(doc=4697)
        0.08 = coord(2/25)