Document (#34722)

Author
Pera, M.S.
Ng, Y.-K.
Title
SpamED : a spam E-mail detection approach based on phrase similarity
Source
Journal of the American Society for Information Science and Technology. 60(2009) no.2, S.393-409
Year
2009
Abstract
E-mail messages are unquestionably one of the most popular communication media these days. Not only are they fast and reliable but also free in general. Unfortunately, a significant number of e-mail messages received by e-mail users on a daily basis are spam. This fact is annoying since spam messages translate into a waste of the user's time in reviewing and deleting them. In addition, spam messages consume resources such as storage, bandwidth, and computer-processing time. Many attempts have been made in the past to eradicate spam; however, none has proven highly effective. In this article, we propose a spam e-mail detection approach, called SpamED, which uses the similarity of phrases in messages to detect spam. Conducted experiments not only verify that SpamED using trigrams in e-mail messages is capable of minimizing false positives and false negatives in spam detection but it also outperforms a number of existing e-mail filtering approaches with a 96% accuracy rate.

Similar documents (author)

  1. Pera, M. Soledad => Soledad Pera, M.: 5.04
    5.0379567 = sum of:
      5.0379567 = weight(author_txt:pera in 3876) [ClassicSimilarity], result of:
        5.0379567 = fieldWeight in 3876, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          9.499662 = idf(docFreq=8, maxDocs=44218)
          0.375 = fieldNorm(doc=3876)
    
  2. Azpiazu, I.M.; Soledad Pera, M.: Is cross-lingual readability assessment possible? (2020) 4.16
    4.156102 = sum of:
      4.156102 = weight(author_txt:pera in 5868) [ClassicSimilarity], result of:
        4.156102 = fieldWeight in 5868, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.499662 = idf(docFreq=8, maxDocs=44218)
          0.4375 = fieldNorm(doc=5868)
    
  3. Pera, M.S.; Lund, W.; Ng, Y.-K.: ¬A sophisticated library search strategy using folksonomies and similarity matching (2009) 3.56
    3.5623734 = sum of:
      3.5623734 = weight(author_txt:pera in 2939) [ClassicSimilarity], result of:
        3.5623734 = fieldWeight in 2939, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.499662 = idf(docFreq=8, maxDocs=44218)
          0.375 = fieldNorm(doc=2939)
    
  4. Denning, J.; Pera, M.S.; Ng, Y.-K.: ¬A readability level prediction tool for K-12 books (2016) 3.56
    3.5623734 = sum of:
      3.5623734 = weight(author_txt:pera in 2772) [ClassicSimilarity], result of:
        3.5623734 = fieldWeight in 2772, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.499662 = idf(docFreq=8, maxDocs=44218)
          0.375 = fieldNorm(doc=2772)
    
  5. Soledad Pera, M.; Ng, Y.-K.: Recommending books to be exchanged online in the absence of wish lists (2018) 3.56
    3.5623734 = sum of:
      3.5623734 = weight(author_txt:pera in 4182) [ClassicSimilarity], result of:
        3.5623734 = fieldWeight in 4182, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.499662 = idf(docFreq=8, maxDocs=44218)
          0.375 = fieldNorm(doc=4182)
    

Similar documents (content)

  1. Ruano-Ordás, D.; Fdez-Riverola, F.; Méndez, J.R.: Using evolutionary computation for discovering spam patterns from e-mail samples (2018) 0.17
    0.17270854 = sum of:
      0.17270854 = product of:
        1.4392378 = sum of:
          0.28358665 = weight(abstract_txt:messages in 5088) [ClassicSimilarity], result of:
            0.28358665 = score(doc=5088,freq=2.0), product of:
              0.37148446 = queryWeight, product of:
                5.8291397 = boost
                6.9093957 = idf(docFreq=119, maxDocs=44218)
                0.009223508 = queryNorm
              0.7633876 = fieldWeight in 5088, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.9093957 = idf(docFreq=119, maxDocs=44218)
                0.078125 = fieldNorm(doc=5088)
          0.15343909 = weight(abstract_txt:mail in 5088) [ClassicSimilarity], result of:
            0.15343909 = score(doc=5088,freq=1.0), product of:
              0.32716468 = queryWeight, product of:
                5.908684 = boost
                6.003155 = idf(docFreq=296, maxDocs=44218)
                0.009223508 = queryNorm
              0.4689965 = fieldWeight in 5088, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.003155 = idf(docFreq=296, maxDocs=44218)
                0.078125 = fieldNorm(doc=5088)
          1.0022122 = weight(abstract_txt:spam in 5088) [ClassicSimilarity], result of:
            1.0022122 = score(doc=5088,freq=4.0), product of:
              0.7529385 = queryWeight, product of:
                9.582598 = boost
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.009223508 = queryNorm
              1.3310677 = fieldWeight in 5088, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.078125 = fieldNorm(doc=5088)
        0.12 = coord(3/25)
    
  2. Sedhai, S.; Sun, A.: ¬An analysis of 14 Million tweets on hashtag-oriented spamming* (2017) 0.13
    0.12842445 = sum of:
      0.12842445 = product of:
        1.0702038 = sum of:
          0.012310583 = weight(abstract_txt:only in 3683) [ClassicSimilarity], result of:
            0.012310583 = score(doc=3683,freq=1.0), product of:
              0.04651348 = queryWeight, product of:
                1.1908661 = boost
                4.234672 = idf(docFreq=1740, maxDocs=44218)
                0.009223508 = queryNorm
              0.264667 = fieldWeight in 3683, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.234672 = idf(docFreq=1740, maxDocs=44218)
                0.0625 = fieldNorm(doc=3683)
          0.075929865 = weight(abstract_txt:detection in 3683) [ClassicSimilarity], result of:
            0.075929865 = score(doc=3683,freq=1.0), product of:
              0.17907374 = queryWeight, product of:
                2.8617723 = boost
                6.784232 = idf(docFreq=135, maxDocs=44218)
                0.009223508 = queryNorm
              0.4240145 = fieldWeight in 3683, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.784232 = idf(docFreq=135, maxDocs=44218)
                0.0625 = fieldNorm(doc=3683)
          0.98196334 = weight(abstract_txt:spam in 3683) [ClassicSimilarity], result of:
            0.98196334 = score(doc=3683,freq=6.0), product of:
              0.7529385 = queryWeight, product of:
                9.582598 = boost
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.009223508 = queryNorm
              1.3041747 = fieldWeight in 3683, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.0625 = fieldNorm(doc=3683)
        0.12 = coord(3/25)
    
  3. Zilberman, P.; Katz, G.; Shabtai, A.; Elovici, Y.: Analyzing group E-mail exchange to detect data leakage (2013) 0.13
    0.1250806 = sum of:
      0.1250806 = product of:
        0.5211692 = sum of:
          0.017034085 = weight(abstract_txt:approach in 1050) [ClassicSimilarity], result of:
            0.017034085 = score(doc=1050,freq=4.0), product of:
              0.036384713 = queryWeight, product of:
                1.0532537 = boost
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.009223508 = queryNorm
              0.468166 = fieldWeight in 1050, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.0625 = fieldNorm(doc=1050)
          0.0115728285 = weight(abstract_txt:time in 1050) [ClassicSimilarity], result of:
            0.0115728285 = score(doc=1050,freq=1.0), product of:
              0.044636082 = queryWeight, product of:
                1.1665854 = boost
                4.148331 = idf(docFreq=1897, maxDocs=44218)
                0.009223508 = queryNorm
              0.2592707 = fieldWeight in 1050, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.148331 = idf(docFreq=1897, maxDocs=44218)
                0.0625 = fieldNorm(doc=1050)
          0.012310583 = weight(abstract_txt:only in 1050) [ClassicSimilarity], result of:
            0.012310583 = score(doc=1050,freq=1.0), product of:
              0.04651348 = queryWeight, product of:
                1.1908661 = boost
                4.234672 = idf(docFreq=1740, maxDocs=44218)
                0.009223508 = queryNorm
              0.264667 = fieldWeight in 1050, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.234672 = idf(docFreq=1740, maxDocs=44218)
                0.0625 = fieldNorm(doc=1050)
          0.072192624 = weight(abstract_txt:false in 1050) [ClassicSimilarity], result of:
            0.072192624 = score(doc=1050,freq=1.0), product of:
              0.15125914 = queryWeight, product of:
                2.1475058 = boost
                7.636444 = idf(docFreq=57, maxDocs=44218)
                0.009223508 = queryNorm
              0.47727776 = fieldWeight in 1050, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.636444 = idf(docFreq=57, maxDocs=44218)
                0.0625 = fieldNorm(doc=1050)
          0.10738104 = weight(abstract_txt:detection in 1050) [ClassicSimilarity], result of:
            0.10738104 = score(doc=1050,freq=2.0), product of:
              0.17907374 = queryWeight, product of:
                2.8617723 = boost
                6.784232 = idf(docFreq=135, maxDocs=44218)
                0.009223508 = queryNorm
              0.59964705 = fieldWeight in 1050, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.784232 = idf(docFreq=135, maxDocs=44218)
                0.0625 = fieldNorm(doc=1050)
          0.300678 = weight(abstract_txt:mail in 1050) [ClassicSimilarity], result of:
            0.300678 = score(doc=1050,freq=6.0), product of:
              0.32716468 = queryWeight, product of:
                5.908684 = boost
                6.003155 = idf(docFreq=296, maxDocs=44218)
                0.009223508 = queryNorm
              0.91904175 = fieldWeight in 1050, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.003155 = idf(docFreq=296, maxDocs=44218)
                0.0625 = fieldNorm(doc=1050)
        0.24 = coord(6/25)
    
  4. Sebastiani, F.: Classification of text, automatic (2006) 0.10
    0.09578756 = sum of:
      0.09578756 = product of:
        0.7982297 = sum of:
          0.012775564 = weight(abstract_txt:approach in 5003) [ClassicSimilarity], result of:
            0.012775564 = score(doc=5003,freq=1.0), product of:
              0.036384713 = queryWeight, product of:
                1.0532537 = boost
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.009223508 = queryNorm
              0.3511245 = fieldWeight in 5003, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.09375 = fieldNorm(doc=5003)
          0.18412691 = weight(abstract_txt:mail in 5003) [ClassicSimilarity], result of:
            0.18412691 = score(doc=5003,freq=1.0), product of:
              0.32716468 = queryWeight, product of:
                5.908684 = boost
                6.003155 = idf(docFreq=296, maxDocs=44218)
                0.009223508 = queryNorm
              0.5627958 = fieldWeight in 5003, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.003155 = idf(docFreq=296, maxDocs=44218)
                0.09375 = fieldNorm(doc=5003)
          0.60132724 = weight(abstract_txt:spam in 5003) [ClassicSimilarity], result of:
            0.60132724 = score(doc=5003,freq=1.0), product of:
              0.7529385 = queryWeight, product of:
                9.582598 = boost
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.009223508 = queryNorm
              0.7986406 = fieldWeight in 5003, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.09375 = fieldNorm(doc=5003)
        0.12 = coord(3/25)
    
  5. Goodman, J.; Heckerman, D.; Rounthwaite, R.: Schutzwälle gegen Spam (2005) 0.09
    0.08796374 = sum of:
      0.08796374 = product of:
        1.0995468 = sum of:
          0.10740736 = weight(abstract_txt:mail in 3696) [ClassicSimilarity], result of:
            0.10740736 = score(doc=3696,freq=1.0), product of:
              0.32716468 = queryWeight, product of:
                5.908684 = boost
                6.003155 = idf(docFreq=296, maxDocs=44218)
                0.009223508 = queryNorm
              0.32829756 = fieldWeight in 3696, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.003155 = idf(docFreq=296, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3696)
          0.9921394 = weight(abstract_txt:spam in 3696) [ClassicSimilarity], result of:
            0.9921394 = score(doc=3696,freq=8.0), product of:
              0.7529385 = queryWeight, product of:
                9.582598 = boost
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.009223508 = queryNorm
              1.3176898 = fieldWeight in 3696, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3696)
        0.08 = coord(2/25)