Document (#38051)

Author
Zilberman, P.
Katz, G.
Shabtai, A.
Elovici, Y.
Title
Analyzing group E-mail exchange to detect data leakage
Source
Journal of the American Society for Information Science and Technology. 64(2013) no.9, S.1768-1779
Year
2013
Abstract
Today's organizations spend a great deal of time and effort on e-mail leakage prevention. However, there are still no satisfactory solutions; addressing mistakes are not detected and in some cases correct recipients are wrongly marked as potential mistakes. In this article we present a new approach for preventing e-mail addressing mistakes in organizations. The approach is based on an analysis of e-mail exchanges among members of an organization and the identification of groups based on common topics. When a new e-mail is about to be sent, each recipient is analyzed. A recipient is approved if the e-mail's content belongs to at least one common topic to both the sender and the recipient. This can be applied even if the sender and recipient have never communicated directly before. The new approach was evaluated using the Enron e-mail data set and was compared with a well known method for the detection of e-mail addressing mistakes. The results show that the proposed approach is capable of detecting 87% of nonlegitimate recipients while incorrectly classifying only 0.5% of the legitimate recipients. These results outperform previous work, which reports a detection rate of 82% without reference to the false positive rate.

Similar documents (author)

  1. Katz, M.: Multimedia: the future of information delivery to homes and business (1993) 5.54
    5.5397964 = sum of:
      5.5397964 = weight(author_txt:katz in 6646) [ClassicSimilarity], result of:
        5.5397964 = fieldWeight in 6646, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.863674 = idf(docFreq=16, maxDocs=44218)
          0.625 = fieldNorm(doc=6646)
    
  2. Katz, B.: Community college reference services : a working guide for and by librarians (1992) 5.54
    5.5397964 = sum of:
      5.5397964 = weight(author_txt:katz in 661) [ClassicSimilarity], result of:
        5.5397964 = fieldWeight in 661, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.863674 = idf(docFreq=16, maxDocs=44218)
          0.625 = fieldNorm(doc=661)
    
  3. Katz, J.S.: Bibliometric standards : personal experience and lessons learned (1996) 5.54
    5.5397964 = sum of:
      5.5397964 = weight(author_txt:katz in 5058) [ClassicSimilarity], result of:
        5.5397964 = fieldWeight in 5058, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.863674 = idf(docFreq=16, maxDocs=44218)
          0.625 = fieldNorm(doc=5058)
    
  4. Katz, W.A.: Introduction to reference work (1997) 5.54
    5.5397964 = sum of:
      5.5397964 = weight(author_txt:katz in 1188) [ClassicSimilarity], result of:
        5.5397964 = fieldWeight in 1188, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.863674 = idf(docFreq=16, maxDocs=44218)
          0.625 = fieldNorm(doc=1188)
    
  5. Katz, W.A.: Introduction to reference work : Vol.1: Basic information sources; vol.2: Reference services and reference processes (1992) 5.54
    5.5397964 = sum of:
      5.5397964 = weight(author_txt:katz in 3364) [ClassicSimilarity], result of:
        5.5397964 = fieldWeight in 3364, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.863674 = idf(docFreq=16, maxDocs=44218)
          0.625 = fieldNorm(doc=3364)
    

Similar documents (content)

  1. Pera, M.S.; Ng, Y.-K.: SpamED : a spam E-mail detection approach based on phrase similarity (2009) 0.14
    0.1379746 = sum of:
      0.1379746 = product of:
        0.68987304 = sum of:
          0.07373753 = weight(abstract_txt:false in 2721) [ClassicSimilarity], result of:
            0.07373753 = score(doc=2721,freq=2.0), product of:
              0.087396175 = queryWeight, product of:
                7.636444 = idf(docFreq=57, maxDocs=44218)
                0.011444616 = queryNorm
              0.8437158 = fieldWeight in 2721, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.636444 = idf(docFreq=57, maxDocs=44218)
                0.078125 = fieldNorm(doc=2721)
          0.053801153 = weight(abstract_txt:rate in 2721) [ClassicSimilarity], result of:
            0.053801153 = score(doc=2721,freq=1.0), product of:
              0.11243833 = queryWeight, product of:
                1.6040798 = boost
                6.124733 = idf(docFreq=262, maxDocs=44218)
                0.011444616 = queryNorm
              0.47849476 = fieldWeight in 2721, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.124733 = idf(docFreq=262, maxDocs=44218)
                0.078125 = fieldNorm(doc=2721)
          0.10340634 = weight(abstract_txt:detection in 2721) [ClassicSimilarity], result of:
            0.10340634 = score(doc=2721,freq=2.0), product of:
              0.13795628 = queryWeight, product of:
                1.776804 = boost
                6.784232 = idf(docFreq=135, maxDocs=44218)
                0.011444616 = queryNorm
              0.7495588 = fieldWeight in 2721, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.784232 = idf(docFreq=135, maxDocs=44218)
                0.078125 = fieldNorm(doc=2721)
          0.024605354 = weight(abstract_txt:approach in 2721) [ClassicSimilarity], result of:
            0.024605354 = score(doc=2721,freq=1.0), product of:
              0.084091045 = queryWeight, product of:
                1.9618177 = boost
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.011444616 = queryNorm
              0.29260373 = fieldWeight in 2721, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.078125 = fieldNorm(doc=2721)
          0.43432266 = weight(abstract_txt:mail in 2721) [ClassicSimilarity], result of:
            0.43432266 = score(doc=2721,freq=6.0), product of:
              0.37806568 = queryWeight, product of:
                5.5028343 = boost
                6.003155 = idf(docFreq=296, maxDocs=44218)
                0.011444616 = queryNorm
              1.1488022 = fieldWeight in 2721, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.003155 = idf(docFreq=296, maxDocs=44218)
                0.078125 = fieldNorm(doc=2721)
        0.2 = coord(5/25)
    
  2. Cyr, S.; Choo, C.W.: ¬The individual and social dynamics of knowledge sharing : an exploratory study (2010) 0.07
    0.0653389 = sum of:
      0.0653389 = product of:
        0.5444909 = sum of:
          0.01722375 = weight(abstract_txt:approach in 3619) [ClassicSimilarity], result of:
            0.01722375 = score(doc=3619,freq=1.0), product of:
              0.084091045 = queryWeight, product of:
                1.9618177 = boost
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.011444616 = queryNorm
              0.20482263 = fieldWeight in 3619, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3619)
          0.18272243 = weight(abstract_txt:recipients in 3619) [ClassicSimilarity], result of:
            0.18272243 = score(doc=3619,freq=1.0), product of:
              0.36887535 = queryWeight, product of:
                3.5583956 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.011444616 = queryNorm
              0.49535006 = fieldWeight in 3619, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3619)
          0.34454468 = weight(abstract_txt:recipient in 3619) [ClassicSimilarity], result of:
            0.34454468 = score(doc=3619,freq=2.0), product of:
              0.49183375 = queryWeight, product of:
                4.7445273 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.011444616 = queryNorm
              0.70053077 = fieldWeight in 3619, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3619)
        0.12 = coord(3/25)
    
  3. MacFarlane, A.; Missaoui, S.; Makri, S.; Gutierrez Lopez, M.: Sender vs. recipient-orientated information systems revisited (2022) 0.06
    0.060473002 = sum of:
      0.060473002 = product of:
        0.37795627 = sum of:
          0.020878334 = weight(abstract_txt:approach in 607) [ClassicSimilarity], result of:
            0.020878334 = score(doc=607,freq=2.0), product of:
              0.084091045 = queryWeight, product of:
                1.9618177 = boost
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.011444616 = queryNorm
              0.24828249 = fieldWeight in 607, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.046875 = fieldNorm(doc=607)
          0.08222991 = weight(abstract_txt:sender in 607) [ClassicSimilarity], result of:
            0.08222991 = score(doc=607,freq=1.0), product of:
              0.20971961 = queryWeight, product of:
                2.1907272 = boost
                8.364683 = idf(docFreq=27, maxDocs=44218)
                0.011444616 = queryNorm
              0.39209452 = fieldWeight in 607, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.364683 = idf(docFreq=27, maxDocs=44218)
                0.046875 = fieldNorm(doc=607)
          0.0660224 = weight(abstract_txt:addressing in 607) [ClassicSimilarity], result of:
            0.0660224 = score(doc=607,freq=1.0), product of:
              0.2073849 = queryWeight, product of:
                2.6681054 = boost
                6.7916126 = idf(docFreq=134, maxDocs=44218)
                0.011444616 = queryNorm
              0.31835684 = fieldWeight in 607, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7916126 = idf(docFreq=134, maxDocs=44218)
                0.046875 = fieldNorm(doc=607)
          0.2088256 = weight(abstract_txt:recipient in 607) [ClassicSimilarity], result of:
            0.2088256 = score(doc=607,freq=1.0), product of:
              0.49183375 = queryWeight, product of:
                4.7445273 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.011444616 = queryNorm
              0.42458576 = fieldWeight in 607, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.046875 = fieldNorm(doc=607)
        0.16 = coord(4/25)
    
  4. Alberts, I.; Forest, D.: Email pragmatics and automatic classification : a study in the organizational context (2012) 0.06
    0.05922028 = sum of:
      0.05922028 = product of:
        0.49350235 = sum of:
          0.1054283 = weight(abstract_txt:rate in 238) [ClassicSimilarity], result of:
            0.1054283 = score(doc=238,freq=6.0), product of:
              0.11243833 = queryWeight, product of:
                1.6040798 = boost
                6.124733 = idf(docFreq=262, maxDocs=44218)
                0.011444616 = queryNorm
              0.93765444 = fieldWeight in 238, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.124733 = idf(docFreq=262, maxDocs=44218)
                0.0625 = fieldNorm(doc=238)
          0.10963988 = weight(abstract_txt:sender in 238) [ClassicSimilarity], result of:
            0.10963988 = score(doc=238,freq=1.0), product of:
              0.20971961 = queryWeight, product of:
                2.1907272 = boost
                8.364683 = idf(docFreq=27, maxDocs=44218)
                0.011444616 = queryNorm
              0.5227927 = fieldWeight in 238, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.364683 = idf(docFreq=27, maxDocs=44218)
                0.0625 = fieldNorm(doc=238)
          0.27843416 = weight(abstract_txt:recipient in 238) [ClassicSimilarity], result of:
            0.27843416 = score(doc=238,freq=1.0), product of:
              0.49183375 = queryWeight, product of:
                4.7445273 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.011444616 = queryNorm
              0.56611437 = fieldWeight in 238, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.0625 = fieldNorm(doc=238)
        0.12 = coord(3/25)
    
  5. Foster, J.: On the interpretative authority of information systems (1999) 0.06
    0.058017243 = sum of:
      0.058017243 = product of:
        0.48347703 = sum of:
          0.057124715 = weight(abstract_txt:approach in 274) [ClassicSimilarity], result of:
            0.057124715 = score(doc=274,freq=11.0), product of:
              0.084091045 = queryWeight, product of:
                1.9618177 = boost
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.011444616 = queryNorm
              0.67931986 = fieldWeight in 274, product of:
                3.3166249 = tf(freq=11.0), with freq of:
                  11.0 = termFreq=11.0
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.0546875 = fieldNorm(doc=274)
          0.18272243 = weight(abstract_txt:recipients in 274) [ClassicSimilarity], result of:
            0.18272243 = score(doc=274,freq=1.0), product of:
              0.36887535 = queryWeight, product of:
                3.5583956 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.011444616 = queryNorm
              0.49535006 = fieldWeight in 274, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.0546875 = fieldNorm(doc=274)
          0.24362987 = weight(abstract_txt:recipient in 274) [ClassicSimilarity], result of:
            0.24362987 = score(doc=274,freq=1.0), product of:
              0.49183375 = queryWeight, product of:
                4.7445273 = boost
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.011444616 = queryNorm
              0.49535006 = fieldWeight in 274, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.05783 = idf(docFreq=13, maxDocs=44218)
                0.0546875 = fieldNorm(doc=274)
        0.12 = coord(3/25)