Document (#36977)

Author
Sood, S.O.
Churchill, E.F.
Antin, J.
Title
Automatic identification of personal insults on social news sites
Source
Journal of the American Society for Information Science and Technology. 63(2012) no.2, S.270-285
Year
2012
Abstract
As online communities grow and the volume of user-generated content increases, the need for community management also rises. Community management has three main purposes: to create a positive experience for existing participants, to promote appropriate, socionormative behaviors, and to encourage potential participants to make contributions. Research indicates that the quality of content a potential participant sees on a site is highly influential; off-topic, negative comments with malicious intent are a particularly strong boundary to participation or set the tone for encouraging similar contributions. A problem for community managers, therefore, is the detection and elimination of such undesirable content. As a community grows, this undertaking becomes more daunting. Can an automated system aid community managers in this task? In this paper, we address this question through a machine learning approach to automatic detection of inappropriate negative user contributions. Our training corpus is a set of comments from a news commenting site that we tasked Amazon Mechanical Turk workers with labeling. Each comment is labeled for the presence of profanity, insults, and the object of the insults. Support vector machines trained on these data are combined with relevance and valence analysis systems in a multistep approach to the detection of inappropriate negative user contributions. The system shows great potential for semiautomated community management.
Theme
Internet

Similar documents (content)

  1. Zheng, H.; Goh, D.H.-L.; Lee, E.W.J.; Lee, C.S.; Theng, Y.-L.: Understanding the effects of message cues on COVID-19 information sharing on Twitter (2022) 0.14
    0.14157332 = sum of:
      0.14157332 = product of:
        0.505619 = sum of:
          0.16452365 = weight(abstract_txt:valence in 564) [ClassicSimilarity], result of:
            0.16452365 = score(doc=564,freq=2.0), product of:
              0.19813846 = queryWeight, product of:
                1.0667446 = boost
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.01977169 = queryNorm
              0.8303468 = fieldWeight in 564, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.0625 = fieldNorm(doc=564)
          0.011152599 = weight(abstract_txt:this in 564) [ClassicSimilarity], result of:
            0.011152599 = score(doc=564,freq=2.0), product of:
              0.052290235 = queryWeight, product of:
                1.0960146 = boost
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.01977169 = queryNorm
              0.21328263 = fieldWeight in 564, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.0625 = fieldNorm(doc=564)
          0.021039654 = weight(abstract_txt:user in 564) [ClassicSimilarity], result of:
            0.021039654 = score(doc=564,freq=1.0), product of:
              0.09138874 = queryWeight, product of:
                1.2548246 = boost
                3.6835442 = idf(docFreq=3020, maxDocs=44218)
                0.01977169 = queryNorm
              0.23022151 = fieldWeight in 564, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6835442 = idf(docFreq=3020, maxDocs=44218)
                0.0625 = fieldNorm(doc=564)
          0.08390118 = weight(abstract_txt:news in 564) [ClassicSimilarity], result of:
            0.08390118 = score(doc=564,freq=2.0), product of:
              0.15934506 = queryWeight, product of:
                1.3528833 = boost
                5.957094 = idf(docFreq=310, maxDocs=44218)
                0.01977169 = queryNorm
              0.5265377 = fieldWeight in 564, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.957094 = idf(docFreq=310, maxDocs=44218)
                0.0625 = fieldNorm(doc=564)
          0.04347672 = weight(abstract_txt:content in 564) [ClassicSimilarity], result of:
            0.04347672 = score(doc=564,freq=2.0), product of:
              0.117677875 = queryWeight, product of:
                1.423915 = boost
                4.17991 = idf(docFreq=1838, maxDocs=44218)
                0.01977169 = queryNorm
              0.36945534 = fieldWeight in 564, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.17991 = idf(docFreq=1838, maxDocs=44218)
                0.0625 = fieldNorm(doc=564)
          0.032110643 = weight(abstract_txt:management in 564) [ClassicSimilarity], result of:
            0.032110643 = score(doc=564,freq=1.0), product of:
              0.12114336 = queryWeight, product of:
                1.4447293 = boost
                4.2410107 = idf(docFreq=1729, maxDocs=44218)
                0.01977169 = queryNorm
              0.26506317 = fieldWeight in 564, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2410107 = idf(docFreq=1729, maxDocs=44218)
                0.0625 = fieldNorm(doc=564)
          0.14941455 = weight(abstract_txt:negative in 564) [ClassicSimilarity], result of:
            0.14941455 = score(doc=564,freq=2.0), product of:
              0.2679902 = queryWeight, product of:
                2.1488006 = boost
                6.3078156 = idf(docFreq=218, maxDocs=44218)
                0.01977169 = queryNorm
              0.5575374 = fieldWeight in 564, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.3078156 = idf(docFreq=218, maxDocs=44218)
                0.0625 = fieldNorm(doc=564)
        0.28 = coord(7/25)
    
  2. Bullard, J.; Howison, J.: Learning from Elitist Jerks : creating high-quality knowledge resources from ongoing conversations (2015) 0.13
    0.12509546 = sum of:
      0.12509546 = product of:
        0.44676948 = sum of:
          0.007886078 = weight(abstract_txt:this in 2268) [ClassicSimilarity], result of:
            0.007886078 = score(doc=2268,freq=1.0), product of:
              0.052290235 = queryWeight, product of:
                1.0960146 = boost
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.01977169 = queryNorm
              0.1508136 = fieldWeight in 2268, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.0625 = fieldNorm(doc=2268)
          0.05178134 = weight(abstract_txt:site in 2268) [ClassicSimilarity], result of:
            0.05178134 = score(doc=2268,freq=1.0), product of:
              0.14552985 = queryWeight, product of:
                1.2929064 = boost
                5.6930003 = idf(docFreq=404, maxDocs=44218)
                0.01977169 = queryNorm
              0.35581252 = fieldWeight in 2268, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6930003 = idf(docFreq=404, maxDocs=44218)
                0.0625 = fieldNorm(doc=2268)
          0.030742684 = weight(abstract_txt:content in 2268) [ClassicSimilarity], result of:
            0.030742684 = score(doc=2268,freq=1.0), product of:
              0.117677875 = queryWeight, product of:
                1.423915 = boost
                4.17991 = idf(docFreq=1838, maxDocs=44218)
                0.01977169 = queryNorm
              0.2612444 = fieldWeight in 2268, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.17991 = idf(docFreq=1838, maxDocs=44218)
                0.0625 = fieldNorm(doc=2268)
          0.032110643 = weight(abstract_txt:management in 2268) [ClassicSimilarity], result of:
            0.032110643 = score(doc=2268,freq=1.0), product of:
              0.12114336 = queryWeight, product of:
                1.4447293 = boost
                4.2410107 = idf(docFreq=1729, maxDocs=44218)
                0.01977169 = queryNorm
              0.26506317 = fieldWeight in 2268, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2410107 = idf(docFreq=1729, maxDocs=44218)
                0.0625 = fieldNorm(doc=2268)
          0.07421739 = weight(abstract_txt:comments in 2268) [ClassicSimilarity], result of:
            0.07421739 = score(doc=2268,freq=1.0), product of:
              0.18500082 = queryWeight, product of:
                1.4577327 = boost
                6.4187727 = idf(docFreq=195, maxDocs=44218)
                0.01977169 = queryNorm
              0.4011733 = fieldWeight in 2268, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.4187727 = idf(docFreq=195, maxDocs=44218)
                0.0625 = fieldNorm(doc=2268)
          0.12204498 = weight(abstract_txt:contributions in 2268) [ClassicSimilarity], result of:
            0.12204498 = score(doc=2268,freq=1.0), product of:
              0.32473305 = queryWeight, product of:
                2.7312994 = boost
                6.0133076 = idf(docFreq=293, maxDocs=44218)
                0.01977169 = queryNorm
              0.37583172 = fieldWeight in 2268, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0133076 = idf(docFreq=293, maxDocs=44218)
                0.0625 = fieldNorm(doc=2268)
          0.12798637 = weight(abstract_txt:community in 2268) [ClassicSimilarity], result of:
            0.12798637 = score(doc=2268,freq=2.0), product of:
              0.3045389 = queryWeight, product of:
                3.2394633 = boost
                4.7547307 = idf(docFreq=1034, maxDocs=44218)
                0.01977169 = queryNorm
              0.42026278 = fieldWeight in 2268, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.7547307 = idf(docFreq=1034, maxDocs=44218)
                0.0625 = fieldNorm(doc=2268)
        0.28 = coord(7/25)
    
  3. Song, M.; Jeong, Y.K.; Kim, H.J.: Identifying the topology of the K-pop video community on YouTube : a combined co-comment analysis approach (2015) 0.12
    0.1243486 = sum of:
      0.1243486 = product of:
        0.621743 = sum of:
          0.14412697 = weight(abstract_txt:commenting in 2273) [ClassicSimilarity], result of:
            0.14412697 = score(doc=2273,freq=2.0), product of:
              0.18140393 = queryWeight, product of:
                1.0207031 = boost
                8.988837 = idf(docFreq=14, maxDocs=44218)
                0.01977169 = queryNorm
              0.79450846 = fieldWeight in 2273, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.988837 = idf(docFreq=14, maxDocs=44218)
                0.0625 = fieldNorm(doc=2273)
          0.015772156 = weight(abstract_txt:this in 2273) [ClassicSimilarity], result of:
            0.015772156 = score(doc=2273,freq=4.0), product of:
              0.052290235 = queryWeight, product of:
                1.0960146 = boost
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.01977169 = queryNorm
              0.3016272 = fieldWeight in 2273, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.0625 = fieldNorm(doc=2273)
          0.063118964 = weight(abstract_txt:user in 2273) [ClassicSimilarity], result of:
            0.063118964 = score(doc=2273,freq=9.0), product of:
              0.09138874 = queryWeight, product of:
                1.2548246 = boost
                3.6835442 = idf(docFreq=3020, maxDocs=44218)
                0.01977169 = queryNorm
              0.6906645 = fieldWeight in 2273, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                3.6835442 = idf(docFreq=3020, maxDocs=44218)
                0.0625 = fieldNorm(doc=2273)
          0.19636074 = weight(abstract_txt:comments in 2273) [ClassicSimilarity], result of:
            0.19636074 = score(doc=2273,freq=7.0), product of:
              0.18500082 = queryWeight, product of:
                1.4577327 = boost
                6.4187727 = idf(docFreq=195, maxDocs=44218)
                0.01977169 = queryNorm
              1.0614047 = fieldWeight in 2273, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                6.4187727 = idf(docFreq=195, maxDocs=44218)
                0.0625 = fieldNorm(doc=2273)
          0.20236422 = weight(abstract_txt:community in 2273) [ClassicSimilarity], result of:
            0.20236422 = score(doc=2273,freq=5.0), product of:
              0.3045389 = queryWeight, product of:
                3.2394633 = boost
                4.7547307 = idf(docFreq=1034, maxDocs=44218)
                0.01977169 = queryNorm
              0.6644938 = fieldWeight in 2273, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.7547307 = idf(docFreq=1034, maxDocs=44218)
                0.0625 = fieldNorm(doc=2273)
        0.2 = coord(5/25)
    
  4. Thelwall, M.; Buckley, K.; Paltoglou, G.; Cai, D.; Kappas, A.: Sentiment strength detection in short informal text (2010) 0.12
    0.11700343 = sum of:
      0.11700343 = product of:
        0.48751432 = sum of:
          0.013659087 = weight(abstract_txt:this in 4200) [ClassicSimilarity], result of:
            0.013659087 = score(doc=4200,freq=3.0), product of:
              0.052290235 = queryWeight, product of:
                1.0960146 = boost
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.01977169 = queryNorm
              0.2612168 = fieldWeight in 4200, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.0625 = fieldNorm(doc=4200)
          0.021039654 = weight(abstract_txt:user in 4200) [ClassicSimilarity], result of:
            0.021039654 = score(doc=4200,freq=1.0), product of:
              0.09138874 = queryWeight, product of:
                1.2548246 = boost
                3.6835442 = idf(docFreq=3020, maxDocs=44218)
                0.01977169 = queryNorm
              0.23022151 = fieldWeight in 4200, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6835442 = idf(docFreq=3020, maxDocs=44218)
                0.0625 = fieldNorm(doc=4200)
          0.07421739 = weight(abstract_txt:comments in 4200) [ClassicSimilarity], result of:
            0.07421739 = score(doc=4200,freq=1.0), product of:
              0.18500082 = queryWeight, product of:
                1.4577327 = boost
                6.4187727 = idf(docFreq=195, maxDocs=44218)
                0.01977169 = queryNorm
              0.4011733 = fieldWeight in 4200, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.4187727 = idf(docFreq=195, maxDocs=44218)
                0.0625 = fieldNorm(doc=4200)
          0.14150143 = weight(abstract_txt:inappropriate in 4200) [ClassicSimilarity], result of:
            0.14150143 = score(doc=4200,freq=1.0), product of:
              0.28445294 = queryWeight, product of:
                1.8075747 = boost
                7.9592175 = idf(docFreq=41, maxDocs=44218)
                0.01977169 = queryNorm
              0.4974511 = fieldWeight in 4200, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.9592175 = idf(docFreq=41, maxDocs=44218)
                0.0625 = fieldNorm(doc=4200)
          0.10565205 = weight(abstract_txt:negative in 4200) [ClassicSimilarity], result of:
            0.10565205 = score(doc=4200,freq=1.0), product of:
              0.2679902 = queryWeight, product of:
                2.1488006 = boost
                6.3078156 = idf(docFreq=218, maxDocs=44218)
                0.01977169 = queryNorm
              0.39423847 = fieldWeight in 4200, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.3078156 = idf(docFreq=218, maxDocs=44218)
                0.0625 = fieldNorm(doc=4200)
          0.1314447 = weight(abstract_txt:detection in 4200) [ClassicSimilarity], result of:
            0.1314447 = score(doc=4200,freq=1.0), product of:
              0.31000045 = queryWeight, product of:
                2.3110952 = boost
                6.784232 = idf(docFreq=135, maxDocs=44218)
                0.01977169 = queryNorm
              0.4240145 = fieldWeight in 4200, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.784232 = idf(docFreq=135, maxDocs=44218)
                0.0625 = fieldNorm(doc=4200)
        0.24 = coord(6/25)
    
  5. Thelwall, M.; Wilkinson, D.; Uppal, S.: Data mining emotion in social network communication : gender differences in MySpace (2009) 0.11
    0.10983999 = sum of:
      0.10983999 = product of:
        0.54919994 = sum of:
          0.009857598 = weight(abstract_txt:this in 3322) [ClassicSimilarity], result of:
            0.009857598 = score(doc=3322,freq=1.0), product of:
              0.052290235 = queryWeight, product of:
                1.0960146 = boost
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.01977169 = queryNorm
              0.18851699 = fieldWeight in 3322, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.078125 = fieldNorm(doc=3322)
          0.06472668 = weight(abstract_txt:site in 3322) [ClassicSimilarity], result of:
            0.06472668 = score(doc=3322,freq=1.0), product of:
              0.14552985 = queryWeight, product of:
                1.2929064 = boost
                5.6930003 = idf(docFreq=404, maxDocs=44218)
                0.01977169 = queryNorm
              0.44476566 = fieldWeight in 3322, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6930003 = idf(docFreq=404, maxDocs=44218)
                0.078125 = fieldNorm(doc=3322)
          0.038428355 = weight(abstract_txt:content in 3322) [ClassicSimilarity], result of:
            0.038428355 = score(doc=3322,freq=1.0), product of:
              0.117677875 = queryWeight, product of:
                1.423915 = boost
                4.17991 = idf(docFreq=1838, maxDocs=44218)
                0.01977169 = queryNorm
              0.3265555 = fieldWeight in 3322, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.17991 = idf(docFreq=1838, maxDocs=44218)
                0.078125 = fieldNorm(doc=3322)
          0.20744391 = weight(abstract_txt:comments in 3322) [ClassicSimilarity], result of:
            0.20744391 = score(doc=3322,freq=5.0), product of:
              0.18500082 = queryWeight, product of:
                1.4577327 = boost
                6.4187727 = idf(docFreq=195, maxDocs=44218)
                0.01977169 = queryNorm
              1.1213135 = fieldWeight in 3322, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.4187727 = idf(docFreq=195, maxDocs=44218)
                0.078125 = fieldNorm(doc=3322)
          0.22874339 = weight(abstract_txt:negative in 3322) [ClassicSimilarity], result of:
            0.22874339 = score(doc=3322,freq=3.0), product of:
              0.2679902 = queryWeight, product of:
                2.1488006 = boost
                6.3078156 = idf(docFreq=218, maxDocs=44218)
                0.01977169 = queryNorm
              0.8535513 = fieldWeight in 3322, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.3078156 = idf(docFreq=218, maxDocs=44218)
                0.078125 = fieldNorm(doc=3322)
        0.2 = coord(5/25)