Document (#37956)

Author
Xiao, C.
Zhou, F.
Wu, Y.
Title
Predicting audience gender in online content-sharing social networks
Source
Journal of the American Society for Information Science and Technology. 64(2013) no.6, S.1284-1297
Year
2013
Abstract
Understanding the behavior and characteristics of web users is valuable when improving information dissemination, designing recommendation systems, and so on. In this work, we explore various methods of predicting the ratio of male viewers to female viewers on YouTube. First, we propose and examine two hypotheses relating to audience consistency and topic consistency. The former means that videos made by the same authors tend to have similar male-to-female audience ratios, whereas the latter means that videos with similar topics tend to have similar audience gender ratios. To predict the audience gender ratio before video publication, two features based on these two hypotheses and other features are used in multiple linear regression (MLR) and support vector regression (SVR). We find that these two features are the key indicators of audience gender, whereas other features, such as gender of the user and duration of the video, have limited relationships. Second, another method is explored to predict the audience gender ratio. Specifically, we use the early comments collected after video publication to predict the ratio via simple linear regression (SLR). The experiments indicate that this model can achieve better performance by using a few early comments. We also observe that the correlation between the number of early comments (cost) and the predictive accuracy (gain) follows the law of diminishing marginal utility. We build the functions of these elements via curve fitting to find the appropriate number of early comments (approximately 250) that can achieve maximum gain at minimum cost.
Theme
Internet

Similar documents (author)

  1. Zhou, H.; Xiao, L.; Liu, Y.; Chen, X.: ¬The effect of prediscussion note-taking in hidden profile tasks (2018) 3.79
    3.7920473 = sum of:
      3.7920473 = sum of:
        1.6039932 = weight(author_txt:zhou in 185) [ClassicSimilarity], result of:
          1.6039932 = score(doc=185,freq=1.0), product of:
            0.63083136 = queryWeight, product of:
              8.13653 = idf(docFreq=33, maxDocs=42740)
              0.077530764 = queryNorm
            2.5426655 = fieldWeight in 185, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.13653 = idf(docFreq=33, maxDocs=42740)
              0.3125 = fieldNorm(doc=185)
        2.188054 = weight(author_txt:xiao in 185) [ClassicSimilarity], result of:
          2.188054 = score(doc=185,freq=1.0), product of:
            0.77592003 = queryWeight, product of:
              1.1090518 = boost
              9.023833 = idf(docFreq=13, maxDocs=42740)
              0.077530764 = queryNorm
            2.819948 = fieldWeight in 185, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              9.023833 = idf(docFreq=13, maxDocs=42740)
              0.3125 = fieldNorm(doc=185)
    
  2. Xiao, Y.: Modern development of classification : research and practice in the People's Republic of China (1992) 2.19
    2.188054 = sum of:
      2.188054 = product of:
        4.376108 = sum of:
          4.376108 = weight(author_txt:xiao in 1909) [ClassicSimilarity], result of:
            4.376108 = score(doc=1909,freq=1.0), product of:
              0.77592003 = queryWeight, product of:
                1.1090518 = boost
                9.023833 = idf(docFreq=13, maxDocs=42740)
                0.077530764 = queryNorm
              5.639896 = fieldWeight in 1909, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.023833 = idf(docFreq=13, maxDocs=42740)
                0.625 = fieldNorm(doc=1909)
        0.5 = coord(1/2)
    
  3. Xiao, Y.: Faceted classification : a consideration of its features as a paradigm of knowledge organization (1994) 2.19
    2.188054 = sum of:
      2.188054 = product of:
        4.376108 = sum of:
          4.376108 = weight(author_txt:xiao in 7547) [ClassicSimilarity], result of:
            4.376108 = score(doc=7547,freq=1.0), product of:
              0.77592003 = queryWeight, product of:
                1.1090518 = boost
                9.023833 = idf(docFreq=13, maxDocs=42740)
                0.077530764 = queryNorm
              5.639896 = fieldWeight in 7547, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.023833 = idf(docFreq=13, maxDocs=42740)
                0.625 = fieldNorm(doc=7547)
        0.5 = coord(1/2)
    
  4. Xiao, G.: ¬A knowledge classification model based on the relationship between science and human needs (2013) 2.19
    2.188054 = sum of:
      2.188054 = product of:
        4.376108 = sum of:
          4.376108 = weight(author_txt:xiao in 2139) [ClassicSimilarity], result of:
            4.376108 = score(doc=2139,freq=1.0), product of:
              0.77592003 = queryWeight, product of:
                1.1090518 = boost
                9.023833 = idf(docFreq=13, maxDocs=42740)
                0.077530764 = queryNorm
              5.639896 = fieldWeight in 2139, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.023833 = idf(docFreq=13, maxDocs=42740)
                0.625 = fieldNorm(doc=2139)
        0.5 = coord(1/2)
    
  5. Xiao, L.: Effects of rationale awareness in online ideation crowdsourcing tasks (2014) 2.19
    2.188054 = sum of:
      2.188054 = product of:
        4.376108 = sum of:
          4.376108 = weight(author_txt:xiao in 3330) [ClassicSimilarity], result of:
            4.376108 = score(doc=3330,freq=1.0), product of:
              0.77592003 = queryWeight, product of:
                1.1090518 = boost
                9.023833 = idf(docFreq=13, maxDocs=42740)
                0.077530764 = queryNorm
              5.639896 = fieldWeight in 3330, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.023833 = idf(docFreq=13, maxDocs=42740)
                0.625 = fieldNorm(doc=3330)
        0.5 = coord(1/2)
    

Similar documents (content)

  1. Thelwall, M.; Sud, P.; Vis, F.: Commenting on YouTube videos : From guatemalan rock to El Big Bang (2012) 0.18
    0.18027341 = sum of:
      0.18027341 = product of:
        0.90136707 = sum of:
          0.041904293 = weight(abstract_txt:whereas in 2064) [ClassicSimilarity], result of:
            0.041904293 = score(doc=2064,freq=1.0), product of:
              0.10678981 = queryWeight, product of:
                1.0940574 = boost
                6.2783957 = idf(docFreq=217, maxDocs=42740)
                0.015546801 = queryNorm
              0.39239973 = fieldWeight in 2064, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2783957 = idf(docFreq=217, maxDocs=42740)
                0.0625 = fieldNorm(doc=2064)
          0.16291663 = weight(abstract_txt:videos in 2064) [ClassicSimilarity], result of:
            0.16291663 = score(doc=2064,freq=8.0), product of:
              0.1320194 = queryWeight, product of:
                1.2164495 = boost
                6.980759 = idf(docFreq=107, maxDocs=42740)
                0.015546801 = queryNorm
              1.2340355 = fieldWeight in 2064, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                6.980759 = idf(docFreq=107, maxDocs=42740)
                0.0625 = fieldNorm(doc=2064)
          0.09789369 = weight(abstract_txt:male in 2064) [ClassicSimilarity], result of:
            0.09789369 = score(doc=2064,freq=1.0), product of:
              0.18801562 = queryWeight, product of:
                1.4516842 = boost
                8.330686 = idf(docFreq=27, maxDocs=42740)
                0.015546801 = queryNorm
              0.52066785 = fieldWeight in 2064, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.330686 = idf(docFreq=27, maxDocs=42740)
                0.0625 = fieldNorm(doc=2064)
          0.237796 = weight(abstract_txt:comments in 2064) [ClassicSimilarity], result of:
            0.237796 = score(doc=2064,freq=7.0), product of:
              0.22377136 = queryWeight, product of:
                2.2397134 = boost
                6.4264483 = idf(docFreq=187, maxDocs=42740)
                0.015546801 = queryNorm
              1.0626739 = fieldWeight in 2064, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                6.4264483 = idf(docFreq=187, maxDocs=42740)
                0.0625 = fieldNorm(doc=2064)
          0.36085644 = weight(abstract_txt:audience in 2064) [ClassicSimilarity], result of:
            0.36085644 = score(doc=2064,freq=3.0), product of:
              0.47231245 = queryWeight, product of:
                4.304512 = boost
                7.05772 = idf(docFreq=99, maxDocs=42740)
                0.015546801 = queryNorm
              0.7640206 = fieldWeight in 2064, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.05772 = idf(docFreq=99, maxDocs=42740)
                0.0625 = fieldNorm(doc=2064)
        0.2 = coord(5/25)
    
  2. Aksnes, D.W.; Rorstad, K.; Piro, F.; Sivertsen, G.: Are female researchers less cited? : a large-scale study of Norwegian scientists (2011) 0.15
    0.15484671 = sum of:
      0.15484671 = product of:
        0.64519465 = sum of:
          0.057769965 = weight(abstract_txt:tend in 708) [ClassicSimilarity], result of:
            0.057769965 = score(doc=708,freq=1.0), product of:
              0.113994926 = queryWeight, product of:
                1.1303631 = boost
                6.4867406 = idf(docFreq=176, maxDocs=42740)
                0.015546801 = queryNorm
              0.50677663 = fieldWeight in 708, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.4867406 = idf(docFreq=176, maxDocs=42740)
                0.078125 = fieldNorm(doc=708)
          0.012330751 = weight(abstract_txt:that in 708) [ClassicSimilarity], result of:
            0.012330751 = score(doc=708,freq=2.0), product of:
              0.0466059 = queryWeight, product of:
                1.2518615 = boost
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.015546801 = queryNorm
              0.2645749 = fieldWeight in 708, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.078125 = fieldNorm(doc=708)
          0.045711227 = weight(abstract_txt:similar in 708) [ClassicSimilarity], result of:
            0.045711227 = score(doc=708,freq=1.0), product of:
              0.11163399 = queryWeight, product of:
                1.3699952 = boost
                5.241268 = idf(docFreq=614, maxDocs=42740)
                0.015546801 = queryNorm
              0.40947407 = fieldWeight in 708, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.241268 = idf(docFreq=614, maxDocs=42740)
                0.078125 = fieldNorm(doc=708)
          0.122367114 = weight(abstract_txt:male in 708) [ClassicSimilarity], result of:
            0.122367114 = score(doc=708,freq=1.0), product of:
              0.18801562 = queryWeight, product of:
                1.4516842 = boost
                8.330686 = idf(docFreq=27, maxDocs=42740)
                0.015546801 = queryNorm
              0.6508348 = fieldWeight in 708, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.330686 = idf(docFreq=27, maxDocs=42740)
                0.078125 = fieldNorm(doc=708)
          0.1828386 = weight(abstract_txt:female in 708) [ClassicSimilarity], result of:
            0.1828386 = score(doc=708,freq=2.0), product of:
              0.19503808 = queryWeight, product of:
                1.4785463 = boost
                8.484837 = idf(docFreq=23, maxDocs=42740)
                0.015546801 = queryNorm
              0.9374508 = fieldWeight in 708, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.484837 = idf(docFreq=23, maxDocs=42740)
                0.078125 = fieldNorm(doc=708)
          0.224177 = weight(abstract_txt:gender in 708) [ClassicSimilarity], result of:
            0.224177 = score(doc=708,freq=1.0), product of:
              0.40599304 = queryWeight, product of:
                3.6948357 = boost
                7.0677705 = idf(docFreq=98, maxDocs=42740)
                0.015546801 = queryNorm
              0.55216956 = fieldWeight in 708, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0677705 = idf(docFreq=98, maxDocs=42740)
                0.078125 = fieldNorm(doc=708)
        0.24 = coord(6/25)
    
  3. Liu, Z.; Huang, X.: Gender differences in the online reading environment (2008) 0.15
    0.15034504 = sum of:
      0.15034504 = product of:
        0.7517252 = sum of:
          0.052380368 = weight(abstract_txt:whereas in 4216) [ClassicSimilarity], result of:
            0.052380368 = score(doc=4216,freq=1.0), product of:
              0.10678981 = queryWeight, product of:
                1.0940574 = boost
                6.2783957 = idf(docFreq=217, maxDocs=42740)
                0.015546801 = queryNorm
              0.49049968 = fieldWeight in 4216, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2783957 = idf(docFreq=217, maxDocs=42740)
                0.078125 = fieldNorm(doc=4216)
          0.008719158 = weight(abstract_txt:that in 4216) [ClassicSimilarity], result of:
            0.008719158 = score(doc=4216,freq=1.0), product of:
              0.0466059 = queryWeight, product of:
                1.2518615 = boost
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.015546801 = queryNorm
              0.18708271 = fieldWeight in 4216, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.078125 = fieldNorm(doc=4216)
          0.17305323 = weight(abstract_txt:male in 4216) [ClassicSimilarity], result of:
            0.17305323 = score(doc=4216,freq=2.0), product of:
              0.18801562 = queryWeight, product of:
                1.4516842 = boost
                8.330686 = idf(docFreq=27, maxDocs=42740)
                0.015546801 = queryNorm
              0.9204194 = fieldWeight in 4216, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.330686 = idf(docFreq=27, maxDocs=42740)
                0.078125 = fieldNorm(doc=4216)
          0.12928642 = weight(abstract_txt:female in 4216) [ClassicSimilarity], result of:
            0.12928642 = score(doc=4216,freq=1.0), product of:
              0.19503808 = queryWeight, product of:
                1.4785463 = boost
                8.484837 = idf(docFreq=23, maxDocs=42740)
                0.015546801 = queryNorm
              0.66287786 = fieldWeight in 4216, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.484837 = idf(docFreq=23, maxDocs=42740)
                0.078125 = fieldNorm(doc=4216)
          0.38828596 = weight(abstract_txt:gender in 4216) [ClassicSimilarity], result of:
            0.38828596 = score(doc=4216,freq=3.0), product of:
              0.40599304 = queryWeight, product of:
                3.6948357 = boost
                7.0677705 = idf(docFreq=98, maxDocs=42740)
                0.015546801 = queryNorm
              0.95638573 = fieldWeight in 4216, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.0677705 = idf(docFreq=98, maxDocs=42740)
                0.078125 = fieldNorm(doc=4216)
        0.2 = coord(5/25)
    
  4. Pan, X.; Yan, E.; Hua, W.: Science communication and dissemination in different cultures : an analysis of the audience for TED videos in China and abroad (2016) 0.14
    0.13841315 = sum of:
      0.13841315 = product of:
        0.57672143 = sum of:
          0.07258037 = weight(abstract_txt:whereas in 4939) [ClassicSimilarity], result of:
            0.07258037 = score(doc=4939,freq=3.0), product of:
              0.10678981 = queryWeight, product of:
                1.0940574 = boost
                6.2783957 = idf(docFreq=217, maxDocs=42740)
                0.015546801 = queryNorm
              0.67965627 = fieldWeight in 4939, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.2783957 = idf(docFreq=217, maxDocs=42740)
                0.0625 = fieldNorm(doc=4939)
          0.09976565 = weight(abstract_txt:videos in 4939) [ClassicSimilarity], result of:
            0.09976565 = score(doc=4939,freq=3.0), product of:
              0.1320194 = queryWeight, product of:
                1.2164495 = boost
                6.980759 = idf(docFreq=107, maxDocs=42740)
                0.015546801 = queryNorm
              0.7556893 = fieldWeight in 4939, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.980759 = idf(docFreq=107, maxDocs=42740)
                0.0625 = fieldNorm(doc=4939)
          0.0098646 = weight(abstract_txt:that in 4939) [ClassicSimilarity], result of:
            0.0098646 = score(doc=4939,freq=2.0), product of:
              0.0466059 = queryWeight, product of:
                1.2518615 = boost
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.015546801 = queryNorm
              0.21165991 = fieldWeight in 4939, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.0625 = fieldNorm(doc=4939)
          0.05906294 = weight(abstract_txt:video in 4939) [ClassicSimilarity], result of:
            0.05906294 = score(doc=4939,freq=1.0), product of:
              0.15367313 = queryWeight, product of:
                1.6073846 = boost
                6.1494617 = idf(docFreq=247, maxDocs=42740)
                0.015546801 = queryNorm
              0.38434136 = fieldWeight in 4939, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1494617 = idf(docFreq=247, maxDocs=42740)
                0.0625 = fieldNorm(doc=4939)
          0.1271073 = weight(abstract_txt:comments in 4939) [ClassicSimilarity], result of:
            0.1271073 = score(doc=4939,freq=2.0), product of:
              0.22377136 = queryWeight, product of:
                2.2397134 = boost
                6.4264483 = idf(docFreq=187, maxDocs=42740)
                0.015546801 = queryNorm
              0.56802315 = fieldWeight in 4939, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.4264483 = idf(docFreq=187, maxDocs=42740)
                0.0625 = fieldNorm(doc=4939)
          0.20834057 = weight(abstract_txt:audience in 4939) [ClassicSimilarity], result of:
            0.20834057 = score(doc=4939,freq=1.0), product of:
              0.47231245 = queryWeight, product of:
                4.304512 = boost
                7.05772 = idf(docFreq=99, maxDocs=42740)
                0.015546801 = queryNorm
              0.4411075 = fieldWeight in 4939, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.05772 = idf(docFreq=99, maxDocs=42740)
                0.0625 = fieldNorm(doc=4939)
        0.24 = coord(6/25)
    
  5. Song, M.; Jeong, Y.K.; Kim, H.J.: Identifying the topology of the K-pop video community on YouTube : a combined co-comment analysis approach (2015) 0.13
    0.13374722 = sum of:
      0.13374722 = product of:
        0.5572801 = sum of:
          0.04621597 = weight(abstract_txt:tend in 4274) [ClassicSimilarity], result of:
            0.04621597 = score(doc=4274,freq=1.0), product of:
              0.113994926 = queryWeight, product of:
                1.1303631 = boost
                6.4867406 = idf(docFreq=176, maxDocs=42740)
                0.015546801 = queryNorm
              0.4054213 = fieldWeight in 4274, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.4867406 = idf(docFreq=176, maxDocs=42740)
                0.0625 = fieldNorm(doc=4274)
          0.14108995 = weight(abstract_txt:videos in 4274) [ClassicSimilarity], result of:
            0.14108995 = score(doc=4274,freq=6.0), product of:
              0.1320194 = queryWeight, product of:
                1.2164495 = boost
                6.980759 = idf(docFreq=107, maxDocs=42740)
                0.015546801 = queryNorm
              1.0687062 = fieldWeight in 4274, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.980759 = idf(docFreq=107, maxDocs=42740)
                0.0625 = fieldNorm(doc=4274)
          0.012081619 = weight(abstract_txt:that in 4274) [ClassicSimilarity], result of:
            0.012081619 = score(doc=4274,freq=3.0), product of:
              0.0466059 = queryWeight, product of:
                1.2518615 = boost
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.015546801 = queryNorm
              0.2592294 = fieldWeight in 4274, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.3946586 = idf(docFreq=10595, maxDocs=42740)
                0.0625 = fieldNorm(doc=4274)
          0.03656898 = weight(abstract_txt:similar in 4274) [ClassicSimilarity], result of:
            0.03656898 = score(doc=4274,freq=1.0), product of:
              0.11163399 = queryWeight, product of:
                1.3699952 = boost
                5.241268 = idf(docFreq=614, maxDocs=42740)
                0.015546801 = queryNorm
              0.32757926 = fieldWeight in 4274, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.241268 = idf(docFreq=614, maxDocs=42740)
                0.0625 = fieldNorm(doc=4274)
          0.08352761 = weight(abstract_txt:video in 4274) [ClassicSimilarity], result of:
            0.08352761 = score(doc=4274,freq=2.0), product of:
              0.15367313 = queryWeight, product of:
                1.6073846 = boost
                6.1494617 = idf(docFreq=247, maxDocs=42740)
                0.015546801 = queryNorm
              0.5435408 = fieldWeight in 4274, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.1494617 = idf(docFreq=247, maxDocs=42740)
                0.0625 = fieldNorm(doc=4274)
          0.237796 = weight(abstract_txt:comments in 4274) [ClassicSimilarity], result of:
            0.237796 = score(doc=4274,freq=7.0), product of:
              0.22377136 = queryWeight, product of:
                2.2397134 = boost
                6.4264483 = idf(docFreq=187, maxDocs=42740)
                0.015546801 = queryNorm
              1.0626739 = fieldWeight in 4274, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                6.4264483 = idf(docFreq=187, maxDocs=42740)
                0.0625 = fieldNorm(doc=4274)
        0.24 = coord(6/25)