Document (#37955)

Author
Xiao, C.
Zhou, F.
Wu, Y.
Title
Predicting audience gender in online content-sharing social networks
Source
Journal of the American Society for Information Science and Technology. 64(2013) no.6, S.1284-1297
Year
2013
Abstract
Understanding the behavior and characteristics of web users is valuable when improving information dissemination, designing recommendation systems, and so on. In this work, we explore various methods of predicting the ratio of male viewers to female viewers on YouTube. First, we propose and examine two hypotheses relating to audience consistency and topic consistency. The former means that videos made by the same authors tend to have similar male-to-female audience ratios, whereas the latter means that videos with similar topics tend to have similar audience gender ratios. To predict the audience gender ratio before video publication, two features based on these two hypotheses and other features are used in multiple linear regression (MLR) and support vector regression (SVR). We find that these two features are the key indicators of audience gender, whereas other features, such as gender of the user and duration of the video, have limited relationships. Second, another method is explored to predict the audience gender ratio. Specifically, we use the early comments collected after video publication to predict the ratio via simple linear regression (SLR). The experiments indicate that this model can achieve better performance by using a few early comments. We also observe that the correlation between the number of early comments (cost) and the predictive accuracy (gain) follows the law of diminishing marginal utility. We build the functions of these elements via curve fitting to find the appropriate number of early comments (approximately 250) that can achieve maximum gain at minimum cost.
Theme
Internet

Similar documents (author)

  1. Zhou, H.; Xiao, L.; Liu, Y.; Chen, X.: ¬The effect of prediscussion note-taking in hidden profile tasks (2018) 3.69
    3.688078 = sum of:
      3.688078 = sum of:
        1.5034795 = weight(author_txt:zhou in 4184) [ClassicSimilarity], result of:
          1.5034795 = score(doc=4184,freq=1.0), product of:
            0.6147876 = queryWeight, product of:
              7.825686 = idf(docFreq=47, maxDocs=44218)
              0.07856022 = queryNorm
            2.4455268 = fieldWeight in 4184, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              7.825686 = idf(docFreq=47, maxDocs=44218)
              0.3125 = fieldNorm(doc=4184)
        2.1845984 = weight(author_txt:xiao in 4184) [ClassicSimilarity], result of:
          2.1845984 = score(doc=4184,freq=1.0), product of:
            0.78869265 = queryWeight, product of:
              1.1326386 = boost
              8.863674 = idf(docFreq=16, maxDocs=44218)
              0.07856022 = queryNorm
            2.7698982 = fieldWeight in 4184, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              8.863674 = idf(docFreq=16, maxDocs=44218)
              0.3125 = fieldNorm(doc=4184)
    
  2. Xiao, Y.: Modern development of classification : research and practice in the People's Republic of China (1992) 2.18
    2.1845984 = sum of:
      2.1845984 = product of:
        4.369197 = sum of:
          4.369197 = weight(author_txt:xiao in 1909) [ClassicSimilarity], result of:
            4.369197 = score(doc=1909,freq=1.0), product of:
              0.78869265 = queryWeight, product of:
                1.1326386 = boost
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.07856022 = queryNorm
              5.5397964 = fieldWeight in 1909, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.625 = fieldNorm(doc=1909)
        0.5 = coord(1/2)
    
  3. Xiao, Y.: Faceted classification : a consideration of its features as a paradigm of knowledge organization (1994) 2.18
    2.1845984 = sum of:
      2.1845984 = product of:
        4.369197 = sum of:
          4.369197 = weight(author_txt:xiao in 7547) [ClassicSimilarity], result of:
            4.369197 = score(doc=7547,freq=1.0), product of:
              0.78869265 = queryWeight, product of:
                1.1326386 = boost
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.07856022 = queryNorm
              5.5397964 = fieldWeight in 7547, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.625 = fieldNorm(doc=7547)
        0.5 = coord(1/2)
    
  4. Xiao, G.: ¬A knowledge classification model based on the relationship between science and human needs (2013) 2.18
    2.1845984 = sum of:
      2.1845984 = product of:
        4.369197 = sum of:
          4.369197 = weight(author_txt:xiao in 138) [ClassicSimilarity], result of:
            4.369197 = score(doc=138,freq=1.0), product of:
              0.78869265 = queryWeight, product of:
                1.1326386 = boost
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.07856022 = queryNorm
              5.5397964 = fieldWeight in 138, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.625 = fieldNorm(doc=138)
        0.5 = coord(1/2)
    
  5. Xiao, L.: Effects of rationale awareness in online ideation crowdsourcing tasks (2014) 2.18
    2.1845984 = sum of:
      2.1845984 = product of:
        4.369197 = sum of:
          4.369197 = weight(author_txt:xiao in 1329) [ClassicSimilarity], result of:
            4.369197 = score(doc=1329,freq=1.0), product of:
              0.78869265 = queryWeight, product of:
                1.1326386 = boost
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.07856022 = queryNorm
              5.5397964 = fieldWeight in 1329, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.625 = fieldNorm(doc=1329)
        0.5 = coord(1/2)
    

Similar documents (content)

  1. Thelwall, M.; Foster, D.: Male or female gender-polarized YouTube videos are less viewed (2021) 0.33
    0.3334145 = sum of:
      0.3334145 = product of:
        1.3892272 = sum of:
          0.05747931 = weight(abstract_txt:tend in 414) [ClassicSimilarity], result of:
            0.05747931 = score(doc=414,freq=1.0), product of:
              0.11434763 = queryWeight, product of:
                1.119722 = boost
                6.434197 = idf(docFreq=192, maxDocs=44218)
                0.015871668 = queryNorm
              0.50267166 = fieldWeight in 414, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.434197 = idf(docFreq=192, maxDocs=44218)
                0.078125 = fieldNorm(doc=414)
          0.14555098 = weight(abstract_txt:videos in 414) [ClassicSimilarity], result of:
            0.14555098 = score(doc=414,freq=4.0), product of:
              0.13382673 = queryWeight, product of:
                1.2113457 = boost
                6.9606886 = idf(docFreq=113, maxDocs=44218)
                0.015871668 = queryNorm
              1.0876076 = fieldWeight in 414, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.9606886 = idf(docFreq=113, maxDocs=44218)
                0.078125 = fieldNorm(doc=414)
          0.26318648 = weight(abstract_txt:male in 414) [ClassicSimilarity], result of:
            0.26318648 = score(doc=414,freq=5.0), product of:
              0.18439049 = queryWeight, product of:
                1.4218897 = boost
                8.1705265 = idf(docFreq=33, maxDocs=44218)
                0.015871668 = queryNorm
              1.4273323 = fieldWeight in 414, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                8.1705265 = idf(docFreq=33, maxDocs=44218)
                0.078125 = fieldNorm(doc=414)
          0.278859 = weight(abstract_txt:female in 414) [ClassicSimilarity], result of:
            0.278859 = score(doc=414,freq=5.0), product of:
              0.19163987 = queryWeight, product of:
                1.4495713 = boost
                8.329592 = idf(docFreq=28, maxDocs=44218)
                0.015871668 = queryNorm
              1.4551198 = fieldWeight in 414, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                8.329592 = idf(docFreq=28, maxDocs=44218)
                0.078125 = fieldNorm(doc=414)
          0.221643 = weight(abstract_txt:viewers in 414) [ClassicSimilarity], result of:
            0.221643 = score(doc=414,freq=2.0), product of:
              0.22317497 = queryWeight, product of:
                1.5642977 = boost
                8.988837 = idf(docFreq=14, maxDocs=44218)
                0.015871668 = queryNorm
              0.9931356 = fieldWeight in 414, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.988837 = idf(docFreq=14, maxDocs=44218)
                0.078125 = fieldNorm(doc=414)
          0.42250836 = weight(abstract_txt:gender in 414) [ClassicSimilarity], result of:
            0.42250836 = score(doc=414,freq=4.0), product of:
              0.39276254 = queryWeight, product of:
                3.594366 = boost
                6.8847027 = idf(docFreq=122, maxDocs=44218)
                0.015871668 = queryNorm
              1.0757349 = fieldWeight in 414, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.8847027 = idf(docFreq=122, maxDocs=44218)
                0.078125 = fieldNorm(doc=414)
        0.24 = coord(6/25)
    
  2. Thelwall, M.; Sud, P.; Vis, F.: Commenting on YouTube videos : From guatemalan rock to El Big Bang (2012) 0.18
    0.1810455 = sum of:
      0.1810455 = product of:
        0.9052275 = sum of:
          0.042149235 = weight(abstract_txt:whereas in 63) [ClassicSimilarity], result of:
            0.042149235 = score(doc=63,freq=1.0), product of:
              0.10789946 = queryWeight, product of:
                1.0876929 = boost
                6.2501497 = idf(docFreq=231, maxDocs=44218)
                0.015871668 = queryNorm
              0.39063436 = fieldWeight in 63, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2501497 = idf(docFreq=231, maxDocs=44218)
                0.0625 = fieldNorm(doc=63)
          0.16467212 = weight(abstract_txt:videos in 63) [ClassicSimilarity], result of:
            0.16467212 = score(doc=63,freq=8.0), product of:
              0.13382673 = queryWeight, product of:
                1.2113457 = boost
                6.9606886 = idf(docFreq=113, maxDocs=44218)
                0.015871668 = queryNorm
              1.2304875 = fieldWeight in 63, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                6.9606886 = idf(docFreq=113, maxDocs=44218)
                0.0625 = fieldNorm(doc=63)
          0.09416046 = weight(abstract_txt:male in 63) [ClassicSimilarity], result of:
            0.09416046 = score(doc=63,freq=1.0), product of:
              0.18439049 = queryWeight, product of:
                1.4218897 = boost
                8.1705265 = idf(docFreq=33, maxDocs=44218)
                0.015871668 = queryNorm
              0.5106579 = fieldWeight in 63, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.1705265 = idf(docFreq=33, maxDocs=44218)
                0.0625 = fieldNorm(doc=63)
          0.24157579 = weight(abstract_txt:comments in 63) [ClassicSimilarity], result of:
            0.24157579 = score(doc=63,freq=7.0), product of:
              0.22760008 = queryWeight, product of:
                2.2340755 = boost
                6.4187727 = idf(docFreq=195, maxDocs=44218)
                0.015871668 = queryNorm
              1.0614047 = fieldWeight in 63, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                6.4187727 = idf(docFreq=195, maxDocs=44218)
                0.0625 = fieldNorm(doc=63)
          0.3626699 = weight(abstract_txt:audience in 63) [ClassicSimilarity], result of:
            0.3626699 = score(doc=63,freq=3.0), product of:
              0.47696084 = queryWeight, product of:
                4.278308 = boost
                7.0240583 = idf(docFreq=106, maxDocs=44218)
                0.015871668 = queryNorm
              0.76037663 = fieldWeight in 63, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.0240583 = idf(docFreq=106, maxDocs=44218)
                0.0625 = fieldNorm(doc=63)
        0.2 = coord(5/25)
    
  3. Aksnes, D.W.; Rorstad, K.; Piro, F.; Sivertsen, G.: Are female researchers less cited? : a large-scale study of Norwegian scientists (2011) 0.15
    0.14898571 = sum of:
      0.14898571 = product of:
        0.62077385 = sum of:
          0.05747931 = weight(abstract_txt:tend in 639) [ClassicSimilarity], result of:
            0.05747931 = score(doc=639,freq=1.0), product of:
              0.11434763 = queryWeight, product of:
                1.119722 = boost
                6.434197 = idf(docFreq=192, maxDocs=44218)
                0.015871668 = queryNorm
              0.50267166 = fieldWeight in 639, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.434197 = idf(docFreq=192, maxDocs=44218)
                0.078125 = fieldNorm(doc=639)
          0.012179268 = weight(abstract_txt:that in 639) [ClassicSimilarity], result of:
            0.012179268 = score(doc=639,freq=2.0), product of:
              0.046522602 = queryWeight, product of:
                1.2370558 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.015871668 = queryNorm
              0.26179248 = fieldWeight in 639, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.078125 = fieldNorm(doc=639)
          0.04579463 = weight(abstract_txt:similar in 639) [ClassicSimilarity], result of:
            0.04579463 = score(doc=639,freq=1.0), product of:
              0.112493195 = queryWeight, product of:
                1.3602083 = boost
                5.2107263 = idf(docFreq=655, maxDocs=44218)
                0.015871668 = queryNorm
              0.40708798 = fieldWeight in 639, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2107263 = idf(docFreq=655, maxDocs=44218)
                0.078125 = fieldNorm(doc=639)
          0.11770057 = weight(abstract_txt:male in 639) [ClassicSimilarity], result of:
            0.11770057 = score(doc=639,freq=1.0), product of:
              0.18439049 = queryWeight, product of:
                1.4218897 = boost
                8.1705265 = idf(docFreq=33, maxDocs=44218)
                0.015871668 = queryNorm
              0.63832235 = fieldWeight in 639, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.1705265 = idf(docFreq=33, maxDocs=44218)
                0.078125 = fieldNorm(doc=639)
          0.1763659 = weight(abstract_txt:female in 639) [ClassicSimilarity], result of:
            0.1763659 = score(doc=639,freq=2.0), product of:
              0.19163987 = queryWeight, product of:
                1.4495713 = boost
                8.329592 = idf(docFreq=28, maxDocs=44218)
                0.015871668 = queryNorm
              0.9202986 = fieldWeight in 639, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.329592 = idf(docFreq=28, maxDocs=44218)
                0.078125 = fieldNorm(doc=639)
          0.21125418 = weight(abstract_txt:gender in 639) [ClassicSimilarity], result of:
            0.21125418 = score(doc=639,freq=1.0), product of:
              0.39276254 = queryWeight, product of:
                3.594366 = boost
                6.8847027 = idf(docFreq=122, maxDocs=44218)
                0.015871668 = queryNorm
              0.5378674 = fieldWeight in 639, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8847027 = idf(docFreq=122, maxDocs=44218)
                0.078125 = fieldNorm(doc=639)
        0.24 = coord(6/25)
    
  4. Liu, Z.; Huang, X.: Gender differences in the online reading environment (2008) 0.14
    0.14367297 = sum of:
      0.14367297 = product of:
        0.71836483 = sum of:
          0.052686542 = weight(abstract_txt:whereas in 2215) [ClassicSimilarity], result of:
            0.052686542 = score(doc=2215,freq=1.0), product of:
              0.10789946 = queryWeight, product of:
                1.0876929 = boost
                6.2501497 = idf(docFreq=231, maxDocs=44218)
                0.015871668 = queryNorm
              0.48829293 = fieldWeight in 2215, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2501497 = idf(docFreq=231, maxDocs=44218)
                0.078125 = fieldNorm(doc=2215)
          0.008612043 = weight(abstract_txt:that in 2215) [ClassicSimilarity], result of:
            0.008612043 = score(doc=2215,freq=1.0), product of:
              0.046522602 = queryWeight, product of:
                1.2370558 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.015871668 = queryNorm
              0.18511525 = fieldWeight in 2215, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.078125 = fieldNorm(doc=2215)
          0.16645375 = weight(abstract_txt:male in 2215) [ClassicSimilarity], result of:
            0.16645375 = score(doc=2215,freq=2.0), product of:
              0.18439049 = queryWeight, product of:
                1.4218897 = boost
                8.1705265 = idf(docFreq=33, maxDocs=44218)
                0.015871668 = queryNorm
              0.9027242 = fieldWeight in 2215, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.1705265 = idf(docFreq=33, maxDocs=44218)
                0.078125 = fieldNorm(doc=2215)
          0.12470952 = weight(abstract_txt:female in 2215) [ClassicSimilarity], result of:
            0.12470952 = score(doc=2215,freq=1.0), product of:
              0.19163987 = queryWeight, product of:
                1.4495713 = boost
                8.329592 = idf(docFreq=28, maxDocs=44218)
                0.015871668 = queryNorm
              0.6507493 = fieldWeight in 2215, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.329592 = idf(docFreq=28, maxDocs=44218)
                0.078125 = fieldNorm(doc=2215)
          0.36590296 = weight(abstract_txt:gender in 2215) [ClassicSimilarity], result of:
            0.36590296 = score(doc=2215,freq=3.0), product of:
              0.39276254 = queryWeight, product of:
                3.594366 = boost
                6.8847027 = idf(docFreq=122, maxDocs=44218)
                0.015871668 = queryNorm
              0.9316137 = fieldWeight in 2215, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.8847027 = idf(docFreq=122, maxDocs=44218)
                0.078125 = fieldNorm(doc=2215)
        0.2 = coord(5/25)
    
  5. Pan, X.; Yan, E.; Hua, W.: Science communication and dissemination in different cultures : an analysis of the audience for TED videos in China and abroad (2016) 0.14
    0.1396369 = sum of:
      0.1396369 = product of:
        0.5818204 = sum of:
          0.07300462 = weight(abstract_txt:whereas in 2938) [ClassicSimilarity], result of:
            0.07300462 = score(doc=2938,freq=3.0), product of:
              0.10789946 = queryWeight, product of:
                1.0876929 = boost
                6.2501497 = idf(docFreq=231, maxDocs=44218)
                0.015871668 = queryNorm
              0.67659855 = fieldWeight in 2938, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.2501497 = idf(docFreq=231, maxDocs=44218)
                0.0625 = fieldNorm(doc=2938)
          0.100840665 = weight(abstract_txt:videos in 2938) [ClassicSimilarity], result of:
            0.100840665 = score(doc=2938,freq=3.0), product of:
              0.13382673 = queryWeight, product of:
                1.2113457 = boost
                6.9606886 = idf(docFreq=113, maxDocs=44218)
                0.015871668 = queryNorm
              0.7535166 = fieldWeight in 2938, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.9606886 = idf(docFreq=113, maxDocs=44218)
                0.0625 = fieldNorm(doc=2938)
          0.009743414 = weight(abstract_txt:that in 2938) [ClassicSimilarity], result of:
            0.009743414 = score(doc=2938,freq=2.0), product of:
              0.046522602 = queryWeight, product of:
                1.2370558 = boost
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.015871668 = queryNorm
              0.20943399 = fieldWeight in 2938, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3694751 = idf(docFreq=11241, maxDocs=44218)
                0.0625 = fieldNorm(doc=2938)
          0.059716463 = weight(abstract_txt:video in 2938) [ClassicSimilarity], result of:
            0.059716463 = score(doc=2938,freq=1.0), product of:
              0.15580663 = queryWeight, product of:
                1.6007932 = boost
                6.1323667 = idf(docFreq=260, maxDocs=44218)
                0.015871668 = queryNorm
              0.38327292 = fieldWeight in 2938, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1323667 = idf(docFreq=260, maxDocs=44218)
                0.0625 = fieldNorm(doc=2938)
          0.12912771 = weight(abstract_txt:comments in 2938) [ClassicSimilarity], result of:
            0.12912771 = score(doc=2938,freq=2.0), product of:
              0.22760008 = queryWeight, product of:
                2.2340755 = boost
                6.4187727 = idf(docFreq=195, maxDocs=44218)
                0.015871668 = queryNorm
              0.5673447 = fieldWeight in 2938, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.4187727 = idf(docFreq=195, maxDocs=44218)
                0.0625 = fieldNorm(doc=2938)
          0.20938754 = weight(abstract_txt:audience in 2938) [ClassicSimilarity], result of:
            0.20938754 = score(doc=2938,freq=1.0), product of:
              0.47696084 = queryWeight, product of:
                4.278308 = boost
                7.0240583 = idf(docFreq=106, maxDocs=44218)
                0.015871668 = queryNorm
              0.43900365 = fieldWeight in 2938, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0240583 = idf(docFreq=106, maxDocs=44218)
                0.0625 = fieldNorm(doc=2938)
        0.24 = coord(6/25)