Document (#38612)

Author
Ofek, N.
Rokach, L.
Title
¬A classifier to determine which Wikipedia biographies will be accepted
Source
Journal of the Association for Information Science and Technology. 66(2015) no.1, S.213-218
Year
2015
Series
Brief communication
Abstract
Wikipedia, like other encyclopedias, includes biographies of notable people. However, because it is jointly written by many contributors, it is subject to constant manipulation by contributors attempting to add biographies of non-notable people. Over time, Wikipedia has developed inclusion criteria for notable people (e.g., receiving a significant award) based on which newly contributed biographies are evaluated. In this paper we present and analyze a set of simple indicators that can be used to predict which article will eventually be accepted. These indicators do not refer to the content itself, but to meta-content features (such as the number of categories that the biography is associated with) and to author-based features (such as if it is a first-time author). By training a classifier on these features, we successfully reached a high predictive performance (area under the receiver operating characteristic [ROC] curve [AUC] of 0.97) even though we overlooked the actual biography text.
Content
Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23199/abstract.
Theme
Informationsmittel
Object
Wikipedia

Similar documents (author)

  1. Rokach, L.; Mitra, P.: Parsimonious citer-based measures : the artificial intelligence domain as a case study (2013) 4.79
    4.786621 = sum of:
      4.786621 = weight(author_txt:rokach in 2213) [ClassicSimilarity], result of:
        4.786621 = fieldWeight in 2213, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.573242 = idf(docFreq=7, maxDocs=42306)
          0.5 = fieldNorm(doc=2213)
    
  2. Blank, I.; Rokach, L.; Shani, G.: Leveraging metadata to recommend keywords for academic papers (2016) 3.59
    3.5899658 = sum of:
      3.5899658 = weight(author_txt:rokach in 151) [ClassicSimilarity], result of:
        3.5899658 = fieldWeight in 151, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.573242 = idf(docFreq=7, maxDocs=42306)
          0.375 = fieldNorm(doc=151)
    
  3. Greenstein-Messica, A.; Rokach, L.; Shabtai, A.: Personal-discount sensitivity prediction for mobile coupon conversion optimization (2017) 3.59
    3.5899658 = sum of:
      3.5899658 = weight(author_txt:rokach in 670) [ClassicSimilarity], result of:
        3.5899658 = fieldWeight in 670, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.573242 = idf(docFreq=7, maxDocs=42306)
          0.375 = fieldNorm(doc=670)
    
  4. Greenstein-Messica, A.; Rokach, L.; Shabtai, A.: Personal-discount sensitivity prediction for mobile coupon conversion optimization (2017) 3.59
    3.5899658 = sum of:
      3.5899658 = weight(author_txt:rokach in 680) [ClassicSimilarity], result of:
        3.5899658 = fieldWeight in 680, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.573242 = idf(docFreq=7, maxDocs=42306)
          0.375 = fieldNorm(doc=680)
    
  5. Rokach, L.; Kalech, M.; Blank, I.; Stern, R.: Who is going to win the next Association for the Advancement of Artificial Intelligence Fellowship Award? : evaluating researchers by mining bibliographic data (2011) 2.99
    2.9916382 = sum of:
      2.9916382 = weight(author_txt:rokach in 1946) [ClassicSimilarity], result of:
        2.9916382 = fieldWeight in 1946, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.573242 = idf(docFreq=7, maxDocs=42306)
          0.3125 = fieldNorm(doc=1946)
    

Similar documents (content)

  1. Tyrwhitt-Drake, B.: ¬The DNB on CD-ROM (1996) 0.11
    0.1107792 = sum of:
      0.1107792 = product of:
        0.9231601 = sum of:
          0.030921284 = weight(abstract_txt:content in 6707) [ClassicSimilarity], result of:
            0.030921284 = score(doc=6707,freq=1.0), product of:
              0.06718605 = queryWeight, product of:
                1.1276523 = boost
                4.207851 = idf(docFreq=1710, maxDocs=42306)
                0.014159358 = queryNorm
              0.4602337 = fieldWeight in 6707, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.207851 = idf(docFreq=1710, maxDocs=42306)
                0.109375 = fieldNorm(doc=6707)
          0.26925978 = weight(abstract_txt:biography in 6707) [ClassicSimilarity], result of:
            0.26925978 = score(doc=6707,freq=1.0), product of:
              0.28437304 = queryWeight, product of:
                2.3199565 = boost
                8.656952 = idf(docFreq=19, maxDocs=42306)
                0.014159358 = queryNorm
              0.9468541 = fieldWeight in 6707, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.656952 = idf(docFreq=19, maxDocs=42306)
                0.109375 = fieldNorm(doc=6707)
          0.62297904 = weight(abstract_txt:biographies in 6707) [ClassicSimilarity], result of:
            0.62297904 = score(doc=6707,freq=1.0), product of:
              0.62675774 = queryWeight, product of:
                4.8708024 = boost
                9.087735 = idf(docFreq=12, maxDocs=42306)
                0.014159358 = queryNorm
              0.99397105 = fieldWeight in 6707, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.087735 = idf(docFreq=12, maxDocs=42306)
                0.109375 = fieldNorm(doc=6707)
        0.12 = coord(3/25)
    
  2. Veelen, I. van: ¬The truth according to Wikipedia (2008) 0.09
    0.09450404 = sum of:
      0.09450404 = product of:
        0.33751443 = sum of:
          0.010291769 = weight(abstract_txt:will in 4140) [ClassicSimilarity], result of:
            0.010291769 = score(doc=4140,freq=1.0), product of:
              0.05676561 = queryWeight, product of:
                1.0365214 = boost
                3.867795 = idf(docFreq=2403, maxDocs=42306)
                0.014159358 = queryNorm
              0.18130289 = fieldWeight in 4140, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.867795 = idf(docFreq=2403, maxDocs=42306)
                0.046875 = fieldNorm(doc=4140)
          0.012939502 = weight(abstract_txt:time in 4140) [ClassicSimilarity], result of:
            0.012939502 = score(doc=4140,freq=1.0), product of:
              0.066125706 = queryWeight, product of:
                1.1187184 = boost
                4.1745143 = idf(docFreq=1768, maxDocs=42306)
                0.014159358 = queryNorm
              0.19568035 = fieldWeight in 4140, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1745143 = idf(docFreq=1768, maxDocs=42306)
                0.046875 = fieldNorm(doc=4140)
          0.018741129 = weight(abstract_txt:content in 4140) [ClassicSimilarity], result of:
            0.018741129 = score(doc=4140,freq=2.0), product of:
              0.06718605 = queryWeight, product of:
                1.1276523 = boost
                4.207851 = idf(docFreq=1710, maxDocs=42306)
                0.014159358 = queryNorm
              0.27894375 = fieldWeight in 4140, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.207851 = idf(docFreq=1710, maxDocs=42306)
                0.046875 = fieldNorm(doc=4140)
          0.0067727054 = weight(abstract_txt:which in 4140) [ClassicSimilarity], result of:
            0.0067727054 = score(doc=4140,freq=1.0), product of:
              0.04916211 = queryWeight, product of:
                1.1813987 = boost
                2.938938 = idf(docFreq=6085, maxDocs=42306)
                0.014159358 = queryNorm
              0.13776271 = fieldWeight in 4140, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.938938 = idf(docFreq=6085, maxDocs=42306)
                0.046875 = fieldNorm(doc=4140)
          0.03136685 = weight(abstract_txt:author in 4140) [ClassicSimilarity], result of:
            0.03136685 = score(doc=4140,freq=2.0), product of:
              0.094710015 = queryWeight, product of:
                1.3388551 = boost
                4.995958 = idf(docFreq=777, maxDocs=42306)
                0.014159358 = queryNorm
              0.33118832 = fieldWeight in 4140, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.995958 = idf(docFreq=777, maxDocs=42306)
                0.046875 = fieldNorm(doc=4140)
          0.057183787 = weight(abstract_txt:people in 4140) [ClassicSimilarity], result of:
            0.057183787 = score(doc=4140,freq=3.0), product of:
              0.14133961 = queryWeight, product of:
                2.0031488 = boost
                4.9831862 = idf(docFreq=787, maxDocs=42306)
                0.014159358 = queryNorm
              0.4045843 = fieldWeight in 4140, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.9831862 = idf(docFreq=787, maxDocs=42306)
                0.046875 = fieldNorm(doc=4140)
          0.20021869 = weight(abstract_txt:wikipedia in 4140) [ClassicSimilarity], result of:
            0.20021869 = score(doc=4140,freq=9.0), product of:
              0.2259668 = queryWeight, product of:
                2.5328157 = boost
                6.300826 = idf(docFreq=210, maxDocs=42306)
                0.014159358 = queryNorm
              0.8860536 = fieldWeight in 4140, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                6.300826 = idf(docFreq=210, maxDocs=42306)
                0.046875 = fieldNorm(doc=4140)
        0.28 = coord(7/25)
    
  3. Bounhas, I.; Elayeb, B.; Evrard, F.; Slimani, Y.: Toward a computer study of the reliability of Arabic stories (2010) 0.07
    0.07232381 = sum of:
      0.07232381 = product of:
        0.6026984 = sum of:
          0.009030274 = weight(abstract_txt:which in 710) [ClassicSimilarity], result of:
            0.009030274 = score(doc=710,freq=1.0), product of:
              0.04916211 = queryWeight, product of:
                1.1813987 = boost
                2.938938 = idf(docFreq=6085, maxDocs=42306)
                0.014159358 = queryNorm
              0.18368362 = fieldWeight in 710, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.938938 = idf(docFreq=6085, maxDocs=42306)
                0.0625 = fieldNorm(doc=710)
          0.09022505 = weight(abstract_txt:classifier in 710) [ClassicSimilarity], result of:
            0.09022505 = score(doc=710,freq=1.0), product of:
              0.19922823 = queryWeight, product of:
                1.9418294 = boost
                7.245965 = idf(docFreq=81, maxDocs=42306)
                0.014159358 = queryNorm
              0.4528728 = fieldWeight in 710, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.245965 = idf(docFreq=81, maxDocs=42306)
                0.0625 = fieldNorm(doc=710)
          0.50344306 = weight(abstract_txt:biographies in 710) [ClassicSimilarity], result of:
            0.50344306 = score(doc=710,freq=2.0), product of:
              0.62675774 = queryWeight, product of:
                4.8708024 = boost
                9.087735 = idf(docFreq=12, maxDocs=42306)
                0.014159358 = queryNorm
              0.8032499 = fieldWeight in 710, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.087735 = idf(docFreq=12, maxDocs=42306)
                0.0625 = fieldNorm(doc=710)
        0.12 = coord(3/25)
    
  4. Fallis, D.: Toward an epistemology of Wikipedia (2008) 0.07
    0.06844185 = sum of:
      0.06844185 = product of:
        0.42776158 = sum of:
          0.012007064 = weight(abstract_txt:will in 4011) [ClassicSimilarity], result of:
            0.012007064 = score(doc=4011,freq=1.0), product of:
              0.05676561 = queryWeight, product of:
                1.0365214 = boost
                3.867795 = idf(docFreq=2403, maxDocs=42306)
                0.014159358 = queryNorm
              0.21152005 = fieldWeight in 4011, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.867795 = idf(docFreq=2403, maxDocs=42306)
                0.0546875 = fieldNorm(doc=4011)
          0.048888154 = weight(abstract_txt:encyclopedias in 4011) [ClassicSimilarity], result of:
            0.048888154 = score(doc=4011,freq=1.0), product of:
              0.11488231 = queryWeight, product of:
                1.0426708 = boost
                7.781483 = idf(docFreq=47, maxDocs=42306)
                0.014159358 = queryNorm
              0.42554986 = fieldWeight in 4011, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.781483 = idf(docFreq=47, maxDocs=42306)
                0.0546875 = fieldNorm(doc=4011)
          0.086127944 = weight(abstract_txt:people in 4011) [ClassicSimilarity], result of:
            0.086127944 = score(doc=4011,freq=5.0), product of:
              0.14133961 = queryWeight, product of:
                2.0031488 = boost
                4.9831862 = idf(docFreq=787, maxDocs=42306)
                0.014159358 = queryNorm
              0.60936874 = fieldWeight in 4011, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.9831862 = idf(docFreq=787, maxDocs=42306)
                0.0546875 = fieldNorm(doc=4011)
          0.2807384 = weight(abstract_txt:wikipedia in 4011) [ClassicSimilarity], result of:
            0.2807384 = score(doc=4011,freq=13.0), product of:
              0.2259668 = queryWeight, product of:
                2.5328157 = boost
                6.300826 = idf(docFreq=210, maxDocs=42306)
                0.014159358 = queryNorm
              1.2423879 = fieldWeight in 4011, product of:
                3.6055512 = tf(freq=13.0), with freq of:
                  13.0 = termFreq=13.0
                6.300826 = idf(docFreq=210, maxDocs=42306)
                0.0546875 = fieldNorm(doc=4011)
        0.16 = coord(4/25)
    
  5. Kubiszewski, I.; Cleveland, C.J.: ¬The Encyclopedia of Earth (2007) 0.06
    0.058641355 = sum of:
      0.058641355 = product of:
        0.29320678 = sum of:
          0.019177578 = weight(abstract_txt:will in 3171) [ClassicSimilarity], result of:
            0.019177578 = score(doc=3171,freq=5.0), product of:
              0.05676561 = queryWeight, product of:
                1.0365214 = boost
                3.867795 = idf(docFreq=2403, maxDocs=42306)
                0.014159358 = queryNorm
              0.337838 = fieldWeight in 3171, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.867795 = idf(docFreq=2403, maxDocs=42306)
                0.0390625 = fieldNorm(doc=3171)
          0.024693605 = weight(abstract_txt:content in 3171) [ClassicSimilarity], result of:
            0.024693605 = score(doc=3171,freq=5.0), product of:
              0.06718605 = queryWeight, product of:
                1.1276523 = boost
                4.207851 = idf(docFreq=1710, maxDocs=42306)
                0.014159358 = queryNorm
              0.36754066 = fieldWeight in 3171, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.207851 = idf(docFreq=1710, maxDocs=42306)
                0.0390625 = fieldNorm(doc=3171)
          0.0056439214 = weight(abstract_txt:which in 3171) [ClassicSimilarity], result of:
            0.0056439214 = score(doc=3171,freq=1.0), product of:
              0.04916211 = queryWeight, product of:
                1.1813987 = boost
                2.938938 = idf(docFreq=6085, maxDocs=42306)
                0.014159358 = queryNorm
              0.11480226 = fieldWeight in 3171, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.938938 = idf(docFreq=6085, maxDocs=42306)
                0.0390625 = fieldNorm(doc=3171)
          0.021199152 = weight(abstract_txt:features in 3171) [ClassicSimilarity], result of:
            0.021199152 = score(doc=3171,freq=1.0), product of:
              0.11879245 = queryWeight, product of:
                1.8364356 = boost
                4.5684576 = idf(docFreq=1192, maxDocs=42306)
                0.014159358 = queryNorm
              0.17845538 = fieldWeight in 3171, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5684576 = idf(docFreq=1192, maxDocs=42306)
                0.0390625 = fieldNorm(doc=3171)
          0.22249252 = weight(abstract_txt:biographies in 3171) [ClassicSimilarity], result of:
            0.22249252 = score(doc=3171,freq=1.0), product of:
              0.62675774 = queryWeight, product of:
                4.8708024 = boost
                9.087735 = idf(docFreq=12, maxDocs=42306)
                0.014159358 = queryNorm
              0.35498965 = fieldWeight in 3171, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.087735 = idf(docFreq=12, maxDocs=42306)
                0.0390625 = fieldNorm(doc=3171)
        0.2 = coord(5/25)