Document (#38611)

Author
Ofek, N.
Rokach, L.
Title
¬A classifier to determine which Wikipedia biographies will be accepted
Source
Journal of the Association for Information Science and Technology. 66(2015) no.1, S.213-218
Year
2015
Series
Brief communication
Abstract
Wikipedia, like other encyclopedias, includes biographies of notable people. However, because it is jointly written by many contributors, it is subject to constant manipulation by contributors attempting to add biographies of non-notable people. Over time, Wikipedia has developed inclusion criteria for notable people (e.g., receiving a significant award) based on which newly contributed biographies are evaluated. In this paper we present and analyze a set of simple indicators that can be used to predict which article will eventually be accepted. These indicators do not refer to the content itself, but to meta-content features (such as the number of categories that the biography is associated with) and to author-based features (such as if it is a first-time author). By training a classifier on these features, we successfully reached a high predictive performance (area under the receiver operating characteristic [ROC] curve [AUC] of 0.97) even though we overlooked the actual biography text.
Content
Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23199/abstract.
Theme
Informationsmittel
Object
Wikipedia

Similar documents (author)

  1. Rokach, L.; Mitra, P.: Parsimonious citer-based measures : the artificial intelligence domain as a case study (2013) 4.81
    4.808723 = sum of:
      4.808723 = weight(author_txt:rokach in 212) [ClassicSimilarity], result of:
        4.808723 = fieldWeight in 212, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.617446 = idf(docFreq=7, maxDocs=44218)
          0.5 = fieldNorm(doc=212)
    
  2. Blank, I.; Rokach, L.; Shani, G.: Leveraging metadata to recommend keywords for academic papers (2016) 3.61
    3.606542 = sum of:
      3.606542 = weight(author_txt:rokach in 3232) [ClassicSimilarity], result of:
        3.606542 = fieldWeight in 3232, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.617446 = idf(docFreq=7, maxDocs=44218)
          0.375 = fieldNorm(doc=3232)
    
  3. Greenstein-Messica, A.; Rokach, L.; Shabtai, A.: Personal-discount sensitivity prediction for mobile coupon conversion optimization (2017) 3.61
    3.606542 = sum of:
      3.606542 = weight(author_txt:rokach in 3751) [ClassicSimilarity], result of:
        3.606542 = fieldWeight in 3751, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.617446 = idf(docFreq=7, maxDocs=44218)
          0.375 = fieldNorm(doc=3751)
    
  4. Greenstein-Messica, A.; Rokach, L.; Shabtai, A.: Personal-discount sensitivity prediction for mobile coupon conversion optimization (2017) 3.61
    3.606542 = sum of:
      3.606542 = weight(author_txt:rokach in 3761) [ClassicSimilarity], result of:
        3.606542 = fieldWeight in 3761, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.617446 = idf(docFreq=7, maxDocs=44218)
          0.375 = fieldNorm(doc=3761)
    
  5. Rokach, L.; Kalech, M.; Blank, I.; Stern, R.: Who is going to win the next Association for the Advancement of Artificial Intelligence Fellowship Award? : evaluating researchers by mining bibliographic data (2011) 3.01
    3.005452 = sum of:
      3.005452 = weight(author_txt:rokach in 4945) [ClassicSimilarity], result of:
        3.005452 = fieldWeight in 4945, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.617446 = idf(docFreq=7, maxDocs=44218)
          0.3125 = fieldNorm(doc=4945)
    

Similar documents (content)

  1. Tyrwhitt-Drake, B.: ¬The DNB on CD-ROM (1996) 0.11
    0.10847274 = sum of:
      0.10847274 = product of:
        0.90393955 = sum of:
          0.030965146 = weight(abstract_txt:content in 6638) [ClassicSimilarity], result of:
            0.030965146 = score(doc=6638,freq=1.0), product of:
              0.0677311 = queryWeight, product of:
                1.1135688 = boost
                4.17991 = idf(docFreq=1838, maxDocs=44218)
                0.014551378 = queryNorm
              0.45717767 = fieldWeight in 6638, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.17991 = idf(docFreq=1838, maxDocs=44218)
                0.109375 = fieldNorm(doc=6638)
          0.27024198 = weight(abstract_txt:biography in 6638) [ClassicSimilarity], result of:
            0.27024198 = score(doc=6638,freq=1.0), product of:
              0.28710532 = queryWeight, product of:
                2.292681 = boost
                8.6058445 = idf(docFreq=21, maxDocs=44218)
                0.014551378 = queryNorm
              0.9412643 = fieldWeight in 6638, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.6058445 = idf(docFreq=21, maxDocs=44218)
                0.109375 = fieldNorm(doc=6638)
          0.6027324 = weight(abstract_txt:biographies in 6638) [ClassicSimilarity], result of:
            0.6027324 = score(doc=6638,freq=1.0), product of:
              0.6174935 = queryWeight, product of:
                4.75504 = boost
                8.924298 = idf(docFreq=15, maxDocs=44218)
                0.014551378 = queryNorm
              0.97609514 = fieldWeight in 6638, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.924298 = idf(docFreq=15, maxDocs=44218)
                0.109375 = fieldNorm(doc=6638)
        0.12 = coord(3/25)
    
  2. Veelen, I. van: ¬The truth according to Wikipedia (2008) 0.09
    0.09472618 = sum of:
      0.09472618 = product of:
        0.3383078 = sum of:
          0.010461616 = weight(abstract_txt:will in 2139) [ClassicSimilarity], result of:
            0.010461616 = score(doc=2139,freq=1.0), product of:
              0.05779937 = queryWeight, product of:
                1.0286901 = boost
                3.8613079 = idf(docFreq=2528, maxDocs=44218)
                0.014551378 = queryNorm
              0.1809988 = fieldWeight in 2139, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.8613079 = idf(docFreq=2528, maxDocs=44218)
                0.046875 = fieldNorm(doc=2139)
          0.012972264 = weight(abstract_txt:time in 2139) [ClassicSimilarity], result of:
            0.012972264 = score(doc=2139,freq=1.0), product of:
              0.06671155 = queryWeight, product of:
                1.1051558 = boost
                4.148331 = idf(docFreq=1897, maxDocs=44218)
                0.014551378 = queryNorm
              0.19445303 = fieldWeight in 2139, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.148331 = idf(docFreq=1897, maxDocs=44218)
                0.046875 = fieldNorm(doc=2139)
          0.01876771 = weight(abstract_txt:content in 2139) [ClassicSimilarity], result of:
            0.01876771 = score(doc=2139,freq=2.0), product of:
              0.0677311 = queryWeight, product of:
                1.1135688 = boost
                4.17991 = idf(docFreq=1838, maxDocs=44218)
                0.014551378 = queryNorm
              0.2770915 = fieldWeight in 2139, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.17991 = idf(docFreq=1838, maxDocs=44218)
                0.046875 = fieldNorm(doc=2139)
          0.006763455 = weight(abstract_txt:which in 2139) [ClassicSimilarity], result of:
            0.006763455 = score(doc=2139,freq=1.0), product of:
              0.049469028 = queryWeight, product of:
                1.1655618 = boost
                2.9167147 = idf(docFreq=6503, maxDocs=44218)
                0.014551378 = queryNorm
              0.136721 = fieldWeight in 2139, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.9167147 = idf(docFreq=6503, maxDocs=44218)
                0.046875 = fieldNorm(doc=2139)
          0.03169876 = weight(abstract_txt:author in 2139) [ClassicSimilarity], result of:
            0.03169876 = score(doc=2139,freq=2.0), product of:
              0.09605989 = queryWeight, product of:
                1.3261542 = boost
                4.9778743 = idf(docFreq=827, maxDocs=44218)
                0.014551378 = queryNorm
              0.32998955 = fieldWeight in 2139, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.9778743 = idf(docFreq=827, maxDocs=44218)
                0.046875 = fieldNorm(doc=2139)
          0.05631807 = weight(abstract_txt:people in 2139) [ClassicSimilarity], result of:
            0.05631807 = score(doc=2139,freq=3.0), product of:
              0.14091128 = queryWeight, product of:
                1.9671683 = boost
                4.922663 = idf(docFreq=874, maxDocs=44218)
                0.014551378 = queryNorm
              0.39967042 = fieldWeight in 2139, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.922663 = idf(docFreq=874, maxDocs=44218)
                0.046875 = fieldNorm(doc=2139)
          0.20132591 = weight(abstract_txt:wikipedia in 2139) [ClassicSimilarity], result of:
            0.20132591 = score(doc=2139,freq=9.0), product of:
              0.22842304 = queryWeight, product of:
                2.5046012 = boost
                6.2675414 = idf(docFreq=227, maxDocs=44218)
                0.014551378 = queryNorm
              0.881373 = fieldWeight in 2139, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                6.2675414 = idf(docFreq=227, maxDocs=44218)
                0.046875 = fieldNorm(doc=2139)
        0.28 = coord(7/25)
    
  3. Bounhas, I.; Elayeb, B.; Evrard, F.; Slimani, Y.: Toward a computer study of the reliability of Arabic stories (2010) 0.07
    0.07057749 = sum of:
      0.07057749 = product of:
        0.58814573 = sum of:
          0.00901794 = weight(abstract_txt:which in 3709) [ClassicSimilarity], result of:
            0.00901794 = score(doc=3709,freq=1.0), product of:
              0.049469028 = queryWeight, product of:
                1.1655618 = boost
                2.9167147 = idf(docFreq=6503, maxDocs=44218)
                0.014551378 = queryNorm
              0.18229467 = fieldWeight in 3709, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.9167147 = idf(docFreq=6503, maxDocs=44218)
                0.0625 = fieldNorm(doc=3709)
          0.09204643 = weight(abstract_txt:classifier in 3709) [ClassicSimilarity], result of:
            0.09204643 = score(doc=3709,freq=1.0), product of:
              0.2033462 = queryWeight, product of:
                1.9294833 = boost
                7.24254 = idf(docFreq=85, maxDocs=44218)
                0.014551378 = queryNorm
              0.45265874 = fieldWeight in 3709, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.24254 = idf(docFreq=85, maxDocs=44218)
                0.0625 = fieldNorm(doc=3709)
          0.48708135 = weight(abstract_txt:biographies in 3709) [ClassicSimilarity], result of:
            0.48708135 = score(doc=3709,freq=2.0), product of:
              0.6174935 = queryWeight, product of:
                4.75504 = boost
                8.924298 = idf(docFreq=15, maxDocs=44218)
                0.014551378 = queryNorm
              0.788804 = fieldWeight in 3709, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.924298 = idf(docFreq=15, maxDocs=44218)
                0.0625 = fieldNorm(doc=3709)
        0.12 = coord(3/25)
    
  4. Fallis, D.: Toward an epistemology of Wikipedia (2008) 0.07
    0.06875544 = sum of:
      0.06875544 = product of:
        0.42972153 = sum of:
          0.01220522 = weight(abstract_txt:will in 2010) [ClassicSimilarity], result of:
            0.01220522 = score(doc=2010,freq=1.0), product of:
              0.05779937 = queryWeight, product of:
                1.0286901 = boost
                3.8613079 = idf(docFreq=2528, maxDocs=44218)
                0.014551378 = queryNorm
              0.21116528 = fieldWeight in 2010, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.8613079 = idf(docFreq=2528, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2010)
          0.050401356 = weight(abstract_txt:encyclopedias in 2010) [ClassicSimilarity], result of:
            0.050401356 = score(doc=2010,freq=1.0), product of:
              0.11808032 = queryWeight, product of:
                1.039673 = boost
                7.805067 = idf(docFreq=48, maxDocs=44218)
                0.014551378 = queryNorm
              0.4268396 = fieldWeight in 2010, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.805067 = idf(docFreq=48, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2010)
          0.08482405 = weight(abstract_txt:people in 2010) [ClassicSimilarity], result of:
            0.08482405 = score(doc=2010,freq=5.0), product of:
              0.14091128 = queryWeight, product of:
                1.9671683 = boost
                4.922663 = idf(docFreq=874, maxDocs=44218)
                0.014551378 = queryNorm
              0.60196775 = fieldWeight in 2010, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.922663 = idf(docFreq=874, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2010)
          0.2822909 = weight(abstract_txt:wikipedia in 2010) [ClassicSimilarity], result of:
            0.2822909 = score(doc=2010,freq=13.0), product of:
              0.22842304 = queryWeight, product of:
                2.5046012 = boost
                6.2675414 = idf(docFreq=227, maxDocs=44218)
                0.014551378 = queryNorm
              1.235825 = fieldWeight in 2010, product of:
                3.6055512 = tf(freq=13.0), with freq of:
                  13.0 = termFreq=13.0
                6.2675414 = idf(docFreq=227, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2010)
        0.16 = coord(4/25)
    
  5. Hyvönen, E.; Leskinen, P.; Tamper, M.; Keravuori, K.; Rantala, H.; Ikkala, E.; Tuominen, J.: BiographySampo - publishing and enriching biographies on the Semantic Web for digital humanities research (2019) 0.06
    0.0639904 = sum of:
      0.0639904 = product of:
        0.79988 = sum of:
          0.054192092 = weight(abstract_txt:people in 5799) [ClassicSimilarity], result of:
            0.054192092 = score(doc=5799,freq=1.0), product of:
              0.14091128 = queryWeight, product of:
                1.9671683 = boost
                4.922663 = idf(docFreq=874, maxDocs=44218)
                0.014551378 = queryNorm
              0.38458306 = fieldWeight in 5799, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.922663 = idf(docFreq=874, maxDocs=44218)
                0.078125 = fieldNorm(doc=5799)
          0.74568796 = weight(abstract_txt:biographies in 5799) [ClassicSimilarity], result of:
            0.74568796 = score(doc=5799,freq=3.0), product of:
              0.6174935 = queryWeight, product of:
                4.75504 = boost
                8.924298 = idf(docFreq=15, maxDocs=44218)
                0.014551378 = queryNorm
              1.2076045 = fieldWeight in 5799, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                8.924298 = idf(docFreq=15, maxDocs=44218)
                0.078125 = fieldNorm(doc=5799)
        0.08 = coord(2/25)