Document (#42507)

Author
Wang, P.
Li, X.
Title
Assessing the quality of information on Wikipedia : a deep-learning approach
Source
Journal of the Association for Information Science and Technology. 71(2020) no.1, S.16-28
Year
2020
Abstract
Currently, web document repositories have been collaboratively created and edited. One of these repositories, Wikipedia, is facing an important problem: assessing the quality of Wikipedia. Existing approaches exploit techniques such as statistical models or machine leaning algorithms to assess Wikipedia article quality. However, existing models do not provide satisfactory results. Furthermore, these models fail to adopt a comprehensive feature framework. In this article, we conduct an extensive survey of previous studies and summarize a comprehensive feature framework, including text statistics, writing style, readability, article structure, network, and editing history. Selected state-of-the-art deep-learning models, including the convolutional neural network (CNN), deep neural network (DNN), long short-term memory (LSTMs) network, CNN-LSTMs, bidirectional LSTMs, and stacked LSTMs, are applied to assess the quality of Wikipedia. A detailed comparison of deep-learning models is conducted with regard to different aspects: classification performance and training performance. We include an importance analysis of different features and feature sets to determine which features or feature sets are most effective in distinguishing Wikipedia article quality. This extensive experiment validates the effectiveness of the proposed model.
Content
Vgl.: https://asistdl.onlinelibrary.wiley.com/doi/10.1002/asi.24210.
Theme
Informationsmittel
Object
Wikipedia

Similar documents (author)

  1. Wang, H.; Wang, C.: Ontologies for universal information systems (1995) 4.77
    4.7679567 = sum of:
      4.7679567 = weight(author_txt:wang in 3263) [ClassicSimilarity], result of:
        4.7679567 = fieldWeight in 3263, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.7429094 = idf(docFreq=136, maxDocs=42740)
          0.5 = fieldNorm(doc=3263)
    
  2. Wang, C.: ¬The online catalogue, subject access and user reactions : a review (1985) 4.21
    4.2143183 = sum of:
      4.2143183 = weight(author_txt:wang in 986) [ClassicSimilarity], result of:
        4.2143183 = fieldWeight in 986, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          6.7429094 = idf(docFreq=136, maxDocs=42740)
          0.625 = fieldNorm(doc=986)
    
  3. Wang, C.: Bibliometrics : a textbook (1990) 4.21
    4.2143183 = sum of:
      4.2143183 = weight(author_txt:wang in 5109) [ClassicSimilarity], result of:
        4.2143183 = fieldWeight in 5109, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          6.7429094 = idf(docFreq=136, maxDocs=42740)
          0.625 = fieldNorm(doc=5109)
    
  4. Wang, P.: Users' information needs at different stages of a research project : a cognitive view (1997) 4.21
    4.2143183 = sum of:
      4.2143183 = weight(author_txt:wang in 1321) [ClassicSimilarity], result of:
        4.2143183 = fieldWeight in 1321, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          6.7429094 = idf(docFreq=136, maxDocs=42740)
          0.625 = fieldNorm(doc=1321)
    
  5. Wang, D.: Cataloger appraises keyword searching in WorldCat (1997) 4.21
    4.2143183 = sum of:
      4.2143183 = weight(author_txt:wang in 2443) [ClassicSimilarity], result of:
        4.2143183 = fieldWeight in 2443, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          6.7429094 = idf(docFreq=136, maxDocs=42740)
          0.625 = fieldNorm(doc=2443)
    

Similar documents (content)

  1. Huang, H.-H.; Wang, J.-J.; Chen, H.-H.: Implicit opinion analysis : extraction and polarity labelling (2017) 0.22
    0.21752104 = sum of:
      0.21752104 = product of:
        0.9063377 = sum of:
          0.1599592 = weight(abstract_txt:convolutional in 5821) [ClassicSimilarity], result of:
            0.1599592 = score(doc=5821,freq=1.0), product of:
              0.18228373 = queryWeight, product of:
                1.1545184 = boost
                9.360306 = idf(docFreq=9, maxDocs=42740)
                0.016867744 = queryNorm
              0.87752867 = fieldWeight in 5821, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.360306 = idf(docFreq=9, maxDocs=42740)
                0.09375 = fieldNorm(doc=5821)
          0.19151784 = weight(abstract_txt:neural in 5821) [ClassicSimilarity], result of:
            0.19151784 = score(doc=5821,freq=2.0), product of:
              0.20553285 = queryWeight, product of:
                1.7337343 = boost
                7.0281615 = idf(docFreq=102, maxDocs=42740)
                0.016867744 = queryNorm
              0.9318114 = fieldWeight in 5821, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.0281615 = idf(docFreq=102, maxDocs=42740)
                0.09375 = fieldNorm(doc=5821)
          0.06501495 = weight(abstract_txt:learning in 5821) [ClassicSimilarity], result of:
            0.06501495 = score(doc=5821,freq=1.0), product of:
              0.14425282 = queryWeight, product of:
                1.7788924 = boost
                4.807482 = idf(docFreq=948, maxDocs=42740)
                0.016867744 = queryNorm
              0.45070142 = fieldWeight in 5821, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.807482 = idf(docFreq=948, maxDocs=42740)
                0.09375 = fieldNorm(doc=5821)
          0.11047823 = weight(abstract_txt:network in 5821) [ClassicSimilarity], result of:
            0.11047823 = score(doc=5821,freq=2.0), product of:
              0.17944701 = queryWeight, product of:
                2.2909997 = boost
                4.643594 = idf(docFreq=1117, maxDocs=42740)
                0.016867744 = queryNorm
              0.61565936 = fieldWeight in 5821, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.643594 = idf(docFreq=1117, maxDocs=42740)
                0.09375 = fieldNorm(doc=5821)
          0.1424763 = weight(abstract_txt:models in 5821) [ClassicSimilarity], result of:
            0.1424763 = score(doc=5821,freq=2.0), product of:
              0.22902533 = queryWeight, product of:
                2.893701 = boost
                4.6921606 = idf(docFreq=1064, maxDocs=42740)
                0.016867744 = queryNorm
              0.62209845 = fieldWeight in 5821, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.6921606 = idf(docFreq=1064, maxDocs=42740)
                0.09375 = fieldNorm(doc=5821)
          0.23689114 = weight(abstract_txt:deep in 5821) [ClassicSimilarity], result of:
            0.23689114 = score(doc=5821,freq=1.0), product of:
              0.37594786 = queryWeight, product of:
                3.3160472 = boost
                6.721248 = idf(docFreq=139, maxDocs=42740)
                0.016867744 = queryNorm
              0.630117 = fieldWeight in 5821, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.721248 = idf(docFreq=139, maxDocs=42740)
                0.09375 = fieldNorm(doc=5821)
        0.24 = coord(6/25)
    
  2. Jiang, Y.; Zhang, X.; Tang, Y.; Nie, R.: Feature-based approaches to semantic similarity assessment of concepts using Wikipedia (2015) 0.21
    0.20767853 = sum of:
      0.20767853 = product of:
        0.86532724 = sum of:
          0.03617919 = weight(abstract_txt:framework in 4683) [ClassicSimilarity], result of:
            0.03617919 = score(doc=4683,freq=2.0), product of:
              0.08866968 = queryWeight, product of:
                1.1387528 = boost
                4.6162434 = idf(docFreq=1148, maxDocs=42740)
                0.016867744 = queryNorm
              0.4080221 = fieldWeight in 4683, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.6162434 = idf(docFreq=1148, maxDocs=42740)
                0.0625 = fieldNorm(doc=4683)
          0.02699531 = weight(abstract_txt:existing in 4683) [ClassicSimilarity], result of:
            0.02699531 = score(doc=4683,freq=1.0), product of:
              0.09190479 = queryWeight, product of:
                1.1593404 = boost
                4.6997004 = idf(docFreq=1056, maxDocs=42740)
                0.016867744 = queryNorm
              0.29373127 = fieldWeight in 4683, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6997004 = idf(docFreq=1056, maxDocs=42740)
                0.0625 = fieldNorm(doc=4683)
          0.055311706 = weight(abstract_txt:assess in 4683) [ClassicSimilarity], result of:
            0.055311706 = score(doc=4683,freq=1.0), product of:
              0.14825998 = queryWeight, product of:
                1.4724952 = boost
                5.969158 = idf(docFreq=296, maxDocs=42740)
                0.016867744 = queryNorm
              0.3730724 = fieldWeight in 4683, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.969158 = idf(docFreq=296, maxDocs=42740)
                0.0625 = fieldNorm(doc=4683)
          0.05207994 = weight(abstract_txt:network in 4683) [ClassicSimilarity], result of:
            0.05207994 = score(doc=4683,freq=1.0), product of:
              0.17944701 = queryWeight, product of:
                2.2909997 = boost
                4.643594 = idf(docFreq=1117, maxDocs=42740)
                0.016867744 = queryNorm
              0.2902246 = fieldWeight in 4683, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.643594 = idf(docFreq=1117, maxDocs=42740)
                0.0625 = fieldNorm(doc=4683)
          0.2186666 = weight(abstract_txt:feature in 4683) [ClassicSimilarity], result of:
            0.2186666 = score(doc=4683,freq=4.0), product of:
              0.29421008 = queryWeight, product of:
                2.9334972 = boost
                5.945863 = idf(docFreq=303, maxDocs=42740)
                0.016867744 = queryNorm
              0.74323285 = fieldWeight in 4683, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.945863 = idf(docFreq=303, maxDocs=42740)
                0.0625 = fieldNorm(doc=4683)
          0.47609448 = weight(abstract_txt:wikipedia in 4683) [ClassicSimilarity], result of:
            0.47609448 = score(doc=4683,freq=6.0), product of:
              0.49423257 = queryWeight, product of:
                4.6565924 = boost
                6.2922525 = idf(docFreq=214, maxDocs=42740)
                0.016867744 = queryNorm
              0.9633005 = fieldWeight in 4683, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.2922525 = idf(docFreq=214, maxDocs=42740)
                0.0625 = fieldNorm(doc=4683)
        0.24 = coord(6/25)
    
  3. Mao, J.; Xu, W.; Yang, Y.; Wang, J.; Yuille, A.L.: Explain images with multimodal recurrent neural networks (2014) 0.19
    0.19231656 = sum of:
      0.19231656 = product of:
        0.801319 = sum of:
          0.032531165 = weight(abstract_txt:performance in 3558) [ClassicSimilarity], result of:
            0.032531165 = score(doc=3558,freq=1.0), product of:
              0.08968896 = queryWeight, product of:
                1.1452792 = boost
                4.6426997 = idf(docFreq=1118, maxDocs=42740)
                0.016867744 = queryNorm
              0.36271092 = fieldWeight in 3558, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6426997 = idf(docFreq=1118, maxDocs=42740)
                0.078125 = fieldNorm(doc=3558)
          0.13329934 = weight(abstract_txt:convolutional in 3558) [ClassicSimilarity], result of:
            0.13329934 = score(doc=3558,freq=1.0), product of:
              0.18228373 = queryWeight, product of:
                1.1545184 = boost
                9.360306 = idf(docFreq=9, maxDocs=42740)
                0.016867744 = queryNorm
              0.7312739 = fieldWeight in 3558, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.360306 = idf(docFreq=9, maxDocs=42740)
                0.078125 = fieldNorm(doc=3558)
          0.15959822 = weight(abstract_txt:neural in 3558) [ClassicSimilarity], result of:
            0.15959822 = score(doc=3558,freq=2.0), product of:
              0.20553285 = queryWeight, product of:
                1.7337343 = boost
                7.0281615 = idf(docFreq=102, maxDocs=42740)
                0.016867744 = queryNorm
              0.7765095 = fieldWeight in 3558, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.0281615 = idf(docFreq=102, maxDocs=42740)
                0.078125 = fieldNorm(doc=3558)
          0.112756364 = weight(abstract_txt:network in 3558) [ClassicSimilarity], result of:
            0.112756364 = score(doc=3558,freq=3.0), product of:
              0.17944701 = queryWeight, product of:
                2.2909997 = boost
                4.643594 = idf(docFreq=1117, maxDocs=42740)
                0.016867744 = queryNorm
              0.62835467 = fieldWeight in 3558, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.643594 = idf(docFreq=1117, maxDocs=42740)
                0.078125 = fieldNorm(doc=3558)
          0.083954975 = weight(abstract_txt:models in 3558) [ClassicSimilarity], result of:
            0.083954975 = score(doc=3558,freq=1.0), product of:
              0.22902533 = queryWeight, product of:
                2.893701 = boost
                4.6921606 = idf(docFreq=1064, maxDocs=42740)
                0.016867744 = queryNorm
              0.36657506 = fieldWeight in 3558, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6921606 = idf(docFreq=1064, maxDocs=42740)
                0.078125 = fieldNorm(doc=3558)
          0.2791789 = weight(abstract_txt:deep in 3558) [ClassicSimilarity], result of:
            0.2791789 = score(doc=3558,freq=2.0), product of:
              0.37594786 = queryWeight, product of:
                3.3160472 = boost
                6.721248 = idf(docFreq=139, maxDocs=42740)
                0.016867744 = queryNorm
              0.7426 = fieldWeight in 3558, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.721248 = idf(docFreq=139, maxDocs=42740)
                0.078125 = fieldNorm(doc=3558)
        0.24 = coord(6/25)
    
  4. Arazy, O.; Yeo, L.; Nov, O.: Stay on the Wikipedia task : when task-related disagreements slip into personal and procedural conflicts (2013) 0.19
    0.18516989 = sum of:
      0.18516989 = product of:
        0.66132104 = sum of:
          0.09800098 = weight(abstract_txt:collaboratively in 3007) [ClassicSimilarity], result of:
            0.09800098 = score(doc=3007,freq=2.0), product of:
              0.13675594 = queryWeight, product of:
                8.107542 = idf(docFreq=34, maxDocs=42740)
                0.016867744 = queryNorm
              0.7166122 = fieldWeight in 3007, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.107542 = idf(docFreq=34, maxDocs=42740)
                0.0625 = fieldNorm(doc=3007)
          0.021874981 = weight(abstract_txt:including in 3007) [ClassicSimilarity], result of:
            0.021874981 = score(doc=3007,freq=1.0), product of:
              0.07988116 = queryWeight, product of:
                1.0808467 = boost
                4.381505 = idf(docFreq=1452, maxDocs=42740)
                0.016867744 = queryNorm
              0.27384406 = fieldWeight in 3007, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.381505 = idf(docFreq=1452, maxDocs=42740)
                0.0625 = fieldNorm(doc=3007)
          0.03680481 = weight(abstract_txt:performance in 3007) [ClassicSimilarity], result of:
            0.03680481 = score(doc=3007,freq=2.0), product of:
              0.08968896 = queryWeight, product of:
                1.1452792 = boost
                4.6426997 = idf(docFreq=1118, maxDocs=42740)
                0.016867744 = queryNorm
              0.41036054 = fieldWeight in 3007, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.6426997 = idf(docFreq=1118, maxDocs=42740)
                0.0625 = fieldNorm(doc=3007)
          0.044175897 = weight(abstract_txt:comprehensive in 3007) [ClassicSimilarity], result of:
            0.044175897 = score(doc=3007,freq=1.0), product of:
              0.12762512 = queryWeight, product of:
                1.3661865 = boost
                5.538207 = idf(docFreq=456, maxDocs=42740)
                0.016867744 = queryNorm
              0.34613794 = fieldWeight in 3007, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.538207 = idf(docFreq=456, maxDocs=42740)
                0.0625 = fieldNorm(doc=3007)
          0.02939695 = weight(abstract_txt:article in 3007) [ClassicSimilarity], result of:
            0.02939695 = score(doc=3007,freq=1.0), product of:
              0.122562446 = queryWeight, product of:
                1.8933705 = boost
                3.8376453 = idf(docFreq=2502, maxDocs=42740)
                0.016867744 = queryNorm
              0.23985283 = fieldWeight in 3007, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.8376453 = idf(docFreq=2502, maxDocs=42740)
                0.0625 = fieldNorm(doc=3007)
          0.094417766 = weight(abstract_txt:quality in 3007) [ClassicSimilarity], result of:
            0.094417766 = score(doc=3007,freq=2.0), product of:
              0.22811389 = queryWeight, product of:
                2.8879373 = boost
                4.6828146 = idf(docFreq=1074, maxDocs=42740)
                0.016867744 = queryNorm
              0.41390625 = fieldWeight in 3007, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.6828146 = idf(docFreq=1074, maxDocs=42740)
                0.0625 = fieldNorm(doc=3007)
          0.33664963 = weight(abstract_txt:wikipedia in 3007) [ClassicSimilarity], result of:
            0.33664963 = score(doc=3007,freq=3.0), product of:
              0.49423257 = queryWeight, product of:
                4.6565924 = boost
                6.2922525 = idf(docFreq=214, maxDocs=42740)
                0.016867744 = queryNorm
              0.6811563 = fieldWeight in 3007, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.2922525 = idf(docFreq=214, maxDocs=42740)
                0.0625 = fieldNorm(doc=3007)
        0.28 = coord(7/25)
    
  5. Gupta, P.; Banchs, R.E.; Rosso, P.: Continuous space models for CLIR (2017) 0.18
    0.1801652 = sum of:
      0.1801652 = product of:
        0.64344716 = sum of:
          0.03617919 = weight(abstract_txt:framework in 5296) [ClassicSimilarity], result of:
            0.03617919 = score(doc=5296,freq=2.0), product of:
              0.08866968 = queryWeight, product of:
                1.1387528 = boost
                4.6162434 = idf(docFreq=1148, maxDocs=42740)
                0.016867744 = queryNorm
              0.4080221 = fieldWeight in 5296, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.6162434 = idf(docFreq=1148, maxDocs=42740)
                0.0625 = fieldNorm(doc=5296)
          0.02699531 = weight(abstract_txt:existing in 5296) [ClassicSimilarity], result of:
            0.02699531 = score(doc=5296,freq=1.0), product of:
              0.09190479 = queryWeight, product of:
                1.1593404 = boost
                4.6997004 = idf(docFreq=1056, maxDocs=42740)
                0.016867744 = queryNorm
              0.29373127 = fieldWeight in 5296, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6997004 = idf(docFreq=1056, maxDocs=42740)
                0.0625 = fieldNorm(doc=5296)
          0.12767857 = weight(abstract_txt:neural in 5296) [ClassicSimilarity], result of:
            0.12767857 = score(doc=5296,freq=2.0), product of:
              0.20553285 = queryWeight, product of:
                1.7337343 = boost
                7.0281615 = idf(docFreq=102, maxDocs=42740)
                0.016867744 = queryNorm
              0.6212076 = fieldWeight in 5296, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.0281615 = idf(docFreq=102, maxDocs=42740)
                0.0625 = fieldNorm(doc=5296)
          0.0866866 = weight(abstract_txt:learning in 5296) [ClassicSimilarity], result of:
            0.0866866 = score(doc=5296,freq=4.0), product of:
              0.14425282 = queryWeight, product of:
                1.7788924 = boost
                4.807482 = idf(docFreq=948, maxDocs=42740)
                0.016867744 = queryNorm
              0.6009352 = fieldWeight in 5296, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.807482 = idf(docFreq=948, maxDocs=42740)
                0.0625 = fieldNorm(doc=5296)
          0.073652156 = weight(abstract_txt:network in 5296) [ClassicSimilarity], result of:
            0.073652156 = score(doc=5296,freq=2.0), product of:
              0.17944701 = queryWeight, product of:
                2.2909997 = boost
                4.643594 = idf(docFreq=1117, maxDocs=42740)
                0.016867744 = queryNorm
              0.41043958 = fieldWeight in 5296, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.643594 = idf(docFreq=1117, maxDocs=42740)
                0.0625 = fieldNorm(doc=5296)
          0.13432796 = weight(abstract_txt:models in 5296) [ClassicSimilarity], result of:
            0.13432796 = score(doc=5296,freq=4.0), product of:
              0.22902533 = queryWeight, product of:
                2.893701 = boost
                4.6921606 = idf(docFreq=1064, maxDocs=42740)
                0.016867744 = queryNorm
              0.5865201 = fieldWeight in 5296, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.6921606 = idf(docFreq=1064, maxDocs=42740)
                0.0625 = fieldNorm(doc=5296)
          0.15792742 = weight(abstract_txt:deep in 5296) [ClassicSimilarity], result of:
            0.15792742 = score(doc=5296,freq=1.0), product of:
              0.37594786 = queryWeight, product of:
                3.3160472 = boost
                6.721248 = idf(docFreq=139, maxDocs=42740)
                0.016867744 = queryNorm
              0.420078 = fieldWeight in 5296, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.721248 = idf(docFreq=139, maxDocs=42740)
                0.0625 = fieldNorm(doc=5296)
        0.28 = coord(7/25)