Document (#34000)

Author
Leroy, G.
Miller, T.
Rosemblat, G.
Browne, A.
Title
¬A balanced approach to health information evaluation : a vocabulary-based naïve Bayes classifier and readability formulas
Source
Journal of the American Society for Information Science and Technology. 59(2008) no.9, S.1409-1419
Year
2008
Abstract
Since millions seek health information online, it is vital for this information to be comprehensible. Most studies use readability formulas, which ignore vocabulary, and conclude that online health information is too difficult. We developed a vocabularly-based, naïve Bayes classifier to distinguish between three difficulty levels in text. It proved 98% accurate in a 250-document evaluation. We compared our classifier with readability formulas for 90 new documents with different origins and asked representative human evaluators, an expert and a consumer, to judge each document. Average readability grade levels for educational and commercial pages was 10th grade or higher, too difficult according to current literature. In contrast, the classifier showed that 70-90% of these pages were written at an intermediate, appropriate level indicating that vocabulary usage is frequently appropriate in text considered too difficult by readability formula evaluations. The expert considered the pages more difficult for a consumer than the consumer did.
Theme
Automatisches Klassifizieren
Field
Medizin

Similar documents (author)

  1. Browne, G.: Scope notes for LISA subject headings (1992) 0.72
    0.72095233 = sum of:
      0.72095233 = product of:
        2.8838093 = sum of:
          2.8838093 = weight(author_txt:browne in 1499) [ClassicSimilarity], result of:
            2.8838093 = score(doc=1499,freq=1.0), product of:
              0.5027351 = queryWeight, product of:
                1.2891159 = boost
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.04249129 = queryNorm
              5.7362404 = fieldWeight in 1499, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.625 = fieldNorm(doc=1499)
        0.25 = coord(1/4)
    
  2. Browne, G.: Professional liability of indexers (1996) 0.72
    0.72095233 = sum of:
      0.72095233 = product of:
        2.8838093 = sum of:
          2.8838093 = weight(author_txt:browne in 4644) [ClassicSimilarity], result of:
            2.8838093 = score(doc=4644,freq=1.0), product of:
              0.5027351 = queryWeight, product of:
                1.2891159 = boost
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.04249129 = queryNorm
              5.7362404 = fieldWeight in 4644, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.625 = fieldNorm(doc=4644)
        0.25 = coord(1/4)
    
  3. Browne, G.: ¬The definite article : acknowledging The in index entries (2001) 0.72
    0.72095233 = sum of:
      0.72095233 = product of:
        2.8838093 = sum of:
          2.8838093 = weight(author_txt:browne in 1013) [ClassicSimilarity], result of:
            2.8838093 = score(doc=1013,freq=1.0), product of:
              0.5027351 = queryWeight, product of:
                1.2891159 = boost
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.04249129 = queryNorm
              5.7362404 = fieldWeight in 1013, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.625 = fieldNorm(doc=1013)
        0.25 = coord(1/4)
    
  4. Browne, G.: Changes in website indexing (2007) 0.72
    0.72095233 = sum of:
      0.72095233 = product of:
        2.8838093 = sum of:
          2.8838093 = weight(author_txt:browne in 2748) [ClassicSimilarity], result of:
            2.8838093 = score(doc=2748,freq=1.0), product of:
              0.5027351 = queryWeight, product of:
                1.2891159 = boost
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.04249129 = queryNorm
              5.7362404 = fieldWeight in 2748, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.177984 = idf(docFreq=11, maxDocs=42740)
                0.625 = fieldNorm(doc=2748)
        0.25 = coord(1/4)
    
  5. Rosemblat, G.; Graham, L.: Cross-language search in a monolingual health information system : flexible designs and lexical processes (2006) 0.72
    0.7175553 = sum of:
      0.7175553 = product of:
        2.8702211 = sum of:
          2.8702211 = weight(author_txt:rosemblat in 2242) [ClassicSimilarity], result of:
            2.8702211 = score(doc=2242,freq=1.0), product of:
              0.58153844 = queryWeight, product of:
                1.3864735 = boost
                9.871131 = idf(docFreq=5, maxDocs=42740)
                0.04249129 = queryNorm
              4.9355655 = fieldWeight in 2242, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.871131 = idf(docFreq=5, maxDocs=42740)
                0.5 = fieldNorm(doc=2242)
        0.25 = coord(1/4)
    

Similar documents (content)

  1. Collins-Thompson, K.; Callan, J.: Predicting reading difficulty with statistical language models (2005) 0.28
    0.2790697 = sum of:
      0.2790697 = product of:
        0.87209284 = sum of:
          0.023647804 = weight(abstract_txt:text in 580) [ClassicSimilarity], result of:
            0.023647804 = score(doc=580,freq=3.0), product of:
              0.05393725 = queryWeight, product of:
                1.0251122 = boost
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.012991402 = queryNorm
              0.43843177 = fieldWeight in 580, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.0625 = fieldNorm(doc=580)
          0.016121687 = weight(abstract_txt:document in 580) [ClassicSimilarity], result of:
            0.016121687 = score(doc=580,freq=1.0), product of:
              0.060257204 = queryWeight, product of:
                1.0835063 = boost
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.012991402 = queryNorm
              0.26754788 = fieldWeight in 580, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.0625 = fieldNorm(doc=580)
          0.0058986223 = weight(abstract_txt:information in 580) [ClassicSimilarity], result of:
            0.0058986223 = score(doc=580,freq=1.0), product of:
              0.038837004 = queryWeight, product of:
                1.2301692 = boost
                2.430104 = idf(docFreq=10226, maxDocs=42740)
                0.012991402 = queryNorm
              0.1518815 = fieldWeight in 580, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.430104 = idf(docFreq=10226, maxDocs=42740)
                0.0625 = fieldNorm(doc=580)
          0.04192563 = weight(abstract_txt:levels in 580) [ClassicSimilarity], result of:
            0.04192563 = score(doc=580,freq=2.0), product of:
              0.09044372 = queryWeight, product of:
                1.327444 = boost
                5.2445254 = idf(docFreq=612, maxDocs=42740)
                0.012991402 = queryNorm
              0.46355492 = fieldWeight in 580, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.2445254 = idf(docFreq=612, maxDocs=42740)
                0.0625 = fieldNorm(doc=580)
          0.13748519 = weight(abstract_txt:grade in 580) [ClassicSimilarity], result of:
            0.13748519 = score(doc=580,freq=2.0), product of:
              0.1996316 = queryWeight, product of:
                1.9721576 = boost
                7.7916894 = idf(docFreq=47, maxDocs=42740)
                0.012991402 = queryNorm
              0.68869454 = fieldWeight in 580, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.7916894 = idf(docFreq=47, maxDocs=42740)
                0.0625 = fieldNorm(doc=580)
          0.075959094 = weight(abstract_txt:pages in 580) [ClassicSimilarity], result of:
            0.075959094 = score(doc=580,freq=2.0), product of:
              0.15386586 = queryWeight, product of:
                2.1205266 = boost
                5.5852485 = idf(docFreq=435, maxDocs=42740)
                0.012991402 = queryNorm
              0.49367088 = fieldWeight in 580, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.5852485 = idf(docFreq=435, maxDocs=42740)
                0.0625 = fieldNorm(doc=580)
          0.15625012 = weight(abstract_txt:classifier in 580) [ClassicSimilarity], result of:
            0.15625012 = score(doc=580,freq=1.0), product of:
              0.34511107 = queryWeight, product of:
                3.6670887 = boost
                7.24405 = idf(docFreq=82, maxDocs=42740)
                0.012991402 = queryNorm
              0.45275313 = fieldWeight in 580, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.24405 = idf(docFreq=82, maxDocs=42740)
                0.0625 = fieldNorm(doc=580)
          0.41480473 = weight(abstract_txt:readability in 580) [ClassicSimilarity], result of:
            0.41480473 = score(doc=580,freq=2.0), product of:
              0.5657195 = queryWeight, product of:
                5.2492533 = boost
                8.295595 = idf(docFreq=28, maxDocs=42740)
                0.012991402 = queryNorm
              0.7332339 = fieldWeight in 580, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.295595 = idf(docFreq=28, maxDocs=42740)
                0.0625 = fieldNorm(doc=580)
        0.32 = coord(8/25)
    
  2. Denning, J.; Pera, M.S.; Ng, Y.-K.: ¬A readability level prediction tool for K-12 books (2016) 0.24
    0.24298728 = sum of:
      0.24298728 = product of:
        1.2149364 = sum of:
          0.013653066 = weight(abstract_txt:text in 4773) [ClassicSimilarity], result of:
            0.013653066 = score(doc=4773,freq=1.0), product of:
              0.05393725 = queryWeight, product of:
                1.0251122 = boost
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.012991402 = queryNorm
              0.2531287 = fieldWeight in 4773, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.0625 = fieldNorm(doc=4773)
          0.04192563 = weight(abstract_txt:levels in 4773) [ClassicSimilarity], result of:
            0.04192563 = score(doc=4773,freq=2.0), product of:
              0.09044372 = queryWeight, product of:
                1.327444 = boost
                5.2445254 = idf(docFreq=612, maxDocs=42740)
                0.012991402 = queryNorm
              0.46355492 = fieldWeight in 4773, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.2445254 = idf(docFreq=612, maxDocs=42740)
                0.0625 = fieldNorm(doc=4773)
          0.13748519 = weight(abstract_txt:grade in 4773) [ClassicSimilarity], result of:
            0.13748519 = score(doc=4773,freq=2.0), product of:
              0.1996316 = queryWeight, product of:
                1.9721576 = boost
                7.7916894 = idf(docFreq=47, maxDocs=42740)
                0.012991402 = queryNorm
              0.68869454 = fieldWeight in 4773, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.7916894 = idf(docFreq=47, maxDocs=42740)
                0.0625 = fieldNorm(doc=4773)
          0.24584389 = weight(abstract_txt:formulas in 4773) [ClassicSimilarity], result of:
            0.24584389 = score(doc=4773,freq=2.0), product of:
              0.33666298 = queryWeight, product of:
                3.1366806 = boost
                8.261693 = idf(docFreq=29, maxDocs=42740)
                0.012991402 = queryNorm
              0.73023736 = fieldWeight in 4773, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.261693 = idf(docFreq=29, maxDocs=42740)
                0.0625 = fieldNorm(doc=4773)
          0.7760286 = weight(abstract_txt:readability in 4773) [ClassicSimilarity], result of:
            0.7760286 = score(doc=4773,freq=7.0), product of:
              0.5657195 = queryWeight, product of:
                5.2492533 = boost
                8.295595 = idf(docFreq=28, maxDocs=42740)
                0.012991402 = queryNorm
              1.3717551 = fieldWeight in 4773, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                8.295595 = idf(docFreq=28, maxDocs=42740)
                0.0625 = fieldNorm(doc=4773)
        0.2 = coord(5/25)
    
  3. Mengle, S.S.R.; Goharian, N.: Ambiguity measure feature-selection algorithm (2009) 0.20
    0.1972112 = sum of:
      0.1972112 = product of:
        0.8217134 = sum of:
          0.033443045 = weight(abstract_txt:text in 4805) [ClassicSimilarity], result of:
            0.033443045 = score(doc=4805,freq=6.0), product of:
              0.05393725 = queryWeight, product of:
                1.0251122 = boost
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.012991402 = queryNorm
              0.6200362 = fieldWeight in 4805, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.0625 = fieldNorm(doc=4805)
          0.022799509 = weight(abstract_txt:document in 4805) [ClassicSimilarity], result of:
            0.022799509 = score(doc=4805,freq=2.0), product of:
              0.060257204 = queryWeight, product of:
                1.0835063 = boost
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.012991402 = queryNorm
              0.37836984 = fieldWeight in 4805, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.0625 = fieldNorm(doc=4805)
          0.16803637 = weight(abstract_txt:naïve in 4805) [ClassicSimilarity], result of:
            0.16803637 = score(doc=4805,freq=2.0), product of:
              0.22820625 = queryWeight, product of:
                2.1085832 = boost
                8.330686 = idf(docFreq=27, maxDocs=42740)
                0.012991402 = queryNorm
              0.7363355 = fieldWeight in 4805, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.330686 = idf(docFreq=27, maxDocs=42740)
                0.0625 = fieldNorm(doc=4805)
          0.18022305 = weight(abstract_txt:bayes in 4805) [ClassicSimilarity], result of:
            0.18022305 = score(doc=4805,freq=2.0), product of:
              0.23911063 = queryWeight, product of:
                2.1583726 = boost
                8.527396 = idf(docFreq=22, maxDocs=42740)
                0.012991402 = queryNorm
              0.7537224 = fieldWeight in 4805, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.527396 = idf(docFreq=22, maxDocs=42740)
                0.0625 = fieldNorm(doc=4805)
          0.067825526 = weight(abstract_txt:difficult in 4805) [ClassicSimilarity], result of:
            0.067825526 = score(doc=4805,freq=1.0), product of:
              0.19785215 = queryWeight, product of:
                2.776594 = boost
                5.4849463 = idf(docFreq=481, maxDocs=42740)
                0.012991402 = queryNorm
              0.34280914 = fieldWeight in 4805, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4849463 = idf(docFreq=481, maxDocs=42740)
                0.0625 = fieldNorm(doc=4805)
          0.3493859 = weight(abstract_txt:classifier in 4805) [ClassicSimilarity], result of:
            0.3493859 = score(doc=4805,freq=5.0), product of:
              0.34511107 = queryWeight, product of:
                3.6670887 = boost
                7.24405 = idf(docFreq=82, maxDocs=42740)
                0.012991402 = queryNorm
              1.0123868 = fieldWeight in 4805, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                7.24405 = idf(docFreq=82, maxDocs=42740)
                0.0625 = fieldNorm(doc=4805)
        0.24 = coord(6/25)
    
  4. Lantz, C.: Evaluating the readability of instructional visuals (1996) 0.10
    0.102811545 = sum of:
      0.102811545 = product of:
        0.8567629 = sum of:
          0.029559756 = weight(abstract_txt:text in 550) [ClassicSimilarity], result of:
            0.029559756 = score(doc=550,freq=3.0), product of:
              0.05393725 = queryWeight, product of:
                1.0251122 = boost
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.012991402 = queryNorm
              0.54803973 = fieldWeight in 550, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.078125 = fieldNorm(doc=550)
          0.0073732785 = weight(abstract_txt:information in 550) [ClassicSimilarity], result of:
            0.0073732785 = score(doc=550,freq=1.0), product of:
              0.038837004 = queryWeight, product of:
                1.2301692 = boost
                2.430104 = idf(docFreq=10226, maxDocs=42740)
                0.012991402 = queryNorm
              0.18985188 = fieldWeight in 550, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.430104 = idf(docFreq=10226, maxDocs=42740)
                0.078125 = fieldNorm(doc=550)
          0.8198298 = weight(abstract_txt:readability in 550) [ClassicSimilarity], result of:
            0.8198298 = score(doc=550,freq=5.0), product of:
              0.5657195 = queryWeight, product of:
                5.2492533 = boost
                8.295595 = idf(docFreq=28, maxDocs=42740)
                0.012991402 = queryNorm
              1.4491808 = fieldWeight in 550, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                8.295595 = idf(docFreq=28, maxDocs=42740)
                0.078125 = fieldNorm(doc=550)
        0.12 = coord(3/25)
    
  5. Sebastiani, F.: Machine learning in automated text categorization (2002) 0.10
    0.099870704 = sum of:
      0.099870704 = product of:
        0.49935353 = sum of:
          0.017066333 = weight(abstract_txt:text in 4390) [ClassicSimilarity], result of:
            0.017066333 = score(doc=4390,freq=1.0), product of:
              0.05393725 = queryWeight, product of:
                1.0251122 = boost
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.012991402 = queryNorm
              0.3164109 = fieldWeight in 4390, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0500593 = idf(docFreq=2023, maxDocs=42740)
                0.078125 = fieldNorm(doc=4390)
          0.020152109 = weight(abstract_txt:document in 4390) [ClassicSimilarity], result of:
            0.020152109 = score(doc=4390,freq=1.0), product of:
              0.060257204 = queryWeight, product of:
                1.0835063 = boost
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.012991402 = queryNorm
              0.33443484 = fieldWeight in 4390, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.078125 = fieldNorm(doc=4390)
          0.023296943 = weight(abstract_txt:evaluation in 4390) [ClassicSimilarity], result of:
            0.023296943 = score(doc=4390,freq=1.0), product of:
              0.06637348 = queryWeight, product of:
                1.1371671 = boost
                4.492771 = idf(docFreq=1299, maxDocs=42740)
                0.012991402 = queryNorm
              0.35099775 = fieldWeight in 4390, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.492771 = idf(docFreq=1299, maxDocs=42740)
                0.078125 = fieldNorm(doc=4390)
          0.048212882 = weight(abstract_txt:expert in 4390) [ClassicSimilarity], result of:
            0.048212882 = score(doc=4390,freq=1.0), product of:
              0.10778808 = queryWeight, product of:
                1.4491466 = boost
                5.725354 = idf(docFreq=378, maxDocs=42740)
                0.012991402 = queryNorm
              0.44729328 = fieldWeight in 4390, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.725354 = idf(docFreq=378, maxDocs=42740)
                0.078125 = fieldNorm(doc=4390)
          0.39062527 = weight(abstract_txt:classifier in 4390) [ClassicSimilarity], result of:
            0.39062527 = score(doc=4390,freq=4.0), product of:
              0.34511107 = queryWeight, product of:
                3.6670887 = boost
                7.24405 = idf(docFreq=82, maxDocs=42740)
                0.012991402 = queryNorm
              1.1318828 = fieldWeight in 4390, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.24405 = idf(docFreq=82, maxDocs=42740)
                0.078125 = fieldNorm(doc=4390)
        0.2 = coord(5/25)