Search (4 results, page 1 of 1)

  • × author_ss:"Leroy, G."
  1. Leroy, G.; Miller, T.; Rosemblat, G.; Browne, A.: ¬A balanced approach to health information evaluation : a vocabulary-based naïve Bayes classifier and readability formulas (2008) 0.01
    0.012923011 = product of:
      0.07107656 = sum of:
        0.06316024 = weight(_text_:higher in 1998) [ClassicSimilarity], result of:
          0.06316024 = score(doc=1998,freq=2.0), product of:
            0.18138453 = queryWeight, product of:
              5.252756 = idf(docFreq=628, maxDocs=44218)
              0.034531306 = queryNorm
            0.34821182 = fieldWeight in 1998, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.252756 = idf(docFreq=628, maxDocs=44218)
              0.046875 = fieldNorm(doc=1998)
        0.007916314 = weight(_text_:of in 1998) [ClassicSimilarity], result of:
          0.007916314 = score(doc=1998,freq=4.0), product of:
            0.053998582 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.034531306 = queryNorm
            0.14660224 = fieldWeight in 1998, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.046875 = fieldNorm(doc=1998)
      0.18181819 = coord(2/11)
    
    Abstract
    Since millions seek health information online, it is vital for this information to be comprehensible. Most studies use readability formulas, which ignore vocabulary, and conclude that online health information is too difficult. We developed a vocabularly-based, naïve Bayes classifier to distinguish between three difficulty levels in text. It proved 98% accurate in a 250-document evaluation. We compared our classifier with readability formulas for 90 new documents with different origins and asked representative human evaluators, an expert and a consumer, to judge each document. Average readability grade levels for educational and commercial pages was 10th grade or higher, too difficult according to current literature. In contrast, the classifier showed that 70-90% of these pages were written at an intermediate, appropriate level indicating that vocabulary usage is frequently appropriate in text considered too difficult by readability formula evaluations. The expert considered the pages more difficult for a consumer than the consumer did.
    Source
    Journal of the American Society for Information Science and Technology. 59(2008) no.9, S.1409-1419
  2. Leroy, G.; Chen, H.: Genescene: an ontology-enhanced integration of linguistic and co-occurrence based relations in biomedical texts (2005) 0.01
    0.010114968 = product of:
      0.037088215 = sum of:
        0.012341722 = weight(_text_:of in 5259) [ClassicSimilarity], result of:
          0.012341722 = score(doc=5259,freq=14.0), product of:
            0.053998582 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.034531306 = queryNorm
            0.22855641 = fieldWeight in 5259, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5259)
        0.013050207 = weight(_text_:on in 5259) [ClassicSimilarity], result of:
          0.013050207 = score(doc=5259,freq=4.0), product of:
            0.07594867 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.034531306 = queryNorm
            0.1718293 = fieldWeight in 5259, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5259)
        0.011696288 = product of:
          0.023392577 = sum of:
            0.023392577 = weight(_text_:22 in 5259) [ClassicSimilarity], result of:
              0.023392577 = score(doc=5259,freq=2.0), product of:
                0.12092275 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.034531306 = queryNorm
                0.19345059 = fieldWeight in 5259, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5259)
          0.5 = coord(1/2)
      0.27272728 = coord(3/11)
    
    Abstract
    The increasing amount of publicly available literature and experimental data in biomedicine makes it hard for biomedical researchers to stay up-to-date. Genescene is a toolkit that will help alleviate this problem by providing an overview of published literature content. We combined a linguistic parser with Concept Space, a co-occurrence based semantic net. Both techniques extract complementary biomedical relations between noun phrases from MEDLINE abstracts. The parser extracts precise and semantically rich relations from individual abstracts. Concept Space extracts relations that hold true for the collection of abstracts. The Gene Ontology, the Human Genome Nomenclature, and the Unified Medical Language System, are also integrated in Genescene. Currently, they are used to facilitate the integration of the two relation types, and to select the more interesting and high-quality relations for presentation. A user study focusing on p53 literature is discussed. All MEDLINE abstracts discussing p53 were processed in Genescene. Two researchers evaluated the terms and relations from several abstracts of interest to them. The results show that the terms were precise (precision 93%) and relevant, as were the parser relations (precision 95%). The Concept Space relations were more precise when selected with ontological knowledge (precision 78%) than without (60%).
    Date
    22. 7.2006 14:26:01
    Footnote
    Beitrag in einem special issue on bioinformatics
    Source
    Journal of the American Society for Information Science and Technology. 56(2005) no.5, S.457-468
  3. Kauchak, D.; Leroy, G.; Hogue, A.: Measuring text difficulty using parse-tree frequency (2017) 0.00
    0.0042692483 = product of:
      0.023480866 = sum of:
        0.010430659 = weight(_text_:of in 3786) [ClassicSimilarity], result of:
          0.010430659 = score(doc=3786,freq=10.0), product of:
            0.053998582 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.034531306 = queryNorm
            0.19316542 = fieldWeight in 3786, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3786)
        0.013050207 = weight(_text_:on in 3786) [ClassicSimilarity], result of:
          0.013050207 = score(doc=3786,freq=4.0), product of:
            0.07594867 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.034531306 = queryNorm
            0.1718293 = fieldWeight in 3786, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3786)
      0.18181819 = coord(2/11)
    
    Abstract
    Text simplification often relies on dated, unproven readability formulas. As an alternative and motivated by the success of term familiarity, we test a complementary measure: grammar familiarity. Grammar familiarity is measured as the frequency of the 3rd level sentence parse tree and is useful for evaluating individual sentences. We created a database of 140K unique 3rd level parse structures by parsing and binning all 5.4M sentences in English Wikipedia. We then calculated the grammar frequencies across the corpus and created 11 frequency bins. We evaluate the measure with a user study and corpus analysis. For the user study, we selected 20 sentences randomly from each bin, controlling for sentence length and term frequency, and recruited 30 readers per sentence (N = 6,600) on Amazon Mechanical Turk. We measured actual difficulty (comprehension) using a Cloze test, perceived difficulty using a 5-point Likert scale, and time taken. Sentences with more frequent grammatical structures, even with very different surface presentations, were easier to understand, perceived as easier, and took less time to read. Outcomes from readability formulas correlated with perceived but not with actual difficulty. Our corpus analysis shows how the metric can be used to understand grammar regularity in a broad range of corpora.
    Source
    Journal of the Association for Information Science and Technology. 68(2017) no.9, S.2088-2100
  4. Ku, C.-H.; Leroy, G.: ¬A crime reports analysis system to identify related crimes (2011) 0.00
    0.0037552915 = product of:
      0.020654103 = sum of:
        0.011426214 = weight(_text_:of in 4629) [ClassicSimilarity], result of:
          0.011426214 = score(doc=4629,freq=12.0), product of:
            0.053998582 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.034531306 = queryNorm
            0.21160212 = fieldWeight in 4629, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4629)
        0.009227889 = weight(_text_:on in 4629) [ClassicSimilarity], result of:
          0.009227889 = score(doc=4629,freq=2.0), product of:
            0.07594867 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.034531306 = queryNorm
            0.121501654 = fieldWeight in 4629, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4629)
      0.18181819 = coord(2/11)
    
    Abstract
    The popularity of online and anonymous options to report crimes, such as tips websites and text messaging, has led to an increasing amount of textual information available to law enforcement personnel. However, locating, filtering, extracting, and combining information to solve crimes is a time-consuming task. In response, we are developing entity and document similarity algorithms to automatically identify overlapping and complementary information. These are essential components for systems that combine and contrast crime information. The entity similarity algorithm integrates a domain-specific hierarchical lexicon with Jaccard coefficients. The document similarity algorithm combines the entity similarity scores using a Dice coefficient. We describe the evaluation of both components. To evaluate the entity similarity algorithm, we compared the new algorithm and four generic algorithms with a gold standard. The strongest correlation with the gold standard, r = 0.710, was found with our entity similarity algorithm. To evaluate the document similarity algorithm, we first developed a test bed containing witness reports for 17 crimes shown in video clips. We evaluated five versions of the algorithm that differ in how much importance is assigned to different entity types. Cosine similarity is then used as a baseline comparison to evaluate the performance of the document similarity algorithms for accuracy in recognizing reports describing the same crime and distinguishing them from reports on different crimes. The best version achieved 92% accuracy.
    Source
    Journal of the American Society for Information Science and Technology. 62(2011) no.8, S.1533-1547