Search (4 results, page 1 of 1)

  • × author_ss:"Ribeiro, C."
  • × year_i:[2010 TO 2020}
  1. Kar, M.; Nunes, S.; Ribeiro, C.: Summarization of changes in dynamic text collections using Latent Dirichlet Allocation model (2015) 0.02
    0.024915472 = product of:
      0.074746415 = sum of:
        0.03853567 = weight(_text_:wide in 2676) [ClassicSimilarity], result of:
          0.03853567 = score(doc=2676,freq=2.0), product of:
            0.19679762 = queryWeight, product of:
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.044416238 = queryNorm
            0.1958137 = fieldWeight in 2676, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.4307585 = idf(docFreq=1430, maxDocs=44218)
              0.03125 = fieldNorm(doc=2676)
        0.036210746 = weight(_text_:web in 2676) [ClassicSimilarity], result of:
          0.036210746 = score(doc=2676,freq=6.0), product of:
            0.14495286 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.044416238 = queryNorm
            0.24981049 = fieldWeight in 2676, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.03125 = fieldNorm(doc=2676)
      0.33333334 = coord(2/6)
    
    Abstract
    In the area of Information Retrieval, the task of automatic text summarization usually assumes a static underlying collection of documents, disregarding the temporal dimension of each document. However, in real world settings, collections and individual documents rarely stay unchanged over time. The World Wide Web is a prime example of a collection where information changes both frequently and significantly over time, with documents being added, modified or just deleted at different times. In this context, previous work addressing the summarization of web documents has simply discarded the dynamic nature of the web, considering only the latest published version of each individual document. This paper proposes and addresses a new challenge - the automatic summarization of changes in dynamic text collections. In standard text summarization, retrieval techniques present a summary to the user by capturing the major points expressed in the most recent version of an entire document in a condensed form. In this new task, the goal is to obtain a summary that describes the most significant changes made to a document during a given period. In other words, the idea is to have a summary of the revisions made to a document over a specific period of time. This paper proposes different approaches to generate summaries using extractive summarization techniques. First, individual terms are scored and then this information is used to rank and select sentences to produce the final summary. A system based on Latent Dirichlet Allocation model (LDA) is used to find the hidden topic structures of changes. The purpose of using the LDA model is to identify separate topics where the changed terms from each topic are likely to carry at least one significant change. The different approaches are then compared with the previous work in this area. A collection of articles from Wikipedia, including their revision history, is used to evaluate the proposed system. For each article, a temporal interval and a reference summary from the article's content are selected manually. The articles and intervals in which a significant event occurred are carefully selected. The summaries produced by each of the approaches are evaluated comparatively to the manual summaries using ROUGE metrics. It is observed that the approach using the LDA model outperforms all the other approaches. Statistical tests reveal that the differences in ROUGE scores for the LDA-based approach is statistically significant at 99% over baseline.
  2. Amorim, R.C.; Castro, J.A.; Silva, J.R. da; Ribeiro, C.: LabTablet: semantic metadata collection on a multi-domain laboratory notebook (2014) 0.01
    0.0065539777 = product of:
      0.039323866 = sum of:
        0.039323866 = weight(_text_:computer in 1583) [ClassicSimilarity], result of:
          0.039323866 = score(doc=1583,freq=2.0), product of:
            0.16231956 = queryWeight, product of:
              3.6545093 = idf(docFreq=3109, maxDocs=44218)
              0.044416238 = queryNorm
            0.24226204 = fieldWeight in 1583, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.6545093 = idf(docFreq=3109, maxDocs=44218)
              0.046875 = fieldNorm(doc=1583)
      0.16666667 = coord(1/6)
    
    Series
    Communications in computer and information science; 478
  3. Teixera Lopes, C.; Ribeiro, C.: Measuring the value of health query translation : An analysis by user language proficiency (2013) 0.01
    0.006159573 = product of:
      0.036957435 = sum of:
        0.036957435 = weight(_text_:web in 739) [ClassicSimilarity], result of:
          0.036957435 = score(doc=739,freq=4.0), product of:
            0.14495286 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.044416238 = queryNorm
            0.25496176 = fieldWeight in 739, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=739)
      0.16666667 = coord(1/6)
    
    Abstract
    English is by far the most used language on the web. In some domains, the existence of less content in the users' native language may not be problematic and even help to cope with the information overload. Yet, in domains such as health, where information quality is critical, a larger quantity of information may mean easier access to higher quality content. Query translation may be a good strategy to access content in other languages, but the presence of medical terms in health queries makes the translation process more difficult, even for users with very good language proficiencies. In this study, we evaluate how translating a health query affects users with different language proficiencies. We chose English as the non-native language because it is a widely spoken language and it is the most used language on the web. Our findings suggest that non-English-speaking users having at least elementary English proficiency can benefit from a system that suggests English alternatives for their queries, or automatically retrieves English content from a non-English query. This awareness of the user profile results in higher precision, more accurate medical knowledge, and better access to high-quality content. Moreover, the suggestions of English-translated queries may also trigger new health search strategies.
  4. Teixera Lopes, C.; Paiva, D.; Ribeiro, C.: Effects of language and terminology of query suggestions on medical accuracy considering different user characteristics (2017) 0.00
    0.004355476 = product of:
      0.026132854 = sum of:
        0.026132854 = weight(_text_:web in 3783) [ClassicSimilarity], result of:
          0.026132854 = score(doc=3783,freq=2.0), product of:
            0.14495286 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.044416238 = queryNorm
            0.18028519 = fieldWeight in 3783, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3783)
      0.16666667 = coord(1/6)
    
    Abstract
    Searching for health information is one of the most popular activities on the web. In this domain, users often misspell or lack knowledge of the proper medical terms to use in queries. To overcome these difficulties and attempt to retrieve higher-quality content, we developed a query suggestion system that provides alternative queries combining the Portuguese or English language with lay or medico-scientific terminology. Here we evaluate this system's impact on the medical accuracy of the knowledge acquired during the search. Evaluation shows that simply providing these suggestions contributes to reduce the quantity of incorrect content. This indicates that even when suggestions are not clicked, they are useful either for subsequent queries' formulation or for interpreting search results. Clicking on suggestions, regardless of type, leads to answers with more correct content. An analysis by type of suggestion and user characteristics showed that the benefits of certain languages and terminologies are more perceptible in users with certain levels of English proficiency and health literacy. This suggests a personalization of this suggestion system toward these characteristics. Overall, the effect of language is more preponderant than the effect of terminology. Clicks on English suggestions are clearly preferable to clicks on Portuguese ones.