Search (2 results, page 1 of 1)

  • × author_ss:"Nguyen, A."
  • × author_ss:"Zuccon, G."
  1. Kholghi, M.; Vine, L.D.; Sitbon, L.; Zuccon, G.; Nguyen, A.: Clinical information extraction using small data : an active learning approach based on sequence representations and word embeddings (2017) 0.00
    0.0018909799 = product of:
      0.0037819599 = sum of:
        0.0037819599 = product of:
          0.0075639198 = sum of:
            0.0075639198 = weight(_text_:a in 3920) [ClassicSimilarity], result of:
              0.0075639198 = score(doc=3920,freq=10.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.14243183 = fieldWeight in 3920, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3920)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    This article demonstrates the benefits of using sequence representations based on word embeddings to inform the seed selection and sample selection processes in an active learning pipeline for clinical information extraction. Seed selection refers to choosing an initial sample set to label to form an initial learning model. Sample selection refers to selecting informative samples to update the model at each iteration of the active learning process. Compared to supervised machine learning approaches, active learning offers the opportunity to build statistical classifiers with a reduced amount of training samples that require manual annotation. Reducing the manual annotation effort can support automating the clinical information extraction process. This is particularly beneficial in the clinical domain, where manual annotation is a time-consuming and costly task, as it requires extensive labor from clinical experts. Our empirical findings demonstrate that (a) using sequence representations along with the length of sequence for seed selection shows potential towards more effective initial models, and (b) using sequence representations for sample selection leads to significantly lower manual annotation efforts, with up to 3% and 6% fewer tokens and concepts requiring annotation, respectively, compared to state-of-the-art query strategies.
    Type
    a
  2. Koopman, B.; Zuccon, G.; Bruza, P.; Nguyen, A.: What makes an effective clinical query and querier? (2017) 0.00
    0.0016913437 = product of:
      0.0033826875 = sum of:
        0.0033826875 = product of:
          0.006765375 = sum of:
            0.006765375 = weight(_text_:a in 3922) [ClassicSimilarity], result of:
              0.006765375 = score(doc=3922,freq=8.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.12739488 = fieldWeight in 3922, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3922)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    In this paper, we perform an in-depth study into how clinicians represent their information needs and the influence this has on information retrieval (IR) effectiveness. While much research in IR has considered the effectiveness of IR systems, there is still a significant gap in the understanding of how users contribute to the effectiveness of these systems. The paper aims to contribute to this by studying how clinicians search for information. Multiple representations of an information need-from verbose patient case descriptions to ad-hoc queries-were considered in order to understand their effect on retrieval. Four clinicians provided queries and performed relevance assessment to form a test collection used in this study. The different query formulation strategies of each clinician, and their effectiveness, were investigated. The results show that query formulation had more impact on retrieval effectiveness than the particular retrieval systems used. The most effective queries were short, ad-hoc keyword queries. Different clinicians were observed to consistently adopt specific query formulation strategies. The most effective queriers were those who, given their information need, inferred novel keywords most likely to appear in relevant documents. This study reveals aspects of how people search within the clinical domain. This can help inform the development of new models and methods that specifically focus on the query formulation process to improve retrieval effectiveness.
    Type
    a