Search (5 results, page 1 of 1)

  • × author_ss:"Kelly, D."
  • × year_i:[2000 TO 2010}
  1. Kelly, D.; Wacholder, N.; Rittman, R.; Sun, Y.; Kantor, P.; Small, S.; Strzalkowski, T.: Using interview data to identify evaluation criteria for interactive, analytical question-answering systems (2007) 0.02
    0.0182639 = product of:
      0.0365278 = sum of:
        0.0365278 = product of:
          0.0730556 = sum of:
            0.0730556 = weight(_text_:systems in 332) [ClassicSimilarity], result of:
              0.0730556 = score(doc=332,freq=10.0), product of:
                0.16037072 = queryWeight, product of:
                  3.0731742 = idf(docFreq=5561, maxDocs=44218)
                  0.052184064 = queryNorm
                0.45554203 = fieldWeight in 332, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  3.0731742 = idf(docFreq=5561, maxDocs=44218)
                  0.046875 = fieldNorm(doc=332)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The purpose of this work is to identify potential evaluation criteria for interactive, analytical question-answering (QA) systems by analyzing evaluative comments made by users of such a system. Qualitative data collected from intelligence analysts during interviews and focus groups were analyzed to identify common themes related to performance, use, and usability. These data were collected as part of an intensive, three-day evaluation workshop of the High-Quality Interactive Question Answering (HITIQA) system. Inductive coding and memoing were used to identify and categorize these data. Results suggest potential evaluation criteria for interactive, analytical QA systems, which can be used to guide the development and design of future systems and evaluations. This work contributes to studies of QA systems, information seeking and use behaviors, and interactive searching.
  2. Wacholder, N.; Kelly, D.; Kantor, P.; Rittman, R.; Sun, Y.; Bai, B.; Small, S.; Yamrom, B.; Strzalkowski, T.: ¬A model for quantitative evaluation of an end-to-end question-answering system (2007) 0.02
    0.01633573 = product of:
      0.03267146 = sum of:
        0.03267146 = product of:
          0.06534292 = sum of:
            0.06534292 = weight(_text_:systems in 435) [ClassicSimilarity], result of:
              0.06534292 = score(doc=435,freq=8.0), product of:
                0.16037072 = queryWeight, product of:
                  3.0731742 = idf(docFreq=5561, maxDocs=44218)
                  0.052184064 = queryNorm
                0.4074492 = fieldWeight in 435, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  3.0731742 = idf(docFreq=5561, maxDocs=44218)
                  0.046875 = fieldNorm(doc=435)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    We describe a procedure for quantitative evaluation of interactive question-answering systems and illustrate it with application to the High-Quality Interactive QuestionAnswering (HITIQA) system. Our objectives were (a) to design a method to realistically and reliably assess interactive question-answering systems by comparing the quality of reports produced using different systems, (b) to conduct a pilot test of this method, and (c) to perform a formative evaluation of the HITIQA system. Far more important than the specific information gathered from this pilot evaluation is the development of (a) a protocol for evaluating an emerging technology, (b) reusable assessment instruments, and (c) the knowledge gained in conducting the evaluation. We conclude that this method, which uses a surprisingly small number of subjects and does not rely on predetermined relevance judgments, measures the impact of system change on work produced by users. Therefore this method can be used to compare the product of interactive systems that use different underlying technologies.
  3. Kelly, D.; Harper, D.J.; Landau, B.: Questionnaire mode effects in interactive information retrieval experiments (2008) 0.01
    0.013613109 = product of:
      0.027226217 = sum of:
        0.027226217 = product of:
          0.054452434 = sum of:
            0.054452434 = weight(_text_:systems in 2029) [ClassicSimilarity], result of:
              0.054452434 = score(doc=2029,freq=8.0), product of:
                0.16037072 = queryWeight, product of:
                  3.0731742 = idf(docFreq=5561, maxDocs=44218)
                  0.052184064 = queryNorm
                0.339541 = fieldWeight in 2029, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  3.0731742 = idf(docFreq=5561, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2029)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The questionnaire is an important technique for gathering data from subjects during interactive information retrieval (IR) experiments. Research in survey methodology, public opinion polling and psychology has demonstrated a number of response biases and behaviors that subjects exhibit when responding to questionnaires. Furthermore, research in human-computer interaction has demonstrated that subjects tend to inflate their ratings of systems when completing usability questionnaires. In this study we investigate the relationship between questionnaire mode and subjects' responses to a usability questionnaire comprised of closed and open questions administered during an interactive IR experiment. Three questionnaire modes (pen-and-paper, electronic and interview) were explored with 51 subjects who used one of two information retrieval systems. Results showed that subjects' quantitative evaluations of systems were significantly lower in the interview mode than in the electronic mode. With respect to open questions, subjects in the interview mode used significantly more words than subjects in the pen-and-paper or electronic modes to communicate their responses, and communicated a significantly higher number of response units, even though the total number of unique response units was roughly the same across condition. Finally, results showed that subjects in the pen-and-paper mode were the most efficient in communicating their responses to open questions. These results suggest that researchers should use the interview mode to elicit responses to closed questions from subjects and either pen-and-paper or electronic modes to elicit responses to open questions.
    Footnote
    Beitrag eines Themenbereichs: Evaluation of Interactive Information Retrieval Systems
  4. Kelly, D.; Fu, X.: Eliciting better information need descriptions from users of information search systems (2007) 0.01
    0.013476291 = product of:
      0.026952581 = sum of:
        0.026952581 = product of:
          0.053905163 = sum of:
            0.053905163 = weight(_text_:systems in 893) [ClassicSimilarity], result of:
              0.053905163 = score(doc=893,freq=4.0), product of:
                0.16037072 = queryWeight, product of:
                  3.0731742 = idf(docFreq=5561, maxDocs=44218)
                  0.052184064 = queryNorm
                0.33612844 = fieldWeight in 893, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.0731742 = idf(docFreq=5561, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=893)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    In this paper we investigate the effectiveness of a technique for eliciting more robust information need descriptions from users of information systems. We propose that such a technique can be used to elicit terms from users for use in query expansion and as a follow-up when ambiguous queries are initially posed. We design a feedback form to obtain additional information from users, administer the form to users after initial querying, and create a series of experimental runs based on the information that we obtained from the form. Results demonstrate that the form was successful at eliciting more information from users and that this additional information significantly improved retrieval performance. Our results further demonstrate a strong relationship between query length and performance.
  5. Kelly, D.: Implicit feedback : using behavior to infer relevance (2005) 0.01
    0.0054452433 = product of:
      0.010890487 = sum of:
        0.010890487 = product of:
          0.021780973 = sum of:
            0.021780973 = weight(_text_:systems in 645) [ClassicSimilarity], result of:
              0.021780973 = score(doc=645,freq=2.0), product of:
                0.16037072 = queryWeight, product of:
                  3.0731742 = idf(docFreq=5561, maxDocs=44218)
                  0.052184064 = queryNorm
                0.1358164 = fieldWeight in 645, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0731742 = idf(docFreq=5561, maxDocs=44218)
                  0.03125 = fieldNorm(doc=645)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The concept of relevance has a rich history in information retrieval (IR) that dates back well over 40 years (Borlund, 2003) and is necessarily a part of any theory of informationseeking and retrieval. Relevance feedback also has a long history in IR (Salton, 1971) and is considered an important part of interactive IR (Spink and Losee, 1996). Relevance feedback techniques often require users to explicitly provide feedback to the system, by, for instance, specifying keywords, selecting, evaluating and marking documents, or answering questions about their interests. The feedback that users provide during these interactions has been used for a variety of IR techniques and applications including query expansion, term disambiguation, user profiling, filtering and personalization. Empirical studies have led to the general finding that users of interactive IR systems desire explicit relevance feedback features and, in particular, term suggestion features (Beaulieu, 1997; Belkin et al., 2001; Koenemann and Belkin, 1996). However, much of the evidence from laboratory studies has indicated that relevance feedback features are not used. While users often report a desire for relevance feedback and term suggestion, they do not actually use these features during their searching activities. Several reasons can be given for why this disparately exists. Users may not have additional cognitive resources available to operate the relevance feedback feature. While the extra effort required to operate the feature may seem trivial, the user is already potentially involved in a complex and cognitively burdensome task. Increased effort would be required for both learning the new system and operating its features. When features require more effort and additional cognitive processing than they appear to be worth, they may be abandoned all together. Furthermore, if relevance feedback features are not implemented as part of the routine search activity, they may be forgotten, no matter how helpful they are. This research, in part, has lead to the general belief that users are unwilling to engage in explicit relevance feedback. Recently (Anick, 2003) demonstrated in a web-based study, that users made use of a term suggestion feature to expand and refine their queries, thus things may be changing. These results suggest the potential of term suggestion features in some types of information-seeking environments, especially for single session interactions. Hence it may just be the case that traditional relevance feedback interfaces have not effectively elicited feedback from users or optimally integrated relevance feedback features into current information interaction models.