Search (9 results, page 1 of 1)

Wacholder, N.; Kelly, D.; Kantor, P.; Rittman, R.; Sun, Y.; Bai, B.; Small, S.; Yamrom, B.; Strzalkowski, T.: ¬A model for quantitative evaluation of an end-to-end question-answering system (2007) 0.00
```
0.0032090992 = product of:
  0.0064181983 = sum of:
    0.0064181983 = product of:
      0.012836397 = sum of:
        0.012836397 = weight(_text_:a in 435) [ClassicSimilarity], result of:
          0.012836397 = score(doc=435,freq=20.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.24171482 = fieldWeight in 435, product of:
              4.472136 = tf(freq=20.0), with freq of:
                20.0 = termFreq=20.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=435)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

We describe a procedure for quantitative evaluation of interactive question-answering systems and illustrate it with application to the High-Quality Interactive QuestionAnswering (HITIQA) system. Our objectives were (a) to design a method to realistically and reliably assess interactive question-answering systems by comparing the quality of reports produced using different systems, (b) to conduct a pilot test of this method, and (c) to perform a formative evaluation of the HITIQA system. Far more important than the specific information gathered from this pilot evaluation is the development of (a) a protocol for evaluating an emerging technology, (b) reusable assessment instruments, and (c) the knowledge gained in conducting the evaluation. We conclude that this method, which uses a surprisingly small number of subjects and does not rely on predetermined relevance judgments, measures the impact of system change on work produced by users. Therefore this method can be used to compare the product of interactive systems that use different underlying technologies.

Type

a
Liu, Y.-H.; Wacholder, N.: Evaluating the impact of MeSH (Medical Subject Headings) terms on different types of searchers (2017) 0.00
```
0.0028703054 = product of:
  0.005740611 = sum of:
    0.005740611 = product of:
      0.011481222 = sum of:
        0.011481222 = weight(_text_:a in 5096) [ClassicSimilarity], result of:
          0.011481222 = score(doc=5096,freq=16.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.2161963 = fieldWeight in 5096, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=5096)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

A commonly used technique for improving search engine performance is result caching. In result caching, precomputed results (e.g., URLs and snippets of best matching pages) of certain queries are stored in a fast-access storage. The future occurrences of a query whose results are already stored in the cache can be directly served by the result cache, eliminating the need to process the query using costly computing resources. Although other performance metrics are possible, the main performance metric for evaluating the success of a result cache is hit rate. In this work, we present a machine learning approach to improve the hit rate of a result cache by facilitating a large number of features extracted from search engine query logs. We then apply the proposed machine learning approach to static, dynamic, and static-dynamic caching. Compared to the previous methods in the literature, the proposed approach improves the hit rate of the result cache up to 0.66%, which corresponds to 9.60% of the potential room for improvement.

Type

a
Wacholder, N.; Liu, L.: User preference : a measure of query-term quality (2006) 0.00
```
0.0026742492 = product of:
  0.0053484985 = sum of:
    0.0053484985 = product of:
      0.010696997 = sum of:
        0.010696997 = weight(_text_:a in 19) [ClassicSimilarity], result of:
          0.010696997 = score(doc=19,freq=20.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.20142901 = fieldWeight in 19, product of:
              4.472136 = tf(freq=20.0), with freq of:
                20.0 = termFreq=20.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=19)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The goal of this research is to understand what characteristics, if any, lead users engaged in interactive information seeking to prefer certain sets of query terms. Underlying this work is the assumption that query terms that information seekers prefer induce a kind of cognitive efficiency: They require less mental effort to process and therefore reduce the energy required in the interactive information-seeking process. Conceptually, this work applies insights from linguistics and cognitive science to the study of query-term quality. We report on an experiment in which we compare user preference for three sets of terms; one had been preconstructed by a human indexer, and two were identified automatically. Twenty-four participants used a merged list of all terms to answer a carefully created set of questions. By design, the interface constrained users to access the text exclusively via the displayed list of query terms. We found that participants displayed a preference for the human-constructed set of terms eight times greater than the preference for either set of automatically identified terms. We speculate about reasons for this strong preference and discuss the implications for information access. The primary contributions of this research are (a) explication of the concept of user preference as a measure of queryterm quality and (b) identification of a replicable procedure for measuring preference for sets of query terms created by different methods, whether human or automatic. All other factors being equal, query terms that users prefer clearly are the best choice for real-world information-access systems.

Type

a
Wacholder, N.; Liu, L.: Assessing term effectiveness in the interactive information access process (2008) 0.00
```
0.0023919214 = product of:
  0.0047838427 = sum of:
    0.0047838427 = product of:
      0.009567685 = sum of:
        0.009567685 = weight(_text_:a in 2079) [ClassicSimilarity], result of:
          0.009567685 = score(doc=2079,freq=16.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.18016359 = fieldWeight in 2079, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2079)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

This study addresses the question of whether the way in which sets of query terms are identified has an impact on the effectiveness of users' information seeking efforts. Query terms are text strings used as input to an information access system; they are products of a method or grammar that identifies a set of query terms. We conducted an experiment that compared the effectiveness of sets of query terms identified for a single book by three different methods. One had been previously prepared by a human indexer for a back-of-the-book index. The other two were identified by computer programs that used a combination of linguistic and statistical criteria to extract terms from full text. Effectiveness was measured by (1) whether selected query terms led participants to correct answers and (2) how long it took participants to obtain correct answers. Our results show that two sets of terms - the human terms and the set selected according to the linguistically more sophisticated criteria - were significantly more effective than the third set of terms. This single case demonstrates that query languages do have a measurable impact on the effectiveness of query term languages in the interactive information access process. The procedure described in this paper can be used to assess the effectiveness for information seekers of query terms identified by any query language.

Type

a

Wacholder, N.: Interactive query formulation (2011) 0.00

0.0023678814 = product of:
  0.0047357627 = sum of:
    0.0047357627 = product of:
      0.009471525 = sum of:
        0.009471525 = weight(_text_:a in 4196) [ClassicSimilarity], result of:
          0.009471525 = score(doc=4196,freq=2.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.17835285 = fieldWeight in 4196, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.109375 = fieldNorm(doc=4196)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Type: a

Ng, K.B.; Kantor, P.B.; Strzalkowski, T.; Wacholder, N.; Tang, R.; Bai, B.; Rittman,; Song, P.; Sun, Y.: Automated judgment of document qualities (2006) 0.00
```
0.002269176 = product of:
  0.004538352 = sum of:
    0.004538352 = product of:
      0.009076704 = sum of:
        0.009076704 = weight(_text_:a in 182) [ClassicSimilarity], result of:
          0.009076704 = score(doc=182,freq=10.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.1709182 = fieldWeight in 182, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=182)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The authors report on a series of experiments to automate the assessment of document qualities such as depth and objectivity. The primary purpose is to develop a quality-sensitive functionality, orthogonal to relevance, to select documents for an interactive question-answering system. The study consisted of two stages. In the classifier construction stage, nine document qualities deemed important by information professionals were identified and classifiers were developed to predict their values. In the confirmative evaluation stage, the performance of the developed methods was checked using a different document collection. The quality prediction methods worked well in the second stage. The results strongly suggest that the best way to predict document qualities automatically is to construct classifiers on a person-by-person basis.

Type

a
Muresan, S.; Gonzalez-Ibanez, R.; Ghosh, D.; Wacholder, N.: Identification of nonliteral language in social media : a case study on sarcasm (2016) 0.00
```
0.0018909799 = product of:
  0.0037819599 = sum of:
    0.0037819599 = product of:
      0.0075639198 = sum of:
        0.0075639198 = weight(_text_:a in 3155) [ClassicSimilarity], result of:
          0.0075639198 = score(doc=3155,freq=10.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.14243183 = fieldWeight in 3155, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3155)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

With the rapid development of social media, spontaneously user-generated content such as tweets and forum posts have become important materials for tracking people's opinions and sentiments online. A major hurdle for current state-of-the-art automatic methods for sentiment analysis is the fact that human communication often involves the use of sarcasm or irony, where the author means the opposite of what she/he says. Sarcasm transforms the polarity of an apparently positive or negative utterance into its opposite. Lack of naturally occurring utterances labeled for sarcasm is one of the key problems for the development of machine-learning methods for sarcasm detection. We report on a method for constructing a corpus of sarcastic Twitter messages in which determination of the sarcasm of each message has been made by its author. We use this reliable corpus to compare sarcastic utterances in Twitter to utterances that express positive or negative attitudes without sarcasm. We investigate the impact of lexical and pragmatic factors on machine-learning effectiveness for identifying sarcastic utterances and we compare the performance of machine-learning techniques and human judges on this task.

Type

a
Kelly, D.; Wacholder, N.; Rittman, R.; Sun, Y.; Kantor, P.; Small, S.; Strzalkowski, T.: Using interview data to identify evaluation criteria for interactive, analytical question-answering systems (2007) 0.00
```
0.0014351527 = product of:
  0.0028703054 = sum of:
    0.0028703054 = product of:
      0.005740611 = sum of:
        0.005740611 = weight(_text_:a in 332) [ClassicSimilarity], result of:
          0.005740611 = score(doc=332,freq=4.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.10809815 = fieldWeight in 332, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=332)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The purpose of this work is to identify potential evaluation criteria for interactive, analytical question-answering (QA) systems by analyzing evaluative comments made by users of such a system. Qualitative data collected from intelligence analysts during interviews and focus groups were analyzed to identify common themes related to performance, use, and usability. These data were collected as part of an intensive, three-day evaluation workshop of the High-Quality Interactive Question Answering (HITIQA) system. Inductive coding and memoing were used to identify and categorize these data. Results suggest potential evaluation criteria for interactive, analytical QA systems, which can be used to guide the development and design of future systems and evaluations. This work contributes to studies of QA systems, information seeking and use behaviors, and interactive searching.

Type

a

Wacholder, N.; Byrd, R.J.: Retrieving information from full text using linguistic knowledge (1994) 0.00

0.0010148063 = product of:
  0.0020296127 = sum of:
    0.0020296127 = product of:
      0.0040592253 = sum of:
        0.0040592253 = weight(_text_:a in 8524) [ClassicSimilarity], result of:
          0.0040592253 = score(doc=8524,freq=2.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.07643694 = fieldWeight in 8524, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=8524)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Type: a

Search (9 results, page 1 of 1)

Authors

Years

Themes