Search (3 results, page 1 of 1)

  • × author_ss:"Elsayed, T."
  • × language_ss:"e"
  1. Hasanain, M.; Elsayed, T.: Studying effectiveness of Web search for fact checking (2022) 0.01
    0.011789299 = product of:
      0.023578597 = sum of:
        0.023578597 = product of:
          0.047157194 = sum of:
            0.047157194 = weight(_text_:systems in 558) [ClassicSimilarity], result of:
              0.047157194 = score(doc=558,freq=6.0), product of:
                0.16037072 = queryWeight, product of:
                  3.0731742 = idf(docFreq=5561, maxDocs=44218)
                  0.052184064 = queryNorm
                0.29405114 = fieldWeight in 558, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.0731742 = idf(docFreq=5561, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=558)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Web search is commonly used by fact checking systems as a source of evidence for claim verification. In this work, we demonstrate that the task of retrieving pages useful for fact checking, called evidential pages, is indeed different from the task of retrieving topically relevant pages that are typically optimized by search engines; thus, it should be handled differently. We conduct a comprehensive study on the performance of retrieving evidential pages over a test collection we developed for the task of re-ranking Web pages by usefulness for fact-checking. Results show that pages (retrieved by a commercial search engine) that are topically relevant to a claim are not always useful for verifying it, and that the engine's performance in retrieving evidential pages is weakly correlated with retrieval of topically relevant pages. Additionally, we identify types of evidence in evidential pages and some linguistic cues that can help predict page usefulness. Moreover, preliminary experiments show that a retrieval model leveraging those cues has a higher performance compared to the search engine. Finally, we show that existing systems have a long way to go to support effective fact checking. To that end, our work provides insights to guide design of better future systems for the task.
  2. Kang, H.; Plaisant, C.; Elsayed, T.; Oard, D.W.: Making sense of archived e-mail : exploring the Enron collection with NetLens (2010) 0.01
    0.008167865 = product of:
      0.01633573 = sum of:
        0.01633573 = product of:
          0.03267146 = sum of:
            0.03267146 = weight(_text_:systems in 3446) [ClassicSimilarity], result of:
              0.03267146 = score(doc=3446,freq=2.0), product of:
                0.16037072 = queryWeight, product of:
                  3.0731742 = idf(docFreq=5561, maxDocs=44218)
                  0.052184064 = queryNorm
                0.2037246 = fieldWeight in 3446, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0731742 = idf(docFreq=5561, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3446)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Informal communications media pose new challenges for information-systems design, but the nature of informal interaction offers new opportunities as well. This paper describes NetLens-E-mail, a system designed to support exploration of the content-actor network in large e-mail collections. Unique features of NetLens-E-mail include close coupling of orientation, specification, restriction, and expansion, and introduction and incorporation of a novel capability for iterative projection between content and actor networks within the same collection. Scenarios are presented to illustrate the intended employment of NetLens-E-mail, and design walkthroughs with two domain experts provide an initial basis for assessment of the suitability of the design by scholars and analysts.
  3. Kutlu, M.; Elsayed, T.; Lease, M.: Intelligent topic selection for low-cost information retrieval evaluation : a new perspective on deep vs. shallow judging (2018) 0.01
    0.007700737 = product of:
      0.015401474 = sum of:
        0.015401474 = product of:
          0.030802948 = sum of:
            0.030802948 = weight(_text_:systems in 5092) [ClassicSimilarity], result of:
              0.030802948 = score(doc=5092,freq=4.0), product of:
                0.16037072 = queryWeight, product of:
                  3.0731742 = idf(docFreq=5561, maxDocs=44218)
                  0.052184064 = queryNorm
                0.19207339 = fieldWeight in 5092, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.0731742 = idf(docFreq=5561, maxDocs=44218)
                  0.03125 = fieldNorm(doc=5092)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    While test collections provide the cornerstone for Cranfield-based evaluation of information retrieval (IR) systems, it has become practically infeasible to rely on traditional pooling techniques to construct test collections at the scale of today's massive document collections (e.g., ClueWeb12's 700M+ Webpages). This has motivated a flurry of studies proposing more cost-effective yet reliable IR evaluation methods. In this paper, we propose a new intelligent topic selection method which reduces the number of search topics (and thereby costly human relevance judgments) needed for reliable IR evaluation. To rigorously assess our method, we integrate previously disparate lines of research on intelligent topic selection and deep vs. shallow judging (i.e., whether it is more cost-effective to collect many relevance judgments for a few topics or a few judgments for many topics). While prior work on intelligent topic selection has never been evaluated against shallow judging baselines, prior work on deep vs. shallow judging has largely argued for shallowed judging, but assuming random topic selection. We argue that for evaluating any topic selection method, ultimately one must ask whether it is actually useful to select topics, or should one simply perform shallow judging over many topics? In seeking a rigorous answer to this over-arching question, we conduct a comprehensive investigation over a set of relevant factors never previously studied together: 1) method of topic selection; 2) the effect of topic familiarity on human judging speed; and 3) how different topic generation processes (requiring varying human effort) impact (i) budget utilization and (ii) the resultant quality of judgments. Experiments on NIST TREC Robust 2003 and Robust 2004 test collections show that not only can we reliably evaluate IR systems with fewer topics, but also that: 1) when topics are intelligently selected, deep judging is often more cost-effective than shallow judging in evaluation reliability; and 2) topic familiarity and topic generation costs greatly impact the evaluation cost vs. reliability trade-off. Our findings challenge conventional wisdom in showing that deep judging is often preferable to shallow judging when topics are selected intelligently.