Search (4 results, page 1 of 1)

  • × author_ss:"Xu, J."
  • × year_i:[2000 TO 2010}
  1. Xu, J.; Croft, W.B.: Topic-based language models for distributed retrieval (2000) 0.00
    0.0024924895 = product of:
      0.004984979 = sum of:
        0.004984979 = product of:
          0.009969958 = sum of:
            0.009969958 = weight(_text_:a in 38) [ClassicSimilarity], result of:
              0.009969958 = score(doc=38,freq=18.0), product of:
                0.043477926 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.037706986 = queryNorm
                0.22931081 = fieldWeight in 38, product of:
                  4.2426405 = tf(freq=18.0), with freq of:
                    18.0 = termFreq=18.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=38)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Effective retrieval in a distributed environment is an important but difficult problem. Lack of effectiveness appears to have two major causes. First, existing collection selection algorithms do not work well on heterogeneous collections. Second, relevant documents are scattered over many collections and searching a few collections misses many relevant documents. We propose a topic-oriented approach to distributed retrieval. With this approach, we structure the document set of a distributed retrieval environment around a set of topics. Retrieval for a query involves first selecting the right topics for the query and then dispatching the search process to collections that contain such topics. The content of a topic is characterized by a language model. In environments where the labeling of documents by topics is unavailable, document clustering is employed for topic identification. Based on these ideas, three methods are proposed to suit different environments. We show that all three methods improve effectiveness of distributed retrieval
    Type
    a
  2. Xu, J.; Weischedel, R.: Empirical studies on the impact of lexical resources on CLIR performance (2005) 0.00
    0.0021981692 = product of:
      0.0043963385 = sum of:
        0.0043963385 = product of:
          0.008792677 = sum of:
            0.008792677 = weight(_text_:a in 1020) [ClassicSimilarity], result of:
              0.008792677 = score(doc=1020,freq=14.0), product of:
                0.043477926 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.037706986 = queryNorm
                0.20223314 = fieldWeight in 1020, product of:
                  3.7416575 = tf(freq=14.0), with freq of:
                    14.0 = termFreq=14.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1020)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    In this paper, we compile and review several experiments measuring cross-lingual information retrieval (CLIR) performance as a function of the following resources: bilingual term lists, parallel corpora, machine translation (MT), and stemmers. Our CLIR system uses a simple probabilistic language model; the studies used TREC test corpora over Chinese, Spanish and Arabic. Our findings include: One can achieve an acceptable CLIR performance using only a bilingual term list (70-80% on Chinese and Arabic corpora). However, if a bilingual term list and parallel corpora are available, CLIR performance can rival monolingual performance. If no parallel corpus is available, pseudo-parallel texts produced by an MT system can partially overcome the lack of parallel text. While stemming is useful normally, with a very large parallel corpus for Arabic-English, stemming hurt performance in our empirical studies with Arabic, a highly inflected language.
    Type
    a
  3. Xu, J.; Weischedel, R.; Licuanan, A.: Evaluation of an extraction-based approach to answering definitional questions (2004) 0.00
    0.0019582848 = product of:
      0.0039165695 = sum of:
        0.0039165695 = product of:
          0.007833139 = sum of:
            0.007833139 = weight(_text_:a in 4107) [ClassicSimilarity], result of:
              0.007833139 = score(doc=4107,freq=4.0), product of:
                0.043477926 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.037706986 = queryNorm
                0.18016359 = fieldWeight in 4107, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.078125 = fieldNorm(doc=4107)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Type
    a
  4. Schroeder, J.; Xu, J.; Chen, H.; Chau, M.: Automated criminal link analysis based on domain knowledge (2007) 0.00
    0.0016616598 = product of:
      0.0033233196 = sum of:
        0.0033233196 = product of:
          0.006646639 = sum of:
            0.006646639 = weight(_text_:a in 275) [ClassicSimilarity], result of:
              0.006646639 = score(doc=275,freq=8.0), product of:
                0.043477926 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.037706986 = queryNorm
                0.15287387 = fieldWeight in 275, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=275)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Link (association) analysis has been used in the criminal justice domain to search large datasets for associations between crime entities in order to facilitate crime investigations. However, link analysis still faces many challenging problems, such as information overload, high search complexity, and heavy reliance on domain knowledge. To address these challenges, this article proposes several techniques for automated, effective, and efficient link analysis. These techniques include the co-occurrence analysis, the shortest path algorithm, and a heuristic approach to identifying associations and determining their importance. We developed a prototype system called CrimeLink Explorer based on the proposed techniques. Results of a user study with 10 crime investigators from the Tucson Police Department showed that our system could help subjects conduct link analysis more efficiently than traditional single-level link analysis tools. Moreover, subjects believed that association paths found based on the heuristic approach were more accurate than those found based solely on the co-occurrence analysis and that the automated link analysis system would be of great help in crime investigations.
    Type
    a