Search (2 results, page 1 of 1)

Ye, Z.; Huang, J.X.; He, B.; Lin, H.: Mining a multilingual association dictionary from Wikipedia for cross-language information retrieval (2012) 0.00
```
0.0033183135 = product of:
  0.0099549405 = sum of:
    0.0099549405 = weight(_text_:a in 513) [ClassicSimilarity], result of:
      0.0099549405 = score(doc=513,freq=18.0), product of:
        0.05209492 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.045180224 = queryNorm
        0.19109234 = fieldWeight in 513, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0390625 = fieldNorm(doc=513)
  0.33333334 = coord(1/3)
```
Abstract

Wikipedia is characterized by its dense link structure and a large number of articles in different languages, which make it a notable Web corpus for knowledge extraction and mining, in particular for mining the multilingual associations. In this paper, motivated by a psychological theory of word meaning, we propose a graph-based approach to constructing a cross-language association dictionary (CLAD) from Wikipedia, which can be used in a variety of cross-language accessing and processing applications. In order to evaluate the quality of the mined CLAD, and to demonstrate how the mined CLAD can be used in practice, we explore two different applications of the mined CLAD to cross-language information retrieval (CLIR). First, we use the mined CLAD to conduct cross-language query expansion; and, second, we use it to filter out translation candidates with low translation probabilities. Experimental results on a variety of standard CLIR test collections show that the CLIR retrieval performance can be substantially improved with the above two applications of CLAD, which indicates that the mined CLAD is of sound quality.

Type

a
Ye, Z.; Huang, J.X.; Lin, H.: Finding a good query-related topic for boosting pseudo-relevance feedback (2011) 0.00
```
0.0029264777 = product of:
  0.008779433 = sum of:
    0.008779433 = weight(_text_:a in 4372) [ClassicSimilarity], result of:
      0.008779433 = score(doc=4372,freq=14.0), product of:
        0.05209492 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.045180224 = queryNorm
        0.1685276 = fieldWeight in 4372, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4372)
  0.33333334 = coord(1/3)
```
Abstract

Pseudo-relevance feedback (PRF) via query expansion (QE) assumes that the top-ranked documents from the first-pass retrieval are relevant. The most informative terms in the pseudo-relevant feedback documents are then used to update the original query representation in order to boost the retrieval performance. Most current PRF approaches estimate the importance of the candidate expansion terms based on their statistics on document level. However, a document for PRF may consist of different topics, which may not be all related to the query even if the document is judged relevant. The main argument of this article is the proposal to conduct PRF on a granularity smaller than on the document level. In this article, we propose a topic-based feedback model with three different strategies for finding a good query-related topic based on the Latent Dirichlet Allocation model. The experimental results on four representative TREC collections show that QE based on the derived topic achieves statistically significant improvements over a strong feedback model in the language modeling framework, which updates the query representation based on the top-ranked documents.

Type

a

Search (2 results, page 1 of 1)

Authors