Search (6 results, page 1 of 1)

  • × author_ss:"Luk, R.W.P."
  1. Dang, E.K.F.; Luk, R.W.P.; Allan, J.: ¬A context-dependent relevance model (2016) 0.03
    0.03400038 = product of:
      0.10200114 = sum of:
        0.10200114 = weight(_text_:query in 2778) [ClassicSimilarity], result of:
          0.10200114 = score(doc=2778,freq=6.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.44470036 = fieldWeight in 2778, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2778)
      0.33333334 = coord(1/3)
    
    Abstract
    Numerous past studies have demonstrated the effectiveness of the relevance model (RM) for information retrieval (IR). This approach enables relevance or pseudo-relevance feedback to be incorporated within the language modeling framework of IR. In the traditional RM, the feedback information is used to improve the estimate of the query language model. In this article, we introduce an extension of RM in the setting of relevance feedback. Our method provides an additional way to incorporate feedback via the improvement of the document language models. Specifically, we make use of the context information of known relevant and nonrelevant documents to obtain weighted counts of query terms for estimating the document language models. The context information is based on the words (unigrams or bigrams) appearing within a text window centered on query terms. Experiments on several Text REtrieval Conference (TREC) collections show that our context-dependent relevance model can improve retrieval performance over the baseline RM. Together with previous studies within the BM25 framework, our current study demonstrates that the effectiveness of our method for using context information in IR is quite general and not limited to any specific retrieval model.
  2. Wu, H.C.; Luk, R.W.P.; Wong, K.F,; Kwok, K.L.: ¬A retrospective study of a hybrid document-context based retrieval model (2007) 0.03
    0.027761191 = product of:
      0.08328357 = sum of:
        0.08328357 = weight(_text_:query in 936) [ClassicSimilarity], result of:
          0.08328357 = score(doc=936,freq=4.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.3630963 = fieldWeight in 936, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.0390625 = fieldNorm(doc=936)
      0.33333334 = coord(1/3)
    
    Abstract
    This paper describes our novel retrieval model that is based on contexts of query terms in documents (i.e., document contexts). Our model is novel because it explicitly takes into account of the document contexts instead of implicitly using the document contexts to find query expansion terms. Our model is based on simulating a user making relevance decisions, and it is a hybrid of various existing effective models and techniques. It estimates the relevance decision preference of a document context as the log-odds and uses smoothing techniques as found in language models to solve the problem of zero probabilities. It combines these estimated preferences of document contexts using different types of aggregation operators that comply with different relevance decision principles (e.g., aggregate relevance principle). Our model is evaluated using retrospective experiments (i.e., with full relevance information), because such experiments can (a) reveal the potential of our model, (b) isolate the problems of the model from those of the parameter estimation, (c) provide information about the major factors affecting the retrieval effectiveness of the model, and (d) show that whether the model obeys the probability ranking principle. Our model is promising as its mean average precision is 60-80% in our experiments using different TREC ad hoc English collections and the NTCIR-5 ad hoc Chinese collection. Our experiments showed that (a) the operators that are consistent with aggregate relevance principle were effective in combining the estimated preferences, and (b) that estimating probabilities using the contexts in the relevant documents can produce better retrieval effectiveness than using the entire relevant documents.
  3. Wong, W.S.; Luk, R.W.P.; Leong, H.V.; Ho, K.S.; Lee, D.L.: Re-examining the effects of adding relevance information in a relevance feedback environment (2008) 0.03
    0.027761191 = product of:
      0.08328357 = sum of:
        0.08328357 = weight(_text_:query in 2084) [ClassicSimilarity], result of:
          0.08328357 = score(doc=2084,freq=4.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.3630963 = fieldWeight in 2084, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2084)
      0.33333334 = coord(1/3)
    
    Abstract
    This paper presents an investigation about how to automatically formulate effective queries using full or partial relevance information (i.e., the terms that are in relevant documents) in the context of relevance feedback (RF). The effects of adding relevance information in the RF environment are studied via controlled experiments. The conditions of these controlled experiments are formalized into a set of assumptions that form the framework of our study. This framework is called idealized relevance feedback (IRF) framework. In our IRF settings, we confirm the previous findings of relevance feedback studies. In addition, our experiments show that better retrieval effectiveness can be obtained when (i) we normalize the term weights by their ranks, (ii) we select weighted terms in the top K retrieved documents, (iii) we include terms in the initial title queries, and (iv) we use the best query sizes for each topic instead of the average best query size where they produce at most five percentage points improvement in the mean average precision (MAP) value. We have also achieved a new level of retrieval effectiveness which is about 55-60% MAP instead of 40+% in the previous findings. This new level of retrieval effectiveness was found to be similar to a level using a TREC ad hoc test collection that is about double the number of documents in the TREC-3 test collection used in previous works.
  4. Dang, E.K.F.; Luk, R.W.P.; Allan, J.: Beyond bag-of-words : bigram-enhanced context-dependent term weights (2014) 0.03
    0.027761191 = product of:
      0.08328357 = sum of:
        0.08328357 = weight(_text_:query in 1283) [ClassicSimilarity], result of:
          0.08328357 = score(doc=1283,freq=4.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.3630963 = fieldWeight in 1283, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1283)
      0.33333334 = coord(1/3)
    
    Abstract
    While term independence is a widely held assumption in most of the established information retrieval approaches, it is clearly not true and various works in the past have investigated a relaxation of the assumption. One approach is to use n-grams in document representation instead of unigrams. However, the majority of early works on n-grams obtained only modest performance improvement. On the other hand, the use of information based on supporting terms or "contexts" of queries has been found to be promising. In particular, recent studies showed that using new context-dependent term weights improved the performance of relevance feedback (RF) retrieval compared with using traditional bag-of-words BM25 term weights. Calculation of the new term weights requires an estimation of the local probability of relevance of each query term occurrence. In previous studies, the estimation of this probability was based on unigrams that occur in the neighborhood of a query term. We explore an integration of the n-gram and context approaches by computing context-dependent term weights based on a mixture of unigrams and bigrams. Extensive experiments are performed using the title queries of the Text Retrieval Conference (TREC)-6, TREC-7, TREC-8, and TREC-2005 collections, for RF with relevance judgment of either the top 10 or top 20 documents of an initial retrieval. We identify some crucial elements needed in the use of bigrams in our methods, such as proper inverse document frequency (IDF) weighting of the bigrams and noise reduction by pruning bigrams with large document frequency values. We show that enhancing context-dependent term weights with bigrams is effective in further improving retrieval performance.
  5. Luk, R.W.P.; Leong, H.V.; Dillon, T.S.; Chan, A.T.S.; Croft, W.B.; Allen, J.: ¬A survey in indexing and searching XML documents (2002) 0.02
    0.023556154 = product of:
      0.07066846 = sum of:
        0.07066846 = weight(_text_:query in 460) [ClassicSimilarity], result of:
          0.07066846 = score(doc=460,freq=2.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.30809742 = fieldWeight in 460, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.046875 = fieldNorm(doc=460)
      0.33333334 = coord(1/3)
    
    Abstract
    XML holds the promise to yield (1) a more precise search by providing additional information in the elements, (2) a better integrated search of documents from heterogeneous sources, (3) a powerful search paradigm using structural as well as content specifications, and (4) data and information exchange to share resources and to support cooperative search. We survey several indexing techniques for XML documents, grouping them into flatfile, semistructured, and structured indexing paradigms. Searching techniques and supporting techniques for searching are reviewed, including full text search and multistage search. Because searching XML documents can be very flexible, various search result presentations are discussed, as well as database and information retrieval system integration and XML query languages. We also survey various retrieval models, examining how they would be used or extended for retrieving XML documents. To conclude the article, we discuss various open issues that XML poses with respect to information retrieval and database research.
  6. Dang, E.K.F.; Luk, R.W.P.; Allan, J.; Ho, K.S.; Chung, K.F.L.; Lee, D.L.: ¬A new context-dependent term weight computed by boost and discount using relevance information (2010) 0.02
    0.019630127 = product of:
      0.05889038 = sum of:
        0.05889038 = weight(_text_:query in 4120) [ClassicSimilarity], result of:
          0.05889038 = score(doc=4120,freq=2.0), product of:
            0.22937049 = queryWeight, product of:
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.049352113 = queryNorm
            0.25674784 = fieldWeight in 4120, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.6476326 = idf(docFreq=1151, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4120)
      0.33333334 = coord(1/3)
    
    Abstract
    We studied the effectiveness of a new class of context-dependent term weights for information retrieval. Unlike the traditional term frequency-inverse document frequency (TF-IDF), the new weighting of a term t in a document d depends not only on the occurrence statistics of t alone but also on the terms found within a text window (or "document-context") centered on t. We introduce a Boost and Discount (B&D) procedure which utilizes partial relevance information to compute the context-dependent term weights of query terms according to a logistic regression model. We investigate the effectiveness of the new term weights compared with the context-independent BM25 weights in the setting of relevance feedback. We performed experiments with title queries of the TREC-6, -7, -8, and 2005 collections, comparing the residual Mean Average Precision (MAP) measures obtained using B&D term weights and those obtained by a baseline using BM25 weights. Given either 10 or 20 relevance judgments of the top retrieved documents, using the new term weights yields improvement over the baseline for all collections tested. The MAP obtained with the new weights has relative improvement over the baseline by 3.3 to 15.2%, with statistical significance at the 95% confidence level across all four collections.