Search (3 results, page 1 of 1)

Did you mean:
lcsh's%3a%22Cataloging %2f data processing%22 3
lcshs%3a%22Cataloging %2f data processing%22 3

Pan, M.; Huang, J.X.; He, T.; Mao, Z.; Ying, Z.; Tu, X.: ¬A simple kernel co-occurrence-based enhancement for pseudo-relevance feedback (2020) 0.01
```
0.011199882 = product of:
  0.04479953 = sum of:
    0.04479953 = weight(_text_:data in 5678) [ClassicSimilarity], result of:
      0.04479953 = score(doc=5678,freq=6.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.30255508 = fieldWeight in 5678, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5678)
  0.25 = coord(1/4)
```
Abstract

Pseudo-relevance feedback is a well-studied query expansion technique in which it is assumed that the top-ranked documents in an initial set of retrieval results are relevant and expansion terms are then extracted from those documents. When selecting expansion terms, most traditional models do not simultaneously consider term frequency and the co-occurrence relationships between candidate terms and query terms. Intuitively, however, a term that has a higher co-occurrence with a query term is more likely to be related to the query topic. In this article, we propose a kernel co-occurrence-based framework to enhance retrieval performance by integrating term co-occurrence information into the Rocchio model and a relevance language model (RM3). Specifically, a kernel co-occurrence-based Rocchio method (KRoc) and a kernel co-occurrence-based RM3 method (KRM3) are proposed. In our framework, co-occurrence information is incorporated into both the factor of the term discrimination power and the factor of the within-document term weight to boost retrieval performance. The results of a series of experiments show that our proposed methods significantly outperform the corresponding strong baselines over all data sets in terms of the mean average precision and over most data sets in terms of P@10. A direct comparison of standard Text Retrieval Conference data sets indicates that our proposed methods are at least comparable to state-of-the-art approaches.
Qi, Q.; Hessen, D.J.; Heijden, P.G.M. van der: Improving information retrieval through correspondenceanalysis instead of latent semantic analysis (2023) 0.01
```
0.010973599 = product of:
  0.043894395 = sum of:
    0.043894395 = weight(_text_:data in 1045) [ClassicSimilarity], result of:
      0.043894395 = score(doc=1045,freq=4.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.29644224 = fieldWeight in 1045, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046875 = fieldNorm(doc=1045)
  0.25 = coord(1/4)
```
Abstract

The initial dimensions extracted by latent semantic analysis (LSA) of a document-term matrixhave been shown to mainly display marginal effects, which are irrelevant for informationretrieval. To improve the performance of LSA, usually the elements of the raw document-term matrix are weighted and the weighting exponent of singular values can be adjusted.An alternative information retrieval technique that ignores the marginal effects is correspon-dence analysis (CA). In this paper, the information retrieval performance of LSA and CA isempirically compared. Moreover, it is explored whether the two weightings also improve theperformance of CA. The results for four empirical datasets show that CA always performsbetter than LSA. Weighting the elements of the raw data matrix can improve CA; however,it is data dependent and the improvement is small. Adjusting the singular value weightingexponent often improves the performance of CA; however, the extent of the improvementdepends on the dataset and the number of dimensions. (PDF) Improving information retrieval through correspondence analysis instead of latent semantic analysis.
Wiggers, G.; Verberne, S.; Loon, W. van; Zwenne, G.-J.: Bibliometric-enhanced legal information retrieval : combining usage and citations as flavors of impact relevance (2023) 0.01
```
0.009144665 = product of:
  0.03657866 = sum of:
    0.03657866 = weight(_text_:data in 1022) [ClassicSimilarity], result of:
      0.03657866 = score(doc=1022,freq=4.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.24703519 = fieldWeight in 1022, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1022)
  0.25 = coord(1/4)
```
Abstract

Bibliometric-enhanced information retrieval uses bibliometrics (e.g., citations) to improve ranking algorithms. Using a data-driven approach, this article describes the development of a bibliometric-enhanced ranking algorithm for legal information retrieval, and the evaluation thereof. We statistically analyze the correlation between usage of documents and citations over time, using data from a commercial legal search engine. We then propose a bibliometric boost function that combines usage of documents with citation counts. The core of this function is an impact variable based on usage and citations that increases in influence as citations and usage counts become more reliable over time. We evaluate our ranking function by comparing search sessions before and after the introduction of the new ranking in the search engine. Using a cost model applied to 129,571 sessions before and 143,864 sessions after the intervention, we show that our bibliometric-enhanced ranking algorithm reduces the time of a search session of legal professionals by 2 to 3% on average for use cases other than known-item retrieval or updating behavior. Given the high hourly tariff of legal professionals and the limited time they can spend on research, this is expected to lead to increased efficiency, especially for users with extremely long search sessions.

Search (3 results, page 1 of 1)

Authors

Types