Search (99 results, page 2 of 5)

Fuhr, N.: Rankingexperimente mit gewichteter Indexierung (1986) 0.01

0.0073257135 = product of:
  0.036628567 = sum of:
    0.036628567 = product of:
      0.07325713 = sum of:
        0.07325713 = weight(_text_:22 in 2051) [ClassicSimilarity], result of:
          0.07325713 = score(doc=2051,freq=2.0), product of:
            0.15778607 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04505818 = queryNorm
            0.46428138 = fieldWeight in 2051, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=2051)
      0.5 = coord(1/2)
  0.2 = coord(1/5)

Date: 14. 6.2015 22:12:56

Bodoff, D.; Enache, D.; Kambil, A.; Simon, G.; Yukhimets, A.: ¬A unified maximum likelihood approach to document retrieval (2001) 0.01
```
0.006678094 = product of:
  0.03339047 = sum of:
    0.03339047 = product of:
      0.06678094 = sum of:
        0.06678094 = weight(_text_:data in 174) [ClassicSimilarity], result of:
          0.06678094 = score(doc=174,freq=10.0), product of:
            0.14247625 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.04505818 = queryNorm
            0.46871632 = fieldWeight in 174, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=174)
      0.5 = coord(1/2)
  0.2 = coord(1/5)
```
Abstract

Empirical work shows significant benefits from using relevance feedback data to improve information retrieval (IR) performance. Still, one fundamental difficulty has limited the ability to fully exploit this valuable data. The problem is that it is not clear whether the relevance feedback data should be used to train the system about what the users really mean, or about what the documents really mean. In this paper, we resolve the question using a maximum likelihood framework. We show how all the available data can be used to simultaneously estimate both documents and queries in proportions that are optimal in a maximum likelihood sense. The resulting algorithm is directly applicable to many approaches to IR, and the unified framework can help explain previously reported results as well as guidethe search for new methods that utilize feedback data in IR
Li, M.; Li, H.; Zhou, Z.-H.: Semi-supervised document retrieval (2009) 0.01
```
0.006096238 = product of:
  0.03048119 = sum of:
    0.03048119 = product of:
      0.06096238 = sum of:
        0.06096238 = weight(_text_:data in 4218) [ClassicSimilarity], result of:
          0.06096238 = score(doc=4218,freq=12.0), product of:
            0.14247625 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.04505818 = queryNorm
            0.4278775 = fieldWeight in 4218, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4218)
      0.5 = coord(1/2)
  0.2 = coord(1/5)
```
Abstract

This paper proposes a new machine learning method for constructing ranking models in document retrieval. The method, which is referred to as SSRank, aims to use the advantages of both the traditional Information Retrieval (IR) methods and the supervised learning methods for IR proposed recently. The advantages include the use of limited amount of labeled data and rich model representation. To do so, the method adopts a semi-supervised learning framework in ranking model construction. Specifically, given a small number of labeled documents with respect to some queries, the method effectively labels the unlabeled documents for the queries. It then uses all the labeled data to train a machine learning model (in our case, Neural Network). In the data labeling, the method also makes use of a traditional IR model (in our case, BM25). A stopping criterion based on machine learning theory is given for the data labeling process. Experimental results on three benchmark datasets and one web search dataset indicate that SSRank consistently and almost always significantly outperforms the baseline methods (unsupervised and supervised learning methods), given the same amount of labeled data. This is because SSRank can effectively leverage the use of unlabeled data in learning.

Harman, D.; Fox, E.; Baeza-Yates, R.; Lee, W.: Inverted files (1992) 0.01

0.0056314636 = product of:
  0.028157318 = sum of:
    0.028157318 = product of:
      0.056314636 = sum of:
        0.056314636 = weight(_text_:data in 3497) [ClassicSimilarity], result of:
          0.056314636 = score(doc=3497,freq=4.0), product of:
            0.14247625 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.04505818 = queryNorm
            0.3952563 = fieldWeight in 3497, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0625 = fieldNorm(doc=3497)
      0.5 = coord(1/2)
  0.2 = coord(1/5)

Abstract: This chaper presents a survey of the various structures (techniques) that can be used in building inverted files, and gives the details for producing an inverted file using sorted arrays. The chapter ends with 2 modifications to this basic method that are affective for large data collections
Source: Information retrieval: data structures and algorithms. Ed.: W.B. Frakes u. R. Baeza-Yates

French, J.C.; Powell, A.L.; Schulman, E.: Using clustering strategies for creating authority files (2000) 0.01
```
0.0051728296 = product of:
  0.025864149 = sum of:
    0.025864149 = product of:
      0.051728297 = sum of:
        0.051728297 = weight(_text_:data in 4811) [ClassicSimilarity], result of:
          0.051728297 = score(doc=4811,freq=6.0), product of:
            0.14247625 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.04505818 = queryNorm
            0.3630661 = fieldWeight in 4811, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=4811)
      0.5 = coord(1/2)
  0.2 = coord(1/5)
```
Abstract

As more online databases are integrated into digital libraries, the issue of quality control of the data becomes increasingly important, especially as it relates to the effective retrieval of information. Authority work, the need to discover and reconcile variant forms of strings in bibliographical entries, will become more critical in the future. Spelling variants, misspellings, and transliteration differences will all increase the difficulty of retrieving information. We investigate a number of approximate string matching techniques that have traditionally been used to help with this problem. We then introduce the notion of approximate word matching and show how it can be used to improve detection and categorization of variant forms. We demonstrate the utility of these approaches using data from the Astrophysics Data System and show how we can reduce the human effort involved in the creation of authority files
Bodoff, D.; Robertson, S.: ¬A new unified probabilistic model (2004) 0.01
```
0.0051728296 = product of:
  0.025864149 = sum of:
    0.025864149 = product of:
      0.051728297 = sum of:
        0.051728297 = weight(_text_:data in 2129) [ClassicSimilarity], result of:
          0.051728297 = score(doc=2129,freq=6.0), product of:
            0.14247625 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.04505818 = queryNorm
            0.3630661 = fieldWeight in 2129, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=2129)
      0.5 = coord(1/2)
  0.2 = coord(1/5)
```
Abstract

This paper proposes a new unified probabilistic model. Two previous models, Robertson et al.'s "Model 0" and "Model 3," each have strengths and weaknesses. The strength of Model 0 not found in Model 3, is that it does not require relevance data about the particular document or query, and, related to that, its probability estimates are straightforward. The strength of Model 3 not found in Model 0 is that it can utilize feedback information about the particular document and query in question. In this paper we introduce a new unified probabilistic model that combines these strengths: the expression of its probabilities is straightforward, it does not require that data must be available for the particular document or query in question, but it can utilize such specific data if it is available. The model is one way to resolve the difficulty of combining two marginal views in probabilistic retrieval.
Beitzel, S.M.; Jensen, E.C.; Chowdhury, A.; Grossman, D.; Frieder, O; Goharian, N.: Fusion of effective retrieval strategies in the same information retrieval system (2004) 0.01
```
0.0051728296 = product of:
  0.025864149 = sum of:
    0.025864149 = product of:
      0.051728297 = sum of:
        0.051728297 = weight(_text_:data in 2502) [ClassicSimilarity], result of:
          0.051728297 = score(doc=2502,freq=6.0), product of:
            0.14247625 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.04505818 = queryNorm
            0.3630661 = fieldWeight in 2502, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=2502)
      0.5 = coord(1/2)
  0.2 = coord(1/5)
```
Abstract

Prior efforts have shown that under certain situations retrieval effectiveness may be improved via the use of data fusion techniques. Although these improvements have been observed from the fusion of result sets from several distinct information retrieval systems, it has often been thought that fusing different document retrieval strategies in a single information retrieval system will lead to similar improvements. In this study, we show that this is not the case. We hold constant systemic differences such as parsing, stemming, phrase processing, and relevance feedback, and fuse result sets generated from highly effective retrieval strategies in the same information retrieval system. From this, we show that data fusion of highly effective retrieval strategies alone shows little or no improvement in retrieval effectiveness. Furthermore, we present a detailed analysis of the performance of modern data fusion approaches, and demonstrate the reasons why they do not perform weIl when applied to this problem. Detailed results and analyses are included to support our conclusions.
Biskri, I.; Rompré, L.: Using association rules for query reformulation (2012) 0.01
```
0.0051728296 = product of:
  0.025864149 = sum of:
    0.025864149 = product of:
      0.051728297 = sum of:
        0.051728297 = weight(_text_:data in 92) [ClassicSimilarity], result of:
          0.051728297 = score(doc=92,freq=6.0), product of:
            0.14247625 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.04505818 = queryNorm
            0.3630661 = fieldWeight in 92, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=92)
      0.5 = coord(1/2)
  0.2 = coord(1/5)
```
Abstract

In this paper the authors will present research on the combination of two methods of data mining: text classification and maximal association rules. Text classification has been the focus of interest of many researchers for a long time. However, the results take the form of lists of words (classes) that people often do not know what to do with. The use of maximal association rules induced a number of advantages: (1) the detection of dependencies and correlations between the relevant units of information (words) of different classes, (2) the extraction of hidden knowledge, often relevant, from a large volume of data. The authors will show how this combination can improve the process of information retrieval.

Theme

Data Mining

Sachs, W.M.: ¬An approach to associative retrieval through the theory of fuzzy sets (1976) 0.00

0.0049775573 = product of:
  0.024887787 = sum of:
    0.024887787 = product of:
      0.049775574 = sum of:
        0.049775574 = weight(_text_:data in 7) [ClassicSimilarity], result of:
          0.049775574 = score(doc=7,freq=2.0), product of:
            0.14247625 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.04505818 = queryNorm
            0.34936053 = fieldWeight in 7, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.078125 = fieldNorm(doc=7)
      0.5 = coord(1/2)
  0.2 = coord(1/5)

Abstract: The theory of fuzzy sets is used to provide a rogorous formulation of the problem of associative retrieval. This formulation suggests the idea of using fuzzy clustering to organize data for retrieval

Heinz, S.; Zobel, J.: Efficient single-pass index construction for text databases (2003) 0.00
```
0.0049275304 = product of:
  0.024637653 = sum of:
    0.024637653 = product of:
      0.049275305 = sum of:
        0.049275305 = weight(_text_:data in 1678) [ClassicSimilarity], result of:
          0.049275305 = score(doc=1678,freq=4.0), product of:
            0.14247625 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.04505818 = queryNorm
            0.34584928 = fieldWeight in 1678, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1678)
      0.5 = coord(1/2)
  0.2 = coord(1/5)
```
Abstract

Efficient construction of inverted indexes is essential to provision of search over large collections of text data. In this article, we review the principal approaches to inversion, analyze their theoretical cost, and present experimental results. We identify the drawbacks of existing inversion approaches and propose a single-pass inversion method that, in contrast to previous approaches, does not require the complete vocabulary of the indexed collection in main memory, can operate within limited resources, and does not sacrifice speed with high temporary storage requirements. We show that the performance of the single-pass approach can be improved by constructing inverted files in segments, reducing the cost of disk accesses during inversion of large volumes of data.

MacFarlane, A.; Robertson, S.E.; McCann, J.A.: Parallel computing for passage retrieval (2004) 0.00

0.004883809 = product of:
  0.024419045 = sum of:
    0.024419045 = product of:
      0.04883809 = sum of:
        0.04883809 = weight(_text_:22 in 5108) [ClassicSimilarity], result of:
          0.04883809 = score(doc=5108,freq=2.0), product of:
            0.15778607 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04505818 = queryNorm
            0.30952093 = fieldWeight in 5108, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=5108)
      0.5 = coord(1/2)
  0.2 = coord(1/5)

Date: 20. 1.2007 18:30:22

Losada, D.E.; Barreiro, A.: Emebedding term similarity and inverse document frequency into a logical model of information retrieval (2003) 0.00

0.004883809 = product of:
  0.024419045 = sum of:
    0.024419045 = product of:
      0.04883809 = sum of:
        0.04883809 = weight(_text_:22 in 1422) [ClassicSimilarity], result of:
          0.04883809 = score(doc=1422,freq=2.0), product of:
            0.15778607 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04505818 = queryNorm
            0.30952093 = fieldWeight in 1422, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=1422)
      0.5 = coord(1/2)
  0.2 = coord(1/5)

Date: 22. 3.2003 19:27:23

Bornmann, L.; Mutz, R.: From P100 to P100' : a new citation-rank approach (2014) 0.00

0.004883809 = product of:
  0.024419045 = sum of:
    0.024419045 = product of:
      0.04883809 = sum of:
        0.04883809 = weight(_text_:22 in 1431) [ClassicSimilarity], result of:
          0.04883809 = score(doc=1431,freq=2.0), product of:
            0.15778607 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04505818 = queryNorm
            0.30952093 = fieldWeight in 1431, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=1431)
      0.5 = coord(1/2)
  0.2 = coord(1/5)

Date: 22. 8.2014 17:05:18

Ravana, S.D.; Rajagopal, P.; Balakrishnan, V.: Ranking retrieval systems using pseudo relevance judgments (2015) 0.00

0.0043167183 = product of:
  0.02158359 = sum of:
    0.02158359 = product of:
      0.04316718 = sum of:
        0.04316718 = weight(_text_:22 in 2591) [ClassicSimilarity], result of:
          0.04316718 = score(doc=2591,freq=4.0), product of:
            0.15778607 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04505818 = queryNorm
            0.27358043 = fieldWeight in 2591, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2591)
      0.5 = coord(1/2)
  0.2 = coord(1/5)

Date: 20. 1.2015 18:30:22
18. 9.2018 18:22:56

Pan, M.; Huang, J.X.; He, T.; Mao, Z.; Ying, Z.; Tu, X.: ¬A simple kernel co-occurrence-based enhancement for pseudo-relevance feedback (2020) 0.00
```
0.0043106913 = product of:
  0.021553457 = sum of:
    0.021553457 = product of:
      0.043106914 = sum of:
        0.043106914 = weight(_text_:data in 5678) [ClassicSimilarity], result of:
          0.043106914 = score(doc=5678,freq=6.0), product of:
            0.14247625 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.04505818 = queryNorm
            0.30255508 = fieldWeight in 5678, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5678)
      0.5 = coord(1/2)
  0.2 = coord(1/5)
```
Abstract

Pseudo-relevance feedback is a well-studied query expansion technique in which it is assumed that the top-ranked documents in an initial set of retrieval results are relevant and expansion terms are then extracted from those documents. When selecting expansion terms, most traditional models do not simultaneously consider term frequency and the co-occurrence relationships between candidate terms and query terms. Intuitively, however, a term that has a higher co-occurrence with a query term is more likely to be related to the query topic. In this article, we propose a kernel co-occurrence-based framework to enhance retrieval performance by integrating term co-occurrence information into the Rocchio model and a relevance language model (RM3). Specifically, a kernel co-occurrence-based Rocchio method (KRoc) and a kernel co-occurrence-based RM3 method (KRM3) are proposed. In our framework, co-occurrence information is incorporated into both the factor of the term discrimination power and the factor of the within-document term weight to boost retrieval performance. The results of a series of experiments show that our proposed methods significantly outperform the corresponding strong baselines over all data sets in terms of the mean average precision and over most data sets in terms of P@10. A direct comparison of standard Text Retrieval Conference data sets indicates that our proposed methods are at least comparable to state-of-the-art approaches.

Chang, C.-H.; Hsu, C.-C.: Integrating query expansion and conceptual relevance feedback for personalized Web information retrieval (1998) 0.00

0.004273333 = product of:
  0.021366665 = sum of:
    0.021366665 = product of:
      0.04273333 = sum of:
        0.04273333 = weight(_text_:22 in 1319) [ClassicSimilarity], result of:
          0.04273333 = score(doc=1319,freq=2.0), product of:
            0.15778607 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04505818 = queryNorm
            0.2708308 = fieldWeight in 1319, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1319)
      0.5 = coord(1/2)
  0.2 = coord(1/5)

Date: 1. 8.1996 22:08:06

Kanaeva, Z.: Ranking: Google und CiteSeer (2005) 0.00

0.004273333 = product of:
  0.021366665 = sum of:
    0.021366665 = product of:
      0.04273333 = sum of:
        0.04273333 = weight(_text_:22 in 3276) [ClassicSimilarity], result of:
          0.04273333 = score(doc=3276,freq=2.0), product of:
            0.15778607 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04505818 = queryNorm
            0.2708308 = fieldWeight in 3276, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3276)
      0.5 = coord(1/2)
  0.2 = coord(1/5)

Date: 20. 3.2005 16:23:22

White, K.J.; Sutcliffe, R.F.E.: Applying incremental tree induction to retrieval : from manuals and medical texts (2006) 0.00
```
0.0042235977 = product of:
  0.021117989 = sum of:
    0.021117989 = product of:
      0.042235978 = sum of:
        0.042235978 = weight(_text_:data in 5044) [ClassicSimilarity], result of:
          0.042235978 = score(doc=5044,freq=4.0), product of:
            0.14247625 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.04505818 = queryNorm
            0.29644224 = fieldWeight in 5044, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=5044)
      0.5 = coord(1/2)
  0.2 = coord(1/5)
```
Abstract

The Decision Tree Forest (DTF) is an architecture for information retrieval that uses a separate decision tree for each document in a collection. Experiments were conducted in which DTFs working with the incremental tree induction (ITI) algorithm of Utgoff, Berkman, and Clouse (1997) were trained and evaluated in the medical and word processing domains using the Cystic Fibrosis and SIFT collections. Performance was compared with that of a conventional inverted index system (IIS) using a BM25-derived probabilistic matching function. Initial results using DTF were poor compared to those obtained with IIS. We then simulated scenarios in which large quantities of training data were available, by using only those parts of the document collection that were well covered by the data sets. Consequently, the retrieval effectiveness of DTF improved substantially. In one particular experiment, precision and recall for DTF were 0.65 and 0.67 respectively, values that compared favorably with values of 0.49 and 0.56 for IIS.
Kekäläinen, J.: Binary and graded relevance in IR evaluations : comparison of the effects on ranking of IR systems (2005) 0.00
```
0.0042235977 = product of:
  0.021117989 = sum of:
    0.021117989 = product of:
      0.042235978 = sum of:
        0.042235978 = weight(_text_:data in 1036) [ClassicSimilarity], result of:
          0.042235978 = score(doc=1036,freq=4.0), product of:
            0.14247625 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.04505818 = queryNorm
            0.29644224 = fieldWeight in 1036, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=1036)
      0.5 = coord(1/2)
  0.2 = coord(1/5)
```
Abstract

In this study the rankings of IR systems based on binary and graded relevance in TREC 7 and 8 data are compared. Relevance of a sample TREC results is reassessed using a relevance scale with four levels: non-relevant, marginally relevant, fairly relevant, highly relevant. Twenty-one topics and 90 systems from TREC 7 and 20 topics and 121 systems from TREC 8 form the data. Binary precision, and cumulated gain, discounted cumulated gain and normalised discounted cumulated gain are the measures compared. Different weighting schemes for relevance levels are tested with cumulated gain measures. Kendall's rank correlations are computed to determine to what extent the rankings produced by different measures are similar. Weighting schemes from binary to emphasising highly relevant documents form a continuum, where the measures correlate strongly in the binary end, and less in the heavily weighted end. The results show the different character of the measures.
Qi, Q.; Hessen, D.J.; Heijden, P.G.M. van der: Improving information retrieval through correspondenceanalysis instead of latent semantic analysis (2023) 0.00
```
0.0042235977 = product of:
  0.021117989 = sum of:
    0.021117989 = product of:
      0.042235978 = sum of:
        0.042235978 = weight(_text_:data in 1045) [ClassicSimilarity], result of:
          0.042235978 = score(doc=1045,freq=4.0), product of:
            0.14247625 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.04505818 = queryNorm
            0.29644224 = fieldWeight in 1045, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=1045)
      0.5 = coord(1/2)
  0.2 = coord(1/5)
```
Abstract

The initial dimensions extracted by latent semantic analysis (LSA) of a document-term matrixhave been shown to mainly display marginal effects, which are irrelevant for informationretrieval. To improve the performance of LSA, usually the elements of the raw document-term matrix are weighted and the weighting exponent of singular values can be adjusted.An alternative information retrieval technique that ignores the marginal effects is correspon-dence analysis (CA). In this paper, the information retrieval performance of LSA and CA isempirically compared. Moreover, it is explored whether the two weightings also improve theperformance of CA. The results for four empirical datasets show that CA always performsbetter than LSA. Weighting the elements of the raw data matrix can improve CA; however,it is data dependent and the improvement is small. Adjusting the singular value weightingexponent often improves the performance of CA; however, the extent of the improvementdepends on the dataset and the number of dimensions. (PDF) Improving information retrieval through correspondence analysis instead of latent semantic analysis.

Search (99 results, page 2 of 5)

Authors

Years

Languages

Themes