Search (109 results, page 1 of 6)

Song, D.; Bruza, P.D.: Towards context sensitive information inference (2003) 0.28

0.2764349 = product of:
  0.36857986 = sum of:
    0.15421765 = weight(_text_:vector in 1428) [ClassicSimilarity], result of:
      0.15421765 = score(doc=1428,freq=4.0), product of:
        0.30654848 = queryWeight, product of:
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.047605187 = queryNorm
        0.5030775 = fieldWeight in 1428, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1428)
    0.14322878 = weight(_text_:space in 1428) [ClassicSimilarity], result of:
      0.14322878 = score(doc=1428,freq=8.0), product of:
        0.24842183 = queryWeight, product of:
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.047605187 = queryNorm
        0.5765547 = fieldWeight in 1428, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1428)
    0.07113342 = sum of:
      0.03888419 = weight(_text_:model in 1428) [ClassicSimilarity], result of:
        0.03888419 = score(doc=1428,freq=2.0), product of:
          0.1830527 = queryWeight, product of:
            3.845226 = idf(docFreq=2569, maxDocs=44218)
            0.047605187 = queryNorm
          0.21242073 = fieldWeight in 1428, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.845226 = idf(docFreq=2569, maxDocs=44218)
            0.0390625 = fieldNorm(doc=1428)
      0.032249227 = weight(_text_:22 in 1428) [ClassicSimilarity], result of:
        0.032249227 = score(doc=1428,freq=2.0), product of:
          0.16670525 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.047605187 = queryNorm
          0.19345059 = fieldWeight in 1428, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=1428)
  0.75 = coord(3/4)

Abstract: Humans can make hasty, but generally robust judgements about what a text fragment is, or is not, about. Such judgements are termed information inference. This article furnishes an account of information inference from a psychologistic stance. By drawing an theories from nonclassical logic and applied cognition, an information inference mechanism is proposed that makes inferences via computations of information flow through an approximation of a conceptual space. Within a conceptual space information is represented geometrically. In this article, geometric representations of words are realized as vectors in a high dimensional semantic space, which is automatically constructed from a text corpus. Two approaches were presented for priming vector representations according to context. The first approach uses a concept combination heuristic to adjust the vector representation of a concept in the light of the representation of another concept. The second approach computes a prototypical concept an the basis of exemplar trace texts and moves it in the dimensional space according to the context. Information inference is evaluated by measuring the effectiveness of query models derived by information flow computations. Results show that information flow contributes significantly to query model effectiveness, particularly with respect to precision. Moreover, retrieval effectiveness compares favorably with two probabilistic query models, and another based an semantic association. More generally, this article can be seen as a contribution towards realizing operational systems that mimic text-based human reasoning.
Date: 22. 3.2003 19:35:46

Efron, M.: Query expansion and dimensionality reduction : Notions of optimality in Rocchio relevance feedback and latent semantic indexing (2008) 0.26

0.26025337 = product of:
  0.34700447 = sum of:
    0.18506117 = weight(_text_:vector in 2020) [ClassicSimilarity], result of:
      0.18506117 = score(doc=2020,freq=4.0), product of:
        0.30654848 = queryWeight, product of:
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.047605187 = queryNorm
        0.603693 = fieldWeight in 2020, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.046875 = fieldNorm(doc=2020)
    0.12153365 = weight(_text_:space in 2020) [ClassicSimilarity], result of:
      0.12153365 = score(doc=2020,freq=4.0), product of:
        0.24842183 = queryWeight, product of:
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.047605187 = queryNorm
        0.48922288 = fieldWeight in 2020, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.046875 = fieldNorm(doc=2020)
    0.04040964 = product of:
      0.08081928 = sum of:
        0.08081928 = weight(_text_:model in 2020) [ClassicSimilarity], result of:
          0.08081928 = score(doc=2020,freq=6.0), product of:
            0.1830527 = queryWeight, product of:
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.047605187 = queryNorm
            0.44150823 = fieldWeight in 2020, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.046875 = fieldNorm(doc=2020)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: Rocchio relevance feedback and latent semantic indexing (LSI) are well-known extensions of the vector space model for information retrieval (IR). This paper analyzes the statistical relationship between these extensions. The analysis focuses on each method's basis in least-squares optimization. Noting that LSI and Rocchio relevance feedback both alter the vector space model in a way that is in some sense least-squares optimal, we ask: what is the relationship between LSI's and Rocchio's notions of optimality? What does this relationship imply for IR? Using an analytical approach, we argue that Rocchio relevance feedback is optimal if we understand retrieval as a simplified classification problem. On the other hand, LSI's motivation comes to the fore if we understand it as a biased regression technique, where projection onto a low-dimensional orthogonal subspace of the documents reduces model variance.

Wong, S.K.M.: On modelling information retrieval with probabilistic inference (1995) 0.24

0.24012579 = product of:
  0.32016772 = sum of:
    0.17447734 = weight(_text_:vector in 1938) [ClassicSimilarity], result of:
      0.17447734 = score(doc=1938,freq=2.0), product of:
        0.30654848 = queryWeight, product of:
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.047605187 = queryNorm
        0.5691672 = fieldWeight in 1938, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.0625 = fieldNorm(doc=1938)
    0.11458302 = weight(_text_:space in 1938) [ClassicSimilarity], result of:
      0.11458302 = score(doc=1938,freq=2.0), product of:
        0.24842183 = queryWeight, product of:
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.047605187 = queryNorm
        0.46124378 = fieldWeight in 1938, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.0625 = fieldNorm(doc=1938)
    0.031107351 = product of:
      0.062214702 = sum of:
        0.062214702 = weight(_text_:model in 1938) [ClassicSimilarity], result of:
          0.062214702 = score(doc=1938,freq=2.0), product of:
            0.1830527 = queryWeight, product of:
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.047605187 = queryNorm
            0.33987316 = fieldWeight in 1938, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.0625 = fieldNorm(doc=1938)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: Examines and extends the logical models of information retrieval in the context of probability theory and extends the applications of these fundamental ideas to term weighting and relevance. Develops a unified framework for modelling the retrieval process with probabilistic inference to provide a common conceptual and mathematical basis for many retrieval models, such as Boolean, fuzzy sets, vector space, and conventional probabilistic models. Employs this framework to identify the underlying assumptions by each model and analyzes the inherent relationships between them. Although the treatment is primarily theoretical, practical methods for rstimating the required probabilities are provided by simple examples

López-Pujalte, C.; Guerrero-Bote, V.P.; Moya-Anegón, F. de: Genetic algorithms in relevance feedback : a second test and new contributions (2003) 0.22

0.21856591 = product of:
  0.2914212 = sum of:
    0.15266767 = weight(_text_:vector in 1076) [ClassicSimilarity], result of:
      0.15266767 = score(doc=1076,freq=2.0), product of:
        0.30654848 = queryWeight, product of:
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.047605187 = queryNorm
        0.4980213 = fieldWeight in 1076, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1076)
    0.100260146 = weight(_text_:space in 1076) [ClassicSimilarity], result of:
      0.100260146 = score(doc=1076,freq=2.0), product of:
        0.24842183 = queryWeight, product of:
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.047605187 = queryNorm
        0.4035883 = fieldWeight in 1076, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1076)
    0.038493384 = product of:
      0.07698677 = sum of:
        0.07698677 = weight(_text_:model in 1076) [ClassicSimilarity], result of:
          0.07698677 = score(doc=1076,freq=4.0), product of:
            0.1830527 = queryWeight, product of:
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.047605187 = queryNorm
            0.4205716 = fieldWeight in 1076, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1076)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: The present work is the continuation of an earlier study which reviewed the literature on relevance feedback genetic techniques that follow the vector space model (the model that is most commonly used in this type of application), and implemented them so that they could be compared with each other as well as with one of the best traditional methods of relevance feedback--the Ide dec-hi method. We here carry out the comparisons on more test collections (Cranfield, CISI, Medline, and NPL), using the residual collection method for their evaluation as is recommended in this type of technique. We also add some fitness functions of our own design.

Savoy, J.; Ndarugendamwo, M.; Vrajitoru, D.: Report on the TREC-4 experiment : combining probabilistic and vector-space schemes (1996) 0.22

0.21679527 = product of:
  0.43359053 = sum of:
    0.261716 = weight(_text_:vector in 7574) [ClassicSimilarity], result of:
      0.261716 = score(doc=7574,freq=2.0), product of:
        0.30654848 = queryWeight, product of:
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.047605187 = queryNorm
        0.8537508 = fieldWeight in 7574, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.09375 = fieldNorm(doc=7574)
    0.17187454 = weight(_text_:space in 7574) [ClassicSimilarity], result of:
      0.17187454 = score(doc=7574,freq=2.0), product of:
        0.24842183 = queryWeight, product of:
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.047605187 = queryNorm
        0.6918657 = fieldWeight in 7574, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.09375 = fieldNorm(doc=7574)
  0.5 = coord(2/4)

Liu, X.; Turtle, H.: Real-time user interest modeling for real-time ranking (2013) 0.20

0.19759221 = product of:
  0.26345628 = sum of:
    0.130858 = weight(_text_:vector in 1035) [ClassicSimilarity], result of:
      0.130858 = score(doc=1035,freq=2.0), product of:
        0.30654848 = queryWeight, product of:
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.047605187 = queryNorm
        0.4268754 = fieldWeight in 1035, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.046875 = fieldNorm(doc=1035)
    0.08593727 = weight(_text_:space in 1035) [ClassicSimilarity], result of:
      0.08593727 = score(doc=1035,freq=2.0), product of:
        0.24842183 = queryWeight, product of:
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.047605187 = queryNorm
        0.34593284 = fieldWeight in 1035, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.046875 = fieldNorm(doc=1035)
    0.046661027 = product of:
      0.09332205 = sum of:
        0.09332205 = weight(_text_:model in 1035) [ClassicSimilarity], result of:
          0.09332205 = score(doc=1035,freq=8.0), product of:
            0.1830527 = queryWeight, product of:
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.047605187 = queryNorm
            0.50980973 = fieldWeight in 1035, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.046875 = fieldNorm(doc=1035)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: User interest as a very dynamic information need is often ignored in most existing information retrieval systems. In this research, we present the results of experiments designed to evaluate the performance of a real-time interest model (RIM) that attempts to identify the dynamic and changing query level interests regarding social media outputs. Unlike most existing ranking methods, our ranking approach targets calculation of the probability that user interest in the content of the document is subject to very dynamic user interest change. We describe 2 formulations of the model (real-time interest vector space and real-time interest language model) stemming from classical relevance ranking methods and develop a novel methodology for evaluating the performance of RIM using Amazon Mechanical Turk to collect (interest-based) relevance judgments on a daily basis. Our results show that the model usually, although not always, performs better than baseline results obtained from commercial web search engines. We identify factors that affect RIM performance and outline plans for future research.

Shah, B.; Raghavan, V.; Dhatric, P.; Zhao, X.: ¬A cluster-based approach for efficient content-based image retrieval using a similarity-preserving space transformation method (2006) 0.17
```
0.1732832 = product of:
  0.3465664 = sum of:
    0.10904834 = weight(_text_:vector in 6118) [ClassicSimilarity], result of:
      0.10904834 = score(doc=6118,freq=2.0), product of:
        0.30654848 = queryWeight, product of:
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.047605187 = queryNorm
        0.3557295 = fieldWeight in 6118, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.0390625 = fieldNorm(doc=6118)
    0.23751807 = weight(_text_:space in 6118) [ClassicSimilarity], result of:
      0.23751807 = score(doc=6118,freq=22.0), product of:
        0.24842183 = queryWeight, product of:
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.047605187 = queryNorm
        0.95610785 = fieldWeight in 6118, product of:
          4.690416 = tf(freq=22.0), with freq of:
            22.0 = termFreq=22.0
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.0390625 = fieldNorm(doc=6118)
  0.5 = coord(2/4)
```
Abstract

The techniques of clustering and space transformation have been successfully used in the past to solve a number of pattern recognition problems. In this article, the authors propose a new approach to content-based image retrieval (CBIR) that uses (a) a newly proposed similarity-preserving space transformation method to transform the original low-level image space into a highlevel vector space that enables efficient query processing, and (b) a clustering scheme that further improves the efficiency of our retrieval system. This combination is unique and the resulting system provides synergistic advantages of using both clustering and space transformation. The proposed space transformation method is shown to preserve the order of the distances in the transformed feature space. This strategy makes this approach to retrieval generic as it can be applied to object types, other than images, and feature spaces more general than metric spaces. The CBIR approach uses the inexpensive "estimated" distance in the transformed space, as opposed to the computationally inefficient "real" distance in the original space, to retrieve the desired results for a given query image. The authors also provide a theoretical analysis of the complexity of their CBIR approach when used for color-based retrieval, which shows that it is computationally more efficient than other comparable approaches. An extensive set of experiments to test the efficiency and effectiveness of the proposed approach has been performed. The results show that the approach offers superior response time (improvement of 1-2 orders of magnitude compared to retrieval approaches that either use pruning techniques like indexing, clustering, etc., or space transformation, but not both) with sufficiently high retrieval accuracy.

Lopez-Pujalte, C.; Guerrero Bote, V.P.; Moya-Anegón, F. de: Evaluation of the application of genetic algorithms to relevance feedback (2003) 0.16

0.15611848 = product of:
  0.20815799 = sum of:
    0.10904834 = weight(_text_:vector in 2756) [ClassicSimilarity], result of:
      0.10904834 = score(doc=2756,freq=2.0), product of:
        0.30654848 = queryWeight, product of:
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.047605187 = queryNorm
        0.3557295 = fieldWeight in 2756, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2756)
    0.07161439 = weight(_text_:space in 2756) [ClassicSimilarity], result of:
      0.07161439 = score(doc=2756,freq=2.0), product of:
        0.24842183 = queryWeight, product of:
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.047605187 = queryNorm
        0.28827736 = fieldWeight in 2756, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2756)
    0.027495272 = product of:
      0.054990545 = sum of:
        0.054990545 = weight(_text_:model in 2756) [ClassicSimilarity], result of:
          0.054990545 = score(doc=2756,freq=4.0), product of:
            0.1830527 = queryWeight, product of:
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.047605187 = queryNorm
            0.30040827 = fieldWeight in 2756, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2756)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: We evaluated the different genetic algorithms applied to relevance feedback that are to be found in the literature and which follow the vector space model (the most commonly used model in this type of application). They were compared with a traditional relevance feedback algorithm - the Ide dec-hi method - since this had given the best results in the study of Salton & Buckley (1990) an this subject. The experiment was performed an the Cranfield collection, and the different algorithms were evaluated using the residual collection method (one of the most suitable methods for evaluating relevance feedback techniques). The results varied greatly depending an the fitness function that was used, from no improvement in some of the genetic algorithms, to a more than 127% improvement with one algorithm, surpassing even the traditional Ide dec-hi method. One can therefore conclude that genetic algorithms show great promise as an aid to implementing a truly effective information retrieval system.

Chen, Z.; Meng, X.; Fowler, R.H.; Zhu, B.: Real-time adaptive feature and document learning for Web search (2001) 0.15

0.15007861 = product of:
  0.20010482 = sum of:
    0.10904834 = weight(_text_:vector in 5209) [ClassicSimilarity], result of:
      0.10904834 = score(doc=5209,freq=2.0), product of:
        0.30654848 = queryWeight, product of:
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.047605187 = queryNorm
        0.3557295 = fieldWeight in 5209, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5209)
    0.07161439 = weight(_text_:space in 5209) [ClassicSimilarity], result of:
      0.07161439 = score(doc=5209,freq=2.0), product of:
        0.24842183 = queryWeight, product of:
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.047605187 = queryNorm
        0.28827736 = fieldWeight in 5209, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5209)
    0.019442094 = product of:
      0.03888419 = sum of:
        0.03888419 = weight(_text_:model in 5209) [ClassicSimilarity], result of:
          0.03888419 = score(doc=5209,freq=2.0), product of:
            0.1830527 = queryWeight, product of:
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.047605187 = queryNorm
            0.21242073 = fieldWeight in 5209, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5209)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: Chen et alia report on the design of FEATURES, a web search engine with adaptive features based on minimal relevance feedback. Rather than developing user profiles from previous searcher activity either at the server or client location, or updating indexes after search completion, FEATURES allows for index and user characterization files to be updated during query modification on retrieval from a general purpose search engine. Indexing terms relevant to a query are defined as the union of all terms assigned to documents retrieved by the initial search run and are used to build a vector space model on this retrieved set. The top ten weighted terms are presented to the user for a relevant non-relevant choice which is used to modify the term weights. Documents are chosen if their summed term weights are greater than some threshold. A user evaluation of the top ten ranked documents as non-relevant will decrease these term weights and a positive judgement will increase them. A new ordering of the retrieved set will generate new display lists of terms and documents. Precision is improved in a test on Alta Vista searches.

Berry, M.W.; Browne, M.: Understanding search engines : mathematical modeling and text retrieval (2005) 0.13
```
0.12774989 = product of:
  0.25549978 = sum of:
    0.17447734 = weight(_text_:vector in 7) [ClassicSimilarity], result of:
      0.17447734 = score(doc=7,freq=8.0), product of:
        0.30654848 = queryWeight, product of:
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.047605187 = queryNorm
        0.5691672 = fieldWeight in 7, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.03125 = fieldNorm(doc=7)
    0.081022434 = weight(_text_:space in 7) [ClassicSimilarity], result of:
      0.081022434 = score(doc=7,freq=4.0), product of:
        0.24842183 = queryWeight, product of:
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.047605187 = queryNorm
        0.3261486 = fieldWeight in 7, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.03125 = fieldNorm(doc=7)
  0.5 = coord(2/4)
```
Content

Inhalt: Introduction Document File Preparation - Manual Indexing - Information Extraction - Vector Space Modeling - Matrix Decompositions - Query Representations - Ranking and Relevance Feedback - Searching by Link Structure - User Interface - Book Format Document File Preparation Document Purification and Analysis - Text Formatting - Validation - Manual Indexing - Automatic Indexing - Item Normalization - Inverted File Structures - Document File - Dictionary List - Inversion List - Other File Structures Vector Space Models Construction - Term-by-Document Matrices - Simple Query Matching - Design Issues - Term Weighting - Sparse Matrix Storage - Low-Rank Approximations Matrix Decompositions QR Factorization - Singular Value Decomposition - Low-Rank Approximations - Query Matching - Software - Semidiscrete Decomposition - Updating Techniques Query Management Query Binding - Types of Queries - Boolean Queries - Natural Language Queries - Thesaurus Queries - Fuzzy Queries - Term Searches - Probabilistic Queries Ranking and Relevance Feedback Performance Evaluation - Precision - Recall - Average Precision - Genetic Algorithms - Relevance Feedback Searching by Link Structure HITS Method - HITS Implementation - HITS Summary - PageRank Method - PageRank Adjustments - PageRank Implementation - PageRank Summary User Interface Considerations General Guidelines - Search Engine Interfaces - Form Fill-in - Display Considerations - Progress Indication - No Penalties for Error - Results - Test and Retest - Final Considerations Further Reading

LCSH

Vector spaces

Subject

Vector spaces
Chen, Z.; Fu, B.: On the complexity of Rocchio's similarity-based relevance feedback algorithm (2007) 0.13
```
0.12774785 = product of:
  0.2554957 = sum of:
    0.15421765 = weight(_text_:vector in 578) [ClassicSimilarity], result of:
      0.15421765 = score(doc=578,freq=4.0), product of:
        0.30654848 = queryWeight, product of:
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.047605187 = queryNorm
        0.5030775 = fieldWeight in 578, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.0390625 = fieldNorm(doc=578)
    0.101278044 = weight(_text_:space in 578) [ClassicSimilarity], result of:
      0.101278044 = score(doc=578,freq=4.0), product of:
        0.24842183 = queryWeight, product of:
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.047605187 = queryNorm
        0.40768576 = fieldWeight in 578, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.0390625 = fieldNorm(doc=578)
  0.5 = coord(2/4)
```
Abstract

Rocchio's similarity-based relevance feedback algorithm, one of the most important query reformation methods in information retrieval, is essentially an adaptive learning algorithm from examples in searching for documents represented by a linear classifier. Despite its popularity in various applications, there is little rigorous analysis of its learning complexity in literature. In this article, the authors prove for the first time that the learning complexity of Rocchio's algorithm is O(d + d**2(log d + log n)) over the discretized vector space {0, ... , n - 1 }**d when the inner product similarity measure is used. The upper bound on the learning complexity for searching for documents represented by a monotone linear classifier (q, 0) over {0, ... , n - 1 }d can be improved to, at most, 1 + 2k (n - 1) (log d + log(n - 1)), where k is the number of nonzero components in q. Several lower bounds on the learning complexity are also obtained for Rocchio's algorithm. For example, the authors prove that Rocchio's algorithm has a lower bound Omega((d über 2)log n) on its learning complexity over the Boolean vector space {0,1}**d.

Wilbur, W.J.: ¬A retrieval system based on automatic relevance weighting of search terms (1992) 0.12

0.11834602 = product of:
  0.23669204 = sum of:
    0.17447734 = weight(_text_:vector in 5269) [ClassicSimilarity], result of:
      0.17447734 = score(doc=5269,freq=2.0), product of:
        0.30654848 = queryWeight, product of:
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.047605187 = queryNorm
        0.5691672 = fieldWeight in 5269, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.0625 = fieldNorm(doc=5269)
    0.062214702 = product of:
      0.124429405 = sum of:
        0.124429405 = weight(_text_:model in 5269) [ClassicSimilarity], result of:
          0.124429405 = score(doc=5269,freq=8.0), product of:
            0.1830527 = queryWeight, product of:
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.047605187 = queryNorm
            0.6797463 = fieldWeight in 5269, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.0625 = fieldNorm(doc=5269)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: Describes the development of a retrieval system based on automatic relevance weighting of search terms and founded on the Bayesian formulation of the probability of relevance as function of term occurrence where the contribution from individual terms is assumed to be independent. The relevance pair (RP) model and the vector cosine (VC) model were compared and in the test environment improved retrieval was obtained with the RP model when compared with the VC model

Computational information retrieval (2001) 0.11
```
0.10839763 = product of:
  0.21679527 = sum of:
    0.130858 = weight(_text_:vector in 4167) [ClassicSimilarity], result of:
      0.130858 = score(doc=4167,freq=2.0), product of:
        0.30654848 = queryWeight, product of:
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.047605187 = queryNorm
        0.4268754 = fieldWeight in 4167, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.046875 = fieldNorm(doc=4167)
    0.08593727 = weight(_text_:space in 4167) [ClassicSimilarity], result of:
      0.08593727 = score(doc=4167,freq=2.0), product of:
        0.24842183 = queryWeight, product of:
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.047605187 = queryNorm
        0.34593284 = fieldWeight in 4167, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.046875 = fieldNorm(doc=4167)
  0.5 = coord(2/4)
```
Abstract

This volume contains selected papers that focus on the use of linear algebra, computational statistics, and computer science in the development of algorithms and software systems for text retrieval. Experts in information modeling and retrieval share their perspectives on the design of scalable but precise text retrieval systems, revealing many of the challenges and obstacles that mathematical and statistical models must overcome to be viable for automated text processing. This very useful proceedings is an excellent companion for courses in information retrieval, applied linear algebra, and applied statistics. Computational Information Retrieval provides background material on vector space models for text retrieval that applied mathematicians, statisticians, and computer scientists may not be familiar with. For graduate students in these areas, several research questions in information modeling are exposed. In addition, several case studies concerning the efficacy of the popular Latent Semantic Analysis (or Indexing) approach are provided.
Cross-language information retrieval (1998) 0.09
```
0.089183114 = product of:
  0.11891082 = sum of:
    0.05452417 = weight(_text_:vector in 6299) [ClassicSimilarity], result of:
      0.05452417 = score(doc=6299,freq=2.0), product of:
        0.30654848 = queryWeight, product of:
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.047605187 = queryNorm
        0.17786475 = fieldWeight in 6299, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.01953125 = fieldNorm(doc=6299)
    0.050639022 = weight(_text_:space in 6299) [ClassicSimilarity], result of:
      0.050639022 = score(doc=6299,freq=4.0), product of:
        0.24842183 = queryWeight, product of:
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.047605187 = queryNorm
        0.20384288 = fieldWeight in 6299, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.01953125 = fieldNorm(doc=6299)
    0.013747636 = product of:
      0.027495272 = sum of:
        0.027495272 = weight(_text_:model in 6299) [ClassicSimilarity], result of:
          0.027495272 = score(doc=6299,freq=4.0), product of:
            0.1830527 = queryWeight, product of:
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.047605187 = queryNorm
            0.15020414 = fieldWeight in 6299, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.01953125 = fieldNorm(doc=6299)
      0.5 = coord(1/2)
  0.75 = coord(3/4)
```
Content

Enthält die Beiträge: GREFENSTETTE, G.: The Problem of Cross-Language Information Retrieval; DAVIS, M.W.: On the Effective Use of Large Parallel Corpora in Cross-Language Text Retrieval; BALLESTEROS, L. u. W.B. CROFT: Statistical Methods for Cross-Language Information Retrieval; Distributed Cross-Lingual Information Retrieval; Automatic Cross-Language Information Retrieval Using Latent Semantic Indexing; EVANS, D.A. u.a.: Mapping Vocabularies Using Latent Semantics; PICCHI, E. u. C. PETERS: Cross-Language Information Retrieval: A System for Comparable Corpus Querying; YAMABANA, K. u.a.: A Language Conversion Front-End for Cross-Language Information Retrieval; GACHOT, D.A. u.a.: The Systran NLP Browser: An Application of Machine Translation Technology in Cross-Language Information Retrieval; HULL, D.: A Weighted Boolean Model for Cross-Language Text Retrieval; SHERIDAN, P. u.a. Building a Large Multilingual Test Collection from Comparable News Documents; OARD; D.W. u. B.J. DORR: Evaluating Cross-Language Text Filtering Effectiveness

Footnote

Christian Fluhr at al (DIST/SMTI, France) outline the EMIR (European Multilingual Information Retrieval) and ESPRIT projects. They found that using SYSTRAN to machine translate queries and to access material from various multilingual databases produced less relevant results than a method referred to as 'multilingual reformulation' (the mechanics of which are only hinted at). An interesting technique is Latent Semantic Indexing (LSI), described by Michael Littman et al (Brown University) and, most clearly, by David Evans et al (Carnegie Mellon University). LSI involves creating matrices of documents and the terms they contain and 'fitting' related documents into a reduced matrix space. This effectively allows queries to be mapped onto a common semantic representation of the documents. Eugenio Picchi and Carol Peters (Pisa) report on a procedure to create links between translation equivalents in an Italian-English parallel corpus. The links are used to construct parallel linguistic contexts in real-time for any term or combination of terms that is being searched for in either language. Their interest is primarily lexicographic but they plan to apply the same procedure to comparable corpora, i.e. to texts which are not translations of each other but which share the same domain. Kiyoshi Yamabana et al (NEC, Japan) address the issue of how to disambiguate between alternative translations of query terms. Their DMAX (double maximise) method looks at co-occurrence frequencies between both source language words and target language words in order to arrive at the most probable translation. The statistical data for the decision are derived, not from the translation texts but independently from monolingual corpora in each language. An interactive user interface allows the user to influence the selection of terms during the matching process. Denis Gachot et al (SYSTRAN) describe the SYSTRAN NLP browser, a prototype tool which collects parsing information derived from a text or corpus previously translated with SYSTRAN. The user enters queries into the browser in either a structured or free form and receives grammatical and lexical information about the source text and/or its translation.
The retrieved output from a query including the phrase 'big rockets' may be, for instance, a sentence containing 'giant rocket' which is semantically ranked above 'military ocket'. David Hull (Xerox Research Centre, Grenoble) describes an implementation of a weighted Boolean model for Spanish-English CLIR. Users construct Boolean-type queries, weighting each term in the query, which is then translated by an on-line dictionary before being applied to the database. Comparisons with the performance of unweighted free-form queries ('vector space' models) proved encouraging. Two contributions consider the evaluation of CLIR systems. In order to by-pass the time-consuming and expensive process of assembling a standard collection of documents and of user queries against which the performance of an CLIR system is manually assessed, Páriac Sheridan et al (ETH Zurich) propose a method based on retrieving 'seed documents'. This involves identifying a unique document in a database (the 'seed document') and, for a number of queries, measuring how fast it is retrieved. The authors have also assembled a large database of multilingual news documents for testing purposes. By storing the (fairly short) documents in a structured form tagged with descriptor codes (e.g. for topic, country and area), the test suite is easily expanded while remaining consistent for the purposes of testing. Douglas Ouard and Bonne Dorr (University of Maryland) describe an evaluation methodology which appears to apply LSI techniques in order to filter and rank incoming documents designed for testing CLIR systems. The volume provides the reader an excellent overview of several projects in CLIR. It is well supported with references and is intended as a secondary text for researchers and practitioners. It highlights the need for a good, general tutorial introduction to the field."

Ding, Y.: Topic-based PageRank on author cocitation networks (2011) 0.09

0.08563382 = product of:
  0.17126764 = sum of:
    0.130858 = weight(_text_:vector in 4348) [ClassicSimilarity], result of:
      0.130858 = score(doc=4348,freq=2.0), product of:
        0.30654848 = queryWeight, product of:
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.047605187 = queryNorm
        0.4268754 = fieldWeight in 4348, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.046875 = fieldNorm(doc=4348)
    0.04040964 = product of:
      0.08081928 = sum of:
        0.08081928 = weight(_text_:model in 4348) [ClassicSimilarity], result of:
          0.08081928 = score(doc=4348,freq=6.0), product of:
            0.1830527 = queryWeight, product of:
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.047605187 = queryNorm
            0.44150823 = fieldWeight in 4348, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.046875 = fieldNorm(doc=4348)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: Ranking authors is vital for identifying a researcher's impact and standing within a scientific field. There are many different ranking methods (e.g., citations, publications, h-index, PageRank, and weighted PageRank), but most of them are topic-independent. This paper proposes topic-dependent ranks based on the combination of a topic model and a weighted PageRank algorithm. The author-conference-topic (ACT) model was used to extract topic distribution of individual authors. Two ways for combining the ACT model with the PageRank algorithm are proposed: simple combination (I_PR) or using a topic distribution as a weighted vector for PageRank (PR_t). Information retrieval was chosen as the test field and representative authors for different topics at different time phases were identified. Principal component analysis (PCA) was applied to analyze the ranking difference between I_PR and PR_t.

Zhang, W.; Korf, R.E.: Performance of linear-space search algorithms (1995) 0.06

0.062019885 = product of:
  0.24807954 = sum of:
    0.24807954 = weight(_text_:space in 4744) [ClassicSimilarity], result of:
      0.24807954 = score(doc=4744,freq=6.0), product of:
        0.24842183 = queryWeight, product of:
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.047605187 = queryNorm
        0.9986221 = fieldWeight in 4744, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.078125 = fieldNorm(doc=4744)
  0.25 = coord(1/4)

Abstract: Search algorithms in artificial intelligence systems that use space linear in the search depth are employed in practice to solve difficult problems optimally, such as planning and scheduling. Studies the average-case performance of linear-space search algorithms, including depth-first branch-and-bound, iterative-deepening, and recursive best-first search

Kulyukin, V.A.; Settle, A.: Ranked retrieval with semantic networks and vector spaces (2001) 0.06

0.061687056 = product of:
  0.24674822 = sum of:
    0.24674822 = weight(_text_:vector in 6934) [ClassicSimilarity], result of:
      0.24674822 = score(doc=6934,freq=4.0), product of:
        0.30654848 = queryWeight, product of:
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.047605187 = queryNorm
        0.804924 = fieldWeight in 6934, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.0625 = fieldNorm(doc=6934)
  0.25 = coord(1/4)

Abstract: The equivalence of semantic networks with spreading activation and vector spaces with dot product is investigated under ranked retrieval. Semantic networks are viewed as networks of concepts organized in terms of abstraction and packaging relations. It is shown that the two models can be effectively constructed from each other. A formal method is suggested to analyze the models in terms of their relative performance in the same universe of objects

Langville, A.N.; Meyer, C.D.: Google's PageRank and beyond : the science of search engine rankings (2006) 0.05
```
0.054513875 = product of:
  0.10902775 = sum of:
    0.092530586 = weight(_text_:vector in 6) [ClassicSimilarity], result of:
      0.092530586 = score(doc=6,freq=4.0), product of:
        0.30654848 = queryWeight, product of:
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.047605187 = queryNorm
        0.3018465 = fieldWeight in 6, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.0234375 = fieldNorm(doc=6)
    0.016497165 = product of:
      0.03299433 = sum of:
        0.03299433 = weight(_text_:model in 6) [ClassicSimilarity], result of:
          0.03299433 = score(doc=6,freq=4.0), product of:
            0.1830527 = queryWeight, product of:
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.047605187 = queryNorm
            0.18024497 = fieldWeight in 6, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.0234375 = fieldNorm(doc=6)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Content

Inhalt: Chapter 1. Introduction to Web Search Engines: 1.1 A Short History of Information Retrieval - 1.2 An Overview of Traditional Information Retrieval - 1.3 Web Information Retrieval Chapter 2. Crawling, Indexing, and Query Processing: 2.1 Crawling - 2.2 The Content Index - 2.3 Query Processing Chapter 3. Ranking Webpages by Popularity: 3.1 The Scene in 1998 - 3.2 Two Theses - 3.3 Query-Independence Chapter 4. The Mathematics of Google's PageRank: 4.1 The Original Summation Formula for PageRank - 4.2 Matrix Representation of the Summation Equations - 4.3 Problems with the Iterative Process - 4.4 A Little Markov Chain Theory - 4.5 Early Adjustments to the Basic Model - 4.6 Computation of the PageRank Vector - 4.7 Theorem and Proof for Spectrum of the Google Matrix Chapter 5. Parameters in the PageRank Model: 5.1 The a Factor - 5.2 The Hyperlink Matrix H - 5.3 The Teleportation Matrix E Chapter 6. The Sensitivity of PageRank; 6.1 Sensitivity with respect to alpha - 6.2 Sensitivity with respect to H - 6.3 Sensitivity with respect to vT - 6.4 Other Analyses of Sensitivity - 6.5 Sensitivity Theorems and Proofs Chapter 7. The PageRank Problem as a Linear System: 7.1 Properties of (I - alphaS) - 7.2 Properties of (I - alphaH) - 7.3 Proof of the PageRank Sparse Linear System Chapter 8. Issues in Large-Scale Implementation of PageRank: 8.1 Storage Issues - 8.2 Convergence Criterion - 8.3 Accuracy - 8.4 Dangling Nodes - 8.5 Back Button Modeling
Chapter 9. Accelerating the Computation of PageRank: 9.1 An Adaptive Power Method - 9.2 Extrapolation - 9.3 Aggregation - 9.4 Other Numerical Methods Chapter 10. Updating the PageRank Vector: 10.1 The Two Updating Problems and their History - 10.2 Restarting the Power Method - 10.3 Approximate Updating Using Approximate Aggregation - 10.4 Exact Aggregation - 10.5 Exact vs. Approximate Aggregation - 10.6 Updating with Iterative Aggregation - 10.7 Determining the Partition - 10.8 Conclusions Chapter 11. The HITS Method for Ranking Webpages: 11.1 The HITS Algorithm - 11.2 HITS Implementation - 11.3 HITS Convergence - 11.4 HITS Example - 11.5 Strengths and Weaknesses of HITS - 11.6 HITS's Relationship to Bibliometrics - 11.7 Query-Independent HITS - 11.8 Accelerating HITS - 11.9 HITS Sensitivity Chapter 12. Other Link Methods for Ranking Webpages: 12.1 SALSA - 12.2 Hybrid Ranking Methods - 12.3 Rankings based on Traffic Flow Chapter 13. The Future of Web Information Retrieval: 13.1 Spam - 13.2 Personalization - 13.3 Clustering - 13.4 Intelligent Agents - 13.5 Trends and Time-Sensitive Search - 13.6 Privacy and Censorship - 13.7 Library Classification Schemes - 13.8 Data Fusion Chapter 14. Resources for Web Information Retrieval: 14.1 Resources for Getting Started - 14.2 Resources for Serious Study Chapter 15. The Mathematics Guide: 15.1 Linear Algebra - 15.2 Perron-Frobenius Theory - 15.3 Markov Chains - 15.4 Perron Complementation - 15.5 Stochastic Complementation - 15.6 Censoring - 15.7 Aggregation - 15.8 Disaggregation
Liddy, E.D.: ¬An alternative representation for documents and queries (1993) 0.05
```
0.046265293 = product of:
  0.18506117 = sum of:
    0.18506117 = weight(_text_:vector in 7813) [ClassicSimilarity], result of:
      0.18506117 = score(doc=7813,freq=4.0), product of:
        0.30654848 = queryWeight, product of:
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.047605187 = queryNorm
        0.603693 = fieldWeight in 7813, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.046875 = fieldNorm(doc=7813)
  0.25 = coord(1/4)
```
Abstract

Describes an alternative method of representation for documents and queries in information retrieval systems to the 2 most common methods: free text, natural language representation and controlled language representation. The alternative method combines the advantage of both traditional approaches and overcomes the difficulties associated with each. The scheme was developed for use with Longman's Dictionary of Contemporary English and uses a computerized version of the dictionary for the automatic generation of summary level semantic representations of each document and query. The system tags each word in a document with the appropriate Subject Field Code (SFC) from the dictionary and the SFCs are summed and normalized to produce a weighted, fixed length vector of the SFC. The search system matches the query SFC vector to the document SFC vectors in the database. The documents are then ranked on the basis of their vector's similarity to the query
Drucker, H.; Shahrary, B.; Gibbon, D.C.: Support vector machines : relevance feedback and information retrieval (2002) 0.05
```
0.046265293 = product of:
  0.18506117 = sum of:
    0.18506117 = weight(_text_:vector in 2581) [ClassicSimilarity], result of:
      0.18506117 = score(doc=2581,freq=4.0), product of:
        0.30654848 = queryWeight, product of:
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.047605187 = queryNorm
        0.603693 = fieldWeight in 2581, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.046875 = fieldNorm(doc=2581)
  0.25 = coord(1/4)
```
Abstract

We compare support vector machines (SVMs) to Rocchio, Ide regular and Ide dec-hi algorithms in information retrieval (IR) of text documents using relevancy feedback. It is assumed a preliminary search finds a set of documents that the user marks as relevant or not and then feedback iterations commence. Particular attention is paid to IR searches where the number of relevant documents in the database is low and the preliminary set of documents used to start the search has few relevant documents. Experiments show that if inverse document frequency (IDF) weighting is not used because one is unwilling to pay the time penalty needed to obtain these features, then SVMs are better whether using term-frequency (TF) or binary weighting. SVM performance is marginally better than Ide dec-hi if TF-IDF weighting is used and there is a reasonable number of relevant documents found in the preliminary search. If the preliminary search is so poor that one has to search through many documents to find at least one relevant document, then SVM is preferred.

Search (109 results, page 1 of 6)

Authors

Years

Languages

Types

Themes

Subjects

Classifications