Search (6445 results, page 1 of 323)

Li, D.; Kwong, C.-P.; Lee, D.L.: Unified linear subspace approach to semantic analysis (2009) 0.35

0.3499609 = product of:
  0.4666145 = sum of:
    0.21809667 = weight(_text_:vector in 3321) [ClassicSimilarity], result of:
      0.21809667 = score(doc=3321,freq=8.0), product of:
        0.30654848 = queryWeight, product of:
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.047605187 = queryNorm
        0.711459 = fieldWeight in 3321, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3321)
    0.21484317 = weight(_text_:space in 3321) [ClassicSimilarity], result of:
      0.21484317 = score(doc=3321,freq=18.0), product of:
        0.24842183 = queryWeight, product of:
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.047605187 = queryNorm
        0.86483204 = fieldWeight in 3321, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3321)
    0.0336747 = product of:
      0.0673494 = sum of:
        0.0673494 = weight(_text_:model in 3321) [ClassicSimilarity], result of:
          0.0673494 = score(doc=3321,freq=6.0), product of:
            0.1830527 = queryWeight, product of:
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.047605187 = queryNorm
            0.36792353 = fieldWeight in 3321, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3321)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: The Basic Vector Space Model (BVSM) is well known in information retrieval. Unfortunately, its retrieval effectiveness is limited because it is based on literal term matching. The Generalized Vector Space Model (GVSM) and Latent Semantic Indexing (LSI) are two prominent semantic retrieval methods, both of which assume there is some underlying latent semantic structure in a dataset that can be used to improve retrieval performance. However, while this structure may be derived from both the term space and the document space, GVSM exploits only the former and LSI the latter. In this article, the latent semantic structure of a dataset is examined from a dual perspective; namely, we consider the term space and the document space simultaneously. This new viewpoint has a natural connection to the notion of kernels. Specifically, a unified kernel function can be derived for a class of vector space models. The dual perspective provides a deeper understanding of the semantic space and makes transparent the geometrical meaning of the unified kernel function. New semantic analysis methods based on the unified kernel function are developed, which combine the advantages of LSI and GVSM. We also prove that the new methods are stable because although the selected rank of the truncated Singular Value Decomposition (SVD) is far from the optimum, the retrieval performance will not be degraded significantly. Experiments performed on standard test collections show that our methods are promising.
Object: Generalized Vector Space Model

Tagheva, K.; Borsack, J.; Condit, A.: Effects of OCR errors on ranking and feedback using the vector space model (1996) 0.34

0.33958915 = product of:
  0.45278552 = sum of:
    0.24674822 = weight(_text_:vector in 4951) [ClassicSimilarity], result of:
      0.24674822 = score(doc=4951,freq=4.0), product of:
        0.30654848 = queryWeight, product of:
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.047605187 = queryNorm
        0.804924 = fieldWeight in 4951, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.0625 = fieldNorm(doc=4951)
    0.16204487 = weight(_text_:space in 4951) [ClassicSimilarity], result of:
      0.16204487 = score(doc=4951,freq=4.0), product of:
        0.24842183 = queryWeight, product of:
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.047605187 = queryNorm
        0.6522972 = fieldWeight in 4951, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.0625 = fieldNorm(doc=4951)
    0.043992437 = product of:
      0.087984875 = sum of:
        0.087984875 = weight(_text_:model in 4951) [ClassicSimilarity], result of:
          0.087984875 = score(doc=4951,freq=4.0), product of:
            0.1830527 = queryWeight, product of:
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.047605187 = queryNorm
            0.48065326 = fieldWeight in 4951, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.0625 = fieldNorm(doc=4951)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: Reports on the performance of the vector space model in the presence of optical character recognition (OCR) errors. Average precision and recall is not affected for full text document rankings of the OCR and corrected collections with different weithing combinations. Cosine normalization plays a considerable role in the disparity seen between the collections. Even though feedback improves retrieval for both collections, it can not be used to compensate for OCR errors caused by badly degraded documents

Bollmann-Sdorra, P.; Raghavan, V.V.: On the delusiveness of adopting a common space for modelling IR objects : are queries documents? (1993) 0.34

0.33859426 = product of:
  0.451459 = sum of:
    0.18506117 = weight(_text_:vector in 6180) [ClassicSimilarity], result of:
      0.18506117 = score(doc=6180,freq=4.0), product of:
        0.30654848 = queryWeight, product of:
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.047605187 = queryNorm
        0.603693 = fieldWeight in 6180, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.046875 = fieldNorm(doc=6180)
    0.2430673 = weight(_text_:space in 6180) [ClassicSimilarity], result of:
      0.2430673 = score(doc=6180,freq=16.0), product of:
        0.24842183 = queryWeight, product of:
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.047605187 = queryNorm
        0.97844577 = fieldWeight in 6180, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.046875 = fieldNorm(doc=6180)
    0.023330513 = product of:
      0.046661027 = sum of:
        0.046661027 = weight(_text_:model in 6180) [ClassicSimilarity], result of:
          0.046661027 = score(doc=6180,freq=2.0), product of:
            0.1830527 = queryWeight, product of:
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.047605187 = queryNorm
            0.25490487 = fieldWeight in 6180, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.046875 = fieldNorm(doc=6180)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: Many authors, who adopt the vector space model, take the view that documents, terms, queries, etc., are all elements within the same (conceptual) space. This view seems to be a natural one, given that documents and queries have the same vector notation. We show, however, that the structure of the query space can be very different from that of the document space. To this end, concepts like preference, similarity, term independence, and linearity, both in the document space and in the query space, are discussed. Our conclusion is that a more realistic and complete view of IR is obtained if we do not consider documents and queries to be elements of the same space. This conclusion implies that certain restrictions usually applied in the design of an IR system are obviated. For example, the retrieval function need not be interpreted as a similarity measure

Bartell, B.T.; Cottrell, G.W.; Belew, R.K.: Representing documents using an explicit model of their similarities (1995) 0.34

0.3359205 = product of:
  0.44789404 = sum of:
    0.30220366 = weight(_text_:vector in 1426) [ClassicSimilarity], result of:
      0.30220366 = score(doc=1426,freq=6.0), product of:
        0.30654848 = queryWeight, product of:
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.047605187 = queryNorm
        0.9858266 = fieldWeight in 1426, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.0625 = fieldNorm(doc=1426)
    0.11458302 = weight(_text_:space in 1426) [ClassicSimilarity], result of:
      0.11458302 = score(doc=1426,freq=2.0), product of:
        0.24842183 = queryWeight, product of:
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.047605187 = queryNorm
        0.46124378 = fieldWeight in 1426, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.0625 = fieldNorm(doc=1426)
    0.031107351 = product of:
      0.062214702 = sum of:
        0.062214702 = weight(_text_:model in 1426) [ClassicSimilarity], result of:
          0.062214702 = score(doc=1426,freq=2.0), product of:
            0.1830527 = queryWeight, product of:
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.047605187 = queryNorm
            0.33987316 = fieldWeight in 1426, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.0625 = fieldNorm(doc=1426)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: Proposes a method for creating vector space representations of documents based on modelling target interdocument similariyt values. The target similarity values are assumed to capture semantic relationships, or associations, between the documents. The vector representations are chosen so that the inner product similarities between document vector pairs closely match their target interdocument similarities. The method is closely related to the Latent Semantic Indexing approach

Xie, Y.; Raghavan, V.V.: Language-modeling kernel based approach for information retrieval (2007) 0.33

0.32542068 = product of:
  0.43389422 = sum of:
    0.261716 = weight(_text_:vector in 1326) [ClassicSimilarity], result of:
      0.261716 = score(doc=1326,freq=8.0), product of:
        0.30654848 = queryWeight, product of:
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.047605187 = queryNorm
        0.8537508 = fieldWeight in 1326, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.046875 = fieldNorm(doc=1326)
    0.14884771 = weight(_text_:space in 1326) [ClassicSimilarity], result of:
      0.14884771 = score(doc=1326,freq=6.0), product of:
        0.24842183 = queryWeight, product of:
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.047605187 = queryNorm
        0.59917325 = fieldWeight in 1326, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.046875 = fieldNorm(doc=1326)
    0.023330513 = product of:
      0.046661027 = sum of:
        0.046661027 = weight(_text_:model in 1326) [ClassicSimilarity], result of:
          0.046661027 = score(doc=1326,freq=2.0), product of:
            0.1830527 = queryWeight, product of:
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.047605187 = queryNorm
            0.25490487 = fieldWeight in 1326, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.046875 = fieldNorm(doc=1326)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: In this presentation, we propose a novel integrated information retrieval approach that provides a unified solution for two challenging problems in the field of information retrieval. The first problem is how to build an optimal vector space corresponding to users' different information needs when applying the vector space model. The second one is how to smoothly incorporate the advantages of machine learning techniques into the language modeling approach. To solve these problems, we designed the language-modeling kernel function, which has all the modeling powers provided by language modeling techniques. In addition, for each information need, this kernel function automatically determines an optimal vector space, for which a discriminative learning machine, such as the support vector machine, can be applied to find an optimal decision boundary between relevant and nonrelevant documents. Large-scale experiments on standard test-beds show that our approach makes significant improvements over other state-of-the-art information retrieval methods.

Dominich, S.; Kiezer, T.: ¬A measure theoretic approach to information retrieval (2007) 0.32
```
0.32321647 = product of:
  0.4309553 = sum of:
    0.23081185 = weight(_text_:vector in 445) [ClassicSimilarity], result of:
      0.23081185 = score(doc=445,freq=14.0), product of:
        0.30654848 = queryWeight, product of:
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.047605187 = queryNorm
        0.7529375 = fieldWeight in 445, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.03125 = fieldNorm(doc=445)
    0.16204487 = weight(_text_:space in 445) [ClassicSimilarity], result of:
      0.16204487 = score(doc=445,freq=16.0), product of:
        0.24842183 = queryWeight, product of:
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.047605187 = queryNorm
        0.6522972 = fieldWeight in 445, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.03125 = fieldNorm(doc=445)
    0.03809857 = product of:
      0.07619714 = sum of:
        0.07619714 = weight(_text_:model in 445) [ClassicSimilarity], result of:
          0.07619714 = score(doc=445,freq=12.0), product of:
            0.1830527 = queryWeight, product of:
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.047605187 = queryNorm
            0.41625792 = fieldWeight in 445, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.03125 = fieldNorm(doc=445)
      0.5 = coord(1/2)
  0.75 = coord(3/4)
```
Abstract

The vector space model of information retrieval is one of the classical and widely applied retrieval models. Paradoxically, it has been characterized by a discrepancy between its formal framework and implementable form. The underlying concepts of the vector space model are mathematical terms: linear space, vector, and inner product. However, in the vector space model, the mathematical meaning of these concepts is not preserved. They are used as mere computational constructs or metaphors. Thus, the vector space model actually does not follow formally from the mathematical concepts on which it has been claimed to rest. This problem has been recognized for more than two decades, but no proper solution has emerged so far. The present article proposes a solution to this problem. First, the concept of retrieval is defined based on the mathematical measure theory. Then, retrieval is particularized using fuzzy set theory. As a result, the retrieval function is conceived as the cardinality of the intersection of two fuzzy sets. This view makes it possible to build a connection to linear spaces. It is shown that the classical and the generalized vector space models, as well as the latent semantic indexing model, gain a correct formal background with which they are consistent. At the same time it becomes clear that the inner product is not a necessary ingredient of the vector space model, and hence of Information Retrieval (IR). The Principle of Object Invariance is introduced to handle this situation. Moreover, this view makes it possible to consistently formulate new retrieval methods: in linear space with general basis, entropy-based, and probability-based. It is also shown that Information Retrieval may be viewed as integral calculus, and thus it gains a very compact and elegant mathematical way of writing. Also, Information Retrieval may thus be conceived as an application of mathematical measure theory.

Liu, G.Z.: Semantic vector space model : implementation and evaluation (1997) 0.32

0.3166211 = product of:
  0.42216146 = sum of:
    0.22665274 = weight(_text_:vector in 161) [ClassicSimilarity], result of:
      0.22665274 = score(doc=161,freq=6.0), product of:
        0.30654848 = queryWeight, product of:
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.047605187 = queryNorm
        0.73937 = fieldWeight in 161, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.046875 = fieldNorm(doc=161)
    0.14884771 = weight(_text_:space in 161) [ClassicSimilarity], result of:
      0.14884771 = score(doc=161,freq=6.0), product of:
        0.24842183 = queryWeight, product of:
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.047605187 = queryNorm
        0.59917325 = fieldWeight in 161, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.046875 = fieldNorm(doc=161)
    0.046661027 = product of:
      0.09332205 = sum of:
        0.09332205 = weight(_text_:model in 161) [ClassicSimilarity], result of:
          0.09332205 = score(doc=161,freq=8.0), product of:
            0.1830527 = queryWeight, product of:
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.047605187 = queryNorm
            0.50980973 = fieldWeight in 161, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.046875 = fieldNorm(doc=161)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: Presents the Semantic Vector Space Model (SVSM), a text representation and searching technique based on the combination of Vector Space Model (VSM) with heuristic syntax parsing and distributed representation of semantic case structures. Both document and queries are represented as semantic matrices. A search mechanism is designed to compute the similarity between 2 semantic matrices to predict relevancy. A prototype system was built to implement this model by modifying the SMART system and using the Xerox Part of Speech tagged as the pre-processor of the indexing. The prototype system was used in an experimental study to evaluate this technique in terms of precision, recall, and effectiveness of relevance ranking. Results show that if documents and queries were too short, the technique was less effective than VSM. But with longer documents and queires, especially when original docuemtns were used as queries, the system based on this technique was found be performance better than SMART

Billhardt, H.; Borrajo, D.; Maojo, V.: ¬A context vector model for information retrieval (2002) 0.31

0.31448868 = product of:
  0.41931823 = sum of:
    0.2438395 = weight(_text_:vector in 251) [ClassicSimilarity], result of:
      0.2438395 = score(doc=251,freq=10.0), product of:
        0.30654848 = queryWeight, product of:
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.047605187 = queryNorm
        0.79543537 = fieldWeight in 251, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.0390625 = fieldNorm(doc=251)
    0.12403977 = weight(_text_:space in 251) [ClassicSimilarity], result of:
      0.12403977 = score(doc=251,freq=6.0), product of:
        0.24842183 = queryWeight, product of:
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.047605187 = queryNorm
        0.49931106 = fieldWeight in 251, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.0390625 = fieldNorm(doc=251)
    0.051438946 = product of:
      0.10287789 = sum of:
        0.10287789 = weight(_text_:model in 251) [ClassicSimilarity], result of:
          0.10287789 = score(doc=251,freq=14.0), product of:
            0.1830527 = queryWeight, product of:
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.047605187 = queryNorm
            0.56201243 = fieldWeight in 251, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.0390625 = fieldNorm(doc=251)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: In the vector space model for information retrieval, term vectors are pair-wise orthogonal, that is, terms are assumed to be independent. It is well known that this assumption is too restrictive. In this article, we present our work on an indexing and retrieval method that, based on the vector space model, incorporates term dependencies and thus obtains semantically richer representations of documents. First, we generate term context vectors based on the co-occurrence of terms in the same documents. These vectors are used to calculate context vectors for documents. We present different techniques for estimating the dependencies among terms. We also define term weights that can be employed in the model. Experimental results on four text collections (MED, CRANFIELD, CISI, and CACM) show that the incorporation of term dependencies in the retrieval process performs statistically significantly better than the classical vector space model with OF weights. We also show that the degree of semantic matching versus direct word matching that performs best varies on the four collections. We conclude that the model performs well for certain types of queries and, generally, for information tasks with high recall requirements. Therefore, we propose the use of the context vector model in combination with other, direct word-matching methods

Lochbaum, K.E.; Streeter, A.R.: Comparing and combining the effectiveness of latent semantic indexing and the ordinary vector space model for information retrieval (1989) 0.31

0.30766279 = product of:
  0.41021705 = sum of:
    0.18506117 = weight(_text_:vector in 3458) [ClassicSimilarity], result of:
      0.18506117 = score(doc=3458,freq=4.0), product of:
        0.30654848 = queryWeight, product of:
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.047605187 = queryNorm
        0.603693 = fieldWeight in 3458, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.046875 = fieldNorm(doc=3458)
    0.19216156 = weight(_text_:space in 3458) [ClassicSimilarity], result of:
      0.19216156 = score(doc=3458,freq=10.0), product of:
        0.24842183 = queryWeight, product of:
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.047605187 = queryNorm
        0.7735293 = fieldWeight in 3458, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.046875 = fieldNorm(doc=3458)
    0.03299433 = product of:
      0.06598866 = sum of:
        0.06598866 = weight(_text_:model in 3458) [ClassicSimilarity], result of:
          0.06598866 = score(doc=3458,freq=4.0), product of:
            0.1830527 = queryWeight, product of:
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.047605187 = queryNorm
            0.36048993 = fieldWeight in 3458, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.046875 = fieldNorm(doc=3458)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: A retrievalsystem was built to find individuals with appropriate expertise within a large research establishment on the basis of their authored documents. The expert-locating system uses a new method for automatic indexing and retrieval based on singular value decomposition, a matrix decomposition technique related to the factor analysis. Organizational groups, represented by the documents they write, and the terms contained in these documents, are fit simultaneously into a 100-dimensional "semantic" space. User queries are positioned in the semantic space, and the most similar groups are returned to the user. Here we compared the standard vector-space model with this new technique and found that combining the two methods improved performance over either alone. We also examined the effects of various experimental variables on the system`s retrieval accuracy. In particular, the effects of: term weighting functions in the semantic space construction and in query construction, suffix stripping, and using lexical units larger than a a single word were studied.

Everett, D.M.; Cater, S.C.: Topology of document retrieval systems (1992) 0.30

0.30456027 = product of:
  0.40608037 = sum of:
    0.17447734 = weight(_text_:vector in 3309) [ClassicSimilarity], result of:
      0.17447734 = score(doc=3309,freq=2.0), product of:
        0.30654848 = queryWeight, product of:
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.047605187 = queryNorm
        0.5691672 = fieldWeight in 3309, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.0625 = fieldNorm(doc=3309)
    0.16204487 = weight(_text_:space in 3309) [ClassicSimilarity], result of:
      0.16204487 = score(doc=3309,freq=4.0), product of:
        0.24842183 = queryWeight, product of:
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.047605187 = queryNorm
        0.6522972 = fieldWeight in 3309, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.0625 = fieldNorm(doc=3309)
    0.06955816 = product of:
      0.13911632 = sum of:
        0.13911632 = weight(_text_:model in 3309) [ClassicSimilarity], result of:
          0.13911632 = score(doc=3309,freq=10.0), product of:
            0.1830527 = queryWeight, product of:
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.047605187 = queryNorm
            0.75997955 = fieldWeight in 3309, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.0625 = fieldNorm(doc=3309)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: Demonstrates that a wide class of document retrieval systems induce natural topologies on their underlying document spaces and that different topologies can be induced on the same document space by 2 retrieval systems. Argues that this difference in topology reflects a meaningful difference in the performance of document retrieval systems. Compares the topological structures of the following document retrieval systems: vector space model, fuzzy set model, extended boolean model, probabilistic model, and the TIRS model (topological information retrieval system)

Kumar, C.A.; Radvansky, M.; Annapurna, J.: Analysis of Vector Space Model, Latent Semantic Indexing and Formal Concept Analysis for information retrieval (2012) 0.30

0.30362892 = product of:
  0.40483856 = sum of:
    0.2159047 = weight(_text_:vector in 2710) [ClassicSimilarity], result of:
      0.2159047 = score(doc=2710,freq=4.0), product of:
        0.30654848 = queryWeight, product of:
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.047605187 = queryNorm
        0.7043085 = fieldWeight in 2710, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2710)
    0.14178926 = weight(_text_:space in 2710) [ClassicSimilarity], result of:
      0.14178926 = score(doc=2710,freq=4.0), product of:
        0.24842183 = queryWeight, product of:
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.047605187 = queryNorm
        0.5707601 = fieldWeight in 2710, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2710)
    0.04714458 = product of:
      0.09428916 = sum of:
        0.09428916 = weight(_text_:model in 2710) [ClassicSimilarity], result of:
          0.09428916 = score(doc=2710,freq=6.0), product of:
            0.1830527 = queryWeight, product of:
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.047605187 = queryNorm
            0.51509297 = fieldWeight in 2710, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2710)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: Latent Semantic Indexing (LSI), a variant of classical Vector Space Model (VSM), is an Information Retrieval (IR) model that attempts to capture the latent semantic relationship between the data items. Mathematical lattices, under the framework of Formal Concept Analysis (FCA), represent conceptual hierarchies in data and retrieve the information. However both LSI and FCA uses the data represented in form of matrices. The objective of this paper is to systematically analyze VSM, LSI and FCA for the task of IR using the standard and real life datasets.

Salton, G.; Wong, A.; Yang, C.S.: ¬A vector space model for automatic indexing (1975) 0.30

0.30015722 = product of:
  0.40020964 = sum of:
    0.21809667 = weight(_text_:vector in 1934) [ClassicSimilarity], result of:
      0.21809667 = score(doc=1934,freq=2.0), product of:
        0.30654848 = queryWeight, product of:
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.047605187 = queryNorm
        0.711459 = fieldWeight in 1934, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.078125 = fieldNorm(doc=1934)
    0.14322878 = weight(_text_:space in 1934) [ClassicSimilarity], result of:
      0.14322878 = score(doc=1934,freq=2.0), product of:
        0.24842183 = queryWeight, product of:
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.047605187 = queryNorm
        0.5765547 = fieldWeight in 1934, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.078125 = fieldNorm(doc=1934)
    0.03888419 = product of:
      0.07776838 = sum of:
        0.07776838 = weight(_text_:model in 1934) [ClassicSimilarity], result of:
          0.07776838 = score(doc=1934,freq=2.0), product of:
            0.1830527 = queryWeight, product of:
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.047605187 = queryNorm
            0.42484146 = fieldWeight in 1934, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.078125 = fieldNorm(doc=1934)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Dominich, S.; Góth, J.; Kiezer, T.; Szlávik, Z.: ¬An entropy-based interpretation of retrieval status value-based retrieval, and its application to the computation of term and query discrimination value (2004) 0.29
```
0.28576547 = product of:
  0.38102064 = sum of:
    0.21809667 = weight(_text_:vector in 2237) [ClassicSimilarity], result of:
      0.21809667 = score(doc=2237,freq=8.0), product of:
        0.30654848 = queryWeight, product of:
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.047605187 = queryNorm
        0.711459 = fieldWeight in 2237, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2237)
    0.12403977 = weight(_text_:space in 2237) [ClassicSimilarity], result of:
      0.12403977 = score(doc=2237,freq=6.0), product of:
        0.24842183 = queryWeight, product of:
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.047605187 = queryNorm
        0.49931106 = fieldWeight in 2237, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2237)
    0.03888419 = product of:
      0.07776838 = sum of:
        0.07776838 = weight(_text_:model in 2237) [ClassicSimilarity], result of:
          0.07776838 = score(doc=2237,freq=8.0), product of:
            0.1830527 = queryWeight, product of:
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.047605187 = queryNorm
            0.42484146 = fieldWeight in 2237, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2237)
      0.5 = coord(1/2)
  0.75 = coord(3/4)
```
Abstract

The concepts of Shannon information and entropy have been applied to a number of information retrieval tasks such as to formalize the probabilistic model, to design practical retrieval systems, to cluster documents, and to model texture in image retrieval. In this report, the concept of entropy is used for a different purpose. It is shown that any positive Retrieval Status Value (RSV)based retrieval system may be conceived as a special probability space in which the amount of the associated Shannon information is being reduced; in this view, the retrieval system is referred to as Uncertainty Decreasing Operation (UDO). The concept of UDO is then proposed as a theoretical background for term and query discrimination Power, and it is applied to the computation of term and query discrimination values in the vector space retrieval model. Experimental evidence is given as regards such computation; the results obtained compare weIl to those obtained using vector-based calculation of term discrimination values. The UDO-based computation, however, presents advantages over the vectorbased calculation: It is faster, easier to assess and handle in practice, and its application is not restricted to the vector space model. Based an the ADI test collection, it is shown that the UDO-based Term Discrimination Value (TDV) weighting scheme yields better retrieval effectiveness than using the vector-based TDV weighting scheme. Also, experimental evidence is given to the intuition that the choice of an appropriate weighting scheure and similarity measure depends an collection properties, and thus the UDO approach may be used as a theoretical basis for this intuition.

Song, D.; Bruza, P.D.: Towards context sensitive information inference (2003) 0.28

0.2764349 = product of:
  0.36857986 = sum of:
    0.15421765 = weight(_text_:vector in 1428) [ClassicSimilarity], result of:
      0.15421765 = score(doc=1428,freq=4.0), product of:
        0.30654848 = queryWeight, product of:
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.047605187 = queryNorm
        0.5030775 = fieldWeight in 1428, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1428)
    0.14322878 = weight(_text_:space in 1428) [ClassicSimilarity], result of:
      0.14322878 = score(doc=1428,freq=8.0), product of:
        0.24842183 = queryWeight, product of:
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.047605187 = queryNorm
        0.5765547 = fieldWeight in 1428, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1428)
    0.07113342 = sum of:
      0.03888419 = weight(_text_:model in 1428) [ClassicSimilarity], result of:
        0.03888419 = score(doc=1428,freq=2.0), product of:
          0.1830527 = queryWeight, product of:
            3.845226 = idf(docFreq=2569, maxDocs=44218)
            0.047605187 = queryNorm
          0.21242073 = fieldWeight in 1428, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.845226 = idf(docFreq=2569, maxDocs=44218)
            0.0390625 = fieldNorm(doc=1428)
      0.032249227 = weight(_text_:22 in 1428) [ClassicSimilarity], result of:
        0.032249227 = score(doc=1428,freq=2.0), product of:
          0.16670525 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.047605187 = queryNorm
          0.19345059 = fieldWeight in 1428, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=1428)
  0.75 = coord(3/4)

Abstract: Humans can make hasty, but generally robust judgements about what a text fragment is, or is not, about. Such judgements are termed information inference. This article furnishes an account of information inference from a psychologistic stance. By drawing an theories from nonclassical logic and applied cognition, an information inference mechanism is proposed that makes inferences via computations of information flow through an approximation of a conceptual space. Within a conceptual space information is represented geometrically. In this article, geometric representations of words are realized as vectors in a high dimensional semantic space, which is automatically constructed from a text corpus. Two approaches were presented for priming vector representations according to context. The first approach uses a concept combination heuristic to adjust the vector representation of a concept in the light of the representation of another concept. The second approach computes a prototypical concept an the basis of exemplar trace texts and moves it in the dimensional space according to the context. Information inference is evaluated by measuring the effectiveness of query models derived by information flow computations. Results show that information flow contributes significantly to query model effectiveness, particularly with respect to precision. Moreover, retrieval effectiveness compares favorably with two probabilistic query models, and another based an semantic association. More generally, this article can be seen as a contribution towards realizing operational systems that mimic text-based human reasoning.
Date: 22. 3.2003 19:35:46

Zhu, W.Z.; Allen, R.B.: Document clustering using the LSI subspace signature model (2013) 0.27

0.26987368 = product of:
  0.35983157 = sum of:
    0.130858 = weight(_text_:vector in 690) [ClassicSimilarity], result of:
      0.130858 = score(doc=690,freq=2.0), product of:
        0.30654848 = queryWeight, product of:
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.047605187 = queryNorm
        0.4268754 = fieldWeight in 690, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.046875 = fieldNorm(doc=690)
    0.08593727 = weight(_text_:space in 690) [ClassicSimilarity], result of:
      0.08593727 = score(doc=690,freq=2.0), product of:
        0.24842183 = queryWeight, product of:
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.047605187 = queryNorm
        0.34593284 = fieldWeight in 690, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.046875 = fieldNorm(doc=690)
    0.1430363 = sum of:
      0.10433724 = weight(_text_:model in 690) [ClassicSimilarity], result of:
        0.10433724 = score(doc=690,freq=10.0), product of:
          0.1830527 = queryWeight, product of:
            3.845226 = idf(docFreq=2569, maxDocs=44218)
            0.047605187 = queryNorm
          0.5699847 = fieldWeight in 690, product of:
            3.1622777 = tf(freq=10.0), with freq of:
              10.0 = termFreq=10.0
            3.845226 = idf(docFreq=2569, maxDocs=44218)
            0.046875 = fieldNorm(doc=690)
      0.03869907 = weight(_text_:22 in 690) [ClassicSimilarity], result of:
        0.03869907 = score(doc=690,freq=2.0), product of:
          0.16670525 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.047605187 = queryNorm
          0.23214069 = fieldWeight in 690, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=690)
  0.75 = coord(3/4)

Abstract: We describe the latent semantic indexing subspace signature model (LSISSM) for semantic content representation of unstructured text. Grounded on singular value decomposition, the model represents terms and documents by the distribution signatures of their statistical contribution across the top-ranking latent concept dimensions. LSISSM matches term signatures with document signatures according to their mapping coherence between latent semantic indexing (LSI) term subspace and LSI document subspace. LSISSM does feature reduction and finds a low-rank approximation of scalable and sparse term-document matrices. Experiments demonstrate that this approach significantly improves the performance of major clustering algorithms such as standard K-means and self-organizing maps compared with the vector space model and the traditional LSI model. The unique contribution ranking mechanism in LSISSM also improves the initialization of standard K-means compared with random seeding procedure, which sometimes causes low efficiency and effectiveness of clustering. A two-stage initialization strategy based on LSISSM significantly reduces the running time of standard K-means procedures.
Date: 23. 3.2013 13:22:36

Schlieder, T.; Meuss, H.: Querying and ranking XML documents (2002) 0.27

0.26907256 = product of:
  0.35876343 = sum of:
    0.18506117 = weight(_text_:vector in 459) [ClassicSimilarity], result of:
      0.18506117 = score(doc=459,freq=4.0), product of:
        0.30654848 = queryWeight, product of:
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.047605187 = queryNorm
        0.603693 = fieldWeight in 459, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.046875 = fieldNorm(doc=459)
    0.12153365 = weight(_text_:space in 459) [ClassicSimilarity], result of:
      0.12153365 = score(doc=459,freq=4.0), product of:
        0.24842183 = queryWeight, product of:
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.047605187 = queryNorm
        0.48922288 = fieldWeight in 459, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.046875 = fieldNorm(doc=459)
    0.05216862 = product of:
      0.10433724 = sum of:
        0.10433724 = weight(_text_:model in 459) [ClassicSimilarity], result of:
          0.10433724 = score(doc=459,freq=10.0), product of:
            0.1830527 = queryWeight, product of:
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.047605187 = queryNorm
            0.5699847 = fieldWeight in 459, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.046875 = fieldNorm(doc=459)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: XML represents both content and structure of documents. Taking advantage of the document structure promises to greatly improve the retrieval precision. In this article, we present a retrieval technique that adopts the similarity measure of the vector space model, incorporates the document structure, and supports structured queries. Our query model is based on tree matching as a simple and elegant means to formulate queries without knowing the exact structure of the data. Using this query model we propose a logical document concept by deciding on the document boundaries at query time. We combine structured queries and term-based ranking by extending the term concept to structural terms that include substructures of queries and documents. The notions of term frequency and inverse document frequency are adapted to logical documents and structural terms. We introduce an efficient technique to calculate all necessary term frequencies and inverse document frequencies at query time. By adjusting parameters of the retrieval process we are able to model two contrary approaches: the classical vector space model, and the original tree matching approach.

Efron, M.: Query expansion and dimensionality reduction : Notions of optimality in Rocchio relevance feedback and latent semantic indexing (2008) 0.26

0.26025337 = product of:
  0.34700447 = sum of:
    0.18506117 = weight(_text_:vector in 2020) [ClassicSimilarity], result of:
      0.18506117 = score(doc=2020,freq=4.0), product of:
        0.30654848 = queryWeight, product of:
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.047605187 = queryNorm
        0.603693 = fieldWeight in 2020, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.046875 = fieldNorm(doc=2020)
    0.12153365 = weight(_text_:space in 2020) [ClassicSimilarity], result of:
      0.12153365 = score(doc=2020,freq=4.0), product of:
        0.24842183 = queryWeight, product of:
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.047605187 = queryNorm
        0.48922288 = fieldWeight in 2020, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.046875 = fieldNorm(doc=2020)
    0.04040964 = product of:
      0.08081928 = sum of:
        0.08081928 = weight(_text_:model in 2020) [ClassicSimilarity], result of:
          0.08081928 = score(doc=2020,freq=6.0), product of:
            0.1830527 = queryWeight, product of:
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.047605187 = queryNorm
            0.44150823 = fieldWeight in 2020, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.046875 = fieldNorm(doc=2020)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: Rocchio relevance feedback and latent semantic indexing (LSI) are well-known extensions of the vector space model for information retrieval (IR). This paper analyzes the statistical relationship between these extensions. The analysis focuses on each method's basis in least-squares optimization. Noting that LSI and Rocchio relevance feedback both alter the vector space model in a way that is in some sense least-squares optimal, we ask: what is the relationship between LSI's and Rocchio's notions of optimality? What does this relationship imply for IR? Using an analytical approach, we argue that Rocchio relevance feedback is optimal if we understand retrieval as a simplified classification problem. On the other hand, LSI's motivation comes to the fore if we understand it as a biased regression technique, where projection onto a low-dimensional orthogonal subspace of the documents reduces model variance.

Burgin, R.: ¬The effect of indexing exhaustivity on retrieval performance (1991) 0.25

0.25469187 = product of:
  0.33958915 = sum of:
    0.18506117 = weight(_text_:vector in 5262) [ClassicSimilarity], result of:
      0.18506117 = score(doc=5262,freq=4.0), product of:
        0.30654848 = queryWeight, product of:
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.047605187 = queryNorm
        0.603693 = fieldWeight in 5262, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.046875 = fieldNorm(doc=5262)
    0.12153365 = weight(_text_:space in 5262) [ClassicSimilarity], result of:
      0.12153365 = score(doc=5262,freq=4.0), product of:
        0.24842183 = queryWeight, product of:
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.047605187 = queryNorm
        0.48922288 = fieldWeight in 5262, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.046875 = fieldNorm(doc=5262)
    0.03299433 = product of:
      0.06598866 = sum of:
        0.06598866 = weight(_text_:model in 5262) [ClassicSimilarity], result of:
          0.06598866 = score(doc=5262,freq=4.0), product of:
            0.1830527 = queryWeight, product of:
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.047605187 = queryNorm
            0.36048993 = fieldWeight in 5262, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.046875 = fieldNorm(doc=5262)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: The study was based on the collection examnined by W.H. Shaw (Inf. proc. man. 26(1990) no.6, S.693-703, 705-718), a test collection of 1239 articles, indexed with the term cystic fibrosis; and 100 queries with 3 sets of relevance evaluations from subject experts. The effect of variations in indexing exhaustivity on retrieval performance in a vector space retrieval system was investigated by using a term weight threshold to construct different document representations for a test collection. Retrieval results showed that retrieval performance, as measured by the mean optimal measure for all queries at a term weight threshold, was highest at the most exhaustive representation, and decreased slightly as terms were eliminated and the indexing representation became less exhaustive. The findings suggest that the vector space model is more robust against variations in indexing exhaustivity that is the single-link clustering model

Dubin, D.: ¬The most influential paper Gerard Salton never wrote (2004) 0.25

0.2533339 = product of:
  0.33777854 = sum of:
    0.18887727 = weight(_text_:vector in 26) [ClassicSimilarity], result of:
      0.18887727 = score(doc=26,freq=6.0), product of:
        0.30654848 = queryWeight, product of:
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.047605187 = queryNorm
        0.6161416 = fieldWeight in 26, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.0390625 = fieldNorm(doc=26)
    0.101278044 = weight(_text_:space in 26) [ClassicSimilarity], result of:
      0.101278044 = score(doc=26,freq=4.0), product of:
        0.24842183 = queryWeight, product of:
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.047605187 = queryNorm
        0.40768576 = fieldWeight in 26, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.0390625 = fieldNorm(doc=26)
    0.04762321 = product of:
      0.09524642 = sum of:
        0.09524642 = weight(_text_:model in 26) [ClassicSimilarity], result of:
          0.09524642 = score(doc=26,freq=12.0), product of:
            0.1830527 = queryWeight, product of:
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.047605187 = queryNorm
            0.5203224 = fieldWeight in 26, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.0390625 = fieldNorm(doc=26)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: Gerard Salton is often credited with developing the vector space model (VSM) for information retrieval (IR). Citations to Salton give the impression that the VSM must have been articulated as an IR model sometime between 1970 and 1975. However, the VSM as it is understood today evolved over a longer time period than is usually acknowledged, and an articulation of the model and its assumptions did not appear in print until several years after those assumptions had been criticized and alternative models proposed. An often cited overview paper titled "A Vector Space Model for Information Retrieval" (alleged to have been published in 1975) does not exist, and citations to it represent a confusion of two 1975 articles, neither of which were overviews of the VSM as a model of information retrieval. Until the late 1970s, Salton did not present vector spaces as models of IR generally but rather as models of specific computations. Citations to the phantom paper reflect an apparently widely held misconception that the operational features and explanatory devices now associated with the VSM must have been introduced at the same time it was first proposed as an IR model.

Kiros, R.; Salakhutdinov, R.; Zemel, R.S.: Unifying visual-semantic embeddings with multimodal neural language models (2014) 0.25

0.25179514 = product of:
  0.33572686 = sum of:
    0.130858 = weight(_text_:vector in 1871) [ClassicSimilarity], result of:
      0.130858 = score(doc=1871,freq=2.0), product of:
        0.30654848 = queryWeight, product of:
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.047605187 = queryNorm
        0.4268754 = fieldWeight in 1871, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.439392 = idf(docFreq=191, maxDocs=44218)
          0.046875 = fieldNorm(doc=1871)
    0.17187454 = weight(_text_:space in 1871) [ClassicSimilarity], result of:
      0.17187454 = score(doc=1871,freq=8.0), product of:
        0.24842183 = queryWeight, product of:
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.047605187 = queryNorm
        0.6918657 = fieldWeight in 1871, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.046875 = fieldNorm(doc=1871)
    0.03299433 = product of:
      0.06598866 = sum of:
        0.06598866 = weight(_text_:model in 1871) [ClassicSimilarity], result of:
          0.06598866 = score(doc=1871,freq=4.0), product of:
            0.1830527 = queryWeight, product of:
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.047605187 = queryNorm
            0.36048993 = fieldWeight in 1871, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.046875 = fieldNorm(doc=1871)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: Inspired by recent advances in multimodal learning and machine translation, we introduce an encoder-decoder pipeline that learns (a): a multimodal joint embedding space with images and text and (b): a novel language model for decoding distributed representations from our space. Our pipeline effectively unifies joint image-text embedding models with multimodal neural language models. We introduce the structure-content neural language model that disentangles the structure of a sentence to its content, conditioned on representations produced by the encoder. The encoder allows one to rank images and sentences while the decoder can generate novel descriptions from scratch. Using LSTM to encode sentences, we match the state-of-the-art performance on Flickr8K and Flickr30K without using object detections. We also set new best results when using the 19-layer Oxford convolutional network. Furthermore we show that with linear encoders, the learned embedding space captures multimodal regularities in terms of vector space arithmetic e.g. *image of a blue car* - "blue" + "red" is near images of red cars. Sample captions generated for 800 images are made available for comparison.

Search (6445 results, page 1 of 323)

Authors

Years

Languages

Types

Themes

Subjects

Classifications