Search (6820 results, page 1 of 341)

Lund, K.; Burgess, C.; Atchley, R.A.: Semantic and associative priming in high-dimensional semantic space (1995) 0.47

0.46663892 = product of:
  0.62218523 = sum of:
    0.35174897 = weight(_text_:vectors in 2151) [ClassicSimilarity], result of:
      0.35174897 = score(doc=2151,freq=4.0), product of:
        0.41416162 = queryWeight, product of:
          7.7650614 = idf(docFreq=50, maxDocs=44218)
          0.053336553 = queryNorm
        0.8493036 = fieldWeight in 2151, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          7.7650614 = idf(docFreq=50, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2151)
    0.1588598 = weight(_text_:space in 2151) [ClassicSimilarity], result of:
      0.1588598 = score(doc=2151,freq=4.0), product of:
        0.27833027 = queryWeight, product of:
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.053336553 = queryNorm
        0.5707601 = fieldWeight in 2151, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2151)
    0.11157641 = sum of:
      0.060991846 = weight(_text_:model in 2151) [ClassicSimilarity], result of:
        0.060991846 = score(doc=2151,freq=2.0), product of:
          0.2050911 = queryWeight, product of:
            3.845226 = idf(docFreq=2569, maxDocs=44218)
            0.053336553 = queryNorm
          0.29738903 = fieldWeight in 2151, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.845226 = idf(docFreq=2569, maxDocs=44218)
            0.0546875 = fieldNorm(doc=2151)
      0.050584566 = weight(_text_:22 in 2151) [ClassicSimilarity], result of:
        0.050584566 = score(doc=2151,freq=2.0), product of:
          0.18677552 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.053336553 = queryNorm
          0.2708308 = fieldWeight in 2151, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0546875 = fieldNorm(doc=2151)
  0.75 = coord(3/4)

Abstract: We present a model of semantic memory that utilizes a high dimensional semantic space constructed from a co-occurrence matrix. This matrix was formed by analyzing a lot) million word corpus. Word vectors were then obtained by extracting rows and columns of this matrix, These vectors were subjected to multidimensional scaling. Words were found to cluster semantically. suggesting that interword distance may be interpretable as a measure of semantic similarity, In attempting to replicate with our simulation the semantic and ...
Source: Proceedings of the Seventeenth Annual Conference of the Cognitive Science Society: July 22 - 25, 1995, University of Pittsburgh / ed. by Johanna D. Moore and Jill Fain Lehmann

Lund, K.; Burgess, C.: Producing high-dimensional semantic spaces from lexical co-occurrence (1996) 0.45

0.44935113 = product of:
  0.59913486 = sum of:
    0.47671193 = weight(_text_:vectors in 1704) [ClassicSimilarity], result of:
      0.47671193 = score(doc=1704,freq=10.0), product of:
        0.41416162 = queryWeight, product of:
          7.7650614 = idf(docFreq=50, maxDocs=44218)
          0.053336553 = queryNorm
        1.1510288 = fieldWeight in 1704, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          7.7650614 = idf(docFreq=50, maxDocs=44218)
          0.046875 = fieldNorm(doc=1704)
    0.09628358 = weight(_text_:space in 1704) [ClassicSimilarity], result of:
      0.09628358 = score(doc=1704,freq=2.0), product of:
        0.27833027 = queryWeight, product of:
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.053336553 = queryNorm
        0.34593284 = fieldWeight in 1704, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.046875 = fieldNorm(doc=1704)
    0.02613936 = product of:
      0.05227872 = sum of:
        0.05227872 = weight(_text_:model in 1704) [ClassicSimilarity], result of:
          0.05227872 = score(doc=1704,freq=2.0), product of:
            0.2050911 = queryWeight, product of:
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.053336553 = queryNorm
            0.25490487 = fieldWeight in 1704, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.046875 = fieldNorm(doc=1704)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: A procedure that processes a corpus of text and produces numeric vectors containing information about its meanings for each word is presented. This procedure is applied to a large corpus of natural language text taken from Usenet, and the resulting vectors are examined to determine what information is contained within them. These vectors provide the coordinates in a high-dimensional space in which word relationships can be analyzed. Analyses of both vector similarity and multidimensional scaling demonstrate that there is significant semantic information carried in the vectors. A comparison of vector similarity with human reaction times in a single-word priming experiment is presented. These vectors provide the basis for a representational model of semantic memory, hyperspace analogue to language (HAL).

Billhardt, H.; Borrajo, D.; Maojo, V.: ¬A context vector model for information retrieval (2002) 0.41

0.41394398 = product of:
  0.5519253 = sum of:
    0.35532007 = weight(_text_:vectors in 251) [ClassicSimilarity], result of:
      0.35532007 = score(doc=251,freq=8.0), product of:
        0.41416162 = queryWeight, product of:
          7.7650614 = idf(docFreq=50, maxDocs=44218)
          0.053336553 = queryNorm
        0.85792613 = fieldWeight in 251, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          7.7650614 = idf(docFreq=50, maxDocs=44218)
          0.0390625 = fieldNorm(doc=251)
    0.13897339 = weight(_text_:space in 251) [ClassicSimilarity], result of:
      0.13897339 = score(doc=251,freq=6.0), product of:
        0.27833027 = queryWeight, product of:
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.053336553 = queryNorm
        0.49931106 = fieldWeight in 251, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.0390625 = fieldNorm(doc=251)
    0.057631876 = product of:
      0.11526375 = sum of:
        0.11526375 = weight(_text_:model in 251) [ClassicSimilarity], result of:
          0.11526375 = score(doc=251,freq=14.0), product of:
            0.2050911 = queryWeight, product of:
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.053336553 = queryNorm
            0.56201243 = fieldWeight in 251, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.0390625 = fieldNorm(doc=251)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: In the vector space model for information retrieval, term vectors are pair-wise orthogonal, that is, terms are assumed to be independent. It is well known that this assumption is too restrictive. In this article, we present our work on an indexing and retrieval method that, based on the vector space model, incorporates term dependencies and thus obtains semantically richer representations of documents. First, we generate term context vectors based on the co-occurrence of terms in the same documents. These vectors are used to calculate context vectors for documents. We present different techniques for estimating the dependencies among terms. We also define term weights that can be employed in the model. Experimental results on four text collections (MED, CRANFIELD, CISI, and CACM) show that the incorporation of term dependencies in the retrieval process performs statistically significantly better than the classical vector space model with OF weights. We also show that the degree of semantic matching versus direct word matching that performs best varies on the four collections. We conclude that the model performs well for certain types of queries and, generally, for information tasks with high recall requirements. Therefore, we propose the use of the context vector model in combination with other, direct word-matching methods

Meincke, P.P.M.; Atherton, P.: Knowledge space : a conceptual basis for the organization of knowledge (1976) 0.36

0.35751185 = product of:
  0.7150237 = sum of:
    0.5561639 = weight(_text_:vectors in 78) [ClassicSimilarity], result of:
      0.5561639 = score(doc=78,freq=10.0), product of:
        0.41416162 = queryWeight, product of:
          7.7650614 = idf(docFreq=50, maxDocs=44218)
          0.053336553 = queryNorm
        1.3428669 = fieldWeight in 78, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          7.7650614 = idf(docFreq=50, maxDocs=44218)
          0.0546875 = fieldNorm(doc=78)
    0.1588598 = weight(_text_:space in 78) [ClassicSimilarity], result of:
      0.1588598 = score(doc=78,freq=4.0), product of:
        0.27833027 = queryWeight, product of:
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.053336553 = queryNorm
        0.5707601 = fieldWeight in 78, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.0546875 = fieldNorm(doc=78)
  0.5 = coord(2/4)

Abstract: We propose a new conceptual basis for visualizing the organization of information, or knowledge, which differentiates between the concept 'vectors' for a field of knowledge represented in a multidimensional space, and the state 'vectors' for a person based on his understanding of these concepts, and the representational 'vectors' for information items which might be in a retrieval system which covers a subspace of knowledge. This accomodates the notion of search volume in which the user of a retrieval system can expand or reduce the subspace he searches for relevant information items which have representational vectors with components on basic concept vectors similar to his state vector. The benefits of such a new conceptual framework are explored in this article

Wang, Y.; Lee, J.-S.; Choi, I.-C.: Indexing by Latent Dirichlet Allocation and an Ensemble Model (2016) 0.32

0.32007533 = product of:
  0.4267671 = sum of:
    0.21319206 = weight(_text_:vectors in 3019) [ClassicSimilarity], result of:
      0.21319206 = score(doc=3019,freq=2.0), product of:
        0.41416162 = queryWeight, product of:
          7.7650614 = idf(docFreq=50, maxDocs=44218)
          0.053336553 = queryNorm
        0.5147557 = fieldWeight in 3019, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          7.7650614 = idf(docFreq=50, maxDocs=44218)
          0.046875 = fieldNorm(doc=3019)
    0.09628358 = weight(_text_:space in 3019) [ClassicSimilarity], result of:
      0.09628358 = score(doc=3019,freq=2.0), product of:
        0.27833027 = queryWeight, product of:
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.053336553 = queryNorm
        0.34593284 = fieldWeight in 3019, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.046875 = fieldNorm(doc=3019)
    0.11729148 = sum of:
      0.07393328 = weight(_text_:model in 3019) [ClassicSimilarity], result of:
        0.07393328 = score(doc=3019,freq=4.0), product of:
          0.2050911 = queryWeight, product of:
            3.845226 = idf(docFreq=2569, maxDocs=44218)
            0.053336553 = queryNorm
          0.36048993 = fieldWeight in 3019, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            3.845226 = idf(docFreq=2569, maxDocs=44218)
            0.046875 = fieldNorm(doc=3019)
      0.0433582 = weight(_text_:22 in 3019) [ClassicSimilarity], result of:
        0.0433582 = score(doc=3019,freq=2.0), product of:
          0.18677552 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.053336553 = queryNorm
          0.23214069 = fieldWeight in 3019, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=3019)
  0.75 = coord(3/4)

Abstract: The contribution of this article is twofold. First, we present Indexing by latent Dirichlet allocation (LDI), an automatic document indexing method. Many ad hoc applications, or their variants with smoothing techniques suggested in LDA-based language modeling, can result in unsatisfactory performance as the document representations do not accurately reflect concept space. To improve document retrieval performance, we introduce a new definition of document probability vectors in the context of LDA and present a novel scheme for automatic document indexing based on LDA. Second, we propose an Ensemble Model (EnM) for document retrieval. EnM combines basic indexing models by assigning different weights and attempts to uncover the optimal weights to maximize the mean average precision. To solve the optimization problem, we propose an algorithm, which is derived based on the boosting method. The results of our computational experiments on benchmark data sets indicate that both the proposed approaches are viable options for document retrieval.
Date: 12. 6.2016 21:39:22

Song, D.; Bruza, P.D.: Towards context sensitive information inference (2003) 0.31

0.31337258 = product of:
  0.4178301 = sum of:
    0.17766003 = weight(_text_:vectors in 1428) [ClassicSimilarity], result of:
      0.17766003 = score(doc=1428,freq=2.0), product of:
        0.41416162 = queryWeight, product of:
          7.7650614 = idf(docFreq=50, maxDocs=44218)
          0.053336553 = queryNorm
        0.42896307 = fieldWeight in 1428, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          7.7650614 = idf(docFreq=50, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1428)
    0.16047263 = weight(_text_:space in 1428) [ClassicSimilarity], result of:
      0.16047263 = score(doc=1428,freq=8.0), product of:
        0.27833027 = queryWeight, product of:
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.053336553 = queryNorm
        0.5765547 = fieldWeight in 1428, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1428)
    0.07969743 = sum of:
      0.0435656 = weight(_text_:model in 1428) [ClassicSimilarity], result of:
        0.0435656 = score(doc=1428,freq=2.0), product of:
          0.2050911 = queryWeight, product of:
            3.845226 = idf(docFreq=2569, maxDocs=44218)
            0.053336553 = queryNorm
          0.21242073 = fieldWeight in 1428, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.845226 = idf(docFreq=2569, maxDocs=44218)
            0.0390625 = fieldNorm(doc=1428)
      0.036131833 = weight(_text_:22 in 1428) [ClassicSimilarity], result of:
        0.036131833 = score(doc=1428,freq=2.0), product of:
          0.18677552 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.053336553 = queryNorm
          0.19345059 = fieldWeight in 1428, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=1428)
  0.75 = coord(3/4)

Abstract: Humans can make hasty, but generally robust judgements about what a text fragment is, or is not, about. Such judgements are termed information inference. This article furnishes an account of information inference from a psychologistic stance. By drawing an theories from nonclassical logic and applied cognition, an information inference mechanism is proposed that makes inferences via computations of information flow through an approximation of a conceptual space. Within a conceptual space information is represented geometrically. In this article, geometric representations of words are realized as vectors in a high dimensional semantic space, which is automatically constructed from a text corpus. Two approaches were presented for priming vector representations according to context. The first approach uses a concept combination heuristic to adjust the vector representation of a concept in the light of the representation of another concept. The second approach computes a prototypical concept an the basis of exemplar trace texts and moves it in the dimensional space according to the context. Information inference is evaluated by measuring the effectiveness of query models derived by information flow computations. Results show that information flow contributes significantly to query model effectiveness, particularly with respect to precision. Moreover, retrieval effectiveness compares favorably with two probabilistic query models, and another based an semantic association. More generally, this article can be seen as a contribution towards realizing operational systems that mimic text-based human reasoning.
Date: 22. 3.2003 19:35:46

Bollmann-Sdorra, P.; Raghavan, V.V.: On the delusiveness of adopting a common space for modelling IR objects : are queries documents? (1993) 0.25

0.248824 = product of:
  0.33176532 = sum of:
    0.03329493 = product of:
      0.099884786 = sum of:
        0.099884786 = weight(_text_:objects in 6180) [ClassicSimilarity], result of:
          0.099884786 = score(doc=6180,freq=2.0), product of:
            0.28348756 = queryWeight, product of:
              5.315071 = idf(docFreq=590, maxDocs=44218)
              0.053336553 = queryNorm
            0.35234275 = fieldWeight in 6180, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.315071 = idf(docFreq=590, maxDocs=44218)
              0.046875 = fieldNorm(doc=6180)
      0.33333334 = coord(1/3)
    0.27233106 = weight(_text_:space in 6180) [ClassicSimilarity], result of:
      0.27233106 = score(doc=6180,freq=16.0), product of:
        0.27833027 = queryWeight, product of:
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.053336553 = queryNorm
        0.97844577 = fieldWeight in 6180, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.046875 = fieldNorm(doc=6180)
    0.02613936 = product of:
      0.05227872 = sum of:
        0.05227872 = weight(_text_:model in 6180) [ClassicSimilarity], result of:
          0.05227872 = score(doc=6180,freq=2.0), product of:
            0.2050911 = queryWeight, product of:
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.053336553 = queryNorm
            0.25490487 = fieldWeight in 6180, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.046875 = fieldNorm(doc=6180)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: Many authors, who adopt the vector space model, take the view that documents, terms, queries, etc., are all elements within the same (conceptual) space. This view seems to be a natural one, given that documents and queries have the same vector notation. We show, however, that the structure of the query space can be very different from that of the document space. To this end, concepts like preference, similarity, term independence, and linearity, both in the document space and in the query space, are discussed. Our conclusion is that a more realistic and complete view of IR is obtained if we do not consider documents and queries to be elements of the same space. This conclusion implies that certain restrictions usually applied in the design of an IR system are obviated. For example, the retrieval function need not be interpreted as a similarity measure

Schutze, H.; Pederson, J.O.: ¬A cooccurrence-based thesaurus and two applications to information retrieval (1997) 0.23

0.23290506 = product of:
  0.46581012 = sum of:
    0.28425607 = weight(_text_:vectors in 153) [ClassicSimilarity], result of:
      0.28425607 = score(doc=153,freq=2.0), product of:
        0.41416162 = queryWeight, product of:
          7.7650614 = idf(docFreq=50, maxDocs=44218)
          0.053336553 = queryNorm
        0.6863409 = fieldWeight in 153, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          7.7650614 = idf(docFreq=50, maxDocs=44218)
          0.0625 = fieldNorm(doc=153)
    0.18155405 = weight(_text_:space in 153) [ClassicSimilarity], result of:
      0.18155405 = score(doc=153,freq=4.0), product of:
        0.27833027 = queryWeight, product of:
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.053336553 = queryNorm
        0.6522972 = fieldWeight in 153, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.0625 = fieldNorm(doc=153)
  0.5 = coord(2/4)

Abstract: Presents a new method for computing a thesaurus from a text corpus. Each word is represented as a vector in a multi-dimensional space that captures cooccurrence information. Words are defined to be similar if they have similar cooccurrence patterns. 2 different methods for using these thesaurus vectors in information retrieval are shown to significantly improve performance over the Tipster reference corpus as compared to a vector space baseline

Bradford, R.B.: Relationship discovery in large text collections using Latent Semantic Indexing (2006) 0.23
```
0.22967187 = product of:
  0.30622914 = sum of:
    0.2009994 = weight(_text_:vectors in 1163) [ClassicSimilarity], result of:
      0.2009994 = score(doc=1163,freq=4.0), product of:
        0.41416162 = queryWeight, product of:
          7.7650614 = idf(docFreq=50, maxDocs=44218)
          0.053336553 = queryNorm
        0.48531634 = fieldWeight in 1163, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          7.7650614 = idf(docFreq=50, maxDocs=44218)
          0.03125 = fieldNorm(doc=1163)
    0.090777025 = weight(_text_:space in 1163) [ClassicSimilarity], result of:
      0.090777025 = score(doc=1163,freq=4.0), product of:
        0.27833027 = queryWeight, product of:
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.053336553 = queryNorm
        0.3261486 = fieldWeight in 1163, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.03125 = fieldNorm(doc=1163)
    0.014452733 = product of:
      0.028905466 = sum of:
        0.028905466 = weight(_text_:22 in 1163) [ClassicSimilarity], result of:
          0.028905466 = score(doc=1163,freq=2.0), product of:
            0.18677552 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.053336553 = queryNorm
            0.15476047 = fieldWeight in 1163, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=1163)
      0.5 = coord(1/2)
  0.75 = coord(3/4)
```
Abstract

This paper addresses the problem of information discovery in large collections of text. For users, one of the key problems in working with such collections is determining where to focus their attention. In selecting documents for examination, users must be able to formulate reasonably precise queries. Queries that are too broad will greatly reduce the efficiency of information discovery efforts by overwhelming the users with peripheral information. In order to formulate efficient queries, a mechanism is needed to automatically alert users regarding potentially interesting information contained within the collection. This paper presents the results of an experiment designed to test one approach to generation of such alerts. The technique of latent semantic indexing (LSI) is used to identify relationships among entities of interest. Entity extraction software is used to pre-process the text of the collection so that the LSI space contains representation vectors for named entities in addition to those for individual terms. In the LSI space, the cosine of the angle between the representation vectors for two entities captures important information regarding the degree of association of those two entities. For appropriate choices of entities, determining the entity pairs with the highest mutual cosine values yields valuable information regarding the contents of the text collection. The test database used for the experiment consists of 150,000 news articles. The proposed approach for alert generation is tested using a counterterrorism analysis example. The approach is shown to have significant potential for aiding users in rapidly focusing on information of potential importance in large text collections. The approach also has value in identifying possible use of aliases.

Source

Proceedings of the Fourth Workshop on Link Analysis, Counterterrorism, and Security, SIAM Data Mining Conference, Bethesda, MD, 20-22 April, 2006. [http://www.siam.org/meetings/sdm06/workproceed/Link%20Analysis/15.pdf]

Batorowska, H.; Kaminska-Czubala, B.: Information retrieval support : visualisation of the information space of a document (2014) 0.20

0.20011163 = product of:
  0.2668155 = sum of:
    0.031390764 = product of:
      0.094172284 = sum of:
        0.094172284 = weight(_text_:objects in 1444) [ClassicSimilarity], result of:
          0.094172284 = score(doc=1444,freq=4.0), product of:
            0.28348756 = queryWeight, product of:
              5.315071 = idf(docFreq=590, maxDocs=44218)
              0.053336553 = queryNorm
            0.33219194 = fieldWeight in 1444, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.315071 = idf(docFreq=590, maxDocs=44218)
              0.03125 = fieldNorm(doc=1444)
      0.33333334 = coord(1/3)
    0.15723042 = weight(_text_:space in 1444) [ClassicSimilarity], result of:
      0.15723042 = score(doc=1444,freq=12.0), product of:
        0.27833027 = queryWeight, product of:
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.053336553 = queryNorm
        0.56490594 = fieldWeight in 1444, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.03125 = fieldNorm(doc=1444)
    0.07819432 = sum of:
      0.049288854 = weight(_text_:model in 1444) [ClassicSimilarity], result of:
        0.049288854 = score(doc=1444,freq=4.0), product of:
          0.2050911 = queryWeight, product of:
            3.845226 = idf(docFreq=2569, maxDocs=44218)
            0.053336553 = queryNorm
          0.24032663 = fieldWeight in 1444, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            3.845226 = idf(docFreq=2569, maxDocs=44218)
            0.03125 = fieldNorm(doc=1444)
      0.028905466 = weight(_text_:22 in 1444) [ClassicSimilarity], result of:
        0.028905466 = score(doc=1444,freq=2.0), product of:
          0.18677552 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.053336553 = queryNorm
          0.15476047 = fieldWeight in 1444, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.03125 = fieldNorm(doc=1444)
  0.75 = coord(3/4)

Abstract: Acquiring knowledge in any field involves information retrieval, i.e. searching the available documents to identify answers to the queries concerning the selected objects. Knowing the keywords which are names of the objects will enable situating the user's query in the information space organized as a thesaurus or faceted classification. Objectives: Identification the areas in the information space which correspond to gaps in the user's personal knowledge or in the domain knowledge might become useful in theory or practice. The aim of this paper is to present a realistic information-space model of a self-authored full-text document on information culture, indexed by the author of this article. Methodology: Having established the relations between the terms, particular modules (sets of terms connected by relations used in facet classification) are situated on a plain, similarly to a communication map. Conclusions drawn from the "journey" on the map, which is a visualization of the knowledge contained in the analysed document, are the crucial part of this paper. Results: The direct result of the research is the created model of information space visualization of a given document (book, article, website). The proposed procedure can practically be used as a new form of representation in order to map the contents of academic books and articles, beside the traditional index form, especially as an e-book auxiliary tool. In teaching, visualization of the information space of a document can be used to help students understand the issues of: classification, categorization and representation of new knowledge emerging in human mind.
Source: Knowledge organization in the 21st century: between historical patterns and future prospects. Proceedings of the Thirteenth International ISKO Conference 19-22 May 2014, Kraków, Poland. Ed.: Wieslaw Babik

Guerrero, V.P.; Moya Anegón, F. de: Reduction of the dimension of a document space using the fuzzified output of a Kohonen network (2001) 0.20

0.19889136 = product of:
  0.3977827 = sum of:
    0.30149913 = weight(_text_:vectors in 6935) [ClassicSimilarity], result of:
      0.30149913 = score(doc=6935,freq=4.0), product of:
        0.41416162 = queryWeight, product of:
          7.7650614 = idf(docFreq=50, maxDocs=44218)
          0.053336553 = queryNorm
        0.72797453 = fieldWeight in 6935, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          7.7650614 = idf(docFreq=50, maxDocs=44218)
          0.046875 = fieldNorm(doc=6935)
    0.09628358 = weight(_text_:space in 6935) [ClassicSimilarity], result of:
      0.09628358 = score(doc=6935,freq=2.0), product of:
        0.27833027 = queryWeight, product of:
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.053336553 = queryNorm
        0.34593284 = fieldWeight in 6935, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.046875 = fieldNorm(doc=6935)
  0.5 = coord(2/4)

Abstract: The vectors used in IR, whether to represent the documents or the terms, are high dimensional, and their dimensions increase as one approaches real problems. The algorithms used to manipulate them, however, consume enormously increasing amounts of computational capacity as the said dimension grows. We used the Kohonen algorithm and a fuzzification module to perform a fuzzy clustering of the terms. The degrees of membership obtained were used to represent the terms and, by extension, the documents, yielding a smaller number of components but still endowed with meaning. To test the results, we use a topological classification of sets of transformed and untransformed vectors to check that the same structure underlies both.

Duwairi, R.M.: Machine learning for Arabic text categorization (2006) 0.20

0.19889136 = product of:
  0.3977827 = sum of:
    0.30149913 = weight(_text_:vectors in 5115) [ClassicSimilarity], result of:
      0.30149913 = score(doc=5115,freq=4.0), product of:
        0.41416162 = queryWeight, product of:
          7.7650614 = idf(docFreq=50, maxDocs=44218)
          0.053336553 = queryNorm
        0.72797453 = fieldWeight in 5115, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          7.7650614 = idf(docFreq=50, maxDocs=44218)
          0.046875 = fieldNorm(doc=5115)
    0.09628358 = weight(_text_:space in 5115) [ClassicSimilarity], result of:
      0.09628358 = score(doc=5115,freq=2.0), product of:
        0.27833027 = queryWeight, product of:
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.053336553 = queryNorm
        0.34593284 = fieldWeight in 5115, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.046875 = fieldNorm(doc=5115)
  0.5 = coord(2/4)

Abstract: In this article we propose a distance-based classifier for categorizing Arabic text. Each category is represented as a vector of words in an m-dimensional space, and documents are classified on the basis of their closeness to feature vectors of categories. The classifier, in its learning phase, scans the set of training documents to extract features of categories that capture inherent category-specific properties; in its testing phase the classifier uses previously determined category-specific features to categorize unclassified documents. Stemming was used to reduce the dimensionality of feature vectors of documents. The accuracy of the classifier was tested by carrying out several categorization tasks on an in-house collected Arabic corpus. The results show that the proposed classifier is very accurate and robust.

Wong, S.K.M.; Yao, Y.Y.; Salton, G.; Buckley, C.: Evaluation of an adaptive linear model (1991) 0.17

0.16677245 = product of:
  0.3335449 = sum of:
    0.28425607 = weight(_text_:vectors in 4836) [ClassicSimilarity], result of:
      0.28425607 = score(doc=4836,freq=2.0), product of:
        0.41416162 = queryWeight, product of:
          7.7650614 = idf(docFreq=50, maxDocs=44218)
          0.053336553 = queryNorm
        0.6863409 = fieldWeight in 4836, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          7.7650614 = idf(docFreq=50, maxDocs=44218)
          0.0625 = fieldNorm(doc=4836)
    0.049288854 = product of:
      0.09857771 = sum of:
        0.09857771 = weight(_text_:model in 4836) [ClassicSimilarity], result of:
          0.09857771 = score(doc=4836,freq=4.0), product of:
            0.2050911 = queryWeight, product of:
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.053336553 = queryNorm
            0.48065326 = fieldWeight in 4836, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.0625 = fieldNorm(doc=4836)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: Reports on the experimental evaluation of an adaptive linear model that constructs improved user query vectors from user preference judgements on a sample set of documents. The performance of this method is compared with that of the standard relevance feedback techniques. The experimental results seem to demonstrate the effectiveness of the adaptive method

Ozkarahan, E.: Multimedia document retrieval (1995) 0.16

0.16452442 = product of:
  0.2193659 = sum of:
    0.038844086 = product of:
      0.11653225 = sum of:
        0.11653225 = weight(_text_:objects in 1492) [ClassicSimilarity], result of:
          0.11653225 = score(doc=1492,freq=2.0), product of:
            0.28348756 = queryWeight, product of:
              5.315071 = idf(docFreq=590, maxDocs=44218)
              0.053336553 = queryNorm
            0.41106653 = fieldWeight in 1492, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.315071 = idf(docFreq=590, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1492)
      0.33333334 = coord(1/3)
    0.11233084 = weight(_text_:space in 1492) [ClassicSimilarity], result of:
      0.11233084 = score(doc=1492,freq=2.0), product of:
        0.27833027 = queryWeight, product of:
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.053336553 = queryNorm
        0.4035883 = fieldWeight in 1492, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1492)
    0.068190955 = product of:
      0.13638191 = sum of:
        0.13638191 = weight(_text_:model in 1492) [ClassicSimilarity], result of:
          0.13638191 = score(doc=1492,freq=10.0), product of:
            0.2050911 = queryWeight, product of:
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.053336553 = queryNorm
            0.6649821 = fieldWeight in 1492, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1492)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: Develops an integrated conceptual representation scheme for multimedia documents that are viewed to comprise an object-oriented database. Develops the necessary abstractions for the conceptual model and extensions to the RM/T relational model used as the search structure. Develops a retrieval model in which the database search space is 1st narrowed down, based on user query, by an associative search. The associative search is followed by semantic and media-specific searches. A query langugae called SQLX is introduced to fomulate these searches directly from the conceptual model. In SQLX, connector attributes replace join, and abstract data type enables use of objects and their methods in query formulation. Describes a temporal model for time-dependent presentations and with the directions for future work

Tomassen, S.L.: Research on ontology-driven information retrieval (2006 (?)) 0.15

0.15473782 = product of:
  0.30947563 = sum of:
    0.21319206 = weight(_text_:vectors in 4328) [ClassicSimilarity], result of:
      0.21319206 = score(doc=4328,freq=2.0), product of:
        0.41416162 = queryWeight, product of:
          7.7650614 = idf(docFreq=50, maxDocs=44218)
          0.053336553 = queryNorm
        0.5147557 = fieldWeight in 4328, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          7.7650614 = idf(docFreq=50, maxDocs=44218)
          0.046875 = fieldNorm(doc=4328)
    0.09628358 = weight(_text_:space in 4328) [ClassicSimilarity], result of:
      0.09628358 = score(doc=4328,freq=2.0), product of:
        0.27833027 = queryWeight, product of:
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.053336553 = queryNorm
        0.34593284 = fieldWeight in 4328, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.046875 = fieldNorm(doc=4328)
  0.5 = coord(2/4)

Abstract: An increasing number of recent information retrieval systems make use of ontologies to help the users clarify their information needs and come up with semantic representations of documents. A particular concern here is the integration of these semantic approaches with traditional search technology. The research presented in this paper examines how ontologies can be efficiently applied to large-scale search systems for the web. We describe how these systems can be enriched with adapted ontologies to provide both an in-depth understanding of the user's needs as well as an easy integration with standard vector-space retrieval systems. The ontology concepts are adapted to the domain terminology by computing a feature vector for each concept. Later, the feature vectors are used to enrich a provided query. The whole retrieval system is under development as part of a larger Semantic Web standardization project for the Norwegian oil & gas sector.

Xiong, C.: Knowledge based text representations for information retrieval (2016) 0.14
```
0.14422366 = product of:
  0.1922982 = sum of:
    0.05647506 = product of:
      0.16942517 = sum of:
        0.16942517 = weight(_text_:3a in 5820) [ClassicSimilarity], result of:
          0.16942517 = score(doc=5820,freq=2.0), product of:
            0.4521879 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.053336553 = queryNorm
            0.3746787 = fieldWeight in 5820, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.03125 = fieldNorm(doc=5820)
      0.33333334 = coord(1/3)
    0.1111787 = weight(_text_:space in 5820) [ClassicSimilarity], result of:
      0.1111787 = score(doc=5820,freq=6.0), product of:
        0.27833027 = queryWeight, product of:
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.053336553 = queryNorm
        0.39944884 = fieldWeight in 5820, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.03125 = fieldNorm(doc=5820)
    0.024644427 = product of:
      0.049288854 = sum of:
        0.049288854 = weight(_text_:model in 5820) [ClassicSimilarity], result of:
          0.049288854 = score(doc=5820,freq=4.0), product of:
            0.2050911 = queryWeight, product of:
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.053336553 = queryNorm
            0.24032663 = fieldWeight in 5820, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.03125 = fieldNorm(doc=5820)
      0.5 = coord(1/2)
  0.75 = coord(3/4)
```
Abstract

The successes of information retrieval (IR) in recent decades were built upon bag-of-words representations. Effective as it is, bag-of-words is only a shallow text understanding; there is a limited amount of information for document ranking in the word space. This dissertation goes beyond words and builds knowledge based text representations, which embed the external and carefully curated information from knowledge bases, and provide richer and structured evidence for more advanced information retrieval systems. This thesis research first builds query representations with entities associated with the query. Entities' descriptions are used by query expansion techniques that enrich the query with explanation terms. Then we present a general framework that represents a query with entities that appear in the query, are retrieved by the query, or frequently show up in the top retrieved documents. A latent space model is developed to jointly learn the connections from query to entities and the ranking of documents, modeling the external evidence from knowledge bases and internal ranking features cooperatively. To further improve the quality of relevant entities, a defining factor of our query representations, we introduce learning to rank to entity search and retrieve better entities from knowledge bases. In the document representation part, this thesis research also moves one step forward with a bag-of-entities model, in which documents are represented by their automatic entity annotations, and the ranking is performed in the entity space.

Content

Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Language and Information Technologies. Vgl.: https%3A%2F%2Fwww.cs.cmu.edu%2F~cx%2Fpapers%2Fknowledge_based_text_representation.pdf&usg=AOvVaw0SaTSvhWLTh__Uz_HtOtl3.

Mu, T.; Goulermas, J.Y.; Korkontzelos, I.; Ananiadou, S.: Descriptive document clustering via discriminant learning in a co-embedded space of multilevel similarities (2016) 0.14

0.14137648 = product of:
  0.18850197 = sum of:
    0.027745776 = product of:
      0.08323733 = sum of:
        0.08323733 = weight(_text_:objects in 2496) [ClassicSimilarity], result of:
          0.08323733 = score(doc=2496,freq=2.0), product of:
            0.28348756 = queryWeight, product of:
              5.315071 = idf(docFreq=590, maxDocs=44218)
              0.053336553 = queryNorm
            0.29361898 = fieldWeight in 2496, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.315071 = idf(docFreq=590, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2496)
      0.33333334 = coord(1/3)
    0.13897339 = weight(_text_:space in 2496) [ClassicSimilarity], result of:
      0.13897339 = score(doc=2496,freq=6.0), product of:
        0.27833027 = queryWeight, product of:
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.053336553 = queryNorm
        0.49931106 = fieldWeight in 2496, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2496)
    0.0217828 = product of:
      0.0435656 = sum of:
        0.0435656 = weight(_text_:model in 2496) [ClassicSimilarity], result of:
          0.0435656 = score(doc=2496,freq=2.0), product of:
            0.2050911 = queryWeight, product of:
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.053336553 = queryNorm
            0.21242073 = fieldWeight in 2496, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2496)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: Descriptive document clustering aims at discovering clusters of semantically interrelated documents together with meaningful labels to summarize the content of each document cluster. In this work, we propose a novel descriptive clustering framework, referred to as CEDL. It relies on the formulation and generation of 2 types of heterogeneous objects, which correspond to documents and candidate phrases, using multilevel similarity information. CEDL is composed of 5 main processing stages. First, it simultaneously maps the documents and candidate phrases into a common co-embedded space that preserves higher-order, neighbor-based proximities between the combined sets of documents and phrases. Then, it discovers an approximate cluster structure of documents in the common space. The third stage extracts promising topic phrases by constructing a discriminant model where documents along with their cluster memberships are used as training instances. Subsequently, the final cluster labels are selected from the topic phrases using a ranking scheme using multiple scores based on the extracted co-embedding information and the discriminant output. The final stage polishes the initial clusters to reduce noise and accommodate the multitopic nature of documents. The effectiveness and competitiveness of CEDL is demonstrated qualitatively and quantitatively with experiments using document databases from different application fields.

Dominich, S.: Interaction information retrieval (1994) 0.14

0.13960999 = product of:
  0.27921999 = sum of:
    0.24872407 = weight(_text_:vectors in 8157) [ClassicSimilarity], result of:
      0.24872407 = score(doc=8157,freq=2.0), product of:
        0.41416162 = queryWeight, product of:
          7.7650614 = idf(docFreq=50, maxDocs=44218)
          0.053336553 = queryNorm
        0.6005483 = fieldWeight in 8157, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          7.7650614 = idf(docFreq=50, maxDocs=44218)
          0.0546875 = fieldNorm(doc=8157)
    0.030495923 = product of:
      0.060991846 = sum of:
        0.060991846 = weight(_text_:model in 8157) [ClassicSimilarity], result of:
          0.060991846 = score(doc=8157,freq=2.0), product of:
            0.2050911 = queryWeight, product of:
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.053336553 = queryNorm
            0.29738903 = fieldWeight in 8157, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.0546875 = fieldNorm(doc=8157)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: In existing information retrieval models there are three different ways documents are represented for retrieval purposes: vectors of weights, collections of sentences and artificial neurons. Accordingly, retrieval depends on a similarity function, or means an inference, or is a spreading of activation. Relevancy is considered to be a critical modelling parameter which is either a priori or it is not treated at all. Assuming that relevancy may equally be an emergent entity, thus not requiring any a priori modelling, the paper proposes the Interaction Informatzion Retrieval model in which documents are interconnected, queries and documents are treated in the same way, and in which retrieval is the result of the interconnection between query and documents. Algorithms and experiences gained with practical applications are presented. A theoretical mathematical formulation of this type of retrieval is also given

Kontostathis, A.; Pottenger, W.M.: ¬A framework for understanding Latent Semantic Indexing (LSI) performance (2006) 0.14

0.13960999 = product of:
  0.27921999 = sum of:
    0.24872407 = weight(_text_:vectors in 959) [ClassicSimilarity], result of:
      0.24872407 = score(doc=959,freq=2.0), product of:
        0.41416162 = queryWeight, product of:
          7.7650614 = idf(docFreq=50, maxDocs=44218)
          0.053336553 = queryNorm
        0.6005483 = fieldWeight in 959, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          7.7650614 = idf(docFreq=50, maxDocs=44218)
          0.0546875 = fieldNorm(doc=959)
    0.030495923 = product of:
      0.060991846 = sum of:
        0.060991846 = weight(_text_:model in 959) [ClassicSimilarity], result of:
          0.060991846 = score(doc=959,freq=2.0), product of:
            0.2050911 = queryWeight, product of:
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.053336553 = queryNorm
            0.29738903 = fieldWeight in 959, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.0546875 = fieldNorm(doc=959)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: In this paper we present a theoretical model for understanding the performance of Latent Semantic Indexing (LSI) search and retrieval application. Many models for understanding LSI have been proposed. Ours is the first to study the values produced by LSI in the term by dimension vectors. The framework presented here is based on term co-occurrence data. We show a strong correlation between second-order term co-occurrence and the values produced by the Singular Value Decomposition (SVD) algorithm that forms the foundation for LSI. We also present a mathematical proof that the SVD algorithm encapsulates term co-occurrence information.

Li, D.; Kwong, C.-P.; Lee, D.L.: Unified linear subspace approach to semantic analysis (2009) 0.14
```
0.13921893 = product of:
  0.27843785 = sum of:
    0.24070893 = weight(_text_:space in 3321) [ClassicSimilarity], result of:
      0.24070893 = score(doc=3321,freq=18.0), product of:
        0.27833027 = queryWeight, product of:
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.053336553 = queryNorm
        0.86483204 = fieldWeight in 3321, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          5.2183776 = idf(docFreq=650, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3321)
    0.03772892 = product of:
      0.07545784 = sum of:
        0.07545784 = weight(_text_:model in 3321) [ClassicSimilarity], result of:
          0.07545784 = score(doc=3321,freq=6.0), product of:
            0.2050911 = queryWeight, product of:
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.053336553 = queryNorm
            0.36792353 = fieldWeight in 3321, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.845226 = idf(docFreq=2569, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3321)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

The Basic Vector Space Model (BVSM) is well known in information retrieval. Unfortunately, its retrieval effectiveness is limited because it is based on literal term matching. The Generalized Vector Space Model (GVSM) and Latent Semantic Indexing (LSI) are two prominent semantic retrieval methods, both of which assume there is some underlying latent semantic structure in a dataset that can be used to improve retrieval performance. However, while this structure may be derived from both the term space and the document space, GVSM exploits only the former and LSI the latter. In this article, the latent semantic structure of a dataset is examined from a dual perspective; namely, we consider the term space and the document space simultaneously. This new viewpoint has a natural connection to the notion of kernels. Specifically, a unified kernel function can be derived for a class of vector space models. The dual perspective provides a deeper understanding of the semantic space and makes transparent the geometrical meaning of the unified kernel function. New semantic analysis methods based on the unified kernel function are developed, which combine the advantages of LSI and GVSM. We also prove that the new methods are stable because although the selected rank of the truncated Singular Value Decomposition (SVD) is far from the optimum, the retrieval performance will not be degraded significantly. Experiments performed on standard test collections show that our methods are promising.

Object

Generalized Vector Space Model

Search (6820 results, page 1 of 341)

Authors

Years

Languages

Types

Themes

Subjects

Classifications