Search (293 results, page 1 of 15)

Lee, J.; Min, J.-K.; Oh, A.; Chung, C.-W.: Effective ranking and search techniques for Web resources considering semantic relationships (2014) 0.11

0.11097769 = product of:
  0.18496281 = sum of:
    0.033088673 = weight(_text_:retrieval in 2670) [ClassicSimilarity], result of:
      0.033088673 = score(doc=2670,freq=4.0), product of:
        0.14001551 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04628742 = queryNorm
        0.23632148 = fieldWeight in 2670, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2670)
    0.13261695 = weight(_text_:semantic in 2670) [ClassicSimilarity], result of:
      0.13261695 = score(doc=2670,freq=18.0), product of:
        0.19245663 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.04628742 = queryNorm
        0.68907446 = fieldWeight in 2670, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2670)
    0.019257195 = product of:
      0.03851439 = sum of:
        0.03851439 = weight(_text_:web in 2670) [ClassicSimilarity], result of:
          0.03851439 = score(doc=2670,freq=4.0), product of:
            0.15105948 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.04628742 = queryNorm
            0.25496176 = fieldWeight in 2670, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2670)
      0.5 = coord(1/2)
  0.6 = coord(3/5)

Abstract: On the Semantic Web, the types of resources and the semantic relationships between resources are defined in an ontology. By using that information, the accuracy of information retrieval can be improved. In this paper, we present effective ranking and search techniques considering the semantic relationships in an ontology. Our technique retrieves top-k resources which are the most relevant to query keywords through the semantic relationships. To do this, we propose a weighting measure for the semantic relationship. Based on this measure, we propose a novel ranking method which considers the number of meaningful semantic relationships between a resource and keywords as well as the coverage and discriminating power of keywords. In order to improve the efficiency of the search, we prune the unnecessary search space using the length and weight thresholds of the semantic relationship path. In addition, we exploit Threshold Algorithm based on an extended inverted index to answer top-k results efficiently. The experimental results using real data sets demonstrate that our retrieval method using the semantic information generates accurate results efficiently compared to the traditional methods.

Koumenides, C.L.; Shadbolt, N.R.: Ranking methods for entity-oriented semantic web search (2014) 0.10

0.10134516 = product of:
  0.1689086 = sum of:
    0.03970641 = weight(_text_:retrieval in 1280) [ClassicSimilarity], result of:
      0.03970641 = score(doc=1280,freq=4.0), product of:
        0.14001551 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04628742 = queryNorm
        0.2835858 = fieldWeight in 1280, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=1280)
    0.10609356 = weight(_text_:semantic in 1280) [ClassicSimilarity], result of:
      0.10609356 = score(doc=1280,freq=8.0), product of:
        0.19245663 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.04628742 = queryNorm
        0.5512596 = fieldWeight in 1280, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.046875 = fieldNorm(doc=1280)
    0.023108633 = product of:
      0.046217266 = sum of:
        0.046217266 = weight(_text_:web in 1280) [ClassicSimilarity], result of:
          0.046217266 = score(doc=1280,freq=4.0), product of:
            0.15105948 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.04628742 = queryNorm
            0.3059541 = fieldWeight in 1280, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=1280)
      0.5 = coord(1/2)
  0.6 = coord(3/5)

Abstract: This article provides a technical review of semantic search methods used to support text-based search over formal Semantic Web knowledge bases. Our focus is on ranking methods and auxiliary processes explored by existing semantic search systems, outlined within broad areas of classification. We present reflective examples from the literature in some detail, which should appeal to readers interested in a deeper perspective on the various methods and systems implemented in the outlined literature. The presentation covers graph exploration and propagation methods, adaptations of classic probabilistic retrieval models, and query-independent link analysis via flexible extensions to the PageRank algorithm. Future research directions are discussed, including development of more cohesive retrieval models to unlock further potentials and uses, data indexing schemes, integration with user interfaces, and building community consensus for more systematic evaluation and gradual development.

Ning, X.; Jin, H.; Jia, W.; Yuan, P.: Practical and effective IR-style keyword search over semantic web (2009) 0.10

0.09648418 = product of:
  0.16080695 = sum of:
    0.04632414 = weight(_text_:retrieval in 4213) [ClassicSimilarity], result of:
      0.04632414 = score(doc=4213,freq=4.0), product of:
        0.14001551 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04628742 = queryNorm
        0.33085006 = fieldWeight in 4213, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4213)
    0.08752273 = weight(_text_:semantic in 4213) [ClassicSimilarity], result of:
      0.08752273 = score(doc=4213,freq=4.0), product of:
        0.19245663 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.04628742 = queryNorm
        0.45476598 = fieldWeight in 4213, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4213)
    0.026960073 = product of:
      0.053920146 = sum of:
        0.053920146 = weight(_text_:web in 4213) [ClassicSimilarity], result of:
          0.053920146 = score(doc=4213,freq=4.0), product of:
            0.15105948 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.04628742 = queryNorm
            0.35694647 = fieldWeight in 4213, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4213)
      0.5 = coord(1/2)
  0.6 = coord(3/5)

Abstract: This paper presents a novel IR-style keyword search model for semantic web data retrieval, distinguished from current retrieval methods. In this model, an answer to a keyword query is a connected subgraph that contains all the query keywords. In addition, the answer is minimal because any proper subgraph can not be an answer to the query. We provide an approximation algorithm to retrieve these answers efficiently. A special ranking strategy is also proposed so that answers can be appropriately ordered. The experimental results over real datasets show that our model outperforms existing possible solutions with respect to effectiveness and efficiency.

Habernal, I.; Konopík, M.; Rohlík, O.: Question answering (2012) 0.09

0.088055015 = product of:
  0.14675835 = sum of:
    0.048630223 = weight(_text_:retrieval in 101) [ClassicSimilarity], result of:
      0.048630223 = score(doc=101,freq=6.0), product of:
        0.14001551 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04628742 = queryNorm
        0.34732026 = fieldWeight in 101, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=101)
    0.075019486 = weight(_text_:semantic in 101) [ClassicSimilarity], result of:
      0.075019486 = score(doc=101,freq=4.0), product of:
        0.19245663 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.04628742 = queryNorm
        0.38979942 = fieldWeight in 101, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.046875 = fieldNorm(doc=101)
    0.023108633 = product of:
      0.046217266 = sum of:
        0.046217266 = weight(_text_:web in 101) [ClassicSimilarity], result of:
          0.046217266 = score(doc=101,freq=4.0), product of:
            0.15105948 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.04628742 = queryNorm
            0.3059541 = fieldWeight in 101, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=101)
      0.5 = coord(1/2)
  0.6 = coord(3/5)

Abstract: Question Answering is an area of information retrieval with the added challenge of applying sophisticated techniques to identify the complex syntactic and semantic relationships present in text in order to provide a more sophisticated and satisfactory response to the user's information needs. For this reason, the authors see question answering as the next step beyond standard information retrieval. In this chapter state of the art question answering is covered focusing on providing an overview of systems, techniques and approaches that are likely to be employed in the next generations of search engines. Special attention is paid to question answering using the World Wide Web as the data source and to question answering exploiting the possibilities of Semantic Web. Considerations about the current issues and prospects for promising future research are also provided.
Source: Next generation search engines: advanced models for information retrieval. Eds.: C. Jouis, u.a

Jindal, V.; Bawa, S.; Batra, S.: ¬A review of ranking approaches for semantic search on Web (2014) 0.09

0.088055015 = product of:
  0.14675835 = sum of:
    0.048630223 = weight(_text_:retrieval in 2799) [ClassicSimilarity], result of:
      0.048630223 = score(doc=2799,freq=6.0), product of:
        0.14001551 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04628742 = queryNorm
        0.34732026 = fieldWeight in 2799, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=2799)
    0.075019486 = weight(_text_:semantic in 2799) [ClassicSimilarity], result of:
      0.075019486 = score(doc=2799,freq=4.0), product of:
        0.19245663 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.04628742 = queryNorm
        0.38979942 = fieldWeight in 2799, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.046875 = fieldNorm(doc=2799)
    0.023108633 = product of:
      0.046217266 = sum of:
        0.046217266 = weight(_text_:web in 2799) [ClassicSimilarity], result of:
          0.046217266 = score(doc=2799,freq=4.0), product of:
            0.15105948 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.04628742 = queryNorm
            0.3059541 = fieldWeight in 2799, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=2799)
      0.5 = coord(1/2)
  0.6 = coord(3/5)

Abstract: With ever increasing information being available to the end users, search engines have become the most powerful tools for obtaining useful information scattered on the Web. However, it is very common that even most renowned search engines return result sets with not so useful pages to the user. Research on semantic search aims to improve traditional information search and retrieval methods where the basic relevance criteria rely primarily on the presence of query keywords within the returned pages. This work is an attempt to explore different relevancy ranking approaches based on semantics which are considered appropriate for the retrieval of relevant information. In this paper, various pilot projects and their corresponding outcomes have been investigated based on methodologies adopted and their most distinctive characteristics towards ranking. An overview of selected approaches and their comparison by means of the classification criteria has been presented. With the help of this comparison, some common concepts and outstanding features have been identified.
Theme: Semantisches Umfeld in Indexierung u. Retrieval

Bhansali, D.; Desai, H.; Deulkar, K.: ¬A study of different ranking approaches for semantic search (2015) 0.08

0.08123621 = product of:
  0.13539368 = sum of:
    0.023397226 = weight(_text_:retrieval in 2696) [ClassicSimilarity], result of:
      0.023397226 = score(doc=2696,freq=2.0), product of:
        0.14001551 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04628742 = queryNorm
        0.16710453 = fieldWeight in 2696, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2696)
    0.0884113 = weight(_text_:semantic in 2696) [ClassicSimilarity], result of:
      0.0884113 = score(doc=2696,freq=8.0), product of:
        0.19245663 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.04628742 = queryNorm
        0.45938298 = fieldWeight in 2696, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2696)
    0.02358515 = product of:
      0.0471703 = sum of:
        0.0471703 = weight(_text_:web in 2696) [ClassicSimilarity], result of:
          0.0471703 = score(doc=2696,freq=6.0), product of:
            0.15105948 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.04628742 = queryNorm
            0.3122631 = fieldWeight in 2696, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2696)
      0.5 = coord(1/2)
  0.6 = coord(3/5)

Abstract: Search Engines have become an integral part of our day to day life. Our reliance on search engines increases with every passing day. With the amount of data available on Internet increasing exponentially, it becomes important to develop new methods and tools that help to return results relevant to the queries and reduce the time spent on searching. The results should be diverse but at the same time should return results focused on the queries asked. Relation Based Page Rank [4] algorithms are considered to be the next frontier in improvement of Semantic Web Search. The probability of finding relevance in the search results as posited by the user while entering the query is used to measure the relevance. However, its application is limited by the complexity of determining relation between the terms and assigning explicit meaning to each term. Trust Rank is one of the most widely used ranking algorithms for semantic web search. Few other ranking algorithms like HITS algorithm, PageRank algorithm are also used for Semantic Web Searching. In this paper, we will provide a comparison of few ranking approaches.
Theme: Semantisches Umfeld in Indexierung u. Retrieval

Kulyukin, V.A.; Settle, A.: Ranked retrieval with semantic networks and vector spaces (2001) 0.07

0.07493864 = product of:
  0.18734659 = sum of:
    0.064840294 = weight(_text_:retrieval in 6934) [ClassicSimilarity], result of:
      0.064840294 = score(doc=6934,freq=6.0), product of:
        0.14001551 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04628742 = queryNorm
        0.46309367 = fieldWeight in 6934, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0625 = fieldNorm(doc=6934)
    0.1225063 = weight(_text_:semantic in 6934) [ClassicSimilarity], result of:
      0.1225063 = score(doc=6934,freq=6.0), product of:
        0.19245663 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.04628742 = queryNorm
        0.63653976 = fieldWeight in 6934, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0625 = fieldNorm(doc=6934)
  0.4 = coord(2/5)

Abstract: The equivalence of semantic networks with spreading activation and vector spaces with dot product is investigated under ranked retrieval. Semantic networks are viewed as networks of concepts organized in terms of abstraction and packaging relations. It is shown that the two models can be effectively constructed from each other. A formal method is suggested to analyze the models in terms of their relative performance in the same universe of objects
Theme: Semantisches Umfeld in Indexierung u. Retrieval

Ning, X.; Jin, H.; Wu, H.: RSS: a framework enabling ranked search on the semantic web (2008) 0.07
```
0.07405119 = product of:
  0.18512796 = sum of:
    0.14661357 = weight(_text_:semantic in 2069) [ClassicSimilarity], result of:
      0.14661357 = score(doc=2069,freq=22.0), product of:
        0.19245663 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.04628742 = queryNorm
        0.7618005 = fieldWeight in 2069, product of:
          4.690416 = tf(freq=22.0), with freq of:
            22.0 = termFreq=22.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2069)
    0.03851439 = product of:
      0.07702878 = sum of:
        0.07702878 = weight(_text_:web in 2069) [ClassicSimilarity], result of:
          0.07702878 = score(doc=2069,freq=16.0), product of:
            0.15105948 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.04628742 = queryNorm
            0.5099235 = fieldWeight in 2069, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2069)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

The semantic web not only contains resources but also includes the heterogeneous relationships among them, which is sharply distinguished from the current web. As the growth of the semantic web, specialized search techniques are of significance. In this paper, we present RSS-a framework for enabling ranked semantic search on the semantic web. In this framework, the heterogeneity of relationships is fully exploited to determine the global importance of resources. In addition, the search results can be greatly expanded with entities most semantically related to the query, thus able to provide users with properly ordered semantic search results by combining global ranking values and the relevance between the resources and the query. The proposed semantic search model which supports inference is very different from traditional keyword-based search methods. Moreover, RSS also distinguishes from many current methods of accessing the semantic web data in that it applies novel ranking strategies to prevent returning search results in disorder. The experimental results show that the framework is feasible and can produce better ordering of semantic search results than directly applying the standard PageRank algorithm on the semantic web.

Theme

Semantic Web
Song, D.; Bruza, P.D.: Towards context sensitive information inference (2003) 0.07
```
0.07123182 = product of:
  0.11871969 = sum of:
    0.040525187 = weight(_text_:retrieval in 1428) [ClassicSimilarity], result of:
      0.040525187 = score(doc=1428,freq=6.0), product of:
        0.14001551 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04628742 = queryNorm
        0.28943354 = fieldWeight in 1428, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1428)
    0.062516235 = weight(_text_:semantic in 1428) [ClassicSimilarity], result of:
      0.062516235 = score(doc=1428,freq=4.0), product of:
        0.19245663 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.04628742 = queryNorm
        0.32483283 = fieldWeight in 1428, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1428)
    0.015678266 = product of:
      0.031356532 = sum of:
        0.031356532 = weight(_text_:22 in 1428) [ClassicSimilarity], result of:
          0.031356532 = score(doc=1428,freq=2.0), product of:
            0.16209066 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04628742 = queryNorm
            0.19345059 = fieldWeight in 1428, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1428)
      0.5 = coord(1/2)
  0.6 = coord(3/5)
```
Abstract

Humans can make hasty, but generally robust judgements about what a text fragment is, or is not, about. Such judgements are termed information inference. This article furnishes an account of information inference from a psychologistic stance. By drawing an theories from nonclassical logic and applied cognition, an information inference mechanism is proposed that makes inferences via computations of information flow through an approximation of a conceptual space. Within a conceptual space information is represented geometrically. In this article, geometric representations of words are realized as vectors in a high dimensional semantic space, which is automatically constructed from a text corpus. Two approaches were presented for priming vector representations according to context. The first approach uses a concept combination heuristic to adjust the vector representation of a concept in the light of the representation of another concept. The second approach computes a prototypical concept an the basis of exemplar trace texts and moves it in the dimensional space according to the context. Information inference is evaluated by measuring the effectiveness of query models derived by information flow computations. Results show that information flow contributes significantly to query model effectiveness, particularly with respect to precision. Moreover, retrieval effectiveness compares favorably with two probabilistic query models, and another based an semantic association. More generally, this article can be seen as a contribution towards realizing operational systems that mimic text-based human reasoning.

Date

22. 3.2003 19:35:46

Footnote

Beitrag eines Themenheftes: Mathematical, logical, and formal methods in information retrieval

Theme

Semantisches Umfeld in Indexierung u. Retrieval
Qi, Q.; Hessen, D.J.; Heijden, P.G.M. van der: Improving information retrieval through correspondenceanalysis instead of latent semantic analysis (2023) 0.06
```
0.06489877 = product of:
  0.16224691 = sum of:
    0.056153342 = weight(_text_:retrieval in 1045) [ClassicSimilarity], result of:
      0.056153342 = score(doc=1045,freq=8.0), product of:
        0.14001551 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04628742 = queryNorm
        0.40105087 = fieldWeight in 1045, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=1045)
    0.10609356 = weight(_text_:semantic in 1045) [ClassicSimilarity], result of:
      0.10609356 = score(doc=1045,freq=8.0), product of:
        0.19245663 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.04628742 = queryNorm
        0.5512596 = fieldWeight in 1045, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.046875 = fieldNorm(doc=1045)
  0.4 = coord(2/5)
```
Abstract

The initial dimensions extracted by latent semantic analysis (LSA) of a document-term matrixhave been shown to mainly display marginal effects, which are irrelevant for informationretrieval. To improve the performance of LSA, usually the elements of the raw document-term matrix are weighted and the weighting exponent of singular values can be adjusted.An alternative information retrieval technique that ignores the marginal effects is correspon-dence analysis (CA). In this paper, the information retrieval performance of LSA and CA isempirically compared. Moreover, it is explored whether the two weightings also improve theperformance of CA. The results for four empirical datasets show that CA always performsbetter than LSA. Weighting the elements of the raw data matrix can improve CA; however,it is data dependent and the improvement is small. Adjusting the singular value weightingexponent often improves the performance of CA; however, the extent of the improvementdepends on the dataset and the number of dimensions. (PDF) Improving information retrieval through correspondence analysis instead of latent semantic analysis.

Object

Latent Semantic Analysis

Smeaton, A.F.; Rijsbergen, C.J. van: ¬The retrieval effects of query expansion on a feedback document retrieval system (1983) 0.06

0.06294786 = product of:
  0.15736966 = sum of:
    0.11347052 = weight(_text_:retrieval in 2134) [ClassicSimilarity], result of:
      0.11347052 = score(doc=2134,freq=6.0), product of:
        0.14001551 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04628742 = queryNorm
        0.8104139 = fieldWeight in 2134, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.109375 = fieldNorm(doc=2134)
    0.043899145 = product of:
      0.08779829 = sum of:
        0.08779829 = weight(_text_:22 in 2134) [ClassicSimilarity], result of:
          0.08779829 = score(doc=2134,freq=2.0), product of:
            0.16209066 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04628742 = queryNorm
            0.5416616 = fieldWeight in 2134, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=2134)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Date: 30. 3.2001 13:32:22
Theme: Semantisches Umfeld in Indexierung u. Retrieval

Chang, C.-H.; Hsu, C.-C.: Integrating query expansion and conceptual relevance feedback for personalized Web information retrieval (1998) 0.06

0.06250469 = product of:
  0.15626171 = sum of:
    0.04632414 = weight(_text_:retrieval in 1319) [ClassicSimilarity], result of:
      0.04632414 = score(doc=1319,freq=4.0), product of:
        0.14001551 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04628742 = queryNorm
        0.33085006 = fieldWeight in 1319, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1319)
    0.10993756 = sum of:
      0.06603842 = weight(_text_:web in 1319) [ClassicSimilarity], result of:
        0.06603842 = score(doc=1319,freq=6.0), product of:
          0.15105948 = queryWeight, product of:
            3.2635105 = idf(docFreq=4597, maxDocs=44218)
            0.04628742 = queryNorm
          0.43716836 = fieldWeight in 1319, product of:
            2.4494898 = tf(freq=6.0), with freq of:
              6.0 = termFreq=6.0
            3.2635105 = idf(docFreq=4597, maxDocs=44218)
            0.0546875 = fieldNorm(doc=1319)
      0.043899145 = weight(_text_:22 in 1319) [ClassicSimilarity], result of:
        0.043899145 = score(doc=1319,freq=2.0), product of:
          0.16209066 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.04628742 = queryNorm
          0.2708308 = fieldWeight in 1319, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0546875 = fieldNorm(doc=1319)
  0.4 = coord(2/5)

Abstract: Keyword based querying has been an immediate and efficient way to specify and retrieve related information that the user inquired. However, conventional document ranking based on an automatic assessment of document relevance to the query may not be the best approach when little information is given. Proposes an idea to integrate 2 existing techniques, query expansion and relevance feedback to achieve a concept-based information search for the Web
Date: 1. 8.1996 22:08:06
Footnote: Contribution to a special issue devoted to the Proceedings of the 7th International World Wide Web Conference, held 14-18 April 1998, Brisbane, Australia
Theme: Semantisches Umfeld in Indexierung u. Retrieval

Computational information retrieval (2001) 0.06
```
0.05972135 = product of:
  0.14930338 = sum of:
    0.07428389 = weight(_text_:retrieval in 4167) [ClassicSimilarity], result of:
      0.07428389 = score(doc=4167,freq=14.0), product of:
        0.14001551 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04628742 = queryNorm
        0.5305404 = fieldWeight in 4167, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=4167)
    0.075019486 = weight(_text_:semantic in 4167) [ClassicSimilarity], result of:
      0.075019486 = score(doc=4167,freq=4.0), product of:
        0.19245663 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.04628742 = queryNorm
        0.38979942 = fieldWeight in 4167, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.046875 = fieldNorm(doc=4167)
  0.4 = coord(2/5)
```
Abstract

This volume contains selected papers that focus on the use of linear algebra, computational statistics, and computer science in the development of algorithms and software systems for text retrieval. Experts in information modeling and retrieval share their perspectives on the design of scalable but precise text retrieval systems, revealing many of the challenges and obstacles that mathematical and statistical models must overcome to be viable for automated text processing. This very useful proceedings is an excellent companion for courses in information retrieval, applied linear algebra, and applied statistics. Computational Information Retrieval provides background material on vector space models for text retrieval that applied mathematicians, statisticians, and computer scientists may not be familiar with. For graduate students in these areas, several research questions in information modeling are exposed. In addition, several case studies concerning the efficacy of the popular Latent Semantic Analysis (or Indexing) approach are provided.

Object

Latent Semantic Indexing
Urbain, J.; Goharian, N.; Frieder, O.: Probabilistic passage models for semantic search of genomics literature (2008) 0.06
```
0.058703244 = product of:
  0.14675811 = sum of:
    0.070191674 = weight(_text_:retrieval in 2380) [ClassicSimilarity], result of:
      0.070191674 = score(doc=2380,freq=18.0), product of:
        0.14001551 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04628742 = queryNorm
        0.50131357 = fieldWeight in 2380, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2380)
    0.076566435 = weight(_text_:semantic in 2380) [ClassicSimilarity], result of:
      0.076566435 = score(doc=2380,freq=6.0), product of:
        0.19245663 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.04628742 = queryNorm
        0.39783734 = fieldWeight in 2380, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2380)
  0.4 = coord(2/5)
```
Abstract

We explore unsupervised learning techniques for extracting semantic information about biomedical concepts and topics, and introduce a passage retrieval model for using these semantics in context to improve genomics literature search. Our contributions include a new passage retrieval model based on an undirected graphical model (Markov Random Fields), and new methods for modeling passage-concepts, document-topics, and passage-terms as potential functions within the model. Each potential function includes distributional evidence to disambiguate topics, concepts, and terms in context. The joint distribution across potential functions in the graph represents the probability of a passage being relevant to a biologist's information need. Relevance ranking within each potential function simplifies normalization across potential functions and eliminates the need for tuning of passage retrieval model parameters. Our dimensional indexing model facilitates efficient aggregation of topic, concept, and term distributions. The proposed passage-retrieval model improves search results in the presence of varying levels of semantic evidence, outperforming models of query terms, concepts, or document topics alone. Our results exceed the state-of-the-art for automatic document retrieval by 14.46% (0.3554 vs. 0.3105) and passage retrieval by 15.57% (0.1128 vs. 0.0976) as assessed by the TREC 2007 Genomics Track, and automatic document retrieval by 18.56% (0.3424 vs. 0.2888) as assessed by the TREC 2005 Genomics Track. Automatic document retrieval results for TREC 2007 and TREC 2005 are statistically significant at the 95% confidence level (p = .0359 and .0253, respectively). Passage retrieval is significant at the 90% confidence level (p = 0.0893).

Fan, W.; Fox, E.A.; Pathak, P.; Wu, H.: ¬The effects of fitness functions an genetic programming-based ranking discovery for Web search (2004) 0.06

0.057078134 = product of:
  0.14269534 = sum of:
    0.03970641 = weight(_text_:retrieval in 2239) [ClassicSimilarity], result of:
      0.03970641 = score(doc=2239,freq=4.0), product of:
        0.14001551 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04628742 = queryNorm
        0.2835858 = fieldWeight in 2239, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=2239)
    0.10298892 = sum of:
      0.06536108 = weight(_text_:web in 2239) [ClassicSimilarity], result of:
        0.06536108 = score(doc=2239,freq=8.0), product of:
          0.15105948 = queryWeight, product of:
            3.2635105 = idf(docFreq=4597, maxDocs=44218)
            0.04628742 = queryNorm
          0.43268442 = fieldWeight in 2239, product of:
            2.828427 = tf(freq=8.0), with freq of:
              8.0 = termFreq=8.0
            3.2635105 = idf(docFreq=4597, maxDocs=44218)
            0.046875 = fieldNorm(doc=2239)
      0.03762784 = weight(_text_:22 in 2239) [ClassicSimilarity], result of:
        0.03762784 = score(doc=2239,freq=2.0), product of:
          0.16209066 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.04628742 = queryNorm
          0.23214069 = fieldWeight in 2239, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=2239)
  0.4 = coord(2/5)

Abstract: Genetic-based evolutionary learning algorithms, such as genetic algorithms (GAs) and genetic programming (GP), have been applied to information retrieval (IR) since the 1980s. Recently, GP has been applied to a new IR taskdiscovery of ranking functions for Web search-and has achieved very promising results. However, in our prior research, only one fitness function has been used for GP-based learning. It is unclear how other fitness functions may affect ranking function discovery for Web search, especially since it is weIl known that choosing a proper fitness function is very important for the effectiveness and efficiency of evolutionary algorithms. In this article, we report our experience in contrasting different fitness function designs an GP-based learning using a very large Web corpus. Our results indicate that the design of fitness functions is instrumental in performance improvement. We also give recommendations an the design of fitness functions for genetic-based information retrieval experiments.
Date: 31. 5.2004 19:22:06

Deerwester, S.; Dumais, S.; Landauer, T.; Furnass, G.; Beck, L.: Improving information retrieval with latent semantic indexing (1988) 0.06

0.05620398 = product of:
  0.14050995 = sum of:
    0.048630223 = weight(_text_:retrieval in 2396) [ClassicSimilarity], result of:
      0.048630223 = score(doc=2396,freq=6.0), product of:
        0.14001551 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04628742 = queryNorm
        0.34732026 = fieldWeight in 2396, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=2396)
    0.091879725 = weight(_text_:semantic in 2396) [ClassicSimilarity], result of:
      0.091879725 = score(doc=2396,freq=6.0), product of:
        0.19245663 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.04628742 = queryNorm
        0.47740483 = fieldWeight in 2396, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.046875 = fieldNorm(doc=2396)
  0.4 = coord(2/5)

Abstract: Describes a latent semantic indexing (LSI) approach for improving information retrieval. Most document retrieval systems depend on matching keywords in queries against those in documents. The LSI approach tries to overcome the incompleteness and imprecision of latent relations among terms and documents. Tested performance of the LSI method ranged from considerably better than to roughly comparable to performance based on weighted keyword matching, apparently depending on the quality of the queries. Best LSI performance was found using a global entropy weighting for terms and about 100 dimensions for representing terms, documents and queries.
Object: Latent Semantic Indexing

Cross-language information retrieval (1998) 0.06
```
0.055131674 = product of:
  0.09188612 = sum of:
    0.04679445 = weight(_text_:retrieval in 6299) [ClassicSimilarity], result of:
      0.04679445 = score(doc=6299,freq=32.0), product of:
        0.14001551 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04628742 = queryNorm
        0.33420905 = fieldWeight in 6299, product of:
          5.656854 = tf(freq=32.0), with freq of:
            32.0 = termFreq=32.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.01953125 = fieldNorm(doc=6299)
    0.038283218 = weight(_text_:semantic in 6299) [ClassicSimilarity], result of:
      0.038283218 = score(doc=6299,freq=6.0), product of:
        0.19245663 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.04628742 = queryNorm
        0.19891867 = fieldWeight in 6299, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.01953125 = fieldNorm(doc=6299)
    0.0068084467 = product of:
      0.013616893 = sum of:
        0.013616893 = weight(_text_:web in 6299) [ClassicSimilarity], result of:
          0.013616893 = score(doc=6299,freq=2.0), product of:
            0.15105948 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.04628742 = queryNorm
            0.09014259 = fieldWeight in 6299, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.01953125 = fieldNorm(doc=6299)
      0.5 = coord(1/2)
  0.6 = coord(3/5)
```
Content

Enthält die Beiträge: GREFENSTETTE, G.: The Problem of Cross-Language Information Retrieval; DAVIS, M.W.: On the Effective Use of Large Parallel Corpora in Cross-Language Text Retrieval; BALLESTEROS, L. u. W.B. CROFT: Statistical Methods for Cross-Language Information Retrieval; Distributed Cross-Lingual Information Retrieval; Automatic Cross-Language Information Retrieval Using Latent Semantic Indexing; EVANS, D.A. u.a.: Mapping Vocabularies Using Latent Semantics; PICCHI, E. u. C. PETERS: Cross-Language Information Retrieval: A System for Comparable Corpus Querying; YAMABANA, K. u.a.: A Language Conversion Front-End for Cross-Language Information Retrieval; GACHOT, D.A. u.a.: The Systran NLP Browser: An Application of Machine Translation Technology in Cross-Language Information Retrieval; HULL, D.: A Weighted Boolean Model for Cross-Language Text Retrieval; SHERIDAN, P. u.a. Building a Large Multilingual Test Collection from Comparable News Documents; OARD; D.W. u. B.J. DORR: Evaluating Cross-Language Text Filtering Effectiveness

Footnote

Rez. in: Machine translation review: 1999, no.10, S.26-27 (D. Lewis): "Cross Language Information Retrieval (CLIR) addresses the growing need to access large volumes of data across language boundaries. The typical requirement is for the user to input a free form query, usually a brief description of a topic, into a search or retrieval engine which returns a list, in ranked order, of documents or web pages that are relevant to the topic. The search engine matches the terms in the query to indexed terms, usually keywords previously derived from the target documents. Unlike monolingual information retrieval, CLIR requires query terms in one language to be matched to indexed terms in another. Matching can be done by bilingual dictionary lookup, full machine translation, or by applying statistical methods. A query's success is measured in terms of recall (how many potentially relevant target documents are found) and precision (what proportion of documents found are relevant). Issues in CLIR are how to translate query terms into index terms, how to eliminate alternative translations (e.g. to decide that French 'traitement' in a query means 'treatment' and not 'salary'), and how to rank or weight translation alternatives that are retained (e.g. how to order the French terms 'aventure', 'business', 'affaire', and 'liaison' as relevant translations of English 'affair'). Grefenstette provides a lucid and useful overview of the field and the problems. The volume brings together a number of experiments and projects in CLIR. Mark Davies (New Mexico State University) describes Recuerdo, a Spanish retrieval engine which reduces translation ambiguities by scanning indexes for parallel texts; it also uses either a bilingual dictionary or direct equivalents from a parallel corpus in order to compare results for queries on parallel texts. Lisa Ballesteros and Bruce Croft (University of Massachusetts) use a 'local feedback' technique which automatically enhances a query by adding extra terms to it both before and after translation; such terms can be derived from documents known to be relevant to the query.
Christian Fluhr at al (DIST/SMTI, France) outline the EMIR (European Multilingual Information Retrieval) and ESPRIT projects. They found that using SYSTRAN to machine translate queries and to access material from various multilingual databases produced less relevant results than a method referred to as 'multilingual reformulation' (the mechanics of which are only hinted at). An interesting technique is Latent Semantic Indexing (LSI), described by Michael Littman et al (Brown University) and, most clearly, by David Evans et al (Carnegie Mellon University). LSI involves creating matrices of documents and the terms they contain and 'fitting' related documents into a reduced matrix space. This effectively allows queries to be mapped onto a common semantic representation of the documents. Eugenio Picchi and Carol Peters (Pisa) report on a procedure to create links between translation equivalents in an Italian-English parallel corpus. The links are used to construct parallel linguistic contexts in real-time for any term or combination of terms that is being searched for in either language. Their interest is primarily lexicographic but they plan to apply the same procedure to comparable corpora, i.e. to texts which are not translations of each other but which share the same domain. Kiyoshi Yamabana et al (NEC, Japan) address the issue of how to disambiguate between alternative translations of query terms. Their DMAX (double maximise) method looks at co-occurrence frequencies between both source language words and target language words in order to arrive at the most probable translation. The statistical data for the decision are derived, not from the translation texts but independently from monolingual corpora in each language. An interactive user interface allows the user to influence the selection of terms during the matching process. Denis Gachot et al (SYSTRAN) describe the SYSTRAN NLP browser, a prototype tool which collects parsing information derived from a text or corpus previously translated with SYSTRAN. The user enters queries into the browser in either a structured or free form and receives grammatical and lexical information about the source text and/or its translation.

Series

The Kluwer International series on information retrieval
Calegari, S.; Sanchez, E.: Object-fuzzy concept network : an enrichment of ontologies in semantic information retrieval (2008) 0.05
```
0.0540823 = product of:
  0.13520575 = sum of:
    0.04679445 = weight(_text_:retrieval in 2393) [ClassicSimilarity], result of:
      0.04679445 = score(doc=2393,freq=8.0), product of:
        0.14001551 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04628742 = queryNorm
        0.33420905 = fieldWeight in 2393, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2393)
    0.0884113 = weight(_text_:semantic in 2393) [ClassicSimilarity], result of:
      0.0884113 = score(doc=2393,freq=8.0), product of:
        0.19245663 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.04628742 = queryNorm
        0.45938298 = fieldWeight in 2393, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2393)
  0.4 = coord(2/5)
```
Abstract

This article shows how a fuzzy ontology-based approach can improve semantic documents retrieval. After formally defining a fuzzy ontology and a fuzzy knowledge base, a special type of new fuzzy relationship called (semantic) correlation, which links the concepts or entities in a fuzzy ontology, is discussed. These correlations, first assigned by experts, are updated after querying or when a document has been inserted into a database. Moreover, in order to define a dynamic knowledge of a domain adapting itself to the context, it is shown how to handle a tradeoff between the correct definition of an object, taken in the ontology structure, and the actual meaning assigned by individuals. The notion of a fuzzy concept network is extended, incorporating database objects so that entities and documents can similarly be represented in the network. Information retrieval (IR) algorithm, using an object-fuzzy concept network (O-FCN), is introduced and described. This algorithm allows us to derive a unique path among the entities involved in the query to obtain maxima semantic associations in the knowledge domain. Finally, the study has been validated by querying a database using fuzzy recall, fuzzy precision, and coefficient variant measures in the crisp and fuzzy cases.

Theme

Semantisches Umfeld in Indexierung u. Retrieval
Efron, M.: Query expansion and dimensionality reduction : Notions of optimality in Rocchio relevance feedback and latent semantic indexing (2008) 0.05
```
0.052634455 = product of:
  0.13158613 = sum of:
    0.03970641 = weight(_text_:retrieval in 2020) [ClassicSimilarity], result of:
      0.03970641 = score(doc=2020,freq=4.0), product of:
        0.14001551 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04628742 = queryNorm
        0.2835858 = fieldWeight in 2020, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=2020)
    0.091879725 = weight(_text_:semantic in 2020) [ClassicSimilarity], result of:
      0.091879725 = score(doc=2020,freq=6.0), product of:
        0.19245663 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.04628742 = queryNorm
        0.47740483 = fieldWeight in 2020, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.046875 = fieldNorm(doc=2020)
  0.4 = coord(2/5)
```
Abstract

Rocchio relevance feedback and latent semantic indexing (LSI) are well-known extensions of the vector space model for information retrieval (IR). This paper analyzes the statistical relationship between these extensions. The analysis focuses on each method's basis in least-squares optimization. Noting that LSI and Rocchio relevance feedback both alter the vector space model in a way that is in some sense least-squares optimal, we ask: what is the relationship between LSI's and Rocchio's notions of optimality? What does this relationship imply for IR? Using an analytical approach, we argue that Rocchio relevance feedback is optimal if we understand retrieval as a simplified classification problem. On the other hand, LSI's motivation comes to the fore if we understand it as a biased regression technique, where projection onto a low-dimensional orthogonal subspace of the documents reduces model variance.

Object

Latent semantic indexing

Deerwester, S.C.; Dumais, S.T.; Landauer, T.K.; Furnas, G.W.; Harshman, R.A.: Indexing by latent semantic analysis (1990) 0.05

0.052634455 = product of:
  0.13158613 = sum of:
    0.03970641 = weight(_text_:retrieval in 2399) [ClassicSimilarity], result of:
      0.03970641 = score(doc=2399,freq=4.0), product of:
        0.14001551 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.04628742 = queryNorm
        0.2835858 = fieldWeight in 2399, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=2399)
    0.091879725 = weight(_text_:semantic in 2399) [ClassicSimilarity], result of:
      0.091879725 = score(doc=2399,freq=6.0), product of:
        0.19245663 = queryWeight, product of:
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.04628742 = queryNorm
        0.47740483 = fieldWeight in 2399, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.1578603 = idf(docFreq=1879, maxDocs=44218)
          0.046875 = fieldNorm(doc=2399)
  0.4 = coord(2/5)

Abstract: A new method for automatic indexing and retrieval is described. The approach is to take advantage of implicit higher-order structure in the association of terms with documents ("semantic structure") in order to improve the detection of relevant documents on the basis of terms found in queries. The particular technique used is singular-value decomposition, in which a large term by document matrix is decomposed into a set of ca 100 orthogonal factors from which the original matrix can be approximated by linear combination. Documents are represented by ca 100 item vectors of factor weights. Queries are represented as pseudo-document vectors formed from weighted combinations of terms, and documents with supra-threshold cosine values are returned. Initial tests find this completely automatic method for retrieval to be promising.
Object: Latent Semantic Indexing

Search (293 results, page 1 of 15)

Authors

Years

Languages

Types

Themes

Subjects

Classifications