Search (95 results, page 1 of 5)

  • × theme_ss:"Retrievalalgorithmen"
  1. Lalmas, M.; Ruthven, I.: Representing and retrieving structured documents using the Dempster-Shafer theory of evidence : modelling and evaluation (1998) 0.06
    0.06306788 = product of:
      0.12613577 = sum of:
        0.099141 = weight(_text_:representation in 1076) [ClassicSimilarity], result of:
          0.099141 = score(doc=1076,freq=4.0), product of:
            0.19700786 = queryWeight, product of:
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.042818543 = queryNorm
            0.50323373 = fieldWeight in 1076, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1076)
        0.02699477 = product of:
          0.08098431 = sum of:
            0.08098431 = weight(_text_:theory in 1076) [ClassicSimilarity], result of:
              0.08098431 = score(doc=1076,freq=4.0), product of:
                0.1780563 = queryWeight, product of:
                  4.1583924 = idf(docFreq=1878, maxDocs=44218)
                  0.042818543 = queryNorm
                0.45482418 = fieldWeight in 1076, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.1583924 = idf(docFreq=1878, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1076)
          0.33333334 = coord(1/3)
      0.5 = coord(2/4)
    
    Abstract
    Reports on a theoretical model of structured document indexing and retrieval based on the Dempster-Schafer Theory of Evidence. Includes a description of the model of structured document retrieval, the representation of structured documents, the representation of individual components, how components are combined, details of the combination process, and how relevance is captured within the model. Also presents a detailed account of an implementation of the model, and an evaluation scheme designed to test the effectiveness of the model
  2. Song, D.; Bruza, P.D.: Towards context sensitive information inference (2003) 0.04
    0.040241938 = product of:
      0.080483876 = sum of:
        0.070815004 = weight(_text_:representation in 1428) [ClassicSimilarity], result of:
          0.070815004 = score(doc=1428,freq=4.0), product of:
            0.19700786 = queryWeight, product of:
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.042818543 = queryNorm
            0.35945266 = fieldWeight in 1428, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1428)
        0.00966887 = product of:
          0.02900661 = sum of:
            0.02900661 = weight(_text_:22 in 1428) [ClassicSimilarity], result of:
              0.02900661 = score(doc=1428,freq=2.0), product of:
                0.14994325 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.042818543 = queryNorm
                0.19345059 = fieldWeight in 1428, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1428)
          0.33333334 = coord(1/3)
      0.5 = coord(2/4)
    
    Abstract
    Humans can make hasty, but generally robust judgements about what a text fragment is, or is not, about. Such judgements are termed information inference. This article furnishes an account of information inference from a psychologistic stance. By drawing an theories from nonclassical logic and applied cognition, an information inference mechanism is proposed that makes inferences via computations of information flow through an approximation of a conceptual space. Within a conceptual space information is represented geometrically. In this article, geometric representations of words are realized as vectors in a high dimensional semantic space, which is automatically constructed from a text corpus. Two approaches were presented for priming vector representations according to context. The first approach uses a concept combination heuristic to adjust the vector representation of a concept in the light of the representation of another concept. The second approach computes a prototypical concept an the basis of exemplar trace texts and moves it in the dimensional space according to the context. Information inference is evaluated by measuring the effectiveness of query models derived by information flow computations. Results show that information flow contributes significantly to query model effectiveness, particularly with respect to precision. Moreover, retrieval effectiveness compares favorably with two probabilistic query models, and another based an semantic association. More generally, this article can be seen as a contribution towards realizing operational systems that mimic text-based human reasoning.
    Date
    22. 3.2003 19:35:46
  3. Hoenkamp, E.; Bruza, P.D.; Song, D.; Huang, Q.: ¬An effective approach to verbose queries using a limited dependencies language model (2009) 0.04
    0.036038794 = product of:
      0.07207759 = sum of:
        0.056652002 = weight(_text_:representation in 2122) [ClassicSimilarity], result of:
          0.056652002 = score(doc=2122,freq=4.0), product of:
            0.19700786 = queryWeight, product of:
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.042818543 = queryNorm
            0.28756213 = fieldWeight in 2122, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.03125 = fieldNorm(doc=2122)
        0.015425583 = product of:
          0.04627675 = sum of:
            0.04627675 = weight(_text_:theory in 2122) [ClassicSimilarity], result of:
              0.04627675 = score(doc=2122,freq=4.0), product of:
                0.1780563 = queryWeight, product of:
                  4.1583924 = idf(docFreq=1878, maxDocs=44218)
                  0.042818543 = queryNorm
                0.25989953 = fieldWeight in 2122, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.1583924 = idf(docFreq=1878, maxDocs=44218)
                  0.03125 = fieldNorm(doc=2122)
          0.33333334 = coord(1/3)
      0.5 = coord(2/4)
    
    Abstract
    Intuitively, any 'bag of words' approach in IR should benefit from taking term dependencies into account. Unfortunately, for years the results of exploiting such dependencies have been mixed or inconclusive. To improve the situation, this paper shows how the natural language properties of the target documents can be used to transform and enrich the term dependencies to more useful statistics. This is done in three steps. The term co-occurrence statistics of queries and documents are each represented by a Markov chain. The paper proves that such a chain is ergodic, and therefore its asymptotic behavior is unique, stationary, and independent of the initial state. Next, the stationary distribution is taken to model queries and documents, rather than their initial distributions. Finally, ranking is achieved following the customary language modeling paradigm. The main contribution of this paper is to argue why the asymptotic behavior of the document model is a better representation then just the document's initial distribution. A secondary contribution is to investigate the practical application of this representation in case the queries become increasingly verbose. In the experiments (based on Lemur's search engine substrate) the default query model was replaced by the stable distribution of the query. Just modeling the query this way already resulted in significant improvements over a standard language model baseline. The results were on a par or better than more sophisticated algorithms that use fine-tuned parameters or extensive training. Moreover, the more verbose the query, the more effective the approach seems to become.
    Series
    Lecture notes in computer science : advances in information retrieval theory; 5766
    Source
    Second International Conference on the Theory of Information Retrieval, ICTIR 2009 Cambridge, UK, September 10-12, 2009 Proceedings. Ed.: L. Azzopardi
  4. Furner, J.: ¬A unifying model of document relatedness for hybrid search engines (2003) 0.04
    0.03584558 = product of:
      0.07169116 = sum of:
        0.060088523 = weight(_text_:representation in 2717) [ClassicSimilarity], result of:
          0.060088523 = score(doc=2717,freq=2.0), product of:
            0.19700786 = queryWeight, product of:
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.042818543 = queryNorm
            0.3050057 = fieldWeight in 2717, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.046875 = fieldNorm(doc=2717)
        0.011602643 = product of:
          0.034807928 = sum of:
            0.034807928 = weight(_text_:22 in 2717) [ClassicSimilarity], result of:
              0.034807928 = score(doc=2717,freq=2.0), product of:
                0.14994325 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.042818543 = queryNorm
                0.23214069 = fieldWeight in 2717, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2717)
          0.33333334 = coord(1/3)
      0.5 = coord(2/4)
    
    Date
    11. 9.2004 17:32:22
    Source
    Challenges in knowledge representation and organization for the 21st century: Integration of knowledge across boundaries. Proceedings of the 7th ISKO International Conference Granada, Spain, July 10-13, 2002. Ed.: M. López-Huertas
  5. Klas, C.-P.; Fuhr, N.; Schaefer, A.: Evaluating strategic support for information access in the DAFFODIL system (2004) 0.04
    0.03584558 = product of:
      0.07169116 = sum of:
        0.060088523 = weight(_text_:representation in 2419) [ClassicSimilarity], result of:
          0.060088523 = score(doc=2419,freq=2.0), product of:
            0.19700786 = queryWeight, product of:
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.042818543 = queryNorm
            0.3050057 = fieldWeight in 2419, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.046875 = fieldNorm(doc=2419)
        0.011602643 = product of:
          0.034807928 = sum of:
            0.034807928 = weight(_text_:22 in 2419) [ClassicSimilarity], result of:
              0.034807928 = score(doc=2419,freq=2.0), product of:
                0.14994325 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.042818543 = queryNorm
                0.23214069 = fieldWeight in 2419, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2419)
          0.33333334 = coord(1/3)
      0.5 = coord(2/4)
    
    Abstract
    The digital library system Daffodil is targeted at strategic support of users during the information search process. For searching, exploring and managing digital library objects it provides user-customisable information seeking patterns over a federation of heterogeneous digital libraries. In this paper evaluation results with respect to retrieval effectiveness, efficiency and user satisfaction are presented. The analysis focuses on strategic support for the scientific work-flow. Daffodil supports the whole work-flow, from data source selection over information seeking to the representation, organisation and reuse of information. By embedding high level search functionality into the scientific work-flow, the user experiences better strategic system support due to a more systematic work process. These ideas have been implemented in Daffodil followed by a qualitative evaluation. The evaluation has been conducted with 28 participants, ranging from information seeking novices to experts. The results are promising, as they support the chosen model.
    Date
    16.11.2008 16:22:48
  6. Carpineto, C.; Romano, G.: Order-theoretical ranking (2000) 0.03
    0.031854097 = product of:
      0.06370819 = sum of:
        0.050073773 = weight(_text_:representation in 4766) [ClassicSimilarity], result of:
          0.050073773 = score(doc=4766,freq=2.0), product of:
            0.19700786 = queryWeight, product of:
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.042818543 = queryNorm
            0.25417143 = fieldWeight in 4766, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4766)
        0.013634419 = product of:
          0.040903255 = sum of:
            0.040903255 = weight(_text_:theory in 4766) [ClassicSimilarity], result of:
              0.040903255 = score(doc=4766,freq=2.0), product of:
                0.1780563 = queryWeight, product of:
                  4.1583924 = idf(docFreq=1878, maxDocs=44218)
                  0.042818543 = queryNorm
                0.2297209 = fieldWeight in 4766, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.1583924 = idf(docFreq=1878, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4766)
          0.33333334 = coord(1/3)
      0.5 = coord(2/4)
    
    Abstract
    Current best-match ranking (BMR) systems perform well but cannot handle word mismatch between a query and a document. The best known alternative ranking method, hierarchical clustering-based ranking (HCR), seems to be more robust than BMR with respect to this problem, but it is hampered by theoretical and practical limitations. We present an approach to document ranking that explicitly addresses the word mismatch problem by exploiting interdocument similarity information in a novel way. Document ranking is seen as a query-document transformation driven by a conceptual representation of the whole document collection, into which the query is merged. Our approach is nased on the theory of concept (or Galois) lattices, which, er argue, provides a powerful, well-founded, and conputationally-tractable framework to model the space in which documents and query are represented and to compute such a transformation. We compared information retrieval using concept lattice-based ranking (CLR) to BMR and HCR. The results showed that HCR was outperformed by CLR as well as BMR, and suggested that, of the two best methods, BMR achieved better performance than CLR on the whole document set, whereas CLR compared more favorably when only the first retrieved documents were used for evaluation. We also evaluated the three methods' specific ability to rank documents that did not match the query, in which case the speriority of CLR over BMR and HCR was apparent
  7. Li, M.; Li, H.; Zhou, Z.-H.: Semi-supervised document retrieval (2009) 0.03
    0.031854097 = product of:
      0.06370819 = sum of:
        0.050073773 = weight(_text_:representation in 4218) [ClassicSimilarity], result of:
          0.050073773 = score(doc=4218,freq=2.0), product of:
            0.19700786 = queryWeight, product of:
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.042818543 = queryNorm
            0.25417143 = fieldWeight in 4218, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4218)
        0.013634419 = product of:
          0.040903255 = sum of:
            0.040903255 = weight(_text_:theory in 4218) [ClassicSimilarity], result of:
              0.040903255 = score(doc=4218,freq=2.0), product of:
                0.1780563 = queryWeight, product of:
                  4.1583924 = idf(docFreq=1878, maxDocs=44218)
                  0.042818543 = queryNorm
                0.2297209 = fieldWeight in 4218, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.1583924 = idf(docFreq=1878, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4218)
          0.33333334 = coord(1/3)
      0.5 = coord(2/4)
    
    Abstract
    This paper proposes a new machine learning method for constructing ranking models in document retrieval. The method, which is referred to as SSRank, aims to use the advantages of both the traditional Information Retrieval (IR) methods and the supervised learning methods for IR proposed recently. The advantages include the use of limited amount of labeled data and rich model representation. To do so, the method adopts a semi-supervised learning framework in ranking model construction. Specifically, given a small number of labeled documents with respect to some queries, the method effectively labels the unlabeled documents for the queries. It then uses all the labeled data to train a machine learning model (in our case, Neural Network). In the data labeling, the method also makes use of a traditional IR model (in our case, BM25). A stopping criterion based on machine learning theory is given for the data labeling process. Experimental results on three benchmark datasets and one web search dataset indicate that SSRank consistently and almost always significantly outperforms the baseline methods (unsupervised and supervised learning methods), given the same amount of labeled data. This is because SSRank can effectively leverage the use of unlabeled data in learning.
  8. Liddy, E.D.: ¬An alternative representation for documents and queries (1993) 0.03
    0.030044261 = product of:
      0.120177045 = sum of:
        0.120177045 = weight(_text_:representation in 7813) [ClassicSimilarity], result of:
          0.120177045 = score(doc=7813,freq=8.0), product of:
            0.19700786 = queryWeight, product of:
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.042818543 = queryNorm
            0.6100114 = fieldWeight in 7813, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.046875 = fieldNorm(doc=7813)
      0.25 = coord(1/4)
    
    Abstract
    Describes an alternative method of representation for documents and queries in information retrieval systems to the 2 most common methods: free text, natural language representation and controlled language representation. The alternative method combines the advantage of both traditional approaches and overcomes the difficulties associated with each. The scheme was developed for use with Longman's Dictionary of Contemporary English and uses a computerized version of the dictionary for the automatic generation of summary level semantic representations of each document and query. The system tags each word in a document with the appropriate Subject Field Code (SFC) from the dictionary and the SFCs are summed and normalized to produce a weighted, fixed length vector of the SFC. The search system matches the query SFC vector to the document SFC vectors in the database. The documents are then ranked on the basis of their vector's similarity to the query
  9. Dannenberg, R.B.; Birmingham, W.P.; Pardo, B.; Hu, N.; Meek, C.; Tzanetakis, G.: ¬A comparative evaluation of search techniques for query-by-humming using the MUSART testbed (2007) 0.03
    0.029915206 = product of:
      0.059830412 = sum of:
        0.050073773 = weight(_text_:representation in 269) [ClassicSimilarity], result of:
          0.050073773 = score(doc=269,freq=2.0), product of:
            0.19700786 = queryWeight, product of:
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.042818543 = queryNorm
            0.25417143 = fieldWeight in 269, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.0390625 = fieldNorm(doc=269)
        0.009756638 = product of:
          0.029269911 = sum of:
            0.029269911 = weight(_text_:29 in 269) [ClassicSimilarity], result of:
              0.029269911 = score(doc=269,freq=2.0), product of:
                0.15062225 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.042818543 = queryNorm
                0.19432661 = fieldWeight in 269, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=269)
          0.33333334 = coord(1/3)
      0.5 = coord(2/4)
    
    Abstract
    Query-by-humming systems offer content-based searching for melodies and require no special musical training or knowledge. Many such systems have been built, but there has not been much useful evaluation and comparison in the literature due to the lack of shared databases and queries. The MUSART project testbed allows various search algorithms to be compared using a shared framework that automatically runs experiments and summarizes results. Using this testbed, the authors compared algorithms based on string alignment, melodic contour matching, a hidden Markov model, n-grams, and CubyHum. Retrieval performance is very sensitive to distance functions and the representation of pitch and rhythm, which raises questions about some previously published conclusions. Some algorithms are particularly sensitive to the quality of queries. Our queries, which are taken from human subjects in a realistic setting, are quite difficult, especially for n-gram models. Finally, simulations on query-by-humming performance as a function of database size indicate that retrieval performance falls only slowly as the database size increases.
    Date
    29. 4.2007 19:45:32
  10. Tsai, C.-F.; Hu, Y.-H.; Chen, Z.-Y.: Factors affecting rocchio-based pseudorelevance feedback in image retrieval (2015) 0.03
    0.027992086 = product of:
      0.111968346 = sum of:
        0.111968346 = weight(_text_:representation in 1607) [ClassicSimilarity], result of:
          0.111968346 = score(doc=1607,freq=10.0), product of:
            0.19700786 = queryWeight, product of:
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.042818543 = queryNorm
            0.56834453 = fieldWeight in 1607, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1607)
      0.25 = coord(1/4)
    
    Abstract
    Pseudorelevance feedback (PRF) was proposed to solve the limitation of relevance feedback (RF), which is based on the user-in-the-loop process. In PRF, the top-k retrieved images are regarded as PRF. Although the PRF set contains noise, PRF has proven effective for automatically improving the overall retrieval result. To implement PRF, the Rocchio algorithm has been considered as a reasonable and well-established baseline. However, the performance of Rocchio-based PRF is subject to various representation choices (or factors). In this article, we examine these factors that affect the performance of Rocchio-based PRF, including image-feature representation, the number of top-ranked images, the weighting parameters of Rocchio, and similarity measure. We offer practical insights on how to optimize the performance of Rocchio-based PRF by choosing appropriate representation choices. Our extensive experiments on NUS-WIDE-LITE and Caltech 101 + Corel 5000 data sets show that the optimal feature representation is color moment + wavelet texture in terms of retrieval efficiency and effectiveness. Other representation choices are that using top-20 ranked images as pseudopositive and pseudonegative feedback sets with the equal weight (i.e., 0.5) by the correlation and cosine distance functions can produce the optimal retrieval result.
  11. Zhang, W.; Yoshida, T.; Tang, X.: ¬A comparative study of TF*IDF, LSI and multi-words for text classification (2011) 0.03
    0.026019095 = product of:
      0.10407638 = sum of:
        0.10407638 = weight(_text_:representation in 1165) [ClassicSimilarity], result of:
          0.10407638 = score(doc=1165,freq=6.0), product of:
            0.19700786 = queryWeight, product of:
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.042818543 = queryNorm
            0.5282854 = fieldWeight in 1165, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.046875 = fieldNorm(doc=1165)
      0.25 = coord(1/4)
    
    Abstract
    One of the main themes in text mining is text representation, which is fundamental and indispensable for text-based intellegent information processing. Generally, text representation inludes two tasks: indexing and weighting. This paper has comparatively studied TF*IDF, LSI and multi-word for text representation. We used a Chinese and an English document collection to respectively evaluate the three methods in information retreival and text categorization. Experimental results have demonstrated that in text categorization, LSI has better performance than other methods in both document collections. Also, LSI has produced the best performance in retrieving English documents. This outcome has shown that LSI has both favorable semantic and statistical quality and is different with the claim that LSI can not produce discriminative power for indexing.
  12. Carpineto, C.; Romano, G.: Information retrieval through hybrid navigation of lattice representations (1996) 0.02
    0.02478525 = product of:
      0.099141 = sum of:
        0.099141 = weight(_text_:representation in 7434) [ClassicSimilarity], result of:
          0.099141 = score(doc=7434,freq=4.0), product of:
            0.19700786 = queryWeight, product of:
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.042818543 = queryNorm
            0.50323373 = fieldWeight in 7434, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.0546875 = fieldNorm(doc=7434)
      0.25 = coord(1/4)
    
    Abstract
    Presents a comprehensive approach to automatic organization and hybrid navigation of text databases. An organizing stage builds a particular lattice representation of the data, through text indexing followed by lattice clustering of the indexed texts. The lattice representation supports the navigation state of the system, a visual retrieval interface that combines 3 main retrieval strategies: browsing, querying, and bounding. Such a hybrid paradigm permits high flexibility in trading off information exploration and retrieval, and had good retrieval performance. Compares information retrieval using lattice-based hybrid navigation with conventional Boolean querying. Experiments conducted on 2 medium-sized bibliographic databases showed that the performance of lattice retrieval was comparable to or better than Boolean retrieval
  13. Ayadi, H.; Torjmen-Khemakhem, M.; Daoud, M.; Xiangji Huang, J.; Ben Jemaa, M.: MF-Re-Rank : a modality feature-based re-ranking model for medical image retrieval (2018) 0.02
    0.023932163 = product of:
      0.047864325 = sum of:
        0.040059015 = weight(_text_:representation in 4459) [ClassicSimilarity], result of:
          0.040059015 = score(doc=4459,freq=2.0), product of:
            0.19700786 = queryWeight, product of:
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.042818543 = queryNorm
            0.20333713 = fieldWeight in 4459, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.03125 = fieldNorm(doc=4459)
        0.0078053097 = product of:
          0.023415929 = sum of:
            0.023415929 = weight(_text_:29 in 4459) [ClassicSimilarity], result of:
              0.023415929 = score(doc=4459,freq=2.0), product of:
                0.15062225 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.042818543 = queryNorm
                0.15546128 = fieldWeight in 4459, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.03125 = fieldNorm(doc=4459)
          0.33333334 = coord(1/3)
      0.5 = coord(2/4)
    
    Abstract
    One of the main challenges in medical image retrieval is the increasing volume of image data, which render it difficult for domain experts to find relevant information from large data sets. Effective and efficient medical image retrieval systems are required to better manage medical image information. Text-based image retrieval (TBIR) was very successful in retrieving images with textual descriptions. Several TBIR approaches rely on models based on bag-of-words approaches, in which the image retrieval problem turns into one of standard text-based information retrieval; where the meanings and values of specific medical entities in the text and metadata are ignored in the image representation and retrieval process. However, we believe that TBIR should extract specific medical entities and terms and then exploit these elements to achieve better image retrieval results. Therefore, we propose a novel reranking method based on medical-image-dependent features. These features are manually selected by a medical expert from imaging modalities and medical terminology. First, we represent queries and images using only medical-image-dependent features such as image modality and image scale. Second, we exploit the defined features in a new reranking method for medical image retrieval. Our motivation is the large influence of image modality in medical image retrieval and its impact on image-relevance scores. To evaluate our approach, we performed a series of experiments on the medical ImageCLEF data sets from 2009 to 2013. The BM25 model, a language model, and an image-relevance feedback model are used as baselines to evaluate our approach. The experimental results show that compared to the BM25 model, the proposed model significantly enhances image retrieval performance. We also compared our approach with other state-of-the-art approaches and show that our approach performs comparably to those of the top three runs in the official ImageCLEF competition.
    Date
    29. 9.2018 11:43:31
  14. Langville, A.N.; Meyer, C.D.: Google's PageRank and beyond : the science of search engine rankings (2006) 0.02
    0.020806724 = product of:
      0.04161345 = sum of:
        0.030044261 = weight(_text_:representation in 6) [ClassicSimilarity], result of:
          0.030044261 = score(doc=6,freq=2.0), product of:
            0.19700786 = queryWeight, product of:
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.042818543 = queryNorm
            0.15250285 = fieldWeight in 6, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.0234375 = fieldNorm(doc=6)
        0.011569187 = product of:
          0.03470756 = sum of:
            0.03470756 = weight(_text_:theory in 6) [ClassicSimilarity], result of:
              0.03470756 = score(doc=6,freq=4.0), product of:
                0.1780563 = queryWeight, product of:
                  4.1583924 = idf(docFreq=1878, maxDocs=44218)
                  0.042818543 = queryNorm
                0.19492465 = fieldWeight in 6, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.1583924 = idf(docFreq=1878, maxDocs=44218)
                  0.0234375 = fieldNorm(doc=6)
          0.33333334 = coord(1/3)
      0.5 = coord(2/4)
    
    Content
    Inhalt: Chapter 1. Introduction to Web Search Engines: 1.1 A Short History of Information Retrieval - 1.2 An Overview of Traditional Information Retrieval - 1.3 Web Information Retrieval Chapter 2. Crawling, Indexing, and Query Processing: 2.1 Crawling - 2.2 The Content Index - 2.3 Query Processing Chapter 3. Ranking Webpages by Popularity: 3.1 The Scene in 1998 - 3.2 Two Theses - 3.3 Query-Independence Chapter 4. The Mathematics of Google's PageRank: 4.1 The Original Summation Formula for PageRank - 4.2 Matrix Representation of the Summation Equations - 4.3 Problems with the Iterative Process - 4.4 A Little Markov Chain Theory - 4.5 Early Adjustments to the Basic Model - 4.6 Computation of the PageRank Vector - 4.7 Theorem and Proof for Spectrum of the Google Matrix Chapter 5. Parameters in the PageRank Model: 5.1 The a Factor - 5.2 The Hyperlink Matrix H - 5.3 The Teleportation Matrix E Chapter 6. The Sensitivity of PageRank; 6.1 Sensitivity with respect to alpha - 6.2 Sensitivity with respect to H - 6.3 Sensitivity with respect to vT - 6.4 Other Analyses of Sensitivity - 6.5 Sensitivity Theorems and Proofs Chapter 7. The PageRank Problem as a Linear System: 7.1 Properties of (I - alphaS) - 7.2 Properties of (I - alphaH) - 7.3 Proof of the PageRank Sparse Linear System Chapter 8. Issues in Large-Scale Implementation of PageRank: 8.1 Storage Issues - 8.2 Convergence Criterion - 8.3 Accuracy - 8.4 Dangling Nodes - 8.5 Back Button Modeling
    Chapter 9. Accelerating the Computation of PageRank: 9.1 An Adaptive Power Method - 9.2 Extrapolation - 9.3 Aggregation - 9.4 Other Numerical Methods Chapter 10. Updating the PageRank Vector: 10.1 The Two Updating Problems and their History - 10.2 Restarting the Power Method - 10.3 Approximate Updating Using Approximate Aggregation - 10.4 Exact Aggregation - 10.5 Exact vs. Approximate Aggregation - 10.6 Updating with Iterative Aggregation - 10.7 Determining the Partition - 10.8 Conclusions Chapter 11. The HITS Method for Ranking Webpages: 11.1 The HITS Algorithm - 11.2 HITS Implementation - 11.3 HITS Convergence - 11.4 HITS Example - 11.5 Strengths and Weaknesses of HITS - 11.6 HITS's Relationship to Bibliometrics - 11.7 Query-Independent HITS - 11.8 Accelerating HITS - 11.9 HITS Sensitivity Chapter 12. Other Link Methods for Ranking Webpages: 12.1 SALSA - 12.2 Hybrid Ranking Methods - 12.3 Rankings based on Traffic Flow Chapter 13. The Future of Web Information Retrieval: 13.1 Spam - 13.2 Personalization - 13.3 Clustering - 13.4 Intelligent Agents - 13.5 Trends and Time-Sensitive Search - 13.6 Privacy and Censorship - 13.7 Library Classification Schemes - 13.8 Data Fusion Chapter 14. Resources for Web Information Retrieval: 14.1 Resources for Getting Started - 14.2 Resources for Serious Study Chapter 15. The Mathematics Guide: 15.1 Linear Algebra - 15.2 Perron-Frobenius Theory - 15.3 Markov Chains - 15.4 Perron Complementation - 15.5 Stochastic Complementation - 15.6 Censoring - 15.7 Aggregation - 15.8 Disaggregation
  15. Hofferer, M.: Heuristic search in information retrieval (1994) 0.02
    0.020029508 = product of:
      0.08011803 = sum of:
        0.08011803 = weight(_text_:representation in 1070) [ClassicSimilarity], result of:
          0.08011803 = score(doc=1070,freq=2.0), product of:
            0.19700786 = queryWeight, product of:
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.042818543 = queryNorm
            0.40667427 = fieldWeight in 1070, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.0625 = fieldNorm(doc=1070)
      0.25 = coord(1/4)
    
    Abstract
    Describes an adaptive information retrieval system: Information Retrieval Algorithm System (IRAS); that uses heuristic searching to sample a document space and retrieve relevant documents according to users' requests; and also a learning module based on a knowledge representation system and an approximate probabilistic characterization of relevant documents; to reproduce a user classification of relevant documents and to provide a rule controlled ranking
  16. Hoenkamp, E.: Unitary operators on the document space (2003) 0.02
    0.017703751 = product of:
      0.070815004 = sum of:
        0.070815004 = weight(_text_:representation in 3457) [ClassicSimilarity], result of:
          0.070815004 = score(doc=3457,freq=4.0), product of:
            0.19700786 = queryWeight, product of:
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.042818543 = queryNorm
            0.35945266 = fieldWeight in 3457, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3457)
      0.25 = coord(1/4)
    
    Abstract
    When people search for documents, they eventually want content, not words. Hence, search engines should relate documents more by their underlying concepts than by the words they contain. One promising technique to do so is Latent Semantic Indexing (LSI). LSI dramatically reduces the dimension of the document space by mapping it into a space spanned by conceptual indices. Empirically, the number of concepts that can represent the documents are far fewer than the great variety of words in the textual representation. Although this almost obviates the problem of lexical matching, the mapping incurs a high computational cost compared to document parsing, indexing, query matching, and updating. This article accomplishes several things. First, it shows how the technique underlying LSI is just one example of a unitary operator, for which there are computationally more attractive alternatives. Second, it proposes the Haar transform as such an alternative, as it is memory efficient, and can be computed in linear to sublinear time. Third, it generalizes LSI by a multiresolution representation of the document space. The approach not only preserves the advantages of LSI at drastically reduced computational costs, it also opens a spectrum of possibilities for new research.
  17. Liddy, E.D.; Paik, W.; McKenna, M.; Yu, E.S.: ¬A natural language text retrieval system with relevance feedback (1995) 0.02
    0.017525818 = product of:
      0.07010327 = sum of:
        0.07010327 = weight(_text_:representation in 3131) [ClassicSimilarity], result of:
          0.07010327 = score(doc=3131,freq=2.0), product of:
            0.19700786 = queryWeight, product of:
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.042818543 = queryNorm
            0.35583997 = fieldWeight in 3131, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3131)
      0.25 = coord(1/4)
    
    Abstract
    Outlines a fully integrated retrieval engine that processes documents and queries at the multiple, complex linguistic levels that humans use to construe meaning. Currently undergoing beta site trials, the DR-LINK natural language text retrieval system allows searchers to state queries as fully formed, natural sentences. The meaning and matching of both queries and documents is accomplished at the conceptual level of human expression, not by the simple concurrence of keywords. Furthermore, the natural browsing behaviour of information searchers is accomodated by allowing documents identified as potentially relevant by the explicit semantics of the system to be used as relevance feedback queries which provide an appropriate implicit semantic representation of the information seeker's need
  18. Herrera-Viedma, E.; Cordón, O.; Herrera, J.C.; Luqe, M.: ¬An IRS based on multi-granular lnguistic information (2003) 0.02
    0.015022131 = product of:
      0.060088523 = sum of:
        0.060088523 = weight(_text_:representation in 2740) [ClassicSimilarity], result of:
          0.060088523 = score(doc=2740,freq=2.0), product of:
            0.19700786 = queryWeight, product of:
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.042818543 = queryNorm
            0.3050057 = fieldWeight in 2740, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.046875 = fieldNorm(doc=2740)
      0.25 = coord(1/4)
    
    Source
    Challenges in knowledge representation and organization for the 21st century: Integration of knowledge across boundaries. Proceedings of the 7th ISKO International Conference Granada, Spain, July 10-13, 2002. Ed.: M. López-Huertas
  19. Li, H.; Wu, H.; Li, D.; Lin, S.; Su, Z.; Luo, X.: PSI: A probabilistic semantic interpretable framework for fine-grained image ranking (2018) 0.02
    0.015022131 = product of:
      0.060088523 = sum of:
        0.060088523 = weight(_text_:representation in 4577) [ClassicSimilarity], result of:
          0.060088523 = score(doc=4577,freq=2.0), product of:
            0.19700786 = queryWeight, product of:
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.042818543 = queryNorm
            0.3050057 = fieldWeight in 4577, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.600994 = idf(docFreq=1206, maxDocs=44218)
              0.046875 = fieldNorm(doc=4577)
      0.25 = coord(1/4)
    
    Abstract
    Image Ranking is one of the key problems in information science research area. However, most current methods focus on increasing the performance, leaving the semantic gap problem, which refers to the learned ranking models are hard to be understood, remaining intact. Therefore, in this article, we aim at learning an interpretable ranking model to tackle the semantic gap in fine-grained image ranking. We propose to combine attribute-based representation and online passive-aggressive (PA) learning based ranking models to achieve this goal. Besides, considering the highly localized instances in fine-grained image ranking, we introduce a supervised constrained clustering method to gather class-balanced training instances for local PA-based models, and incorporate the learned local models into a unified probabilistic framework. Extensive experiments on the benchmark demonstrate that the proposed framework outperforms state-of-the-art methods in terms of accuracy and speed.
  20. Dominich, S.: Mathematical foundations of information retrieval (2001) 0.01
    0.014475424 = product of:
      0.057901695 = sum of:
        0.057901695 = product of:
          0.08685254 = sum of:
            0.057845935 = weight(_text_:theory in 1753) [ClassicSimilarity], result of:
              0.057845935 = score(doc=1753,freq=4.0), product of:
                0.1780563 = queryWeight, product of:
                  4.1583924 = idf(docFreq=1878, maxDocs=44218)
                  0.042818543 = queryNorm
                0.3248744 = fieldWeight in 1753, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.1583924 = idf(docFreq=1878, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1753)
            0.02900661 = weight(_text_:22 in 1753) [ClassicSimilarity], result of:
              0.02900661 = score(doc=1753,freq=2.0), product of:
                0.14994325 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.042818543 = queryNorm
                0.19345059 = fieldWeight in 1753, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1753)
          0.6666667 = coord(2/3)
      0.25 = coord(1/4)
    
    Abstract
    This book offers a comprehensive and consistent mathematical approach to information retrieval (IR) without which no implementation is possible, and sheds an entirely new light upon the structure of IR models. It contains the descriptions of all IR models in a unified formal style and language, along with examples for each, thus offering a comprehensive overview of them. The book also creates mathematical foundations and a consistent mathematical theory (including all mathematical results achieved so far) of IR as a stand-alone mathematical discipline, which thus can be read and taught independently. Also, the book contains all necessary mathematical knowledge on which IR relies, to help the reader avoid searching different sources. The book will be of interest to computer or information scientists, librarians, mathematicians, undergraduate students and researchers whose work involves information retrieval.
    Date
    22. 3.2008 12:26:32
    Series
    Mathematical modelling: theory and applications; 12

Languages

  • e 81
  • d 12
  • pt 1
  • More… Less…

Types

  • a 87
  • m 5
  • el 2
  • r 1
  • s 1
  • x 1
  • More… Less…