Search (61 results, page 2 of 4)

  • × theme_ss:"Retrievalalgorithmen"
  • × year_i:[2010 TO 2020}
  1. Oberhauser, O.: Relevance Ranking in den Online-Katalogen der "nächsten Generation" (2010) 0.01
    0.007585435 = product of:
      0.01517087 = sum of:
        0.01517087 = product of:
          0.022756305 = sum of:
            0.004032909 = weight(_text_:a in 4308) [ClassicSimilarity], result of:
              0.004032909 = score(doc=4308,freq=2.0), product of:
                0.052761257 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.045758117 = queryNorm
                0.07643694 = fieldWeight in 4308, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4308)
            0.018723397 = weight(_text_:h in 4308) [ClassicSimilarity], result of:
              0.018723397 = score(doc=4308,freq=2.0), product of:
                0.113683715 = queryWeight, product of:
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.045758117 = queryNorm
                0.16469726 = fieldWeight in 4308, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4308)
          0.6666667 = coord(2/3)
      0.5 = coord(1/2)
    
    Source
    Mitteilungen der Vereinigung Österreichischer Bibliothekarinnen und Bibliothekare. 63(2010) H.1/2, S.25-37
    Type
    a
  2. Bhansali, D.; Desai, H.; Deulkar, K.: ¬A study of different ranking approaches for semantic search (2015) 0.01
    0.0071412786 = product of:
      0.014282557 = sum of:
        0.014282557 = product of:
          0.021423835 = sum of:
            0.0058210026 = weight(_text_:a in 2696) [ClassicSimilarity], result of:
              0.0058210026 = score(doc=2696,freq=6.0), product of:
                0.052761257 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.045758117 = queryNorm
                0.11032722 = fieldWeight in 2696, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2696)
            0.015602832 = weight(_text_:h in 2696) [ClassicSimilarity], result of:
              0.015602832 = score(doc=2696,freq=2.0), product of:
                0.113683715 = queryWeight, product of:
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.045758117 = queryNorm
                0.13724773 = fieldWeight in 2696, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2696)
          0.6666667 = coord(2/3)
      0.5 = coord(1/2)
    
    Abstract
    Search Engines have become an integral part of our day to day life. Our reliance on search engines increases with every passing day. With the amount of data available on Internet increasing exponentially, it becomes important to develop new methods and tools that help to return results relevant to the queries and reduce the time spent on searching. The results should be diverse but at the same time should return results focused on the queries asked. Relation Based Page Rank [4] algorithms are considered to be the next frontier in improvement of Semantic Web Search. The probability of finding relevance in the search results as posited by the user while entering the query is used to measure the relevance. However, its application is limited by the complexity of determining relation between the terms and assigning explicit meaning to each term. Trust Rank is one of the most widely used ranking algorithms for semantic web search. Few other ranking algorithms like HITS algorithm, PageRank algorithm are also used for Semantic Web Searching. In this paper, we will provide a comparison of few ranking approaches.
    Type
    a
  3. Tsai, C.-F.; Hu, Y.-H.; Chen, Z.-Y.: Factors affecting rocchio-based pseudorelevance feedback in image retrieval (2015) 0.01
    0.0067852205 = product of:
      0.013570441 = sum of:
        0.013570441 = product of:
          0.02035566 = sum of:
            0.0047528287 = weight(_text_:a in 1607) [ClassicSimilarity], result of:
              0.0047528287 = score(doc=1607,freq=4.0), product of:
                0.052761257 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.045758117 = queryNorm
                0.090081796 = fieldWeight in 1607, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1607)
            0.015602832 = weight(_text_:h in 1607) [ClassicSimilarity], result of:
              0.015602832 = score(doc=1607,freq=2.0), product of:
                0.113683715 = queryWeight, product of:
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.045758117 = queryNorm
                0.13724773 = fieldWeight in 1607, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1607)
          0.6666667 = coord(2/3)
      0.5 = coord(1/2)
    
    Abstract
    Pseudorelevance feedback (PRF) was proposed to solve the limitation of relevance feedback (RF), which is based on the user-in-the-loop process. In PRF, the top-k retrieved images are regarded as PRF. Although the PRF set contains noise, PRF has proven effective for automatically improving the overall retrieval result. To implement PRF, the Rocchio algorithm has been considered as a reasonable and well-established baseline. However, the performance of Rocchio-based PRF is subject to various representation choices (or factors). In this article, we examine these factors that affect the performance of Rocchio-based PRF, including image-feature representation, the number of top-ranked images, the weighting parameters of Rocchio, and similarity measure. We offer practical insights on how to optimize the performance of Rocchio-based PRF by choosing appropriate representation choices. Our extensive experiments on NUS-WIDE-LITE and Caltech 101 + Corel 5000 data sets show that the optimal feature representation is color moment + wavelet texture in terms of retrieval efficiency and effectiveness. Other representation choices are that using top-20 ranked images as pseudopositive and pseudonegative feedback sets with the equal weight (i.e., 0.5) by the correlation and cosine distance functions can produce the optimal retrieval result.
    Type
    a
  4. Xu, B.; Lin, H.; Lin, Y.: Assessment of learning to rank methods for query expansion (2016) 0.01
    0.0067852205 = product of:
      0.013570441 = sum of:
        0.013570441 = product of:
          0.02035566 = sum of:
            0.0047528287 = weight(_text_:a in 2929) [ClassicSimilarity], result of:
              0.0047528287 = score(doc=2929,freq=4.0), product of:
                0.052761257 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.045758117 = queryNorm
                0.090081796 = fieldWeight in 2929, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2929)
            0.015602832 = weight(_text_:h in 2929) [ClassicSimilarity], result of:
              0.015602832 = score(doc=2929,freq=2.0), product of:
                0.113683715 = queryWeight, product of:
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.045758117 = queryNorm
                0.13724773 = fieldWeight in 2929, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2929)
          0.6666667 = coord(2/3)
      0.5 = coord(1/2)
    
    Abstract
    Pseudo relevance feedback, as an effective query expansion method, can significantly improve information retrieval performance. However, the method may negatively impact the retrieval performance when some irrelevant terms are used in the expanded query. Therefore, it is necessary to refine the expansion terms. Learning to rank methods have proven effective in information retrieval to solve ranking problems by ranking the most relevant documents at the top of the returned list, but few attempts have been made to employ learning to rank methods for term refinement in pseudo relevance feedback. This article proposes a novel framework to explore the feasibility of using learning to rank to optimize pseudo relevance feedback by means of reranking the candidate expansion terms. We investigate some learning approaches to choose the candidate terms and introduce some state-of-the-art learning to rank methods to refine the expansion terms. In addition, we propose two term labeling strategies and examine the usefulness of various term features to optimize the framework. Experimental results with three TREC collections show that our framework can effectively improve retrieval performance.
    Type
    a
  5. Ayadi, H.; Torjmen-Khemakhem, M.; Daoud, M.; Xiangji Huang, J.; Ben Jemaa, M.: MF-Re-Rank : a modality feature-based re-ranking model for medical image retrieval (2018) 0.01
    0.0065318826 = product of:
      0.013063765 = sum of:
        0.013063765 = product of:
          0.019595647 = sum of:
            0.0071133827 = weight(_text_:a in 4459) [ClassicSimilarity], result of:
              0.0071133827 = score(doc=4459,freq=14.0), product of:
                0.052761257 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.045758117 = queryNorm
                0.13482209 = fieldWeight in 4459, product of:
                  3.7416575 = tf(freq=14.0), with freq of:
                    14.0 = termFreq=14.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.03125 = fieldNorm(doc=4459)
            0.012482265 = weight(_text_:h in 4459) [ClassicSimilarity], result of:
              0.012482265 = score(doc=4459,freq=2.0), product of:
                0.113683715 = queryWeight, product of:
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.045758117 = queryNorm
                0.10979818 = fieldWeight in 4459, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.4844491 = idf(docFreq=10020, maxDocs=44218)
                  0.03125 = fieldNorm(doc=4459)
          0.6666667 = coord(2/3)
      0.5 = coord(1/2)
    
    Abstract
    One of the main challenges in medical image retrieval is the increasing volume of image data, which render it difficult for domain experts to find relevant information from large data sets. Effective and efficient medical image retrieval systems are required to better manage medical image information. Text-based image retrieval (TBIR) was very successful in retrieving images with textual descriptions. Several TBIR approaches rely on models based on bag-of-words approaches, in which the image retrieval problem turns into one of standard text-based information retrieval; where the meanings and values of specific medical entities in the text and metadata are ignored in the image representation and retrieval process. However, we believe that TBIR should extract specific medical entities and terms and then exploit these elements to achieve better image retrieval results. Therefore, we propose a novel reranking method based on medical-image-dependent features. These features are manually selected by a medical expert from imaging modalities and medical terminology. First, we represent queries and images using only medical-image-dependent features such as image modality and image scale. Second, we exploit the defined features in a new reranking method for medical image retrieval. Our motivation is the large influence of image modality in medical image retrieval and its impact on image-relevance scores. To evaluate our approach, we performed a series of experiments on the medical ImageCLEF data sets from 2009 to 2013. The BM25 model, a language model, and an image-relevance feedback model are used as baselines to evaluate our approach. The experimental results show that compared to the BM25 model, the proposed model significantly enhances image retrieval performance. We also compared our approach with other state-of-the-art approaches and show that our approach performs comparably to those of the top three runs in the official ImageCLEF competition.
    Type
    a
  6. Bauckhage, C.: Marginalizing over the PageRank damping factor (2014) 0.00
    0.0022405048 = product of:
      0.0044810097 = sum of:
        0.0044810097 = product of:
          0.013443029 = sum of:
            0.013443029 = weight(_text_:a in 928) [ClassicSimilarity], result of:
              0.013443029 = score(doc=928,freq=8.0), product of:
                0.052761257 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.045758117 = queryNorm
                0.25478977 = fieldWeight in 928, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.078125 = fieldNorm(doc=928)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
    Abstract
    In this note, we show how to marginalize over the damping parameter of the PageRank equation so as to obtain a parameter-free version known as TotalRank. Our discussion is meant as a reference and intended to provide a guided tour towards an interesting result that has applications in information retrieval and classification.
    Type
    a
  7. Silva, R.M.; Gonçalves, M.A.; Veloso, A.: ¬A Two-stage active learning method for learning to rank (2014) 0.00
    0.002019564 = product of:
      0.004039128 = sum of:
        0.004039128 = product of:
          0.012117383 = sum of:
            0.012117383 = weight(_text_:a in 1184) [ClassicSimilarity], result of:
              0.012117383 = score(doc=1184,freq=26.0), product of:
                0.052761257 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.045758117 = queryNorm
                0.22966442 = fieldWeight in 1184, product of:
                  5.0990195 = tf(freq=26.0), with freq of:
                    26.0 = termFreq=26.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1184)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
    Abstract
    Learning to rank (L2R) algorithms use a labeled training set to generate a ranking model that can later be used to rank new query results. These training sets are costly and laborious to produce, requiring human annotators to assess the relevance or order of the documents in relation to a query. Active learning algorithms are able to reduce the labeling effort by selectively sampling an unlabeled set and choosing data instances that maximize a learning function's effectiveness. In this article, we propose a novel two-stage active learning method for L2R that combines and exploits interesting properties of its constituent parts, thus being effective and practical. In the first stage, an association rule active sampling algorithm is used to select a very small but effective initial training set. In the second stage, a query-by-committee strategy trained with the first-stage set is used to iteratively select more examples until a preset labeling budget is met or a target effectiveness is achieved. We test our method with various LETOR benchmarking data sets and compare it with several baselines to show that it achieves good results using only a small portion of the original training sets.
    Type
    a
  8. Efron, M.: Linear time series models for term weighting in information retrieval (2010) 0.00
    0.0020164545 = product of:
      0.004032909 = sum of:
        0.004032909 = product of:
          0.012098727 = sum of:
            0.012098727 = weight(_text_:a in 3688) [ClassicSimilarity], result of:
              0.012098727 = score(doc=3688,freq=18.0), product of:
                0.052761257 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.045758117 = queryNorm
                0.22931081 = fieldWeight in 3688, product of:
                  4.2426405 = tf(freq=18.0), with freq of:
                    18.0 = termFreq=18.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3688)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
    Abstract
    Common measures of term importance in information retrieval (IR) rely on counts of term frequency; rare terms receive higher weight in document ranking than common terms receive. However, realistic scenarios yield additional information about terms in a collection. Of interest in this article is the temporal behavior of terms as a collection changes over time. We propose capturing each term's collection frequency at discrete time intervals over the lifespan of a corpus and analyzing the resulting time series. We hypothesize the collection frequency of a weakly discriminative term x at time t is predictable by a linear model of the term's prior observations. On the other hand, a linear time series model for a strong discriminators' collection frequency will yield a poor fit to the data. Operationalizing this hypothesis, we induce three time-based measures of term importance and test these against state-of-the-art term weighting models.
    Type
    a
  9. He, J.; Meij, E.; Rijke, M. de: Result diversification based on query-specific cluster ranking (2011) 0.00
    0.0019403342 = product of:
      0.0038806684 = sum of:
        0.0038806684 = product of:
          0.011642005 = sum of:
            0.011642005 = weight(_text_:a in 4355) [ClassicSimilarity], result of:
              0.011642005 = score(doc=4355,freq=24.0), product of:
                0.052761257 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.045758117 = queryNorm
                0.22065444 = fieldWeight in 4355, product of:
                  4.8989797 = tf(freq=24.0), with freq of:
                    24.0 = termFreq=24.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4355)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
    Abstract
    Result diversification is a retrieval strategy for dealing with ambiguous or multi-faceted queries by providing documents that cover as many facets of the query as possible. We propose a result diversification framework based on query-specific clustering and cluster ranking, in which diversification is restricted to documents belonging to clusters that potentially contain a high percentage of relevant documents. Empirical results show that the proposed framework improves the performance of several existing diversification methods. The framework also gives rise to a simple yet effective cluster-based approach to result diversification that selects documents from different clusters to be included in a ranked list in a round robin fashion. We describe a set of experiments aimed at thoroughly analyzing the behavior of the two main components of the proposed diversification framework, ranking and selecting clusters for diversification. Both components have a crucial impact on the overall performance of our framework, but ranking clusters plays a more important role than selecting clusters. We also examine properties that clusters should have in order for our diversification framework to be effective. Most relevant documents should be contained in a small number of high-quality clusters, while there should be no dominantly large clusters. Also, documents from these high-quality clusters should have a diverse content. These properties are strongly correlated with the overall performance of the proposed diversification framework.
    Type
    a
  10. Efron, M.; Winget, M.: Query polyrepresentation for ranking retrieval systems without relevance judgments (2010) 0.00
    0.0019011315 = product of:
      0.003802263 = sum of:
        0.003802263 = product of:
          0.011406789 = sum of:
            0.011406789 = weight(_text_:a in 3469) [ClassicSimilarity], result of:
              0.011406789 = score(doc=3469,freq=16.0), product of:
                0.052761257 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.045758117 = queryNorm
                0.2161963 = fieldWeight in 3469, product of:
                  4.0 = tf(freq=16.0), with freq of:
                    16.0 = termFreq=16.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3469)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
    Abstract
    Ranking information retrieval (IR) systems with respect to their effectiveness is a crucial operation during IR evaluation, as well as during data fusion. This article offers a novel method of approaching the system-ranking problem, based on the widely studied idea of polyrepresentation. The principle of polyrepresentation suggests that a single information need can be represented by many query articulations-what we call query aspects. By skimming the top k (where k is small) documents retrieved by a single system for multiple query aspects, we collect a set of documents that are likely to be relevant to a given test topic. Labeling these skimmed documents as putatively relevant lets us build pseudorelevance judgments without undue human intervention. We report experiments where using these pseudorelevance judgments delivers a rank ordering of IR systems that correlates highly with rankings based on human relevance judgments.
    Type
    a
  11. Nunes, S.; Ribeiro, C.; David, G.: Term weighting based on document revision history (2011) 0.00
    0.0018577286 = product of:
      0.0037154572 = sum of:
        0.0037154572 = product of:
          0.011146371 = sum of:
            0.011146371 = weight(_text_:a in 4946) [ClassicSimilarity], result of:
              0.011146371 = score(doc=4946,freq=22.0), product of:
                0.052761257 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.045758117 = queryNorm
                0.21126054 = fieldWeight in 4946, product of:
                  4.690416 = tf(freq=22.0), with freq of:
                    22.0 = termFreq=22.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4946)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
    Abstract
    In real-world information retrieval systems, the underlying document collection is rarely stable or definitive. This work is focused on the study of signals extracted from the content of documents at different points in time for the purpose of weighting individual terms in a document. The basic idea behind our proposals is that terms that have existed for a longer time in a document should have a greater weight. We propose 4 term weighting functions that use each document's history to estimate a current term score. To evaluate this thesis, we conduct 3 independent experiments using a collection of documents sampled from Wikipedia. In the first experiment, we use data from Wikipedia to judge each set of terms. In a second experiment, we use an external collection of tags from a popular social bookmarking service as a gold standard. In the third experiment, we crowdsource user judgments to collect feedback on term preference. Across all experiments results consistently support our thesis. We show that temporally aware measures, specifically the proposed revision term frequency and revision term frequency span, outperform a term-weighting measure based on raw term frequency alone.
    Type
    a
  12. Cecchini, R.L.; Lorenzetti, C.M.; Maguitman, A.G.; Brignole, N.B.: Multiobjective evolutionary algorithms for context-based search (2010) 0.00
    0.0017783458 = product of:
      0.0035566916 = sum of:
        0.0035566916 = product of:
          0.010670074 = sum of:
            0.010670074 = weight(_text_:a in 3482) [ClassicSimilarity], result of:
              0.010670074 = score(doc=3482,freq=14.0), product of:
                0.052761257 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.045758117 = queryNorm
                0.20223314 = fieldWeight in 3482, product of:
                  3.7416575 = tf(freq=14.0), with freq of:
                    14.0 = termFreq=14.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3482)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
    Abstract
    Formulating high-quality queries is a key aspect of context-based search. However, determining the effectiveness of a query is challenging because multiple objectives, such as high precision and high recall, are usually involved. In this work, we study techniques that can be applied to evolve contextualized queries when the criteria for determining query quality are based on multiple objectives. We report on the results of three different strategies for evolving queries: (a) single-objective, (b) multiobjective with Pareto-based ranking, and (c) multiobjective with aggregative ranking. After a comprehensive evaluation with a large set of topics, we discuss the limitations of the single-objective approach and observe that both the Pareto-based and aggregative strategies are highly effective for evolving topical queries. In particular, our experiments lead us to conclude that the multiobjective techniques are superior to a baseline as well as to well-known and ad hoc query reformulation techniques.
    Type
    a
  13. Costa Carvalho, A. da; Rossi, C.; Moura, E.S. de; Silva, A.S. da; Fernandes, D.: LePrEF: Learn to precompute evidence fusion for efficient query evaluation (2012) 0.00
    0.0017712747 = product of:
      0.0035425494 = sum of:
        0.0035425494 = product of:
          0.010627648 = sum of:
            0.010627648 = weight(_text_:a in 278) [ClassicSimilarity], result of:
              0.010627648 = score(doc=278,freq=20.0), product of:
                0.052761257 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.045758117 = queryNorm
                0.20142901 = fieldWeight in 278, product of:
                  4.472136 = tf(freq=20.0), with freq of:
                    20.0 = termFreq=20.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=278)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
    Abstract
    State-of-the-art search engine ranking methods combine several distinct sources of relevance evidence to produce a high-quality ranking of results for each query. The fusion of information is currently done at query-processing time, which has a direct effect on the response time of search systems. Previous research also shows that an alternative to improve search efficiency in textual databases is to precompute term impacts at indexing time. In this article, we propose a novel alternative to precompute term impacts, providing a generic framework for combining any distinct set of sources of evidence by using a machine-learning technique. This method retains the advantages of producing high-quality results, but avoids the costs of combining evidence at query-processing time. Our method, called Learn to Precompute Evidence Fusion (LePrEF), uses genetic programming to compute a unified precomputed impact value for each term found in each document prior to query processing, at indexing time. Compared with previous research on precomputing term impacts, our method offers the advantage of providing a generic framework to precompute impact using any set of relevance evidence at any text collection, whereas previous research articles do not. The precomputed impact values are indexed and used later for computing document ranking at query-processing time. By doing so, our method effectively reduces the query processing to simple additions of such impacts. We show that this approach, while leading to results comparable to state-of-the-art ranking methods, also can lead to a significant decrease in computational costs during query processing.
    Type
    a
  14. Dang, E.K.F.; Luk, R.W.P.; Allan, J.; Ho, K.S.; Chung, K.F.L.; Lee, D.L.: ¬A new context-dependent term weight computed by boost and discount using relevance information (2010) 0.00
    0.0016803787 = product of:
      0.0033607574 = sum of:
        0.0033607574 = product of:
          0.010082272 = sum of:
            0.010082272 = weight(_text_:a in 4120) [ClassicSimilarity], result of:
              0.010082272 = score(doc=4120,freq=18.0), product of:
                0.052761257 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.045758117 = queryNorm
                0.19109234 = fieldWeight in 4120, product of:
                  4.2426405 = tf(freq=18.0), with freq of:
                    18.0 = termFreq=18.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4120)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
    Abstract
    We studied the effectiveness of a new class of context-dependent term weights for information retrieval. Unlike the traditional term frequency-inverse document frequency (TF-IDF), the new weighting of a term t in a document d depends not only on the occurrence statistics of t alone but also on the terms found within a text window (or "document-context") centered on t. We introduce a Boost and Discount (B&D) procedure which utilizes partial relevance information to compute the context-dependent term weights of query terms according to a logistic regression model. We investigate the effectiveness of the new term weights compared with the context-independent BM25 weights in the setting of relevance feedback. We performed experiments with title queries of the TREC-6, -7, -8, and 2005 collections, comparing the residual Mean Average Precision (MAP) measures obtained using B&D term weights and those obtained by a baseline using BM25 weights. Given either 10 or 20 relevance judgments of the top retrieved documents, using the new term weights yields improvement over the baseline for all collections tested. The MAP obtained with the new weights has relative improvement over the baseline by 3.3 to 15.2%, with statistical significance at the 95% confidence level across all four collections.
    Type
    a
  15. Symonds, M.; Bruza, P.; Zuccon, G.; Koopman, B.; Sitbon, L.; Turner, I.: Automatic query expansion : a structural linguistic perspective (2014) 0.00
    0.0016803787 = product of:
      0.0033607574 = sum of:
        0.0033607574 = product of:
          0.010082272 = sum of:
            0.010082272 = weight(_text_:a in 1338) [ClassicSimilarity], result of:
              0.010082272 = score(doc=1338,freq=18.0), product of:
                0.052761257 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.045758117 = queryNorm
                0.19109234 = fieldWeight in 1338, product of:
                  4.2426405 = tf(freq=18.0), with freq of:
                    18.0 = termFreq=18.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1338)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
    Abstract
    A user's query is considered to be an imprecise description of their information need. Automatic query expansion is the process of reformulating the original query with the goal of improving retrieval effectiveness. Many successful query expansion techniques model syntagmatic associations that infer two terms co-occur more often than by chance in natural language. However, structural linguistics relies on both syntagmatic and paradigmatic associations to deduce the meaning of a word. Given the success of dependency-based approaches to query expansion and the reliance on word meanings in the query formulation process, we argue that modeling both syntagmatic and paradigmatic information in the query expansion process improves retrieval effectiveness. This article develops and evaluates a new query expansion technique that is based on a formal, corpus-based model of word meaning that models syntagmatic and paradigmatic associations. We demonstrate that when sufficient statistical information exists, as in the case of longer queries, including paradigmatic information alone provides significant improvements in retrieval effectiveness across a wide variety of data sets. More generally, when our new query expansion approach is applied to large-scale web retrieval it demonstrates significant improvements in retrieval effectiveness over a strong baseline system, based on a commercial search engine.
    Type
    a
  16. Van der Veer Martens, B.; Fleet, C. van: Opening the black box of "relevance work" : a domain analysis (2012) 0.00
    0.0016464281 = product of:
      0.0032928563 = sum of:
        0.0032928563 = product of:
          0.009878568 = sum of:
            0.009878568 = weight(_text_:a in 247) [ClassicSimilarity], result of:
              0.009878568 = score(doc=247,freq=12.0), product of:
                0.052761257 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.045758117 = queryNorm
                0.18723148 = fieldWeight in 247, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=247)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
    Abstract
    In response to Hjørland's recent call for a reconceptualization of the foundations of relevance, we suggest that the sociocognitive aspects of intermediation by information agencies, such as archives and libraries, are a necessary and unexplored part of the infrastructure of the subject knowledge domains central to his recommended "view of relevance informed by a social paradigm" (2010, p. 217). From a comparative analysis of documents from 39 graduate-level introductory courses in archives, reference, and strategic/competitive intelligence taught in 13 American Library Association-accredited library and information science (LIS) programs, we identify four defining sociocognitive dimensions of "relevance work" in information agencies within Hjørland's proposed framework for relevance: tasks, time, systems, and assessors. This study is intended to supply sociocognitive content from within the relevance work domain to support further domain analytic research, and to emphasize the importance of intermediary relevance work for all subject knowledge domains.
    Type
    a
  17. Karlsson, A.; Hammarfelt, B.; Steinhauer, H.J.; Falkman, G.; Olson, N.; Nelhans, G.; Nolin, J.: Modeling uncertainty in bibliometrics and information retrieval : an information fusion approach (2015) 0.00
    0.0015842763 = product of:
      0.0031685526 = sum of:
        0.0031685526 = product of:
          0.0095056575 = sum of:
            0.0095056575 = weight(_text_:a in 1696) [ClassicSimilarity], result of:
              0.0095056575 = score(doc=1696,freq=4.0), product of:
                0.052761257 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.045758117 = queryNorm
                0.18016359 = fieldWeight in 1696, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.078125 = fieldNorm(doc=1696)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
    Type
    a
  18. Fu, X.: Towards a model of implicit feedback for Web search (2010) 0.00
    0.0015029765 = product of:
      0.003005953 = sum of:
        0.003005953 = product of:
          0.009017859 = sum of:
            0.009017859 = weight(_text_:a in 3310) [ClassicSimilarity], result of:
              0.009017859 = score(doc=3310,freq=10.0), product of:
                0.052761257 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.045758117 = queryNorm
                0.1709182 = fieldWeight in 3310, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3310)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
    Abstract
    This research investigated several important issues in using implicit feedback techniques to assist searchers with difficulties in formulating effective search strategies. It focused on examining the relationship between types of behavioral evidence that can be captured from Web searches and searchers' interests. A carefully crafted observation study was conducted to capture, examine, and elucidate the analytical processes and work practices of human analysts when they simulated the role of an implicit feedback system by trying to infer searchers' interests from behavioral traces. Findings provided rare insight into the complexities and nuances in using behavioral evidence for implicit feedback and led to the proposal of an implicit feedback model for Web search that bridged previous studies on behavioral evidence and implicit feedback measures. A new level of analysis termed an analytical lens emerged from the data and provides a road map for future research on this topic.
    Type
    a
  19. Moura, E.S. de; Fernandes, D.; Ribeiro-Neto, B.; Silva, A.S. da; Gonçalves, M.A.: Using structural information to improve search in Web collections (2010) 0.00
    0.0015029765 = product of:
      0.003005953 = sum of:
        0.003005953 = product of:
          0.009017859 = sum of:
            0.009017859 = weight(_text_:a in 4119) [ClassicSimilarity], result of:
              0.009017859 = score(doc=4119,freq=10.0), product of:
                0.052761257 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.045758117 = queryNorm
                0.1709182 = fieldWeight in 4119, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4119)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
    Abstract
    In this work, we investigate the problem of using the block structure of Web pages to improve ranking results. Starting with basic intuitions provided by the concepts of term frequency (TF) and inverse document frequency (IDF), we propose nine block-weight functions to distinguish the impact of term occurrences inside page blocks, instead of inside whole pages. These are then used to compute a modified BM25 ranking function. Using four distinct Web collections, we ran extensive experiments to compare our block-weight ranking formulas with two other baselines: (a) a BM25 ranking applied to full pages, and (b) a BM25 ranking that takes into account best blocks. Our methods suggest that our block-weighting ranking method is superior to all baselines across all collections we used and that average gain in precision figures from 5 to 20% are generated.
    Type
    a
  20. Lee, J.; Min, J.-K.; Oh, A.; Chung, C.-W.: Effective ranking and search techniques for Web resources considering semantic relationships (2014) 0.00
    0.0014819547 = product of:
      0.0029639094 = sum of:
        0.0029639094 = product of:
          0.008891728 = sum of:
            0.008891728 = weight(_text_:a in 2670) [ClassicSimilarity], result of:
              0.008891728 = score(doc=2670,freq=14.0), product of:
                0.052761257 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.045758117 = queryNorm
                0.1685276 = fieldWeight in 2670, product of:
                  3.7416575 = tf(freq=14.0), with freq of:
                    14.0 = termFreq=14.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2670)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
    Abstract
    On the Semantic Web, the types of resources and the semantic relationships between resources are defined in an ontology. By using that information, the accuracy of information retrieval can be improved. In this paper, we present effective ranking and search techniques considering the semantic relationships in an ontology. Our technique retrieves top-k resources which are the most relevant to query keywords through the semantic relationships. To do this, we propose a weighting measure for the semantic relationship. Based on this measure, we propose a novel ranking method which considers the number of meaningful semantic relationships between a resource and keywords as well as the coverage and discriminating power of keywords. In order to improve the efficiency of the search, we prune the unnecessary search space using the length and weight thresholds of the semantic relationship path. In addition, we exploit Threshold Algorithm based on an extended inverted index to answer top-k results efficiently. The experimental results using real data sets demonstrate that our retrieval method using the semantic information generates accurate results efficiently compared to the traditional methods.
    Content
    Vgl.: doi: 10.1016/j.ipm.2013.08.007. A short preliminary version of this paper was published in the proceeding of WWW 2009 as a two page poster paper.
    Type
    a

Languages

  • e 51
  • d 10

Types

  • a 59
  • el 2
  • r 1
  • More… Less…