Search (60 results, page 3 of 3)

  • × theme_ss:"Retrievalalgorithmen"
  • × year_i:[2010 TO 2020}
  1. Liu, X.; Turtle, H.: Real-time user interest modeling for real-time ranking (2013) 0.00
    4.6303135E-4 = product of:
      0.00694547 = sum of:
        0.00694547 = weight(_text_:in in 1035) [ClassicSimilarity], result of:
          0.00694547 = score(doc=1035,freq=6.0), product of:
            0.044469737 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.032692216 = queryNorm
            0.1561842 = fieldWeight in 1035, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=1035)
      0.06666667 = coord(1/15)
    
    Abstract
    User interest as a very dynamic information need is often ignored in most existing information retrieval systems. In this research, we present the results of experiments designed to evaluate the performance of a real-time interest model (RIM) that attempts to identify the dynamic and changing query level interests regarding social media outputs. Unlike most existing ranking methods, our ranking approach targets calculation of the probability that user interest in the content of the document is subject to very dynamic user interest change. We describe 2 formulations of the model (real-time interest vector space and real-time interest language model) stemming from classical relevance ranking methods and develop a novel methodology for evaluating the performance of RIM using Amazon Mechanical Turk to collect (interest-based) relevance judgments on a daily basis. Our results show that the model usually, although not always, performs better than baseline results obtained from commercial web search engines. We identify factors that affect RIM performance and outline plans for future research.
  2. Costa Carvalho, A. da; Rossi, C.; Moura, E.S. de; Silva, A.S. da; Fernandes, D.: LePrEF: Learn to precompute evidence fusion for efficient query evaluation (2012) 0.00
    4.4555214E-4 = product of:
      0.0066832816 = sum of:
        0.0066832816 = weight(_text_:in in 278) [ClassicSimilarity], result of:
          0.0066832816 = score(doc=278,freq=8.0), product of:
            0.044469737 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.032692216 = queryNorm
            0.15028831 = fieldWeight in 278, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0390625 = fieldNorm(doc=278)
      0.06666667 = coord(1/15)
    
    Abstract
    State-of-the-art search engine ranking methods combine several distinct sources of relevance evidence to produce a high-quality ranking of results for each query. The fusion of information is currently done at query-processing time, which has a direct effect on the response time of search systems. Previous research also shows that an alternative to improve search efficiency in textual databases is to precompute term impacts at indexing time. In this article, we propose a novel alternative to precompute term impacts, providing a generic framework for combining any distinct set of sources of evidence by using a machine-learning technique. This method retains the advantages of producing high-quality results, but avoids the costs of combining evidence at query-processing time. Our method, called Learn to Precompute Evidence Fusion (LePrEF), uses genetic programming to compute a unified precomputed impact value for each term found in each document prior to query processing, at indexing time. Compared with previous research on precomputing term impacts, our method offers the advantage of providing a generic framework to precompute impact using any set of relevance evidence at any text collection, whereas previous research articles do not. The precomputed impact values are indexed and used later for computing document ranking at query-processing time. By doing so, our method effectively reduces the query processing to simple additions of such impacts. We show that this approach, while leading to results comparable to state-of-the-art ranking methods, also can lead to a significant decrease in computational costs during query processing.
  3. Silva, R.M.; Gonçalves, M.A.; Veloso, A.: ¬A Two-stage active learning method for learning to rank (2014) 0.00
    4.4555214E-4 = product of:
      0.0066832816 = sum of:
        0.0066832816 = weight(_text_:in in 1184) [ClassicSimilarity], result of:
          0.0066832816 = score(doc=1184,freq=8.0), product of:
            0.044469737 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.032692216 = queryNorm
            0.15028831 = fieldWeight in 1184, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1184)
      0.06666667 = coord(1/15)
    
    Abstract
    Learning to rank (L2R) algorithms use a labeled training set to generate a ranking model that can later be used to rank new query results. These training sets are costly and laborious to produce, requiring human annotators to assess the relevance or order of the documents in relation to a query. Active learning algorithms are able to reduce the labeling effort by selectively sampling an unlabeled set and choosing data instances that maximize a learning function's effectiveness. In this article, we propose a novel two-stage active learning method for L2R that combines and exploits interesting properties of its constituent parts, thus being effective and practical. In the first stage, an association rule active sampling algorithm is used to select a very small but effective initial training set. In the second stage, a query-by-committee strategy trained with the first-stage set is used to iteratively select more examples until a preset labeling budget is met or a target effectiveness is achieved. We test our method with various LETOR benchmarking data sets and compare it with several baselines to show that it achieves good results using only a small portion of the original training sets.
  4. Bar-Ilan, J.; Levene, M.: ¬The hw-rank : an h-index variant for ranking web pages (2015) 0.00
    4.4555214E-4 = product of:
      0.0066832816 = sum of:
        0.0066832816 = weight(_text_:in in 1694) [ClassicSimilarity], result of:
          0.0066832816 = score(doc=1694,freq=2.0), product of:
            0.044469737 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.032692216 = queryNorm
            0.15028831 = fieldWeight in 1694, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.078125 = fieldNorm(doc=1694)
      0.06666667 = coord(1/15)
    
    Footnote
    Beitrag in einem Special Issue "Combining bibliometrics and information retrieval"
  5. Bhansali, D.; Desai, H.; Deulkar, K.: ¬A study of different ranking approaches for semantic search (2015) 0.00
    4.4555214E-4 = product of:
      0.0066832816 = sum of:
        0.0066832816 = weight(_text_:in in 2696) [ClassicSimilarity], result of:
          0.0066832816 = score(doc=2696,freq=8.0), product of:
            0.044469737 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.032692216 = queryNorm
            0.15028831 = fieldWeight in 2696, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2696)
      0.06666667 = coord(1/15)
    
    Abstract
    Search Engines have become an integral part of our day to day life. Our reliance on search engines increases with every passing day. With the amount of data available on Internet increasing exponentially, it becomes important to develop new methods and tools that help to return results relevant to the queries and reduce the time spent on searching. The results should be diverse but at the same time should return results focused on the queries asked. Relation Based Page Rank [4] algorithms are considered to be the next frontier in improvement of Semantic Web Search. The probability of finding relevance in the search results as posited by the user while entering the query is used to measure the relevance. However, its application is limited by the complexity of determining relation between the terms and assigning explicit meaning to each term. Trust Rank is one of the most widely used ranking algorithms for semantic web search. Few other ranking algorithms like HITS algorithm, PageRank algorithm are also used for Semantic Web Searching. In this paper, we will provide a comparison of few ranking approaches.
    Theme
    Semantisches Umfeld in Indexierung u. Retrieval
  6. Zhu, J.; Han, L.; Gou, Z.; Yuan, X.: ¬A fuzzy clustering-based denoising model for evaluating uncertainty in collaborative filtering recommender systems (2018) 0.00
    4.4555214E-4 = product of:
      0.0066832816 = sum of:
        0.0066832816 = weight(_text_:in in 4460) [ClassicSimilarity], result of:
          0.0066832816 = score(doc=4460,freq=8.0), product of:
            0.044469737 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.032692216 = queryNorm
            0.15028831 = fieldWeight in 4460, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4460)
      0.06666667 = coord(1/15)
    
    Abstract
    Recommender systems are effective in predicting the most suitable products for users, such as movies and books. To facilitate personalized recommendations, the quality of item ratings should be guaranteed. However, a few ratings might not be accurate enough due to the uncertainty of user behavior and are referred to as natural noise. In this article, we present a novel fuzzy clustering-based method for detecting noisy ratings. The entropy of a subset of the original ratings dataset is used to indicate the data-driven uncertainty, and evaluation metrics are adopted to represent the prediction-driven uncertainty. After the repetition of resampling and the execution of a recommendation algorithm, the entropy and evaluation metrics vectors are obtained and are empirically categorized to identify the proportion of the potential noise. Then, the fuzzy C-means-based denoising (FCMD) algorithm is performed to verify the natural noise under the assumption that natural noise is primarily the result of the exceptional behavior of users. Finally, a case study is performed using two real-world datasets. The experimental results show that our proposal outperforms previous proposals and has an advantage in dealing with natural noise.
  7. Abdelkareem, M.A.A.: In terms of publication index, what indicator is the best for researchers indexing, Google Scholar, Scopus, Clarivate or others? (2018) 0.00
    4.410741E-4 = product of:
      0.0066161114 = sum of:
        0.0066161114 = weight(_text_:in in 4548) [ClassicSimilarity], result of:
          0.0066161114 = score(doc=4548,freq=4.0), product of:
            0.044469737 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.032692216 = queryNorm
            0.14877784 = fieldWeight in 4548, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4548)
      0.06666667 = coord(1/15)
    
    Abstract
    I believe that Google Scholar is the most popular academic indexing way for researchers and citations. However, some other indexing institutions may be more professional than Google Scholar but not as popular as Google Scholar. Other indexing websites like Scopus and Clarivate are providing more statistical figures for scholars, institutions or even journals. On account of publication citations, always Google Scholar shows higher citations for a paper than other indexing websites since Google Scholar consider most of the publication platforms so he can easily count the citations. While other databases just consider the citations come from those journals that are already indexed in their database
  8. Bilal, D.: Ranking, relevance judgment, and precision of information retrieval on children's queries : evaluation of Google, Yahoo!, Bing, Yahoo! Kids, and ask Kids (2012) 0.00
    3.9851398E-4 = product of:
      0.0059777093 = sum of:
        0.0059777093 = weight(_text_:in in 393) [ClassicSimilarity], result of:
          0.0059777093 = score(doc=393,freq=10.0), product of:
            0.044469737 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.032692216 = queryNorm
            0.13442196 = fieldWeight in 393, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.03125 = fieldNorm(doc=393)
      0.06666667 = coord(1/15)
    
    Abstract
    This study employed benchmarking and intellectual relevance judgment in evaluating Google, Yahoo!, Bing, Yahoo! Kids, and Ask Kids on 30 queries that children formulated to find information for specific tasks. Retrieved hits on given queries were benchmarked to Google's and Yahoo! Kids' top-five ranked hits retrieved. Relevancy of hits was judged on a graded scale; precision was calculated using the precision-at-ten metric (P@10). Yahoo! and Bing produced a similar percentage in hit overlap with Google (nearly 30%), but differed in the ranking of hits. Ask Kids retrieved 11% in hit overlap with Google versus 3% by Yahoo! Kids. The engines retrieved 26 hits across query clusters that overlapped with Yahoo! Kids' top-five ranked hits. Precision (P) that the engines produced across the queries was P = 0.48 for relevant hits, and P = 0.28 for partially relevant hits. Precision by Ask Kids was P = 0.44 for relevant hits versus P = 0.21 by Yahoo! Kids. Bing produced the highest total precision (TP) of relevant hits (TP = 0.86) across the queries, and Yahoo! Kids yielded the lowest (TP = 0.47). Average precision (AP) of relevant hits was AP = 0.56 by leading engines versus AP = 0.29 by small engines. In contrast, average precision of partially relevant hits was AP = 0.83 by small engines versus AP = 0.33 by leading engines. Average precision of relevant hits across the engines was highest on two-word queries and lowest on one-word queries. Google performed best on natural language queries; Bing did the same (P = 0.69) on two-word queries. The findings have implications for search engine ranking algorithms, relevance theory, search engine design, research design, and information literacy.
  9. Ozdemiray, A.M.; Altingovde, I.S.: Explicit search result diversification using score and rank aggregation methods (2015) 0.00
    3.8585952E-4 = product of:
      0.0057878923 = sum of:
        0.0057878923 = weight(_text_:in in 1856) [ClassicSimilarity], result of:
          0.0057878923 = score(doc=1856,freq=6.0), product of:
            0.044469737 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.032692216 = queryNorm
            0.1301535 = fieldWeight in 1856, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1856)
      0.06666667 = coord(1/15)
    
    Abstract
    Search result diversification is one of the key techniques to cope with the ambiguous and underspecified information needs of web users. In the last few years, strategies that are based on the explicit knowledge of query aspects emerged as highly effective ways of diversifying search results. Our contributions in this article are two-fold. First, we extensively evaluate the performance of a state-of-the-art explicit diversification strategy and pin-point its potential weaknesses. We propose basic yet novel optimizations to remedy these weaknesses and boost the performance of this algorithm. As a second contribution, inspired by the success of the current diversification strategies that exploit the relevance of the candidate documents to individual query aspects, we cast the diversification problem into the problem of ranking aggregation. To this end, we propose to materialize the re-rankings of the candidate documents for each query aspect and then merge these rankings by adapting the score(-based) and rank(-based) aggregation methods. Our extensive experimental evaluations show that certain ranking aggregation methods are superior to existing explicit diversification strategies in terms of diversification effectiveness. Furthermore, these ranking aggregation methods have lower computational complexity than the state-of-the-art diversification strategies.
  10. Hoenkamp, E.; Bruza, P.: How everyday language can and will boost effective information retrieval (2015) 0.00
    3.8585952E-4 = product of:
      0.0057878923 = sum of:
        0.0057878923 = weight(_text_:in in 2123) [ClassicSimilarity], result of:
          0.0057878923 = score(doc=2123,freq=6.0), product of:
            0.044469737 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.032692216 = queryNorm
            0.1301535 = fieldWeight in 2123, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2123)
      0.06666667 = coord(1/15)
    
    Abstract
    Typing 2 or 3 keywords into a browser has become an easy and efficient way to find information. Yet, typing even short queries becomes tedious on ever shrinking (virtual) keyboards. Meanwhile, speech processing is maturing rapidly, facilitating everyday language input. Also, wearable technology can inform users proactively by listening in on their conversations or processing their social media interactions. Given these developments, everyday language may soon become the new input of choice. We present an information retrieval (IR) algorithm specifically designed to accept everyday language. It integrates two paradigms of information retrieval, previously studied in isolation; one directed mainly at the surface structure of language, the other primarily at the underlying meaning. The integration was achieved by a Markov machine that encodes meaning by its transition graph, and surface structure by the language it generates. A rigorous evaluation of the approach showed, first, that it can compete with the quality of existing language models, second, that it is more effective the more verbose the input, and third, as a consequence, that it is promising for an imminent transition from keyword input, where the onus is on the user to formulate concise queries, to a modality where users can express more freely, more informal, and more natural their need for information in everyday language.
  11. Liu, X.; Zheng, W.; Fang, H.: ¬An exploration of ranking models and feedback method for related entity finding (2013) 0.00
    3.8585952E-4 = product of:
      0.0057878923 = sum of:
        0.0057878923 = weight(_text_:in in 2714) [ClassicSimilarity], result of:
          0.0057878923 = score(doc=2714,freq=6.0), product of:
            0.044469737 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.032692216 = queryNorm
            0.1301535 = fieldWeight in 2714, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2714)
      0.06666667 = coord(1/15)
    
    Abstract
    Most existing search engines focus on document retrieval. However, information needs are certainly not limited to finding relevant documents. Instead, a user may want to find relevant entities such as persons and organizations. In this paper, we study the problem of related entity finding. Our goal is to rank entities based on their relevance to a structured query, which specifies an input entity, the type of related entities and the relation between the input and related entities. We first discuss a general probabilistic framework, derive six possible retrieval models to rank the related entities, and then compare these models both analytically and empirically. To further improve performance, we study the problem of feedback in the context of related entity finding. Specifically, we propose a mixture model based feedback method that can utilize the pseudo feedback entities to estimate an enriched model for the relation between the input and related entities. Experimental results over two standard TREC collections show that the derived relation generation model combined with a relation feedback method performs better than other models.
    Theme
    Semantisches Umfeld in Indexierung u. Retrieval
  12. Hubert, G.; Pitarch, Y.; Pinel-Sauvagnat, K.; Tournier, R.; Laporte, L.: TournaRank : when retrieval becomes document competition (2018) 0.00
    3.8585952E-4 = product of:
      0.0057878923 = sum of:
        0.0057878923 = weight(_text_:in in 5087) [ClassicSimilarity], result of:
          0.0057878923 = score(doc=5087,freq=6.0), product of:
            0.044469737 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.032692216 = queryNorm
            0.1301535 = fieldWeight in 5087, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5087)
      0.06666667 = coord(1/15)
    
    Abstract
    Numerous feature-based models have been recently proposed by the information retrieval community. The capability of features to express different relevance facets (query- or document-dependent) can explain such a success story. Such models are most of the time supervised, thus requiring a learning phase. To leverage the advantages of feature-based representations of documents, we propose TournaRank, an unsupervised approach inspired by real-life game and sport competition principles. Documents compete against each other in tournaments using features as evidences of relevance. Tournaments are modeled as a sequence of matches, which involve pairs of documents playing in turn their features. Once a tournament is ended, documents are ranked according to their number of won matches during the tournament. This principle is generic since it can be applied to any collection type. It also provides great flexibility since different alternatives can be considered by changing the tournament type, the match rules, the feature set, or the strategies adopted by documents during matches. TournaRank was experimented on several collections to evaluate our model in different contexts and to compare it with related approaches such as Learning To Rank and fusion ones: the TREC Robust2004 collection for homogeneous documents, the TREC Web2014 (ClueWeb12) collection for heterogeneous web documents, and the LETOR3.0 collection for comparison with supervised feature-based models.
  13. Jiang, J.-D.; Jiang, J.-Y.; Cheng, P.-J.: Cocluster hypothesis and ranking consistency for relevance ranking in web search (2019) 0.00
    3.8585952E-4 = product of:
      0.0057878923 = sum of:
        0.0057878923 = weight(_text_:in in 5247) [ClassicSimilarity], result of:
          0.0057878923 = score(doc=5247,freq=6.0), product of:
            0.044469737 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.032692216 = queryNorm
            0.1301535 = fieldWeight in 5247, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5247)
      0.06666667 = coord(1/15)
    
    Abstract
    Conventional approaches to relevance ranking typically optimize ranking models by each query separately. The traditional cluster hypothesis also does not consider the dependency between related queries. The goal of this paper is to leverage similar search intents to perform ranking consistency so that the search performance can be improved accordingly. Different from the previous supervised approach, which learns relevance by click-through data, we propose a novel cocluster hypothesis to bridge the gap between relevance ranking and ranking consistency. A nearest-neighbors test is also designed to measure the extent to which the cocluster hypothesis holds. Based on the hypothesis, we further propose a two-stage unsupervised approach, in which two ranking heuristics and a cost function are developed to optimize the combination of consistency and uniqueness (or inconsistency). Extensive experiments have been conducted on a real and large-scale search engine log. The experimental results not only verify the applicability of the proposed cocluster hypothesis but also show that our approach is effective in boosting the retrieval performance of the commercial search engine and reaches a comparable performance to the supervised approach.
  14. Cecchini, R.L.; Lorenzetti, C.M.; Maguitman, A.G.; Brignole, N.B.: Multiobjective evolutionary algorithms for context-based search (2010) 0.00
    3.7806356E-4 = product of:
      0.005670953 = sum of:
        0.005670953 = weight(_text_:in in 3482) [ClassicSimilarity], result of:
          0.005670953 = score(doc=3482,freq=4.0), product of:
            0.044469737 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.032692216 = queryNorm
            0.12752387 = fieldWeight in 3482, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=3482)
      0.06666667 = coord(1/15)
    
    Abstract
    Formulating high-quality queries is a key aspect of context-based search. However, determining the effectiveness of a query is challenging because multiple objectives, such as high precision and high recall, are usually involved. In this work, we study techniques that can be applied to evolve contextualized queries when the criteria for determining query quality are based on multiple objectives. We report on the results of three different strategies for evolving queries: (a) single-objective, (b) multiobjective with Pareto-based ranking, and (c) multiobjective with aggregative ranking. After a comprehensive evaluation with a large set of topics, we discuss the limitations of the single-objective approach and observe that both the Pareto-based and aggregative strategies are highly effective for evolving topical queries. In particular, our experiments lead us to conclude that the multiobjective techniques are superior to a baseline as well as to well-known and ad hoc query reformulation techniques.
  15. Jindal, V.; Bawa, S.; Batra, S.: ¬A review of ranking approaches for semantic search on Web (2014) 0.00
    3.7806356E-4 = product of:
      0.005670953 = sum of:
        0.005670953 = weight(_text_:in in 2799) [ClassicSimilarity], result of:
          0.005670953 = score(doc=2799,freq=4.0), product of:
            0.044469737 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.032692216 = queryNorm
            0.12752387 = fieldWeight in 2799, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=2799)
      0.06666667 = coord(1/15)
    
    Abstract
    With ever increasing information being available to the end users, search engines have become the most powerful tools for obtaining useful information scattered on the Web. However, it is very common that even most renowned search engines return result sets with not so useful pages to the user. Research on semantic search aims to improve traditional information search and retrieval methods where the basic relevance criteria rely primarily on the presence of query keywords within the returned pages. This work is an attempt to explore different relevancy ranking approaches based on semantics which are considered appropriate for the retrieval of relevant information. In this paper, various pilot projects and their corresponding outcomes have been investigated based on methodologies adopted and their most distinctive characteristics towards ranking. An overview of selected approaches and their comparison by means of the classification criteria has been presented. With the help of this comparison, some common concepts and outstanding features have been identified.
    Theme
    Semantisches Umfeld in Indexierung u. Retrieval
  16. Dang, E.K.F.; Luk, R.W.P.; Allan, J.; Ho, K.S.; Chung, K.F.L.; Lee, D.L.: ¬A new context-dependent term weight computed by boost and discount using relevance information (2010) 0.00
    3.1505295E-4 = product of:
      0.004725794 = sum of:
        0.004725794 = weight(_text_:in in 4120) [ClassicSimilarity], result of:
          0.004725794 = score(doc=4120,freq=4.0), product of:
            0.044469737 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.032692216 = queryNorm
            0.10626988 = fieldWeight in 4120, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4120)
      0.06666667 = coord(1/15)
    
    Abstract
    We studied the effectiveness of a new class of context-dependent term weights for information retrieval. Unlike the traditional term frequency-inverse document frequency (TF-IDF), the new weighting of a term t in a document d depends not only on the occurrence statistics of t alone but also on the terms found within a text window (or "document-context") centered on t. We introduce a Boost and Discount (B&D) procedure which utilizes partial relevance information to compute the context-dependent term weights of query terms according to a logistic regression model. We investigate the effectiveness of the new term weights compared with the context-independent BM25 weights in the setting of relevance feedback. We performed experiments with title queries of the TREC-6, -7, -8, and 2005 collections, comparing the residual Mean Average Precision (MAP) measures obtained using B&D term weights and those obtained by a baseline using BM25 weights. Given either 10 or 20 relevance judgments of the top retrieved documents, using the new term weights yields improvement over the baseline for all collections tested. The MAP obtained with the new weights has relative improvement over the baseline by 3.3 to 15.2%, with statistical significance at the 95% confidence level across all four collections.
  17. Jiang, X.; Sun, X.; Yang, Z.; Zhuge, H.; Lapshinova-Koltunski, E.; Yao, J.: Exploiting heterogeneous scientific literature networks to combat ranking bias : evidence from the computational linguistics area (2016) 0.00
    3.1505295E-4 = product of:
      0.004725794 = sum of:
        0.004725794 = weight(_text_:in in 3017) [ClassicSimilarity], result of:
          0.004725794 = score(doc=3017,freq=4.0), product of:
            0.044469737 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.032692216 = queryNorm
            0.10626988 = fieldWeight in 3017, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3017)
      0.06666667 = coord(1/15)
    
    Abstract
    It is important to help researchers find valuable papers from a large literature collection. To this end, many graph-based ranking algorithms have been proposed. However, most of these algorithms suffer from the problem of ranking bias. Ranking bias hurts the usefulness of a ranking algorithm because it returns a ranking list with an undesirable time distribution. This paper is a focused study on how to alleviate ranking bias by leveraging the heterogeneous network structure of the literature collection. We propose a new graph-based ranking algorithm, MutualRank, that integrates mutual reinforcement relationships among networks of papers, researchers, and venues to achieve a more synthetic, accurate, and less-biased ranking than previous methods. MutualRank provides a unified model that involves both intra- and inter-network information for ranking papers, researchers, and venues simultaneously. We use the ACL Anthology Network as the benchmark data set and construct the gold standard from computer linguistics course websites of well-known universities and two well-known textbooks. The experimental results show that MutualRank greatly outperforms the state-of-the-art competitors, including PageRank, HITS, CoRank, Future Rank, and P-Rank, in ranking papers in both improving ranking effectiveness and alleviating ranking bias. Rankings of researchers and venues by MutualRank are also quite reasonable.
  18. Biskri, I.; Rompré, L.: Using association rules for query reformulation (2012) 0.00
    2.6733126E-4 = product of:
      0.0040099686 = sum of:
        0.0040099686 = weight(_text_:in in 92) [ClassicSimilarity], result of:
          0.0040099686 = score(doc=92,freq=2.0), product of:
            0.044469737 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.032692216 = queryNorm
            0.09017298 = fieldWeight in 92, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=92)
      0.06666667 = coord(1/15)
    
    Abstract
    In this paper the authors will present research on the combination of two methods of data mining: text classification and maximal association rules. Text classification has been the focus of interest of many researchers for a long time. However, the results take the form of lists of words (classes) that people often do not know what to do with. The use of maximal association rules induced a number of advantages: (1) the detection of dependencies and correlations between the relevant units of information (words) of different classes, (2) the extraction of hidden knowledge, often relevant, from a large volume of data. The authors will show how this combination can improve the process of information retrieval.
  19. Wei, F.; Li, W.; Liu, S.: iRANK: a rank-learn-combine framework for unsupervised ensemble ranking (2010) 0.00
    2.2277607E-4 = product of:
      0.0033416408 = sum of:
        0.0033416408 = weight(_text_:in in 3472) [ClassicSimilarity], result of:
          0.0033416408 = score(doc=3472,freq=2.0), product of:
            0.044469737 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.032692216 = queryNorm
            0.07514416 = fieldWeight in 3472, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3472)
      0.06666667 = coord(1/15)
    
    Abstract
    The authors address the problem of unsupervised ensemble ranking. Traditional approaches either combine multiple ranking criteria into a unified representation to obtain an overall ranking score or to utilize certain rank fusion or aggregation techniques to combine the ranking results. Beyond the aforementioned combine-then-rank and rank-then-combine approaches, the authors propose a novel rank-learn-combine ranking framework, called Interactive Ranking (iRANK), which allows two base rankers to teach each other before combination during the ranking process by providing their own ranking results as feedback to the others to boost the ranking performance. This mutual ranking refinement process continues until the two base rankers cannot learn from each other any more. The overall performance is improved by the enhancement of the base rankers through the mutual learning mechanism. The authors further design two ranking refinement strategies to efficiently and effectively use the feedback based on reasonable assumptions and rational analysis. Although iRANK is applicable to many applications, as a case study, they apply this framework to the sentence ranking problem in query-focused summarization and evaluate its effectiveness on the DUC 2005 and 2006 data sets. The results are encouraging with consistent and promising improvements.
  20. Lee, J.-T.; Seo, J.; Jeon, J.; Rim, H.-C.: Sentence-based relevance flow analysis for high accuracy retrieval (2011) 0.00
    2.2277607E-4 = product of:
      0.0033416408 = sum of:
        0.0033416408 = weight(_text_:in in 4746) [ClassicSimilarity], result of:
          0.0033416408 = score(doc=4746,freq=2.0), product of:
            0.044469737 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.032692216 = queryNorm
            0.07514416 = fieldWeight in 4746, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4746)
      0.06666667 = coord(1/15)
    
    Abstract
    Traditional ranking models for information retrieval lack the ability to make a clear distinction between relevant and nonrelevant documents at top ranks if both have similar bag-of-words representations with regard to a user query. We aim to go beyond the bag-of-words approach to document ranking in a new perspective, by representing each document as a sequence of sentences. We begin with an assumption that relevant documents are distinguishable from nonrelevant ones by sequential patterns of relevance degrees of sentences to a query. We introduce the notion of relevance flow, which refers to a stream of sentence-query relevance within a document. We then present a framework to learn a function for ranking documents effectively based on various features extracted from their relevance flows and leverage the output to enhance existing retrieval models. We validate the effectiveness of our approach by performing a number of retrieval experiments on three standard test collections, each comprising a different type of document: news articles, medical references, and blog posts. Experimental results demonstrate that the proposed approach can improve the retrieval performance at the top ranks significantly as compared with the state-of-the-art retrieval models regardless of document type.

Languages

  • e 50
  • d 10

Types

  • a 57
  • el 2
  • r 1
  • x 1
  • More… Less…