Search (99 results, page 5 of 5)

  • × theme_ss:"Retrievalalgorithmen"
  • × type_ss:"a"
  1. Habernal, I.; Konopík, M.; Rohlík, O.: Question answering (2012) 0.00
    0.0029865343 = product of:
      0.014932672 = sum of:
        0.014932672 = product of:
          0.029865343 = sum of:
            0.029865343 = weight(_text_:data in 101) [ClassicSimilarity], result of:
              0.029865343 = score(doc=101,freq=2.0), product of:
                0.14247625 = queryWeight, product of:
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.04505818 = queryNorm
                0.2096163 = fieldWeight in 101, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.046875 = fieldNorm(doc=101)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Abstract
    Question Answering is an area of information retrieval with the added challenge of applying sophisticated techniques to identify the complex syntactic and semantic relationships present in text in order to provide a more sophisticated and satisfactory response to the user's information needs. For this reason, the authors see question answering as the next step beyond standard information retrieval. In this chapter state of the art question answering is covered focusing on providing an overview of systems, techniques and approaches that are likely to be employed in the next generations of search engines. Special attention is paid to question answering using the World Wide Web as the data source and to question answering exploiting the possibilities of Semantic Web. Considerations about the current issues and prospects for promising future research are also provided.
  2. Koumenides, C.L.; Shadbolt, N.R.: Ranking methods for entity-oriented semantic web search (2014) 0.00
    0.0029865343 = product of:
      0.014932672 = sum of:
        0.014932672 = product of:
          0.029865343 = sum of:
            0.029865343 = weight(_text_:data in 1280) [ClassicSimilarity], result of:
              0.029865343 = score(doc=1280,freq=2.0), product of:
                0.14247625 = queryWeight, product of:
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.04505818 = queryNorm
                0.2096163 = fieldWeight in 1280, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1280)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Abstract
    This article provides a technical review of semantic search methods used to support text-based search over formal Semantic Web knowledge bases. Our focus is on ranking methods and auxiliary processes explored by existing semantic search systems, outlined within broad areas of classification. We present reflective examples from the literature in some detail, which should appeal to readers interested in a deeper perspective on the various methods and systems implemented in the outlined literature. The presentation covers graph exploration and propagation methods, adaptations of classic probabilistic retrieval models, and query-independent link analysis via flexible extensions to the PageRank algorithm. Future research directions are discussed, including development of more cohesive retrieval models to unlock further potentials and uses, data indexing schemes, integration with user interfaces, and building community consensus for more systematic evaluation and gradual development.
  3. Efthimiadis, E.N.: Interactive query expansion : a user-based evaluation in a relevance feedback environment (2000) 0.00
    0.0028157318 = product of:
      0.014078659 = sum of:
        0.014078659 = product of:
          0.028157318 = sum of:
            0.028157318 = weight(_text_:data in 5701) [ClassicSimilarity], result of:
              0.028157318 = score(doc=5701,freq=4.0), product of:
                0.14247625 = queryWeight, product of:
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.04505818 = queryNorm
                0.19762816 = fieldWeight in 5701, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.03125 = fieldNorm(doc=5701)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Abstract
    A user-centered investigation of interactive query expansion within the context of a relevance feedback system is presented in this article. Data were collected from 25 searches using the INSPEC database. The data collection mechanisms included questionnaires, transaction logs, and relevance evaluations. The results discuss issues that relate to query expansion, retrieval effectiveness, the correspondence of the on-line-to-off-line relevance judgments, and the selection of terms for query expansion by users (interactive query expansion). The main conclusions drawn from the results of the study are that: (1) one-third of the terms presented to users in a list of candidate terms for query expansion was identified by the users as potentially useful for query expansion. (2) These terms were mainly judged as either variant expressions (synonyms) or alternative (related) terms to the initial query terms. However, a substantial portion of the selected terms were identified as representing new ideas. (3) The relationships identified between the five best terms selected by the users for query expansion and the initial query terms were that: (a) 34% of the query expansion terms have no relationship or other type of correspondence with a query term; (b) 66% of the remaining query expansion terms have a relationship to the query terms. These relationships were: narrower term (46%), broader term (3%), related term (17%). (4) The results provide evidence for the effectiveness of interactive query expansion. The initial search produced on average three highly relevant documents; the query expansion search produced on average nine further highly relevant documents. The conclusions highlight the need for more research on: interactive query expansion, the comparative evaluation of automatic vs. interactive query expansion, the study of weighted Webbased or Web-accessible retrieval systems in operational environments, and for user studies in searching ranked retrieval systems in general
  4. Kantor, P.; Kim, M.H.; Ibraev, U.; Atasoy, K.: Estimating the number of relevant documents in enormous collections (1999) 0.00
    0.0024887787 = product of:
      0.012443894 = sum of:
        0.012443894 = product of:
          0.024887787 = sum of:
            0.024887787 = weight(_text_:data in 6690) [ClassicSimilarity], result of:
              0.024887787 = score(doc=6690,freq=2.0), product of:
                0.14247625 = queryWeight, product of:
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.04505818 = queryNorm
                0.17468026 = fieldWeight in 6690, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=6690)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Abstract
    In assessing information retrieval systems, it is important to know not only the precision of the retrieved set, but also to compare the number of retrieved relevant items to the total number of relevant items. For large collections, such as the TREC test collections, or the World Wide Web, it is not possible to enumerate the entire set of relevant documents. If the retrieved documents are evaluated, a variant of the statistical "capture-recapture" method can be used to estimate the total number of relevant documents, providing the several retrieval systems used are sufficiently independent. We show that the underlying signal detection model supporting such an analysis can be extended in two ways. First, assuming that there are two distinct performance characteristics (corresponding to the chance of retrieving a relevant, and retrieving a given non-relevant document), we show that if there are three or more independent systems available it is possible to estimate the number of relevant documents without actually having to decide whether each individual document is relevant. We report applications of this 3-system method to the TREC data, leading to the conclusion that the independence assumptions are not satisfied. We then extend the model to a multi-system, multi-problem model, and show that it is possible to include statistical dependencies of all orders in the model, and determine the number of relevant documents for each of the problems in the set. Application to the TREC setting will be presented
  5. Crouch, C.J.; Crouch, D.B.; Chen, Q.; Holtz, S.J.: Improving the retrieval effectiveness of very short queries (2002) 0.00
    0.0024887787 = product of:
      0.012443894 = sum of:
        0.012443894 = product of:
          0.024887787 = sum of:
            0.024887787 = weight(_text_:data in 2572) [ClassicSimilarity], result of:
              0.024887787 = score(doc=2572,freq=2.0), product of:
                0.14247625 = queryWeight, product of:
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.04505818 = queryNorm
                0.17468026 = fieldWeight in 2572, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2572)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Abstract
    This paper describes an automatic approach designed to improve the retrieval effectiveness of very short queries such as those used in web searching. The method is based on the observation that stemming, which is designed to maximize recall, often results in depressed precision. Our approach is based on pseudo-feedback and attempts to increase the number of relevant documents in the pseudo-relevant set by reranking those documents based on the presence of unstemmed query terms in the document text. The original experiments underlying this work were carried out using Smart 11.0 and the lnc.ltc weighting scheme on three sets of documents from the TREC collection with corresponding TREC (title only) topics as queries. (The average length of these queries after stoplisting ranges from 2.4 to 4.5 terms.) Results, evaluated in terms of P@20 and non-interpolated average precision, showed clearly that pseudo-feedback (PF) based on this approach was effective in increasing the number of relevant documents in the top ranks. Subsequent experiments, performed on the same data sets using Smart 13.0 and the improved Lnu.ltu weighting scheme, indicate that these results hold up even over the much higher baseline provided by the new weights. Query drift analysis presents a more detailed picture of the improvements produced by this process.
  6. Li, J.; Willett, P.: ArticleRank : a PageRank-based alternative to numbers of citations for analysing citation networks (2009) 0.00
    0.0024887787 = product of:
      0.012443894 = sum of:
        0.012443894 = product of:
          0.024887787 = sum of:
            0.024887787 = weight(_text_:data in 751) [ClassicSimilarity], result of:
              0.024887787 = score(doc=751,freq=2.0), product of:
                0.14247625 = queryWeight, product of:
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.04505818 = queryNorm
                0.17468026 = fieldWeight in 751, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=751)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Abstract
    Purpose - The purpose of this paper is to suggest an alternative to the widely used Times Cited criterion for analysing citation networks. The approach involves taking account of the natures of the papers that cite a given paper, so as to differentiate between papers that attract the same number of citations. Design/methodology/approach - ArticleRank is an algorithm that has been derived from Google's PageRank algorithm to measure the influence of journal articles. ArticleRank is applied to two datasets - a citation network based on an early paper on webometrics, and a self-citation network based on the 19 most cited papers in the Journal of Documentation - using citation data taken from the Web of Knowledge database. Findings - ArticleRank values provide a different ranking of a set of papers from that provided by the corresponding Times Cited values, and overcomes the inability of the latter to differentiate between papers with the same numbers of citations. The difference in rankings between Times Cited and ArticleRank is greatest for the most heavily cited articles in a dataset. Originality/value - This is a novel application of the PageRank algorithm.
  7. Ning, X.; Jin, H.; Wu, H.: RSS: a framework enabling ranked search on the semantic web (2008) 0.00
    0.0024887787 = product of:
      0.012443894 = sum of:
        0.012443894 = product of:
          0.024887787 = sum of:
            0.024887787 = weight(_text_:data in 2069) [ClassicSimilarity], result of:
              0.024887787 = score(doc=2069,freq=2.0), product of:
                0.14247625 = queryWeight, product of:
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.04505818 = queryNorm
                0.17468026 = fieldWeight in 2069, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2069)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Abstract
    The semantic web not only contains resources but also includes the heterogeneous relationships among them, which is sharply distinguished from the current web. As the growth of the semantic web, specialized search techniques are of significance. In this paper, we present RSS-a framework for enabling ranked semantic search on the semantic web. In this framework, the heterogeneity of relationships is fully exploited to determine the global importance of resources. In addition, the search results can be greatly expanded with entities most semantically related to the query, thus able to provide users with properly ordered semantic search results by combining global ranking values and the relevance between the resources and the query. The proposed semantic search model which supports inference is very different from traditional keyword-based search methods. Moreover, RSS also distinguishes from many current methods of accessing the semantic web data in that it applies novel ranking strategies to prevent returning search results in disorder. The experimental results show that the framework is feasible and can produce better ordering of semantic search results than directly applying the standard PageRank algorithm on the semantic web.
  8. Wei, F.; Li, W.; Liu, S.: iRANK: a rank-learn-combine framework for unsupervised ensemble ranking (2010) 0.00
    0.0024887787 = product of:
      0.012443894 = sum of:
        0.012443894 = product of:
          0.024887787 = sum of:
            0.024887787 = weight(_text_:data in 3472) [ClassicSimilarity], result of:
              0.024887787 = score(doc=3472,freq=2.0), product of:
                0.14247625 = queryWeight, product of:
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.04505818 = queryNorm
                0.17468026 = fieldWeight in 3472, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3472)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Abstract
    The authors address the problem of unsupervised ensemble ranking. Traditional approaches either combine multiple ranking criteria into a unified representation to obtain an overall ranking score or to utilize certain rank fusion or aggregation techniques to combine the ranking results. Beyond the aforementioned combine-then-rank and rank-then-combine approaches, the authors propose a novel rank-learn-combine ranking framework, called Interactive Ranking (iRANK), which allows two base rankers to teach each other before combination during the ranking process by providing their own ranking results as feedback to the others to boost the ranking performance. This mutual ranking refinement process continues until the two base rankers cannot learn from each other any more. The overall performance is improved by the enhancement of the base rankers through the mutual learning mechanism. The authors further design two ranking refinement strategies to efficiently and effectively use the feedback based on reasonable assumptions and rational analysis. Although iRANK is applicable to many applications, as a case study, they apply this framework to the sentence ranking problem in query-focused summarization and evaluate its effectiveness on the DUC 2005 and 2006 data sets. The results are encouraging with consistent and promising improvements.
  9. Nunes, S.; Ribeiro, C.; David, G.: Term weighting based on document revision history (2011) 0.00
    0.0024887787 = product of:
      0.012443894 = sum of:
        0.012443894 = product of:
          0.024887787 = sum of:
            0.024887787 = weight(_text_:data in 4946) [ClassicSimilarity], result of:
              0.024887787 = score(doc=4946,freq=2.0), product of:
                0.14247625 = queryWeight, product of:
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.04505818 = queryNorm
                0.17468026 = fieldWeight in 4946, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4946)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Abstract
    In real-world information retrieval systems, the underlying document collection is rarely stable or definitive. This work is focused on the study of signals extracted from the content of documents at different points in time for the purpose of weighting individual terms in a document. The basic idea behind our proposals is that terms that have existed for a longer time in a document should have a greater weight. We propose 4 term weighting functions that use each document's history to estimate a current term score. To evaluate this thesis, we conduct 3 independent experiments using a collection of documents sampled from Wikipedia. In the first experiment, we use data from Wikipedia to judge each set of terms. In a second experiment, we use an external collection of tags from a popular social bookmarking service as a gold standard. In the third experiment, we crowdsource user judgments to collect feedback on term preference. Across all experiments results consistently support our thesis. We show that temporally aware measures, specifically the proposed revision term frequency and revision term frequency span, outperform a term-weighting measure based on raw term frequency alone.
  10. Symonds, M.; Bruza, P.; Zuccon, G.; Koopman, B.; Sitbon, L.; Turner, I.: Automatic query expansion : a structural linguistic perspective (2014) 0.00
    0.0024887787 = product of:
      0.012443894 = sum of:
        0.012443894 = product of:
          0.024887787 = sum of:
            0.024887787 = weight(_text_:data in 1338) [ClassicSimilarity], result of:
              0.024887787 = score(doc=1338,freq=2.0), product of:
                0.14247625 = queryWeight, product of:
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.04505818 = queryNorm
                0.17468026 = fieldWeight in 1338, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1338)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Abstract
    A user's query is considered to be an imprecise description of their information need. Automatic query expansion is the process of reformulating the original query with the goal of improving retrieval effectiveness. Many successful query expansion techniques model syntagmatic associations that infer two terms co-occur more often than by chance in natural language. However, structural linguistics relies on both syntagmatic and paradigmatic associations to deduce the meaning of a word. Given the success of dependency-based approaches to query expansion and the reliance on word meanings in the query formulation process, we argue that modeling both syntagmatic and paradigmatic information in the query expansion process improves retrieval effectiveness. This article develops and evaluates a new query expansion technique that is based on a formal, corpus-based model of word meaning that models syntagmatic and paradigmatic associations. We demonstrate that when sufficient statistical information exists, as in the case of longer queries, including paradigmatic information alone provides significant improvements in retrieval effectiveness across a wide variety of data sets. More generally, when our new query expansion approach is applied to large-scale web retrieval it demonstrates significant improvements in retrieval effectiveness over a strong baseline system, based on a commercial search engine.
  11. Tsai, C.-F.; Hu, Y.-H.; Chen, Z.-Y.: Factors affecting rocchio-based pseudorelevance feedback in image retrieval (2015) 0.00
    0.0024887787 = product of:
      0.012443894 = sum of:
        0.012443894 = product of:
          0.024887787 = sum of:
            0.024887787 = weight(_text_:data in 1607) [ClassicSimilarity], result of:
              0.024887787 = score(doc=1607,freq=2.0), product of:
                0.14247625 = queryWeight, product of:
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.04505818 = queryNorm
                0.17468026 = fieldWeight in 1607, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1607)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Abstract
    Pseudorelevance feedback (PRF) was proposed to solve the limitation of relevance feedback (RF), which is based on the user-in-the-loop process. In PRF, the top-k retrieved images are regarded as PRF. Although the PRF set contains noise, PRF has proven effective for automatically improving the overall retrieval result. To implement PRF, the Rocchio algorithm has been considered as a reasonable and well-established baseline. However, the performance of Rocchio-based PRF is subject to various representation choices (or factors). In this article, we examine these factors that affect the performance of Rocchio-based PRF, including image-feature representation, the number of top-ranked images, the weighting parameters of Rocchio, and similarity measure. We offer practical insights on how to optimize the performance of Rocchio-based PRF by choosing appropriate representation choices. Our extensive experiments on NUS-WIDE-LITE and Caltech 101 + Corel 5000 data sets show that the optimal feature representation is color moment + wavelet texture in terms of retrieval efficiency and effectiveness. Other representation choices are that using top-20 ranked images as pseudopositive and pseudonegative feedback sets with the equal weight (i.e., 0.5) by the correlation and cosine distance functions can produce the optimal retrieval result.
  12. Lee, J.; Min, J.-K.; Oh, A.; Chung, C.-W.: Effective ranking and search techniques for Web resources considering semantic relationships (2014) 0.00
    0.0024887787 = product of:
      0.012443894 = sum of:
        0.012443894 = product of:
          0.024887787 = sum of:
            0.024887787 = weight(_text_:data in 2670) [ClassicSimilarity], result of:
              0.024887787 = score(doc=2670,freq=2.0), product of:
                0.14247625 = queryWeight, product of:
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.04505818 = queryNorm
                0.17468026 = fieldWeight in 2670, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2670)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Abstract
    On the Semantic Web, the types of resources and the semantic relationships between resources are defined in an ontology. By using that information, the accuracy of information retrieval can be improved. In this paper, we present effective ranking and search techniques considering the semantic relationships in an ontology. Our technique retrieves top-k resources which are the most relevant to query keywords through the semantic relationships. To do this, we propose a weighting measure for the semantic relationship. Based on this measure, we propose a novel ranking method which considers the number of meaningful semantic relationships between a resource and keywords as well as the coverage and discriminating power of keywords. In order to improve the efficiency of the search, we prune the unnecessary search space using the length and weight thresholds of the semantic relationship path. In addition, we exploit Threshold Algorithm based on an extended inverted index to answer top-k results efficiently. The experimental results using real data sets demonstrate that our retrieval method using the semantic information generates accurate results efficiently compared to the traditional methods.
  13. Bhansali, D.; Desai, H.; Deulkar, K.: ¬A study of different ranking approaches for semantic search (2015) 0.00
    0.0024887787 = product of:
      0.012443894 = sum of:
        0.012443894 = product of:
          0.024887787 = sum of:
            0.024887787 = weight(_text_:data in 2696) [ClassicSimilarity], result of:
              0.024887787 = score(doc=2696,freq=2.0), product of:
                0.14247625 = queryWeight, product of:
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.04505818 = queryNorm
                0.17468026 = fieldWeight in 2696, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2696)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Abstract
    Search Engines have become an integral part of our day to day life. Our reliance on search engines increases with every passing day. With the amount of data available on Internet increasing exponentially, it becomes important to develop new methods and tools that help to return results relevant to the queries and reduce the time spent on searching. The results should be diverse but at the same time should return results focused on the queries asked. Relation Based Page Rank [4] algorithms are considered to be the next frontier in improvement of Semantic Web Search. The probability of finding relevance in the search results as posited by the user while entering the query is used to measure the relevance. However, its application is limited by the complexity of determining relation between the terms and assigning explicit meaning to each term. Trust Rank is one of the most widely used ranking algorithms for semantic web search. Few other ranking algorithms like HITS algorithm, PageRank algorithm are also used for Semantic Web Searching. In this paper, we will provide a comparison of few ranking approaches.
  14. Jiang, X.; Sun, X.; Yang, Z.; Zhuge, H.; Lapshinova-Koltunski, E.; Yao, J.: Exploiting heterogeneous scientific literature networks to combat ranking bias : evidence from the computational linguistics area (2016) 0.00
    0.0024887787 = product of:
      0.012443894 = sum of:
        0.012443894 = product of:
          0.024887787 = sum of:
            0.024887787 = weight(_text_:data in 3017) [ClassicSimilarity], result of:
              0.024887787 = score(doc=3017,freq=2.0), product of:
                0.14247625 = queryWeight, product of:
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.04505818 = queryNorm
                0.17468026 = fieldWeight in 3017, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3017)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Abstract
    It is important to help researchers find valuable papers from a large literature collection. To this end, many graph-based ranking algorithms have been proposed. However, most of these algorithms suffer from the problem of ranking bias. Ranking bias hurts the usefulness of a ranking algorithm because it returns a ranking list with an undesirable time distribution. This paper is a focused study on how to alleviate ranking bias by leveraging the heterogeneous network structure of the literature collection. We propose a new graph-based ranking algorithm, MutualRank, that integrates mutual reinforcement relationships among networks of papers, researchers, and venues to achieve a more synthetic, accurate, and less-biased ranking than previous methods. MutualRank provides a unified model that involves both intra- and inter-network information for ranking papers, researchers, and venues simultaneously. We use the ACL Anthology Network as the benchmark data set and construct the gold standard from computer linguistics course websites of well-known universities and two well-known textbooks. The experimental results show that MutualRank greatly outperforms the state-of-the-art competitors, including PageRank, HITS, CoRank, Future Rank, and P-Rank, in ranking papers in both improving ranking effectiveness and alleviating ranking bias. Rankings of researchers and venues by MutualRank are also quite reasonable.
  15. Zhu, J.; Han, L.; Gou, Z.; Yuan, X.: ¬A fuzzy clustering-based denoising model for evaluating uncertainty in collaborative filtering recommender systems (2018) 0.00
    0.0024887787 = product of:
      0.012443894 = sum of:
        0.012443894 = product of:
          0.024887787 = sum of:
            0.024887787 = weight(_text_:data in 4460) [ClassicSimilarity], result of:
              0.024887787 = score(doc=4460,freq=2.0), product of:
                0.14247625 = queryWeight, product of:
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.04505818 = queryNorm
                0.17468026 = fieldWeight in 4460, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4460)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Abstract
    Recommender systems are effective in predicting the most suitable products for users, such as movies and books. To facilitate personalized recommendations, the quality of item ratings should be guaranteed. However, a few ratings might not be accurate enough due to the uncertainty of user behavior and are referred to as natural noise. In this article, we present a novel fuzzy clustering-based method for detecting noisy ratings. The entropy of a subset of the original ratings dataset is used to indicate the data-driven uncertainty, and evaluation metrics are adopted to represent the prediction-driven uncertainty. After the repetition of resampling and the execution of a recommendation algorithm, the entropy and evaluation metrics vectors are obtained and are empirically categorized to identify the proportion of the potential noise. Then, the fuzzy C-means-based denoising (FCMD) algorithm is performed to verify the natural noise under the assumption that natural noise is primarily the result of the exceptional behavior of users. Finally, a case study is performed using two real-world datasets. The experimental results show that our proposal outperforms previous proposals and has an advantage in dealing with natural noise.
  16. Jiang, J.-D.; Jiang, J.-Y.; Cheng, P.-J.: Cocluster hypothesis and ranking consistency for relevance ranking in web search (2019) 0.00
    0.0024887787 = product of:
      0.012443894 = sum of:
        0.012443894 = product of:
          0.024887787 = sum of:
            0.024887787 = weight(_text_:data in 5247) [ClassicSimilarity], result of:
              0.024887787 = score(doc=5247,freq=2.0), product of:
                0.14247625 = queryWeight, product of:
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.04505818 = queryNorm
                0.17468026 = fieldWeight in 5247, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5247)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Abstract
    Conventional approaches to relevance ranking typically optimize ranking models by each query separately. The traditional cluster hypothesis also does not consider the dependency between related queries. The goal of this paper is to leverage similar search intents to perform ranking consistency so that the search performance can be improved accordingly. Different from the previous supervised approach, which learns relevance by click-through data, we propose a novel cocluster hypothesis to bridge the gap between relevance ranking and ranking consistency. A nearest-neighbors test is also designed to measure the extent to which the cocluster hypothesis holds. Based on the hypothesis, we further propose a two-stage unsupervised approach, in which two ranking heuristics and a cost function are developed to optimize the combination of consistency and uniqueness (or inconsistency). Extensive experiments have been conducted on a real and large-scale search engine log. The experimental results not only verify the applicability of the proposed cocluster hypothesis but also show that our approach is effective in boosting the retrieval performance of the commercial search engine and reaches a comparable performance to the supervised approach.
  17. Khoo, C.S.G.; Wan, K.-W.: ¬A simple relevancy-ranking strategy for an interface to Boolean OPACs (2004) 0.00
    0.0021366666 = product of:
      0.010683333 = sum of:
        0.010683333 = product of:
          0.021366665 = sum of:
            0.021366665 = weight(_text_:22 in 2509) [ClassicSimilarity], result of:
              0.021366665 = score(doc=2509,freq=2.0), product of:
                0.15778607 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04505818 = queryNorm
                0.1354154 = fieldWeight in 2509, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.02734375 = fieldNorm(doc=2509)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Source
    Electronic library. 22(2004) no.2, S.112-120
  18. Bar-Ilan, J.; Levene, M.; Mat-Hassan, M.: Methods for evaluating dynamic changes in search engine rankings : a case study (2006) 0.00
    0.001991023 = product of:
      0.009955115 = sum of:
        0.009955115 = product of:
          0.01991023 = sum of:
            0.01991023 = weight(_text_:data in 616) [ClassicSimilarity], result of:
              0.01991023 = score(doc=616,freq=2.0), product of:
                0.14247625 = queryWeight, product of:
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.04505818 = queryNorm
                0.1397442 = fieldWeight in 616, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.03125 = fieldNorm(doc=616)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Abstract
    Purpose - The objective of this paper is to characterize the changes in the rankings of the top ten results of major search engines over time and to compare the rankings between these engines. Design/methodology/approach - The papers compare rankings of the top-ten results of the search engines Google and AlltheWeb on ten identical queries over a period of three weeks. Only the top-ten results were considered, since users do not normally inspect more than the first results page returned by a search engine. The experiment was repeated twice, in October 2003 and in January 2004, in order to assess changes to the top-ten results of some of the queries during the three months interval. In order to assess the changes in the rankings, three measures were computed for each data collection point and each search engine. Findings - The findings in this paper show that the rankings of AlltheWeb were highly stable over each period, while the rankings of Google underwent constant yet minor changes, with occasional major ones. Changes over time can be explained by the dynamic nature of the web or by fluctuations in the search engines' indexes. The top-ten results of the two search engines had surprisingly low overlap. With such small overlap, the task of comparing the rankings of the two engines becomes extremely challenging. Originality/value - The paper shows that because of the abundance of information on the web, ranking search results is of extreme importance. The paper compares several measures for computing the similarity between rankings of search tools, and shows that none of the measures is fully satisfactory as a standalone measure. It also demonstrates the apparent differences in the ranking algorithms of two widely used search engines.
  19. Henzinger, M.R.: Link analysis in Web information retrieval (2000) 0.00
    0.001991023 = product of:
      0.009955115 = sum of:
        0.009955115 = product of:
          0.01991023 = sum of:
            0.01991023 = weight(_text_:data in 801) [ClassicSimilarity], result of:
              0.01991023 = score(doc=801,freq=2.0), product of:
                0.14247625 = queryWeight, product of:
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.04505818 = queryNorm
                0.1397442 = fieldWeight in 801, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.03125 = fieldNorm(doc=801)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Source
    IEEE data engineering bulletin. 23(2000) no.3, S.3-8

Years

Languages

  • e 96
  • d 3
  • More… Less…