Search (318 results, page 1 of 16)

  • × language_ss:"e"
  • × theme_ss:"Retrievalalgorithmen"
  1. Pan, M.; Huang, J.X.; He, T.; Mao, Z.; Ying, Z.; Tu, X.: ¬A simple kernel co-occurrence-based enhancement for pseudo-relevance feedback (2020) 0.02
    0.019840576 = product of:
      0.0793623 = sum of:
        0.0793623 = product of:
          0.119043455 = sum of:
            0.012212924 = weight(_text_:a in 5678) [ClassicSimilarity], result of:
              0.012212924 = score(doc=5678,freq=24.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.22065444 = fieldWeight in 5678, product of:
                  4.8989797 = tf(freq=24.0), with freq of:
                    24.0 = termFreq=24.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5678)
            0.10683053 = weight(_text_:z in 5678) [ClassicSimilarity], result of:
              0.10683053 = score(doc=5678,freq=4.0), product of:
                0.2562021 = queryWeight, product of:
                  5.337313 = idf(docFreq=577, maxDocs=44218)
                  0.04800207 = queryNorm
                0.41697758 = fieldWeight in 5678, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  5.337313 = idf(docFreq=577, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5678)
          0.6666667 = coord(2/3)
      0.25 = coord(1/4)
    
    Abstract
    Pseudo-relevance feedback is a well-studied query expansion technique in which it is assumed that the top-ranked documents in an initial set of retrieval results are relevant and expansion terms are then extracted from those documents. When selecting expansion terms, most traditional models do not simultaneously consider term frequency and the co-occurrence relationships between candidate terms and query terms. Intuitively, however, a term that has a higher co-occurrence with a query term is more likely to be related to the query topic. In this article, we propose a kernel co-occurrence-based framework to enhance retrieval performance by integrating term co-occurrence information into the Rocchio model and a relevance language model (RM3). Specifically, a kernel co-occurrence-based Rocchio method (KRoc) and a kernel co-occurrence-based RM3 method (KRM3) are proposed. In our framework, co-occurrence information is incorporated into both the factor of the term discrimination power and the factor of the within-document term weight to boost retrieval performance. The results of a series of experiments show that our proposed methods significantly outperform the corresponding strong baselines over all data sets in terms of the mean average precision and over most data sets in terms of P@10. A direct comparison of standard Text Retrieval Conference data sets indicates that our proposed methods are at least comparable to state-of-the-art approaches.
    Type
    a
  2. Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval (1986) 0.02
    0.019223286 = product of:
      0.07689314 = sum of:
        0.07689314 = product of:
          0.11533971 = sum of:
            0.011281814 = weight(_text_:a in 402) [ClassicSimilarity], result of:
              0.011281814 = score(doc=402,freq=2.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.20383182 = fieldWeight in 402, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.125 = fieldNorm(doc=402)
            0.10405789 = weight(_text_:22 in 402) [ClassicSimilarity], result of:
              0.10405789 = score(doc=402,freq=2.0), product of:
                0.16809508 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04800207 = queryNorm
                0.61904186 = fieldWeight in 402, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.125 = fieldNorm(doc=402)
          0.6666667 = coord(2/3)
      0.25 = coord(1/4)
    
    Source
    Information processing and management. 22(1986) no.6, S.465-476
    Type
    a
  3. Smeaton, A.F.; Rijsbergen, C.J. van: ¬The retrieval effects of query expansion on a feedback document retrieval system (1983) 0.02
    0.017501865 = product of:
      0.07000746 = sum of:
        0.07000746 = product of:
          0.10501119 = sum of:
            0.013960535 = weight(_text_:a in 2134) [ClassicSimilarity], result of:
              0.013960535 = score(doc=2134,freq=4.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.25222903 = fieldWeight in 2134, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.109375 = fieldNorm(doc=2134)
            0.091050655 = weight(_text_:22 in 2134) [ClassicSimilarity], result of:
              0.091050655 = score(doc=2134,freq=2.0), product of:
                0.16809508 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04800207 = queryNorm
                0.5416616 = fieldWeight in 2134, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=2134)
          0.6666667 = coord(2/3)
      0.25 = coord(1/4)
    
    Date
    30. 3.2001 13:32:22
    Type
    a
  4. Back, J.: ¬An evaluation of relevancy ranking techniques used by Internet search engines (2000) 0.02
    0.016820375 = product of:
      0.0672815 = sum of:
        0.0672815 = product of:
          0.10092224 = sum of:
            0.009871588 = weight(_text_:a in 3445) [ClassicSimilarity], result of:
              0.009871588 = score(doc=3445,freq=2.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.17835285 = fieldWeight in 3445, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.109375 = fieldNorm(doc=3445)
            0.091050655 = weight(_text_:22 in 3445) [ClassicSimilarity], result of:
              0.091050655 = score(doc=3445,freq=2.0), product of:
                0.16809508 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04800207 = queryNorm
                0.5416616 = fieldWeight in 3445, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=3445)
          0.6666667 = coord(2/3)
      0.25 = coord(1/4)
    
    Date
    25. 8.2005 17:42:22
    Type
    a
  5. Li, H.; Wu, H.; Li, D.; Lin, S.; Su, Z.; Luo, X.: PSI: A probabilistic semantic interpretable framework for fine-grained image ranking (2018) 0.02
    0.016518347 = product of:
      0.06607339 = sum of:
        0.06607339 = product of:
          0.09911008 = sum of:
            0.008461362 = weight(_text_:a in 4577) [ClassicSimilarity], result of:
              0.008461362 = score(doc=4577,freq=8.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.15287387 = fieldWeight in 4577, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4577)
            0.09064872 = weight(_text_:z in 4577) [ClassicSimilarity], result of:
              0.09064872 = score(doc=4577,freq=2.0), product of:
                0.2562021 = queryWeight, product of:
                  5.337313 = idf(docFreq=577, maxDocs=44218)
                  0.04800207 = queryNorm
                0.35381722 = fieldWeight in 4577, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.337313 = idf(docFreq=577, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4577)
          0.6666667 = coord(2/3)
      0.25 = coord(1/4)
    
    Abstract
    Image Ranking is one of the key problems in information science research area. However, most current methods focus on increasing the performance, leaving the semantic gap problem, which refers to the learned ranking models are hard to be understood, remaining intact. Therefore, in this article, we aim at learning an interpretable ranking model to tackle the semantic gap in fine-grained image ranking. We propose to combine attribute-based representation and online passive-aggressive (PA) learning based ranking models to achieve this goal. Besides, considering the highly localized instances in fine-grained image ranking, we introduce a supervised constrained clustering method to gather class-balanced training instances for local PA-based models, and incorporate the learned local models into a unified probabilistic framework. Extensive experiments on the benchmark demonstrate that the proposed framework outperforms state-of-the-art methods in terms of accuracy and speed.
    Type
    a
  6. Chen, Z.; Meng, X.; Fowler, R.H.; Zhu, B.: Real-time adaptive feature and document learning for Web search (2001) 0.01
    0.014448237 = product of:
      0.057792947 = sum of:
        0.057792947 = product of:
          0.08668942 = sum of:
            0.0111488225 = weight(_text_:a in 5209) [ClassicSimilarity], result of:
              0.0111488225 = score(doc=5209,freq=20.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.20142901 = fieldWeight in 5209, product of:
                  4.472136 = tf(freq=20.0), with freq of:
                    20.0 = termFreq=20.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5209)
            0.075540595 = weight(_text_:z in 5209) [ClassicSimilarity], result of:
              0.075540595 = score(doc=5209,freq=2.0), product of:
                0.2562021 = queryWeight, product of:
                  5.337313 = idf(docFreq=577, maxDocs=44218)
                  0.04800207 = queryNorm
                0.29484767 = fieldWeight in 5209, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.337313 = idf(docFreq=577, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5209)
          0.6666667 = coord(2/3)
      0.25 = coord(1/4)
    
    Abstract
    Chen et alia report on the design of FEATURES, a web search engine with adaptive features based on minimal relevance feedback. Rather than developing user profiles from previous searcher activity either at the server or client location, or updating indexes after search completion, FEATURES allows for index and user characterization files to be updated during query modification on retrieval from a general purpose search engine. Indexing terms relevant to a query are defined as the union of all terms assigned to documents retrieved by the initial search run and are used to build a vector space model on this retrieved set. The top ten weighted terms are presented to the user for a relevant non-relevant choice which is used to modify the term weights. Documents are chosen if their summed term weights are greater than some threshold. A user evaluation of the top ten ranked documents as non-relevant will decrease these term weights and a positive judgement will increase them. A new ordering of the retrieved set will generate new display lists of terms and documents. Precision is improved in a test on Alta Vista searches.
    Type
    a
  7. Jiang, X.; Sun, X.; Yang, Z.; Zhuge, H.; Lapshinova-Koltunski, E.; Yao, J.: Exploiting heterogeneous scientific literature networks to combat ranking bias : evidence from the computational linguistics area (2016) 0.01
    0.0142520685 = product of:
      0.057008274 = sum of:
        0.057008274 = product of:
          0.08551241 = sum of:
            0.0099718105 = weight(_text_:a in 3017) [ClassicSimilarity], result of:
              0.0099718105 = score(doc=3017,freq=16.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.18016359 = fieldWeight in 3017, product of:
                  4.0 = tf(freq=16.0), with freq of:
                    16.0 = termFreq=16.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3017)
            0.075540595 = weight(_text_:z in 3017) [ClassicSimilarity], result of:
              0.075540595 = score(doc=3017,freq=2.0), product of:
                0.2562021 = queryWeight, product of:
                  5.337313 = idf(docFreq=577, maxDocs=44218)
                  0.04800207 = queryNorm
                0.29484767 = fieldWeight in 3017, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.337313 = idf(docFreq=577, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3017)
          0.6666667 = coord(2/3)
      0.25 = coord(1/4)
    
    Abstract
    It is important to help researchers find valuable papers from a large literature collection. To this end, many graph-based ranking algorithms have been proposed. However, most of these algorithms suffer from the problem of ranking bias. Ranking bias hurts the usefulness of a ranking algorithm because it returns a ranking list with an undesirable time distribution. This paper is a focused study on how to alleviate ranking bias by leveraging the heterogeneous network structure of the literature collection. We propose a new graph-based ranking algorithm, MutualRank, that integrates mutual reinforcement relationships among networks of papers, researchers, and venues to achieve a more synthetic, accurate, and less-biased ranking than previous methods. MutualRank provides a unified model that involves both intra- and inter-network information for ranking papers, researchers, and venues simultaneously. We use the ACL Anthology Network as the benchmark data set and construct the gold standard from computer linguistics course websites of well-known universities and two well-known textbooks. The experimental results show that MutualRank greatly outperforms the state-of-the-art competitors, including PageRank, HITS, CoRank, Future Rank, and P-Rank, in ranking papers in both improving ranking effectiveness and alleviating ranking bias. Rankings of researchers and venues by MutualRank are also quite reasonable.
    Type
    a
  8. Li, M.; Li, H.; Zhou, Z.-H.: Semi-supervised document retrieval (2009) 0.01
    0.014144729 = product of:
      0.056578916 = sum of:
        0.056578916 = product of:
          0.08486837 = sum of:
            0.009327774 = weight(_text_:a in 4218) [ClassicSimilarity], result of:
              0.009327774 = score(doc=4218,freq=14.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.1685276 = fieldWeight in 4218, product of:
                  3.7416575 = tf(freq=14.0), with freq of:
                    14.0 = termFreq=14.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4218)
            0.075540595 = weight(_text_:z in 4218) [ClassicSimilarity], result of:
              0.075540595 = score(doc=4218,freq=2.0), product of:
                0.2562021 = queryWeight, product of:
                  5.337313 = idf(docFreq=577, maxDocs=44218)
                  0.04800207 = queryNorm
                0.29484767 = fieldWeight in 4218, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.337313 = idf(docFreq=577, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4218)
          0.6666667 = coord(2/3)
      0.25 = coord(1/4)
    
    Abstract
    This paper proposes a new machine learning method for constructing ranking models in document retrieval. The method, which is referred to as SSRank, aims to use the advantages of both the traditional Information Retrieval (IR) methods and the supervised learning methods for IR proposed recently. The advantages include the use of limited amount of labeled data and rich model representation. To do so, the method adopts a semi-supervised learning framework in ranking model construction. Specifically, given a small number of labeled documents with respect to some queries, the method effectively labels the unlabeled documents for the queries. It then uses all the labeled data to train a machine learning model (in our case, Neural Network). In the data labeling, the method also makes use of a traditional IR model (in our case, BM25). A stopping criterion based on machine learning theory is given for the data labeling process. Experimental results on three benchmark datasets and one web search dataset indicate that SSRank consistently and almost always significantly outperforms the baseline methods (unsupervised and supervised learning methods), given the same amount of labeled data. This is because SSRank can effectively leverage the use of unlabeled data in learning.
    Type
    a
  9. Zhu, J.; Han, L.; Gou, Z.; Yuan, X.: ¬A fuzzy clustering-based denoising model for evaluating uncertainty in collaborative filtering recommender systems (2018) 0.01
    0.014144729 = product of:
      0.056578916 = sum of:
        0.056578916 = product of:
          0.08486837 = sum of:
            0.009327774 = weight(_text_:a in 4460) [ClassicSimilarity], result of:
              0.009327774 = score(doc=4460,freq=14.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.1685276 = fieldWeight in 4460, product of:
                  3.7416575 = tf(freq=14.0), with freq of:
                    14.0 = termFreq=14.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4460)
            0.075540595 = weight(_text_:z in 4460) [ClassicSimilarity], result of:
              0.075540595 = score(doc=4460,freq=2.0), product of:
                0.2562021 = queryWeight, product of:
                  5.337313 = idf(docFreq=577, maxDocs=44218)
                  0.04800207 = queryNorm
                0.29484767 = fieldWeight in 4460, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.337313 = idf(docFreq=577, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4460)
          0.6666667 = coord(2/3)
      0.25 = coord(1/4)
    
    Abstract
    Recommender systems are effective in predicting the most suitable products for users, such as movies and books. To facilitate personalized recommendations, the quality of item ratings should be guaranteed. However, a few ratings might not be accurate enough due to the uncertainty of user behavior and are referred to as natural noise. In this article, we present a novel fuzzy clustering-based method for detecting noisy ratings. The entropy of a subset of the original ratings dataset is used to indicate the data-driven uncertainty, and evaluation metrics are adopted to represent the prediction-driven uncertainty. After the repetition of resampling and the execution of a recommendation algorithm, the entropy and evaluation metrics vectors are obtained and are empirically categorized to identify the proportion of the potential noise. Then, the fuzzy C-means-based denoising (FCMD) algorithm is performed to verify the natural noise under the assumption that natural noise is primarily the result of the exceptional behavior of users. Finally, a case study is performed using two real-world datasets. The experimental results show that our proposal outperforms previous proposals and has an advantage in dealing with natural noise.
    Type
    a
  10. Brenner, E.H.: Beyond Boolean : new approaches in information retrieval; the quest for intuitive online search systems past, present & future (1995) 0.01
    0.014035223 = product of:
      0.028070446 = sum of:
        0.026425181 = weight(_text_:von in 2547) [ClassicSimilarity], result of:
          0.026425181 = score(doc=2547,freq=2.0), product of:
            0.12806706 = queryWeight, product of:
              2.6679487 = idf(docFreq=8340, maxDocs=44218)
              0.04800207 = queryNorm
            0.20633863 = fieldWeight in 2547, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.6679487 = idf(docFreq=8340, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2547)
        0.0016452647 = product of:
          0.004935794 = sum of:
            0.004935794 = weight(_text_:a in 2547) [ClassicSimilarity], result of:
              0.004935794 = score(doc=2547,freq=2.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.089176424 = fieldWeight in 2547, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2547)
          0.33333334 = coord(1/3)
      0.5 = coord(2/4)
    
    Content
    (1) The Boolean world; (2) The Non-Boolean picture; (3) The commercial search engines: Personal Librarian, CLARIT, ConQuest, DR-LINK, InQuizit, InTEXT, TOPIC, WIN, TARGET, FREESTYLE, InfoSeek; (4) Wiedergabe von 8 Aufsätzen aus 'Monitor'
    Issue
    A collection of writings.
  11. Weller, K.; Stock, W.G.: Transitive meronymy : automatic concept-based query expansion using weighted transitive part-whole relations (2008) 0.01
    0.014035223 = product of:
      0.028070446 = sum of:
        0.026425181 = weight(_text_:von in 1835) [ClassicSimilarity], result of:
          0.026425181 = score(doc=1835,freq=2.0), product of:
            0.12806706 = queryWeight, product of:
              2.6679487 = idf(docFreq=8340, maxDocs=44218)
              0.04800207 = queryNorm
            0.20633863 = fieldWeight in 1835, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.6679487 = idf(docFreq=8340, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1835)
        0.0016452647 = product of:
          0.004935794 = sum of:
            0.004935794 = weight(_text_:a in 1835) [ClassicSimilarity], result of:
              0.004935794 = score(doc=1835,freq=2.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.089176424 = fieldWeight in 1835, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1835)
          0.33333334 = coord(1/3)
      0.5 = coord(2/4)
    
    Abstract
    Transitive Meronymie. Automatische begriffsbasierte Suchanfrageerweiterung unter Nutzung gewichteter transitiver Teil-Ganzes-Relationen. Unsere theoretisch orientierte Arbeit isoliert transitive Teil-Ganzes-Beziehungen. Wir diskutieren den Einsatz der Meronymie bei der automatischen begriffsbasierten Suchanfrageerweiterung im Information Retrieval. Aus praktischen Gründen schlagen wir vor, die Bestandsrelationen zu spezifizieren und die einzelnen Arten mit unterschiedlichen Gewichtungswerten zu versehen, die im Retrieval genutzt werden. Für das Design von Wissensordnungen ist bedeutsam, dass innerhalb der Begriffsleiter einer Abstraktionsrelation ein Begriff alle seine Teile (sowie alle transitiven Teile der Teile) an seine Unterbegriffe vererbt.
    Type
    a
  12. Ye, Z.; Huang, J.X.: ¬A learning to rank approach for quality-aware pseudo-relevance feedback (2016) 0.01
    0.013904001 = product of:
      0.055616003 = sum of:
        0.055616003 = product of:
          0.083424 = sum of:
            0.007883408 = weight(_text_:a in 2855) [ClassicSimilarity], result of:
              0.007883408 = score(doc=2855,freq=10.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.14243183 = fieldWeight in 2855, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2855)
            0.075540595 = weight(_text_:z in 2855) [ClassicSimilarity], result of:
              0.075540595 = score(doc=2855,freq=2.0), product of:
                0.2562021 = queryWeight, product of:
                  5.337313 = idf(docFreq=577, maxDocs=44218)
                  0.04800207 = queryNorm
                0.29484767 = fieldWeight in 2855, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.337313 = idf(docFreq=577, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2855)
          0.6666667 = coord(2/3)
      0.25 = coord(1/4)
    
    Abstract
    Pseudo relevance feedback (PRF) has shown to be effective in ad hoc information retrieval. In traditional PRF methods, top-ranked documents are all assumed to be relevant and therefore treated equally in the feedback process. However, the performance gain brought by each document is different as showed in our preliminary experiments. Thus, it is more reasonable to predict the performance gain brought by each candidate feedback document in the process of PRF. We define the quality level (QL) and then use this information to adjust the weights of feedback terms in these documents. Unlike previous work, we do not make any explicit relevance assumption and we go beyond just selecting "good" documents for PRF. We propose a quality-based PRF framework, in which two quality-based assumptions are introduced. Particularly, two different strategies, relevance-based QL (RelPRF) and improvement-based QL (ImpPRF) are presented to estimate the QL of each feedback document. Based on this, we select a set of heterogeneous document-level features and apply a learning approach to evaluate the QL of each feedback document. Extensive experiments on standard TREC (Text REtrieval Conference) test collections show that our proposed model performs robustly and outperforms strong baselines significantly.
    Type
    a
  13. Chen, Z.; Fu, B.: On the complexity of Rocchio's similarity-based relevance feedback algorithm (2007) 0.01
    0.0137652885 = product of:
      0.055061154 = sum of:
        0.055061154 = product of:
          0.08259173 = sum of:
            0.007051134 = weight(_text_:a in 578) [ClassicSimilarity], result of:
              0.007051134 = score(doc=578,freq=8.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.12739488 = fieldWeight in 578, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=578)
            0.075540595 = weight(_text_:z in 578) [ClassicSimilarity], result of:
              0.075540595 = score(doc=578,freq=2.0), product of:
                0.2562021 = queryWeight, product of:
                  5.337313 = idf(docFreq=577, maxDocs=44218)
                  0.04800207 = queryNorm
                0.29484767 = fieldWeight in 578, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.337313 = idf(docFreq=577, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=578)
          0.6666667 = coord(2/3)
      0.25 = coord(1/4)
    
    Abstract
    Rocchio's similarity-based relevance feedback algorithm, one of the most important query reformation methods in information retrieval, is essentially an adaptive learning algorithm from examples in searching for documents represented by a linear classifier. Despite its popularity in various applications, there is little rigorous analysis of its learning complexity in literature. In this article, the authors prove for the first time that the learning complexity of Rocchio's algorithm is O(d + d**2(log d + log n)) over the discretized vector space {0, ... , n - 1 }**d when the inner product similarity measure is used. The upper bound on the learning complexity for searching for documents represented by a monotone linear classifier (q, 0) over {0, ... , n - 1 }d can be improved to, at most, 1 + 2k (n - 1) (log d + log(n - 1)), where k is the number of nonzero components in q. Several lower bounds on the learning complexity are also obtained for Rocchio's algorithm. For example, the authors prove that Rocchio's algorithm has a lower bound Omega((d über 2)log n) on its learning complexity over the Boolean vector space {0,1}**d.
    Type
    a
  14. Tsai, C.-F.; Hu, Y.-H.; Chen, Z.-Y.: Factors affecting rocchio-based pseudorelevance feedback in image retrieval (2015) 0.01
    0.013421084 = product of:
      0.053684335 = sum of:
        0.053684335 = product of:
          0.0805265 = sum of:
            0.0049859053 = weight(_text_:a in 1607) [ClassicSimilarity], result of:
              0.0049859053 = score(doc=1607,freq=4.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.090081796 = fieldWeight in 1607, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1607)
            0.075540595 = weight(_text_:z in 1607) [ClassicSimilarity], result of:
              0.075540595 = score(doc=1607,freq=2.0), product of:
                0.2562021 = queryWeight, product of:
                  5.337313 = idf(docFreq=577, maxDocs=44218)
                  0.04800207 = queryNorm
                0.29484767 = fieldWeight in 1607, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.337313 = idf(docFreq=577, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1607)
          0.6666667 = coord(2/3)
      0.25 = coord(1/4)
    
    Abstract
    Pseudorelevance feedback (PRF) was proposed to solve the limitation of relevance feedback (RF), which is based on the user-in-the-loop process. In PRF, the top-k retrieved images are regarded as PRF. Although the PRF set contains noise, PRF has proven effective for automatically improving the overall retrieval result. To implement PRF, the Rocchio algorithm has been considered as a reasonable and well-established baseline. However, the performance of Rocchio-based PRF is subject to various representation choices (or factors). In this article, we examine these factors that affect the performance of Rocchio-based PRF, including image-feature representation, the number of top-ranked images, the weighting parameters of Rocchio, and similarity measure. We offer practical insights on how to optimize the performance of Rocchio-based PRF by choosing appropriate representation choices. Our extensive experiments on NUS-WIDE-LITE and Caltech 101 + Corel 5000 data sets show that the optimal feature representation is color moment + wavelet texture in terms of retrieval efficiency and effectiveness. Other representation choices are that using top-20 ranked images as pseudopositive and pseudonegative feedback sets with the equal weight (i.e., 0.5) by the correlation and cosine distance functions can produce the optimal retrieval result.
    Type
    a
  15. Losada, D.E.; Barreiro, A.: Emebedding term similarity and inverse document frequency into a logical model of information retrieval (2003) 0.01
    0.010773733 = product of:
      0.043094933 = sum of:
        0.043094933 = product of:
          0.0646424 = sum of:
            0.012613453 = weight(_text_:a in 1422) [ClassicSimilarity], result of:
              0.012613453 = score(doc=1422,freq=10.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.22789092 = fieldWeight in 1422, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1422)
            0.052028947 = weight(_text_:22 in 1422) [ClassicSimilarity], result of:
              0.052028947 = score(doc=1422,freq=2.0), product of:
                0.16809508 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04800207 = queryNorm
                0.30952093 = fieldWeight in 1422, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1422)
          0.6666667 = coord(2/3)
      0.25 = coord(1/4)
    
    Abstract
    We propose a novel approach to incorporate term similarity and inverse document frequency into a logical model of information retrieval. The ability of the logic to handle expressive representations along with the use of such classical notions are promising characteristics for IR systems. The approach proposed here has been efficiently implemented and experiments against test collections are presented.
    Date
    22. 3.2003 19:27:23
    Type
    a
  16. Faloutsos, C.: Signature files (1992) 0.01
    0.0105517935 = product of:
      0.042207174 = sum of:
        0.042207174 = product of:
          0.06331076 = sum of:
            0.011281814 = weight(_text_:a in 3499) [ClassicSimilarity], result of:
              0.011281814 = score(doc=3499,freq=8.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.20383182 = fieldWeight in 3499, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0625 = fieldNorm(doc=3499)
            0.052028947 = weight(_text_:22 in 3499) [ClassicSimilarity], result of:
              0.052028947 = score(doc=3499,freq=2.0), product of:
                0.16809508 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04800207 = queryNorm
                0.30952093 = fieldWeight in 3499, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=3499)
          0.6666667 = coord(2/3)
      0.25 = coord(1/4)
    
    Abstract
    Presents a survey and discussion on signature-based text retrieval methods. It describes the main idea behind the signature approach and its advantages over other text retrieval methods, it provides a classification of the signature methods that have appeared in the literature, it describes the main representatives of each class, together with the relative advantages and drawbacks, and it gives a list of applications as well as commercial or university prototypes that use the signature approach
    Date
    7. 5.1999 15:22:48
    Type
    a
  17. Bornmann, L.; Mutz, R.: From P100 to P100' : a new citation-rank approach (2014) 0.01
    0.0105517935 = product of:
      0.042207174 = sum of:
        0.042207174 = product of:
          0.06331076 = sum of:
            0.011281814 = weight(_text_:a in 1431) [ClassicSimilarity], result of:
              0.011281814 = score(doc=1431,freq=8.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.20383182 = fieldWeight in 1431, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1431)
            0.052028947 = weight(_text_:22 in 1431) [ClassicSimilarity], result of:
              0.052028947 = score(doc=1431,freq=2.0), product of:
                0.16809508 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04800207 = queryNorm
                0.30952093 = fieldWeight in 1431, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1431)
          0.6666667 = coord(2/3)
      0.25 = coord(1/4)
    
    Abstract
    Properties of a percentile-based rating scale needed in bibliometrics are formulated. Based on these properties, P100 was recently introduced as a new citation-rank approach (Bornmann, Leydesdorff, & Wang, 2013). In this paper, we conceptualize P100 and propose an improvement which we call P100'. Advantages and disadvantages of citation-rank indicators are noted.
    Date
    22. 8.2014 17:05:18
    Type
    a
  18. MacFarlane, A.; Robertson, S.E.; McCann, J.A.: Parallel computing for passage retrieval (2004) 0.01
    0.010001066 = product of:
      0.040004265 = sum of:
        0.040004265 = product of:
          0.060006395 = sum of:
            0.007977448 = weight(_text_:a in 5108) [ClassicSimilarity], result of:
              0.007977448 = score(doc=5108,freq=4.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.14413087 = fieldWeight in 5108, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0625 = fieldNorm(doc=5108)
            0.052028947 = weight(_text_:22 in 5108) [ClassicSimilarity], result of:
              0.052028947 = score(doc=5108,freq=2.0), product of:
                0.16809508 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04800207 = queryNorm
                0.30952093 = fieldWeight in 5108, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=5108)
          0.6666667 = coord(2/3)
      0.25 = coord(1/4)
    
    Date
    20. 1.2007 18:30:22
    Type
    a
  19. Chang, C.-H.; Hsu, C.-C.: Integrating query expansion and conceptual relevance feedback for personalized Web information retrieval (1998) 0.01
    0.0090123955 = product of:
      0.036049582 = sum of:
        0.036049582 = product of:
          0.054074373 = sum of:
            0.008549047 = weight(_text_:a in 1319) [ClassicSimilarity], result of:
              0.008549047 = score(doc=1319,freq=6.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.1544581 = fieldWeight in 1319, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1319)
            0.045525327 = weight(_text_:22 in 1319) [ClassicSimilarity], result of:
              0.045525327 = score(doc=1319,freq=2.0), product of:
                0.16809508 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04800207 = queryNorm
                0.2708308 = fieldWeight in 1319, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1319)
          0.6666667 = coord(2/3)
      0.25 = coord(1/4)
    
    Abstract
    Keyword based querying has been an immediate and efficient way to specify and retrieve related information that the user inquired. However, conventional document ranking based on an automatic assessment of document relevance to the query may not be the best approach when little information is given. Proposes an idea to integrate 2 existing techniques, query expansion and relevance feedback to achieve a concept-based information search for the Web
    Date
    1. 8.1996 22:08:06
    Footnote
    Contribution to a special issue devoted to the Proceedings of the 7th International World Wide Web Conference, held 14-18 April 1998, Brisbane, Australia
    Type
    a
  20. Ravana, S.D.; Rajagopal, P.; Balakrishnan, V.: Ranking retrieval systems using pseudo relevance judgments (2015) 0.01
    0.008978489 = product of:
      0.035913955 = sum of:
        0.035913955 = product of:
          0.05387093 = sum of:
            0.007883408 = weight(_text_:a in 2591) [ClassicSimilarity], result of:
              0.007883408 = score(doc=2591,freq=10.0), product of:
                0.055348642 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.04800207 = queryNorm
                0.14243183 = fieldWeight in 2591, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2591)
            0.045987524 = weight(_text_:22 in 2591) [ClassicSimilarity], result of:
              0.045987524 = score(doc=2591,freq=4.0), product of:
                0.16809508 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04800207 = queryNorm
                0.27358043 = fieldWeight in 2591, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2591)
          0.6666667 = coord(2/3)
      0.25 = coord(1/4)
    
    Abstract
    Purpose In a system-based approach, replicating the web would require large test collections, and judging the relevancy of all documents per topic in creating relevance judgment through human assessors is infeasible. Due to the large amount of documents that requires judgment, there are possible errors introduced by human assessors because of disagreements. The paper aims to discuss these issues. Design/methodology/approach This study explores exponential variation and document ranking methods that generate a reliable set of relevance judgments (pseudo relevance judgments) to reduce human efforts. These methods overcome problems with large amounts of documents for judgment while avoiding human disagreement errors during the judgment process. This study utilizes two key factors: number of occurrences of each document per topic from all the system runs; and document rankings to generate the alternate methods. Findings The effectiveness of the proposed method is evaluated using the correlation coefficient of ranked systems using mean average precision scores between the original Text REtrieval Conference (TREC) relevance judgments and pseudo relevance judgments. The results suggest that the proposed document ranking method with a pool depth of 100 could be a reliable alternative to reduce human effort and disagreement errors involved in generating TREC-like relevance judgments. Originality/value Simple methods proposed in this study show improvement in the correlation coefficient in generating alternate relevance judgment without human assessors while contributing to information retrieval evaluation.
    Date
    20. 1.2015 18:30:22
    18. 9.2018 18:22:56
    Type
    a

Types

  • a 303
  • m 7
  • el 6
  • s 3
  • p 2
  • r 1
  • More… Less…