Search (39 results, page 1 of 2)

  • × language_ss:"e"
  • × theme_ss:"Retrievalalgorithmen"
  • × year_i:[2010 TO 2020}
  1. Ravana, S.D.; Rajagopal, P.; Balakrishnan, V.: Ranking retrieval systems using pseudo relevance judgments (2015) 0.02
    0.020789422 = product of:
      0.062368266 = sum of:
        0.062368266 = product of:
          0.093552396 = sum of:
            0.04467099 = weight(_text_:retrieval in 2591) [ClassicSimilarity], result of:
              0.04467099 = score(doc=2591,freq=6.0), product of:
                0.15433937 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.051022716 = queryNorm
                0.28943354 = fieldWeight in 2591, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2591)
            0.0488814 = weight(_text_:22 in 2591) [ClassicSimilarity], result of:
              0.0488814 = score(doc=2591,freq=4.0), product of:
                0.17867287 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.051022716 = queryNorm
                0.27358043 = fieldWeight in 2591, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2591)
          0.6666667 = coord(2/3)
      0.33333334 = coord(1/3)
    
    Abstract
    Purpose In a system-based approach, replicating the web would require large test collections, and judging the relevancy of all documents per topic in creating relevance judgment through human assessors is infeasible. Due to the large amount of documents that requires judgment, there are possible errors introduced by human assessors because of disagreements. The paper aims to discuss these issues. Design/methodology/approach This study explores exponential variation and document ranking methods that generate a reliable set of relevance judgments (pseudo relevance judgments) to reduce human efforts. These methods overcome problems with large amounts of documents for judgment while avoiding human disagreement errors during the judgment process. This study utilizes two key factors: number of occurrences of each document per topic from all the system runs; and document rankings to generate the alternate methods. Findings The effectiveness of the proposed method is evaluated using the correlation coefficient of ranked systems using mean average precision scores between the original Text REtrieval Conference (TREC) relevance judgments and pseudo relevance judgments. The results suggest that the proposed document ranking method with a pool depth of 100 could be a reliable alternative to reduce human effort and disagreement errors involved in generating TREC-like relevance judgments. Originality/value Simple methods proposed in this study show improvement in the correlation coefficient in generating alternate relevance judgment without human assessors while contributing to information retrieval evaluation.
    Date
    20. 1.2015 18:30:22
    18. 9.2018 18:22:56
  2. Jacucci, G.; Barral, O.; Daee, P.; Wenzel, M.; Serim, B.; Ruotsalo, T.; Pluchino, P.; Freeman, J.; Gamberini, L.; Kaski, S.; Blankertz, B.: Integrating neurophysiologic relevance feedback in intent modeling for information retrieval (2019) 0.02
    0.019621456 = product of:
      0.058864366 = sum of:
        0.058864366 = product of:
          0.08829655 = sum of:
            0.036714934 = weight(_text_:online in 5356) [ClassicSimilarity], result of:
              0.036714934 = score(doc=5356,freq=4.0), product of:
                0.1548489 = queryWeight, product of:
                  3.0349014 = idf(docFreq=5778, maxDocs=44218)
                  0.051022716 = queryNorm
                0.23710167 = fieldWeight in 5356, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.0349014 = idf(docFreq=5778, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5356)
            0.051581617 = weight(_text_:retrieval in 5356) [ClassicSimilarity], result of:
              0.051581617 = score(doc=5356,freq=8.0), product of:
                0.15433937 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.051022716 = queryNorm
                0.33420905 = fieldWeight in 5356, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5356)
          0.6666667 = coord(2/3)
      0.33333334 = coord(1/3)
    
    Abstract
    The use of implicit relevance feedback from neurophysiology could deliver effortless information retrieval. However, both computing neurophysiologic responses and retrieving documents are characterized by uncertainty because of noisy signals and incomplete or inconsistent representations of the data. We present the first-of-its-kind, fully integrated information retrieval system that makes use of online implicit relevance feedback generated from brain activity as measured through electroencephalography (EEG), and eye movements. The findings of the evaluation experiment (N = 16) show that we are able to compute online neurophysiology-based relevance feedback with performance significantly better than chance in complex data domains and realistic search tasks. We contribute by demonstrating how to integrate in interactive intent modeling this inherently noisy implicit relevance feedback combined with scarce explicit feedback. Although experimental measures of task performance did not allow us to demonstrate how the classification outcomes translated into search task performance, the experiment proved that our approach is able to generate relevance feedback from brain signals and eye movements in a realistic scenario, thus providing promising implications for future work in neuroadaptive information retrieval (IR).
  3. Soulier, L.; Jabeur, L.B.; Tamine, L.; Bahsoun, W.: On ranking relevant entities in heterogeneous networks using a language-based model (2013) 0.01
    0.013412262 = product of:
      0.040236786 = sum of:
        0.040236786 = product of:
          0.06035518 = sum of:
            0.025790809 = weight(_text_:retrieval in 664) [ClassicSimilarity], result of:
              0.025790809 = score(doc=664,freq=2.0), product of:
                0.15433937 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.051022716 = queryNorm
                0.16710453 = fieldWeight in 664, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=664)
            0.03456437 = weight(_text_:22 in 664) [ClassicSimilarity], result of:
              0.03456437 = score(doc=664,freq=2.0), product of:
                0.17867287 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.051022716 = queryNorm
                0.19345059 = fieldWeight in 664, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=664)
          0.6666667 = coord(2/3)
      0.33333334 = coord(1/3)
    
    Abstract
    A new challenge, accessing multiple relevant entities, arises from the availability of linked heterogeneous data. In this article, we address more specifically the problem of accessing relevant entities, such as publications and authors within a bibliographic network, given an information need. We propose a novel algorithm, called BibRank, that estimates a joint relevance of documents and authors within a bibliographic network. This model ranks each type of entity using a score propagation algorithm with respect to the query topic and the structure of the underlying bi-type information entity network. Evidence sources, namely content-based and network-based scores, are both used to estimate the topical similarity between connected entities. For this purpose, authorship relationships are analyzed through a language model-based score on the one hand and on the other hand, non topically related entities of the same type are detected through marginal citations. The article reports the results of experiments using the Bibrank algorithm for an information retrieval task. The CiteSeerX bibliographic data set forms the basis for the topical query automatic generation and evaluation. We show that a statistically significant improvement over closely related ranking models is achieved.
    Date
    22. 3.2013 19:34:49
  4. White, H. D.: Co-cited author retrieval and relevance theory : examples from the humanities (2015) 0.01
    0.009726323 = product of:
      0.02917897 = sum of:
        0.02917897 = product of:
          0.08753691 = sum of:
            0.08753691 = weight(_text_:retrieval in 1687) [ClassicSimilarity], result of:
              0.08753691 = score(doc=1687,freq=4.0), product of:
                0.15433937 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.051022716 = queryNorm
                0.5671716 = fieldWeight in 1687, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.09375 = fieldNorm(doc=1687)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Footnote
    Beitrag in einem Special Issue "Combining bibliometrics and information retrieval"
  5. Karlsson, A.; Hammarfelt, B.; Steinhauer, H.J.; Falkman, G.; Olson, N.; Nelhans, G.; Nolin, J.: Modeling uncertainty in bibliometrics and information retrieval : an information fusion approach (2015) 0.01
    0.00810527 = product of:
      0.024315808 = sum of:
        0.024315808 = product of:
          0.07294742 = sum of:
            0.07294742 = weight(_text_:retrieval in 1696) [ClassicSimilarity], result of:
              0.07294742 = score(doc=1696,freq=4.0), product of:
                0.15433937 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.051022716 = queryNorm
                0.47264296 = fieldWeight in 1696, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.078125 = fieldNorm(doc=1696)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Footnote
    Beitrag in einem Special Issue "Combining bibliometrics and information retrieval"
  6. Ayadi, H.; Torjmen-Khemakhem, M.; Daoud, M.; Xiangji Huang, J.; Ben Jemaa, M.: MF-Re-Rank : a modality feature-based re-ranking model for medical image retrieval (2018) 0.01
    0.007603416 = product of:
      0.022810249 = sum of:
        0.022810249 = product of:
          0.068430744 = sum of:
            0.068430744 = weight(_text_:retrieval in 4459) [ClassicSimilarity], result of:
              0.068430744 = score(doc=4459,freq=22.0), product of:
                0.15433937 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.051022716 = queryNorm
                0.44337842 = fieldWeight in 4459, product of:
                  4.690416 = tf(freq=22.0), with freq of:
                    22.0 = termFreq=22.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03125 = fieldNorm(doc=4459)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Abstract
    One of the main challenges in medical image retrieval is the increasing volume of image data, which render it difficult for domain experts to find relevant information from large data sets. Effective and efficient medical image retrieval systems are required to better manage medical image information. Text-based image retrieval (TBIR) was very successful in retrieving images with textual descriptions. Several TBIR approaches rely on models based on bag-of-words approaches, in which the image retrieval problem turns into one of standard text-based information retrieval; where the meanings and values of specific medical entities in the text and metadata are ignored in the image representation and retrieval process. However, we believe that TBIR should extract specific medical entities and terms and then exploit these elements to achieve better image retrieval results. Therefore, we propose a novel reranking method based on medical-image-dependent features. These features are manually selected by a medical expert from imaging modalities and medical terminology. First, we represent queries and images using only medical-image-dependent features such as image modality and image scale. Second, we exploit the defined features in a new reranking method for medical image retrieval. Our motivation is the large influence of image modality in medical image retrieval and its impact on image-relevance scores. To evaluate our approach, we performed a series of experiments on the medical ImageCLEF data sets from 2009 to 2013. The BM25 model, a language model, and an image-relevance feedback model are used as baselines to evaluate our approach. The experimental results show that compared to the BM25 model, the proposed model significantly enhances image retrieval performance. We also compared our approach with other state-of-the-art approaches and show that our approach performs comparably to those of the top three runs in the official ImageCLEF competition.
  7. Lee, J.-T.; Seo, J.; Jeon, J.; Rim, H.-C.: Sentence-based relevance flow analysis for high accuracy retrieval (2011) 0.01
    0.00701937 = product of:
      0.021058109 = sum of:
        0.021058109 = product of:
          0.06317432 = sum of:
            0.06317432 = weight(_text_:retrieval in 4746) [ClassicSimilarity], result of:
              0.06317432 = score(doc=4746,freq=12.0), product of:
                0.15433937 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.051022716 = queryNorm
                0.40932083 = fieldWeight in 4746, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4746)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Abstract
    Traditional ranking models for information retrieval lack the ability to make a clear distinction between relevant and nonrelevant documents at top ranks if both have similar bag-of-words representations with regard to a user query. We aim to go beyond the bag-of-words approach to document ranking in a new perspective, by representing each document as a sequence of sentences. We begin with an assumption that relevant documents are distinguishable from nonrelevant ones by sequential patterns of relevance degrees of sentences to a query. We introduce the notion of relevance flow, which refers to a stream of sentence-query relevance within a document. We then present a framework to learn a function for ranking documents effectively based on various features extracted from their relevance flows and leverage the output to enhance existing retrieval models. We validate the effectiveness of our approach by performing a number of retrieval experiments on three standard test collections, each comprising a different type of document: news articles, medical references, and blog posts. Experimental results demonstrate that the proposed approach can improve the retrieval performance at the top ranks significantly as compared with the state-of-the-art retrieval models regardless of document type.
  8. Symonds, M.; Bruza, P.; Zuccon, G.; Koopman, B.; Sitbon, L.; Turner, I.: Automatic query expansion : a structural linguistic perspective (2014) 0.01
    0.00701937 = product of:
      0.021058109 = sum of:
        0.021058109 = product of:
          0.06317432 = sum of:
            0.06317432 = weight(_text_:retrieval in 1338) [ClassicSimilarity], result of:
              0.06317432 = score(doc=1338,freq=12.0), product of:
                0.15433937 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.051022716 = queryNorm
                0.40932083 = fieldWeight in 1338, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1338)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Abstract
    A user's query is considered to be an imprecise description of their information need. Automatic query expansion is the process of reformulating the original query with the goal of improving retrieval effectiveness. Many successful query expansion techniques model syntagmatic associations that infer two terms co-occur more often than by chance in natural language. However, structural linguistics relies on both syntagmatic and paradigmatic associations to deduce the meaning of a word. Given the success of dependency-based approaches to query expansion and the reliance on word meanings in the query formulation process, we argue that modeling both syntagmatic and paradigmatic information in the query expansion process improves retrieval effectiveness. This article develops and evaluates a new query expansion technique that is based on a formal, corpus-based model of word meaning that models syntagmatic and paradigmatic associations. We demonstrate that when sufficient statistical information exists, as in the case of longer queries, including paradigmatic information alone provides significant improvements in retrieval effectiveness across a wide variety of data sets. More generally, when our new query expansion approach is applied to large-scale web retrieval it demonstrates significant improvements in retrieval effectiveness over a strong baseline system, based on a commercial search engine.
    Theme
    Semantisches Umfeld in Indexierung u. Retrieval
  9. Dang, E.K.F.; Luk, R.W.P.; Allan, J.: Beyond bag-of-words : bigram-enhanced context-dependent term weights (2014) 0.01
    0.0064077782 = product of:
      0.019223334 = sum of:
        0.019223334 = product of:
          0.05767 = sum of:
            0.05767 = weight(_text_:retrieval in 1283) [ClassicSimilarity], result of:
              0.05767 = score(doc=1283,freq=10.0), product of:
                0.15433937 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.051022716 = queryNorm
                0.37365708 = fieldWeight in 1283, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1283)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Abstract
    While term independence is a widely held assumption in most of the established information retrieval approaches, it is clearly not true and various works in the past have investigated a relaxation of the assumption. One approach is to use n-grams in document representation instead of unigrams. However, the majority of early works on n-grams obtained only modest performance improvement. On the other hand, the use of information based on supporting terms or "contexts" of queries has been found to be promising. In particular, recent studies showed that using new context-dependent term weights improved the performance of relevance feedback (RF) retrieval compared with using traditional bag-of-words BM25 term weights. Calculation of the new term weights requires an estimation of the local probability of relevance of each query term occurrence. In previous studies, the estimation of this probability was based on unigrams that occur in the neighborhood of a query term. We explore an integration of the n-gram and context approaches by computing context-dependent term weights based on a mixture of unigrams and bigrams. Extensive experiments are performed using the title queries of the Text Retrieval Conference (TREC)-6, TREC-7, TREC-8, and TREC-2005 collections, for RF with relevance judgment of either the top 10 or top 20 documents of an initial retrieval. We identify some crucial elements needed in the use of bigrams in our methods, such as proper inverse document frequency (IDF) weighting of the bigrams and noise reduction by pruning bigrams with large document frequency values. We show that enhancing context-dependent term weights with bigrams is effective in further improving retrieval performance.
  10. Xu, B.; Lin, H.; Lin, Y.: Assessment of learning to rank methods for query expansion (2016) 0.01
    0.0064077782 = product of:
      0.019223334 = sum of:
        0.019223334 = product of:
          0.05767 = sum of:
            0.05767 = weight(_text_:retrieval in 2929) [ClassicSimilarity], result of:
              0.05767 = score(doc=2929,freq=10.0), product of:
                0.15433937 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.051022716 = queryNorm
                0.37365708 = fieldWeight in 2929, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2929)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Abstract
    Pseudo relevance feedback, as an effective query expansion method, can significantly improve information retrieval performance. However, the method may negatively impact the retrieval performance when some irrelevant terms are used in the expanded query. Therefore, it is necessary to refine the expansion terms. Learning to rank methods have proven effective in information retrieval to solve ranking problems by ranking the most relevant documents at the top of the returned list, but few attempts have been made to employ learning to rank methods for term refinement in pseudo relevance feedback. This article proposes a novel framework to explore the feasibility of using learning to rank to optimize pseudo relevance feedback by means of reranking the candidate expansion terms. We investigate some learning approaches to choose the candidate terms and introduce some state-of-the-art learning to rank methods to refine the expansion terms. In addition, we propose two term labeling strategies and examine the usefulness of various term features to optimize the framework. Experimental results with three TREC collections show that our framework can effectively improve retrieval performance.
    Theme
    Semantisches Umfeld in Indexierung u. Retrieval
  11. Bornmann, L.; Mutz, R.: From P100 to P100' : a new citation-rank approach (2014) 0.01
    0.006144777 = product of:
      0.01843433 = sum of:
        0.01843433 = product of:
          0.055302992 = sum of:
            0.055302992 = weight(_text_:22 in 1431) [ClassicSimilarity], result of:
              0.055302992 = score(doc=1431,freq=2.0), product of:
                0.17867287 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.051022716 = queryNorm
                0.30952093 = fieldWeight in 1431, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1431)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Date
    22. 8.2014 17:05:18
  12. Habernal, I.; Konopík, M.; Rohlík, O.: Question answering (2012) 0.01
    0.005956133 = product of:
      0.017868398 = sum of:
        0.017868398 = product of:
          0.05360519 = sum of:
            0.05360519 = weight(_text_:retrieval in 101) [ClassicSimilarity], result of:
              0.05360519 = score(doc=101,freq=6.0), product of:
                0.15433937 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.051022716 = queryNorm
                0.34732026 = fieldWeight in 101, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.046875 = fieldNorm(doc=101)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Abstract
    Question Answering is an area of information retrieval with the added challenge of applying sophisticated techniques to identify the complex syntactic and semantic relationships present in text in order to provide a more sophisticated and satisfactory response to the user's information needs. For this reason, the authors see question answering as the next step beyond standard information retrieval. In this chapter state of the art question answering is covered focusing on providing an overview of systems, techniques and approaches that are likely to be employed in the next generations of search engines. Special attention is paid to question answering using the World Wide Web as the data source and to question answering exploiting the possibilities of Semantic Web. Considerations about the current issues and prospects for promising future research are also provided.
    Source
    Next generation search engines: advanced models for information retrieval. Eds.: C. Jouis, u.a
  13. Jindal, V.; Bawa, S.; Batra, S.: ¬A review of ranking approaches for semantic search on Web (2014) 0.01
    0.005956133 = product of:
      0.017868398 = sum of:
        0.017868398 = product of:
          0.05360519 = sum of:
            0.05360519 = weight(_text_:retrieval in 2799) [ClassicSimilarity], result of:
              0.05360519 = score(doc=2799,freq=6.0), product of:
                0.15433937 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.051022716 = queryNorm
                0.34732026 = fieldWeight in 2799, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2799)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Abstract
    With ever increasing information being available to the end users, search engines have become the most powerful tools for obtaining useful information scattered on the Web. However, it is very common that even most renowned search engines return result sets with not so useful pages to the user. Research on semantic search aims to improve traditional information search and retrieval methods where the basic relevance criteria rely primarily on the presence of query keywords within the returned pages. This work is an attempt to explore different relevancy ranking approaches based on semantics which are considered appropriate for the retrieval of relevant information. In this paper, various pilot projects and their corresponding outcomes have been investigated based on methodologies adopted and their most distinctive characteristics towards ranking. An overview of selected approaches and their comparison by means of the classification criteria has been presented. With the help of this comparison, some common concepts and outstanding features have been identified.
    Theme
    Semantisches Umfeld in Indexierung u. Retrieval
  14. Tsai, C.-F.; Hu, Y.-H.; Chen, Z.-Y.: Factors affecting rocchio-based pseudorelevance feedback in image retrieval (2015) 0.01
    0.005731291 = product of:
      0.017193872 = sum of:
        0.017193872 = product of:
          0.051581617 = sum of:
            0.051581617 = weight(_text_:retrieval in 1607) [ClassicSimilarity], result of:
              0.051581617 = score(doc=1607,freq=8.0), product of:
                0.15433937 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.051022716 = queryNorm
                0.33420905 = fieldWeight in 1607, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1607)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Abstract
    Pseudorelevance feedback (PRF) was proposed to solve the limitation of relevance feedback (RF), which is based on the user-in-the-loop process. In PRF, the top-k retrieved images are regarded as PRF. Although the PRF set contains noise, PRF has proven effective for automatically improving the overall retrieval result. To implement PRF, the Rocchio algorithm has been considered as a reasonable and well-established baseline. However, the performance of Rocchio-based PRF is subject to various representation choices (or factors). In this article, we examine these factors that affect the performance of Rocchio-based PRF, including image-feature representation, the number of top-ranked images, the weighting parameters of Rocchio, and similarity measure. We offer practical insights on how to optimize the performance of Rocchio-based PRF by choosing appropriate representation choices. Our extensive experiments on NUS-WIDE-LITE and Caltech 101 + Corel 5000 data sets show that the optimal feature representation is color moment + wavelet texture in terms of retrieval efficiency and effectiveness. Other representation choices are that using top-20 ranked images as pseudopositive and pseudonegative feedback sets with the equal weight (i.e., 0.5) by the correlation and cosine distance functions can produce the optimal retrieval result.
  15. Bar-Ilan, J.; Levene, M.: ¬The hw-rank : an h-index variant for ranking web pages (2015) 0.01
    0.005731291 = product of:
      0.017193872 = sum of:
        0.017193872 = product of:
          0.051581617 = sum of:
            0.051581617 = weight(_text_:retrieval in 1694) [ClassicSimilarity], result of:
              0.051581617 = score(doc=1694,freq=2.0), product of:
                0.15433937 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.051022716 = queryNorm
                0.33420905 = fieldWeight in 1694, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.078125 = fieldNorm(doc=1694)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Footnote
    Beitrag in einem Special Issue "Combining bibliometrics and information retrieval"
  16. Hoenkamp, E.; Bruza, P.: How everyday language can and will boost effective information retrieval (2015) 0.00
    0.0049634436 = product of:
      0.014890331 = sum of:
        0.014890331 = product of:
          0.04467099 = sum of:
            0.04467099 = weight(_text_:retrieval in 2123) [ClassicSimilarity], result of:
              0.04467099 = score(doc=2123,freq=6.0), product of:
                0.15433937 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.051022716 = queryNorm
                0.28943354 = fieldWeight in 2123, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2123)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Abstract
    Typing 2 or 3 keywords into a browser has become an easy and efficient way to find information. Yet, typing even short queries becomes tedious on ever shrinking (virtual) keyboards. Meanwhile, speech processing is maturing rapidly, facilitating everyday language input. Also, wearable technology can inform users proactively by listening in on their conversations or processing their social media interactions. Given these developments, everyday language may soon become the new input of choice. We present an information retrieval (IR) algorithm specifically designed to accept everyday language. It integrates two paradigms of information retrieval, previously studied in isolation; one directed mainly at the surface structure of language, the other primarily at the underlying meaning. The integration was achieved by a Markov machine that encodes meaning by its transition graph, and surface structure by the language it generates. A rigorous evaluation of the approach showed, first, that it can compete with the quality of existing language models, second, that it is more effective the more verbose the input, and third, as a consequence, that it is promising for an imminent transition from keyword input, where the onus is on the user to formulate concise queries, to a modality where users can express more freely, more informal, and more natural their need for information in everyday language.
  17. Liu, X.; Zheng, W.; Fang, H.: ¬An exploration of ranking models and feedback method for related entity finding (2013) 0.00
    0.0049634436 = product of:
      0.014890331 = sum of:
        0.014890331 = product of:
          0.04467099 = sum of:
            0.04467099 = weight(_text_:retrieval in 2714) [ClassicSimilarity], result of:
              0.04467099 = score(doc=2714,freq=6.0), product of:
                0.15433937 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.051022716 = queryNorm
                0.28943354 = fieldWeight in 2714, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2714)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Abstract
    Most existing search engines focus on document retrieval. However, information needs are certainly not limited to finding relevant documents. Instead, a user may want to find relevant entities such as persons and organizations. In this paper, we study the problem of related entity finding. Our goal is to rank entities based on their relevance to a structured query, which specifies an input entity, the type of related entities and the relation between the input and related entities. We first discuss a general probabilistic framework, derive six possible retrieval models to rank the related entities, and then compare these models both analytically and empirically. To further improve performance, we study the problem of feedback in the context of related entity finding. Specifically, we propose a mixture model based feedback method that can utilize the pseudo feedback entities to estimate an enriched model for the relation between the input and related entities. Experimental results over two standard TREC collections show that the derived relation generation model combined with a relation feedback method performs better than other models.
    Theme
    Semantisches Umfeld in Indexierung u. Retrieval
  18. Efron, M.; Winget, M.: Query polyrepresentation for ranking retrieval systems without relevance judgments (2010) 0.00
    0.0048631616 = product of:
      0.014589485 = sum of:
        0.014589485 = product of:
          0.043768454 = sum of:
            0.043768454 = weight(_text_:retrieval in 3469) [ClassicSimilarity], result of:
              0.043768454 = score(doc=3469,freq=4.0), product of:
                0.15433937 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.051022716 = queryNorm
                0.2835858 = fieldWeight in 3469, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3469)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Abstract
    Ranking information retrieval (IR) systems with respect to their effectiveness is a crucial operation during IR evaluation, as well as during data fusion. This article offers a novel method of approaching the system-ranking problem, based on the widely studied idea of polyrepresentation. The principle of polyrepresentation suggests that a single information need can be represented by many query articulations-what we call query aspects. By skimming the top k (where k is small) documents retrieved by a single system for multiple query aspects, we collect a set of documents that are likely to be relevant to a given test topic. Labeling these skimmed documents as putatively relevant lets us build pseudorelevance judgments without undue human intervention. We report experiments where using these pseudorelevance judgments delivers a rank ordering of IR systems that correlates highly with rankings based on human relevance judgments.
  19. Efron, M.: Linear time series models for term weighting in information retrieval (2010) 0.00
    0.0048631616 = product of:
      0.014589485 = sum of:
        0.014589485 = product of:
          0.043768454 = sum of:
            0.043768454 = weight(_text_:retrieval in 3688) [ClassicSimilarity], result of:
              0.043768454 = score(doc=3688,freq=4.0), product of:
                0.15433937 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.051022716 = queryNorm
                0.2835858 = fieldWeight in 3688, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3688)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Abstract
    Common measures of term importance in information retrieval (IR) rely on counts of term frequency; rare terms receive higher weight in document ranking than common terms receive. However, realistic scenarios yield additional information about terms in a collection. Of interest in this article is the temporal behavior of terms as a collection changes over time. We propose capturing each term's collection frequency at discrete time intervals over the lifespan of a corpus and analyzing the resulting time series. We hypothesize the collection frequency of a weakly discriminative term x at time t is predictable by a linear model of the term's prior observations. On the other hand, a linear time series model for a strong discriminators' collection frequency will yield a poor fit to the data. Operationalizing this hypothesis, we induce three time-based measures of term importance and test these against state-of-the-art term weighting models.
  20. Biskri, I.; Rompré, L.: Using association rules for query reformulation (2012) 0.00
    0.0048631616 = product of:
      0.014589485 = sum of:
        0.014589485 = product of:
          0.043768454 = sum of:
            0.043768454 = weight(_text_:retrieval in 92) [ClassicSimilarity], result of:
              0.043768454 = score(doc=92,freq=4.0), product of:
                0.15433937 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.051022716 = queryNorm
                0.2835858 = fieldWeight in 92, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.046875 = fieldNorm(doc=92)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Abstract
    In this paper the authors will present research on the combination of two methods of data mining: text classification and maximal association rules. Text classification has been the focus of interest of many researchers for a long time. However, the results take the form of lists of words (classes) that people often do not know what to do with. The use of maximal association rules induced a number of advantages: (1) the detection of dependencies and correlations between the relevant units of information (words) of different classes, (2) the extraction of hidden knowledge, often relevant, from a large volume of data. The authors will show how this combination can improve the process of information retrieval.
    Source
    Next generation search engines: advanced models for information retrieval. Eds.: C. Jouis, u.a