Search (50 results, page 1 of 3)

  • × language_ss:"e"
  • × theme_ss:"Retrievalalgorithmen"
  • × year_i:[2010 TO 2020}
  1. Van der Veer Martens, B.; Fleet, C. van: Opening the black box of "relevance work" : a domain analysis (2012) 0.03
    0.034116738 = product of:
      0.102350205 = sum of:
        0.02783884 = weight(_text_:23 in 247) [ClassicSimilarity], result of:
          0.02783884 = score(doc=247,freq=2.0), product of:
            0.117170855 = queryWeight, product of:
              3.5840597 = idf(docFreq=3336, maxDocs=44218)
              0.032692216 = queryNorm
            0.23759183 = fieldWeight in 247, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5840597 = idf(docFreq=3336, maxDocs=44218)
              0.046875 = fieldNorm(doc=247)
        0.02783884 = weight(_text_:23 in 247) [ClassicSimilarity], result of:
          0.02783884 = score(doc=247,freq=2.0), product of:
            0.117170855 = queryWeight, product of:
              3.5840597 = idf(docFreq=3336, maxDocs=44218)
              0.032692216 = queryNorm
            0.23759183 = fieldWeight in 247, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5840597 = idf(docFreq=3336, maxDocs=44218)
              0.046875 = fieldNorm(doc=247)
        0.02783884 = weight(_text_:23 in 247) [ClassicSimilarity], result of:
          0.02783884 = score(doc=247,freq=2.0), product of:
            0.117170855 = queryWeight, product of:
              3.5840597 = idf(docFreq=3336, maxDocs=44218)
              0.032692216 = queryNorm
            0.23759183 = fieldWeight in 247, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5840597 = idf(docFreq=3336, maxDocs=44218)
              0.046875 = fieldNorm(doc=247)
        0.008019937 = weight(_text_:in in 247) [ClassicSimilarity], result of:
          0.008019937 = score(doc=247,freq=8.0), product of:
            0.044469737 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.032692216 = queryNorm
            0.18034597 = fieldWeight in 247, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=247)
        0.010813748 = weight(_text_:der in 247) [ClassicSimilarity], result of:
          0.010813748 = score(doc=247,freq=2.0), product of:
            0.073026784 = queryWeight, product of:
              2.2337668 = idf(docFreq=12875, maxDocs=44218)
              0.032692216 = queryNorm
            0.14807922 = fieldWeight in 247, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.2337668 = idf(docFreq=12875, maxDocs=44218)
              0.046875 = fieldNorm(doc=247)
      0.33333334 = coord(5/15)
    
    Abstract
    In response to Hjørland's recent call for a reconceptualization of the foundations of relevance, we suggest that the sociocognitive aspects of intermediation by information agencies, such as archives and libraries, are a necessary and unexplored part of the infrastructure of the subject knowledge domains central to his recommended "view of relevance informed by a social paradigm" (2010, p. 217). From a comparative analysis of documents from 39 graduate-level introductory courses in archives, reference, and strategic/competitive intelligence taught in 13 American Library Association-accredited library and information science (LIS) programs, we identify four defining sociocognitive dimensions of "relevance work" in information agencies within Hjørland's proposed framework for relevance: tasks, time, systems, and assessors. This study is intended to supply sociocognitive content from within the relevance work domain to support further domain analytic research, and to emphasize the importance of intermediary relevance work for all subject knowledge domains.
    Date
    11. 6.2012 14:23:00
  2. Dadashkarimia, J.; Shakery, A.; Failia, H.; Zamani, H.: ¬An expectation-maximization algorithm for query translation based on pseudo-relevant documents (2017) 0.02
    0.016733494 = product of:
      0.0627506 = sum of:
        0.018559227 = weight(_text_:23 in 3296) [ClassicSimilarity], result of:
          0.018559227 = score(doc=3296,freq=2.0), product of:
            0.117170855 = queryWeight, product of:
              3.5840597 = idf(docFreq=3336, maxDocs=44218)
              0.032692216 = queryNorm
            0.15839456 = fieldWeight in 3296, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5840597 = idf(docFreq=3336, maxDocs=44218)
              0.03125 = fieldNorm(doc=3296)
        0.018559227 = weight(_text_:23 in 3296) [ClassicSimilarity], result of:
          0.018559227 = score(doc=3296,freq=2.0), product of:
            0.117170855 = queryWeight, product of:
              3.5840597 = idf(docFreq=3336, maxDocs=44218)
              0.032692216 = queryNorm
            0.15839456 = fieldWeight in 3296, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5840597 = idf(docFreq=3336, maxDocs=44218)
              0.03125 = fieldNorm(doc=3296)
        0.018559227 = weight(_text_:23 in 3296) [ClassicSimilarity], result of:
          0.018559227 = score(doc=3296,freq=2.0), product of:
            0.117170855 = queryWeight, product of:
              3.5840597 = idf(docFreq=3336, maxDocs=44218)
              0.032692216 = queryNorm
            0.15839456 = fieldWeight in 3296, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5840597 = idf(docFreq=3336, maxDocs=44218)
              0.03125 = fieldNorm(doc=3296)
        0.007072921 = weight(_text_:in in 3296) [ClassicSimilarity], result of:
          0.007072921 = score(doc=3296,freq=14.0), product of:
            0.044469737 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.032692216 = queryNorm
            0.15905021 = fieldWeight in 3296, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.03125 = fieldNorm(doc=3296)
      0.26666668 = coord(4/15)
    
    Abstract
    Query translation in cross-language information retrieval (CLIR) can be done by employing dictionaries, aligned corpora, or machine translators. Scarcity of aligned corpora for various domains in many language pairs intensifies the importance of dictionary-based CLIR which motivates us to use only a bilingual dictionary and two independent collections in source and target languages for query translation. We exploit pseudo-relevant documents for a given query in the source language and pseudo-relevant documents for a translation of the query in the target language with a proposed expectation-maximization algorithm for improving query translation. The proposed method (called EM4QT) assumes that each target term either is translated from the source pseudo-relevant documents or has come from a noisy collection. Since EM4QT does not directly consider term coherency, which is defined as fluency of the target translation, we investigate a crucial question: can EM4QT be improved using either coherency-based methods or token-to-token translation ones? To address this question, we combine different translation models via simple linear interpolation and a proposed divergence minimization method. Evaluations over four CLEF collections in Persian, French, Spanish, and German indicate that EM4QT significantly outperforms competitive baselines in all the collections. Our experiments also reveal that since EM4QT indirectly considers term coherency, combining the method with coherency-based models cannot significantly improve the retrieval performance. On the other hand, investigating the query-by-query results supports the view that EM4QT usually gives a relatively high weight to one translation and its combination with the proposed token-to-token translation model, which is obtained by running EM4QT for each query term separately, soothes the effect and reaches better results for many queries. Comparing the method with a competitive word-embedding baseline reveals the superiority of the proposed model.
    Date
    23. 1.2017 14:07:40
  3. Ding, Y.: Topic-based PageRank on author cocitation networks (2011) 0.02
    0.016703304 = product of:
      0.083516516 = sum of:
        0.02783884 = weight(_text_:23 in 4348) [ClassicSimilarity], result of:
          0.02783884 = score(doc=4348,freq=2.0), product of:
            0.117170855 = queryWeight, product of:
              3.5840597 = idf(docFreq=3336, maxDocs=44218)
              0.032692216 = queryNorm
            0.23759183 = fieldWeight in 4348, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5840597 = idf(docFreq=3336, maxDocs=44218)
              0.046875 = fieldNorm(doc=4348)
        0.02783884 = weight(_text_:23 in 4348) [ClassicSimilarity], result of:
          0.02783884 = score(doc=4348,freq=2.0), product of:
            0.117170855 = queryWeight, product of:
              3.5840597 = idf(docFreq=3336, maxDocs=44218)
              0.032692216 = queryNorm
            0.23759183 = fieldWeight in 4348, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5840597 = idf(docFreq=3336, maxDocs=44218)
              0.046875 = fieldNorm(doc=4348)
        0.02783884 = weight(_text_:23 in 4348) [ClassicSimilarity], result of:
          0.02783884 = score(doc=4348,freq=2.0), product of:
            0.117170855 = queryWeight, product of:
              3.5840597 = idf(docFreq=3336, maxDocs=44218)
              0.032692216 = queryNorm
            0.23759183 = fieldWeight in 4348, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5840597 = idf(docFreq=3336, maxDocs=44218)
              0.046875 = fieldNorm(doc=4348)
      0.2 = coord(3/15)
    
    Date
    17. 3.2011 18:08:23
  4. Bornmann, L.; Mutz, R.: From P100 to P100' : a new citation-rank approach (2014) 0.00
    0.0025830474 = product of:
      0.019372854 = sum of:
        0.00756127 = weight(_text_:in in 1431) [ClassicSimilarity], result of:
          0.00756127 = score(doc=1431,freq=4.0), product of:
            0.044469737 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.032692216 = queryNorm
            0.17003182 = fieldWeight in 1431, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0625 = fieldNorm(doc=1431)
        0.011811584 = product of:
          0.035434753 = sum of:
            0.035434753 = weight(_text_:22 in 1431) [ClassicSimilarity], result of:
              0.035434753 = score(doc=1431,freq=2.0), product of:
                0.114482574 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.032692216 = queryNorm
                0.30952093 = fieldWeight in 1431, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1431)
          0.33333334 = coord(1/3)
      0.13333334 = coord(2/15)
    
    Abstract
    Properties of a percentile-based rating scale needed in bibliometrics are formulated. Based on these properties, P100 was recently introduced as a new citation-rank approach (Bornmann, Leydesdorff, & Wang, 2013). In this paper, we conceptualize P100 and propose an improvement which we call P100'. Advantages and disadvantages of citation-rank indicators are noted.
    Date
    22. 8.2014 17:05:18
  5. Ravana, S.D.; Rajagopal, P.; Balakrishnan, V.: Ranking retrieval systems using pseudo relevance judgments (2015) 0.00
    0.002483384 = product of:
      0.018625379 = sum of:
        0.008185315 = weight(_text_:in in 2591) [ClassicSimilarity], result of:
          0.008185315 = score(doc=2591,freq=12.0), product of:
            0.044469737 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.032692216 = queryNorm
            0.18406484 = fieldWeight in 2591, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2591)
        0.010440065 = product of:
          0.031320192 = sum of:
            0.031320192 = weight(_text_:22 in 2591) [ClassicSimilarity], result of:
              0.031320192 = score(doc=2591,freq=4.0), product of:
                0.114482574 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.032692216 = queryNorm
                0.27358043 = fieldWeight in 2591, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2591)
          0.33333334 = coord(1/3)
      0.13333334 = coord(2/15)
    
    Abstract
    Purpose In a system-based approach, replicating the web would require large test collections, and judging the relevancy of all documents per topic in creating relevance judgment through human assessors is infeasible. Due to the large amount of documents that requires judgment, there are possible errors introduced by human assessors because of disagreements. The paper aims to discuss these issues. Design/methodology/approach This study explores exponential variation and document ranking methods that generate a reliable set of relevance judgments (pseudo relevance judgments) to reduce human efforts. These methods overcome problems with large amounts of documents for judgment while avoiding human disagreement errors during the judgment process. This study utilizes two key factors: number of occurrences of each document per topic from all the system runs; and document rankings to generate the alternate methods. Findings The effectiveness of the proposed method is evaluated using the correlation coefficient of ranked systems using mean average precision scores between the original Text REtrieval Conference (TREC) relevance judgments and pseudo relevance judgments. The results suggest that the proposed document ranking method with a pool depth of 100 could be a reliable alternative to reduce human effort and disagreement errors involved in generating TREC-like relevance judgments. Originality/value Simple methods proposed in this study show improvement in the correlation coefficient in generating alternate relevance judgment without human assessors while contributing to information retrieval evaluation.
    Date
    20. 1.2015 18:30:22
    18. 9.2018 18:22:56
  6. Baloh, P.; Desouza, K.C.; Hackney, R.: Contextualizing organizational interventions of knowledge management systems : a design science perspectiveA domain analysis (2012) 0.00
    0.001875403 = product of:
      0.014065522 = sum of:
        0.0066832816 = weight(_text_:in in 241) [ClassicSimilarity], result of:
          0.0066832816 = score(doc=241,freq=8.0), product of:
            0.044469737 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.032692216 = queryNorm
            0.15028831 = fieldWeight in 241, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0390625 = fieldNorm(doc=241)
        0.00738224 = product of:
          0.02214672 = sum of:
            0.02214672 = weight(_text_:22 in 241) [ClassicSimilarity], result of:
              0.02214672 = score(doc=241,freq=2.0), product of:
                0.114482574 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.032692216 = queryNorm
                0.19345059 = fieldWeight in 241, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=241)
          0.33333334 = coord(1/3)
      0.13333334 = coord(2/15)
    
    Abstract
    We address how individuals' (workers) knowledge needs influence the design of knowledge management systems (KMS), enabling knowledge creation and utilization. It is evident that KMS technologies and activities are indiscriminately deployed in most organizations with little regard to the actual context of their adoption. Moreover, it is apparent that the extant literature pertaining to knowledge management projects is frequently deficient in identifying the variety of factors indicative for successful KMS. This presents an obvious business practice and research gap that requires a critical analysis of the necessary intervention that will actually improve how workers can leverage and form organization-wide knowledge. This research involved an extensive review of the literature, a grounded theory methodological approach and rigorous data collection and synthesis through an empirical case analysis (Parsons Brinckerhoff and Samsung). The contribution of this study is the formulation of a model for designing KMS based upon the design science paradigm, which aspires to create artifacts that are interdependent of people and organizations. The essential proposition is that KMS design and implementation must be contextualized in relation to knowledge needs and that these will differ for various organizational settings. The findings present valuable insights and further understanding of the way in which KMS design efforts should be focused.
    Date
    11. 6.2012 14:22:34
  7. Soulier, L.; Jabeur, L.B.; Tamine, L.; Bahsoun, W.: On ranking relevant entities in heterogeneous networks using a language-based model (2013) 0.00
    0.0016144046 = product of:
      0.0121080335 = sum of:
        0.004725794 = weight(_text_:in in 664) [ClassicSimilarity], result of:
          0.004725794 = score(doc=664,freq=4.0), product of:
            0.044469737 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.032692216 = queryNorm
            0.10626988 = fieldWeight in 664, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0390625 = fieldNorm(doc=664)
        0.00738224 = product of:
          0.02214672 = sum of:
            0.02214672 = weight(_text_:22 in 664) [ClassicSimilarity], result of:
              0.02214672 = score(doc=664,freq=2.0), product of:
                0.114482574 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.032692216 = queryNorm
                0.19345059 = fieldWeight in 664, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=664)
          0.33333334 = coord(1/3)
      0.13333334 = coord(2/15)
    
    Abstract
    A new challenge, accessing multiple relevant entities, arises from the availability of linked heterogeneous data. In this article, we address more specifically the problem of accessing relevant entities, such as publications and authors within a bibliographic network, given an information need. We propose a novel algorithm, called BibRank, that estimates a joint relevance of documents and authors within a bibliographic network. This model ranks each type of entity using a score propagation algorithm with respect to the query topic and the structure of the underlying bi-type information entity network. Evidence sources, namely content-based and network-based scores, are both used to estimate the topical similarity between connected entities. For this purpose, authorship relationships are analyzed through a language model-based score on the one hand and on the other hand, non topically related entities of the same type are detected through marginal citations. The article reports the results of experiments using the Bibrank algorithm for an information retrieval task. The CiteSeerX bibliographic data set forms the basis for the topical query automatic generation and evaluation. We show that a statistically significant improvement over closely related ranking models is achieved.
    Date
    22. 3.2013 19:34:49
  8. Dang, E.K.F.; Luk, R.W.P.; Allan, J.: Beyond bag-of-words : bigram-enhanced context-dependent term weights (2014) 0.00
    6.6832814E-4 = product of:
      0.010024922 = sum of:
        0.010024922 = weight(_text_:in in 1283) [ClassicSimilarity], result of:
          0.010024922 = score(doc=1283,freq=18.0), product of:
            0.044469737 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.032692216 = queryNorm
            0.22543246 = fieldWeight in 1283, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1283)
      0.06666667 = coord(1/15)
    
    Abstract
    While term independence is a widely held assumption in most of the established information retrieval approaches, it is clearly not true and various works in the past have investigated a relaxation of the assumption. One approach is to use n-grams in document representation instead of unigrams. However, the majority of early works on n-grams obtained only modest performance improvement. On the other hand, the use of information based on supporting terms or "contexts" of queries has been found to be promising. In particular, recent studies showed that using new context-dependent term weights improved the performance of relevance feedback (RF) retrieval compared with using traditional bag-of-words BM25 term weights. Calculation of the new term weights requires an estimation of the local probability of relevance of each query term occurrence. In previous studies, the estimation of this probability was based on unigrams that occur in the neighborhood of a query term. We explore an integration of the n-gram and context approaches by computing context-dependent term weights based on a mixture of unigrams and bigrams. Extensive experiments are performed using the title queries of the Text Retrieval Conference (TREC)-6, TREC-7, TREC-8, and TREC-2005 collections, for RF with relevance judgment of either the top 10 or top 20 documents of an initial retrieval. We identify some crucial elements needed in the use of bigrams in our methods, such as proper inverse document frequency (IDF) weighting of the bigrams and noise reduction by pruning bigrams with large document frequency values. We show that enhancing context-dependent term weights with bigrams is effective in further improving retrieval performance.
  9. Zhang, W.; Yoshida, T.; Tang, X.: ¬A comparative study of TF*IDF, LSI and multi-words for text classification (2011) 0.00
    6.5482524E-4 = product of:
      0.009822378 = sum of:
        0.009822378 = weight(_text_:in in 1165) [ClassicSimilarity], result of:
          0.009822378 = score(doc=1165,freq=12.0), product of:
            0.044469737 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.032692216 = queryNorm
            0.22087781 = fieldWeight in 1165, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=1165)
      0.06666667 = coord(1/15)
    
    Abstract
    One of the main themes in text mining is text representation, which is fundamental and indispensable for text-based intellegent information processing. Generally, text representation inludes two tasks: indexing and weighting. This paper has comparatively studied TF*IDF, LSI and multi-word for text representation. We used a Chinese and an English document collection to respectively evaluate the three methods in information retreival and text categorization. Experimental results have demonstrated that in text categorization, LSI has better performance than other methods in both document collections. Also, LSI has produced the best performance in retrieving English documents. This outcome has shown that LSI has both favorable semantic and statistical quality and is different with the claim that LSI can not produce discriminative power for indexing.
    Theme
    Semantisches Umfeld in Indexierung u. Retrieval
  10. Karlsson, A.; Hammarfelt, B.; Steinhauer, H.J.; Falkman, G.; Olson, N.; Nelhans, G.; Nolin, J.: Modeling uncertainty in bibliometrics and information retrieval : an information fusion approach (2015) 0.00
    6.301059E-4 = product of:
      0.009451588 = sum of:
        0.009451588 = weight(_text_:in in 1696) [ClassicSimilarity], result of:
          0.009451588 = score(doc=1696,freq=4.0), product of:
            0.044469737 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.032692216 = queryNorm
            0.21253976 = fieldWeight in 1696, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.078125 = fieldNorm(doc=1696)
      0.06666667 = coord(1/15)
    
    Footnote
    Beitrag in einem Special Issue "Combining bibliometrics and information retrieval"
  11. Efron, M.: Linear time series models for term weighting in information retrieval (2010) 0.00
    5.9777097E-4 = product of:
      0.008966564 = sum of:
        0.008966564 = weight(_text_:in in 3688) [ClassicSimilarity], result of:
          0.008966564 = score(doc=3688,freq=10.0), product of:
            0.044469737 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.032692216 = queryNorm
            0.20163295 = fieldWeight in 3688, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=3688)
      0.06666667 = coord(1/15)
    
    Abstract
    Common measures of term importance in information retrieval (IR) rely on counts of term frequency; rare terms receive higher weight in document ranking than common terms receive. However, realistic scenarios yield additional information about terms in a collection. Of interest in this article is the temporal behavior of terms as a collection changes over time. We propose capturing each term's collection frequency at discrete time intervals over the lifespan of a corpus and analyzing the resulting time series. We hypothesize the collection frequency of a weakly discriminative term x at time t is predictable by a linear model of the term's prior observations. On the other hand, a linear time series model for a strong discriminators' collection frequency will yield a poor fit to the data. Operationalizing this hypothesis, we induce three time-based measures of term importance and test these against state-of-the-art term weighting models.
  12. Li, H.; Wu, H.; Li, D.; Lin, S.; Su, Z.; Luo, X.: PSI: A probabilistic semantic interpretable framework for fine-grained image ranking (2018) 0.00
    5.9777097E-4 = product of:
      0.008966564 = sum of:
        0.008966564 = weight(_text_:in in 4577) [ClassicSimilarity], result of:
          0.008966564 = score(doc=4577,freq=10.0), product of:
            0.044469737 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.032692216 = queryNorm
            0.20163295 = fieldWeight in 4577, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=4577)
      0.06666667 = coord(1/15)
    
    Abstract
    Image Ranking is one of the key problems in information science research area. However, most current methods focus on increasing the performance, leaving the semantic gap problem, which refers to the learned ranking models are hard to be understood, remaining intact. Therefore, in this article, we aim at learning an interpretable ranking model to tackle the semantic gap in fine-grained image ranking. We propose to combine attribute-based representation and online passive-aggressive (PA) learning based ranking models to achieve this goal. Besides, considering the highly localized instances in fine-grained image ranking, we introduce a supervised constrained clustering method to gather class-balanced training instances for local PA-based models, and incorporate the learned local models into a unified probabilistic framework. Extensive experiments on the benchmark demonstrate that the proposed framework outperforms state-of-the-art methods in terms of accuracy and speed.
  13. Nunes, S.; Ribeiro, C.; David, G.: Term weighting based on document revision history (2011) 0.00
    5.8941013E-4 = product of:
      0.008841151 = sum of:
        0.008841151 = weight(_text_:in in 4946) [ClassicSimilarity], result of:
          0.008841151 = score(doc=4946,freq=14.0), product of:
            0.044469737 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.032692216 = queryNorm
            0.19881277 = fieldWeight in 4946, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4946)
      0.06666667 = coord(1/15)
    
    Abstract
    In real-world information retrieval systems, the underlying document collection is rarely stable or definitive. This work is focused on the study of signals extracted from the content of documents at different points in time for the purpose of weighting individual terms in a document. The basic idea behind our proposals is that terms that have existed for a longer time in a document should have a greater weight. We propose 4 term weighting functions that use each document's history to estimate a current term score. To evaluate this thesis, we conduct 3 independent experiments using a collection of documents sampled from Wikipedia. In the first experiment, we use data from Wikipedia to judge each set of terms. In a second experiment, we use an external collection of tags from a popular social bookmarking service as a gold standard. In the third experiment, we crowdsource user judgments to collect feedback on term preference. Across all experiments results consistently support our thesis. We show that temporally aware measures, specifically the proposed revision term frequency and revision term frequency span, outperform a term-weighting measure based on raw term frequency alone.
  14. Symonds, M.; Bruza, P.; Zuccon, G.; Koopman, B.; Sitbon, L.; Turner, I.: Automatic query expansion : a structural linguistic perspective (2014) 0.00
    5.8941013E-4 = product of:
      0.008841151 = sum of:
        0.008841151 = weight(_text_:in in 1338) [ClassicSimilarity], result of:
          0.008841151 = score(doc=1338,freq=14.0), product of:
            0.044469737 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.032692216 = queryNorm
            0.19881277 = fieldWeight in 1338, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1338)
      0.06666667 = coord(1/15)
    
    Abstract
    A user's query is considered to be an imprecise description of their information need. Automatic query expansion is the process of reformulating the original query with the goal of improving retrieval effectiveness. Many successful query expansion techniques model syntagmatic associations that infer two terms co-occur more often than by chance in natural language. However, structural linguistics relies on both syntagmatic and paradigmatic associations to deduce the meaning of a word. Given the success of dependency-based approaches to query expansion and the reliance on word meanings in the query formulation process, we argue that modeling both syntagmatic and paradigmatic information in the query expansion process improves retrieval effectiveness. This article develops and evaluates a new query expansion technique that is based on a formal, corpus-based model of word meaning that models syntagmatic and paradigmatic associations. We demonstrate that when sufficient statistical information exists, as in the case of longer queries, including paradigmatic information alone provides significant improvements in retrieval effectiveness across a wide variety of data sets. More generally, when our new query expansion approach is applied to large-scale web retrieval it demonstrates significant improvements in retrieval effectiveness over a strong baseline system, based on a commercial search engine.
    Theme
    Semantisches Umfeld in Indexierung u. Retrieval
  15. Ye, Z.; Huang, J.X.: ¬A learning to rank approach for quality-aware pseudo-relevance feedback (2016) 0.00
    5.8941013E-4 = product of:
      0.008841151 = sum of:
        0.008841151 = weight(_text_:in in 2855) [ClassicSimilarity], result of:
          0.008841151 = score(doc=2855,freq=14.0), product of:
            0.044469737 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.032692216 = queryNorm
            0.19881277 = fieldWeight in 2855, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2855)
      0.06666667 = coord(1/15)
    
    Abstract
    Pseudo relevance feedback (PRF) has shown to be effective in ad hoc information retrieval. In traditional PRF methods, top-ranked documents are all assumed to be relevant and therefore treated equally in the feedback process. However, the performance gain brought by each document is different as showed in our preliminary experiments. Thus, it is more reasonable to predict the performance gain brought by each candidate feedback document in the process of PRF. We define the quality level (QL) and then use this information to adjust the weights of feedback terms in these documents. Unlike previous work, we do not make any explicit relevance assumption and we go beyond just selecting "good" documents for PRF. We propose a quality-based PRF framework, in which two quality-based assumptions are introduced. Particularly, two different strategies, relevance-based QL (RelPRF) and improvement-based QL (ImpPRF) are presented to estimate the QL of each feedback document. Based on this, we select a set of heterogeneous document-level features and apply a learning approach to evaluate the QL of each feedback document. Extensive experiments on standard TREC (Text REtrieval Conference) test collections show that our proposed model performs robustly and outperforms strong baselines significantly.
  16. Liu, R.-L.; Huang, Y.-C.: Ranker enhancement for proximity-based ranking of biomedical texts (2011) 0.00
    5.456877E-4 = product of:
      0.008185315 = sum of:
        0.008185315 = weight(_text_:in in 4947) [ClassicSimilarity], result of:
          0.008185315 = score(doc=4947,freq=12.0), product of:
            0.044469737 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.032692216 = queryNorm
            0.18406484 = fieldWeight in 4947, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4947)
      0.06666667 = coord(1/15)
    
    Abstract
    Biomedical decision making often requires relevant evidence from the biomedical literature. Retrieval of the evidence calls for a system that receives a natural language query for a biomedical information need and, among the huge amount of texts retrieved for the query, ranks relevant texts higher for further processing. However, state-of-the-art text rankers have weaknesses in dealing with biomedical queries, which often consist of several correlating concepts and prefer those texts that completely talk about the concepts. In this article, we present a technique, Proximity-Based Ranker Enhancer (PRE), to enhance text rankers by term-proximity information. PRE assesses the term frequency (TF) of each term in the text by integrating three types of term proximity to measure the contextual completeness of query terms appearing in nearby areas in the text being ranked. Therefore, PRE may serve as a preprocessor for (or supplement to) those rankers that consider TF in ranking, without the need to change the algorithms and development processes of the rankers. Empirical evaluation shows that PRE significantly improves various kinds of text rankers, and when compared with several state-of-the-art techniques that enhance rankers by term-proximity information, PRE may more stably and significantly enhance the rankers.
  17. Lee, J.; Min, J.-K.; Oh, A.; Chung, C.-W.: Effective ranking and search techniques for Web resources considering semantic relationships (2014) 0.00
    5.456877E-4 = product of:
      0.008185315 = sum of:
        0.008185315 = weight(_text_:in in 2670) [ClassicSimilarity], result of:
          0.008185315 = score(doc=2670,freq=12.0), product of:
            0.044469737 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.032692216 = queryNorm
            0.18406484 = fieldWeight in 2670, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2670)
      0.06666667 = coord(1/15)
    
    Abstract
    On the Semantic Web, the types of resources and the semantic relationships between resources are defined in an ontology. By using that information, the accuracy of information retrieval can be improved. In this paper, we present effective ranking and search techniques considering the semantic relationships in an ontology. Our technique retrieves top-k resources which are the most relevant to query keywords through the semantic relationships. To do this, we propose a weighting measure for the semantic relationship. Based on this measure, we propose a novel ranking method which considers the number of meaningful semantic relationships between a resource and keywords as well as the coverage and discriminating power of keywords. In order to improve the efficiency of the search, we prune the unnecessary search space using the length and weight thresholds of the semantic relationship path. In addition, we exploit Threshold Algorithm based on an extended inverted index to answer top-k results efficiently. The experimental results using real data sets demonstrate that our retrieval method using the semantic information generates accurate results efficiently compared to the traditional methods.
    Content
    Vgl.: doi: 10.1016/j.ipm.2013.08.007. A short preliminary version of this paper was published in the proceeding of WWW 2009 as a two page poster paper.
  18. Karisani, P.; Rahgozar, M.; Oroumchian, F.: Transforming LSA space dimensions into a rubric for an automatic assessment and feedback system (2016) 0.00
    5.456877E-4 = product of:
      0.008185315 = sum of:
        0.008185315 = weight(_text_:in in 2970) [ClassicSimilarity], result of:
          0.008185315 = score(doc=2970,freq=12.0), product of:
            0.044469737 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.032692216 = queryNorm
            0.18406484 = fieldWeight in 2970, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2970)
      0.06666667 = coord(1/15)
    
    Abstract
    Pseudo-relevance feedback is the basis of a category of automatic query modification techniques. Pseudo-relevance feedback methods assume the initial retrieved set of documents to be relevant. Then they use these documents to extract more relevant terms for the query or just re-weigh the user's original query. In this paper, we propose a straightforward, yet effective use of pseudo-relevance feedback method in detecting more informative query terms and re-weighting them. The query-by-query analysis of our results indicates that our method is capable of identifying the most important keywords even in short queries. Our main idea is that some of the top documents may contain a closer context to the user's information need than the others. Therefore, re-examining the similarity of those top documents and weighting this set based on their context could help in identifying and re-weighting informative query terms. Our experimental results in standard English and Persian test collections show that our method improves retrieval performance, in terms of MAP criterion, up to 7% over traditional query term re-weighting methods.
  19. Jacucci, G.; Barral, O.; Daee, P.; Wenzel, M.; Serim, B.; Ruotsalo, T.; Pluchino, P.; Freeman, J.; Gamberini, L.; Kaski, S.; Blankertz, B.: Integrating neurophysiologic relevance feedback in intent modeling for information retrieval (2019) 0.00
    5.456877E-4 = product of:
      0.008185315 = sum of:
        0.008185315 = weight(_text_:in in 5356) [ClassicSimilarity], result of:
          0.008185315 = score(doc=5356,freq=12.0), product of:
            0.044469737 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.032692216 = queryNorm
            0.18406484 = fieldWeight in 5356, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5356)
      0.06666667 = coord(1/15)
    
    Abstract
    The use of implicit relevance feedback from neurophysiology could deliver effortless information retrieval. However, both computing neurophysiologic responses and retrieving documents are characterized by uncertainty because of noisy signals and incomplete or inconsistent representations of the data. We present the first-of-its-kind, fully integrated information retrieval system that makes use of online implicit relevance feedback generated from brain activity as measured through electroencephalography (EEG), and eye movements. The findings of the evaluation experiment (N = 16) show that we are able to compute online neurophysiology-based relevance feedback with performance significantly better than chance in complex data domains and realistic search tasks. We contribute by demonstrating how to integrate in interactive intent modeling this inherently noisy implicit relevance feedback combined with scarce explicit feedback. Although experimental measures of task performance did not allow us to demonstrate how the classification outcomes translated into search task performance, the experiment proved that our approach is able to generate relevance feedback from brain signals and eye movements in a realistic scenario, thus providing promising implications for future work in neuroadaptive information retrieval (IR).
    Footnote
    Beitrag in einem 'Special issue on neuro-information science'.
  20. Yan, E.; Ding, Y.; Sugimoto, C.R.: P-Rank: an indicator measuring prestige in heterogeneous scholarly networks (2011) 0.00
    5.346625E-4 = product of:
      0.008019937 = sum of:
        0.008019937 = weight(_text_:in in 4349) [ClassicSimilarity], result of:
          0.008019937 = score(doc=4349,freq=8.0), product of:
            0.044469737 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.032692216 = queryNorm
            0.18034597 = fieldWeight in 4349, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=4349)
      0.06666667 = coord(1/15)
    
    Abstract
    Ranking scientific productivity and prestige are often limited to homogeneous networks. These networks are unable to account for the multiple factors that constitute the scholarly communication and reward system. This study proposes a new informetric indicator, P-Rank, for measuring prestige in heterogeneous scholarly networks containing articles, authors, and journals. P-Rank differentiates the weight of each citation based on its citing papers, citing journals, and citing authors. Articles from 16 representative library and information science journals are selected as the dataset. Principle Component Analysis is conducted to examine the relationship between P-Rank and other bibliometric indicators. We also compare the correlation and rank variances between citation counts and P-Rank scores. This work provides a new approach to examining prestige in scholarly communication networks in a more comprehensive and nuanced way.

Types

  • a 49
  • el 1
  • More… Less…