Search (51 results, page 1 of 3)

Bornmann, L.; Mutz, R.: From P100 to P100' : a new citation-rank approach (2014) 0.07

0.07207365 = product of:
  0.10811047 = sum of:
    0.019537456 = weight(_text_:of in 1431) [ClassicSimilarity], result of:
      0.019537456 = score(doc=1431,freq=6.0), product of:
        0.08160993 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.05218836 = queryNorm
        0.23940048 = fieldWeight in 1431, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0625 = fieldNorm(doc=1431)
    0.088573016 = sum of:
      0.0320066 = weight(_text_:science in 1431) [ClassicSimilarity], result of:
        0.0320066 = score(doc=1431,freq=2.0), product of:
          0.13747036 = queryWeight, product of:
            2.6341193 = idf(docFreq=8627, maxDocs=44218)
            0.05218836 = queryNorm
          0.23282544 = fieldWeight in 1431, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            2.6341193 = idf(docFreq=8627, maxDocs=44218)
            0.0625 = fieldNorm(doc=1431)
      0.056566417 = weight(_text_:22 in 1431) [ClassicSimilarity], result of:
        0.056566417 = score(doc=1431,freq=2.0), product of:
          0.18275474 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.05218836 = queryNorm
          0.30952093 = fieldWeight in 1431, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0625 = fieldNorm(doc=1431)
  0.6666667 = coord(2/3)

Abstract: Properties of a percentile-based rating scale needed in bibliometrics are formulated. Based on these properties, P100 was recently introduced as a new citation-rank approach (Bornmann, Leydesdorff, & Wang, 2013). In this paper, we conceptualize P100 and propose an improvement which we call P100'. Advantages and disadvantages of citation-rank indicators are noted.
Date: 22. 8.2014 17:05:18
Source: Journal of the Association for Information Science and Technology. 65(2014) no.9, S.1939-1943

Baloh, P.; Desouza, K.C.; Hackney, R.: Contextualizing organizational interventions of knowledge management systems : a design science perspectiveA domain analysis (2012) 0.06
```
0.062256187 = product of:
  0.09338428 = sum of:
    0.02338211 = weight(_text_:of in 241) [ClassicSimilarity], result of:
      0.02338211 = score(doc=241,freq=22.0), product of:
        0.08160993 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.05218836 = queryNorm
        0.28651062 = fieldWeight in 241, product of:
          4.690416 = tf(freq=22.0), with freq of:
            22.0 = termFreq=22.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=241)
    0.07000217 = sum of:
      0.03464816 = weight(_text_:science in 241) [ClassicSimilarity], result of:
        0.03464816 = score(doc=241,freq=6.0), product of:
          0.13747036 = queryWeight, product of:
            2.6341193 = idf(docFreq=8627, maxDocs=44218)
            0.05218836 = queryNorm
          0.25204095 = fieldWeight in 241, product of:
            2.4494898 = tf(freq=6.0), with freq of:
              6.0 = termFreq=6.0
            2.6341193 = idf(docFreq=8627, maxDocs=44218)
            0.0390625 = fieldNorm(doc=241)
      0.03535401 = weight(_text_:22 in 241) [ClassicSimilarity], result of:
        0.03535401 = score(doc=241,freq=2.0), product of:
          0.18275474 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.05218836 = queryNorm
          0.19345059 = fieldWeight in 241, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=241)
  0.6666667 = coord(2/3)
```
Abstract

We address how individuals' (workers) knowledge needs influence the design of knowledge management systems (KMS), enabling knowledge creation and utilization. It is evident that KMS technologies and activities are indiscriminately deployed in most organizations with little regard to the actual context of their adoption. Moreover, it is apparent that the extant literature pertaining to knowledge management projects is frequently deficient in identifying the variety of factors indicative for successful KMS. This presents an obvious business practice and research gap that requires a critical analysis of the necessary intervention that will actually improve how workers can leverage and form organization-wide knowledge. This research involved an extensive review of the literature, a grounded theory methodological approach and rigorous data collection and synthesis through an empirical case analysis (Parsons Brinckerhoff and Samsung). The contribution of this study is the formulation of a model for designing KMS based upon the design science paradigm, which aspires to create artifacts that are interdependent of people and organizations. The essential proposition is that KMS design and implementation must be contextualized in relation to knowledge needs and that these will differ for various organizational settings. The findings present valuable insights and further understanding of the way in which KMS design efforts should be focused.

Date

11. 6.2012 14:22:34

Source

Journal of the American Society for Information Science and Technology. 63(2012) no.5, S.948-966

Soulier, L.; Jabeur, L.B.; Tamine, L.; Bahsoun, W.: On ranking relevant entities in heterogeneous networks using a language-based model (2013) 0.05

0.05019898 = product of:
  0.075298466 = sum of:
    0.019940332 = weight(_text_:of in 664) [ClassicSimilarity], result of:
      0.019940332 = score(doc=664,freq=16.0), product of:
        0.08160993 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.05218836 = queryNorm
        0.24433708 = fieldWeight in 664, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=664)
    0.055358134 = sum of:
      0.020004123 = weight(_text_:science in 664) [ClassicSimilarity], result of:
        0.020004123 = score(doc=664,freq=2.0), product of:
          0.13747036 = queryWeight, product of:
            2.6341193 = idf(docFreq=8627, maxDocs=44218)
            0.05218836 = queryNorm
          0.1455159 = fieldWeight in 664, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            2.6341193 = idf(docFreq=8627, maxDocs=44218)
            0.0390625 = fieldNorm(doc=664)
      0.03535401 = weight(_text_:22 in 664) [ClassicSimilarity], result of:
        0.03535401 = score(doc=664,freq=2.0), product of:
          0.18275474 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.05218836 = queryNorm
          0.19345059 = fieldWeight in 664, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=664)
  0.6666667 = coord(2/3)

Abstract: A new challenge, accessing multiple relevant entities, arises from the availability of linked heterogeneous data. In this article, we address more specifically the problem of accessing relevant entities, such as publications and authors within a bibliographic network, given an information need. We propose a novel algorithm, called BibRank, that estimates a joint relevance of documents and authors within a bibliographic network. This model ranks each type of entity using a score propagation algorithm with respect to the query topic and the structure of the underlying bi-type information entity network. Evidence sources, namely content-based and network-based scores, are both used to estimate the topical similarity between connected entities. For this purpose, authorship relationships are analyzed through a language model-based score on the one hand and on the other hand, non topically related entities of the same type are detected through marginal citations. The article reports the results of experiments using the Bibrank algorithm for an information retrieval task. The CiteSeerX bibliographic data set forms the basis for the topical query automatic generation and evaluation. We show that a statistically significant improvement over closely related ranking models is achieved.
Date: 22. 3.2013 19:34:49
Source: Journal of the American Society for Information Science and Technology. 64(2013) no.3, S.500-515

Ravana, S.D.; Rajagopal, P.; Balakrishnan, V.: Ranking retrieval systems using pseudo relevance judgments (2015) 0.03
```
0.032254115 = product of:
  0.048381172 = sum of:
    0.02338211 = weight(_text_:of in 2591) [ClassicSimilarity], result of:
      0.02338211 = score(doc=2591,freq=22.0), product of:
        0.08160993 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.05218836 = queryNorm
        0.28651062 = fieldWeight in 2591, product of:
          4.690416 = tf(freq=22.0), with freq of:
            22.0 = termFreq=22.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2591)
    0.02499906 = product of:
      0.04999812 = sum of:
        0.04999812 = weight(_text_:22 in 2591) [ClassicSimilarity], result of:
          0.04999812 = score(doc=2591,freq=4.0), product of:
            0.18275474 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05218836 = queryNorm
            0.27358043 = fieldWeight in 2591, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2591)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

Purpose In a system-based approach, replicating the web would require large test collections, and judging the relevancy of all documents per topic in creating relevance judgment through human assessors is infeasible. Due to the large amount of documents that requires judgment, there are possible errors introduced by human assessors because of disagreements. The paper aims to discuss these issues. Design/methodology/approach This study explores exponential variation and document ranking methods that generate a reliable set of relevance judgments (pseudo relevance judgments) to reduce human efforts. These methods overcome problems with large amounts of documents for judgment while avoiding human disagreement errors during the judgment process. This study utilizes two key factors: number of occurrences of each document per topic from all the system runs; and document rankings to generate the alternate methods. Findings The effectiveness of the proposed method is evaluated using the correlation coefficient of ranked systems using mean average precision scores between the original Text REtrieval Conference (TREC) relevance judgments and pseudo relevance judgments. The results suggest that the proposed document ranking method with a pool depth of 100 could be a reliable alternative to reduce human effort and disagreement errors involved in generating TREC-like relevance judgments. Originality/value Simple methods proposed in this study show improvement in the correlation coefficient in generating alternate relevance judgment without human assessors while contributing to information retrieval evaluation.

Date

20. 1.2015 18:30:22
18. 9.2018 18:22:56

Source

Aslib journal of information management. 67(2015) no.6, S.700-714
Van der Veer Martens, B.; Fleet, C. van: Opening the black box of "relevance work" : a domain analysis (2012) 0.03
```
0.030021733 = product of:
  0.045032598 = sum of:
    0.028058534 = weight(_text_:of in 247) [ClassicSimilarity], result of:
      0.028058534 = score(doc=247,freq=22.0), product of:
        0.08160993 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.05218836 = queryNorm
        0.34381276 = fieldWeight in 247, product of:
          4.690416 = tf(freq=22.0), with freq of:
            22.0 = termFreq=22.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=247)
    0.016974064 = product of:
      0.033948127 = sum of:
        0.033948127 = weight(_text_:science in 247) [ClassicSimilarity], result of:
          0.033948127 = score(doc=247,freq=4.0), product of:
            0.13747036 = queryWeight, product of:
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.05218836 = queryNorm
            0.24694869 = fieldWeight in 247, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.046875 = fieldNorm(doc=247)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

In response to Hjørland's recent call for a reconceptualization of the foundations of relevance, we suggest that the sociocognitive aspects of intermediation by information agencies, such as archives and libraries, are a necessary and unexplored part of the infrastructure of the subject knowledge domains central to his recommended "view of relevance informed by a social paradigm" (2010, p. 217). From a comparative analysis of documents from 39 graduate-level introductory courses in archives, reference, and strategic/competitive intelligence taught in 13 American Library Association-accredited library and information science (LIS) programs, we identify four defining sociocognitive dimensions of "relevance work" in information agencies within Hjørland's proposed framework for relevance: tasks, time, systems, and assessors. This study is intended to supply sociocognitive content from within the relevance work domain to support further domain analytic research, and to emphasize the importance of intermediary relevance work for all subject knowledge domains.

Source

Journal of the American Society for Information Science and Technology. 63(2012) no.5, S.936-947
Dang, E.K.F.; Luk, R.W.P.; Allan, J.: Beyond bag-of-words : bigram-enhanced context-dependent term weights (2014) 0.03
```
0.02920836 = product of:
  0.04381254 = sum of:
    0.033810478 = weight(_text_:of in 1283) [ClassicSimilarity], result of:
      0.033810478 = score(doc=1283,freq=46.0), product of:
        0.08160993 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.05218836 = queryNorm
        0.41429368 = fieldWeight in 1283, product of:
          6.78233 = tf(freq=46.0), with freq of:
            46.0 = termFreq=46.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1283)
    0.010002062 = product of:
      0.020004123 = sum of:
        0.020004123 = weight(_text_:science in 1283) [ClassicSimilarity], result of:
          0.020004123 = score(doc=1283,freq=2.0), product of:
            0.13747036 = queryWeight, product of:
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.05218836 = queryNorm
            0.1455159 = fieldWeight in 1283, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1283)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

While term independence is a widely held assumption in most of the established information retrieval approaches, it is clearly not true and various works in the past have investigated a relaxation of the assumption. One approach is to use n-grams in document representation instead of unigrams. However, the majority of early works on n-grams obtained only modest performance improvement. On the other hand, the use of information based on supporting terms or "contexts" of queries has been found to be promising. In particular, recent studies showed that using new context-dependent term weights improved the performance of relevance feedback (RF) retrieval compared with using traditional bag-of-words BM25 term weights. Calculation of the new term weights requires an estimation of the local probability of relevance of each query term occurrence. In previous studies, the estimation of this probability was based on unigrams that occur in the neighborhood of a query term. We explore an integration of the n-gram and context approaches by computing context-dependent term weights based on a mixture of unigrams and bigrams. Extensive experiments are performed using the title queries of the Text Retrieval Conference (TREC)-6, TREC-7, TREC-8, and TREC-2005 collections, for RF with relevance judgment of either the top 10 or top 20 documents of an initial retrieval. We identify some crucial elements needed in the use of bigrams in our methods, such as proper inverse document frequency (IDF) weighting of the bigrams and noise reduction by pruning bigrams with large document frequency values. We show that enhancing context-dependent term weights with bigrams is effective in further improving retrieval performance.

Source

Journal of the Association for Information Science and Technology. 65(2014) no.6, S.1134-1148

Efron, M.: Linear time series models for term weighting in information retrieval (2010) 0.03

0.025836824 = product of:
  0.038755234 = sum of:
    0.02675276 = weight(_text_:of in 3688) [ClassicSimilarity], result of:
      0.02675276 = score(doc=3688,freq=20.0), product of:
        0.08160993 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.05218836 = queryNorm
        0.32781258 = fieldWeight in 3688, product of:
          4.472136 = tf(freq=20.0), with freq of:
            20.0 = termFreq=20.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=3688)
    0.012002475 = product of:
      0.02400495 = sum of:
        0.02400495 = weight(_text_:science in 3688) [ClassicSimilarity], result of:
          0.02400495 = score(doc=3688,freq=2.0), product of:
            0.13747036 = queryWeight, product of:
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.05218836 = queryNorm
            0.17461908 = fieldWeight in 3688, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.046875 = fieldNorm(doc=3688)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: Common measures of term importance in information retrieval (IR) rely on counts of term frequency; rare terms receive higher weight in document ranking than common terms receive. However, realistic scenarios yield additional information about terms in a collection. Of interest in this article is the temporal behavior of terms as a collection changes over time. We propose capturing each term's collection frequency at discrete time intervals over the lifespan of a corpus and analyzing the resulting time series. We hypothesize the collection frequency of a weakly discriminative term x at time t is predictable by a linear model of the term's prior observations. On the other hand, a linear time series model for a strong discriminators' collection frequency will yield a poor fit to the data. Operationalizing this hypothesis, we induce three time-based measures of term importance and test these against state-of-the-art term weighting models.
Source: Journal of the American Society for Information Science and Technology. 61(2010) no.7, S.1299-1312

Costa Carvalho, A. da; Rossi, C.; Moura, E.S. de; Silva, A.S. da; Fernandes, D.: LePrEF: Learn to precompute evidence fusion for efficient query evaluation (2012) 0.02
```
0.024253761 = product of:
  0.03638064 = sum of:
    0.02637858 = weight(_text_:of in 278) [ClassicSimilarity], result of:
      0.02637858 = score(doc=278,freq=28.0), product of:
        0.08160993 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.05218836 = queryNorm
        0.32322758 = fieldWeight in 278, product of:
          5.2915025 = tf(freq=28.0), with freq of:
            28.0 = termFreq=28.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=278)
    0.010002062 = product of:
      0.020004123 = sum of:
        0.020004123 = weight(_text_:science in 278) [ClassicSimilarity], result of:
          0.020004123 = score(doc=278,freq=2.0), product of:
            0.13747036 = queryWeight, product of:
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.05218836 = queryNorm
            0.1455159 = fieldWeight in 278, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.0390625 = fieldNorm(doc=278)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

State-of-the-art search engine ranking methods combine several distinct sources of relevance evidence to produce a high-quality ranking of results for each query. The fusion of information is currently done at query-processing time, which has a direct effect on the response time of search systems. Previous research also shows that an alternative to improve search efficiency in textual databases is to precompute term impacts at indexing time. In this article, we propose a novel alternative to precompute term impacts, providing a generic framework for combining any distinct set of sources of evidence by using a machine-learning technique. This method retains the advantages of producing high-quality results, but avoids the costs of combining evidence at query-processing time. Our method, called Learn to Precompute Evidence Fusion (LePrEF), uses genetic programming to compute a unified precomputed impact value for each term found in each document prior to query processing, at indexing time. Compared with previous research on precomputing term impacts, our method offers the advantage of providing a generic framework to precompute impact using any set of relevance evidence at any text collection, whereas previous research articles do not. The precomputed impact values are indexed and used later for computing document ranking at query-processing time. By doing so, our method effectively reduces the query processing to simple additions of such impacts. We show that this approach, while leading to results comparable to state-of-the-art ranking methods, also can lead to a significant decrease in computational costs during query processing.

Source

Journal of the American Society for Information Science and Technology. 63(2012) no.7, S.1383-1397
Ozdemiray, A.M.; Altingovde, I.S.: Explicit search result diversification using score and rank aggregation methods (2015) 0.02
```
0.024253761 = product of:
  0.03638064 = sum of:
    0.02637858 = weight(_text_:of in 1856) [ClassicSimilarity], result of:
      0.02637858 = score(doc=1856,freq=28.0), product of:
        0.08160993 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.05218836 = queryNorm
        0.32322758 = fieldWeight in 1856, product of:
          5.2915025 = tf(freq=28.0), with freq of:
            28.0 = termFreq=28.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1856)
    0.010002062 = product of:
      0.020004123 = sum of:
        0.020004123 = weight(_text_:science in 1856) [ClassicSimilarity], result of:
          0.020004123 = score(doc=1856,freq=2.0), product of:
            0.13747036 = queryWeight, product of:
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.05218836 = queryNorm
            0.1455159 = fieldWeight in 1856, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1856)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

Search result diversification is one of the key techniques to cope with the ambiguous and underspecified information needs of web users. In the last few years, strategies that are based on the explicit knowledge of query aspects emerged as highly effective ways of diversifying search results. Our contributions in this article are two-fold. First, we extensively evaluate the performance of a state-of-the-art explicit diversification strategy and pin-point its potential weaknesses. We propose basic yet novel optimizations to remedy these weaknesses and boost the performance of this algorithm. As a second contribution, inspired by the success of the current diversification strategies that exploit the relevance of the candidate documents to individual query aspects, we cast the diversification problem into the problem of ranking aggregation. To this end, we propose to materialize the re-rankings of the candidate documents for each query aspect and then merge these rankings by adapting the score(-based) and rank(-based) aggregation methods. Our extensive experimental evaluations show that certain ranking aggregation methods are superior to existing explicit diversification strategies in terms of diversification effectiveness. Furthermore, these ranking aggregation methods have lower computational complexity than the state-of-the-art diversification strategies.

Source

Journal of the Association for Information Science and Technology. 66(2015) no.6, S.1212-1228
Koumenides, C.L.; Shadbolt, N.R.: Ranking methods for entity-oriented semantic web search (2014) 0.02
```
0.023927417 = product of:
  0.035891123 = sum of:
    0.01891706 = weight(_text_:of in 1280) [ClassicSimilarity], result of:
      0.01891706 = score(doc=1280,freq=10.0), product of:
        0.08160993 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.05218836 = queryNorm
        0.23179851 = fieldWeight in 1280, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=1280)
    0.016974064 = product of:
      0.033948127 = sum of:
        0.033948127 = weight(_text_:science in 1280) [ClassicSimilarity], result of:
          0.033948127 = score(doc=1280,freq=4.0), product of:
            0.13747036 = queryWeight, product of:
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.05218836 = queryNorm
            0.24694869 = fieldWeight in 1280, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.046875 = fieldNorm(doc=1280)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

This article provides a technical review of semantic search methods used to support text-based search over formal Semantic Web knowledge bases. Our focus is on ranking methods and auxiliary processes explored by existing semantic search systems, outlined within broad areas of classification. We present reflective examples from the literature in some detail, which should appeal to readers interested in a deeper perspective on the various methods and systems implemented in the outlined literature. The presentation covers graph exploration and propagation methods, adaptations of classic probabilistic retrieval models, and query-independent link analysis via flexible extensions to the PageRank algorithm. Future research directions are discussed, including development of more cohesive retrieval models to unlock further potentials and uses, data indexing schemes, integration with user interfaces, and building community consensus for more systematic evaluation and gradual development.

Series

Advances in information science

Source

Journal of the Association for Information Science and Technology. 65(2014) no.6, S.1091-1106
Lee, J.-T.; Seo, J.; Jeon, J.; Rim, H.-C.: Sentence-based relevance flow analysis for high accuracy retrieval (2011) 0.02
```
0.023614064 = product of:
  0.035421096 = sum of:
    0.025419034 = weight(_text_:of in 4746) [ClassicSimilarity], result of:
      0.025419034 = score(doc=4746,freq=26.0), product of:
        0.08160993 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.05218836 = queryNorm
        0.31146988 = fieldWeight in 4746, product of:
          5.0990195 = tf(freq=26.0), with freq of:
            26.0 = termFreq=26.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4746)
    0.010002062 = product of:
      0.020004123 = sum of:
        0.020004123 = weight(_text_:science in 4746) [ClassicSimilarity], result of:
          0.020004123 = score(doc=4746,freq=2.0), product of:
            0.13747036 = queryWeight, product of:
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.05218836 = queryNorm
            0.1455159 = fieldWeight in 4746, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4746)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

Traditional ranking models for information retrieval lack the ability to make a clear distinction between relevant and nonrelevant documents at top ranks if both have similar bag-of-words representations with regard to a user query. We aim to go beyond the bag-of-words approach to document ranking in a new perspective, by representing each document as a sequence of sentences. We begin with an assumption that relevant documents are distinguishable from nonrelevant ones by sequential patterns of relevance degrees of sentences to a query. We introduce the notion of relevance flow, which refers to a stream of sentence-query relevance within a document. We then present a framework to learn a function for ranking documents effectively based on various features extracted from their relevance flows and leverage the output to enhance existing retrieval models. We validate the effectiveness of our approach by performing a number of retrieval experiments on three standard test collections, each comprising a different type of document: news articles, medical references, and blog posts. Experimental results demonstrate that the proposed approach can improve the retrieval performance at the top ranks significantly as compared with the state-of-the-art retrieval models regardless of document type.

Source

Journal of the American Society for Information Science and Technology. 62(2011) no.9, S.1666-1675
Liu, R.-L.; Huang, Y.-C.: Ranker enhancement for proximity-based ranking of biomedical texts (2011) 0.02
```
0.022949254 = product of:
  0.03442388 = sum of:
    0.02442182 = weight(_text_:of in 4947) [ClassicSimilarity], result of:
      0.02442182 = score(doc=4947,freq=24.0), product of:
        0.08160993 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.05218836 = queryNorm
        0.2992506 = fieldWeight in 4947, product of:
          4.8989797 = tf(freq=24.0), with freq of:
            24.0 = termFreq=24.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4947)
    0.010002062 = product of:
      0.020004123 = sum of:
        0.020004123 = weight(_text_:science in 4947) [ClassicSimilarity], result of:
          0.020004123 = score(doc=4947,freq=2.0), product of:
            0.13747036 = queryWeight, product of:
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.05218836 = queryNorm
            0.1455159 = fieldWeight in 4947, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4947)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

Biomedical decision making often requires relevant evidence from the biomedical literature. Retrieval of the evidence calls for a system that receives a natural language query for a biomedical information need and, among the huge amount of texts retrieved for the query, ranks relevant texts higher for further processing. However, state-of-the-art text rankers have weaknesses in dealing with biomedical queries, which often consist of several correlating concepts and prefer those texts that completely talk about the concepts. In this article, we present a technique, Proximity-Based Ranker Enhancer (PRE), to enhance text rankers by term-proximity information. PRE assesses the term frequency (TF) of each term in the text by integrating three types of term proximity to measure the contextual completeness of query terms appearing in nearby areas in the text being ranked. Therefore, PRE may serve as a preprocessor for (or supplement to) those rankers that consider TF in ranking, without the need to change the algorithms and development processes of the rankers. Empirical evaluation shows that PRE significantly improves various kinds of text rankers, and when compared with several state-of-the-art techniques that enhance rankers by term-proximity information, PRE may more stably and significantly enhance the rankers.

Source

Journal of the American Society for Information Science and Technology. 62(2011) no.12, S.2479-2495

Fu, X.: Towards a model of implicit feedback for Web search (2010) 0.02

0.02292363 = product of:
  0.034385443 = sum of:
    0.022382967 = weight(_text_:of in 3310) [ClassicSimilarity], result of:
      0.022382967 = score(doc=3310,freq=14.0), product of:
        0.08160993 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.05218836 = queryNorm
        0.2742677 = fieldWeight in 3310, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=3310)
    0.012002475 = product of:
      0.02400495 = sum of:
        0.02400495 = weight(_text_:science in 3310) [ClassicSimilarity], result of:
          0.02400495 = score(doc=3310,freq=2.0), product of:
            0.13747036 = queryWeight, product of:
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.05218836 = queryNorm
            0.17461908 = fieldWeight in 3310, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.046875 = fieldNorm(doc=3310)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: This research investigated several important issues in using implicit feedback techniques to assist searchers with difficulties in formulating effective search strategies. It focused on examining the relationship between types of behavioral evidence that can be captured from Web searches and searchers' interests. A carefully crafted observation study was conducted to capture, examine, and elucidate the analytical processes and work practices of human analysts when they simulated the role of an implicit feedback system by trying to infer searchers' interests from behavioral traces. Findings provided rare insight into the complexities and nuances in using behavioral evidence for implicit feedback and led to the proposal of an implicit feedback model for Web search that bridged previous studies on behavioral evidence and implicit feedback measures. A new level of analysis termed an analytical lens emerged from the data and provides a road map for future research on this topic.
Source: Journal of the American Society for Information Science and Technology. 61(2010) no.1, S.30-49

Liu, X.; Turtle, H.: Real-time user interest modeling for real-time ranking (2013) 0.02
```
0.02292363 = product of:
  0.034385443 = sum of:
    0.022382967 = weight(_text_:of in 1035) [ClassicSimilarity], result of:
      0.022382967 = score(doc=1035,freq=14.0), product of:
        0.08160993 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.05218836 = queryNorm
        0.2742677 = fieldWeight in 1035, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=1035)
    0.012002475 = product of:
      0.02400495 = sum of:
        0.02400495 = weight(_text_:science in 1035) [ClassicSimilarity], result of:
          0.02400495 = score(doc=1035,freq=2.0), product of:
            0.13747036 = queryWeight, product of:
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.05218836 = queryNorm
            0.17461908 = fieldWeight in 1035, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.046875 = fieldNorm(doc=1035)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

User interest as a very dynamic information need is often ignored in most existing information retrieval systems. In this research, we present the results of experiments designed to evaluate the performance of a real-time interest model (RIM) that attempts to identify the dynamic and changing query level interests regarding social media outputs. Unlike most existing ranking methods, our ranking approach targets calculation of the probability that user interest in the content of the document is subject to very dynamic user interest change. We describe 2 formulations of the model (real-time interest vector space and real-time interest language model) stemming from classical relevance ranking methods and develop a novel methodology for evaluating the performance of RIM using Amazon Mechanical Turk to collect (interest-based) relevance judgments on a daily basis. Our results show that the model usually, although not always, performs better than baseline results obtained from commercial web search engines. We identify factors that affect RIM performance and outline plans for future research.

Source

Journal of the American Society for Information Science and Technology. 64(2013) no.8, S.1557-1576
Jacucci, G.; Barral, O.; Daee, P.; Wenzel, M.; Serim, B.; Ruotsalo, T.; Pluchino, P.; Freeman, J.; Gamberini, L.; Kaski, S.; Blankertz, B.: Integrating neurophysiologic relevance feedback in intent modeling for information retrieval (2019) 0.02
```
0.022723591 = product of:
  0.034085386 = sum of:
    0.019940332 = weight(_text_:of in 5356) [ClassicSimilarity], result of:
      0.019940332 = score(doc=5356,freq=16.0), product of:
        0.08160993 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.05218836 = queryNorm
        0.24433708 = fieldWeight in 5356, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5356)
    0.014145052 = product of:
      0.028290104 = sum of:
        0.028290104 = weight(_text_:science in 5356) [ClassicSimilarity], result of:
          0.028290104 = score(doc=5356,freq=4.0), product of:
            0.13747036 = queryWeight, product of:
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.05218836 = queryNorm
            0.20579056 = fieldWeight in 5356, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5356)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

The use of implicit relevance feedback from neurophysiology could deliver effortless information retrieval. However, both computing neurophysiologic responses and retrieving documents are characterized by uncertainty because of noisy signals and incomplete or inconsistent representations of the data. We present the first-of-its-kind, fully integrated information retrieval system that makes use of online implicit relevance feedback generated from brain activity as measured through electroencephalography (EEG), and eye movements. The findings of the evaluation experiment (N = 16) show that we are able to compute online neurophysiology-based relevance feedback with performance significantly better than chance in complex data domains and realistic search tasks. We contribute by demonstrating how to integrate in interactive intent modeling this inherently noisy implicit relevance feedback combined with scarce explicit feedback. Although experimental measures of task performance did not allow us to demonstrate how the classification outcomes translated into search task performance, the experiment proved that our approach is able to generate relevance feedback from brain signals and eye movements in a realistic scenario, thus providing promising implications for future work in neuroadaptive information retrieval (IR).

Footnote

Beitrag in einem 'Special issue on neuro-information science'.

Source

Journal of the Association for Information Science and Technology. 70(2019) no.9, S.917-930
Li, H.; Wu, H.; Li, D.; Lin, S.; Su, Z.; Luo, X.: PSI: A probabilistic semantic interpretable framework for fine-grained image ranking (2018) 0.02
```
0.022595998 = product of:
  0.033893995 = sum of:
    0.016919931 = weight(_text_:of in 4577) [ClassicSimilarity], result of:
      0.016919931 = score(doc=4577,freq=8.0), product of:
        0.08160993 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.05218836 = queryNorm
        0.20732689 = fieldWeight in 4577, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=4577)
    0.016974064 = product of:
      0.033948127 = sum of:
        0.033948127 = weight(_text_:science in 4577) [ClassicSimilarity], result of:
          0.033948127 = score(doc=4577,freq=4.0), product of:
            0.13747036 = queryWeight, product of:
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.05218836 = queryNorm
            0.24694869 = fieldWeight in 4577, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.046875 = fieldNorm(doc=4577)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

Image Ranking is one of the key problems in information science research area. However, most current methods focus on increasing the performance, leaving the semantic gap problem, which refers to the learned ranking models are hard to be understood, remaining intact. Therefore, in this article, we aim at learning an interpretable ranking model to tackle the semantic gap in fine-grained image ranking. We propose to combine attribute-based representation and online passive-aggressive (PA) learning based ranking models to achieve this goal. Besides, considering the highly localized instances in fine-grained image ranking, we introduce a supervised constrained clustering method to gather class-balanced training instances for local PA-based models, and incorporate the learned local models into a unified probabilistic framework. Extensive experiments on the benchmark demonstrate that the proposed framework outperforms state-of-the-art methods in terms of accuracy and speed.

Source

Journal of the Association for Information Science and Technology. 69(2018) no.12, S.1488-1501
González-Ibáñez, R.; Esparza-Villamán, A.; Vargas-Godoy, J.C.; Shah, C.: ¬A comparison of unimodal and multimodal models for implicit detection of relevance in interactive IR (2019) 0.02
```
0.022256117 = product of:
  0.033384174 = sum of:
    0.02338211 = weight(_text_:of in 5417) [ClassicSimilarity], result of:
      0.02338211 = score(doc=5417,freq=22.0), product of:
        0.08160993 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.05218836 = queryNorm
        0.28651062 = fieldWeight in 5417, product of:
          4.690416 = tf(freq=22.0), with freq of:
            22.0 = termFreq=22.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5417)
    0.010002062 = product of:
      0.020004123 = sum of:
        0.020004123 = weight(_text_:science in 5417) [ClassicSimilarity], result of:
          0.020004123 = score(doc=5417,freq=2.0), product of:
            0.13747036 = queryWeight, product of:
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.05218836 = queryNorm
            0.1455159 = fieldWeight in 5417, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5417)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

Implicit detection of relevance has been approached by many during the last decade. From the use of individual measures to the use of multiple features from different sources (multimodality), studies have shown the feasibility to automatically detect whether a document is relevant. Despite promising results, it is not clear yet to what extent multimodality constitutes an effective approach compared to unimodality. In this article, we hypothesize that it is possible to build unimodal models capable of outperforming multimodal models in the detection of perceived relevance. To test this hypothesis, we conducted three experiments to compare unimodal and multimodal classification models built using a combination of 24 features. Our classification experiments showed that a univariate unimodal model based on the left-click feature supports our hypothesis. On the other hand, our prediction experiment suggests that multimodality slightly improves early classification compared to the best unimodal models. Based on our results, we argue that the feasibility for practical applications of state-of-the-art multimodal approaches may be strongly constrained by technology, cultural, ethical, and legal aspects, in which case unimodality may offer a better alternative today for supporting relevance detection in interactive information retrieval systems.

Source

Journal of the Association for Information Science and Technology. 70(2019) no.11, S.1223-1235

Efron, M.; Winget, M.: Query polyrepresentation for ranking retrieval systems without relevance judgments (2010) 0.02

0.021816716 = product of:
  0.032725073 = sum of:
    0.020722598 = weight(_text_:of in 3469) [ClassicSimilarity], result of:
      0.020722598 = score(doc=3469,freq=12.0), product of:
        0.08160993 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.05218836 = queryNorm
        0.25392252 = fieldWeight in 3469, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=3469)
    0.012002475 = product of:
      0.02400495 = sum of:
        0.02400495 = weight(_text_:science in 3469) [ClassicSimilarity], result of:
          0.02400495 = score(doc=3469,freq=2.0), product of:
            0.13747036 = queryWeight, product of:
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.05218836 = queryNorm
            0.17461908 = fieldWeight in 3469, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.046875 = fieldNorm(doc=3469)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: Ranking information retrieval (IR) systems with respect to their effectiveness is a crucial operation during IR evaluation, as well as during data fusion. This article offers a novel method of approaching the system-ranking problem, based on the widely studied idea of polyrepresentation. The principle of polyrepresentation suggests that a single information need can be represented by many query articulations-what we call query aspects. By skimming the top k (where k is small) documents retrieved by a single system for multiple query aspects, we collect a set of documents that are likely to be relevant to a given test topic. Labeling these skimmed documents as putatively relevant lets us build pseudorelevance judgments without undue human intervention. We report experiments where using these pseudorelevance judgments delivers a rank ordering of IR systems that correlates highly with rankings based on human relevance judgments.
Source: Journal of the American Society for Information Science and Technology. 61(2010) no.6, S.1081-1091

Cecchini, R.L.; Lorenzetti, C.M.; Maguitman, A.G.; Brignole, N.B.: Multiobjective evolutionary algorithms for context-based search (2010) 0.02
```
0.021816716 = product of:
  0.032725073 = sum of:
    0.020722598 = weight(_text_:of in 3482) [ClassicSimilarity], result of:
      0.020722598 = score(doc=3482,freq=12.0), product of:
        0.08160993 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.05218836 = queryNorm
        0.25392252 = fieldWeight in 3482, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=3482)
    0.012002475 = product of:
      0.02400495 = sum of:
        0.02400495 = weight(_text_:science in 3482) [ClassicSimilarity], result of:
          0.02400495 = score(doc=3482,freq=2.0), product of:
            0.13747036 = queryWeight, product of:
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.05218836 = queryNorm
            0.17461908 = fieldWeight in 3482, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.046875 = fieldNorm(doc=3482)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

Formulating high-quality queries is a key aspect of context-based search. However, determining the effectiveness of a query is challenging because multiple objectives, such as high precision and high recall, are usually involved. In this work, we study techniques that can be applied to evolve contextualized queries when the criteria for determining query quality are based on multiple objectives. We report on the results of three different strategies for evolving queries: (a) single-objective, (b) multiobjective with Pareto-based ranking, and (c) multiobjective with aggregative ranking. After a comprehensive evaluation with a large set of topics, we discuss the limitations of the single-objective approach and observe that both the Pareto-based and aggregative strategies are highly effective for evolving topical queries. In particular, our experiments lead us to conclude that the multiobjective techniques are superior to a baseline as well as to well-known and ad hoc query reformulation techniques.

Source

Journal of the American Society for Information Science and Technology. 61(2010) no.6, S.1258-1274

Moura, E.S. de; Fernandes, D.; Ribeiro-Neto, B.; Silva, A.S. da; Gonçalves, M.A.: Using structural information to improve search in Web collections (2010) 0.02

0.021816716 = product of:
  0.032725073 = sum of:
    0.020722598 = weight(_text_:of in 4119) [ClassicSimilarity], result of:
      0.020722598 = score(doc=4119,freq=12.0), product of:
        0.08160993 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.05218836 = queryNorm
        0.25392252 = fieldWeight in 4119, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=4119)
    0.012002475 = product of:
      0.02400495 = sum of:
        0.02400495 = weight(_text_:science in 4119) [ClassicSimilarity], result of:
          0.02400495 = score(doc=4119,freq=2.0), product of:
            0.13747036 = queryWeight, product of:
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.05218836 = queryNorm
            0.17461908 = fieldWeight in 4119, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.046875 = fieldNorm(doc=4119)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: In this work, we investigate the problem of using the block structure of Web pages to improve ranking results. Starting with basic intuitions provided by the concepts of term frequency (TF) and inverse document frequency (IDF), we propose nine block-weight functions to distinguish the impact of term occurrences inside page blocks, instead of inside whole pages. These are then used to compute a modified BM25 ranking function. Using four distinct Web collections, we ran extensive experiments to compare our block-weight ranking formulas with two other baselines: (a) a BM25 ranking applied to full pages, and (b) a BM25 ranking that takes into account best blocks. Our methods suggest that our block-weighting ranking method is superior to all baselines across all collections we used and that average gain in precision figures from 5 to 20% are generated.
Source: Journal of the American Society for Information Science and Technology. 61(2010) no.12, S.2503-2513

Search (51 results, page 1 of 3)

Authors

Languages

Types

Themes