Search (306 results, page 1 of 16)

  • × theme_ss:"Retrievalalgorithmen"
  1. Kang, I.-H.; Kim, G.C.: Integration of multiple evidences based on a query type for web search (2004) 0.05
    0.046878368 = product of:
      0.10938285 = sum of:
        0.014529302 = weight(_text_:of in 2568) [ClassicSimilarity], result of:
          0.014529302 = score(doc=2568,freq=12.0), product of:
            0.06866331 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.043909185 = queryNorm
            0.21160212 = fieldWeight in 2568, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2568)
        0.07258732 = weight(_text_:distribution in 2568) [ClassicSimilarity], result of:
          0.07258732 = score(doc=2568,freq=2.0), product of:
            0.24019864 = queryWeight, product of:
              5.4703507 = idf(docFreq=505, maxDocs=44218)
              0.043909185 = queryNorm
            0.30219704 = fieldWeight in 2568, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4703507 = idf(docFreq=505, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2568)
        0.022266233 = product of:
          0.044532467 = sum of:
            0.044532467 = weight(_text_:service in 2568) [ClassicSimilarity], result of:
              0.044532467 = score(doc=2568,freq=2.0), product of:
                0.18813887 = queryWeight, product of:
                  4.284727 = idf(docFreq=1655, maxDocs=44218)
                  0.043909185 = queryNorm
                0.23669997 = fieldWeight in 2568, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.284727 = idf(docFreq=1655, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2568)
          0.5 = coord(1/2)
      0.42857143 = coord(3/7)
    
    Abstract
    The massive and heterogeneous Web exacerbates IR problems and short user queries make them worse. The contents of web pages are not enough to find answer pages. PageRank compensates for the insufficiencies of content information. The content information and PageRank are combined to get better results. However, static combination of multiple evidences may lower the retrieval performance. We have to use different strategies to meet the need of a user. We can classify user queries as three categories according to users' intent, the topic relevance task, the homepage finding task, and the service finding task. In this paper, we present a user query classification method. The difference of distribution, mutual information, the usage rate as anchor texts and the POS information are used for the classification. After we classified a user query, we apply different algorithms and information for the better results. For the topic relevance task, we emphasize the content information, on the other hand, for the homepage finding task, we emphasize the Link information and the URL information. We could get the best performance when our proposed classification method with the OKAPI scoring algorithm was used.
  2. Stock, W.G.: On relevance distributions (2006) 0.04
    0.04085226 = product of:
      0.1429829 = sum of:
        0.026843186 = weight(_text_:of in 5116) [ClassicSimilarity], result of:
          0.026843186 = score(doc=5116,freq=16.0), product of:
            0.06866331 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.043909185 = queryNorm
            0.39093933 = fieldWeight in 5116, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0625 = fieldNorm(doc=5116)
        0.11613971 = weight(_text_:distribution in 5116) [ClassicSimilarity], result of:
          0.11613971 = score(doc=5116,freq=2.0), product of:
            0.24019864 = queryWeight, product of:
              5.4703507 = idf(docFreq=505, maxDocs=44218)
              0.043909185 = queryNorm
            0.48351526 = fieldWeight in 5116, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4703507 = idf(docFreq=505, maxDocs=44218)
              0.0625 = fieldNorm(doc=5116)
      0.2857143 = coord(2/7)
    
    Abstract
    There are at least three possible ways that documents are distributed by relevance: informetric (power law), inverse logistic, and dichotomous. The nature of the type of distribution has implications for the construction of relevance ranking algorithms for search engines, for automated (blind) relevance feedback, for user behavior when using Web search engines, for combining of outputs of search engines for metasearch, for topic detection and tracking, and for the methodology of evaluation of information retrieval systems.
    Source
    Journal of the American Society for Information Science and Technology. 57(2006) no.8, S.1126-1129
  3. Ding, Y.: Topic-based PageRank on author cocitation networks (2011) 0.04
    0.039263006 = product of:
      0.13742052 = sum of:
        0.01423575 = weight(_text_:of in 4348) [ClassicSimilarity], result of:
          0.01423575 = score(doc=4348,freq=8.0), product of:
            0.06866331 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.043909185 = queryNorm
            0.20732689 = fieldWeight in 4348, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.046875 = fieldNorm(doc=4348)
        0.12318477 = weight(_text_:distribution in 4348) [ClassicSimilarity], result of:
          0.12318477 = score(doc=4348,freq=4.0), product of:
            0.24019864 = queryWeight, product of:
              5.4703507 = idf(docFreq=505, maxDocs=44218)
              0.043909185 = queryNorm
            0.5128454 = fieldWeight in 4348, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.4703507 = idf(docFreq=505, maxDocs=44218)
              0.046875 = fieldNorm(doc=4348)
      0.2857143 = coord(2/7)
    
    Abstract
    Ranking authors is vital for identifying a researcher's impact and standing within a scientific field. There are many different ranking methods (e.g., citations, publications, h-index, PageRank, and weighted PageRank), but most of them are topic-independent. This paper proposes topic-dependent ranks based on the combination of a topic model and a weighted PageRank algorithm. The author-conference-topic (ACT) model was used to extract topic distribution of individual authors. Two ways for combining the ACT model with the PageRank algorithm are proposed: simple combination (I_PR) or using a topic distribution as a weighted vector for PageRank (PR_t). Information retrieval was chosen as the test field and representative authors for different topics at different time phases were identified. Principal component analysis (PCA) was applied to analyze the ranking difference between I_PR and PR_t.
    Source
    Journal of the American Society for Information Science and Technology. 62(2011) no.3, S.449-466
  4. Yang, L.; Ji, D.; Leong, M.: Document reranking by term distribution and maximal marginal relevance for chinese information retrieval (2007) 0.04
    0.038718086 = product of:
      0.13551329 = sum of:
        0.01232852 = weight(_text_:of in 907) [ClassicSimilarity], result of:
          0.01232852 = score(doc=907,freq=6.0), product of:
            0.06866331 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.043909185 = queryNorm
            0.17955035 = fieldWeight in 907, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.046875 = fieldNorm(doc=907)
        0.12318477 = weight(_text_:distribution in 907) [ClassicSimilarity], result of:
          0.12318477 = score(doc=907,freq=4.0), product of:
            0.24019864 = queryWeight, product of:
              5.4703507 = idf(docFreq=505, maxDocs=44218)
              0.043909185 = queryNorm
            0.5128454 = fieldWeight in 907, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.4703507 = idf(docFreq=505, maxDocs=44218)
              0.046875 = fieldNorm(doc=907)
      0.2857143 = coord(2/7)
    
    Abstract
    In this paper, we propose a document reranking method for Chinese information retrieval. The method is based on a term weighting scheme, which integrates local and global distribution of terms as well as document frequency, document positions and term length. The weight scheme allows randomly setting a larger portion of the retrieved documents as relevance feedback, and lifts off the worry that very fewer relevant documents appear in top retrieved documents. It also helps to improve the performance of maximal marginal relevance (MMR) in document reranking. The method was evaluated by MAP (mean average precision), a recall-oriented measure. Significance tests showed that our method can get significant improvement against standard baselines, and outperform relevant methods consistently.
  5. Cheng, C.-S.; Chung, C.-P.; Shann, J.J.-J.: Fast query evaluation through document identifier assignment for inverted file-based information retrieval systems (2006) 0.03
    0.033813547 = product of:
      0.11834741 = sum of:
        0.01569344 = weight(_text_:of in 979) [ClassicSimilarity], result of:
          0.01569344 = score(doc=979,freq=14.0), product of:
            0.06866331 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.043909185 = queryNorm
            0.22855641 = fieldWeight in 979, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0390625 = fieldNorm(doc=979)
        0.102653965 = weight(_text_:distribution in 979) [ClassicSimilarity], result of:
          0.102653965 = score(doc=979,freq=4.0), product of:
            0.24019864 = queryWeight, product of:
              5.4703507 = idf(docFreq=505, maxDocs=44218)
              0.043909185 = queryNorm
            0.42737114 = fieldWeight in 979, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.4703507 = idf(docFreq=505, maxDocs=44218)
              0.0390625 = fieldNorm(doc=979)
      0.2857143 = coord(2/7)
    
    Abstract
    Compressing an inverted file can greatly improve query performance of an information retrieval system (IRS) by reducing disk I/Os. We observe that a good document identifier assignment (DIA) can make the document identifiers in the posting lists more clustered, and result in better compression as well as shorter query processing time. In this paper, we tackle the NP-complete problem of finding an optimal DIA to minimize the average query processing time in an IRS when the probability distribution of query terms is given. We indicate that the greedy nearest neighbor (Greedy-NN) algorithm can provide excellent performance for this problem. However, the Greedy-NN algorithm is inappropriate if used in large-scale IRSs, due to its high complexity O(N2 × n), where N denotes the number of documents and n denotes the number of distinct terms. In real-world IRSs, the distribution of query terms is skewed. Based on this fact, we propose a fast O(N × n) heuristic, called partition-based document identifier assignment (PBDIA) algorithm, which can efficiently assign consecutive document identifiers to those documents containing frequently used query terms, and improve compression efficiency of the posting lists for those terms. This can result in reduced query processing time. The experimental results show that the PBDIA algorithm can yield a competitive performance versus the Greedy-NN for the DIA problem, and that this optimization problem has significant advantages for both long queries and parallel information retrieval (IR).
  6. Hoenkamp, E.; Bruza, P.D.; Song, D.; Huang, Q.: ¬An effective approach to verbose queries using a limited dependencies language model (2009) 0.03
    0.033024497 = product of:
      0.11558574 = sum of:
        0.015005797 = weight(_text_:of in 2122) [ClassicSimilarity], result of:
          0.015005797 = score(doc=2122,freq=20.0), product of:
            0.06866331 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.043909185 = queryNorm
            0.21854173 = fieldWeight in 2122, product of:
              4.472136 = tf(freq=20.0), with freq of:
                20.0 = termFreq=20.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.03125 = fieldNorm(doc=2122)
        0.10057994 = weight(_text_:distribution in 2122) [ClassicSimilarity], result of:
          0.10057994 = score(doc=2122,freq=6.0), product of:
            0.24019864 = queryWeight, product of:
              5.4703507 = idf(docFreq=505, maxDocs=44218)
              0.043909185 = queryNorm
            0.41873652 = fieldWeight in 2122, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              5.4703507 = idf(docFreq=505, maxDocs=44218)
              0.03125 = fieldNorm(doc=2122)
      0.2857143 = coord(2/7)
    
    Abstract
    Intuitively, any 'bag of words' approach in IR should benefit from taking term dependencies into account. Unfortunately, for years the results of exploiting such dependencies have been mixed or inconclusive. To improve the situation, this paper shows how the natural language properties of the target documents can be used to transform and enrich the term dependencies to more useful statistics. This is done in three steps. The term co-occurrence statistics of queries and documents are each represented by a Markov chain. The paper proves that such a chain is ergodic, and therefore its asymptotic behavior is unique, stationary, and independent of the initial state. Next, the stationary distribution is taken to model queries and documents, rather than their initial distributions. Finally, ranking is achieved following the customary language modeling paradigm. The main contribution of this paper is to argue why the asymptotic behavior of the document model is a better representation then just the document's initial distribution. A secondary contribution is to investigate the practical application of this representation in case the queries become increasingly verbose. In the experiments (based on Lemur's search engine substrate) the default query model was replaced by the stable distribution of the query. Just modeling the query this way already resulted in significant improvements over a standard language model baseline. The results were on a par or better than more sophisticated algorithms that use fine-tuned parameters or extensive training. Moreover, the more verbose the query, the more effective the approach seems to become.
    Source
    Second International Conference on the Theory of Information Retrieval, ICTIR 2009 Cambridge, UK, September 10-12, 2009 Proceedings. Ed.: L. Azzopardi
  7. Dominich, S.; Skrop, A.: PageRank and interaction information retrieval (2005) 0.03
    0.030639194 = product of:
      0.107237175 = sum of:
        0.020132389 = weight(_text_:of in 3268) [ClassicSimilarity], result of:
          0.020132389 = score(doc=3268,freq=16.0), product of:
            0.06866331 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.043909185 = queryNorm
            0.2932045 = fieldWeight in 3268, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.046875 = fieldNorm(doc=3268)
        0.08710478 = weight(_text_:distribution in 3268) [ClassicSimilarity], result of:
          0.08710478 = score(doc=3268,freq=2.0), product of:
            0.24019864 = queryWeight, product of:
              5.4703507 = idf(docFreq=505, maxDocs=44218)
              0.043909185 = queryNorm
            0.36263645 = fieldWeight in 3268, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4703507 = idf(docFreq=505, maxDocs=44218)
              0.046875 = fieldNorm(doc=3268)
      0.2857143 = coord(2/7)
    
    Abstract
    The PageRank method is used by the Google Web search engine to compute the importance of Web pages. Two different views have been developed for the Interpretation of the PageRank method and values: (a) stochastic (random surfer): the PageRank values can be conceived as the steady-state distribution of a Markov chain, and (b) algebraic: the PageRank values form the eigenvector corresponding to eigenvalue 1 of the Web link matrix. The Interaction Information Retrieval (1**2 R) method is a nonclassical information retrieval paradigm, which represents a connectionist approach based an dynamic systems. In the present paper, a different Interpretation of PageRank is proposed, namely, a dynamic systems viewpoint, by showing that the PageRank method can be formally interpreted as a particular case of the Interaction Information Retrieval method; and thus, the PageRank values may be interpreted as neutral equilibrium points of the Web.
    Source
    Journal of the American Society for Information Science and Technology. 56(2005) no.1, S.63-69
  8. Khoo, C.S.G.; Wan, K.-W.: ¬A simple relevancy-ranking strategy for an interface to Boolean OPACs (2004) 0.03
    0.027913082 = product of:
      0.065130524 = sum of:
        0.01608099 = weight(_text_:of in 2509) [ClassicSimilarity], result of:
          0.01608099 = score(doc=2509,freq=30.0), product of:
            0.06866331 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.043909185 = queryNorm
            0.23420064 = fieldWeight in 2509, product of:
              5.477226 = tf(freq=30.0), with freq of:
                30.0 = termFreq=30.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.02734375 = fieldNorm(doc=2509)
        0.03863863 = weight(_text_:congress in 2509) [ClassicSimilarity], result of:
          0.03863863 = score(doc=2509,freq=2.0), product of:
            0.20946044 = queryWeight, product of:
              4.7703104 = idf(docFreq=1018, maxDocs=44218)
              0.043909185 = queryNorm
            0.18446743 = fieldWeight in 2509, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.7703104 = idf(docFreq=1018, maxDocs=44218)
              0.02734375 = fieldNorm(doc=2509)
        0.010410905 = product of:
          0.02082181 = sum of:
            0.02082181 = weight(_text_:22 in 2509) [ClassicSimilarity], result of:
              0.02082181 = score(doc=2509,freq=2.0), product of:
                0.15376249 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.043909185 = queryNorm
                0.1354154 = fieldWeight in 2509, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.02734375 = fieldNorm(doc=2509)
          0.5 = coord(1/2)
      0.42857143 = coord(3/7)
    
    Abstract
    A relevancy-ranking algorithm for a natural language interface to Boolean online public access catalogs (OPACs) was formulated and compared with that currently used in a knowledge-based search interface called the E-Referencer, being developed by the authors. The algorithm makes use of seven weIl-known ranking criteria: breadth of match, section weighting, proximity of query words, variant word forms (stemming), document frequency, term frequency and document length. The algorithm converts a natural language query into a series of increasingly broader Boolean search statements. In a small experiment with ten subjects in which the algorithm was simulated by hand, the algorithm obtained good results with a mean overall precision of 0.42 and mean average precision of 0.62, representing a 27 percent improvement in precision and 41 percent improvement in average precision compared to the E-Referencer. The usefulness of each step in the algorithm was analyzed and suggestions are made for improving the algorithm.
    Content
    "Most Web search engines accept natural language queries, perform some kind of fuzzy matching and produce ranked output, displaying first the documents that are most likely to be relevant. On the other hand, most library online public access catalogs (OPACs) an the Web are still Boolean retrieval systems that perform exact matching, and require users to express their search requests precisely in a Boolean search language and to refine their search statements to improve the search results. It is well-documented that users have difficulty searching Boolean OPACs effectively (e.g. Borgman, 1996; Ensor, 1992; Wallace, 1993). One approach to making OPACs easier to use is to develop a natural language search interface that acts as a middleware between the user's Web browser and the OPAC system. The search interface can accept a natural language query from the user and reformulate it as a series of Boolean search statements that are then submitted to the OPAC. The records retrieved by the OPAC are ranked by the search interface before forwarding them to the user's Web browser. The user, then, does not need to interact directly with the Boolean OPAC but with the natural language search interface or search intermediary. The search interface interacts with the OPAC system an the user's behalf. The advantage of this approach is that no modification to the OPAC or library system is required. Furthermore, the search interface can access multiple OPACs, acting as a meta search engine, and integrate search results from various OPACs before sending them to the user. The search interface needs to incorporate a method for converting the user's natural language query into a series of Boolean search statements, and for ranking the OPAC records retrieved. The purpose of this study was to develop a relevancyranking algorithm for a search interface to Boolean OPAC systems. This is part of an on-going effort to develop a knowledge-based search interface to OPACs called the E-Referencer (Khoo et al., 1998, 1999; Poo et al., 2000). E-Referencer v. 2 that has been implemented applies a repertoire of initial search strategies and reformulation strategies to retrieve records from OPACs using the Z39.50 protocol, and also assists users in mapping query keywords to the Library of Congress subject headings."
    Source
    Electronic library. 22(2004) no.2, S.112-120
  9. Urbain, J.; Goharian, N.; Frieder, O.: Probabilistic passage models for semantic search of genomics literature (2008) 0.03
    0.025823431 = product of:
      0.090382 = sum of:
        0.017794685 = weight(_text_:of in 2380) [ClassicSimilarity], result of:
          0.017794685 = score(doc=2380,freq=18.0), product of:
            0.06866331 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.043909185 = queryNorm
            0.25915858 = fieldWeight in 2380, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2380)
        0.07258732 = weight(_text_:distribution in 2380) [ClassicSimilarity], result of:
          0.07258732 = score(doc=2380,freq=2.0), product of:
            0.24019864 = queryWeight, product of:
              5.4703507 = idf(docFreq=505, maxDocs=44218)
              0.043909185 = queryNorm
            0.30219704 = fieldWeight in 2380, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4703507 = idf(docFreq=505, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2380)
      0.2857143 = coord(2/7)
    
    Abstract
    We explore unsupervised learning techniques for extracting semantic information about biomedical concepts and topics, and introduce a passage retrieval model for using these semantics in context to improve genomics literature search. Our contributions include a new passage retrieval model based on an undirected graphical model (Markov Random Fields), and new methods for modeling passage-concepts, document-topics, and passage-terms as potential functions within the model. Each potential function includes distributional evidence to disambiguate topics, concepts, and terms in context. The joint distribution across potential functions in the graph represents the probability of a passage being relevant to a biologist's information need. Relevance ranking within each potential function simplifies normalization across potential functions and eliminates the need for tuning of passage retrieval model parameters. Our dimensional indexing model facilitates efficient aggregation of topic, concept, and term distributions. The proposed passage-retrieval model improves search results in the presence of varying levels of semantic evidence, outperforming models of query terms, concepts, or document topics alone. Our results exceed the state-of-the-art for automatic document retrieval by 14.46% (0.3554 vs. 0.3105) and passage retrieval by 15.57% (0.1128 vs. 0.0976) as assessed by the TREC 2007 Genomics Track, and automatic document retrieval by 18.56% (0.3424 vs. 0.2888) as assessed by the TREC 2005 Genomics Track. Automatic document retrieval results for TREC 2007 and TREC 2005 are statistically significant at the 95% confidence level (p = .0359 and .0253, respectively). Passage retrieval is significant at the 90% confidence level (p = 0.0893).
    Source
    Journal of the American Society for Information Science and Technology. 59(2008) no.12, S.2008-2023
  10. Jiang, X.; Sun, X.; Yang, Z.; Zhuge, H.; Lapshinova-Koltunski, E.; Yao, J.: Exploiting heterogeneous scientific literature networks to combat ranking bias : evidence from the computational linguistics area (2016) 0.03
    0.025823431 = product of:
      0.090382 = sum of:
        0.017794685 = weight(_text_:of in 3017) [ClassicSimilarity], result of:
          0.017794685 = score(doc=3017,freq=18.0), product of:
            0.06866331 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.043909185 = queryNorm
            0.25915858 = fieldWeight in 3017, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3017)
        0.07258732 = weight(_text_:distribution in 3017) [ClassicSimilarity], result of:
          0.07258732 = score(doc=3017,freq=2.0), product of:
            0.24019864 = queryWeight, product of:
              5.4703507 = idf(docFreq=505, maxDocs=44218)
              0.043909185 = queryNorm
            0.30219704 = fieldWeight in 3017, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4703507 = idf(docFreq=505, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3017)
      0.2857143 = coord(2/7)
    
    Abstract
    It is important to help researchers find valuable papers from a large literature collection. To this end, many graph-based ranking algorithms have been proposed. However, most of these algorithms suffer from the problem of ranking bias. Ranking bias hurts the usefulness of a ranking algorithm because it returns a ranking list with an undesirable time distribution. This paper is a focused study on how to alleviate ranking bias by leveraging the heterogeneous network structure of the literature collection. We propose a new graph-based ranking algorithm, MutualRank, that integrates mutual reinforcement relationships among networks of papers, researchers, and venues to achieve a more synthetic, accurate, and less-biased ranking than previous methods. MutualRank provides a unified model that involves both intra- and inter-network information for ranking papers, researchers, and venues simultaneously. We use the ACL Anthology Network as the benchmark data set and construct the gold standard from computer linguistics course websites of well-known universities and two well-known textbooks. The experimental results show that MutualRank greatly outperforms the state-of-the-art competitors, including PageRank, HITS, CoRank, Future Rank, and P-Rank, in ranking papers in both improving ranking effectiveness and alleviating ranking bias. Rankings of researchers and venues by MutualRank are also quite reasonable.
    Source
    Journal of the Association for Information Science and Technology. 67(2016) no.7, S.1679-1702
  11. MacFarlane, A.; McCann, J.A.; Robertson, S.E.: Parallel methods for the update of partitioned inverted files (2007) 0.02
    0.01780776 = product of:
      0.062327154 = sum of:
        0.017794685 = weight(_text_:of in 819) [ClassicSimilarity], result of:
          0.017794685 = score(doc=819,freq=18.0), product of:
            0.06866331 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.043909185 = queryNorm
            0.25915858 = fieldWeight in 819, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0390625 = fieldNorm(doc=819)
        0.044532467 = product of:
          0.08906493 = sum of:
            0.08906493 = weight(_text_:service in 819) [ClassicSimilarity], result of:
              0.08906493 = score(doc=819,freq=8.0), product of:
                0.18813887 = queryWeight, product of:
                  4.284727 = idf(docFreq=1655, maxDocs=44218)
                  0.043909185 = queryNorm
                0.47339994 = fieldWeight in 819, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  4.284727 = idf(docFreq=1655, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=819)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Abstract
    Purpose - An issue that tends to be ignored in information retrieval is the issue of updating inverted files. This is largely because inverted files were devised to provide fast query service, and much work has been done with the emphasis strongly on queries. This paper aims to study the effect of using parallel methods for the update of inverted files in order to reduce costs, by looking at two types of partitioning for inverted files: document identifier and term identifier. Design/methodology/approach - Raw update service and update with query service are studied with these partitioning schemes using an incremental update strategy. The paper uses standard measures used in parallel computing such as speedup to examine the computing results and also the costs of reorganising indexes while servicing transactions. Findings - Empirical results show that for both transaction processing and index reorganisation the document identifier method is superior. However, there is evidence that the term identifier partitioning method could be useful in a concurrent transaction processing context. Practical implications - There is an increasing need to service updates, which is now becoming a requirement of inverted files (for dynamic collections such as the web), demonstrating that a shift in requirements of inverted file maintenance is needed from the past. Originality/value - The paper is of value to database administrators who manage large-scale and dynamic text collections, and who need to use parallel computing to implement their text retrieval services.
  12. Smeaton, A.F.; Rijsbergen, C.J. van: ¬The retrieval effects of query expansion on a feedback document retrieval system (1983) 0.02
    0.016643427 = product of:
      0.058251992 = sum of:
        0.016608374 = weight(_text_:of in 2134) [ClassicSimilarity], result of:
          0.016608374 = score(doc=2134,freq=2.0), product of:
            0.06866331 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.043909185 = queryNorm
            0.24188137 = fieldWeight in 2134, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.109375 = fieldNorm(doc=2134)
        0.04164362 = product of:
          0.08328724 = sum of:
            0.08328724 = weight(_text_:22 in 2134) [ClassicSimilarity], result of:
              0.08328724 = score(doc=2134,freq=2.0), product of:
                0.15376249 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.043909185 = queryNorm
                0.5416616 = fieldWeight in 2134, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=2134)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Date
    30. 3.2001 13:32:22
  13. Back, J.: ¬An evaluation of relevancy ranking techniques used by Internet search engines (2000) 0.02
    0.016643427 = product of:
      0.058251992 = sum of:
        0.016608374 = weight(_text_:of in 3445) [ClassicSimilarity], result of:
          0.016608374 = score(doc=3445,freq=2.0), product of:
            0.06866331 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.043909185 = queryNorm
            0.24188137 = fieldWeight in 3445, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.109375 = fieldNorm(doc=3445)
        0.04164362 = product of:
          0.08328724 = sum of:
            0.08328724 = weight(_text_:22 in 3445) [ClassicSimilarity], result of:
              0.08328724 = score(doc=3445,freq=2.0), product of:
                0.15376249 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.043909185 = queryNorm
                0.5416616 = fieldWeight in 3445, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=3445)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Date
    25. 8.2005 17:42:22
  14. Quint, B.: Check out the new RANK command on DIALOG (1993) 0.02
    0.016113026 = product of:
      0.05639559 = sum of:
        0.011863125 = weight(_text_:of in 6640) [ClassicSimilarity], result of:
          0.011863125 = score(doc=6640,freq=2.0), product of:
            0.06866331 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.043909185 = queryNorm
            0.17277241 = fieldWeight in 6640, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.078125 = fieldNorm(doc=6640)
        0.044532467 = product of:
          0.08906493 = sum of:
            0.08906493 = weight(_text_:service in 6640) [ClassicSimilarity], result of:
              0.08906493 = score(doc=6640,freq=2.0), product of:
                0.18813887 = queryWeight, product of:
                  4.284727 = idf(docFreq=1655, maxDocs=44218)
                  0.043909185 = queryNorm
                0.47339994 = fieldWeight in 6640, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.284727 = idf(docFreq=1655, maxDocs=44218)
                  0.078125 = fieldNorm(doc=6640)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Abstract
    Describes the RANK command on DIALOG online information service. RANK conducts statistical analysis on an existing set of search results for fields specified by searchers. Details how to use RANK and applications. Points out drawbacks to its use
  15. Losada, D.E.; Barreiro, A.: Emebedding term similarity and inverse document frequency into a logical model of information retrieval (2003) 0.01
    0.012862217 = product of:
      0.045017757 = sum of:
        0.021221403 = weight(_text_:of in 1422) [ClassicSimilarity], result of:
          0.021221403 = score(doc=1422,freq=10.0), product of:
            0.06866331 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.043909185 = queryNorm
            0.3090647 = fieldWeight in 1422, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0625 = fieldNorm(doc=1422)
        0.023796353 = product of:
          0.047592707 = sum of:
            0.047592707 = weight(_text_:22 in 1422) [ClassicSimilarity], result of:
              0.047592707 = score(doc=1422,freq=2.0), product of:
                0.15376249 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.043909185 = queryNorm
                0.30952093 = fieldWeight in 1422, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1422)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Abstract
    We propose a novel approach to incorporate term similarity and inverse document frequency into a logical model of information retrieval. The ability of the logic to handle expressive representations along with the use of such classical notions are promising characteristics for IR systems. The approach proposed here has been efficiently implemented and experiments against test collections are presented.
    Date
    22. 3.2003 19:27:23
    Source
    Journal of the American Society for Information Science and technology. 54(2003) no.4, S.285-301
  16. Kelledy, F.; Smeaton, A.F.: Signature files and beyond (1996) 0.01
    0.012431752 = product of:
      0.04351113 = sum of:
        0.025663862 = weight(_text_:of in 6973) [ClassicSimilarity], result of:
          0.025663862 = score(doc=6973,freq=26.0), product of:
            0.06866331 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.043909185 = queryNorm
            0.37376386 = fieldWeight in 6973, product of:
              5.0990195 = tf(freq=26.0), with freq of:
                26.0 = termFreq=26.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.046875 = fieldNorm(doc=6973)
        0.017847266 = product of:
          0.035694532 = sum of:
            0.035694532 = weight(_text_:22 in 6973) [ClassicSimilarity], result of:
              0.035694532 = score(doc=6973,freq=2.0), product of:
                0.15376249 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.043909185 = queryNorm
                0.23214069 = fieldWeight in 6973, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=6973)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Abstract
    Proposes that signature files be used as a viable alternative to other indexing strategies such as inverted files for searching through large volumes of text. Demonstrates through simulation, that search times can be further reduced by enhancing the basic signature file concept using deterministic partitioning algorithms which eliminate the need for an exhaustive search of the entire signature file. Reports research to evaluate the performance of some deterministic partitioning algorithms in a non simulated environment using 276 MB of raw newspaper text (taken from the Wall Street Journal) and real user queries. Presents a selection of results to illustrate trends and highlight important aspects of the performance of these methods under realistic rather than simulated operating conditions. As a result of the research reported here certain aspects of this approach to signature files are shown to be found wanting and require improvement. Suggests lines of future research on the partitioning of signature files
    Source
    Information retrieval: new systems and current research. Proceedings of the 16th Research Colloquium of the British Computer Society Information Retrieval Specialist Group, Drymen, Scotland, 22-23 Mar 94. Ed.: R. Leon
  17. Furner, J.: ¬A unifying model of document relatedness for hybrid search engines (2003) 0.01
    0.011844168 = product of:
      0.041454587 = sum of:
        0.023607321 = weight(_text_:of in 2717) [ClassicSimilarity], result of:
          0.023607321 = score(doc=2717,freq=22.0), product of:
            0.06866331 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.043909185 = queryNorm
            0.34381276 = fieldWeight in 2717, product of:
              4.690416 = tf(freq=22.0), with freq of:
                22.0 = termFreq=22.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.046875 = fieldNorm(doc=2717)
        0.017847266 = product of:
          0.035694532 = sum of:
            0.035694532 = weight(_text_:22 in 2717) [ClassicSimilarity], result of:
              0.035694532 = score(doc=2717,freq=2.0), product of:
                0.15376249 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.043909185 = queryNorm
                0.23214069 = fieldWeight in 2717, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2717)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Abstract
    Previous work an search-engine design has indicated that information-seekers may benefit from being given the opportunity to exploit multiple sources of evidence of document relatedness. Few existing systems, however, give users more than minimal control over the selections that may be made among methods of exploitation. By applying the methods of "document network analysis" (DNA), a unifying, graph-theoretic model of content-, collaboration-, and context-based systems (CCC) may be developed in which the nature of the similarities between types of document relatedness and document ranking are clarified. The usefulness of the approach to system design suggested by this model may be tested by constructing and evaluating a prototype system (UCXtra) that allows searchers to maintain control over the multiple ways in which document collections may be ranked and re-ranked.
    Date
    11. 9.2004 17:32:22
    Source
    Challenges in knowledge representation and organization for the 21st century: Integration of knowledge across boundaries. Proceedings of the 7th ISKO International Conference Granada, Spain, July 10-13, 2002. Ed.: M. López-Huertas
  18. Ravana, S.D.; Rajagopal, P.; Balakrishnan, V.: Ranking retrieval systems using pseudo relevance judgments (2015) 0.01
    0.011630278 = product of:
      0.04070597 = sum of:
        0.019672766 = weight(_text_:of in 2591) [ClassicSimilarity], result of:
          0.019672766 = score(doc=2591,freq=22.0), product of:
            0.06866331 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.043909185 = queryNorm
            0.28651062 = fieldWeight in 2591, product of:
              4.690416 = tf(freq=22.0), with freq of:
                22.0 = termFreq=22.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2591)
        0.021033203 = product of:
          0.042066406 = sum of:
            0.042066406 = weight(_text_:22 in 2591) [ClassicSimilarity], result of:
              0.042066406 = score(doc=2591,freq=4.0), product of:
                0.15376249 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.043909185 = queryNorm
                0.27358043 = fieldWeight in 2591, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2591)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Abstract
    Purpose In a system-based approach, replicating the web would require large test collections, and judging the relevancy of all documents per topic in creating relevance judgment through human assessors is infeasible. Due to the large amount of documents that requires judgment, there are possible errors introduced by human assessors because of disagreements. The paper aims to discuss these issues. Design/methodology/approach This study explores exponential variation and document ranking methods that generate a reliable set of relevance judgments (pseudo relevance judgments) to reduce human efforts. These methods overcome problems with large amounts of documents for judgment while avoiding human disagreement errors during the judgment process. This study utilizes two key factors: number of occurrences of each document per topic from all the system runs; and document rankings to generate the alternate methods. Findings The effectiveness of the proposed method is evaluated using the correlation coefficient of ranked systems using mean average precision scores between the original Text REtrieval Conference (TREC) relevance judgments and pseudo relevance judgments. The results suggest that the proposed document ranking method with a pool depth of 100 could be a reliable alternative to reduce human effort and disagreement errors involved in generating TREC-like relevance judgments. Originality/value Simple methods proposed in this study show improvement in the correlation coefficient in generating alternate relevance judgment without human assessors while contributing to information retrieval evaluation.
    Date
    20. 1.2015 18:30:22
    18. 9.2018 18:22:56
    Source
    Aslib journal of information management. 67(2015) no.6, S.700-714
  19. Faloutsos, C.: Signature files (1992) 0.01
    0.011495537 = product of:
      0.04023438 = sum of:
        0.016438028 = weight(_text_:of in 3499) [ClassicSimilarity], result of:
          0.016438028 = score(doc=3499,freq=6.0), product of:
            0.06866331 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.043909185 = queryNorm
            0.23940048 = fieldWeight in 3499, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0625 = fieldNorm(doc=3499)
        0.023796353 = product of:
          0.047592707 = sum of:
            0.047592707 = weight(_text_:22 in 3499) [ClassicSimilarity], result of:
              0.047592707 = score(doc=3499,freq=2.0), product of:
                0.15376249 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.043909185 = queryNorm
                0.30952093 = fieldWeight in 3499, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=3499)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Abstract
    Presents a survey and discussion on signature-based text retrieval methods. It describes the main idea behind the signature approach and its advantages over other text retrieval methods, it provides a classification of the signature methods that have appeared in the literature, it describes the main representatives of each class, together with the relative advantages and drawbacks, and it gives a list of applications as well as commercial or university prototypes that use the signature approach
    Date
    7. 5.1999 15:22:48
  20. Bornmann, L.; Mutz, R.: From P100 to P100' : a new citation-rank approach (2014) 0.01
    0.011495537 = product of:
      0.04023438 = sum of:
        0.016438028 = weight(_text_:of in 1431) [ClassicSimilarity], result of:
          0.016438028 = score(doc=1431,freq=6.0), product of:
            0.06866331 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.043909185 = queryNorm
            0.23940048 = fieldWeight in 1431, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0625 = fieldNorm(doc=1431)
        0.023796353 = product of:
          0.047592707 = sum of:
            0.047592707 = weight(_text_:22 in 1431) [ClassicSimilarity], result of:
              0.047592707 = score(doc=1431,freq=2.0), product of:
                0.15376249 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.043909185 = queryNorm
                0.30952093 = fieldWeight in 1431, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1431)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Abstract
    Properties of a percentile-based rating scale needed in bibliometrics are formulated. Based on these properties, P100 was recently introduced as a new citation-rank approach (Bornmann, Leydesdorff, & Wang, 2013). In this paper, we conceptualize P100 and propose an improvement which we call P100'. Advantages and disadvantages of citation-rank indicators are noted.
    Date
    22. 8.2014 17:05:18
    Source
    Journal of the Association for Information Science and Technology. 65(2014) no.9, S.1939-1943

Languages

  • e 293
  • d 10
  • chi 2
  • More… Less…

Types

  • a 284
  • m 10
  • el 8
  • s 4
  • r 3
  • p 2
  • x 1
  • More… Less…