Search (68 results, page 1 of 4)

Chang, M.; Poon, C.K.: Efficient phrase querying with common phrase index (2008) 0.06
```
0.06434436 = product of:
  0.19303308 = sum of:
    0.19303308 = weight(_text_:index in 2061) [ClassicSimilarity], result of:
      0.19303308 = score(doc=2061,freq=18.0), product of:
        0.2221244 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.05083213 = queryNorm
        0.8690314 = fieldWeight in 2061, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.046875 = fieldNorm(doc=2061)
  0.33333334 = coord(1/3)
```
Abstract

In this paper, we propose a common phrase index as an efficient index structure to support phrase queries in a very large text database. Our structure is an extension of previous index structures for phrases and achieves better query efficiency with modest extra storage cost. Further improvement in efficiency can be attained by implementing our index according to our observation of the dynamic nature of common word set. In experimental evaluation, a common phrase index using 255 common words has an improvement of about 11% and 62% in query time for the overall and large queries (queries of long phrases) respectively over an auxiliary nextword index. Moreover, it has only about 19% extra storage cost. Compared with an inverted index, our improvement is about 72% and 87% for the overall and large queries respectively. We also propose to implement a common phrase index with dynamic update feature. Our experiments show that more improvement in time efficiency can be achieved.
Jacso, P.: Testing the calculation of a realistic h-index in Google Scholar, Scopus, and Web of Science for F. W. Lancaster (2008) 0.05
```
0.047288667 = product of:
  0.141866 = sum of:
    0.141866 = weight(_text_:index in 5586) [ClassicSimilarity], result of:
      0.141866 = score(doc=5586,freq=14.0), product of:
        0.2221244 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.05083213 = queryNorm
        0.63867813 = fieldWeight in 5586, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5586)
  0.33333334 = coord(1/3)
```
Abstract

This paper focuses on the practical limitations in the content and software of the databases that are used to calculate the h-index for assessing the publishing productivity and impact of researchers. To celebrate F. W. Lancaster's biological age of seventy-five, and "scientific age" of forty-five, this paper discusses the related features of Google Scholar, Scopus, and Web of Science (WoS), and demonstrates in the latter how a much more realistic and fair h-index can be computed for F. W. Lancaster than the one produced automatically. Browsing and searching the cited reference index of the 1945-2007 edition of WoS, which in my estimate has over a hundred million "orphan references" that have no counterpart master records to be attached to, and "stray references" that cite papers which do have master records but cannot be identified by the matching algorithm because of errors of omission and commission in the references of the citing works, can bring up hundreds of additional cited references given to works of an accomplished author but are ignored in the automatic process of calculating the h-index. The partially manual process doubled the h-index value for F. W. Lancaster from 13 to 26, which is a much more realistic value for an information scientist and professor of his stature.

Object

h-index
Abu-Salem, H.; Al-Omari, M.; Evens, M.W.: Stemming methodologies over individual query words for an Arabic information retrieval system (1999) 0.04
```
0.039966214 = product of:
  0.11989864 = sum of:
    0.11989864 = weight(_text_:index in 3672) [ClassicSimilarity], result of:
      0.11989864 = score(doc=3672,freq=10.0), product of:
        0.2221244 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.05083213 = queryNorm
        0.5397815 = fieldWeight in 3672, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3672)
  0.33333334 = coord(1/3)
```
Abstract

Stemming is one of the most important factors that affect the performance of information retrieval systems. This article investigates how to improve the performance of an Arabic information retrieval system by imposing the retrieval method over individual words of a query depending on the importance of the WORD, the STEM, or the ROOT of the query terms in the database. This method, called Mxed Stemming, computes term importance using a weighting scheme that use the Term Frequency (TF) and the Inverse Document Frequency (IDF), called TFxIDF. An extended version of the Arabic IRS system is designed, implemented, and evaluated to reduce the number of irrelevant documents retrieved. The results of the experiment suggest that the proposed method outperforms the Word index method using the TFxIDF weighting scheme. It also outperforms the Stem index method using the Binary weighting scheme but does not outperform the Stem index method using the TFxIDF weighting scheme, and again it outperforms the Root index method using the Binary weighting scheme but does not outperform the Root index method using the TFxIDF weighting scheme
Moffat, A.; Bell, T.A.H.: In situ generation of compressed inverted files (1995) 0.04
```
0.03714924 = product of:
  0.111447714 = sum of:
    0.111447714 = weight(_text_:index in 2648) [ClassicSimilarity], result of:
      0.111447714 = score(doc=2648,freq=6.0), product of:
        0.2221244 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.05083213 = queryNorm
        0.50173557 = fieldWeight in 2648, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.046875 = fieldNorm(doc=2648)
  0.33333334 = coord(1/3)
```
Abstract

An inverted index stores, for each term that appears in a collection of documents, a list of document numbers containing that term. Such an index is indispensible when Boolean or informal ranked queries are to be answered. Construction of the index ist, however, a non trivial task. Simple methods using in.memory data structures cannot be used for large collections because they require too much random access storage, and traditional disc based methods require large amounts of temporary file space. Describes a new indexing algorithm designed to create large compressed inverted indexes in situ. It makes use of simple compression codes for the positive integers and an in place external multi way merge sort. The new techniques has been used to invert a 2-gigabyte text collection in under 4 hours, using less than 40 megabytes of temporary disc space, and less than 20 megabytes of main memory

Bar-Ilan, J.; Levene, M.: ¬The hw-rank : an h-index variant for ranking web pages (2015) 0.04

0.03574687 = product of:
  0.1072406 = sum of:
    0.1072406 = weight(_text_:index in 1694) [ClassicSimilarity], result of:
      0.1072406 = score(doc=1694,freq=2.0), product of:
        0.2221244 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.05083213 = queryNorm
        0.48279524 = fieldWeight in 1694, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.078125 = fieldNorm(doc=1694)
  0.33333334 = coord(1/3)

Rajashekar, T.B.; Croft, W.B.: Combining automatic and manual index representations in probabilistic retrieval (1995) 0.04
```
0.035387594 = product of:
  0.10616278 = sum of:
    0.10616278 = weight(_text_:index in 2418) [ClassicSimilarity], result of:
      0.10616278 = score(doc=2418,freq=4.0), product of:
        0.2221244 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.05083213 = queryNorm
        0.4779429 = fieldWeight in 2418, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2418)
  0.33333334 = coord(1/3)
```
Abstract

Results from research in information retrieval have suggested that significant improvements in retrieval effectiveness can be obtained by combining results from multiple index representioms, query formulations, and search strategies. The inference net model of retrieval, which was designed from this point of view, treats information retrieval as an evidental reasoning process where multiple sources of evidence about document and query content are combined to estimate relevance probabilities. Uses a system based on this model to study the retrieval effectiveness benefits of combining these types of document and query information that are found in typical commercial databases and information services. The results indicate that substantial real benefits are possible

Faloutsos, C.: Signature files (1992) 0.03

0.033555232 = product of:
  0.100665696 = sum of:
    0.100665696 = sum of:
      0.045569282 = weight(_text_:classification in 3499) [ClassicSimilarity], result of:
        0.045569282 = score(doc=3499,freq=2.0), product of:
          0.16188543 = queryWeight, product of:
            3.1847067 = idf(docFreq=4974, maxDocs=44218)
            0.05083213 = queryNorm
          0.28149095 = fieldWeight in 3499, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.1847067 = idf(docFreq=4974, maxDocs=44218)
            0.0625 = fieldNorm(doc=3499)
      0.055096414 = weight(_text_:22 in 3499) [ClassicSimilarity], result of:
        0.055096414 = score(doc=3499,freq=2.0), product of:
          0.17800546 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.05083213 = queryNorm
          0.30952093 = fieldWeight in 3499, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0625 = fieldNorm(doc=3499)
  0.33333334 = coord(1/3)

Abstract: Presents a survey and discussion on signature-based text retrieval methods. It describes the main idea behind the signature approach and its advantages over other text retrieval methods, it provides a classification of the signature methods that have appeared in the literature, it describes the main representatives of each class, together with the relative advantages and drawbacks, and it gives a list of applications as well as commercial or university prototypes that use the signature approach
Date: 7. 5.1999 15:22:48

Maron, M.E.; Kuhns, I.L.: On relevance, probabilistic indexing and information retrieval (1960) 0.03
```
0.030957699 = product of:
  0.0928731 = sum of:
    0.0928731 = weight(_text_:index in 1928) [ClassicSimilarity], result of:
      0.0928731 = score(doc=1928,freq=6.0), product of:
        0.2221244 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.05083213 = queryNorm
        0.418113 = fieldWeight in 1928, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1928)
  0.33333334 = coord(1/3)
```
Abstract

Reports on a novel technique for literature indexing and searching in a mechanized library system. The notion of relevance is taken as the key concept in the theory of information retrieval and a comparative concept of relevance is explicated in terms of the theory of probability. The resulting technique called 'Probabilistic indexing' allows a computing machine, given a request for information, to make a statistical inference and derive a number (called the 'relevance number') for each document, which is a measure of the probability that the document will satisfy the given request. The result of a search is an ordered list of those documents which satisfy the request ranked according to their probable relevance. The paper goes on to show that whereas in a conventional library system the cross-referencing ('see' and 'see also') is based soley on the 'semantic closeness' between index terms, statistical measures of closeness between index terms can be defined and computed. Thus, given an arbitrary request consisting of one (or many) index term(s), a machine can eleborate on it to increase the probability of selecting relevant documents that would not otherwise have been selected. Finally, the paper suggest an interpretation of the whole library problem as one where the request is considered as a clue on the basis of which the library system makes a concatenated statistical inference in order to provide as an output an ordered list of those documents which most probably satisfy the information needs of the user
Käki, M.: fKWIC: frequency-based Keyword-in-Context Index for filtering Web search results (2006) 0.03
```
0.030332223 = product of:
  0.09099667 = sum of:
    0.09099667 = weight(_text_:index in 6112) [ClassicSimilarity], result of:
      0.09099667 = score(doc=6112,freq=4.0), product of:
        0.2221244 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.05083213 = queryNorm
        0.40966535 = fieldWeight in 6112, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.046875 = fieldNorm(doc=6112)
  0.33333334 = coord(1/3)
```
Abstract

Enormous Web search engine databases combined with short search queries result in large result sets that are often difficult to access. Result ranking works fairly well, but users need help when it fails. For these situations, we propose a filtering interface that is inspired by keyword-in-context (KWIC) indices. The user interface lists the most frequent keyword contexts (fKWIC). When a context is selected, the corresponding results are displayed in the result list, allowing users to concentrate on the specific context. We compared the keyword context index user interface to the rank order result listing in an experiment with 36 participants. The results show that the proposed user interface was 29% faster in finding relevant results, and the precision of the selected results was 19% higher. In addition, participants showed positive attitudes toward the system.
Ding, Y.; Yan, E.; Frazho, A.; Caverlee, J.: PageRank for ranking authors in co-citation networks (2009) 0.03
```
0.030332223 = product of:
  0.09099667 = sum of:
    0.09099667 = weight(_text_:index in 3161) [ClassicSimilarity], result of:
      0.09099667 = score(doc=3161,freq=4.0), product of:
        0.2221244 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.05083213 = queryNorm
        0.40966535 = fieldWeight in 3161, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.046875 = fieldNorm(doc=3161)
  0.33333334 = coord(1/3)
```
Abstract

This paper studies how varied damping factors in the PageRank algorithm influence the ranking of authors and proposes weighted PageRank algorithms. We selected the 108 most highly cited authors in the information retrieval (IR) area from the 1970s to 2008 to form the author co-citation network. We calculated the ranks of these 108 authors based on PageRank with the damping factor ranging from 0.05 to 0.95. In order to test the relationship between different measures, we compared PageRank and weighted PageRank results with the citation ranking, h-index, and centrality measures. We found that in our author co-citation network, citation rank is highly correlated with PageRank with different damping factors and also with different weighted PageRank algorithms; citation rank and PageRank are not significantly correlated with centrality measures; and h-index rank does not significantly correlate with centrality measures but does significantly correlate with other measures. The key factors that have impact on the PageRank of authors in the author co-citation network are being co-cited with important authors.

Walz, J.: Analyse der Übertragbarkeit allgemeiner Rankingfaktoren von Web-Suchmaschinen auf Discovery-Systeme (2018) 0.03

0.030332223 = product of:
  0.09099667 = sum of:
    0.09099667 = weight(_text_:index in 5744) [ClassicSimilarity], result of:
      0.09099667 = score(doc=5744,freq=4.0), product of:
        0.2221244 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.05083213 = queryNorm
        0.40966535 = fieldWeight in 5744, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.046875 = fieldNorm(doc=5744)
  0.33333334 = coord(1/3)

Content: Vgl.: https://publiscologne.th-koeln.de/frontdoor/index/index/searchtype/authorsearch/author/Julia+Walz/docId/1169/start/0/rows/10.

Robertson, A.M.; Willett, P.: Use of genetic algorithms in information retrieval (1995) 0.03
```
0.028597495 = product of:
  0.08579248 = sum of:
    0.08579248 = weight(_text_:index in 2418) [ClassicSimilarity], result of:
      0.08579248 = score(doc=2418,freq=2.0), product of:
        0.2221244 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.05083213 = queryNorm
        0.3862362 = fieldWeight in 2418, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.0625 = fieldNorm(doc=2418)
  0.33333334 = coord(1/3)
```
Abstract

Reviews the basic techniques involving genetic algorithms and their application to 2 problems in information retrieval: the generation of equifrequent groups of index terms; and the identification of optimal query and term weights. The algorithm developed for the generation of equifrequent groupings proved to be effective in operation, achieving results comparable with those obtained using a good deterministic algorithm. The algorithm developed for the identification of optimal query and term weighting involves fitness function that is based on full relevance information

Gonnet, G.H.; Snider, T.; Baeza-Yates, R.A.: New indices for text : PAT trees and PAT arrays (1992) 0.03

0.028597495 = product of:
  0.08579248 = sum of:
    0.08579248 = weight(_text_:index in 3500) [ClassicSimilarity], result of:
      0.08579248 = score(doc=3500,freq=2.0), product of:
        0.2221244 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.05083213 = queryNorm
        0.3862362 = fieldWeight in 3500, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.0625 = fieldNorm(doc=3500)
  0.33333334 = coord(1/3)

Abstract: We survey new indices for text, with emphasis on PAT arrays (also called suffic arrays). A PAT array is an index based on a new model of text that does not use the concept of word and does not need to know the structure of text

Langville, A.N.; Meyer, C.D.: Google's PageRank and beyond : the science of search engine rankings (2006) 0.03
```
0.027144281 = product of:
  0.04071642 = sum of:
    0.03217218 = weight(_text_:index in 6) [ClassicSimilarity], result of:
      0.03217218 = score(doc=6,freq=2.0), product of:
        0.2221244 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.05083213 = queryNorm
        0.14483857 = fieldWeight in 6, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.0234375 = fieldNorm(doc=6)
    0.008544241 = product of:
      0.017088482 = sum of:
        0.017088482 = weight(_text_:classification in 6) [ClassicSimilarity], result of:
          0.017088482 = score(doc=6,freq=2.0), product of:
            0.16188543 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.05083213 = queryNorm
            0.10555911 = fieldWeight in 6, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0234375 = fieldNorm(doc=6)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Content

Inhalt: Chapter 1. Introduction to Web Search Engines: 1.1 A Short History of Information Retrieval - 1.2 An Overview of Traditional Information Retrieval - 1.3 Web Information Retrieval Chapter 2. Crawling, Indexing, and Query Processing: 2.1 Crawling - 2.2 The Content Index - 2.3 Query Processing Chapter 3. Ranking Webpages by Popularity: 3.1 The Scene in 1998 - 3.2 Two Theses - 3.3 Query-Independence Chapter 4. The Mathematics of Google's PageRank: 4.1 The Original Summation Formula for PageRank - 4.2 Matrix Representation of the Summation Equations - 4.3 Problems with the Iterative Process - 4.4 A Little Markov Chain Theory - 4.5 Early Adjustments to the Basic Model - 4.6 Computation of the PageRank Vector - 4.7 Theorem and Proof for Spectrum of the Google Matrix Chapter 5. Parameters in the PageRank Model: 5.1 The a Factor - 5.2 The Hyperlink Matrix H - 5.3 The Teleportation Matrix E Chapter 6. The Sensitivity of PageRank; 6.1 Sensitivity with respect to alpha - 6.2 Sensitivity with respect to H - 6.3 Sensitivity with respect to vT - 6.4 Other Analyses of Sensitivity - 6.5 Sensitivity Theorems and Proofs Chapter 7. The PageRank Problem as a Linear System: 7.1 Properties of (I - alphaS) - 7.2 Properties of (I - alphaH) - 7.3 Proof of the PageRank Sparse Linear System Chapter 8. Issues in Large-Scale Implementation of PageRank: 8.1 Storage Issues - 8.2 Convergence Criterion - 8.3 Accuracy - 8.4 Dangling Nodes - 8.5 Back Button Modeling
Chapter 9. Accelerating the Computation of PageRank: 9.1 An Adaptive Power Method - 9.2 Extrapolation - 9.3 Aggregation - 9.4 Other Numerical Methods Chapter 10. Updating the PageRank Vector: 10.1 The Two Updating Problems and their History - 10.2 Restarting the Power Method - 10.3 Approximate Updating Using Approximate Aggregation - 10.4 Exact Aggregation - 10.5 Exact vs. Approximate Aggregation - 10.6 Updating with Iterative Aggregation - 10.7 Determining the Partition - 10.8 Conclusions Chapter 11. The HITS Method for Ranking Webpages: 11.1 The HITS Algorithm - 11.2 HITS Implementation - 11.3 HITS Convergence - 11.4 HITS Example - 11.5 Strengths and Weaknesses of HITS - 11.6 HITS's Relationship to Bibliometrics - 11.7 Query-Independent HITS - 11.8 Accelerating HITS - 11.9 HITS Sensitivity Chapter 12. Other Link Methods for Ranking Webpages: 12.1 SALSA - 12.2 Hybrid Ranking Methods - 12.3 Rankings based on Traffic Flow Chapter 13. The Future of Web Information Retrieval: 13.1 Spam - 13.2 Personalization - 13.3 Clustering - 13.4 Intelligent Agents - 13.5 Trends and Time-Sensitive Search - 13.6 Privacy and Censorship - 13.7 Library Classification Schemes - 13.8 Data Fusion Chapter 14. Resources for Web Information Retrieval: 14.1 Resources for Getting Started - 14.2 Resources for Serious Study Chapter 15. The Mathematics Guide: 15.1 Linear Algebra - 15.2 Perron-Frobenius Theory - 15.3 Markov Chains - 15.4 Perron Complementation - 15.5 Stochastic Complementation - 15.6 Censoring - 15.7 Aggregation - 15.8 Disaggregation

Heinz, S.; Zobel, J.: Efficient single-pass index construction for text databases (2003) 0.03

0.025022808 = product of:
  0.07506842 = sum of:
    0.07506842 = weight(_text_:index in 1678) [ClassicSimilarity], result of:
      0.07506842 = score(doc=1678,freq=2.0), product of:
        0.2221244 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.05083213 = queryNorm
        0.33795667 = fieldWeight in 1678, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1678)
  0.33333334 = coord(1/3)

Sparck Jones, K.: ¬A statistical interpretation of term specificity and its application in retrieval (2004) 0.03
```
0.025022808 = product of:
  0.07506842 = sum of:
    0.07506842 = weight(_text_:index in 4420) [ClassicSimilarity], result of:
      0.07506842 = score(doc=4420,freq=2.0), product of:
        0.2221244 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.05083213 = queryNorm
        0.33795667 = fieldWeight in 4420, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4420)
  0.33333334 = coord(1/3)
```
Abstract

The exhaustivity of document descriptions and the specificity of index terms are usually regarded as independent. It is suggested that specificity should be interpreted statistically, as a function of term use rather than of term meaning. The effects on retrieval of variations in term specificity are examined, experiments with three test collections showing, in particular, that frequently-occurring terms are required for good overall performance. It is argued that terms should be weighted according to collection frequency, so that matches on less frequent, more specific, terms are of greater value than matches on frequent terms. Results for the test collections show that considerable improvements in performance are obtained with this very simple procedure.

Abdelkareem, M.A.A.: In terms of publication index, what indicator is the best for researchers indexing, Google Scholar, Scopus, Clarivate or others? (2018) 0.03

0.025022808 = product of:
  0.07506842 = sum of:
    0.07506842 = weight(_text_:index in 4548) [ClassicSimilarity], result of:
      0.07506842 = score(doc=4548,freq=2.0), product of:
        0.2221244 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.05083213 = queryNorm
        0.33795667 = fieldWeight in 4548, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4548)
  0.33333334 = coord(1/3)

Savoy, J.: Ranking schemes in hybrid Boolean systems : a new approach (1997) 0.02
```
0.02144812 = product of:
  0.06434436 = sum of:
    0.06434436 = weight(_text_:index in 393) [ClassicSimilarity], result of:
      0.06434436 = score(doc=393,freq=2.0), product of:
        0.2221244 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.05083213 = queryNorm
        0.28967714 = fieldWeight in 393, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.046875 = fieldNorm(doc=393)
  0.33333334 = coord(1/3)
```
Abstract

In most commercial online systems, the retrieval system is based on the Boolean model and its inverted file organization. Since the investment in these systems is so great and changing them could be economically unfeasible, this article suggests a new ranking scheme especially adapted for hypertext environments in order to produce more effective retrieval results and yet maintain the effectiveness of the investment made to date in the Boolean model. To select the retrieved documents, the suggested ranking strategy uses multiple sources of document content evidence. The proposed scheme integrates both the information provided by the index and query terms, and the inherent relationships between documents such as bibliographic references or hypertext links. We will demonstrate that our scheme represents an integration of both subject and citation indexing, and results in a significant imporvement over classical ranking schemes uses in hybrid Boolean systems, while preserving its efficiency. Moreover, through knowing the nearest neighbor and the hypertext links which constitute additional sources of evidence, our strategy will take them into account in order to further improve retrieval effectiveness and to provide 'good' starting points for browsing in a hypertext or hypermedia environement
White, K.J.; Sutcliffe, R.F.E.: Applying incremental tree induction to retrieval : from manuals and medical texts (2006) 0.02
```
0.02144812 = product of:
  0.06434436 = sum of:
    0.06434436 = weight(_text_:index in 5044) [ClassicSimilarity], result of:
      0.06434436 = score(doc=5044,freq=2.0), product of:
        0.2221244 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.05083213 = queryNorm
        0.28967714 = fieldWeight in 5044, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.046875 = fieldNorm(doc=5044)
  0.33333334 = coord(1/3)
```
Abstract

The Decision Tree Forest (DTF) is an architecture for information retrieval that uses a separate decision tree for each document in a collection. Experiments were conducted in which DTFs working with the incremental tree induction (ITI) algorithm of Utgoff, Berkman, and Clouse (1997) were trained and evaluated in the medical and word processing domains using the Cystic Fibrosis and SIFT collections. Performance was compared with that of a conventional inverted index system (IIS) using a BM25-derived probabilistic matching function. Initial results using DTF were poor compared to those obtained with IIS. We then simulated scenarios in which large quantities of training data were available, by using only those parts of the document collection that were well covered by the data sets. Consequently, the retrieval effectiveness of DTF improved substantially. In one particular experiment, precision and recall for DTF were 0.65 and 0.67 respectively, values that compared favorably with values of 0.49 and 0.56 for IIS.
MacFarlane, A.; McCann, J.A.; Robertson, S.E.: Parallel methods for the generation of partitioned inverted files (2005) 0.02
```
0.02144812 = product of:
  0.06434436 = sum of:
    0.06434436 = weight(_text_:index in 651) [ClassicSimilarity], result of:
      0.06434436 = score(doc=651,freq=2.0), product of:
        0.2221244 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.05083213 = queryNorm
        0.28967714 = fieldWeight in 651, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.046875 = fieldNorm(doc=651)
  0.33333334 = coord(1/3)
```
Abstract

Purpose - The generation of inverted indexes is one of the most computationally intensive activities for information retrieval systems: indexing large multi-gigabyte text databases can take many hours or even days to complete. We examine the generation of partitioned inverted files in order to speed up the process of indexing. Two types of index partitions are investigated: TermId and DocId. Design/methodology/approach - We use standard measures used in parallel computing such as speedup and efficiency to examine the computing results and also the space costs of our trial indexing experiments. Findings - The results from runs on both partitioning methods are compared and contrasted, concluding that DocId is the more efficient method. Practical implications - The practical implications are that the DocId partitioning method would in most circumstances be used for distributing inverted file data in a parallel computer, particularly if indexing speed is the primary consideration. Originality/value - The paper is of value to database administrators who manage large-scale text collections, and who need to use parallel computing to implement their text retrieval services.

Search (68 results, page 1 of 4)

Authors

Years

Languages

Types

Themes

Subjects

Classifications