Search (55 results, page 1 of 3)

  • × theme_ss:"Retrievalalgorithmen"
  1. Chang, M.; Poon, C.K.: Efficient phrase querying with common phrase index (2008) 0.04
    0.039122667 = product of:
      0.19561332 = sum of:
        0.19561332 = weight(_text_:index in 2061) [ClassicSimilarity], result of:
          0.19561332 = score(doc=2061,freq=18.0), product of:
            0.2250935 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.051511593 = queryNorm
            0.8690314 = fieldWeight in 2061, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.046875 = fieldNorm(doc=2061)
      0.2 = coord(1/5)
    
    Abstract
    In this paper, we propose a common phrase index as an efficient index structure to support phrase queries in a very large text database. Our structure is an extension of previous index structures for phrases and achieves better query efficiency with modest extra storage cost. Further improvement in efficiency can be attained by implementing our index according to our observation of the dynamic nature of common word set. In experimental evaluation, a common phrase index using 255 common words has an improvement of about 11% and 62% in query time for the overall and large queries (queries of long phrases) respectively over an auxiliary nextword index. Moreover, it has only about 19% extra storage cost. Compared with an inverted index, our improvement is about 72% and 87% for the overall and large queries respectively. We also propose to implement a common phrase index with dynamic update feature. Our experiments show that more improvement in time efficiency can be achieved.
  2. Jacso, P.: Testing the calculation of a realistic h-index in Google Scholar, Scopus, and Web of Science for F. W. Lancaster (2008) 0.03
    0.02875246 = product of:
      0.14376229 = sum of:
        0.14376229 = weight(_text_:index in 5586) [ClassicSimilarity], result of:
          0.14376229 = score(doc=5586,freq=14.0), product of:
            0.2250935 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.051511593 = queryNorm
            0.63867813 = fieldWeight in 5586, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5586)
      0.2 = coord(1/5)
    
    Abstract
    This paper focuses on the practical limitations in the content and software of the databases that are used to calculate the h-index for assessing the publishing productivity and impact of researchers. To celebrate F. W. Lancaster's biological age of seventy-five, and "scientific age" of forty-five, this paper discusses the related features of Google Scholar, Scopus, and Web of Science (WoS), and demonstrates in the latter how a much more realistic and fair h-index can be computed for F. W. Lancaster than the one produced automatically. Browsing and searching the cited reference index of the 1945-2007 edition of WoS, which in my estimate has over a hundred million "orphan references" that have no counterpart master records to be attached to, and "stray references" that cite papers which do have master records but cannot be identified by the matching algorithm because of errors of omission and commission in the references of the citing works, can bring up hundreds of additional cited references given to works of an accomplished author but are ignored in the automatic process of calculating the h-index. The partially manual process doubled the h-index value for F. W. Lancaster from 13 to 26, which is a much more realistic value for an information scientist and professor of his stature.
    Object
    h-index
  3. Abu-Salem, H.; Al-Omari, M.; Evens, M.W.: Stemming methodologies over individual query words for an Arabic information retrieval system (1999) 0.02
    0.024300262 = product of:
      0.12150131 = sum of:
        0.12150131 = weight(_text_:index in 3672) [ClassicSimilarity], result of:
          0.12150131 = score(doc=3672,freq=10.0), product of:
            0.2250935 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.051511593 = queryNorm
            0.5397815 = fieldWeight in 3672, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3672)
      0.2 = coord(1/5)
    
    Abstract
    Stemming is one of the most important factors that affect the performance of information retrieval systems. This article investigates how to improve the performance of an Arabic information retrieval system by imposing the retrieval method over individual words of a query depending on the importance of the WORD, the STEM, or the ROOT of the query terms in the database. This method, called Mxed Stemming, computes term importance using a weighting scheme that use the Term Frequency (TF) and the Inverse Document Frequency (IDF), called TFxIDF. An extended version of the Arabic IRS system is designed, implemented, and evaluated to reduce the number of irrelevant documents retrieved. The results of the experiment suggest that the proposed method outperforms the Word index method using the TFxIDF weighting scheme. It also outperforms the Stem index method using the Binary weighting scheme but does not outperform the Stem index method using the TFxIDF weighting scheme, and again it outperforms the Root index method using the Binary weighting scheme but does not outperform the Root index method using the TFxIDF weighting scheme
  4. Moffat, A.; Bell, T.A.H.: In situ generation of compressed inverted files (1995) 0.02
    0.022587484 = product of:
      0.11293741 = sum of:
        0.11293741 = weight(_text_:index in 2648) [ClassicSimilarity], result of:
          0.11293741 = score(doc=2648,freq=6.0), product of:
            0.2250935 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.051511593 = queryNorm
            0.50173557 = fieldWeight in 2648, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.046875 = fieldNorm(doc=2648)
      0.2 = coord(1/5)
    
    Abstract
    An inverted index stores, for each term that appears in a collection of documents, a list of document numbers containing that term. Such an index is indispensible when Boolean or informal ranked queries are to be answered. Construction of the index ist, however, a non trivial task. Simple methods using in.memory data structures cannot be used for large collections because they require too much random access storage, and traditional disc based methods require large amounts of temporary file space. Describes a new indexing algorithm designed to create large compressed inverted indexes in situ. It makes use of simple compression codes for the positive integers and an in place external multi way merge sort. The new techniques has been used to invert a 2-gigabyte text collection in under 4 hours, using less than 40 megabytes of temporary disc space, and less than 20 megabytes of main memory
  5. Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval (1986) 0.02
    0.022333153 = product of:
      0.11166576 = sum of:
        0.11166576 = weight(_text_:22 in 402) [ClassicSimilarity], result of:
          0.11166576 = score(doc=402,freq=2.0), product of:
            0.18038483 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051511593 = queryNorm
            0.61904186 = fieldWeight in 402, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.125 = fieldNorm(doc=402)
      0.2 = coord(1/5)
    
    Source
    Information processing and management. 22(1986) no.6, S.465-476
  6. Bar-Ilan, J.; Levene, M.: ¬The hw-rank : an h-index variant for ranking web pages (2015) 0.02
    0.021734815 = product of:
      0.10867407 = sum of:
        0.10867407 = weight(_text_:index in 1694) [ClassicSimilarity], result of:
          0.10867407 = score(doc=1694,freq=2.0), product of:
            0.2250935 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.051511593 = queryNorm
            0.48279524 = fieldWeight in 1694, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.078125 = fieldNorm(doc=1694)
      0.2 = coord(1/5)
    
  7. Rajashekar, T.B.; Croft, W.B.: Combining automatic and manual index representations in probabilistic retrieval (1995) 0.02
    0.02151637 = product of:
      0.10758185 = sum of:
        0.10758185 = weight(_text_:index in 2418) [ClassicSimilarity], result of:
          0.10758185 = score(doc=2418,freq=4.0), product of:
            0.2250935 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.051511593 = queryNorm
            0.4779429 = fieldWeight in 2418, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2418)
      0.2 = coord(1/5)
    
    Abstract
    Results from research in information retrieval have suggested that significant improvements in retrieval effectiveness can be obtained by combining results from multiple index representioms, query formulations, and search strategies. The inference net model of retrieval, which was designed from this point of view, treats information retrieval as an evidental reasoning process where multiple sources of evidence about document and query content are combined to estimate relevance probabilities. Uses a system based on this model to study the retrieval effectiveness benefits of combining these types of document and query information that are found in typical commercial databases and information services. The results indicate that substantial real benefits are possible
  8. Smeaton, A.F.; Rijsbergen, C.J. van: ¬The retrieval effects of query expansion on a feedback document retrieval system (1983) 0.02
    0.019541508 = product of:
      0.09770754 = sum of:
        0.09770754 = weight(_text_:22 in 2134) [ClassicSimilarity], result of:
          0.09770754 = score(doc=2134,freq=2.0), product of:
            0.18038483 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051511593 = queryNorm
            0.5416616 = fieldWeight in 2134, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=2134)
      0.2 = coord(1/5)
    
    Date
    30. 3.2001 13:32:22
  9. Back, J.: ¬An evaluation of relevancy ranking techniques used by Internet search engines (2000) 0.02
    0.019541508 = product of:
      0.09770754 = sum of:
        0.09770754 = weight(_text_:22 in 3445) [ClassicSimilarity], result of:
          0.09770754 = score(doc=3445,freq=2.0), product of:
            0.18038483 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051511593 = queryNorm
            0.5416616 = fieldWeight in 3445, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=3445)
      0.2 = coord(1/5)
    
    Date
    25. 8.2005 17:42:22
  10. Maron, M.E.; Kuhns, I.L.: On relevance, probabilistic indexing and information retrieval (1960) 0.02
    0.018822905 = product of:
      0.09411452 = sum of:
        0.09411452 = weight(_text_:index in 1928) [ClassicSimilarity], result of:
          0.09411452 = score(doc=1928,freq=6.0), product of:
            0.2250935 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.051511593 = queryNorm
            0.418113 = fieldWeight in 1928, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1928)
      0.2 = coord(1/5)
    
    Abstract
    Reports on a novel technique for literature indexing and searching in a mechanized library system. The notion of relevance is taken as the key concept in the theory of information retrieval and a comparative concept of relevance is explicated in terms of the theory of probability. The resulting technique called 'Probabilistic indexing' allows a computing machine, given a request for information, to make a statistical inference and derive a number (called the 'relevance number') for each document, which is a measure of the probability that the document will satisfy the given request. The result of a search is an ordered list of those documents which satisfy the request ranked according to their probable relevance. The paper goes on to show that whereas in a conventional library system the cross-referencing ('see' and 'see also') is based soley on the 'semantic closeness' between index terms, statistical measures of closeness between index terms can be defined and computed. Thus, given an arbitrary request consisting of one (or many) index term(s), a machine can eleborate on it to increase the probability of selecting relevant documents that would not otherwise have been selected. Finally, the paper suggest an interpretation of the whole library problem as one where the request is considered as a clue on the basis of which the library system makes a concatenated statistical inference in order to provide as an output an ordered list of those documents which most probably satisfy the information needs of the user
  11. Käki, M.: fKWIC: frequency-based Keyword-in-Context Index for filtering Web search results (2006) 0.02
    0.018442601 = product of:
      0.092213005 = sum of:
        0.092213005 = weight(_text_:index in 6112) [ClassicSimilarity], result of:
          0.092213005 = score(doc=6112,freq=4.0), product of:
            0.2250935 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.051511593 = queryNorm
            0.40966535 = fieldWeight in 6112, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.046875 = fieldNorm(doc=6112)
      0.2 = coord(1/5)
    
    Abstract
    Enormous Web search engine databases combined with short search queries result in large result sets that are often difficult to access. Result ranking works fairly well, but users need help when it fails. For these situations, we propose a filtering interface that is inspired by keyword-in-context (KWIC) indices. The user interface lists the most frequent keyword contexts (fKWIC). When a context is selected, the corresponding results are displayed in the result list, allowing users to concentrate on the specific context. We compared the keyword context index user interface to the rank order result listing in an experiment with 36 participants. The results show that the proposed user interface was 29% faster in finding relevant results, and the precision of the selected results was 19% higher. In addition, participants showed positive attitudes toward the system.
  12. Ding, Y.; Yan, E.; Frazho, A.; Caverlee, J.: PageRank for ranking authors in co-citation networks (2009) 0.02
    0.018442601 = product of:
      0.092213005 = sum of:
        0.092213005 = weight(_text_:index in 3161) [ClassicSimilarity], result of:
          0.092213005 = score(doc=3161,freq=4.0), product of:
            0.2250935 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.051511593 = queryNorm
            0.40966535 = fieldWeight in 3161, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.046875 = fieldNorm(doc=3161)
      0.2 = coord(1/5)
    
    Abstract
    This paper studies how varied damping factors in the PageRank algorithm influence the ranking of authors and proposes weighted PageRank algorithms. We selected the 108 most highly cited authors in the information retrieval (IR) area from the 1970s to 2008 to form the author co-citation network. We calculated the ranks of these 108 authors based on PageRank with the damping factor ranging from 0.05 to 0.95. In order to test the relationship between different measures, we compared PageRank and weighted PageRank results with the citation ranking, h-index, and centrality measures. We found that in our author co-citation network, citation rank is highly correlated with PageRank with different damping factors and also with different weighted PageRank algorithms; citation rank and PageRank are not significantly correlated with centrality measures; and h-index rank does not significantly correlate with centrality measures but does significantly correlate with other measures. The key factors that have impact on the PageRank of authors in the author co-citation network are being co-cited with important authors.
  13. Walz, J.: Analyse der Übertragbarkeit allgemeiner Rankingfaktoren von Web-Suchmaschinen auf Discovery-Systeme (2018) 0.02
    0.018442601 = product of:
      0.092213005 = sum of:
        0.092213005 = weight(_text_:index in 5744) [ClassicSimilarity], result of:
          0.092213005 = score(doc=5744,freq=4.0), product of:
            0.2250935 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.051511593 = queryNorm
            0.40966535 = fieldWeight in 5744, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.046875 = fieldNorm(doc=5744)
      0.2 = coord(1/5)
    
    Content
    Vgl.: https://publiscologne.th-koeln.de/frontdoor/index/index/searchtype/authorsearch/author/Julia+Walz/docId/1169/start/0/rows/10.
  14. Robertson, A.M.; Willett, P.: Use of genetic algorithms in information retrieval (1995) 0.02
    0.01738785 = product of:
      0.08693925 = sum of:
        0.08693925 = weight(_text_:index in 2418) [ClassicSimilarity], result of:
          0.08693925 = score(doc=2418,freq=2.0), product of:
            0.2250935 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.051511593 = queryNorm
            0.3862362 = fieldWeight in 2418, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.0625 = fieldNorm(doc=2418)
      0.2 = coord(1/5)
    
    Abstract
    Reviews the basic techniques involving genetic algorithms and their application to 2 problems in information retrieval: the generation of equifrequent groups of index terms; and the identification of optimal query and term weights. The algorithm developed for the generation of equifrequent groupings proved to be effective in operation, achieving results comparable with those obtained using a good deterministic algorithm. The algorithm developed for the identification of optimal query and term weighting involves fitness function that is based on full relevance information
  15. Gonnet, G.H.; Snider, T.; Baeza-Yates, R.A.: New indices for text : PAT trees and PAT arrays (1992) 0.02
    0.01738785 = product of:
      0.08693925 = sum of:
        0.08693925 = weight(_text_:index in 3500) [ClassicSimilarity], result of:
          0.08693925 = score(doc=3500,freq=2.0), product of:
            0.2250935 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.051511593 = queryNorm
            0.3862362 = fieldWeight in 3500, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.0625 = fieldNorm(doc=3500)
      0.2 = coord(1/5)
    
    Abstract
    We survey new indices for text, with emphasis on PAT arrays (also called suffic arrays). A PAT array is an index based on a new model of text that does not use the concept of word and does not need to know the structure of text
  16. Fuhr, N.: Ranking-Experimente mit gewichteter Indexierung (1986) 0.02
    0.016749864 = product of:
      0.08374932 = sum of:
        0.08374932 = weight(_text_:22 in 58) [ClassicSimilarity], result of:
          0.08374932 = score(doc=58,freq=2.0), product of:
            0.18038483 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051511593 = queryNorm
            0.46428138 = fieldWeight in 58, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=58)
      0.2 = coord(1/5)
    
    Date
    14. 6.2015 22:12:44
  17. Fuhr, N.: Rankingexperimente mit gewichteter Indexierung (1986) 0.02
    0.016749864 = product of:
      0.08374932 = sum of:
        0.08374932 = weight(_text_:22 in 2051) [ClassicSimilarity], result of:
          0.08374932 = score(doc=2051,freq=2.0), product of:
            0.18038483 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.051511593 = queryNorm
            0.46428138 = fieldWeight in 2051, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=2051)
      0.2 = coord(1/5)
    
    Date
    14. 6.2015 22:12:56
  18. Heinz, S.; Zobel, J.: Efficient single-pass index construction for text databases (2003) 0.02
    0.015214371 = product of:
      0.07607185 = sum of:
        0.07607185 = weight(_text_:index in 1678) [ClassicSimilarity], result of:
          0.07607185 = score(doc=1678,freq=2.0), product of:
            0.2250935 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.051511593 = queryNorm
            0.33795667 = fieldWeight in 1678, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1678)
      0.2 = coord(1/5)
    
  19. Sparck Jones, K.: ¬A statistical interpretation of term specificity and its application in retrieval (2004) 0.02
    0.015214371 = product of:
      0.07607185 = sum of:
        0.07607185 = weight(_text_:index in 4420) [ClassicSimilarity], result of:
          0.07607185 = score(doc=4420,freq=2.0), product of:
            0.2250935 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.051511593 = queryNorm
            0.33795667 = fieldWeight in 4420, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4420)
      0.2 = coord(1/5)
    
    Abstract
    The exhaustivity of document descriptions and the specificity of index terms are usually regarded as independent. It is suggested that specificity should be interpreted statistically, as a function of term use rather than of term meaning. The effects on retrieval of variations in term specificity are examined, experiments with three test collections showing, in particular, that frequently-occurring terms are required for good overall performance. It is argued that terms should be weighted according to collection frequency, so that matches on less frequent, more specific, terms are of greater value than matches on frequent terms. Results for the test collections show that considerable improvements in performance are obtained with this very simple procedure.
  20. Abdelkareem, M.A.A.: In terms of publication index, what indicator is the best for researchers indexing, Google Scholar, Scopus, Clarivate or others? (2018) 0.02
    0.015214371 = product of:
      0.07607185 = sum of:
        0.07607185 = weight(_text_:index in 4548) [ClassicSimilarity], result of:
          0.07607185 = score(doc=4548,freq=2.0), product of:
            0.2250935 = queryWeight, product of:
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.051511593 = queryNorm
            0.33795667 = fieldWeight in 4548, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.369764 = idf(docFreq=1520, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4548)
      0.2 = coord(1/5)
    

Years

Languages

  • e 50
  • d 5

Types

  • a 46
  • m 4
  • el 2
  • r 2
  • s 1
  • x 1
  • More… Less…