Search (60 results, page 1 of 3)

  • × theme_ss:"Retrievalalgorithmen"
  1. Ding, Y.; Yan, E.; Frazho, A.; Caverlee, J.: PageRank for ranking authors in co-citation networks (2009) 0.16
    0.15861982 = product of:
      0.23792972 = sum of:
        0.19311218 = weight(_text_:citation in 3161) [ClassicSimilarity], result of:
          0.19311218 = score(doc=3161,freq=14.0), product of:
            0.23479973 = queryWeight, product of:
              4.6892867 = idf(docFreq=1104, maxDocs=44218)
              0.050071523 = queryNorm
            0.82245487 = fieldWeight in 3161, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              4.6892867 = idf(docFreq=1104, maxDocs=44218)
              0.046875 = fieldNorm(doc=3161)
        0.044817537 = product of:
          0.089635074 = sum of:
            0.089635074 = weight(_text_:index in 3161) [ClassicSimilarity], result of:
              0.089635074 = score(doc=3161,freq=4.0), product of:
                0.21880072 = queryWeight, product of:
                  4.369764 = idf(docFreq=1520, maxDocs=44218)
                  0.050071523 = queryNorm
                0.40966535 = fieldWeight in 3161, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.369764 = idf(docFreq=1520, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3161)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    This paper studies how varied damping factors in the PageRank algorithm influence the ranking of authors and proposes weighted PageRank algorithms. We selected the 108 most highly cited authors in the information retrieval (IR) area from the 1970s to 2008 to form the author co-citation network. We calculated the ranks of these 108 authors based on PageRank with the damping factor ranging from 0.05 to 0.95. In order to test the relationship between different measures, we compared PageRank and weighted PageRank results with the citation ranking, h-index, and centrality measures. We found that in our author co-citation network, citation rank is highly correlated with PageRank with different damping factors and also with different weighted PageRank algorithms; citation rank and PageRank are not significantly correlated with centrality measures; and h-index rank does not significantly correlate with centrality measures but does significantly correlate with other measures. The key factors that have impact on the PageRank of authors in the author co-citation network are being co-cited with important authors.
  2. Bornmann, L.; Mutz, R.: From P100 to P100' : a new citation-rank approach (2014) 0.13
    0.13046543 = product of:
      0.19569814 = sum of:
        0.16856214 = weight(_text_:citation in 1431) [ClassicSimilarity], result of:
          0.16856214 = score(doc=1431,freq=6.0), product of:
            0.23479973 = queryWeight, product of:
              4.6892867 = idf(docFreq=1104, maxDocs=44218)
              0.050071523 = queryNorm
            0.71789753 = fieldWeight in 1431, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.6892867 = idf(docFreq=1104, maxDocs=44218)
              0.0625 = fieldNorm(doc=1431)
        0.027136 = product of:
          0.054272 = sum of:
            0.054272 = weight(_text_:22 in 1431) [ClassicSimilarity], result of:
              0.054272 = score(doc=1431,freq=2.0), product of:
                0.17534193 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050071523 = queryNorm
                0.30952093 = fieldWeight in 1431, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1431)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Properties of a percentile-based rating scale needed in bibliometrics are formulated. Based on these properties, P100 was recently introduced as a new citation-rank approach (Bornmann, Leydesdorff, & Wang, 2013). In this paper, we conceptualize P100 and propose an improvement which we call P100'. Advantages and disadvantages of citation-rank indicators are noted.
    Date
    22. 8.2014 17:05:18
  3. Kanaeva, Z.: Ranking: Google und CiteSeer (2005) 0.07
    0.07259898 = product of:
      0.10889846 = sum of:
        0.08515447 = weight(_text_:citation in 3276) [ClassicSimilarity], result of:
          0.08515447 = score(doc=3276,freq=2.0), product of:
            0.23479973 = queryWeight, product of:
              4.6892867 = idf(docFreq=1104, maxDocs=44218)
              0.050071523 = queryNorm
            0.3626685 = fieldWeight in 3276, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.6892867 = idf(docFreq=1104, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3276)
        0.023743998 = product of:
          0.047487997 = sum of:
            0.047487997 = weight(_text_:22 in 3276) [ClassicSimilarity], result of:
              0.047487997 = score(doc=3276,freq=2.0), product of:
                0.17534193 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050071523 = queryNorm
                0.2708308 = fieldWeight in 3276, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=3276)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Im Rahmen des klassischen Information Retrieval wurden verschiedene Verfahren für das Ranking sowie die Suche in einer homogenen strukturlosen Dokumentenmenge entwickelt. Die Erfolge der Suchmaschine Google haben gezeigt dass die Suche in einer zwar inhomogenen aber zusammenhängenden Dokumentenmenge wie dem Internet unter Berücksichtigung der Dokumentenverbindungen (Links) sehr effektiv sein kann. Unter den von der Suchmaschine Google realisierten Konzepten ist ein Verfahren zum Ranking von Suchergebnissen (PageRank), das in diesem Artikel kurz erklärt wird. Darüber hinaus wird auf die Konzepte eines Systems namens CiteSeer eingegangen, welches automatisch bibliographische Angaben indexiert (engl. Autonomous Citation Indexing, ACI). Letzteres erzeugt aus einer Menge von nicht vernetzten wissenschaftlichen Dokumenten eine zusammenhängende Dokumentenmenge und ermöglicht den Einsatz von Banking-Verfahren, die auf den von Google genutzten Verfahren basieren.
    Date
    20. 3.2005 16:23:22
  4. Savoy, J.: Ranking schemes in hybrid Boolean systems : a new approach (1997) 0.07
    0.06978689 = product of:
      0.10468033 = sum of:
        0.072989546 = weight(_text_:citation in 393) [ClassicSimilarity], result of:
          0.072989546 = score(doc=393,freq=2.0), product of:
            0.23479973 = queryWeight, product of:
              4.6892867 = idf(docFreq=1104, maxDocs=44218)
              0.050071523 = queryNorm
            0.31085873 = fieldWeight in 393, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.6892867 = idf(docFreq=1104, maxDocs=44218)
              0.046875 = fieldNorm(doc=393)
        0.031690784 = product of:
          0.06338157 = sum of:
            0.06338157 = weight(_text_:index in 393) [ClassicSimilarity], result of:
              0.06338157 = score(doc=393,freq=2.0), product of:
                0.21880072 = queryWeight, product of:
                  4.369764 = idf(docFreq=1520, maxDocs=44218)
                  0.050071523 = queryNorm
                0.28967714 = fieldWeight in 393, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.369764 = idf(docFreq=1520, maxDocs=44218)
                  0.046875 = fieldNorm(doc=393)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    In most commercial online systems, the retrieval system is based on the Boolean model and its inverted file organization. Since the investment in these systems is so great and changing them could be economically unfeasible, this article suggests a new ranking scheme especially adapted for hypertext environments in order to produce more effective retrieval results and yet maintain the effectiveness of the investment made to date in the Boolean model. To select the retrieved documents, the suggested ranking strategy uses multiple sources of document content evidence. The proposed scheme integrates both the information provided by the index and query terms, and the inherent relationships between documents such as bibliographic references or hypertext links. We will demonstrate that our scheme represents an integration of both subject and citation indexing, and results in a significant imporvement over classical ranking schemes uses in hybrid Boolean systems, while preserving its efficiency. Moreover, through knowing the nearest neighbor and the hypertext links which constitute additional sources of evidence, our strategy will take them into account in order to further improve retrieval effectiveness and to provide 'good' starting points for browsing in a hypertext or hypermedia environement
  5. Li, J.; Willett, P.: ArticleRank : a PageRank-based alternative to numbers of citations for analysing citation networks (2009) 0.05
    0.045336 = product of:
      0.136008 = sum of:
        0.136008 = weight(_text_:citation in 751) [ClassicSimilarity], result of:
          0.136008 = score(doc=751,freq=10.0), product of:
            0.23479973 = queryWeight, product of:
              4.6892867 = idf(docFreq=1104, maxDocs=44218)
              0.050071523 = queryNorm
            0.57925105 = fieldWeight in 751, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              4.6892867 = idf(docFreq=1104, maxDocs=44218)
              0.0390625 = fieldNorm(doc=751)
      0.33333334 = coord(1/3)
    
    Abstract
    Purpose - The purpose of this paper is to suggest an alternative to the widely used Times Cited criterion for analysing citation networks. The approach involves taking account of the natures of the papers that cite a given paper, so as to differentiate between papers that attract the same number of citations. Design/methodology/approach - ArticleRank is an algorithm that has been derived from Google's PageRank algorithm to measure the influence of journal articles. ArticleRank is applied to two datasets - a citation network based on an early paper on webometrics, and a self-citation network based on the 19 most cited papers in the Journal of Documentation - using citation data taken from the Web of Knowledge database. Findings - ArticleRank values provide a different ranking of a set of papers from that provided by the corresponding Times Cited values, and overcomes the inability of the latter to differentiate between papers with the same numbers of citations. The difference in rankings between Times Cited and ArticleRank is greatest for the most heavily cited articles in a dataset. Originality/value - This is a novel application of the PageRank algorithm.
  6. Yan, E.; Ding, Y.; Sugimoto, C.R.: P-Rank: an indicator measuring prestige in heterogeneous scholarly networks (2011) 0.03
    0.0344076 = product of:
      0.1032228 = sum of:
        0.1032228 = weight(_text_:citation in 4349) [ClassicSimilarity], result of:
          0.1032228 = score(doc=4349,freq=4.0), product of:
            0.23479973 = queryWeight, product of:
              4.6892867 = idf(docFreq=1104, maxDocs=44218)
              0.050071523 = queryNorm
            0.4396206 = fieldWeight in 4349, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.6892867 = idf(docFreq=1104, maxDocs=44218)
              0.046875 = fieldNorm(doc=4349)
      0.33333334 = coord(1/3)
    
    Abstract
    Ranking scientific productivity and prestige are often limited to homogeneous networks. These networks are unable to account for the multiple factors that constitute the scholarly communication and reward system. This study proposes a new informetric indicator, P-Rank, for measuring prestige in heterogeneous scholarly networks containing articles, authors, and journals. P-Rank differentiates the weight of each citation based on its citing papers, citing journals, and citing authors. Articles from 16 representative library and information science journals are selected as the dataset. Principle Component Analysis is conducted to examine the relationship between P-Rank and other bibliometric indicators. We also compare the correlation and rank variances between citation counts and P-Rank scores. This work provides a new approach to examining prestige in scholarly communication networks in a more comprehensive and nuanced way.
  7. Chang, M.; Poon, C.K.: Efficient phrase querying with common phrase index (2008) 0.03
    0.031690784 = product of:
      0.09507235 = sum of:
        0.09507235 = product of:
          0.1901447 = sum of:
            0.1901447 = weight(_text_:index in 2061) [ClassicSimilarity], result of:
              0.1901447 = score(doc=2061,freq=18.0), product of:
                0.21880072 = queryWeight, product of:
                  4.369764 = idf(docFreq=1520, maxDocs=44218)
                  0.050071523 = queryNorm
                0.8690314 = fieldWeight in 2061, product of:
                  4.2426405 = tf(freq=18.0), with freq of:
                    18.0 = termFreq=18.0
                  4.369764 = idf(docFreq=1520, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2061)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    In this paper, we propose a common phrase index as an efficient index structure to support phrase queries in a very large text database. Our structure is an extension of previous index structures for phrases and achieves better query efficiency with modest extra storage cost. Further improvement in efficiency can be attained by implementing our index according to our observation of the dynamic nature of common word set. In experimental evaluation, a common phrase index using 255 common words has an improvement of about 11% and 62% in query time for the overall and large queries (queries of long phrases) respectively over an auxiliary nextword index. Moreover, it has only about 19% extra storage cost. Compared with an inverted index, our improvement is about 72% and 87% for the overall and large queries respectively. We also propose to implement a common phrase index with dynamic update feature. Our experiments show that more improvement in time efficiency can be achieved.
  8. Rada, R.; Barlow, J.; Potharst, J.; Zanstra, P.; Bijstra, D.: Document ranking using an enriched thesaurus (1991) 0.02
    0.024329849 = product of:
      0.072989546 = sum of:
        0.072989546 = weight(_text_:citation in 6626) [ClassicSimilarity], result of:
          0.072989546 = score(doc=6626,freq=2.0), product of:
            0.23479973 = queryWeight, product of:
              4.6892867 = idf(docFreq=1104, maxDocs=44218)
              0.050071523 = queryNorm
            0.31085873 = fieldWeight in 6626, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.6892867 = idf(docFreq=1104, maxDocs=44218)
              0.046875 = fieldNorm(doc=6626)
      0.33333334 = coord(1/3)
    
    Abstract
    A thesaurus may be viewed as a graph, and document retrieval algorithms can exploit this graph when both the documents and the query are represented by thesaurus terms. These retrieval algorithms measure the distance between the query and documents by using the path lengths in the graph. Previous work witj such strategies has shown that the hierarchical relations in the thesaurus are useful but the non-hierarchical are not. This paper shows that when the query explicitly mentions a particular non-hierarchical relation, the retrieval algorithm benefits from the presence of such relations in the thesaurus. Our algorithms were applied to the Excerpta Medica bibliographic citation database whose citations are indexed with terms from the EMTREE thesaurus. We also created an enriched EMTREE by systematically adding non-hierarchical relations from a medical knowledge base. Our algorithms used at one time EMTREE and, at another time, the enriched EMTREE in the course of ranking documents from Excerpta Medica against queries. When, and only when, the query specifically mentioned a particular non-hierarchical relation type, did EMTREE enriched with that relation type lead to a ranking that better corresponded to an expert's ranking
  9. Nagelschmidt, M.: Verfahren zur Anfragemodifikation im Information Retrieval (2008) 0.02
    0.024329849 = product of:
      0.072989546 = sum of:
        0.072989546 = weight(_text_:citation in 2774) [ClassicSimilarity], result of:
          0.072989546 = score(doc=2774,freq=2.0), product of:
            0.23479973 = queryWeight, product of:
              4.6892867 = idf(docFreq=1104, maxDocs=44218)
              0.050071523 = queryNorm
            0.31085873 = fieldWeight in 2774, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.6892867 = idf(docFreq=1104, maxDocs=44218)
              0.046875 = fieldNorm(doc=2774)
      0.33333334 = coord(1/3)
    
    Abstract
    Für das Modifizieren von Suchanfragen kennt das Information Retrieval vielfältige Möglichkeiten. Nach einer einleitenden Darstellung der Wechselwirkung zwischen Informationsbedarf und Suchanfrage wird eine konzeptuelle und typologische Annäherung an Verfahren zur Anfragemodifikation gegeben. Im Anschluss an eine kurze Charakterisierung des Fakten- und des Information Retrieval, sowie des Vektorraum- und des probabilistischen Modells, werden intellektuelle, automatische und interaktive Modifikationsverfahren vorgestellt. Neben klassischen intellektuellen Verfahren, wie der Blockstrategie und der "Citation Pearl Growing"-Strategie, umfasst die Darstellung der automatischen und interaktiven Verfahren Modifikationsmöglichkeiten auf den Ebenen der Morphologie, der Syntax und der Semantik von Suchtermen. Darüber hinaus werden das Relevance Feedback, der Nutzen informetrischer Analysen und die Idee eines assoziativen Retrievals auf der Basis von Clustering- und terminologischen Techniken, sowie zitationsanalytischen Verfahren verfolgt. Ein Eindruck für die praktischen Gestaltungsmöglichkeiten der behandelten Verfahren soll abschließend durch fünf Anwendungsbeispiele vermittelt werden.
  10. Jacso, P.: Testing the calculation of a realistic h-index in Google Scholar, Scopus, and Web of Science for F. W. Lancaster (2008) 0.02
    0.023290541 = product of:
      0.06987162 = sum of:
        0.06987162 = product of:
          0.13974324 = sum of:
            0.13974324 = weight(_text_:index in 5586) [ClassicSimilarity], result of:
              0.13974324 = score(doc=5586,freq=14.0), product of:
                0.21880072 = queryWeight, product of:
                  4.369764 = idf(docFreq=1520, maxDocs=44218)
                  0.050071523 = queryNorm
                0.63867813 = fieldWeight in 5586, product of:
                  3.7416575 = tf(freq=14.0), with freq of:
                    14.0 = termFreq=14.0
                  4.369764 = idf(docFreq=1520, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5586)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    This paper focuses on the practical limitations in the content and software of the databases that are used to calculate the h-index for assessing the publishing productivity and impact of researchers. To celebrate F. W. Lancaster's biological age of seventy-five, and "scientific age" of forty-five, this paper discusses the related features of Google Scholar, Scopus, and Web of Science (WoS), and demonstrates in the latter how a much more realistic and fair h-index can be computed for F. W. Lancaster than the one produced automatically. Browsing and searching the cited reference index of the 1945-2007 edition of WoS, which in my estimate has over a hundred million "orphan references" that have no counterpart master records to be attached to, and "stray references" that cite papers which do have master records but cannot be identified by the matching algorithm because of errors of omission and commission in the references of the citing works, can bring up hundreds of additional cited references given to works of an accomplished author but are ignored in the automatic process of calculating the h-index. The partially manual process doubled the h-index value for F. W. Lancaster from 13 to 26, which is a much more realistic value for an information scientist and professor of his stature.
    Object
    h-index
  11. Wiggers, G.; Verberne, S.; Loon, W. van; Zwenne, G.-J.: Bibliometric-enhanced legal information retrieval : combining usage and citations as flavors of impact relevance (2023) 0.02
    0.020274874 = product of:
      0.06082462 = sum of:
        0.06082462 = weight(_text_:citation in 1022) [ClassicSimilarity], result of:
          0.06082462 = score(doc=1022,freq=2.0), product of:
            0.23479973 = queryWeight, product of:
              4.6892867 = idf(docFreq=1104, maxDocs=44218)
              0.050071523 = queryNorm
            0.25904894 = fieldWeight in 1022, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.6892867 = idf(docFreq=1104, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1022)
      0.33333334 = coord(1/3)
    
    Abstract
    Bibliometric-enhanced information retrieval uses bibliometrics (e.g., citations) to improve ranking algorithms. Using a data-driven approach, this article describes the development of a bibliometric-enhanced ranking algorithm for legal information retrieval, and the evaluation thereof. We statistically analyze the correlation between usage of documents and citations over time, using data from a commercial legal search engine. We then propose a bibliometric boost function that combines usage of documents with citation counts. The core of this function is an impact variable based on usage and citations that increases in influence as citations and usage counts become more reliable over time. We evaluate our ranking function by comparing search sessions before and after the introduction of the new ranking in the search engine. Using a cost model applied to 129,571 sessions before and 143,864 sessions after the intervention, we show that our bibliometric-enhanced ranking algorithm reduces the time of a search session of legal professionals by 2 to 3% on average for use cases other than known-item retrieval or updating behavior. Given the high hourly tariff of legal professionals and the limited time they can spend on research, this is expected to lead to increased efficiency, especially for users with extremely long search sessions.
  12. Abu-Salem, H.; Al-Omari, M.; Evens, M.W.: Stemming methodologies over individual query words for an Arabic information retrieval system (1999) 0.02
    0.019684099 = product of:
      0.059052292 = sum of:
        0.059052292 = product of:
          0.118104585 = sum of:
            0.118104585 = weight(_text_:index in 3672) [ClassicSimilarity], result of:
              0.118104585 = score(doc=3672,freq=10.0), product of:
                0.21880072 = queryWeight, product of:
                  4.369764 = idf(docFreq=1520, maxDocs=44218)
                  0.050071523 = queryNorm
                0.5397815 = fieldWeight in 3672, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  4.369764 = idf(docFreq=1520, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3672)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Stemming is one of the most important factors that affect the performance of information retrieval systems. This article investigates how to improve the performance of an Arabic information retrieval system by imposing the retrieval method over individual words of a query depending on the importance of the WORD, the STEM, or the ROOT of the query terms in the database. This method, called Mxed Stemming, computes term importance using a weighting scheme that use the Term Frequency (TF) and the Inverse Document Frequency (IDF), called TFxIDF. An extended version of the Arabic IRS system is designed, implemented, and evaluated to reduce the number of irrelevant documents retrieved. The results of the experiment suggest that the proposed method outperforms the Word index method using the TFxIDF weighting scheme. It also outperforms the Stem index method using the Binary weighting scheme but does not outperform the Stem index method using the TFxIDF weighting scheme, and again it outperforms the Root index method using the Binary weighting scheme but does not outperform the Root index method using the TFxIDF weighting scheme
  13. Moffat, A.; Bell, T.A.H.: In situ generation of compressed inverted files (1995) 0.02
    0.018296685 = product of:
      0.05489005 = sum of:
        0.05489005 = product of:
          0.1097801 = sum of:
            0.1097801 = weight(_text_:index in 2648) [ClassicSimilarity], result of:
              0.1097801 = score(doc=2648,freq=6.0), product of:
                0.21880072 = queryWeight, product of:
                  4.369764 = idf(docFreq=1520, maxDocs=44218)
                  0.050071523 = queryNorm
                0.50173557 = fieldWeight in 2648, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  4.369764 = idf(docFreq=1520, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2648)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    An inverted index stores, for each term that appears in a collection of documents, a list of document numbers containing that term. Such an index is indispensible when Boolean or informal ranked queries are to be answered. Construction of the index ist, however, a non trivial task. Simple methods using in.memory data structures cannot be used for large collections because they require too much random access storage, and traditional disc based methods require large amounts of temporary file space. Describes a new indexing algorithm designed to create large compressed inverted indexes in situ. It makes use of simple compression codes for the positive integers and an in place external multi way merge sort. The new techniques has been used to invert a 2-gigabyte text collection in under 4 hours, using less than 40 megabytes of temporary disc space, and less than 20 megabytes of main memory
  14. Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval (1986) 0.02
    0.018090667 = product of:
      0.054272 = sum of:
        0.054272 = product of:
          0.108544 = sum of:
            0.108544 = weight(_text_:22 in 402) [ClassicSimilarity], result of:
              0.108544 = score(doc=402,freq=2.0), product of:
                0.17534193 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050071523 = queryNorm
                0.61904186 = fieldWeight in 402, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.125 = fieldNorm(doc=402)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Source
    Information processing and management. 22(1986) no.6, S.465-476
  15. Bar-Ilan, J.; Levene, M.: ¬The hw-rank : an h-index variant for ranking web pages (2015) 0.02
    0.017605992 = product of:
      0.052817974 = sum of:
        0.052817974 = product of:
          0.10563595 = sum of:
            0.10563595 = weight(_text_:index in 1694) [ClassicSimilarity], result of:
              0.10563595 = score(doc=1694,freq=2.0), product of:
                0.21880072 = queryWeight, product of:
                  4.369764 = idf(docFreq=1520, maxDocs=44218)
                  0.050071523 = queryNorm
                0.48279524 = fieldWeight in 1694, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.369764 = idf(docFreq=1520, maxDocs=44218)
                  0.078125 = fieldNorm(doc=1694)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
  16. Rajashekar, T.B.; Croft, W.B.: Combining automatic and manual index representations in probabilistic retrieval (1995) 0.02
    0.017429043 = product of:
      0.052287128 = sum of:
        0.052287128 = product of:
          0.104574256 = sum of:
            0.104574256 = weight(_text_:index in 2418) [ClassicSimilarity], result of:
              0.104574256 = score(doc=2418,freq=4.0), product of:
                0.21880072 = queryWeight, product of:
                  4.369764 = idf(docFreq=1520, maxDocs=44218)
                  0.050071523 = queryNorm
                0.4779429 = fieldWeight in 2418, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.369764 = idf(docFreq=1520, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2418)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Results from research in information retrieval have suggested that significant improvements in retrieval effectiveness can be obtained by combining results from multiple index representioms, query formulations, and search strategies. The inference net model of retrieval, which was designed from this point of view, treats information retrieval as an evidental reasoning process where multiple sources of evidence about document and query content are combined to estimate relevance probabilities. Uses a system based on this model to study the retrieval effectiveness benefits of combining these types of document and query information that are found in typical commercial databases and information services. The results indicate that substantial real benefits are possible
  17. Smeaton, A.F.; Rijsbergen, C.J. van: ¬The retrieval effects of query expansion on a feedback document retrieval system (1983) 0.02
    0.015829332 = product of:
      0.047487997 = sum of:
        0.047487997 = product of:
          0.09497599 = sum of:
            0.09497599 = weight(_text_:22 in 2134) [ClassicSimilarity], result of:
              0.09497599 = score(doc=2134,freq=2.0), product of:
                0.17534193 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050071523 = queryNorm
                0.5416616 = fieldWeight in 2134, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=2134)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    30. 3.2001 13:32:22
  18. Back, J.: ¬An evaluation of relevancy ranking techniques used by Internet search engines (2000) 0.02
    0.015829332 = product of:
      0.047487997 = sum of:
        0.047487997 = product of:
          0.09497599 = sum of:
            0.09497599 = weight(_text_:22 in 3445) [ClassicSimilarity], result of:
              0.09497599 = score(doc=3445,freq=2.0), product of:
                0.17534193 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050071523 = queryNorm
                0.5416616 = fieldWeight in 3445, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=3445)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    25. 8.2005 17:42:22
  19. Maron, M.E.; Kuhns, I.L.: On relevance, probabilistic indexing and information retrieval (1960) 0.02
    0.015247239 = product of:
      0.045741715 = sum of:
        0.045741715 = product of:
          0.09148343 = sum of:
            0.09148343 = weight(_text_:index in 1928) [ClassicSimilarity], result of:
              0.09148343 = score(doc=1928,freq=6.0), product of:
                0.21880072 = queryWeight, product of:
                  4.369764 = idf(docFreq=1520, maxDocs=44218)
                  0.050071523 = queryNorm
                0.418113 = fieldWeight in 1928, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  4.369764 = idf(docFreq=1520, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1928)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Reports on a novel technique for literature indexing and searching in a mechanized library system. The notion of relevance is taken as the key concept in the theory of information retrieval and a comparative concept of relevance is explicated in terms of the theory of probability. The resulting technique called 'Probabilistic indexing' allows a computing machine, given a request for information, to make a statistical inference and derive a number (called the 'relevance number') for each document, which is a measure of the probability that the document will satisfy the given request. The result of a search is an ordered list of those documents which satisfy the request ranked according to their probable relevance. The paper goes on to show that whereas in a conventional library system the cross-referencing ('see' and 'see also') is based soley on the 'semantic closeness' between index terms, statistical measures of closeness between index terms can be defined and computed. Thus, given an arbitrary request consisting of one (or many) index term(s), a machine can eleborate on it to increase the probability of selecting relevant documents that would not otherwise have been selected. Finally, the paper suggest an interpretation of the whole library problem as one where the request is considered as a clue on the basis of which the library system makes a concatenated statistical inference in order to provide as an output an ordered list of those documents which most probably satisfy the information needs of the user
  20. Käki, M.: fKWIC: frequency-based Keyword-in-Context Index for filtering Web search results (2006) 0.01
    0.01493918 = product of:
      0.044817537 = sum of:
        0.044817537 = product of:
          0.089635074 = sum of:
            0.089635074 = weight(_text_:index in 6112) [ClassicSimilarity], result of:
              0.089635074 = score(doc=6112,freq=4.0), product of:
                0.21880072 = queryWeight, product of:
                  4.369764 = idf(docFreq=1520, maxDocs=44218)
                  0.050071523 = queryNorm
                0.40966535 = fieldWeight in 6112, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.369764 = idf(docFreq=1520, maxDocs=44218)
                  0.046875 = fieldNorm(doc=6112)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Enormous Web search engine databases combined with short search queries result in large result sets that are often difficult to access. Result ranking works fairly well, but users need help when it fails. For these situations, we propose a filtering interface that is inspired by keyword-in-context (KWIC) indices. The user interface lists the most frequent keyword contexts (fKWIC). When a context is selected, the corresponding results are displayed in the result list, allowing users to concentrate on the specific context. We compared the keyword context index user interface to the rank order result listing in an experiment with 36 participants. The results show that the proposed user interface was 29% faster in finding relevant results, and the precision of the selected results was 19% higher. In addition, participants showed positive attitudes toward the system.

Years

Languages

  • e 54
  • d 6

Types

  • a 50
  • m 4
  • el 2
  • r 2
  • x 2
  • s 1
  • More… Less…