Search (41 results, page 1 of 3)

  • × author_ss:"Egghe, L."
  1. Egghe, L.; Rousseau, R.: Averaging and globalising quotients of informetric and scientometric data (1996) 0.02
    0.023862753 = product of:
      0.035794128 = sum of:
        0.017283546 = weight(_text_:to in 7659) [ClassicSimilarity], result of:
          0.017283546 = score(doc=7659,freq=6.0), product of:
            0.08279609 = queryWeight, product of:
              1.818051 = idf(docFreq=19512, maxDocs=44218)
              0.045541126 = queryNorm
            0.20874833 = fieldWeight in 7659, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.818051 = idf(docFreq=19512, maxDocs=44218)
              0.046875 = fieldNorm(doc=7659)
        0.018510582 = product of:
          0.037021164 = sum of:
            0.037021164 = weight(_text_:22 in 7659) [ClassicSimilarity], result of:
              0.037021164 = score(doc=7659,freq=2.0), product of:
                0.15947726 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.045541126 = queryNorm
                0.23214069 = fieldWeight in 7659, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=7659)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    It is possible, using ISI's Journal Citation Report (JCR), to calculate average impact factors (AIF) for LCR's subject categories but it can be more useful to know the global Impact Factor (GIF) of a subject category and compare the 2 values. Reports results of a study to compare the relationships between AIFs and GIFs of subjects, based on the particular case of the average impact factor of a subfield versus the impact factor of this subfield as a whole, the difference being studied between an average of quotients, denoted as AQ, and a global average, obtained as a quotient of averages, and denoted as GQ. In the case of impact factors, AQ becomes the average impact factor of a field, and GQ becomes its global impact factor. Discusses a number of applications of this technique in the context of informetrics and scientometrics
    Source
    Journal of information science. 22(1996) no.3, S.165-170
  2. Egghe, L.: ¬A universal method of information retrieval evaluation : the "missing" link M and the universal IR surface (2004) 0.02
    0.02174836 = product of:
      0.03262254 = sum of:
        0.014111955 = weight(_text_:to in 2558) [ClassicSimilarity], result of:
          0.014111955 = score(doc=2558,freq=4.0), product of:
            0.08279609 = queryWeight, product of:
              1.818051 = idf(docFreq=19512, maxDocs=44218)
              0.045541126 = queryNorm
            0.17044228 = fieldWeight in 2558, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.818051 = idf(docFreq=19512, maxDocs=44218)
              0.046875 = fieldNorm(doc=2558)
        0.018510582 = product of:
          0.037021164 = sum of:
            0.037021164 = weight(_text_:22 in 2558) [ClassicSimilarity], result of:
              0.037021164 = score(doc=2558,freq=2.0), product of:
                0.15947726 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.045541126 = queryNorm
                0.23214069 = fieldWeight in 2558, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2558)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    The paper shows that the present evaluation methods in information retrieval (basically recall R and precision P and in some cases fallout F ) lack universal comparability in the sense that their values depend on the generality of the IR problem. A solution is given by using all "parts" of the database, including the non-relevant documents and also the not-retrieved documents. It turns out that the solution is given by introducing the measure M being the fraction of the not-retrieved documents that are relevant (hence the "miss" measure). We prove that - independent of the IR problem or of the IR action - the quadruple (P,R,F,M) belongs to a universal IR surface, being the same for all IR-activities. This universality is then exploited by defining a new measure for evaluation in IR allowing for unbiased comparisons of all IR results. We also show that only using one, two or even three measures from the set {P,R,F,M} necessary leads to evaluation measures that are non-universal and hence not capable of comparing different IR situations.
    Date
    14. 8.2004 19:17:22
  3. Egghe, L.; Guns, R.; Rousseau, R.; Leuven, K.U.: Erratum (2012) 0.01
    0.010283656 = product of:
      0.03085097 = sum of:
        0.03085097 = product of:
          0.06170194 = sum of:
            0.06170194 = weight(_text_:22 in 4992) [ClassicSimilarity], result of:
              0.06170194 = score(doc=4992,freq=2.0), product of:
                0.15947726 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.045541126 = queryNorm
                0.38690117 = fieldWeight in 4992, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=4992)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    14. 2.2012 12:53:22
  4. Egghe, L.; Rousseau, R.; Hooydonk, G. van: Methods for accrediting publications to authors or countries : consequences for evaluation studies (2000) 0.01
    0.00880035 = product of:
      0.026401049 = sum of:
        0.026401049 = weight(_text_:to in 4384) [ClassicSimilarity], result of:
          0.026401049 = score(doc=4384,freq=14.0), product of:
            0.08279609 = queryWeight, product of:
              1.818051 = idf(docFreq=19512, maxDocs=44218)
              0.045541126 = queryNorm
            0.3188683 = fieldWeight in 4384, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              1.818051 = idf(docFreq=19512, maxDocs=44218)
              0.046875 = fieldNorm(doc=4384)
      0.33333334 = coord(1/3)
    
    Abstract
    One aim of science evaluation studies is to determine quantitatively the contribution of different players (authors, departments, countries) to the whole system. This information is then used to study the evolution of the system, for instance to gauge the results of special national or international programs. Taking articles as our basic data, we want to determine the exact relative contribution of each coauthor or each country. These numbers are brought together to obtain country scores, or department scores, etc. It turns out, as we will show in this article, that different scoring methods can yield totally different rankings. Conseqeuntly, a ranking between countries, universities, research groups or authors, based on one particular accrediting methods does not contain an absolute truth about their relative importance
  5. Egghe, L.; Bornmann, L.: Fallout and miss in journal peer review (2013) 0.01
    0.0077611795 = product of:
      0.023283537 = sum of:
        0.023283537 = weight(_text_:to in 1759) [ClassicSimilarity], result of:
          0.023283537 = score(doc=1759,freq=8.0), product of:
            0.08279609 = queryWeight, product of:
              1.818051 = idf(docFreq=19512, maxDocs=44218)
              0.045541126 = queryNorm
            0.28121543 = fieldWeight in 1759, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.818051 = idf(docFreq=19512, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1759)
      0.33333334 = coord(1/3)
    
    Abstract
    Purpose - The authors exploit the analogy between journal peer review and information retrieval in order to quantify some imperfections of journal peer review. Design/methodology/approach - The authors define fallout rate and missing rate in order to describe quantitatively the weak papers that were accepted and the strong papers that were missed, respectively. To assess the quality of manuscripts the authors use bibliometric measures. Findings - Fallout rate and missing rate are put in relation with the hitting rate and success rate. Conclusions are drawn on what fraction of weak papers will be accepted in order to have a certain fraction of strong accepted papers. Originality/value - The paper illustrates that these curves are new in peer review research when interpreted in the information retrieval terminology.
  6. Egghe, L.: Mathematical theories of citation (1998) 0.01
    0.0076815756 = product of:
      0.023044726 = sum of:
        0.023044726 = weight(_text_:to in 5125) [ClassicSimilarity], result of:
          0.023044726 = score(doc=5125,freq=6.0), product of:
            0.08279609 = queryWeight, product of:
              1.818051 = idf(docFreq=19512, maxDocs=44218)
              0.045541126 = queryNorm
            0.2783311 = fieldWeight in 5125, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.818051 = idf(docFreq=19512, maxDocs=44218)
              0.0625 = fieldNorm(doc=5125)
      0.33333334 = coord(1/3)
    
    Abstract
    Focuses on possible mathematical theories of citation and on the intrinsic problems related to it. Sheds light on aspects of mathematical complexity as encountered in, for example, fractal theory and Mandelbrot's law. Also discusses dynamical aspects of citation theory as reflected in evolutions of journal rankings, centres of gravity or of the set of source journals. Makes some comments in this connection on growth and obsolescence
    Footnote
    Contribution to a thematic issue devoted to 'Theories of citation?'
  7. Egghe, L.: ¬A model for the size-frequency function of coauthor pairs (2008) 0.01
    0.0074376534 = product of:
      0.02231296 = sum of:
        0.02231296 = weight(_text_:to in 2366) [ClassicSimilarity], result of:
          0.02231296 = score(doc=2366,freq=10.0), product of:
            0.08279609 = queryWeight, product of:
              1.818051 = idf(docFreq=19512, maxDocs=44218)
              0.045541126 = queryNorm
            0.26949292 = fieldWeight in 2366, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.818051 = idf(docFreq=19512, maxDocs=44218)
              0.046875 = fieldNorm(doc=2366)
      0.33333334 = coord(1/3)
    
    Abstract
    Lotka's law was formulated to describe the number of authors with a certain number of publications. Empirical results (Morris & Goldstein, 2007) indicate that Lotka's law is also valid if one counts the number of publications of coauthor pairs. This article gives a simple model proving this to be true, with the same Lotka exponent, if the number of coauthored papers is proportional to the number of papers of the individual coauthors. Under the assumption that this number of coauthored papers is more than proportional to the number of papers of the individual authors (to be explained in the article), we can prove that the size-frequency function of coauthor pairs is Lotkaian with an exponent that is higher than that of the Lotka function of individual authors, a fact that is confirmed in experimental results.
  8. Egghe, L.; Ravichandra Rao, I.K.: Duality revisited : construction of fractional frequency distributions based on two dual Lotka laws (2002) 0.01
    0.006652439 = product of:
      0.019957317 = sum of:
        0.019957317 = weight(_text_:to in 1006) [ClassicSimilarity], result of:
          0.019957317 = score(doc=1006,freq=8.0), product of:
            0.08279609 = queryWeight, product of:
              1.818051 = idf(docFreq=19512, maxDocs=44218)
              0.045541126 = queryNorm
            0.24104178 = fieldWeight in 1006, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.818051 = idf(docFreq=19512, maxDocs=44218)
              0.046875 = fieldNorm(doc=1006)
      0.33333334 = coord(1/3)
    
    Abstract
    Fractional frequency distributions of, for example, authors with a certain (fractional) number of papers are very irregular and, therefore, not easy to model or to explain. This article gives a first attempt to this by assuming two simple Lotka laws (with exponent 2): one for the number of authors with n papers (total count here) and one for the number of papers with n authors, n E N. Based an an earlier made convolution model of Egghe, interpreted and reworked now for discrete scores, we are able to produce theoretical fractional frequency distributions with only one parameter, which are in very close agreement with the practical ones as found in a large dataset produced earlier by Rao. The article also shows that (irregular) fractional frequency distributions are a consequence of Lotka's law, and are not examples of breakdowns of this famous historical law.
  9. Egghe, L.: Relations between the continuous and the discrete Lotka power function (2005) 0.01
    0.006652439 = product of:
      0.019957317 = sum of:
        0.019957317 = weight(_text_:to in 3464) [ClassicSimilarity], result of:
          0.019957317 = score(doc=3464,freq=8.0), product of:
            0.08279609 = queryWeight, product of:
              1.818051 = idf(docFreq=19512, maxDocs=44218)
              0.045541126 = queryNorm
            0.24104178 = fieldWeight in 3464, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.818051 = idf(docFreq=19512, maxDocs=44218)
              0.046875 = fieldNorm(doc=3464)
      0.33333334 = coord(1/3)
    
    Abstract
    The discrete Lotka power function describes the number of sources (e.g., authors) with n = 1, 2, 3, ... items (e.g., publications). As in econometrics, informetrics theory requires functions of a continuous variable j, replacing the discrete variable n. Now j represents item densities instead of number of items. The continuous Lotka power function describes the density of sources with item density j. The discrete Lotka function one obtains from data, obtained empirically; the continuous Lotka function is the one needed when one wants to apply Lotkaian informetrics, i.e., to determine properties that can be derived from the (continuous) model. It is, hence, important to know the relations between the two models. We show that the exponents of the discrete Lotka function (if not too high, i.e., within limits encountered in practice) and of the continuous Lotka function are approximately the same. This is important to know in applying theoretical results (from the continuous model), derived from practical data.
  10. Egghe, L.: New relations between similarity measures for vectors based on vector norms (2009) 0.01
    0.006652439 = product of:
      0.019957317 = sum of:
        0.019957317 = weight(_text_:to in 2708) [ClassicSimilarity], result of:
          0.019957317 = score(doc=2708,freq=8.0), product of:
            0.08279609 = queryWeight, product of:
              1.818051 = idf(docFreq=19512, maxDocs=44218)
              0.045541126 = queryNorm
            0.24104178 = fieldWeight in 2708, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.818051 = idf(docFreq=19512, maxDocs=44218)
              0.046875 = fieldNorm(doc=2708)
      0.33333334 = coord(1/3)
    
    Abstract
    The well-known similarity measures Jaccard, Salton's cosine, Dice, and several related overlap measures for vectors are compared. While general relations are not possible to prove, we study these measures on the trajectories of the form [X]=a[Y], where a > 0 is a constant and [·] denotes the Euclidean norm of a vector. In this case, direct functional relations between these measures are proved. For Jaccard, we prove that it is a convexly increasing function of Salton's cosine measure, but always smaller than or equal to the latter, hereby explaining a curve, experimentally found by Leydesdorff. All the other measures have a linear relation with Salton's cosine, reducing even to equality, in case a = 1. Hence, for equally normed vectors (e.g., for normalized vectors) we, essentially, only have Jaccard's measure and Salton's cosine measure since all the other measures are equal to the latter.
  11. Egghe, L.; Guns, R.: Applications of the generalized law of Benford to informetric data (2012) 0.01
    0.006652439 = product of:
      0.019957317 = sum of:
        0.019957317 = weight(_text_:to in 376) [ClassicSimilarity], result of:
          0.019957317 = score(doc=376,freq=8.0), product of:
            0.08279609 = queryWeight, product of:
              1.818051 = idf(docFreq=19512, maxDocs=44218)
              0.045541126 = queryNorm
            0.24104178 = fieldWeight in 376, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.818051 = idf(docFreq=19512, maxDocs=44218)
              0.046875 = fieldNorm(doc=376)
      0.33333334 = coord(1/3)
    
    Abstract
    In a previous work (Egghe, 2011), the first author showed that Benford's law (describing the logarithmic distribution of the numbers 1, 2, ... , 9 as first digits of data in decimal form) is related to the classical law of Zipf with exponent 1. The work of Campanario and Coslado (2011), however, shows that Benford's law does not always fit practical data in a statistical sense. In this article, we use a generalization of Benford's law related to the general law of Zipf with exponent ? > 0. Using data from Campanario and Coslado, we apply nonlinear least squares to determine the optimal ? and show that this generalized law of Benford fits the data better than the classical law of Benford.
  12. Egghe, L.: Note on a possible decomposition of the h-Index (2013) 0.01
    0.006652439 = product of:
      0.019957317 = sum of:
        0.019957317 = weight(_text_:to in 683) [ClassicSimilarity], result of:
          0.019957317 = score(doc=683,freq=2.0), product of:
            0.08279609 = queryWeight, product of:
              1.818051 = idf(docFreq=19512, maxDocs=44218)
              0.045541126 = queryNorm
            0.24104178 = fieldWeight in 683, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.818051 = idf(docFreq=19512, maxDocs=44218)
              0.09375 = fieldNorm(doc=683)
      0.33333334 = coord(1/3)
    
    Series
    Letter to the editor
  13. Rousseau, R.; Egghe, L.; Guns, R.: Becoming metric-wise : a bibliometric guide for researchers (2018) 0.01
    0.0061980444 = product of:
      0.018594133 = sum of:
        0.018594133 = weight(_text_:to in 5226) [ClassicSimilarity], result of:
          0.018594133 = score(doc=5226,freq=10.0), product of:
            0.08279609 = queryWeight, product of:
              1.818051 = idf(docFreq=19512, maxDocs=44218)
              0.045541126 = queryNorm
            0.22457743 = fieldWeight in 5226, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.818051 = idf(docFreq=19512, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5226)
      0.33333334 = coord(1/3)
    
    Abstract
    Aims to inform researchers about metrics so that they become aware of the evaluative techniques being applied to their scientific output. Understanding these concepts will help them during their funding initiatives, and in hiring and tenure. The book not only describes what indicators do (or are designed to do, which is not always the same thing), but also gives precise mathematical formulae so that indicators can be properly understood and evaluated. Metrics have become a critical issue in science, with widespread international discussion taking place on the subject across scientific journals and organizations. As researchers should know the publication-citation context, the mathematical formulae of indicators being used by evaluating committees and their consequences, and how such indicators might be misused, this book provides an ideal tome on the topic. Provides researchers with a detailed understanding of bibliometric indicators and their applications. Empowers researchers looking to understand the indicators relevant to their work and careers. Presents an informed and rounded picture of bibliometrics, including the strengths and shortcomings of particular indicators. Supplies the mathematics behind bibliometric indicators so they can be properly understood. Written by authors with longstanding expertise who are considered global leaders in the field of bibliometrics
  14. Egghe, L.; Rousseau, R.; Rousseau, S.: TOP-curves (2007) 0.01
    0.005487982 = product of:
      0.016463947 = sum of:
        0.016463947 = weight(_text_:to in 50) [ClassicSimilarity], result of:
          0.016463947 = score(doc=50,freq=4.0), product of:
            0.08279609 = queryWeight, product of:
              1.818051 = idf(docFreq=19512, maxDocs=44218)
              0.045541126 = queryNorm
            0.19884932 = fieldWeight in 50, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.818051 = idf(docFreq=19512, maxDocs=44218)
              0.0546875 = fieldNorm(doc=50)
      0.33333334 = coord(1/3)
    
    Abstract
    Several characteristics of classical Lorenz curves make them unsuitable for the study of a group of topperformers. TOP-curves, defined as a kind of mirror image of TIP-curves used in poverty studies, are shown to possess the properties necessary for adequate empirical ranking of various data arrays, based on the properties of the highest performers (i.e., the core). TOP-curves and essential TOP-curves, also introduced in this article, simultaneously represent the incidence, intensity, and inequality among the top. It is shown that TOPdominance partial order, introduced in this article, is stronger than Lorenz dominance order. In this way, this article contributes to the study of cores, a central issue in applied informetrics.
  15. Egghe, L.: ¬The influence of transformations on the h-index and the g-index (2008) 0.01
    0.005487982 = product of:
      0.016463947 = sum of:
        0.016463947 = weight(_text_:to in 1881) [ClassicSimilarity], result of:
          0.016463947 = score(doc=1881,freq=4.0), product of:
            0.08279609 = queryWeight, product of:
              1.818051 = idf(docFreq=19512, maxDocs=44218)
              0.045541126 = queryNorm
            0.19884932 = fieldWeight in 1881, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.818051 = idf(docFreq=19512, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1881)
      0.33333334 = coord(1/3)
    
    Abstract
    In a previous article, we introduced a general transformation on sources and one on items in an arbitrary information production process (IPP). In this article, we investigate the influence of these transformations on the h-index and on the g-index. General formulae that describe this influence are presented. These are applied to the case that the size-frequency function is Lotkaian (i.e., is a decreasing power function). We further show that the h-index of the transformed IPP belongs to the interval bounded by the two transformations of the h-index of the original IPP, and we also show that this property is not true for the g-index.
  16. Egghe, L.; Liang, L.; Rousseau, R.: Fundamental properties of rhythm sequences (2008) 0.01
    0.005487982 = product of:
      0.016463947 = sum of:
        0.016463947 = weight(_text_:to in 1965) [ClassicSimilarity], result of:
          0.016463947 = score(doc=1965,freq=4.0), product of:
            0.08279609 = queryWeight, product of:
              1.818051 = idf(docFreq=19512, maxDocs=44218)
              0.045541126 = queryNorm
            0.19884932 = fieldWeight in 1965, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.818051 = idf(docFreq=19512, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1965)
      0.33333334 = coord(1/3)
    
    Abstract
    Fundamental mathematical properties of rhythm sequences are studied. In particular, a set of three axioms for valid rhythm indicators is proposed, and it is shown that the R-indicator satisfies only two out of three but that the R-indicator satisfies all three. This fills a critical, logical gap in the study of these indicator sequences. Matrices leading to a constant R-sequence are called baseline matrices. They are characterized as matrices with constant w-year diachronous impact factors. The relation with classical impact factors is clarified. Using regression analysis matrices with a rhythm sequence that is on average equal to 1 (smaller than 1, larger than 1) are characterized.
  17. Egghe, L.: ¬The power of power laws and an interpretation of Lotkaian informetric systems as self-similar fractals (2005) 0.00
    0.0048009846 = product of:
      0.014402954 = sum of:
        0.014402954 = weight(_text_:to in 3466) [ClassicSimilarity], result of:
          0.014402954 = score(doc=3466,freq=6.0), product of:
            0.08279609 = queryWeight, product of:
              1.818051 = idf(docFreq=19512, maxDocs=44218)
              0.045541126 = queryNorm
            0.17395693 = fieldWeight in 3466, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.818051 = idf(docFreq=19512, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3466)
      0.33333334 = coord(1/3)
    
    Abstract
    Power laws as defined in 1926 by A. Lotka are increasing in importance because they have been found valid in varied social networks including the Internet. In this article some unique properties of power laws are proven. They are shown to characterize functions with the scalefree property (also called seif-similarity property) as weIl as functions with the product property. Power laws have other desirable properties that are not shared by exponential laws, as we indicate in this paper. Specifically, Naranan (1970) proves the validity of Lotka's law based on the exponential growth of articles in journals and of the number of journals. His argument is reproduced here and a discrete-time argument is also given, yielding the same law as that of Lotka. This argument makes it possible to interpret the information production process as a seif-similar fractal and show the relation between Lotka's exponent and the (seif-similar) fractal dimension of the system. Lotkaian informetric systems are seif-similar fractals, a fact revealed by Mandelbrot (1977) in relation to nature, but is also true for random texts, which exemplify a very special type of informetric system.
  18. Egghe, L.: Untangling Herdan's law and Heaps' law : mathematical and informetric arguments (2007) 0.00
    0.0048009846 = product of:
      0.014402954 = sum of:
        0.014402954 = weight(_text_:to in 271) [ClassicSimilarity], result of:
          0.014402954 = score(doc=271,freq=6.0), product of:
            0.08279609 = queryWeight, product of:
              1.818051 = idf(docFreq=19512, maxDocs=44218)
              0.045541126 = queryNorm
            0.17395693 = fieldWeight in 271, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.818051 = idf(docFreq=19512, maxDocs=44218)
              0.0390625 = fieldNorm(doc=271)
      0.33333334 = coord(1/3)
    
    Abstract
    Herdan's law in linguistics and Heaps' law in information retrieval are different formulations of the same phenomenon. Stated briefly and in linguistic terms they state that vocabularies' sizes are concave increasing power laws of texts' sizes. This study investigates these laws from a purely mathematical and informetric point of view. A general informetric argument shows that the problem of proving these laws is, in fact, ill-posed. Using the more general terminology of sources and items, the author shows by presenting exact formulas from Lotkaian informetrics that the total number T of sources is not only a function of the total number A of items, but is also a function of several parameters (e.g., the parameters occurring in Lotka's law). Consequently, it is shown that a fixed T(or A) value can lead to different possible A (respectively, T) values. Limiting the T(A)-variability to increasing samples (e.g., in a text as done in linguistics) the author then shows, in a purely mathematical way, that for large sample sizes T~ A**phi, where phi is a constant, phi < 1 but close to 1, hence roughly, Heaps' or Herdan's law can be proved without using any linguistic or informetric argument. The author also shows that for smaller samples, a is not a constant but essentially decreases as confirmed by practical examples. Finally, an exact informetric argument on random sampling in the items shows that, in most cases, T= T(A) is a concavely increasing function, in accordance with practical examples.
  19. Egghe, L.; Ravichandra Rao, I.K.: ¬The influence of the broadness of a query of a topic on its h-index : models and examples of the h-index of n-grams (2008) 0.00
    0.0048009846 = product of:
      0.014402954 = sum of:
        0.014402954 = weight(_text_:to in 2009) [ClassicSimilarity], result of:
          0.014402954 = score(doc=2009,freq=6.0), product of:
            0.08279609 = queryWeight, product of:
              1.818051 = idf(docFreq=19512, maxDocs=44218)
              0.045541126 = queryNorm
            0.17395693 = fieldWeight in 2009, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.818051 = idf(docFreq=19512, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2009)
      0.33333334 = coord(1/3)
    
    Abstract
    The article studies the influence of the query formulation of a topic on its h-index. In order to generate pure random sets of documents, we used N-grams (N variable) to measure this influence: strings of zeros, truncated at the end. The used databases are WoS and Scopus. The formula h=T**1/alpha, proved in Egghe and Rousseau (2006) where T is the number of retrieved documents and is Lotka's exponent, is confirmed being a concavely increasing function of T. We also give a formula for the relation between h and N the length of the N-gram: h=D10**(-N/alpha) where D is a constant, a convexly decreasing function, which is found in our experiments. Nonlinear regression on h=T**1/alpha gives an estimation of , which can then be used to estimate the h-index of the entire database (Web of Science [WoS] and Scopus): h=S**1/alpha, , where S is the total number of documents in the database.
  20. Egghe, L.: Mathematical study of h-index sequences (2009) 0.00
    0.0048009846 = product of:
      0.014402954 = sum of:
        0.014402954 = weight(_text_:to in 4217) [ClassicSimilarity], result of:
          0.014402954 = score(doc=4217,freq=6.0), product of:
            0.08279609 = queryWeight, product of:
              1.818051 = idf(docFreq=19512, maxDocs=44218)
              0.045541126 = queryNorm
            0.17395693 = fieldWeight in 4217, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.818051 = idf(docFreq=19512, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4217)
      0.33333334 = coord(1/3)
    
    Abstract
    This paper studies mathematical properties of h-index sequences as developed by Liang [Liang, L. (2006). h-Index sequence and h-index matrix: Constructions and applications. Scientometrics, 69(1), 153-159]. For practical reasons, Liming studies such sequences where the time goes backwards while it is more logical to use the time going forward (real career periods). Both type of h-index sequences are studied here and their interrelations are revealed. We show cases where these sequences are convex, linear and concave. We also show that, when one of the sequences is convex then the other one is concave, showing that the reverse-time sequence, in general, cannot be used to derive similar properties of the (difficult to obtain) forward time sequence. We show that both sequences are the same if and only if the author produces the same number of papers per year. If the author produces an increasing number of papers per year, then Liang's h-sequences are above the "normal" ones. All these results are also valid for g- and R-sequences. The results are confirmed by the h-, g- and R-sequences (forward and reverse time) of the author.