Modeling citation concentration through a mixture of Leimkuhler curves
Abstract: When a graphical representation of the cumulative percentage of total citations to articles, ordered from most cited to least cited, is plotted against the cumulative percentage of articles, we obtain a Leimkuhler curve. In this study, we noticed that standard Leimkuhler functions may not be sufficient to provide accurate fits to various empirical informetrics data. Therefore, we introduce a new approach to Leimkuhler curves by fitting a known probability density function to the initial Leimkuhler curve, taking into account the presence of a heterogeneity factor. As a significant contribution to the existing literature, we introduce a pair of mixture distributions (called PG and PIG) to bibliometrics. In addition, we present closed-form expressions for Leimkuhler curves. {Some measures of citation concentration are examined empirically for the basic models (based on the Power {and Pareto distributions}) and the mixed models derived from {these}.} An application to two sources of informetric data was conducted to see how the mixing models outperform the standard basic models. The different models were fitted using non-linear least squares estimation.
- Research productivity: Are higher academic ranks more productive than lower ones?. Scientometrics, 88:915–928. DOI: 10.1007/s11192-011-0426-6
- Handbook of Mathematical Functions. No. 55 in Applied Mathematics Series. National Bureau of Standards.
- Citation statistics: a report from the International Mathematical Union (IMU) in cooperation with the International Council of Industrial and Applied Mathematics (ICIAM) and the Institute of Mathematical Statistics (IMS). Statistical Science, 24(1):1–14. https://www.jstor.org/stable/20697661
- Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6):716–723.
- Amemiya, T. (1985). Advanced Econometrics. Oxford: Basic Blackwell.
- Conditional Specification of Statistical Models. Springer-Verlag, New York.
- Atkinson, A. B. (1970). On the measurement of inequality. Journal of Economic Theory, 2:244–263.
- Application of Bradford’s law of scattering and Leimkuhler model to information science literature. COLLNET Journal of Scientometrics and Information Management, 15(1):197-212. DOI: 10.1080/09737766.2021.1943041
- Bozdogan, H. (1987). The general theory and its analytical extension. Psychometrika, 52, 345–370.
- Bradford, S. C. (1934). Sources of information on specific subjects. Engineering, 137:85–86; reprinted in Journal of Information Science, 10(4):173–175 (1985).
- Brzezinski, M. (2015). Power laws in citation distributions: evidence from Scopus. Scientometrics, 103:213–228. DOI: 10.1007/s11192-014-1524-z
- Burrell, Q. L. (1991). The Bradford distribution and the Gini index. Scientometrics, 21:181–194.
- Burrell, Q. L. (1992). The Gini index and the Leimkuhler curve for bibliometric processes. Information Processing and Management, 28:19–33.
- Burrell, Q. L. (2005). Symmetry and other transformation features of Lorez/Leimkuhler representations of informetric data. Information Processing and Management, 41:1317–1329.
- The analysis of library data. Journal of the Royal Statistical Society. Series A (General), 145(4):439–471.
- Notes on the measurement of inequality. Journal of Economic Theory, 6:180–187.
- Devore J. L. (2015). Probability and Statistics for Engineering and the Sciences. Boston: Cengage Learning.
- Modeling the obsolescence of research literature in disciplinary journals through the age of their cited references. Scientometrics, 127:2901–2931.
- Egghe L. (2006). Theory and practise of the g-index. Scientometrics, 69(1):131–152.
- Measuring statistical heterogeneity: The Pietra index. Physica A, 389:117–125.
- On power-law relationships of the Internet topology. In: Applications, technologies, architectures, and protocols for computer communication: Proceedings of the conference on applications, technologies, architectures, and protocols for computer communication, New York: ACM, pp. 251–262.
- The inverse Gaussian distribution and its statistical application-a review. Journal of the Royal Statistical Society. Series B (Methodological), 40(3):263–289.
- Parametric Lorenz curves based on the beta system of distributions. Communications in Statistics-Theory and Methods, 51(23):8371–8390. DOI: 10.1080/03610926.2021.1894449
- Gordy, M.B. (1998). Computationally convenient distributional assumptions for common-value auctions. Computational Economics, 12, 61–78.
- Patterns in the growth and thematic evolution of Artificial Intelligence research: A study using Bradford distribution of productivity and path analysis. Research Square Preprin. DOI: 10.21203/rs.3.rs-1806711/v1
- Hubert, J.J. (1977). Bibliometric models for journal productivity. Social Indicators Research, 4:441–473.
- Co-citation and co-authorship networks of statisticians. Journal of Business & Economic Statistics, 40(2):469–485. DOI: 10.1080/07350015.2021.1978469
- Kakwani, N. (1980). On a class of poverty measures. Econometrica, 48:437–446.
- Leimkuhler, F.F. (1967). The Bradford distribution. Journal of Documentation, 23:197–207.
- Lotka, A. J. (1926). The frequency distribution of scientific productivity. Journal of the Washington Academy of Sciences, 16(12):317–323.
- Nair N. U., Vineshkumar B. (2022). Modelling informetric data using quantile functions. Journal of Informetrics, 16(2), 101266. DOI: 10.1016/j.joi.2022.101266
- Newman, M. E. J. (2005). Power laws, Pareto distributions and Zipf’s law. Contemporary Physics, 46(5):323–351.
- Pareto, V. (1895). La legge della domanda. Giornale degli Economisti 2nd Series, 10:59–68.
- Becoming Metric-Wise. A bibliometric guide for researchers. Chandos-Elsevier.
- Salpeter, E. (1955). The luminosity function and stellar evolution. Astrophysical Journal, 121:161–167.
- Sarabia, J. (2008a). Explicit expressions for the Leimkuhler curve in parametric families. Information Processing and Management, 44:1808–1818.
- An ordered family of Lorenz curves. Journal of Econometrics, 91, 43–60.
- Sarabia, J. (2008b). A general definition of the Leimkuhler curve. Journal of Informetrics, 2:156–163.
- A general method for generating parametric Lorenz and Leimkuhler curves. Journal of Informetrics, 4(4):524–39.
- Power laws and critical fragmentation in global forests. Scientific Reports, 8:17766. DOI: 10.1038/s41598-018-36120-w
- Seshadri, V. (1983). The inverse Gaussian distribution: some properties and characterizations. The Canadian Journal of Statistics, 11(2):131–136.
- Shorrocks, A. F. (1983). Ranking Income Distributions. Economica, 50:2–17.
- Distorted Lorenz curves: models and comparisons. Social Choice and Welfare, 42,4,761–780.
- Thelwall, M. (2016a). The discretised lognormal and hooked power law distributions for complete citation data: Best options for modelling and regression. Journal of Informetrics, 10(2):336–346. DOI: 10.1016/j.joi.2015.12.007
- Thelwall, M. (2016b). Are the discretised lognormal and hooked power law distributions plausible for citation data?. Journal of Informetrics, 10(2):454–470. DOI: 10.1016/j.joi.2016.03.001
- Distributions for cited articles from individual subjects and years. Journal of Informetrics, 8(4):824–839. DOI: 10.1016/j.joi.2014.08.001
- Yitzhaki, S. (1983). On an extension of the Gini inequality index. International Economic Review, 24:617–628.
- Zipf, G. K. (1941). National unity and disunity; the nation as a bio-social organism. Bloomington: Principia Press.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.