Dispersion Modeling in Zero-inflated Tweedie Models with Applications to Insurance Claim Data Analysis
Abstract: The Tweedie generalized linear models are commonly applied in the insurance industry to analyze semicontinuous claim data. For better prediction of the aggregated claim size, the mean and dispersion of the Tweedie model are often estimated together using the double generalized linear models. In some actuarial applications, it is common to observe an excessive percentage of zeros, which often results in a decline in the performance of the Tweedie model. The zero-inflated Tweedie model has been recently considered in the literature, which draws inspiration from the zero-inflated Poisson model. In this article, we consider the problem of dispersion modeling of the Tweedie state in the zero-inflated Tweedie model, in addition to the mean modeling. We also model the probability of the zero state based on the generalized expectation-maximization algorithm. To potentially incorporate nonlinear and interaction effects of the covariates, we estimate the mean, dispersion, and zero-state probability using decision-tree-based gradient boosting. We conduct extensive numerical studies to demonstrate the improved performance of our method over existing ones.
- Borman, S. (2004). The expectation maximization algorithm—a short tutorial. Unpublished paper https://www.lri.fr/~sebag/COURS/EM_algorithm.pdf.
- Generalized additive modelling of dependent frequency and severity distributions for aggregate claims. Journal of Statistical and Econometric Methods 12, 1–37.
- Cragg, J. G. (1971). Some statistical models for limited dependent variables with application to the demand for durable goods. Econometrica 39, 829–844.
- Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B: Statistical Methodology 39, 1–22.
- Testing for evidence of adverse selection in the automobile insurance market: A comment. Journal of Political Economy 109, 444–453.
- Series evaluation of Tweedie exponential dispersion model densities. Statistics and Computing 15, 267–280.
- Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. The Annals of Statistics 29, 1189–1232.
- The elements of statistical learning: data mining, inference, and prediction, vol. 2. Springer.
- Hayashi, F. (2000). Econometrics. Princeton, NJ: Princeton University Press.
- Jorgensen, B. (1997). The theory of dispersion models. CRC Press.
- Fitting Tweedie’s compound Poisson model to insurance claims data. Scandinavian Actuarial Journal 1994, 69–93.
- LightGBM: A highly efficient gradient boosting decision tree. Advances in Neural Information Processing Systems 30, 3146–3154.
- Lambert, D. (1992a). Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics 34, 1.
- Lambert, D. (1992b). Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics 34, 1–14.
- Statistical analysis of zero-inflated nonnegative continuous data: a review. Statistical Science 34, 253–279.
- A two-part model of the demand for medical care: preliminary results from the health insurance study. Health, Economics, and Health Economics 137, 103–23.
- An extended quasi-likelihood function. Biometrika 74, 221–232.
- Tweedie’s compound Poisson model with grouped elastic net. Journal of Computational and Graphical Statistics 25, 606–625.
- Boosting: foundations and algorithms. The MIT Press.
- Smyth, G. K. (1996). Partitioned algorithms for maximum likelihood and other non-linear estimation. Statistics and Computing 6, 201–216.
- Fitting Tweedie’s compound Poisson model to insurance claims data: dispersion modelling. ASTIN Bulletin: The Journal of the IAA 32, 143–157.
- Adjusted likelihood methods for modelling dispersion in generalized linear models. Environmetrics 10, 695–709.
- Tobin, J. (1958). Estimation of relationships for limited dependent variables. Econometrica 26, 24–36.
- Tweedie, M. C. (1984). An index which distinguishes between some important exponential families. In Statistics: Applications and new directions: Proceedings of the Indian Statistical Institute Golden Jubilee International conference, vol. 579.
- Insurance premium prediction via gradient tree-boosted Tweedie compound Poisson models. Journal of Business & Economic Statistics 36, 456–470.
- On modeling claim frequency data in general insurance with extra zeros. Insurance: Mathematics and Economics 36, 153–163.
- Zhang, Y. (2012). Likelihood-based and bayesian methods for tweedie compound poisson linear mixed models. Statistics and Computing 23, 743–757.
- Zhang, Y. (2013). Likelihood-based and Bayesian methods for Tweedie compound Poisson linear mixed models. Statistics and Computing 23, 743–757.
- Tweedie gradient boosting for extremely unbalanced zero-inflated data. Communications in Statistics - Simulation and Computation 51, 5507–5529.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.