Modelling Road Accident Blackspots Data with the Discrete Generalized Pareto distribution

Published 16 Dec 2013 in stat.AP | (1312.4383v1)

Abstract: This study shows how road traffic networks events, in particular road accidents on blackspots, can be modelled with simple probabilistic distributions. We considered the number of accidents and the number of deaths on Spanish blackspots in the period 2003-2007, from Spanish General Directorate of Traffic (DGT). We modelled those datasets, respectively, with the discrete generalized Pareto distribution (a discrete parametric model with three parameters) and with the discrete Lomax distribution (a discrete parametric model with two parameters, and particular case of the previous model). For that, we analyzed the basic properties of both parametric models: cumulative distribution, survival, probability mass, quantile and hazard functions, genesis and rth-order moments; applied two estimation methods of their parameters: the $\mu$ and ($\mu+1$) frequency method and the maximum likelihood method; and used two goodness-of-fit tests: Chi-square test and discrete Kolmogorov-Smirnov test based on bootstrap resampling. We found that those probabilistic models can be useful to describe the road accident blackspots datasets analyzed.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper demonstrates that discrete generalized Pareto and Lomax models effectively capture accident and death frequencies at Spanish road blackspots.
The paper employs both a frequency method and maximum likelihood estimation to derive and refine the model parameters.
The paper validates the models using Chi-Square and discrete Kolmogorov–Smirnov tests with bootstrap, confirming a good fit for most years analyzed.

This paper investigates the use of discrete probability distributions to model road accident data specifically from "blackspots" in Spain between 2003 and 2007 (1312.4383). A blackspot is defined as a 100-meter road section experiencing three or more accidents within a year. The data, obtained from the Spanish General Directorate of Traffic (DGT), included 16,552 accidents and 895 deaths occurring at these locations over the five-year period.

The core proposal is to use two specific discrete distributions: 1. Discrete Generalized Pareto (DGP) distribution: To model the number of accidents per blackspot. This is a three-parameter distribution ( $\alpha$ : shape, $\lambda$ : scale, $\mu$ : location) defined by its cumulative distribution function (CDF):

$F(x) = 1 - [1 + \lambda(x - \mu + 1)]^{-\alpha}$ , for $x = \mu, \mu+1, \dots$

The paper details its properties including the survival function, probability mass function (PMF), quantile function, hazard function, and moments. The location parameter $\mu$ is taken as the minimum possible value, which is 3 for the accident data based on the blackspot definition.

Discrete Lomax (DLo) distribution: To model the number of deaths per blackspot. This is presented as a special case of the DGP where the location parameter $\mu=0$ . It's a two-parameter distribution ( $\alpha$ : shape, $\lambda$ : scale) with CDF:

$F(x) = 1 - [1 + \lambda(x + 1)]^{-\alpha}$ , for $x = 0, 1, \dots$

Its properties are derived from the DGP by setting $\mu=0$ . The paper also briefly discusses a one-parameter version where $\alpha=1$ .

For parameter estimation (specifically $\alpha$ and $\lambda$ , assuming $\mu$ is fixed by the minimum observation), two methods are described:

$\mu$ and ( $\mu+1$ ) frequency method: This uses the observed relative frequencies of the two smallest possible values ( $\mu$ and $\mu+1$ ) and equates them to the theoretical PMF values to solve for $\alpha$ and $\lambda$ . This method provides initial estimates.
Maximum Likelihood Estimation (MLE): The standard MLE approach is used, involving maximizing the log-likelihood function. The paper provides the partial derivatives (normal equations) which require numerical methods for solving. The estimates from the frequency method are used as starting values for the numerical optimization.

To assess how well the chosen distributions fit the actual data, two goodness-of-fit (GOF) tests were employed:

Chi-Square Test: The standard test comparing observed frequencies ( $O_i$ ) in bins to expected frequencies ( $E_i$ ) calculated from the fitted MLE model. Bins with expected frequencies below 5 are combined. The test statistic $\chi^2 = \sum (O_i - E_i)^2 / E_i$ is compared to a chi-square distribution with $k-r-1$ degrees of freedom (where $k$ is the number of bins after combination, $r$ is the number of estimated parameters).
Discrete Kolmogorov-Smirnov (KS) Test: This test uses the maximum absolute difference between the empirical distribution function (EDF) of the data and the CDF of the fitted model: $K_n = \sqrt{n} \max_k |F_n(k) - F(k, \hat{\vartheta}_n)|$ . Since the parameters are estimated from the data, the standard KS critical values are incorrect. Therefore, a parametric bootstrap procedure is used to calculate the p-value: simulate many datasets from the fitted model, re-estimate parameters for each synthetic dataset, calculate $K_n$ for each, and find the proportion of synthetic $K_n$ values exceeding the $K_n$ from the original data.

The results showed that the MLE parameter estimates for both distributions were generally significant for each year analyzed (2003-2007).

The Chi-Square test indicated a good fit for the DLo model (deaths) in all years and for the DGP model (accidents) in four out of five years (the fit was rejected for 2003 accidents at the 0.05 significance level).
The discrete KS test, using bootstrap p-values, indicated that neither the DGP model for accidents nor the DLo model for deaths could be rejected for any of the years studied at the 0.05 significance level.

The paper concludes that the Discrete Generalized Pareto distribution and its special case, the Discrete Lomax distribution, serve as useful and simple probabilistic models for describing the frequency of accidents and deaths, respectively, at road traffic blackspots, based on the Spanish data from 2003-2007.

Markdown Report Issue