- Original Paper
- Open Access
- Published:

# Autoregressive nonlinear time-series modeling of traffic fatalities in Europe

*European Transport Research Review*
**volume 3**, pages113–127
(2011)

## Abstract

### Purpose

The objective of this paper is to provide a parsimonious model for linking motorization level with the decreasing fatality rates observed across EU countries during the last three decades.

### Methods

A macroscopic analysis of road-safety in Europe at the country level is proposed through the application of non-linear models correlating fatalities and vehicles for the period between 1970 and 2002. Given the time series nature of road safety data, these models result in auto-correlated residuals, thus violating at least one of the assumptions of non-linear regression. Autoregressive forms of the considered models that overcome these limitations and provide superior predictive capabilities are also considered.

### Results

An autoregressive log-transformed model seems to outperform the base autoregressive non-linear model in this respect. The use of these models allowed for the identification of the best and worst performing countries.

### Conclusions

The proposed models can prove useful for assessing the road safety performance of the examined countries, as well as for obtaining some insight on the current and future trends of less developed countries.

## Introduction

Road traffic injuries represent a major global public health crisis, requiring concerted efforts for effective and sustainable prevention. Worldwide, the number of people killed in road traffic accidents every year is estimated at 1.2 million, while the number of those injured could be as high as 50 million – the combined population of five of the world’s largest cities [37]. Furthermore, while the number of accidents in developed countries is reducing, unless decisive action is taken globally, the total number of road traffic deaths and injuries is forecast to rise by some 65% between 2000 and 2020 [36], with deaths in low-income and middle-income countries expected to increase by as much as 80% [37] due to their upcoming growth and associated consequent traffic.

Macroscopic modeling can provide insight into this problem and help policy-makers in both under-developed and developing countries adjust their policies in reaction to the changing conditions. Older studies focused primarily on developed countries. Within the current research, data from countries from various parts of Europe are analyzed thus highlighting differences between countries that can be used to anticipate traffic safety trends in less developed countries. The interest of such an analysis may become more pronounced when considering that the EU includes different groups of countries with different socioeconomic characteristics presenting different road safety cultures and performances (i.e. western European countries, southern Mediterranean countries, eastern new member states) and requiring potentially different road safety measures, programmes and strategies.

Several researchers [9, 14, 24, 26], using road accident statistics, have presumed that the explanatory variables have a multiplicative effect on accidents (as opposed to e.g. additive). Henning-Hager [17] presented a non-linear regression model to express the relationship between traffic fatalities, traffic volumes and the quality of transportation supply and demand in urban areas. Qin et al. [31] showed that the relationship between crashes and the daily volume (AADT) is non-linear and varies by crash type, and is significantly different from the relationship between crashes and segment length for all crash types. A macroscopic road-safety model commonly used in the late 60s was proposed by Smeed [33] linking the number of fatalities with the number of vehicles and the population. Jacobs [18] repeated this analysis for a number of developed and developing countries using data between 1968 and 1975 while Gharaybeh [13] applied the same formula to assess the development of road safety in Jordan, relative to that of other middle-eastern and developing countries.

It should be noted, however, that many studies have criticised Smeed’s model because it only concentrates on the motorisation level of country and ignores the impact of other variables (cf. [3, 8]). An implication of this is that effectiveness assessment of road safety measures would have little meaning because road fatalities can simply be predicted from population and vehicle numbers in any country and any year, at least at macroscopic level. Andreassen [3] criticised the model’s accuracy because there would always be a decline in traffic risk for any increase in the number of vehicles, but generally in a non-linear way, and proposed using country-specific parameters to distinguish between countries with a similar degree of motorisation. The main criticism of Andreassen, however, seems to be targeted at the way that the Smeed formula was manipulated algebraically (instead of a new regression being fit to the resulting transformation). Smeed’s formula expected the downtrend in fatalities rate but not the number of absolute fatalities, which occurred in the highly motorized countries in the seventies [8].

A critical review of a number of approaches for modeling road safety trends can be found in [14, 27]. Al-Haji [2] provides a review of these concerns, as well as several alternative approaches for the development of road safety models. Another useful review [10] provides a detailed analysis of the debate surrounding Smeed’s formulas and analysis. One of the conclusions is that “there is general agreement now among researchers, that models describing traffic safety developments should have time-dependent parameters.” In this paper, we contribute to this discussion by exploring the development of models that explicitly treat the temporal correlation of the road safety data. Within this alternative approach, time is not treated as an explanatory variable, but instead its negative impact (temporal serial correlation) is factored out by the use of appropriate statistical procedures in order to focus on road safety related predictors.

The comparison of time series of road safety among different countries has been an interesting research topic. Lassarre [22] applies the local linear trend model to ten European countries and uses the estimated trend and elasticities to make inference about the relationship between traffic flow and number of fatalities. Page [28] presents a statistical model to compare road mortality in OECD (Organisation for Economic Co-operation and Development) countries, combining cross-sectional and panel data. Models with several exogenous variables are developed and countries are ranked based on their road mortality level. Beenstock and Gafni [5] show that there is a relationship between the downward trend in the rate of road accidents in Israel and other countries and suggest that this reflects the international propagation of road safety technology as it is embodied in motor vehicles and road design, rather than parochial road safety policy. Van Beeck et al. [35] examine the association between prosperity and traffic accident mortality in industrialized countries in a long-term perspective (1962–1990) and find that in the long-term the relation between prosperity and traffic accident mortality appears to be non-linear. Kopits and Cropper [21] use linear and log-linear forms to model region specific trends of traffic fatality risk and per income growth using panel data from 1963 to 1999 for 88 countries. Abbas [1] compares the road safety of Egypt with that of other Arab nations and G-7 countries, and develops predictive models for road safety. Yannis et al. [38] fit piece-wise linear regression models to identify changes in macroscopic road accident trends. Lessons from the analysis of the past road safety patterns of developed countries provide some insight into the underlying process that relates motorization levels with personal risk and can prove to be beneficial for predicting the road safety evolution of developing countries that may have not yet reached the same breakpoints.

Taking into account the road safety macroscopic modeling background presented above, the objective of this paper is to provide a parsimonious model for linking motorization level with the decreasing fatality rates across EU countries observed during the last three decades. Models used in the late 60’s to describe the – at the time – increasing relationship between motorization and traffic fatalities were adjusted in order to describe the decreasing relationship observed in the last three decades. Time-series methods are applied to remove the temporal trends (and autocorrelation) from the modeling of traffic fatality risk, thus allowing for capturing the impact of macroscopic road safety related model parameters on traffic risk.

On that purpose, a macroscopic analysis of road-safety in Europe at the country level (16 EU countries) is proposed through the application of non-linear models correlating fatalities and vehicles for the period between 1970 and 2002. Road safety trends can be attributed to various parameters, some of which can be modeled explicitly, while others may be handled indirectly. Within this analysis, the motorization level has been chosen as the single explanatory variable, as elaborate models that would include some of the other prevailing parameters (e.g. vehicle quality, traffic safety measures and regulations, intensity of police enforcement) are less macroscopic and thus fall outside the scope of this research.

## Methodology

While the linear regression model is simple (to run and interpret), elegant and efficient, many interesting processes may be more adequately modeled by non-linear models in practice. Linear regression models might have been a practical necessity in the past, but theoretical and computational developments have made the use of more elaborate (appropriate, accurate) methods practical. This can also be seen in road safety research, where while early work used multiple linear regression modeling (assuming normally distributed errors and homoscedasticity), over the past two decades there has been a departure from this model. Generalized linear models (GLM) allow for some nonlinear relationships to be modeled and relax some restrictions on the distributional assumptions of linear regression [12, 25]. Although many scientific and engineering processes can be described well using linear models, or other relatively simple types of models, there are many processes that are inherently nonlinear. Non-linear models can then be used. The biggest advantage of nonlinear regression over many other techniques is the broad range of functions that can be fit.

A non-linear regression model can be written as:

where f is the expectation function, x_{m} is a vector of associated regressor variables or independent variables for the nth case, Y_{m} is the dependent variable, *θ* is a vector of parameters to be estimated and Z_{m} are random disturbances. This model is of the same general form as the linear model, with the exception that the expected responses are nonlinear functions of the parameters. More formally, for non-linear models, at least one of the derivatives of the expectation function with respect to the parameters depends on at least one of the parameters. The presentation of non-linear models on the following sections relies on Bates and Watts [4]. Non-linear regression has been widely used in road-safety related research.

The Gauss-Markov assumptions from ordinary least square (OLS) procedures (normal, i.i.d. disturbances etc) still apply in non-linear regression. Therefore, whenever time or distance is involved as a factor in a regression analysis, it is important to check the assumption of independent residuals. When the residuals are not independent, the model for the observations must be altered to account for dependence (e.g. moving average or autoregressive models of variable order).

Road safety data are often correlated in space or time, raising the suspicion of correlated data (and hence residuals), which violates one of the underlying assumptions (that of independent disturbances). In order to provide a clear distinction with the previously defined data *m* = 1, …, *M*, potentially correlated data are denoted by *n* = 1, …, *N*. Serial correlation of the disturbances can be detected from an ordered time series plot of the residuals versus time or from a lag plot of the residuals on the (*n*)th case versus the residuals on the (*n*-*1*)th case. If a violation of independent disturbances is detected, then the model needs to be altered to account for this. Common forms for dependence, or autocorrelation, of disturbances are moving average or autoregressive models of variable order [7].

A moving average process of order 1 can be written as:

while an autoregressive process of order 1, can be expressed as:

where *ε*_{
n
}, *n* = 1, 2, …, *N* are white noise terms (i.e. independent normal error terms with zero mean and constant unit variance). While both processes could be used, within this research, the autoregressive process was selected in order to account for correlated residuals.

A macroscopic road-safety model commonly used in the late 60’s, is based on Smeed’s original relationship [33]:

where *F* is the number of fatalities, *V* is the number of vehicles (in thousands), *P* is the population (in thousands), m indicates the country, *α* and *β* are model parameters to be estimated and *Z*_{m} are the disturbances. Using data for road fatalities, vehicles and population from 20 (mostly European) countries, Smeed [33] estimated the values of *α* and *β* as 0.0003 and −0.66 respectively. Jacobs [18] repeated this analysis for a number of developed and developing countries using data between 1968 and 1975 and obtained values of 0.000204 and −0.84 for *α* and *β* respectively. Gharaybeh [13] applied Smeed’s formula to assess the development of road safety in Jordan, relative to that of other middle-eastern and developing countries.

In this paper, Eq. 4 forms the base model from which all the others are developed and against which they are benchmarked. Within this research, V/P was chosen as a macroscopic predictor of traffic fatalities, which can be safely calculated by the use of data available and comparable across several EU countries (Vehicles and Population). Traffic, road expenditure, driver behaviour and other road safety related parameters may also affect traffic fatalities’ trends but cannot easily be calculated in a uniform way across the EU.

### An autoregressive non-linear model

The model to be fitted is

where . In order to solve this problem by reducing it to a non-linear least squares problem, one can subtract *φ* times the equation for *Y*_{n-1} from *Y*_{n}, thus obtaining:

which is equivalent to

Substituting Eq. 4 into Eq. 7, the autoregressive non-linear model that corrects for temporal correlation is:

### A log-transformed model

The original non-linear model (Eq. 4) can be converted to a similar (but not equivalent) linear model through a simple log transformation. Taking the log of both sides of Eq. 4 (temporarily ignoring the additive error term), the following linear model is obtained:

Adding an additive error term, the equation becomes:

This equation is similar, but not equivalent to Eq. 4. The difference is in the error term. If one takes the exponent of Eq. 10, the resulting equation is:

i.e. there is a multiplicative error term (as opposed to an additive error term in Eq. 4). The log transformations lead to some more transformations of model parameters, e.g. *α*′ = exp(*α*) in Eq. 11.

### An autoregressive log-transformed model

An autoregressive version of Eq. 10 can be constructed in a similar way to Eq. 8:

Note that the above model (Eq. 12) is not linear in the parameters, due to the second and fourth right-hand terms (in particular (1−*φ*)·*α* and *φ*·*β*). Furthermore, unlike the model in Eq. 8 (which is also not linear in the parameters, but can easily be transformed into a linear model through taking the logarithm, as shown in Eq. 4), this model cannot be easily transformed into a linear model.

In the remainder of this research, the four models represented by Eqs. 4, 8, 10, and 12, are estimated and assessed through a variety of tests, including lack-of-fit tests and portmanteau tests. Furthermore, the predictive ability of the models has been assessed using the root mean square percent error (RMSPE) statistic [30]. In order to be able to validate the predictive ability of the estimated models, the data-set was split to an estimation part and a validation part.

All models in this research have been estimated using the R Software for Statistical Computing v. 2.11.0 [32].

## Data overview

Aggregate fatality, population and vehicle data from European countries between 1970 and 2002 have been used. Data for years 1970–1994 have been used for the model estimation and years 1995–2002 have been used for validation. Choosing different splits for the data set (e.g. setting aside fewer or more data for the validation) might lead to different results. The particular choice is based on the fact that as many as possible data should be allocated for estimation, while still keeping more than a few data-points for validation. The data have been obtained primarily from IRTAD (International Road Traffic and Accident Database). Official representatives of the countries with missing data were contacted directly, and several responses with additional data were incorporated to the database. In the end, out of the 25 countries of the enlarged EU, sufficiently complete data have been available for 16 of them, for which this model has been applied. Fatalities data refer to the 30-day definition of fatality for all countries, i.e. include all persons who died within 30 days of being involved in a traffic accident. The timeframe used in this research was decided during the Safety-Net project in 2006, when this work initiated [34]. The presented models are general and could be applied to newer data.

The final data that were used in this research are shown in Fig. 1. The variables defined as fatalities/vehicle and vehicles/population exhibit opposite trends (the former is mostly decreasing in this time period, while the latter is in general increasing). In this application, in order to be able to compare among countries, ratios of fatalities per vehicles and vehicles per population have been used instead of absolute numbers. The vehicle ownership is increasing consistently for all countries. The fatality rates show the opposite trend, i.e. they all decrease, especially in the earlier years.

One of the assumptions of the (linear and nonlinear) regression is that the data follow a normal distribution and aim to minimize the sum of squares (least-squares regression). Outliers can have a dominant effect in this process and therefore can be of particular interest in this analysis. On the other hand, one needs to be very cautious in easily removing data points that are suspected outliers, as this process can also artificially affect the model properties.

## Results and main diagnostics

The model presented in Eq. 4 was estimated for the 16 countries mentioned above and the estimated coefficients and statistics are shown in Table 1. All parameters are very significant.

Figure 2 shows the main diagnostics for the estimated models, as per Eqs. 4, 8, 10 and 12. Indicative results are shown for two of the largest European countries, namely France and Germany. For each country, the residuals per observation are plotted, followed by the autocorrelation function (ACF) and the partial ACF (PACF). Note that PACF plots start at lag 1, while ACF plots start at 0. Subfigure 2A shows the diagnostics for France for the non-linear model, while Subfigure 2E shows the same diagnostics for Germany. These results are representative of the other countries as well and suggest that the assumption of independent disturbances is violated. The residual plots suggest that residual observations depend on the previous residual. In most of the ACF plots, the correlation decays quickly and falls below the limits (computed using Bartlett’s formula and indicated with the dotted lines) after one or two intervals. Please note that lag-0 autocorrelations have a value of 1 by definition. Therefore the fact that these values exceed the limits should not be interpreted as a violation of assumptions.

An analysis of the correlograms indicates that serial correlation exists and -if untreated- the independence assumption of the regression is violated. Both the apparent exponential decay of the autocorrelations and the presence of a significant partial autocorrelation of order 1 suggest that a first order autoregressive process may be able to capture the serial correlation of the residuals. This is confirmed, as the autocorrelation is mostly dealt with in the residuals of the autoregressive models (as per Eq. 8), diagnostics for which are provided in Subfigures 2B and 2D (for France) and 2F and 2H (for Germany).

The estimated coefficients of the log-transformed models are shown in Table 2. The model shown in Eq. 10 is shown on top, followed by the model presented in Eq. 12. Similarly to the non-linear model (Table 1), the estimation results are unreliable for models with estimated values for *φ* very close to 1 (such as Finland, Germany, Ireland and United Kingdom, highlighted in the table). The term “unreliable” here is used to convey inconsistency with expectations about these values, i.e. in terms of sign and magnitude.

One of the observations that can be made from Tables 1 and 2 is that the base non-linear regressions provide lower standard errors (respectively higher t-test statistics) than their counterparts that have been corrected for serial correlation. Since the autoregressive models provide superior fit (as indicated by both the summary goodness of fit statistics), as well as satisfy the assumption of independent residuals (as indicated by the graphical diagnostics), it may be concluded that the “ordinary” non-linear models underestimate the standard errors. An exhaustive discussion of this issue in the context of OLS is provided in Petersen [29]. This is a serious potential issue with models that ignore violations of the independence assumption, as it could lead to the acceptance of non-valid models as true.

The significance of the coefficient *β* associated with the motorization level reinforces the indications about the validity of this model. Even when correcting for autocorrelation, the obtained t-statistics suggest that this coefficient is very significant. Therefore, it is inferred that the negative relationship between the motorization level and the fatality risk is not circumstantial. In the two following sections, further statistical tests will be performed to provide additional insight into the properties of the developed models.

## Model assessment

### “Portmanteau” tests

In the previous section, the autocorrelations for the various lags have been considered individually. A different way to test this type of lack-of-fit of a model is to consider the first e.g. 12 autocorrelations as a whole. It should be noted that this value depends on the data. A lag of 4 or 5 might be sufficient, and using a lower lag might not illustrate the temporal dependency. Larger lags do not add to the inference, but are also rather harmless in this context. Denoting the first K autocorrelations as (*k* = 1,2,… *K*) Box and Pierce [6] showed that if the fitted model is appropriate then

is approximately distributed as *χ*^{2} (*K*–*p*–*q*) where n is the number of residuals used to fit the model, p and q are the number of the autoregressive (AR) and moving average (MA) coefficients. On the other hand, if the model is inappropriate, the average values of Q will be inflated. Therefore a so-called “portmanteau” test of the hypothesis of model adequacy can be obtained by comparing the value of Q against a standard *χ*^{2} table. Small p-values would imply evidence of serial correlation. Ljung and Box [23] argued that the chi-squared distribution does not provide an adequate approximation of the distribution of the Q-statistic under the null hypothesis for short time-series, while Davies et al. [11] provided empirical evidence to support this argument. Ljung and Box [23] proposed a modified statistic (Ljung-Box-Pierce statistic):

A more detailed presentation of these tests is available in several texts, including Box et al. [7], on which this section is based. In the following application, Eq. 14 is used.

Figure 3 visually presents the portmanteau test results for the four groups of models. While the interpretation of the obtained p-values cannot be easily quantified, smaller p-values indicate violation of the assumption of independent residuals. Both the non-linear and the log-transformed models show mostly low p-values (and consequently a violation of the assumption of independent residuals). A threshold of 5% (indicated by a horizontal dashed line) exceeds several models’ lines for the non-linear model and all-but-three (Cyprus, Luxemburg and the Netherlands) for the log-transformed. The situation is substantially improved for the autoregressive models, with the p-values being considerably increased. Actually, only a couple of models (Finland and Spain) fall below the 5% threshold for the non-linear AR model, and only one (Spain) for the log-transformed AR model.

### Comparison of predictive results

Summary statistics of prediction for years 1995–2002 using all four models are presented in Fig. 4. This data is different from the data-set that was used for estimation (1970–1994). The root mean square percent error (RMSPE) statistic [30] is used for Fig. 4:

where x is the variable of interest, N is the number of observations (years) and superscripts 0 and 1 denote observed and fitted measures respectively. RMSPE is one of many measures that can be used to assess the predictive accuracy of the various models. RMSPE has several desirable properties, e.g. it penalizes larger errors and is converted to a percentage, which makes it easier to comprehend, as it is unit and variable independent.

The impact of the autoregressive process in the prediction results is clear, with both autoregressive models almost consistently outperforming the base models. The non-linear AR model performs on average 39% better than the nonlinear model (i.e. the average reduction in the RMSPE of the models for the 16 countries that have been considered is 39%), while the autoregressive log-transformed model performs on average 49% better than the log-transformed model. This is a substantial improvement at the cost of just one extra parameter (the AR coefficient *φ*). Also, the AR log-transformed model performs on average more than 13% better than the AR non-linear model.

In absolute numbers, the non-linear and log-transformed models provide sometimes inaccurate predictions, ranging between 0.1 and 0.4 in terms of RMSPE. The performance of the autoregressive models, on the other hand, is a lot more consistent with most models providing predictions well below 0.1. Only two models (Cyprus and Luxemburg) have a higher RMSPE (i.e. lower predictive ability). An explanation may be found in the fact that these are by far the smaller of the considered countries (in terms of population) and hence the sample (not in term of annual observations, but in terms of fatalities per year) is smaller for them.

Figure 5 visually presents the prediction performance of the various models for three of the larger countries (France, Germany and Italy) as a sample.

## Model interpretation

Figure 6 presents a plot of the estimated model parameters per country, on the basis of the non linear AR model. It is noted that, while the log-transformed AR model seemed to provide a superior overall performance in terms of RMSPE, the non-linear AR model parameters are more intuitive in terms of sign and magnitude. A discussion about this point is provided in the conclusion. Subfigure 6A illustrates the parameter values of the non-linear base model, while Subfigure 6B reflects the parameters of the non-linear AR model. A visual comparison of the two subfigures indicates that the two models do not produce vastly different parameter values (with the exception of Spain and Portugal, that show a substantial decrease of the value of parameter *α*). Therefore, while the autoregressive model resolves some of the issues due to the correlated residuals in the data, the changes in the final model results are not dramatic.

The interpretation of parameter *α* is fairly straightforward, as it is a positive multiplicative parameter, and as such it can be considered as an indicator of the level of traffic risk in the country. Naturally, these parameters are not always directly comparable, as the value of the second parameter *β* also affects the total number of fatality rate. As the base of the exponent term is the car ownership rate, which is usually less than one, a larger negative value implies a higher overall term. One can deduce that parameter *α* is the *dominant* parameter, and as such a simplified categorization of the countries in terms of their traffic fatalities status could be based on that parameter (i.e. their position along the x-axis). Consequently, better performing countries are those presenting lower fatality rate combined with increasing effect of motorization rate. Several topics can be further investigated. For example, an interesting question is the influence of the general level of motorization on the models and the values of their parameters.

Combining these observations, safer countries should be to the left and top of Fig. 6 and less safe countries should be in the right and bottom. No countries are located in the lower right triangle of the plot, which is a reflection of the fact that, despite their differences, the considered countries are developed and have a decent level of road safety. It is expected that developing countries may be located closer to the lower right corner of the plot. Their objective should be how to move towards the top left corner of the plot. This trend might –to a degree- occur due to the increased motorization level resulting in lower speeds, but also in a better overall road safety culture. However, it would be possible for road safety experts and policy makers in these countries to also study the successful policies and measures from the more advanced (from a road safety point of view) countries and try to adapt them and incorporate them into their road safety strategies.

Among the countries considered, the least safe countries in terms of safety in Europe today are Greece, Portugal, and Cyprus and indeed the respective points are located closer to the right and top of the plot. Similarly, the United Kingdom, Finland, the Netherlands and Denmark (some of the safest countries in Europe) are closer to the left and bottom, without necessarily providing the exact ranking between them. These findings provide further validation for the ability of this model to capture existing road safety trends.

## Conclusion

Modeling road safety is a complex task, which needs to consider both the quantifiable impact of specific parameters, and the underlying trends that cannot always be measured or observed. The sensitivity of users to road safety campaigns, the improved quality of the vehicle fleet, the improvement of the driving skills of the general population, and the overall improvement of the condition of the road network are only some of the aspects that cannot be easily modeled directly. Therefore, modeling should consider both measurable parameters and the dimension of time, which embodies all remaining parameters.

In the present research, the development of macroscopic models using both time and vehicle fleet as explanatory variables would have also been a meaningful approach. However, an alternative approach was opted for, for several reasons. First of all, time has some limitations as an explanatory variable as it is not really explaining road safety trends but instead reflects indirectly the changes in other parameters. Furthermore, a parameter representing time is linear (and uniform across countries) and thus limited in the amount of information that it can add to the model.

On the other hand, vehicle fleet may affect the number of fatalities, given that an increase in the vehicle number leads to higher average traffic volumes, which in turn may translate to a reduction in average speeds. Moreover, an increase of the vehicle fleet and total mileage in a country increases the need for more and safer road environment, in which the drivers’ behaviour tends to be also better [19, 20]. Besides, vehicle fleet is acknowledged as a useful alternative measurement of exposure, when traffic data are not available. Therefore, there is a causal macroscopic relationship between the number of fatalities (or fatality rates) and vehicles (or vehicle ownership). In this research, this relation has been investigated and modeled in the context of European countries.

Time-series methods have been used to account for and correct temporal correlation of the data. It is recognised however that traffic fatality risk also depends on other parameters, such as vehicle quality, traffic safety initiatives and regulations, and intensity of police enforcement. However, there are a number of reasons that make collection of these data across countries very difficult and –even when such data exist – they are often not directly comparable. Another important consideration is that some of these variables may be endogenous and thus might require special treatment in order to not impair the model.

The value of a simple model that could be used for cross-country comparisons can be easily motivated, without however claiming to fully explain the road safety phenomenon. Therefore, this paper provides a parsimonious model for linking motorization level with fatality rates across EU countries and possibly some insight on the existing or future trends in other, especially less developed countries, which still have not reached the motorization level of EU countries. Examining the road safety patterns of countries in this motorization level, policy makers and road safety experts in developing countries could foresee these developments and incorporate them into their strategies and policies.

Using fatality rate and vehicle ownership data from 16 EU countries for a period of 33 years (1970–2002) several models were developed, fitted, validated and compared, including simple non-linear models, their log-transformations and the related autoregressive models. The autoregressive versions of the models were proved to overcome the correlation of the residuals and also exhibit superior predictive properties. For a couple of countries (Italy and the Netherlands), however, the autoregressive model performed poorer than the base non-linear model. Log-transformed versions of the model also suffer from correlated residuals, and with the exception of few cases (especially Finland, Greece, and Hungary) have better or similar predictive capabilities than the non-linear models. The autoregressive log-transformed models also overcome the issues with the correlated residuals and provide superior predictive performance.

However, the estimated coefficients of the AR log-transformed model for five of the 16 countries are sometimes questionable (in terms of magnitudes and signs), suggesting that this model should be applied with caution, taking into account the particularities of the case examined. The autoregressive non-linear models therefore seem to be a more robust choice for prediction of macroscopic road safety trends, as they provide desirable predictive properties, satisfy the assumptions of the model (e.g. uncorrelated residuals) and provide intuitive model parameters (in terms of magnitude and sign).

The models presented in this research are regression based models and therefore have modest data requirements. Considering that annual road safety time-series are often small, such models are suitable for this analysis. The length of the time-intervals should be such that they provide adequate data for the model estimation and still allow for a reasonable validation data set. The choice of the boundaries of the time intervals can be important if the time series data exhibit sudden changes that could shift the regression line. If such changes are observed in the data then it is recommended that the modelers try alternative definitions of the time intervals, in order to determine the sensitivity/robustness of the models to the inclusion of one or more additional data points.

The results of the presented models can be used to evaluate the road safety performance of various countries, identifying poor performers, as well as traffic safety leaders. Indeed, as exhibited in the previous section, the model accurately determines the poor performers among the considered countries (Greece, Portugal, Cyprus), as well as those countries that are leading in terms of their road safety performance (United Kingdom, Finland, Netherlands, Denmark). At individual country level, given estimates of a country’s expected performance, the actual road safety performance of that country over the past few years may be assessed. Moreover, by applying the models, the expected road safety situation in a country in a “do-nothing” scenario is described, so that the potential impact of adopted road safety strategies may be assessed at macroscopic level (e.g. target setting). Furthermore, the study of more advanced (in terms of road safety and in general) countries may be applied to predict the future evolution of less developed or successful (in terms of traffic safety) countries. However, it is stressed that the use of the developed models for prediction should be limited within the currently applied domain, as their applicability in ranges for which data is not available cannot be verified.

Further research directions include the enrichment of the model with additional macroscopic parameters, as well as the investigation of other functional forms and model specifications. Additional parameters (such as the Gross Domestic Product, GDP) may help separate exogenous effects and isolate road safety trends. Other functional forms may also provide valuable insight into the road-safety problem. One relevant question is whether road safety trends are similar for best and worst performing countries and subsequently to find the inflection points defining the thresholds between the changing trends. An alternative modeling approach would have been the use of state-space models and structural time-series models, such as those proposed by Harvey and Shephard [16], Harvey [15], which belong to the family of unobserved component models. One of the advantages of this type of models is that they can explicitly model interventions or external road safety measures and campaigns.

## References

- 1.
Abbas KA (2004) Traffic safety assessment and development of predictive models for accidents on rural roads in Egypt. Accid Anal Prev 36(2):149–163

- 2.
Al-Haji G (2007) Road Safety Development Index (RSDI). Theory, philosophy and practice. linkoeping studies in science and technology, Dissertation No. 1100, Norrkoeping, Sweden

- 3.
Andreassen D (1991) Population and registered vehicle data vs. road deaths. Accid Anal Prev 23(5):343–351

- 4.
Bates DM, Watts DG (1988) Nonlinear regression analysis and its applications. Wiley, New York

- 5.
Beenstock M, Gafni D (2000) Globalization in road safety: explaining the downward trend in road accident rates in a single country (Israel). Accid Anal Prev 32:71–84

- 6.
Box GEP, Pierce DA (1970) Distribution of residual autocorrelations in autoregressive-integrated moving average time series models. J Am Stat Assoc 65:1509–1526

- 7.
Box GEP, Jenkins GM, Reinsel GC (1994) Time series analysis. Forecasting and control. Prentice Hall International, Inc., New Jersey

- 8.
Broughton J (1991) Forecasting road accident casualties in Great Britain. Accid Anal Prev 23(5):353–362

- 9.
Cameron MH, Haworth N, Oxley J, Newstead S, Le T (1993) Evaluation of Transport Accident Commission road safety television advertising. Report No. 52, Monash University Accident Research Centre

- 10.
COST329 (2004) Models for traffic and safety development and interventions. Final Report of the Action. European Commission, Luxembourg

- 11.
Davies N, Triggs CM, Newbold P (1977) Significance levels of the Box-Pierce portmanteau statistic in finite samples. Biometrika 64:517–522

- 12.
Dobson AJ (1990) An introduction to generalized linear models, 2nd edn. Chapman and Hall, London

- 13.
Gharaybeh FA (1994) Application of Smeed’s formula to assess development of traffic safety in Jordan. Accid Anal Prev 26(1):113–120

- 14.
Hakim S, Shefer D, Hakkert AS, Hocherman I (1991) A critical review of macro models for road accidents. Accid Anal Prev 23(5):379–400

- 15.
Harvey AC (1994) Forecasting, structural time series models and the Kalman filter. Cambridge University Press, Cambridge

- 16.
Harvey AC, Shephard N (1993) Structural time series models. In: Maddala GS, Rao CR, Vinod HD (eds) Handbook of Statistics, vol 11. Elsevier Science Publishers, B. V, Amsterdam, pp 261–302

- 17.
Henning-Hager U (1986) Urban development and road safety. Accid Anal Prev 18(2):135–145

- 18.
Jacobs GD (1986) Road accident fatality rates in developing countries-a reappraisal. In: PTRC. Summer Annual Meeting, University of Sussex, 14–17 July 1986., Proc of Seminar H. London: PTRC Education and Research Services, pp 107–119

- 19.
Koornstra MJ (1992) The evolution of road safety and mobility. IATSS Research 16:129–148

- 20.
Koornstra MJ (1997) Trends and forecasts in motor vehicle Kilometrage, road safety, and environmental quality, pp 21–32 in Roller, D., (ed.) The motor vehicle and the environment – Entering a new century. Proceedings of the 30th International Symposium on Automotive Technology & Automation, Automotive Automation Limited, Croydon

- 21.
Kopits E, Cropper M (2005) Traffic fatalities and economic growth. Accid Anal Prev 37:169–178

- 22.
Lassarre S (2001) Analysis of progress in road safety in ten European countries. Accid Anal Prev 33:743–751

- 23.
Ljung GM, Box GEP (1978) On a measure of lack of fit in time series models. Biometrika 65:297–303

- 24.
Lord D (2002) Application of accident prediction models for computation of accident risk on transportation Networks. Transport Res Rec: J Transport Res Board 1784:17–26

- 25.
McCullagh P, Nelder JA (1989) Generalized linear models, 2nd edn. Chapman Hall, New York

- 26.
Newstead S, Cameron MH, Gantzer S, Vulcan P (1995). Modeling of some major factors influencing road trauma trends in Victoria 1989–93. Report No. 74, Monash University Accident Research Centre

- 27.
Oppe S (1989) Macroscopic models for traffic and traffic safety. Accid Anal Prev 21(3):225–232

- 28.
Page Y (2001) A statistical model to compare road mortality in OECD countries. Accid Anal Prev 33:371–385

- 29.
Petersen MA (2009) Estimating standard errors in finance panel data sets: comparing approaches. Rev Financ Stud 22:435–480

- 30.
Pindyck RS, Rubinfeld DL (1997) Econometric models and economic forecasts, 4th edn. Irwin McGraw-Hill, Boston

- 31.
Qin X, Ivan JN, Ravishanker N (2004) Selecting exposure measures in crash rate prediction for two-lane highway segments. Accid Anal Prev 36(2):183–191

- 32.
R Development Core Team (2011) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, http://www.R-project.org (accessed August 13, 2011)

- 33.
Smeed RJ (1968) Variations in the pattern of accident rates in different countries and their causes. Traffic Eng Contr 10(7):364–371

- 34.
Stipdonk HL.(ed.) (2008) Time series applications on road safety developments in Europe. Deliverable D7.10 of the EU FP6 project SafetyNet

- 35.
van Beeck EF, Borsboom GJJ, Mackenbach JP (2000) Economic development and traffic accident mortality in the industrialized world, 1962–1990. Int J Epidemiol 29:503–509

- 36.
WHO (2002) WHO mortality statistics. World Health Organization, Geneva

- 37.
WHO (2004) World report on road traffic injury prevention. World Health Organization, Geneva

- 38.
Yannis G, Antoniou C, Papadimitriou E, Katsochis D (2011) When may road fatalities start to decrease? J Saf Res 42(1):17–25

## Author information

## Rights and permissions

**Open Access** This article is distributed under the terms of the Creative Commons Attribution 2.0 International License (https://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

## About this article

#### Received

#### Accepted

#### Published

#### Issue Date

#### DOI

### Keywords

- Traffic safety
- Non-linear regression
- Time series analysis
- Autoregressive models