Skip to main content

An Open Access Journal

An international review of challenges and opportunities in development and use of crash prediction models

Abstract

Purpose

Over the past 10 years, building on road infrastructure data, crash prediction models (CPMs) have become fundamental scientific tools for road safety management. However, there is a gap between state-of-the-art and state-of-the-practice, with the practical application lagging behind scientific progress. This motivated a review of international experience with CPMs from perspectives of application by practitioners and development by researchers. The objective of the paper is to improve practitioner understanding of modelling road safety performance using CPMs for crash frequency estimation, leading to their greater uptake in improving road safety. In short, why and how should road safety practitioners consider CPMs?

Methods

Both scientific and practice-oriented literature was retrieved, using academic sources, as well as reports of road agencies or institutes. The selection was limited to English language.

Results

From the review it is clear that developing CPMs is not a straightforward task: there are many available choices and decisions to be made during the process without definite guidance. This explains the diversity of approaches, techniques, and model types. The paper explains how some fundamental modelling decisions affect practical aspects of modelling safety performance.

Conclusions

There is a need to identify CPM solutions that will be scientifically sound and feasible in practitioners’ context. Together with increased communication between researchers and practitioners, these solutions will help overcome the identified challenges and increase use of CPMs.

1 Introduction

Crash prediction models (CPMs) are mathematical equations which link road safety performance and crash risk factors. First applications of CPMs appeared in the 1980s (for a review, see e.g. [48, 54, 66]). CPMs express the predicted crash frequency of a road (e.g. road segment or intersection) as a function of explanatory variables. These variables (risk factors) describe exposure to crash risk and other characteristics related to cross section, road design and other road and traffic attributes. The typical model form is:

$$ {N}_i=\exp \left({\beta}_0\right)\cdot {\left( EXP{O}_i\right)}^{\beta_j}\cdot \exp \left({\sum}_{j=2}^n\left({\beta}_j\cdot {x}_j\right)\right) $$
(1)

where

Ni … crash frequency on road i in specific time period

β0 … intercept

EXPOi … exposure on road i in specific time period

βj … regression coefficients

xj … explanatory variables

In order to correctly consider discrete and character of crash frequencies, generalized linear modelling (GLM) methods are typically used. First models used the Poisson regression as a starting point; however, it was found that they cannot handle overdispersion (the variance exceeding the mean), which is typical for crash data [66]. It motivated use of the negative binomial (or Poisson-gamma) models, which assume that the Poisson parameter follows a gamma probability distribution. According to an extensive review by Lord and Mannering [47], the negative binomial (NB) models are the most used in crash-frequency modelling. Given this fact, further text will focus on NB models; for more information on other model types, such as zero-inflated, generalized estimating equations (GEE), generalized additive models (GAM), random-effects, random-parameters, hierarchical/multilevel or neural networks, see e.g. [5, 47, 50].

CPMs analyse and highlight potential safety issues, help to identify potential for safety improvements and estimate their benefits [87]. Over the past decades, building on road infrastructure data, CPMs have become the fundamental scientific tools in quantitative road safety management, forming the foundation of the AASHTO Highway Safety Manual (HSM) or the Australian National Risk Assessment Model (ANRAM). First edition of HSM (2010) became a recognized source of information and methods for science-based decision making, allowing safety to be quantitatively evaluated alongside other transportation performance measures such as traffic operations, environmental impacts, pavement durability or construction costs. The methods in HSM, based on CPMs, provide an opportunity to: (1) improve the reliability of common activities, such as screening a network for sites at which to reduce crashes, and (2) expand analysis to include assessments of new or alternative geometric and operational characteristics [1].

CPMs may be used for various key functions, including network safety screening, development of crash modification factors (CMFs), road safety impact assessments and economic analysis. However, there are gaps between state-of-the-art (what is published by researchers) and state-of-the-practice (what is needed/used by practitioners), which limit the application of CPMs.

This paper will assist road safety practitioners in understanding why and how they might use CPMs to improve road safety. The paper presents a review of how CPMs are developed and applied. Especially, the paper explores challenges of optimising scientific validity and practical applicability. These challenges are discussed in context of opportunities and potential solutions that might assist practitioners in incorporation CPMs into road safety management.

2 Methods

The goal of the review was to critically summarize international experience in the development and application of CPMs for crash frequency estimation, with a focus on practical use by road transport agencies. In this regard, both scientific and practice-oriented literature was retrieved based on the following criteria:

  • Sources:

    • academic: Web of Science and Scopus, including selected references (snowballing)

    • practical: reports of agencies (e.g. Federal Highway Administration, Austroads, NZ Transport Agency)

    • both: ARRB Knowledge Base, TRID database, reports of European institutes, EU project deliverables

  • Keywords: accident prediction model, crash prediction model, safety performance function

  • Language: English

  • Time frame restriction: none

To focus on the typical road settings (the main road network, i.e. motorways/freeways/expressways and national roads), the following specific issues were not considered:

  • Macro/planning-level applications (analysis based on jurisdiction, GDP, or land-use zones in assignment models)

  • Specific CPMs for vulnerable road users, such as pedestrians or bicyclists

  • CPMs for specific road elements (e.g. railway level crossings, bridges, tunnels, etc.)

  • Logistic binary modelling of crash characteristics (e.g. victim gender and age, vehicle age, etc.)

  • Use of CPMs for evaluation of safety effectiveness of safety treatments or programmes (before/after studies)

These CPM applications are important in broader road safety context and may be explored using findings presented in this paper as a starting point.

The retrieved materials were mainly from Europe, Australia, New Zealand and North America. In order to stress the practical focus, the aim was to select the works related to the most frequent applications of CPMs. The final literature selection focused on developing and using CPMs of road segments and intersections from an international perspective of fulfilling important road safety management functions. These include network screening for high-risk road sections, identifying significant crash risk factors, and road safety impact assessment of potential treatment options. The review is structured along the following sections, given by the hierarchical nature of considering, developing and applying CPMs:

  1. 1.

    Data collection, sample size and time period

  2. 2.

    Road network segmentation

  3. 3.

    Selection of explanatory variables

  4. 4.

    Model function and variable forms

  5. 5.

    Model validation

  6. 6.

    Using CPMs in network screening

  7. 7.

    Using CPMs in developing crash modification factors (CMFs)

  8. 8.

    Using CPM tools, e.g. for road safety impact assessment

Previous reviews related to CPMs [5, 47, 54, 86] usually considered some of these steps only, mainly 3 and 4. The presented review fills the gap by compiling information on all eight steps, followed by summarised challenges and opportunities, with available solutions.

3 Review

3.1 CPMs and their uses

CPMs may be used to accomplish various road safety management functions, such as:

  1. 1.

    Exploring and comparing combinations of individual risk factors that make some road locations unsafe

  2. 2.

    Network safety screening, i.e. safety ranking road locations, or identification of hazardous locations

  3. 3.

    Impact assessments, i.e. assessing safety of contemplated (re)constructions or safety treatments

  4. 4.

    Economic analysis of project costs vs. safety benefits

It is to be noted that Task 1 is rather research-oriented; Tasks 2, 3 and 4 represent typical practical tasks undertaken by many road agencies. According to a review of North American practices [62], network screening is the most common application of CPMs. In European project PRACT, cost-benefit analysis was identified as a common use of CPM application [85, 86].

As noted, CPMs may be developed for road segments of a particular road type (e.g. rural undivided highway), for all intersections, for individual intersection types, or any combination of these. CPMs can be developed for all recorded crashes, casualty crashes, or severe crashes only; the approach depends on the purpose of the model. Very broad CPMs may be useful in high-level network screening or highlighting strategic issues. More specific safety management or research objectives will require more specific models. Given the range of potential applications, CPMs have been acknowledged worldwide as recommended tools, on which rational road safety management should be based. However, at the same time, it has been known that prediction modelling is not a simple task [15, 18, 77] and involve various analytical choices, which are often done without explicit justification. This may explain why there are gaps between state-of-the-art and state-of-the-practice; and this may in turn limit the practical use of CPMs. For example, a survey among European road agencies found that 70% of them rarely or never systematically use CPMs in their decision-making [85].

Regarding the selection of research for inclusion in the review, another distinction needs to be made. HSM introduces a set of CPMs (referred to as safety performance functions, SPFs) and crash modification factors (CMFs). Crash prediction in the HSM has two main steps: (1) prediction of a baseline crash rates using SPFs/CPMs for nominal route and intersection conditions, and (2) multiplying the ‘baseline’ models by crash modification factors (CMFs) to capture changes in geometric design and operational characteristics (deviations from nominal conditions). This approach has gained popularity, being incorporated into Interactive Highway Safety Design Model (IHSDM), and recently adopted in the European CPM [86], as well as Australian ANRAM [41] and New Zealand Crash Estimation Compendium [53].

The CPMs/SPFs in the HSM and ISHDM, developed from data in several US states, are not directly transferable to other jurisdictions (inside or outside US). Some studies confirmed good transferability, mainly between US states [7, 74, 84], but others were less successful when applied abroad, for example in Canada, Italy or Korea [42, 63, 64, 69, 88]. Therefore, it is recommended that each country and jurisdiction (e.g. State) develops its own specific CPMs. The present review, written by non-US authors, adopts this perspective.

3.2 Data collection

In theory, to obtain sufficiently representative models, one should randomly sample data from the population of similar road types or intersections. In this regards, given the variance of crash frequencies, several authors recommended minimal sample sizes, such as at least 50 sites [77], 200 crashes [39] or 300 crashes [73]. The HSM [1] advises using a sample of 30–50 locations with a total of at least 100 crashes per year. However, others were critical about the one-size-fits-all approach. For example, Lord [46] provided guidance on necessary sample size based on sample mean, i.e. for example 200 segments in case of average of 5 crashes per segment, or 1000 segments in case of average of 1 crash per segment. (Note that these considerations do not apply in case of network screening, whose goal is to screen the complete network).

In addition, unlike in the case of large USA and Canadian samples, smaller countries are limited in their samples of network and crash data. For example, Turner etal. [77] mentioned, that New Zealand road network size limits the development of models for some segment and site types, e.g. interchanges. This factor also reduces opportunities for disaggregation CPMs into all crash types and severity levels.

Data on crashes, traffic volumes and other relevant road attributes need to be assigned to all the sample sites. Crash data are known for various biases, such as underreporting, location errors, severity misclassification or inaccurate identification of contributory factors. Also, traffic volume data may be prone to errors: typical measure of traffic volume AADT is an average, aggregated for various vehicle types [18]; in addition, location errors also exist, as traffic volumes typically measured at one location are assumed to apply to the entire section, and often to multiple sections. Thus actual variation in traffic flow is difficult to reflect in data.

Choice of time period for crash and AADT data requires another decision. A 1- to 5-year period is usually recommended for safety ranking, with 3-year period being the most frequent [16]. Using longer time periods (beyond 5 years) may cause problems due to changes in conditions, such as substantial increases in traffic volumes or layout changes, over the period. Probably due to these issues there are no specific guidelines for time period choice. An exception was the simulation study of Cheng and Washington [13], which concluded there is little gain in the network screening accuracy when using a period longer than 6 years. Also using several consistency tests, 4 years were found sufficient for developing a CPM in a study by Ambros etal. [2]. Usually a compromise between the need for early analysis of new treatments and the need for accumulating sufficient crashes to permit robust analysis is accepted [18].

Differences between rural and urban settings are also worth mentioning. Traditionally most focus has been given to rural roads (as also evident from CPM reviews [66, 85, 86]). In contrast, modelling urban safety is more challenging, due to higher presence of vulnerable road users and complex environments, including facilities for different road users, mixed land use, or higher density of various intersection types. Detailed crash data is likely to be needed if crash type-specific models are to be developed later on. More road attributes also need to be collected for urban roads, then tested for correlation, autocorrelation, and only then considered in models [50].

Ideal data sources are road agency asset inventories. Unfortunately, these may not be complete or up to date, and a modeller thus needs to combine various data sources. Additional surveys can be also conducted, either in the field (pedestrian counts, signal timing, speeds, etc.), drive-through digital video collection, or via online maps. Recent emergence of big data and open government policies (e.g. open data initiatives such as data.vic.gov.au) have aided these efforts substantially. It is feasible to pull together substantial amounts of road data from publicly available and road agencies’ own sources. Cross-checking of data for the same attributes between different sets also adds to reducing errors and better data quality management.

3.3 Road network segmentation

CPMs are typically developed either for road intersections or segments. In the latter case, segmentation has to be conducted, in order to divide the network into homogeneous segments, i.e. with constant values of explanatory variables. However, in case of multiple variables, this practice can naturally lead to short segments. This may complicate accurate assigning of crashes to individual segments. In addition, crash concentration is heterogeneous and random; many short segments may also have zero crash counts during the selected time period.

For segmentation, some authors set fixed lengths of several hundred meters [12, 14, 26], or used patterns based on tangents and curves [10, 44, 79]. Long segments can lead to forced homogenisation of variables by aggregating continuous variables into categories (e.g. pavement width bands), and this can lead to loss of applicability. In short, segmentation should consider the overall purpose of the modelling exercise. Longer segments (1–5 km) are often used for network screening [27, 57, 65]. Shorter segments are used to develop more meaningful CMFs, or to estimate localised benefits of safety treatments. Variable segment length can be included in the model. HSM assumes length to be a directly proportional to crash frequency, however many published models which include segment length as a variable suggest otherwise (e.g. [79]).

In practice, division of road network into segments is likely to be dictated by structure of national road databanks. For example in the Czech Republic, national traffic census (as the main source of AADT data) does not cover all minor roads; thus process of aggregating segments into longer segments including minor intersections was found feasible [3]. As the segments may be subject to further investigations, their length should be feasible for on-site visits or crash analyses.

3.4 Explanatory variables

Selection of explanatory variables should be guided by previously documented crash and injury risk factor evidence available from research literature. However, in practice it is often dictated simply by data availability. Explanatory variables generally include exposure, transport function, cross section, traffic control; less often variables describing alignment, vehicle types or road user behaviour are used [66]. When actual variables are not available, proxy variables may be used, e.g. abutting land use as a proxy for pedestrian movement counts.

The first step in variable selection involves identifying variables which are correlated with each other. For each such pair the researcher should remove one variable which is less useful to the purpose of the model (e.g. if sealed shoulder provision is strongly correlated with line marking presence, then remove the latter). In order to further identify the statistically significant variables, a stepwise regression approach is typically used. It may be applied either in a forward selection or a backward elimination manner; in both cases selected goodness-of-fit (GOF) measures are used to assess the statistical significance. Common GOF measures include information criteria such as AIC or BIC, while others use for example scaled deviance [22, 77] or proportion of explained systematic variance [2, 45].

Based on a number of explanatory variables (model complexity), CPMs may be simple (exposure-only) or multivariate (fully-specified) [62]. Sawalha and Sayed [71] warned against temptations to build overfit models, i.e. containing too many insignificant variables. In fact, a number of studies found that additional predictors are not as beneficial as expected [59, 70, 82]. One should strive for parsimonious models, i.e. the ones containing as few explanatory variables as possible [66]. Such models enable simple interpretation and understanding, as well as easy updating [2].

A practice-driven approach was adopted in developing New Zealand rural road CPMs [79]. When it was found that the statistically significant variables did not include the parameters that were of most interest to practitioners, two distinct model types were developed. Statistical models are the best-performing models according to goodness-of-fit measures at 95% confidence levels. Practitioner models contain additional variables of interest to safety professionals, at confidence levels of 70% or more.

On the other hand, in case of leaving out an influential explanatory variable due to unavailable data, so called “omitted variable bias” occurs. The bias results in biased parameter estimates that can produce erroneous inferences and crash frequency predictions [47, 50, 51].

Another bias may be caused by spatial correlation, given by the fact that adjacent road segment may share unobserved effects [47]. This bias can be handled by using random-effect models, where the common unobserved effects are assumed to be distributed over the road segments according to some distribution and shared unobserved effects are assumed to be uncorrelated with explanatory variables [47].

3.5 Model function and variable forms

Before carrying out the modelling task, exploratory data analysis should be conducted, in order to detect potential outliers, check the extreme values, potential mistakes, etc.

As previously mentioned, crash data are typically overdispersed. The degree of overdispersion in a negative binomial model is represented by overdispersion parameter that is estimated during modelling along with the regression coefficients of the regression equation. The overdispersion parameter is used to determine the value of a weight factor for use in the empirical Bayes (EB) method. This method combines predicted (modelled) and recorded (observed) crash frequencies, in order to improve reliability of a specific site safety level estimation [32]. Applications of EB methods are described in later sections of the review.

Crash frequency (i.e. response variable) ideally should not involve mixed levels of crash severity and crash types, as it may produce uninterpretable results [18]. It is thus recommended to develop disaggregated CPMs [66]. Alternatively one may use the observed proportion of a given crash type or severity and apply it to the CPM that has been estimated for total crashes [72]. However, this has been found a questionable practice, leading to estimation errors [40]. The current recommendation is estimating separate CPMs by crash types. New Zealand practice is developing models for key (or common) crash types and, if necessary, scaling their predictions to represent total crash frequency, to allow for less common crash types [77]. Some studies [24, 27] used sub-samples (for example stratification based on AADT under/over specific limits) in order to improve model quality. In any case, developing disaggregated CPMs obviously requires larger sample sizes. In terms of severity models are developed by injury severity levels (usually with fatal and serious injury crashes combined), as with the ANRAM models [41]. Alternatively, severity factors (proportions) are applied to models developed for all injury crashes or all crashes (including non-injury) [53].

Regarding function forms of explanatory variables, there is no universal guidance and various are used in the literature. To select the most suitable mathematical forms of explanatory variables, one may use graphical relationships between crash frequency or a road variable (i.e. univariate analysis) [4], or use more complex techniques, such as empirical integral functions and cumulative residuals (CURE) [33]. According to Hauer [35], the model equation may have both multiplicative components (to represent the influence of continuous factors, such as lane width or shoulder type), and additive components (to account for the influence of point hazards, such as driveways or narrow bridges). Despite these recommendations, the typical modelling approach is often simple. The general model form of Eq. (1) is widely adopted.

Exposure is usually modelled in terms of traffic volume, i.e. single AADT value for road segments, or product of major and minor AADTs for road intersections. Function is typically a power form, but some authors considered it jointly with an exponential form (so called Ricker model [68]). Traffic volumes (flows) should be adapted to the specific segment and intersection types. For example, New Zealand CPMs [53] apply either product of flows or conflicting flows, based on the type of intersection, urban/rural settings and speed limits. As discussed, segment length variable is often used where road segments are not of equal length. For intersections, standard approach length is typically used, e.g. 50–100 m, and not modelled as a variable.

Another example is segment length, usually applied as an offset, i.e. with regression coefficient = 1, but often also in a power form [30, 67, 68]. According to Hauer [34], segment length should also be considered when estimating the over-dispersion parameter for the frequency models to be used in the empirical Bayes approach. However, the exact form of the relationship is not definite [9]; in fact, not only length but also other variables may play a role [25].

Creation of a model is undertaken by running relevant statistical regression processes on the sample data. The most common tools for this are statistical software packages such as R, SPSS, SAS or Matlab. Microsoft Excel is not considered appropriate for this task as it lacks many of the necessary statistical features.

In practice, the modelling process is highly iterative. Variables are added, and then removed if shown to add little or nothing to explanation of the response variable. Often data for a given variable is re-categorised to improve its significance if it is borderline. Often borderline or non-significant variables are retained if they add to better understanding of crash problem. Optimisation of the model fit vs. number of variables vs. applicability is gradually achieved. This iterative process can be stopped when little further improvement in the model is achieved with each iteration [10, 25].

3.6 Model validation

The goal of validation is proving whether the developed model is acceptable from both scientific and practical perspectives. It is thus surprising that most of modelling guidelines seem to overlook this step [1, 23, 35, 36, 48, 71, 72, 83].

According to Oh etal. [56], one may distinguish between internal validity and external validity.

  • Interval validity means that CPM findings should be consistent with established knowledge on the subject; CPM should also possess the features of the underlying phenomenon; and finally CPM should agree with fundamental information and knowledge, such as physical mechanics and dynamics involved with crashes [56]. Newly developed CPMs may be compared to previous literature in terms of signs and magnitudes of regression coefficients, or for example their marginal effects [81].

  • External validity (goodness-of-fit) may be evaluated by comparing either models from two independent samples, or a model from a complete sample applied on selected sub-samples that have not been used in the model building (e.g. randomly-chosen 20%). Various goodness-of-fit indicators may be applied; often proportion of systematic variation in the original accident dataset explained by the model (also known as Elvik index) is used [22, 45].

3.7 Using CPMs in network screening

Previous reviews [16, 52] indicated that current state-of-practice is generally behind the state-of-the-art. According to the EB methodology, predicted crash frequency from CPMs should be combined with observed historical crash frequency to obtain the so called “expected average crash frequency with empirical Bayes adjustment” (in short EB estimate). These EB estimates benefit to the practitioner by removing much of the random statistical variation associated with historical crash data, especially at low frequencies [1, 41]. Apart from EB estimates, other safety indicators can be developed for network screening purposes, for example potential for safety improvement (PSI) [61], level of service of safety (LOSS) [43] or scaled difference [8].

In Australia and New Zealand, where low-volume rural roads generate very low numbers of crashes per kilometre per 5 years (or zero), CPMs provide a continuous proxy measure of safety. In Australia the ANRAM model uses EB estimates of severe casualty crashes to remove the random variation in observed crash data at 1–3 km segment level: sites are prioritised simply on the EB estimate [41]. Differences of more than two standard errors between the EB estimate and observed crashes are noted as a possible indicator of non-infrastructure based influences of safety (e.g. localised speeding or drink-driving) [41].

Given the variety of available methods, HSM [1] notes that “using multiple performance measures to evaluate each site may improve the level of confidence in the results.” Hence sites may be ranked for treatment based on several different methods [49, 52, 89]. Those that rank consistently high using several methods are the sites where treatment should be focused.

3.8 Using CPMs in developing crash modification factors

Crash modification factor (CMF) is a multiplicative factor used to compute the expected number of crashes after implementing a given countermeasure or a design change at a location. CMFs may be derived from before-after or cross-sectional studies; however, each method has its own challenges, and available CMFs can often be highly inconsistent between literature sources [28]. Before and after studies are generally the preferred source of CMFs, particularly for the HSM. However they typically only look at features in isolation and so when the combined effects of features on crash occurrence is not the sum of the effects of each individual feature, then they may provide misleading results. Several solutions to developing multiple treatment CMFs have been proposed, without reaching definite conclusions [17, 29, 58].

Cross-sectional studies (i.e. the ones based on CPMs) have been criticised for being more prone to non-causal safety effects, due to bias-by selection [11, 19, 36]. Bias-by-selection can occur when a treatment (e.g. a crash barrier) is applied more often to sites that already have a crash problem than to those that do not. They do however provide a much better crash prediction for the combination of road features. In some cases, CMFs are developed from CPMs where limited before and after studies are available.

Although the practice of deriving crash modification factors (CMFs) from cross-sectional CPMs has been criticised, it is relatively common. Again, there are various approaches: for example, Park etal. [58] tested six different methods of combining CMFs and concluded that one should not rely on only one of them. Interim solution is applying ‘rule-of-thumbs’ , such as using the product of no more than three separate independent countermeasures [55] or reducing the product through multiplying by a ratio 2/3 [76].

3.9 Using CPM tools

The above-mentioned analytical steps (data preparation, exploratory analysis, modelling, calculations) are typically conducted in statistical software or spreadsheets. Nevertheless, for an end user it is beneficial to be able to visualize the results. These may take form of tables or map outputs, for example the identified hotspots or the lists of ranked segments. A number of practitioner tools are worthy of mention, especially as they apply to network screening and analysis of safety impacts of potential treatments.

One option is using stand-alone software solutions, such as the following two from the USA:

  • IHSDM Crash Prediction Module [20] estimates the frequency and severity of crashes on a highway using geometric design and traffic characteristics. This helps users evaluate an existing highway, compare the relative safety performance of design alternatives, and assess the safety cost-effectiveness of design decisions.

  • SafetyAnalyst (commercial software) Network Screening Tool [21] identifies sites with potential for safety improvement. In addition, it is able to identify sites with high crash severities and with high proportions of specific crash types.

Note that there are close links between IHSDM, SafetyAnalyst and Highway Safety Manual. According to Harwood etal. [31], SafetyAnalyst Module 1 (network screening) is to be applied first, followed by Module 2 (diagnosis and countermeasure selection), Module 3 (economic appraisal and priority ranking) and IHSDM to perform safety analyses as part of the design process.

The Finnish evaluation tool TARVA [60] also deserves mentioning. Its purpose is to provide a common method and database for (1) predicting the expected number of crashes, and (2) estimating the safety effects of road safety improvements. Based on simple CPMs and pre-determined CMFs, it currently exists in Finnish and Lithuanian versions, with planned applications in other countries.

Capabilities of network screening and road safety impact assessment are built in commercial software PTV Visum Safety. There are also applications in the form of Excel spreadsheets, for example British COBALT, Swedish TS-EVA or Norwegian CPMs for national and country roads [37, 38]. In the US, spreadsheets were developed for safety analysis of freeway segments and interchanges (ISAT [75] and ISATe [6]).

The Australian National Risk Assessment Model (ANRAM) tool, available to road agencies, is a network screening and prioritisation tool, which uses CPMs for different road stereotypes, together with CMFs and observed crash data to estimate severe injury crashes across segmented road network [41]. ANRAM allows users to develop and estimate benefits of road network and corridor treatment programs. This tool has gained wide use among state road agencies in Australia, particularly for the rural road networks where actual severe crashes are randomly distributed. ANRAM is available in a spreadsheet form, with planned online adaptations.

New Zealand also has a history of various safety prediction tools. Turner etal. [78] stressed the practical need of such tools and after review of overseas applications, considered IHSDM as worth transferring into New Zealand conditions, for assessing new road designs. A later work [80] reviewed New Zealand spreadsheet applications, as well as experience with using and calibrating the ISAT tool from the USA.

Increasingly, online business analytics software has been used to display CPM results in map format, often with dynamic filtering and computational functions. Examples include open source and free resources such as ArcGIS Online, QGIS, Tableau, or Microsoft Power BI. These solutions make it easy for practitioners to access and understand the value of CPMs.

4 Challenges and opportunities

The review has presented an opportunity to synthesise the key challenges practitioners are likely to face in translating the scientific state-of-the-art into practice. Opportunities and potential solutions are proposed for addressing these challenges and making CPMs more accessible to road safety practitioners – see Table 1.

Table 1 Overview of identified challenges, opportunities and potential solutions

5 Summary and conclusions

Greater uptake of state-of-the-art analytical techniques is necessary for continuing improvement in road safety. This paper aimed to improve practitioner understanding of modelling road safety performance using CPMs, so that this useful analytical technique could become more accessible.

A number of steps have been reviewed: from data collection and road network segmentation to choosing variables and function forms, validating models and using them in practice, including description of available tools. The review highlighted that developing CPMs is not a straightforward task: there are many alternative choices and decisions to be made during the process (without definite guidance), which explains the diversity of approaches and techniques. While this may be interesting from a research perspective, the current diverse state-of-the-art limits understanding and application by practitioners, and complicates international comparability or transferability. There is a need to identify the opportunities and solutions, which will be scientifically sound, while also meeting the needs of practitioners.

The main consideration for the researches should be application of their models by intended practitioners. This applies equally in the context of basic research, such as seeking understanding of a new challenge, as in the context of applied research such as development of algorithms for inclusion in practitioner software. Either way the end users of CPMs are the practitioners, i.e. road agency engineers, policy makers, or data analysts.

The review aimed to improve practitioner understanding of CPMs to bolster their use in improving road safety. The question of how and why should practitioners consider using CPMs could be answered as follows:

  • CPMs are valuable tools, which help link crashes with risk factors. This is especially valuable in current conditions of scattered crash occurrence (less crash black-spots), where traditional crash-based approaches do not work well.

  • Developing and using CPMs has its challenges. However, these may be overcome by improved communication of the CPM benefits and application, so that practitioners have a basic understanding of CPMs and can make basic application decisions (e.g. use or calibrate available models).

  • Applying network-wide CPMs enable performing effective road safety impact assessment and network screening.

  • Ongoing investment in developing CPM-based practitioner tools, big data management and visualisation platforms offers potential for improved accessibility and uptake of CPMs in road safety management.

References

  1. AASHTO (2010) Highway safety manual, 1st edn. American Association of State Highway and Transportation Officials, Washington

    Google Scholar 

  2. Ambros J, Valentová V, Sedoník J (2016) Developing updatable crash prediction model for network screening: case study of Czech two-lane rural road segments. Transp Res Rec 2583:1–7

    Article  Google Scholar 

  3. Ambros J, Sedoník J, Křivánková Z (2018) How to simplify road network safety screening? Adv Transp Stud 44:151–158

    Google Scholar 

  4. Arndt O, Troutbeck R (2006) Techniques for analysing the effect of road geometry on accident rates using multifactor studies. Paper presented at the 22nd ARRB conference, Canberra.

  5. Basu S, Saha P (2017) Regression models of highway traffic crashes: a review of recent research and future research needs. Procedia Eng 187:59–66

    Article  Google Scholar 

  6. Bonneson JA, Geedipally S, Pratt MP, Lord D (2012) Safety prediction methodology and analysis tool for freeways and interchanges. NCHRP project 17–45 final report. Transportation Research Board, Washington

    Google Scholar 

  7. Bornheimer C, Schrock S, Wang M, Lubliner H (2012) Developing a regional safety performance function for rural two-lane highways. Paper presented at the 91st Transportation Research Board Annual Meeting, Washington

  8. Butsick AJ, Wood JS, Jovanis PP (2017) Using network screening methods to determine locations with specific safety issues: a design consistency case study. Accid Anal Prev 106:223–233

    Article  Google Scholar 

  9. Cafiso S, Di Silvestro G, Persaud B, Begum MA (2010) Revisiting variability of dispersion parameter of safety performance for two-lane rural roads. Transp Res Rec 2148:38–46

    Article  Google Scholar 

  10. Cafiso S, D’Agostino C (2013) Investigating the influence of segmentation in estimating safety performance functions for roadway sections. Paper presented at the 92nd Transportation Research Board Annual Meeting, Washington

  11. Carter D, Srinivasan R, Gross F, Council F (2012) Recommended protocols for developing crash modification factors. NCHRP project 20-07, task 314 report. Transportation Research Board, Washington

    Google Scholar 

  12. Cenek PD, Davies RB, McLarin MW, Griffith-Jones G, Locke NJ (1997) Road environment and traffic crashes. Research report 79. Transfund, Wellington

    Google Scholar 

  13. Cheng W, Washington S (2005) Experimental evaluation of hotspot identification methods. Accid Anal Prev 37:870–881

    Article  Google Scholar 

  14. da Costa JO, Jacques MAP, Pereira PAA, Freitas EF, Soares FEC (2015) Portuguese two-lane highways: modelling crash frequencies for different temporal and spatial aggregation of crash data. Transp 30:1–12

    Google Scholar 

  15. Eenink R, Reurings M, Elvik R, Cardoso J, Wichert S, Stefan C (2008) Accident prediction models and road safety impact assessment: recommendations for using these tools. RIPCORD-ISEREST project deliverable 2

    Google Scholar 

  16. Elvik R (2008) A survey of operational definitions of hazardous road locations in some European countries. Accid Anal Prev 40:1830–1835

    Article  Google Scholar 

  17. Elvik R (2009) An exploratory analysis of models for estimating the combined effects of road safety measures. Accid Anal Prev 41:876–880

    Article  Google Scholar 

  18. Elvik R (2010) Assessment and applicability of road safety management evaluation tools: Current practice and state-of-the-art in Europe. Report 1113/2010. Institute of Transport Economics, Oslo

    Google Scholar 

  19. Elvik R (2011) Assessing causality in multivariate accident models. Accid Anal Prev 43:253–264

    Article  Google Scholar 

  20. FHWA (2003) Interactive highway safety design model (IHSDM) – crash prediction module (CPM) Userʼs manual. Federal Highway Administration, McLean

    Google Scholar 

  21. FHWA (2010) SafetyAnalyst: software tools for safety management of specific highway sites. White paper for module 1 – Network screening. Federal Highway Administration, McLean

    Google Scholar 

  22. Fridstrøm L, Ifver J, Ingebrigtsen S, Kulmala R, Thomsen LK (1995) Measuring the contribution of randomness, exposure, weather, and daylight to the variation in road accident counts. Accid Anal Prev 27:1–20

    Article  Google Scholar 

  23. Fridstrøm L (2015) Disaggregate accident frequency and risk modelling: A rough guide. Report 1403/2015. Institute of Transport Economics, Oslo

    Google Scholar 

  24. Garach L, de Oña J, López G, Baena L (2016) Development of safety performance functions for Spanish two-lane rural highways on flat terrain. Accid Anal Prev 95:250–265

    Article  Google Scholar 

  25. Geedipally SR, Lord D, Park BJ (2009) Analyzing different parameterizations of the varying dispersion parameter as a function of segment length. Transp Res Rec 2103:108–118

    Article  Google Scholar 

  26. Geyer J, Lankina E, Chan C-Y, Ragland D, Pham T, Sharafsaleh A (2008) Methods for identifying high collision concentration locations for potential safety improvements. Report UCB-ITS-PRR-2008-35. University of California, Berkeley

    Google Scholar 

  27. Gitelman V, Doveh E (2016) Safety management of non-urban roads in Israel: an application of empirical Bayes evaluation. J Traffic Transp Eng 4:259–269

    Google Scholar 

  28. Gross F, Persaud B, Lyon C (2010) A guide to developing quality crash modification factors. Report FHWA-SA-10-032. Federal Highway Administration, Washington

    Google Scholar 

  29. Gross F, Hamidi A (2011) Investigation of existing and alternative methods for combining multiple CMFs. T-06-013 HSIP Technical Support, Task A.9

    Google Scholar 

  30. Hadi MA, Aruldhas J, Chow L-F, Wattleworth JA (1995) Estimating safety effects of cross-section design for various highway types using negative binomial regression. Transp Res Rec 1500:169–177

    Google Scholar 

  31. Harwood DW, Torbic DJ, Richard KR, Meyer MM (2010) SafetyAnalyst: Software tools for safety management of specific highway sites. Report FHWA-HRT-10-063. Federal Highway Administration, McLean

    Google Scholar 

  32. Hauer E (1997) Observational before-after studies in road safety: estimating the effect of highway and traffic engineering measures on road safety. Pergamon, Oxford

    Google Scholar 

  33. Hauer E, Bamfo J (1997) Two tools for finding what function links the dependent variable to the explanatory variables. Paper presented at ICTCT 97 Conference, Lund

  34. Hauer E (2001) Overdispersion in modelling accidents on road sections and in empirical Bayes estimation. Accid Anal Prev 33:799–808

    Article  Google Scholar 

  35. Hauer E (2004) Statistical road safety modeling. Transp Res Rec 1897:81–87

    Article  Google Scholar 

  36. South J, Blass B (2001) The future of modern genomics. Blackwell, London

  37. Høye A (2014) Development of crash prediction models for national and county roads in Norway. Report 1323/2014. Institute of Transport Economics, Oslo

    Google Scholar 

  38. Høye A (2016) Development of crash prediction models for national and county roads in Norway (2010-2015). Report 1522/2016. Institute of Transport Economics, Oslo

    Google Scholar 

  39. Jonsson T (2005) Predictive models for accidents on urban links: A focus on vulnerable road users. Bulletin 226. Lund University, Lund

    Google Scholar 

  40. Jonsson T, Lyon C, Ivan J, Washington S, van Schalkwyk I, Lord D (2009) Investigating differences in safety performance functions estimated for total crash count and for crash county by collision type. Transp Res Rec 2102:115–123

    Article  Google Scholar 

  41. Jurewicz C, Steinmetz L, Turner B (2014) Australian National Risk Assessment Model. Publication AP-R451–14. Austroads, Sydney

    Google Scholar 

  42. Kim E, Lee D, Choi B-G, Choi S-E, Choi E (2010) Applicability of a Korea highway safety evaluation model compared to the crash prediction module of IHSDM. Paper presented at the 12th World Conference on Transport Research, Lisbon

  43. Kononov J, Allery B (2003) Level of service of safety: conceptual blueprint and analytical framework. Transp Res Rec 1840:57–66

    Article  Google Scholar 

  44. Koorey G (2009) Road data aggregation and sectioning considerations for crash analysis. Transp Res Rec 2103:61–68

    Article  Google Scholar 

  45. Kulmala R (1995) Safety at rural three- and four-arm junctions: Development and application of accident prediction models. Publication 233. VTT Technical Research Centre of Finland, Espoo

    Google Scholar 

  46. Lord D (2006) Modeling motor vehicle crashes using Poisson-gamma models: examining the effects of low sample mean values and small sample size on the estimation of the fixed dispersion parameter. Accid Anal Prev 38:751–766

    Article  Google Scholar 

  47. Lord D, Mannering F (2010) The statistical analysis of crash-frequency data: a review and assessment of methodological alternatives. Transp Res A 44:291–305

    Google Scholar 

  48. Maher MJ, Summersgill I (1996) A comprehensive methodology for the fitting of predictive accident models. Accid Anal Prev 28:281–296

    Article  Google Scholar 

  49. Manepalli URR, Bham GH (2016) An evaluation of performance measures for hotspot identification. J Transp Saf Secur 8:327–345

    Article  Google Scholar 

  50. Mannering FL, Bhat CR (2014) Analytic methods in accident research: methodological frontier and future directions. Analytic Methods Accid Res 1:1–22

    Article  Google Scholar 

  51. Mitra S, Washington S (2012) On the significance of omitted variables in intersection crash modeling. Accid Anal Prev 49:439–448

    Article  Google Scholar 

  52. Montella A (2010) A comparative analysis of hotspot identification methods. Accid Anal Prev 42:571–581

    Article  Google Scholar 

  53. NZTA (2016) Crash estimation compendium (New Zealand crash risk factors guideline). NZ Transport Agency, Wellington

    Google Scholar 

  54. OECD (1997) Road safety principles and models: review of descriptive, predictive, risk and accident consequence models. OECD, Paris

    Google Scholar 

  55. OECD (2012) Sharing road safety: developing an international framework for crash modification functions. OECD, Paris

    Google Scholar 

  56. Oh J, Lyon C, Washington S, Persaud B, Bared J (2003) Validation of FHWA crash models for rural intersections: lessons learned. Transp Res Rec 1840:41–49

    Article  Google Scholar 

  57. Pardillo Mayora JM, Bojórquez Manzo R, Camarero Orive A (2006) Refinement of accident prediction models for Spanish national network. Transp Res Rec 1950:65–72

    Article  Google Scholar 

  58. Park J, Abdel-Aty M, Lee C (2014) Exploration and comparison of crash modification factors for multiple treatments on rural multilane roadways. Accid Anal Prev 70:167–177

    Article  Google Scholar 

  59. Peltola H, Kulmala R, Kallberg V-P (1994) Why use a complicated accident prediction model when a simple one is just as good? Paper presented at the 22nd PTRC Summer Annual Meeting, Warwick

  60. Peltola H, Rajamäki R, Luoma J (2013) A tool for safety evaluations of road improvements. Accid Anal Prev 60:277–288

    Article  Google Scholar 

  61. Persaud B, Lyon C, Nguyen T (1999) Empirical Bayes procedure for ranking sites for safety investigation by potential for safety improvement. Transp Res Rec 1665:7–12

    Article  Google Scholar 

  62. Persaud BN (2001) Statistical methods in highway safety analysis: A synthesis of highway practice. NCHRP synthesis 295. Transportation Research Board, Washington

    Google Scholar 

  63. Persaud B, Lord D, Palmisano J (2002) Calibration and transferability of accident prediction models for urban intersections. Transp Res Rec 1784:57–64

    Article  Google Scholar 

  64. Persaud B, Saleem T, Faisal S, Lyon C, Chen Y, Sabbaghi A (2012) Adoption of Highway Safety Manual predictive methodologies for Canadian highways. Paper presented at 2012 TAC Conference, Fredericton

  65. Ragnøy A, Christensen P, Elvik R (2002) Injury severity density: A new approach to identifying hazardous road sections. Report 618/2002. Institute of Transport Economics, Oslo

    Google Scholar 

  66. Reurings M, Janssen T, Eenink R, Elvik R, Cardoso J, Stefan C (2005) Accident prediction models and road safety impact assessment: a state-of-the-art. RIPCORD-ISEREST project deliverable 2.1

    Google Scholar 

  67. Reurings M, Janssen T (2007) Accident prediction models for urban and rural carriageways. Report R-2006-14. SWOV Institute for Road Safety Research, Leidschendam

    Google Scholar 

  68. Roque C, Cardoso JL (2014) Investigating the relationship between run-off-the-road crash frequency and traffic flow through different functional forms. Accid Anal Prev 63:121–132

    Article  Google Scholar 

  69. Sacchi E, Persaud B, Bassani M (2012) Assessing international transferability of highway safety manual crash prediction algorithm and its components. Transp Res Rec 2279:90–98

    Article  Google Scholar 

  70. Saha D, Alluri P, Gan A (2015) Prioritizing highway safety Manual’s crash prediction variables using boosted regression trees. Accid Anal Prev 79:133–144

    Article  Google Scholar 

  71. Sawalha Z, Sayed T (2006) Traffic accident modeling: some statistical issues. Can J Civ Eng 33:1115–1124

    Article  Google Scholar 

  72. Srinivasan R, Bauer K (2013) Safety performance function development guide: Developing jurisdiction-specific SPFs. Report FHWA-SA-14-005. Federal Highway Administration, Washington

    Google Scholar 

  73. Srinivasan R, Carter D, Bauer K (2013) Safety performance function decision guide: SPF calibration vs SPF development. Report FHWA-SA-14-004. Federal Highway Administration, Washington

    Google Scholar 

  74. Sun X, Li Y, Magri D, Shirazi HH (2006) Application of highway safety manual draft chapter: Louisiana experience. Transp Res Rec 1950:55–64

    Google Scholar 

  75. Torbic DJ, Harwood DW, Gilmore DK, Richard KR (2007) Interchange Safety Analysis Tool (ISAT): User manual. Report FHWA-HRT-07-045. Federal Highway Administration, McLean

    Google Scholar 

  76. Turner B (2011) Estimating the safety benefits when using multiple road engineering treatments. Road Safety Risk Reporter 11

    Google Scholar 

  77. Turner S, Durdin P, Bone I, Jackett M (2003) New Zealand accident prediction models and their applications. Paper presented at the 21st ARRB Conference, Cairns

  78. Turner S, Tate F, Koorey G (2007) A SIDRA for road safety. Paper presented at 2007 IPENZ Transportation Group Conference, Tauranga

  79. Turner S, Singh R, Nates G (2012) The next generation of rural road crash prediction models: final report. Research Report 509. NZ Transport Agency, Wellington

    Google Scholar 

  80. Turner S, Brown M (2013) Pushing the boundaries of road safety risk analysis. Paper presented at 2013 IPENZ Transportation Group Conference, Dunedin

  81. Washington SP, Karlaftis MG, Mannering FL (2011) Statistical and econometric methods for transportation data analysis, 2nd edn. CRC Press, Boca Raton

    MATH  Google Scholar 

  82. Wood AG, Mountain LJ, Connors RD, Maher MJ, Ropkins K (2013) Updating outdated predictive accident models. Accid Anal Prev 55:54–66

    Article  Google Scholar 

  83. Wood GR, Turner S (2007) Towards a start-to-finish approach to the fitting of traffic accident models. In: De Smet A (ed) Transportation accident analysis and prevention. Nova Science, New York, pp 239–250

    Google Scholar 

  84. Xie F, Gladhill K, Dixon KK, Monsere CM (2011) Calibrating the highway safety manual predictive models for Oregon state highways. Transp Res Rec 2241:19–28

    Article  Google Scholar 

  85. Yannis G, Dragomanovits A, Laiou A, Richter T, Ruhl S, La Torre F et al (2014) Overview of existing accident prediction models and data sources. PRACT project deliverable D1

    Google Scholar 

  86. Yannis G, Dragomanovits A, Laiou A, Richter T, Ruhl S, Calabretta F et al (2015) Inventory and critical review of existing APMs and CMFs and related data sources. PRACT project deliverable D4

    Google Scholar 

  87. Yannis G, Dragomanovits A, Laiou A, Richter T, Ruhl S, La Torre F, et al (2016) Use of accident prediction models in road safety management – an international inquiry. Transp Res Proc 14:4257–4266

  88. Young J, Park PY (2013) Benefits of small municipalities using jurisdiction-specific safety performance functions rather than the highway safety Manualʼs calibrated or uncalibrated safety performance functions. Can J Civ Eng 40:517–527

    Article  Google Scholar 

  89. Yu H, Liu P, Chen J, Wang H (2014) Comparative analysis of the spatial analysis methods for hotspot identification. Accid Anal Prev 66:80–88

    Article  Google Scholar 

Download references

Acknowledgements

The paper was produced with the financial support of Czech Ministry of Education, Youth and Sports under the National Sustainability Programme I project of Transport R&D Centre (LO1610), using the research infrastructure from the Operation Programme Research and Development for Innovations (CZ.1.05/2.1.00/03.0064).

Author information

Authors and Affiliations

Authors

Contributions

All authors read and approved the final manuscript.

Corresponding author

Correspondence to Jiří Ambros.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ambros, J., Jurewicz, C., Turner, S. et al. An international review of challenges and opportunities in development and use of crash prediction models. Eur. Transp. Res. Rev. 10, 35 (2018). https://doi.org/10.1186/s12544-018-0307-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12544-018-0307-7

Keywords