- Original Paper
- Open Access
An international review of challenges and opportunities in development and use of crash prediction models
European Transport Research Review volume 10, Article number: 35 (2018)
Over the past 10 years, building on road infrastructure data, crash prediction models (CPMs) have become fundamental scientific tools for road safety management. However, there is a gap between state-of-the-art and state-of-the-practice, with the practical application lagging behind scientific progress. This motivated a review of international experience with CPMs from perspectives of application by practitioners and development by researchers. The objective of the paper is to improve practitioner understanding of modelling road safety performance using CPMs for crash frequency estimation, leading to their greater uptake in improving road safety. In short, why and how should road safety practitioners consider CPMs?
Both scientific and practice-oriented literature was retrieved, using academic sources, as well as reports of road agencies or institutes. The selection was limited to English language.
From the review it is clear that developing CPMs is not a straightforward task: there are many available choices and decisions to be made during the process without definite guidance. This explains the diversity of approaches, techniques, and model types. The paper explains how some fundamental modelling decisions affect practical aspects of modelling safety performance.
There is a need to identify CPM solutions that will be scientifically sound and feasible in practitioners’ context. Together with increased communication between researchers and practitioners, these solutions will help overcome the identified challenges and increase use of CPMs.
Crash prediction models (CPMs) are mathematical equations which link road safety performance and crash risk factors. First applications of CPMs appeared in the 1980s (for a review, see e.g. [48, 54, 66]). CPMs express the predicted crash frequency of a road (e.g. road segment or intersection) as a function of explanatory variables. These variables (risk factors) describe exposure to crash risk and other characteristics related to cross section, road design and other road and traffic attributes. The typical model form is:
Ni … crash frequency on road i in specific time period
β0 … intercept
EXPOi … exposure on road i in specific time period
βj … regression coefficients
xj … explanatory variables
In order to correctly consider discrete and character of crash frequencies, generalized linear modelling (GLM) methods are typically used. First models used the Poisson regression as a starting point; however, it was found that they cannot handle overdispersion (the variance exceeding the mean), which is typical for crash data . It motivated use of the negative binomial (or Poisson-gamma) models, which assume that the Poisson parameter follows a gamma probability distribution. According to an extensive review by Lord and Mannering , the negative binomial (NB) models are the most used in crash-frequency modelling. Given this fact, further text will focus on NB models; for more information on other model types, such as zero-inflated, generalized estimating equations (GEE), generalized additive models (GAM), random-effects, random-parameters, hierarchical/multilevel or neural networks, see e.g. [5, 47, 50].
CPMs analyse and highlight potential safety issues, help to identify potential for safety improvements and estimate their benefits . Over the past decades, building on road infrastructure data, CPMs have become the fundamental scientific tools in quantitative road safety management, forming the foundation of the AASHTO Highway Safety Manual (HSM) or the Australian National Risk Assessment Model (ANRAM). First edition of HSM (2010) became a recognized source of information and methods for science-based decision making, allowing safety to be quantitatively evaluated alongside other transportation performance measures such as traffic operations, environmental impacts, pavement durability or construction costs. The methods in HSM, based on CPMs, provide an opportunity to: (1) improve the reliability of common activities, such as screening a network for sites at which to reduce crashes, and (2) expand analysis to include assessments of new or alternative geometric and operational characteristics .
CPMs may be used for various key functions, including network safety screening, development of crash modification factors (CMFs), road safety impact assessments and economic analysis. However, there are gaps between state-of-the-art (what is published by researchers) and state-of-the-practice (what is needed/used by practitioners), which limit the application of CPMs.
This paper will assist road safety practitioners in understanding why and how they might use CPMs to improve road safety. The paper presents a review of how CPMs are developed and applied. Especially, the paper explores challenges of optimising scientific validity and practical applicability. These challenges are discussed in context of opportunities and potential solutions that might assist practitioners in incorporation CPMs into road safety management.
The goal of the review was to critically summarize international experience in the development and application of CPMs for crash frequency estimation, with a focus on practical use by road transport agencies. In this regard, both scientific and practice-oriented literature was retrieved based on the following criteria:
academic: Web of Science and Scopus, including selected references (snowballing)
practical: reports of agencies (e.g. Federal Highway Administration, Austroads, NZ Transport Agency)
both: ARRB Knowledge Base, TRID database, reports of European institutes, EU project deliverables
Keywords: accident prediction model, crash prediction model, safety performance function
Time frame restriction: none
To focus on the typical road settings (the main road network, i.e. motorways/freeways/expressways and national roads), the following specific issues were not considered:
Macro/planning-level applications (analysis based on jurisdiction, GDP, or land-use zones in assignment models)
Specific CPMs for vulnerable road users, such as pedestrians or bicyclists
CPMs for specific road elements (e.g. railway level crossings, bridges, tunnels, etc.)
Logistic binary modelling of crash characteristics (e.g. victim gender and age, vehicle age, etc.)
Use of CPMs for evaluation of safety effectiveness of safety treatments or programmes (before/after studies)
These CPM applications are important in broader road safety context and may be explored using findings presented in this paper as a starting point.
The retrieved materials were mainly from Europe, Australia, New Zealand and North America. In order to stress the practical focus, the aim was to select the works related to the most frequent applications of CPMs. The final literature selection focused on developing and using CPMs of road segments and intersections from an international perspective of fulfilling important road safety management functions. These include network screening for high-risk road sections, identifying significant crash risk factors, and road safety impact assessment of potential treatment options. The review is structured along the following sections, given by the hierarchical nature of considering, developing and applying CPMs:
Data collection, sample size and time period
Road network segmentation
Selection of explanatory variables
Model function and variable forms
Using CPMs in network screening
Using CPMs in developing crash modification factors (CMFs)
Using CPM tools, e.g. for road safety impact assessment
Previous reviews related to CPMs [5, 47, 54, 86] usually considered some of these steps only, mainly 3 and 4. The presented review fills the gap by compiling information on all eight steps, followed by summarised challenges and opportunities, with available solutions.
CPMs and their uses
CPMs may be used to accomplish various road safety management functions, such as:
Exploring and comparing combinations of individual risk factors that make some road locations unsafe
Network safety screening, i.e. safety ranking road locations, or identification of hazardous locations
Impact assessments, i.e. assessing safety of contemplated (re)constructions or safety treatments
Economic analysis of project costs vs. safety benefits
It is to be noted that Task 1 is rather research-oriented; Tasks 2, 3 and 4 represent typical practical tasks undertaken by many road agencies. According to a review of North American practices , network screening is the most common application of CPMs. In European project PRACT, cost-benefit analysis was identified as a common use of CPM application [85, 86].
As noted, CPMs may be developed for road segments of a particular road type (e.g. rural undivided highway), for all intersections, for individual intersection types, or any combination of these. CPMs can be developed for all recorded crashes, casualty crashes, or severe crashes only; the approach depends on the purpose of the model. Very broad CPMs may be useful in high-level network screening or highlighting strategic issues. More specific safety management or research objectives will require more specific models. Given the range of potential applications, CPMs have been acknowledged worldwide as recommended tools, on which rational road safety management should be based. However, at the same time, it has been known that prediction modelling is not a simple task [15, 18, 77] and involve various analytical choices, which are often done without explicit justification. This may explain why there are gaps between state-of-the-art and state-of-the-practice; and this may in turn limit the practical use of CPMs. For example, a survey among European road agencies found that 70% of them rarely or never systematically use CPMs in their decision-making .
Regarding the selection of research for inclusion in the review, another distinction needs to be made. HSM introduces a set of CPMs (referred to as safety performance functions, SPFs) and crash modification factors (CMFs). Crash prediction in the HSM has two main steps: (1) prediction of a baseline crash rates using SPFs/CPMs for nominal route and intersection conditions, and (2) multiplying the ‘baseline’ models by crash modification factors (CMFs) to capture changes in geometric design and operational characteristics (deviations from nominal conditions). This approach has gained popularity, being incorporated into Interactive Highway Safety Design Model (IHSDM), and recently adopted in the European CPM , as well as Australian ANRAM  and New Zealand Crash Estimation Compendium .
The CPMs/SPFs in the HSM and ISHDM, developed from data in several US states, are not directly transferable to other jurisdictions (inside or outside US). Some studies confirmed good transferability, mainly between US states [7, 74, 84], but others were less successful when applied abroad, for example in Canada, Italy or Korea [42, 63, 64, 69, 88]. Therefore, it is recommended that each country and jurisdiction (e.g. State) develops its own specific CPMs. The present review, written by non-US authors, adopts this perspective.
In theory, to obtain sufficiently representative models, one should randomly sample data from the population of similar road types or intersections. In this regards, given the variance of crash frequencies, several authors recommended minimal sample sizes, such as at least 50 sites , 200 crashes  or 300 crashes . The HSM  advises using a sample of 30–50 locations with a total of at least 100 crashes per year. However, others were critical about the one-size-fits-all approach. For example, Lord  provided guidance on necessary sample size based on sample mean, i.e. for example 200 segments in case of average of 5 crashes per segment, or 1000 segments in case of average of 1 crash per segment. (Note that these considerations do not apply in case of network screening, whose goal is to screen the complete network).
In addition, unlike in the case of large USA and Canadian samples, smaller countries are limited in their samples of network and crash data. For example, Turner etal.  mentioned, that New Zealand road network size limits the development of models for some segment and site types, e.g. interchanges. This factor also reduces opportunities for disaggregation CPMs into all crash types and severity levels.
Data on crashes, traffic volumes and other relevant road attributes need to be assigned to all the sample sites. Crash data are known for various biases, such as underreporting, location errors, severity misclassification or inaccurate identification of contributory factors. Also, traffic volume data may be prone to errors: typical measure of traffic volume AADT is an average, aggregated for various vehicle types ; in addition, location errors also exist, as traffic volumes typically measured at one location are assumed to apply to the entire section, and often to multiple sections. Thus actual variation in traffic flow is difficult to reflect in data.
Choice of time period for crash and AADT data requires another decision. A 1- to 5-year period is usually recommended for safety ranking, with 3-year period being the most frequent . Using longer time periods (beyond 5 years) may cause problems due to changes in conditions, such as substantial increases in traffic volumes or layout changes, over the period. Probably due to these issues there are no specific guidelines for time period choice. An exception was the simulation study of Cheng and Washington , which concluded there is little gain in the network screening accuracy when using a period longer than 6 years. Also using several consistency tests, 4 years were found sufficient for developing a CPM in a study by Ambros etal. . Usually a compromise between the need for early analysis of new treatments and the need for accumulating sufficient crashes to permit robust analysis is accepted .
Differences between rural and urban settings are also worth mentioning. Traditionally most focus has been given to rural roads (as also evident from CPM reviews [66, 85, 86]). In contrast, modelling urban safety is more challenging, due to higher presence of vulnerable road users and complex environments, including facilities for different road users, mixed land use, or higher density of various intersection types. Detailed crash data is likely to be needed if crash type-specific models are to be developed later on. More road attributes also need to be collected for urban roads, then tested for correlation, autocorrelation, and only then considered in models .
Ideal data sources are road agency asset inventories. Unfortunately, these may not be complete or up to date, and a modeller thus needs to combine various data sources. Additional surveys can be also conducted, either in the field (pedestrian counts, signal timing, speeds, etc.), drive-through digital video collection, or via online maps. Recent emergence of big data and open government policies (e.g. open data initiatives such as data.vic.gov.au) have aided these efforts substantially. It is feasible to pull together substantial amounts of road data from publicly available and road agencies’ own sources. Cross-checking of data for the same attributes between different sets also adds to reducing errors and better data quality management.
Road network segmentation
CPMs are typically developed either for road intersections or segments. In the latter case, segmentation has to be conducted, in order to divide the network into homogeneous segments, i.e. with constant values of explanatory variables. However, in case of multiple variables, this practice can naturally lead to short segments. This may complicate accurate assigning of crashes to individual segments. In addition, crash concentration is heterogeneous and random; many short segments may also have zero crash counts during the selected time period.
For segmentation, some authors set fixed lengths of several hundred meters [12, 14, 26], or used patterns based on tangents and curves [10, 44, 79]. Long segments can lead to forced homogenisation of variables by aggregating continuous variables into categories (e.g. pavement width bands), and this can lead to loss of applicability. In short, segmentation should consider the overall purpose of the modelling exercise. Longer segments (1–5 km) are often used for network screening [27, 57, 65]. Shorter segments are used to develop more meaningful CMFs, or to estimate localised benefits of safety treatments. Variable segment length can be included in the model. HSM assumes length to be a directly proportional to crash frequency, however many published models which include segment length as a variable suggest otherwise (e.g. ).
In practice, division of road network into segments is likely to be dictated by structure of national road databanks. For example in the Czech Republic, national traffic census (as the main source of AADT data) does not cover all minor roads; thus process of aggregating segments into longer segments including minor intersections was found feasible . As the segments may be subject to further investigations, their length should be feasible for on-site visits or crash analyses.
Selection of explanatory variables should be guided by previously documented crash and injury risk factor evidence available from research literature. However, in practice it is often dictated simply by data availability. Explanatory variables generally include exposure, transport function, cross section, traffic control; less often variables describing alignment, vehicle types or road user behaviour are used . When actual variables are not available, proxy variables may be used, e.g. abutting land use as a proxy for pedestrian movement counts.
The first step in variable selection involves identifying variables which are correlated with each other. For each such pair the researcher should remove one variable which is less useful to the purpose of the model (e.g. if sealed shoulder provision is strongly correlated with line marking presence, then remove the latter). In order to further identify the statistically significant variables, a stepwise regression approach is typically used. It may be applied either in a forward selection or a backward elimination manner; in both cases selected goodness-of-fit (GOF) measures are used to assess the statistical significance. Common GOF measures include information criteria such as AIC or BIC, while others use for example scaled deviance [22, 77] or proportion of explained systematic variance [2, 45].
Based on a number of explanatory variables (model complexity), CPMs may be simple (exposure-only) or multivariate (fully-specified) . Sawalha and Sayed  warned against temptations to build overfit models, i.e. containing too many insignificant variables. In fact, a number of studies found that additional predictors are not as beneficial as expected [59, 70, 82]. One should strive for parsimonious models, i.e. the ones containing as few explanatory variables as possible . Such models enable simple interpretation and understanding, as well as easy updating .
A practice-driven approach was adopted in developing New Zealand rural road CPMs . When it was found that the statistically significant variables did not include the parameters that were of most interest to practitioners, two distinct model types were developed. Statistical models are the best-performing models according to goodness-of-fit measures at 95% confidence levels. Practitioner models contain additional variables of interest to safety professionals, at confidence levels of 70% or more.
On the other hand, in case of leaving out an influential explanatory variable due to unavailable data, so called “omitted variable bias” occurs. The bias results in biased parameter estimates that can produce erroneous inferences and crash frequency predictions [47, 50, 51].
Another bias may be caused by spatial correlation, given by the fact that adjacent road segment may share unobserved effects . This bias can be handled by using random-effect models, where the common unobserved effects are assumed to be distributed over the road segments according to some distribution and shared unobserved effects are assumed to be uncorrelated with explanatory variables .
Model function and variable forms
Before carrying out the modelling task, exploratory data analysis should be conducted, in order to detect potential outliers, check the extreme values, potential mistakes, etc.
As previously mentioned, crash data are typically overdispersed. The degree of overdispersion in a negative binomial model is represented by overdispersion parameter that is estimated during modelling along with the regression coefficients of the regression equation. The overdispersion parameter is used to determine the value of a weight factor for use in the empirical Bayes (EB) method. This method combines predicted (modelled) and recorded (observed) crash frequencies, in order to improve reliability of a specific site safety level estimation . Applications of EB methods are described in later sections of the review.
Crash frequency (i.e. response variable) ideally should not involve mixed levels of crash severity and crash types, as it may produce uninterpretable results . It is thus recommended to develop disaggregated CPMs . Alternatively one may use the observed proportion of a given crash type or severity and apply it to the CPM that has been estimated for total crashes . However, this has been found a questionable practice, leading to estimation errors . The current recommendation is estimating separate CPMs by crash types. New Zealand practice is developing models for key (or common) crash types and, if necessary, scaling their predictions to represent total crash frequency, to allow for less common crash types . Some studies [24, 27] used sub-samples (for example stratification based on AADT under/over specific limits) in order to improve model quality. In any case, developing disaggregated CPMs obviously requires larger sample sizes. In terms of severity models are developed by injury severity levels (usually with fatal and serious injury crashes combined), as with the ANRAM models . Alternatively, severity factors (proportions) are applied to models developed for all injury crashes or all crashes (including non-injury) .
Regarding function forms of explanatory variables, there is no universal guidance and various are used in the literature. To select the most suitable mathematical forms of explanatory variables, one may use graphical relationships between crash frequency or a road variable (i.e. univariate analysis) , or use more complex techniques, such as empirical integral functions and cumulative residuals (CURE) . According to Hauer , the model equation may have both multiplicative components (to represent the influence of continuous factors, such as lane width or shoulder type), and additive components (to account for the influence of point hazards, such as driveways or narrow bridges). Despite these recommendations, the typical modelling approach is often simple. The general model form of Eq. (1) is widely adopted.
Exposure is usually modelled in terms of traffic volume, i.e. single AADT value for road segments, or product of major and minor AADTs for road intersections. Function is typically a power form, but some authors considered it jointly with an exponential form (so called Ricker model ). Traffic volumes (flows) should be adapted to the specific segment and intersection types. For example, New Zealand CPMs  apply either product of flows or conflicting flows, based on the type of intersection, urban/rural settings and speed limits. As discussed, segment length variable is often used where road segments are not of equal length. For intersections, standard approach length is typically used, e.g. 50–100 m, and not modelled as a variable.
Another example is segment length, usually applied as an offset, i.e. with regression coefficient = 1, but often also in a power form [30, 67, 68]. According to Hauer , segment length should also be considered when estimating the over-dispersion parameter for the frequency models to be used in the empirical Bayes approach. However, the exact form of the relationship is not definite ; in fact, not only length but also other variables may play a role .
Creation of a model is undertaken by running relevant statistical regression processes on the sample data. The most common tools for this are statistical software packages such as R, SPSS, SAS or Matlab. Microsoft Excel is not considered appropriate for this task as it lacks many of the necessary statistical features.
In practice, the modelling process is highly iterative. Variables are added, and then removed if shown to add little or nothing to explanation of the response variable. Often data for a given variable is re-categorised to improve its significance if it is borderline. Often borderline or non-significant variables are retained if they add to better understanding of crash problem. Optimisation of the model fit vs. number of variables vs. applicability is gradually achieved. This iterative process can be stopped when little further improvement in the model is achieved with each iteration [10, 25].
The goal of validation is proving whether the developed model is acceptable from both scientific and practical perspectives. It is thus surprising that most of modelling guidelines seem to overlook this step [1, 23, 35, 36, 48, 71, 72, 83].
According to Oh etal. , one may distinguish between internal validity and external validity.
Interval validity means that CPM findings should be consistent with established knowledge on the subject; CPM should also possess the features of the underlying phenomenon; and finally CPM should agree with fundamental information and knowledge, such as physical mechanics and dynamics involved with crashes . Newly developed CPMs may be compared to previous literature in terms of signs and magnitudes of regression coefficients, or for example their marginal effects .
External validity (goodness-of-fit) may be evaluated by comparing either models from two independent samples, or a model from a complete sample applied on selected sub-samples that have not been used in the model building (e.g. randomly-chosen 20%). Various goodness-of-fit indicators may be applied; often proportion of systematic variation in the original accident dataset explained by the model (also known as Elvik index) is used [22, 45].
Using CPMs in network screening
Previous reviews [16, 52] indicated that current state-of-practice is generally behind the state-of-the-art. According to the EB methodology, predicted crash frequency from CPMs should be combined with observed historical crash frequency to obtain the so called “expected average crash frequency with empirical Bayes adjustment” (in short EB estimate). These EB estimates benefit to the practitioner by removing much of the random statistical variation associated with historical crash data, especially at low frequencies [1, 41]. Apart from EB estimates, other safety indicators can be developed for network screening purposes, for example potential for safety improvement (PSI) , level of service of safety (LOSS)  or scaled difference .
In Australia and New Zealand, where low-volume rural roads generate very low numbers of crashes per kilometre per 5 years (or zero), CPMs provide a continuous proxy measure of safety. In Australia the ANRAM model uses EB estimates of severe casualty crashes to remove the random variation in observed crash data at 1–3 km segment level: sites are prioritised simply on the EB estimate . Differences of more than two standard errors between the EB estimate and observed crashes are noted as a possible indicator of non-infrastructure based influences of safety (e.g. localised speeding or drink-driving) .
Given the variety of available methods, HSM  notes that “using multiple performance measures to evaluate each site may improve the level of confidence in the results.” Hence sites may be ranked for treatment based on several different methods [49, 52, 89]. Those that rank consistently high using several methods are the sites where treatment should be focused.
Using CPMs in developing crash modification factors
Crash modification factor (CMF) is a multiplicative factor used to compute the expected number of crashes after implementing a given countermeasure or a design change at a location. CMFs may be derived from before-after or cross-sectional studies; however, each method has its own challenges, and available CMFs can often be highly inconsistent between literature sources . Before and after studies are generally the preferred source of CMFs, particularly for the HSM. However they typically only look at features in isolation and so when the combined effects of features on crash occurrence is not the sum of the effects of each individual feature, then they may provide misleading results. Several solutions to developing multiple treatment CMFs have been proposed, without reaching definite conclusions [17, 29, 58].
Cross-sectional studies (i.e. the ones based on CPMs) have been criticised for being more prone to non-causal safety effects, due to bias-by selection [11, 19, 36]. Bias-by-selection can occur when a treatment (e.g. a crash barrier) is applied more often to sites that already have a crash problem than to those that do not. They do however provide a much better crash prediction for the combination of road features. In some cases, CMFs are developed from CPMs where limited before and after studies are available.
Although the practice of deriving crash modification factors (CMFs) from cross-sectional CPMs has been criticised, it is relatively common. Again, there are various approaches: for example, Park etal.  tested six different methods of combining CMFs and concluded that one should not rely on only one of them. Interim solution is applying ‘rule-of-thumbs’ , such as using the product of no more than three separate independent countermeasures  or reducing the product through multiplying by a ratio 2/3 .
Using CPM tools
The above-mentioned analytical steps (data preparation, exploratory analysis, modelling, calculations) are typically conducted in statistical software or spreadsheets. Nevertheless, for an end user it is beneficial to be able to visualize the results. These may take form of tables or map outputs, for example the identified hotspots or the lists of ranked segments. A number of practitioner tools are worthy of mention, especially as they apply to network screening and analysis of safety impacts of potential treatments.
One option is using stand-alone software solutions, such as the following two from the USA:
IHSDM Crash Prediction Module  estimates the frequency and severity of crashes on a highway using geometric design and traffic characteristics. This helps users evaluate an existing highway, compare the relative safety performance of design alternatives, and assess the safety cost-effectiveness of design decisions.
SafetyAnalyst (commercial software) Network Screening Tool  identifies sites with potential for safety improvement. In addition, it is able to identify sites with high crash severities and with high proportions of specific crash types.
Note that there are close links between IHSDM, SafetyAnalyst and Highway Safety Manual. According to Harwood etal. , SafetyAnalyst Module 1 (network screening) is to be applied first, followed by Module 2 (diagnosis and countermeasure selection), Module 3 (economic appraisal and priority ranking) and IHSDM to perform safety analyses as part of the design process.
The Finnish evaluation tool TARVA  also deserves mentioning. Its purpose is to provide a common method and database for (1) predicting the expected number of crashes, and (2) estimating the safety effects of road safety improvements. Based on simple CPMs and pre-determined CMFs, it currently exists in Finnish and Lithuanian versions, with planned applications in other countries.
Capabilities of network screening and road safety impact assessment are built in commercial software PTV Visum Safety. There are also applications in the form of Excel spreadsheets, for example British COBALT, Swedish TS-EVA or Norwegian CPMs for national and country roads [37, 38]. In the US, spreadsheets were developed for safety analysis of freeway segments and interchanges (ISAT  and ISATe ).
The Australian National Risk Assessment Model (ANRAM) tool, available to road agencies, is a network screening and prioritisation tool, which uses CPMs for different road stereotypes, together with CMFs and observed crash data to estimate severe injury crashes across segmented road network . ANRAM allows users to develop and estimate benefits of road network and corridor treatment programs. This tool has gained wide use among state road agencies in Australia, particularly for the rural road networks where actual severe crashes are randomly distributed. ANRAM is available in a spreadsheet form, with planned online adaptations.
New Zealand also has a history of various safety prediction tools. Turner etal.  stressed the practical need of such tools and after review of overseas applications, considered IHSDM as worth transferring into New Zealand conditions, for assessing new road designs. A later work  reviewed New Zealand spreadsheet applications, as well as experience with using and calibrating the ISAT tool from the USA.
Increasingly, online business analytics software has been used to display CPM results in map format, often with dynamic filtering and computational functions. Examples include open source and free resources such as ArcGIS Online, QGIS, Tableau, or Microsoft Power BI. These solutions make it easy for practitioners to access and understand the value of CPMs.
Challenges and opportunities
The review has presented an opportunity to synthesise the key challenges practitioners are likely to face in translating the scientific state-of-the-art into practice. Opportunities and potential solutions are proposed for addressing these challenges and making CPMs more accessible to road safety practitioners – see Table 1.
Summary and conclusions
Greater uptake of state-of-the-art analytical techniques is necessary for continuing improvement in road safety. This paper aimed to improve practitioner understanding of modelling road safety performance using CPMs, so that this useful analytical technique could become more accessible.
A number of steps have been reviewed: from data collection and road network segmentation to choosing variables and function forms, validating models and using them in practice, including description of available tools. The review highlighted that developing CPMs is not a straightforward task: there are many alternative choices and decisions to be made during the process (without definite guidance), which explains the diversity of approaches and techniques. While this may be interesting from a research perspective, the current diverse state-of-the-art limits understanding and application by practitioners, and complicates international comparability or transferability. There is a need to identify the opportunities and solutions, which will be scientifically sound, while also meeting the needs of practitioners.
The main consideration for the researches should be application of their models by intended practitioners. This applies equally in the context of basic research, such as seeking understanding of a new challenge, as in the context of applied research such as development of algorithms for inclusion in practitioner software. Either way the end users of CPMs are the practitioners, i.e. road agency engineers, policy makers, or data analysts.
The review aimed to improve practitioner understanding of CPMs to bolster their use in improving road safety. The question of how and why should practitioners consider using CPMs could be answered as follows:
CPMs are valuable tools, which help link crashes with risk factors. This is especially valuable in current conditions of scattered crash occurrence (less crash black-spots), where traditional crash-based approaches do not work well.
Developing and using CPMs has its challenges. However, these may be overcome by improved communication of the CPM benefits and application, so that practitioners have a basic understanding of CPMs and can make basic application decisions (e.g. use or calibrate available models).
Applying network-wide CPMs enable performing effective road safety impact assessment and network screening.
Ongoing investment in developing CPM-based practitioner tools, big data management and visualisation platforms offers potential for improved accessibility and uptake of CPMs in road safety management.
AASHTO (2010) Highway safety manual, 1st edn. American Association of State Highway and Transportation Officials, Washington
Ambros J, Valentová V, Sedoník J (2016) Developing updatable crash prediction model for network screening: case study of Czech two-lane rural road segments. Transp Res Rec 2583:1–7
Ambros J, Sedoník J, Křivánková Z (2018) How to simplify road network safety screening? Adv Transp Stud 44:151–158
Arndt O, Troutbeck R (2006) Techniques for analysing the effect of road geometry on accident rates using multifactor studies. Paper presented at the 22nd ARRB conference, Canberra.
Basu S, Saha P (2017) Regression models of highway traffic crashes: a review of recent research and future research needs. Procedia Eng 187:59–66
Bonneson JA, Geedipally S, Pratt MP, Lord D (2012) Safety prediction methodology and analysis tool for freeways and interchanges. NCHRP project 17–45 final report. Transportation Research Board, Washington
Bornheimer C, Schrock S, Wang M, Lubliner H (2012) Developing a regional safety performance function for rural two-lane highways. Paper presented at the 91st Transportation Research Board Annual Meeting, Washington
Butsick AJ, Wood JS, Jovanis PP (2017) Using network screening methods to determine locations with specific safety issues: a design consistency case study. Accid Anal Prev 106:223–233
Cafiso S, Di Silvestro G, Persaud B, Begum MA (2010) Revisiting variability of dispersion parameter of safety performance for two-lane rural roads. Transp Res Rec 2148:38–46
Cafiso S, D’Agostino C (2013) Investigating the influence of segmentation in estimating safety performance functions for roadway sections. Paper presented at the 92nd Transportation Research Board Annual Meeting, Washington
Carter D, Srinivasan R, Gross F, Council F (2012) Recommended protocols for developing crash modification factors. NCHRP project 20-07, task 314 report. Transportation Research Board, Washington
Cenek PD, Davies RB, McLarin MW, Griffith-Jones G, Locke NJ (1997) Road environment and traffic crashes. Research report 79. Transfund, Wellington
Cheng W, Washington S (2005) Experimental evaluation of hotspot identification methods. Accid Anal Prev 37:870–881
da Costa JO, Jacques MAP, Pereira PAA, Freitas EF, Soares FEC (2015) Portuguese two-lane highways: modelling crash frequencies for different temporal and spatial aggregation of crash data. Transp 30:1–12
Eenink R, Reurings M, Elvik R, Cardoso J, Wichert S, Stefan C (2008) Accident prediction models and road safety impact assessment: recommendations for using these tools. RIPCORD-ISEREST project deliverable 2
Elvik R (2008) A survey of operational definitions of hazardous road locations in some European countries. Accid Anal Prev 40:1830–1835
Elvik R (2009) An exploratory analysis of models for estimating the combined effects of road safety measures. Accid Anal Prev 41:876–880
Elvik R (2010) Assessment and applicability of road safety management evaluation tools: Current practice and state-of-the-art in Europe. Report 1113/2010. Institute of Transport Economics, Oslo
Elvik R (2011) Assessing causality in multivariate accident models. Accid Anal Prev 43:253–264
FHWA (2003) Interactive highway safety design model (IHSDM) – crash prediction module (CPM) Userʼs manual. Federal Highway Administration, McLean
FHWA (2010) SafetyAnalyst: software tools for safety management of specific highway sites. White paper for module 1 – Network screening. Federal Highway Administration, McLean
Fridstrøm L, Ifver J, Ingebrigtsen S, Kulmala R, Thomsen LK (1995) Measuring the contribution of randomness, exposure, weather, and daylight to the variation in road accident counts. Accid Anal Prev 27:1–20
Fridstrøm L (2015) Disaggregate accident frequency and risk modelling: A rough guide. Report 1403/2015. Institute of Transport Economics, Oslo
Garach L, de Oña J, López G, Baena L (2016) Development of safety performance functions for Spanish two-lane rural highways on flat terrain. Accid Anal Prev 95:250–265
Geedipally SR, Lord D, Park BJ (2009) Analyzing different parameterizations of the varying dispersion parameter as a function of segment length. Transp Res Rec 2103:108–118
Geyer J, Lankina E, Chan C-Y, Ragland D, Pham T, Sharafsaleh A (2008) Methods for identifying high collision concentration locations for potential safety improvements. Report UCB-ITS-PRR-2008-35. University of California, Berkeley
Gitelman V, Doveh E (2016) Safety management of non-urban roads in Israel: an application of empirical Bayes evaluation. J Traffic Transp Eng 4:259–269
Gross F, Persaud B, Lyon C (2010) A guide to developing quality crash modification factors. Report FHWA-SA-10-032. Federal Highway Administration, Washington
Gross F, Hamidi A (2011) Investigation of existing and alternative methods for combining multiple CMFs. T-06-013 HSIP Technical Support, Task A.9
Hadi MA, Aruldhas J, Chow L-F, Wattleworth JA (1995) Estimating safety effects of cross-section design for various highway types using negative binomial regression. Transp Res Rec 1500:169–177
Harwood DW, Torbic DJ, Richard KR, Meyer MM (2010) SafetyAnalyst: Software tools for safety management of specific highway sites. Report FHWA-HRT-10-063. Federal Highway Administration, McLean
Hauer E (1997) Observational before-after studies in road safety: estimating the effect of highway and traffic engineering measures on road safety. Pergamon, Oxford
Hauer E, Bamfo J (1997) Two tools for finding what function links the dependent variable to the explanatory variables. Paper presented at ICTCT 97 Conference, Lund
Hauer E (2001) Overdispersion in modelling accidents on road sections and in empirical Bayes estimation. Accid Anal Prev 33:799–808
Hauer E (2004) Statistical road safety modeling. Transp Res Rec 1897:81–87
South J, Blass B (2001) The future of modern genomics. Blackwell, London
Høye A (2014) Development of crash prediction models for national and county roads in Norway. Report 1323/2014. Institute of Transport Economics, Oslo
Høye A (2016) Development of crash prediction models for national and county roads in Norway (2010-2015). Report 1522/2016. Institute of Transport Economics, Oslo
Jonsson T (2005) Predictive models for accidents on urban links: A focus on vulnerable road users. Bulletin 226. Lund University, Lund
Jonsson T, Lyon C, Ivan J, Washington S, van Schalkwyk I, Lord D (2009) Investigating differences in safety performance functions estimated for total crash count and for crash county by collision type. Transp Res Rec 2102:115–123
Jurewicz C, Steinmetz L, Turner B (2014) Australian National Risk Assessment Model. Publication AP-R451–14. Austroads, Sydney
Kim E, Lee D, Choi B-G, Choi S-E, Choi E (2010) Applicability of a Korea highway safety evaluation model compared to the crash prediction module of IHSDM. Paper presented at the 12th World Conference on Transport Research, Lisbon
Kononov J, Allery B (2003) Level of service of safety: conceptual blueprint and analytical framework. Transp Res Rec 1840:57–66
Koorey G (2009) Road data aggregation and sectioning considerations for crash analysis. Transp Res Rec 2103:61–68
Kulmala R (1995) Safety at rural three- and four-arm junctions: Development and application of accident prediction models. Publication 233. VTT Technical Research Centre of Finland, Espoo
Lord D (2006) Modeling motor vehicle crashes using Poisson-gamma models: examining the effects of low sample mean values and small sample size on the estimation of the fixed dispersion parameter. Accid Anal Prev 38:751–766
Lord D, Mannering F (2010) The statistical analysis of crash-frequency data: a review and assessment of methodological alternatives. Transp Res A 44:291–305
Maher MJ, Summersgill I (1996) A comprehensive methodology for the fitting of predictive accident models. Accid Anal Prev 28:281–296
Manepalli URR, Bham GH (2016) An evaluation of performance measures for hotspot identification. J Transp Saf Secur 8:327–345
Mannering FL, Bhat CR (2014) Analytic methods in accident research: methodological frontier and future directions. Analytic Methods Accid Res 1:1–22
Mitra S, Washington S (2012) On the significance of omitted variables in intersection crash modeling. Accid Anal Prev 49:439–448
Montella A (2010) A comparative analysis of hotspot identification methods. Accid Anal Prev 42:571–581
NZTA (2016) Crash estimation compendium (New Zealand crash risk factors guideline). NZ Transport Agency, Wellington
OECD (1997) Road safety principles and models: review of descriptive, predictive, risk and accident consequence models. OECD, Paris
OECD (2012) Sharing road safety: developing an international framework for crash modification functions. OECD, Paris
Oh J, Lyon C, Washington S, Persaud B, Bared J (2003) Validation of FHWA crash models for rural intersections: lessons learned. Transp Res Rec 1840:41–49
Pardillo Mayora JM, Bojórquez Manzo R, Camarero Orive A (2006) Refinement of accident prediction models for Spanish national network. Transp Res Rec 1950:65–72
Park J, Abdel-Aty M, Lee C (2014) Exploration and comparison of crash modification factors for multiple treatments on rural multilane roadways. Accid Anal Prev 70:167–177
Peltola H, Kulmala R, Kallberg V-P (1994) Why use a complicated accident prediction model when a simple one is just as good? Paper presented at the 22nd PTRC Summer Annual Meeting, Warwick
Peltola H, Rajamäki R, Luoma J (2013) A tool for safety evaluations of road improvements. Accid Anal Prev 60:277–288
Persaud B, Lyon C, Nguyen T (1999) Empirical Bayes procedure for ranking sites for safety investigation by potential for safety improvement. Transp Res Rec 1665:7–12
Persaud BN (2001) Statistical methods in highway safety analysis: A synthesis of highway practice. NCHRP synthesis 295. Transportation Research Board, Washington
Persaud B, Lord D, Palmisano J (2002) Calibration and transferability of accident prediction models for urban intersections. Transp Res Rec 1784:57–64
Persaud B, Saleem T, Faisal S, Lyon C, Chen Y, Sabbaghi A (2012) Adoption of Highway Safety Manual predictive methodologies for Canadian highways. Paper presented at 2012 TAC Conference, Fredericton
Ragnøy A, Christensen P, Elvik R (2002) Injury severity density: A new approach to identifying hazardous road sections. Report 618/2002. Institute of Transport Economics, Oslo
Reurings M, Janssen T, Eenink R, Elvik R, Cardoso J, Stefan C (2005) Accident prediction models and road safety impact assessment: a state-of-the-art. RIPCORD-ISEREST project deliverable 2.1
Reurings M, Janssen T (2007) Accident prediction models for urban and rural carriageways. Report R-2006-14. SWOV Institute for Road Safety Research, Leidschendam
Roque C, Cardoso JL (2014) Investigating the relationship between run-off-the-road crash frequency and traffic flow through different functional forms. Accid Anal Prev 63:121–132
Sacchi E, Persaud B, Bassani M (2012) Assessing international transferability of highway safety manual crash prediction algorithm and its components. Transp Res Rec 2279:90–98
Saha D, Alluri P, Gan A (2015) Prioritizing highway safety Manual’s crash prediction variables using boosted regression trees. Accid Anal Prev 79:133–144
Sawalha Z, Sayed T (2006) Traffic accident modeling: some statistical issues. Can J Civ Eng 33:1115–1124
Srinivasan R, Bauer K (2013) Safety performance function development guide: Developing jurisdiction-specific SPFs. Report FHWA-SA-14-005. Federal Highway Administration, Washington
Srinivasan R, Carter D, Bauer K (2013) Safety performance function decision guide: SPF calibration vs SPF development. Report FHWA-SA-14-004. Federal Highway Administration, Washington
Sun X, Li Y, Magri D, Shirazi HH (2006) Application of highway safety manual draft chapter: Louisiana experience. Transp Res Rec 1950:55–64
Torbic DJ, Harwood DW, Gilmore DK, Richard KR (2007) Interchange Safety Analysis Tool (ISAT): User manual. Report FHWA-HRT-07-045. Federal Highway Administration, McLean
Turner B (2011) Estimating the safety benefits when using multiple road engineering treatments. Road Safety Risk Reporter 11
Turner S, Durdin P, Bone I, Jackett M (2003) New Zealand accident prediction models and their applications. Paper presented at the 21st ARRB Conference, Cairns
Turner S, Tate F, Koorey G (2007) A SIDRA for road safety. Paper presented at 2007 IPENZ Transportation Group Conference, Tauranga
Turner S, Singh R, Nates G (2012) The next generation of rural road crash prediction models: final report. Research Report 509. NZ Transport Agency, Wellington
Turner S, Brown M (2013) Pushing the boundaries of road safety risk analysis. Paper presented at 2013 IPENZ Transportation Group Conference, Dunedin
Washington SP, Karlaftis MG, Mannering FL (2011) Statistical and econometric methods for transportation data analysis, 2nd edn. CRC Press, Boca Raton
Wood AG, Mountain LJ, Connors RD, Maher MJ, Ropkins K (2013) Updating outdated predictive accident models. Accid Anal Prev 55:54–66
Wood GR, Turner S (2007) Towards a start-to-finish approach to the fitting of traffic accident models. In: De Smet A (ed) Transportation accident analysis and prevention. Nova Science, New York, pp 239–250
Xie F, Gladhill K, Dixon KK, Monsere CM (2011) Calibrating the highway safety manual predictive models for Oregon state highways. Transp Res Rec 2241:19–28
Yannis G, Dragomanovits A, Laiou A, Richter T, Ruhl S, La Torre F et al (2014) Overview of existing accident prediction models and data sources. PRACT project deliverable D1
Yannis G, Dragomanovits A, Laiou A, Richter T, Ruhl S, Calabretta F et al (2015) Inventory and critical review of existing APMs and CMFs and related data sources. PRACT project deliverable D4
Yannis G, Dragomanovits A, Laiou A, Richter T, Ruhl S, La Torre F, et al (2016) Use of accident prediction models in road safety management – an international inquiry. Transp Res Proc 14:4257–4266
Young J, Park PY (2013) Benefits of small municipalities using jurisdiction-specific safety performance functions rather than the highway safety Manualʼs calibrated or uncalibrated safety performance functions. Can J Civ Eng 40:517–527
Yu H, Liu P, Chen J, Wang H (2014) Comparative analysis of the spatial analysis methods for hotspot identification. Accid Anal Prev 66:80–88
The paper was produced with the financial support of Czech Ministry of Education, Youth and Sports under the National Sustainability Programme I project of Transport R&D Centre (LO1610), using the research infrastructure from the Operation Programme Research and Development for Innovations (CZ.1.05/2.1.00/03.0064).
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Ambros, J., Jurewicz, C., Turner, S. et al. An international review of challenges and opportunities in development and use of crash prediction models. Eur. Transp. Res. Rev. 10, 35 (2018) doi:10.1186/s12544-018-0307-7
- Road safety
- Crash prediction model
- Risk model