3.1 CPMs and their uses
CPMs may be used to accomplish various road safety management functions, such as:
Exploring and comparing combinations of individual risk factors that make some road locations unsafe
Network safety screening, i.e. safety ranking road locations, or identification of hazardous locations
Impact assessments, i.e. assessing safety of contemplated (re)constructions or safety treatments
Economic analysis of project costs vs. safety benefits
It is to be noted that Task 1 is rather research-oriented; Tasks 2, 3 and 4 represent typical practical tasks undertaken by many road agencies. According to a review of North American practices , network screening is the most common application of CPMs. In European project PRACT, cost-benefit analysis was identified as a common use of CPM application [85, 86].
As noted, CPMs may be developed for road segments of a particular road type (e.g. rural undivided highway), for all intersections, for individual intersection types, or any combination of these. CPMs can be developed for all recorded crashes, casualty crashes, or severe crashes only; the approach depends on the purpose of the model. Very broad CPMs may be useful in high-level network screening or highlighting strategic issues. More specific safety management or research objectives will require more specific models. Given the range of potential applications, CPMs have been acknowledged worldwide as recommended tools, on which rational road safety management should be based. However, at the same time, it has been known that prediction modelling is not a simple task [15, 18, 77] and involve various analytical choices, which are often done without explicit justification. This may explain why there are gaps between state-of-the-art and state-of-the-practice; and this may in turn limit the practical use of CPMs. For example, a survey among European road agencies found that 70% of them rarely or never systematically use CPMs in their decision-making .
Regarding the selection of research for inclusion in the review, another distinction needs to be made. HSM introduces a set of CPMs (referred to as safety performance functions, SPFs) and crash modification factors (CMFs). Crash prediction in the HSM has two main steps: (1) prediction of a baseline crash rates using SPFs/CPMs for nominal route and intersection conditions, and (2) multiplying the ‘baseline’ models by crash modification factors (CMFs) to capture changes in geometric design and operational characteristics (deviations from nominal conditions). This approach has gained popularity, being incorporated into Interactive Highway Safety Design Model (IHSDM), and recently adopted in the European CPM , as well as Australian ANRAM  and New Zealand Crash Estimation Compendium .
The CPMs/SPFs in the HSM and ISHDM, developed from data in several US states, are not directly transferable to other jurisdictions (inside or outside US). Some studies confirmed good transferability, mainly between US states [7, 74, 84], but others were less successful when applied abroad, for example in Canada, Italy or Korea [42, 63, 64, 69, 88]. Therefore, it is recommended that each country and jurisdiction (e.g. State) develops its own specific CPMs. The present review, written by non-US authors, adopts this perspective.
3.2 Data collection
In theory, to obtain sufficiently representative models, one should randomly sample data from the population of similar road types or intersections. In this regards, given the variance of crash frequencies, several authors recommended minimal sample sizes, such as at least 50 sites , 200 crashes  or 300 crashes . The HSM  advises using a sample of 30–50 locations with a total of at least 100 crashes per year. However, others were critical about the one-size-fits-all approach. For example, Lord  provided guidance on necessary sample size based on sample mean, i.e. for example 200 segments in case of average of 5 crashes per segment, or 1000 segments in case of average of 1 crash per segment. (Note that these considerations do not apply in case of network screening, whose goal is to screen the complete network).
In addition, unlike in the case of large USA and Canadian samples, smaller countries are limited in their samples of network and crash data. For example, Turner etal.  mentioned, that New Zealand road network size limits the development of models for some segment and site types, e.g. interchanges. This factor also reduces opportunities for disaggregation CPMs into all crash types and severity levels.
Data on crashes, traffic volumes and other relevant road attributes need to be assigned to all the sample sites. Crash data are known for various biases, such as underreporting, location errors, severity misclassification or inaccurate identification of contributory factors. Also, traffic volume data may be prone to errors: typical measure of traffic volume AADT is an average, aggregated for various vehicle types ; in addition, location errors also exist, as traffic volumes typically measured at one location are assumed to apply to the entire section, and often to multiple sections. Thus actual variation in traffic flow is difficult to reflect in data.
Choice of time period for crash and AADT data requires another decision. A 1- to 5-year period is usually recommended for safety ranking, with 3-year period being the most frequent . Using longer time periods (beyond 5 years) may cause problems due to changes in conditions, such as substantial increases in traffic volumes or layout changes, over the period. Probably due to these issues there are no specific guidelines for time period choice. An exception was the simulation study of Cheng and Washington , which concluded there is little gain in the network screening accuracy when using a period longer than 6 years. Also using several consistency tests, 4 years were found sufficient for developing a CPM in a study by Ambros etal. . Usually a compromise between the need for early analysis of new treatments and the need for accumulating sufficient crashes to permit robust analysis is accepted .
Differences between rural and urban settings are also worth mentioning. Traditionally most focus has been given to rural roads (as also evident from CPM reviews [66, 85, 86]). In contrast, modelling urban safety is more challenging, due to higher presence of vulnerable road users and complex environments, including facilities for different road users, mixed land use, or higher density of various intersection types. Detailed crash data is likely to be needed if crash type-specific models are to be developed later on. More road attributes also need to be collected for urban roads, then tested for correlation, autocorrelation, and only then considered in models .
Ideal data sources are road agency asset inventories. Unfortunately, these may not be complete or up to date, and a modeller thus needs to combine various data sources. Additional surveys can be also conducted, either in the field (pedestrian counts, signal timing, speeds, etc.), drive-through digital video collection, or via online maps. Recent emergence of big data and open government policies (e.g. open data initiatives such as data.vic.gov.au) have aided these efforts substantially. It is feasible to pull together substantial amounts of road data from publicly available and road agencies’ own sources. Cross-checking of data for the same attributes between different sets also adds to reducing errors and better data quality management.
3.3 Road network segmentation
CPMs are typically developed either for road intersections or segments. In the latter case, segmentation has to be conducted, in order to divide the network into homogeneous segments, i.e. with constant values of explanatory variables. However, in case of multiple variables, this practice can naturally lead to short segments. This may complicate accurate assigning of crashes to individual segments. In addition, crash concentration is heterogeneous and random; many short segments may also have zero crash counts during the selected time period.
For segmentation, some authors set fixed lengths of several hundred meters [12, 14, 26], or used patterns based on tangents and curves [10, 44, 79]. Long segments can lead to forced homogenisation of variables by aggregating continuous variables into categories (e.g. pavement width bands), and this can lead to loss of applicability. In short, segmentation should consider the overall purpose of the modelling exercise. Longer segments (1–5 km) are often used for network screening [27, 57, 65]. Shorter segments are used to develop more meaningful CMFs, or to estimate localised benefits of safety treatments. Variable segment length can be included in the model. HSM assumes length to be a directly proportional to crash frequency, however many published models which include segment length as a variable suggest otherwise (e.g. ).
In practice, division of road network into segments is likely to be dictated by structure of national road databanks. For example in the Czech Republic, national traffic census (as the main source of AADT data) does not cover all minor roads; thus process of aggregating segments into longer segments including minor intersections was found feasible . As the segments may be subject to further investigations, their length should be feasible for on-site visits or crash analyses.
3.4 Explanatory variables
Selection of explanatory variables should be guided by previously documented crash and injury risk factor evidence available from research literature. However, in practice it is often dictated simply by data availability. Explanatory variables generally include exposure, transport function, cross section, traffic control; less often variables describing alignment, vehicle types or road user behaviour are used . When actual variables are not available, proxy variables may be used, e.g. abutting land use as a proxy for pedestrian movement counts.
The first step in variable selection involves identifying variables which are correlated with each other. For each such pair the researcher should remove one variable which is less useful to the purpose of the model (e.g. if sealed shoulder provision is strongly correlated with line marking presence, then remove the latter). In order to further identify the statistically significant variables, a stepwise regression approach is typically used. It may be applied either in a forward selection or a backward elimination manner; in both cases selected goodness-of-fit (GOF) measures are used to assess the statistical significance. Common GOF measures include information criteria such as AIC or BIC, while others use for example scaled deviance [22, 77] or proportion of explained systematic variance [2, 45].
Based on a number of explanatory variables (model complexity), CPMs may be simple (exposure-only) or multivariate (fully-specified) . Sawalha and Sayed  warned against temptations to build overfit models, i.e. containing too many insignificant variables. In fact, a number of studies found that additional predictors are not as beneficial as expected [59, 70, 82]. One should strive for parsimonious models, i.e. the ones containing as few explanatory variables as possible . Such models enable simple interpretation and understanding, as well as easy updating .
A practice-driven approach was adopted in developing New Zealand rural road CPMs . When it was found that the statistically significant variables did not include the parameters that were of most interest to practitioners, two distinct model types were developed. Statistical models are the best-performing models according to goodness-of-fit measures at 95% confidence levels. Practitioner models contain additional variables of interest to safety professionals, at confidence levels of 70% or more.
On the other hand, in case of leaving out an influential explanatory variable due to unavailable data, so called “omitted variable bias” occurs. The bias results in biased parameter estimates that can produce erroneous inferences and crash frequency predictions [47, 50, 51].
Another bias may be caused by spatial correlation, given by the fact that adjacent road segment may share unobserved effects . This bias can be handled by using random-effect models, where the common unobserved effects are assumed to be distributed over the road segments according to some distribution and shared unobserved effects are assumed to be uncorrelated with explanatory variables .
3.5 Model function and variable forms
Before carrying out the modelling task, exploratory data analysis should be conducted, in order to detect potential outliers, check the extreme values, potential mistakes, etc.
As previously mentioned, crash data are typically overdispersed. The degree of overdispersion in a negative binomial model is represented by overdispersion parameter that is estimated during modelling along with the regression coefficients of the regression equation. The overdispersion parameter is used to determine the value of a weight factor for use in the empirical Bayes (EB) method. This method combines predicted (modelled) and recorded (observed) crash frequencies, in order to improve reliability of a specific site safety level estimation . Applications of EB methods are described in later sections of the review.
Crash frequency (i.e. response variable) ideally should not involve mixed levels of crash severity and crash types, as it may produce uninterpretable results . It is thus recommended to develop disaggregated CPMs . Alternatively one may use the observed proportion of a given crash type or severity and apply it to the CPM that has been estimated for total crashes . However, this has been found a questionable practice, leading to estimation errors . The current recommendation is estimating separate CPMs by crash types. New Zealand practice is developing models for key (or common) crash types and, if necessary, scaling their predictions to represent total crash frequency, to allow for less common crash types . Some studies [24, 27] used sub-samples (for example stratification based on AADT under/over specific limits) in order to improve model quality. In any case, developing disaggregated CPMs obviously requires larger sample sizes. In terms of severity models are developed by injury severity levels (usually with fatal and serious injury crashes combined), as with the ANRAM models . Alternatively, severity factors (proportions) are applied to models developed for all injury crashes or all crashes (including non-injury) .
Regarding function forms of explanatory variables, there is no universal guidance and various are used in the literature. To select the most suitable mathematical forms of explanatory variables, one may use graphical relationships between crash frequency or a road variable (i.e. univariate analysis) , or use more complex techniques, such as empirical integral functions and cumulative residuals (CURE) . According to Hauer , the model equation may have both multiplicative components (to represent the influence of continuous factors, such as lane width or shoulder type), and additive components (to account for the influence of point hazards, such as driveways or narrow bridges). Despite these recommendations, the typical modelling approach is often simple. The general model form of Eq. (1) is widely adopted.
Exposure is usually modelled in terms of traffic volume, i.e. single AADT value for road segments, or product of major and minor AADTs for road intersections. Function is typically a power form, but some authors considered it jointly with an exponential form (so called Ricker model ). Traffic volumes (flows) should be adapted to the specific segment and intersection types. For example, New Zealand CPMs  apply either product of flows or conflicting flows, based on the type of intersection, urban/rural settings and speed limits. As discussed, segment length variable is often used where road segments are not of equal length. For intersections, standard approach length is typically used, e.g. 50–100 m, and not modelled as a variable.
Another example is segment length, usually applied as an offset, i.e. with regression coefficient = 1, but often also in a power form [30, 67, 68]. According to Hauer , segment length should also be considered when estimating the over-dispersion parameter for the frequency models to be used in the empirical Bayes approach. However, the exact form of the relationship is not definite ; in fact, not only length but also other variables may play a role .
Creation of a model is undertaken by running relevant statistical regression processes on the sample data. The most common tools for this are statistical software packages such as R, SPSS, SAS or Matlab. Microsoft Excel is not considered appropriate for this task as it lacks many of the necessary statistical features.
In practice, the modelling process is highly iterative. Variables are added, and then removed if shown to add little or nothing to explanation of the response variable. Often data for a given variable is re-categorised to improve its significance if it is borderline. Often borderline or non-significant variables are retained if they add to better understanding of crash problem. Optimisation of the model fit vs. number of variables vs. applicability is gradually achieved. This iterative process can be stopped when little further improvement in the model is achieved with each iteration [10, 25].
3.6 Model validation
The goal of validation is proving whether the developed model is acceptable from both scientific and practical perspectives. It is thus surprising that most of modelling guidelines seem to overlook this step [1, 23, 35, 36, 48, 71, 72, 83].
According to Oh etal. , one may distinguish between internal validity and external validity.
Interval validity means that CPM findings should be consistent with established knowledge on the subject; CPM should also possess the features of the underlying phenomenon; and finally CPM should agree with fundamental information and knowledge, such as physical mechanics and dynamics involved with crashes . Newly developed CPMs may be compared to previous literature in terms of signs and magnitudes of regression coefficients, or for example their marginal effects .
External validity (goodness-of-fit) may be evaluated by comparing either models from two independent samples, or a model from a complete sample applied on selected sub-samples that have not been used in the model building (e.g. randomly-chosen 20%). Various goodness-of-fit indicators may be applied; often proportion of systematic variation in the original accident dataset explained by the model (also known as Elvik index) is used [22, 45].
3.7 Using CPMs in network screening
Previous reviews [16, 52] indicated that current state-of-practice is generally behind the state-of-the-art. According to the EB methodology, predicted crash frequency from CPMs should be combined with observed historical crash frequency to obtain the so called “expected average crash frequency with empirical Bayes adjustment” (in short EB estimate). These EB estimates benefit to the practitioner by removing much of the random statistical variation associated with historical crash data, especially at low frequencies [1, 41]. Apart from EB estimates, other safety indicators can be developed for network screening purposes, for example potential for safety improvement (PSI) , level of service of safety (LOSS)  or scaled difference .
In Australia and New Zealand, where low-volume rural roads generate very low numbers of crashes per kilometre per 5 years (or zero), CPMs provide a continuous proxy measure of safety. In Australia the ANRAM model uses EB estimates of severe casualty crashes to remove the random variation in observed crash data at 1–3 km segment level: sites are prioritised simply on the EB estimate . Differences of more than two standard errors between the EB estimate and observed crashes are noted as a possible indicator of non-infrastructure based influences of safety (e.g. localised speeding or drink-driving) .
Given the variety of available methods, HSM  notes that “using multiple performance measures to evaluate each site may improve the level of confidence in the results.” Hence sites may be ranked for treatment based on several different methods [49, 52, 89]. Those that rank consistently high using several methods are the sites where treatment should be focused.
3.8 Using CPMs in developing crash modification factors
Crash modification factor (CMF) is a multiplicative factor used to compute the expected number of crashes after implementing a given countermeasure or a design change at a location. CMFs may be derived from before-after or cross-sectional studies; however, each method has its own challenges, and available CMFs can often be highly inconsistent between literature sources . Before and after studies are generally the preferred source of CMFs, particularly for the HSM. However they typically only look at features in isolation and so when the combined effects of features on crash occurrence is not the sum of the effects of each individual feature, then they may provide misleading results. Several solutions to developing multiple treatment CMFs have been proposed, without reaching definite conclusions [17, 29, 58].
Cross-sectional studies (i.e. the ones based on CPMs) have been criticised for being more prone to non-causal safety effects, due to bias-by selection [11, 19, 36]. Bias-by-selection can occur when a treatment (e.g. a crash barrier) is applied more often to sites that already have a crash problem than to those that do not. They do however provide a much better crash prediction for the combination of road features. In some cases, CMFs are developed from CPMs where limited before and after studies are available.
Although the practice of deriving crash modification factors (CMFs) from cross-sectional CPMs has been criticised, it is relatively common. Again, there are various approaches: for example, Park etal.  tested six different methods of combining CMFs and concluded that one should not rely on only one of them. Interim solution is applying ‘rule-of-thumbs’ , such as using the product of no more than three separate independent countermeasures  or reducing the product through multiplying by a ratio 2/3 .
3.9 Using CPM tools
The above-mentioned analytical steps (data preparation, exploratory analysis, modelling, calculations) are typically conducted in statistical software or spreadsheets. Nevertheless, for an end user it is beneficial to be able to visualize the results. These may take form of tables or map outputs, for example the identified hotspots or the lists of ranked segments. A number of practitioner tools are worthy of mention, especially as they apply to network screening and analysis of safety impacts of potential treatments.
One option is using stand-alone software solutions, such as the following two from the USA:
IHSDM Crash Prediction Module  estimates the frequency and severity of crashes on a highway using geometric design and traffic characteristics. This helps users evaluate an existing highway, compare the relative safety performance of design alternatives, and assess the safety cost-effectiveness of design decisions.
SafetyAnalyst (commercial software) Network Screening Tool  identifies sites with potential for safety improvement. In addition, it is able to identify sites with high crash severities and with high proportions of specific crash types.
Note that there are close links between IHSDM, SafetyAnalyst and Highway Safety Manual. According to Harwood etal. , SafetyAnalyst Module 1 (network screening) is to be applied first, followed by Module 2 (diagnosis and countermeasure selection), Module 3 (economic appraisal and priority ranking) and IHSDM to perform safety analyses as part of the design process.
The Finnish evaluation tool TARVA  also deserves mentioning. Its purpose is to provide a common method and database for (1) predicting the expected number of crashes, and (2) estimating the safety effects of road safety improvements. Based on simple CPMs and pre-determined CMFs, it currently exists in Finnish and Lithuanian versions, with planned applications in other countries.
Capabilities of network screening and road safety impact assessment are built in commercial software PTV Visum Safety. There are also applications in the form of Excel spreadsheets, for example British COBALT, Swedish TS-EVA or Norwegian CPMs for national and country roads [37, 38]. In the US, spreadsheets were developed for safety analysis of freeway segments and interchanges (ISAT  and ISATe ).
The Australian National Risk Assessment Model (ANRAM) tool, available to road agencies, is a network screening and prioritisation tool, which uses CPMs for different road stereotypes, together with CMFs and observed crash data to estimate severe injury crashes across segmented road network . ANRAM allows users to develop and estimate benefits of road network and corridor treatment programs. This tool has gained wide use among state road agencies in Australia, particularly for the rural road networks where actual severe crashes are randomly distributed. ANRAM is available in a spreadsheet form, with planned online adaptations.
New Zealand also has a history of various safety prediction tools. Turner etal.  stressed the practical need of such tools and after review of overseas applications, considered IHSDM as worth transferring into New Zealand conditions, for assessing new road designs. A later work  reviewed New Zealand spreadsheet applications, as well as experience with using and calibrating the ISAT tool from the USA.
Increasingly, online business analytics software has been used to display CPM results in map format, often with dynamic filtering and computational functions. Examples include open source and free resources such as ArcGIS Online, QGIS, Tableau, or Microsoft Power BI. These solutions make it easy for practitioners to access and understand the value of CPMs.