- Original Paper
- Open Access

# Short-term traffic flow prediction using seasonal ARIMA model with limited input data

- S. Vasantha Kumar
^{1}Email author and - Lelitha Vanajakshi
^{2}

**7**:21

https://doi.org/10.1007/s12544-015-0170-8

© The Author(s) 2015

**Received:**17 December 2013**Accepted:**30 May 2015**Published:**13 June 2015

## Abstract

### Background

Accurate prediction of traffic flow is an integral component in most of the Intelligent Transportation Systems (ITS) applications. The data driven approach using Box-Jenkins Autoregressive Integrated Moving Average (ARIMA) models reported in most studies demands sound database for model building. Hence, the applicability of these models remains a question in places where the data availability could be an issue. The present study tries to overcome the above issue by proposing a prediction scheme using Seasonal ARIMA (SARIMA) model for short term prediction of traffic flow using only limited input data.

### Method

A 3-lane arterial roadway in Chennai, India was selected as the study stretch and limited flow data from only three consecutive days was used for the model development using SARIMA. After necessary differencing to make the input time series a stationary one, the autocorrelation function (ACF) and partial autocorrelation function (PACF) were plotted to identify the suitable order of the SARIMA model. The model parameters were found using maximum likelihood method in R. The developed model was validated by performing 24 hrs. ahead forecast and the predicted flows were compared with the actual flow values. A comparison of the proposed model with historic average and naive method was also attempted. The effect of increase in sample size of input data on prediction results was studied. Short term prediction of traffic flow during morning and evening peak periods was also attempted using both historic and real time data.

### Concluding remarks

The mean absolute percentage error (MAPE) between actual and predicted flow was found to be in the range of 4–10, which is acceptable in most of the ITS applications. The prediction scheme proposed in this study for traffic flow prediction could be considered in situations where database is a major constraint during model development using ARIMA.

## Keywords

- Time-series analysis
- SARIMA
- Flow prediction
- Intelligent transportation systems
- Limited input data

## 1 Introduction

The exponential growth of personal vehicles (cars and two-wheelers), combined with increase in trips and trip lengths results in acute traffic congestion in most of the metropolitan cities around the world. In recent years, the focus of congestion reduction have shifted from infrastructure- and capital-intensive transportation strategies to more balanced and sustainable transportation solutions like Intelligent Transportation Systems (ITS). Traffic forecasting, the process of predicting future traffic conditions in short-term or near-term future, based on current and the past traffic observations is an important component of any of the Intelligent Transportation Systems (ITS) applications. Short-term traffic flow forecasting, which involves the prediction of traffic volume in the next time interval usually in the range of five minutes to 1 h, is one of the important research problem in the field of ITS addressed by many researchers in the last two decades. Traffic flow or the number of vehicles crossing a particular point per unit time period is a point process or in other words, it is a type of random process which consists of a set of isolated points collected over time [1]. For modelling of such point processes, data driven approaches based on statistical techniques are usually employed to identify the stochasticity in the observed data [2]. In general, the statistical techniques used for the problem of traffic flow prediction can be classified as non parametric or parametric statistical techniques [3]. The nonparametric techniques include nonparametric regression [4] and neural network [5–16]. The parametric techniques include linear and nonlinear regression, historical average algorithms [6], smoothing techniques [6, 11, 17], and autoregressive linear processes [3, 7, 11, 17–26]. It is reported that the time series analysis based techniques like the autoregressive integrated moving average (ARIMA) is one of the most precise methods for the prediction of traffic flow when compared to other available techniques as mentioned above [27]. The time series models try to identify the pattern in the past data by decomposing the long term trends and seasonal patterns and extrapolate that pattern into the future. Since the traffic flow pattern exhibits a strong seasonal pattern due to peak and off-peak traffic conditions which is repeating more or less on the same time every day, it is said that, seasonal ARIMA (SARIMA) models are particularly relevant to model traffic flow behavior [3, 23, 24, 26, 27]. In many studies, the SARIMA model is found to perform better than the models based on random walk, linear regression, support vector regression (SVR), historical average, and simple ARIMA [23, 24, 26, 28]. Smith et al. [29] reported that the best-performing k-NN forecast models (non-parametric) did not reach the predictive performance of SARIMA (parametric).

Reported studies on the use of SARIMA models for flow prediction mainly suffers from a drawback that, they used a huge historical database for model development. For example, Smith et al. [29] used previous 45 days of 15 min. flow observations for the next day traffic flow forecasting. More than 2 months of traffic volume observations was used by Williams and Hoel [23] and around 60, 000 flow observations aggregated for each 3 min. intervals spanned over a period of 106 days was used by Stathopoulos and Karlaftis [30]. Ghosh et al. [24] used 20 days of 15 min. flow data with a total of 1920 observations. Mai et al. [27] used 15 min aggregated traffic volume observations over a period of 26 days for fitting the SARIMA based traffic flow prediction model. Dong et al. [25] used 2 months of flow observations aggregated to 5 min. intervals as input to ARIMA model for predicting the flow for the test day of interest. Lippi et al. [26] used 4 months of flow data from loop detectors placed around nine districts of California for model development using SARIMA. Tan et al. [11] used a time series of traffic flow collected over several years for model development using ARIMA. The use of such a huge database for model building may restrict its application in places where the data availability could be an issue. Sometimes, the storage and maintenance of the historical databases could be a difficult task. Thus, it will be ideal if a SARIMA model can be developed for predicting flow, which need only limited input data for model development. The present study is an attempt in this direction, in which only previous 3 days flow observations aggregated to the required time interval has been used in the prediction scheme developed using SARIMA for predicting the next day (24 h. ahead forecast) flow values with a desired accuracy. The use of previous 3 days flow data as input can capture the peak and off-peak traffic conditions which is repeating more or less on the same time every day. Short term prediction of traffic flow during morning and evening peak periods was also attempted using both historic (previous 3 days flow data) and real time data on the day of interest.

The following section gives the details of the selected study stretch, data collection and extraction techniques for prediction scheme development and corroboration. Section 3 explains the step by step procedure of the development of proposed scheme for traffic flow prediction using SARIMA using only previous 3 days flow data as input. The corroboration of the prediction scheme using the actual data from the field is explained in section 4. Short term prediction of traffic flow using both historic and real time data is presented in section 5 followed by concluding remarks in section 6.

## 2 Data collection and extraction

The study stretch considered for the present study was on Rajiv Gandhi road in Chennai, India. The selected road is one of the busy arterial roads in Chennai and is also known as IT corridor or Old Mahabalipuram road. More than 30,000 vehicles use this road daily. It is a 6 lane roadway, with 3 lanes in each direction. For the present study only one direction of traffic was considered. The automated traffic sensor namely the Collect-R camera [31] permanently fixed at one of the location of the selected study stretch was utilized to obtain the required data on vehicular flow for the model development and corroboration of the prediction scheme. Flow data from three consecutive days (September 20, 21 and 22, 2012) was collected from the Collect-R camera and used for the model development. The flow data corresponding to September 23, 2012 was used for model validation. The raw data from the automated traffic sensor contained each one minute class-wise traffic flow for the entire 24 h from 12 midnight to 12 midnight. As the prediction scheme is based on time series analysis, which basically requires a series of discrete observations collected over time, the input could be either class-wise traffic flow or total vehicular flow aggregated into any desired uniform time interval. For the present study, the total number of vehicles aggregated into ten minute time intervals were considered as input. However, the proposed prediction scheme could be extended to any desired time interval with the input of class-wise vehicular flow also. Hence, the data extraction involved the summing up of class-wise traffic flow in each one minute interval and then aggregating into 10 min intervals. The observed flow in each 10 min interval was then converted to vehicles per hour. Thus for each day, 144 flow values were available (24 h × 6 data points/hr) as input to the prediction scheme. The same process was repeated for all the 4 days (three consecutive days for model development and next consecutive day for model validation) to get the total number of vehicles in each 10 min. intervals.

## 3 Development of prediction scheme using SARIMA

The development of proposed scheme for traffic flow prediction using SARIMA involved four steps of model identification, model estimation, diagnostic checking and forecasting/validation of the developed model. The first three steps are explained in this section. The last step of model validation is explained in section 4.

### 3.1 Model identification

*x*

_{ t }−

*x*

_{ t − 12}) is required for monthly data with seasonality. If the series contains trend as well as seasonality, both non-seasonal and seasonal differencing needs to be applied as two successive operations in either order. If there is neither obvious trend nor seasonality, such series can be modelled by AR, MA or ARMA models. It is not advisable to go beyond two differencing as over-differencing can cause unnecessary levels of dependency in the time series data. The time series plot of the observed 10 min. flow in veh/hr of three consecutive days is shown in Fig. 1. It can be seen that, there is a clear seasonal pattern in the observed traffic flow with seasonality of 24 h. This shows that, the time series data could be modelled using SARIMA. It can be seen from Fig. 1 that the morning and evening peak hours were clearly repetitive and showed similar variation across the days. Inspection of the plots also suggests that, there is no increasing or decreasing long-term trend in the data.

*x*

_{ t }−

*x*

_{ t − 144}) was adopted. For the differenced series, the ACF and PACF were plotted and are shown in Fig. 2. It can be seen from Fig. 2 that, there is a gradual tapering of ACF towards zero, which clearly suggests a possible AR process for the non-seasonal part. The order of the AR model could be found in PACF. There are three significant non-zero autocorrelations at early lags in PACF and this indicates a possibility of 3rd order AR model for the non-seasonal part. However, there is a sharp cut-off after lag 2 in PACF and this suggests a possibility of AR(2) process. On the other hand, the PACF at lag 1 is comparatively higher than that of lag 2 and 3, showing the possibility of AR(1) for the non-seasonal part. The ACF and PACF at seasonal lag of 144 in Fig. 2 indicated a possible MA(1) process for the seasonal model as there is a significant spike in ACF at lag 144. Hence, the possible combination of models that can be tried include ARIMA (3,0,0) × (0,1,1) 144, ARIMA (2,0,0) × (0,1,1) 144, ARIMA (1,0,0) × (0,1,1) 144. Once the possible models and their corresponding orders were found, the next step of model estimation was performed as explained below.

### 3.2 Model estimation and diagnostic checking

*φ*′

*s*,

*θ*′

*s*,

*Φ*′

*s*, and

*ϑ*′

*s*. In the present study, one of the most widely used estimation method, namely the ‘maximum likelihood’ method was adopted using R software. The estimation procedures are not covered in this paper and the details of it can be obtained from Brockwell and Davis [32]. The generally accepted principle is that the model with the fewest parameters that can adequately describe the process has to be selected [33]. If two different models are fitting a series equally well, the model with less number of parameters should be preferred because estimation of parameters will be more precise for models with fewer parameters. From the selected feasible models, most suitable one is selected based on the goodness-of-fit. The present study uses Akaike’s Information Criteria (AIC) given by Eq. (1) to select the best model. The model with lowest AIC will be the best one.

*σ*

_{ k }

^{2}is the estimate of variance,

*n*is the number of samples and

*k*is the number of parameters. The results of model estimation are shown in Table 1. The usual procedure is to choose a model that has low AIC. Since the ARIMA (2,0,0) × (0,1,1)144 model showed a AIC of 4218.34 which is less when compared to that of other two models, the model ARIMA (2,0,0) × (0,1,1)144 was finally selected and corroboration of the chosen model is detailed in the following section.

Parameters of the SARIMA model

Model | Type | Parameters |
| AIC | |
---|---|---|---|---|---|

(3,0,0) × (0,1,1) 144 | Non-seasonal AR | AR | 0.34 | 5.80 | 4218.7 |

AR | 0.27 | 4.59 | |||

AR | 0.07 | 1.27 | |||

Seasonal MA | MA | −0.41 | −3.38 | ||

(2,0,0) × (0,1,1) 144 | Non-seasonal AR | AR | 0.36 | 6.49 | 4218.3 |

AR | 0.30 | 5.36 | |||

Seasonal MA | MA | −0.40 | −3.37 | ||

(1,0,0) × (0,1,1) 144 | Non-seasonal AR | AR | 0.51 | 10.1 | 4243.6 |

Seasonal MA | MA | −0.33 | −3.13 |

## 4 Corroboration of the prediction scheme

Details of the number of days considered in the SARIMA model for various scenarios

Scenario | Number of previous days considered in the model to predict traffic flow on June 06, 2014 | ||||||||
---|---|---|---|---|---|---|---|---|---|

1 | June 03 (Tue) | June 04 (Wed) | June 05 (Thu) | ||||||

2 | June 02 (Mon) | June 03 (Tue) | June 04 (Wed) | June 05 (Thu) | |||||

3 | May 30 (Fri) | June 02 (Mon) | June 03 (Tue) | June 04 (Wed) | June 05 (Thu) | ||||

4 | May 29 (Thu) | May 30 (Fri) | June 02 (Mon) | June 03 (Tue) | June 04 (Wed) | June 05 (Thu) | |||

5 | May 28 (Wed) | May 29 (Thu) | May 30 (Fri) | June 02 (Mon) | June 03 (Tue) | June 04 (Wed) | June 05 (Thu) | ||

6 | May 27 (Tue) | May 28 (Wed) | May 29 (Thu) | May 30 (Fri) | June 02 (Mon) | June 03 (Tue) | June 04 (Wed) | June 05 (Thu) | |

7 | May 26 (Mon) | May 27 (Tue) | May 28 (Wed) | May 29 (Thu) | May 30 (Fri) | June 02 (Mon) | June 03 (Tue) | June 04 (Wed) | June 05 (Thu) |

## 5 Real time short term traffic prediction

## 6 Concluding remarks

Timely and accurate prediction of traffic flow is essential for proactive traffic management and control in Advanced Traffic Management Systems (ATMS) and real-time route guidance in Advanced Traveler Information Systems (ATIS). Among the techniques available for traffic flow prediction, time series analysis using ARIMA models is one of the most precise methods and SARIMA in particular is relevant to model traffic flow behavior. However, the main drawback of data driven approaches is the requirement of huge historical database for model development. For example, the reported studies on the use of SARIMA for flow prediction used flow data in the order of several months for development of the prediction scheme. Use of such huge database may restrict its application in places where the data availability could be an issue. Also, the storage and maintenance of the historical databases sometimes becomes a difficult task. It may also involve more computational time and resources for running the SARIMA model when the input time series is large. The present study tries to overcome the above issues by proposing a prediction scheme using SARIMA model for short term prediction of traffic flow using only limited input data. In the prediction scheme, only previous 3 days of flow observations was considered as input for predicting the next day (24 h. ahead forecast) flow values. Short term prediction of traffic flow during morning and evening peak periods was also attempted using both historic and real time data. The results were promising and the prediction scheme proposed in this study for traffic flow prediction could be considered in situations where database is a major constraint during model development using ARIMA.

## Declarations

### Acknowledgments

The data collection effort in this project was made possible through sponsored projects from Ministry of Urban Development (MoUD), Govt. of India (through their sponsorship of Centre of Excellence in Urban Transport at IIT Madras).

**Open Access** This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.

## Authors’ Affiliations

## References

- Zhang GY, Zhao Q, Luo ZW, Wei H (2009) Short-term traffic flow prediction with ACD and particle filter. Proceedings of the 9th International Conference on Chinese Transportation Professionals, Harbin, ChinaGoogle Scholar
- Chien SI, Xiaobo L, Ozbay K (2003) Predicting travel times for the South Jersey real-time motorist information system. Trans Res Rec J Transp Res Board 1855:32–40View ArticleGoogle Scholar
- Ghosh B, Basu B, Mahony MO (2007) Bayesian time-series model for short-term traffic flow forecasting. J Transp Eng 133(3):180–189View ArticleGoogle Scholar
- Davis GA, Nihan NL (1991) Nonparametric regression and short-term freeway traffic forecasting. J Transp Eng 117(2):178–188View ArticleGoogle Scholar
- Vythoulkas PC (1993) Alternative approaches to short-term traffic forecasting for use in driver information systems. Proc., 12th Int. Symp. on traffic flow theory and transportation, Berkeley, CaliforniaGoogle Scholar
- Smith BL, Demetsky MJ (1994) Short-term traffic flow prediction: neural network approach. Transp Res Rec J Transp Res Board 1453:98–104Google Scholar
- Kirby HR, Watson SM, Dougherty MS (1997) Should we use neural network or statistical models for short-term motorway traffic forecasting. Int J Forecast 13(1):43–50View ArticleGoogle Scholar
- Yin HB, Wong SC, Xu JM, Wong CK (2002) Urban traffic flow prediction using a fuzzy-neural approach. Transp Res C 10(2):85–98View ArticleGoogle Scholar
- Lee DH, Zheng WZ, Shi QX (2004) Short-term freeway traffic flow prediction using a combined neural network model. Proceedings of the 84th Annual Transportation Research Board Meeting, Washington D.C., USAGoogle Scholar
- Vlahogianni EI, Karlaftis MG, Golias JC (2005) Optimized and meta-optimized neural networks for short-term traffic flow prediction: a genetic approach. Transp Res C 13(2):211–234View ArticleGoogle Scholar
- Tan MC, Wong SC, Xu JM, Guan ZR, Zhang P (2009) An aggregation approach to short-term traffic flow prediction. IEEE Trans Intell Transp Syst 10(1):60–69View ArticleGoogle Scholar
- Cetinar BG, Sari M, Borat O (2010) A neural network based traffic-flow prediction model. Math Comput Appl 15(2):269–278Google Scholar
- Ogunwolu L, Adedokun O, Orimoloye O, Oke SA (2011) A neuro-fuzzy approach to vehicular traffic flow prediction for a metropolis in a developing country. J Ind Eng Int 7(13):52–66Google Scholar
- Ge Y, Wang G (2011) Study of traffic flow short-time prediction based on wavelet neural network. Lect Notes Electr Eng 98:509–516View ArticleGoogle Scholar
- Pamula T (2012) Traffic flow analysis based on the real data using neural networks. Commun Comput Inf Sci 329:364–371View ArticleGoogle Scholar
- Kumar K, Parida M, Katiyar VK (2013) Short term traffic flow prediction in heterogeneous condition using artificial neural network. Transportation. doi:10.3846/16484142.2013.818057 Google Scholar
- Williams BM, Durvasula PK, Brown DE (1998) Urban traffic flow prediction: application of seasonal autoregressive integrated moving average and exponential smoothing models. Transp Res Rec J Transp Res Board 1644:132–144View ArticleGoogle Scholar
- Ahmed MS, Cook AR (1979) Analysis of freeway traffic time-series data by using Box–Jenkins techniques. Transp Res Rec J Transp Res Board 722:1–9Google Scholar
- Levin M, Tsao YD (1980) On forecasting freeway occupancies and volumes. Transp Res Rec J Transp Res Board 773:47–49Google Scholar
- Nihan NL, Holmesland KO (1980) Use of the Box and Jenkins time series technique in traffic forecasting. Transportation 9(2):125–143View ArticleGoogle Scholar
- Hamed MM, Al-Masaeid HR, Bani Said ZM (1995) Short term prediction of traffic volume in urban arterials. J Transp Eng 121(3):249–254View ArticleGoogle Scholar
- Lee S, Fambro DB (1999) Application of subset autoregressive moving average model for short-term freeway traffic volume forecasting. Transp Res Rec J Transp Res Board 1678:179–188View ArticleGoogle Scholar
- Williams BM, Hoel LA (2003) Modeling and forecasting vehicular traffic flow as a seasonal ARIMA process: theoretical basis and empirical results. J Transp Eng 129(6):664–672View ArticleGoogle Scholar
- Ghosh B, Basu B, Mahony MO (2005) Time-series modelling for forecasting vehicular traffic flow in Dublin. Proceedings of the 85th Transportation Research Board Annual Meeting, Washington, D.CGoogle Scholar
- Dong H, Jia L, Sun X, Li C, Qin Y (2009) Road traffic flow prediction with a time-oriented ARIMA model. Fifth International Joint Conference on INC, IMS and IDC, Seoul, Korea, 1649–1652Google Scholar
- Lippi M, Bertini M, Frasconi P (2013) Short-term traffic flow forecasting: an experimental comparison of time-series analysis and supervised learning. IEEE Trans intell Transp Syst 14(2):871–882View ArticleGoogle Scholar
- Mai T, Ghosh B, Wilson S (2012) Multivariate short-term traffic flow forecasting using Bayesian vector autoregressive moving average model. Proceedings of the 91st Transportation Research Board Annual Meeting, Washington, D.CGoogle Scholar
- Chung E, Rosalion N (2001) Short term traffic flow prediction. Proc. of the 24th Australian Transportation Research Forum, Hobart, TasmaniaGoogle Scholar
- Smith BL, Williams BM, Oswald RK (2002) Comparison of parametric and nonparametric models for traffic flow forecasting. Transp Res C 10:303–321View ArticleGoogle Scholar
- Stathopoulos A, Karlaftis MG (2003) A multivariate state space approach for urban traffic flow modeling and prediction. Transp Res C 11:121–135View ArticleGoogle Scholar
- Traficon (2010) User guide traficam collect-r data collection sensor, Wevelgem, Belgium.Google Scholar
- Brockwell P, Davis R (2011) Introduction to time series and forecasting. Springer, IndiaGoogle Scholar
- Caldwell JG (2006) TIMES Box-Jenkins forecasting systems. http://www.foundationwebsite.org/ TIMESVol1TechnicalBackground.pdf Accessed November 19, 2013
- Kenneth DL, Ronald KK (1982) Advances in business and management forecasting. Emerald books, UKGoogle Scholar
- Wu Y, Chen F, Lu C, Smith B (2011) Traffic flow prediction for urban network using spatio-temporal random effects model. Proceedings of the 91st Transportation Research Board Annual Meeting, Washington, D.CGoogle Scholar