A novel methodological framework for testing automated vehicle functions

Since there are more and more highly automated vehicles in road transportation with various extremely safetycritical functions, it is crucial to intensify the development of testing methods. Following this consideration, this paper introduces a new methodological framework for evaluating automated vehicle functions. In the first step, the article proposes a method to select and calibrate the possibly applied and implemented control model. Following this, the developed approach suggests the comparison of the performed test results and the identified theoretical model outputs by evaluating the similarity of the investigated distributions based on statistical hypothesis tests. If the distributions of the real system’s output and the theoretical model’s output are close enough to each other, we can assume that the operation performance of the real system in the case of the investigated operation scenario is acceptable.


Introduction
Nowadays, road transportation can be characterized by a continuously increasing ratio of highly automated vehicles. This evolution process is considerably influenced by the development of automotive technologies. In some countries, e.g., in Hungary, it is now permitted to test highly automated vehicles on public roads. Besides this, many SAE (Society of Automotive Engineers) level 1 or 2 vehicles are already on public roads [19]. The question of safety and the level of reliability related to these systems are getting more and more critical [30]. The share of liability between the driver and the driving assistance systems is not completely obvious. According to our test experiences, the thresholds where the driver needs to perform an emergency intervention are not univocal and absolutely clear. In many cases, even the influencing factors affecting the proper operation of a driving assistance system are not completely known [7]. In such circumstances, the application of such automated systems includes numerous uncertainties in itself. In accordance with this, the current article aims to identify a novel test approach that can be applied to evaluate the conformance of specific automated systems.
Winkle [28] stated that transportation safety could be improved by increasing vehicle automation [1]. Besides this, the author noted that, first of all, automated systems have to reach the level of a human's driving capability. Only in this case can automated systems considerably contribute to the safety of future transportation. This aspect makes the relevancy of testing and validation processes even more important [12].
Carsten et al. differentiated the proposed ADAS (Advanced Driving Assistance Systems) testing approaches based on the complexity of the investigated system. Systems responsible for intervening beyond informing and warning were characterized with the highest complexity by the authors. Authors highlighted the adaptive cruise control systems, such as outstandingly safety-critical systems that can directly affect velocity, acceleration, and deceleration of the vehicle [17]. For this kind of complex systems, the classical approaches focusing on the compliance with specific parametric values or with certain specifications may be inappropriate and misleading. Instead of this, process-oriented approaches can adequately contribute to the development of reliable validation models [4]. Following the remarks of Carsten and Nilsson, the current paper focuses on the development of an advanced validation concept related to adaptive cruise control (ACC) systems.
Although many research papers investigated the development possibilities of ACC systems previously [3,5], [2,6], it seems to be reasonable to discuss the taxonomy of basic ACC systems very shortly.
ACC systems can be differentiated based on their control structure and the number of control objectives [20]. Nowadays, ACC systems have two or more control objectives. A relevant control objective can be derived from the so-called free-flow case, where there is no vehicle in front of the ego vehicle inside its control range. Thus the system intends to drive the ego vehicle according to the set speed limit. A key objective can be identified based on the so-called vehicle following scenario. In this case, there is a vehicle in front of the ego vehicle inside its control range. Therefore, the system intends to identify the ego vehicle's velocity by selecting the lower speed value from the set speed limit or the velocity of the front vehicle. Besides these, an outstandingly important control objective is to perform the smoothest and most comfortable transitions between the motion states of the vehicle. In light of the introduced control objectives, the distance of the ego vehicle from the front vehicle is crucial. The characteristics of the distance function are directly influenced by the applied spacing strategy of the control model. In accordance with this, the control system may intend to keep a constant spatial distance from the front vehicle. This strategy is represented by the constant space-headway concept. In the case of the constant time-headway approach, the system intends to provide a constant time interval between the two vehicles. If the relationship of distance and speed should be represented by a nonlinear function, then the variable time-headway concept is proposed to be applied [15].
In light of the aforementioned aspects -the newly developed validation concept introduced by the paper needs to be able to characterize and evaluate the differently implemented ACC models. Accordingly, the main aim of the article is to identify a methodological framework which is able to characterize and separate the proper and ineligible operation conditions related to the analyzed automated driving assistant systems. The newly developed performance evaluation concept can strongly support the identification of cases and operation conditions where the driving assistance system is not capable of solving the driving task on its own.

Methodology
The methodology identification process aims to develop a validation concept, which corresponds to the mentioned process-oriented requirements. Accordingly, it seems to be reasonable to start from the overall framework of the driver-vehicle-environment system [29]. Based on the introduced model, three important process components can be separated: perception, decision, and action. Besides this, it is also important to emphasize that in cases of an average vehicle purchased from the market, the detailed structure of the analyzed driving assistance systems is probably not known. Therefore, the validation concept has to treat driving assistance systems as black boxes. Owing to the identified process components and the assumed black box characteristics of the investigated systems, it seems to be reasonable to focus on the measured input and output variables of the systems, which thus could cover the triplet of perception, decision, and action.
Based on the measured input and output values, the most well-known and widely applied control models related to the investigated ADAS function can be compared to the realized control model of the vehicle, which makes it possible to select the best-fitting model. This can lead to a detailed comparison of the best-fitting theoretical models, and the real implemented and measured in-built ADAS function. This approach would result in a more flexible evaluation model compared to the recently applied testing frameworks -such as New Car Assessment Program (NCAP - [16]) or UNECE [26], which focus rather on the simplified ranking of the investigated systems, evaluating the compliance of the investigated system with a specific threshold. In contrast, an advanced validation concept could characterize the whole system in a more detailed way taking into account the control model, as well as the input variables and the output variables, which would result in a complex evaluation framework.
The introduced methodological approach is adopted for the test evaluation of a commercially available 2018 middle-class car's ACC system. The analysis has evaluated the mentioned spacing strategies, namely, constant spatial distance, constant time-headway, and variable time-headway based strategies. It is necessary to mention that there are other relevant control objectives related to the ACC system concept beyond the introduced strategies. For example, many models aim to minimize the emission or optimize the fuel consumption of the vehicle. However, it must be emphasized that the article's main goal is to introduce the newly developed evaluation framework of automated vehicle functions. Accordingly, the investigation of the mentioned three basic ACC control strategies would support the evaluation framework's introduction more efficiently.

Spacing strategiesmodel identification
Strategies like the constant spatial distance [22] and the constant time-headway strategy [13,20] control the egocar's acceleration and deceleration manoeuvres to provide either a constant spatial distance or time headway from the front vehicle. Variable time-headway based strategies aim to identify the expected time-headway depending on the actual speed of the ego-and the frontvehicle [18,27].

Constant spatial distance
Following this approach, the spatial distance between the vehicles does not depend on the speed of the vehicles. This objective makes it necessary to have very accurate real-time information on the location, speed and acceleration/deceleration data of the vehicles participating in the platooning process. Accordingly, these systems need more information to warrant the required safety criteria. On the other hand, it must be mentioned that these systems can achieve the highest traffic performance [22].
where, € x i : is the desired acceleration value of the test vehicle, € x i − 1 : is the measured acceleration value of the front vehicle, x i : is the measured position value of the test vehicle, x i − 1 : is the measured position value of the front vehicle, x i : is the measured velocity value of the test vehicle, x i − 1 : is the measured velocity value of the front vehicle, h d : is the desired spatial distance between the ego-and the front vehicle, K 1 , K 2 : are coefficients.

Constant time-headway
According to constant time-headway strategy, the timegap between the vehicles does not depend on the speed of the vehicles. The aim of the control process is to minimize the differences of the actual and the desired time-headway between the vehicles and the speed difference of the front and the test vehicle [15].
where, € x i : is the desired acceleration value of the test vehicle, x i : is the measured position value of the test vehicle, x i − 1 : is the measured position value of the front vehicle, x i : is the measured velocity value of the test vehicle, x i − 1 : is the measured velocity value of the front vehicle, h d : is the following time distance between the two vehicles, K 1 , K 2 : are coefficients.

Variable time-headway
Earlier, many adaptive cruise control systems were developed to assist car-following driving processes [8,10,14,24]. In this investigation, I adapted the Intelligent Driver Model (IDM) developed by Treiber and his colleagues, which is more realistic and can achieve better performance than most of the deterministic carfollowing systems [11,23,25].
where, € x i : is the desired acceleration value of the test vehicle, x i : is the measured position value of the test vehicle, x i − 1 : is the measured position value of the front vehicle, x i : is the measured velocity value of the test vehicle, x i − 1 : is the measured velocity value of the front vehicle, s′: is the desired headway, decc: is the desired deceleration, s 0 : headway between the stopped cars, T desired time gap, K 1 : is the maximum acceleration, K 2 : is the maximum speed, K 3 : is the acceleration exponent.

Performed measurements
The test of the ACC system was performed in Kistarcsa (Hungary), in a separate vehicle parking facility. The test vehicle had an ACC Plus car-following system, which is responsible for adjusting the vehicle speed in the case of a detected object. The input signal is generated by a radar sensor located at the midpoint of the front of the vehicle. The car-following system of the test vehicle was set to 30 km/h. At the same time, the front vehicle was driven with 20 km/h and then the front vehicle started to decelerate. Following the Euro NCAP methodology, four different cases were tested (Fig. 1). The first test case was the complete overlapping when the two vehicles' center lines (planes of longitudinal symmetry)

Applied analytical methods
In the first step, I calibrated the above-introduced car following models by identifying the constant parameters of the equations (eqs. 1-3). During the calibration process, the aim of the optimization problem was to minimize the difference (F) between the measured test results (a i ) and the estimated output values of the introduced control function: After identifying the three models, the selection of the best fitting function was based on the Pearson correlation coefficient ( ρ a i ;€ x i ) describing the relationship between the measured and the estimated acceleration function [21]: Where, covða i ; € x i Þ: is the covariance of the two variables, σ a i : is the standard deviation of a i , σ € x i : is the standard deviation of € x i .
After selecting the best fitting model, I investigated how well the implemented car-following model can perform in different overlapping situations. To evaluate the performance of the identified model, I analyzed whether the output values generated by the theoretical model and the recorded real test values describing the operation of the inbuilt ACC are from the same continuous distribution at a specified significance level. For this purpose, I apply the Kolmogorov-Smirnov test [9]. This approach can be applied to express the impact of the overlapping on the significance level of accepting the similarity of the expected and the real acceleration/deceleration values.
Let us assume the first sample can be characterized by size m, and its cumulative distribution function is indicated by F(x). At the same time, suppose that the second sample can be characterized by size n, and its observed cumulative distribution function is indicated by G(x).
My null hypothesis is that both samples are from the same distribution. If D m, n is larger than D m, n, α , I reject the null hypothesis at significance level α where D m, n, α is the critical value.

Results and discussion
Following the developed methodological framework, in the first step, the car following model was identified. Applying eq. (5), I performed the calibration of the introduced car-following models.
It is important to emphasize that only those parts of the test were used in identifying the parameter set of the ACC function where the system has worked properly, and the integrity of safety was not harmed. Accordingly, the generated parameter values of the ACC function represent a properly operating system. The goodness-offit of the first car-following (eq. 1) model is illustrated in the figure below (Fig 2) by the speed values as a function of time derived from the acceleration data. The correlation between the measured (Test) and calculated (model) speeds is 97.5%.
In the case of the second model (eq. 2), the correlation between the measured (Test) and calculated (model) Fig. 1 The analyzed scenarios speeds is 98% (Fig 3). The goodness-of-fit value of the third model is 99.5%. Based on the achieved results, the third model of variable time-headway was identified as the best fitting model. The coefficients of the model are decc = 1.6, s 0 = 1, T = 1, K 1 = 0.7, K 2 = 30, and K 3 = 3.2.
In the next step, I investigated whether the measured and the calculated acceleration values are from the same distribution or not. In the cases of the 100% and the 50% overlapping scenarios, the distributions of the measured and the calculated values were outstandingly similar. Furthermore, in these cases, the ACC module could not have been confused. Accordingly, the test driver did not need to perform any intervention in the driving-process; therefore, I focus on the 25% and 10% cases in the next sections.
In the cases of the 25% and 10% overlapping scenarios, (Fig 4) there were extreme situations, where the ACC system made unsafe decisions, and the test driver corrected the process.
Following this, the effect of human intervention had to be detached from the operation process controlled by  the car-following system. Accordingly, based on the recorded video files, the moment of human intervention was identified. Based on the recorded data, the average length of the human intervention was 2 seconds. Therefore, the data of the next 2 seconds following the human interventions had to be replaced. Accordingly, the velocity and acceleration values of the next 2 seconds after the intervention were replaced in the database by the estimated and corrected ACC output. The modified data was estimated based on the linear extrapolation of the data from the previous 2 seconds.
With this, I managed to generate the dataset describing the estimated output values of the investigated ACC system, assuming no human avoidance intervention. Thus, it is possible to compare the corrected outputs of the real system and the generated outputs of the theoretical model. To evaluate how similar the distributions of the two samples are, I used the Kolmogorov-Smirnov test. In this case, the null hypothesis assumes that the  elements of the two samples are from the same distribution. Beyond the evaluation of certain overlapping cases, this approach makes it possible to compare different overlapping scenarios through the identification of the threshold significance level, where the specific hypothesis can still be accepted. According to my expectations, the reduction of overlapping leads to a serious decrease in the threshold significance level of accepting the null hypothesis.
In the case of the 10% overlapping scenario, the threshold value is 0.1%. This indicates that the null hypothesis can only be accepted at a 0.1% significance level. In the case of larger significance level values, the null hypothesis has to be rejected. This result suggests that the two distributions probably differ from each other, and the performance of the real system is far from the expected.
The three axes of the diagram represent the distance between the two vehicles (headway), the actual velocity and acceleration of the test vehicle. The magnitude of the difference between the calculated and measured values is represented by the lines connecting the related yellow and red points. It is clear from the figure that the deviation of the measured acceleration values is significantly higher than the values of the theoretical system. This also supports the assumption that in the case of 10% overlapping, the performance of the system is significantly lower than the performance of the theoretical model. It can also be observed that in the case of moderate tracking distances (~15-20 m), and moderate speeds (5-7 m/s) the absolute values of the real corrected acceleration (yellow points) are frequently higher than the absolute values of the acceleration generated by the theoretical model (red points). This represents well that the real system did not provide the expected proper deceleration levels, and in some cases it did not decelerate but provided a slight acceleration as an output. Accordingly, it was necessary for the driver to perform an intervention in order to respond to the deceleration of the front car.
In the case of the 25% overlapping scenario, (Fig 5) the threshold value is much higher: 12%. This indicates that the null hypothesis can be accepted at a 12% significance level. In the case of larger significance level values, the null hypothesis has to be rejected. This result suggests that the two distributions probably are much closer to each other than in the previous cases. Accordingly, the system can be characterized with a better performance in the case of the 25% overlapping scenario; however, it is still far from good.
The achieved results are in accordance with my expectations, i.e. the reduction of overlapping leads to a serious decrease in the threshold significance level of accepting the null hypothesis.
The deviation of the measured and the generated acceleration values differ less in the case of the 25% overlapping scenario. Accordingly, I can conclude that the performance of the system is better in the case of 25% overlapping than in the case of 10% overlapping. It can also be observed that in the case of lower tracking distances (< 15 m), the deviation of the real corrected acceleration (yellow points) increases. This represents that the performance of the real system is still far from good in the case of 25% overlapping.
Based on the introduced ACC specific testing process, it is possible to identify the structure of a generalized methodology that can be applied to evaluate the performance of other automated vehicle functions as well. In the first step, it is necessary to calibrate the possibly applied control models based on the results of the performed tests (see Fig. 6). After this, it is possible to select the implemented model based on the value of the Pearson correlation coefficient. In the next step, the operation characteristics of the investigated system have to be measured in the case of different operation dependent scenarios. This allows us to compare the results of the tests and the theoretical model outputs by evaluating the similarity of the different distributions with the tools of statistical hypothesis testing. In accordance with Fig. 6, if the distributions of the real system's output and the theoretical model's output are similar, then the operation performance of the system can be accepted in the case of a specific scenario. On the other hand, this can also enable us to analyze the system operation characteristics in the case of different scenarios by comparing the significance level of accepting the similarity of the real system's output and the theoretical model's output.

Conclusion
Following the performed literature review, it can be concluded that road transportation can be characterized by a continuously increasing ratio of highly automated vehicles. The question of safety and the level of reliability related to these systems are getting more and more critical. The share of liability between the driver and the driving assistance systems is not completely obvious. According to the test experiences, the thresholds where the driver needs to perform an emergency intervention are not univocal and absolutely clear. In accordance with this, the current article identified a novel test approach that can be applied to evaluate the conformance of specific automated systems.
Based on the measured input and output values, the best known and widely applied control models related to the investigated car-following function were compared, which made it possible to select the best-fitting model. This led to a detailed comparison of the best-fitting theoretical models, and the real implemented and measured in-built ADAS function. This approach resulted in a more flexible evaluation model compared to the recently applied testing frameworks -such as NCAP [16] or UNECE [26], which rather focus on the simplified ranking of the investigated systems. In contrast, an advanced validation concept can characterize the whole system in a more detailed way taking into account the control model, as well as the input variables and the output variables.
In the first step, the control model has to be selected. Here three different control models were considered during the model identification process: strategies like the constant spatial distance [22] and the constant timeheadway strategy [13,20] control the ego-car's acceleration and deceleration maneuvres to provide either a constant spatial distance or time headway from the front vehicle. Variable time-headway based strategies aim to identify the expected time-headway depending on the actual speed of the ego-and the front-vehicle [18,27].
The measurement of an implemented car-following system was organized in a separate parking facility. The test vehicle was equipped with an ACC Plus carfollowing system, which can influence the velocity of the vehicle if an obstacle is detected on the road surface.
The performed test followed the Euro NCAP methodology, in the case of four different cases. The first test case was the complete overlapping when the two vehicles' center lines coincided. The second test case was the 50% overlapping when the distance of the two vehicles' center lines was approximately half of the width of the test vehicle. The third test case was the 25% overlapping when the distance of the two vehicles' center lines was approximately 75% of the width of the test vehicle. The fourth test case was the 10% overlapping when the distance of the two vehicles' center lines was approximately 90% of the width of the test vehicle. All the tests were implemented five times.
In light of the performed hypothesis tests, I can conclude that the achieved results are in accordance with my expectations. Thus, the reduction of overlapping leads to a serious decrease in the threshold significance level of accepting the null hypothesis. In other words, under 25% overlapping, especially in the case of 10% overlapping, the reliability of the investigated carfollowing system reduces significantly.
Finally, the foundation of a generic testing methodology has been laid, which can be used in the future for evaluating complex automated vehicle functions.