Multimodal warning design for take-over request in conditionally automated driving

Humans are required to respond to a vehicle’s request to take-over anytime even when they are not responsible for monitoring driving environments in automated driving, e.g., a SAE level-3 vehicle. Thus, a safe and effective delivery of a take-over request from an automated vehicle to a human is critical for the successful commercialization of automated vehicles. In the current study, a set of human-in-the-loop experiments was conducted to compare diverse warning combinations by applying visual, auditory, and haptic modalities under systematically classified take-over request scenarios in conditionally automated driving. Forty-one volunteers consisting of 16 females and 25 males participated in the study. Vehicle and human data on response to take-over request were collected in two take-over scenarios, i.e., a disabled vehicle on the road ahead and a highway exit. Visual-auditory-haptic modal combination showed the best performance in both human behavioral and physiological data and visual-auditory warning in vehicle data. Visual-auditory-haptic warning combination showed the best performance when considering all performance indices. Meanwhile, visual-only warning, which is considered as a basic modality in manual driving, performed the worst in the conditionally automated driving situation. These findings imply that the warning design in automated vehicles must be clearly differentiated from that of conventional manual driving vehicles. Future work shall include a follow-up experiment to verify the study results and compare more diverse multimodal combinations.


Introduction
Automated driving can reduce traffic accidents caused by human errors, thereby resulting in environmental improvement through reduced traffic jam and offer freedom to users in non-driving activities when automated systems are active [9]. Therefore, automated driving systems have become participants of development and testing for many car manufacturers and federal research institutes. General consensus shows that highly automated cars will be prevalent on public roads by 2030 [16,23].
Many automotive companies and engineers worldwide are now focused on developing sensor systems for automated vehicles. It is important to develop high-performance sensors for automated vehicle; however, to satisfy both technological and commercial aspects, human-machine interaction (HMI) must be considered [13].
Automation levels of automated vehicles are defined based on diverse criteria; in particular, HMI depends on each automated level. For example, at the lowest level of automation, i.e., no automation or level 0 in [27], the human driver is in full control of the car. In full automation, i.e., level 5 in [27], the human driver is not involved in any driving task at all. According to Society of Automotive Engineers (SAE) automation level 3, that is, level 3 [27], humans are not obligated to monitor traffic environments because these environments are managed by the automated system in the designated area. However, when the automated system reaches its limits, the human drivers are asked to respond safely and timely. Such an automated vehicle is called a "conditional driving automation" [27]. The take-over request (TOR) situations of an SAE level 3 automated vehicle can be highly disturbing and puzzling to human drivers because the drivers may be in an "out-of-the-loop" status and completely detached from driving under the automated driving mode; as such, they fail to recognize whether a TOR is urgent. For instance, they might be texting, reading, or watching a video. However, when a human driver receives a TOR, he or she should engage in driving and "in-the-loop" situation by taking control of the vehicle either by steering, braking, or throttling. From the perspective of human drivers, such a transition can be sudden and unforeseen. Because drivers in an automated vehicle are likely to engage in non-driving activities, safely responding to a TOR with appropriate situation awareness can be challenging. Therefore, providing an understandable and non-obscure TOR method is a significant factor in enhancing the safety of automated driving technology.
It is noteworthy that not all TORs that evoke situations are the same, and two different types of TOR exist, i.e., planned vs. unplanned TOR [22]. Planned TORs are given when automated vehicles are preparing to exit an operational design domain (ODD) (e.g., vehicles are in the automated mode on a highway and about to exit the highway; both the user and vehicle knows that the vehicle is not supposed to drive after the exit) and are aware of the future situation of non-automated driving. Meanwhile, if unexpected circumstances occur, such as the sudden failure of an automated system or erasure of lane markers, an unplanned TOR would be generated.
Diverse areas of TOR research have been performed; nevertheless, a consensus standard for TORs has not been reached. Some studies have shown that using a multimodal display in the TOR situation generally leads to quicker responses and positive evaluations than using a unimodal display [20]. Our study focuses on multimodal warning methods in TOR situations. Simultaneously, we would like to observe different TOR situations; e.g., planned and unplanned ODD exits should require different multimodal warnings.
Therefore, developing TOR methodologies is important from various aspects. Because it is beyond the scope of the present study to survey an extensive amount of TOR-related studies, only a few selected TOR studies have been reviewed, as follows: Regarding the urgency of warning and modalities for manual vehicles, Campbell et al. [8] demonstrated that both visual and auditory modalities for collision warning were well understood by drivers and thus a robust design was achieved. Gold et al. [11] implemented a visual-auditory modal combination warning in a TOR scenario. Based on 62 participants' data on reaction time, presenting a take-over alert 7 s in advance resulted in a faster user reaction compared with a 5-s alert in advance. Borojeni et al. [7] compared alarm methods using LEDs on a steering wheel in a TOR scenario. Based on 21 participants' data on reaction time and the time of collision with an obstacle, they found that the user reaction was prompter when the LED was lit dynamically, flickering toward the direction of take-over transition cause, rather than when it was lit statically. Hester et al. [12] compared no alert, sound alert, and task-irrelevant voice alert in a TOR scenario. From 24 participants' data on reaction time and the time to look away from the driving task, they concluded that the task-irrelevant voice alert helped more participants to avoid collision. Louw and Merat [18] observed user behavior and gaze pattern before and after manual driving resumption under a TOR according to the degree of fog. Under a heavy fog condition, participants gazed between the road and dashboard whereas under a light fog condition, they tended to look at the middle of the road. These results indicated that although the users' gaze was dispersed while performing a secondary task, a visual alert was still necessary in TOR events. Telpaz et al. [28] studied the effect of haptic warning in a TOR scenario due to lane change. Using 26 participants' data on reaction time and eye-gaze patterns, they demonstrated that the driver's seat vibration helped to enhance the concentration on the surrounding traffic situation with a shorter user reaction. Additionally, Bazilinskyy et al. [5] reported that using a multimodal display (i.e., auditory and vibrotactile) in the TOR situation generally results in quicker responses and more positive evaluations than the using a unimodal display (i.e., vibrotactile only). Bazilinskyy et al. [6] concluded that multi-modality warning is desirable for TOR situation, and they compared visual, auditory, and haptic modalities in a simulated environment. Petermeijer et al. [23] used combinations of auditory and haptic modalities and concluded that multi-modality warnings are relevant. Yoon et al. [29] compared the effects of non-driving related tasks when given multi-modal TORs and focused on behavioral response times and self-reported questionnaires.
Preceding studies analyzed take-over transition effects by diversifying the design variables of a single modality among visual, audio, and haptic modalities. In those studies, the time required for users to resume a manual operation and several types of vehicle data were utilized for managing sudden take-over transition scenarios. A vast majority of the investigations on TOR have focused primarily on takeover scenarios to compare either the warning time (e.g., [13]) or between-modality design variables (e.g., [6,23]). Our study mainly compares between-modality designs, which is compatible with previous studies, and serves as a stepping stone for comparing within-modality designs [14]. In the present study, visual, auditory, and haptic modalities are considered to address diverse takeover scenarios while comparing warning effectiveness for young Koreans. We use vehicular and physiological metrics in addition to behavioral times, which were used by Yoon et al. [29], as mentioned above. Both planned and unplanned TOR scenarios are investigated in our study, which is rare. In the current study, visual, auditory, and haptic modalities were considered to address diverse take-over scenarios while comparing alert effectiveness for young Koreans.
Our experimental setting used take-over scenarios comparable to those used by several European transportation research groups (e.g., [11]), with the most notable difference being that we conducted our study outside of Europe. Rather than examine cultural differences, our Asia-based study can provide a link to extend European research worldwide-our research, if presented in the journal, would serve as an ambassador for European and Asian communities. Furthermore, Asia is a fast-growing market, and our research would share with European society useful ideas on conducting automated-driving studies in a global context. For example, do people react differently in take-over situations? Do they prefer different warning modalities? What are the implications of these differences? Although it might be challenging to compare our research results directly with those from the European research community since we did not perform a replication study, our results are generally compatible with relevant studies from Europe, as shown in the results section. Further opportunities to contribute to European journals will provide benefits to European and Asian communities.
The primary objective of this study is to suggest the best or worst multimodal designs by comparing the effects of diverse warning combinations in TOR scenarios in conditionally automated vehicles. The present study is novel based on the following aspects: diverse types of warning modality combinations are compared, including visual, auditory, and haptic senses in TOR scenarios. Both the reaction time of human drivers and vehicle dynamic parameters and human physiological data are considered. Additionally, take-over scenarios are categorized systematically, followed by factoring unplanned operational design domain (ODD) exit and planned ODD exit cases together. Previous studies concentrated on sudden unexpected situations, such as unplanned ODD exits; however, the present study examines warning modalities in planned ODD exit situations as well, which is more common in an actual automated driving mode. The current study is expected to contribute not only to automated driving system development but also to its evaluation process and system building.

Experimental objective and hypothesis
The primary objective of this study is to suggest the best or worst multimodal designs by comparing the effects of diverse warning combinations in TOR scenarios in conditionally automated vehicles. Hypotheses of the study are set as follows: H1: Different multimodal warnings provide different human behavioral, vehicle control, physiological, and subjective responses in an unplanned ODD exit. H2: Different multimodal warnings provide different human behavioral, vehicle control, physiological, and subjective responses in a planned ODD exit. H3: Unplanned and planned ODD exits could differ in human behavioral, vehicle control, physiological, and subjective responses for the same multimodal warning.

Apparatus
A full-scale driving simulator that can implement both automated driving and manual driving modes, visualauditory-haptic warning methods, and data collection was employed ( Fig. 1. Based on the AV Simulation SCA-NeR Sudio 1.7 program [3], a virtual driving environment and an automated vehicle function were provided. The LF Sonata, a Hyundai midsize sedan, was utilized as a cabin. Three-channel projectors and three 2080 mm × 1600 mm (width × length) screens were connected horizontally for driving scene visualization. A TOR icon (visual) was presented in the HUD position, a beep sound (auditory) through a 5.1 channel speaker, and vibration (haptic) in the driver's seat using nine electric motors. To verify the operational situation and participant status, a four-channel network video recording camera was utilized for monitoring. The participants' physiological signals of heart rate and skin conductance response were collected using a BioPac bioinstrument.

Material
In the current manually driven vehicles, visual and auditory modality-based alerts are primary while other modalities such as haptic are secondary [26]. If automated vehicles are utilized, human users would perform different tasks than those required in conventional cars. Therefore, the user behavioral pattern would become different as well. For instance, during a highly automated driving, the driver is likely occupied with non-driving activities (i.e., non-driving related tasks [NDRT]) such as texting on a phone, reading comics, watching videos, and playing mobile games; these activities prevent them from seeing the front and causes their visual modality to be occupied. Therefore, auditory and haptic stimuli could be promising as TORs [4]; it is apparent that an in-depth study is required on this issue.
Multimodal TOR warnings combining visual, auditory, and haptic modality have been designed. To suggest designs of this study for each single modality, we 1) analyzed the existing alert guidelines for manually driven vehicles described in the NHTSA Human Factor Guidance for Driver [21] of regular manually driven vehicles, 2) investigated the warning designs of mass-produced conditionally automated vehicles, and 3) reviewed preceding studies on take-over transition. Although we understood that the role of the driver in manually-driven vehicles is completely different from that in automated driving, benchmarking the vehicle guideline could provide existing driver assistance systems (DAS) alert methodology. Only relevant excerpts are provided below and detailed information can be referred in Yun et al. [30]. Based on analysis, warnings were designed for each modality, i.e., visual, auditory, and haptic, as shown in Table 1.

Visual-modality warning design
When a warning is required, using red, orange, and yellow can result in a prompt user reaction. For the most urgent situation, red is recommended [5,21]. The user reaction time becomes shorter when a simple phrase and a circular icon are utilized [19,24]. Based on these studies, we used circular red icons for TOR situations in  our visual modality. A 19-mm height is known to be the optimal size of an icon when a user looks at the icon vertically from the driver's seat, and a higher flash rate (optimal rate of 3-4 Hz) is used to signify danger or critical situations. For example, a lane-keeping assist system can provide warning with a red vehicle icon when the vehicle is non-centered [21].

Auditory-modality warning design
At least two frequencies are recommended in auditory alerts, including one from 500 to 1500 Hz [21]. According to Lin et al. [17], a 1750-Hz tone triggers a quicker human reaction than a 500-Hz tone. Therefore, in the present study, a frequency tone that is a combination of 800 Hz and 1750 Hz was utilized. Additionally, a tone length of 50 ms was used from the A8, a mass-produced vehicle by Audi that was also used as a conditional driving automated vehicle. A tone interval of 30 ms was used from IONIQ, a mass-produced vehicle by Hyundai Motor Company. The sound volume of the auditory signal should be higher by 15-30 dB than the surrounding noise, and the absolute volume was recommended to be less than 90 dB [21].

Haptic-modality warning design
A haptic device is recommended to be placed at the back and bottom parts of the driver's seat. Human sensitivity is maximal in the range from 200 to 250 Hz [21]. Fitch et al. [10] offered a pulse-type vibration alert (200-ms on, 50-ms off) as a front collision warning (FCW) to the alert in the regular vehicle alert guideline [21].

Multimodal TOR warning design combining singlemodal designs
In the present study, we developed multimodal designs by combining the single modalities, as shown in Table 1. Seven modality combination design sets were fabricated, i.e., visual only, auditory only, haptic only, visualauditory, auditory-haptic, visual-haptic, and visualauditory-haptic.

Design of TOR scenarios
Several episodic TOR scenarios could be presented and examined. However, we obtained TOR scenarios more systematically in this study. Thus, integrated scenarios that could represent all possible take-over transition cases were derived [22] for driving simulator studies. Consequently, five integrated TOR scenarios were presented, including a straight road with no event (i.e., planned ODD exit), a straight road with a missing lane marking, a curved road with a missing lane marking, a straight road with an external obstacle, and a straight road with system failure. Of these, we employed the unplanned ODD exit event of lane change due to an accident ahead (defined as Event 1) and the planned ODD

Event 1 (unplanned ODD exit)
Participants were asked to perform NDRTs while driving in the automated mode. At a certain time during the ride, the automated system provides a TOR when a stopped vehicle detected in the front had experienced an accident. The TOR was designed to be unplanned for the system and unexpected for the human in the car. Given the TOR, humans are required to initiate manual driving by either steering, braking or throttling to avoid a crash with the front vehicle. While traveling at 100 km/h through a straight section of a four-lane highway in the automated driving mode, a standstill broken car was found 167 m ahead. The situation was regarded as an unplanned ODD exit event. A TOR alert was provided 6 s prior to a possible collision with the standstill car ahead by assuming a 100-km/h driving speed. We adopted 6 s, as suggested in the Automatically Commanded Steering Function (ACSF) [1]. Determining TOR timing is another critical factor (e.g., [11,13]). In this study, we used a fixed TOR timing to eliminate the confounding factor. If no vehicle had appeared nearby that could disturb lane change, the human driver could manually change lanes to the left or right and then return to the initial lane they were travelling and continue manual operation for approximately 30 s to complete the scenario. If the participants did not take-over in the time frame, the driver would resume manual operation according to the experimenter's instruction and continue driving manually for approximately 30 s to complete the scenario.

Event 2 (planned ODD exit)
Participants were asked to perform NDRTs while driving in the automated mode. After some time, the automated system provided TORs when exiting highways, which were the planned automated driving area. The TOR was designed to be planned for the system and humans were expected to be aware of the TOR situation in advance. Given the TOR, humans are required to initiate manual driving by either steering, braking or throttling to exit the highway. While traveling at 100 km/h through a straight section of a four-lane highway in the automated driving mode, the driver has to enter the highway exit lane. When a destination was set at the beginning of the experiment, the path was predetermined. Therefore, the situation was defined as a planned ODD exit where take-over transition was predicted in advance. The TOR was provided 15 s prior to exiting the entrance, assuming a 100 km/h driving speed. We used 15 s as per the ACFS's suggestion, i.e., for normal operating conditions and when the system contains the information that system boundaries will be reached (e.g., exit of the highway), a transition demand shall be provided no later than 15 s before the system boundaries are reached [2]. In the TOR scenario, no car was present to disturb lane change, and the human driver changed lanes twice to the right to enter the exit road and continued manual operation for approximately 30 s to complete the scenario. In this case, all participants performed take-over in accordance with the TOR.

Independent variables and dependent variables
The independent variables (IVs) in the experiment were the "multimodal TOR warning designs" and the "TOR events." The experiment was designed to be a repeatedmeasures, within-subject design. The first IV was set to seven designs: 1) visual only, 2) auditory only, 3) haptic only, 4) visual-auditory, 5) visual-haptic, 6) auditoryhaptic, and 7) visual-auditory-haptic TOR warning designs. TOR events, as the second IV, were investigated for both Events 1 and 2, as described in the previous section. Owing to experimental logistics such as limited experimental time, all the seven combination sets were compared in the unplanned ODD exit take-over scenario, while only two combination sets, i.e., the visualauditory and visual-auditory-haptic were compared in the planned ODD exit take-over scenario. Six quantitative metrics were defined as dependent variables ( We analyzed diverse aspects by including human behavior, vehicle control, and physiological metrics.

Experimental procedure
The participants were first informed about the study objectives, precautions, experimental methods, vehicle simulator's basic functionality and operational method, and experimental methods. Subsequently, they completed a consent form approved by the IRB and basic demographic questionnaires. Next, their heart rate and skin conductance response were measured. After experiencing three rounds of practice driving, they participated in a series of TOR experiments, following a random order defined through simple randomization. Each participant received nine times of TORs. Before each TOR was presented, they were driving in the automated mode ranging from 4 min 30 s to 7 min 18 s -this automated timing was not informed to the participants beforehand and designed as exponential distributed with average of 5 min 30s. During the automated driving mode, the participants were instructed to perform four arbitrary NDRTs that were preset by the experimenters. We investigated the most likely behaviors of the drivers in the automated driving mode [15] and used the top-rated behaviors to simulate realistic automated driving user experiences. The NDRTs utilized in the experiment were mobile-phone keyboard tapping, video watching, cartoon reading, and tablet gaming. Every time a scenario was completed, they were surveyed on whether the TOR that they received was appropriate in relation to take-over response in a 7point Likert scale. The experimental process lasted for approximately 150 min. The orders of the TOR scenario and NDRT were randomized.

Participants
Forty-one participants (16 females and 25 males between ages 22 and 33 years) volunteered in our study. The average and standard deviation (std) of the age were 26.2 and 2.8 years, respectively. All the participants had a valid driver's license and at least 6 months of driving experience (mean = 3.7 years, std. = 2.3 years). Of them, 44% utilized the ADAS function while 56% did not. The ADAS function surveyed includes level-2 automated systems such as collision-avoidance system, lane-departure warning system, and adaptive cruise-control system. The experiment was approved by the Institutional Review Board (KMU-201803-HR-174) and complied with the IRB regulations.

Results and discussion
Repeated measures MANOVA was applied for data analysis. The significance level α was set to 0.05. Bonferroni correction was applied in the post-hoc pairwise comparison to accommodate alpha inflation [25]. Table 2 summarizes the means and standard deviation of all levels for both Events 1 and 2. Multimodal warning combinations are sorted and represent each column in Table 2 in the order of visual only, haptic only, visual-auditory, auditory only, auditory-haptic, visual-haptic, and visualauditory-haptic. This layout is to avoid any unnecessary confusion with our forthcoming conclusions. No learning or sequence effects for the reaction times were found. Experimental results are presented in the following three subsections.

Multimodal warning effects in unplanned ODD exit TOR (H1)
As shown in Table 2, all DVs, excluding the AHR, indicated noticeable differences in the "visual-only warning" case compared with the other multimodality designs. For instance, the average RT was 19.93 s under "visualonly warning," while other multimodality designs averaged between 1.73-2.32 s, indicating a magnitude of order difference. Similarly, the TTL was larger by approximately 35 s on average, whereas the SDLP was smaller by approximately 1 m on average; the SRR was approximately 0.7 s − 1 on average. The SCR was higher by approximately 10 s on average. The AHR was lower at approximately 6 bpm.
As  (Table 3) showed statistically significant differences between "visual-only warning" and all the other levels. (p < 0.001, α = 0.0024). The AHR indicated no significant difference, thus not shown in the Table 3.

Multimodal warning effects in planned ODD exit TOR (H2)
As shown in the TOR scenario under a planned ODD exit in Table 2, "visual-auditory warning" and "visualauditory-haptic warning" were examined. As shown in Table 2, the RT of "visual-auditory warning" was 0.27 s longer than that of "visual-auditory-haptic warning." Meanwhile, as shown in Fig. 4 (a), the TTL of "visualauditory warning" was 0.61 s shorter than that of "visualauditory-haptic warning." The results of repeated measures MANOVA indicated no significant difference in all the quantitative indexes except human behavioral metrics, i.e., the RT [F [1,27]

Unplanned and planned ODD exit comparison (H3)
Only two multimodal combinations, "visual-auditory warning" and "visual-auditory-haptic warning" were examined for both the planned and unplanned ODD exits. The RT was faster by 0.04 s on average in the planned ODD exit than in the unplanned ODD exit after "visualauditory warning" was given. Meanwhile, the RT was faster by 0.28 s on average in the unplanned ODD exit than in the planned ODD exit after "visual-auditory-haptic warning" was given.
The results of repeated measures MANOVA for "visual-auditory warning" indicated statistically significant differences: TTL [F [1,27] = 1349.867, p < 0.001, partial η 2 = 0.980], and SDLP [F [1,27] = 335.491, p < 0.001, partial η 2 = 0.926] (Fig. 4 (b)). Meanwhile, the RT, SRR, SCR, and AHR indicated no significant differences. The TTL was 7.37 s shorter on average in the unplanned ODD exit compared with the planned ODD exit. This is because the participants perceived the unplanned ODD exit TOR as more dangerous, as they found a broken car ahead while driving and consequently responded more promptly. However, the vehicle control metrics such as SDLP indicated differences presumably because in the unplanned ODD exit TOR, the participants drove the car manually to change lanes and then returned to the original travel lane. On the other hand, in the planned ODD exit TOR, they had to change lanes twice to the right to get onto the exit road. Thus, SDLP gaps are considered to be due to the inherent differences in two scenarios.

Conclusions
The experimental results obtained in this study are consistent with the results obtained using different participant samples in other studies, for example, Bazilinskyy et al. [6] and Yoon et al. [29]. This indicates that multimodal warnings are preferred over unimodal warnings. Our results are generally consistent with those of relevant studies from Europe. Further opportunities to contribute to European journals will benefit European and Asian communities. This study led us to conduct another comparative study on the differences among within-modality combinations [14]. For example, not all visual modalities are identical, and thus, within-modality effects warrant further investigation. Another notable point of this study is that we considered unplanned  TOR scenarios, whereas most of the extant studies on TOR have focused on planned TOR scenarios. Based on the experimental results, we concluded that 1) A visual-only TOR warning should be avoided. 2) Multimodal TOR warnings appeared to be more effective than unimodal TOR warnings. However, each modality has its own advantage, for instance, the haptic modality can elicit more immediate TOR, whereas the auditory modality can elicit a more stable TOR. A threemodality combinations such as visual-auditory-haptic can simultaneously lead to prompt human reaction and high workload. 3) Humans responded slower in the planned ODD exit situation than in the unplanned ODD exit situation, even when the same TOR modalities were provided.
In the future, we plan to compare the effectiveness of warnings in more diversified take-over transition situations. In addition, we intend to examine diverse modality designs. For example, we can use spearcons (i.e., combination of speech and earcons that resemble rapid speaking) instead of a beep sound as an auditory modality. A three degree-of-freedom motion platform was installed in our driving simulator after the experiments described in this study were conducted, and richer motion cues in a simulated environment will be considered in future experiments. Furthermore, we plan to verify the effectiveness of the multimodal alert warnings by using an automated vehicle on a real road.