On the data obtained as described before, the processing chain for traffic monitoring was tested. This experimental processing chain, consisting of several modules can be roughly divided into three major steps. These are road extraction, car detection, and car tracking (see also Fig. 4).
3.1 Road extraction
For an effective real time traffic analysis, the road surface needs to be clearly determined. The road extraction starts by forming a buffer zone around the roads surfaces using a road database as described above as a basis for the buffer formation process. In the next step, two different methods for further feature analysis can be applied. Both modules automatically delineate the roadsides by two linear features. One module works as follows: Within the marked buffer zone, edge detection and feature extraction techniques are used. The edge detection is based on an edge detector proposed by Phillipe Paillau for noisy SAR images [15]. Derived from Deriche filter [6] and proposed for noisy SAR images, we found this edge detector after ISEF filtering [18] efficient for our purpose of finding edges along the roadsides and suppressing any other kind of surplus edges and noise present. With this method, mainly the edge between the tarry road area and the vegetation is found. The alternative module searches for the roadside markings by extracting lines on a dynamic threshold image. In this module, only the longest lines are kept representing the drawn through roadside marking lines. As a side effect, the dashed midline markings are detected in this module, too. These markings often cause confusion in the car detection, since they resemble white cars. However, these false alarms can be deleted from car detection, since the module for roadside marking detection finds the dashed midline markings and stores them in a separate class.
In a next step, the roadside identification module, again with the help of the road database tries to correct possible errors (gaps and bumps) that might have crept in during the feature extraction phase. Furthermore, it smoothes the sometimes curly road boundary detections from feature extraction (see Fig. 3). Gaps due to occlusion of the road surface by crossing bridges are closed, if gapping is not too large. This has the advantage that the course of the road is not lost, although the road itself is not seen at this place. However, it could lead to false alarms in the car detection. If cars are crossing the bridge, they might be assigned belonging to the occluded road below the bridge spuriously in car detection. However, we try to sort them out by alignment, since they are elongated perpendicular to the course on the occluded road.
3.2 Vehicle detection
With the information of the roadside obtained in the processing step described before, it is possible to restrict vehicle detections and tracking only to the well determined road areas. This increases performance and enhances the accuracy of vehicle detection. Based on this, we developed an algorithm for the detection of vehicles which is described in the following.
A Canny edge algorithm [5] is applied and a histogram on the edge steepness is calculated. Then, a k-means algorithm is used to split edge steepness statistics into three parts which represent three main classes. These three classes are namely edges belonging to vehicles, edges belonging to roads, and edges within road and vehicle edges, and therefore not yet classifiable.
Edges in the class with lowest steepness are ignored, while edges in the highest steepness class are directly assumed to be due to vehicles. For the histogram part with medium steepness a hysteresis threshold is applied examining neighbourhood in order to assign edges in this class either to the vehicle or the road class. In the next step, the edges belonging to the roadside markings still contaminating the vehicle class are eliminated from the histogram.
As the roads are well determined by the road extraction, these roadside lines can be found easily. Thus, the algorithm erases all pixels with high edge steepness laying on a roadside position. These pixels are considered mostly belonging to the roadside markings. Thereby, the algorithm avoids erasing vehicles on the roadside by observing the width of the shape. Since vehicles are usually broader than roadside lines, this works well. Midline markings, which were detected by the roadside identification module based on the dynamical threshold image, are erased, too. Then, potential vehicle pixels are grouped by selecting neighboured pixels. Each region is considered to be composed of potential vehicle pixels connected to each other. With the regions obtained a list of potential vehicles is produced. In order to mainly extract real vehicles from the potential vehicle list, a closing and filling of the regions is performed. Using closed shapes, the properties of vehicle shapes can be described by their direction, area, the length and width. Furthermore, it can be checked if their alignments follow the road direction, and its position on the road can be considered as well. Based on these observable parameters, we created a geometric vehicle model. The vehicles are assumed to have approximately rectangular shapes with a specific length and width oriented in the road direction. Since they are expected to be rectangular, their pixel area should be approximately equal to the product of measured length and width and vehicles must be located on the roads. In case of several detections with very low distances the algorithm assumes a detection of two shapes for the same vehicle. Then, it merges the two detections into one vehicle by calculating averages of the positions. Finally, based on this vehicle model, a quality factor for each potential vehicle is found and the best vehicles are chosen. For traffic monitoring, the camera system is in a recording mode, that we call “burst mode”. In this mode, the camera takes a series of four or five exposures with a frame rate of 3 fps, and then it pauses for several seconds. During this pause, the plane moves significantly over ground. Then, with an overlap of about 10% to 20% to the first exposure “burst”, the second exposure sequence is started. Continuing this periodical shift between exposure sequences and brakes, we are able to perform an area-wide traffic monitoring without producing an overwhelming amount of data. Our strategy for traffic monitoring from this exposures obtained in “burst mode” is to perform a car detection only in the first image of an image sequence and then to track the detected cars over the next images (Fig. 4).
3.3 Vehicle tracking
Vehicle tracking is based on matching by normalized cross correlation (e.g. [13]). Tracking is performed on each consecutive image pair within an exposure burst. With the vehicle detection done on the first image of the burst, vehicle tracking starts with the image pair consisting of the first and second image of an image sequence. For each vehicle detected in the first image, a circular template image of a certain radius (e.g. r = 3 m for cars) is generated at the position of the vehicle detection in the first image. The vehicle position is transferred into the second image. There, a rectangular search window is opened aligned into driving direction starting at the vehicle position obtained from the detection in the first window. Thereby, driving direction is obtained from the road database.
The length of the search window depends on the maximum expected velocity for the road and the time difference between the two images. Then, the normalized cross correlation between the template image and second image is calculated while the template image is shifted all along the search window. The calculated correlation value gives a score for a possible hit. This value obtained lies between 0.0 and 1.0. We store the maximum score and the corresponding position in the second image. Furthermore we require the score to exceed a certain value for keeping it as a hit. We reached maximum correctness with an acceptable completeness in tracking by setting this score threshold to a value of 0.9. A vehicle detection that does not reach this threshold during correlation at any position in the search window is not tracked anymore. The program for tracking can be restarted with the second and the third image (and with further consecutive pairs of the exposure burst in succession) in order to track the vehicles through a whole image sequence. For vehicles that disappear at image borders or below bridges during an exposure of the sequence (but have been detected or tracked in the image before) the tracking algorithm does not dump a match. This means that disappeared vehicles are normally not confused with other vehicles or objects, because of the high matching threshold of 0.9. Vehicles occluded by bridges or other objects may be detected again after reappearance by a new vehicle detection performed on a further exposure sequence. However, they appear as new detections and loose their identification relation, but this is irrelevant on our application. Due to illumination invariance vehicles normally can be tracked if they shift from fully illuminated regions into shadow regions. In order to increase correctness, cross correlation is performed as matching in RGB color space. Here, the average score obtained from cross correlation in each of the three channels is calculated and stored. This helps since vehicles are varicoloured objects. For vehicle tracking on motorways, rotations of the template vehicle image are neglected. This is valid, since the lane change angles on typical velocities obtained on motorway is quite low due to physical reasons, and hence the change in course in between two exposures (at a frame rate of 3 fps) can be neglected. However, for city regions, rotation of the template during correlation can be switched on, but this will rise in calculation time linear with the number of rotation steps during correlation. We accelerate normalized cross correlation by an estimation of the normalization, since calculating the full norm at each position in the search window costs quite a lot of calculation time. Assuming that the illumination situation does not change a lot between two images, an upper limit of the correlation score is estimated for each correlation position in the search window.
Only if this upper limit exceeds the score threshold the exact normalized cross correlation is calculated at that position. For the estimate of the score only the first channel of the RGB-image is used. These arrangements decrease calculation time by a factor of at least four.
Since vehicle tracking based on normalized cross correlation in RGB color space itself works fine at high resolutions, it is sensitive to false vehicle detections. Although several false vehicle detections can be eliminated during tracking as outliers in direction or velocity space, other false alarms still remain in tracking. Especially objects from the dashed lane markings that were detected as vehicles erroneously, may still remain in tracking. This is due to the fact, that the object shape of the dashed markings reappears periodically within a search window and the fact that all of these markings have almost exactly the same shape and intensity. Hence, the focus for improving our traffic monitoring algorithms will be placed in future on improving the vehicle detection module.