Recognition of built-up and non-built-up areas from road scenes
European Transport Research Review volume 8, Article number: 17 (2016)
In many cases, it does not follow from the road design, whether the given scene is within or outside the posted built-up area. The purpose of this paper is to evaluate road scenes, how far they can be considered being of built-up and non-built-up nature, as well as to identify road scenes which are ambiguous and therefore less safe.
Two methods were used to assess the degree of unambiguous or ambiguous nature of road scenes. In the first approach, a survey of requested speeds at various road scenes was performed with 500 respondents. Here clearly non-built-up and built-up sites, as well as unclear sites were compared. In the second method, the recognition process of drivers was simulated by an image classification software. The classifier was trained by 100 clearly built-up and 100 non-built-up pictures. Four test runs followed, each using 200 pictures from different roads.
From the speed choice study, results have shown that in unclear situations (e.g. transition between built-up and non-built-up areas) the standard deviation of chosen speeds is higher than in unambiguous situations. In the image classification study the trained classifier worked well for road scenes which are definitely of built-up or non-built-up nature. Furthermore, as expected, for unclear situations, the classifier gave uncertain classifications.
Each of the two methods produces an output indicator, the standard deviation of speeds and the certainty score, respectively. Both indicators can serve to identify road scenes leading to uncertain and therefore risky situations.
The safe speeds and also the general speed limits are quite different outside and within built-up areas. However, the general definition of a built-up area is rather vague. According to the Vienna Convention on Road Signs and Signals , “built-up area” means an area with entries and exits specially sign-posted as such, or otherwise defined in domestic legislation. The sign to indicate the beginning of a built-up area shall bear the name of the built-up area or the symbol showing the silhouette of a built-up area or the two combined.
The concept of self-explaining roads involves that drivers choose their appropriate speed according to the road layout, without the help of speed limit signs. Typical road layouts within and outside built-up areas are usually sharply different from each other, drivers can easily recognize them. However, in transition areas between non-built-up and built up areas the layout might be not so clear, therefore drivers are not certain, which speed they should drive at. This uncertainty is reflected in higher differences between their speeds which is itself a risk factor. This paper shows two methods to assess the differences between built-up and non-built-up areas: a questionnaire survey and a computer-based image classification procedure.
2 Earlier research
The term “self-explaining road” has been used more frequently in the literature since the 90’s. According to Theeuwes and Godthelp , traffic systems having selfexplaining properties are designed in such a way that they are in line with the expectations of the road user. The so-called “Self-Explaining Road” (SER) is a traffic environment which elicits safe behavior simply by its design.
Although different authors use different terms, all agree that internal mental representations (such as schemata, scripts, routines, prototypical representations and mental models) help to increase efficiency in human decisions. According to Theeuwes and Godthelp , abstract representations of the world are stored in memory. These prototypical representations develop through experience. In order to ensure unity in the way people structure their world, it is required that there is a large consistency in the physical appearance of an object or environment and a large consistency with respect to the behaviour displayed in relation to that object or environment.
In the SPACE project initiated by ERA-NET ROAD (Sjörgen et al. ) refer to (Mazet and Dubois  and Mazet, Dubois and Fleury ), considering two terms in driving behaviour: ‘mental categories of roads’ and ‘road readability’. The SPACE project uses a definition of SER that recognizes the role of categorization in the previous papers, but suggests that practitioners generally now understand the meaning of self-explaining roads to include other psychological concepts such as intuitive and understandable design, consistency, readability and psychological traffic calming.
In their paper about behaviourally relevant road categorisation, Weller et al.  argue that “unsafe situations are likely to occur if the perceived message conveyed by cues or affordances does not match the normative behavioural expectations of the official road category. In order to avoid such mismatch it is important to know how drivers categorise (rural) roads and which elements are used for this subjective and behaviourally relevant road categorisation.” Therefore they conducted a study in a laboratory setting during which subjects were asked to rate a variety of rural road pictures. The study revealed that drivers distinguish between three different rural road categories which can be distinguished with comparatively few objective criteria.
Discussing the cognitive psychological background of driving, Montel et al.  explain that “drivers refer to categories of roads when they analyse the roads and environments they are driving on. They also associate to such categories of roads certain specific expectancies related to the events they may encounter on such roads. … One challenge for the engineers is to take drivers’ categories into account when designing roads in such a way that drivers’ information processing and decision making will be more appropriate to the situations encountered.” Montel’s paper shows results from a survey related to urban streets. The goal of the survey was to identify drivers’ categories of urban streets based on 65 photographs of various urban streets. Drivers were asked to classify streets and then to describe the events they expected to meet in the different classes of streets.
Referring to a research program on road legibility Fleury  describes a set of experiments using photographs, TV screens and drawings of various road scenes to assess the cognitive categorial knowledge of the “common driver”, to find the sets of properties of the environment appear to be relevant for the categorial organisation and finally to identify the clues (or patterns of clues) of the environment which are associated as predictors of different types of problems or patterns of behaviour.
Road scene photographs were also used in further studies about the selection of the speed by drivers depending on the layout and conditions of the environment of the road section (e.g. Garrick , Goldenbeld and van Schagen , Lahausse ).
Charlton et al.  describe a project undertaken to establish a self-explaining roads (SER) design programme on existing streets in an urban area. The SER design for local roads included increased landscaping and community islands to limit forward visibility, and removal of road markings to create a visually distinct road environment. In comparison, roads categorised as collectors received increased delineation, addition of cycle lanes, and improved amenity for pedestrians. The objective speed data, combined with residents’ speed choice ratings, indicated that the project was successful in creating two discriminably different road categories.
Dealing with road categorisation and design of self-explaining roads in a broader sense, Matena et al.  showed specific good and bad practices for the layout of transitions between rural and urban road segments.
3 Questionnaire survey
The goal of this survey was to assess, how well road users can distinguish built-up and non-built-up areas with a general speed limit of 50 and 90 km/h, and especially how they perceive transition zones.
Pictures of clearly built-up, clearly non-built-up as well as transition sites were shown on computer screen to persons who had to give their chosen speeds at each location. For each of these three types, five pictures were shown in randomly mixed order. The sites were chosen from 2*1 lane national main roads in the North-western part of Hungary, flat terrain, tangent sections, and the built-up sections from the same roads national main roads being in villages or small towns. Participants were not informed about the actual speed limit. The images showed road scenes with very little or no traffic at all so that it could be inferred what the free flow speed would be at those locations. Fig. 1 shows two typical pictures: the first one being a clearly built-up site with houses, sidewalk and public lighting poles on both sides of the road, while the second being an unclear site with built-up nature on the right side but with a rural look on the left side).
Nearly 500 respondents filled in this on-line questionnaire at home at their own computers. The survey started with about 100 students and it was later extended by other persons on available mailing lists. The average age of the respondents was 31 years, the maximum 61 years. Male/female rate: 72/28 %. This sample is certainly not representative for the total driving population, however it can be assumed that it is appropriate for finding the differences between built-up, non-built-up as well as transition sites.
For each picture, the average preferred speed, the v85 speed, the standard deviation and the relative standard deviation of speeds were calculated. The results for the three categories are shown in Table 1.
The average speeds in the three categories are well reflecting the differences: for built-up, transition and non-built-up sections 47.8, 63.1 and 86.1 km/h respectively. The fact that the mean speed in “transition areas” lies between the mean speed for built-up areas and the mean speed for non-built-up areas is not surprising: drivers take into account the reality of the road environment and the related risks, and not the official dichotomous categories (built-up/non-built-up).
Other results of this survey show that both the standard deviations and the relative standard deviation of speeds at not clearly identified sites are considerably higher than at clearly built-up or clearly non-built-up sites. This reflects the uncertainty of drivers with speed choice at such locations. An interpretation consistent with the self-explaining road notion is that it is less easy for them to categorize these sites as built-up roads (implying a lower legal speed limit) or non-built-up roads.
4 Image classification
In the next phase, an image recognition software was used to identify built-up and non-built-up areas. The aim of this part of the research was to verify that such a tool is able to account for the human classification activity, which is important for potential applications. The program used for the classification is VLFeat, the framework is provided by the program Matlab. The algorithm was created by Zisserman and Vedaldi .
The algorithm combines the following building blocks: 
Feature extraction: in this block a dense set of multi-scale Scale Invariant Feature Transform (SIFT) descriptors are efficiently computed from given input images.
Vocabulary learning: clustering a few hundred thousands of visual descriptors into a vocabulary of 103 visual words.
Spatial histograms: characterizes the joint distribution of appearance and location of the visual words in an image.
Training a non-linear Support Vector Machine (SVM): the spatial histograms are used as image descriptors and fed to a linear SVM classifier. Linear SVMs are very fast to train, but also limited to use an inner product to compare descriptors. Much better results can be obtained by pre-transforming the data and to compute an explicit feature map that “emulates” a non-linear χ2-kernel as a linear one.
Results: the computation and quantization of the dense SIFT features and testing of the SVM requires under a quarter of a second for each image. Training the SVM requires less than a minute.
In each picture the program explores in detail the typical points of a regular lattice structure, makes descriptive information from it and these descriptive data, the so called “Visual words” are collected in a dictionary. According to the density of the “visual words”, histograms are made for each picture and for each grid within the picture.
4.1 Training of the classifier with training image dataset
For this experiment a large amount of road scene photographs was needed, which we got from on-board camera video records from the same roads as in the previous section (2*1 lane main roads in the North-western part of Hungary, flat terrain, tangent sections, and the built-up sections being in villages or small towns). The images depict road scenes showing the field of view in front of the driver while driving. The photographs of the database should be classified and all classes need a series of training images and also a series of test images.
In the teaching phase, pictures of clearly built-up and clearly non-built-up road scenes were given as input. Having a sufficient number of teaching pictures, the program is able to allocate new pictures to either built-up or non-built-up categories. Each picture is given a numerical value, indicating the degree of belonging to one or the other category. Using this evaluation method, unclear sites can be identified; preventive measures can be taken, thereby increasing safety.
The classifier builds a dictionary using histograms from a series of “visual words” extracted from the training image dataset. The program will recognize the “visual words” which best describe the category and also the least typical ones. This helps to build up the model. For the training we use two sets of images, one for the positive training images, which depict the category we want to recognize, while the other, the negative image group, gives a series of images that do not contain the category desired to recognize. According to these training images two models, a positive and a negative model is prepared. Each picture is given a “certainty” score by the program. The closer the picture is to the image in the model, the higher its certainty score is (in absolute value). The score is negative for the images that do not contain the object that you want to recognize.
In our experiment the built-up road category was taken as positive and the non-built-up road category as negative. One hundred built-up and one hundred non-built-up training images were given to the program. The training images were taken on various urban and rural roads. The training database contains only clearly built-up and non-built-up pictures. In the training urban database there are pictures showing dense built-up scenes and also pictures with fewer houses along the road. Looking at the training rural database we can find pictures with dense vegetation nearby the road and also cross sections where vegetation is rare or absolutely lacking. In the training database road marking conditions also vary including locations with visible pavement markings and also no markings at all.
In Fig. 2 the 100 + 100 training images are listed on the horizontal axis, with their ratings on the vertical axis. The positive grades mean the built-up scenes, while negative values the non-built-up ones.
4.2 Classifying test images
The trained classifier is used for classifying test images. Similarly to the training image dataset, two groups of test images, both positive and negative image sets are used. Using the model each test picture gets a certainty score.
In the first experiment, the database of test images contained only clearly distinguishable built-up and non-built-up road images. According to the results, the classifier was able to recognize these two categories, there were only 12 pictures which were not ranked into the correct category. 94 % of the pictures were classified correctly (Table 2). In Table 2 the 100 + 100 test images are listed on the horizontal axis with their ratings on the vertical axis. The positive grades mean scenes recognized as built-up setting, while negative values concern scenes identified as non-built-up areas.
In the second experiment the training database was kept unchanged, while the test data were completely changed. The test images were chosen from two specific road sections, the clear urban images from the small town Herend, and the rural road scenes from main road No. 8. In the authors’ opinion nearly all of the images were clearly definable urban or rural scenes. From the test a similarly good detection was hoped as in the first experiment with images from various roads. The expectation was confirmed, this case resulted also in the high rate of correctly classified images, 91 % of the images were correctly classified (Table 3).
In the third experiment the training image dataset was kept constant again. Test images were chosen from another road section, from road No. 1 between the towns Komarom and Tata. The built-up or non-built-up nature of the pictures was defined by the city limit signs, indicating the speed limit of 50 km/h in built-up areas. There were a number of cases, where it was difficult or even impossible to decide from the picture itself, whether it is situated inside or outside a built-up area. This is because one side of the road is like a built-up environment, and the other side of the road suggests a non-built-up environment.
The detection rate has dropped significantly in this case, since during the training process the classifier met only images that clearly belong to one or to the other class. Thus, only 65 % of the images were classified correctly. From the built-up images only 42 % were recognized correctly (Table 4). Most of the built-up scenes were classified as non-built-up. It has to be mentioned, that the road scenes themselves are not clear, this can also cause difficulties for the real drivers in this road section.
For the fourth experiment, the test database was changed again. This time the images were taken on road No. 81, in and around the town of Mor. Similar results were seen as in the third case, so the recognition rate is quite low. The classifier was able to recognize and correctly classify only 60 % of the test images. From the built-up images only 20 % was correctly recognized (Table 5). Similarly to the previous case, for drivers on this road section it might be difficult to decide where they are and what speed they should choose.
4.3 Discussion of results
If we look at the certainty scores given by the classifier (Table 6), it is clear that in the first two experiments more than 70 % of built-up road scenes was ranked correctly (72 and 76 pictures from 100 having scores over 0.5 or over 1), while the precision in the non-built-up scene recognition was 100 % (Scores below −1). About 8-13 % of the pictures were ranked into the uncertain zone (17 and 26 pictures with scores between −0.5 and +0.5 from 200). In the third and fourth experiment – which contained a lot of unclear sites – 18 to 27 % of the test images got into this uncertain zone (54 and 36 scenes with scores between −0.5 and +0.5 from 200).
In Fig. 3 two pictures are shown from the correctly classified test image dataset. In the first picture houses on both sides, raised curbs, sidewalks, public lighting poles create a clear built-up image, while in the second picture vegetation on both sides, pavement edge markings, steel barriers make a clear non-built-up vision.
This result does not imply that the program is inefficient: humans also fail to correctly classify these unclear sites, as suggested by the results of the questionnaire survey. Therefore, the reason of the low performance of the program for these environments is probably to be found in the road layout. If we consider the results from the experiments No. 1 and 2, we can observe that for scenes, which are definitely of built-up or non-built-up nature, the classifier works reasonably well. So the reason of misclassification in experiments 3 and 4 has to be in the road layout. Fig. 4 shows two examples of misclassified images. The left-side picture in Fig. 4 looks definitely a non-built-up site, with solid lines marking the pavement edge, with green shoulders, without curbs and sidewalks, while in reality it is within the city limit signs with a speed limit of 50 km/h.
The right-side picture in Fig. 4 is a little bit less obvious. On the left side there are some buildings, a sidewalk with curb, while on the right side there is no curb, no sidewalk and it looks more non-built-up. Looking at this picture more carefully, a building can be recognized on the right, but it is covered by trees and it is not well visible for the drivers.
In about 50 % of the cases in experiment 3 and 4 the certainty scores were between minus 0.5 and 0.5. If we look for the reasons of the uncertain classification, we can identify cases like unknown objects in the pictures (e.g. bridges, New Jersey concrete barrier elements) and sometimes simply too dark pictures.
The teaching process described above used only clearly built-up and clearly non-built up pictures. Thus – as it was expected – the program was not able to classify unclear sites. In a later phase of the research the training process could be applied to all kinds of environments. The question is to what extent one can recognize the two “official” categories (officially inside a built-up area, with a 50 km/h legal speed limit, versus officially outside built-up areas, with a legal speed limit of 90 km/h). Then the program could be trained on all sites, based on the two official categories, and therefore including the transition sites. Thus, the positive training images would be the officially built-up environments (including both clear and unclear sites), and the negative training images would be the officially non-built-up environments (including both clear and unclear sites). The expected result would be that, as human drivers, even after training, the program will probably less easily recognise the transition sites (as officially built-up or non-built-up sites) than clearly built-up or clearly non-built up sites. It cannot be excluded, that the performance of the program for the unclear sites would be improved if unclear road environments were added in the training sample.
5 Machine-human comparison
This chapter attempts to compare machine and human classifications. In the image classification experiments a total of 200 + 4*200 = 1000 pictures were used. For each picture the certainty score given by the program was known. As this amount is too much for human tests, 50 pictures were chosen, so that they cover the whole range of the certainty scores. These pictures were shown to 86 persons asking about their preferred speeds. The average speed for each picture is plotted by the certainty scores given by the classifier in Fig. 5.
In Fig. 5 a linear regression line was fitted. Despite of the moderate R2 of 0.72 it is visible that the relationship is not linear, respondents classified the pictures in two groups and referred to the speed limits of 90 and 50 km/h on non-built-up and built sites resp. However in the range of low certainty scores (between about −1.5 and +1) there are deviations from these limits. In general, there is a reasonable coherence between the human and machine classification.
It is widely known that road users choose their speed based on their visual impression of the road scene, rather than on speed limit signs. Unclear road design can cause uncertainty to the drivers. If it does not follow from the road design, whether the given scene is within or outside built-up area, drivers are not informed properly about the appropriate speed.
This paper shows two approaches to assess the degree of uncertainty of the drivers. In the first approach, a survey of requested speeds at various road scenes has shown that in unclear situations the standard deviation of chosen speeds is higher than in unambiguous situations, and the inhomogeneous distribution of driving speeds can increase the risk of accidents.
In the second method, the recognition process of drivers was simulated by an image classification software. For road scenes which are definitely of built-up or non-built-up nature, the trained classifier works reasonably well. However, as expected, for unclear situations the classifier gives an uncertain classification.
Each of the two methods presented in this paper produces an output indicator, the standard deviation of speeds and the certainty score, respectively. Both indicators can serve as tools to assess the degree of uncertainty in road users, thus road scenes and road elements leading to uncertain and therefore risky situations can be identified. The output of the proposed methodology can be used to help road safety inspections.
Having identified uncertain transition sites, road engineers should help drivers to select the right speed. There are several possible solutions for this purpose. According to the SER principle, a clear distinction should be created, e.g. by adding a “village gate” consisting of a middle island with an appropriate deviation in the vehicles’ path. If it is not possible, non-SER solutions could also help, e.g. using or repeating speed limit signs, pavement markings.
The experiments of this paper were restricted to 2x1 lane national main roads within and around villages and small towns in off-peak hours. Further research is envisaged to add other cases, like more urbanised areas and more sophisticated traffic conditions (e.g. higher traffic volumes, bicycles, pedestrians).
The human road scene assessment method described in Chapter 3 fits into a series of similar experiments mentioned in Chapter 2. The authors think that the image classification in Chapter 4 adds a new element using a relatively simple tool. The focus here was to identify road scenes and certain elements in the environment influencing human decisions. Recently there are more advanced techniques (e.g. Foucher et al. ) using sequences of pictures taken at every 5 m with a more sophisticated analysis algorithms trying to minimize false classifications. However if there are ambiguous road scenes or sequences of them, false classifications will remain.
United Nations Economic Commission for Europe (1995) Convention on road signs and signals, done at Vienna on 8 November 1968. Incorporating the amendments to the Convention which entered into force on 30 November 1995
Theeuwes J, Godthelp H (1995) Self-explaining roads. Saf Sci 19:217–225
Sjörgen et al. (2012) SPACE, Speed Adaption Control by Self‐Explaining Roads; Final Report. Project initiated by ERA-NET ROAD. Project Nr. 823153
Mazet C, Dubois D (1988) Mental organisation of road situations: theory of cognitive categorisation and methodological consequences. Proceedings of the Conference on Road Safety Theory and Research Methods. Leischendam: SWOV
Mazet C, Dubois D, Fleury D (1987) Catégorisation et interprétation de scenes visuelles: le cas de l’environnement urbain et routier. Psychologie Française, Numéro Spécial, 85–96
Weller G et al (2008) Behaviourally relevant road categorisation: a step towards self-explaining rural roads. Accid Anal Prev 40:1581–1588
Montel M, Van Elslande P, Brenac T (2005) Categorisation of streets by drivers and associated expectancies: a cognitive analysis of driving activity for safer urban design. Adv Transp Stud 7:23–38
Fleury D (1991) Recognition of driving situations and road legibility ICTCT workshop Vienna. ICTCT document 373
Garrick NW (2011) Speeds and street design results UConn and UCD, highway design class, University lecture, University of Connecticut (February 2011)
Goldenbeld C, van Schagen I (2007) The credibility of speed limits on 80 km/h rural roads: The effects of road and person(ality) characteristics. Accid Anal Prev 39:1121–1130. doi:10.1016/j.aap.2007.02.012
Lahausse JA, van Nes N, Fildes BN, Keall MD (2010) Attitudes towards current and lowered speed limits in Australia. Accid Anal Prev 42:2108–2116. doi:10.1016/j.aap.2010.06.024
Charlton SG et al (2010) Using endemic road features to create self-explaining roads and reduce vehicle speeds. Accid Anal Prev 42(6):1989–1998
Matena S et al. (2006) Road categorisation and design of self explaining roads, RIPCORD-ISEREST project, Sixth Framework programme, pp. 1–132
Vedaldi A., Fulkerson B., Feat V. L. (2010): An open and portable library of computer vision algorithms, Proceedings of the 18th annual ACM international conference on Multimedia, Firenze, Italy, 25–29 October 2010, (Winner of the ACM Open Source Software Competition 2010), pp. 1469–1472. doi: 10.1145/1873951.1874249
Vedaldi A., Zisserman A. (2011): Image classification practical, http://www.robots.ox.ac.uk/~vgg/share/practical-image-classification.htm (last visited 20 September 2013).
Foucher P., Moebel E., Charbonnier P. (2015). Route segmentation into speed limit categories by using image analysis In: 10th International Conference on Computer Vision Theory and Applications. Berlin, 2015.03.11 -2015.03.14. Paper 237. 8 p. (Institute for Systems and Technologies of Information, Control and Communication)
This article is part of Topical collection on TRA 2014 human factors and safety
About this article
Cite this article
Kosztolanyi-Ivan, G., Koren, C. & Borsos, A. Recognition of built-up and non-built-up areas from road scenes. Eur. Transp. Res. Rev. 8, 17 (2016). https://doi.org/10.1007/s12544-016-0205-9