Weighting methods in multi-attribute assessment of transport projects
© European Conference of Transport Research Institutes (ECTRI) 2009
Received: 24 May 2009
Accepted: 9 November 2009
Published: 1 December 2009
The paper considers the application to transport project assessment of multi-attribute value theory (MAVT). The aim is to compare the results of three weighting methods which are used in MAVT: ratio with swings, Saaty scale with swings, and trade-off. An experiment is set up for a decision problem relating to a public transport project. A sample of individuals is asked to provide judgments according to the three methods. The weights obtained by ratio with swings and by Saaty scale with swings show a high correlation, while weights obtained by each of these methods show a low correlation with trade-off. The weight attached to the cost attribute is higher in trade-off, which is implemented as pricing out, than in the other methods; two explanations for this result are proposed: the compatibility principle and loss aversion. The paper also provides insight on how the attributes of transport project alternatives should be described by indicators so that individuals can attach a value to them.
The assessment of projects, meant here as capital investments that create transport infrastructure, supports the activity of decision makers. In the case of public decision makers the assessment is used as a tool to assist the process of planning transport infrastructure. The assessment is concerned with achieving social (as opposed to private) objectives, including improvement of economic efficiency, reduction of the damage on the environment, improvement of safety.
A review of transport project assessment methodologies in use in European Union (EU) countries conducted within the EU-funded project EUNET has shown that, although cost-benefit analysis is predominant, recent developments are in the direction of greater use of multi-criteria analysis . An example of this move towards multi-criteria analysis is the methodology for multi-modal studies adopted in the UK [8, 23]. One reason is the simplicity of multi-criteria analysis in taking into account non-marketable effects and qualitative criteria.
Another issue of concern in project assessment for public decisions is the inclusion of the variety of points of view of the stakeholders involved in the planning process. This calls for extending the traditional cost-benefit analysis approach . A line of research explores how participation techniques can be accommodated within a multi-criteria analysis framework. Various multi-stakeholder approaches have been proposed within this framework to deal with the conflicts which arise because of the uneven distribution of costs and benefits and of the different objectives and weights attached to them [2, 4, 18].
Multi-criteria analysis is today widely used in transport project studies commissioned by public bodies. Guidance for its application exists in some countries in the form of manuals, an example is the manual issued by the UK government . Multi-criteria analysis is also widely used as assessment tool in EU-funded research projects. A few projects which have dealt with the development of comprehensive frameworks for the modeling and assessment of transport policies, including investment policies, have adopted multi-criteria analysis: among these PROPOLIS which has dealt with urban transport .
Typical decision problems addressed with multi-criteria analysis include prioritizing projects within a programme and choosing between alternative design solutions, often location-related, for an individual project. In principle, the projects cover the whole range of geographical levels which mark the boundaries of their impacts and determine the decision making bodies concerned: from international to national and local. Application examples taken from recent literature include the plans for the pan-European road and railway networks sponsored by the United Nations , the location of an intermodal barge terminal , the location of a high-speed railway station in a metropolitan area , the layout of a railway link to a city port , the plans for the development of the road network in an urban area , the plans for the development of the rail public transport network in a city .
Multi-attribute value theory (MAVT) is a well known and widely used multi-criteria methodology (a review of its mathematical foundation and of different versions implemented in commercial software is in Figueira et al. ). MAVT uses a functional approach, i.e. a multi-attribute value function is associated with each alternative. Most commonly the value function takes an additive form. Typically the additive value function is assessed using the weighted summation approach: single-attribute value functions are separately assessed, then attribute scaling constants are assessed to weight and combine the single-attribute functions.
In MAVT the scales of the single-attribute value functions are set up with reference to well-defined anchor points: this makes it possible for both the scores of the alternatives on the criteria and the weights of the criteria to be normalized with respect to the same range over which the alternatives vary on each criterion. The consequence of this property is that the elicitation of weights of the criteria is anchored to clearly defined alternatives. This helps the evaluators in their judgment task because it gives them reference points.
Conversely, the other functional methodology widely used, the Analytic Hierarchy Process (AHP) proposed by Saaty, suffers from the limitation that consistency with the mathematical structure of the model would require anchoring to alternatives which can be ill-defined. Weights would need to be derived from pair comparisons on the relative importance of a change from the value 0 for the score to the value 1 for the score on each criterion: however, given the ratio scale used for the scores, it might not always be possible to define the values taken by the attributes for the alternatives with score 0 and score 1. An example is when the attribute is defined by a quantitative indicator but is unfavourable like construction costs or when it is defined only qualitatively. In these instances the evaluator does not know what the reference points with score 0 and score 1 are. This issue is discussed in Dyer  who also shows how this limitation of the AHP can be corrected, using ideas from MAVT, by rescaling the scores. Ferrari  proposes a similar modification to the AHP methodology and illustrates this with an application to transport projects.
There are different weight elicitation methods which are consistent with the classical additive model used in MAVT. Theoretical rigour requires all methods be implemented with explicit reference to the attribute ranges defined by the anchor points. With ratio weights (proposed by von Winterfeldt and Edwards ) and semantic scales, such as the AHP scale (proposed by Saaty ) and the MACBETH scale (proposed by Bana e Costa and Vasnick ), the weight eliciting questions are derived from an interpretation of weights as swings and the respondent is asked to compare swings among different attributes. The trade-off method (proposed by Keeney and Raiffa ) uses questions where the respondent is asked to adjust the outcome of one attribute to achieve indifference among alternatives.
If different weight elicitation methods can be used within the same modelling framework, the research question arises whether different methods yield significantly (in statistical sense) different results in terms of weights. This in turn would have an effect on the resulting ranking of the alternatives. This type of investigation is helpful to detect behavioural biases. If we set up an experiment and find that the differences in the results obtained with the different methods are statistically significant, we are in a position to predict how the results are likely to differ using different weighting methods if we apply the multi-criteria analysis to new problems.
This type of analysis has already been carried out. Literature offers a range of experimental studies which, however, have considered decision contexts different from transport. Bell et al.  have considered climate change policies, Borcherding et al.  the siting of nuclear waste repositories, Pöyhönen and Hämäläinen  the evaluation of job alternatives and Schoemaker and Waid  the choice of candidates for college admission.
The results found by these authors are in agreement: the resulting weights do depend on the weighting methodology. Correlation was used as a descriptive measure of the convergence degree between data sets of results. Wherever one method is found to be more correlated with a second one than with a third one the statistical significance of these differences in correlations between pair of methods has been assessed by using p-values as in conventional hypothesis testing practice. In words, the p-values are a measure of the probability that the differences occur only by chance. In particular, it was found that trade-off is an outlier, i.e. it shows low correlations with other methods.
The paper considers MAVT models and presents the results of an experiment aimed at assessing if the weights obtained with different weight elicitation methods are different in a statistically significant sense. Methods compared include ratio with swings, AHP scale with swings, trade-off. The experiment is carried out for a decision between layout alternatives of a metro line. The paper also reports on the insight which the experiment has provided on how the attributes of the alternatives should be described by indicators so that individuals can attach a value to them. Thus the paper contributes both to suggesting how MAVT should be applied to transport projects and to highlighting the differences in weights that can be expected when different weighting methods are used.
2 Attribute weighting
multi-attribute value function of the alternative
- x :
attribute vector of the alternative
- w i :
weight, or scaling constant, of attribute x i
- v i (x i ):
single-attribute value function (score) on attribute x i
- n :
number of attributes
Ratio weighting and weighting with AHP scale are elicitation methods which can be used consistently with this interpretation. In ratio weighting 100 points are assigned to the attribute with the highest value swing. The other swings are valued by judgment in terms of percentage of the highest value swing. Weights then follow.
In weighting with the AHP scale, for each pair of attributes the dominance of the swing in the first attribute on the swing in the second attribute is assessed by judgment according to the Saaty semantic scale. Weights are then derived based on the principal eigenvector of the pairwise comparison matrix as proposed by Saaty in his classical paper .
In trade-off weighting the outcome of the attribute x i is adjusted by judgment n-1 times in order to assess the n-1 ratios w i /w j . Weights then follow. This method requires that the attribute to be adjusted is described with a single quantitative indicator.
3 The experiment
3.1 The hypothesis
The experiment considers three weight elicitation methods: ratio with swings, AHP scale with swings, and trade-off. To formulate the hypothesis of the experiment we define convergent validity as the correlation shown by a pair of data sets obtained with different elicitation methods. The hypothesis of the experiment is that the convergent validity between weights elicited with different methods depends on the methods.
3.2 The decision problem
The decision problem relates to the layout alternatives for the planned fourth line of the metro network in Rome. Two layout alternatives only are considered, one of minimum length (base alternative) and one of maximum length (complete alternative). Criteria include one cost attribute: construction costs, and three benefit attributes: travel time, road safety and air quality.
The experiment was carried out a first time based on quantitative descriptors for the benefit attributes that are directly obtainable from conventional transport planning models. Descriptors of benefits were yearly reductions, on the without-project case, of respectively, hours spent travelling, accidents resulting in injury or death, and particulate matter (PM) emissions.
The individuals participating in the experiment who were asked the questions to elicit weights reported a difficulty in attaching a value to the attributes described by total hours saved in a year and PM emissions saved in a year.
Thus the experiment was repeated based on a different choice of the descriptors for the travel time and air quality benefits. For the former the description was given in terms of number of yearly users having a time saving for one trip of a given value. For the latter the description was given in terms of consequences on health of the reduction of PM emissions: premature deaths and days of restricted working activity saved.
Alternatives and attributes
1. Construction costs
1.7 billion €
2.4 billion €
2. Travel time benefits
360,000 daily users of the new line have an average saving of 12 min per trip
500,000 daily users of the new line have an average saving of 12 min per trip
3. Road safety benefits
in 1 year: 67 accidents, fatal and with injured, are saved, of them 3% are fatal
in 1 year: 101 accidents, fatal and with injured, are saved, of them 3% are fatal
4. Benefits relating to the effects on citizens’ health of particulate matter emissions
in 1 year: 90 premature deaths due to cardio-respiratory diseases are saved, 4,100 days where activity is restricted due to health conditions are saved
in 1 year: 158 premature deaths due to cardio-respiratory diseases are saved, 7,200 days where activity is restricted due to health conditions are saved
3.3 The multi-criteria model
The additive MAVT model is adopted. Single-attribute value functions are set up using local scaling (i.e. either 0 and 1 scores are assigned to the two alternatives). For the trade-off method which requires the evaluation of the single-attribute function for the adjusted attribute a linearity assumption is made with respect to the quantitative descriptor of the attribute.
3.4 The questionnaire
Attribute weights are elicited in the experiment with a questionnaire. Fourteen experts in transport planning, including academics and practitioners, participated in the experiment. The questionnaire includes an introduction where the decision problem is explained and information on the alternatives are provided. These include descriptors for the four attributes used in the multi-criteria model as in Table 1. The questionnaire includes then three blocks of questions aimed at eliciting weights. Each block uses one elicitation method. The sequence in which blocks are presented to respondents was randomized to avoid that the order of questions could bias the results.
In the ratio and AHP methods the subjects are confronted with the change from the worst to the best outcome of each attribute (where worst and best are identified based on the two layout alternatives under examination).
“Given an alternative with the worst outcome for each attribute, which attribute would you change first from worst to best? Which second? ... Assign 100 to the first. Assign between 0 and 100 to the second. … ”
“Compare the swing for worst to best of the attribute construction costs with the swing from worst to best of the attribute travel time benefits. Use the Saaty scale (1 to 9: 1 equal importance, 9 extreme importance) to judge the dominance of the first swing on the second (use the reciprocals 1 to 1/9 if the second swing dominates the first)”. The question is repeated for each pair of attributes.
In the trade-off block the attribute that subjects are asked to adjust is construction costs and the questions are formulated by asking the willingness to pay of the society to achieve improvements in each of the benefit attributes. Trade-off is equivalent in this instance to a pricing out method.
“Given an alternative with lowest construction costs and worst benefits, how much would you be willing to increase construction costs to change travel time benefits from worst to best?” The question is repeated for each benefit attribute.
3.5 The statistical analysis
The weights, derived from the three elicitation methods, are calculated for each subject. Within-subject inter-method Pearson correlation coefficients of weights are calculated and then averaged over subjects. This is done first for all weights, and, second, individually for the weight of each attribute.
- s :
index of individual
- a,b :
index of weighting method
- i :
index of attribute
Statistical significance is assessed using the p-values obtained from the Wilcoxon matched pairs signed-rank test. The logic is that of hypothesis testing. We have a sample of subjects and for each subject a pair of measurements. The null hypothesis H0 states that the two series of measurements are drawn from the same population, i.e. they differ only by chance. The test chooses a particular statistic which is based on the measurements. We calculate then the a-priori probability (p-value) that if H0 is true the statistic takes values as extreme as the value actually obtained from the sample. If this probability is low we can reject H0 and say that the differences in the two series of measurements are statistically significant.
In the Wilcoxon test the difference of the two measurements is calculated for each subject, then the absolute values of the differences are ordered in ranks. The statistic of the test is the sum of the ranks corresponding to positive differences. If H0 is true the sum of the ranks corresponding to positive differences must be of the same order of magnitude, in probability terms, of the sum of the ranks corresponding to negative differences. Tables provide the probability (p-values) that if H0 is true the statistic takes values greater or equal than the one obtained from the sample. Siegel and Castellan  provide these tables in the case of small samples (less than fifteen subjects).
Finally, the ranking of the two layout alternatives, resulting from the different weight vectors, are calculated, according to the additive MAVT model, for each subject.
Average within-subject inter-method correlation of weights
Ratio and AHP scale
Ratio and trade-off
AHP scale and trade-off
Average within-subject inter-method correlation of weights of individual attributes
1. Construction costs
2. Travel time benefits
3. Road safety benefits
4. Air quality benefits
Ratio and AHP scale
Ratio and trade-off
AHP scale and trade-off
And in fact, the analysis of weight values shows that the average weight of the cost attribute in the ratio method (average 0.209) and that in the AHP scale method (average 0.148) are lower than the average weight of the cost attribute in the trade-off method (average 0.452). These differences in weight values are significant at p < 0.0001 in both cases. Thus individuals attach a value to the range of cost variation to varying extent according to the elicitation method.
The finding that the cost attribute has a higher weight in the trade-off method can be explained in terms of the compatibility principle (Slovic et al.  and Tversky et al.  discuss the principle and support it with experimental evidence). This states that the weight of any stimulus element is enhanced by its compatibility with the response mode. Individuals tend to focus on stimulus elements that are compatible with their response. In the case here, the trade-off method is implemented with pricing out questions. Therefore, in trade-off the construction cost attribute matches the response scale, i.e. both the attribute and the response of the individuals is in monetary units. In ratio and AHP scale the response scale is different from money. Thus in trade-off the attribute construction costs tends to be accentuated.
The finding on the weight of the cost attribute can also be seen as a manifestation of a loss aversion effect, well known in behavioural economics, when it is recognised that money is given lower value if the question is about reducing costs borne (i.e. a gain) as in ratio and AHP scale, it is given higher value if the question is about bearing higher costs and willingness to pay (i.e. a loss) as in trade-off.
Money equivalents (billion €) of the benefit attributes
Travel time benefits
Road safety benefits
Air quality benefits
It is relevant at this point to recall that different measures of the equivalents between money and another attribute can be defined according to whether a loss or a gain is incurred. The theory of reference-dependent preferences, which applies the ideas of prospect theory to riskless choice, define four valuation measures (found in Bateman et al.  and in Munro and Sugden ): willingness to pay (money one would pay in return for a given gain in the attribute), willingness to accept (money one would accept in return for a given loss in the attribute), equivalent loss (money loss one would accept in place of a given loss in the attribute), and equivalent gain (money gain one would accept in place of a given gain in the attribute). A central assumption in the theory is loss aversion: losses are valued more heavily than gains. This implies that the four valuation measures need to be different and satisfy certain inequalities.
The result obtained here can be explained in terms of difference between equivalent gain and willingness to pay. In ratio and AHP scale the questions were about gains of both money and benefits (equivalent gain). In trade-off the question was about a loss in money and a gain in benefits (willingness to pay). According to the theory of reference-dependent preferences the equivalent gain needs to be higher than the willingness to pay and this expectation is confirmed by the results here.
Finally we note that the weighting method affects as well the resulting ranking of the two alternatives. When moving from ratio to AHP scale nobody in the sample changes the ranking. Conversely, when moving from ratio to trade-off and from AHP scale to trade-off 21% of the individuals in the sample (both cases) shows a reversal in the ranking.
The paper has presented the results of a weight elicitation experiment within a decision problem relating to transport projects. The experiment has made evident that the attributes of the transport projects need to be described in terms of impacts to which individuals are able to attach a value. The usual outputs of transport planning models are not sufficient if the aim is to conduct multi-criteria analyses. This translates into the rejection of descriptors such as yearly passenger-hours and pollutant emissions, and adoption of descriptors based on yearly users gaining a time saving and average time saving per trip, and on health consequences of pollutant emissions.
The paper has compared the weights obtained with the following three weight elicitation methods: ratio with swings, AHP scale with swings and trade-off. Results are in agreement with those found in the literature for other decision problems. The correlation of weights between a pair of methods depends on the methods. In particular, trade-off turns out to be an outlier. The main difference is shown by the weight of the construction cost attribute. The weight for this attribute obtained with the trade-off method is higher than with the other two methods. As the trade-off method has been implemented with pricing out questions this result can be explained in terms of the compatibility principle. An alternative explanation is loss aversion. When moving from the first two methods to trade-off the ranking of alternatives also shows to be affected.
Results suggest that selection of the weight elicitation method is of relevance in multi-attribute models as this affects both the weights and the resulting ranking of the alternatives. The findings here add to the body of experimental research documenting that weights are “constructed” in the elicitation process rather than “uncovered”.
Further research could be carried out by implementing the trade-off method according to the four valuation measures which are defined in the theory of reference-dependent preferences. The experiment here has considered only a willingness to pay question but the trade-off method could be implemented by asking also willingness to accept, equivalent gain and equivalent loss questions. In case the weights obtained from the four variants of the trade-off method were highly correlated with each other one might conclude that weights depend principally on the methods (ratio and AHP scale on the one hand, trade-off on the other) and that differences can be explained on the basis of the different response scales used by the methods. In case they were not one might relate the differences in weights to valuation of losses and gains.
- Bana e Costa CA, Vasnick JC (1994) MACBETH—an interactive path toward the construction of cardinal value functions. Int Trans Oper Res 1:489–500View ArticleGoogle Scholar
- Bana e Costa CA, Nunes da Silva F, Vansnick J-C (2001) Conflict dissolution in the public sector: a case-study. Eur J Oper Res 130(2):388–401View ArticleGoogle Scholar
- Bateman I, Munro A, Rhodes B, Starmer C, Sugden R (1997) A test of the theory of reference-dependent preferences. Q J Econ 112(2):479–505View ArticleGoogle Scholar
- Beinat E (ed) (1998) A methodology for policy analysis and spatial conflicts in transport policies. Final Report of the DTCS (Spatial decision support for negotiation and conflict resolution of environmental and economic effects of transport policies) Project, European Commission DG12Google Scholar
- Bell ML, Hobbs BF, Elliott EM, Ellis H, Robinson Z (2001) An evaluation of multi-criteria methods in integrated assessment of climate policy. J Multi-Criteria Decis Anal 10(5):229–256View ArticleGoogle Scholar
- Borcherding K, Eppel T, von Winterfeldt D (1991) Comparison of weighting judgements in multiattribute utility measurement. Manage Sci 37(12):1603–1619View ArticleGoogle Scholar
- Bristow AL, Nellthorp J (2000) Transport project appraisal in the European Union. Transp Policy 7(1):51–60View ArticleGoogle Scholar
- DETR (2000) Guidance on the Methodology for Multi-Modal Studies (GOMMMS). Department of the Environment, Transport and the Regions, LondonGoogle Scholar
- Dodgson J, Spackman M, Pearman AD, Philips LD (2000) Multi-criteria Analysis: a Manual. Department of the Environment, Transport and the Regions, LondonGoogle Scholar
- Dyer JS (1990) Remarks on the analytic hierarchy process. Manage Sci 36(3):249–258MathSciNetView ArticleGoogle Scholar
- Ferrari P (2003) A method for choosing from among alternative transportation projects. Eur J Oper Res 150(1):194–203View ArticleGoogle Scholar
- Figueira J, Greco S., Ehrgott M (2004) Multiple Criteria Decision Analysis: State of the Art Surveys. SpringerGoogle Scholar
- Gerçek H, Karpak B, Kilinçaslan T (2004) A multiple criteria approach for the evaluation of the rail transit network in Istanbul. Transportation 31(2):203–228View ArticleGoogle Scholar
- Haezendonck E (ed) (2007) Transport Policy Evaluation: Extending the Social Cost-benefit Approach. Edward Elgar, CheltenhamGoogle Scholar
- Keeney RL, Raiffa H (1976) Decision with Multiple Objectives: Preferences and Value Trade-offs. Wiley, New YorkGoogle Scholar
- Lautso K, Wegener M, Spiekermann K, Shepperd I, Steadman P, Martino A, Domingo R, Gayda S (2004) PROPOLIS: Planning and Research of Policy for Land Use and Transport for Increasing Urban Sustainability. Final Report, Fifth Framework Programme, European CommissionGoogle Scholar
- Macharis C (2004) A methodology to evaluate potential locations for intermodal barge terminals: a policy decision support tools. In: Beuthe M, Himanen V, Reggiani A, Zamparini L (eds) Transport Developments and Innovations in an Evolving World. Springer, Berlin, pp 211–234View ArticleGoogle Scholar
- Macharis C (2004) Multi-criteria anlysis as a tool to include stakeholders in project evaluation: the MAMCA method. In: Haezendonck E (ed) Transport Policy Evaluation: Extending the Social Cost-benefit Approach. Edward Elgar, Cheltenham, pp 115–131Google Scholar
- Mateus R, Ferreira JA, Carreira J (2008) Multicriteria decision analysis (MCDA): Central Porto high-speed railway station. Eur J Oper Res 187(1):1–18View ArticleGoogle Scholar
- Munro A, Sugden R (2003) On the theory of reference-dependent preferences. J Econ Behav Organ 50(4):407–428View ArticleGoogle Scholar
- Pöyhönen M, Hämäläinen RP (2001) On the convergence of multiattribute weighting methods. Eur J Oper Res 129(3):569–585View ArticleGoogle Scholar
- Saaty TL (1977) Scaling method for priorities in hierarchical structures. J Math Psychol 15(3):234–281MathSciNetView ArticleGoogle Scholar
- Sayers TM, Jessop AT, Hills PJ (2003) Multi-criteria evaluation of transport options—flexible, transparent and user-friendly? Transp Policy 10(2):95–105View ArticleGoogle Scholar
- Schoemaker PJ, Waid CC (1982) An experimental comparison of different approaches to determining weights in additive value models. Manage Sci 28(2):182–196View ArticleGoogle Scholar
- Siegel S, Castellan NJ (1988) Nonparametric Statistics for the Behavioral Sciences. McGraw-HillGoogle Scholar
- Slovic P, Griffin D, Tversky A (2002) Compatibility effects in judgment and choice. In: Gilovich T, Griffin D, Kahneman D (eds) Heuristics and Biases. The Psychology of Intuitive Judgment. Cambridge University Press, pp 217–229Google Scholar
- Tsamboulas DA (2007) A tool for prioritizing multinational transport infrastructure investments. Transp Policy 14(1):11–26View ArticleGoogle Scholar
- Tversky A, Sattah S, Slovic P (1988) Contingent weighting in judgment and choice. Psychol Rev 95(3):371–384View ArticleGoogle Scholar
- Von Winterfeldt D, Edwards W (1986) Decision Analysis and Behavioural Research. Cambridge University, CambridgeGoogle Scholar