The assessment of projects, meant here as capital investments that create transport infrastructure, supports the activity of decision makers. In the case of public decision makers the assessment is used as a tool to assist the process of planning transport infrastructure. The assessment is concerned with achieving social (as opposed to private) objectives, including improvement of economic efficiency, reduction of the damage on the environment, improvement of safety.
A review of transport project assessment methodologies in use in European Union (EU) countries conducted within the EU-funded project EUNET has shown that, although cost-benefit analysis is predominant, recent developments are in the direction of greater use of multi-criteria analysis . An example of this move towards multi-criteria analysis is the methodology for multi-modal studies adopted in the UK [8, 23]. One reason is the simplicity of multi-criteria analysis in taking into account non-marketable effects and qualitative criteria.
Another issue of concern in project assessment for public decisions is the inclusion of the variety of points of view of the stakeholders involved in the planning process. This calls for extending the traditional cost-benefit analysis approach . A line of research explores how participation techniques can be accommodated within a multi-criteria analysis framework. Various multi-stakeholder approaches have been proposed within this framework to deal with the conflicts which arise because of the uneven distribution of costs and benefits and of the different objectives and weights attached to them [2, 4, 18].
Multi-criteria analysis is today widely used in transport project studies commissioned by public bodies. Guidance for its application exists in some countries in the form of manuals, an example is the manual issued by the UK government . Multi-criteria analysis is also widely used as assessment tool in EU-funded research projects. A few projects which have dealt with the development of comprehensive frameworks for the modeling and assessment of transport policies, including investment policies, have adopted multi-criteria analysis: among these PROPOLIS which has dealt with urban transport .
Typical decision problems addressed with multi-criteria analysis include prioritizing projects within a programme and choosing between alternative design solutions, often location-related, for an individual project. In principle, the projects cover the whole range of geographical levels which mark the boundaries of their impacts and determine the decision making bodies concerned: from international to national and local. Application examples taken from recent literature include the plans for the pan-European road and railway networks sponsored by the United Nations , the location of an intermodal barge terminal , the location of a high-speed railway station in a metropolitan area , the layout of a railway link to a city port , the plans for the development of the road network in an urban area , the plans for the development of the rail public transport network in a city .
Multi-attribute value theory (MAVT) is a well known and widely used multi-criteria methodology (a review of its mathematical foundation and of different versions implemented in commercial software is in Figueira et al. ). MAVT uses a functional approach, i.e. a multi-attribute value function is associated with each alternative. Most commonly the value function takes an additive form. Typically the additive value function is assessed using the weighted summation approach: single-attribute value functions are separately assessed, then attribute scaling constants are assessed to weight and combine the single-attribute functions.
In MAVT the scales of the single-attribute value functions are set up with reference to well-defined anchor points: this makes it possible for both the scores of the alternatives on the criteria and the weights of the criteria to be normalized with respect to the same range over which the alternatives vary on each criterion. The consequence of this property is that the elicitation of weights of the criteria is anchored to clearly defined alternatives. This helps the evaluators in their judgment task because it gives them reference points.
Conversely, the other functional methodology widely used, the Analytic Hierarchy Process (AHP) proposed by Saaty, suffers from the limitation that consistency with the mathematical structure of the model would require anchoring to alternatives which can be ill-defined. Weights would need to be derived from pair comparisons on the relative importance of a change from the value 0 for the score to the value 1 for the score on each criterion: however, given the ratio scale used for the scores, it might not always be possible to define the values taken by the attributes for the alternatives with score 0 and score 1. An example is when the attribute is defined by a quantitative indicator but is unfavourable like construction costs or when it is defined only qualitatively. In these instances the evaluator does not know what the reference points with score 0 and score 1 are. This issue is discussed in Dyer  who also shows how this limitation of the AHP can be corrected, using ideas from MAVT, by rescaling the scores. Ferrari  proposes a similar modification to the AHP methodology and illustrates this with an application to transport projects.
There are different weight elicitation methods which are consistent with the classical additive model used in MAVT. Theoretical rigour requires all methods be implemented with explicit reference to the attribute ranges defined by the anchor points. With ratio weights (proposed by von Winterfeldt and Edwards ) and semantic scales, such as the AHP scale (proposed by Saaty ) and the MACBETH scale (proposed by Bana e Costa and Vasnick ), the weight eliciting questions are derived from an interpretation of weights as swings and the respondent is asked to compare swings among different attributes. The trade-off method (proposed by Keeney and Raiffa ) uses questions where the respondent is asked to adjust the outcome of one attribute to achieve indifference among alternatives.
If different weight elicitation methods can be used within the same modelling framework, the research question arises whether different methods yield significantly (in statistical sense) different results in terms of weights. This in turn would have an effect on the resulting ranking of the alternatives. This type of investigation is helpful to detect behavioural biases. If we set up an experiment and find that the differences in the results obtained with the different methods are statistically significant, we are in a position to predict how the results are likely to differ using different weighting methods if we apply the multi-criteria analysis to new problems.
This type of analysis has already been carried out. Literature offers a range of experimental studies which, however, have considered decision contexts different from transport. Bell et al.  have considered climate change policies, Borcherding et al.  the siting of nuclear waste repositories, Pöyhönen and Hämäläinen  the evaluation of job alternatives and Schoemaker and Waid  the choice of candidates for college admission.
The results found by these authors are in agreement: the resulting weights do depend on the weighting methodology. Correlation was used as a descriptive measure of the convergence degree between data sets of results. Wherever one method is found to be more correlated with a second one than with a third one the statistical significance of these differences in correlations between pair of methods has been assessed by using p-values as in conventional hypothesis testing practice. In words, the p-values are a measure of the probability that the differences occur only by chance. In particular, it was found that trade-off is an outlier, i.e. it shows low correlations with other methods.
The paper considers MAVT models and presents the results of an experiment aimed at assessing if the weights obtained with different weight elicitation methods are different in a statistically significant sense. Methods compared include ratio with swings, AHP scale with swings, trade-off. The experiment is carried out for a decision between layout alternatives of a metro line. The paper also reports on the insight which the experiment has provided on how the attributes of the alternatives should be described by indicators so that individuals can attach a value to them. Thus the paper contributes both to suggesting how MAVT should be applied to transport projects and to highlighting the differences in weights that can be expected when different weighting methods are used.