Inference of dynamic origin–destination matrices with trip and transfer status from individual smart card data

The provision of seamless public transport supply requires a complete understanding of the real traffic dynamics, comprising origin-to-destination multimodal mobility patterns along the transport network. However, most current solutions are centred on the volumetric analysis of passengers’ flows, generally neglecting transfer, walking, and waiting needs, as well as the changes in the mobility patterns with the calendar and user profile. These challenges prevent a comprehensive assessment of the routing and scheduling vulnerabilities of (multimodal) public transport networks. The research presented in this paper aims at addressing the above challenges by proposing a novel approach that extends dynamic Origin-Destination (OD) matrix inference to dynamic OD matrix inference with aggregated statistics, highlighting vulnerabilities and multimodal mobility patterns from individual trip record data. Given specific spatial and temporal criteria, the proposed methodology extends dynamic Origin-Destination (OD) matrices with aggregated statistics, using smart-card validations gathered from (multimodal) public transport networks. More specifically, three major contributions are tackled; i) the data enrichment in the OD matrices with statistical information besides trip volume (e.g., transfer and trip features); ii) the detection of vulnerabilities on the network pertaining to walking distances and trip durations in a user-centric way and iii) the decomposition of traffic flows in accordance with calendrical rules and user (passenger) profiles. The set of contributions are validated on the bus-and-metro public transport network in the city of Lisbon. The proposed approach for inferring OD matrices yields four unique contributions. First, we allow inference to consider multimodal commuting patterns, detecting individual trips undertaken along with different operators. Second, we support dynamic matrices’ OD inference along with parameterizable time intervals and calendrical rules, and further support the decomposition of traffic flows according to the user profile. Third, we allow parameterization of the desirable spatial granularity and visualisation preferences. Fourth, our solution efficiently computes several statistics that support OD matrix analysis, helping with the detection of vulnerabilities throughout the transport network. More specifically, statistical indicators related to travellers’ functional mobility needs (commuters for working purposes, etc.), walking distances and trip durations are supported. The inferred dynamic OD matrices are the outcome of a developed software with strict guarantees of usability. Results from the case study using data gathered from the two main public transport operators (Bus and Metro) in the city of Lisbon show that 77.3% of alighting stops can be estimated with a high confidence degree from bus smart-card data. The inferred OD matrices (Bus and Metro) in the city of Lisbon reveal vulnerabilities along specific OD pairs, offering the bus public operators in Lisbon new knowledge and a means to better understand dynamics and validate OD assumptions.


Introduction
European cities are pursuing sustainable mobility, discouraging individual car-based travel and reinforcing the service quality of public transport. In this context, it is of utmost importance for policy makers to have a clear and comprehensive knowledge of the public transport network (all modes), being able to assess its adequacy, vulnerabilities and unsatisfied mobility needs along time based on passenger dynamics patterns and relevant context information (e.g., socioeconomic). Classic origin-destination (OD) matrices are one of the most used tools by public transport agencies to this end, enabling the analysis of the distribution of passengers' flows along the network. However, this classic method is hampered by multiple obstacles: (i) the need to account for ongoing changes in demand and isolate calendar-specific traffic flows (dynamic stance); (ii) the need to integrate traffic views from different operators and modes of transport; (iii) the relevance of offering parameterizable spatial resolutions; (iv) the importance of providing filters to focus on specific routes and user profiles; and, finally and foremost, (v) the need to go beyond classic volumetric views and capture important statistics that can reveal vulnerabilities, such as the distribution of the number of required transfers, as well as the time and distance spent in trips or within transfers throughout the network.
The research presented in this paper addresses the above issues by proposing an approach that extends dynamic Origin-Destination (OD) matrices with aggregated statistics, using smart-card validations gathered from (multimodal) public transport networks. More specifically, we discuss three major contributions: (i) the data enrichment in the OD matrices with statistical information besides trip volume (e.g., transfer and trip features); (ii) the detection of vulnerabilities on the network pertaining to walking distances and trip durations in a usercentric way and (iii) the decomposition of traffic flows in accordance with calendrical rules and user (passenger) profiles.
Differently from other previous works, these contributions go beyond the analysis of the demand distribution on the network. In particular, to the best of our knowledge, the analysis of transfer or trip features in Origin-Destination matrices is non-existent in the literature. Integrating a set of statistical indicators (e.g., distribution of travel times, transfer times, travel distances, transfer distances and the average number of transfers) in a single OD matrix (across all OD pairs) promotes a comprehensive and differentiated analysis of the public transport network. Moreover, this approach allows to detect network vulnerabilities, especially for specific profiles at a given time window and OD level (for instance, search OD pairs with high or moderate volume, where the mean value of transfers and the travel time is high for elders).
The contributions are validated on the bus-and-metro public transport network in the city of Lisbon. In particular, this work is conducted in the context of the ILU project [1], an innovative project established on advances from artificial intelligence, big data analytics, and urban computing, applied to the integrative analysis and optimization of urban traffic in the Lisbon city.
With the support and validation of the primary public bus operator, CARRIS, a robust and usable software application for the visual analysis of the proposed dynamic OD matrices was further developed. The tool allows several filters to build the OD matrix, including temporal restrictions (time periods, calendrical constraints), spatial granularities (transportation analysis zones, parishes, neighbourhood sections, stops), selection of user profiles, trip typologies, amongst other facilities. In the end, it projects the OD matrix onto a heatmatrix, where one of the metrics is highlighted and the remaining metrics are shown in a tooltip that is visible by hovering over the cell. Observing the related literature, to the best of our knowledge, the contributions explained here are unique and encourage a new spatiotemporal perspective of urban traffic.
The paper is organised as follows: Sect. 2 introduces essential concepts pertaining to this multidisciplinary research scope and identifies the related work on the inference of OD matrices; Sect. 3 introduces the case study and describes the proposed methodology for the inference of dynamic and multimodal OD matrices; Sect. 4 presents the main research results and implications; finally, major concluding remarks are drawn in Sect. 5.

Smart card data and automatic fare collection systems definition
decisions through the observation of the extracted information from a data-collecting technology. Automated Fare Collection (AFC) systems record passenger entries and/or exits on the network via smart card validation. When a passenger validates the smart card, on the station or in a public vehicle, a record is stored with the timestamp, location, and optional route specifications. In this work, the act of validating a card is called a transaction. AFC systems can be classified as an entry-only system or a close-system [3][4][5]. In a close-system, the passenger has to validate the card when both arriving and leaving a station (or vehicle boarding and alighting). A transport network with an entry-only system requires ticket validation only at the boarding. Since the alighting information is not recorded on the entry-only system, the agencies do not know the vehicle load at a given moment and the destination of its passengers, hindering the service planning and management. This is the case of the bus public transport system in the city of Lisbon.

Trip typology definition
A trip stage s : θ start → θ end , is a movement of a passenger p without transfers between stop coordinates θ start and θ end , through transport modes (metro, bus, bike, car, among others). The path of a passenger to its final destination is a set S of one or more (1...m) travel steps S = {s 1 , s 2 , ...., s m } . Since some systems are entry-only and, hence, only collect the boarding information, it is necessary for the identification of alighting information to have a complete trip stage record. In the literature, the most studied methodologies are based on rule-based chaining of trip stages (Li et al. 2007, [6][7][8][9][10][11][12][13][14]), where the most used rules are the ones enunciated by Barry et al. [15]: i passengers tend to start their next trip near the exit on the previous trip; ii the alighting place of the last trip is the same place as they boarding on the first trip of the day. Later, this principle was improved by Trépanier et al. [12], by suggesting that the last trip is the first boarding place on the day that could be closed and not necessarily the same location. This revised principle is particularly important (or prone to occur) in bus networks because on a given route that has ascending and descending directions, their stops correspond to different locations. Let a journey, j : θ start → θ end , to be the movement of a passenger from a origin θ start to a final destination of the passenger's trip θ end , with zero or more transfers through one or more modes of transport. From a set of m trip stages S = {s 1 , s 2 , ...., s m } it is inferred a set of $n$ journeys J = j 1 , j 2 , ...., j n , were n ≤ m . In the literature, the methodologies to identify the origin and final destination of the passenger's trip are mostly based on the distinction of transfers from an activity [10,16]. In detail, if the time interval between two trip stages is greater than a certain threshold it indicates that the passenger is doing an activity (work, shopping, home), otherwise, it is a transfer between trip stages. For instance, Alsger et al. [17] used this methodology to generate origin-destination matrices based on journeys and demonstrated that the transfer time of 15 to 90 min had an insignificant impact on the OD matrices (Fig. 1).
In commuting trips, the distinction between transfer and activity is simpler to identify due to their periodicity, frequency and the large time discrepancy between the time spent on an activity and on a transfer, such as school-home and vice versa, or work-home and vice versa.

Origin destination matrix definition
After performing stage trip or journey generation and extraction, we can represent the volumetric distribution of trips, in space, in an origin-destination matrix [18]. Each cell of the origin-destination matrix specifies the volume v i,j between an origin i and a destination j . In short, matrices include three modelling features, which  are: (i) static or dynamic matrix; (ii) spatial granularity; and (iii) trips typology (Table 1). Firstly, OD matrices can be classified as either static or dynamic. A static OD matrix considers time-independent flows over the space [19]. For this typology, methodologies have been developed to capture average flows between OD pairs within a geographic area, in a single matrix, such as gravity models, entropy maximization, information minimization [20]. However, the advancement of technologies, computational processing and storage resources enabled the inference of dynamic OD matrices. Consequently, dynamic OD matrices become the focus in the study of transportation planning, since it shows more accurately traffic dynamics between zones.
Secondly, in the transport context, the space dimension can be configured for micro and macroscopic analysis. Some studies perform exhaustive and microscopic analysis, observing the flow of trips between pairs of network stops (subway stations, bus stops, bike stations, among others) [8]. On the other hand, there are matrix studies at the macroscopic level, i.e. between network zones (aggregations of stops), such as Transportation Analysis Zones (TAZ) [19,21], city parishes, clusters [22].
Finally, the content of the matrices can be modelled from trip stages or journeys. Matrices that present the flow in the network through trip stages show the actual passenger volume at all points in the network. While the rendering of journeys-based matrices aims at identifying potential producer and attractor points in the network.

Previous works on origin destination matrix modelling
This section summarises the related work on the inference of origin-destination matrices and similar contributions in urban traffic visualisation. In literature, the inference of origin-destination matrices has a common purpose, which is passenger flow analysis. However, matrices' design diverges in different aspects, that will be herein addressed, in the following order: (i) data source (ii) temporal and spatial granularities, (iii) visualisation facilities.
First approaches for the estimation of origin-destination (OD) matrices were based on statistical inference from interviews or/and surveys. However, with the monitoring of individual movements in the network, it has been possible to model dynamic and more accurate matrices of the state of urban traffic through sensory data sources such as phone mobile records [23], global position system trajectories [19], and smart card records [24]. In fact, most studies in the scope of public urban transport with AFC systems are dependent on smart card information. For instance, Munizaga et al. [24] used smart card data from the multimodal public transport system of Chile (metro and bus) to enrich the alighting bus information and apply the bus data to infer OD matrices. Similarly, in 2017, Hora et al. [8] contributed with an approach that includes smart card data from the transport modes metro, bus and tram. The matrix proposed by Hora et al. [8] depicts dynamic OD matrices with flow distribution between city Porto zones, where each zone aggregates stops of all transport modes.
The second fundamental point for the design and analysis of OD matrices is spatial granularity. Usually, the explored granularities in literature and transport planning practice correspond to aggregations of stops, such as TAZ, clusters, or zones chosen by the author. According to McCord et al. [25], stop-to-stop OD matrices make it difficult to explore important pattern flows. Yet, Sobral et al. [21] states that it is essential to depict OD matrices with several levels of spatio-temporal granularity to encourage use by stakeholders in exploring urban mobility flows. Luo et al. [22] proposes aggregation of stops through the clustering algorithm K-means. Spatial K-means requires the optimal parametrization of the optimal number of clusters, by maximizing the ratio of average intra-cluster flow to average inter-cluster flow while maintaining the spatial compactness of all clusters. Table 1 Illustrative OD matrix showing the traffic flow volume between stations or stops. The last row and column shows the total volume on a given entry or exit, respectively Destination j The cost is less when it is assigned stops to each zone before inferring matrices. Data visualisation plays an indispensable role in the organisation and perception of data. According to Lee et al. [26], complex data encoded in numbers and text is much more incomprehensible to humans than those visualised in graphics. Indeed, finding useful patterns and information, to the naked eye, in a large set of numerical data, such as target origin matrices, is easier and appealing when the data is expressed in appropriate graphical representations. In the existing literature, we find several options to depict the demand flow between origin-destination, including heat matrices [21], heat maps [27,28], flow maps [29], chord diagrams [30], and sankey diagrams [31]. In the scope of heat matrices, Sobral et al. [21] proposes a knowledge-assisted visualisation tool suitable for each stakeholder Porto. The dashboard allows the visualisation of OD matrices with journey flow in Porto's public transport system, with spatial options such as stops, neighbourhood and TAZ spatial granularity. On the other hand, the temporal granularities available are restricted to daytime windows, such as morning (AM) peaks, afternoon (PM) peaks, and weekends. Each cell of the matrix has a hue colour belonging to the scale to indicate the volume between the OD pair and the flow details are shown after hovering the mouse over a cell. Interestingly, if the matrix has a coarse granularity e.g., neighbourhood, clicking on a cell drills down the spatial perspective by one level, e.g., stops. Similarly, the software application developed within the scope of our research (project ILU) is tailored to the Lisbon bus network and it shows more statistics, beyond the volume of passengers' distribution. Furthermore, the ILU app displays heat matrices that dynamically change temporally and spatially, allows other filters such as views of profile user types, boarding and alighting restrictions (routes and stops), typology of trips (trip stage and journeys), and more granularities (parishes). Similarly to Sobral et al. [31], the hovering act displays all information regarding the available metrics.
In contrast with heat matrices, heatmaps offer the complementary advantage of associating passenger demand with geographical locations on the map. However, displaying an arbitrarily high number OD pairs in a geographical area can deteriorate usability, and generally inflows and outflows need to be represented in two separate heatmaps. For instance, Yu et al. [28] details bus travel demand patterns using the heat maps to identify entries and exits, in the Guangzhou bus network, China. Flow patterns were identified at different temporal granularities, including periods of the day, days of the week, weekdays, weekends and vacation periods. Wood et al. [27] counteracts the restriction of representing OD flows in heatmaps by proposing a new visualisation approach. The developed technique preserves the spatial layout of all origin and destination locations by constructing a gridded two-level spatial treemap. The result is a set of vectors projected on each geographic area.

Transfer and trip status indicators
Assessing the status of public transport is covered by a vast number of related studies in different study cases (urban rail, bus, metro, or even multimodal) worldwide, where accessibility, comfort, security, and other factors are considered relevant to improve the network [32][33][34][35]. For this purpose, researchers and operators recurrently perform inquiries, inference (e.g., gravity model) or datadriven analysis (e.g., data mining on smart card data) to extract users' behaviour. Considering indicators such as flow, transfer or trip features (e.g., trip time, transfer location), studies have been able to identify mobility patterns and travel preferences, for instance: de Magalhães et al. [33] say that comfort, direct lines and decrease travel time are key factors to create attractiveness for commuting private users and they are willing to walk around 500 m to transfers; Espino et al. [36] analyse travel preferences and suggest that, from a policy point of view, decrease travel time and the transfer cost encourages the use of bus transport system. Indeed, the assessment of these statistical indicators has been pivotal to aiding the operators' planning, help redesigning the network (e.g., 30 min city or even 15 min city projects), and increasing the public transport attractiveness, especially for private car users [37][38][39].
Despite the significant efforts assessing the public network, non-trivial mobility patterns or even network vulnerabilities can stay unnoticed due to: (i) lack of a proper data representation for the analysis transfers and trip features (not only volume), on space and time (e.g., OD matrix), and (ii) lack of complete integrative analysis of the volume with other statistical indicators. For instance, Arbex et al. [32] propose an approach to assess the accessibility in São Paulo, Brazil, using transfer and travel time statistics, before and after changes in public bus transport. By using a grid map, the author indicates the accessibility score of each hexagon area on the map (the accessibility to reach the area). Since this data representation is devoid of bilateral orientation, it does not differentiate the regions of the map with lower (or higher) accessibility to reach the marked area (hexagon area). By contrast, our proposed approach integrates the indicators in an OD matrix, which allows a differentiated analysis by OD regions (or stops).
Chen et al. [34] provide useful methods to capture transfer patterns between metro and bus transports in order to assess the accessibility in Nanjing, China. Employing a multidimensional representation (cube) with three dimensions, time of the day, day of the week and stations, the author cuts the cube into two dimensions (matrix) to visualize the transfers. As a result, the author can identify patterns across the possible combinations of two dimensions, such as days of the week against a time of the day, days of the week against stations, and time of the day against stations. Despite its relevance, the study does not provide an approach to gathering spatial-temporal traffic dynamics (e.g., matrix with spatial resolution, at a given time window). Studies that robustly include transfer or trip features in OD matrices are, to our knowledge, non-existent. Usually, related works apply OD matrices on volume/flow analysis, as shown in Sect. 2.4. Besides, the identified studies so far, analyse complementary indicators individually. Chia et al. [35] analyze travel time indicator only and show that most inner-city suburbs in Brisbane, Australia, has high accessibility, although by including transfer needs the outcome reveals opposite results. In other words, an integrative exploration of both indicators allows to discover non-trivial knowledge about network status. Indeed, our work proposes extended OD matrices with aggregated statistics, not only transfer time or distance but a complete set of context statistics, such as mean travel time and distance, trip volume and mean value of transfers between origins and destinations. Considering these feature extraction principles, together with visual facilities developed to extract these matrices with guarantees of usability, our work overcomes the lack of proper data representation and data enrichment present in the literature.
Besides, the developed software allows the decomposition of these indicators in accordance with calendrical rules and user (passenger) profiles. The filtering criteria enhance the analytical power, allowing a comprehensive analysis, for instance, exploiting the accessibility for elders or youngers at different time windows.

Lisbon city as the study case
The smart card transactions used in the scope of this research were made available from the main Lisbon public transport operators, including the main bus operator, CARRIS, and the subway operator, METRO. The bus operator has a total of 178 routes (82 ascending, 82 descending and 14 circular routes), including 2166 stops, meanwhile the subway operator contains four lines with a total of 53 stations.
A residual number of transactions, with a lack of passenger identifier, boarding timestamp and boarding location, were removed from the original dataset. Briefly, the bus and subway datasets have approximately 11 million and 31 million trip records, respectively, for October 2019. The distribution of bus transactions during October month is constant, with a daily average of 400,000 validations on working days, 200,000 on Saturdays and 150,000 on Sundays. Meanwhile, the subway operator has a daily average of 1.2 million transactions on working days, 600,000 on Saturdays and 500,000 on Sundays.
Dataset features with smart card transactions from bus and subway networks are described in Tables 2 and 3, respectively.
Studying the distribution of the transactions from subway and bus operators according to card titles, the majority were done by monthly passes or occasional tickets. Specifically, 37% of the transaction are from monthly passes, 26% from occasional tickets with 1-h durability and 17% from tickets with the durability of 24/48/72 h. The remaining 17% are distributed on the ticket for elders (10%), youngers between 18 and 23 years old (5%), children with less than 12 years old (1%) and others (1%).

Modelling methods
This study aims to infer extended dynamic OD matrices by gathering trip and transfer statistics, in a unique integrated view. This approach is suitable for any traffic analysis and offers efficient interpretability of bus network vulnerabilities.

Alighting estimation of a stage trip
To generate dynamic origin-destination matrices we need complete information of each passenger's trip stages. Therefore, in this subsection, we briefly explain the algorithm for alighting stop and timestamp estimation for transactions collected from the bus entry-only system. The developed algorithm chains the bus and subway transactions to trace the passenger's path, in the bus and metro network. Subsequently, to determine the location of unknown exit bus stops, the model follows the principles described by Barry et al. [15], explained in Sect. 2. Figure 2 provides a flowchart illustrating the steps for processing metro and bus transactions of a given passenger [15]. The model can be parameterized to receive transactions from a suitable time window. Indeed, a 24 h period was chosen, starting from one day at 03:59:59 to 04:00:00 the next day. Then, the algorithm collects the transactions from a parameterized period ordered by passenger identification and chronologically. The assessment of candidate alighting stops depends on a suitable distance equation. For our problem, the haversine distance is used, since it expresses more accurately the walking distance and is widely used in recent studies [8,40]. The candidate stop θ that minimises transfer distance, along a given route, is chosen. However, if the calculated transfer distance exceeds the threshold (parameterizable on the model), the θ is not valid and the transaction remains without alighting information. We consider that the maximum transfer distance should not exceed the 1000 m threshold [24]. In the end, the model produces a dataset with trip stages, including the following columns shown in Table 4, along with other statistical features.
In the related surveys, the developed approaches are limited to keep boarding and alighting information. In our enhanced solution, we additionally calculate and store relevant statistics, such as travel time, trust level, travel distance, including transfer distance (as shown in the Table 4) associated with each trip stage. With this method, the computational effort generated by the model is compensated by the creation of low cost and efficient dynamic matrices based on calculated a priori statistics. On the other hand, storing these statistics offers freedom for further exploratory analysis beyond the classic OD matrices.

Boarding and alighting stops estimation for a journey
The model proposed in this section aims to generate journeys, whose origin and destination are respectively the beginning of the trip and the final destination (trip purpose) of the passenger. Some common examples that illustrate this typology of journeys are routine or functional trips, such as home-work or home-school commuting.
Journeys are derived from a set of tip stages made during the day. A journey ends when the passenger alights to perform an activity. In the proposed algorithm, the activity is identifiable through the time spent between trip stages. That is, if the time spent between trip stages is greater than the defined threshold, the passenger is considered to be performing an activity; otherwise, it is considered to be a transfer between public transport vehicles. The threshold defined for our case study is 90 min [17]. Tables 4 and 5 show a real example of the estimation of journeys. The algorithm receives, as input, a dataset with stage trips of a passenger, sorted by boarding date, as shown in Table 4. Finally, the algorithm estimates two journeys, as shown in Table 5. In the afternoon, the passenger makes a return trip, transferring between the same routes, but in the upward direction.
Similarly to the trip stage estimation model, this solution stores other feature statistics associated with each journey, in addition to those indicated in Table 4. In addition to the trip time and distance statistics, the model also calculates the number of transfers made by the passenger during the journey, the time and the total distances spent on the transfers. This new approach allows efficient resource allocation for generating journey-based matrices, dependent on statistical metrics.

Inference of dynamic OD matrices
One of the main contributions of this research resides in the inference and visualisation of extended dynamic OD matrices that highlight mobility patterns within areas of the city. The proposed solution aims to overcome traditional presentations centred on the distribution of volume in the network. Considering a given spatial resolution and temporal constraints (time interval and calendrical restrictions), we generate dynamic matrices able to comprehensively describe the real state of the network, through metrics such as volume, time, distances, transfers, and multimodality indicators. This functional multiplicity of matrices allows a detailed and precise  identification of vulnerabilities and mobility disparities within the city. The OD matrices are modelled from three components which are. (i) the data source; (ii) optional filters including temporal, spatial and user profile restriction; and (iii) the selection of one of the possible statistics as primary organisation criterion. The next paragraphs explain each of these modelling fields.
Firstly, the approach allows the formation of dynamic matrices based on one of the trip typologies which are either trip stage or journey. Secondly, the matrices can be modelled according to time, space and passenger typology, by parameterizing: (a) a desirable time window restricting dates and times; (b) the target weekdays (weekend, working days, one or more days of the week); (c) the target user profiles; (d) the desirable entry and exit routes and stops (by default all the transport network is considered). This last filter is only available for matrices filled with journeys, since the alighting route may not be the same as the boarding one. Third, one of the following metrics must be chosen to guide the organisation of the matrix (the highlighted metric) and the remaining is displayed by hovering in each cell (OD pair): The visual representation of the matrices is given through interactive heat matrices with usable zooming and selection facilities. The hue of each cell changes according to a scale that varies between the minimum and maximum value of the highlighted metric. As mentioned before, only one of the aforementioned metrics is chosen to tone the matrix and the rest are coupled and displayed through tooltips.
At the top of the matrix and on the left side are bar charts that summarise information about the total boardings and alightings, respectively. If the highlighted metric in the OD pairs is volume, the bar charts show the sum of the passenger volumes presented in the rows and columns to indicate the total volume of boardings and alightings, respectively. If the highlighted metric is related to time or distance, the average value is weighted according to the cell's volume presented in the row or column.

Tool for OD inference with trip and transfer status
With the support and validation of the major public bus operator in the city of Lisbon, CARRIS, a robust tool was developed for the guided parameterization and usable visualisation of the proposed dynamic OD matrices. The software application allows the specification of several filters, including temporal filters (time windowing, calendar selections), spatial granularities (TAZ, parishes, neighbourhood sections, stops), user profile filters, trip typologies, amongst others. Figure 3 provides a snapshot of the parameterization board. The visualisation of OD matrices satisfies strict usability requirements, incorporating zooming, navigation, and exportation facilities. Both heatmatrix and heatmap visualisations are supported, as well as statistical reports for summarization and background checks.
All described algorithms and graphic interface are implemented under python language conventions, including alighting trip inference, journey identifications, and OD matrices inference. Pressing "Run Query", an OD matrix is displayed on the graphic interface after the software executes the following steps: (i) retrieves the information from each field of the interface, (ii) completes a query with extracted information and executes it on the database (PostgreSQL), (iii) a table is returned where each row contains boarding and alighting information, trip and transfers features, (iv) for each indicator (e.g., travel time), a matrix OD is inferred from the table (e.g., for each pair OD is calculated the mean travel time) and finally, (v) all matrices are displayed in one heat matrix, where the indicator chosen in "Highlighted statistic" field will colour the heat matrix and the remaining indicators can be shown by hovering over the cell (OD pair). To render graphic visualizations (e.g., heat matrix, violin plot, maps) the app uses the library Plotly.js.

Modelling results
This section validates the relevance of the proposed contributions using Lisbon's public transport network as the study case. We want to assess whether the crosscutting dynamic views over the available statistics can reveal knowledge and guide the public transport service. In particular, we explore several scenarios, including age groups distribution, intra-city disparities, and regions connectivity. Since CARRIS, the target bus operator, has an entry-only system, the inferred alighting stops must be validated to verify if the outcomes from models discussed in the Sects. 4.1 and 4.2 satisfy the principles mentioned in Sect. 2. For this purpose, we adopt the usual validation method used in the literature, which is a sensitive analysis of the percentage of trips that fulfil certain criteria. The majority use as criteria a distance threshold on the transfer to validate the alighting stop. Therefore, using the mentioned method, the next section's results (Figs. 4 and 5) will assess the robustness of the algorithm for the alighting estimation. Following, we show the contributions of this research with the analysis of OD matrices for October 2019. Figure 4 shows the distribution of transactions whose alighting stop labels were successfully estimated. The blue bar, whose percentage is 11.6%, indicates the percentage of transactions with no estimated alighting stop. These transactions are unlinked with other transactions, therefore alighting information remains unknown. In the orange bar, we observe that 11.1% of the transactions were chained with other transactions, but the estimated output was not valid, since the inferred alighting bus stop is at a distance above 1000 m from the boarding of the subsequent transaction. The last bar, green, indicates the success percentage: 77.3% of the input transactions were assigned with a valid alighting stop. Thus, these  transactions compose the data source for modelling matrices based on trip stages. Figure 5 shows the percentage of trip stages (transactions with alighting stop estimated) between a range of the walking distance spent after alighting at the estimated stop. The results are in agreement with the principle that passengers tend to walk as short distances as possible, in transfers and after arriving at their destination. The first bar of the chart does not correspond to an interval because it is intended to highlight that 14.4% destination stop of trip stages is the same as the boarding of the subsequent trip stage. Another conclusion that corroborates the mentioned principle is the distribution of the percentage of trip stages that decreases with the increase of the walking distance interval. Observing the accumulated percentage, we verify that 91.30% of the trip stages, the walking distance on transfers is less than 500 m, and the remaining percentage is residual and distributed in the remaining intervals. Figure 6 describes the distribution of journeys according to the number of transfers. In short, 72.5% of the journeys have no transfers, 21.8% have one transfer and the remaining percentage is residually distributed in a number higher or equal to two transfers. These results corroborate the assumption that passengers prefer to walk the least as possible and consequently achieve the final destination without transfers.

Dynamic origin destination matrix analysis
In this section, we present the results from the exploration of dynamics OD matrices, that goes beyond the volumetric distribution analysis. Our proposal aims to express a complete and detailed understanding of the reality of dynamic traffic and patterns in a city, through several variables.
The figures are directly taken from the produced output of the application used to visualise OD matrices in different conditions. In common, matrices are based on journeys and parishes of Lisbon are selected as the default spatial granularity. Parishes are the geographical (administrative) divisions of the city of Lisbon; these allow displaying fewer rows and columns in the matrices,   yielding a simpler and more understandable visualisation. The matrix rows correspond to boarding parishes, the columns are the alighting parishes.

Volumetric OD matrix for user profiles
Studies show policies and restructuring of public transport services targeting age groups such as the young and elderly can offer freedom and independence in their mobility, and implicitly generate a positive impact on their lifestyles [41][42][43]. Therefore, Fig. 8 motivates the potentiality of the OD matrices to assess the traffic dynamics of specific age groups, wherein this case is the elderly group. Furthermore, the results of Fig. 7 show that the distribution of demand of the elderly age group is slightly different from other age groups. According to Fig. 7 elderly individuals (green wave) concentrate their travel from 10 to 11 am and from 4 to 5 pm. These results are in line with evidence from Szeto et al. [44] study, where it indicates that older people choose 10 am to 11 am to avoid overcrowded public transport (as shown in our results by peaks in pink and blue wave representing the density of entries in the network for adult and young age groups, respectively).
The statistical highlighted metric (which determines the hue of the cells) in the matrices at the Fig. 8 is the volume flow. Additionally, at the top and on the right side, the bar charts indicate the total volume of boarding and alighting on the network, respectively. The left matrix corresponds to the period between 7 and 9 am and on the right side we see the matrix from the period between 9 and 11 am of October 2nd (Wednesday). The cell shading between the matrices of Fig. 8 reaffirms the fact that during the period between 9 and 11 am there is higher traffic of elderly passengers. It is also easily observed that the largest volume is between the following OD pairs: Benfica-Benfica; Marvila-Marvila; São Domingos de Benfica-São Domingos de Benfica, i.e., there's an higher internal mobility dynamics (within parishes) than between different parishes. This fact matches the information of the number of elderly residents per parish described in the 2011 decennial census, where the parishes with the highest number of elderly residents are São Domingos de Benfica, Benfica and Marvila.
In both matrices, the cells (parish-parish OD pair) with the highest volume are those representing traffic within the same parish. This pattern indicates that older people prefer to travel shorter distances and within the same parish. Furthermore, the results of this figure corroborate with evidence of the study by Wong et al. [45], which states that shortening the walking and waiting times and improving seat availability can improve the probability of the elderly making a trip. Therefore, public transport can be a means to promote more active lifestyles for the elderly. Assessing the detailed information of some cells, by hovering over, other valuable information can be revealed. For instance: i the average transfer distance is low, ranging between 4 and 50 m; ii the trip distance and travel time are relatively short, 1.7 to 2.9 km and 13.4 to 15 min, respectively; iii the average number of transfers ranges between 0.4 and 0.5, and the total number trips with transfers ranges between 71 and 95. The latter metric indicates that around 42% to 47% of trips require a transfer. We conclude that the connectivity between locations inside of the parish could be improved to benefit the elderly group, by dedicating mobility services, such as neighbourhood routes. Figure 9, shows the average number of transfers between origin-destination pairs in a 24 h period, on October 9. We zoomed two cells that show vulnerability in OD pairs. According to Fig. 9, the cell with the highest mean number of transfers is the pair Penha de França-Santa Clara (entry-exit). The detailed information, on the cell, shows that it takes on average 2 transfers to move between Penha de França and Santa Clara. Moreover, the average transfer time (124 min) and the average trip time are extremely high (66.7 min). Despite the low number of trips, the results show that the connectivity between these parishes is disparate regarding the rest of the network. The second cell zoomed at the bottom shows the detailed information about Olivais to Olivais (traffic within the parish). At first sight, the highlighted indicator (average number of transfers) seems low, corresponding to 0.4 (close to 0 transfers). However, if we observe the exact value of the volume of trips and the volume of transfers, 1926 and 764, respectively, we verify that approximately 40% of the trips required 1 or more transfers within the same parish. These findings reveal that the public transport network can be improved within the parish of Olivais to enable better user-place connectivity. Actually, according to Suman et al. [46], decreasing transfers is the key to encouraging bus transport use and further states that improving connectivity saves users travel time.  the average time from a given origin to any point in the network, respectively. This indicator and the set of visuals identify different areas on the network whose accessibility in terms of average travel time is higher or lower. For instance, the bar charts in Fig. 10a show evidence that the parish with the longest average arrival and departure times is Santa Clara, 21.9 min and 20.1 min. This average time at departure seems reasonable, however, it is weighted with the volume of each cell, and therefore OD pair cells with higher volume and low average time may be hiding critical cases of OD pairs with the lower flow but with higher travel times. In fact, the detailed information on the cells at the Fig. 10f, g and h, i reinforce the evidence of inaccessibility to entry and exit on the Santa Clara parish. The zoomed cells (f ) and (g), on Fig. 10, show the OD pairs Santa Clara-Avenidas Novas and Santa Clara-Santa Clara with the respective volume 127 and 1710, and the average travel times are 32 and 20 min, which seems moderate travel time. However, if we add the average transfer time to the travel time, the total time spent on the trips for each pair OD is 53 and 35 min for 7.6 km and 3.1 km average trip distance, respectively. These statistical indicators show strong evidence that Santa Clara must be a target for new route modelling. The same scenario with volume and temporal difference happens as well with the zoomed cell (h) and (i) where the alighting parish is Santa Clara.

OD matrix for calendrical periods
The highlighted metric in the OD matrices presented in Fig. 11 is the average daily volume, in different contexts. The top matrix corresponds to the weekday period (7 to 11 October) and the matrix below represents a weekend (12 to 13 October). The scale of the matrices ranges from 0 to 6000, and the scale of the bar charts ranges from 0 to 17,000. In both matrices, we show the cells with the highest daily volume number, zoomed on the sides of Fig. 11.
As expected, the average daily volume on weekdays is higher than at the weekend. And in both matrices, the OD pairs with higher volume correspond to traffic within parishes with higher resident density. The OD pair Santa Maria Maior-Santa Maria Maior, in the period of working days, is the fourth pair with the highest average daily volume. However, the weekend becomes the parish with higher internal traffic and more daily entries. This situation may be explained by the touristic flows in Lisbon since this parish corresponds to Lisbon's historical centre.

Research limitations
Along with this work, two limitations were identified that are highlighted as relevant directions for future work. Firstly, the proposed solution for the alighting time inference can be further refined in the presence of traffic congestions. To calculate the alighting time, our approach sums the trip duration to the boarding time. Instead of using GTFS files, the trip duration is gathered from a file made available by CARRIS (bus operator) that contains more precise information on the routes, including the time and distances between stops of a given route. Additionally, the algorithm checks if the alighting time occurs before the next transaction.
In case this constraint is violated, the alighting time coincides with the boarding time of the next transaction. We believe that in future work, pre-processing all boarding timestamps to map the alighting timestamp of each route is a robust approach to solve the current problem. Secondly, since the CARRIS network has more than 2000 stops, the interpretability of an OD matrix with this dimensionality becomes an impractical task. The optimal solution would be an algorithm able to automatically detect motifs, relationships, outliers, clusters, among other relevant patterns on the OD matrix. Therefore, for future work, we intend to pursue this line of investigation, including the application of machine learning algorithms (e.g., clustering algorithms).

Conclusions
The reported research offers a new approach for the analysis of passengers' flow behaviour and inference of dynamic OD matrices. We propose alighting stop inference models over the passengers' paths in the absence and presence of multimodal views, offering the possibility to parameterize maximum walking distances and waiting times on route transfers, extending classical assumptions, and further addressing statistical indicators.
Furthermore, the proposed approach for inferring OD matrices yields four unique contributions. First, we allow inference to consider multimodal commuting patterns, detecting individual trips undertaken along with different operators. Second, we support dynamic matrices' OD inference along with parameterizable time intervals and calendrical rules, and further support the decomposition of traffic flows according to the user profile. Third, we allow parameterization of the desirable spatial granularity and visualisation preferences. Fourth, our solution efficiently computes several statistics that support OD matrix analysis, helping with the detection of vulnerabilities throughout the transport network. More specifically, statistical indicators related to travellers' functional mobility needs (commuters for working purposes, etc.), walking distances and trip durations are supported. The inferred dynamic OD matrices are the outcome of a developed software with strict guarantees of usability.
Results from the case study using data gathered from the two main public transport operators in the city of Lisbon (Bus and Metro) show that 77.3% of alighting stops can be estimated with a high confidence degree from bus smart-card data. Since the analysis of patterns showed that nearly 27.5% of the journeys within Lisbon's transportation network require one or more transfers, the inferred OD matrices allowed the identification of vulnerabilities in the network, offering the bus public operators in Lisbon new knowledge and a means to better understand dynamics and validate OD assumptions.
The dynamic OD matrices explored within the scope of this investigation showed relevant patterns, including evidence of the greater predominance of flows within parishes, by the elderly; factors such as travel time, transfer time and transfer show that there are significant intracity disparities, with Santa Clara being one of the parishes with significant vulnerabilities, regarding connectivity and accessibility. Research findings are actionable, offering the opportunity for operators and municipalities to pursue their efforts towards sustainable mobility.