Strategies for the Use of Data and Algorithm Approaches in Railway Traffic Management

A Railway Traffic Management problem can be defined as forecasting fu- ture progression of trains, identifying conflicts where two or more trains compete for available infrastructure, investigating options for resolution of conflicts, re-planning train schedules to minimise the impact on sy- stem performance. Performance management of complex networks is a problem common to a number of industries and applications. There has been much work over many decades on modelling the generation and optimisation of railway timetables. Much of this focuses on relatively simple railways and services and is therefore quite straightforward. Main line railways have a number of features that introduce significant com- plexity. Traditionally the problem of re-planning a timetable in near real time to manage and recover from service perturbations and disruption is simplified to help arrive at a solution in an acceptable amount of time, but this then can have unintended consequences which can amplify rat- her than reduce the disruption in the network. Resonate are interested in looking at different strategies / models / techniques for dealing with the problem, the likely strengths and risks of these, and how they might be adapted to improve existing solutions. The study group participants undertook a brief survey of recent literature on modelling train delays and found machine learning approaches, network models and a statisti- cal approach to defining the efficiency of a station in dissipating delays which are worthy of further consideration. We then explored total of nine modelling approaches during the study group. The approaches fell broadly into two groups: those that sought to understand the pro- pagation of delays (Approaches 1 to 6) and those that sought to offer strategies for minimising delays (Approaches 8 and 9). Approach 7 pro- poses a way of understanding the propagation of delays and using that to evaluate candidate policy decisions. There are a number of promising approaches here which provide useful lines of enquiry, many suitable for expansion beyond the simple railways modelled, to include variable train speeds, junctions and intersections, temporal differences in usage, such as tidal flows in and out of cities, and resource constraints.


Summary
Resonate are interested in looking at different strategies / models / techniques for dealing with the problem of rescheduling a railway timetable when it's unexpectedly disrupted, the likely strengths and risks of these, and how they might be adapted to improve existing solutions. Nine different approaches (drawn from machine learning, network models and stochastic models) to defining the efficiency of a station in dissipating delays were considered. They fell broadly into two groups: those that sought to understand the propagation of delays and those that sought to offer strategies for minimising delays. (1.1) There has been much work over many decades on modelling the generation and optimisation of railway timetables. Much of this focuses on relatively simple railways and services and is therefore quite straightforward. Main line railways have a number of features that introduce significant complexity: • Routes with many junctions and intersections • Variable train speeds • Variable train stopping patterns • Mixed passenger and freight traffic • Peak patterns / off peak patterns • Tidal flows in and out of cities for the working day • Constraints from resources (trains and crew) As the volume of traffic in the system increases, the effect of perturbations in the actual progress of trains can introduce unstable behaviour that provides significant challenge for the compromise between performance and capacity. Traffic management systems are therefore needed to forecast the likely future progress of trains, identify conflicts, and modify the planned schedule of trains to minimise the resulting disruption and accelerate recovery back to the planned service.
Resonate is a technology company specialising in rail and connected transport solutions. We have a powerful platform and an excellent team that is helping us to support numerous elements of the Digital Railway initiative being driven by Network Rail and the DfT. We are also working hard to help deliver intelligent traffic management and smarter cities internationally. We have over 50 years of rail industry experience and used to be the research division of British Rail before being privatised in 1996. Our understanding spans safety critical signalling control, rail operations management, logistics and IT. We have combined our rail background knowledge with agile development methods, data gathering, advanced algorithms and the latest cloud computing, so that we have the tools to deliver 21st century traffic management.
In 2016 we changed our name from DeltaRail to Resonate in recognition of the fact that we are entering a new and demanding age of connected and intelligent transport. We have embarked on a drive to maximise capacity and performance through predictive intelligence, shared data, joined up travel and informed customer journeys.
2 Problem statement (2.1) A Railway Traffic Management problem can be defined as: • Forecasting future progression of trains • Identifying conflicts where two or more trains compete for available infrastructure • Investigating options for resolution of conflicts • Re-planning train schedules to minimise the impact on system performance Traditionally the problem of re-planning a timetable in near real time to manage and recover from service perturbations and disruption is simplified to help arrive at a solution in an acceptable amount of time, but this then can have unintended consequences which can amplify rather than reduce the disruption. We would be interested to look at different strategies / models / techniques for dealing with the problem, the likely strengths and risks of these, and how they might be adapted to improve existing solutions. Performance management of complex networks is a problem common to a number of industries and applications and therefore it is likely that approaches already exist that could be adapted for use in rail. Detailed data on planned and actual services over many months is available to support the workshop.

Constraints
A number of constraints can complicate the timetable optimisation and have a significant impact on the robustness of a solution: • Availability of infrastructure -typically during disruption, the level of infrastructure available is reduced, either by a failure of the equipment, or a blockage caused by train failures or human intervention.
• Availability of rolling stock (their location, and the fact that there are various different types of train, with differing requirements for maintenance and capacity) • Availability of staff (location, ability to drive certain trains & routes, shift hours).

Costs & Benefits
The primary benefit of improved railway performance is of course to the passengers, freight operators and the UK economy. In addition to this: • The Rail Regulator can fine Network Rail for failure to meet performance targets -the last fine was £53m in July 2014.
• There is a delay attribution regime in place between Network Rail and Train Operators where the originators of delay pay compensation to those who suffer delay -typically £100m can change hands across the industry in a year.
• There are additional costs for reimbursing customers for delays or reduced services.
• Customer service is important, often measured indirectly through the number of trains calling at each station compared to planned operation.
There is therefore also a good business case for minimising train delays 3 The solution (3.0.2) We undertook a brief survey of recent literature on modelling train delays and found machine learning approaches, network models and a statistical approach to defining the efficiency of a station with respect to delays. We explored total of nine distinct modelling approaches during the study group and The approaches fell broadly into two groups: those that sought to understand the propagation of delays (Approaches 1 to 6) and those that sought to offer strategies for minimising delays (Approaches 8 and 9). Approach 7 proposes a way of understanding the propagation of delays and using that to evaluate candidate policy decisions.

Approaches in Literature
(3.1.1) Literature Primary delays are caused by unexpected stochastic events in the system (e.g. technical faults, prolonged alighting/boarding times, adverse weather conditions etc.); these can have a knock-on effect to create secondary delays on other services. The propagation of primary delays depends significantly on railway timetabling and infrastructure and they can have an impact on services both spatially and temporally far from the origin. This makes the prediction of secondary delay challenging, but numerous approaches have been considered in recent literature. An overview of recovery models and algorithms for real-time railway rescheduling can be found in [9]. Rescheduling models for railway traffic management in large-scale networks can be found in [10].These papers above contain a great many references to other relevant literature.
(3.1.2) Modelling train delays with q-exponential functions is used in [11] to provide an efficiency score for each station. q-exponentials are defined as , they estimated the parameters using nonlinear least squares. Then q measures the deviation from an exponential distribution , so an estimated q larger than unity indicates a long-tailed distribution. 80% of the trains recorded t = 0 indicating a delay of less than 1 minute, so this model represents the conditional probability of delay given the train is delayed 1 minute or more. Assuming the waiting time distribution is given by a Poisson process P (t|β) = βe −βt , allowing β to fluctuate to describe the temporal variations in the rail network due to weather, holidays, signal failures etc., the degrees of freedom in the model. When β is small, delays are more frequent. From the model, the average contribution of each degree of freedom is estimated form the fitted value of b, giving a statistic χ 2 i = 1 2 (q − 1)b which is laarge when a local station is doing well, i.e. the local exponential decay of the delay times is as fast as it can be. Stations with the same q (external degrees of freedom in the network) can usefully be compared. This analysis showed that Cambridge ans Edinburgh were the best performing busy stations under this criterion. techniques have enabled the efficient analysis of large data sets for accurate prediction. Several instances of applications of these techniques to train delay can be found in literature. [12] use support vector regression to identify the relationship between various system characteristics and train delay; [14]use artificial neural networks to predict delay, achieving high accuracy in an application to Iranian railways. A major flaw in these approaches for our application can be the computational time required for the analysis of very large data sets; [13] propose a fast learning algorithm based on the 'Extreme Learning Machine', which can extract relevant information quickly to make accurate predictions about future network states, they show the method can improve the current prediction systems implemented in Italian railway networks. There is a substantial amount of literature of railway simulation using petri nets.
Here is a selection of papers concerned with this topic. Articles dealing with safety aspects: [27], [16]. An article dealing with conflict forecasting and delay minimisation: [17].Articles dealing with train scheduling and its optimisation: [18], [19], [20], [21]. Several articles have focussed on 'on-the-fly' responses to a primary delay, a small selection of which are provided here: [25], Inputs to a discrete Petri net model are 'fuzzified' to create a probabilistic prediction of delay propagation; Success is shown from testing the model on part of the Belgrade railway node. [22], [23] similarly model delay propagation probabilistically by considering stochastic Petri nets., [24], [26].
(3.1.1) HLPNs have been used as models of railway networks for multiple purpo-ses, a large body of the literature focuses on optimisation of timetabling, which occurs several months ahead of dispatch. Several studies however have focused on 'on-the-fly' responses to a primary delay, a small selection are provided here. Milinkovic et al. (2013) 'fuzzify' the inputs to a discrete Petri net model to create a probabilistic prediction of delay propagation. They show success from testing the model on part of the Belgrade railway node. Caetano and Teixeira (2014) similarly model delay propagation probabilistically by considering stochastic Petri nets. [24] build a prototype tool for identification of conflicts and estimation of knock-on delay, tested, with success, on the Dutch railway network.
3.2 Approach 1: Toy model The idea is to create a very simple model in order to simulate a train line at the tactical level. We build a exclusive interacting particle system on a network which represents the line. (3.2. 2) The network is made of three fundamental components: Straight: this is a pieces of track included between two signals. For the purpose of the simulation, the time step is the time needed for a train to move from one berth to the following, and it set at 2 minutes. The time a train stops at a station is also set at 2 minutes. Trains may only proceed to the next node if the next node is not occupied by another train. Split: this is a berth followed by a signal that involves a decision to send the train to one of n possible following berths. In the network we consider here, n = 2 always; In reality, this is realistic because points/switches on the railway can only take two positions. More complex junctions are achieved by having sets of points in series. Join: this is a berth preceded by a signal that involves the decision is to allow one of the trains on the previous n = 2 berths to proceed to the next one and stop all the others.
(3.2.3) Using these simple components other structures can be built : Station: this is a combination of a split, n = 2 straights, and a join; Overtake track : this is similar to a station, but between the split and the join, the two tracks have a different number of straights. We are not considering this structure in our simple model since overtaking can also happen in a station. Figure 2 shows all the fundamental components of the network and the sample train network that we will use for our simulations.
(3.2.4) Important remarks: For coding purposes, in the network above berths are represented by nodes. Signals are at the end of nodes, as they would in a a real berth. We consider two different kinds of trains: Fast trains: scheduled to go from (1) or (2) to (17) without stopping at the stations; Slow trains: scheduled to stop at every station. In our model, stations correspond to the sets of berths (5,6), (10,11), and (14,16). In our model, both types of trains move at the same speed and fast trains have priority at splits and joins; this reflect the fact that, often, the speed of a train is constrained by the maximum speed permissible on the track, rather than by its intrinsic speed limit. Moreover we assume that any berth has the same length. Other kinds of trains could be added to the model later. For example, one could consider trains which only stop at the station (10,11).
(3.2.5) How does the model work? Trains are injected in nodes (1) or (2). For now the frequency is fixed, depending of how much we want to saturate the network. We assume that trains in node (17) always disappear. The directed network is given as an adjacency matrix. Then it is necessary to assign transition probabilities to each berth, i.e. applying a transition matrix to the whole system. The transition matrix is different at every time step and needs to be recalculated in a particular order, starting from the terminal node, i.e. when we know what the train in node (17) does, we decide what the node in (16) can do; after we know that we can decide what will happen to trains in (14) and (15) and so on.
(3.2.6) Join nodes require particular attention. To decide the strategy at these points all nodes preceding it must be processed together. In general, the decision is made by the following algorithm: '''pyton if 'the join berth is occupied' then 'all the trains immediately before will receive the signal to stop' else if 'there are more than one train in the station' then if 'there is a fast train' then if 'there are multiple fast trains' then 'the one with the higher delay goes first' # in this case higher time since injection else 'the fast train goes first' else 'the train with the highest delay that has already stopped goes first' end else 'the train can proceed' end ''' How does this address the problem? When we simulate the system filling it below a certain capacity level, trains are not expected to interfere with one another. In this case it is trivial to calculate the time it would take to a train to complete the entire journey. On the other hand, by running the simulation many times at different capacities we can record the distribution of the time taken for trains to complete the whole journey. Taking the mean time or other suitable statistic, we can produce a timetable. Once a time table has been established, we could then study how delays can propagate in the network and answer questions such as what happens when we are near full capacity? Which delays considerably affect the performance of the network? What happens if we introduce injection delays and ejection delays? What happens if we make a train stop at one station for an unknown amount of time? We can change the join and split rules in to explore different scenarios and study how these affect the network. For example, knowing that a fast train is delayed could trigger the action of making slow trains wait in previous stations. This could be relevant when considering trains having different speeds.
Results to date Our network is programmed to favour fast and delayed trains. To do so we set the following rules for a train to move forward: • Fast trains always have priority over slow trains.
• If both trains have the same type and the same amount of delay (which could be zero) the probability of moving forward is 50%.
• If both trains have the same type and one of them is more delayed than the other one, the most delayed train moves first.
• In stations one platform has preference. The second platform only gets used if the first one is taken.
The current model is as simple as possible. This means that splits and joints are related to three berths, and stations have two platforms. In future development, adjustments could be made to model a network which is a more accurate representation of an existing railway network.
Code The model simplify the rail network, representing it as a finite graph where each node corresponds to a berth, consistently with the information available at the global level. At each time step the program, in function of the current global status of the system, decides (and applies) the next move of all the trains in the network. The main function is 'railtrack', it has basically one input and several settings. The input 'ad matrix' is the matrix which describes the network connections as a graph. The parameter 'time steps' determines the number of iterations. Other parameters refer to delays probabilities and congestion: prob_inject = floating number between 0 and 1, probability that in a single starting empty berth is filled with a train lambda_delay = non negative value, constant for the poisson describing the delay of the trains, the constant corresponds to the average delay prob_stop_anywhere = floating number between 0 and 1, probability that a train spends one time step further in the station prob_stop_station = floating number between 0 and 1, probability that a train spends one time step further in any non-station berth The outputs are the: trains_old = set containing all the trains that arrived at the last station. trains_now = list of all the berth at the end of the computation, the entry is the class of the train in the berth, 0 if there are no trains in that edge. transitions = matrix with the instruction for moving the trains at each time step. Ex: transitions [7] it is a vector that describes the action to be taken at the time_step=7, the train in the position 4 goes in the position transition [7][4], if in a specific position there are no train the relative vector entry is null. matrix_of_trains = matrix with the train in the network at each time step (characterized just by the category), used for printing a gif output.
Moreover there are some utility functions, in particular 'outputcsv' generates CSV files useful for studying the behaviour of the network with different parameters. See Appendix A for full code.

Results
(3.2.7) We ran the simulation with different settings of parameters (more than 1000 different settings). Here it is how our network looks (blue are fast trains, red slow trains): (3.2.8) Figure 3 is an example of the kind of analysis you can do with the model. Blue dots are fast trains, red slow. These show the impact of the strategy on the initial delay: if a train has more delay then it will not gain more delay during the journey since it has acquired priority. A better analysis would describe the propagation of the delay over the network. Figure 4, refers to the following settings: injection probability = 0.75 (probability that a train enter in a start edge); lambda Poisson coefficient = 5 (distribution on the initial delays of the trains); probability of stopping anywhere 0.001; probability of stopping at station 0.01.

Discussion
(3.2.9) Strengths The code has the potential of modeling the whole rail network. It has different setting that can simulate peak times, tendency of delays in stations and stops at any berth. One could use this code to study the if a train has more delay then it will not gain more delay during the journey since it has acquired priority propagation of delays and test the implementation of new decision making schemes. When a 'critical' decision has to be taken it would possible to run several forecasts of the network with different decisions and pick the method which most probably will minimize the delays.
(3.2.10) Limitations The actual code is just a sketch written in few days, therefore it is flexible to a certain extent. Although it is possible to input a rail network by its adjacency matrix, there are some constraints about the structure of it. For example, the current code does not deal with overtaking (they are just possible in the stations), nor with paths of different lengths. Time, distance and velocity are treated as discrete values. So far any train takes one time step to move to the next berth, it would be possible to describe the network with a more flexible and realistic structure (including trains with different speeds and subdividing actual "long" berth in shorter ones) but we retain that the discrete approach is a key point in order to make the simulation feasible from a computational point of view. mentation it would be key to rewrite the code in an a faster language, such as C or Fortran. This would allow to process the simulations as fast as possible and would give the possibility of obtaining sensible data in real time. One possible approach would be forecasting the network behaviour accordingly to different strategies (running several simulation in few seconds) and pick the best one.
(3.2.12) It would be possible to model the parameters in such a way that the simulation is able to address realistic situations, for instance by using Figure 5: A sketch showing the signal aspects in the blocks behind a train. The first block behind shows red, the next one yellow, the one after that double-yellow, and finally green.
mixed strategies for the choices depending on the current status of the whole system, of the following track or any particular conditions that might be interesting to explore.
We assume that line-side signals are placed at roughly regular intervals along each line of track, and are the primary (and often the only) mechanism for controllers to affect the progress of a train along its route. Signals show one of four aspects: green, double-yellow, yellow and red. At a green signal the driver can proceed at maximum speed for the track, at double-yellow and yellow the train must slow by increasing amounts and be prepared to stop at a future signal. A red signal must not be passed.
2) The track is divided into blocks by the signal locations. A built-in safety feature ensures that if there is a train in next block then the preceding signal shows red. If the next block is free but the one after contains a train, then the signal is yellow, and if there are two free blocks before the next train, then the signal is double-yellow. With three or more free blocks the signal shows green. See figure 5 (3.3.3) These signals can be manually over-ridden, but only to keep a signal at an aspect that is less permissive than the safety settings above. It is generally not permitted to manually change a signal to a more permissive aspect, since this might result in trains not having enough warning to stop at a future red. Thus by default, signals are left on red until a route is allocated to a train wishing to travel along a particular stretch of track.

The model
We used discrete train model to describe the local behaviour of individual trains as they encountered signals on the track and the feedback between multiple successive trains on the same line. In particular we are interested in the effects of perturbations on a string of trains running under green lights, and on the effects of a string of trains either coming to a halt at an unexpected red, and on a string of stopped trains starting up again after a red light turns green. We take into account the finite length of each train, the fact that signals can be seen in advance of their position, and the time is takes to accelerate and decelerate a train to its desired speed. x n (t) -the position of nth train at time t v n (t) -the velocity of nth train at time t s k (t) -the aspect of the kth signal at time t σ n (t) -the aspect of last signal that train n has seen as of time t where the signal aspects are encoded as 0 for red, 1 for yellow, 2 for doubleyellow, and 3 for green. The input functions and parameters of the model are as follows: h -the distance ahead that a train driver can first view each signal d -the length of each train A(v, σ, δ) -the acceleration function where the acceleration function can depend on the current speed v, the last observed signal aspect σ, and distance to next signal δ.
(3.3.9) The four governing equations are then as follows. The rate of change of position of each train is given by its velocity, and the rate of change of velocity is given by the specified acceleration: (3.3.11) Unless overridden by a manual intervention, the signal aspects are determined by the number of blocks ahead of each signal the closest train is. We must consider both the front and back of each train, and so obtain (3.3.13) Finally, the signal last seen by each train is determined cases, depending on whether the train is in sight of a signal. If it is, then we adopt that aspect. If not, then we leave the last observed aspect unchanged. Hence: For the acceleration function, we assume there is a desired speed V(σ) for each signal aspect, and then take a simple smoothed constant-acceleration function to drive the speed towards the desired one:

Initial model and numerical implementation
(3.3.17) For the target velocities, we take

Results and discussion
(3.3.21) To illustrate the model, a simple case is studied, simulating trains starting up from a line blockage that has just been cleared. A line of trains is initially at rest, with one train per block each stopped just behind a red signal. The signals ahead are then turned green allowing the first train to set off. As each train clears its block the train behind can then start to move.
The results of this model (with the parameter values above) are shown in figure 6. We observe and interesting instability in the start-up process, whereby trains further back initially start moving, but then have to stop again at a red light.
(3.3.23) When trains start up in order, the following train always loses a bit of ground on the train ahead as it only starts to accelerate once the signal ahead turns to yellow. So once up to speed it would be running slightly more than a block length behind. The instability is caused by the driver being able to see the next red signal further ahead than this lost ground. Thus a red signal is observed in the next block immediately after restarting. Whether or not this causes a train to stop, it will likely make it at least slow, reducing the distance to the train behind. Hence the next train in the line will be affected slightly more, and so on.
The instability only grows on the first new red, as the additional slowing for that red allows the train ahead to make up enough ground to be further ahead by the next signal.The only way to avoid this instability is to ensure that the distance lost during acceleration is large enough that each train has no need to break for the next signal. This could be achieved by having trains accelerate more slowly, and/or introducing a more realistic breaking model that doesn't react immediately to an observed red signal, but only starts to slow the train when δ is small enough that braking is needed to come to a halt before the signal.
(3.3.25) We also observe that the trains spend a long time running under doubleyellows, before being far enough apart to only observe green lights. This is because of the relatively small speed differential in the model between green and double-yellow. Thus when a train in front is continually passing green lights and the next train is just over two blocks behind and so seeing double-yellow, the train ahead is only going slightly faster than the train behind. It therefore takes a long time for the train ahead to make up enough extra ground to extend the gap behind it to three blocks.

Further work (model refinement)
(3.3.26) This is only a preliminary model, but is designed to show how detailed train-level modelling could be accomplished. A number of refinements would need to be made for this model to have practical applications.
(3.3.27) These include: • Allowing for variable block lengths and signal observation distances; • Allowing for different trains to have different characteristics; • Providing more realistic acceleration and deceleration models.
In particular, the function A(v, σ, δ) should be adjusted so that breaking only commences when necessary to adjust the speed to the desired value as the signal is passed. So for δ larger than some critical value (dependent on v and σ) no breaking would occur. Currently breaking commences the moment a new signal aspect is observed.

Contact
(3.4.3) where ρ(x, t) is the number density of vehicles, u(x, t) is the speed, and thus the flux is given by ρu. Independent variables x and t are distance along the track and time, respectively. Classically the speed is assumed to be a simple function of density with effects such as acceleration and driver response times neglected. In this case the conservation equation above can be formulated as a differential equation for ρ. Most commonly for car traffic, the speed-density relation is taken to be (3.4.5) where ρ max is the maximum density of vehicles (corresponding to a standstill traffic jam) and the maximum speed is u = 1.
(3.4.6) Using this model we can consider a simple situation in which the initial density of vehicles is zero for x positive and ρ max for x negative, representing a line of vehicles stuck at a red light. The resulting time evolution of the solution to the conservation equation with the velocity function given by the above equation is shown in Figure 7.

Modified Velocity Function
(3.4.7) To adapt the classical theory to describe the flow of train traffic, as opposed to cars, we discussed a number of options. One of these was to take the formula where the non-linearity of this relation is based on the concept that the driver attempts to choose a speed that will allow them to stop in a given distance should the signalling indicate that this is necessary. It also attempts to have a maximum speed, a maximum density and account approximately for the four different signal types that are possible. The behaviour for this modified speed function is given in Figure 8   Here we might account for the train signals that are visible to drivers indicating the density of trains further down the track. To model this, we can assume speed of a train to be a function of train density at a distance, d, ahead of the train, i.e., (3.4.10) We find this modification leads to unstable behaviour, when implementing this flux function within our numerical scheme. We believe this warrants further investigation. Additionally, as signals are placed at fixed points along the track, it may, in fact, be more accurate to model the speed as a function of density at discrete points where x k > x is the location of the closest upcoming signal.

Conclusions
(3.4.11) This investigation seems to show general effects of modifying the train 'flux' that may be useful in predicting behaviour of train traffic and delays. An accurate measurement of the flux would allow us to test and refine predictions, so we would recommend exploring this further. Additionally, the incorporation of looking ahead to signals appears to have some interesting instability issues.

Flux Modelling
(3.4.12) In determining the dynamics of trains is is crucial to understand how the "flux" of trains is altered by signalling and the geometry of the track. We draw on the ideas of traffic modeling and note that, on a single track system, the flux is simply the density of trains times the velocity of the trains. We now explore what this flux might be and how it might be altered. We start by exploring a deterministic model of the flux but we note that in practice the flux will vary considerably with driver behaviour, train type, rail conditions and other source of uncertainty. We start by considering a single block of track and ask what the flux along this is. As the train enters a block there are four possible scenarios: • the signal it just passed was green • the signal it just passed was double yellow • the signal it just passed was yellow.
In each of these cases we assume there two measurable quantities for the block. Firstly: the 'transit time', T t, the time after entering the block that the train takes to pass the next signal and be in the next block and secondly:the 'decision time', Dt, the time after entering the block where the driver has seen the next signal and must react to it such as coming to a halt at a stop signal.
We will briefly discuss general properties that these two quantities might have but would expect the data currently collected to give a very good insight into what the real behaviour is.
To examine what the resulting flux is it is informative to consider a simple diagram that might be put on the train path graph (train position (measured in block number) versus time (in minutes)). To fill the train route in we can now note that when put on this diagram we must have certain blocks and certain parts of blocks empty in order for the train to observe the signals that will make it run at a steady rate through the first block..
Here are the three examples where we assume the signal colour the train driver sees as they enter the block is the same colour as signal when they leave For a train travelling 'under green signals' we have Figure 9. For a train travelling 'under double yellow signals' we have Figure 10. For a train travelling 'under yellow signals' we have Figure 11. From these we can determine the 'flux' of trains under each of the signal conditions. For example under green, using the T t and Dt values for the bock under green, there is the block currently filled by the train which is occupied for T t. Furthermore the two blocks ahead must also be empty for T t and the final block that must be empty for at least T t − Dt so that the signal is green when it is observed by the driver. The total track used is therefore 4T t − Dt block-minutes, hence the flux is 1/(4T t − Dt) train/block-minute for a train under green. Similarly we have 1/(3T t − Dt) train/block-minute for trains under double yellow and 1/(2T t − Dt) train/block-minute for train under yellow. Critically, note that both T t and Dt will have different values for each of the signal conditions and, in particular that they will be smallest under green because the train travels fastest and increase monotonically as the severity of the signal increases and the train travels slower. (Note we expect T t − Dt to also increase monotonically.) What is not so obvious is which of the three possible fluxes is maximum. This is important since the maximum flux would give the fastest recovery from a disruption from the standard timetable. This is similar to the active speed limits on motorways where speeds are set to give maximum flux along the motorway. An interesting study would be to determine the three possible fluxes for all the blocks on a particular line and hence to find the blocks where the maximum fluxes are smallest and hence might indicate regions that will cause disruption to spread.
We can now consider using the existing timetable and the current status to create a new timetable by altering the path in the 'block-time' diagram while keeping the relevant additional blocks empty. We can also modify the shape of the path by removing the usual assumption of 'travelling under green' to one where part of the path is travelling under 'double yellow' or 'yellow'. Such alterations allow the steepness of the path to be reduced while simultaneously reducing the width of the Figure 9: we assume the signal colour the train driver sees as they enter the block is the same colour as signal when they leave. Figure 10: we assume the signal colour the train driver sees as they enter the block is the same colour as signal when they leave path due to needing fewer blocks free around it. Figure 11: we assume the signal colour the train driver sees as they enter the block is the same colour as signal when they leave

Conclusion
(3.4.13) We have outlined a method for assessing possible parts of the track where difficulties may occur by examining the flux along the track. We have also described a method for graphically modifying a timetable that can accommodate understanding of this changing flux. Critical to such an assessment is to determine the dependency of transit time and the decision time on signal state and this requires looking in details at data.  the two, and operational decisions made before the train arrives at that station, which can be used as inputs.
(3.5.3) There are several relevant QoS measures such as: Total cumulative delay at a main station, e.g., Paddington Average journey time between two main stations compared to the scheduled one Average journey time of a typical customer on a common route. In order to compute this measure, we might need additional information. For example, one need to change trains to reach the destination on those common routes. (3.5.6) The QoS measure considered is the total cumulative delay at Paddington station. We would like to determine how delays happening in the network in a one-hour period affect the total delay at Paddington station in the next one-hour period. For each one-hour period, incremental delays for all trains within the network can be computed for all stations. If a train has not used a station in the network within that one-hour period, the delay is set to 0. The total cumulative delay at Paddington station can also be computed using the delays of all trains which arrived at Paddington within that one-hour period. Given the prepared dataset, predictive modeling can be applied using appropriate software such as R, IBM SPSS Modeler, and SAS Enterprise Miner. We could try to predict the service quality at important stations (instead of overall QoS) by using relevant incremental delays from upstream (sub-)network. This approach could address the above issue of primary source of delays. Would it be enough to use the total incremental delay at each station in the network within a one-hour period to predict the delay at the main stations in the next one-hour period, i.e., removing the reference to particular trains in the inputs of the model? In addition to analyse the relationship between current delays and QoS, it would be interesting to use historical data to analyse the uncertainty in primary sources of delays, e.g., traveling time between stations, delays caused by passengers at stations, driver availability, etc. These analyses can be useful in data preparation for the optimization models to construct the (robust) train timetable and to (proactively) reschedule the train timetable while taking into account potential delays in the future.
Assuming the delays at the main stations (e.g., Paddington station) can be estimated, an interesting problem would be how to reassign arriving trains to appropriate berths to make sure the delay in the future will be minimized. The problem would provide a global solution, once every half an hour, let say, while taking into account the anticipated delays of arriving trains.

Method B: Event based delay propagation in a train network
(3.5.9) Introduction In addition to the data driven method above, trains in networks can be characterized by logical relations: for example, train T 1 is in front of train T 2 on the same track segment, and at intersections it is decided which train will go ahead of another train. This implies that delays will propagate according to these logical relations. If train T 1 is behind train T 2 which is delayed then the delay may be passed onto train T 1 . Similarly, if a train has to wait at an intersection for a delayed train then the delay is passed on. (Note: could also use a contagion model. (3.5.10) The idea is to calculate the delays d(k) = (d 1 (k), . . . , d N (k)) for trains T 1 , . . . , T N after event k for given initial delays d(0). More precisely, we want to find functions f k which correspond to individual events such that   The symbol ω will very crudely represent realisations. (For example, if a service runs every day from 1 January to 31 December, then ω is simply a label of all the days of the year.) Let R ω (t) be a train scheduled to leave Reading station at time t on a certain service (line) on realization ω. The delay of R ω (t) is denoted δ R ω (t) .
On the other hand, one measures the total delay at Paddington over a certain period of time, e.g.
where τ > 0, where i indexes all services scheduled to arrive at Paddington during t and t + τ , and δ ω i (s) is the delay at time t ≤ s ≤ t + τ at Paddington Station of service with index i, on realisation ω. Other metrics are evidently possible. One then measures the correlation between δ R ω and δ P ω (t). It may be that (on average) a delay for R of (say) 3 minutes will minimize the delay at Paddington. This suggests that the schedule of R should be modified accordingly. The above should be done for all services: all trains going through all stations in the network (including Paddington!) over all times.

Problem to address
(3.7.2) Delay information of a train when it reaches certain points, or berths, on the route is available from the train movement data. However, it is difficult to know if a lateness is due to normal operational behaviour or if some unpredictable events might have happened. And, even if we knew when and where an action has had been taken to resolve a failure, the failure might have stemmed from points way before it was getting manifested. (3.7.4) In this toy model we assume there are three types of points: junctions, stations and terminal (destination). We could fit a Gamma distribution to each of the point type, using the lateness data corresponding to the types, because a junction may have less lateness than a station, which may have less lateness than the destination.

Description of model
(3.7.5) As an example, suppose the train is en route A-B-C-D, where B is station, C is junction and D is the destination of the train. We are interested in the lateness incurred between segments AB, BC and CD, which we could from observed historical data fit a distribution and it may be the case that on average the relative lateness at C is less than that at B which is less than that at D.
(3.7.6) The total delay adds up all the relative delays along the route, this constitutes one of the possible contingencies arising from normal operational behaviour. However unpredictable events can happen which cause further delays-the so-called primary delays. To model this kind of events, we also associate each interval with a status rating that ranges from 0 to 1, with 0 being normal and 1 being failure, the status rating of an interval measures the likelihood of primary delay in that interval. (3.7.8) Posterior distribution of status rating at each point given the relative lateness at the points: where P (D i ) is the prior for the rating at i, that encodes our prior knowledge of tendency of primary delay during segment (i − 1, i), a Beta distribution can be used, with Beta(1,1) being the uniform distribution, indicating a vague prior.
(3.7.9) The term L(T 1 . . . T n |D 1 . . . D n ) is the likelihood of observing the relative lateness, given the status ratings. Here independence is assumed, that is relative lateness at i only depends on the rating at i. In a simple setting the conditional random variable T i |D i is modeled as sum of two exponential random variables: T i |D i = E 1i + E 2i , corresponding to two lateness contributions: one from normal operational delays and is dependent on if i is in J, S, or terminal; the second is the contribution due to primary delays. In this model we use E 1i ∼ Exp(µ i ) with (3.7.10) An MCMC simulation has been run to sample from the posterior distribution of the ratings, using artificial data. The code in R, as well as parameter settings, are listed in Appendix A.4. Figure 13 shows the posterior distribution of the ratings for each section, for the simulation above. Compare this with the observed relative lateness T i , while we may expect large T i leads to large D i , there could be subtle cases where this isn't so obvious, and it would be interesting to investigate them further.

Approach 7: Bayesian Networks and IDSS
(3.8.1) In today's ever more interconnected world, decision making in dynamic environments is often extremely difficult despite vast streams of data, huge models and ever-growing disparate domains of expertise. Decision support can be valuable, but needs to incorporate all the relevant inputs in a coherent, transparent way so that decision makers can make defensible policy choices. In dynamic, plural environments decision makers often need a tool that can draw together expert judgements coming from a number of different panels of experts where each panel is supported by their own, sometimes very complex, models. Methodology and theoretical developments to do this has been recently developed [1].
(3.8.2) Complex networks of railways are one such example. How can we leverage these developments to increase efficiency through strategic decisionmaking? We first produce a probabilistic model of a single line, from Bristol to Paddington as a Bayesian Network, learn from data the conditional probability distributions and use this to understand how current operational behaviour, in terms of delays, affect service quality.  Figure 14) as a Bayesian network ( Figure 15). We then learn the probability distributions from data, and use domain knowledge to define sensible discretisations in the delays.

Method
(3.8.4) In Figure 15 and following we discretise delays into No Delay (negative delays in freight trains which can leave early to zero delay); Minor delay (less than 10 minutes); Moderate delay (10.1-29.9 minutes); Severe delay (30 minutes plus, this is when compensation starts to be paid) (3.8.5) We extract data into the format of Table 3.8.1 to learn the conditional probability distributions in Figure 16.

Results
(3.8.6) Note that most stations have minor delay as the most likely outcome, with the probability of moderate delays growing as we proceed along the network, as might be expected. We investigate the effects of delays at A what-if analysis shows that a moderate delay or severe delay at Bristol Temple Meads is likely to have dissipated by Didcot Parkway, with no effect at Paddington, Figure 17. A moderate delay or severe delay at North Somerset Junction, Bath Spa or Bathampton Junction is likely to have dissipated by Didcot Parkway, with no effect at Paddington. A moderate delay or severe delay at Thingley East Junction or Chippenham is likely to increase slightly the probability of moderate delay at Didcot Parkway, with no effect at Paddington. A moderate delay or severe delay at Chippenham is likely to increase slightly the probability of moderate delay at Didcot Parkway, with no effect at Paddington. A moderate delay or severe delay at Wooton Bassett Junction is likely to increase the probability of moderate delay at Didcot Parkway, with no effect at Paddington. A moderate delay or severe delay at Swindon is likely to increase significantly the probability of moderate delay or severe delay at Didcot Parkway, with no effect at Paddington. A moderate delay at Uffington is highly likely to lead to moderate delay at Didcot Parkway, with no effect at Paddington. The effect of a severe delay is a flat distribution at Didcot, suggesting this combination has not been seen in the data. Moderate or severe delays at Southall, Acton West or Ladbrooke Grove are highly likely to lead to moderate or severe delays at Paddington, as might be expected, given their proximity and the limited opportunity for corrective interventions.
(3.8.8) Another approach is to assume that a severe or moderate delay has been observed at Paddington and examine the probability distributions at the upstream stations. Figure 18 and Figure 19 show that a moderate or severe delay at Paddington implies an upstream delay beginning at Heathrow Airport Junction and propagating though to Paddington. Further investigations could be made if data on delays at Heathrow Airport, Oxford and Newbury ere obtained or those same journeys and the probability distributions on delays given delays at multiple upstream stations could be investigated. (3.8.10) This provides a probabilistic model for a single journey. Similar models could be developed for whole regions, and these networked together using principles in [1]. If a suitable multi-attribute utility can be elicited from decision-makers, which has measurable attributes and passes the clarity test, then the networked probabilistic models for regions along with other key expert panels (e.g. weather) can be used to evaluate candidate policy options and score each with respect to the utility defines, and taking uncertainty into account. Figure 23 gives a sketch of such an integrating decision support system.   Figure 22: Bayesian network, with probability distributions learned from data augmented with heuristic and subjective probabilities where data sparse or nonexistent. This shows the downstream effects on Paddington of a severe delay at both Slough and Heathrow Airport, both feeding into Heathrow Airport Junction. Image produced in Netica [5] Figure 23: Sketch Integrating Decision Support System (IDSS) with 5 attributes in the Utility: average passenger delay, passengers delayed over one hour, compensation payments, revenue and Briggs station efficiency [11]. Weather can affect any region and all regions contribute to the utility function. Region A affects delays in Region B and Region B affects Region D but regions D is conditionally independent of Region A given Region B, i.e. if we know the status of Region B, knowing the status of Region A gives no further useful information to predict the status of Region D. Region C is independent of other regions. Image produced in Netica [5] (3.8.12) There is now methodology to knit together probability models for different parts of a complex rail system for decision support. Called an integrating decision support system (IDSS) this makes possible coherent inference over a network of probabilistic models [1,2,3]. With this methodology, a decision making panel can define a utility function and the IDSS can be used to score candidate policies to aid selection. This approach is transparent giving decision-makers the ability to justify decisions to an auditor. Uncertainty can be propagated using Tower Rules E(X) = E(E(X|Y )) and V ar(X) = E(V ar(X|Y )) + V ar(E(X|Y )) (sometimes called the law of total probability for the expectation and conditional variance identity). This allows uncertainty to be incorporated into the score for candidate policies.

Future developments
(3.8.13) * Gather data on the interventions enacted to mitigate delays and estimate effect of intervention v no intervention perhaps by structured expert judgement approaches [6,7,8,2] * Represent the rail system as a set of probabilistic models in a way that is suitable to the domain, e.g. by management regions. Add in probabilistic information from any external influencing factors, e.g. weather forecast indicating ice / heat / snow / likelihood of leaves on the line. * Develop a mulitattribute utility against which to evaluate candidate interventions [4], e.g. overall delay, number of passengers delayed more than x minutes, costs of compensation to passengers / train operators. * Develop IDSS [1] to score candidate policies to provide decision support.

Approach 8: Mixed Integer Programming
(3.9.1) It's also a toy model of a sort. Method Aim to reschedule traffic in a limited time window (e.g., 60 minutes into the future) on a macroscopic level (that is, arrivals/departures to/from major stations and junctions; do not pay too much attention to detailed track topology and dynamic properties of rolling stock, etc.). The objective is to make the new schedule as close to the original one as possible.

Decision variables
(3.9. 2) The rescheduled time of each event. An event is the arrival of a particular train in a particular location, or the departure of a particular train from a particular location. For certain pairs of events, the precedence relation. Constraints • Open track capacity is modelled by headway constraints: minimum time between departures/arrivals of two trains to/from the same line.
• Overtaking: where overtaking is impossible (open track and certain stations/junctions), the precedence relation must be preserved.
• Train must not depart earlier than advertised time of departure.
• Minimum travel times between pairs of consecutive stations (depends on class of train, but can be refined to be more train-specific).
• Minimum dwell times in stations where the train stops (also depends on class of train, but can be refined to be more train-specific).
TO DO: How to model conflicts between arriving and departing trains at Paddington?
Objective Minimise total delay = sum of (rescheduled time of event -original time of event) over all advertised arrivals of trains.
Implementation Coded using the Mosel modelling language to be solved by Xpress. (Because JF has a licence for Xpress and no licence for Cplex or so.) See Appendix A.3 for code.

Results
We have implemented the model in a simple situation of the Reading-Paddington line, with data for 7 trains. In simple scenarios it shows that sometimes it may be useful to reverse the order of running trains, sometimes it may be beneficial to let trains overtake at stations to deal with disruptions (overtaking is not really possible between Reading and Paddington, but we allow it in the model to test these aspects). In one scenario we can see how the delay of a train departing from Reading propagates to a train going back from Paddington to Reading (same rolling stock). Discussion Strengths The model deals with disturbances on a more global scale by considering the impact of traffic-management decisions on the whole network. The solution is optimal with respect to preserving the original schedule as much as possible.
Limitations The model does not take into account detailed-level information, such as platforming, exact track topology, dynamic properties of rolling stock, etc. However, more constraints can be added if necessary to model some of these aspects.
Scalability is yet unknown, but it appears that rescheduling an hour of traffic on the GW network should be feasible. The practicality of this approach has two parts: 1. Can we solve the model in reasonable computational time? 2. Is the resulting schedule feasible in practice? In practice, there will always be a trade-off between computational time and quality of new schedule.

Next steps (& Scalability)
• Test whether the model can be scaled up to usable size.
• Build a front end that will allow the user to study the effects of different traffic management decisions, and the propagation of various disruption scenarios.