Traﬃc conﬂict reduction based on distributed stochastic task allocation

The aim of this paper is to provide preliminary results on a traﬃc coordination framework based on stochastic task allocation. General trends and the predicted advent of personal aerial vehicles increase traﬃc rapidly, but current air traﬃc management methods admittedly cannot scale appropriately. A hierarchical system is proposed to overcome the problem, the middle layer of which is elaborated in this paper. This layer aims to enable stochastic control of traﬃc behaviour using a single parameter, which is achieved by applying distributed stochastic task allocation. The task allocation algorithm is used to allocate speeds to vehicles in a scalable way. By regulating the speed distribution of vehicles the conﬂict rates remain manageable. Multi-agent simulation results show that it is possible to control ensemble dynamics and together with that traﬃc safety and throughput via a single parameter. Using transient simulations the dynamic performance of the system is analysed. It is shown that the traﬃc conﬂict reduction problem can be transformed into a control design problem. The performance of a simple controller is also evaluated. It was shown that by applying the controller, quicker transients can be achieved for the mean speed of the system.


Abstract
The aim of this paper is to provide preliminary results on a traffic coordination framework based on stochastic task allocation. General trends and the predicted advent of personal aerial vehicles increase traffic rapidly, but current air traffic management methods admittedly cannot scale appropriately. A hierarchical system is proposed to overcome the problem, the middle layer of which is elaborated in this paper. This layer aims to enable stochastic control of traffic behaviour using a single parameter, which is achieved by applying distributed stochastic task allocation. The task allocation algorithm is used to allocate speeds to vehicles in a scalable way. By regulating the speed distribution of vehicles the conflict rates remain manageable. Multi-agent simulation results show that it is possible to control ensemble dynamics and together with that traffic safety and throughput via a single parameter. Using transient simulations the dynamic performance of the system is analysed. It is shown that the traffic conflict reduction problem can be transformed into a control design problem. The performance of a simple controller is also evaluated. It was shown that by applying the controller, quicker transients can be achieved for the mean speed of the system. as a first approximation (reflecting on the assertive nature of the movement) rather than the discrete behaviours [22,23] based on interval programming [24]. Aircraft and formations are expected to cooperate and share their current and planned behaviour parameters via a system similar to ADS-B. Behaviour parameters capture relevant intent in a compact form enabling a low communication burden. The high level solves a traffic-centred, distributed optimisation problem to adapt the stochastic Markov matrices of aircraft and formations to achieve the highest airspace capacity and utilisation while satisfying safety-related constraints. The aim is to find a trade-off between nominal throughput and the probability of conflicts (since conflict resolution manoeuvres cause deviation from the optimal path, thus, reducing traffic flow).
Due to the general risk-based approach to unmanned traffic management and the inherent establishment and transition period human supervision of the coordination system will be necessary. Human-system interaction within the UTM system has already been the subject of research [25] and is likely to be further investigated as the framework matures. From studies thus far it is already clear that the complete system needs to be considered when analysing human factors, and transparency would be key, as too much, too little or irrelevant information can lead to challenging situations as experienced during flight tests.
In the study conducted by Sadler et al. [26] the response times of pilots to resolution advisories and their compliance rates were measured and used as metrics, together with subjective opinion, to assess pilot performance. It was found that the way the ground control station is configured, the available input methods, the type of advisories and the automated functions all affect performance.
An integration of the three layers described above yields a unified traffic coordination framework capable of handling low-level collision avoidance and high-level optimal flow management in a combined manner. Therefore, a better overall understanding of the system is expected, enabling improved trade-off between system performance metrics. All coordination levels should operate with appropriate transparency to enable the human supervisor to understand the state of the system and make decisions appropriately. This is ensured by the choice of comprehensible decision variables, which are the parameters of the behaviour distributions.

Focus of the paper
In this paper, the middle level is described and elaborated, as it is the key layer for the framework being connected to both other layers, providing the interface. Also, this layer is believed to be the most complicated and contain most of the novel research results, as both formation flight and optimisation are fairly well researched. The aim of the middle layer is to ensure that the prescribed speed distribution (considered as the behaviour) is admitted by the agents, and thus, the probability of conflicts is controlled. Separation, conflict and intrusion are defined based on time and distance, the role of the middle layer is to ensure that enough time is reserved for appropriate reaction and conflict management if aircraft arrive at close proximity. Therefore at this level conflict is defined as the situation when the time elapsed between another aircraft closing from sensing range to conflict range is less than a predefined interval.
The key idea behind that is when the density of agents is high, the mean of the desired local speed distribution should be lower. On the other hand, if the density is lower, the mean of the speed distribution may be higher to enable a smoother flow of traffic, leading to the highest possible safe throughput. Even though this approach does not enable the regulation of the exact number of conflicts, their probability can be managed. This is achieved by manipulating only a single degree-of-freedom, changes in heading and altitude are still available for the low level for resolution should any intrusions occur. Because of the high traffic density, it is expected that the key issue to be tackled would be to handle emerging conflicts due to conflict avoidance manoeuvres. This is called the "domino effect", defined as an indicator of system stability [19]. By changing the overall local behaviour of agents it is expected that this problem is implicitly managed. Also, the aim of a traffic management framework (especially the higher levels) is not to minimise the number of conflict events, but to reduce the macroscopic complexity of the airspace [27]. Macroscopic complexity is a measure of how difficult it is to solve a given conflict scenario due to the number of participants involved. Through regulation of conflict probability, airspace complexity can also be managed.
For the higher level to be able to adjust the desired distribution easily, specific probability distributions have been selected as desired distributions. The parameters of the distributions may be regarded as the decision variables of the framework. Performance of truncated exponential distribution and the truncated normal distribution have been investigated within the scope of this paper. Normal distribution of agent speeds may resemble a generally smooth flow of traffic with deviations, while the exponential distribution may correspond to a situation involving emergency vehicles for which the probability of high speeds must be available. These exceptional, selected vehicles may show non-compliance, and occupy these speeds deterministically, while the rest of the vehicles need to adjust their states accordingly.
The general research question is how to formulate a stochastically bounded traffic management algorithm with favourable scaling and resilience characteristics. In particular, the question is how to limit the conflict rate of the complete system by varying the speed distributions of vehicles. The aim of this paper is to evaluate the applicability of distributed task allocation as the middle level of such a framework by performing a simulation campaign and analysing its results. Task allocation is used to drive the speed distribution of vehicles to the desired distribution and maintain it. The methods used for task allocation are based on homogeneous Markov chains (HMC [28]), inhomogeneous Markov chains (IMC [29]) and inhomogeneous Markov chains using local information only (LICA IMC [30]).
To analyse the operation of the system, agent-based numerical simulations were used, similarly to the well known AgentFly system [31]. Based on the simulation results key system dynamics can be identified and described, essential to a macroscopic model of the traffic flow, similarly to Macroscopic Fundamental Diagram (MFD) in ground transportation.

Structure of the paper
The paper is structured as follows. In Section 2 a survey of relevant sense & avoid techniques and traffic coordination methods is given. Section 3 provides a brief introduction of distributed task allocation algorithms. The framework used to carry out the simulations is described in Section 4. The steadystate numerical simulations to assess the applicability of the method are described in Section 5. Section 6 details transient simulations and analysis. Evaluation of performance with a regulated distribution parameter is given in Section 7. Concluding remarks are given in Section 8, and possible future work is outlined in Section 9.

Definitions and context
The description of the concept of free flight [15] already refers to aircraft conflict management as a "stochastic control problem" due to the several sources of uncertainty. Airspace is broken down into three regions with increasing levels of restriction as aircraft fly from low-density areas to the vicinity of terminals. Together with the description of the concept a principal analysis of functions, communication and data to be shared is provided. A "mixed focus" configuration is outlined defining the roles of aircraft crews, air traffic controllers and airline operators. Several conflict determination and resolution methods are presented categorised according to aiding logic type, control variables and whether they are single or continuous solutions. Three variables are identified as possible ways to avoid conflicts: speed, altitude and heading. Although the concept of free flight is clearly outlined, no safety or performance analysis is presented.
The feasibility of the free flight concept of operations for urban air mobility has already been investigated [32]. The aim of the framework elaborated is to enable safe and efficient flights and a high density of operations, building on the already established benefits of free flight. The problem of conflict resolution between aircraft while guiding them to their destination was formulated as a multi-agent Markov decision process and solved by the Monte Carlo tree search algorithm. The effect of creating airspace sectors and flights with higher priority was also investigated using numerical simulations. The scenario (2D level-flying aircraft in a high-density environment) and the concept (attempting to limit conflict probability by introducing coordination) are similar to those of this paper. An important difference however is that in the study by Yang and Wei [32] conflicts are resolved by heading changes, potentially leading to unsolvable situations. Also, the effect of the method applied can only be seen by a posteriori analysing the simulation data, thus, no assurances can be made before deploying the strategy.
With the transition to free flight, the traditionally centralised air traffic control becomes a distributed one [33]. To analyse the emergent behaviour of an integration of systems with the novel behaviour Petri net modelling and Monte-Carlo simulations are used. The overall goal is to understand how to design the free flight framework for optimal safety and capacity by understanding the process of multiple conflict resolution and providing an unbiased estimator for aircraft collision risk considered to be rare events. The conflict resolution itself is based on the well-known Modified Potential Field method. Even though the density of aircraft simulated is higher than usually encountered in general aviation, it still does not compare to projected PAV (Personal Aerial Vehicle) density.

Sense and avoid methods
The robustness of a collaborative sense and avoid algorithm in the terminal control area is investigated by Allignol et al. [34], using an adaptation of the ORCA (Optimal Reciprocal Collision Avoidance) algorithm [35], changing the heading of UASs (Unmanned Aerial System) to avoid conflicts with aircraft on a fixed trajectory. Conflict and separation are defined based on the distance to other aircraft for a given lookahead time. Probability theory and random variables are used in this context to assess the effect of navigation accuracy on the performance of the algorithm. This analysis shows how important it is to take the effect of noise and uncertainty into account. However, this analysis only considers normal distribution for position and velocity, since the distribution is a result of inaccuracies, not intentional stochastic design.
A probabilistic conflict detection algorithm is described by Hao et al. [36]. The underlying assumption is that waypoints and arrival times are constrained, therefore the reachable set of a given aircraft and the probability distribution of future positions changes continuously with time. Using a truncated spatio-temporal Brownian bridge motion model the conflict probability of multiple aircraft can be calculated in space-time. This method is only applicable to trajectory-based operations, not in a free flight scenario.
A probabilistic collision avoidance algorithm for unmanned aircraft can be formulated based on Markov decision processes (MDPs) and partially observable Markov decision processes (POMDPs) [37]. The key advantage of this method is that it can accommodate a wide range of sensor types. After a state-space reduction, POMDP solvers can be used to produce optimal policies which can be implemented in real-time. Even though the solution is optimal for different sensor configurations, this algorithm does not consider the higher level, ensemble related performance of the system.
For traffic management and coordination, a novel air traffic flow model is presented by Alam et al. [38] based on functional airspace blocks. Distributed 4D trajectory planning is achieved based on input from airlines, airports and Air Navigation Service Providers (ANSPs) and centralised planning by the central flow management unit (CFMU). The overall goal of the algorithm is to minimise the interaction between trajectories by applying route/departure-time allocation techniques. Departure-time techniques adjust planned departure time within bounds, while route modification consists of waypoint repositioning. This enables a MIP (Mixed-Integer Optimisation Problem) formulation. Interactions are identified using a four-dimensional grid structure and iteratively minimised using a hybrid method with local search (LS) in the inner-loop and simulated annealing (SA) in the outer. The presented distributed method, however, is only applicable for pre-planning purposes.
Another distributed, 4D path planning algorithm is introduced by Egorov et al. [27] for an unstructured airspace, emphasizing the importance of encounter aware planning. The necessary characteristics of a traffic management system are described as robust, reliable and scalable. During the off-line, pre-planning phase, trajectories are optimised considering the probability distribution of the types of conflicts according to the number of vehicles affected. As a metric encounter shift is introduced. The efficiency of the method is proven by simulations for a multi-modal traffic region. The method, however, is only applicable off-line for strategic deconfliction, and thus, it cannot adjust for actual operational deviations.

Airspace and system-level analysis
The stochastic simulator presented by Lecchini et al. [39] is able to take flight related uncertainties into account. In this stochastic environment conflict resolution is formulated as a constrained optimisation problem, which is solved by the Markov Chain Monte Carlo method. A drawback of the applied algorithm is that it can only be applied off-line.
Having a prescribed structure of the airspace can provide an a priori separation procedure for aircraft, and thus, limit the probability of conflicts. The relationship between capacity and structure is analysed by Sunil et al. [40,41] for four structures: full mix, layers, zones and tubes, each further constraining the traffic flow movement together with possible resolution methods. For the first three, separation is defined based on distance, for the 4D tubes time-based separation is used. Simulation results show that airspace structure has an influence on conflict probability, the best safety, efficiency and stability metric can be achieved by segmenting traffic into altitude bands according to heading range.
The introduction of structure to our framework, and studying its effect on safety and efficiency is part of our future plans. The current paper, however, focuses on how to apply constraints and structure just when they are necessary. These constraints are applied by changing the speed distribution. Also, it is unlikely that PAVs could occupy several altitude bands, therefore, introduction of structure into the UAM airspace may be difficult.
Since all analysis in this paper is performed in 2D, the methodology may be compared to land-based systems. Self-organisation and emergent formations of a swarm of autonomous, car-like vehicles was investigated by Vanualailai et al. [42,43]. The papers investigate the emergent dynamics of a Lagrangian swarm. It is shown using computer simulations that the resulting formations are cohesive, well-spaced and bounded around the centroid of the formation. The focus of the papers, however, is on the patterns arising and formation keeping. Therefore, the results are not directly comparable.
Macroscopic modelling, MFD (Macroscopic Fundamental Diagram) and aggregate level description are well-established methods in the field of urban ground mobility. MFDs describe the relations between traffic flow, density and velocity [44]. MFDs can be used to apply high-level control to regulate ensemble dynamics. The control goal is to maximise throughput and stabilise traffic flow. A feedback regulatorbased gating method is described by Keyvan-Ekbatania et al. [45]. For control design, a basic dynamics model is used, developed based on the NFD (Network Fundamental Diagram) of the entire network. For a heterogeneous urban network, an optimal perimeter traffic flow controller can be derived analytically, as discussed by Aalipour et al. [46]. These papers show how important modelling aggregate dynamics and traffic flows are for proper control design. However, these models are generally applicable for ground traffic networks.
The papers referenced provide an overview of the research related to general air traffic management, conflict probability calculation and conflict resolution. However, no methods have yet been advised to bound the conflict probability without having hard restrictions on airspace structure. This paper presents a novel coordination method in which constraints on the trajectory, namely, changed in speed are applied only when they are necessary. This is in line with the general concept of free flight. In this paper, the regulation of conflict probability is investigated by modifying ensemble speed distribution using the distributed stochastic task allocation algorithm. This enables a real-time, distributed coordination method. Because of the appropriately defined parameters the current state of the agent can be efficiently disseminated, therefore the communication burden of the algorithm is low.

Distributed Task Allocation
The technique at the core of the method presented is distributed stochastic task allocation, therefore in this section, a short overview of such algorithms is given. All referenced articles deal with formation flight of UAVs, where formation is accomplished in an airspace partitioned into disjoint bins. These bins can also be interpreted as tasks, and partitioning as task allocation as long as tasks are disjoint. Task allocation algorithms in general refer to algorithms which assign sub-tasks to agents to maximise a utility function. Tasks may be interpreted as speeds of the vehicles. Thus, in this paper, the objective for task allocation algorithms is to drive the system of vehicles to the desired speed distribution by assigning speeds to each vehicle. This way the system-level speed distribution of vehicles can be driven to the desired one.
For each UAV, the target bin is determined stochastically. The same framework and algorithms may also be used for task allocation to ensure convergence to an optimal behaviour/speed distribution. The algorithms presented in this section were implemented in the simulator and used to drive the vehicle speeds to the desired distribution so that the emergent dynamics of the system could be observed. The key aim of this paper is to assess and evaluate the suitability of task allocation algorithms for the realisation of the middle layer of the traffic management framework described in Section 1.

Homogeneous Markov Chains (HMC)
The probabilistic guidance law based on Markov chains was introduced an later elaborated by Açikmese and Bayard [28,47]. A predefined desired formation (desired density distribution) of agents was achieved using an open-loop (no feedback from the actual state), stochastic, decentralised algorithm, formulated the following way. Let x (t) be the vector of probabilities corresponding to the tasks/states/speeds at time t such that 1 T x (t) = 1, with a given desired distribution . The probability vector evolution is given by Equation (1).
The matrix M is the Markov matrix containing the state transition probabilities. The next definite state is given as y (t + 1) according to Equations (2) and (3), where y is a vector with a single 1 as the jth element, and all other elements 0. (2) In Equation (2) z is a random number with uniform probability distribution such that z ∈ [0, 1] holds. The Markov matrix (M) is synthesised using the Metropolis-Hastings algorithm, as given by Equation (4), where I denotes the identity matrix.
Since not all state transitions are physically possible (due to acceleration constraints), an adjacency matrix (A a ) defined in Equation (5) indicates permitted transitions (Equation (6), representing the Hadamard product).
The elements of the proposal matrix follow Gaussian distribution (Equation (7)).
In Equation (6) c i is the vector describing the states and the function g (z;η, δ) (Gaussian distribution) is defined as in Equation (8).
The elements of the acceptance matrix are subject to the constraint in Equation (9).
Acceptance matrix elements are determined as in Equation (10), with α ∈ (0, 1] being a design parameter. In Equations (9) and (10) the matrix R is defined according to Equation (10), where θ is the vector of desired (steady-state) distribution.
Using Equations (4)-(11) the Markov matrix can be constructed, which can be used to determine states at the next time step (Equation (1)).
Speed at the next time step is determined according to Equation (12), where V is a prescribed speed vector.
The main advantage of this method is that only the desired state needs to be transmitted to agents, who realise the Markov chain independently, no communication about the state is necessary. As no feedback is incorporated, the Markov matrix is constant in time, which leads to a homogeneous Markov chain (HMC). Asymptotic convergence of swarm distribution is guaranteed to desired distribution if the spectral radius condition is satisfied. The algorithm was extended to handle collaborative sub-swarms (composite swarms) using consensus techniques. A Linear Matrix Inequality (LMI) formulation of the problem has also been elaborated ensuring convergence with a prescribed exponential rate, also minimising a cost function. Scalability was demonstrated by applying the method with the desired distribution given as a greyscale image [47].

Inhomogeneous Markov Chains (IMC)
The swarm guidance method based on Markov chains was later extended by Bandyopadhyay et al. [29] by incorporating feedback from the current distribution state into the transition matrix synthesis. This lead to a closed-loop strategy by generating inhomogeneous Markov chains. Markov matrix changes in time due to the feedback, thus, the Markov chain is inhomogeneous (IMC). The algorithm was proven to converge and minimise transition costs. Apart from the Markov chain construction algorithm (Equation (4)), the same speed/task/bin allocation method can be used as in the case of homogeneous Markov chains, the optimal Markov chain is constructed using the following algorithm. The feedback gain is the Hellinger distance (for the kth agent at time instant t given by Equation (13)) between the current and desired distribution, if it is sufficiently low, the transition matrix is changed to identity to avoid unnecessary transitions.
In Equation (13) μ k t denotes the estimation of agent k at time t of the total agent distribution μ * t defined as in Equation (14), m t denoting the total number of agents at time t, and r k t is the current task of agent k at time t.
As an estimation method, Bayesian Consensus Filtering [48] may be used. Given the feedback (ξ ) and the transition cost matrix (C), the problem can be formulated as a linear program subject to constraints giving the optimal Markov matrix as in Equation (15).
Parameters ε M ∈ (0, 1] and ε C ∈ IR + in Equation (15) are constant design parameters. The technique is also extended with a time-dependent method to facilitate initial convergence and reduce undesirable transitions after sufficient convergence and a deterministic procedure to escape transient bins.

Local information consistency -Inhomogeneous Markov chains
The feedback-based swarm guidance algorithm was further developed Jang et al. [30] to function without having to rely on global information or an estimation of the distribution of the entire swarm. This is called Local Information Consistency Assumption (LICA) as opposed to the general Global Information Consistency Assumption (GICA). By using local information only the algorithm remains scalable with the number of agents involved, an important aspect when utilised for traffic management on a large scale. Since the method is fundamentally applied to probabilistic swarm guidance in a bin partitioned world, in the paper local is defined in terms of neighbouring bins (transition is possible within a single time step) as opposed to a definition based on communication range used in this paper. It is assumed that aircraft cooperate, and share navigation data with all others within their communication range, which are considered to be the local quantities. Local current and desired distribution are defined as given by Equations (16) and (17) respectively. The convergence of local quantities is equivalent to that of global quantities if the topology graph of the bins is at least bidirectional and strongly connected.
The feedback is defined according to Equation (18), saturated to ε ξ , 1 , α denotes the sensitivity parameter. Markov matrix synthesis is similar to Equation (15), and is given by Equation (19). An important feature of this LICA-based method is that using local information only, and enables an asynchronous implementation. Algorithms for Markov chain synthesis under flux upper limits, and, for local information based quorum model are also reported by Jang et al. [30].
The above-described algorithms were implemented with a behaviour/task allocation perspective to achieve the traffic coordination goals introduced. Within the context of the paper, x is the probability vector corresponding to potential speed states, is the desired speed state distribution of vehicles, A a is the matrix representing the acceleration constraints, and V is the vector of potential speeds within limits a and b.

Simulator platform
The primary aim of the simulations is to determine whether distributed task allocation algorithms are applicable for stochastic traffic flow management to enable a high-level optimisation layer to control ensemble dynamics and characteristics using a single parameter.
As it is not common in air traffic management research, to carry out the analysis and study the emergent dynamics, a custom simulation environment was created in Matlab.
This way the fidelity and complexity of the simulation modules could be appropriately decided with full control over all the parameters and the possibility to easily extend the framework. Since there are no external data flows to be considered, all the vehicles are synthetic and the point of the simulation is to explore the system rather than individual vehicles, the simulation time is different than real-time.
The architecture of the simulation framework is based on an agent-based modelling approach. The system is solved in a time-based manner, with fixed time steps. Each vehicle is modelled as an agent which moves and makes decisions according to the implemented logic and the location of surrounding vehicles and the information transmitted by them. The speed for the next simulation step is decided based on the task allocation algorithm run individually by each agent, respecting the acceleration constraints. Then, a motion step is taken by the vehicle. A simulation step is completed once all agents made a decision and a movement step.
In case of centralised task allocation algorithms the Markov matrix is generated centrally and passed on to each agent. For decentralised algorithms, it is generated based on the state information of the surrounding vehicles. Since for the communication model a disk radio is used, the decision of whether the information needs to be taken into account from another vehicle becomes a distance calculation. When task allocation is asynchronous, agents are put in a random order, then synthesise the Markov matrix and update the state sequentially.
During the simulations conflict events are detected and stored centrally, with the ID of all vehicles involved. The time from the sensing range to the conflict range is measured. Vehicle states and speeds are also recorded, together with state changes. Based on these, the conflict rate and throughput metrics, the number of transitions and the distribution distance from the desired can be calculated.
Since the results for iterations and different parameters are independent in steady-state, the simulations can be run in parallel. Therefore, to run the simulations a high-performance computing facility was used. The general run time was of the order of a few hours for the set of distribution parameters.
In this paper, a free flight scenario is analysed with en-route traffic, maintaining level flight, therefore it is sufficient to handle the problem in 2D. Conflicts are defined as stated in Section 1 and elaborated below. As a behaviour (task), speed changes are considered only, conflict resolution via heading changes are preserved for the deterministic low level.

Simulation scenario
The simulations are performed in a 100 nmi 2 region (10 nmi × 10 nmi) as an "infinite world", in order to maintain the total number of agents.
An "infinite world" means that once an agent crosses the border of the simulation area it reappears on the opposite side, with all other parameters (state, speed, heading, etc.) unchanged. Aircraft density was chosen in line with urbanisation trends and market penetration predictions based on the analysis by Sunil et al. [49]. Considering the area simulated, the expected density and that the number of vehicles need to the integers, simulations were performed with the following agent numbers: 165, 211, 258, 305. The resulting densities of vehicles match the scenarios analysed by Sunil et al.
Within the simulation, aircraft models need to be representative of urban air mobility vehicles. Considering the diversity of such vehicles, instead of modelling a specific type of vehicle, a surrogate model is developed based on kinematics and certain dynamic characteristics, which makes the analysis more general.
For aircraft kinematics a simple integrator model is considered as given by Equations (20)- (23), where ψ denotes the heading, t is the discrete time step and (x, y) is the position of the aircraft. The speed of the aircraft is denoted by v, v x and v y are the velocity components in the x and y direction.
The discrete time step was chosen based on rough approximations and estimations of the dynamics of personal aerial vehicles [50,51]. Since the aim of the middle layer is to determine the prescribed behaviour of the bottom level and provide a reference to the guidance algorithm, therefore the time step size was selected considering the settling time of the step response dynamics, which is t = 5 s for both longitudinal velocity transfer functions in the cited references. The modelled dynamics are generally representative of a wide range of vehicles, independently of their architecture and design.

Speed distributions
For the speed distributions, the truncated exponential distribution (Equation (24)) and the truncated normal distribution (Equation (25)) were selected. Truncated exponential distribution is given by Equation (24). Lower limit is a, upper limit is b, rate is λ.
Truncated normal distribution is given by Equation (25). Lower limit is a, upper limit is b, mean is η, standard deviation is σ .
The probability density function of the standard normal distribution is given by Equation (26), the cumulative distribution of the standard normal distribution is in Equation (26) (where erf is the error function).
As a distribution parameter the rate (λ) is considered in case of the exponential distribution and the mean (η) in case of the normal distribution.
Speed limits were chosen taking into consideration the operation in an urban environment. Therefore the minimum speed is set as a = 15 Kn (below this speed, manoeuvres are considered low-speed [50,52]), the maximum speed is set as b = 180 Kn. Thus, the average roughly matches the speed forecast by Perfect et al. [52] for the decision variable limits, and the range is in harmony with the characteristic speeds [2,41].
Decision variable limits are selected so that the mean speed values change between 20 and 90 Kn in order to retain the necessary freedom and flexibility. For the exponential distribution the range was chosen as λ ∈ [0.0025, 0.2]. For the normal distribution the standard deviation is kept constant at σ = 10, and the mean is adjusted as η ∈ [5,90].
The dynamics model has a relationship with the task allocation adjacency matrices since it governs the available subsequent states (speeds), which is based on the level flight acceleration limits of the aircraft. Considering passenger comfort, the acceleration limit is specified as 1Kn/s ≈ 0.5m/s 2 [53], which seems to match the average acceleration capabilities of personal aerial vehicles [50]. Variation of acceleration limits with speed is not considered here. Given the time step of t = 5s the proposal and adjacency matrices are defined to enable transitions to states with a speed deviation of ±5Kn. Based on dynamic limits, the minimum speed was chosen as a = 15Kn, the maximum speed as b = 180Kn and the acceleration limit for the t = 5s timestep as 5Kn. Thus, the available states were created by dividing the range between the minimum and the maximum speed into 5Kn wide segments (15, 20, . . . , 180).

Description of the environment
Since the scenario considered is personal aerial vehicle traffic in an urban environment reduced separation margins are used. For levelled flight only horizontal separation is meaningful, which was defined as R C = 0.135nmi [41]. Since cooperation between aircraft is assumed and separation is achieved by a distributed algorithm, communication range has to be defined. The nominal range of ADS-B is greater than 90nmi (air-to-air) for low interference (FRUIT -False Replies Unsynchronised to Interrogator Transmission) conditions [54]. For a high-traffic-density scenario, such a high range is neither necessary nor probable (because of the high interference due to high communication density). It is assumed that the communication range scales similarly to separation limits (standard ATM separation for ADS-B equipped aircraft [55] of 5nmi reduced to R C = 0.135nmi), thus, the data broadcast range used in this paper is 2.5nmi. For the LICA based algorithm, local states and quantities are the ones within this range. To verify that data dissemination is possible, the density capacity of the ADS-B protocol was estimated. Media access is based on TDMA: each agent transmits at most 6.2 messages of 120µs on average every second [56], therefore ideally 1,344 aircraft can be handled simultaneously. For the simulated agent densities, the number of aircraft within the communication range is significantly less. Hence it is assumed that the communication part of the system introduces no errors.
The intention of the middle layer is to reduce conflict probabilities, the probability of simultaneous conflicts and conflict propagation. Moreover, it should ensure that enough time is reserved to resolve conflicts should they occur. Therefore, a conflict event is considered to be the incident when the traversing time of an intruder aircraft from the sensing range (communication range, 2.5nmi) to the conflict range (R C = 0.135nmi) is less than a predefined threshold. When selecting the conflict threshold time several factors have to be taken into account, for example, the performance of conflict detection and conflict resolution algorithms, passenger comfort levels and their relationship with acceleration and its time derivative (investigated e.g. by Dousse et al. [57]). For the simulations in this paper, threshold times of 30, 40, 50 and 60s are investigated. Similarly to TCAS [58], a 5s transition time is assumed based on settling time of the yaw transfer function [50], with a worst-case relative speed of b = 360Kn. For traffic management GPS and ADS-B technologies, separation limits have long been expected to become time-based, defined according to wake turbulence intensity [59]. The highest 60s separation corresponds to the current limit for light aircraft in a terminal manoeuvring area [60]. Given the allowable collision probability threshold, the separation intervals can be determined, similarly to the method presented by Bell [61].  Figure 1. Illustration of the deviation penalty (P). The ownship is the aircraft from the viewpoint of which the scenario is considered. Any other aircraft is considered to be an intruder.

Performance metrics
The quality metrics considered reflect on the original objectives, which is to achieve the highest traffic throughput while complying with safety requirements. To evaluate the performance from a safety perspective the number of conflicts normalised by the number of agents and the time elapsed is calculated. The definition of the safety metric (S) is given by Equation (28).
In Equation (28) the total number of agents is denoted by n agent , C is the total number of conflicts.
To analyse the effects on throughput the mean speed of the agents is evaluated, taking into account that a conflict resolution manoeuvre results in detours and reduction in speed towards the destination, therefore conflict events are penalised. The throughput (T ) metric is thus defined as the average ratio of distance travelled towards the target and the total elapsed time, given by Equation (29).
In Equation (29) v i is the speed of the ith agent and P is the penalty associated with conflicts, which was defined to be the difference between the direct and deviated path length, assuming both agents seek total separation distance for safety reasons ( Fig. 1 and Equation (30)).
It is assumed that both aircraft start their coordinated conflict avoidance manoeuvres once they are within each other's sensing range (r S ), and seek a distance of r C from the original path of the other aircraft.
Furthermore, the number of transitions is counted to help assess the practical applicability of the method.
Steady-state simulations are performed for 900 time steps, making sure that the sliding window mean variation of the metrics decreases below 1%.
Simulations with the same setup and parameters are performed 128 times to account for the stochastic nature of the framework. The number of repetitions is chosen after a convergence analysis, together with requirements from the computing service.
Initial positions and headings of agents are chosen randomly according to a uniform distribution. For steady-state simulations, initial states (speeds) are chosen according to the distribution parameter.

Configuration
To analyse the emergent dynamics in steady-state, simulations were run with the distribution parameter kept fixed. The simulation was initialised by setting vehicle positions randomly in the simulation area, according to a uniform distribution. Headings were also generated according to a uniform distribution. Initial speeds were assigned according to the desired distribution with the parameter matching the studied case. Simulations were repeated to account for their stochastic nature.

Simulation results for steady state
Based on the simulation results the effect of distribution types and parameters, agent density and performance of algorithms can be compared. Figure 2 shows the relationship between the safety metric and the distribution parameter for an exponential desired distribution, using the HMC algorithm. This relationship proves that by varying the distribution parameter the safety metric can be regulated.
For the exponential desired distribution the safety metric is effectively zero for λ > 0.07, and rapidly increases with decreasing parameter values (increasing mean speed). The effect of agent density can also be seen, with increasing density the number of conflicts and the safety metric increases as well.
The throughput metric also increases with decreasing parameter values (Fig. 3), however, it is less affected by the agent density.
Similar trends can be seen when the desired distribution is normal, the safety metric increases with increasing parameter (increasing mean speed) and agent density (Fig. 4).
The appropriate functioning of the algorithm is verified by Figs 6 and 7, which show the mean speed of agents for all agent densities as a function of the distribution parameter. Mean speed changes with the variation of the parameters as expected.
In Fig. 8 the desired distributions are compared by plotting the safety metric as a function of mean speed, showing that for the same mean speed the safety metric is lower (fewer conflicts) for normal distribution than it is for the exponential distribution.
Normalising the safety metric with agent density shows that the dependence is linear, normalised metrics for a given distribution are almost indistinguishable (Fig. 9).
The choice of the conflict threshold time is based on estimated dynamic and detect-and-avoid performance, however, it is arbitrary to some extent. The relationship between the time threshold and the number of conflicts was investigated by analysing the histograms of traversal times from the sensing to the conflict region. The traversal time is the time it takes the intruder to arrive at the conflict range of the ownship after it had reached the sensing range. A representative example for traversal times for  Traversal times for the exponential distribution are generally lower, and more evenly distributed. For the normal distribution, the traversal times heavily peak at t = 50s, showing the sensitivity of metrics to threshold selection.
Comparison of algorithms in terms of safety and throughput can be seen in Figs 12 and 13, showing the performance of all algorithms is identical.
It is also important to assess the number of state transitions needed to achieve the desired state distribution, a comparison can be seen in Fig. 14. Since there is no feedback mechanism in the HMC algorithms transitions still occur in steady-state, therefore the number of transitions is three orders of magnitude higher than for IMC algorithms. The global information-based IMC algorithm generates the least number of transitions, the algorithms using local information only require a higher number of transitions within a time step (compared to global information IMC), however, this number seems to be independent of whether the transitions occur simultaneously or asynchronously.
The final distance form the desired distribution (total variation distance) achieved by the algorithms can be seen in Figs 15 and 16. The IMC algorithm generally performs better than HMC, achieving a smaller distance, however, both are outperformed by the local information based IMC algorithm, especially for parameters related to lower mean speeds. Conflicts are also classified according to the number of aircraft involved. The number of conflicts involving two aircraft can be seen in Fig. 17 (for the HMC algorithm, results are almost identical for all used algorithms), showing tendencies similar to the safety metric, indicating that with higher mean speed or agent density the number of conflicts increases.
Normalising the results with the cube of the number of agents involved the curves are almost identical, demonstrating a quadratic relationship between agent density and the number of conflicts, independent of the speed distribution (Fig. 18).
The number of conflicts involving three aircraft is shown in Fig. 19, with the trend being similar to the trends of conflicts involving two aircraft only, but values about two orders of magnitude lower.
When normalising the number of triple conflicts with the cube of the agents simulated the results become similar for all agent densities, indicating a cubic relationship. This also explains why the safety metric scales linearly with the number of agents, which is because conflicts predominantly involve  two aircraft only, and the number of conflicts is already divided by the number of agents according to Equation (28). The number of conflicts involving more than three aircraft at the same time was negligible, their probability requires further simulations and analysis, possibly using techniques for rare event simulation. The explanation for the observed scaling laws is probably related to the uniform distribution of aircraft, the probability that two aircraft reach close proximity scales quadratically with the number of agents (density of agents in proximity scales linearly for each aircraft). The probability of the proximity of a third aircraft to two aircraft already in conflict also scales linearly, explaining the cubic relationship.
To further analyse the dynamics of intrusion, five ranges were defined around the own aircraft, with logarithmically spaced radii (2.5, 1.2, 0.58, 0.28, 0.135nmi). The probabilities of transitioning to an inner radius, provided that the intruder has already arrived at the current radius have been calculated, and are given in Table 1. Results show that transition probabilities are independent, and are closely related to the ratios of radii of ranges.
Steady-state simulations show that the safety and throughput metric can be governed through the choice of the distribution parameters, applying the stochastic task allocation algorithms. Specifying   the desired distribution more accurately, and IMC algorithms, in general, require a much lower number of transitions.

Configuration
To assess the dynamic performance of the algorithms transition simulations are performed starting all agents from the first state (lowest speed). The distribution parameter is kept fixed, the time it takes for the system to reach the desired state and its dynamic characteristics are observed.
Step response simulations are also performed to determine the time constants of the system.  The calculation of the safety metric was modified, a moving average is calculated to provide more relevant information about the conflict state of the system. Information on conflicts, conflict rate, throughput, traversal times, distribution distance and the number of transitions were collected. Simulations were repeated to account for their stochastic nature.

Transient performance of homogeneous Markov chains
Results show that for parameters corresponding to lower speeds the rate of convergence is satisfactory, however, for higher speed parameters the rate is acceptable for the normal desired distribution only. For mean speed and throughput, the transient resembles that of a first-order system for both distributions. For normal distribution both speed and throughput are well-converged by the time reaches 1,000s (Figs 20 and 21).
However, for exponential distribution (Figs 22 and 23) results are still changing at 5,000s, which questions the practical applicability of such a transition.
With the moving mean the transition of the safety metric converges with oscillations for normal distribution (Fig. 24), but fails to converge for exponential distribution (Fig. 25).  Distribution distance decreases rapidly compared to the initial distance (Figs 26 and 27), still, it remains quite high for the exponential distribution to hinder convergence.
Since for both distributions the HMC algorithm is used, the number of transitions is the same, as shown in Figs 28 and 29.

Transient performance between distributions
The transition from a normal distribution to an exponential one corresponding to the same mean speed and back was also investigated. This may correspond to the scenario outlined in Section 1, when the distribution has to be changed due to the appearance of an exceptional vehicle, and then back once it has left the area. Using the HMC algorithm, changing the desired distribution for the time span of 500-2,000s, the mean speed changes only slightly after the transitions (Fig. 33), together with a slight drop in throughput (Fig. 34, as expected based on steady state results in Fig. 13). The predicted growth in the safety metric is also noticeable in Fig. 35. The distribution distance evolves as expected with peaks at the distribution changes dropping with convergence (Fig. 36).

Time constants
To further analyse the transient performance of the system at different operating points, for each point the parameter corresponding to a mean speed 10Kn higher and lower was calculated and set as a transient goal. Thus, the step responses of the system for the different operating points could be analysed. Figures  37 and 38 show representative examples of transitions. The fundamental reason for outliers is that fitting for a stochastic response is not exact, still, the general trend is clearly visible. For exponential distribution, because of the poor convergence, only a few fits were possible. However, the general tendency is still evident. For parameters related to smaller speeds, the convergence is faster, even though the step size was set to be the same. For exponential distribution, for higher speeds, this tendency can also be seen in Fig. 38.
The behaviour and performance of the system are now fully analysed in steady-state and transient conditions for a set parameter, thus, analysis for a varying or even controlled parameter becomes possible.

Multi-agent simulations with a regulated distribution parameter 7.1 Configuration
Fixed distribution parameter simulations have confirmed that by changing the rate of the exponential distribution or the mean of the normal distribution the throughput and safety measures can be adjusted, therefore an attempt has been made to regulate these performance metrics. The overall goal of the middle layer is to enable ensemble dynamics to be controlled by the high-level optimisation by adjusting a single parameter only, therefore the simulation has been modified to allow for a feedback-based control. The aim of the controller was to speed up transitions from one parameter to another while respecting limits on the safety metric. Since transient simulations showed that system dynamics are essentially first order, there is no limit on the proportional gain permissible considering the mean speed (and the closely coupled throughput metric) only, the limit is imposed only by the admissible safety metric. Figures 41  and 42 show results of transient simulations for normal desired distribution stepping from η = 45 to η = 55.
For the transitions, a simple proportional controller is used, given by Equation (31) together with the evolution of the corresponding safety metric.
In Equation (31) P is the proportional gain, η des is the desired distribution parameter, f (v) is a function which takes the current mean speed as an input, and calculates the corresponding parameter according to the steady-state relationship and η u is the control output. Speeds are initialised for η = 45. Apart from the controller and the initial speed, the configuration is the same as in the previous cases. Agents' initial position and heading are assigned according to a uniform distribution. Information on conflicts, states, speeds, conflict rate, throughput, traversal time, number of transitions and distribution distance is collected. Simulations were repeated to account for their stochastic nature.

Results
The general trends are clearly visible from Figs 41 and 42, with increasing the proportional gains the transition becomes faster, the speed error reduces, yet, the corresponding safety metric increases as well. However, without a proper dynamic model of the safety metric control is impossible, especially because of the long delay. Moreover, it is difficult to define the limit of admissible safety metric without any knowledge of the performance limits of the underlying layers. Still, it has been demonstrated that the traffic collision reduction problem can be transformed into a control design problem.

Conclusion
This paper proposes a traffic management framework and elaborates the middle layer, which is based on distributed stochastic task allocation. The aim, which is to coordinate the ensemble dynamics of traffic using a stochastic algorithm with a single parameter input is proven by the extensive simulation campaign to be appropriately accomplished. Steady-state and transient analysis of the system was performed to understand key dynamics and relationships between parameters. Four algorithms were implemented for evaluation: HMC (homogeneous Markov-chain), IMC (inhomogeneous Markov-chain), LICA (local information consistency assumption) IMC and asynchronous LICA IMC. Performance was investigated in terms of two metrics defined, safety and throughput. Steady-state simulations uncovered two scaling laws, the safety metric scales nearly linearly with agent density and number of conflicts simultaneously involving a number of aircraft scales exponentially with the number of aircraft involved. Relationship between metrics and parameters of two desired distributions, the exponential and normal distribution was calculated. Simulations showed that the same performance can be achieved by all algorithms studied, however, IMC-based algorithms can achieve this performance with a much fewer number of transitions. The results of the transient simulations indicated that convergence is much faster for a normal desired distribution than for an exponential one, however, a smooth transition between the two for the same mean speed is possible. Another result is that IMC-based algorithms perform much worse for transient than HMC, therefore later the two were combined, HMC for setpoint changes and IMC for keeping a set point. To improve the transient performance a simple controller was used to regulate the desired parameter for normal distribution. It has been shown that the traffic conduction problem can be transformed into a control design problem through the distributed task allocation algorithm.

Future work
Future work will investigate the integration of the proposed middle layer with the low-level ad-hoc formation and collision avoidance methods and the high-level traffic flow optimisation. The practical applicability of the framework for personal aerial vehicles also has to be studied in detail. Also, safety and throughput results for the predicted densities and dynamic characteristics show the system is in a stable state for traffic flows. Therefore, to study system dynamics in all regimes additional analysis is necessary to investigate the behaviour of flow diagrams past and around the maximum flow point in steady-state and transient.