Hostname: page-component-89b8bd64d-dvtzq Total loading time: 0 Render date: 2026-05-06T13:51:55.675Z Has data issue: false hasContentIssue false

Improvements in learning to control perched landings

Published online by Cambridge University Press:  04 May 2022

L. Fletcher*
Affiliation:
Department of Aerospace Engineering, University of Bristol, Bristol, UK
R. Clarke
Affiliation:
Department of Aerospace Engineering, University of Bristol, Bristol, UK
T. Richardson
Affiliation:
Department of Aerospace Engineering, University of Bristol, Bristol, UK
M. Hansen
Affiliation:
Bristol Robotics Lab, Bristol, UK
*
*Corresponding author. Email: liam.fletcher@bristol.ac.uk
Rights & Permissions [Opens in a new window]

Abstract

Reinforcement learning has previously been applied to the problem of controlling a perched landing manoeuvre for a custom sweep-wing aircraft. Previous work showed that the use of domain randomisation to train with atmospheric disturbances improved the real-world performance of the controllers, leading to increased reward. This paper builds on the previous project, investigating enhancements and modifications to the learning process to further improve performance, and reduce final state error. These changes include modifying the observation by adding information about the airspeed to the standard aircraft state vector, employing further domain randomisation of the simulator, optimising the underlying RL algorithm and network structure, and changing to a continuous action space. Simulated investigations identified hyperparameter optimisation as achieving the most significant increase in reward performance. Several test cases were explored to identify the best combination of enhancements. Flight testing was performed, comparing a baseline model against some of the best performing test cases from simulation. Generally, test cases that performed better than the baseline in simulation also performed better in the real world. However, flight tests also identified limitations with the current numerical model. For some models, the chosen policy performs well in simulation yet stalls prematurely in reality, a problem known as the reality gap.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
© The Author(s), 2022. Published by Cambridge University Press on behalf of Royal Aeronautical Society
Figure 0

Figure 1. Sweep wing Bixler 2 during flight testing.

Figure 1

Figure 2. Perched landing diagram.

Figure 2

Table 1. Domain randomisation parameters

Figure 3

Figure 3. Training graph, showing reward against time steps for several values of the n_steps hyperparameter.

Figure 4

Table 2. Change of hyperparameters

Figure 5

Figure 4. Mean reward obtained during training by models using frame stacking, with n of four (yellow line) and n of eight (blue line).

Figure 6

Table 3. Test Case configurations for the first set of simulation evaluations

Figure 7

Figure 5. Training history for test cases detailed in Table 3.

Figure 8

Figure 6. Mean reward for 100 runs of each test case under the baseline evaluation (steady state wind with Dryden gusts) and under the enhanced evaluation with domain randomisation enabled.

Figure 9

Table 4. Test Case configurations for the second set of simulation investigations

Figure 10

Figure 7. Training history for test cases detailed in Table 4.

Figure 11

Table 5. Simulated mean reward for several test cases

Figure 12

Figure 8. Rewards for each test case, normalised against the baseline reward.

Figure 13

Figure 9. Simulation time history results for a baseline model at different headwinds.

Figure 14

Figure 10. Simulation time history results for a Test Case 5 model at different headwinds.

Figure 15

Table 6. Reward achieved by 2 different test cases during the first session of experimental flight testing

Figure 16

Table 7. Mean final states for a baseline model and a Test Case 7 model during the first flight testing session

Figure 17

Figure 11. Plot of states and actions for several attempts for the baseline model from the first experimental flight testing session.

Figure 18

Figure 12. Plot of states and actions for several attempts for the Test Case 7 model from the first experimental flight testing session.

Figure 19

Table 8. Reward achieved by 3 different test cases during the second session of experimental flight testing

Figure 20

Table 9. Mean final states for a baseline model, a Test Case 5 model and a Test Case 9 model during the second flight testing session

Figure 21

Figure 13. Plot of states and actions for several attempts for the baseline model from the second experimental flight testing session.

Figure 22

Figure 14. Plot of states and actions for several attempts for the Test Case 9 model from the second experimental flight testing session.