Search

Adaptive curriculum reinforcement learning with sim-to-real strategy in balance control of underactuated triple pendulum robots
Yunfan Fu, Jing Guo, Donghao Li, Junpeng Chen, Liangliang Han, Yuanju Qu, Yang Pan
Journal:

Robotica / Volume 44 / Issue 2 / February 2026

Published online by Cambridge University Press:

27 March 2026, pp. 698-722
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
This paper addresses the challenge of balance control for the underactuated triple pendulum robot (UTPR) using a model-free reinforcement learning (RL) strategy. A curriculum-based Soft Actor-Critic strategy, with a quadratic form and an integral term in the reward function (CSAC-QI), is proposed. By incorporating the integral of cumulative joint angle errors into the reward function, the CSAC-QI method significantly reduces steady-state errors and enhances control precision. CSAC-QI improves convergence efficiency through an adaptive curriculum learning (CL) framework that enables a structured transition from simpler to more complex tasks. To enhance control robustness, motor friction identification and domain randomization are implemented during training, thereby equipping the UTPR to cope with real-world uncertainties. Simulation experiments demonstrate superior performance of the CSAC-QI method in handling larger initial joint deviations, achieving accurate end-effector positioning, and maintaining balance under dynamic randomization, sensor noise, and external disturbances. Notably, the trained policy is directly deployed on the UTPR prototype, where it successfully maintains balance in real-world conditions.

Priority-based reward function for fast-settling and low-deviation trajectory planning of a wing-in-ground craft’s altitude change
H. Hu, Y. Shi, G. Zhang, Q. Zhang, N. Chen, J. Zhang
Journal:

The Aeronautical Journal / Volume 130 / Issue 1346 / April 2026

Published online by Cambridge University Press:

19 February 2026, pp. 1290-1309
- Article
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
For efficient altitude transition and cruise maintenance, the wing-in-ground craft (WIG) quickly adjusts to and then stabilises at the target flight altitude, with minimal deviation during the fastest transition. Conventional methods typically rely on multi-objective functions to minimise settling time (ST) and integrated absolute error (IAE). These methods achieve the target trajectory through iterative trial-and-error adjustment on their weights that balance the two objectives, leading to time-consuming tuning. A reward function is proposed to prioritise minimising ST first, followed by minimising IAE. When trajectory points deviate from the target altitude, rewards are calculated based on absolute error (AE) values. Once the target altitude is reached, the reward takes the value of an exponential based on the current step number. The proposed reward function ensures that positive rewards for reducing ST outweigh the reward losses from adjusting previous trajectory points, and it is mathematically validated. Example of a WIG’s altitude change compares the proposed reward function and conventional multi-objective methods via deep reinforcement learning. Results show that the proposed reward function directly plans the trajectory that prioritises fast-settling requirements first and then minimising deviation, without needing to introduce or tune any parameter.

Reward distributions associated with some block tridiagonal transition matrices with applications to identity by descent
Part of
- Distribution theory - Probability
Valeri T. Stefanov, Frank Ball
Journal:

Advances in Applied Probability / Volume 41 / Issue 2 / June 2009

Published online by Cambridge University Press:

01 July 2016, pp. 523-545

Print publication:

June 2009
- Article
- - You have access
- PDF
- Export citation
Markov and semi-Markov processes with block tridiagonal transition matrices for their embedded discrete-time Markov chains are underlying stochastic models in many applied probability problems. In particular, identity-by-descent (IBD) problems for uncle-type and cousin-type relationships fall into this class. More specifically, the exact distributions of relevant IBD statistics for two individuals in either an uncle-type or cousin-type relationship are of interest. Such statistics are the amount of genome shared IBD by the two related individuals on a chromosomal segment and the number of IBD pieces on such a segment. These lead to special reward distributions associated with block tridiagonal transition matrices for continuous-time Markov chains. A method is provided for calculating explicit, closed-form expressions for Laplace transforms of general reward functions for such Markov chains. Some calculation results on the cumulative probabilities of relevant IBD statistics via a numerical inversion of the Laplace transforms are also provided for uncle/nephew and first-cousin relationships.

Exact distributions for reward functions on semi-Markov and Markov additive processes
Part of
- Special processes
- Distribution theory - Probability
Valeri T. Stefanov
Journal:

Journal of Applied Probability / Volume 43 / Issue 4 / December 2006

Published online by Cambridge University Press:

14 July 2016, pp. 1053-1065

Print publication:

December 2006
- Article
- - You have access
- PDF
- Export citation
The distribution theory for reward functions on semi-Markov processes has been of interest since the early 1960s. The relevant asymptotic distribution theory has been satisfactorily developed. On the other hand, it has been noticed that it is difficult to find exact distribution results which lead to the effective computation of such distributions. Note that there is no satisfactory exact distribution result for rewards accumulated over deterministic time intervals [0, t], even in the special case of continuous-time Markov chains. The present paper provides neat general results which lead to explicit closed-form expressions for the relevant Laplace transforms of general reward functions on semi-Markov and Markov additive processes.

Search Results

Refine search

Refine search

Actions for selected content:

4 results

Adaptive curriculum reinforcement learning with sim-to-real strategy in balance control of underactuated triple pendulum robots

Priority-based reward function for fast-settling and low-deviation trajectory planning of a wing-in-ground craft’s altitude change

Reward distributions associated with some block tridiagonal transition matrices with applications to identity by descent

Exact distributions for reward functions on semi-Markov and Markov additive processes

Search Results

Refine search

Refine search

Actions for selected content:

Save Search

4 results

Adaptive curriculum reinforcement learning with sim-to-real strategy in balance control of underactuated triple pendulum robots

Priority-based reward function for fast-settling and low-deviation trajectory planning of a wing-in-ground craft’s altitude change

Reward distributions associated with some block tridiagonal transition matrices with applications to identity by descent

Exact distributions for reward functions on semi-Markov and Markov additive processes