This paper addresses the challenge of balance control for the underactuated triple pendulum robot (UTPR) using a model-free reinforcement learning (RL) strategy. A curriculum-based Soft Actor-Critic strategy, with a quadratic form and an integral term in the reward function (CSAC-QI), is proposed. By incorporating the integral of cumulative joint angle errors into the reward function, the CSAC-QI method significantly reduces steady-state errors and enhances control precision. CSAC-QI improves convergence efficiency through an adaptive curriculum learning (CL) framework that enables a structured transition from simpler to more complex tasks. To enhance control robustness, motor friction identification and domain randomization are implemented during training, thereby equipping the UTPR to cope with real-world uncertainties. Simulation experiments demonstrate superior performance of the CSAC-QI method in handling larger initial joint deviations, achieving accurate end-effector positioning, and maintaining balance under dynamic randomization, sensor noise, and external disturbances. Notably, the trained policy is directly deployed on the UTPR prototype, where it successfully maintains balance in real-world conditions.