Abstract
Portfolio optimization via deep reinforcement learning (DRL) consis tently fails during market crashes because it focuses exclusively on return maximization. Large language models (LLMs) can read financial news to anticipate these crashes, but our experiments show that a standard DRL agent ignores semantic signals in favor of price momentum. We present Risk-First, an architecture that mathematically forces the agent to follow those alerts. The system has three components: a variance filter to discard LLM hallucinations, a reward penalty for dangerous exposure (Reward Shaping), and a deterministic circuit breaker that forces asset liquidation. Tested on the NASDAQ market using the FNSPID dataset and DeepSeek-V3, these constraints reduce tail risk (Max Drawdown) from-58.37% to-56.09% while increasing returns, showing that the DRL architecture must be structurally constrained to make LLM predictions useful.
Supplementary weblinks
Title
Github
Description
Projet Github :
This is the research contribution developed for the FinRL Contest 2026, Task 1 (AI for Finance, PGE5 2025/2026 at Aivancity). It extends CPPO-DeepSeek with three modules that force the agent to act on LLM risk signals, not just observe them.
Actions
View 

