FinRL-DeepSeek Risk-First Architecture Structural Constraints for LLM-Driven Trading Agents

21 May 2026, Version 1
This content is an early or alternative research output and has not been peer-reviewed by Cambridge University Press at the time of posting.

Abstract

Portfolio optimization via deep reinforcement learning (DRL) consis tently fails during market crashes because it focuses exclusively on return maximization. Large language models (LLMs) can read financial news to anticipate these crashes, but our experiments show that a standard DRL agent ignores semantic signals in favor of price momentum. We present Risk-First, an architecture that mathematically forces the agent to follow those alerts. The system has three components: a variance filter to discard LLM hallucinations, a reward penalty for dangerous exposure (Reward Shaping), and a deterministic circuit breaker that forces asset liquidation. Tested on the NASDAQ market using the FNSPID dataset and DeepSeek-V3, these constraints reduce tail risk (Max Drawdown) from-58.37% to-56.09% while increasing returns, showing that the DRL architecture must be structurally constrained to make LLM predictions useful.

Keywords

Reinforcement Learning
arge Language Models
Risk Management
CMDP
DeepSeek-V3
Apprentissage par renforcement
Optimisation de portefeuille
Modèles de langage
Gestion des risques
CMDP
DeepSeek-V3

Supplementary weblinks

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting and Discussion Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.