FinRL-DeepSeek Risk-First Architecture Structural Constraints for LLM-Driven Trading Agents

Seth-Sady NDINGA-NDINGA

doi:10.33774/coe-2026-xhg2x

Abstract

Portfolio optimization via deep reinforcement learning (DRL) consis tently fails during market crashes because it focuses exclusively on return maximization. Large language models (LLMs) can read financial news to anticipate these crashes, but our experiments show that a standard DRL agent ignores semantic signals in favor of price momentum. We present Risk-First, an architecture that mathematically forces the agent to follow those alerts. The system has three components: a variance filter to discard LLM hallucinations, a reward penalty for dangerous exposure (Reward Shaping), and a deterministic circuit breaker that forces asset liquidation. Tested on the NASDAQ market using the FNSPID dataset and DeepSeek-V3, these constraints reduce tail risk (Max Drawdown) from-58.37% to-56.09% while increasing returns, showing that the DRL architecture must be structurally constrained to make LLM predictions useful.

Keywords

Reinforcement Learning

Apprentissage par renforcement

Optimisation de portefeuille

Supplementary weblinks

Title

Description

Actions

Title

Github

Description

Projet Github : This is the research contribution developed for the FinRL Contest 2026, Task 1 (AI for Finance, PGE5 2025/2026 at Aivancity). It extends CPPO-DeepSeek with three modules that force the agent to act on LLM risk signals, not just observe them.

Actions

View

FinRL-DeepSeek Risk-First Architecture Structural Constraints for LLM-Driven Trading Agents

Abstract

Keywords

Supplementary weblinks

Comments

Version History

Metrics

License

DOI

Author’s competing interest statement

Ethics

Share

FinRL-DeepSeek Risk-First Architecture Structural Constraints for LLM-Driven Trading Agents

Authors

Abstract

Keywords

Supplementary weblinks

Comments

Version History

Metrics

License

DOI

Author’s competing interest statement

Ethics

Share