This paper presents an innovative hybrid approach that integrates traditional control strategies with deep reinforcement learning for robotic assembly. By fusing multimodal information from visual and force feedback, the method leverages admittance control to ensure safe force feedback while using deep reinforcement learning to process visual input, enabling precise control and real-time correction of assembly actions. This multi-sensor feedback mechanism not only enhances the stability and accuracy of the assembly process but also improves the robot’s robustness and adaptability in uncertain environments. Additionally, a twin-delay deep deterministic policy gradient algorithm based on residual reinforcement learning is proposed. The design of a task-specific reward function, which simultaneously considers visual goals, force compliance, and contact stability, effectively addresses challenges such as difficult state information acquisition and sparse rewards in assembly tasks. This improves the robot’s interaction capabilities and task execution efficiency in real-world environments. Experimental results demonstrate that the method designed in this paper effectively reduces the training time for reinforcement learning from 400 epochs to 100 epochs, significantly decreases the magnitude of contact forces during the assembly process, and shortens the contact time.