In the digital information age, artificial intelligence is increasingly being applied to national governance and judicial decision-making assistance. Existing studies lack case studies and empirical analyses of the effectiveness of large models in aiding judicial decisions. To address this research gap, this study designs a comprehensive evaluation framework encompassing five core task dimensions: Task-oriented Information Extraction, Legal Article Citation, Event Extraction, Judicial Decision Generation, and Legal Opinion Generation. By using carefully crafted prompts to activate the legal reasoning capabilities of the models, we conducted extensive testing on 13 mainstream large language models (LLMs). The experimental results demonstrate that large models perform excellently in processing legal texts and providing preliminary legal opinions, but still exhibit shortcomings in complex legal reasoning and precise decision-making. On this basis, we applied a weakly supervised learning strategy to fine-tune the LLMs for targeted improvements. The results indicate that introducing a small amount of task-specific learning can significantly enhance the performance of LLMs in judicial tasks. This further underscores the critical role of data and the acquisition of domain-specific knowledge in applying AI technology to judicial tasks. Additionally, this study briefly discusses the issue of the boundaries of AI’s involvement in judicial activities, aiming to provide theoretical foundations and practical guidance for the deep integration of AI technology with legal practice.