Hostname: page-component-89b8bd64d-nlwjb Total loading time: 0 Render date: 2026-05-07T06:44:03.690Z Has data issue: false hasContentIssue false

Large language models in judicial assistance: Empirical insights and domain-specific fine-tuning

Published online by Cambridge University Press:  26 February 2026

Surong Zhu*
Affiliation:
Beijing Foreign Studies University, China
Jiahui Zhao
Affiliation:
Tianjin University, China
Yunan Chen
Affiliation:
Beijing Foreign Studies University, China
Xu Sun
Affiliation:
Southwestern University of Finance and Economics, China
Xi Zheng
Affiliation:
Beijing Foreign Studies University, China
*
Corresponding author: Surong Zhu; Email: zhu_su_rong@163.com
Rights & Permissions [Opens in a new window]

Abstract

In the digital information age, artificial intelligence is increasingly being applied to national governance and judicial decision-making assistance. Existing studies lack case studies and empirical analyses of the effectiveness of large models in aiding judicial decisions. To address this research gap, this study designs a comprehensive evaluation framework encompassing five core task dimensions: Task-oriented Information Extraction, Legal Article Citation, Event Extraction, Judicial Decision Generation, and Legal Opinion Generation. By using carefully crafted prompts to activate the legal reasoning capabilities of the models, we conducted extensive testing on 13 mainstream large language models (LLMs). The experimental results demonstrate that large models perform excellently in processing legal texts and providing preliminary legal opinions, but still exhibit shortcomings in complex legal reasoning and precise decision-making. On this basis, we applied a weakly supervised learning strategy to fine-tune the LLMs for targeted improvements. The results indicate that introducing a small amount of task-specific learning can significantly enhance the performance of LLMs in judicial tasks. This further underscores the critical role of data and the acquisition of domain-specific knowledge in applying AI technology to judicial tasks. Additionally, this study briefly discusses the issue of the boundaries of AI’s involvement in judicial activities, aiming to provide theoretical foundations and practical guidance for the deep integration of AI technology with legal practice.

Information

Type
Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2026. Published by Cambridge University Press
Figure 0

Figure 1. Distribution of description lengths (in number of characters) across the 215 selected Chinese legal cases.

Figure 1

Figure 2. The process of weakly supervised data generation.

Figure 2

Table 1. Inter-annotator consistency for different annotation tasks using coefficient of variation (CV), within-3% agreement ratio, and within-5% agreement ratio

Figure 3

Table 2. Performance evaluation of 13 large language models on task-oriented information extraction (F-score only, sorted by average)

Figure 4

Table 3. Performance of large language models on legal article citation

Figure 5

Table 4. Performance of large language models in event extraction

Figure 6

Table 5. Performance of large language models in judicial decision capability

Figure 7

Table 6. Performance of large language models in legal opinion generation

Figure 8

Figure 3. Correlation matrix between five legal annotation tasks.

Figure 9

Figure 4. Mean Opinion Scores (MOS) of 13 LLMs across event extraction, judicial decision generation, and legal opinion generation tasks, with 95% confidence intervals.

Figure 10

Table 7. Performance evaluation of LLMs on task-oriented legal information extraction tasks

Figure 11

Figure 5. Convergence curves during fine-tuning of the Baichuan model.

Figure 12

Table 8. Performance evaluation on legal article extraction, event extraction, judicial decision generation, and legal opinion generation tasks