P.184 Synthetic neurosurgical data generation using large language models

AA Barr; E Guo; E Sezgin

doi:10.1017/cjn.2025.10322

P.184 Synthetic neurosurgical data generation using large language models

Published online by Cambridge University Press: 10 July 2025

AA Barr ,

E Guo and

E Sezgin

Show author details

AA Barr: Affiliation:
(Calgary)*
E Guo: Affiliation:
(Calgary)
E Sezgin: Affiliation:
(Columbus)

Article contents

Abstract

Rights & Permissions

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

Background: Use of neurosurgical data for research and machine learning model development is often constrained by privacy regulations, small sample sizes, and resource-intensive data preprocessing. We explored the feasibility of using the large language model (LLM) GPT-4o to generate synthetic neurosurgical data. Methods: A plain-language prompt instructed GPT-4o to generate synthetic data based on univariate and bivariate statistical properties of 12 perioperative parameters from a real-world open-access neurosurgical dataset (n = 139). The prompt was input over independent trials to generate 10 datasets matching the reference size (n = 139), followed by an additional dataset representing a ten-fold amplification (n = 1390). Fidelity was assessed using t-tests, two-sample proportion tests, Jensen-Shannon divergence, two-sample Kolmogorov-Smirnov, and Pearson’s product-moment correlation. Results: Generated data preserved distributional characteristics and relationships between desired parameters. In all generations, at least 11/12 (91.67%) parameters showed no statistically significant differences in means and proportions from real data, including the amplified dataset. Five of the synthetic datasets showed no significant differences in all 12 parameters. Conclusions: The findings demonstrate that a zero-shot prompting approach can generate synthetic neurosurgical data and amplify sample sizes with consistent high fidelity compared to real-world data. This underscores LLMs’ potential in addressing data availability challenges for neurosurgical research.

Information

Type: Abstracts
Information: Canadian Journal of Neurological Sciences , Volume 52 , Supplement s1: ABSTRACTS: Canadian Neurological Sciences Federation (CNSF) 2025 Congress , June 2025 , pp. S58

DOI: https://doi.org/10.1017/cjn.2025.10322 [Opens in a new window]

Article contents

P.184 Synthetic neurosurgical data generation using large language models

Abstract

Information

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests