Abstract
As large language models (LLMs) become central tools in science, improving their reasoning capabilities is critical for meaningful and trustworthy applications. We introduce a Socratic agent for scientific reasoning, implemented through a structured system prompt that guides LLMs via classical principles of inquiry. Unlike typical prompt engineering or retrieval-based methods, our approach leverages definition, analogy, hypothesis elimination, and other Socratic techniques to generate more coherent, critical, and domain-aware responses. We evaluate the agent across diverse scientific domains and benchmark it on the ARC Challenge dataset, achieving state-of-the-art performance (97.15%) without fine-tuning or external tools. Expert evaluation shows improved reasoning depth, clarity, and adaptability over conventional LLM outputs, suggesting that structured prompting rooted in philosophical reasoning can substantially enhance the scientific utility of language models.
Supplementary materials
Title
Supplementary Information
Description
All prompt logs
Actions
Title
Full Prompt
Description
Full SM prompt that can be used to create the agent
Actions



![Author ORCID: We display the ORCID iD icon alongside authors names on our website to acknowledge that the ORCiD has been authenticated when entered by the user. To view the users ORCiD record click the icon. [opens in a new tab]](https://www.cambridge.org/engage/assets/public/coe/logo/orcid.png)