Representation Format Effects on Small Language Model Diagnostic Fidelity in Primary Care Pipelines: A Three-Arm Paired Simulation Protocol with a Flat, FHIR, and openEHR Illustration

14 April 2026, Version 1
This content is an early or alternative research output and has not been peer-reviewed by Cambridge University Press at the time of posting.

Abstract

Background. Primary care deploys small language models (SLMs) in sequence. Whether input representation modulates pipeline fidelity, and whether FHIR wrapping or archetype-based structure is required, is not established at small-model scale. Methods. We establish a three-arm paired simulation protocol. Each of 50 Synthea patients was processed through a Triage → Coder SLM pipeline (Gemma 3 1B, Qwen3 1.7B) under three conditions: flat tabular text, compact FHIR R4 Bundle JSON, and openEHR Composition JSON with four Clinical Knowledge Manager archetypes. Three repeats per condition at temperature 0.5 (450 runs). Compact FHIR was pre-specified to control an input-length confound (full FHIR Bundles run approximately 6.5× openEHR length). Primary outcome: semantic F1 against Synthea condition descriptions. Statistics: Friedman test, pairwise Wilcoxon with Bonferroni correction, paired Cohen's d. Results. Mean semantic F1: Flat 0.193, FHIR 0.198, openEHR 0.228. Friedman significant (χ² = 7.56, p = 0.023). Bonferroni-corrected pairwise: Flat vs FHIR p = 1.00 (d = +0.03); Flat vs openEHR p = 0.10 (d = +0.29); FHIR vs openEHR p = 0.09 (d = +0.33). Best arm per patient: Flat 10, FHIR 15, openEHR 25. Directionally consistent with an openEHR lift; no pairwise contrast survives Bonferroni at n = 50. Conclusions. The protocol establishes a reproducible instrument for representation-format effects in multi-SLM clinical pipelines. The n = 50 illustration gives preliminary evidence consistent with long-standing arguments that ontological richness, not mere structural wrapping, drives semantic preservation through a language model cascade. Findings invite collaborative follow-up on larger, non-synthetic cohorts.

Keywords

openEHR
FHIR
archetype
small language models
primary care
clinical AI safety
representation format
paired simulation
cascade amplification factor
ground-truth evaluation

Supplementary materials

Title
Description
Actions
Title
Reproducibility bundle: simulation source code, three-arm paired dataset, fresh statistics script, and figure-generation script
Description
Complete set of Python source files (three input-format mappers and simulator), the three-arm paired per-patient F1 dataset, a fresh-statistics script with its stored output, and the figure-generation script for the hero figure. All numbers in Section 4 of the manuscript can be reproduced by running python stats/r1_threearm_stats.py from the bundle root.
Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting and Discussion Policy [opens in a new tab] - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here [opens in a new tab] .
This site is protected by reCAPTCHA and the Google Privacy Policy [opens in a new tab] and Terms of Service [opens in a new tab] apply.