Representation Format Effects on Small Language Model
Diagnostic Fidelity in Primary Care Pipelines:
A Three-Arm Paired Simulation Protocol with
a Flat, FHIR, and openEHR Illustration

Florian Odi Stummer

doi:10.33774/coe-2026-jk375

Computer Science

Search within Computer Science

Representation Format Effects on Small Language Model Diagnostic Fidelity in Primary Care Pipelines: A Three-Arm Paired Simulation Protocol with a Flat, FHIR, and openEHR Illustration

14 April 2026, Version 1

Working Paper

Florian Odi Stummer

Show author details

This content is an early or alternative research output and has not been peer-reviewed by Cambridge University Press at the time of posting.

Abstract

Background. Primary care deploys small language models (SLMs) in sequence. Whether input representation modulates pipeline fidelity, and whether FHIR wrapping or archetype-based structure is required, is not established at small-model scale. Methods. We establish a three-arm paired simulation protocol. Each of 50 Synthea patients was processed through a Triage → Coder SLM pipeline (Gemma 3 1B, Qwen3 1.7B) under three conditions: flat tabular text, compact FHIR R4 Bundle JSON, and openEHR Composition JSON with four Clinical Knowledge Manager archetypes. Three repeats per condition at temperature 0.5 (450 runs). Compact FHIR was pre-specified to control an input-length confound (full FHIR Bundles run approximately 6.5× openEHR length). Primary outcome: semantic F1 against Synthea condition descriptions. Statistics: Friedman test, pairwise Wilcoxon with Bonferroni correction, paired Cohen's d. Results. Mean semantic F1: Flat 0.193, FHIR 0.198, openEHR 0.228. Friedman significant (χ² = 7.56, p = 0.023). Bonferroni-corrected pairwise: Flat vs FHIR p = 1.00 (d = +0.03); Flat vs openEHR p = 0.10 (d = +0.29); FHIR vs openEHR p = 0.09 (d = +0.33). Best arm per patient: Flat 10, FHIR 15, openEHR 25. Directionally consistent with an openEHR lift; no pairwise contrast survives Bonferroni at n = 50. Conclusions. The protocol establishes a reproducible instrument for representation-format effects in multi-SLM clinical pipelines. The n = 50 illustration gives preliminary evidence consistent with long-standing arguments that ontological richness, not mere structural wrapping, drives semantic preservation through a language model cascade. Findings invite collaborative follow-up on larger, non-synthetic cohorts.

Keywords

openEHR

FHIR

archetype

small language models

primary care

clinical AI safety

representation format

paired simulation

cascade amplification factor

ground-truth evaluation

Supplementary materials

Title

Description

Actions

Title

Reproducibility bundle: simulation source code, three-arm paired dataset, fresh statistics script, and figure-generation script

Description

Complete set of Python source files (three input-format mappers and simulator), the three-arm paired per-patient F1 dataset, a fresh-statistics script with its stored output, and the figure-generation script for the hero figure. All numbers in Section 4 of the manuscript can be reproduced by running python stats/r1_threearm_stats.py from the bundle root.

Actions

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting and Discussion Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Apr 14, 2026 Version 1

Metrics

261

Views

Downloads

Citations

License

The content is available under CC BY 4.0

DOI

10.33774/coe-2026-jk375

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content