Abstract
This study tests whether process-state verbs (PSVs), bracketed all-caps verb tokens (RESOLVE / AWAIT / FATAL) used as operator-level constraints, achieve durable behavioral adherence in DeepSeek V4 Pro under multi-turn context dilution. Such operator-level constraints are the primary mechanism for governing agentic deployments. Study 1 crosses 5 constraint-delivery conditions x 2 omission-class constraints x 6 conversation depths (N=150 per cell, 9,000 sessions); Study 2 (1,800 sessions) is a pre-planned confirmatory replication on this substrate of the omission/commission compliance asymmetry reported by Gamage (2026). Two pre-registered findings hold across sensitivity checks. (i) Gamage's omission/commission asymmetry is not supported: at depth 10, commission compliance was 0% and omission compliance 39% (a -39pp gap), inverting Gamage's reported +67pp on Mistral Large 3 across roughly 106pp of swing. (ii) The Knows-But-Violates partition is supported: PSV violations were 100% knowing violations and control violations 100% ignorance violations. An exploratory 200-session supplement shows the canonical noise corpus contained verbatim restatements of constraint vocabulary, functioning as depth-correlated implicit re-injection; under domain-neutral noise the 60.6% pooled canonical ambiguity rate largely dissolves. Decay-curve studies importing conversational noise should audit corpus-constraint vocabulary overlap before treating hedging rates as substrate properties.
Supplementary materials
Title
OSF Reproducibility Materials — Reading Guide
Description
Navigation guide for the OSF pre-registration (10.17605/OSF.IO/J3MBS) and the OSF data and code archive (10.17605/OSF.IO/PY2DE). Describes the archive structure (code/, data/, analysis/, figures/), the "Download As Zip" instruction needed when the OSF Files panel renders empty without an account, and instructions for reproducing the paper's Section 4 results from the deposited materials.
Actions
Supplementary weblinks
Title
OSF Pre-registration (Studies 1 and 2)
Description
Pre-registration of all hypotheses, design decisions, and analysis plan for this study, deposited at OSF Registries on 2026-05-08 prior to data collection. The 14-section pre-registered protocol covers Study 1 (5 constraint-delivery conditions x 2 omission-class constraints x 6 conversation depths x N=150 per cell, 9,000 sessions) and Study 2 (commission constraint replication, 1,800 sessions), with per-hypothesis decision rules, pre-specified null interpretations, and the public commitment to publish irrespective of outcome.
Actions
View Title
OSF Data Archive (test harness, analysis pipeline, raw sessions)
Description
Full reproducibility package: the test harness (test_behavioral.py), the analysis pipeline (analyze_results.py and the LLM-as-judge pipeline), all 10,800 raw session JSONs from Study 1 and Study 2, judged labels, the 50-item Cohen kappa calibration sample, and the Section 4.5 mismatched-noise counter-experiment outputs. CC-BY 4.0 licensed. The results reported in Section 4 of the paper can be regenerated from these files using the included scripts.
Actions
View 


![Author ORCID: We display the ORCID iD icon alongside authors names on our website to acknowledge that the ORCiD has been authenticated when entered by the user. To view the users ORCiD record click the icon. [opens in a new tab]](https://www.cambridge.org/engage/assets/public/coe/logo/orcid.png)