Abstract
Large language model deployments increasingly allocate computation at inference time rather than applying a single fixed model and decoding policy to every input. The resulting design problem is not only which model is best on average, but which computational action should be taken for a particular query, partial answer, reasoning step, or token under a budget. Research on this problem is fragmented across model routing, confidence-gated cascades, selective prediction, test-time scaling, verifier-guided search, speculative decoding, and token- or layer-level architectural adaptivity. This survey unifies these strands as adaptive computation: budgeted sequential decision-making over computational actions on a quality-cost frontier. I provide a structured review protocol, a taxonomy by allocation granularity and decision signal, a formal mapping of routing and cascades to special cases of sequential decision-making, and evaluation conventions for reporting tokens, dollars, FLOPs, latency, and decision overhead. A normalized audit of 15 representative systems and method families indicates that adaptive policies are most credible when the decision signal is substantially cheaper than the action it avoids and is calibrated near the deployment threshold. The audit also shows why many headline savings are not directly comparable: router calls, verifier calls, draft-model FLOPs, rejected samples, price snapshots, and queueing latency are often treated inconsistently. I close with open problems in step-level deferral with guarantees, calibration under distribution shift, effort prediction for reasoning models, routing over models and inference configurations, and inference-compute economics.
Supplementary materials
Title
Coding sheet — supplementary material
Description
The supplementary coding sheet referenced in Section 2 and the audit in Section 9.3 / Appendix A. It operationalizes the review protocol and coding schema in Table 2 and records the per-entry coding behind Table 6.
Actions



![Author ORCID: We display the ORCID iD icon alongside authors names on our website to acknowledge that the ORCiD has been authenticated when entered by the user. To view the users ORCiD record click the icon. [opens in a new tab]](https://www.cambridge.org/engage/assets/public/coe/logo/orcid.png)