Hostname: page-component-77f85d65b8-5ngxj Total loading time: 0 Render date: 2026-04-18T09:53:43.149Z Has data issue: false hasContentIssue false

Parallel dual-numbers reverse AD

Part of: POPL 23

Published online by Cambridge University Press:  15 August 2025

TOM J. SMEDING
Affiliation:
Utrecht University, The Netherlands (e-mail: t.j.smeding@uu.nl)
MATTHIJS I. L. VÁKÁR
Affiliation:
Utrecht University, The Netherlands (e-mail: m.i.l.vakar@uu.nl)
Rights & Permissions [Opens in a new window]

Abstract

Where dual-numbers forward-mode automatic differentiation (AD) pairs each scalar value with its tangent value, dual-numbers reverse-mode AD attempts to achieve reverse AD using a similarly simple idea: by pairing each scalar value with a backpropagator function. Its correctness and efficiency on higher-order input languages have been analysed by Brunel, Mazza and Pagani, but this analysis used a custom operational semantics for which it is unclear whether it can be implemented efficiently. We take inspiration from their use of linear factoring to optimise dual-numbers reverse-mode AD to an algorithm that has the correct complexity and enjoys an efficient implementation in a standard functional language with support for mutable arrays, such as Haskell. Aside from the linear factoring ingredient, our optimisation steps consist of well-known ideas from the functional programming community. We demonstrate the use of our technique by providing a practical implementation that differentiates most of Haskell98. Where previous work on dual numbers reverse AD has required sequentialisation to construct the reverse pass, we demonstrate that we can apply our technique to task-parallel source programs and generate a task-parallel derivative computation.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press
Figure 0

Fig. 1. An example program together with its derivative, both using dual-numbers forward AD and using dual-numbers reverse AD. The original program is of type $(\mathbb R, \mathbb R) \rightarrow \mathbb R$.

Figure 1

Fig. 2. Left: an example showing how naive dual-numbers reverse AD can result in exponential blow-up when applied to a program with sharing. Right: the dependency graph of the backpropagators $dx_i$.

Figure 2

Fig. 3. Overview of the optimisations to dual-numbers reverse AD as a code transformation that are described in this paper. († = inspired by Brunel et al. (2020))

Figure 3

Fig. 4. The source language of all variants of this paper’s reverse AD transformation. $\mathbb Z$, the type of integers, is added as an example of a type that AD does not act upon.

Figure 4

Fig. 5. The target language of the unoptimised variant of the reverse AD transformation. Components that are also in the source language (Figure 4) are set in .

Figure 5

Fig. 6. The naive code transformation from the source (Figure 4) to the target (Figure 5) language. The cases where $\textbf{D}^{1}_{c}$ just maps homomorphically over the source language are set in .

Figure 6

Fig. 7. Wrapper around $\textbf{D}^{1}_{c}$ of Figure 6.

Figure 7

Fig. 8. The monadically transformed code transformation (from Figures 4 to 5 plus ${\textrm{Staged}}$ operations), based on Figure 6. Grey parts are unchanged or simply monadically lifted.

Figure 8

Fig. 9. The Cayley-transformed code transformation, based on Figure 8. Grey parts are unchanged.

Figure 9

Fig. 10. Code transformation plus wrapper using mutable arrays, modified from Figure 9. Grey parts are unchanged.

Figure 10

Fig. 11. The sharing structure before and after defunctionalisation. $\textrm{SCall}$ is elided here; in Figure 11(a), the backpropagator calls are depicted as if they are still normal calls. Boxes ($\Box$) are the same in-memory value as the value their arrow points to; two boxes pointing to the same value indicates that this value is shared: referenced in two places.

Figure 11

Fig. 12. Schematic view of the operational model underlying $(\star)$.

Figure 12

Fig. 13. An example program. Note that the program starts by forking, before performing any primitive operations, hence job $\alpha$ is empty and the partial order on compound IDs happens to have multiple minimal elements.

Figure 13

Fig. 14. Sketch of the implementation of the monad $\operatorname*{\mathcal{M}}$. The diagram shows the meaning of the job descriptions in “Fork”: the first field (labeled “A”) contains the history up to the last fork in this task (excluding subtasks), and the fields labeled B and C describe the subtasks spawned by that fork. The first job in a task has no history, indicated with “Start”.

Figure 14

Fig. 15. Implementation of $\textrm{SResolve}$ for the parallel-ready dual-numbers reverse AD algorithm. The inParallel function is as in Figure 14.

Figure 15

Table 1. Benchmark results of Section 12 + Sections 10.1 and 10.2 versus ad-4.5.6. The “TH” and “ad” columns indicate runtimes on one machine for our implementation and the ad library, respectively. The last column shows the ratio between the previous two columns. We give the size of the largest side of criterion’s 95% bootstrapping confidence interval, rounded to 2 decimal digits. Setup: GHC 9.6.6 on Linux, Intel i9-10900K CPU, with Intel Turbo Boost disabled (i.e. running at a consistent 3.7 GHz).

Submit a response

Discussions

No Discussions have been published for this article.