Hostname: page-component-848d4c4894-m9kch Total loading time: 0 Render date: 2024-05-29T05:55:55.727Z Has data issue: false hasContentIssue false

# Perturbation confusion in forward automatic differentiation of higher-order functions

Published online by Cambridge University Press:  16 September 2019

## Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

Automatic differentiation (AD) is a technique for augmenting computer programs to compute derivatives. The essence of AD in its forward accumulation mode is to attach perturbations to each number, and propagate these through the computation by overloading the arithmetic operators. When derivatives are nested, the distinct derivative calculations, and their associated perturbations, must be distinguished. This is typically accomplished by creating a unique tag for each derivative calculation and tagging the perturbations. We exhibit a subtle bug, present in fielded implementations which support derivatives of higher-order functions, in which perturbations are confused despite the tagging machinery, leading to incorrect results. The essence of the bug is as follows: a unique tag is needed for each derivative calculation, but in existing implementations unique tags are created when taking the derivative of a function at a point. When taking derivatives of higher-order functions, these need not correspond! We exhibit a simple example: a higher-order function f whose derivative at a point x, namely f′(x), is itself a function which calculates a derivative. This situation arises naturally when taking derivatives of curried functions. Two potential solutions are presented, and their deficiencies discussed. One uses eta expansion to delay the creation of fresh tags in order to put them into one-to-one correspondence with derivative calculations. The other wraps outputs of derivative operators with tag substitution machinery. Both solutions seem very difficult to implement without violating the desirable complexity guarantees of forward AD.

Type
Research Article
Information
Creative Commons
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

1

2

3

## References

Andrychowicz, M., Denil, M., Colmenarejo, S. G., Hoffman, M. W., Pfau, D., Schaul, T. & de Freitas, N. (2016) Learning to learn by gradient descent by gradient descent. In Advances in Neural Information Processing Systems. Red Hook, NY, USA: Curran Associates.Google Scholar
Baydin, A. G., Pearlmutter, B. A. & Siskind, J. M. (2016) DiffSharp: An AD library for.NET languages. arXiv:1611.03423.Google Scholar
Bendtsen, C. & Stauning, O. (1996) FADBAD, A Flexible C++ Package for Automatic Differentiation. Technical Report IMM-REP-1996-17. Department of Mathematical Modelling, Technical University of Denmark, Lyngby, Denmark.Google Scholar
Bischof, C. H., Carle, A., Corliss, G. F., Griewank, A. & Hovland, P. D. (1992) ADIFOR: Generating derivative codes from Fortran programs. Sci. Program. 1(1), 1129.Google Scholar
Breuleux, O. & van Merriënboer, B. (2017) Automatic differentiation in Myia. In AutoDiff Workshop at Neural Information Processing Systems Conference.Google Scholar
Chen, T. Q., Rubanova, Y., Bettencourt, J. & Duvenaud, D. (2018) Neural ordinary differential equations. arXiv:1806.07366.Google Scholar
Cheney, J. (2012) A dependent nominal type theory. arXiv:1201.5240.CrossRefGoogle Scholar
Church, A. (1941) The Calculi of Lambda Conversion. Princeton, NJ: Princeton University Press.Google Scholar
Clifford, W. K. (1873) Preliminary sketch of bi-quaternions. Proc. London Math. Soc. 4, 381395.Google Scholar
Ehrhard, T. & Regnier, L. (2003) The differential lambda-calculus. Theor. Comput. Sci. 309(1–3), 141.CrossRefGoogle Scholar
Elliott, C. M. (2009) Beautiful differentiation. In International Conference on Functional Programming (ICFP). New York, NY, USA: Association for Computing Machinery (ACM).Google Scholar
Elliott, C. M. (2017) Compiling to categories. International Conference on Functional Programming (ICFP) New York, NY, USA: Association for Computing Machinery (ACM).Google Scholar
Farr, W. M. (2006) “Automatic Differentiation” in OCaml. http://wmfarr.blogspot.com/2006/10/automatic-differentiation-in-ocaml.html.Google Scholar
Griewank, A. & Walther, A. (2008) Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation. Philadelphia, PA: Society for Industrial and Applied Mathematics.CrossRefGoogle Scholar
Hamilton, W. R. (1837) Theory of conjugate functions, or algebraic couples; with a preliminary and elementary essay on algebra as the science of pure time. Trans. R. Ir. Acad . 17, 293422.Google Scholar
Hascoët, L. & Pascual, V. (2004) TAPENADE 2.1 user’s guide. Rapport technique 300. INRIA, Sophia Antipolis.Google Scholar
Karczmarczuk, J. (2001) Functional differentiation of computer programs. Higher-Order Symbolic Comput . 14, 3557.CrossRefGoogle Scholar
Kelly, R., Pearlmutter, B. A. & Siskind, J. M. (2016) Evolving the incremental calculus into a model of forward AD. Extended abstract presented at the AD 2016 Conference, Oxford, UK, arXiv:1611.03429.Google Scholar
Lavendhomme, R. (1996) Basic Concepts of Synthetic Differential Geometry. Kluwer Academic.CrossRefGoogle Scholar
Leibniz, G. W. (1684) Nova methodus pro maximis et minimis, itemque tangentibus, quae nec fractas nec irrationales quantitates moratur, et singulare pro illis calculi genus (A new method for maxima and minima, and for tangents, that is not hindered by fractional or irrational quantities, and a singular kind of calculus for the above mentioned). Acta Eruditorum.Google Scholar
Maclaurin, D., Duvenaud, D. & Adams, R. P. (2015a) Autograd: Effortless gradients in NumPy. In Paper presented at International Conference on Machine Learning AutoML Workshop.Google Scholar
Maclaurin, D., Duvenaud, D. & Adams, R. P. (2015b) Gradient-based hyperparameter optimization through reversible learning. arXiv:1502.03492.Google Scholar
Manzyuk, O. (2012a) A simply typed λ-calculus of forward automatic differentiation. In Proceedings of the 28th Conference on the Mathematical Foundations of Programming Semantics (MFPS XXVIII), Electronic Notes in Theoretical Computer Science, vol. 286, pp. 257272.CrossRefGoogle Scholar
Manzyuk, O. (2012b) Tangent bundles in differential λ-categories. arXiv:1202.0411.Google Scholar
Newton, I. (1704) De quadratura curvarum. In Opticks: or, A Treatise of the Reflexions, Refractions, Inflexions and Colours of Light, also Two Treatises of the Species and Magnitude of Curvilinear Figures, London: Printed for Sam Smith and Benjamin Walford, printers to the Royal Society, at the Prince’s Arms in St. Paul’s Churchyard. Appendix.CrossRefGoogle Scholar
Pearlmutter, B. A. & Siskind, J. M. (2007) Lazy multivariate higher-order forward-mode AD. In Symposium on Principles of Programming Languages, New York, NY, USA: Association for Computing Machinery (ACM), pp. 155160.Google Scholar
Pearlmutter, B. A. & Siskind, J. M. (2008) Using programming language theory to make AD sound and efficient. In International Conference on Automatic Differentiation, SIAM, pp. 7990.CrossRefGoogle Scholar
Pitts, A. M. (2003) Nominal logic, a first order theory of names and binding. Inf. Comput. 186(2), 165193.CrossRefGoogle Scholar
Plotkin, G. (2018) Some principles of differential programming languages. POPL 2018 Keynote talk, Jan 11, Los Angeles, CA, USA.Google Scholar
Raissi, M. (2018) Deep hidden physics models: Deep learning of nonlinear partial differential equations. J. Mach. Learn. Res. 19(25), 124.Google Scholar
Salman, H., Yadollahpour, P., Fletcher, T. & Batmanghelich, K. (2018) Deep diffeomorphic normalizing flows. arXiv:1810.03256.Google Scholar
Siskind, J. M. & Pearlmutter, B. A. (2005) Perturbation confusion and referential transparency: Correct functional implementation of forward-mode AD. In Implementation and Application of Functional Languages, pp. 19. Trinity College Dublin Computer Science Department Technical Report TCD-CS-2005-60.Google Scholar
Siskind, J. M. & Pearlmutter, B. A. (2007) First-class nonstandard interpretations by opening closures. In Symposium on Principles of Programming Languages, New York, NY, USA: Association for Computing Machinery (ACM), pp. 7176.Google Scholar
Siskind, J. M. & Pearlmutter, B. A. (2008) Nesting forward-mode AD in a functional framework. Higher-Order Symbolic Comput . 21(4), 361376.CrossRefGoogle Scholar
Speelpenning, B. (1980) Compiling Fast Partial Derivatives of Functions Given by Algorithms. PhD thesis, Department of Computer Science, University of Illinois at Urbana-Champaign.CrossRefGoogle Scholar
Sussman, G. J., Abelson, H., Wisdom, J., Katzenelson, J., Mayer, M. E., Hanson, C. P., Halfant, M., Siebert, B., Rozas, G. J., Skordos, P., Koniaris, K., Lin, K. & Zuras, D. (1997a) Scheme Mechanics Installation for GNU/Linux or Mac OS X. http://groups.csail.mit.edu/mac/users/gjs/6946/linux-install.htm. http://groups.csail.mit.edu/mac/users/gjs/6946/scmutils-tarballs/.Google Scholar
Sussman, G. J., Abelson, H., Wisdom, J., Katzenelson, J., Mayer, M. E., Hanson, C. P., Halfant, M., Siebert, B., Rozas, G. J., Skordos, P., Koniaris, K., Lin, K. & Zuras, D. (1997b) SCMUTILS Reference Manual. http://groups.csail.mit.edu/mac/users/gjs/6946/refman.txt.Google Scholar
Sussman, G. J., Wisdom, J. & Mayer, M. E. (2001) Structure and Interpretation of Classical Mechanics. Cambridge, MA: MIT Press.Google Scholar
Sussman, G. J., Wisdom, J. & Farr, W. M. (2013) Functional Differential Geometry. Cambridge, MA: MIT Press.Google Scholar
Taylor, B. (1715) Methodus incrementorum directa et inversa. London: Typis Pearsonianis.Google Scholar
van Merriënboer, B., Moldovan, D. & Wiltschko, A. (2018) Tangent: Automatic differentiation using source-code transformation for dynamically typed array programming. In Advances in Neural Information Processing Systems, Red Hook, New York, USA: Curran Associates, pp. 62596268.Google Scholar
Wengert, R. E. (1964) A simple automatic derivative evaluation program. Commun. ACM 7(8), 463464.CrossRefGoogle Scholar

### Manzyuk et al. supplementary material

Manzyuk et al. supplementary material 1
File 5.3 KB

### Manzyuk et al. supplementary material

Manzyuk et al. supplementary material 2

Video 21.6 MB