Perturbation confusion in forward automatic differentiation of higher-order functions

OLEKSANDR MANZYUK; BARAK A. PEARLMUTTER; ALEXEY ANDREYEVICH RADUL; DAVID R. RUSH; JEFFREY MARK SISKIND

doi:10.1017/S095679681900008X

Perturbation confusion in forward automatic differentiation of higher-order functions

Published online by Cambridge University Press: 16 September 2019

OLEKSANDR MANZYUK ,

BARAK A. PEARLMUTTER

ALEXEY ANDREYEVICH RADUL ,

DAVID R. RUSH and

JEFFREY MARK SISKIND

Show author details

OLEKSANDR MANZYUK: Affiliation:
Department of Computer Science and Hamilton Institute, Maynooth University, Co. Kildare, Ireland (e-mails: manzyuk@gmail.com, barak@pearlmutter.net, axch@alum.mit.edu, kumoyuki@gmail.com)
BARAK A. PEARLMUTTER: Affiliation:
Department of Computer Science and Hamilton Institute, Maynooth University, Co. Kildare, Ireland (e-mails: manzyuk@gmail.com, barak@pearlmutter.net, axch@alum.mit.edu, kumoyuki@gmail.com)
ALEXEY ANDREYEVICH RADUL: Affiliation:
Department of Computer Science and Hamilton Institute, Maynooth University, Co. Kildare, Ireland (e-mails: manzyuk@gmail.com, barak@pearlmutter.net, axch@alum.mit.edu, kumoyuki@gmail.com)
DAVID R. RUSH: Affiliation:
Department of Computer Science and Hamilton Institute, Maynooth University, Co. Kildare, Ireland (e-mails: manzyuk@gmail.com, barak@pearlmutter.net, axch@alum.mit.edu, kumoyuki@gmail.com)
JEFFREY MARK SISKIND: Affiliation:
School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907-2035, USA (e-mail: qobi@purdue.edu)

Article contents

Abstract
Footnotes
References

Rights & Permissions

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

Automatic differentiation (AD) is a technique for augmenting computer programs to compute derivatives. The essence of AD in its forward accumulation mode is to attach perturbations to each number, and propagate these through the computation by overloading the arithmetic operators. When derivatives are nested, the distinct derivative calculations, and their associated perturbations, must be distinguished. This is typically accomplished by creating a unique tag for each derivative calculation and tagging the perturbations. We exhibit a subtle bug, present in fielded implementations which support derivatives of higher-order functions, in which perturbations are confused despite the tagging machinery, leading to incorrect results. The essence of the bug is as follows: a unique tag is needed for each derivative calculation, but in existing implementations unique tags are created when taking the derivative of a function at a point. When taking derivatives of higher-order functions, these need not correspond! We exhibit a simple example: a higher-order function f whose derivative at a point x, namely f′(x), is itself a function which calculates a derivative. This situation arises naturally when taking derivatives of curried functions. Two potential solutions are presented, and their deficiencies discussed. One uses eta expansion to delay the creation of fresh tags in order to put them into one-to-one correspondence with derivative calculations. The other wraps outputs of derivative operators with tag substitution machinery. Both solutions seem very difficult to implement without violating the desirable complexity guarantees of forward AD.

Information

Type: Research Article
Information: Journal of Functional Programming , Volume 29 , 2019 , e12

DOI: https://doi.org/10.1017/S095679681900008X [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright: © The Author(s) (2019)

Footnotes

Current affiliation: Facebook.

Current affiliation: Google AI.

Current address: Dunlavin, Ireland.

References

Andrychowicz, M., Denil, M., Colmenarejo, S. G., Hoffman, M. W., Pfau, D., Schaul, T. & de Freitas, N. (2016) Learning to learn by gradient descent by gradient descent. In Advances in Neural Information Processing Systems. Red Hook, NY, USA: Curran Associates.Google Scholar

Baydin, A. G., Pearlmutter, B. A. & Siskind, J. M. (2016) DiffSharp: An AD library for.NET languages. arXiv:1611.03423.Google Scholar

Bendtsen, C. & Stauning, O. (1996) FADBAD, A Flexible C++ Package for Automatic Differentiation. Technical Report IMM-REP-1996-17. Department of Mathematical Modelling, Technical University of Denmark, Lyngby, Denmark.Google Scholar

Bischof, C. H., Carle, A., Corliss, G. F., Griewank, A. & Hovland, P. D. (1992) ADIFOR: Generating derivative codes from Fortran programs. Sci. Program. 1(1), 11–29.Google Scholar

Breuleux, O. & van Merriënboer, B. (2017) Automatic differentiation in Myia. In AutoDiff Workshop at Neural Information Processing Systems Conference.Google Scholar

Buckwalter, B. (2007) Safe Forward-Mode AD in Haskell? https://mail.haskell.org/pipermail/haskell-cafe/2007-May/025274.html.Google Scholar

Chen, T. Q., Rubanova, Y., Bettencourt, J. & Duvenaud, D. (2018) Neural ordinary differential equations. arXiv:1806.07366.Google Scholar

Cheney, J. (2012) A dependent nominal type theory. arXiv:1201.5240.CrossRef Google Scholar

Church, A. (1941) The Calculi of Lambda Conversion. Princeton, NJ: Princeton University Press.Google Scholar

Clifford, W. K. (1873) Preliminary sketch of bi-quaternions. Proc. London Math. Soc. 4, 381–395.Google Scholar

Ehrhard, T. & Regnier, L. (2003) The differential lambda-calculus. Theor. Comput. Sci. 309(1–3), 1–41.CrossRef Google Scholar

Elliott, C. M. (2009) Beautiful differentiation. In International Conference on Functional Programming (ICFP). New York, NY, USA: Association for Computing Machinery (ACM).Google Scholar

Elliott, C. M. (2017) Compiling to categories. International Conference on Functional Programming (ICFP) New York, NY, USA: Association for Computing Machinery (ACM).Google Scholar

Farr, W. M. (2006) “Automatic Differentiation” in OCaml. http://wmfarr.blogspot.com/2006/10/automatic-differentiation-in-ocaml.html.Google Scholar

Griewank, A. & Walther, A. (2008) Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation. Philadelphia, PA: Society for Industrial and Applied Mathematics.CrossRef Google Scholar

Hamilton, W. R. (1837) Theory of conjugate functions, or algebraic couples; with a preliminary and elementary essay on algebra as the science of pure time. Trans. R. Ir. Acad . 17, 293–422.Google Scholar

Hascoët, L. & Pascual, V. (2004) TAPENADE 2.1 user’s guide. Rapport technique 300. INRIA, Sophia Antipolis.Google Scholar

Karczmarczuk, J. (2001) Functional differentiation of computer programs. Higher-Order Symbolic Comput . 14, 35–57.CrossRef Google Scholar

Kelly, R., Pearlmutter, B. A. & Siskind, J. M. (2016) Evolving the incremental calculus into a model of forward AD. Extended abstract presented at the AD 2016 Conference, Oxford, UK, arXiv:1611.03429.Google Scholar

Kmett, E. (2010) ad: Automatic Differentiation. https://hackage.haskell.org/package/ad.Google Scholar

Lavendhomme, R. (1996) Basic Concepts of Synthetic Differential Geometry. Kluwer Academic.CrossRef Google Scholar

Leibniz, G. W. (1684) Nova methodus pro maximis et minimis, itemque tangentibus, quae nec fractas nec irrationales quantitates moratur, et singulare pro illis calculi genus (A new method for maxima and minima, and for tangents, that is not hindered by fractional or irrational quantities, and a singular kind of calculus for the above mentioned). Acta Eruditorum.Google Scholar

Maclaurin, D., Duvenaud, D. & Adams, R. P. (2015a) Autograd: Effortless gradients in NumPy. In Paper presented at International Conference on Machine Learning AutoML Workshop.Google Scholar

Maclaurin, D., Duvenaud, D. & Adams, R. P. (2015b) Gradient-based hyperparameter optimization through reversible learning. arXiv:1502.03492.Google Scholar

Manzyuk, O. (2012a) A simply typed λ-calculus of forward automatic differentiation. In Proceedings of the 28th Conference on the Mathematical Foundations of Programming Semantics (MFPS XXVIII), Electronic Notes in Theoretical Computer Science, vol. 286, pp. 257–272.CrossRef Google Scholar

Manzyuk, O. (2012b) Tangent bundles in differential λ-categories. arXiv:1202.0411.Google Scholar

Newton, I. (1704) De quadratura curvarum. In Opticks: or, A Treatise of the Reflexions, Refractions, Inflexions and Colours of Light, also Two Treatises of the Species and Magnitude of Curvilinear Figures, London: Printed for Sam Smith and Benjamin Walford, printers to the Royal Society, at the Prince’s Arms in St. Paul’s Churchyard. Appendix.CrossRef Google Scholar

Pearlmutter, B. A. & Siskind, J. M. (2007) Lazy multivariate higher-order forward-mode AD. In Symposium on Principles of Programming Languages, New York, NY, USA: Association for Computing Machinery (ACM), pp. 155–160.Google Scholar

Pearlmutter, B. A. & Siskind, J. M. (2008) Using programming language theory to make AD sound and efficient. In International Conference on Automatic Differentiation, SIAM, pp. 79–90.CrossRef Google Scholar

Pitts, A. M. (2003) Nominal logic, a first order theory of names and binding. Inf. Comput. 186(2), 165–193.CrossRef Google Scholar

Plotkin, G. (2018) Some principles of differential programming languages. POPL 2018 Keynote talk, Jan 11, Los Angeles, CA, USA.Google Scholar

Raissi, M. (2018) Deep hidden physics models: Deep learning of nonlinear partial differential equations. J. Mach. Learn. Res. 19(25), 1–24.Google Scholar

Salman, H., Yadollahpour, P., Fletcher, T. & Batmanghelich, K. (2018) Deep diffeomorphic normalizing flows. arXiv:1810.03256.Google Scholar

Shan, C.-c. (2008) Differentiating Regions. http://conway.rutgers.edu/ccshan/wiki/blog/posts/Differentiation/.Google Scholar

Siskind, J. M. & Pearlmutter, B. A. (2005) Perturbation confusion and referential transparency: Correct functional implementation of forward-mode AD. In Implementation and Application of Functional Languages, pp. 1–9. Trinity College Dublin Computer Science Department Technical Report TCD-CS-2005-60.Google Scholar

Siskind, J. M. & Pearlmutter, B. A. (2007) First-class nonstandard interpretations by opening closures. In Symposium on Principles of Programming Languages, New York, NY, USA: Association for Computing Machinery (ACM), pp. 71–76.Google Scholar

Siskind, J. M. & Pearlmutter, B. A. (2008) Nesting forward-mode AD in a functional framework. Higher-Order Symbolic Comput . 21(4), 361–376.CrossRef Google Scholar

Speelpenning, B. (1980) Compiling Fast Partial Derivatives of Functions Given by Algorithms. PhD thesis, Department of Computer Science, University of Illinois at Urbana-Champaign.CrossRef Google Scholar

Sussman, G. J., Abelson, H., Wisdom, J., Katzenelson, J., Mayer, M. E., Hanson, C. P., Halfant, M., Siebert, B., Rozas, G. J., Skordos, P., Koniaris, K., Lin, K. & Zuras, D. (1997a) Scheme Mechanics Installation for GNU/Linux or Mac OS X. http://groups.csail.mit.edu/mac/users/gjs/6946/linux-install.htm. http://groups.csail.mit.edu/mac/users/gjs/6946/scmutils-tarballs/.Google Scholar

Sussman, G. J., Abelson, H., Wisdom, J., Katzenelson, J., Mayer, M. E., Hanson, C. P., Halfant, M., Siebert, B., Rozas, G. J., Skordos, P., Koniaris, K., Lin, K. & Zuras, D. (1997b) SCMUTILS Reference Manual. http://groups.csail.mit.edu/mac/users/gjs/6946/refman.txt.Google Scholar

Sussman, G. J., Wisdom, J. & Mayer, M. E. (2001) Structure and Interpretation of Classical Mechanics. Cambridge, MA: MIT Press.Google Scholar

Sussman, G. J., Wisdom, J. & Farr, W. M. (2013) Functional Differential Geometry. Cambridge, MA: MIT Press.Google Scholar

Taylor, B. (1715) Methodus incrementorum directa et inversa. London: Typis Pearsonianis.Google Scholar

van Merriënboer, B., Moldovan, D. & Wiltschko, A. (2018) Tangent: Automatic differentiation using source-code transformation for dynamically typed array programming. In Advances in Neural Information Processing Systems, Red Hook, New York, USA: Curran Associates, pp. 6259–6268.Google Scholar

Wengert, R. E. (1964) A simple automatic derivative evaluation program. Commun. ACM 7(8), 463–464.CrossRef Google Scholar

Manzyuk et al. supplementary material

Manzyuk et al. supplementary material 1

File 5.3 KB

Manzyuk et al. supplementary material

Manzyuk et al. supplementary material 2

Video 21.6 MB

Submit a response

Discussions

No Discussions have been published for this article.

Article contents

Perturbation confusion in forward automatic differentiation of higher-order functions

Abstract

Information

Footnotes

References

Manzyuk et al. supplementary material

Manzyuk et al. supplementary material

Discussions

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests