Skip to main content Accessibility help
×
Home

Perturbation confusion in forward automatic differentiation of higher-order functions

  • OLEKSANDR MANZYUK (a1), BARAK A. PEARLMUTTER (a1), ALEXEY ANDREYEVICH RADUL (a1), DAVID R. RUSH (a1) and JEFFREY MARK SISKIND (a2)...

Abstract

Automatic differentiation (AD) is a technique for augmenting computer programs to compute derivatives. The essence of AD in its forward accumulation mode is to attach perturbations to each number, and propagate these through the computation by overloading the arithmetic operators. When derivatives are nested, the distinct derivative calculations, and their associated perturbations, must be distinguished. This is typically accomplished by creating a unique tag for each derivative calculation and tagging the perturbations. We exhibit a subtle bug, present in fielded implementations which support derivatives of higher-order functions, in which perturbations are confused despite the tagging machinery, leading to incorrect results. The essence of the bug is as follows: a unique tag is needed for each derivative calculation, but in existing implementations unique tags are created when taking the derivative of a function at a point. When taking derivatives of higher-order functions, these need not correspond! We exhibit a simple example: a higher-order function f whose derivative at a point x, namely f′(x), is itself a function which calculates a derivative. This situation arises naturally when taking derivatives of curried functions. Two potential solutions are presented, and their deficiencies discussed. One uses eta expansion to delay the creation of fresh tags in order to put them into one-to-one correspondence with derivative calculations. The other wraps outputs of derivative operators with tag substitution machinery. Both solutions seem very difficult to implement without violating the desirable complexity guarantees of forward AD.

    • Send article to Kindle

      To send this article to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle. Find out more about sending to your Kindle.

      Note you can select to send to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

      Find out more about the Kindle Personal Document Service.

      Perturbation confusion in forward automatic differentiation of higher-order functions
      Available formats
      ×

      Send article to Dropbox

      To send this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Dropbox.

      Perturbation confusion in forward automatic differentiation of higher-order functions
      Available formats
      ×

      Send article to Google Drive

      To send this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Google Drive.

      Perturbation confusion in forward automatic differentiation of higher-order functions
      Available formats
      ×

Copyright

This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Footnotes

Hide All
1

Current affiliation: Facebook.

2

Current affiliation: Google AI.

3

Current address: Dunlavin, Ireland.

Footnotes

References

Hide All
Andrychowicz, M., Denil, M., Colmenarejo, S. G., Hoffman, M. W., Pfau, D., Schaul, T. & de Freitas, N. (2016) Learning to learn by gradient descent by gradient descent. In Advances in Neural Information Processing Systems. Red Hook, NY, USA: Curran Associates.
Baydin, A. G., Pearlmutter, B. A. & Siskind, J. M. (2016) DiffSharp: An AD library for.NET languages. arXiv:1611.03423.
Bendtsen, C. & Stauning, O. (1996) FADBAD, A Flexible C++ Package for Automatic Differentiation. Technical Report IMM-REP-1996-17. Department of Mathematical Modelling, Technical University of Denmark, Lyngby, Denmark.
Bischof, C. H., Carle, A., Corliss, G. F., Griewank, A. & Hovland, P. D. (1992) ADIFOR: Generating derivative codes from Fortran programs. Sci. Program. 1(1), 1129.
Breuleux, O. & van Merriënboer, B. (2017) Automatic differentiation in Myia. In AutoDiff Workshop at Neural Information Processing Systems Conference.
Buckwalter, B. (2007) Safe Forward-Mode AD in Haskell? https://mail.haskell.org/pipermail/haskell-cafe/2007-May/025274.html.
Chen, T. Q., Rubanova, Y., Bettencourt, J. & Duvenaud, D. (2018) Neural ordinary differential equations. arXiv:1806.07366.
Cheney, J. (2012) A dependent nominal type theory. arXiv:1201.5240.
Church, A. (1941) The Calculi of Lambda Conversion. Princeton, NJ: Princeton University Press.
Clifford, W. K. (1873) Preliminary sketch of bi-quaternions. Proc. London Math. Soc. 4, 381395.
Ehrhard, T. & Regnier, L. (2003) The differential lambda-calculus. Theor. Comput. Sci. 309(1–3), 141.
Elliott, C. M. (2009) Beautiful differentiation. In International Conference on Functional Programming (ICFP). New York, NY, USA: Association for Computing Machinery (ACM).
Elliott, C. M. (2017) Compiling to categories. International Conference on Functional Programming (ICFP) New York, NY, USA: Association for Computing Machinery (ACM).
Farr, W. M. (2006) “Automatic Differentiation” in OCaml. http://wmfarr.blogspot.com/2006/10/automatic-differentiation-in-ocaml.html.
Griewank, A. & Walther, A. (2008) Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation. Philadelphia, PA: Society for Industrial and Applied Mathematics.
Hamilton, W. R. (1837) Theory of conjugate functions, or algebraic couples; with a preliminary and elementary essay on algebra as the science of pure time. Trans. R. Ir. Acad . 17, 293422.
Hascoët, L. & Pascual, V. (2004) TAPENADE 2.1 user’s guide. Rapport technique 300. INRIA, Sophia Antipolis.
Karczmarczuk, J. (2001) Functional differentiation of computer programs. Higher-Order Symbolic Comput . 14, 3557.
Kelly, R., Pearlmutter, B. A. & Siskind, J. M. (2016) Evolving the incremental calculus into a model of forward AD. Extended abstract presented at the AD 2016 Conference, Oxford, UK, arXiv:1611.03429.
Kmett, E. (2010) ad: Automatic Differentiation. https://hackage.haskell.org/package/ad.
Lavendhomme, R. (1996) Basic Concepts of Synthetic Differential Geometry. Kluwer Academic.
Leibniz, G. W. (1684) Nova methodus pro maximis et minimis, itemque tangentibus, quae nec fractas nec irrationales quantitates moratur, et singulare pro illis calculi genus (A new method for maxima and minima, and for tangents, that is not hindered by fractional or irrational quantities, and a singular kind of calculus for the above mentioned). Acta Eruditorum.
Maclaurin, D., Duvenaud, D. & Adams, R. P. (2015a) Autograd: Effortless gradients in NumPy. In Paper presented at International Conference on Machine Learning AutoML Workshop.
Maclaurin, D., Duvenaud, D. & Adams, R. P. (2015b) Gradient-based hyperparameter optimization through reversible learning. arXiv:1502.03492.
Manzyuk, O. (2012a) A simply typed λ-calculus of forward automatic differentiation. In Proceedings of the 28th Conference on the Mathematical Foundations of Programming Semantics (MFPS XXVIII), Electronic Notes in Theoretical Computer Science, vol. 286, pp. 257272.
Manzyuk, O. (2012b) Tangent bundles in differential λ-categories. arXiv:1202.0411.
Newton, I. (1704) De quadratura curvarum. In Opticks: or, A Treatise of the Reflexions, Refractions, Inflexions and Colours of Light, also Two Treatises of the Species and Magnitude of Curvilinear Figures, London: Printed for Sam Smith and Benjamin Walford, printers to the Royal Society, at the Prince’s Arms in St. Paul’s Churchyard. Appendix.
Pearlmutter, B. A. & Siskind, J. M. (2007) Lazy multivariate higher-order forward-mode AD. In Symposium on Principles of Programming Languages, New York, NY, USA: Association for Computing Machinery (ACM), pp. 155160.
Pearlmutter, B. A. & Siskind, J. M. (2008) Using programming language theory to make AD sound and efficient. In International Conference on Automatic Differentiation, SIAM, pp. 7990.
Pitts, A. M. (2003) Nominal logic, a first order theory of names and binding. Inf. Comput. 186(2), 165193.
Plotkin, G. (2018) Some principles of differential programming languages. POPL 2018 Keynote talk, Jan 11, Los Angeles, CA, USA.
Raissi, M. (2018) Deep hidden physics models: Deep learning of nonlinear partial differential equations. J. Mach. Learn. Res. 19(25), 124.
Salman, H., Yadollahpour, P., Fletcher, T. & Batmanghelich, K. (2018) Deep diffeomorphic normalizing flows. arXiv:1810.03256.
Shan, C.-c. (2008) Differentiating Regions. http://conway.rutgers.edu/ccshan/wiki/blog/posts/Differentiation/.
Siskind, J. M. & Pearlmutter, B. A. (2005) Perturbation confusion and referential transparency: Correct functional implementation of forward-mode AD. In Implementation and Application of Functional Languages, pp. 19. Trinity College Dublin Computer Science Department Technical Report TCD-CS-2005-60.
Siskind, J. M. & Pearlmutter, B. A. (2007) First-class nonstandard interpretations by opening closures. In Symposium on Principles of Programming Languages, New York, NY, USA: Association for Computing Machinery (ACM), pp. 7176.
Siskind, J. M. & Pearlmutter, B. A. (2008) Nesting forward-mode AD in a functional framework. Higher-Order Symbolic Comput . 21(4), 361376.
Speelpenning, B. (1980) Compiling Fast Partial Derivatives of Functions Given by Algorithms. PhD thesis, Department of Computer Science, University of Illinois at Urbana-Champaign.
Sussman, G. J., Abelson, H., Wisdom, J., Katzenelson, J., Mayer, M. E., Hanson, C. P., Halfant, M., Siebert, B., Rozas, G. J., Skordos, P., Koniaris, K., Lin, K. & Zuras, D. (1997a) Scheme Mechanics Installation for GNU/Linux or Mac OS X. http://groups.csail.mit.edu/mac/users/gjs/6946/linux-install.htm. http://groups.csail.mit.edu/mac/users/gjs/6946/scmutils-tarballs/.
Sussman, G. J., Abelson, H., Wisdom, J., Katzenelson, J., Mayer, M. E., Hanson, C. P., Halfant, M., Siebert, B., Rozas, G. J., Skordos, P., Koniaris, K., Lin, K. & Zuras, D. (1997b) SCMUTILS Reference Manual. http://groups.csail.mit.edu/mac/users/gjs/6946/refman.txt.
Sussman, G. J., Wisdom, J. & Mayer, M. E. (2001) Structure and Interpretation of Classical Mechanics. Cambridge, MA: MIT Press.
Sussman, G. J., Wisdom, J. & Farr, W. M. (2013) Functional Differential Geometry. Cambridge, MA: MIT Press.
Taylor, B. (1715) Methodus incrementorum directa et inversa. London: Typis Pearsonianis.
van Merriënboer, B., Moldovan, D. & Wiltschko, A. (2018) Tangent: Automatic differentiation using source-code transformation for dynamically typed array programming. In Advances in Neural Information Processing Systems, Red Hook, New York, USA: Curran Associates, pp. 62596268.
Wengert, R. E. (1964) A simple automatic derivative evaluation program. Commun. ACM 7(8), 463464.
Type Description Title
UNKNOWN
Supplementary materials

Manzyuk et al. supplementary material
Manzyuk et al. supplementary material

 Unknown (5 KB)
5 KB

Perturbation confusion in forward automatic differentiation of higher-order functions

  • OLEKSANDR MANZYUK (a1), BARAK A. PEARLMUTTER (a1), ALEXEY ANDREYEVICH RADUL (a1), DAVID R. RUSH (a1) and JEFFREY MARK SISKIND (a2)...

Metrics

Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed

Perturbation confusion in forward automatic differentiation of higher-order functions

  • OLEKSANDR MANZYUK (a1), BARAK A. PEARLMUTTER (a1), ALEXEY ANDREYEVICH RADUL (a1), DAVID R. RUSH (a1) and JEFFREY MARK SISKIND (a2)...
Submit a response

Discussions

No Discussions have been published for this article.

×

Reply to: Submit a response


Your details


Conflicting interests

Do you have any conflicting interests? *