Hostname: page-component-89b8bd64d-n8gtw Total loading time: 0 Render date: 2026-05-06T10:36:04.618Z Has data issue: false hasContentIssue false

Floating-point arithmetic

Published online by Cambridge University Press:  11 May 2023

Sylvie Boldo
Affiliation:
Université Paris Saclay, CNRS, ENS Paris Saclay, Inria, LMF, 91190 Gif-sur-Yvette, France E-mail: sylvie.boldo@inria.fr
Claude-Pierre Jeannerod
Affiliation:
Inria, ENS de Lyon, LIP, 69364 Lyon, France E-mail: claude-pierre.jeannerod@inria.fr
Guillaume Melquiond
Affiliation:
Université Paris Saclay, CNRS, ENS Paris Saclay, Inria, LMF, 91190 Gif-sur-Yvette, France E-mail: guillaume.melquiond@inria.fr
Jean-Michel Muller
Affiliation:
CNRS, ENS de Lyon, LIP, 69364 Lyon, France E-mail: jean-michel.muller@cnrs.fr
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the 'Save PDF' action button.

Floating-point numbers have an intuitive meaning when it comes to physics-based numerical computations, and they have thus become the most common way of approximating real numbers in computers. The IEEE-754 Standard has played a large part in making floating-point arithmetic ubiquitous today, by specifying its semantics in a strict yet useful way as early as 1985. In particular, floating-point operations should be performed as if their results were first computed with an infinite precision and then rounded to the target format. A consequence is that floating-point arithmetic satisfies the ‘standard model’ that is often used for analysing the accuracy of floating-point algorithms. But that is only scraping the surface, and floating-point arithmetic offers much more.

In this survey we recall the history of floating-point arithmetic as well as its specification mandated by the IEEE-754 Standard. We also recall what properties it entails and what every programmer should know when designing a floating-point algorithm. We provide various basic blocks that can be implemented with floating-point arithmetic. In particular, one can actually compute the rounding error caused by some floating-point operations, which paves the way to designing more accurate algorithms. More generally, properties of floating-point arithmetic make it possible to extend the accuracy of computations beyond working precision.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
© The Author(s), 2023. Published by Cambridge University Press