Skip to main content Accessibility help
Internet Explorer 11 is being discontinued by Microsoft in August 2021. If you have difficulties viewing the site on Internet Explorer 11 we recommend using a different browser such as Microsoft Edge, Google Chrome, Apple Safari or Mozilla Firefox.

Chapter 11: Fixed- and floating-point numbers

Chapter 11: Fixed- and floating-point numbers

pp. 250-268

Authors

, Stanford University, California, , Google Inc., New York, , University of British Columbia, Vancouver
Resources available Unlock the full potential of this textbook with additional resources. There are Instructor restricted resources available for this textbook. Explore resources
  • Add bookmark
  • Cite
  • Share

Summary

In Chapter 10 we introduced the basics of computer arithmetic: adding, subtracting, multiplying, and dividing binary integers. In this chapter we continue our exploration of computer arithmetic by looking at number representation in more detail. Often integers do not suffice for our needs. For example, suppose we wish to represent a pressure that varies between 0 (vacuum) and 0.9 atmospheres with an error of at most 0.001 atmospheres. Integers don't help us much when we need to distinguish 0.899 from 0.9. For this task we will introduce the notion of a binary point (similar to a decimal point) and use fixed-point binary numbers.

In some cases, we need to represent data with a very large dynamic range. For example, suppose we need to represent time intervals ranging from 1 ps (10−12 s) to one century (about 3 × 109 s) with an accuracy of 1%. To span this range with a fixed-point number would require 72 bits. However, if we use a floating-point number – in which we allow the position of the binary point to vary – we can get by with 13 bits: six bits to represent the number and seven bits to encode the position of the binary point.

REPRESENTATION ERROR: ACCURACY, PRECISION, AND RESOLUTION

With digital electronics, we represent a number, x, as a string of bits, b. Many different number Systems are used in digital systems. A number system can be thought of as two functions R and V. The representation function R maps a number x from some set of numbers (e.g., real numbers, integers, etc.) into a bit string b: b = R(x). The value function V returns the number (from the same set) represented by a particular bit string: y = V(b).

Consider mapping to and from the set of real numbers in some range. Because there are more possible real numbers than there are bit strings of a given length, many real numbers necessarily map to the same bit string. Thus, if we map a real number to a bit string with R and then back with V we will almost always get a slightly different real number than we started with. That is, if we compute y = V(R(x)) then y and x will differ.

About the book

Access options

Review the options below to login to check your access.

Purchase options

eTextbook
US$95.00
Hardback
US$95.00

Have an access code?

To redeem an access code, please log in with your personal login.

If you believe you should have access to this content, please contact your institutional librarian or consult our FAQ page for further information about accessing our content.

Also available to purchase from these educational ebook suppliers