Skip to main content Accessibility help
×
Hostname: page-component-848d4c4894-75dct Total loading time: 0 Render date: 2024-04-30T18:50:38.322Z Has data issue: false hasContentIssue false

12 - Basics of Floating–Point Quantization

from Part III - Floating–Point Quantization

Published online by Cambridge University Press:  06 July 2010

Bernard Widrow
Affiliation:
Stanford University, California
István Kollár
Affiliation:
Budapest University of Technology and Economics
Get access

Summary

Representation of physical quantities in terms of floating–point numbers allows one to cover a very wide dynamic range with a relatively small number of digits. Given this type of representation, roundoff errors are roughly proportional to the amplitude of the represented quantity. In contrast, roundoff errors with uniform quantization are bounded between ±q/2 and are not in any way proportional to the represented quantity.

Floating–point is in most cases so advantageous over fixed–point number representation that it is rapidly becoming ubiquitous. The movement toward usage of floating–point numbers is accelerating as the speed of floating–point calculation is increasing and the cost of implementation is going down. For this reason, it is essential to have a method of analysis for floating–point quantization and floating–point arithmetic.

THE FLOATING–POINT QUANTIZER

Binary numbers have become accepted as the basis for all digital computation. We therefore describe floating–point representation in terms of binary numbers. Other number bases are completely possible, such as base 10 or base 16, but modern digital hardware is built on the binary base.

The numbers in the table of Fig. 12.1 are chosen to provide a simple example. We begin by counting with nonnegative binary floating–point numbers as illustrated in Fig. 12.1. The counting starts with the number 0, represented here by 00000. Each number is multiplied by 2E, where E is an exponent. Initially, let E = 0. Continuing the count, the next number is 1, represented by 00001, and so forth.

Type
Chapter
Information
Quantization Noise
Roundoff Error in Digital Computation, Signal Processing, Control, and Communications
, pp. 257 - 306
Publisher: Cambridge University Press
Print publication year: 2008

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×