To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
The key to parallel programming is sharing a task between many cooperating threads running in parallel. A chart is presented showing how since 2003 the Moore’s law growth in computing performance has depended on parallel computing. This chapter includes a simple introductory CUDA example which performs numerical integration using 1000 000 000 threads. Using CUDA gives a speed-up of about 1000 compared to a single CPU thread. Key CUDA concepts including thread blocks, thread grids and warps are introduced. The hardware differences between conventional CPU architectures and GPUs are then discussed. Optimisations in memory caching on GPUs are also explained as memory access time is often a key performance constraint for many programs. The use of OpenMP to share a single task across all cores of a multicore CPU is also discussed.
Chapter 7 explores the ability of GPUs to perform multiple tasks simultaneously, including overlapping IO with computation and the simultaneous running of multiple kernels. CUDA streams and events are advanced features that allow users to manage multiple asynchronous tasks running on the GPU. Examples are given and the NVIDIA visual profiler (NVVP) is used to visualise the timeline for tasks in multiple CUDA streams. Asynchronous disk IO on the host PC can also be performed and examples using the C++ <threads> are given. Finally, the new CUDA graphs feature is introduced. This provides a wrapper for efficiently launching large numbers of kernel calls for complex workloads.
Appendix B discusses the role of atomic operations in parallel computing and the available function in CUDA. An example is provided showing the use of atomicCAS to implement another atomic operation.
This chapter discusses the tensor core hardware available on newer GPUs. This hardware is designed to perform fast mixed precision matrix multiplications and is intended for applications in AI.However, CUDA exposes their use to programmers with the warp matrix function library. These functions support tiled matrix multiplication using 16 × 16 tiles.We provide examples of their use to improve on the early matrix multiplication example in Chapter 2.We also show how reduction operations can be performed using tensor codes as a potential non-AI application.
Chapter 6 explains the CUDA random number generators provided by the cuRAND library. The CUDA XORWOW generator was found to be the fastest generator in the cuRAND library. The classic calculation of pi by generating random numbers inside a square is used as a test case for the various possibilities on both host CPU and the GPU. A kernel using separate generators for each thread is able to generate about1012 random numbers per second and is about 20 000 times faster than the simplest host CPU version running on a single core. The inverse transform method for generating random numbers from any distribution is explained. A 3D Ising model calculation is presented as a more interesting application of random numbers.The Ising example has a simple interactive GUI based on OpenCV.
The solution of partial differential equations in two and three-dimensions using stencil iteration (Jacobi’s method) is discussed and illustrated for Laplace’s equation. A very simple kernel gives about a factor of 100 speed-up compared to the host CPU.The very slow convergence of the Jacobi method can be addressed by using solutions on lower resolution grids to initialise higher resolution grids. A convergence check using the maximum change per iteration is also illustrated. Digital image processing is another example of stencil use and a number of digital image filters are shown including the Sobel filter for edge finding and the median filter for noise reduction. The fast GPU-based median filter uses one thread per image pixel and is implemented using an optimal Batcher network.
CUDA is now the dominant language used for programming GPUs, one of the most exciting hardware developments of recent decades. With CUDA, you can use a desktop PC for work that would have previously required a large cluster of PCs or access to a HPC facility. As a result, CUDA is increasingly important in scientific and technical computing across the whole STEM community, from medical physics and financial modelling to big data applications and beyond. This unique book on CUDA draws on the author's passion for and long experience of developing and using computers to acquire and analyse scientific data. The result is an innovative text featuring a much richer set of examples than found in any other comparable book on GPU computing. Much attention has been paid to the C++ coding style, which is compact, elegant and efficient. A code base of examples and supporting material is available online, which readers can build on for their own projects.
An introduction to the syntax and conventions of Mathematica and the Wolfram Language, with tips to get new users up and running. The Basic Math Assistant palette is discussed in some depth.
Using Mathematica and the Wolfram Language to engage with the the algebra encountered in a precalculus or college algebra setting. Includes solving equations and simplifying expressions.
An introduction to the computational geometry commands in the Wolfram Language with an eye toward creating high quality, watertight, 3D printable meshes. Numerous examples illustrate the ideas.
Using Mathematica and the Wolfram Language to investigate mathematical functions, their graphs, creating tables of values, and working with real world data.
Practical information to and tips for using Mathematica and the Wolfram Language. Document creation, slideshow presentations, keyboard shortcuts, documentation, and troubleshooting are discussed.
Using Mathematica and the Wolfram Language to engage with the calculus of functions of a single variable. Includes limits, continuity, differentiation, integration, sequences, and series.