Search results for Computational Science

Appendix C - The NVCC Compiler
Richard Ansorge, University of Cambridge
Book:

Programming in Parallel with CUDA

Published online:

04 May 2022

Print publication:

02 June 2022, pp 387-392
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

CUDA uses the NVCC compiler to generate GPU code. This appendix discusses some of the important options users can use to tune the performance of their code.

Contents
Richard Ansorge, University of Cambridge
Book:

Programming in Parallel with CUDA

Published online:

04 May 2022

Print publication:

02 June 2022, pp vii-ix
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Preface
Richard Ansorge, University of Cambridge
Book:

Programming in Parallel with CUDA

Published online:

04 May 2022

Print publication:

02 June 2022, pp xix-xxii
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

9 - Scaling Up
Richard Ansorge, University of Cambridge
Book:

Programming in Parallel with CUDA

Published online:

04 May 2022

Print publication:

02 June 2022, pp 293-324
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Chapter 9 discusses how to share a single calculation between multiple GPUs on a workstation. CUDA provides a number of tools to both manage individual devices and for memory management so that multiple devices can see a common shared memory pool. CUDA unified virtual addressing (UVA) is an example of this. Transfers of data between the host and GPU memory can also be automated or eliminated using unified memory (UM) or zero-copy memory. To scale beyond a single workstation the well-known message passing interface (MPI) library is often used and this is described with a simple example.

1 - Introduction to GPU Kernels and Hardware
Richard Ansorge, University of Cambridge
Book:

Programming in Parallel with CUDA

Published online:

04 May 2022

Print publication:

02 June 2022, pp 1-21
- Chapter
- - You have access
- PDF
- Export citation
Summary

The key to parallel programming is sharing a task between many cooperating threads running in parallel. A chart is presented showing how since 2003 the Moore’s law growth in computing performance has depended on parallel computing. This chapter includes a simple introductory CUDA example which performs numerical integration using 1000 000 000 threads. Using CUDA gives a speed-up of about 1000 compared to a single CPU thread. Key CUDA concepts including thread blocks, thread grids and warps are introduced. The hardware differences between conventional CPU architectures and GPUs are then discussed. Optimisations in memory caching on GPUs are also explained as memory access time is often a key performance constraint for many programs. The use of OpenMP to share a single task across all cores of a multicore CPU is also discussed.

7 - Concurrency Using CUDA Streams and Events
Richard Ansorge, University of Cambridge
Book:

Programming in Parallel with CUDA

Published online:

04 May 2022

Print publication:

02 June 2022, pp 209-238
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Chapter 7 explores the ability of GPUs to perform multiple tasks simultaneously, including overlapping IO with computation and the simultaneous running of multiple kernels. CUDA streams and events are advanced features that allow users to manage multiple asynchronous tasks running on the GPU. Examples are given and the NVIDIA visual profiler (NVVP) is used to visualise the timeline for tasks in multiple CUDA streams. Asynchronous disk IO on the host PC can also be performed and examples using the C++ <threads> are given. Finally, the new CUDA graphs feature is introduced. This provides a wrapper for efficiently launching large numbers of kernel calls for complex workloads.

Appendix B - Atomic Operations
Richard Ansorge, University of Cambridge
Book:

Programming in Parallel with CUDA

Published online:

04 May 2022

Print publication:

02 June 2022, pp 382-386
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Appendix B discusses the role of atomic operations in parallel computing and the available function in CUDA. An example is provided showing the use of atomicCAS to implement another atomic operation.

11 - Tensor Cores
Richard Ansorge, University of Cambridge
Book:

Programming in Parallel with CUDA

Published online:

04 May 2022

Print publication:

02 June 2022, pp 358-372
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter discusses the tensor core hardware available on newer GPUs. This hardware is designed to perform fast mixed precision matrix multiplications and is intended for applications in AI.However, CUDA exposes their use to programmers with the warp matrix function library. These functions support tiled matrix multiplication using 16 × 16 tiles.We provide examples of their use to improve on the early matrix multiplication example in Chapter 2.We also show how reduction operations can be performed using tensor codes as a potential non-AI application.

6 - Monte Carlo Applications
Richard Ansorge, University of Cambridge
Book:

Programming in Parallel with CUDA

Published online:

04 May 2022

Print publication:

02 June 2022, pp 178-208
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

Chapter 6 explains the CUDA random number generators provided by the cuRAND library. The CUDA XORWOW generator was found to be the fastest generator in the cuRAND library. The classic calculation of pi by generating random numbers inside a square is used as a test case for the various possibilities on both host CPU and the GPU. A kernel using separate generators for each thread is able to generate about1012 random numbers per second and is about 20 000 times faster than the simplest host CPU version running on a single core. The inverse transform method for generating random numbers from any distribution is explained. A 3D Ising model calculation is presented as a more interesting application of random numbers.The Ising example has a simple interactive GUI based on OpenCV.

4 - Parallel Stencils
Richard Ansorge, University of Cambridge
Book:

Programming in Parallel with CUDA

Published online:

04 May 2022

Print publication:

02 June 2022, pp 106-141
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

The solution of partial differential equations in two and three-dimensions using stencil iteration (Jacobi’s method) is discussed and illustrated for Laplace’s equation. A very simple kernel gives about a factor of 100 speed-up compared to the host CPU.The very slow convergence of the Jacobi method can be addressed by using solutions on lower resolution grids to initialise higher resolution grids. A convergence check using the maximum change per iteration is also illustrated. Digital image processing is another example of stencil use and a number of digital image filters are shown including the Sobel filter for edge finding and the median filter for noise reduction. The fast GPU-based median filter uses one thread per image pixel and is implemented using an optimal Batcher network.

Examples
Richard Ansorge, University of Cambridge
Book:

Programming in Parallel with CUDA

Published online:

04 May 2022

Print publication:

02 June 2022, pp xv-xviii
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Figures
Richard Ansorge, University of Cambridge
Book:

Programming in Parallel with CUDA

Published online:

04 May 2022

Print publication:

02 June 2022, pp x-xii
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Index
Richard Ansorge, University of Cambridge
Book:

Programming in Parallel with CUDA

Published online:

04 May 2022

Print publication:

02 June 2022, pp 448-454
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Probabilistic Numerics

Computation as Machine Learning
Philipp Hennig, Michael A. Osborne, Hans P. Kersting
Published online:

01 June 2022

Print publication:

30 June 2022
- Book
- - Get access
    
    Buy a print copy
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Probabilistic numerical computation formalises the connection between machine learning and applied mathematics. Numerical algorithms approximate intractable quantities from computable ones. They estimate integrals from evaluations of the integrand, or the path of a dynamical system described by differential equations from evaluations of the vector field. In other words, they infer a latent quantity from data. This book shows that it is thus formally possible to think of computational routines as learning machines, and to use the notion of Bayesian inference to build more flexible, efficient, or customised algorithms for computation. The text caters for Masters' and PhD students, as well as postgraduate researchers in artificial intelligence, computer science, statistics, and applied mathematics. Extensive background material is provided along with a wealth of figures, worked examples, and exercises (with solutions) to develop intuition.

12 - Reduced-Order Models (ROMs)
from Part IV - Advanced Data-Driven Modeling and Control
Steven L. Brunton, University of Washington, J. Nathan Kutz, University of Washington
Book:

Data-Driven Science and Engineering

Published online:

10 June 2022

Print publication:

05 May 2022, pp 449-484
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Contents
Steven L. Brunton, University of Washington, J. Nathan Kutz, University of Washington
Book:

Data-Driven Science and Engineering

Published online:

10 June 2022

Print publication:

05 May 2022, pp v-viii
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

14 - Concluding Remarks
from Part III - Further Properties of Hybrid Iterative Algorithms and Suggestions for Improvement
Alexander H. Barnett, Charles L. Epstein, Leslie Greengard, Jeremy Magland
Book:

Geometry of the Phase Retrieval Problem

Published online:

21 April 2022

Print publication:

05 May 2022, pp 292-296
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation
Summary

This chapter summarizes the results of the book and describes some directions for future research.

Index
Alexander H. Barnett, Charles L. Epstein, Leslie Greengard, Jeremy Magland
Book:

Geometry of the Phase Retrieval Problem

Published online:

21 April 2022

Print publication:

05 May 2022, pp 263-308
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Index
Steven L. Brunton, University of Washington, J. Nathan Kutz, University of Washington
Book:

Data-Driven Science and Engineering

Published online:

10 June 2022

Print publication:

05 May 2022, pp 588-590
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

5 - Clustering and Classification
from Part II - Machine Learning and Data Analysis
Steven L. Brunton, University of Washington, J. Nathan Kutz, University of Washington
Book:

Data-Driven Science and Engineering

Published online:

10 June 2022

Print publication:

05 May 2022, pp 168-207
- Chapter
- - Get access
    
    Check if you have access via personal or institutional login
    
    Log in Register
- Export citation

Computational Science

Refine search

Refine search

Actions for selected content:

2540 results in Computational Science

Appendix C - The NVCC Compiler

Summary

Contents

Preface

9 - Scaling Up

Summary

1 - Introduction to GPU Kernels and Hardware

Summary

7 - Concurrency Using CUDA Streams and Events

Summary

Appendix B - Atomic Operations

Summary

11 - Tensor Cores

Summary

6 - Monte Carlo Applications

Summary

4 - Parallel Stencils

Summary

Examples

Figures

Index

Probabilistic Numerics

12 - Reduced-Order Models (ROMs)

Contents

14 - Concluding Remarks

Summary

Index

Index

5 - Clustering and Classification

Computational Science

Refine search

Refine search

Actions for selected content:

Save Search

2540 results in Computational Science

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Probabilistic Numerics

Summary