Skip to main content Accessibility help
Internet Explorer 11 is being discontinued by Microsoft in August 2021. If you have difficulties viewing the site on Internet Explorer 11 we recommend using a different browser such as Microsoft Edge, Google Chrome, Apple Safari or Mozilla Firefox.

Chapter 3: Parallel Performance Analysis

Chapter 3: Parallel Performance Analysis

pp. 45-74

Authors

, Indian Institute of Technology, Delhi
  • Add bookmark
  • Cite
  • Share

Extract

Programs need to be correct. Programs also need to be fast. In order to write efficient programs, one surely must know how to evaluate efficiency. One might take recourse to our prior understanding of efficiency in the sequential context and compare observed parallel performance to observed sequential performance. Or, we can define parallel efficiency independent of sequential performance. We may yet draw inspiration from the way efficiency is evaluated in a sequential context. Into that scheme, we would need to incorporate the impact of an increasing number of processors deployed to solve the given problem.

Question: How do you reason about how long an algorithm or program takes?

Efficiency has two metrics. The first is in an abstract setting, for example, the asymptotic analysis of the underlying algorithm. The second is concrete – how well does the algorithm's implementation behave in practice on the available hardware and on data sizes of interest. Both are important.

There is no substitute for measuring the performance of the real implementation on real data. On the other hand, developing and testing iteratively on large parallel systems is prohibitively expensive. Most development occurs on a small scale: using only a few processors, p, on small input of size n. The extrapolation of these tests to a much larger scale is deceptively hard, and we often must resort to simplified models and analysis tools.

Asymptotic analysis on simple models is sometimes criticized because it oversimplifies several complex dynamics (like cache behavior, out-of-order execution on multiple execution engines, instruction dependencies, etc.) and conceals constant multipliers. Nonetheless, with large input sizes that are common in parallel applications, asymptotic measures do have value. They can be computed somewhat easily, in a standardized setting and without requiring iterations on large supercomputers. And, concealing constants is a choice to some degree. Useful constants can and should be retained. Nonetheless, the abstract part of our analysis will employ the big-O notation to describe the number of steps an algorithm takes. It is a function of the input size n and the number of processors p.

Asymptotic notation or not, the time t(n, p) to solve a problem in parallel is a function of n and p. For this purpose, we will generally count in p the number of sequential processors – they complete their program instructions in sequence.

About the book

Access options

Review the options below to login to check your access.

Purchase options

eTextbook
US$64.00
Paperback
US$64.00

Have an access code?

To redeem an access code, please log in with your personal login.

If you believe you should have access to this content, please contact your institutional librarian or consult our FAQ page for further information about accessing our content.

Also available to purchase from these educational ebook suppliers