Parallel Performance Analysis

Subodh Kumar

doi:10.1017/9781009069533.004

Programs need to be correct. Programs also need to be fast. In order to write efficient programs, one surely must know how to evaluate efficiency. One might take recourse to our prior understanding of efficiency in the sequential context and compare observed parallel performance to observed sequential performance. Or, we can define parallel efficiency independent of sequential performance. We may yet draw inspiration from the way efficiency is evaluated in a sequential context. Into that scheme, we would need to incorporate the impact of an increasing number of processors deployed to solve the given problem.

Question: How do you reason about how long an algorithm or program takes?

Efficiency has two metrics. The first is in an abstract setting, for example, the asymptotic analysis of the underlying algorithm. The second is concrete – how well does the algorithm's implementation behave in practice on the available hardware and on data sizes of interest. Both are important.

There is no substitute for measuring the performance of the real implementation on real data. On the other hand, developing and testing iteratively on large parallel systems is prohibitively expensive. Most development occurs on a small scale: using only a few processors, p, on small input of size n. The extrapolation of these tests to a much larger scale is deceptively hard, and we often must resort to simplified models and analysis tools.

Asymptotic analysis on simple models is sometimes criticized because it oversimplifies several complex dynamics (like cache behavior, out-of-order execution on multiple execution engines, instruction dependencies, etc.) and conceals constant multipliers. Nonetheless, with large input sizes that are common in parallel applications, asymptotic measures do have value. They can be computed somewhat easily, in a standardized setting and without requiring iterations on large supercomputers. And, concealing constants is a choice to some degree. Useful constants can and should be retained. Nonetheless, the abstract part of our analysis will employ the big-O notation to describe the number of steps an algorithm takes. It is a function of the input size n and the number of processors p.

Asymptotic notation or not, the time t(n, p) to solve a problem in parallel is a function of n and p. For this purpose, we will generally count in p the number of sequential processors – they complete their program instructions in sequence.

Book contents

3 - Parallel Performance Analysis

Summary

Information

Access options

Book purchase

Temporarily unavailable

Accessibility standard: Unknown