This chapter introduces some general principles of parallel algorithm design. We will consider a few case studies to illustrate broad approaches to parallel algorithms. As already discussed in Chapter 5, the underlying goal for these algorithms is to pose the solution into parcels of relatively independent computation, with occasional interaction. In order to abstract the details of synchronization, we will assume the parallel RAM (PRAM) or the bulk-synchronous parallel (BSP) model to describe and analyze these algorithms. It is a good time for the reminder that going from, say, a PRAM algorithm to one that is efficient on a particular architecture requires refinement and careful design for a particular platform. This is particularly true when “constant time” concurrent read and write operations are assumed. Concurrent reads and writes are particularly inefficient for distributed-memory platforms, and are inefficient for shared-memory platforms as well. It requires synchronization of the processors’ views of the shared memory, which can be expensive.
Question: How do parallel algorithms differ from sequential algorithms?
Recall that PRAM models focus mainly on the computational aspect of algorithm, whereas practical algorithms also require close attention to memory, communication, and synchronization overheads. PRAM algorithms may not always be practical, but they are easier to design than those for more general models. In reality, PRAM algorithms are only the first step toward more practical algorithms, particularly on distributed-memory systems.
Parallel algorithm design often seeks to maximize parallelism and minimize the time complexity. Even if the number of actually available processors is limited, higher parallelism translates to higher scalability in practice. Nonetheless, the work-time scheduling principle (Section 3.5) indicates that low work complexity is paramount for fast execution in practice. In general, if the best sequential complexity of solving the given problem is, say To(n), we would like the parallel work complexity to be O(To(n)). It is a common algorithm design pattern to assume up to To(n) processors and then try to minimize the time complexity. With maximal parallelism, the target time complexity using To(n) processors is O(1). This is not always achievable, and there is often a trade-off between time and work complexity. We then try to reduce the work complexity to O(To(n)), without significantly increasing the time complexity.
Review the options below to login to check your access.
Log in with your Cambridge Higher Education account to check access.
If you believe you should have access to this content, please contact your institutional librarian or consult our FAQ page for further information about accessing our content.