Skip to main content Accessibility help
Hostname: page-component-684bc48f8b-zqvvz Total loading time: 0.304 Render date: 2021-04-10T18:30:32.230Z Has data issue: true Feature Flags: { "shouldUseShareProductTool": true, "shouldUseHypothesis": true, "isUnsiloEnabled": true, "metricsAbstractViews": false, "figures": false, "newCiteModal": false, "newCitedByModal": true }

Functional and dynamic programming in the design of parallel prefix networks

Published online by Cambridge University Press:  06 December 2010

CSE Department, Chalmers University of Technology, Göteborg, SE-41296, Sweden (e-mail:
E-mail address:


A parallel prefix network of width n takes n inputs, a1, a2, . . ., an, and computes each yi = a1a2 ○ ⋅ ⋅ ⋅ ○ ai for 1 ≤ in, for an associative operator ○. This is one of the fundamental problems in computer science, because it gives insight into how parallel computation can be used to solve an apparently sequential problem. As parallel programming becomes the dominant programming paradigm, parallel prefix or scan is proving to be a very important building block of parallel algorithms and applications. There are many different parallel prefix networks, with different properties such as number of operators, depth and allowed fanout from the operators. In this paper, ideas from functional programming are combined with search to enable a deep exploration of parallel prefix network design. Networks that improve on the best known previous results are generated. It is argued that precise modelling in a functional programming language, together with simple visualization of the networks, gives a new, more experimental, approach to parallel prefix network design, improving on the manual techniques typically employed in the literature. The programming idiom that marries search with higher order functions may well have wider application than the network generation described here.

Copyright © Cambridge University Press 2010

Access options

Get access to the full version of this content by using one of the access options below.


Antoy, S. & Hanus, M. (2010) Functional logic programming, Commun. ACM, 53 (4), 7485.CrossRefGoogle Scholar
Axelsson, E., Björk, M. & Sheeran, M. (2005) Teaching hardware description and verification. In International Conference on Microelectronic Systems Education, MSE. IEEE, pp. 119120.Google Scholar
Axelsson, E. (2008) Functional Programming Enabling Flexible Hardware Design at Low Levels of Abstraction. Ph.D. thesis, Chalmers University of Technology.Google Scholar
Axelsson, E., Dévai, G., Horváth, Z., Keijzer, K., Lyckegård, B., Persson, A., Sheeran, M., Svenningsson, J. & Vajda, A. (2010) Feldspar: A domain specific language for digital signal processing algorithms. In Proceedings of the Eighth ACM/IEEE International Conference on Formal Methods and Models for Codesign, MemoCode. IEEE Computer Society, pp. 169178.CrossRefGoogle Scholar
Bjesse, P., Claessen, K., Sheeran, M. & Singh, S. (1998) Lava: Hardware design in Haskell. In International Conference on Functional Programming, ICFP. ACM, pp. 174184.Google Scholar
Blelloch, G. E. (1990) Prefix Sums and Their Applications. Tech. rept. CMU-CS-90-190. School of Computer Science, Carnegie Mellon University. Also appears in Synthesis of Parallel Algorithms, Reif (ed.), Morgan Kaufmann, 1993.Google Scholar
Brent, R. P. & Kung, H. T. (1982) A regular layout for parallel adders, IEEE Trans. Comput., C-31, 260–264.Google Scholar
Chan, P. K., Schlag, M. D. F., Thomborson, C. D. & Oklobdzija, V. J. (1992) Delay optimization of carry-skip adders and block carry-lookahead adders using multi-dimensional dynamic programming, IEEE Trans. Comput., 41 (8), 920930.CrossRefGoogle Scholar
Claessen, K., Sheeran, M. & Singh, S. (2001) The design and verification of a sorter core. In Correct Hardware Design and Verification Methods, CHARME. Lecture Notes in Computer Science, vol. 2144. Springer, pp. 355369.CrossRefGoogle Scholar
Cormen, T. H., Leiserson, C. E, Rivest, R. L. & Stein, C. (2001) Introduction to Algorithms. 2nd ed.Cambridge, MA: MIT Press.Google Scholar
Fich, F. E. (1982) Two Problems in Concrete complexity: Cycle Detection and Parallel Prefix Computation. Ph.D. thesis, University of California, Berkeley.Google Scholar
Fich, F. E. (1983) New bounds for parallel prefix circuits. In STOC '83: Proceedings of the Fifteenth Annual ACM Symposium on Theory of Computing. ACM Press, pp. 100109.CrossRefGoogle Scholar
Franchetti, F., de Mesmay, F., McFarlin, D. & Püschel, M. (2009) Operator language: A program generation framework for fast Kernels. In Proceedings of IFIP Working Conference on Domain Specific Languages (DSL WC). Lecture Notes in Computer Science, vol. 5658. Springer, pp. 385410.Google Scholar
Giegerich, R., Meyer, C. & Steffen, P. (2002) Towards a discipline of dynamic programming. In Informatik bewegt: Informatik 2002–32. Jahrestagung der Gesellschaft für Informatik e.v. (gi). Lecture Notes in Informatics. Bonner Köllen Verlag, pp. 344.Google Scholar
Gill, A., Bull, T., Kimmell, G., Perrins, E., Komp, E. & Werling, B. (2010) Introducing Kansas Lava. In Proceedings of the 21st Symposium on Implementation and Application of Functional Languages, IFL'09. Lecture Notes in Computer Science, vol. 6041. Springer, pp. 1835.CrossRefGoogle Scholar
Han, T. & Carlson, D. (1987) Fast area-efficient VLSI adders. In Proceedings of International Symposium on Computer Arithmetic. IEEE, pp. 4956.Google Scholar (2009) The web page gathers information about Haskell, compilers, tutorial materials, packages and much more.Google Scholar
Hinze, R. (2000) Memo functions, polytypically! In Proceedings of the Second Workshop on Generic Programming, WGP 2000, Jeuring, J. (ed), pp. 17–32.Google Scholar
Hinze, R. (2004) An Algebra of scans. In Mathematics of Program Construction. Lecture Notes in Computer Science, vol. 3125. Springer, pp. 186210.CrossRefGoogle Scholar
Jones, G. & Sheeran, M. (1990) Circuit design in Ruby. In Formal Methods for VLSI Design, Staunstrup, J. (ed). North-Holland, pp. 1370.Google Scholar
Knowles, S. (1999) A family of adders. In Proceedings of International. Symposium on Computer Arithmetic. IEEE Press, pp. 277284.Google Scholar
Kogge, P. M. & Stone, H. S. (1973) A parallel Algorithm for the efficient solution of a general class of recurrence equations, IEEE Trans. Comput., C-22 (8), 786793.CrossRefGoogle Scholar
Ladner, R. E. & Fischer, M. J. (1980) Parallel prefix computation, J. ACM, 27 (4), 831838.CrossRefGoogle Scholar
Lakshmivarahan, S., Dhall, S. K. & Yang, C.-M. (1987) On a new class of optimal parallel prefix circuits with (Size+Depth) = 2n−2 and ⌈logn⌉ ≤ depth ≤ (2⌈logn⌉ −3). In Proceedings of International Conference on Parallel Processing. Pennsylvania State University Press, pp. 5865.Google Scholar
Lin, Y.-C. & Hung, L.-L. (2009) Straightforward construction of depth-size optimal, parallel prefix circuits with fan-out 2, ACM Trans. Des. Autom. Electron. Syst., 14 (1), 15:115:13.CrossRefGoogle Scholar
Lin, Y.-C., Hsu, Y.-H, & Liu, C.-K. (2003) Constructing H4, a fast depth-size optimal parallel prefix circuit, J. Supercomput., 24 (3), 279304.CrossRefGoogle Scholar
Lin, Y.-C. & Liu, C.-K. (1999) Finding optimal parallel prefix circuits with fan-out 2 in constant time, Inf. Process. Lett., 70 (4), 191195.CrossRefGoogle Scholar
Lin, Y.-C. & Su, C.-Y. (2005) Faster optimal parallel prefix circuits: New algorithmic construction, J. Parallel Distrib. Comput., 65 (12), 15851595.CrossRefGoogle Scholar
Liu, J., Zhu, Y., Zhu, H., Cheng, C.-K. & Lillis, J. (2007) Optimum prefix adders in a comprehensive area, timing and power design space. In ASP-DAC'07: Proceedings of the 2007 Asia and South Pacific Design Automation Conference. Washington, DC, USA: IEEE Computer Society, pp. 609615.CrossRefGoogle Scholar
Martel, C., Oklobdzija, V. G., Ravi, R. & Stelling, P. (1995) Design strategies for optimal multiplier circuits. In Proceedings 12th IEEE Symposium on Computer Arithmetic. IEEE, pp. 4249.CrossRefGoogle Scholar
Naylor, M. (2008) Hardware-Assisted and Target-Directed Evaluation of Functional Programs. Ph.D. thesis, University of York.Google Scholar
Naylor, M., Axelsson, E. & Runciman, C. (2007) A functional-logic library for wired. In Proceedings of the ACM SIGPLAN Haskell Workshop, pp. 37–48.CrossRefGoogle Scholar
Pippenger, N. (1987) The complexity of computations by networks, IBM J. Res. Dev. 31 (2), 235243.CrossRefGoogle Scholar
Püschel, M., Moura, J. M. F., Johnson, J., Padua, D., Veloso, M., Singer, B., Xiong, J., Franchetti, F., Gacic, A., Voronenko, Y., Chen, K., Johnson, R. W. & Rizzolo, N. (2005) SPIRAL: Code generation for DSP transforms. Proceedings of IEEE, Special Issue on Program Generation, Optimization and Adaptation, 93 (2), 232275.Google Scholar
Sheeran, M. (2003) Finding regularity: Describing and analysing circuits that are almost regular, In Correct Hardware Design and Verification Methods, CHARME. Lecture Notes in Computer Science, vol. 2860. Springer, pp. 418.CrossRefGoogle Scholar
Sheeran, M. (2004) Generating fast multipliers using clever circuits. In Formal Methods in Computer-Aided Design, FMCAD. Lecture Notes in Computer Science, vol. 3312. Springer, pp. 620.CrossRefGoogle Scholar
Sheeran, M. & Parberry, I. (2006) A New Approach to the Design of Optimal Parallel Prefix Circuits. Tech. rept. 2006:1. Chalmers: Department of Computer Science and Engineering.Google Scholar
Singh, S. (1992) Circuit analysis by non-standard interpretation. In Designing Correct Circuits. IFIP Transactions, vol. A-5. North-Holland, pp. 119138.Google Scholar
Singh, S. (2000) Death of the RLOC? In FPGAs for Custom Computing Machines (FCCM). IEEE Computer Society Press, pp. 145152.Google Scholar
Sklansky, J. (1960) Conditional-sum addition logic, IRE Trans. Electron. Comput., EC-9, 226231.CrossRefGoogle Scholar
Snir, M. (1986) Depth-size trade-offs for parallel prefix computation. J. Algebra, 7 (2), 185201.Google Scholar
Svensson, J., Sheeran, M. & Claessen, K. (2010) GPGPU Kernel Implementation and Refinement using Obsidian. In Proceedings of the Seventh International Workshop on Practical Aspects of High-level Parallel Programming, ICCS. Procedia, pp. 20592068.Google Scholar
Voigtländer, J. (2008) Much ado about two: A pearl on parallel prefix computation. In Proceedings of the 35th Symposium on Principles of Programming Languages, Wadler, P. (ed), SIGPLAN Notices, vol. 43, no. 1. ACM Press, pp. 2935.Google Scholar
Vuillemin, J. (2006) Use of dynamic programming to find best topology for given technology for 64 bit adder, work done at Digital in 1992. (private communication).Google Scholar
Wadler, P. (1992) Monads for functional programming. In Proceedings of the Marktoberdorf Summer School on Program Design Calculi, vol. 118. Springer-Verlag, NATO ASI Series F: Computer and systems science.Google Scholar
Zhu, H., Cheng, C.-K. & Graham, R. (2006) On the construction of zero-deficiency parallel prefix circuits with minimum depth, ACM Trans. Des. Autom. Electron. Syst., 11 (2), 387409.CrossRefGoogle Scholar
Submit a response


No Discussions have been published for this article.

Full text views

Full text views reflects PDF downloads, PDFs sent to Google Drive, Dropbox and Kindle and HTML full text views.

Total number of HTML views: 0
Total number of PDF views: 21 *
View data table for this chart

* Views captured on Cambridge Core between September 2016 - 10th April 2021. This data will be updated every 24 hours.

Send article to Kindle

To send this article to your Kindle, first ensure is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle. Find out more about sending to your Kindle.

Note you can select to send to either the or variations. ‘’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Functional and dynamic programming in the design of parallel prefix networks
Available formats

Send article to Dropbox

To send this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Dropbox.

Functional and dynamic programming in the design of parallel prefix networks
Available formats

Send article to Google Drive

To send this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Google Drive.

Functional and dynamic programming in the design of parallel prefix networks
Available formats

Reply to: Submit a response

Your details

Conflicting interests

Do you have any conflicting interests? *