Hostname: page-component-848d4c4894-m9kch Total loading time: 0 Render date: 2024-05-25T00:54:24.228Z Has data issue: false hasContentIssue false

Towards Textbook Efficiency for Parallel Multigrid

Published online by Cambridge University Press:  03 March 2015

Björn Gmeiner*
Institute for Numerical Mathematics (M2), Technische Universität München, Boltzmannstrasse 3, D-85748 Garching b. München, Germany
Ulrich Rüde
Department of Computer Science 10, FAU Erlangen-Nürnberg, Cauerstraße 6, D-91058 Erlangen, Germany
Holger Stengel
Erlangen Regional Computing Center (RRZE), FAU Erlangen-Nürnberg, Martensstraße 1, D–91058 Erlangen, Germany
Christian Waluga
Institute for Numerical Mathematics (M2), Technische Universität München, Boltzmannstrasse 3, D-85748 Garching b. München, Germany
Barbara Wohlmuth
Institute for Numerical Mathematics (M2), Technische Universität München, Boltzmannstrasse 3, D-85748 Garching b. München, Germany
*Email addresses: (B. Gmeiner), (U. Rüde), (C. Waluga), (B. Wohlmuth)
Get access


In this work, we extend Achi Brandt's notion of textbook multigrid efficiency (TME) to massively parallel algorithms. Using a finite element based geometric multigrid implementation, we recall the classical view on TME with experiments for scalar linear equations with constant and varying coefficients as well as linear systems with saddle-point structure. To extend the idea of TME to the parallel setting, we give a new characterization of a work unit (WU) in an architecture-aware fashion by taking into account performance modeling techniques. We illustrate our newly introduced parallel TME measure by large-scale computations, solving problems with up to 200 billion unknowns on a TOP-10 supercomputer.

Research Article
Copyright © Global-Science Press 2015 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)


[1]Adams, M.F.Bayraktar, H.H.Keaveny, T.M. and Papadopoulos, P.Ultrascalable implicit finite element analyses in solid mechanics with over a half a billion degrees of freedom. In ACM/IEEE Proceedings of SC2004: High Performance Networking and Computing. IEEE Computer Society, 2004.Google Scholar
[2]Nathan, Bell, Dalton, Steven and Olson, Luke NExposing fine-grained parallelism in algebraic multigrid methods. SIAM Journal on Scientific Computing, 34(4):C123–C152, 2012.Google Scholar
[3]Bergen, B.Hülsemann, F. and Rüde, U.Is 1.7 x 1010 Unknowns the Largest Finite Element System that Can Be Solved Today? In ACM/IEEE Proceedings of SC2005: High Performance Networking and Computing. IEEE Computer Society, 2005.Google Scholar
[4]Bergen, B.K.Hierarchical Hybrid Grids: Data structures and core algorithms for efficient finite element simulations on supercomputers. SCS Publishing House eV, 2006.Google Scholar
[5]Bergen, B.K.Gradl, T.Hülsemann, F. and Rüde, U.A massively parallel multigrid method for finite elements. Computing in Science and Engineering, 8(6):5662, 2006.CrossRefGoogle Scholar
[6]Bergen, B.K. and Hülsemann, F.Hierarchical hybrid grids: data structures and core algorithms for multigrid. Numerical linear algebra with applications, 11(2–3):279291, 2004.CrossRefGoogle Scholar
[7]Bey, J.Tetrahedral grid refinement. Computing, 55(4):355378, 1995.CrossRefGoogle Scholar
[8]Blatt, M.Ippisch, O. and Bastian, P.A massively parallel algebraic multigrid preconditioner based on aggregation for elliptic problems with heterogeneous coefficients. arXiv preprint arXiv:1209.0960, 2012.Google Scholar
[9]Brandt, A.Barriers to achieving textbook multigrid efficiency (TME) in CFD. Institute for Computer Applications in Science and Engineering, NASA Langley Research Center, 1998.Google Scholar
[10]Brandt, A. and Livne, O.E.Multigrid Techniques: 1984 Guide with Applications to Fluid Dynamics, Revised Edition. Classics in Applied Mathematics. Society for Industrial and Applied Mathematics, 2011.CrossRefGoogle Scholar
[11]James, Brannick, Chen, YaoHu, Xiaozhe and Zikatanov, LudmilParallel unsmoothed aggregation algebraic multigrid algorithms on gpus. In Numerical Solution of Partial Differential Equations: Theory, Algorithms, and Their Applications, pages 81102. Springer New York, 2013.Google Scholar
[12]Brezzi, F. and Pitkaranta, J.On the stabilization of finite element approximations of the Stokes equations. In Hackbusch, W. editor, Efficient Solutions of Elliptic Systems. Springer, 1984.Google Scholar
[13]Chow, E.Falgout, R.D.Hu, J.J.Tuminaro, R.S. and Meier-Yang, U.A survey of par-allelization techniques for multigrid solvers. In Heroux, M.A.Raghavan, P. and Simon, H.D. editors, Parallel processing for scientific computing, number 20 in Software, Environments, and Tools, pages 179201. Society for Industrial and Applied Mathematics, 2006.CrossRefGoogle Scholar
[14]Flaig, C. and Arbenz, P.A scalable memory efficient multigrid solver for micro-finite element analyses based on CT images. Parallel Computing, 37(12):846854, 2011.CrossRefGoogle Scholar
[15]Ghysels, P. and Vanrose, W.Modeling the performance of geometric multigrid on manycore computer architectures. Technical report, ExaScience Lab, Intel Labs Europe, Kapeldreef 75, 3001 Leuven, Belgium, 2013. Submitted.Google Scholar
[16]Gmeiner, B.Design and Analysis ofHierarchical Hybrid Multigrid Methods for Peta-Scale Systems and Beyond. PhD thesis, University of Erlangen-Nuremberg, 2013.Google Scholar
[17]Gmeiner, B.Gradl, T.Gaspar, F. and Rüde, U.Optimization of the multigrid-convergence rate on semi-structured meshes by local Fourier analysis. Computers & Mathematics with Applications, 65(4):694711, 2013.CrossRefGoogle Scholar
[18]Gmeiner, B.Rüde, U.Stengel, H.Waluga, C. and Wohlmuth, B.Performance and scalability of hierarchical hybrid multigrid solvers for Stokes systems. SIAM Journal on Scientific Computing, 2015, accepted.CrossRefGoogle Scholar
[19]Björn, Gmeiner, Köstler, HaraldStürmer, Markus and Rüde, UlrichParallel multigrid on hierarchical hybrid grids: a performance study on current high performance computing clusters. Concurrencyand Computation: Practice and Experience, 26(1):217240, 2014.Google Scholar
[20]Göddeke, D.Strzodka, R.Mohd-Yusof, J.McCormick, P.Buijssen, S.H.M.Grajewski, M. and Turek, S.Exploring weak scalability for FEM calculations on a GPU-enhanced cluster. Parallel Computing, 33(10–11):685699, 2007.CrossRefGoogle Scholar
[21]Hager, G. and Wellein, G.Introduction to high performance computing for scientists and engineers. CRC Press, 2010.CrossRefGoogle Scholar
[22]Georg, Hager, Treibig, JanHabich, Johannes and Wellein, GerhardExploring performance and power properties of modern multi-core chips via simple machine models. to appear in: Concurrencyand Computation: Practice and Experience, 2014.Google Scholar
[23]Heuveline, V.Lukarski, DTrost, N. and Weiss, J.-P.Parallel smoothers for matrix-based geometric multigrid methods on locally refined meshes using multicore CPUs and GPUs. In Facing the Multicore-Challenge II, pages 158171. Springer, 2012.CrossRefGoogle Scholar
[24]Hülsemann, F.Kowarschik, M.Mohr, M. and Rüde, U.Parallel Geometric Multigrid. In Bruaset, A.M. and Tveito, A. editors, Numerical Solution of Partial Differential Equations on Parallel Computers, number 51 in Lecture Notes in Computational Science and Engineering, pages 165208. Springer, 2005.Google Scholar
[25]Harald, Koestler, Ritter, Daniel and Feichtinger, ChristianA geometric multigrid solver on gpu clusters. In GPU Solutions to Multi-scale Problems in Science and Engineering, pages 407422. Springer Berlin Heidelberg, 2013.Google Scholar
[26]McCalpin, J.D.Memory bandwidth and machine balance in current high performance computers. IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, pages 1925, December 1995.Google Scholar
[27]Mueller, E.H. and Scheichl, R.Massively parallel solvers for elliptic PDEs in numerical weather- and climate prediction. ArXiv e-prints, July 2013.Google Scholar
[28]Neic, A.Liebmann, M.Haase, G. and Plank, G.Algebraic multigrid solver on clusters of CPUs and GPUs. In Applied parallel and scientific computing, pages 389398. Springer, 2012.CrossRefGoogle Scholar
[29]Romanazzi, G. and Jimack, P.K.Parallel performance prediction for multigrid codes on distributed memory architectures. In High Performance Computing and Communications, pages 647658. Springer, 2007.CrossRefGoogle Scholar
[30]Sampath, R.S. and Biros, G.A parallel geometric multigrid method for finite elements on octree meshes. SIAM Journal on Scientific Computing, 32(3):13611392, 2010.CrossRefGoogle Scholar
[31]Shaidurov, V.V.Multigrid methods for finite elements, volume 318. Kluwer Academic Publishers (Dordrecht and Boston), 1995.CrossRefGoogle Scholar
[32]Sundar, H.Biros, G.Burstedde, C.Rudi, J.Ghattas, O. and Stadler, G.Parallel geometric algebraic multigrid on unstructured forests of octrees. In ACM/IEEE Proceedings of SC2012: High Performance Computing, Networking, Storage and Analysis, page 43. IEEE Computer Society, 2012.Google Scholar
[33]Treibig, J. and Hager, G.. Introducing a performance model for bandwidth-limited loop kernels. In Wyrzykowski, R.Dongarra, J.Karczewski, K. and Wasniewski, J. editors, Parallel Processing and Applied Mathematics, volume 6067 of Lecture Notes in Computer Science, pages 615624. Springer Berlin / Heidelberg, 2010.CrossRefGoogle Scholar
[34]Verfürth, R.A combined conjugate gradient - multi-grid algorithm for the numerical solution of the Stokes problem. IMA Journal of Numerical Analysis, 4(4):441455, 1984.CrossRefGoogle Scholar
[35]Wieners, C.A geometric data structure for parallel finite elements and the application to multigrid methods with block smoothing. Computing and visualization in science, 13(4):161175, 2010.CrossRefGoogle Scholar
[36]Williams, S.W.Waterman, A. and Patterson, D.A.Roofline: An insightful visual performance model for floating-point programs and multicore architectures. Technical Report UCB/EECS-2008–134, EECS Department, University of California, Berkeley, Oct 2008.Google Scholar
[37]Yavneh, I.On red-black SOR smoothing in multigrid. SIAM Journal on Scientific Computing, 17(1):180192, 1996.CrossRefGoogle Scholar