Skip to main content Accessibility help

Towards Textbook Efficiency for Parallel Multigrid

  • Björn Gmeiner (a1), Ulrich Rüde (a2), Holger Stengel (a3), Christian Waluga (a1) and Barbara Wohlmuth (a1)...


In this work, we extend Achi Brandt's notion of textbook multigrid efficiency (TME) to massively parallel algorithms. Using a finite element based geometric multigrid implementation, we recall the classical view on TME with experiments for scalar linear equations with constant and varying coefficients as well as linear systems with saddle-point structure. To extend the idea of TME to the parallel setting, we give a new characterization of a work unit (WU) in an architecture-aware fashion by taking into account performance modeling techniques. We illustrate our newly introduced parallel TME measure by large-scale computations, solving problems with up to 200 billion unknowns on a TOP-10 supercomputer.


Corresponding author

*Email addresses: (B. Gmeiner), (U. Rüde), (C. Waluga), (B. Wohlmuth)


Hide All
[1]Adams, M.F.Bayraktar, H.H.Keaveny, T.M. and Papadopoulos, P.Ultrascalable implicit finite element analyses in solid mechanics with over a half a billion degrees of freedom. In ACM/IEEE Proceedings of SC2004: High Performance Networking and Computing. IEEE Computer Society, 2004.
[2]Nathan, Bell, Dalton, Steven and Olson, Luke NExposing fine-grained parallelism in algebraic multigrid methods. SIAM Journal on Scientific Computing, 34(4):C123–C152, 2012.
[3]Bergen, B.Hülsemann, F. and Rüde, U.Is 1.7 x 1010 Unknowns the Largest Finite Element System that Can Be Solved Today? In ACM/IEEE Proceedings of SC2005: High Performance Networking and Computing. IEEE Computer Society, 2005.
[4]Bergen, B.K.Hierarchical Hybrid Grids: Data structures and core algorithms for efficient finite element simulations on supercomputers. SCS Publishing House eV, 2006.
[5]Bergen, B.K.Gradl, T.Hülsemann, F. and Rüde, U.A massively parallel multigrid method for finite elements. Computing in Science and Engineering, 8(6):5662, 2006.
[6]Bergen, B.K. and Hülsemann, F.Hierarchical hybrid grids: data structures and core algorithms for multigrid. Numerical linear algebra with applications, 11(2–3):279291, 2004.
[7]Bey, J.Tetrahedral grid refinement. Computing, 55(4):355378, 1995.
[8]Blatt, M.Ippisch, O. and Bastian, P.A massively parallel algebraic multigrid preconditioner based on aggregation for elliptic problems with heterogeneous coefficients. arXiv preprint arXiv:1209.0960, 2012.
[9]Brandt, A.Barriers to achieving textbook multigrid efficiency (TME) in CFD. Institute for Computer Applications in Science and Engineering, NASA Langley Research Center, 1998.
[10]Brandt, A. and Livne, O.E.Multigrid Techniques: 1984 Guide with Applications to Fluid Dynamics, Revised Edition. Classics in Applied Mathematics. Society for Industrial and Applied Mathematics, 2011.
[11]James, Brannick, Chen, YaoHu, Xiaozhe and Zikatanov, LudmilParallel unsmoothed aggregation algebraic multigrid algorithms on gpus. In Numerical Solution of Partial Differential Equations: Theory, Algorithms, and Their Applications, pages 81102. Springer New York, 2013.
[12]Brezzi, F. and Pitkaranta, J.On the stabilization of finite element approximations of the Stokes equations. In Hackbusch, W. editor, Efficient Solutions of Elliptic Systems. Springer, 1984.
[13]Chow, E.Falgout, R.D.Hu, J.J.Tuminaro, R.S. and Meier-Yang, U.A survey of par-allelization techniques for multigrid solvers. In Heroux, M.A.Raghavan, P. and Simon, H.D. editors, Parallel processing for scientific computing, number 20 in Software, Environments, and Tools, pages 179201. Society for Industrial and Applied Mathematics, 2006.
[14]Flaig, C. and Arbenz, P.A scalable memory efficient multigrid solver for micro-finite element analyses based on CT images. Parallel Computing, 37(12):846854, 2011.
[15]Ghysels, P. and Vanrose, W.Modeling the performance of geometric multigrid on manycore computer architectures. Technical report, ExaScience Lab, Intel Labs Europe, Kapeldreef 75, 3001 Leuven, Belgium, 2013. Submitted.
[16]Gmeiner, B.Design and Analysis ofHierarchical Hybrid Multigrid Methods for Peta-Scale Systems and Beyond. PhD thesis, University of Erlangen-Nuremberg, 2013.
[17]Gmeiner, B.Gradl, T.Gaspar, F. and Rüde, U.Optimization of the multigrid-convergence rate on semi-structured meshes by local Fourier analysis. Computers & Mathematics with Applications, 65(4):694711, 2013.
[18]Gmeiner, B.Rüde, U.Stengel, H.Waluga, C. and Wohlmuth, B.Performance and scalability of hierarchical hybrid multigrid solvers for Stokes systems. SIAM Journal on Scientific Computing, 2015, accepted.
[19]Björn, Gmeiner, Köstler, HaraldStürmer, Markus and Rüde, UlrichParallel multigrid on hierarchical hybrid grids: a performance study on current high performance computing clusters. Concurrencyand Computation: Practice and Experience, 26(1):217240, 2014.
[20]Göddeke, D.Strzodka, R.Mohd-Yusof, J.McCormick, P.Buijssen, S.H.M.Grajewski, M. and Turek, S.Exploring weak scalability for FEM calculations on a GPU-enhanced cluster. Parallel Computing, 33(10–11):685699, 2007.
[21]Hager, G. and Wellein, G.Introduction to high performance computing for scientists and engineers. CRC Press, 2010.
[22]Georg, Hager, Treibig, JanHabich, Johannes and Wellein, GerhardExploring performance and power properties of modern multi-core chips via simple machine models. to appear in: Concurrencyand Computation: Practice and Experience, 2014.
[23]Heuveline, V.Lukarski, DTrost, N. and Weiss, J.-P.Parallel smoothers for matrix-based geometric multigrid methods on locally refined meshes using multicore CPUs and GPUs. In Facing the Multicore-Challenge II, pages 158171. Springer, 2012.
[24]Hülsemann, F.Kowarschik, M.Mohr, M. and Rüde, U.Parallel Geometric Multigrid. In Bruaset, A.M. and Tveito, A. editors, Numerical Solution of Partial Differential Equations on Parallel Computers, number 51 in Lecture Notes in Computational Science and Engineering, pages 165208. Springer, 2005.
[25]Harald, Koestler, Ritter, Daniel and Feichtinger, ChristianA geometric multigrid solver on gpu clusters. In GPU Solutions to Multi-scale Problems in Science and Engineering, pages 407422. Springer Berlin Heidelberg, 2013.
[26]McCalpin, J.D.Memory bandwidth and machine balance in current high performance computers. IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, pages 1925, December 1995.
[27]Mueller, E.H. and Scheichl, R.Massively parallel solvers for elliptic PDEs in numerical weather- and climate prediction. ArXiv e-prints, July 2013.
[28]Neic, A.Liebmann, M.Haase, G. and Plank, G.Algebraic multigrid solver on clusters of CPUs and GPUs. In Applied parallel and scientific computing, pages 389398. Springer, 2012.
[29]Romanazzi, G. and Jimack, P.K.Parallel performance prediction for multigrid codes on distributed memory architectures. In High Performance Computing and Communications, pages 647658. Springer, 2007.
[30]Sampath, R.S. and Biros, G.A parallel geometric multigrid method for finite elements on octree meshes. SIAM Journal on Scientific Computing, 32(3):13611392, 2010.
[31]Shaidurov, V.V.Multigrid methods for finite elements, volume 318. Kluwer Academic Publishers (Dordrecht and Boston), 1995.
[32]Sundar, H.Biros, G.Burstedde, C.Rudi, J.Ghattas, O. and Stadler, G.Parallel geometric algebraic multigrid on unstructured forests of octrees. In ACM/IEEE Proceedings of SC2012: High Performance Computing, Networking, Storage and Analysis, page 43. IEEE Computer Society, 2012.
[33]Treibig, J. and Hager, G.. Introducing a performance model for bandwidth-limited loop kernels. In Wyrzykowski, R.Dongarra, J.Karczewski, K. and Wasniewski, J. editors, Parallel Processing and Applied Mathematics, volume 6067 of Lecture Notes in Computer Science, pages 615624. Springer Berlin / Heidelberg, 2010.
[34]Verfürth, R.A combined conjugate gradient - multi-grid algorithm for the numerical solution of the Stokes problem. IMA Journal of Numerical Analysis, 4(4):441455, 1984.
[35]Wieners, C.A geometric data structure for parallel finite elements and the application to multigrid methods with block smoothing. Computing and visualization in science, 13(4):161175, 2010.
[36]Williams, S.W.Waterman, A. and Patterson, D.A.Roofline: An insightful visual performance model for floating-point programs and multicore architectures. Technical Report UCB/EECS-2008–134, EECS Department, University of California, Berkeley, Oct 2008.
[37]Yavneh, I.On red-black SOR smoothing in multigrid. SIAM Journal on Scientific Computing, 17(1):180192, 1996.


Related content

Powered by UNSILO

Towards Textbook Efficiency for Parallel Multigrid

  • Björn Gmeiner (a1), Ulrich Rüde (a2), Holger Stengel (a3), Christian Waluga (a1) and Barbara Wohlmuth (a1)...


Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed.