Towards Textbook Efficiency for Parallel Multigrid

Björn Gmeiner; Ulrich Rüde; Holger Stengel; Christian Waluga; Barbara Wohlmuth

doi:10.4208/nmtma.2015.w10si

Towards Textbook Efficiency for Parallel Multigrid

Published online by Cambridge University Press: 03 March 2015

Christian Waluga and

Björn Gmeiner*: Affiliation:
Institute for Numerical Mathematics (M2), Technische Universität München, Boltzmannstrasse 3, D-85748 Garching b. München, Germany
Ulrich Rüde: Affiliation:
Department of Computer Science 10, FAU Erlangen-Nürnberg, Cauerstraße 6, D-91058 Erlangen, Germany
Holger Stengel: Affiliation:
Erlangen Regional Computing Center (RRZE), FAU Erlangen-Nürnberg, Martensstraße 1, D–91058 Erlangen, Germany
Christian Waluga: Affiliation:
Institute for Numerical Mathematics (M2), Technische Universität München, Boltzmannstrasse 3, D-85748 Garching b. München, Germany
Barbara Wohlmuth: Affiliation:
Institute for Numerical Mathematics (M2), Technische Universität München, Boltzmannstrasse 3, D-85748 Garching b. München, Germany
*: *Email addresses: gmeiner@ma.tum.de (B. Gmeiner), ulrich.ruede@fau.de (U. Rüde), waluga@ma.tum.de (C. Waluga), wohlmuth@ma.tum.de (B. Wohlmuth)

Article contents

Abstract
References

Get access

Abstract

In this work, we extend Achi Brandt's notion of textbook multigrid efficiency (TME) to massively parallel algorithms. Using a finite element based geometric multigrid implementation, we recall the classical view on TME with experiments for scalar linear equations with constant and varying coefficients as well as linear systems with saddle-point structure. To extend the idea of TME to the parallel setting, we give a new characterization of a work unit (WU) in an architecture-aware fashion by taking into account performance modeling techniques. We illustrate our newly introduced parallel TME measure by large-scale computations, solving problems with up to 200 billion unknowns on a TOP-10 supercomputer.

Keywords

65N55 68W10 Multigrid parallel computing textbook efficiency finite element method

Information

Type: Research Article
Information: Numerical Mathematics: Theory, Methods and Applications , Volume 8 , Issue 1 , February 2015 , pp. 22 - 46

DOI: https://doi.org/10.4208/nmtma.2015.w10si [Opens in a new window]
Copyright: Copyright © Global-Science Press 2015

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

[1]Adams, M.F.Bayraktar, H.H.Keaveny, T.M. and Papadopoulos, P.Ultrascalable implicit finite element analyses in solid mechanics with over a half a billion degrees of freedom. In ACM/IEEE Proceedings of SC2004: High Performance Networking and Computing. IEEE Computer Society, 2004.Google Scholar

[2]Nathan, Bell, Dalton, Steven and Olson, Luke NExposing fine-grained parallelism in algebraic multigrid methods. SIAM Journal on Scientific Computing, 34(4):C123–C152, 2012.Google Scholar

[3]Bergen, B.Hülsemann, F. and Rüde, U.Is 1.7 x 10¹⁰ Unknowns the Largest Finite Element System that Can Be Solved Today? In ACM/IEEE Proceedings of SC2005: High Performance Networking and Computing. IEEE Computer Society, 2005.Google Scholar

[4]Bergen, B.K.Hierarchical Hybrid Grids: Data structures and core algorithms for efficient finite element simulations on supercomputers. SCS Publishing House eV, 2006.Google Scholar

[5]Bergen, B.K.Gradl, T.Hülsemann, F. and Rüde, U.A massively parallel multigrid method for finite elements. Computing in Science and Engineering, 8(6):56–62, 2006.CrossRef Google Scholar

[6]Bergen, B.K. and Hülsemann, F.Hierarchical hybrid grids: data structures and core algorithms for multigrid. Numerical linear algebra with applications, 11(2–3):279–291, 2004.CrossRef Google Scholar

[7]Bey, J.Tetrahedral grid refinement. Computing, 55(4):355–378, 1995.CrossRef Google Scholar

[8]Blatt, M.Ippisch, O. and Bastian, P.A massively parallel algebraic multigrid preconditioner based on aggregation for elliptic problems with heterogeneous coefficients. arXiv preprint arXiv:1209.0960, 2012.Google Scholar

[9]Brandt, A.Barriers to achieving textbook multigrid efficiency (TME) in CFD. Institute for Computer Applications in Science and Engineering, NASA Langley Research Center, 1998.Google Scholar

[10]Brandt, A. and Livne, O.E.Multigrid Techniques: 1984 Guide with Applications to Fluid Dynamics, Revised Edition. Classics in Applied Mathematics. Society for Industrial and Applied Mathematics, 2011.CrossRef Google Scholar

[11]James, Brannick, Chen, YaoHu, Xiaozhe and Zikatanov, LudmilParallel unsmoothed aggregation algebraic multigrid algorithms on gpus. In Numerical Solution of Partial Differential Equations: Theory, Algorithms, and Their Applications, pages 81–102. Springer New York, 2013.Google Scholar

[12]Brezzi, F. and Pitkaranta, J.On the stabilization of finite element approximations of the Stokes equations. In Hackbusch, W. editor, Efficient Solutions of Elliptic Systems. Springer, 1984.Google Scholar

[13]Chow, E.Falgout, R.D.Hu, J.J.Tuminaro, R.S. and Meier-Yang, U.A survey of par-allelization techniques for multigrid solvers. In Heroux, M.A.Raghavan, P. and Simon, H.D. editors, Parallel processing for scientific computing, number 20 in Software, Environments, and Tools, pages 179–201. Society for Industrial and Applied Mathematics, 2006.CrossRef Google Scholar

[14]Flaig, C. and Arbenz, P.A scalable memory efficient multigrid solver for micro-finite element analyses based on CT images. Parallel Computing, 37(12):846–854, 2011.CrossRef Google Scholar

[15]Ghysels, P. and Vanrose, W.Modeling the performance of geometric multigrid on manycore computer architectures. Technical report, ExaScience Lab, Intel Labs Europe, Kapeldreef 75, 3001 Leuven, Belgium, 2013. Submitted.Google Scholar

[16]Gmeiner, B.Design and Analysis ofHierarchical Hybrid Multigrid Methods for Peta-Scale Systems and Beyond. PhD thesis, University of Erlangen-Nuremberg, 2013.Google Scholar

[17]Gmeiner, B.Gradl, T.Gaspar, F. and Rüde, U.Optimization of the multigrid-convergence rate on semi-structured meshes by local Fourier analysis. Computers & Mathematics with Applications, 65(4):694–711, 2013.CrossRef Google Scholar

[18]Gmeiner, B.Rüde, U.Stengel, H.Waluga, C. and Wohlmuth, B.Performance and scalability of hierarchical hybrid multigrid solvers for Stokes systems. SIAM Journal on Scientific Computing, 2015, accepted.CrossRef Google Scholar

[19]Björn, Gmeiner, Köstler, HaraldStürmer, Markus and Rüde, UlrichParallel multigrid on hierarchical hybrid grids: a performance study on current high performance computing clusters. Concurrencyand Computation: Practice and Experience, 26(1):217–240, 2014.Google Scholar

[20]Göddeke, D.Strzodka, R.Mohd-Yusof, J.McCormick, P.Buijssen, S.H.M.Grajewski, M. and Turek, S.Exploring weak scalability for FEM calculations on a GPU-enhanced cluster. Parallel Computing, 33(10–11):685–699, 2007.CrossRef Google Scholar

[21]Hager, G. and Wellein, G.Introduction to high performance computing for scientists and engineers. CRC Press, 2010.CrossRef Google Scholar

[22]Georg, Hager, Treibig, JanHabich, Johannes and Wellein, GerhardExploring performance and power properties of modern multi-core chips via simple machine models. to appear in: Concurrencyand Computation: Practice and Experience, 2014.Google Scholar

[23]Heuveline, V.Lukarski, DTrost, N. and Weiss, J.-P.Parallel smoothers for matrix-based geometric multigrid methods on locally refined meshes using multicore CPUs and GPUs. In Facing the Multicore-Challenge II, pages 158–171. Springer, 2012.CrossRef Google Scholar

[24]Hülsemann, F.Kowarschik, M.Mohr, M. and Rüde, U.Parallel Geometric Multigrid. In Bruaset, A.M. and Tveito, A. editors, Numerical Solution of Partial Differential Equations on Parallel Computers, number 51 in Lecture Notes in Computational Science and Engineering, pages 165–208. Springer, 2005.Google Scholar

[25]Harald, Koestler, Ritter, Daniel and Feichtinger, ChristianA geometric multigrid solver on gpu clusters. In GPU Solutions to Multi-scale Problems in Science and Engineering, pages 407–422. Springer Berlin Heidelberg, 2013.Google Scholar

[26]McCalpin, J.D.Memory bandwidth and machine balance in current high performance computers. IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, pages 19–25, December 1995.Google Scholar

[27]Mueller, E.H. and Scheichl, R.Massively parallel solvers for elliptic PDEs in numerical weather- and climate prediction. ArXiv e-prints, July 2013.Google Scholar

[28]Neic, A.Liebmann, M.Haase, G. and Plank, G.Algebraic multigrid solver on clusters of CPUs and GPUs. In Applied parallel and scientific computing, pages 389–398. Springer, 2012.CrossRef Google Scholar

[29]Romanazzi, G. and Jimack, P.K.Parallel performance prediction for multigrid codes on distributed memory architectures. In High Performance Computing and Communications, pages 647–658. Springer, 2007.CrossRef Google Scholar

[30]Sampath, R.S. and Biros, G.A parallel geometric multigrid method for finite elements on octree meshes. SIAM Journal on Scientific Computing, 32(3):1361–1392, 2010.CrossRef Google Scholar

[31]Shaidurov, V.V.Multigrid methods for finite elements, volume 318. Kluwer Academic Publishers (Dordrecht and Boston), 1995.CrossRef Google Scholar

[32]Sundar, H.Biros, G.Burstedde, C.Rudi, J.Ghattas, O. and Stadler, G.Parallel geometric algebraic multigrid on unstructured forests of octrees. In ACM/IEEE Proceedings of SC2012: High Performance Computing, Networking, Storage and Analysis, page 43. IEEE Computer Society, 2012.Google Scholar

[33]Treibig, J. and Hager, G.. Introducing a performance model for bandwidth-limited loop kernels. In Wyrzykowski, R.Dongarra, J.Karczewski, K. and Wasniewski, J. editors, Parallel Processing and Applied Mathematics, volume 6067 of Lecture Notes in Computer Science, pages 615–624. Springer Berlin / Heidelberg, 2010.CrossRef Google Scholar

[34]Verfürth, R.A combined conjugate gradient - multi-grid algorithm for the numerical solution of the Stokes problem. IMA Journal of Numerical Analysis, 4(4):441–455, 1984.CrossRef Google Scholar

[35]Wieners, C.A geometric data structure for parallel finite elements and the application to multigrid methods with block smoothing. Computing and visualization in science, 13(4):161–175, 2010.CrossRef Google Scholar

[36]Williams, S.W.Waterman, A. and Patterson, D.A.Roofline: An insightful visual performance model for floating-point programs and multicore architectures. Technical Report UCB/EECS-2008–134, EECS Department, University of California, Berkeley, Oct 2008.Google Scholar

[37]Yavneh, I.On red-black SOR smoothing in multigrid. SIAM Journal on Scientific Computing, 17(1):180–192, 1996.CrossRef Google Scholar

Article contents

Towards Textbook Efficiency for Parallel Multigrid

Abstract

Keywords

Information

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests