Skip to main content

Numerical Study of Geometric Multigrid Methods on CPU-GPU Heterogeneous Computers

  • Chunsheng Feng (a1), Shi Shu (a2), Jinchao Xu (a3) and Chen-Song Zhang (a4)

The geometric multigrid method (GMG) is one of the most efficient solving techniques for discrete algebraic systems arising from elliptic partial differential equations. GMG utilizes a hierarchy of grids or discretizations and reduces the error at a number of frequencies simultaneously. Graphics processing units (GPUs) have recently burst onto the scientific computing scene as a technology that has yielded substantial performance and energy-efficiency improvements. A central challenge in implementing GMG on GPUs, though, is that computational work on coarse levels cannot fully utilize the capacity of a GPU. In this work, we perform numerical studies of GMG on CPU-GPU heterogeneous computers. Furthermore, we compare our implementation with an efficient CPU implementation of GMG and with the most popular fast Poisson solver, Fast Fourier Transform, in the cuFFT library developed by NVIDIA.

Corresponding author
Corresponding author. Email:
Hide All
[1]Asanovic, K., Bodik, R., Catanzaro, B. C., Gebis, J. J., Husbands, P., Keutzer, K., Patterson, D. A., Plishker, W. L., Shalf, J., Williams, S. W., and Yelick, K. A., The Landscape of Parallel Computing Research: A View from Berkeley, Technical Report UCB/EECS-2006-183, EECS Department, University of California, Berkeley, 2006.
[2]Bakos, J. D., High-performance heterogeneous computing with the convey HC-1, Comput. Sci. Eng., 12(6) (2010), pp. 8087.
[3]Barrachina, S., Castillo, M., Igual, F. D., Mayo, R., Quintana-Ort I, E. S., and Quintana-Ort I, G., Exploiting the capabilities of modern GPUs for dense matrix computations, Concurrency Comput. Practice Experience, 21(18) (2009), pp. 24572477.
[4]Bell, N., Dalton, S., and Olson, L. N., Exposing fine-grained parallelism in algebraic multigrid methods, Technical report, NVIDIA Technical Report NVR-2011-002,2011.
[5]Bell, N. and Garland, M., Efficient sparse matrix-vector multiplication on CUDA, Memory, (NVR-2008-004), pp. 132, 2008.
[6]Bell, N. and Garland, M., Implementing sparse matrix-vector multiplication on throughput-oriented processors, Proceedings of the Conference on High Performance Computing Networking Storage and Analysis SC 09, (1) (2009), pp. 1.
[7]Bjø Rstad, P. E., Dryja, M., and Rahman, T., Additive Schwarz methods for elliptic mortar finite element problems, Numer. Math., 2003(2) (2003), pp. 427457.
[8]Bjø Rstad, P. E., Manne, F., SøRevik, T., and Vajtersic, M., Efficient matrix multiplication on SIMD computers, SIAM J. Matrix Ana l. Appl., 13(1) (1992), pp. 386401.
[9]Bolz, J., Farmer, I., and Grinspun, E., Sparse matrix solvers on the GPU: conjugate gradients and multigrid, ACM Trans. Graphics, 22 (2003), pp. 917924.
[10]Bramble, J. H., Multigrid methods, Chapman & Hall/CRC, 1993.
[11]Brandt, A., Algebraic multigrid theory: The symmetric case, Appl. Math. Comput., 19(1-4) (1986), pp. 2356.
[12]Brandt, A., Multigrid guide, Technical report, 2011.
[13]Brandt, A., Mccormick, S., and Ruge, J., Algebraic multigrid (AMG)for automatic multigrid solution with application to geodetic computations, Report, Inst. Comput. Studies Colorado State Univ, 109 (1982), pp. 110.
[14]Briggs, W. L., Henson, V. E., and Mccormick, S. F., A Multigrid Tutorial, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, second edition, 2000.
[15]Brodtkorb, A. R., Dyken, C., Hagen, T. R., and Hjelmervik, J. M., State-of-the-art in heterogeneous computing, Sci. Program, 18 (2010), pp. 133.
[16]Buck, I., GPU computing: programming a massively parallel processor, International Symposium on Code Generation and Optimization (CGO’07), (2007), pp. 17.
[17]Cao, W., Yao, L., Li, Z., Wang, Y., and Wang, Z., Implementing Sparse Matrix-Vector multiplication using CUDA based on a hybrid sparse matrix format, In Computer Application and System Modeling ICCASM 2010 International Conference on, 2010(11), pp. 161, IEEE, 2010.
[18]Carpenter, P. and Symon, W., Issues in heterogenenous GPU clusters a historical and usage analysis, Technical report, 2009.
[19]Chamberlain, R. D., Franklin, M. A., Tyson, E. J., Buhler, J., Gayen, S., Crowley, P., and Buckley, J. H., Application development on hybrid systems, In Proceedings of the 2007 ACM/IEEE conference on Supercomputing, SC '07,50, pp. 110, New York, NY, USA, 2007, ACM.
[20]Chen, G., Guobo, L., Pei, S., and Wu, B., High performance computing via a GPU, In International Conference on Information Science and Engineering, pp. 238241, 2009.
[21]Cooley, J. W. and Tukey, J. W., An algorithm for the machine calculation complex fourier series, Math. Comput., 19 (1965), pp. 297301.
[22]Elble, J. M., Sahinidis, N. V., and Vouzis, P., GPU computing with Kaczmarz’s and other iterative algorithms for linear systems, Parallel Comput., 36(5-6) (2010), pp. 215231.
[23]Frigo, M. and Johnson, S. G., The design and implementation ofFFTW3, Proc. IEEE, 93(2) (2005), pp. 216231.
[24]Georgescu, S. AND Okuda, H., Conjugate gradients on multiple GPUs, (2010), pp. 12541273.
[25] Green500, Green500 List, available at , 2012.
[26]Griebel, M., Zur Losung von Finite-Differenzenund Finite-Element-Gleichungen Mittels der Hiearchischen-Transformations-Mehrgitter-Methode, PhD thesis, Technische Universitat Munchen, 1989.
[27]Guo, D. AND Gropp, W., Adaptive Threads Distributions for SpMV on GPU. In XSEDE12 Extreme Scaling Workshop, 2012.
[28]Hackbusch, W., Multi-Grid Methods and Applications, Springer Verlag, 1985.
[29]Heuveline, V., Lukarski, D., Trost, N., and Weiss, J.-P., Parallel smoothers for matrix-based multigrid methods on unstructured meshes using multicore CPUs and GPUs, Technical report, 2011.
[30]Heuveline, V., Lukarski, D., and Weiss, J.-P., Enhanced parallel ILU (p)-based preconditioners for multi-core CPUs and GPUs-the power (q)-pattern method, Technical report, 2011.
[31]Hey, T., Tansley, S., and Tolle, K., The fourth paradigm: data-intensive scientific discovery, Microsoft Research, 2009.
[32]Jeschke, S. and Cline, D., A GPU Laplacian solver for diffusion curves and Poisson image editing, ACM Trans. Graphics (TOG), 28(5) (2009).
[33]Kaushik, D., Keyes, D., Balay, S., and Smith, B., Hybrid programming model for implicit PDE simulations on multicore architectures, Proceedings of the International Workshop on OpenMP (IWOMP), pp. 1221, 2011.
[34]Keyes, D. E., Exaflop/s: the why and the how, Comptes Rendus Mecanique, 339(2-3) (2011), pp. 7077.
[35]Knibbe, H., Oosterlee, C. W., and Vuik, C., GPU implementation of a Helmholtz Krylov solver preconditioned by a shifted Laplace multigrid method, J. Comput. Appl. Math., 236 (2011), pp. 281293.
[36]Kostler, H., Schmid, R., Rüde, U., and Scheit, C., A parallel multigrid accelerated Poisson solver for ab initio molecular dynamics applications, Comput. Visual. Sci., 11(2) (2007), pp. 11522.
[37]Lord, R., Fang, f., Bervoets, f., and Oosterlee, C. W., A fast and accurate FFT-based method for pricing early-exercise options under Levy processes, SIAM J. Sci. Comput., 30(4) (2008), pp. 16781705.
[38] MAGMA, Matrix Algebra on GPU and Multicore Architectures, available at , 2012.
[39]Morton, K. W. AND Mayers, D. F., Numerical Solution of Partial Differential Equations, Cambridge University Press, Cambridge, second edition, 2005.
[40]Nickolls, J. AND Dally, W. J., The GPU computing era, Micro IEEE, 30(2) (2010), pp. 5669.
[41] NVIDIA, CUDA 4.1, available at , 2012.
[42] NVIDIA, cuFFT, available at , 2012.
[43]Ruge, J. W. AND Stüben, K., Algebraic multigrid, Multigrid Methods, 3 (1987), pp. 73130.
[44]Shi, J., Cai, Y., Hou, W., Ma, L., Tan, S. X.-D., Ho, P.-H., AND Wang, X., GPU friendly fast Poisson solver for structured power grid network analysis, In Proceedings of the 46th Annual Design Automation Conference-DAC ’09, pp. 178, New York, New York, USA, 2009, ACM Press.
[45]Stürmer, M., Kostler, H., and Rüde, U., A fast full multigrid solver for applications in image processing, Numer. Linear Algebra Appl., 15 (2008), pp. 187200.
[46] H. P. C. TOP500, HPC Top500, available at , 2012.
[47]Trottenberg, U., Oosterlee, C. W., and Schüller, A., Multigrid, Academic Pr, 2001.
[48]Walker, J. S., Fast Fourier transforms, Studies in Advanced Mathematics, CRC Press, Boca Raton, FL, second edition, 1996.
[49]Weiss, C., Data Locality Optimizations for Multigrid Methods on Structured Grids, PhD thesis, 2001.
[50]Wolfe, M., The Heterogeneous Programming Jungle, HPC Wire, 2012.
[51]Xu, J., Fast Poisson-based solvers for linear and nonlinear PDEs Jinchao Xu, In Proceedings Of The International Congress Of Mathematicians 2010, Number 2000, pp. 28862912, 2010.
[52]Yang, J., Cai, Y., and Zhou, Q., Fast Poisson Solver preconditioned method for robust power grid analysis, In Computer-Aided Design (ICCAD), 2011 IEEE/ACM International Conference on, pp. 531536, 2011.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Advances in Applied Mathematics and Mechanics
  • ISSN: 2070-0733
  • EISSN: 2075-1354
  • URL: /core/journals/advances-in-applied-mathematics-and-mechanics
Please enter your name
Please enter a valid email address
Who would you like to send this to? *



Full text views

Total number of HTML views: 0
Total number of PDF views: 18 *
Loading metrics...

Abstract views

Total abstract views: 159 *
Loading metrics...

* Views captured on Cambridge Core between September 2016 - 23rd March 2018. This data will be updated every 24 hours.