Hostname: page-component-8448b6f56d-c4f8m Total loading time: 0 Render date: 2024-04-24T19:06:01.577Z Has data issue: false hasContentIssue false

GPU-Accelerated LOBPCG Method with Inexact Null-Space Filtering for Solving Generalized Eigenvalue Problems in Computational Electromagnetics Analysis with Higher-Order FEM

Published online by Cambridge University Press:  28 July 2017

A. Dziekonski*
Affiliation:
Department of Microwave and Antenna Engineering, Faculty of Electronics, Telecommunications and Informatics, Gdansk University of Technology, Gdansk 80-23, Poland; CUDA Research Center for Computational Electromagnetics atGdansk University of Technology (2012-2016)
M. Rewienski*
Affiliation:
Department of Microwave and Antenna Engineering, Faculty of Electronics, Telecommunications and Informatics, Gdansk University of Technology, Gdansk 80-23, Poland; CUDA Research Center for Computational Electromagnetics atGdansk University of Technology (2012-2016)
P. Sypek*
Affiliation:
Department of Microwave and Antenna Engineering, Faculty of Electronics, Telecommunications and Informatics, Gdansk University of Technology, Gdansk 80-23, Poland; CUDA Research Center for Computational Electromagnetics atGdansk University of Technology (2012-2016)
A. Lamecki*
Affiliation:
Department of Microwave and Antenna Engineering, Faculty of Electronics, Telecommunications and Informatics, Gdansk University of Technology, Gdansk 80-23, Poland; CUDA Research Center for Computational Electromagnetics atGdansk University of Technology (2012-2016)
M. Mrozowski*
Affiliation:
Department of Microwave and Antenna Engineering, Faculty of Electronics, Telecommunications and Informatics, Gdansk University of Technology, Gdansk 80-23, Poland; CUDA Research Center for Computational Electromagnetics atGdansk University of Technology (2012-2016)
*
*Corresponding author. Email addresses:adziek@eti.pg.gda.pl (A. Dziekonski), mrewiens@eti.pg.gda.pl (M. Rewienski), psypek@eti.pg.gda.pl (P. Sypek), adam.lamecki@ieee.org (A. Lamecki), m.mrozowski@ieee.org (M. Mrozowski)
*Corresponding author. Email addresses:adziek@eti.pg.gda.pl (A. Dziekonski), mrewiens@eti.pg.gda.pl (M. Rewienski), psypek@eti.pg.gda.pl (P. Sypek), adam.lamecki@ieee.org (A. Lamecki), m.mrozowski@ieee.org (M. Mrozowski)
*Corresponding author. Email addresses:adziek@eti.pg.gda.pl (A. Dziekonski), mrewiens@eti.pg.gda.pl (M. Rewienski), psypek@eti.pg.gda.pl (P. Sypek), adam.lamecki@ieee.org (A. Lamecki), m.mrozowski@ieee.org (M. Mrozowski)
*Corresponding author. Email addresses:adziek@eti.pg.gda.pl (A. Dziekonski), mrewiens@eti.pg.gda.pl (M. Rewienski), psypek@eti.pg.gda.pl (P. Sypek), adam.lamecki@ieee.org (A. Lamecki), m.mrozowski@ieee.org (M. Mrozowski)
*Corresponding author. Email addresses:adziek@eti.pg.gda.pl (A. Dziekonski), mrewiens@eti.pg.gda.pl (M. Rewienski), psypek@eti.pg.gda.pl (P. Sypek), adam.lamecki@ieee.org (A. Lamecki), m.mrozowski@ieee.org (M. Mrozowski)
Get access

Abstract

This paper presents a GPU-accelerated implementation of the Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) method with an inexact nullspace filtering approach to find eigenvalues in electromagnetics analysis with higher-order FEM. The performance of the proposed approach is verified using the Kepler (Tesla K40c) graphics accelerator, and is compared to the performance of the implementation based on functions from the Intel MKL on the Intel Xeon (E5-2680 v3, 12 threads) central processing unit (CPU) executed in parallel mode. Compared to the CPU reference implementation based on the Intel MKL functions, the proposed GPU-based LOBPCG method with inexact nullspace filtering allowed us to achieve up to 2.9-fold acceleration.

Type
Research Article
Copyright
Copyright © Global-Science Press 2017 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

[1] Ingelström, P., A new set of H (curl)-conforming hierarchical basis functions for tetrahedral meshes, Microwave Theory and Techniques, IEEE Transactions on, 54 (1) (2006), 106–114.Google Scholar
[2] Zhu, Y., and Cangellaris, A., Nested multigrid vector and scalar potential finite element method for three-dimensional time-harmonic electromagnetic analysis, Radio Science, 37 (3) (2002), 8:1–8:10.Google Scholar
[3] Chen, Y., Feng, J., Generalized eigenvalue analysis of symmetric prestressed structures using group theory, J. Comput. Civ. Eng., 10, (2012), 488497.Google Scholar
[4] Absil, P. -A., Baker, C. G., and Gallivan, K. A., A truncated-CG style method for symmetric generalized eigenvalue problems, J. Comput. Appl. Math. 189, (2006), 274285.Google Scholar
[5] Sorensen, D. C., Implicitly Restarted Arnoldi/Lanczos Methods for Large Scale Eigenvalue Calculations, Springer Netherlands, 1997.Google Scholar
[6] Knyazev, A. V., Toward the Optimal Preconditioned Eigensolver: Locally Optimal Block Preconditioned Conjugate Gradient Method, SIAM Journal on Scientific Computing 23 (2), (2001), 517541.CrossRefGoogle Scholar
[7] Demmel, J., Dongarra, J., Ruhe, A. and van der Vorst, H., Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide. Bai, Zhaojun (Ed.). Soc. for Industrial and Applied Math., Philadelphia, PA, USA, 2000.Google Scholar
[8] Arbenz, P., Bečka, M., Geus, R., Hetmaniuk, U., T., and Mengotti, , On a parallel multilevel preconditioned Maxwell eigensolver, Parallel Computing, 32 (2), (2006), 157165.Google Scholar
[9] Romero, E., Roman, J. E., A parallel implementation of Davidson methods for large-scale eigenvalue problems in SLEPc, ACM Transactions on Mathematical Software (TOMS) 40 (2) (2014), 13:1–13:29.Google Scholar
[10] Knyazev, A. V., Argentati, M. E., Lashuk, I., Ovtchinnikov, E. E., Block locally optimal preconditioned eigenvalue xolvers (BLOPEX) in hypre and PETSc, SIAM Journal on Scientific Computing, 29 (5), (2007), 22242239.Google Scholar
[11] Langr, D., Tvrdik, P., Evaluation criteria for sparse matrix storage formats, IEEE Transactions on Parallel and Distributed Systems, 27 (2), (2016), 428440.CrossRefGoogle Scholar
[12] Anzt, H., Tomov, S., Luszczek, P., Sawyer, W., Dongarra, J., Acceleration of GPU-based Krylov solvers via data transfer reduction, International Journal of High Performance Computing Applications, 29 (3), (2015), 366383.Google Scholar
[13] Zhang, S., Li, T., Jiao, X., Wang, Y., Yifeng, Y., HLanc: Heterogeneous parallel implementation of the implicitly restarted Lanczos method, 43rd International Conference on Parallel Processing Workshops, IEEE, (2014) 403410.Google Scholar
[14] Anzt, H., Tomov, S., and Dongarra, J., Accelerating the LOBPCG method on GPUs using a blocked sparse matrix vector product, In Proceedings of the Symposium on High Performance Computing (HPC ′15), Society for Computer Simulation International, San Diego, CA, USA, (2015) 7582.Google Scholar
[15] Matrix Algebra on GPU and Multicore Architectures (MAGMA), http://icl.cs.utk.edu/magma/index.html.Google Scholar
[16] Rewienski, M., Lamecki, A. and Mrozowski, M., An extended basis inexact shift-invert Lanczos for the efficient solution of large-scale generalized eigenproblems, Computer Physics Communications 184 (2013), 21272135.Google Scholar
[17] Zhong, L., Two-grid methods for time-harmonic Maxwell equations, Numerical Linear Algebra with Applications, 20 (1) (2013), 93–111.Google Scholar
[18] Kolev, T. V., Pasciak, J. E. and Vassilevski, P. S., H (curl) auxiliary mesh preconditioning, Numerical Linear Algebra with Applications, 15 (5) (2008), 455–471.Google Scholar
[19] Arbenz, P. and Geus, R., Multilevel preconditioned iterative eigensolvers for Maxwell eigenvalue problems, Applied Numerical Mathematics, 54 (2) (2005), 107–121.CrossRefGoogle Scholar
[20] Zhu, Y., Cangellaris, A., Multigrid Finite Element Methods For Electromagnetic Field Modeling, Wiley-Interscience, 2006.Google Scholar
[21] NVIDIA Corporation, CUDA Programming Guide, http://docs.nvidia.com//cuda//index.html Google Scholar
[22] Dziekonski, A., Lamecki, A. and Mrozowski, M., GPU acceleration of multilevel solvers for analysis of microwave components with finite element method, Microwave and Wireless Components Letters, IEEE 21 (1) (2011), 1–3.Google Scholar
[23] Dziekonski, A., Lamecki, A. and Mrozowski, M, Tuning a hybrid GPU–CPU V-Cycle multilevel preconditioner for solving large real and complex systems of FEM equations, Antennas and Wireless Propagation Letters, IEEE, 10 (2011), 619622.Google Scholar
[24] Dziekonski, A., Lamecki, A., and Mrozowski, M., A memory-efficient and fast sparse matrix vector product on a GPU, Progress In Electromagnetics Research, 116, (2011), 4963.CrossRefGoogle Scholar
[25] Schöberl, J., NETGEN an advancing front 2D/3D-mesh generator based on abstract rules, Computing and Visualization in Science, 1 (1), (1997) 4152.Google Scholar
[26] Lamecki, A., Balewski, L. and Mrozowski, M., An efficient framework for fast computer aided design of microwave circuits based on the higher-order 3D finite-element method, Radio-engineering, 23 (4), (2014), 970978.Google Scholar