Mixed precision algorithms in numerical linear algebra

Nicholas J. Higham; Theo Mary

doi:10.1017/S0962492922000022

Mixed precision algorithms in numerical linear algebra

Published online by Cambridge University Press: 09 June 2022

Nicholas J. Higham

and

Theo Mary

Show author details

Nicholas J. Higham: Affiliation:
Department of Mathematics, University of Manchester, Manchester, M13 9PL, UK E-mail: nick.higham@manchester.ac.uk
Theo Mary: Affiliation:
Sorbonne Université, CNRS, LIP6, Paris, F-75005, France E-mail: theo.mary@lip6.fr

Article contents

Abstract
References

Rights & Permissions

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

Today’s floating-point arithmetic landscape is broader than ever. While scientific computing has traditionally used single precision and double precision floating-point arithmetics, half precision is increasingly available in hardware and quadruple precision is supported in software. Lower precision arithmetic brings increased speed and reduced communication and energy costs, but it produces results of correspondingly low accuracy. Higher precisions are more expensive but can potentially provide great benefits, even if used sparingly. A variety of mixed precision algorithms have been developed that combine the superior performance of lower precisions with the better accuracy of higher precisions. Some of these algorithms aim to provide results of the same quality as algorithms running in a fixed precision but at a much lower cost; others use a little higher precision to improve the accuracy of an algorithm. This survey treats a broad range of mixed precision algorithms in numerical linear algebra, both direct and iterative, for problems including matrix multiplication, matrix factorization, linear systems, least squares, eigenvalue decomposition and singular value decomposition. We identify key algorithmic ideas, such as iterative refinement, adapting the precision to the data, and exploiting mixed precision block fused multiply–add operations. We also describe the possible performance benefits and explain what is known about the numerical stability of the algorithms. This survey should be useful to a wide community of researchers and practitioners who wish to develop or benefit from mixed precision numerical linear algebra algorithms.

Information

Type: Research Article
Information: Acta Numerica , Volume 31 , May 2022 , pp. 347 - 414

DOI: https://doi.org/10.1017/S0962492922000022 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright: © The Author(s), 2022. Published by Cambridge University Press

References

Abdelfattah, A., Anzt, H., Boman, E. G., Carson, E., Cojean, T., Dongarra, J., Fox, A., Gates, M., Higham, N. J., Li, X. S., Loe, J., Luszczek, P., Pranesh, S., Rajamanickam, S., Ribizel, T., Smith, B. F., Swirydowicz, K., Thomas, S., Tomov, S., Tsai, Y. M. and Yang, U. M. (2021a), A survey of numerical linear algebra methods utilizing mixed-precision arithmetic, Int. J. High Perform. Comput. Appl. 35, 344–369.CrossRef Google Scholar

Abdelfattah, A., Costa, T., Dongarra, J., Gates, M., Haidar, A., Hammarling, S., Higham, N. J., Kurzak, J., Luszczek, P., Tomov, S. and Zounon, M. (2021b), A set of Batched Basic Linear Algebra Subprograms and LAPACK routines, ACM Trans. Math. Software 47, 21.CrossRef Google Scholar

Abdelfattah, A., Tomov, S. and Dongarra, J. (2019a), Fast batched matrix multiplication for small sizes using half-precision arithmetic on GPUs, in 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), IEEE, pp. 111–122.CrossRef Google Scholar

Abdelfattah, A., Tomov, S. and Dongarra, J. (2019b), Towards half-precision computation for complex matrices: A case study for mixed-precision solvers on GPUs, in 2019 IEEE/ACM 10th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA), IEEE, pp. 17–24.CrossRef Google Scholar

Abdelfattah, A., Tomov, S. and Dongarra, J. (2020), Investigating the benefit of FP16-enabled mixed-precision solvers for symmetric positive definite matrices using GPUs, in Computational Science – ICCS 2020 (Krzhizhanovskaya, V. V. et al., eds), Vol. 12138 of Lecture Notes in Computer Science, Springer, pp. 237–250.CrossRef Google Scholar

Abdulah, S., Cao, Q., Pei, Y., Bosilca, G., Dongarra, J., Genton, M. G., Keyes, D. E., Ltaief, H. and Sun, Y. (2022), Accelerating geostatistical modeling and prediction with mixed-precision computations: A high-productivity approach with PaRSEC, IEEE Trans. Parallel Distrib. Syst. 33, 964–976.CrossRef Google Scholar

Abdulah, S., Ltaief, H., Sun, Y., Genton, M. G. and Keyes, D. E. (2019), Geostatistical modeling and prediction using mixed precision tile Cholesky factorization, in 2019 IEEE 26th International Conference on High Performance Computing, Data, and Analytics (HiPC), IEEE, pp. 152–162.CrossRef Google Scholar

Agullo, E., Cappello, F., Di, S., Giraud, L., Liang, X. and Schenkels, N. (2020), Exploring variable accuracy storage through lossy compression techniques in numerical linear algebra: A first application to flexible GMRES. Research report RR-9342, Inria Bordeaux Sud-Ouest. Available at hal-02572910v2.Google Scholar

Ahmad, K., Sundar, H. and Hall, M. (2019), Data-driven mixed precision sparse matrix vector multiplication for GPUs, ACM Trans. Archit. Code Optim. 16, 51.Google Scholar

Al-Mohy, A. H., Higham, N. J. and Liu, X. (2022), Arbitrary precision algorithms for computing the matrix cosine and its Fréchet derivative, SIAM J. Matrix Anal. Appl. 43, 233–256.CrossRef Google Scholar

Aliaga, J. I., Anzt, H., Grützmacher, T., Quintana-Ortí, E. S. and Tomás, A. E. (2020), Compressed basis GMRES on high performance GPUs. Available at arXiv:2009.12101.Google Scholar

Alvermann, A., Basermann, A., Bungartz, H.-J., Carbogno, C., Ernst, D., Fehske, H., Futamura, Y., Galgon, M., Hager, G., Huber, S., Huckle, T., Ida, A., Imakura, A., Kawai, M., Köcher, S., Kreutzer, M., Kus, P., Lang, B., Lederer, H., Manin, V., Marek, A., Nakajima, K., Nemec, L., Reuter, K., Rippl, M., Röhrig-Zöllner, M., Sakurai, T., Scheffler, M., Scheurer, C., Shahzad, F., Brambila, D. Simoes, Thies, J. and Wellein, G. (2019), Benefits from using mixed precision computations in the ELPA-AEO and ESSEX-II eigensolver projects, Japan J. Indust. Appl. Math. 36, 699–717.CrossRef Google Scholar

Amestoy, P., Boiteau, O., Buttari, A., Gerest, M., Jézéquel, F., L’Excellent, J.Y. and Mary, T. (2021a), Mixed precision low rank approximations and their application to block low rank LU factorization. Available at hal-03251738.Google Scholar

Amestoy, P., Buttari, A., Higham, N. J., L’Excellent, J.-Y., Mary, T. and Vieublé, B. (2021b), Five-precision GMRES-based iterative refinement. MIMS EPrint 2021.5, Manchester Institute for Mathematical Sciences, The University of Manchester, UK.Google Scholar

Amestoy, P., Buttari, A., Higham, N. J., L’Excellent, J.-Y., Mary, T. and Vieublé, B. (2022), Combining sparse approximate factorizations with mixed precision iterative refinement. MIMS EPrint 2022.2, Manchester Institute for Mathematical Sciences, The University of Manchester, UK.Google Scholar

Amestoy, P., Buttari, A., L’Excellent, J.-Y. and Mary, T. (2019), Performance and scalability of the block low-rank multifrontal factorization on multicore architectures, ACM Trans. Math. Software 45, 2.CrossRef Google Scholar

Amestoy, P., Duff, I. S., L’Excellent, J.-Y. and Koster, J. (2001), A fully asynchronous multifrontal solver using distributed dynamic scheduling, SIAM J. Matrix Anal. Appl. 23, 15–41.CrossRef Google Scholar

Anderson, E. (1991), Robust triangular solves for use in condition estimation. Technical report CS-91-142, Department of Computer Science, The University of Tennessee, Knoxville, TN, USA. LAPACK Working Note 36.Google Scholar

ANSI (1966), American National Standard FORTRAN, American National Standards Institute, New York.Google Scholar

Anzt, H., Dongarra, J. and Quintana-Ortí, E. S. (2015), Adaptive precision solvers for sparse linear systems, in Proceedings of the 3rd International Workshop on Energy Efficient Supercomputing (E2SC ’15), ACM Press, article 2.Google Scholar

Anzt, H., Dongarra, J., Flegar, G., Higham, N. J. and Quintana-Ortí, E. S. (2019a), Adaptive precision in block-Jacobi preconditioning for iterative sparse linear system solvers, Concurrency Comput. Pract. Exper. 31, e4460.CrossRef Google Scholar

Anzt, H., Flegar, G., Grützmacher, T. and Quintana-Ortí, E. S. (2019b), Toward a modular precision ecosystem for high-performance computing, Int. J. High Perform. Comput. Appl. 33, 1069–1078.CrossRef Google Scholar

Appleyard, J. and Yokim, S. (2017), Programming tensor cores in CUDA 9. Available at https://devblogs.nvidia.com/programming-tensor-cores-cuda-9/.Google Scholar

Arioli, M. and Duff, I. S. (2009), Using FGMRES to obtain backward stability in mixed precision, Electron. Trans. Numer. Anal. 33, 31–44.Google Scholar

Arioli, M., Duff, I. S., Gratton, S. and Pralet, S. (2007), A note on GMRES preconditioned by a perturbed $\mathrm{LD}{L}^T$

decomposition with static pivoting, SIAM J. Sci. Comput. 29, 2024–2044.CrossRef Google Scholar

ARM (2018), ARM Architecture Reference Manual. ARMv8, for ARMv8-A Architecture Profile, ARM Limited, Cambridge, UK. Version dated 31 October 2018. Original release dated 30 April 2013.Google Scholar

ARM (2019), Arm A64 Instruction Set Architecture Armv8, for Armv8-A Architecture Profile, ARM Limited, Cambridge, UK.Google Scholar

ARM (2020), Arm Architecture Reference Manual. Armv8, for Armv8-A Architecture Profile, ARM Limited, Cambridge, UK. ARM DDI 0487F.b (ID040120).Google Scholar

Baboulin, M., Buttari, A., Dongarra, J., Kurzak, J., Langou, J., Langou, J., Luszczek, P. and Tomov, S. (2009), Accelerating scientific computations with mixed precision algorithms, Comput. Phys. Comm. 180, 2526–2533.CrossRef Google Scholar

Bailey, D. H. (2021), MPFUN2020: A new thread-safe arbitrary precision package (full documentation). Available at https://www.davidhbailey.com/dhbpapers/mpfun2020.pdf.Google Scholar

Bailey, D. H., Hida, Y., Li, X. S. and Thompson, B. (2002), ARPREC: An arbitrary precision computation package. Technical report LBNL-53651, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.Google Scholar

Bauer, P., Dueben, P. D., Hoefler, T., Quintino, T., Schulthess, T. C. and Wedi, N. P. (2021), The digital revolution of earth-system science, Nature Comput. Sci. 1, 104–113.CrossRef Google Scholar

Bezanson, J., Edelman, A., Karpinski, S. and Shah, V. B. (2017), Julia: A fresh approach to numerical computing, SIAM Rev. 59, 65–98.CrossRef Google Scholar

Björck, Å. (1967), Iterative refinement of linear least squares solutions I, BIT 7, 257–278.CrossRef Google Scholar

Björck, Å. (1996), Numerical Methods for Least Squares Problems, SIAM.CrossRef Google Scholar

Blanchard, P., Higham, N. J. and Mary, T. (2020a), A class of fast and accurate summation algorithms,, SIAM J. Sci. Comput. 42, A1541–A1557.CrossRef Google Scholar

Blanchard, P., Higham, N. J., Lopez, F., Mary, T. and Pranesh, S. (2020b), Mixed precision block fused multiply-add: Error analysis and application to GPU tensor cores, SIAM J. Sci. Comput. 42, C124–C141.CrossRef Google Scholar

Bouras, A. and Frayssé, V. (2005), Inexact matrix-vector products in Krylov methods for solving linear systems: A relaxation strategy, SIAM J. Matrix Anal. Appl. 26, 660–678.CrossRef Google Scholar

Bouras, A., Frayssé, V. and Giraud, L. (2000), A relaxation strategy for inner–outer linear solvers in domain decomposition methods. Technical report TR/PA/00/17, CERFACS, Toulouse, France.Google Scholar

Brun, E., Defour, D., De Oliveira Castro, P., Iştoan, M., Mancusi, D., Petit, E. and Vaquet, A. (2021), A study of the effects and benefits of custom-precision mathematical libraries for HPC codes, IEEE Trans. Emerg. Topics Comput. 9, 1467–1478.CrossRef Google Scholar

Buttari, A., Dongarra, J., Kurzak, J., Luszczek, P. and Tomov, S. (2008), Using mixed precision for sparse matrix computations to enhance the performance while achieving 64-bit accuracy, ACM Trans. Math. Software 34, 17.CrossRef Google Scholar

Buttari, A., Dongarra, J., Langou, J., Langou, J., Luszczek, P. and Kurzak, J. (2007), Mixed precision iterative refinement techniques for the solution of dense linear systems, Int. J. High Perform. Comput. Appl. 21, 457–466.CrossRef Google Scholar

Carson, E. and Higham, N. J. (2017), A new analysis of iterative refinement and its application to accurate solution of ill-conditioned sparse linear systems, SIAM J. Sci. Comput. 39, A2834–A2856.CrossRef Google Scholar

Carson, E. and Higham, N. J. (2018), Accelerating the solution of linear systems by iterative refinement in three precisions, SIAM J. Sci. Comput. 40, A817–A847.CrossRef Google Scholar

Carson, E., Gergelits, T. and Yamazaki, I. (2022a), Mixed precision $s$

-step Lanczos and conjugate gradient algorithms, Numer. Linear Algebra Appl. 29, e2425.CrossRef Google Scholar

Carson, E., Higham, N. J. and Pranesh, S. (2020), Three-precision GMRES-based iterative refinement for least squares problems, SIAM J. Sci. Comput. 42, A4063–A4083.CrossRef Google Scholar

Carson, E., Lund, K., Rozložník, M. and Thomas, S. (2022b), Block Gram–Schmidt algorithms and their stability properties, Linear Algebra Appl. 638, 150–195.CrossRef Google Scholar

Charara, A., Gates, M., Kurzak, J., YarKhan, A. and Dongarra, J. (2020), SLATE developers’ guide. SLATE Working Note 11, Innovative Computing Laboratory, The University of Tennessee, Knoxville, TN, US.Google Scholar

Choquette, J., Gandhi, W., Giroux, O., Stam, N. and Krashinsky, R. (2021), NVIDIA A100 tensor core GPU: Performance and innovation, IEEE Micro 41, 29–35.CrossRef Google Scholar

Clark, M. A., Babich, R., Barros, K., Brower, R. C. and Rebbi, C. (2010), Solving lattice QCD systems of equations using mixed precision solvers on GPUs, Comput. Phys. Comm. 181, 1517–1528.CrossRef Google Scholar

Connolly, M. P. and Higham, N. J. (2022), Probabilistic rounding error analysis of Householder QR factorization. MIMS EPrint 2022.5, Manchester Institute for Mathematical Sciences, The University of Manchester, UK.Google Scholar

Connolly, M. P., Higham, N. J. and Mary, T. (2021), Stochastic rounding and its probabilistic backward error analysis, SIAM J. Sci. Comput. 43, A566–A585.CrossRef Google Scholar

Courbariaux, M., Bengio, Y. and David, J.-P. (2015), Training deep neural networks with low precision multiplications. Available at arXiv:1412.7024v5.Google Scholar

Croarken, M. G. (1985), The centralization of scientific computation in Britain 1925–1955. PhD thesis, University of Warwick, Coventry, UK.Google Scholar

Croci, M., Fasi, M., Higham, N. J., Mary, T. and Mikaitis, M. (2022), Stochastic rounding: Implementation, error analysis, and applications, Roy. Soc. Open Sci. 9, 1–25.Google Scholar

Davies, P. I., Higham, N. J. and Tisseur, F. (2001), Analysis of the Cholesky method with iterative refinement for solving the symmetric definite generalized eigenproblem, SIAM J. Matrix Anal. Appl. 23, 472–493.CrossRef Google Scholar

Davis, T. A. and Hu, Y. (2011), The University of Florida Sparse Matrix Collection, ACM Trans. Math. Software 38, 1.Google Scholar

Dawson, A. and Düben, P. D. (2017), rpe v5: An emulator for reduced floating-point precision in large numerical simulations, Geosci. Model Dev. 10, 2221–2230.CrossRef Google Scholar

Dawson, A., Düben, P. D., MacLeod, D. A. and Palmer, T. N. (2018), Reliable low precision simulations in land surface models, Climate Dynam. 51, 2657–2666.CrossRef Google Scholar

Dean, J. (2020), The deep learning revolution and its implications for computer architecture and chip design, in 2020 IEEE International Solid-State Circuits Conference (ISSCC), IEEE, pp. 8–14.CrossRef Google Scholar

Demmel, J. and Hida, Y. (2004), Accurate and efficient floating point summation, SIAM J. Sci. Comput. 25, 1214–1248.CrossRef Google Scholar

Demmel, J. and Li, X. (1994), Faster numerical algorithms via exception handling, IEEE Trans. Comput. 43, 983–992.CrossRef Google Scholar

Demmel, J., Hida, Y., Riedy, E. J. and Li, X. S. (2009), Extra-precise iterative refinement for overdetermined least squares problems, ACM Trans. Math. Software 35, 28.CrossRef Google Scholar

Dennis, J. E. Jr and Schnabel, R. B. (1983), Numerical Methods for Unconstrained Optimization and Nonlinear Equations, Prentice Hall. Reprinted by SIAM, 1996.Google Scholar

Di, S. and Cappello, F. (2016), Fast error-bounded lossy HPC data compression with SZ, in 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), IEEE, pp. 730–739.CrossRef Google Scholar

Diffenderfer, J., Osei-Kuffuor, D. and Menon, H. (2021), QDOT: Quantized dot product kernel for approximate high-performance computing. Available at arXiv:2105.00115.Google Scholar

Dongarra, J. J. (1980), Improving the accuracy of computed matrix eigenvalues. Preprint ANL-80-84, Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL, USA.Google Scholar

Dongarra, J. J. (1982), Algorithm 589 SICEDR: A FORTRAN subroutine for improving the accuracy of computed matrix eigenvalues, ACM Trans. Math. Software 8, 371–375.CrossRef Google Scholar

Dongarra, J. J. (1983), Improving the accuracy of computed singular values, SIAM J. Sci. Statist. Comput. 4, 712–719.CrossRef Google Scholar

Dongarra, J. J. (2020), Report on the Fujitsu Fugaku system. Technical report ICL-UT-20-06, Innovative Computing Laboratory, The University of Tennessee, Knoxville, TN, USA.Google Scholar

Dongarra, J. J., Bunch, J. R., Moler, C. B. and Stewart, G. W. (1979), LINPACK Users’ Guide, SIAM.CrossRef Google Scholar

Dongarra, J. J., Moler, C. B. and Wilkinson, J. H. (1983), Improving the accuracy of computed eigenvalues and eigenvectors, SIAM J. Numer. Anal. 20, 23–45.CrossRef Google Scholar

Doucet, N., Ltaief, H., Gratadour, D. and Keyes, D. (2019), Mixed-precision tomographic reconstructor computations on hardware accelerators, in 2019 IEEE/ACM 9th Workshop on Irregular Applications: Architectures and Algorithms (IA3), IEEE, pp. 31–38.Google Scholar

Düben, P. D., Subramanian, A., Dawson, A. and Palmer, T. N. (2017), A study of reduced numerical precision to make superparameterization more competitive using a hardware emulator in the OpenIFS model, J. Adv. Model. Earth Syst. 9, 566–584.CrossRef Google Scholar

Duff, I. S. and Pralet, S. (2007), Towards stable mixed pivoting strategies for the sequential and parallel solution of sparse symmetric indefinite systems, SIAM J. Matrix Anal. Appl. 29, 1007–1024.CrossRef Google Scholar

Duff, I. S., Erisman, A. M. and Reid, J. K. (2017), Direct Methods for Sparse Matrices, second edition, Oxford University Press.CrossRef Google Scholar

Emans, M. and van der Meer, A. (2012), Mixed-precision AMG as linear equation solver for definite systems, Procedia Comput. Sci. 1, 175–183.CrossRef Google Scholar

Fasi, M. and Higham, N. J. (2018), Multiprecision algorithms for computing the matrix logarithm, SIAM J. Matrix Anal. Appl. 39, 472–491.Google Scholar

Fasi, M. and Higham, N. J. (2019), An arbitrary precision scaling and squaring algorithm for the matrix exponential, SIAM J. Matrix Anal. Appl. 40, 1233–1256.CrossRef Google Scholar

Fasi, M. and Higham, N. J. (2021), Matrices with tunable infinity-norm condition number and no need for pivoting in LU factorization, SIAM J. Matrix Anal. Appl. 42, 417–435.CrossRef Google Scholar

Fasi, M. and Mikaitis, M. (2020), CPFloat: A C library for emulating low-precision arithmetic. MIMS EPrint 2020.22, Manchester Institute for Mathematical Sciences, The University of Manchester, UK.Google Scholar

Fasi, M., Higham, N. J., Lopez, F., Mary, T. and Mikaitis, M. (2022), Matrix multiplication in multiword arithmetic: Error analysis and application to GPU tensor cores. MIMS EPrint 2022.3, Manchester Institute for Mathematical Sciences, The University of Manchester, UK.Google Scholar

Fasi, M., Higham, N. J., Mikaitis, M. and Pranesh, S. (2021), Numerical behavior of NVIDIA tensor cores, PeerJ Comput. Sci. 7, e330.CrossRef Google Scholar PubMed

Flegar, G., Anzt, H., Cojean, T. and Quintana-Ortí, E. S. (2021), Adaptive precision block-Jacobi for high performance preconditioning in the Ginkgo linear algebra software, ACM Trans. Math. Software 47, 1–28.CrossRef Google Scholar

Fousse, L., Hanrot, G., Lefèvre, V., Pélissier, P. and Zimmermann, P. (2007), MPFR: A multiple-precision binary floating-point library with correct rounding, ACM Trans. Math. Software 33, 13.CrossRef Google Scholar

Fox, L., Huskey, H. D. and Wilkinson, J. H. (1948), The solution of algebraic linear simultaneous equations by punched card methods. Report, Mathematics Division, Department of Scientific and Industrial Research, National Physical Laboratory, Teddington, UK.CrossRef Google Scholar

Fukaya, T., Kannan, R., Nakatsukasa, Y., Yamamoto, Y. and Yanagisawa, Y. (2020), Shifted Cholesky QR for computing the QR factorization of ill-conditioned matrices, SIAM J. Sci. Comput. 42, A477–A503.CrossRef Google Scholar

Gao, J., Zheng, F., Qi, F., Ding, Y., Li, H., Lu, H., He, W., Wei, H., Jin, L., Liu, X., Gong, D., Wang, F., Zheng, Y., Sun, H., Zhou, Z., Liu, Y. and You, H. (2021), Sunway supercomputer architecture towards exascale computing: Analysis and practice, Sci. China Inform. Sci. 64, 141101.CrossRef Google Scholar

Gill, P. E., Saunders, M. A. and Shinnerl, J. R. (1996), On the stability of Cholesky factorization for symmetric quasidefinite systems, SIAM J. Matrix Anal. Appl. 17, 35–46.CrossRef Google Scholar

Giraud, L., Gratton, S. and Langou, J. (2007), Convergence in backward error of relaxed GMRES, SIAM J. Sci. Comput. 29, 710–728.CrossRef Google Scholar

Giraud, L., Haidar, A. and Watson, L. T. (2008), Mixed-precision preconditioners in parallel domain decomposition solvers, in Domain Decomposition Methods in Science and Engineering XVII (Langer, U. et al., eds), Vol. 60 of Lecture Notes in Computational Science and Engineering, Springer, pp. 357–364.CrossRef Google Scholar

Giraud, L., Langou, J., Rozložník, M. and van den Eshof, J. (2005), Rounding error analysis of the classical Gram–Schmidt orthogonalization process, Numer. Math. 101, 87–100.CrossRef Google Scholar

Göbel, F., Grützmacher, T., Ribizel, T. and Anzt, H. (2021), Mixed precision incomplete and factorized sparse approximate inverse preconditioning on GPUs, in Euro-Par 2021: Parallel Processing, Vol. 12820 of Lecture Notes in Computer Science, Springer, pp. 550–564.CrossRef Google Scholar

Goddeke, D. and Strzodka, R. (2011), Cyclic reduction tridiagonal solvers on GPUs applied to mixed-precision multigrid, IEEE Trans. Parallel Distrib. Syst. 22, 22–32.CrossRef Google Scholar

Göddeke, D., Strzodka, R. and Turek, S. (2007), Performance and accuracy of hardware-oriented native-, emulated- and mixed-precision solvers in FEM simulations, Int. J. Parallel Emergent Distrib. Syst. 22, 221–256.CrossRef Google Scholar

Govaerts, W. and Pryce, J. D. (1990), Block elimination with one iterative refinement solves bordered linear systems accurately, BIT 30, 490–507.CrossRef Google Scholar

Graillat, S., Jézéquel, F., Mary, T. and Molina, R. (2022), Adaptive precision matrix–vector product. Available at hal-03561193.Google Scholar

Gratton, S., Simon, E., Titley-Peloquin, D. and Toint, P. (2019), Exploiting variable precision in GMRES. Available at arXiv:1907.10550.Google Scholar

Greenbaum, A. (1997), Estimating the attainable accuracy of recursively computed residual methods, SIAM J. Matrix Anal. Appl. 18, 535–551.CrossRef Google Scholar

Groote, J. F., Morel, R., Schmaltz, J. and Watkins, A. (2021), Logic Gates, Circuits, Processors, Compilers and Computers, Springer.CrossRef Google Scholar

Grützmacher, T., Anzt, H. and Quintana-Ortí, E. S. (2021), Using Ginkgo’s memory accessor for improving the accuracy of memory-bound low precision BLAS, Software Pract. Exper. Available at doi:10.1002/spe.3041.CrossRef Google Scholar

Gulliksson, M. (1994), Iterative refinement for constrained and weighted linear least squares, BIT 34, 239–253.CrossRef Google Scholar

Gupta, S., Agrawal, A., Gopalakrishnan, K. and Narayanan, P. (2015), Deep learning with limited numerical precision, in Proceedings of the 32nd International Conference on Machine Learning (Bach, F. and Blei, D., eds), Vol. 37 of Proceedings of Machine Learning Research, PMLR, pp. 1737–1746.Google Scholar

Haidar, A., Abdelfattah, A., Zounon, M., Wu, P., Pranesh, S., Tomov, S. and Dongarra, J. (2018a), The design of fast and energy-efficient linear solvers: On the potential of half-precision arithmetic and iterative refinement techniques, in Computational Science – ICCS 2018 (Shi, Y. et al., eds), Vol. 10860 of Lecture Notes in Computer Science, Springer, pp. 586–600.CrossRef Google Scholar

Haidar, A., Bayraktar, H., Tomov, S., Dongarra, J. and Higham, N. J. (2020), Mixed-precision iterative refinement using tensor cores on GPUs to accelerate solution of linear systems, Proc . Roy. Soc. London A 476 (2243), 20200110.Google Scholar

Haidar, A., Tomov, S., Dongarra, J. and Higham, N. J. (2018b), Harnessing GPU tensor cores for fast FP16 arithmetic to speed up mixed-precision iterative refinement solvers, in Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC18), IEEE, article 47.Google Scholar

Haidar, A., Wu, P., Tomov, S. and Dongarra, J. (2017), Investigating half precision arithmetic to accelerate dense linear system solvers, in Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA ’17), ACM Press, article 10.CrossRef Google Scholar

Harvey, R. and Verseghy, D. L. (2015), The reliability of single precision computations in the simulation of deep soil heat diffusion in a land surface model, Climate Dynam. 16, 3865–3882.Google Scholar

Henry, G., Tang, P. T. P. and Heinecke, A. (2019), Leveraging the bfloat16 artificial intelligence datatype for higher-precision computations, in 2019 IEEE 26th Symposium on Computer Arithmetic (ARITH), IEEE, pp. 69–76.Google Scholar

Higham, D. J., Higham, N. J. and Pranesh, S. (2021), Random matrices generating large growth in LU factorization with pivoting, SIAM J. Matrix Anal. Appl. 42, 185–201.CrossRef Google Scholar

Higham, N. J. (1986), Computing the polar decomposition: With applications, SIAM J. Sci. Statist. Comput. 7, 1160–1174.CrossRef Google Scholar

Higham, N. J. (1988), Fast solution of Vandermonde-like systems involving orthogonal polynomials, IMA J. Numer. Anal. 8, 473–486.CrossRef Google Scholar

Higham, N. J. (1991), Iterative refinement enhances the stability of $\mathrm{QR}$

factorization methods for solving linear equations, BIT 31, 447–468.CrossRef Google Scholar

Higham, N. J. (1997), Iterative refinement for linear systems and LAPACK, IMA J. Numer. Anal. 17, 495–509.CrossRef Google Scholar

Higham, N. J. (2002), Accuracy and Stability of Numerical Algorithms, second edition, SIAM.CrossRef Google Scholar

Higham, N. J. (2008), Functions of Matrices: Theory and Computation, SIAM.CrossRef Google Scholar

Higham, N. J. (2021), Numerical stability of algorithms at extreme scale and low precisions. MIMS EPrint 2021.14, Manchester Institute for Mathematical Sciences, The University of Manchester, UK. To appear in Proc. Int. Cong. Math. Google Scholar

Higham, N. J. and Liu, X. (2021), A multiprecision derivative-free Schur–Parlett algorithm for computing matrix functions, SIAM J. Matrix Anal. Appl. 42, 1401–1422.CrossRef Google Scholar

Higham, N. J. and Mary, T. (2019a), A new approach to probabilistic rounding error analysis, SIAM J. Sci. Comput. 41, A2815–A2835.CrossRef Google Scholar

Higham, N. J. and Mary, T. (2019b), A new preconditioner that exploits low-rank approximations to factorization error, SIAM J. Sci. Comput. 41, A59–A82.CrossRef Google Scholar

Higham, N. J. and Mary, T. (2020), Sharper probabilistic backward error analysis for basic linear algebra kernels with random data, SIAM J. Sci. Comput. 42, A3427–A3446.CrossRef Google Scholar

Higham, N. J. and Mary, T. (2021), Solving block low-rank linear systems by LU factorization is numerically stable, IMA J. Numer. Anal. Available at doi:10.1093/imanum/drab020.Google Scholar

Higham, N. J. and Pranesh, S. (2019), Simulating low precision floating-point arithmetic, SIAM J. Sci. Comput. 41, C585–C602.CrossRef Google Scholar

Higham, N. J. and Pranesh, S. (2021), Exploiting lower precision arithmetic in solving symmetric positive definite linear systems and least squares problems, SIAM J. Sci. Comput. 43, A258–A277.CrossRef Google Scholar

Higham, N. J., Pranesh, S. and Zounon, M. (2019), Squeezing a matrix into half precision, with an application to solving linear systems, SIAM J. Sci. Comput. 41, A2536–A2551.CrossRef Google Scholar

Ho, N.-M., De Silva, H. and Wong, W.-F. (2021), GRAM: A framework for dynamically mixing precisions in GPU applications, ACM Trans. Archit. Code Optim. 18, 1–24.CrossRef Google Scholar

Hogg, J. D. and Scott, J. A. (2010), A fast and robust mixed-precision solver for the solution of sparse symmetric linear systems, ACM Trans. Math. Software 37, 17.CrossRef Google Scholar

Idomura, Y., Ina, T., Ali, Y. and Imamura, T. (2020), Acceleration of fusion plasma turbulence simulations using the mixed-precision communication-avoiding Krylov method, in International Conference for High Performance Computing, Networking, Storage and Analysis (SC20), IEEE, pp. 1–13.Google Scholar

IEEE (1985), IEEE Standard for Binary Floating-Point Arithmetic, ANSI/IEEE Standard 754-1985, Institute of Electrical and Electronics Engineers.Google Scholar

IEEE (2008), IEEE Standard for Floating-Point Arithmetic, IEEE Std 754-2008 (Revision of IEEE 754-1985), Institute of Electrical and Electronics Engineers.Google Scholar

Intel Corporation (2018), BFLOAT16: Hardware Numerics Definition. White paper. Document number 338302-001US.Google Scholar

Ipsen, I. C. F. and Zhou, H. (2020), Probabilistic error analysis for inner products, SIAM J. Matrix Anal. Appl. 41, 1726–1741.CrossRef Google Scholar PubMed

Iwashita, T., Suzuki, K. and Fukaya, T. (2020), An integer arithmetic-based sparse linear solver using a GMRES method and iterative refinement, in 2020 IEEE/ACM 11th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA), IEEE, pp. 1–8.Google Scholar

Jankowski, M. and Woźniakowski, H. (1977), Iterative refinement implies numerical stability, BIT 17, 303–311.CrossRef Google Scholar

Johansson, F. et al. (2013), Mpmath: A Python library for arbitrary-precision floating-point arithmetic. Available at http://mpmath.org.Google Scholar

Joldes, M., Muller, J.-M. and Popescu, V. (2017), Tight and rigorous error bounds for basic building blocks of double-word arithmetic, ACM Trans. Math. Software 44, 15res.Google Scholar

Jouppi, N. P., Yoon, D. H., Ashcraft, M., Gottscho, M., Jablin, T. B., Kurian, G., Laudon, J., Li, S., Ma, P., Ma, X., Norrie, T., Patil, N., Prasad, S., Young, C., Zhou, Z. and Patterson, D. (2021), Ten lessons from three generations shaped Google’s TPUv4i: Industrial product, in 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA), IEEE, pp. 1–14.Google Scholar

Jouppi, N. P., Yoon, D. H., Kurian, G., Li, S., Patil, N., Laudon, J., Young, C. and Patterson, D. (2020), A domain-specific supercomputer for training deep neural networks, Comm. Assoc. Comput. Mach. 63, 67–78.Google Scholar

Kahan, W. (1981), Why do we need a floating-point arithmetic standard? Technical report, University of California, Berkeley, CA, USA.Google Scholar

Kelley, C. T. (1995), Iterative Methods for Linear and Nonlinear Equations, SIAM.CrossRef Google Scholar

Kelley, C. T. (2022), Newton’s method in mixed precision, SIAM Rev. 64, 191–211.CrossRef Google Scholar

Kiełbasiński, A. (1981), Iterative refinement for linear systems in variable-precision arithmetic, BIT 21, 97–103.Google Scholar

Knight, P. A., Ruiz, D. and Uçar, B. (2014), A symmetry preserving algorithm for matrix scaling, SIAM J. Matrix Anal. Appl. 35, 931–955.CrossRef Google Scholar

Kronbichler, M. and Ljungkvist, K. (2019), Multigrid for matrix-free high-order finite element computations on graphics processors, ACM Trans. Parallel Comput. 6, 2.CrossRef Google Scholar

Kudo, S., Nitadori, K., Ina, T. and Imamura, T. (2020a), Implementation and numerical techniques for one EFlop/s HPL-AI benchmark on Fugaku, in Proceedings of the 11th IEEE/ACM Workshop on Latest Advances in Scalable Algorithms for Large-Scale, Vol. 1, IEEE, pp. 69–76.Google Scholar

Kudo, S., Nitadori, K., Ina, T. and Imamura, T. (2020b), Prompt report on exa-scale HPL-AI benchmark, in 2020 IEEE International Conference on Cluster Computing (CLUSTER), IEEE, pp. 418–419.CrossRef Google Scholar

Kurzak, J. and Dongarra, J. (2007), Implementation of mixed precision in solving systems of linear equations on the Cell processor, Concurrency Comput. Pract. Exper. 19, 1371–1385.CrossRef Google Scholar

Langou, J., Langou, J., Luszczek, P., Kurzak, J., Buttari, A. and Dongarra, J. (2006), Exploiting the performance of 32 bit floating point arithmetic in obtaining 64 bit accuracy (revisiting iterative refinement for linear systems), in Proceedings of the 2006 ACM/IEEE Conference on Supercomputing (SC ’06), IEEE.Google Scholar

Lefèvre, V. and Zimmermann, P. (2017), Optimized binary64 and binary128 arithmetic with GNU MPFR, in 2017 IEEE 24th Symposium on Computer Arithmetic (ARITH), IEEE, pp. 18–26.CrossRef Google Scholar

Li, X. S. and Demmel, J. W. (1998), Making sparse Gaussian elimination scalable by static pivoting, in Proceedings of the 1998 ACM/IEEE Conference on Supercomputing, IEEE, pp. 1–17.Google Scholar

Li, X. S. and Demmel, J. W. (2003), SuperLU_DIST: A scalable distributed-memory sparse direct solver for unsymmetric linear systems, ACM Trans. Math. Software 29, 110–140.CrossRef Google Scholar

Li, X. S., Demmel, J. W., Bailey, D. H., Henry, G., Hida, Y., Iskandar, J., Kahan, W., Kang, S. Y., Kapur, A., Martin, M. C., Thompson, B. J., Tung, T. and Yoo, D. J. (2002), Design, implementation and testing of extended and mixed precision BLAS, ACM Trans. Math. Software 28, 152–205.CrossRef Google Scholar

Lichtenau, C., Carlough, S. and Mueller, S. M. (2016), Quad precision floating point on the IBM z13, in 2016 IEEE 23rd Symposium on Computer Arithmetic (ARITH), IEEE, pp. 87–94.CrossRef Google Scholar

Lindquist, N., Luszczek, P. and Dongarra, J. (2020), Improving the performance of the GMRES method using mixed-precision techniques, in Communications in Computer and Information Science (Nichols, J. et al., eds), Springer, pp. 51–66.Google Scholar

Lindquist, N., Luszczek, P. and Dongarra, J. (2022), Accelerating restarted GMRES with mixed precision arithmetic, IEEE Trans. Parallel Distrib. Syst. 33, 1027–1037.CrossRef Google Scholar

Loe, J. A., Glusa, C. A., Yamazaki, I., Boman, E. G. and Rajamanickam, S. (2021a), Experimental evaluation of multiprecision strategies for GMRES on GPUs, in 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), IEEE, pp. 469–478.CrossRef Google Scholar

Loe, J. A., Glusa, C. A., Yamazaki, I., Boman, E. G. and Rajamanickam, S. (2021b), A study of mixed precision strategies for GMRES on GPUs. Available at arXiv:2109.01232.Google Scholar

Lopez, F. and Mary, T. (2020), Mixed precision LU factorization on GPU tensor cores: Reducing data movement and memory footprint. MIMS EPrint 2020.20, Manchester Institute for Mathematical Sciences, The University of Manchester, UK.Google Scholar

Luszczek, P., Yamazaki, I. and Dongarra, J. (2019), Increasing accuracy of iterative refinement in limited floating-point arithmetic on half-precision accelerators, in 2019 IEEE High Performance Extreme Computing Conference (HPEC), IEEE, pp. 1–6.Google Scholar

Markidis, S., Wei Der Chien, S., Laure, E., Peng, I. B. and Vetter, J. S. (2018), NVIDIA tensor core programmability, performance & precision, in 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), IEEE, pp. 522–531.Google Scholar

Maynard, C. M. and Walters, D. N. (2019), Mixed-precision arithmetic in the ENDGame dynamical core of the unified model, a numerical weather prediction and climate model code, Comput. Phys. Comm. 244, 69–75.CrossRef Google Scholar

McCormick, S. F., Benzaken, J. and Tamstorf, R. (2021), Algebraic error analysis for mixed-precision multigrid solvers, SIAM J. Sci. Comput. 43, S392–S419.CrossRef Google Scholar

Meurer, A., Smith, C. P., Paprocki, M., C̆ertik, O., Kirpichev, S. B., Rocklin, M., Kumar, A., Ivanov, S., Moore, J. K., Singh, S., Rathnayake, T., Vig, S., Granger, B. E., Muller, R. P., Bonazzi, F., Gupta, H., Vats, S., Johansson, F., Pedregosa, F., Curry, M. J., Terrel, A. R., Roučka, Š., Saboo, A., Fernando, I., Kulal, S., Cimrman, R. and Scopatz, A. (2017), SymPy: Symbolic computing in Python, PeerJ Comput. Sci. 3, e103.CrossRef Google Scholar

Moler, C. B. (1967), Iterative refinement in floating point, J. Assoc. Comput. Mach. 14, 316–321.CrossRef Google Scholar

Moler, C. B. (2017), ‘Half precision’ $16$

-bit floating point arithmetic. Available at tp://blogs.mathworks.com/cleve/2017/05/08/half-precision-16-bit-floating-point-arithmetic/. Google Scholar

Moler, C. B. (2019), Variable format half precision floating point arithmetic. Available at https://blogs.mathworks.com/cleve/2019/01/16/variable-format-half-precision-floating-point-arithmetic/.Google Scholar

Mukunoki, D., Ozaki, K., Ogita, T. and Imamura, T. (2020), DGEMM using tensor cores, and its accurate and reproducible versions, in High Performance Computing (Sadayappan, P. et al., eds), Springer, pp. 230–248.CrossRef Google Scholar

Muller, J.-M., Brunie, N., de Dinechin, F., Jeannerod, C.-P., Joldes, M., Lefèvre, V., Melquiond, G., Revol, N. and Torres, S. (2018), Handbook of Floating-Point Arithmetic, second edition, Birkhäuser.CrossRef Google Scholar

Nakata, M. (2021), MPLAPACK version 1.0.0 user manual. Available at arXiv:2109.13406.Google Scholar

Norrie, T., Patil, N., Yoon, D. H., Kurian, G., Li, S., Laudon, J., Young, C., Jouppi, N. and Patterson, D. (2021), The design process for Google’s training chips: TPUv2 and TPUv3, IEEE Micro 41, 56–63.CrossRef Google Scholar

NVIDIA Corporation (2020), NVIDIA A100 Tensor Core GPU Architecture, v1.0.Google Scholar

Ogita, T. and Aishima, K. (2018), Iterative refinement for symmetric eigenvalue decomposition, Japan J. Indust. Appl. Math. 35, 1007–1035.CrossRef Google Scholar

Ogita, T. and Aishima, K. (2019), Iterative refinement for symmetric eigenvalue decomposition II: Clustered eigenvalues, Japan J. Indust. Appl. Math. 36, 435–459.CrossRef Google Scholar

Ogita, T. and Aishima, K. (2020), Iterative refinement for singular value decomposition based on matrix multiplication, J. Comput. Appl. Math. 369, 112512.CrossRef Google Scholar

Oktay, E. and Carson, E. (2022), Multistage mixed precision iterative refinement, Numer. Linear Algebra Appl. Available at doi:10.1002/nla.2434.CrossRef Google Scholar

Oo, K. L. and Vogel, A. (2020), Accelerating geometric multigrid preconditioning with half-precision arithmetic on GPUs. Available at arXiv:2007.07539.Google Scholar

Ooi, R., Iwashita, T., Fukaya, T., Ida, A. and Yokota, R. (2020), Effect of mixed precision computing on H-matrix vector multiplication in BEM analysis, in Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, ACM Press.Google Scholar

O’uchi, S.-i, Fuketa, H., Ikegami, T., Nogami, W., Matsukawa, T., Kudoh, T. and Takano, R. (2018), Image-classifier deep convolutional neural network training by 9-bit dedicated hardware to realize validation accuracy and energy efficiency superior to the half precision floating point format, in 2018 IEEE International Symposium on Circuits and Systems (ISCAS), IEEE, pp. 1–5.Google Scholar

Paige, C. C., Rozložník, M. and Strakoš, Z. (2006), Modified Gram–Schmidt (MGS), least squares, and backward stability of MGS-GMRES, SIAM J. Matrix Anal. Appl. 28, 264–284.CrossRef Google Scholar

Palmer, T. N. (2014), More reliable forecasts with less precise computations: A fast-track route to cloud-resolved weather and climate simulators?, Phil. Trans. R. Soc. A 372 (2018), 1–14.Google Scholar

Palmer, T. N. (2020), The physics of numerical analysis: A climate modelling case study, Phil. Trans. R. Soc. A 378 (2166), 1–6.CrossRef Google Scholar PubMed

Petschow, M., Quintana-Ortí, E. and Bientinesi, P. (2014), Improved accuracy and parallelism for MRRR-based eigensolvers: A mixed precision approach, SIAM J. Sci. Comput. 36, C240–C263.CrossRef Google Scholar

Pisha, L. and Ligowski, L. (2021), Accelerating non-power-of-2 size Fourier transforms with GPU tensor cores, in 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS), IEEE, pp. 507–516.CrossRef Google Scholar

Ralha, R. (2018), Mixed precision bisection, Math. Comput. Sci. 12, 173–181.CrossRef Google Scholar

Rubio-González, C., Nguyen, C., Nguyen, H. D., Demmel, J., Kahan, W., Sen, K., Bailey, D. H., Iancu, C. and Hough, D. (2013), Precimonious: Tuning assistant for floating-point precision, in Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC ’13), ACM Press, article 27.Google Scholar

San Juan, P., Rodríguez-Sánchez, R., Igual, F. D., Alonso-Jordá, P. and Quintana-Ortí, E. S. (2021), Low precision matrix multiplication for efficient deep learning in NVIDIA carmel processors, J. Supercomput. 77, 11257–11269.CrossRef Google Scholar

Sato, M., Ishikawa, Y., Tomita, H., Kodama, Y., Odajima, T., Tsuji, M., Yashiro, H., Aoki, M., Shida, N., Miyoshi, I., Hirai, K., Furuya, A., Asato, A., Morita, K. and Shimizu, T. (2020), Co-design for A64FX manycore processor and ‘Fugaku’, in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’20), IEEE.Google Scholar

Scheinberg, K. (2016), Evolution of randomness in optimization methods for supervised machine learning, SIAG/OPT Views and News 24, 1–8.Google Scholar

Schenk, O., Gärtner, K., Fichtner, W. and Stricker, A. (2001), PARDISO: A high-performance serial and parallel sparse linear solver in semiconductor device simulation, Future Gener. Comput. Syst. 18, 69–78.CrossRef Google Scholar

Simoncini, V. and Szyld, D. B. (2003), Theory of inexact Krylov subspace methods and applications to scientific computing, SIAM J. Sci. Comput. 25, 454–477.CrossRef Google Scholar

Skeel, R. D. (1980), Iterative refinement implies numerical stability for Gaussian elimination, Math. Comp. 35, 817–832.CrossRef Google Scholar

Smoktunowicz, A. and Sokolnicka, J. (1984), Binary cascades iterative refinement in doubled-mantissa arithmetics, BIT 24, 123–127.CrossRef Google Scholar

Sorna, A., Cheng, X., D’Azevedo, E., Won, K. and Tomov, S. (2018), Optimizing the fast Fourier transform using mixed precision on tensor core hardware, in 2018 IEEE 25th International Conference on High Performance Computing Workshops (HiPCW), IEEE, pp. 3–7.CrossRef Google Scholar

Stathopoulos, A. and Wu, K. (2002), A block orthogonalization procedure with constant synchronization requirements, SIAM J. Sci. Comput. 23, 2165–2182.CrossRef Google Scholar

Stewart, G. W. (1973), Introduction to Matrix Computations, Academic Press.Google Scholar

Stor, N. J., Slapničar, I. and Barlow, J. L. (2015), Accurate eigenvalue decomposition of real symmetric arrowhead matrices and applications, Linear Algebra Appl. 464, 62–89.CrossRef Google Scholar

Sumiyoshi, Y., Fujii, A., Nukada, A. and Tanaka, T. (2014), Mixed-precision AMG method for many core accelerators, in Proceedings of the 21st European MPI Users’ Group Meeting (EuroMPI/ASIA ’14), ACM Press, pp. 127–132.CrossRef Google Scholar

Sun, J., Peterson, G. D. and Storaasli, O. O. (2008), High-performance mixed-precision linear solver for FPGAs, IEEE Trans. Comput. 57, 1614–1623.Google Scholar

Tagliavini, G., Mach, S., Rossi, D., Marongiu, A. and Benin, L. (2018), A transprecision floating-point platform for ultra-low power computing, in 2018 Design, Automation and Test in Europe Conference and Exhibition (DATE), pp. 1051–1056.Google Scholar

Tamstorf, R., Benzaken, J. and McCormick, S. F. (2021), Discretization-error-accurate mixed-precision multigrid solvers, SIAM J. Sci. Comput. 43, S420–S447.CrossRef Google Scholar

Tintó Prims, O., Acosta, M. C., Moore, A. M., Castrillo, M., Serradell, K., Cortés, A. and Doblas-Reyes, F. J. (2019), How to use mixed precision in ocean models: Exploring a potential reduction of numerical precision in NEMO 4.0 and ROMS 3.6, Geosci. Model Dev. 12, 3135–3148.CrossRef Google Scholar

Tisseur, F. (2001), Newton’s method in floating point arithmetic and iterative refinement of generalized eigenvalue problems, SIAM J. Matrix Anal. Appl. 22, 1038–1057.CrossRef Google Scholar

Trader, T. (2016), IBM advances against x86 with Power9. Available at https://www.hpcwire.com/2016/08/30/ibm-unveils-power9-details/.Google Scholar

Tsai, Y. M., Luszczek, P. and Dongarra, J. (2021), Mixed-precision algorithm for finding selected eigenvalues and eigenvectors of symmetric and Hermitian matrices. Technical report ICL-UT-21-05, Innovative Computing Laboratory, The University of Tennessee, Knoxville, TN, USA.Google Scholar

Tsuchida, E. and Choe, Y.-K. (2012), Iterative diagonalization of symmetric matrices in mixed precision and its application to electronic structure calculations, Comput. Phys. Comm. 183, 980–985.CrossRef Google Scholar

Turner, K. and Walker, H. F. (1992), Efficient high accuracy solutions with GMRES( $m$

), SIAM J. Sci. Statist. Comput. 12, 815–825.CrossRef Google Scholar

van den Eshof, J. and Sleijpen, G. L. G. (2004), Inexact Krylov subspace methods for linear systems, SIAM J. Matrix Anal. Appl. 26, 125–153.CrossRef Google Scholar

Váňa, F., Düben, P., Lang, S., Palmer, T., Leutbecher, M., Salmond, D. and Carver, G. (2017), Single precision in weather forecasting models: An evaluation with the IFS, Mon . Weather Rev. 145, 495–502.CrossRef Google Scholar

von Neumann, J. and Goldstine, H. H. (1947), Numerical inverting of matrices of high order, Bull. Amer. Math. Soc. 53, 1021–1099.CrossRef Google Scholar

Wang, E., Davis, J. J., Zhao, R., Ng, H.-C., Niu, X., Luk, W., Cheung, P. Y. K. and Constantinides, G. A. (2019), Deep neural network approximation for custom hardware, ACM Comput. Surv. 52, 1–39.CrossRef Google Scholar

Wang, N., Choi, J., Brand, D., Chen, C.-Y. and Gopalakrishnan, K. (2018), Training deep neural networks with $8$

-bit floating point numbers, in Advances in Neural Information Processing Systems 31 (Bengio, S. et al., eds), Curran Associates, pp. 7686–7695.Google Scholar

Wang, S. and Kanwar, P. (2019), BFloat16: The secret to high performance on cloud TPUs. Available at https://cloud.google.com/blog/products/ai-machine-learning/bfloat16-the-secret-to-high-performance-on-cloud-tpus.Google Scholar

Wilkinson, J. H. (1948), Progress report on the Automatic Computing Engine. Report MA/17/1024, Mathematics Division, Department of Scientific and Industrial Research, National Physical Laboratory, Teddington, UK.Google Scholar

Wilkinson, J. H. (1961), Error analysis of direct methods of matrix inversion, J. Assoc. Comput. Mach. 8, 281–330.CrossRef Google Scholar

Wilkinson, J. H. (1963), Rounding Errors in Algebraic Processes, Notes on Applied Science No. 32, Her Majesty’s Stationery Office. Also published by Prentice Hall, USA. Reprinted by Dover, 1994.Google Scholar

Wilkinson, J. H. (1977), The use of the single-precision residual in the solution of linear systems. Unpublished manuscript.Google Scholar

Yamazaki, I., Tomov, S. and Dongarra, J. (2015a), Mixed-precision Cholesky QR factorization and its case studies on multicore CPU with multiple GPUs, SIAM J. Sci. Comput. 37, C307–C330.CrossRef Google Scholar

Yamazaki, I., Tomov, S. and Dongarra, J. (2016), Stability and performance of various singular value QR implementations on multicore CPU with a GPU, ACM Trans. Math. Software 43, 10.CrossRef Google Scholar

Yamazaki, I., Tomov, S., Dong, T. and Dongarra, J. (2015b), Mixed-precision orthogonalization scheme and adaptive step size for improving the stability and performance of CA-GMRES on GPUs, in High Performance Computing for Computational Science (VECPAR 2014) (Daydé, M. et al., eds), Vol. 8969 of Lecture Notes in Computer Science, Springer, pp. 17–30.CrossRef Google Scholar

Yamazaki, I., Tomov, S., Kurzak, J., Dongarra, J. and Barlow, J. (2015c), Mixed-precision block Gram Schmidt orthogonalization, in Proceedings of the 6th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA ’15), ACM Press.Google Scholar

Yang, K., Chen, Y.-F., Roumpos, G., Colby, C. and Anderson, J. (2019), High performance Monte Carlo simulation of Ising model on TPU clusters, in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’19), ACM Press.Google Scholar

Yang, L. M., Fox, A. and Sanders, G. (2021), Rounding error analysis of mixed precision block Householder QR algorithms, SIAM J. Sci. Comput. 43, A1723–A1753.CrossRef Google Scholar

Zhang, S., Baharlouei, E. and Wu, P. (2020), High accuracy matrix computations on neural engines: A study of QR factorization and its applications, in Proceedings of the 29th International Symposium on High-Performance Parallel and Distributed Computing, ACM Press.Google Scholar

Zhu, Y.-K. and Hayes, W. B. (2009), Correct rounding and a hybrid approach to exact floating-point summation, SIAM J. Sci. Comput. 31, 2981–3001.CrossRef Google Scholar

Zlatev, Z. (1982), Use of iterative refinement in the solution of sparse linear systems, SIAM J. Numer. Anal. 19, 381–399.CrossRef Google Scholar

Zounon, M., Higham, N. J., Lucas, C. and Tisseur, F. (2022), Performance impact of precision reduction in sparse linear systems solvers, PeerJ Comput. Sci. 8, e778.CrossRef Google Scholar PubMed

Article contents

Mixed precision algorithms in numerical linear algebra

Abstract

Information

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests