Hostname: page-component-76fb5796d-vvkck Total loading time: 0 Render date: 2024-04-26T08:38:15.343Z Has data issue: false hasContentIssue false

DASHMM: Dynamic Adaptive System for Hierarchical Multipole Methods

Published online by Cambridge University Press:  05 October 2016

J. DeBuhr*
Affiliation:
Center for Research in Extreme Scale Technologies, School of Informatics and Computing, Indiana University, Bloomington, IN, 47404, USA
B. Zhang
Affiliation:
Center for Research in Extreme Scale Technologies, School of Informatics and Computing, Indiana University, Bloomington, IN, 47404, USA
A. Tsueda
Affiliation:
College of Arts and Sciences, Loyola University Chicago, Chicago, IL, 60660, USA
V. Tilstra-Smith
Affiliation:
Department of Physics and Mathematics, Central College, Pella, IA, 50219, USA
T. Sterling
Affiliation:
Center for Research in Extreme Scale Technologies, School of Informatics and Computing, Indiana University, Bloomington, IN, 47404, USA
*
*Corresponding author. Email address:jdebuhr@indiana.edu (J. DeBuhr)
Get access

Abstract

We present DASHMM, a general library implementing multipole methods (including both Barnes-Hut and the Fast Multipole Method). DASHMM relies on dynamic adaptive runtime techniques provided by the HPX-5 system to parallelize the resulting multipole moment computation. The result is a library that is easy-to-use, extensible, scalable, efficient, and portable. We present both the abstractions defined by DASHMM as well as the specific features of HPX-5 that allow the library to execute scalably and efficiently.

Type
Computational Software
Copyright
Copyright © Global-Science Press 2016 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

[2] Agullo, E., Bramas, B., Coulaud, O., Darve, E., Messner, M., and Takahashi, T.. Task-based FMM for multicore architectures. SIAM J. Sci. Comput., 36:C66–C93, 2014.Google Scholar
[3] Aluru, S., Gustafson, J., Prabhu, G. M., and Sevilgen, F. E.. Distribution-independent hierarchical algorithms for the N-body problem. J. SuperComput., 12:303323, 1998.Google Scholar
[4] Amer, A., Maruyama, N., Pericás, M., Taura, K., Yokota, R., and Matsuoka, S.. Fork-join and data-driven exeuction models on multi-core architectures: Case study of the FMM. Lect. Notes. Comput. Sc., 7905:255266, 2013.CrossRefGoogle Scholar
[5] Barnes, J. and Hut, P.. A hierarchical O(N log N) force-calculation algorithm. Nature, 324:446449, December 1986.Google Scholar
[6] Bosilca, G., Bouteiller, A., Danalis, A., Herault, T., Lemarinier, P., and Dongarra, J.. DAGuE: A generic distributed DAG engine for high performance computing. Parallel Comput., 38:3751, 2012.CrossRefGoogle Scholar
[7] Chew, W. C., Jin, J. M., Michielssen, E., and Song, J. M.. Fast and Efficient Algorithm in Computational Electromagnetics. Artech House, 2001.Google Scholar
[8] Cruz, F. A., Knepley, M. G., and Barba, L. A.. PetFMM–A dynamically load-balancing parallel fast multipole library. Int. J. Numer. Meth. Eng., 85:403428, 2011.Google Scholar
[9] Greengard, L. and Gropp, W.. A parallel version of the fast multipole method. Comput. Math. Appl., 20:6371, 1990.CrossRefGoogle Scholar
[10] Greengard, L., Kropinski, MC, and Mayo, A.. Integral Equation Methods for Stokes Flow and Isotropic Elasticity in the Plane. J. Comput. Phys., 125:403414, 1996.CrossRefGoogle Scholar
[11] Greengard, L. and Rokhlin, V.. A fast algorithm for particle simulations. J. Comput. Phys., 73:325348, 1987.Google Scholar
[12] Gumerov, N. A. and Duraiswami, R.. Fast multipole methods on graphics processors. J. Comput. Phys., 227:82908313, 2008.Google Scholar
[13] Board, J. A. Jr., Causey, J. W., and Leathrum, J. F. Jr. Accelerated molecular dynamics simulation with the parallel fast multipole algorithm. Chem. Phys. Lett., 198:8994, 1992.CrossRefGoogle Scholar
[14] Leathrum, J. F. Jr. and Board, J. A. Jr. Mapping the adaptive fast multipole algorithm onto MIMD systems. In Unstructured Scientific Computation on Scalable Multiprocessors, pages 161177, Nags Head, NC, USA, 1992.Google Scholar
[15] Kurzak, J. and Pettitt, B.M.. Communications overlapping in fast multipole particle dynamics methods. J. Comput. Phys., 203:731743, 2005.Google Scholar
[16] Kurzak, J. and Pettitt, B. M.. Massively parallel implementation of a fast multipole method for distributed memory machines. J. Parallel Distr. Com., 65:870881, 2005.CrossRefGoogle Scholar
[17] Lashuk, I., Chandramowlishwaran, A., Langston, H., Nguyen, T.-A., Sampath, R., Shringarpure, A., Vuduc, R., Ying, L., Zorin, D., and Biros, G.. A massively parallel adaptive fast-multipole method on heterogeneous architectures. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, 2009.Google Scholar
[18] Ltaief, H. and Yokota, R.. Data-driven execution of fast multipole methods. CoRR, abs/1203.0889, 2012.Google Scholar
[19] Lu, B., Cheng, X., Huang, J., and McCammon, J.. Order N algorithm for computation of electrostatic Interactions in biomolecular systems. Proceedings of the National Academy of Sciences, 103:1931419319, 2006.Google Scholar
[20] Rahimian, A., Lashuk, I., Veerapaneni, S. K., Chandramowlishwaran, A., Malhotra, D., Moon, L., Sampath, R., Shringarpure, A., Vetter, J., Vuduc, R., Zorin, D., and Biros, G.. Petascale direct numerical simulation of blood flow on 200K cores and heterogeneous architectures. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, 2010.Google Scholar
[21] Singh, J., Holt, C., Hennessy, J., and Gupta, A.. A parallel adaptive fast multipole method. In SC 93’: Proceedings of the 1993 ACM/IEEE Conference on Supercomputing, 1993.Google Scholar
[22] Singh, J., Holt, C., Totsuka, T., Gupta, A., and Hennessy, J.. Load balancing and data locality in adaptive hierarchical n-body methods: Barnes-Hut, fast multipole, and radiosity. J. Parallel Distr. Com., 27:118141, 1995.Google Scholar
[23] Springel, V., Wang, J., Vogelsberger, M., Ludlow, A., Jenkins, A., Helmi, A., Navarro, J. F., Frenk, C. S., and White, S. D. M.. The Aquarius Project: the subhaloes of galactic haloes. MNRAS, 391:16851711, December 2008.Google Scholar
[24] Teng, S.. Provably good partitioning and load balancing algorithms for parallel adaptive n-body simulation. SIAM J. Sci. Comput., 19:635656, 1998.Google Scholar
[25] Wang, H., Lei, T., Li, J., Huang, J., and Yao, Z.. A parallel fast multipole accelerated integral equation scheme for 3D Stokes equations. Int. J. Numer. Meth. Eng., 70:812839, 2007.Google Scholar
[26] Warren, M. and Salmon, J.. Astrophysical n-body simulation using hierarchical tree data structures. In SC 92’: Proceedings of the 1992 ACM/IEEE Conference on Supercomputing, 1992.Google Scholar
[27] Warren, M. and Salmon, J.. A parallel hashed oct-tree n-body algorithm. In SC 93’: Proceedings of the 1993 ACM/IEEE Conference on Supercomputing, 1993.Google Scholar
[28] Wu, W., Bosilca, G., Bouteiller, A., Faverge, M., and Dongarra, J.. Hierarchical DAG scheduling for hybrid distributed systems. In IPDPS, Hyderabad, India, 2015.Google Scholar
[29] Ying, L., Biros, G., Zorin, D., and Langston, H.. A new parallel kernel-independent fast multipole method. In SC ’03: Proceedings of the 2003 ACM/IEEE Conference on Supercomputing, 2003.Google Scholar
[30] Yokota, R., Bardhan, J. P., Knepley, M. G., Barba, L. A., and Hamada, T.. Biomolecular electrostatics using a fastmultipole BEMon up to 512 GPUs and a billion unknowns. Comput. Phys. Commun., 182:12721283, 2011.CrossRefGoogle Scholar
[31] Yuan, Y. and Banerjee, P.. A parallel implementation of a fast multipole based 3D capacitance extraction program on distributed memory multicomputers. J. Parallel Distr. Com., 61:17511774, 2001.Google Scholar
[32] Zhang, B.. Asynchronous task scheduling of the fast multipole method using various runtime systems. In Proceedings of the Forth Workshop on Data-Flow Execution Models for Extreme Scale Computing, Edmonton, Canada, 2014.Google Scholar
[33] Zhang, B., Huang, J., Pitsianis, N. P., and Sun, X.. Dynamic prioritization for parallel traversal of irregularly structured spatio-temporal graphs. In Proceedings of 3rd USENIX Workshop on Hot Topics in Parallelism, 2011.Google Scholar
[34] Zhao, F. and Johnsson, S. L.. The parallel multipole method on the connection machine. SIAM J. Sci. Stat. Comp., 12:14201437, 1991.Google Scholar