Hostname: page-component-6766d58669-bp2c4 Total loading time: 0 Render date: 2026-05-15T13:20:52.257Z Has data issue: false hasContentIssue false

Lazy tree splitting

Published online by Cambridge University Press:  15 August 2012

LARS BERGSTROM
Affiliation:
Department of Computer Science, University of Chicago, Chicago, IL 60637, USA (e-mail: larsberg@cs.uchicago.edu)
MATTHEW FLUET
Affiliation:
Department of Computer Science, Rochester Institute of Technology, Rochester NY 14623-5603, USA (e-mail: mtf@cs.rit.edu)
MIKE RAINEY
Affiliation:
Max Planck Institute for Software Systems, D-67663 Kaiserslautern, Rheinland-PhalzGermany (e-mail: mrainey@mpi-sws.org)
JOHN REPPY
Affiliation:
Department of Computer Science, University of Chicago, Chicago, IL 60637, USA (e-mail: jhr@cs.uchicago.edu)
ADAM SHAW
Affiliation:
Department of Computer Science, University of Chicago, Chicago, IL 60637, USA (e-mail: ams@cs.uchicago.edu)
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the 'Save PDF' action button.

Nested data-parallelism (NDP) is a language mechanism that supports programming irregular parallel applications in a declarative style. In this paper, we describe the implementation of NDP in Parallel ML (PML), which is a part of the Manticore system. One of the main challenges of implementing NDP is managing the parallel decomposition of work. If we have too many small chunks of work, the overhead will be too high, but if we do not have enough chunks of work, processors will be idle. Recently, the technique of Lazy Binary Splitting was proposed to address this problem for nested parallel loops over flat arrays. We have adapted this technique to our implementation of NDP, which uses binary trees to represent parallel arrays. This new technique, which we call Lazy Tree Splitting (LTS), has the key advantage of performance robustness, i.e., it does not require tuning to get the best performance for each program. We describe the implementation of the standard NDP operations using LTS and present experimental data that demonstrate the scalability of LTS across a range of benchmarks.

Information

Type
Articles
Copyright
Copyright © Cambridge University Press 2012
Submit a response

Discussions

No Discussions have been published for this article.