Skip to main content Accessibility help
×
Home

Lazy tree splitting

  • LARS BERGSTROM (a1), MATTHEW FLUET (a2), MIKE RAINEY (a3), JOHN REPPY (a4) and ADAM SHAW (a5)...

Abstract

Nested data-parallelism (NDP) is a language mechanism that supports programming irregular parallel applications in a declarative style. In this paper, we describe the implementation of NDP in Parallel ML (PML), which is a part of the Manticore system. One of the main challenges of implementing NDP is managing the parallel decomposition of work. If we have too many small chunks of work, the overhead will be too high, but if we do not have enough chunks of work, processors will be idle. Recently, the technique of Lazy Binary Splitting was proposed to address this problem for nested parallel loops over flat arrays. We have adapted this technique to our implementation of NDP, which uses binary trees to represent parallel arrays. This new technique, which we call Lazy Tree Splitting (LTS), has the key advantage of performance robustness, i.e., it does not require tuning to get the best performance for each program. We describe the implementation of the standard NDP operations using LTS and present experimental data that demonstrate the scalability of LTS across a range of benchmarks.

Copyright

References

Hide All
Appel, Andrew W. (1989) Simple generational garbage collection and fast allocation. Softw. Pract. Exp. 19 (2), 171183.
Appel, Andrew W. (1992) Compiling with Continuations. Cambridge, UK: Cambridge University Press.
Barnes, J. & Hut, P. (1986) A hierarchical O(N log N) force calculation algorithm. Nature 324 (Dec.), 446449.
Blelloch, Guy E. (1990a, Nov.) Prefix Sums and Their Applications. Tech. rept. CMU-CS-90-190. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA.
Blelloch, Guy E. (1990b) Vector Models for Data-Parallel Computing. Cambridge, MA: MIT Press.
Blelloch, Guy E. (1996) Programming parallel algorithms. Commun. ACM 39 (3), 8597.
Blelloch, Guy E. & Greiner, J. (1996) A provable time and space-efficient implementation of NESL. In Proceedings of the 1996 ACM SIGPLAN International Conference on Functional Programming, Philadelphia, PA, USA. New York, NY: ACM, pp. 213225.
Blelloch, Guy E., Chatterjee, S., Hardwick, Jonathan C., Sipelstein, J. & Zagha, M. (1994) Implementation of a portable nested data-parallel language. J. Parallel Distrib. Comput. 21 (1), 414.
Blumofe, Robert D. & Leiserson, Charles E. (1999) Scheduling multi-threaded computations by work stealing. J. ACM 46 (5), 720748.
Boehm, Hans-J., Atkinson, R. & Plass, M. (1995) Ropes: An alternative to strings. Softw. Pract. Exp. 25 (12), 13151330.
Burton, F. Warren & Sleep, M. Ronan. (1981) Executing functional programs on a virtual tree of processors. In Functional Programming Languages and Computer Architecture (FPCA '81). New York, NY: ACM, pp. 187194.
Carver, T. (2010, Mar) Magny-Cours and Direct Connect Architecture 2.0. Accessed January 2012. Available at: http://developer.amd.com/documentation/articles/pages/Magny-Cours-Direct-Connect-Architecture-2.0.aspx.
Chakravarty, Manuel M. T., Leshchinskiy, R., Peyton Jones, S. & Keller, G. (2008) Partial vectorisation of Haskell programs. Proceedings of the ACM SIGPLAN Workshop on Declarative Aspects of Multicore Programming, Nice, France. New York, NY: ACM.
Chakravarty, Manuel M. T., Leshchinskiy, R., Peyton Jones, S., Keller, G. & Marlow, S. (2007) Data parallel Haskell: A status report. In Proceedings of the ACM SIGPLAN Workshop on Declarative Aspects of Multicore Programming, San Francisco, CA, USA. New York, NY: ACM, pp. 1018.
Chatterjee, S. (1993) Compiling nested data-parallel programs for shared-memory multiprocessors. ACM Trans. Program. Lang. Syst. 15 (3), 400462.
Conway, P., Kalyanasundharam, N., Donley, G., Lepak, K. & Hughes, B. (2010) Cache hierarchy and memory subsystem of the AMD Opteron processor. IEEE Micro 30, 1629.
Fluet, M., Rainey, M., Reppy, J., Shaw, A. & Xiao, Y. (2007a) Manticore: A heterogeneous parallel language. In Proceedings of the ACM SIGPLAN Workshop on Declarative Aspects of Multicore Programming, San Francisco, CA, USA. New York, NY: ACM, pp. 3744.
Fluet, M., Ford, N., Rainey, M., Reppy, J., Shaw, A. & Xiao, Y. (2007b) Status report: The manticore project. In Proceedings of the 2007 ACM SIGPLAN Workshop on ML, Victoria, BC, Canada. New York, NY: ACM, pp. 1524.
Fluet, M., Rainey, M., Reppy, J. & Shaw, A. (2008a) Implicitly threaded parallelism in Manticore. In Proceedings of the 13th ACM SIGPLAN International Conference on Functional Programming, Victoria, BC, Canada. New York, NY: ACM, pp. 119130.
Fluet, M., Rainey, M. & Reppy, J. (2008b) A scheduling framework for general-purpose parallel languages. In Proceedings of the 13th ACM SIGPLAN International Conference on Functional Programming, Victoria, BC, Canada. New York, NY: ACM, pp. 241252.
Frigo, M., Leiserson, Charles E. & Randall, Keith H. (1998, Jun) The implementation of the Cilk-5 multithreaded language. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation (PLDI '98), Montreal, Canada. New York, NY: ACM, pp. 212223.
Ghuloum, A., Sprangle, E., Fang, J., Wu, G. & Zhou, X. (2007, Oct) Ct: A Flexible Parallel Programming Model for Tera-Scale Architectures. Tech. rept. Intel. Accessed January 2012. Available at: http://techresearch.intel.com/UserFiles/en-us/File/terascale/Whitepaper-Ct.pdf.
Halstead, Robert H. Jr. (1984) Implementation of multilisp: LISP on a multiprocessor. In Conference Record of the 1984 ACM Symposium on LISP and Functional Programming. New York, NY: ACM, pp. 917.
Hinze, R. & Paterson, R. (2006) Finger trees: A simple general-purpose data structure. J. Funct. Program. 16 (2), 197217.
Huet, G. (1997) The zipper. J. Funct. Program. 7 (5), 549554.
Intel (2008) Intel Threading Building Blocks Reference Manual. Santa Clara, CA: Intel Corporation. Available at: http://www.threadingbuildingblocks.org/.
Keller, G. (1999) Transformation-Based Implementation of Nested Data Parallelism for Distributed Memory Machines. PhD thesis, Technische Universität Berlin, Berlin, Germany.
Leiserson, Charles E. (2009) The Cilk++ concurrency platform. In Proceedings of the 46th Annual Design Automation Conference, San Francisco, CA, USA. New York, NY: ACM, pp. 522527.
Leshchinskiy, R. (2005) Higher-Order Nested Data Parallelism: Semantics and Implementation. PhD thesis, Technische Universität Berlin, Berlin, Germany.
Loidl, H.-W. & Hammond, K. (1995) On the granularity of divide-and-conquer parallelism. In Proceedings of the Glasgow Workshop on Functional Programming, Ullapool, Scotland. New York: Springer-Verlag, pp. 810.
Lopez, P., Hermenegildo, M. & Debray, S. (1996) A methodology for granularity-based control of parallelism in logic programs. J. Symb. Comput. 21 (Jun), 715734.
McBride, C. (2008) Clowns to the left of me, jokers to the right (pearl): Dissecting data structures. In Conference Record of the 35th Annual ACM Symposium on Principles of Programming Languages(POPL '08), San Francisco, CA, USA. New York, NY: ACM, pp. 287295.
Milner, R., Tofte, M., Harper, R. & MacQueen, D. (1997) The Definition of Standard ML (revised). Cambridge, MA: MIT Press.
MLton (n.d.) The MLton Standard ML Compiler. Accessed January 2011. Available at: http://mlton.org.
Narlikar, Girija J. & Blelloch, Guy E. (1999) Space-efficient scheduling of nested parallelism. ACM Trans. Program. Lang. Syst. 21 (1), 138173.
Nikhil, Rishiyur S. (1991, Jul) ID Language Reference Manual. Cambridge, MA: Laboratory for Computer Science, MIT.
Peyton Jones, S., Leshchinskiy, R., Keller, G. & Chakravarty, Manuel M. T. (2008) Harnessing the multicores: Nested data parallelism in Haskell. In Proceedings of the 6th Asian Symposium on Programming Languages and Systems. New York, NY: Springer-Verlag, pp. 138138.
Plummer, H. C. (1911) On the problem of distribution in globular star clusters. Mon. Not. R. Astron. Soc. 71 (Mar), 460470.
Rainey, M. (2007, Jan) The Manticore Runtime Model. M.Phil. thesis, University of Chicago, Illinois, USA. Available at: http://manticore.cs.uchicago.edu.
Rainey, M. (2009) Prototyping nested schedulers. In Semantics Engineering with Plt Redex, Felleisen, M., Findler, R. & Flatt, M. (eds.). Cambridge, MA: MIT Press.
Robison, A., Voss, M. & Kukanov, A. (2008) Optimization via reflection on work stealing in TBB. Proceedings of the IEEE International Symposium on Parallel and Distributed Processing. Los Alamitos, CA: IEEE Computer Society Press.
Scandal Project (n.d.) A Library of Parallel Algorithms Written NESL. Accessed January 2012. Available at: http://www.cs.cmu.edu/~scandal/nesl/algorithms.html.
So, B., Ghuloum, A. & Wu, Y. (2006) Optimizing data parallel operations on many-core platforms. First Workshop on Software Tools for Multi-Core Systems, (STMCS), Manhattan, NY.
Tick, E. & Zhong, X. (1993) A compile-time granularity analysis algorithm and its performance evaluation. In Selected Papers of the International Conference on Fifth Generation Computer Systems (FGCS '92). New York, NY: Springer-Verlag, pp. 271295.
Trinder, Philip W., Hammond, K., Loidl, H.-W. & Peyton Jones, S. L. (1998) Algorithm + strategy = parallelism. J. Funct. Program. 8 (1), 2360.
Tzannes, A., Caragea, G. C., Barua, R. & Vishkin, U. (2010) Lazy binary-splitting: A run-time adaptive work-stealing scheduler. In Proceedings of the 2010 ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming. New York, NY: ACM, pp. 179190.
Weeks, S. (2006, Sep) Whole Program Compilation in MLton. Invited talk at ML'06 workshop. Accessed January 2011. Slides available at: http://mlton.org/pages/References/attachments/060916-mlton.pdf.

Metrics

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed

Lazy tree splitting

  • LARS BERGSTROM (a1), MATTHEW FLUET (a2), MIKE RAINEY (a3), JOHN REPPY (a4) and ADAM SHAW (a5)...
Submit a response

Discussions

No Discussions have been published for this article.

×

Reply to: Submit a response


Your details


Conflicting interests

Do you have any conflicting interests? *