Skip to main content
×
Home
    • Aa
    • Aa

A language for hierarchical data parallel design-space exploration on GPUs

  • BO JOEL SVENSSON (a1), RYAN R. NEWTON (a1) and MARY SHEERAN (a2)
Abstract
Abstract

Graphics Processing Units (GPUs) offer potential for very high performance; they are also rapidly evolving. Obsidian is an embedded language (in Haskell) for implementing high performance kernels to be run on GPUs. We would like to have our cake and eat it too; we want to raise the level of abstraction beyond CUDA code and still give the programmer control over the details relevant to kernel performance. To that end, Obsidian provides array representations that guarantee elimination of intermediate arrays while also using the type system to model the hierarchy of the GPU. Operations are compiled very differently depending on what level of the GPU they target, and as a result, the user is gently constrained to write code that matches the capabilities of the GPU. Thus, we implement not Nested Data Parallelism, but a more limited form that we call Hierarchical Data Parallelism. We walk through case-studies that demonstrate how to use Obsidian for rapid design exploration or auto-tuning, resulting in performance that compares well to the hand-tuned kernels used in Accelerate and NVIDIA Thrust.

Copyright
References
Hide All
AxelssonE., ClaessenK., SheeranM., SvenningssonJ., EngdalD. & PerssonA. (2011) The design and implementation of Feldspar: An embedded language for digital signal processing. In Proceedings of 22nd International Conference on Implementation and Application of Functional Languages, IFL'10. Berlin Heidelberg: Springer-Verlag, pp. 121–136.
BergstromL. & ReppyJ. (2012) Nested data-parallelism on the GPU. In Proceedings of the 17th ACM SIGPLAN International Conference on Functional Programming. ICFP'12. New York, NY, USA: ACM, pp. 247–258.
BilleterM., OlssonO. & AssarssonU. (2009) Efficient stream compaction on wide SIMD many-core architectures. In Proceedings of the Conference on High Performance Graphics. HPG '09. New York, NY, USA: ACM, pp. 159–166.
BjesseP., ClaessenK., SheeranM. & SinghS. (1998) Lava: Hardware design in Haskell. In Proceedings of the 3rd ACM SIGPLAN International Conference on Functional Programming, ICFP'98. New York, NY, USA: ACM, pp. 174–184.
BlellochG. (1996) Programming parallel algorithms. Commun. ACM 39 (3), 8597.
CatanzaroB., GarlandM. & KeutzerK. (2011) Copperhead: Compiling an embedded data parallel language. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming. PPoPP '11. New York, NY, USA: ACM, pp. 47–56.
ChafiH., SujeethA. K., BrownK. J., LeeH., AtreyaA. R. & OlukotunK. (2011) A Domain-specific Approach to Heterogeneous Parallelism. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming. PPoPP '11. New York, NY, USA: ACM, pp. 35–46.
ChakravartyM. M. T., KellerG., LeeS., McDonellT. L. & GroverV. (2011) Accelerating Haskell array codes with multicore GPUs. In Proceedings of the 6th workshop on Declarative Aspects of Multicore Programming. DAMP'11. New York, NY, USA: ACM, pp. 3–14.
ClaessenK., SheeranM. & SvenssonB. J. (2012) Expressive array constructs in an embedded GPU Kernel programming language. In Proceedings of the 7th workshop on Declarative Aspects and Applications of Multicore Programming. DAMP '12. New York, NY, USA: ACM, pp. 21–30.
ElliottC. (2003) Functional images. The Fun of Programming. “Cornerstones of Computing” series. Palgrave, pp. 131–150.
ElliottC., FinneS. & de MoorO. (2003) Compiling embedded languages. J. Funct. Program. 13 (3), 455481.
GuibasL. J. & WyattD. K. (1978) Compilation and delayed evaluation in APL. In Proceedings of the 5th Acm SIGACT-SIGPLAN Symposium on Principles of Programming Languages. POPL'78. New York, NY, USA: ACM, pp. 1–8.
HarrisM. (2007) Optimizing parallel reduction in CUDA. "http://developer.download.nvidia.com/assets/cuda/files/reduction.pdf".
HarrisM., SenguptaS. & OwensJ. D. (2007) Parallel prefix sum (Scan) with CUDA. In GPU Gems 3, NguyenH. (ed), Boston Mass.: Addison Wesley, pp. 851876.
HolkE., ByrdW. E., MahajanN., WillcockJ., ChauhanA. & LumsdaineA. (2012) Declarative parallel programming for GPUs. In Proceedings of ParCo 2011 Applications, Tools and Techniques on the Road to Exascale Computing. Advances in Parallel Computing. Amsterdam: IOS Press, pp. 297–304.
KellerG., ChakravartyM. M. T., LeshchinskiyR., Peyton JonesS. & LippmeierB. (2010) Regular, shape-polymorphic, parallel arrays in Haskell. In Proceedings of the 15th ACM SIGPLAN International Conference on Functional Programming. ICFP'10. New York, NY, USA: ACM, pp. 261–272.
KloecknerA. (2015) Loo.py: From Fortran to performance via transformation and substitution rules. In Proceedings of the ACM SIGPLAN 2nd International Workshop on Libraries, Languages and Compilers for Array Programming. ARRAY'15. New York, NY, USA: ACM, pp. 1–6.
MainlandG. & Morrisett. (2010) Nikola: Embedding compiled GPU functions in Haskell. In Proceedings of the 3rd ACM Haskell Symposium. New York, NY, USA: ACM, pp. 67–78.
McDonellT. L., ChakravartyM. M. T., KellerG. & LippmeierB. (2013) Optimising purely functional GPU programs. In Proceedings of the 18th ACM SIGPLAN International Conference on Functional Programming. ICFP'13. New York, NY, USA: ACM, pp. 49–60.
NVIDIA. (2015a) CUDA C Programming Guide. “http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html”.
NVIDIA. (2015b) NVIDIA CUB Library. “http://nvlabs.github.io/cub/”.
NVIDIA. (2015c) NVIDIA Thrust Library. “https://developer.nvidia.com/thrust”.
OanceaC. E., AndreettaC., BertholdJ., FrischA. & HengleinF. (2012) Financial software on GPUs: Between Haskell and Fortran. In Proceedings of the 1st ACM SIGPLAN workshop on Functional High-Performance Computing. FHPC'12. New York, NY, USA: ACM, pp. 61–72.
PerssonA., AxelssonE. & SvenningssonJ. (2012) Generic monadic constructs for embedded languages. In Implementation and Application of Functional Languages, Gill Andy and Hage Jurriaan (eds), IFL '11. Berlin Heidelberg: Springer-Verlag, pp. 8599.
SculthorpeN., BrackerJ., GiorgidzeG. & GillA. (2013) The constrained-monad problem. In Proceedings of 18th ACM SIGPLAN International Conference on Functional Programming, ICFP 2013. New York, NY, USA: ACM, pp. 287–298.
SklanskyJ. (1960) Conditional-sum addition logic. IRE Trans. Electron. Comput. EC–9 (2), 226231.
StevensR. T. (1989) Fractal Programming in C. M&T Books.
SvenningssonJ. & AxelssonE. (2013) Combining deep and shallow embedding for EDSL. In Trends in Functional Programming, TFP '12, LoidlH.-W. & PeaR. (eds), Lecture Notes in Computer Science, vol. 7829. Berlin Heidelberg: Springer-Verlag, pp. 2136.
SvenningssonJ. & SvenssonB. J. (2013) Simple and compositional reification of monadic embedded languages. In Proceedings of 18th ACM SIGPLAN International Conference on Functional Programming, ICFP'13. New York, NY, USA: ACM.
SvenningssonJ., SvenssonB. J. & SheeranM. (2013) Efficient counting and occurrence sort for GPUs using an embedded GPU programming language. In Proceedings of the 2nd ACM SIGPLAN Workshop on Functional High-Performance Computing. FHPC'13. New York, NY, USA: ACM pp. 50–63.
SvenssonB. J., SheeranM. & NewtonR. R. (2014) Design exploration through code-generating DSLs. Commun. ACM 57 (6), 5663.
SvenssonJ., SheeranM. & ClaessenK. (2008) Obsidian: A domain specific embedded language for parallel programming of graphics processors. In Proceedings of the International Conference on Implementation and Application of Functional Languages. IFL '08. Berlin Heidelberg: Springer-Verlag, pp. 156–173.
SvenssonJ., ClaessenK. & SheeranM. (2010) GPGPU kernel implementation and refinement using Obsidian. Procedia Comput. Sci. 1 (1), 20652074.
UlvingeN. (2014) Increasing Programmability of an Embedded Domain Specific Language for GPGPU Kernels using Static Analysis. MSc Thesis, Dept. Computer Science and Engineering, Chalmers University of Technology.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Journal of Functional Programming
  • ISSN: 0956-7968
  • EISSN: 1469-7653
  • URL: /core/journals/journal-of-functional-programming
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
×

Metrics

Full text views

Total number of HTML views: 0
Total number of PDF views: 45 *
Loading metrics...

Abstract views

Total abstract views: 244 *
Loading metrics...

* Views captured on Cambridge Core between September 2016 - 23rd October 2017. This data will be updated every 24 hours.