Skip to main content

A Survey on Parallel Computing and its Applications in Data-Parallel Problems Using GPU Architectures

  • Cristóbal A. Navarro (a1) (a2), Nancy Hitschfeld-Kahler (a1) and Luis Mateu (a1)

Parallel computing has become an important subject in the field of computer science and has proven to be critical when researching high performance solutions. The evolution of computer architectures (multi-core and many-core) towards a higher number of cores can only confirm that parallelism is the method of choice for speeding up an algorithm. In the last decade, the graphics processing unit, or GPU, has gained an important place in the field of high performance computing (HPC) because of its low cost and massive parallel processing power. Super-computing has become, for the first time, available to anyone at the price of a desktop computer. In this paper, we survey the concept of parallel computing and especially GPU computing. Achieving efficient parallel algorithms for the GPU is not a trivial task, there are several technical restrictions that must be satisfied in order to achieve the expected performance. Some of these limitations are consequences of the underlying architecture of the GPU and the theoretical models behind it. Our goal is to present a set of theoretical and technical concepts that are often required to understand the GPU and its massive parallelism model. In particular, we show how this new technology can help the field of computational physics, especially when the problem is data-parallel. We present four examples of computational physics problems; n-body, collision detection, Potts model and cellular automata simulations. These examples well represent the kind of problems that are suitable for GPU computing. By understanding the GPU architecture and its massive parallelism programming model, one can overcome many of the technical limitations found along the way, design better GPU-based algorithms for computational physics problems and achieve speedups that can reach up to two orders of magnitude when compared to sequential implementations.

    • Send article to Kindle

      To send this article to your Kindle, first ensure is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle. Find out more about sending to your Kindle.

      Note you can select to send to either the or variations. ‘’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

      Find out more about the Kindle Personal Document Service.

      A Survey on Parallel Computing and its Applications in Data-Parallel Problems Using GPU Architectures
      Available formats
      Send article to Dropbox

      To send this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Dropbox.

      A Survey on Parallel Computing and its Applications in Data-Parallel Problems Using GPU Architectures
      Available formats
      Send article to Google Drive

      To send this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Google Drive.

      A Survey on Parallel Computing and its Applications in Data-Parallel Problems Using GPU Architectures
      Available formats
Corresponding author
Hide All
[1]Adve, S. V. and Gharachorloo, K.Shared memory consistency models: A tutorial. Computer, 29(12):66–76, December 1996.
[2]Aggarwal, A., Alpern, B., Chandra, A., and Snir, M.A model for hierarchical memory. In Proceedings of the nineteenth annual ACM symposium on Theory of computing, STOC ‘87, pages 305–314, New York, NY, USA, 1987. ACM.
[3]Alpern, B., Carter, L., Feig, E., and Selker, T.The uniform memory hierarchy model of computation. Algorithmica, 12:72–109, 1994. 10.1007/BF01185206.
[4]Alpern, B., Carter, L., and Ferrante, J.Modeling parallel computers as memory hierarchies. In In Proc. Programming Models for Massively Parallel Computers, pages 116–123. IEEE Computer Society Press, 1993.
[5]Amdahl, G. M.Validity of the single processor approachto achieving large scale computing capabilities. In Proceedings of the April 18-20, 1967, spring joint computer conference, AFIPS ‘67 (Spring), pages 483–485, New York, NY, USA, 1967. ACM.
[6]Barnes, J. and Hut, P.A hierarchical O(N log N) force-calculation algorithm. Nature, 324(6096):446–449, December 1986.
[7]Barroso, L. A.The price of performance. Queue, 3(7):48–53, September 2005.
[8]Bays, C.Cellular automata in triangular, pentagonal and hexagonal tessellations. In Meyers, Robert A., editor, Computational Complexity, pages 434–442. Springer New York, 2012.
[9]Beame, P. and Hastad, J.Optimal bounds for decision problems on the crcw pram. In In Proceedings of the 19th ACM Symposium on Theory of Computing (New, pages 25–27. ACM.
[10]Bédorf, J., Gaburov, E., and Zwart, S. P.A sparse octree gravitational n-body code that runs entirely on the GPU processor. J. Comput. Phys., 231(7):2825–2839, April 2012.
[11]Bernhardt, A., Maximo, A., Velho, L., Hnaidi, H., and Cani, M.-P.Real-time terrain modeling using cpu-GPU coupled computation. In Proceedings of the 2011 24th SIBGRAPI Conference on Graphics, Patterns and Images, SIBGRAPI ‘11, pages 64–71, Washington, DC, USA, 2011. IEEE Computer Society.
[12]Bittnar, Z., Kruis, J., Němeček, J., Patzák, B., and Rypl, D.Civil and structural engineering computing: 2001. chapter Parallel and distributed computations for structural mechanics: a review, pages 211–233. Saxe-Coburg Publications, 2001.
[13]Carter, L.Alpern, B.The ram model considered harmful towards a science of performance programming, 1994.
[14]Breshears, C. P.The Art of Concurrency – A Thread Monkey’s Guide to Writing Parallel Applications. O’Reilly, 2009.
[15]Buck, I., Foley, T., Horn, D., Sugerman, J., Fatahalian, K., Houston, M., and Hanrahan, P.Brook for GPUs: stream computing on graphics hardware. ACM Trans. Graph., 23(3):777–786, August 2004.
[16]Capannini, G., Silvestri, F., and Baraglia, R.K-model: A new computational model for stream processors. In Proceedings of the 2010 IEEE 12th International Conference on High Performance Computing and Communications, HPCC ‘10, pages 239–246, Washington, DC, USA, 2010. IEEE Computer Society.
[17]Chamberlain, B. L.Chapel (cray inc. hpcs language). In Encyclopedia of Parallel Computing, pages 249–256. 2011.
[18]Chapman, B., Jost, G., and Pas, R. van der. Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation). The MIT Press, 2007.
[19]Chen, D.-K., Su, H.-M., and Yew, P.-C.The impact of synchronization and granularity on parallel systems. SIGARCH Comput. Archit. News, 18(3a):239–248, May 1990.
[20]Chen, N., Glazier, J. A., Izaguirre, J. A., and Alber, M. S.A parallel implementation of the cellular potts model for simulation of cell-based morphogenesis. Computer Physics Communications, 176(11-12):670–681, 2007.
[21]Coddington, P.Visualizations of spin models of magnetism, online at August 2013.
[22]Cohen, F., Decaudin, P., and Neyret, F.GPU-based lighting and shadowing of complex natural scenes. In Siggraph’04 Conf. DVD-ROM (Poster), August 2004. Los Angeles, USA.
[23]Colbert, M. and Kŕivánek, J.Real-time dynamic shadows for image-based lighting. In ShaderX 7 - Advanced Rendering Technicques. Charles River Media, 2009.
[24]Cole, M.Algorithmic skeletons: structured management of parallel computation. MIT Press, Cambridge, MA, USA, 1991.
[25]Colic, A., Kalva, H., and Furht, B.Exploring nvidia-cuda for video coding. In Proceedings of the first annual ACM SIGMM conference on Multimedia systems, MMSys ‘10, pages 13–22, New York, NY, USA, 2010. ACM.
[26]Cook, M.Universality in Elementary Cellular Automata. Complex Systems, 15(1): 1–40, 2004.
[27]Cormen, T. H., Stein, C., Rivest, R. L., and Leiserson, C. E.Introduction to Algorithms. McGraw-Hill Higher Education, 2nd edition, 2001.
[28] Intel Corporation. IntelR XeonR Processor E5-2600 Product Family Uncore Performance Monitoring Guide, 2012.
[29] Nvidia Corporation. Kepler Whitepaper for the GK110 architecture, 2012.
[30]Scheihing, E., Navarro, C. A., Hitschfeld-Kahler, N.A GPU-based method for generating quasi-delaunay triangulations based on edge-flips. In Proceedings of the 8th International on Computer Graphics, Theory and Applications, GRAPP 2013, pages 27–34, February 2013.
[31]Culler, D., Karp, R., Patterson, D., Sahay, A., Schauser, K. E., Santos, E., Subramonian, R., and Eicken, T. von. Logp: towards a realistic model of parallel computation. SIGPLAN Not., 28(7):1–12, July 1993.
[32]Dean, J. and Ghemawat, S.Mapreduce: simplified data processing on large clusters. Commun. ACM, 51(1):107–113,January 2008.
[33]Dijkstra, E. W.Solution of a problem in concurrent programming control. Commun. ACM, 8(9):569–, September 1965.
[34]Dunstan, N.Semaphores for fair scheduling monitor conditions. SIGOPS Oper. Syst. Rev., 25(3):27–31, May 1991.
[35]Faber, V., Lubeck, O. M., and White, A. B., Jr. Superlinear speedup of an efficient sequential algorithm is not possible. Parallel Comput., 3(3):259–260, July 1986.
[36]Ferrando, N., Gosalvez, M. A., Cerda, J., Girones, R. G., and Sato, K.Octree-based, GPU implementation of acontinuous cellular automaton for the simulation of complex, evolving surfaces. Computer Physics Communications, pages 628–640, 2011.
[37]Ferrero, E. E., De Francesco, J. P., Wolovick, N., and S. A.Cannas. q-state potts model metasta-bility study using optimized GPU-based monte carlo algorithms. Computer Physics Communications, 183(8):1578–1587, 2012.
[38]Flynn, M. J.Some computer organizations and their effectiveness. IEEE Trans. Comput., 21(9):948–960, September 1972.
[39]Fortune, S. and Wyllie, J.Parallelism in random access machines. In Proceedings of the tenth annual ACM symposium on Theory of computing, STOC ‘78, pages 114–118, New York, NY, USA, 1978. ACM.
[40]Foster, I.Designing and building parallel programs: Concepts and tools for parallel software engineering. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1995.
[41]Gabriel, E., Fagg, G. E., Bosilca, G., Angskun, T., Dongarra, J. J., Squyres, J. M., Sahay, V., Kambadur, P., Barrett, B., Lumsdaine, A., Castain, R. H., Daniel, D. J., Graham, R. L., and Woodall, T. S.Open MPI: Goals, concept, and design of a next generation MPI implementation. In Proceedings, 11th European PVM/MPI Users’ Group Meeting, pages 97–104, Budapest, Hungary, September 2004.
[42]Gardner, M.The fantastic combinations of John Conway’s new solitaire game “life”. Scientific American, 223:120–123, October 1970.
[43]Gobron, S., Bonafos, H., and Mestre, D.GPU accelerated computation and visualization of hexagonal cellular automata. In Proceedings of the 8th international conference on Cellular Automata for Reseach and Industry, ACRI ‘08, pages 512–521, Berlin, Heidelberg, 2008. Springer-Verlag.
[44]Gobron, S., Çöltekin, A., Bonafos, H., and Thalmann, D.GPGPU computation and visualization of three-dimensional cellular automata. The Visual Computer, 27(1):67–81, 2011.
[45]Gobron, S., Devillard, F., and Heit, B.Retina simulation using cellular automata and GPU programming. Mach. Vision Appl., 18(6):331–342, November 2007.
[46]Gobron, S., Marx, C., Ahn, J., and Thalmann, D.Real-time textured volume reconstruction using virtual and real video cameras. In proceedings of the Computer Graphics International 2010 conference, 2010.
[47]Greenlaw, R., Hoover, J. H., and Ruzzo, W. L.Limits to Parallel Computation: P-Completeness Theory. Oxford University Press, USA, April 1995.
[48]Gupta, M., Mukhopadhyay, S., and Sinha, N.Automatic parallelization of recursive procedures. Int. Parallel, J.Program., 28(6):537–562, December 2000.
[49]Gustafson, J. L.Reevaluating Amdahl’s law. Communications of the ACM, 31:532–533, 1988.
[50]Gustafson, J. L. Fixed time, tiered memory, and superlinear speedup. In In Proceedings of the Fifth Distributed Memory Computing Conference (DMCC5, 1990.
[51]Gustafson, J. L. The consequences of fixed time performance measurement. In Proceedings of the 25th Hawaii International Conference on Systems Sciences, IEEE Computer Society, 1992.
[52]Hamada, T., Narumi, T., Yokota, R., Yasuoka, K., Nitadori, K., and Taiji, M. 42 tflops hierarchical n-body simulations on GPUs with applications in both astrophysics and turbulence. In SC, 2009.
[53]Harada, T.Real-time rigid body simulation on GPUs. In Hubert Nguyen, editor, GPU Gems 3, pages 611–632. Addison-Wesley, 2008.
[54]Hoare, C. A. R.Monitors: an operating system structuring concept. Commun. ACM, 17(10):549–557, October 1974.
[55]Hong, C., Chen, D., Chen, W., Zheng, W., and Lin, H.Mapcg: writing parallel program portable between cpu and GPU. In Proceedings of the 19th international conference on Parallel architectures and compilation techniques, PACT1 ‘10, pages 217–226, New York, NY, USA, 2010. ACM.
[56]Horn, D. R., Sugerman, J., Houston, M., and Hanrahan, P.Interactive k-d tree GPU raytrac-ing. In Proceedings of the 2007 symposium on Interactive 3D graphics and games, I3D ‘07, pages 167–174, New York, NY, USA, 2007. ACM.
[57]Huang, M., Mehalel, M., Arvapalli, R., and He, S.An energy efficient 32nm 20 MB L3 cache for IntelR XeonR processor E5 family. In CICC, pages 1–4. IEEE, 2012.
[58]Ivanov, L.The n-body problem throughout the computer science curriculum. J. Comput. Sci. Coll., 22(6):43–52, June 2007.
[59]Luebke, D.Tran, J., Jordan, D.New challenges for cellular automata simulation on the GPU. Technical Report MSU-CSE-00-2, Virginia University, 2003.
[60]Jimenez, P., Thomas, F., and Torras, C. 3d collision detection: A survey. Computers and Graphics, 25:269–285, 2000.
[61]Judice, S. F., Barcellos, B., Coutinho, S., and Giraldi, G. A.Lattice methods for fluid animation in games. Comput. Entertain., 7(4):56:1–56:29, January 2010.
[62]Kashyap, S., Goradia, R., Chaudhuri, P., and Chandran, S.Implicit surface octrees for ray tracing point models. In Proceedings of the Seventh Indian Conference on Computer Vision, Graphics and Image Processing, ICVGIP ‘10, pages 227–234, New York, NY, USA, 2010. ACM.
[63]Kauffmann, C. and Piche, N.Seeded nd medical image segmentation by cellular automaton on GPU. Int. Computer, J.Assisted Radiology and Surgery, 5(3):251–262, 2010.
[64]Kautz, J., Heidrich, W., and Seidel, H.-P.Real-time bump map synthesis. In Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware, HWWS ‘01, pages 109–114, New York, NY, USA, 2001. ACM.
[65] Khronos OpenCL Working Group. The OpenCL Specification, version 1.0.29, 8 December 2008.
[66]Kidner, D. B., Rallings, P. J., and Ware, J. A.Parallel processing for terrain analysis in GIS: Visibility as a case study. Geoinformatica, 1(2):183–207, August 1997.
[67]Kilgard, M. J. A practical and robust bump-mapping technique for todays GPUs. Nvidia, 2000.
[68]Kim, S. W. and Eigenmann, R.The structure of a compiler for explicit and implicit parallelism. In Proceedings of the 14th international conference on Languages and compilers for parallel computing, LCPC’01, pages 336–351, Berlin, Heidelberg, 2003. Springer-Verlag.
[69]Kipfer, P.LCP algorithms for collision detection using CUDA. In Hubert Nguyen, editor, GPUGems 3, pages 723–739. Addison-Wesley, 2007.
[70]Knuth, D. E.Computer programming as an art. Commun. ACM, 17(12):667–673,December 1974.
[71]Komura, Y. and Okabe, Y.GPU-based single-cluster algorithm for the simulation of the ising model. J. Comput. Phys., 231(4):1209–1215,February 2012.
[72]Komura, Y. and Okabe, Y.Multi-GPU-based swendsenVwang multi-cluster algorithm for the simulation of two-dimensional -state potts model. Computer Physics Communications, 184(1):40–44, 2013.
[73]Korček, P., Sekanina, L., and Fučik, O.Cellular automata based traffic simulation accelerated on GPU. In Proceedings of the 17th International Conference on Soft Computing (MENDEL2011), pages 395–402. Institute of Automation and Computer Science FME BUT, 2011.
[74]Krishnamoorthy, S., Baskaran, M., Bondhugula, U., Ramanujam, J., Rountev, A., and Sa-dayappan, P.Effective automatic parallelization of stencil computations. SIGPLAN Not., 42(6):235–244, June 2007.
[75]Lee, V. W., Kim, C., Chhugani, J., Deisher, M., Kim, D., Nguyen, A. D., Satish, N., Smelyanskiy, M., Chennupaty, S., Hammarlund, P., Singhal, R., and Dubey, P.Debunking the 100x GPU vs. cpu myth: an evaluation of throughput computing on cpu and GPU. SIGARCH Comput. Archit. News, 38(3):451–460, June 2010.
[76]Leighton, F. T.Introduction to parallel algorithms and architectures: array, trees, hyper-cubes. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1992.
[77]Loveman, D.High performance Fortran. IEEE Parallel & Distributed Technology: Systems & Applications, 1(1):25–42, 1993.
[78]Lu, P., Oki, H., Frey, C., Chamitoff, G., Chiao, L., Fincke, E., Foale, C., Magnus, S., Mc, W.Arthur, Tani, D., Whitson, P., Williams, J., Meyer, W., Sicker, R., Au, B., Christiansen, M., Schofield, A., and Weitz, D.Orders-of-magnitude performance increases in GPU-accelerated correlation of images from the international space station. Journal of Real-Time Image Processing, 5:179–193, 2010. 10.1007/s11554-009-0133-1.
[79]Ma, X., Li, J., and Samatova, N. F.Automatic parallelization of scripting languages: Toward transparent desktop parallel computing. In Parallel and Distributed Processing Symposium, 2007. IPDPS 2007. IEEE International, pages 1–6, 2007.
[80]Macedonia, M.The GPU enters computing’s mainstream. Computer, 36(10):106–108,2003.
[81]Mackenzie, P. D. and Ramachandran, V.ERCW PRAMs and optical communication. In in Proceedings of the European Conference on Parallel Processing, EUROPAR 96, pages 293–302, 1996.
[82]Mark, W. R., Glanville, R. S., Akeley, K., and Kilgard, M. J.Cg: a system for programming graphics hardware in a c-like language. ACM Trans. Graph., 22(3):896–907, July 2003.
[83]Marroquim, R. and Maximo, A.Introduction to GPU programming with glsl. In Proceedings of the 2009 Tutorials of the XXII Brazilian Symposium on Computer Graphics and Image Processing, SIBGRAPI-TUTORIALS ‘09, pages 3–16, Washington, DC, USA, 2009. IEEE Computer Society.
[84]Matias, Y. and Vishkin, U.On parallel hashing and integer sorting. In Michael Paterson, editor, Automata, Languages and Programming, volume 443 of Lecture Notes in Computer Science, pages 729–743. Springer Berlin / Heidelberg, 1990. 10.1007/BFb0032070.
[85]McCool, M. D., Qin, Z., and Popa, T. S.Shader metaprogramming. In Proceedings of the ACMSIGGRAPH/EUROGRAPHICS conferenceonGraphics hardware,HWWS ‘02, pages 57–68, Aire-la-Ville, Switzerland, Switzerland, 2002. Eurographics Association.
[86]Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., and Teller, E.Equation of state calculations by fast computing machines. J. Chem. Phys., 21:1087, 1953.
[87]Mikhayhu, A. S. Embarrassingly Parallel. Tempor, 2012.
[88]Neumann, J. Von. Theory of Self-Reproducing Automata. University of Illinois Press, Champaign, IL, USA, 1966.
[89]Nguyen, H. GPU gems 3. Addison-Wesley Professional, first edition, 2007.
[90]Nichols, B., Buttlar, D., and Farrell, J. P.Pthreads Programming. O’Reilly, 101 Morris Street, Sebastopol, CA 95472, 1998.
[91]Nikhil, R. and Arvind, . Implicit Parallel Programming in pH. Morgan Kaufmann, May 2001.
[92] Nvidia. Fermi Compute Architecture Whitepaper.
[93] Nvidia-Corporation. Nvidia CUDA C Programming Guide, 2012.
[94]Oneppo, M.Hlsl shader model 4.0. In ACM SIGGRAPH 2007 courses, SIGGRAPH ‘07, pages 112–152, New York, NY, USA, 2007. ACM.
[95]Openshaw, S. and Turton, I.High Performance Computing and the Art of Parallel Programming: An Introduction for Geographers, Social Scientists, and Engineers. Routledge, New York, NY, 10001, 1999.
[96]Pabst, S., Koch, A., and Straßer, W.Fast and scalable CPU/GPU collision detection for rigid and deformable surfaces. Computer Graphics Forum, 29(5):1605–1612, 2010.
[97]Padua, D. A., editor. Encyclopedia of Parallel Computing, volume 4. Springer, 2011.
[98]Pagani, M. and Tranquilli, P. Parallel reduction in resource lambda-calculus. In APLAS, pages 226–242, 2009.
[99]Parkinson, D.Parallel efficiency can be greater than unity. Parallel Computing, 3(3):261 – 262, 1986.
[100]Peelle, H. A.To teach Newton’s square root algorithm. SIGAPL APL Quote Quad, 5(4):48–50, December 1974.
[101]Plagianakos, V. P., Nousis, N. K., and Vrahatis, M. N.Locating and computing in parallel all the simple roots of special functions using pvm. J. Comput. Appl. Math., 133(1-2):545–554, August 2001.
[102]Preis, T., Virnau, P., Paul, W., and Schneider, J. J.GPU accelerated monte carlo simulation of the 2d and 3d ising model. J. Comput. Phys., 228(12):4468–4477, July 2009.
[103]Roberts, M., Packer, J., Sousa, M. C., and Mitchell, J. R.A work-efficient GPU algorithm for level set segmentation. In Proceedings of the Conference on High Performance Graphics, HPG ‘10, pages 123–132, Aire-la-Ville, Switzerland, Switzerland, 2010. Eurographics Association.
[104]Ross, P. E.Why cpu frequency stalled. IEEE Spectr., 45(4):72–72, April 2008.
[105]Rugina, R. and Rinard, M.Automatic parallelization of divide and conquer algorithms. In In Proceedings of the 7th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 72–83, 1999.
[106]Rybacki, S., Himmelspach, J., and Uhrmacher, A. M.Experiments with single core, multi-core, and GPU based computation of cellular automata. In Proceedings of the 2009 First International Conference on Advances in System Simulation, SIMUL ‘09, pages 62–67, Washington, DC, USA, 2009. IEEE Computer Society.
[107]Sander, P. V. and Mitchell, J. L.Progressive buffers: view-dependent geometry and texture lod rendering. In Proceedings of the third Eurographics symposium on Geometry processing, SGP ‘05, Aire-la-Ville, Switzerland, Switzerland, 2005. Eurographics Association.
[108]Di, A. Serio and Ibáñez, M. B. Evaluation of a nearest-neighbor load balancing strategy for parallel molecular simulations in mpi environment. In PVM/MPI, pages 226–233, 2002.
[109]Shiloach, Y. and Vishkin, U.An o(log n) parallel connectivity algorithm. Algorithms, J., 3(1):57–67, 1982.
[110]Smith, J. R.The design and analysis of parallel algorithms. Oxford University Press, Inc., New York, NY, USA, 1993.
[111]Subramonian, R.An o(log n) time common CRCW PRAM algorithm for minimum spanning tree. Technical Report UCB/CSD-92-673, EECS Department, University of California, Berkeley, Mar 1992.
[112]Sugerman, J., Fatahalian, K., Boulos, S., Akeley, K., and Hanrahan, P.Gramps: A programming model for graphics pipelines. ACM Trans. Graph., 28(1):4:1–4:11, February 2009.
[113]Swendsen, R. H. and Wang, J. S.Nonuniversal, critical dynamics in Monte Carlo simulations. Phys. Rev. Lett., 58:86, 1987.
[114]Tanabe, N., Hori, N., Nuttapon, B., and Nakajo, H.Preliminary evaluations for hybrid memory cube with gather functions using FPGA. IPSJ SIG Notes, 2012(6):1–10, 2012-03-19.
[115]Taniar, D., Leung, C. H. C., Rahayu, W., and Goel, S. High-Performance Parallel Database Processing and Grid Databases. Wiley Series on Parallel and Distributed Computing, 2008.
[116]Tapia, J. J. and D’Souza, R.Data-parallel algorithms for large-scale real-time simulation of the cellular Potts model on graphics processing units. 2009 IEEE International Conference on Systems Man and Cybernetics, (10):1411–1418, 2009.
[117]Tapia, J. J. and D’Souza, R.Parallelizing the cellular potts model on graphics processing units. Computer Physics Communications, 182(4):857–865, 2011.
[118]Topa, P. and Mlocek, P.GpGPU implementation of cellular automata model of water flow. In Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I, PPAM’11, pages 630–639, Berlin, Heidelberg, 2012. Springer-Verlag.
[119]Valiant, L. G.A bridging model for parallel computation. Commun. ACM, 33(8):103–111, August 1990.
[120]Vishkin, U. A pram-on-chip vision (invited abstract). In SPIRE, page 260, 2000.
[121]Vishkin, U., Dascal, S., Berkovich, E., and Nuzman, J. Explicit multi-threading (XMT) bridging models for instruction parallelism (extended abstract). In SPAA, pages 140–151, 1998.
[122]Neumann, J. von1. The general and logical theory of automata. In Cerebral Mechanisms in Behaviour. Wiley, 1951.
[123]Woeginger, G. J.Combinatorial optimization - eureka, you shrink! chapter Exact algorithms for NP-hard problems: a survey, pages 185–207. Springer-Verlag New York, Inc., New York, NY, USA, 2003.
[124]Wolff, U.Collective Monte Carlo updating for spin systems. Physical Review Letters, 62:361–364, 1989.
[125]Wu, F. Y.The Potts model. Reviews of Modern Physics, 54(1):235–268, January 1982.
[126]Yokota, R., Barba, L., Narumi, T., and Yasuoka, K.Scaling fast multipole methods up to 4000 GPUs. In Proceedings of the ATIP/A*CRC Workshop on Accelerator Technologies for High-Performance Computing: Does Asia Lead the Way?, ATIP ‘12, pages 9:1–9:6, Singapore, Singapore, 2012. A*STAR Computational Resource Centre.
[127]Yokota, R. and Barba, L. A. Fast n-body simulations on GPUs. CoRR, abs/1108.5815, 2011.
[128]Yokota, R. and Barba, L. A. A tuned and scalable fast multipole method as a preeminent algorithm for exascale systems. CoRR, abs/1106.2176, 2011.
[129]Yokota, R. and Barba, L. A.Hierarchical n-body simulations with autotuning for heterogeneous systems. Computing in Science and Engineering, 14(3):30–39, 2012.
[130]Yukita, S.Cellular automata in non-euclidean spaces. In Proceedings of the 7th WSEAS International Conference on Mathematical Methods and Computational Techniques In Electrical Engineering, MMACTE’05, pages 200–207, Stevens Point, Wisconsin, USA, 2005. World Scientific and Engineering Academy and Society (WSEAS).
[131]Zhou, K., Hou, Q., Wang, R., and Guo, B.Real-time kd-tree construction on graphics hardware. ACM Trans. Graph., 27(5):126:1–126:11, December 2008.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Communications in Computational Physics
  • ISSN: 1815-2406
  • EISSN: 1991-7120
  • URL: /core/journals/communications-in-computational-physics
Please enter your name
Please enter a valid email address
Who would you like to send this to? *



Full text views

Total number of HTML views: 4
Total number of PDF views: 1142 *
Loading metrics...

Abstract views

Total abstract views: 963 *
Loading metrics...

* Views captured on Cambridge Core between September 2016 - 21st March 2018. This data will be updated every 24 hours.