This list contains references from the content that can be linked to their source. For a full set of references and notes please see the PDF or HTML where available.
 S. V. Adve and K. Gharachorloo Shared memory consistency models: A tutorial. Computer, 29(12):66–76, December 1996.
 B. Alpern , L. Carter , E. Feig , and T. Selker The uniform memory hierarchy model of computation. Algorithmica, 12:72–109, 1994. 10.1007/BF01185206.
 J. Barnes and P. Hut A hierarchical O(N log N) force-calculation algorithm. Nature, 324(6096):446–449, December 1986.
 L. A. Barroso The price of performance. Queue, 3(7):48–53, September 2005.
 J. Bédorf , E. Gaburov , and S. P. Zwart A sparse octree gravitational n-body code that runs entirely on the GPU processor. J. Comput. Phys., 231(7):2825–2839, April 2012.
 I. Buck , T. Foley , D. Horn , J. Sugerman , K. Fatahalian , M. Houston , and P. Hanrahan Brook for GPUs: stream computing on graphics hardware. ACM Trans. Graph., 23(3):777–786, August 2004.
 D.-K. Chen , H.-M. Su , and P.-C. Yew The impact of synchronization and granularity on parallel systems. SIGARCH Comput. Archit. News, 18(3a):239–248, May 1990.
 D. Culler , R. Karp , D. Patterson , A. Sahay , K. E. Schauser , E. Santos , R. Subramonian , and T. von Eicken . Logp: towards a realistic model of parallel computation. SIGPLAN Not., 28(7):1–12, July 1993.
 J. Dean and S. Ghemawat Mapreduce: simplified data processing on large clusters. Commun. ACM, 51(1):107–113,January 2008.
 E. W. Dijkstra Solution of a problem in concurrent programming control. Commun. ACM, 8(9):569–, September 1965.
 N. Dunstan Semaphores for fair scheduling monitor conditions. SIGOPS Oper. Syst. Rev., 25(3):27–31, May 1991.
 V. Faber , O. M. Lubeck , and A. B. White , Jr. Superlinear speedup of an efficient sequential algorithm is not possible. Parallel Comput., 3(3):259–260, July 1986.
 E. E. Ferrero , J. P. De Francesco , N. Wolovick , and S. A.Cannas. q-state potts model metasta-bility study using optimized GPU-based monte carlo algorithms. Computer Physics Communications, 183(8):1578–1587, 2012.
 S. Gobron , A. Çöltekin , H. Bonafos , and D. Thalmann GPGPU computation and visualization of three-dimensional cellular automata. The Visual Computer, 27(1):67–81, 2011.
 S. Gobron , F. Devillard , and B. Heit Retina simulation using cellular automata and GPU programming. Mach. Vision Appl., 18(6):331–342, November 2007.
 J. L. Gustafson Reevaluating Amdahl’s law. Communications of the ACM, 31:532–533, 1988.
 C. A. R. Hoare Monitors: an operating system structuring concept. Commun. ACM, 17(10):549–557, October 1974.
 D. B. Kidner , P. J. Rallings , and J. A. Ware Parallel processing for terrain analysis in GIS: Visibility as a case study. Geoinformatica, 1(2):183–207, August 1997.
 D. E. Knuth Computer programming as an art. Commun. ACM, 17(12):667–673,December 1974.
 Y. Komura and Y. Okabe GPU-based single-cluster algorithm for the simulation of the ising model. J. Comput. Phys., 231(4):1209–1215,February 2012.
 Y. Komura and Y. Okabe Multi-GPU-based swendsenVwang multi-cluster algorithm for the simulation of two-dimensional -state potts model. Computer Physics Communications, 184(1):40–44, 2013.
 S. Krishnamoorthy , M. Baskaran , U. Bondhugula , J. Ramanujam , A. Rountev , and P. Sa-dayappan Effective automatic parallelization of stencil computations. SIGPLAN Not., 42(6):235–244, June 2007.
 V. W. Lee , C. Kim , J. Chhugani , M. Deisher , D. Kim , A. D. Nguyen , N. Satish , M. Smelyanskiy , S. Chennupaty , P. Hammarlund , R. Singhal , and P. Dubey Debunking the 100x GPU vs. cpu myth: an evaluation of throughput computing on cpu and GPU. SIGARCH Comput. Archit. News, 38(3):451–460, June 2010.
 D. Loveman High performance Fortran. IEEE Parallel & Distributed Technology: Systems & Applications, 1(1):25–42, 1993.
 P. Lu , H. Oki , C. Frey , G. Chamitoff , L. Chiao , E. Fincke , C. Foale , S. Magnus , W. Mc Arthur, D. Tani , P. Whitson , J. Williams , W. Meyer , R. Sicker , B. Au , M. Christiansen , A. Schofield , and D. Weitz Orders-of-magnitude performance increases in GPU-accelerated correlation of images from the international space station. Journal of Real-Time Image Processing, 5:179–193, 2010. 10.1007/s11554-009-0133-1.
 M. Macedonia The GPU enters computing’s mainstream. Computer, 36(10):106–108,2003.
 W. R. Mark , R. S. Glanville , K. Akeley , and M. J. Kilgard Cg: a system for programming graphics hardware in a c-like language. ACM Trans. Graph., 22(3):896–907, July 2003.
 N. Metropolis , A. Rosenbluth , M. Rosenbluth , A. Teller , and E. Teller Equation of state calculations by fast computing machines. J. Chem. Phys., 21:1087, 1953.
 S. Pabst , A. Koch , and W. Straßer Fast and scalable CPU/GPU collision detection for rigid and deformable surfaces. Computer Graphics Forum, 29(5):1605–1612, 2010.
 D. Parkinson Parallel efficiency can be greater than unity. Parallel Computing, 3(3):261 – 262, 1986.
 H. A. Peelle To teach Newton’s square root algorithm. SIGAPL APL Quote Quad, 5(4):48–50, December 1974.
 V. P. Plagianakos , N. K. Nousis , and M. N. Vrahatis Locating and computing in parallel all the simple roots of special functions using pvm. J. Comput. Appl. Math., 133(1-2):545–554, August 2001.
 T. Preis , P. Virnau , W. Paul , and J. J. Schneider GPU accelerated monte carlo simulation of the 2d and 3d ising model. J. Comput. Phys., 228(12):4468–4477, July 2009.
 P. E. Ross Why cpu frequency stalled. IEEE Spectr., 45(4):72–72, April 2008.
 J. Sugerman , K. Fatahalian , S. Boulos , K. Akeley , and P. Hanrahan Gramps: A programming model for graphics pipelines. ACM Trans. Graph., 28(1):4:1–4:11, February 2009.
 R. H. Swendsen and J. S. Wang Nonuniversal, critical dynamics in Monte Carlo simulations. Phys. Rev. Lett., 58:86, 1987.
 J. J. Tapia and R. D’Souza Parallelizing the cellular potts model on graphics processing units. Computer Physics Communications, 182(4):857–865, 2011.
 L. G. Valiant A bridging model for parallel computation. Commun. ACM, 33(8):103–111, August 1990.
 U. Wolff Collective Monte Carlo updating for spin systems. Physical Review Letters, 62:361–364, 1989.
 F. Y. Wu The Potts model. Reviews of Modern Physics, 54(1):235–268, January 1982.
 R. Yokota and L. A. Barba Hierarchical n-body simulations with autotuning for heterogeneous systems. Computing in Science and Engineering, 14(3):30–39, 2012.
 K. Zhou , Q. Hou , R. Wang , and B. Guo Real-time kd-tree construction on graphics hardware. ACM Trans. Graph., 27(5):126:1–126:11, December 2008.