Skip to main content
×
×
Home

A programming model and foundation for lineage-based distributed computation

  • PHILIPP HALLER (a1), HEATHER MILLER (a2) (a3) and NORMEN MÜLLER (a4)
Abstract

The most successful systems for “big data” processing have all adopted functional APIs. We present a new programming model, we call function passing, designed to provide a more principled substrate, or middleware, upon which to build data-centric distributed systems like Spark. A key idea is to build up a persistent functional data structure representing transformations on distributed immutable data by passing well-typed serializable functions over the wire and applying them to this distributed data. Thus, the function passing model can be thought of as a persistent functional data structure that is distributed, where transformations performed on distributed data are stored in its nodes rather than the distributed data itself. One advantage of this model is that failure recovery is simplified by design – data can be recovered by replaying function applications atop immutable data loaded from stable storage. Deferred evaluation is also central to our model; by incorporating deferred evaluation into our design only at the point of initiating network communication, the function passing model remains easy to reason about while remaining efficient in time and memory. Moreover, we provide a complete formalization of the programming model in order to study the foundations of lineage-based distributed computation. In particular, we develop a theory of safe, mobile lineages based on a subject reduction theorem for a typed core language. Furthermore, we formalize a progress theorem that guarantees the finite materialization of remote, lineage-based data. Thus, the formal model may serve as a basis for further developments of the theory of data-centric distributed programming, including aspects such as fault tolerance. We provide an open-source implementation of our model in and for the Scala programming language, along with a case study of several example frameworks and end-user programs written atop this model.

Copyright
References
Hide All
Agha, G. (1986) ACTORS: A Model of Concurrent Computation in Distributed Systems. Cambridge, MA, USA: MIT Press.
Agha, G. A., Mason, I. A., Smith, S. F. & Talcott, C. L. (1997) A foundation for actor computation. J. Funct. Prog. 7(1), 172.
Apache. (2015) Hadoop. Available at: http://hadoop.apache.org/, accessed January 30, 2018.
Billings, J., Sewell, P., Shinwell, M. & Strniša, R. (2006) Type-safe distributed programming for OCaml. In Proceedings of the 2006 Workshop on ML. New York, NY, USA: ACM, pp. 20–31.
Chambers, C., Raniwala, A., Perry, F., Adams, S, Henry, R. R., Bradshaw, R. & Weizenbaum, N. (2010) FlumeJava: Easy, efficient data-parallel pipelines. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. New York NY, USA: ACM. pp. 363–375.
Dean, J. & Ghemawat, S. (2008) MapReduce: Simplified data processing on large clusters. Commun. ACM 51 (1), 107113.
Dzik, J., Palladinos, N., Rontogiannis, K., Tsarpalis, E. & Vathis, N. (2013) MBrace: Cloud computing with monads. In PLOS@SOSP, Harris, T. & Madhavapeddy, A. (eds). New York, NY, USA: ACM.
Elsman, M. (2005) Type-specialized serialization with sharing. In Proceedings of the Symposium on Trends in Functional Programming, pp. 47–62.
Epstein, J., Black, A. P. & Jones, S. L. P. (2011) Towards Haskell in the cloud. In Proceedings of the Haskell Symposium, pp. 118–129.
Germain, G. (2006) Concurrency oriented programming in Termite Scheme. In Proceedings of the 2006 ACM SIGPLAN workshop on Erlang, p. 20.
Gunda, P. K., Ravindranath, L., Thekkath, C. A., Yu, Y. & Zhuang, L. (2010) Nectar: Automatic management of data and computation in datacenters. In OSDI, Arpaci-Dusseau, R. H. & Chen, B. (eds). Berkeley, CA, USA: USENIX Association, pp. 7588.
Haller, P. & Loiko, A. (2016) LaCasa: Lightweight affinity and object capabilities in Scala. In OOPSLA, Visser, E. & Smaragdakis, Y. (eds). New York, NY, USA: ACM, pp. 272291.
Haller, P. & Odersky, M. (2009) Scala actors: Unifying thread-based and event-based programming. Theor. Comput. Sci. 410(2), 202220.
Haller, P. & Odersky, M. (2010) Capabilities for uniqueness and borrowing. In Proceedings of the European Conference on Object-Oriented Programming, Maribor, Slovenia, June 21–25, 2010, pp. 354–378.
Haller, P., Prokopec, A., Miller, H., Klang, V., Kuhn, R. & Jovanovic, V. (2012) Futures and promises. Available at: http://docs.scala-lang.org/overviews/core/futures.html, accessed January 30, 2018.
He, J., Wadler, P. & Trinder, P. (2014) Typecasting actors: From Akka to TAkka. In Proceedings of the 5th Scala Workshop. New York, NY, USA: ACM, pp. 23–33.
Herhut, S., Hudson, R. L., Shpeisman, T. & Sreeram, J. (2013) River Trail: A path to parallelism in JavaScript. In Proceedings of the ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications. New York, NY, USA: ACM, pp. 729–744.
Hickey, R. (2008) The Clojure programming language. In Proceedings of the Dynamic Languages Symposium. New York, NY, USA: ACM, p. 1.
Isard, M., Budiu, M., Yu, Y., Birrell, A. & Fetterly, D. (2007) Dryad: Distributed data-parallel programs from sequential building blocks. In Proceedings of the ACM SIGOPS/EuroSys European Conference on Computer Systems. New York, NY, USA: ACM, pp. 59–72.
Kennedy, A. (2004) Pickler combinators. J. Funct. Program. 14 (6), 727739.
Matsakis, N. D. (2012) Parallel closures: A new twist on an old idea. In Proceedings of the 4th USENIX Workshop on Hot Topics in Parallelism, Boehm, H.-J. & Ceze, L. (eds), HotPar. Berkeley, CA, USA: USENIX Association, p. 5.
Miller, H., Haller, P., Burmako, E. & Odersky, M. (2013) Instant pickles: Generating object-oriented pickler combinators for fast and extensible serialization. In Proceedings of the ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications. New York, NY, USA: ACM, pp. 183–202.
Miller, H., Haller, P. & Odersky, M. (2014) Spores: A type-based foundation for closures in the age of concurrency and distribution. In Proceedings of the European Conference on Object-Oriented Programming. Berlin, Heidelberg, Germany: Springer-Verlag, pp. 308–333.
Milner, R., Parrow, J. & Walker, D. (1992) A calculus of mobile processes. Inf. Comput. 100(1), 177.
Murphy, T. VII, Crary, K. & Harper, R. (2007) Type-safe distributed programming with ML5. In Proceedings of the International Symposium on Trustworthy Global Computing. Berlin, Heidelberg, Germany: Springer-Verlag, pp. 108–123.
Murray, D. G., Schwarzkopf, M., Smowton, C., Smith, S., Madhavapeddy, A. & Hand, S. (2011) CIEL: A universal execution engine for distributed data-flow computing. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation, Andersen, D. G. & Ratnasamy, S. (eds). Berkeley, CA, USA: USENIX Association.
NICTA. (2015) Scoobi. Available at: https://github.com/nicta/scoobi, accessed January 30, 2018.
Odersky, M. & Zenger, M. (2005) Scalable component abstractions. In Proceedings of the ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, Johnson, R. E. & Gabriel, R. P. (eds). New York, NY, USA: ACM, pp. 41–57.
Odersky, M., Spoon, L. & Venners, B. (2010) Programming in Scala, 2nd edn. Walnut Creek, CA, USA: Artima.
Peyton Jones, S., Gordon, A. & Finne, S. (1996) Concurrent Haskell. In Proceedings of the ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. New York, NY, USA: ACM, pp. 295–308.
Pierce, B. C. (2002) Types and Programming Languages. Cambridge, MA, USA: MIT Press.
Rossberg, A., Le Botlan, D., Tack, G., Brunklaus, T. & Smolka, G. (2004) Alice through the looking glass. Trends Funct. Program. 5, 7996.
Sewell, P., Leifer, J. J., Wansbrough, K., Nardelli, F. Z., Allen-Williams, M., Habouzit, P. & Vafeiadis, V. (2005) Acute: High-level programming language design for distributed computation. In Proceedings of the ACM SIGPLAN International Conference on Functional Programming. New York, NY, USA: ACM, pp. 15–26.
Shapiro, M., Preguiça, N. M., Baquero, C. & Zawirski, M. (2011) Conflict-free replicated data types. In SSS, Défago, X., Petit, F. & Villain, V. (eds), Lecture Notes in Computer Science, vol. 6976. Berlin, Heidelberg, Germany: Springer, pp. 386400.
Twitter. (2015) Scalding. Available at: https://github.com/twitter/scalding, accessed January 30, 2018.
Typesafe. (2015) Akka. Available at: http://akka.io/, accessed January 30, 2018.
Waldo, J., Wyant, G., Wollrath, A. & Kendall, S. C. (1996) A note on distributed computing. In Proceedings of the International Workshop on Mobile Object Systems, Vitek, J., & Tschudin, C. (eds). Berlin, Heidelberg, Germany: Springer-Verlag, pp. 49–64.
Yu, Y., Isard, M., Fetterly, D., Budiu, M., Erlingsson, Ú., Gunda, P. K. & Currey, J. (2008) DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation, Draves, Richard, & van Renesse, Robbert (eds). Berkeley, CA, USA: USENIX Association, pp. 1–14.
Zaharia, M., Chowdhury, M., Franklin, M. J., Shenker, S. & Stoica, I. (2010) Spark: Cluster computing with working sets. In Proceedings of the USENIX Workshop on Hot Topics in Cloud Computing. HotCloud'10. Berkeley, CA, USA: USENIX Association, pp. 10–10.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Journal of Functional Programming
  • ISSN: 0956-7968
  • EISSN: 1469-7653
  • URL: /core/journals/journal-of-functional-programming
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
×

Metrics

Altmetric attention score

A programming model and foundation for lineage-based distributed computation

  • PHILIPP HALLER (a1), HEATHER MILLER (a2) (a3) and NORMEN MÜLLER (a4)
Submit a response

Discussions

No Discussions have been published for this article.

×

Reply to: Submit a response


Your details


Conflicting interests

Do you have any conflicting interests? *