Skip to main content Accessibility help
×
Home

A Case for Stale Synchronous Distributed Model for Declarative Recursive Computation

  • ARIYAM DAS (a1) and CARLO ZANIOLO (a1)

Abstract

A large class of traditional graph and data mining algorithms can be concisely expressed in Datalog, and other Logic-based languages, once aggregates are allowed in recursion. In fact, for most BigData algorithms, the difficult semantic issues raised by the use of non-monotonic aggregates in recursion are solved by Pre-Mappability ( ${\cal P}$ reM), a property that assures that for a program with aggregates in recursion there is an equivalent aggregate-stratified program. In this paper we show that, by bringing together the formal abstract semantics of stratified programs with the efficient operational one of unstratified programs, $\[{\cal P}\]$ reM can also facilitate and improve their parallel execution. We prove that $\[{\cal P}\]$ reM-optimized lock-free and decomposable parallel semi-naive evaluations produce the same results as the single executor programs. Therefore, $\[{\cal P}\]$ reM can be assimilated into the data-parallel computation plans of different distributed systems, irrespective of whether these follow bulk synchronous parallel (BSP) or asynchronous computing models. In addition, we show that non-linear recursive queries can be evaluated using a hybrid stale synchronous parallel (SSP) model on distributed environments. After providing a formal correctness proof for the recursive query evaluation with $\[{\cal P}\]$ reM under this relaxed synchronization model, we present experimental evidence of its benefits.

Copyright

References

Hide All
Ameloot, T. J. 2014. Declarative networking: Recent theoretical work on coordination, correctness, and declarative semantics. SIGMOD Rec. 43, 2, 516.
Ameloot, T. J., Geck, G., Ketsman, B., Neven, F., and Schwentick, T. 2017. Parallel-correctness and transferability for conjunctive queries. J. ACM 64, 5, 36:136:38.
Ameloot, T. J., Ketsman, B., Neven, F., and Zinn, D. 2015. Weaker forms of monotonicity for declarative networking: A more fine-grained answer to the calm-conjecture. ACM Trans. Database Syst. 40, 4, 21:121:45.
Ameloot, T. J., Neven, F., and Van Den Bussche, J. 2013. Relational transducers for declarative networking. J. ACM 60, 2, 15:115:38.
Ananthanarayanan, G., Kandula, S., Greenberg, A., Stoica, I., Lu, Y., Saha, B., and Harris, E. 2010. Reining in the outliers in map-reduce clusters using mantri. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation. OSDI’10. 265278.
Aref, M., ten Cate, B., Green, T. J., Kimelfeld, B., Olteanu, D., Pasalic, E., Veldhuizen, T. L., and Washburn, G. 2015. Design and implementation of the logicblox system. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. 13711382.
Beckman, P., Iskra, K., Yoshii, K., and Coghlan, S. 2006. The influence of operating systems on the performance of collective operations at extreme scale. In 2006 IEEE International Conference on Cluster Computing. 112.
Cipar, J., Ho, Q., Kim, J. K., Lee, S., Ganger, G. R., Gibson, G., Keeton, K., and Xing, E. 2013. Solving the straggler problem with bounded staleness. In Proceedings of the 14th USENIX Conference on Hot Topics in Operating Systems. HotOS’13. 2222.
Condie, T., Das, A., Interlandi, M., Shkapsky, A., Yang, M., and Zaniolo, C. 2018. Scaling-up reasoning and advanced analytics on bigdata. TPLP 18 , 5-6, 806845.
Cui, H., Cipar, J., Ho, Q., Kim, J. K., Lee, S., Kumar, A., Wei, J., Dai, W., Ganger, G. R., Gibbons, P. B., Gibson, G. A., and Xing, E. P. 2014. Exploiting bounded staleness to speed up big data analytics. In USENIX ATC. 3748.
Das, A., Gandhi, S. M., and Zaniolo, C. 2018. Astro: A datalog system for advanced stream reasoning. In CIKM’18. 18631866.
Das, A. and Zaniolo, C. 2019. A case for stale synchronous distributed model for declarative recursive computation. CoRR abs/1907.10278.
Ganguly, S., Silberschatz, A., and Tsur, S. 1992. Parallel bottom-up processing of datalog queries. J. Log. Program. 14, 1-2, 101126.
Gu, J., Watanabe, Y., Mazza, W., Shkapsky, A., Yang, M., Ding, L., and Zaniolo, C. 2019. Rasql: Greater power and performance for big data analytics with recursive-aggregate-sql on spark. In SIGMOD’19.
Ho, Q., Cipar, J., Cui, H., Kim, J. K., Lee, S., Gibbons, P. B., Gibson, G. A., Ganger, G. R., and Xing, E. P. 2013. More effective distributed ml via a stale synchronous parallel parameter server. In NIPS. 12231231.
Interlandi, M. and Tanca, L. 2018. A datalog-based computational model for coordination-free, data-parallel systems. Theory and Practice of Logic Programming 18, 5-6, 874927.
Krevat, E., Tucek, J., and Ganger, G. R. 2011. Disks are like snowflakes: No two are alike. In Proceedings of the 13th USENIX Conference on Hot Topics in Operating Systems. HotOS’13. 1414.
Lee, S., Kim, J. K., Zheng, X., Ho, Q., Gibson, G. A., and Xing, E. P. 2014. On model parallelization and scheduling strategies for distributed machine learning. In NIPS. 28342842.
Low, Y., Bickson, D., Gonzalez, J., Guestrin, C., Kyrola, A., and Hellerstein, J. M. 2012. Distributed graphlab: A framework for machine learning and data mining in the cloud. Proc. VLDB Endow. 5, 8, 716727.
Malewicz, G., Austern, M. H., Bik, A. J., Dehnert, J. C., Horn, I., Leiser, N., and Czajkowski, G. 2010. Pregel: A system for large-scale graph processing. In SIGMOD’10. 135146.
Mazuran, M., Serra, E., and Zaniolo, C. 2013. Extending the power of datalog recursion. The VLDB Journal 22, 4, 471493.
Seo, J., Park, J., Shin, J., and Lam, M. S. 2013. Distributed socialite: A datalog-based language for large-scale graph analysis. Proc. VLDB Endow. 6, 14, 19061917.
Shkapsky, A., Yang, M., Interlandi, M., Chiu, H., Condie, T., and Zaniolo, C. 2016. Big data analytics with datalog queries on spark. In SIGMOD. ACM, New York, NY, USA, 11351149.
Wang, J., Balazinska, M., and Halperin, D. 2015. Asynchronous and fault-tolerant recursive datalog evaluation in shared-nothing engines. Proc. VLDB Endow. 8, 12, 15421553.
Yan, D., Cheng, J., Lu, Y., and Ng, W. 2015. Effective techniques for message reduction and load balancing in distributed graph computation. In WWW. 13071317.
Yang, M., Shkapsky, A., and Zaniolo, C. 2017. Scaling up the performance of more powerful datalog systems on multicore machines. VLDB J. 26, 2, 229248.
Zaniolo, C., Yang, M., Das, A., and Interlandi, M. 2016. The magic of pushing extrema into recursion: Simple, powerful datalog programs. In AMW.
Zaniolo, C., Yang, M., Interlandi, M., Das, A., Shkapsky, A., and Condie, T. 2017. Fixpoint semantics and optimization of recursive Datalog programs with aggregates. TPLP 17, 5-6, 10481065.
Zaniolo, C., Yang, M., Interlandi, M., Das, A., Shkapsky, A., and Condie, T. 2018. Declarative bigdata algorithms via aggregates and relational database dependencies. In AMW.

Keywords

Related content

Powered by UNSILO
Type Description Title
PDF
Supplementary materials

Das and Zaniolo supplementary material
Appendix

 PDF (43 KB)
43 KB

A Case for Stale Synchronous Distributed Model for Declarative Recursive Computation

  • ARIYAM DAS (a1) and CARLO ZANIOLO (a1)

Metrics

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed.