Skip to main content
×
×
Home

Push versus pull-based loop fusion in query engines

  • AMIR SHAIKHHA (a1), MOHAMMAD DASHTI (a1) and CHRISTOPH KOCH (a1)
Abstract

Database query engines use pull-based or push-based approaches to avoid the materialization of data across query operators. In this paper, we study these two types of query engines in depth and present the limitations and advantages of each engine. Similarly, the programming languages community has developed loop fusion techniques to remove intermediate collections in the context of collection programming. We draw parallels between databases (DB) and programming language (PL) research by demonstrating the connection between pipelined query engines and loop fusion techniques. Based on this connection, we propose a new type of pull-based engine, inspired by a loop fusion technique, which combines the benefits of both approaches. Then, we experimentally evaluate the various engines, in the context of query compilation, for the first time in a fair environment, eliminating the biasing impact of ancillary optimizations that have traditionally only been used with one of the approaches. We show that for realistic analytical workloads, there is no considerable advantage for either form of pipelined query engine, as opposed to what recent research suggests. Also, by using micro-benchmarks, which demonstrate certain edge cases on which one approach or the other performs better, we show that our proposed engine dominates the existing engines by combining the benefits of both.

Copyright
References
Hide All
Abadi, D., Madden, S. & Ferreira, M. (2006) Integrating compression and execution in column-oriented database systems. In Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data. ACM, pp. 671–682.
Abadi, D. J., Myers, D. S., DeWitt, D. J. & Madden, S. R. (2007) Materialization strategies in a column-oriented DBMS. In Proceedings of the IEEE 23rd International Conference on Data Engineering, ICDE 2007. IEEE, pp. 466–475.
Ahmad, Y. & Koch, C. (2009) DBToaster: A SQL compiler for high-performance delta processing in main-memory databases. PVLDB 2 (2), 15661569.
Armbrust, M., Xin, R. S., Lian, C., Huai, Y., Liu, D., Bradley, J. K., Meng, X., Kaftan, T., Franklin, M. J., Ghodsi, A. & Zaharia, M. (2015) Spark SQL: Relational data processing in spark. In Proceedings of the SIGMOD '15. New York, NY, USA: ACM.
Biboudis, A., Palladinos, N., Fourtounis, G. & Smaragdakis, Y. (2015) Streams à la carte: Extensible pipelines with object algebras. In Proceedings of the 29th European Conference on Object-Oriented Programming, p. 591.
Binnig, C., Hildenbrand, S., & Färber, F. (2009) Dictionary-based order-preserving string compression for main memory column stores. In Proceedings of the SIGMOD '09. ACM, pp. 283–296.
Böhm, C. & Berarducci, A. (1985) Automatic synthesis of typed λ-programs on term algebras. Theor. Comput. Sci. 39, 135154.
Breazu-Tannen, V. & Subrahmanyam, R. (1991) Logical and Computational Aspects of Programming with Sets/Bags/Lists. Springer.
Breazu-Tannen, V., Buneman, P. & Wong, L. (1992) Naturally Embedded Query Languages. Springer.
Buchlovsky, P. & Thielecke, H. (2006) A type-theoretic reconstruction of the visitor pattern. Electron. Notes Theor. Comput. Sci. 155, 309329.
Chhugani, J., Nguyen, A. D., Lee, V. W., Macy, W., Hagog, M., Chen, Y.-K., Baransi, A., Kumar, S. & Dubey, P. (2008) Efficient implementation of sorting on multi-core SIMD CPU architecture. PVLDB 1 (2), 13131324.
Choi, J.-D., Gupta, M., Serrano, M., Sreedhar, V. C. & Midkiff, S. (1999) Escape analysis for java. ACM SIGPLAN Notices 34 (10), 119.
Coutts, D., Leshchinskiy, R. & Stewart, D. (2007) Stream fusion. From lists to streams to nothing at all. In Proceedings of the ICFP '07.
Crotty, A., Galakatos, A., Dursun, K., Kraska, T., Çetintemel, U. & Zdonik, S. B. (2015) Tupleware: “Big” data, big analytics, small clusters. In Proceedings of the CIDR.
Diaconu, C., Freedman, C., Ismert, E., Larson, P.-A., Mittal, P., Stonecipher, R., Verma, N. & Zwilling, M. (2013) Hekaton: SQL server's memory-optimized OLTP engine. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, SIGMOD '13. New York, NY, USA: ACM, pp. 1243–1254.
Emir, B., Odersky, M. & Williams, J. (2007) Matching objects with patterns. In Proceedings of the ECOOP'07. Berlin, Heidelberg: Springer-Verlag.
Fegaras, L. & Maier, D. (2000) Optimizing object queries using an effective calculus. TODS 25 (4), 457516.
Gedik, B., Andrade, H., Wu, K.-L., Yu, P. & Doo, M. (2008) SPADE: The system S seclarative stream processing engine. In Proceedings of the SIGMOD.
Gibbons, J. & Oliveira, B. C. d S. (2009) The essence of the iterator pattern. J. Funct. Program. 19 (3–4), 377402.
Gill, A., Launchbury, J. & Peyton Jones, S. L. (1993) A short cut to deforestation. In Proceedings of the FPCA. ACM.
Graefe, G. (1994) Volcano–an extensible and parallel query evaluation system. IEEE Trans. Knowl. Data Eng. 6 (1), 120135.
Graefe, G. (1993) Query evaluation techniques for large databases. CSUR 25 (2), 73169.
Grust, T. & Scholl, M. (1999) How to comprehend queries functionally. J. Intell. Inform. Syst. 12 (2–3), 191218.
Grust, T., Mayr, M., Rittinger, J. & Schreiber, T. (2009) FERRY: Database-supported program execution. In Proceedings of the SIGMOD 2009. ACM.
Grust, T., Rittinger, J. & Schreiber, T. (2010) Avalanche-safe LINQ compilation. PVLDB 3 (1–2), 162172.
Hellerstein, J. M., Stonebraker, M. & Hamilton, J. (2007) Architecture of a database system. Found. Trends® Databases 1 (2), 141259.
Hinze, R., Harper, T. & James, D. W. H. (2011) Theory and practice of fusion. In Proceedings of the 22Nd International Conference on Implementation and Application of Functional Languages, IFL'10. Berlin, Heidelberg: Springer-Verlag, pp. 19–37.
Hirzel, M., Soulé, R., Schneider, S., Gedik, B. & Grimm, R. (2014) A catalog of stream processing optimizations. ACM Comput. Surv. 46 (4), 46:146:34.
Hofer, C. & Ostermann, K. (2010) Modular domain-specific language components in scala. In Proceedings of the 9th International Conference on Generative Programming and Component Engineering, GPCE '10. New York, NY, USA: ACM, pp. 83–92.
Hudak, P. (1996) Building domain-specific embedded languages. ACM Comput. Surv. 28 (4es), 196.
Idreos, S., Groffen, F., Nes, N., Manegold, S., Mullender, S. K., Kersten, M. L., (2012) MonetDB: Two decades of research in column-oriented database architectures. IEEE Data Eng. Bull. 35 (1), 4045.
Jones, S. L. P., Hall, C., Hammond, K., Partain, W. & Wadler, P. (1993) The glasgow Haskell compiler: A technical overview. In Proceedings of the UK Joint Framework for Information Technology, Technical Conference, vol. 93. Citeseer.
Jonnalagedda, M. & Stucki, S. (2015) Fold-based fusion as a library: A generative programming pearl. In Proceedings of the 6th ACM SIGPLAN Symposium on Scala. ACM, pp. 41–50.
Karpathiotakis, M., Alagiannis, I., Heinis, T, Branco, M. & Ailamaki, A. (2015) Just-in-time data virtualization: Lightweight data management with ViDa. In Proceedings of the CIDR.
Karpathiotakis, M., Alagiannis, I. & Ailamaki, A. (2016) Fast queries over heterogeneous data through engine customization. In Proceedings of the VLDB Endowment 9 (12), 972983.
Klonatos, Y., Koch, C., Rompf, T. & Chafi, H. (2014a) Building efficient query engines in a high-level language. PVLDB 7 (10), 853864.
Klonatos, Y., Koch, C., Rompf, T. & Chafi, H. (2014b) Errata for “Building efficient query engines in a high-level language” PVLDB 7(10):853-864. PVLDB 7 (13), 17841784.
Koch, C. (2010) Incremental query evaluation in a ring of databases. In Proceedings of the PODS 2010. ACM.
Koch, C. (2014) Abstraction without regret in database systems building: A manifesto. IEEE Data Eng. Bull. 37 (1), 7079.
Koch, C., Ahmad, Y., Kennedy, O., Nikolic, M., Nötzli, A., Lupei, D. & Shaikhha, A. (2014) DBToaster: Higher-order delta processing for dynamic, frequently fresh views. Vldbj 23 (2), 253278.
Krikellas, K., Viglas, S. & Cintra, M. (2010) Generating code for holistic query evaluation. In Proceedings of the ICDE, pp. 613–624.
Li, Z. & Ross, K. A. (1999) Fast joins using join indices. VLDB J. 8 (1), 124.
Lorie, R. A. (1974) XRM: An Extended (N-ary) Relational Memory. IBM.
Mainland, G., Leshchinskiy, R. & Peyton Jones, S. (2013) Exploiting vector instructions with generalized stream fusion. In Proceedings of the ICFP '13. New York, NY, USA: ACM.
Meijer, E., Beckman, B. & Bierman, G. (2006) LINQ: Reconciling object, relations and XML in the .NET framework. In Proceedings of the SIGMOD '06. ACM.
Murray, D. G., Isard, M. & Yu, Y. (2011) Steno: Automatic optimization of declarative queries. In Proceedings of the PLDI '11. New York, NY, USA: ACM.
Nagel, F., Bierman, G. & Viglas, S. D. (2014) Code generation for efficient query processing in managed runtimes. PVLDB 7 (12), 10951106.
Neumann, T. (2011) Efficiently compiling efficient query plans for modern hardware. PVLDB 4 (9), 539550.
Padmanabhan, S., Malkemus, T., Jhingran, A. & Agarwal, R. (2001) Block oriented processing of relational database operations in modern computer architectures. In Proceedings of the ICDE, pp. 567–574.
Paredaens, J. & Gucht, D. V. (1988) Possibilities and limitations of using flat operators in nested algebra expressions. In Proceedings of the Seventh ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, March 21–23, 1988, Austin, Texas, USA, pp. 29–38.
Park, Y., Seo, S., Park, H., Cho, H. K., & Mahlke, S. (2012) SIMD Defragmenter: Efficient ILP realization on data-parallel architectures. In Proceedings of the ACM SIGARCH Computer Architecture News, vol. 40. ACM, pp. 363–374.
Peyton Jones, S., Leshchinskiy, R., Keller, G. & MT Chakravarty, M.. (2008) Harnessing the multicores: Nested data parallelism in Haskell. In Proceedings of the LIPIcs-Leibniz International Proceedings in Informatics, vol. 2. Schloss Dagstuhl-Leibniz-Zentrum für Informatik.
Pierce, B. C. (2002) Types and Programming Languages. MIT press.
Polychroniou, O., Raghavan, A. & Ross, K. A. (2015) Rethinking SIMD vectorization for in-memory databases. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, SIGMOD '15. New York, NY, USA: ACM, pp. 1493–1508.
Schuh, S., Chen, X. & Dittrich, J. (2016) An experimental comparison of thirteen relational equi-joins in main memory. In Proceedings of the SIGMOD '16. New York, NY, USA: ACM, pp. 1961–1976.
Shaikhha, A., Klonatos, Y. & Koch, C. (2018) Building efficient query engines in a high-level language. Trans. Database Syst. 43 (1).
Shaikhha, A., Klonatos, Y., Parreaux, L., Brown, L., Dashti, M. & Koch, C. (2016) How to architect a query compiler. In Proceedings of the SIGMOD'16.
Shivers, O. & Might, M. (2006) Continuations and transducer composition. In Proceedings of the PLDI '06. ACM.
Stonebraker, M., Abadi, D. J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O'Neil, E., O'Neil, P., Rasin, A., Tran, N. & Zdonik, S. (2005) C-store: A column-oriented DBMS. In Proceedings of the VLDB '05. VLDB Endowment.
Svenningsson, J. (2002) Shortcut fusion for accumulating parameters & zip-like Functions. In Proceedings of the ICFP '02. ACM.
Tibbetts, R., Yang, S., MacNeill, R. & Rydzewski, D. (2011) StreamBase LiveView: Push-based real-time analytics. In Proceedings of the StreamBase Systems (Jan 2012).
Transaction Processing Performance Council. (2017) TPC-H, a Decision Support Benchmark. http://www.tpc.org/tpch.
Trinder, P. (1992) Comprehensions, a query notation for DBPLs. In Proceedings of the 3rd DBPL Workshop, DBPL3. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc, pp. 55–68.
Veldhuizen, T. L. (2014) Leapfrog triejoin: A simple, worst-case optimal join algorithm. In Proceedings of the 17th International Conference on Database Theory (ICDT), Athens, Greece, March 24–28, 2014.
Viglas, S., Bierman, G. M., & Nagel, F. (2014) Processing declarative queries through generating imperative code in managed runtimes. IEEE Data Eng. Bull. 37 (1), 1221.
Vlissides, J., Helm, R., Johnson, R. & Gamma, E. (1995) Design patterns: Elements of reusable object-oriented software. Reading: Addison-Wesley 49 (120), 11.
Wadler, P. (1988) Deforestation: Transforming programs to eliminate trees. In Proceedings of the ESOP'88. Springer, pp. 344–358.
Wadler, P. (1990) Comprehending monads. In Proceedings of the 1990 ACM Conference on LISP and Functional Programming, LFP '90. New York, NY, USA: ACM, pp. 61–78.
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M. J., Shenker, S., & Stoica, I. (2012) Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the NSDI'12. USENIX Association.
Zhou, J. & Ross, K. A. (2002) Implementing database operations using SIMD instructions. In Proceedings of the SIGMOD '02. New York, NY, USA: ACM.
Zukowski, M., Boncz, P. A., Nes, N., & Héman, S. (2005) MonetDB/X100 – A DBMS In The CPU Cache. IEEE Data Eng. Bull. 28, 1722.
Zukowski, M., Heman, S., Nes, N., & Boncz, P. (2006) Super-scalar RAM-CPU cache compression. In Proceedings of the 22nd International Conference on Data Engineering, ICDE '06. Washington, DC, USA: IEEE Computer Society, p. 59.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Journal of Functional Programming
  • ISSN: 0956-7968
  • EISSN: 1469-7653
  • URL: /core/journals/journal-of-functional-programming
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
×

Metrics

Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 10 *
Loading metrics...

Abstract views

Total abstract views: 139 *
Loading metrics...

* Views captured on Cambridge Core between 10th April 2018 - 19th April 2018. This data will be updated every 24 hours.