Hostname: page-component-6766d58669-7fx5l Total loading time: 0 Render date: 2026-05-15T13:28:30.827Z Has data issue: false hasContentIssue false

An algebra for distributed Big Data analytics

Published online by Cambridge University Press:  11 December 2017

LEONIDAS FEGARAS*
Affiliation:
Department of Computer Science and Engineering, University of Texas at Arlington, Arlington TX 76019, USA (e-mail: fegaras@cse.uta.edu)
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the 'Save PDF' action button.

We present an algebra for data-intensive scalable computing based on monoid homomorphisms that consists of a small set of operations that capture most features supported by current domain-specific languages for data-centric distributed computing. This algebra is being used as the formal basis of MRQL, which is a query processing and optimization system for large-scale distributed data analysis. The MRQL semantics is given in terms of monoid comprehensions, which support group-by and order-by syntax and can work on heterogeneous collections without requiring any extension to the monoid algebra. We present the syntax and semantics of monoid comprehensions and provide rules to translate them to the monoid algebra. We give evidence of the effectiveness of our algebra by presenting some important optimization rules, such as converting nested queries to joins.

Information

Type
Research Article
Copyright
Copyright © Cambridge University Press 2017 
Submit a response

Discussions

No Discussions have been published for this article.