We present an algebra for data-intensive scalable computing based on monoid homomorphisms that consists of a small set of operations that capture most features supported by current domain-specific languages for data-centric distributed computing. This algebra is being used as the formal basis of MRQL, which is a query processing and optimization system for large-scale distributed data analysis. The MRQL semantics is given in terms of monoid comprehensions, which support group-by and order-by syntax and can work on heterogeneous collections without requiring any extension to the monoid algebra. We present the syntax and semantics of monoid comprehensions and provide rules to translate them to the monoid algebra. We give evidence of the effectiveness of our algebra by presenting some important optimization rules, such as converting nested queries to joins.