Published online by Cambridge University Press: 02 November 2017
In this paper, we develop an algebraic approach to data integration by combining techniques from functional programming, category theory, and database theory. In our formalism, database schemas and instances are algebraic (multi-sorted equational) theories of a certain form. Schemas denote categories, and instances denote their initial (term) algebras. The instances on a schema S form a category, S–Inst, and a morphism of schemas F : S → T induces three adjoint data migration functors: Σ F : S–Inst → T–Inst, defined by substitution along F, which has a right adjoint Δ F : T–Inst → S–Inst, which in turn has a right adjoint Π F : S–Inst → T–Inst. We present a query language based on for/where/return syntax where each query denotes a sequence of data migration functors; a pushout-based design pattern for performing data integration using our formalism; and describe the implementation of our formalism in a tool we call AQL (Algebraic Query Language).
The authors would like to thank David Spivak and Peter Gates. Patrick Schultz was supported by AFOSR grant FA9550-14-1-0031, ONR grant N000141310260, and NASA grant NNH13ZEA001N. Ryan Wisnesky was supported by NIST SBIR grant 70NANB15H290.