Skip to main content

Algebraic data integration*


In this paper, we develop an algebraic approach to data integration by combining techniques from functional programming, category theory, and database theory. In our formalism, database schemas and instances are algebraic (multi-sorted equational) theories of a certain form. Schemas denote categories, and instances denote their initial (term) algebras. The instances on a schema S form a category, SInst, and a morphism of schemas F : ST induces three adjoint data migration functors: Σ F : SInstTInst, defined by substitution along F, which has a right adjoint Δ F : TInstSInst, which in turn has a right adjoint Π F : SInstTInst. We present a query language based on for/where/return syntax where each query denotes a sequence of data migration functors; a pushout-based design pattern for performing data integration using our formalism; and describe the implementation of our formalism in a tool we call AQL (Algebraic Query Language).

Hide All

The authors would like to thank David Spivak and Peter Gates. Patrick Schultz was supported by AFOSR grant FA9550-14-1-0031, ONR grant N000141310260, and NASA grant NNH13ZEA001N. Ryan Wisnesky was supported by NIST SBIR grant 70NANB15H290.

Hide All
Abiteboul S., Hull R. & Vianu V. (1995) Foundations of Databases. Addison-Wesley-Longman.
Adámek J., Rosický J. & Vitale E. M. (2011) Algebraic Theories. Cambridge Tracts in Mathematics, vol. 184. Cambridge University Press.
Alagic S. & Bernstein P. (2001) A model theory for generic schema management. In Proceedings of the 8th International Workshop on Database Programming Languages, 228–246.
Angles R. & Gutierrez C. (2008) Survey of graph database models. ACM Comput. Surv., 40 (1).
Baader F. & Nipkow T. (1998) Term Rewriting and All That. Cambridge University Press.
Bachmair L., Dershowitz N. & Plaisted D. A. (1989) Completion without failure. Resolution Equ. Algebr. Struct. – Rewriting Tech. 2.
Barr M. & Wells C. (1995) Category Theory for Computing Science. Prentice Hall International.
Bertot Y. & Castéran P. (2010) Interactive Theorem Proving and Program Development: Coq'art the Calculus of Inductive Constructions. Springer.
Blum E. K., Ehrig H. & Parisi-Presicce F. (1987) Algebraic specification of modules and their basic interconnections. J. Comput. Syst. Sci. 34 (2–3), 293339.
Bush M. R., Leeming M. & Walters R. F. C. (2003) Computing left Kan extensions. J. Symbol. Comput. 35 (2), 107126.
Doan H., Halevy A. & Ives Z. (2012) Principles of Data Integration. Morgan Kaufmann.
Enderton H. B. (2001) A Mathematical Introduction to Logic. Academic Press.
Fagin R., Kolaitis P. G., Miller R. J. & Popa L. (2005b) Data exchange: Semantics and query answering. Theoretical Computer Science 336 (1), 89124.
Fagin R., Kolaitis P. G., Popa L. & Tan W. (2005a) Composing schema mappings: Second-order dependencies to the rescue. ACM Transactions on Database Systems 30 (4), 9941055.
Fleming M., Gunther R. & Rosebrugh R. (2003) A database of categories. J. Symbol. Comput. 35 (2), 127135.
Garcia-Molina H., Ullman J. D. & Widom J. (2008) Database Systems: The Complete Book, 2nd ed. Upper Saddle River, NJ, USA: Prentice Hall.
Ghilardi S., Lutz C. & Wolter F. (2006) Did I damage my ontology? A case for conservative extensions in description logics. In Proceedings of the 10th International Conference on Principles of Knowledge Representation and Reasoning, 187–197.
Goguen J. A. & Burstall R. M. (1984) Introducing institutions. In Proceedings of the Carnegie Mellon Workshop on Logic of Programs, 221–256.
Goguen J. (2004) Information Integration in Institutions [online]. Available at:
Grust T. (2004) Monad comprehensions: A versatile representation for queries. In The Functional Approach to Data Management: Modeling, Analyzing and Integrating Heterogeneous Data, Gray P. M. D., Kerschberg L., King P. J. H. & Poulovassilis A. (eds). Berlin, Heidelberg: Springer, 288311.
Haas L. M., Hernández M. A., Ho H., Popa L. & Roth M. (2005) Clio grows up: From research prototype to industrial tool. In Proceedings of the International Conference on Management of Data, 805–810.
Johnson M., Rosebrugh R. & Wood R. J. (2002) Entity-Relationship-Attribute designs and sketches. Theory Appl. Categories 10, 94112.
Kapur D. & Narendran P. (1985) The Knuth–Bendix completion procedure and Thue systems. SIAM J. Comput. 14 (4), 10521072.
Knuth D. & Bendix P. (1970) Simple word problems in universal algebra. In Computational Problems in Abstract Algebra, Leech J. (ed). Springer.
Lüth C., Roggenbach M. & Schröder L. (2005) CCC – The CASL consistency checker. In Recent Trends in Algebraic Development Techniques, 17th International Workshop, Fiadeiro J. (ed). Lecture Notes in Computer Science, vol. 3423, Springer, 94105.
Melnik S., Rahm E. & Bernstein P. A. (2003) Rondo: A programming platform for generic model management. In Proceedings of the International Conference on Management of Data, 193–204.
Mitchell J. C. (1996) Foundations of Programming Languages. MIT Press.
Mossakowski T., Krumnack U. & Maibaum T. (2014) What is a derived signature morphism? In Recent Trends in Algebraic Development Techniques, Lecture Notes in Computer Science, vol 9463, Springer, 90109.
Mossakowski T., Rabe F. & Mihai C. (2017) Canonical selection of colimits, Algebraic Model Management: A Survery. arXiv:1705.09363.
Nelson G. & Oppen D. C. (1980) Fast decision procedures based on congruence closure. J. ACM 27 (2), 356364.
Patterson E. (2017) Knowledge representation in bicategories of relations. arXiv:1706.00526.
Roberson E. L. & Wyss C. M. (2004) Optimal tuple merge is NP-complete. Bloomington: Indiana University School of Informatics and Computing.
Schultz P., Spivak D. I., Vasilakopoulou C. & Wisnesky R. (2017) Algebraic databases. Theory Appl. Categ. 32.
Schultz P., Spivak D. I. & Wisnesky R. (2015) QINL: Query-integrated languages. arXiv: 1511.06459.
Schultz P., Spivak D. I. & Wisnesky R. (2016) Algebraic model management: A survey. In Proceedings of the Workshop on Algebraic Development Techniques.
Shipman D. W. (1981) The functional data model and the data languages daplex. ACM Trans. database Syst. 6 (1), 140173.
Spivak D. I. (2012) Functorial data migration. Inform. Comput. 217, 3151.
Spivak D. I. (2014) Database queries and constraints via lifting problems. Math. Struct. Comput. Sci. 24 (6).
Spivak D. I. & Wisnesky R. (2015) Relational foundations for functorial data migration. In Proceedings of the 15th Symposium on Database Programming Languages, 21–28.
Zieliński B., Maślanka P. & Sobieski Ś. (2013) Allegories for database modeling. In Model and Data Engineering: 3rd International Conference, 278–289.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Journal of Functional Programming
  • ISSN: 0956-7968
  • EISSN: 1469-7653
  • URL: /core/journals/journal-of-functional-programming
Please enter your name
Please enter a valid email address
Who would you like to send this to? *


Full text views

Total number of HTML views: 0
Total number of PDF views: 17 *
Loading metrics...

Abstract views

Total abstract views: 106 *
Loading metrics...

* Views captured on Cambridge Core between 2nd November 2017 - 17th December 2017. This data will be updated every 24 hours.