Skip to main content Accessibility help
×
Home
Hostname: page-component-78bd46657c-zhxtg Total loading time: 0.189 Render date: 2021-05-10T00:12:49.518Z Has data issue: true Feature Flags: { "shouldUseShareProductTool": true, "shouldUseHypothesis": true, "isUnsiloEnabled": true, "metricsAbstractViews": false, "figures": false, "newCiteModal": false, "newCitedByModal": true }

Algebraic data integration*

Published online by Cambridge University Press:  02 November 2017

PATRICK SCHULTZ
Affiliation:
Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA, USA (e-mail: schultzp@mit.edu)
RYAN WISNESKY
Affiliation:
Categorical Informatics, Inc., Cambridge, MA, USA (e-mail: ryan@catinf.com)
Corresponding

Abstract

In this paper, we develop an algebraic approach to data integration by combining techniques from functional programming, category theory, and database theory. In our formalism, database schemas and instances are algebraic (multi-sorted equational) theories of a certain form. Schemas denote categories, and instances denote their initial (term) algebras. The instances on a schema S form a category, SInst, and a morphism of schemas F : ST induces three adjoint data migration functors: Σ F : SInstTInst, defined by substitution along F, which has a right adjoint Δ F : TInstSInst, which in turn has a right adjoint Π F : SInstTInst. We present a query language based on for/where/return syntax where each query denotes a sequence of data migration functors; a pushout-based design pattern for performing data integration using our formalism; and describe the implementation of our formalism in a tool we call AQL (Algebraic Query Language).

Type
Research Article
Copyright
Copyright © Cambridge University Press 2017 

Access options

Get access to the full version of this content by using one of the access options below.

Footnotes

*

The authors would like to thank David Spivak and Peter Gates. Patrick Schultz was supported by AFOSR grant FA9550-14-1-0031, ONR grant N000141310260, and NASA grant NNH13ZEA001N. Ryan Wisnesky was supported by NIST SBIR grant 70NANB15H290.

References

Abiteboul, S., Hull, R. & Vianu, V. (1995) Foundations of Databases. Addison-Wesley-Longman.Google Scholar
Adámek, J., Rosický, J. & Vitale, E. M. (2011) Algebraic Theories. Cambridge Tracts in Mathematics, vol. 184. Cambridge University Press.Google Scholar
Alagic, S. & Bernstein, P. (2001) A model theory for generic schema management. In Proceedings of the 8th International Workshop on Database Programming Languages, 228–246.Google Scholar
Angles, R. & Gutierrez, C. (2008) Survey of graph database models. ACM Comput. Surv., 40 (1).CrossRefGoogle Scholar
Baader, F. & Nipkow, T. (1998) Term Rewriting and All That. Cambridge University Press.CrossRefGoogle Scholar
Bachmair, L., Dershowitz, N. & Plaisted, D. A. (1989) Completion without failure. Resolution Equ. Algebr. Struct. – Rewriting Tech. 2.Google Scholar
Barr, M. & Wells, C. (1995) Category Theory for Computing Science. Prentice Hall International.Google Scholar
Bertot, Y. & Castéran, P. (2010) Interactive Theorem Proving and Program Development: Coq'art the Calculus of Inductive Constructions. Springer.Google Scholar
Blum, E. K., Ehrig, H. & Parisi-Presicce, F. (1987) Algebraic specification of modules and their basic interconnections. J. Comput. Syst. Sci. 34 (2–3), 293339.CrossRefGoogle Scholar
Bush, M. R., Leeming, M. & Walters, R. F. C. (2003) Computing left Kan extensions. J. Symbol. Comput. 35 (2), 107126.CrossRefGoogle Scholar
Doan, H., Halevy, A. & Ives, Z. (2012) Principles of Data Integration. Morgan Kaufmann.Google Scholar
Enderton, H. B. (2001) A Mathematical Introduction to Logic. Academic Press.Google Scholar
Fagin, R., Kolaitis, P. G., Miller, R. J. & Popa, L. (2005b) Data exchange: Semantics and query answering. Theoretical Computer Science 336 (1), 89124.CrossRefGoogle Scholar
Fagin, R., Kolaitis, P. G., Popa, L. & Tan, W. (2005a) Composing schema mappings: Second-order dependencies to the rescue. ACM Transactions on Database Systems 30 (4), 9941055.CrossRefGoogle Scholar
Fleming, M., Gunther, R. & Rosebrugh, R. (2003) A database of categories. J. Symbol. Comput. 35 (2), 127135.CrossRefGoogle Scholar
Garcia-Molina, H., Ullman, J. D. & Widom, J. (2008) Database Systems: The Complete Book, 2nd ed. Upper Saddle River, NJ, USA: Prentice Hall.Google Scholar
Ghilardi, S., Lutz, C. & Wolter, F. (2006) Did I damage my ontology? A case for conservative extensions in description logics. In Proceedings of the 10th International Conference on Principles of Knowledge Representation and Reasoning, 187–197.Google Scholar
Goguen, J. A. & Burstall, R. M. (1984) Introducing institutions. In Proceedings of the Carnegie Mellon Workshop on Logic of Programs, 221–256.Google Scholar
Goguen, J. (2004) Information Integration in Institutions [online]. Available at: http://cseweb.ucsd.edu/~goguen/pps/ifi04.pdf Google Scholar
Grust, T. (2004) Monad comprehensions: A versatile representation for queries. In The Functional Approach to Data Management: Modeling, Analyzing and Integrating Heterogeneous Data, Gray, P. M. D., Kerschberg, L., King, P. J. H. & Poulovassilis, A. (eds). Berlin, Heidelberg: Springer, 288311.CrossRefGoogle Scholar
Haas, L. M., Hernández, M. A., Ho, H., Popa, L. & Roth, M. (2005) Clio grows up: From research prototype to industrial tool. In Proceedings of the International Conference on Management of Data, 805–810.Google Scholar
Johnson, M., Rosebrugh, R. & Wood, R. J. (2002) Entity-Relationship-Attribute designs and sketches. Theory Appl. Categories 10, 94112.Google Scholar
Kapur, D. & Narendran, P. (1985) The Knuth–Bendix completion procedure and Thue systems. SIAM J. Comput. 14 (4), 10521072.CrossRefGoogle Scholar
Knuth, D. & Bendix, P. (1970) Simple word problems in universal algebra. In Computational Problems in Abstract Algebra, Leech, J. (ed). Springer.Google Scholar
Lüth, C., Roggenbach, M. & Schröder, L. (2005) CCC – The CASL consistency checker. In Recent Trends in Algebraic Development Techniques, 17th International Workshop, Fiadeiro, J. (ed). Lecture Notes in Computer Science, vol. 3423, Springer, 94105.CrossRefGoogle Scholar
Melnik, S., Rahm, E. & Bernstein, P. A. (2003) Rondo: A programming platform for generic model management. In Proceedings of the International Conference on Management of Data, 193–204.CrossRefGoogle Scholar
Mitchell, J. C. (1996) Foundations of Programming Languages. MIT Press.Google Scholar
Mossakowski, T., Krumnack, U. & Maibaum, T. (2014) What is a derived signature morphism? In Recent Trends in Algebraic Development Techniques, Lecture Notes in Computer Science, vol 9463, Springer, 90109.Google Scholar
Mossakowski, T., Rabe, F. & Mihai, C. (2017) Canonical selection of colimits, Algebraic Model Management: A Survery. arXiv:1705.09363.Google Scholar
Nelson, G. & Oppen, D. C. (1980) Fast decision procedures based on congruence closure. J. ACM 27 (2), 356364.CrossRefGoogle Scholar
Patterson, E. (2017) Knowledge representation in bicategories of relations. arXiv:1706.00526.Google Scholar
Roberson, E. L. & Wyss, C. M. (2004) Optimal tuple merge is NP-complete. Bloomington: Indiana University School of Informatics and Computing.Google Scholar
Schultz, P., Spivak, D. I., Vasilakopoulou, C. & Wisnesky, R. (2017) Algebraic databases. Theory Appl. Categ. 32.Google Scholar
Schultz, P., Spivak, D. I. & Wisnesky, R. (2015) QINL: Query-integrated languages. arXiv: 1511.06459.Google Scholar
Schultz, P., Spivak, D. I. & Wisnesky, R. (2016) Algebraic model management: A survey. In Proceedings of the Workshop on Algebraic Development Techniques.Google Scholar
Shipman, D. W. (1981) The functional data model and the data languages daplex. ACM Trans. database Syst. 6 (1), 140173.CrossRefGoogle Scholar
Spivak, D. I. (2012) Functorial data migration. Inform. Comput. 217, 3151.CrossRefGoogle Scholar
Spivak, D. I. (2014) Database queries and constraints via lifting problems. Math. Struct. Comput. Sci. 24 (6).CrossRefGoogle Scholar
Spivak, D. I. & Wisnesky, R. (2015) Relational foundations for functorial data migration. In Proceedings of the 15th Symposium on Database Programming Languages, 21–28.Google Scholar
Zieliński, B., Maślanka, P. & Sobieski, Ś. (2013) Allegories for database modeling. In Model and Data Engineering: 3rd International Conference, 278–289.CrossRefGoogle Scholar
Submit a response

Discussions

No Discussions have been published for this article.

Send article to Kindle

To send this article to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle. Find out more about sending to your Kindle.

Note you can select to send to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Algebraic data integration*
Available formats
×

Send article to Dropbox

To send this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Dropbox.

Algebraic data integration*
Available formats
×

Send article to Google Drive

To send this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Google Drive.

Algebraic data integration*
Available formats
×
×

Reply to: Submit a response


Your details


Conflicting interests

Do you have any conflicting interests? *