Skip to main content

Digitization of the Canadian Parliamentary Debates


This paper describes the digitization and enrichment of the Canadian House of Commons English Debates from 1901 to present. We start by laying out the general framework in which this project took place and then present the structure of the database and provide guidelines to prospective users. The paper concludes with the introduction of, an online platform designed as a hub for archiving Canadian political data, with the parliamentary proceedings at the centre of its architecture.

Cet article décrit la numérisation et l'enrichissement de la publication parlementaire Débats de la Chambre des communes du Canada en langue anglaise, de 1901 à nos jours. Nous commençons par exposer le cadre général dans lequel ce projet s'est inscrit pour présenter ensuite la structure de la base de données et fournir des lignes directrices aux utilisateurs potentiels. L'article se conclut par la présentation de, une plateforme en ligne conçue pour être un carrefour d'archivage des données politiques canadiennes, avec les débats parlementaires au centre de son architecture.

Corresponding author
Informatics Institute, University of Amsterdam, Science Park 904, Amsterdam, 1098 XH, email:
Department of Political Science, University of Toronto, 100 St. George Street, Toronto, Ontario, M5S 3G3, email:
Department of Computer Science, University of Toronto, 10 King's College Road, Toronto, Ontario, M5S 3G4, email:
Informatics Institute, University of Amsterdam, Science Park 904, Amsterdam, 1098 XH, email:
Department of Computer Science, University of Toronto, 10 King's College Road, Toronto, Ontario, M5S 3G4, email:
Department of Political Science, University of Toronto, 100 St. George Street, Toronto, Ontario, M5S 3G3, email:
Department of Political Science, University of Toronto, 100 St. George Street, Toronto, Ontario, M5S 3G3, email:
Hide All
Alonso, José, Ambur, Owen, Amutio, Miguel A., Azañón, Oscar, Bennett, Daniel, Flagg, Rachel, McAllister, Dave, Novak, Kevin, Rush, Sharron and Sheridan, John. 2009. “Improving access to government through better use of the web.” World Wide Web Consortium.
Auer, Sören, Bizer, Christian, Kobilarov, Georgi, Lehmann, Jens, Cyganiak, Richard and Ives, Zachary. 2007. “DBPedia: A Nucleus for a Web of Open Data.” In The Semantic Web: Lectures Notes in Computer Science 4825, ed. Aberer, Karl, Choi, Key-Sun, Noy, Natasha Allemang, Dean, Lee, Kyung-Il, Nixon, Lyndon Golbeck, Jennifer, Mika, Peter, Maynard, Diana, Mizoguchi, Riichiro, Schreiber, Guus and Cudré-Mauroux, Philippe. Berlin: Springer.
Barbera, Michele. 2013. “Linked (open) data at web scale: research, social and engineering challenges in the digital humanities.” Journal of Law and Information Science 4: 91101.
Berners-Lee, Tim, Hendler, James and Lassila, Ora. 2001. “The Semantic Web. A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities.” Scientific American, May 1, 15.
Bizer, Christian, Heath, Tom and Berners-Lee, Tim. 2009. “Linked data—the story so far.” International Journal on Semantic Web and Information Systems 5: 205–27.
Blanke, Tobias, Bodard, Gabriel, Bryant, Michael, Dunn, Stuart, Hedges, Mark, Jackson, Michael and Scott, David. 2012. “Linked data for humanities research—The SPQR experiment.” Paper presented at the 6th IEEE International Conference, IEEE.
Brown, Peter F., Cocke, John, Della Pietra, Stephen A., Della Pietra, Vincent J., Jelinek, Fredrick, Lafferty, John D., Mercer, Robert L., and Roossin, Paul S.. 1990. “A statistical approach to machine translation.” Computational linguistics 16: 7985.
Brown, Peter F., Della Pietra, Stephen A., Della Pietra, Vincent J., and Mercer, Robert L.. 1991. “Word-sense disambiguation using statistical methods.” In Proceedings of the 29th annual meeting on Association for Computational Linguistics, Association for Computational Linguistics, pp. 264270.
Brown, Peter F., Della Pietra, Vincent J., Della Pietra, Stephen A., and Mercer, Robert L.. 1993. “The mathematics of statistical machine translation: Parameter estimation.” Computational linguistics 19: 263311.
Diermeier, Daniel, Godbout, Jean-François, Yu, Bei and Kaufmann, Stefan. 2012. “Language and Ideology in Congress.” British Journal of Political Science 42: 3155.
Grimmer, Justin and Gary King, G. 2011. “General Purpose Computer-Assisted Clustering and Conceptualization.” Proceedings of the National Academy of Sciences 108: 2643–50.
Fraser, Alexander and Marcu, Daniel. 2007. “Measuring word alignment quality for statistical machine translation.” Computational Linguistics 33: 293303.
Grimmer, Justin and Stewart, Brandon M.. 2013. “Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts.” Political Analysis 21: 267297.
Hansard Association of Canada. 2005. “Tradition and Innovation, Celebrating 125 years of Hansard.” Ottawa.
Ide, Nancy, and Veronis, Jean, eds. 1995. Text encoding initiative: Background and contexts. vol. 29. Berlin: Springer Science & Business Media.
Kitchin, Rob. 2014. The data revolution: Big data, open data, data infrastructures and their consequences. London: Sage.
Manin, Bernard. 1997. The principles of representative government. Cambridge: Cambridge University Press.
Marleau, Robert, and Montpetit, Camille. 2000. House of Commons Procedure and Practice. (December 1, 2015).
Marx, Maarten. 2009. “Advanced information access to parliamentary debates.” Journal of Digital Information 10: 111.
Meroño-Peñuela, Albert, Ashkpour, Ashkan, Rietveld, Laurens and Hoekstra, Rinke. 2012. “Linked humanities data: The next frontier? A case-study in historical census data.” In Proceedings of the 2nd International Workshop on Linked Science, Boston.
Milligan, Ian. 2014. “Open Data's Potential for Political History.” Canadian Parliamentary Review 35: 3443.
Monroe, Burt L., Colaresi, Michael P. and Quinn, Kevin M.. 2008. “Fightin' Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict.” Political Analysis 16: 372403.
O'Brien, Audrey. 2002. “Prism: The House of Commons Integrated Technology Project.” Canadian Parliamentary Review 25.
Proksch, Sven-Oliver and Slapin, Jonathan B.. 2010. “Position Taking in European Parliament Speeches.” British Journal of Political Science 40: 587611.
Rademaker, Alexandre, Borges Oliveira, Dário Augusto, de Paiva, Valeria, Higuchi, Suemi, Medeiros e Sá, Asla, and Alvim, Moacyr. 2015. “A linked open data architecture for the historical archives of the Getulio Vargas Foundation.” International Journal on Digital Libraries 15: 153–67.
Roberts, Margaret E., Stewart, Brandon M., Tingley, Dustin, Lucas, Christopher, Leder-Luis, Jetson, Kushner Gadarian, Shana, Albertson, Bethany and Rand, David G.. 2014. “Structural Topic Models for Open-Ended Survey Responses.” American Journal of Political Science 58: 1064–82.
Slembrouck, Stef. 1992. “The parliamentary Hansard ‘verbatim’ report: the written construction of spoken discourse.” Language and literature 1: 101–19.
Sztyler, Timo, Huber, Jakob, Noessner, Jan, Murdock, Jaimie, Allen, Colin and Niepert, Mathias. 2014. “LODE: Linking digital humanities content to the web of data.” In Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries. IEEE Press.
Tarasova, Tatiana and Marx, Maarten. 2013. “ParlBench: A SPARQL Benchmark for Electronic Publishing Applications.” In The Semantic Web: ESWC 2013 Satellite Events. Berlin: Springer.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Canadian Journal of Political Science/Revue canadienne de science politique
  • ISSN: 0008-4239
  • EISSN: 1744-9324
  • URL: /core/journals/canadian-journal-of-political-science-revue-canadienne-de-science-politique
Please enter your name
Please enter a valid email address
Who would you like to send this to? *


Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed