Skip to main content

Digitization of the Canadian Parliamentary Debates


This paper describes the digitization and enrichment of the Canadian House of Commons English Debates from 1901 to present. We start by laying out the general framework in which this project took place and then present the structure of the database and provide guidelines to prospective users. The paper concludes with the introduction of, an online platform designed as a hub for archiving Canadian political data, with the parliamentary proceedings at the centre of its architecture.


Cet article décrit la numérisation et l'enrichissement de la publication parlementaire Débats de la Chambre des communes du Canada en langue anglaise, de 1901 à nos jours. Nous commençons par exposer le cadre général dans lequel ce projet s'est inscrit pour présenter ensuite la structure de la base de données et fournir des lignes directrices aux utilisateurs potentiels. L'article se conclut par la présentation de, une plateforme en ligne conçue pour être un carrefour d'archivage des données politiques canadiennes, avec les débats parlementaires au centre de son architecture.

Corresponding author
Informatics Institute, University of Amsterdam, Science Park 904, Amsterdam, 1098 XH, email:
Department of Political Science, University of Toronto, 100 St. George Street, Toronto, Ontario, M5S 3G3, email:
Department of Computer Science, University of Toronto, 10 King's College Road, Toronto, Ontario, M5S 3G4, email:
Informatics Institute, University of Amsterdam, Science Park 904, Amsterdam, 1098 XH, email:
Department of Computer Science, University of Toronto, 10 King's College Road, Toronto, Ontario, M5S 3G4, email:
Department of Political Science, University of Toronto, 100 St. George Street, Toronto, Ontario, M5S 3G3, email:
Department of Political Science, University of Toronto, 100 St. George Street, Toronto, Ontario, M5S 3G3, email:
Hide All
Alonso José, Ambur Owen, Amutio Miguel A., Azañón Oscar, Bennett Daniel, Flagg Rachel, McAllister Dave, Novak Kevin, Rush Sharron and Sheridan John. 2009. “Improving access to government through better use of the web.” World Wide Web Consortium.
Auer Sören, Bizer Christian, Kobilarov Georgi, Lehmann Jens, Cyganiak Richard and Ives Zachary. 2007. “DBPedia: A Nucleus for a Web of Open Data.” In The Semantic Web: Lectures Notes in Computer Science 4825, ed. Aberer Karl, Choi Key-Sun, Noy Natasha Allemang Dean, Lee Kyung-Il, Nixon Lyndon Golbeck Jennifer, Mika Peter, Maynard Diana, Mizoguchi Riichiro, Schreiber Guus and Cudré-Mauroux Philippe. Berlin: Springer.
Barbera Michele. 2013. “Linked (open) data at web scale: research, social and engineering challenges in the digital humanities.” Journal of Law and Information Science 4: 91101.
Berners-Lee Tim, Hendler James and Lassila Ora. 2001. “The Semantic Web. A new form of Web content that is meaningful to computers will unleash a revolution of new possibilities.” Scientific American, May 1, 15.
Bizer Christian, Heath Tom and Berners-Lee Tim. 2009. “Linked data—the story so far.” International Journal on Semantic Web and Information Systems 5: 205–27.
Blanke Tobias, Bodard Gabriel, Bryant Michael, Dunn Stuart, Hedges Mark, Jackson Michael and Scott David. 2012. “Linked data for humanities research—The SPQR experiment.” Paper presented at the 6th IEEE International Conference, IEEE.
Brown Peter F., Cocke John, Della Pietra Stephen A., Della Pietra Vincent J., Jelinek Fredrick, Lafferty John D., Mercer Robert L., and Roossin Paul S.. 1990. “A statistical approach to machine translation.” Computational linguistics 16: 7985.
Brown Peter F., Della Pietra Stephen A., Della Pietra Vincent J., and Mercer Robert L.. 1991. “Word-sense disambiguation using statistical methods.” In Proceedings of the 29th annual meeting on Association for Computational Linguistics, Association for Computational Linguistics, pp. 264270.
Brown Peter F., Della Pietra Vincent J., Della Pietra Stephen A., and Mercer Robert L.. 1993. “The mathematics of statistical machine translation: Parameter estimation.” Computational linguistics 19: 263311.
Diermeier Daniel, Godbout Jean-François, Yu Bei and Kaufmann Stefan. 2012. “Language and Ideology in Congress.” British Journal of Political Science 42: 3155.
Grimmer Justin and Gary King G. 2011. “General Purpose Computer-Assisted Clustering and Conceptualization.” Proceedings of the National Academy of Sciences 108: 2643–50.
Fraser Alexander and Marcu Daniel. 2007. “Measuring word alignment quality for statistical machine translation.” Computational Linguistics 33: 293303.
Grimmer Justin and Stewart Brandon M.. 2013. “Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts.” Political Analysis 21: 267297.
Hansard Association of Canada. 2005. “Tradition and Innovation, Celebrating 125 years of Hansard.” Ottawa.
Ide Nancy, and Veronis Jean, eds. 1995. Text encoding initiative: Background and contexts. vol. 29. Berlin: Springer Science & Business Media.
Kitchin Rob. 2014. The data revolution: Big data, open data, data infrastructures and their consequences. London: Sage.
Manin Bernard. 1997. The principles of representative government. Cambridge: Cambridge University Press.
Marleau Robert, and Montpetit Camille. 2000. House of Commons Procedure and Practice. (December 1, 2015).
Marx Maarten. 2009. “Advanced information access to parliamentary debates.” Journal of Digital Information 10: 111.
Meroño-Peñuela Albert, Ashkpour Ashkan, Rietveld Laurens and Hoekstra Rinke. 2012. “Linked humanities data: The next frontier? A case-study in historical census data.” In Proceedings of the 2nd International Workshop on Linked Science, Boston.
Milligan Ian. 2014. “Open Data's Potential for Political History.” Canadian Parliamentary Review 35: 3443.
Monroe Burt L., Colaresi Michael P. and Quinn Kevin M.. 2008. “Fightin' Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict.” Political Analysis 16: 372403.
O'Brien Audrey. 2002. “Prism: The House of Commons Integrated Technology Project.” Canadian Parliamentary Review 25.
Proksch Sven-Oliver and Slapin Jonathan B.. 2010. “Position Taking in European Parliament Speeches.” British Journal of Political Science 40: 587611.
Rademaker Alexandre, Borges Oliveira Dário Augusto, de Paiva Valeria, Higuchi Suemi, Medeiros e Sá Asla, and Alvim Moacyr. 2015. “A linked open data architecture for the historical archives of the Getulio Vargas Foundation.” International Journal on Digital Libraries 15: 153–67.
Roberts Margaret E., Stewart Brandon M., Tingley Dustin, Lucas Christopher, Leder-Luis Jetson, Kushner Gadarian Shana, Albertson Bethany and Rand David G.. 2014. “Structural Topic Models for Open-Ended Survey Responses.” American Journal of Political Science 58: 1064–82.
Slembrouck Stef. 1992. “The parliamentary Hansard ‘verbatim’ report: the written construction of spoken discourse.” Language and literature 1: 101–19.
Sztyler Timo, Huber Jakob, Noessner Jan, Murdock Jaimie, Allen Colin and Niepert Mathias. 2014. “LODE: Linking digital humanities content to the web of data.” In Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries. IEEE Press.
Tarasova Tatiana and Marx Maarten. 2013. “ParlBench: A SPARQL Benchmark for Electronic Publishing Applications.” In The Semantic Web: ESWC 2013 Satellite Events. Berlin: Springer.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Canadian Journal of Political Science/Revue canadienne de science politique
  • ISSN: 0008-4239
  • EISSN: 1744-9324
  • URL: /core/journals/canadian-journal-of-political-science-revue-canadienne-de-science-politique
Please enter your name
Please enter a valid email address
Who would you like to send this to? *


Full text views

Total number of HTML views: 17
Total number of PDF views: 122 *
Loading metrics...

Abstract views

Total abstract views: 448 *
Loading metrics...

* Views captured on Cambridge Core between 18th January 2017 - 24th January 2018. This data will be updated every 24 hours.