Hostname: page-component-89b8bd64d-n8gtw Total loading time: 0 Render date: 2026-05-09T00:34:40.686Z Has data issue: false hasContentIssue false

A new database for Italian parliamentary speeches: introducing the ItaParlCorpus dataset

Published online by Cambridge University Press:  14 March 2025

Joshua Cova*
Affiliation:
Max Planck Institute for the Study of Societies, Cologne, Germany

Abstract

A common challenge in studying Italian parliamentary discourse is the lack of accessible, machine-readable, and systematized parliamentary data. To address this, this article introduces the ItaParlCorpus dataset, a new, annotated, machine-readable collection of Italian parliamentary plenary speeches for the Camera dei Deputati, the lower house of Parliament, spanning from 1948 to 2022. This dataset encompasses 470 million words and 2.4 million speeches delivered by 5830 unique speakers representing 77 different political parties. The files are designed for easy processing and analysis using widely-used programming languages, and they include metadata such as speaker identification and party affiliation. This opens up opportunities for in-depth analyses on a variety of topics related to parliamentary behavior, elite rhetoric, and the salience of political themes, exploring how these vary across party families and over time.

Information

Type
Research Note: Dataset
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press on behalf of Società Italiana di Scienza Politica
Figure 0

Figure 1. Original .txt files of parliamentary discussions on April 4, 1978.

Figure 1

Figure 2. Converted .csv files of parliamentary discussions on April 4, 1978.

Figure 2

Figure 3. Share of parliamentary interventions discussing abortion (left) or the mafia (right) as a share of all parliamentary interventions by party (1948–1992).

Figure 3

Figure 4. Most common nouns and adjectives used by political party families when discussing the mafia (1948–1979).

Figure 4

Figure 5. Most common nouns and adjectives used by political party families when discussing the mafia (1980–2022).

Figure 5

Figure 6. Cosine similarity for speeches discussing abortion before and after the introduction of the Legge 194 (1978).

Supplementary material: File

Cova supplementary material

Cova supplementary material
Download Cova supplementary material(File)
File 418.2 KB