Hostname: page-component-77f85d65b8-7lfxl Total loading time: 0 Render date: 2026-03-29T02:55:40.324Z Has data issue: false hasContentIssue false

Ideology from topic mixture statistics: inference method and example application to carbon tax public opinion

Published online by Cambridge University Press:  02 April 2024

Maximilian Puelma Touzel*
Affiliation:
Mila, Québec AI Institute, Montréal, QC, Canada Department of Computer Science and Operations Research, Université de Montréal, Montréal, QC, Canada
Erick Lachapelle
Affiliation:
Department of Political Science, Université de Montréal, Montréal, QC, Canada
*
Corresponding author: Maximilian Puelma Touzel; Email: max.puelmatouzel@gmail.com

Abstract

Political opposition to fiscal climate policy, such as a carbon tax, typically appeals to fiscal conservative ideology. Here, we ask to what extent public opposition to the carbon tax in Canada is, in fact, ideological in origin. As an object of study, ideology is a latent belief structure over a set of issue topics—and in particular their relationships—as revealed through stated opinions. Ideology is thus amenable to a generative modeling approach within the text-as-data paradigm. We use the Structural Topic Model, which generates word content from a set of latent topics and mixture weights placed on them. We fit the model to open-ended survey responses of Canadians elaborating on their support of or opposition to a carbon tax, then use it to infer the set of mixture weights used by each response. We demonstrate this set, moreso than the observed word use, serves efficient discrimination of opposition from support, with near-perfect accuracy on held-out data. We then operationalize ideology as the empirical distribution of inferred topic mixture weights. We propose and use an evaluation of ideology-driven beliefs based on four statistics of this distribution capturing the specificity, variability, expressivity, and alignment of the underlying ideology. We find that the ideology behind responses from respondents who opposed the carbon tax is more specific and aligned, much less expressive, and of similar variability as compared with those who support the tax. We discuss the implications of our results for climate policy and of broad application of our approach in social science.

Information

Type
Application Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Open Practices
Open materials
Copyright
© The Author(s), 2024. Published by Cambridge University Press
Figure 0

Figure 1. The Structural Topic Model for analysis of public opinion survey data. From left to right: Recall the simplex as the space of frequency variables. STM parameters define topic content through the topics’ word frequencies and topic prevalence through parameters associated with the topic mixture prior. We use seven meta-data covariates. We also store the respondent’s categorical response (in support, in opposition, or not sure) to the issue question (here carbon tax). By sequentially sampling topics and words, the model produces a bag-of-words open-ended response elaborating on this opinion.

Figure 1

Table 1. Four low-order statistics of a data cloud in a ($ K-1 $)-dimensional space ($ \boldsymbol{\mu} \in {\mathrm{\mathbb{R}}}^{K-1} $) with mean position $ \overline{\boldsymbol{\mu}} $ and covariance matrix $ {\Sigma}_{\boldsymbol{\mu}} $, having eigenvalues $ {\left\{{\lambda}_i\right\}}_{i=1}^{K-1} $

Figure 2

Figure 2. Carbon tax opposition versus support varies strongly with political demographics. Fraction of respondents that stated they opposed versus the fraction who stated they support the carbon tax over (a) age cohort, (b) political ideology, and (c) political partisanship (measured as voting preference; see Section 2 for details). Addition of the remaining “not sure” fraction (not shown) sums to 1. Opposition is a majority (minority) opinion for dots above (below) the dashed line.

Figure 3

Figure 3. Word-level representations. (a) Tfidf word cloud (top) and rank plot (bottom) for the word statistics of support and oppose response types. (b) Classification test accuracy of word-level content to predict response type (support/oppose; gray shows $ \pm $ standard deviation over 100 random train-test splits). Accuracy is plotted for projections of the data onto the $ n $ most predictive words (see Section 2 for details). Inset are word clouds of log-transformed frequency-weighted effect size for the $ n=100 $ most predictive words for support (left) and oppose (right) classifications.

Figure 4

Table 2. Example topic content (for $ K=7 $ and independent topic mixture prior, $ \sigma =1 $)

Figure 5

Figure 4. Topic structure over topic number, $ K $. (a) Topic trees from varying the number of topics, $ K $, in the combined model. A row corresponds to a given value of $ K $ starting at the top with $ K=2 $ down to $ K=12 $. A column refers to one of the topics inferred for that $ K $, indexed by an ordering procedure for visualization. The relative support (blue) and oppose (red) contribution to each response-averaged mixture weight component is shown as a pie chart. Link thickness is inverse Euclidean distance between topics. (b) Topic quality for response type models. Top: Topic exclusivity plotted against topic semantic coherence for topic number $ K=2,\dots, 12 $ (colors in legend on right) for the three response types (marker type; see legend in Bottom panel). Black outlined markers denote the average over topics obtained for a given $ K $. Bottom: overall score computed from projection orthogonal to average tradeoff line over the three response types (see black lines in top panel). Both plots were made for $ \sigma =0 $. Similar results were found for $ \sigma =1 $ (see Supplementary Material).

Figure 6

Figure 5. Topic-level Representations. (a) Projection of inferred mixture weights from the combined model onto the first two PCA components for a range of values of $ K $. A histogram of counts on a fine-grained grid is shown using a white-to-blue and white-to-red color scale for support and oppose responses, respectively. (b) Classification test accuracy for using the best $ k $ of the mixture weights ranked by largest frequency-averaged effect size(black to gray is different values of $ K $ from $ 2 $ to $ 12 $). (c) The mixture estimates projected into the two top topics associated with the top $ k=2 $ mixture weights for $ K=7 $. The blacked-dashed line is the optimal classifier that achieves 100% accuracy. The text and mixture position of a top response for each topic is shown as a block dot.

Figure 7

Figure 6. Evaluations of Statistical Measures of Mixture Statistics. Top (a–d): The average difference of the oppose value and the support value in each respective measure from the combined model (equation (4)). Bottom (f–i): The value of the measure for each of the support (blue) and oppose (orange) models (equation (3)). Lines are mean estimates and error bars are the standard deviation of 100 (200) posterior samples for top (bottom). Respective held-out model likelihoods are shown in (e) and (j).

Supplementary material: File

Puelma Touzel and Lachapelle supplementary material

Puelma Touzel and Lachapelle supplementary material
Download Puelma Touzel and Lachapelle supplementary material(File)
File 443.1 KB