Hostname: page-component-89b8bd64d-shngb Total loading time: 0 Render date: 2026-05-06T08:41:19.558Z Has data issue: false hasContentIssue false

How to improve the substantive interpretation of regression results when the dependent variable is logged

Published online by Cambridge University Press:  10 August 2023

Oliver Rittmann*
Affiliation:
School of Social Sciences, University of Mannheim, Mannheim, Germany
Marcel Neunhoeffer
Affiliation:
Rafik B. Hariri Institute for Computing and Computational Science & Engineering, Boston University, Boston, USA Department of Statistics, LMU Munich, München, Germany
Thomas Gschwend
Affiliation:
School of Social Sciences, University of Mannheim, Mannheim, Germany
*
*Corresponding author. Email: orittman@mail.uni-mannheim.de
Rights & Permissions [Opens in a new window]

Abstract

Regression models with log-transformed dependent variables are widely used by social scientists to investigate nonlinear relationships between variables. Unfortunately, this transformation complicates the substantive interpretation of estimation results and often leads to incomplete and sometimes even misleading interpretations. We focus on one valuable but underused method, the presentation of quantities of interest such as expected values or first differences on the original scale of the dependent variable. The procedure to derive these quantities differs in seemingly minor but critical aspects from the well-known procedure based on standard linear models. To improve empirical practice, we explain the underlying problem and develop guidelines that help researchers to derive meaningful interpretations from regression results of models with log-transformed dependent variables.

Information

Type
Research Note
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
Copyright © The Author(s), 2023. Published by Cambridge University Press on behalf of the European Political Science Association
Figure 0

Table 1. Transformation formulas for point estimates of common quantities of interest, and their approximated distributions to construct correct confidence intervals by using α/2 and 1 − α/2 percentiles

Figure 1

Figure 1. Algorithm to simulate confidence intervals (King et al., 2000) when the dependent variable is untransformed (left column), for the median when the dependent variable is logged (center column), and for the mean when the dependent variable is logged (right column).

Figure 2

Figure 2. Quantities of interest with an interaction effect. The DGP follows $\ln ( Y) = \beta _0 + \beta _1 D + \beta _2 X + \beta _3 ( D \times X) + \epsilon$ with β0 = 5, β1 = −5.5, β2 = 0.5, β3 = 0.4, and $\epsilon \sim N( 0,\; \, 1.5^2)$. As X increases, the first difference on the log-scale of Y decreases, but it increases on the original scale of Y.

Figure 3

Figure 3. Conditional mean and median values of nominee-president ideological divergence and first difference between minimum and maximum values of the independent variable, conditional on two different sets of covariate values.

Supplementary material: File

Rittmann et al. supplementary material

Rittmann et al. supplementary material
Download Rittmann et al. supplementary material(File)
File 243.2 KB