Lord’s Paradox and two network meta-analysis models

Yu-Kang Tu; James S. Hodges

doi:10.1017/rsm.2025.10036

Lord’s Paradox and two network meta-analysis models

Published online by Cambridge University Press: 18 September 2025

Yu-Kang Tu and

James S. Hodges

Show author details

Yu-Kang Tu*: Affiliation:
Institute of Health Data Analytics & Statistics, College of Public Health, National Taiwan University, Taipei, Taiwan Health Data Research Center, National Taiwan University, Taipei, Taiwan
James S. Hodges: Affiliation:
Institute of Health Data Analytics & Statistics, College of Public Health, National Taiwan University, Taipei, Taiwan Division of Biostatistics and Health Data Science, School of Public Health, University of Minnesota, Minneapolis, MN, USA
*: Corresponding author: Yu-Kang Tu; Email: yukangtu@ntu.edu.tw

Article contents

Abstract
Highlights
Introduction
NMA models and their statistical assumptions
Lord’s Paradox
The relation between NMA models and Lord’s Paradox
Graphical comparisons of NMA models
Directed acyclic graphs for Lord’s Paradox and NMA models
Discussion
Conclusion
Author contributions
Competing interest statement
Data availability statement
Funding statement
Footnotes
References

Rights & Permissions

Abstract

The contrast-based model (CBM) is the most popular network meta-analysis (NMA) method, although alternative approaches, e.g., the baseline model (BM), have been proposed but seldom used. This article aims to illuminate the difference between the CBM and BM and explores when they produce different results. These models differ in key assumptions: The CBM assumes treatment contrasts are exchangeable across trials and models the reference (baseline) treatment’s outcome levels as fixed effects, while the BM further assumes that the baseline treatment’s outcome levels are exchangeable across trials and treats them as random effects. We show algebraically and graphically that the difference between the CBM and BM is analogous to the difference between the two analyses in a statistical conundrum called Lord’s Paradox, in which the t-test and analysis of covariance (ANCOVA) yield conflicting conclusions about the group difference in weight gain. We show that this conflict arises because the t-test compares the observed weight change, whereas ANCOVA compares an adjusted weight change. In NMA, analogously, the CBM compares observed treatment contrasts, while the BM compares adjusted treatment contrasts. We demonstrate how the difference in modeling baseline effects can cause the CBM and BM to give different results. The analogy of Lord’s Paradox provides insights into the different assumptions of the CBM and BM regarding the relationship between baseline effects and treatment contrasts. When these two models produce substantially different results, it may indicate a violation of the transitivity assumption. Therefore, we should be cautious in interpreting the results from either model.

Keywords

network meta-analysis contrast-based model baseline model Lord’s paradox directly acyclic graph

Information

Type: Research Article
Information: Research Synthesis Methods , First View , pp. 1 - 12

DOI: https://doi.org/10.1017/rsm.2025.10036 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Open Practices: Open data Open materials
Copyright: © The Author(s), 2025. Published by Cambridge University Press on behalf of The Society for Research Synthesis Methodology

Highlights

What is already known?

The CBM for NMA assumes treatment contrasts are exchangeable across the included studies, while the BM further assumes the baseline (or reference) treatment’s outcome level is also exchangeable across trials. A recent study showed that these two models may yield different results when the baseline risks vary across different designs of studies that compared different sets of treatments.

What is new?

We show that a key distinction between these two NMA models is similar to Lord’s Paradox, a statistical conundrum about whether a t-test or ANCOVA should be used to compare two groups according to a change in the outcome. We show that the CBM uses the observed treatment contrasts as the outcome in the analysis, while the BM uses adjusted treatment contrasts as the outcome. Alternatively, the CBM estimates the unconditional differences between one treatment and the baseline treatment, while the BM estimates the conditional difference by adjusting for the baseline effects.

Potential impact for RSM readers

The two NMA models make different assumptions about the relationship between treatment contrasts and the reference treatment effects. These differences in statistical assumptions reflect different perspectives on how the data were generated and how treatments impact patients. Therefore, choosing the appropriate model should depend on evaluating the validity of these assumptions in real-world scenarios.

1 Introduction

Network meta-analysis (NMA) combines direct and indirect evidence to compare the benefits and harms of multiple treatments.Reference Dias, Sutton, Ades and Welton ¹ ^– Reference Salanti ³ Several approaches have been proposed to estimate relative effects between treatments; differences in their model specifications reflect different assumptions about how treatments should be compared.Reference Hong, Chu, Zhang and Carlin ⁴ ^, Reference White, Turner, Karahalios and Salanti ⁵ The contrast-based model (CBM), also called the Lu and Ades model, uses the difference in outcome between a pair of treatments in a given study, i.e., a treatment contrast, as the unit of analysis, assuming that a given treatment contrast is exchangeable across studies.Reference Shih and Tu ⁶ ^, Reference Tu ⁷ The baseline model (BM) further assumes that the reference treatment’s outcome levels are exchangeable across studies.Reference Dias, Sutton, Ades and Welton ¹ ^, Reference White, Turner, Karahalios and Salanti ⁵ Although these two models are closely related, there has been debate as to which should be preferred.Reference Hong, Chu, Zhang and Carlin ⁴ ^– Reference Shih and Tu ⁶ ^, Reference Hong, Chu, Zhang and Carlin ⁸ ^, Reference Dias and Ades ⁹

White et al.Reference Dias, Sutton, Ades and Welton ¹ ^, Reference White, Turner, Karahalios and Salanti ⁵ discussed when the CBM and BMs yield different results. We aim to illuminate a key difference between these two models and to show that this difference is analogous to Lord’s Paradox, a conundrum about whether a t-test or analysis of covariance (ANCOVA) should be used to compare two groups according to a change in an outcome.Reference Lord ¹⁰ ^, Reference Lord ¹¹

Our article is organized as follows. First, we give a brief overview of the CBM and BM and their assumptions. We then describe Lord’s Paradox and how the paradox occurs, after which we discuss how the assumptions of the two NMA models are related to Lord’s Paradox, using directed acyclic graphs (DAGs) to illustrate the relation. Finally, we discuss how Lord’s Paradox can illuminate the differences between the two NMA models.

2 NMA models and their statistical assumptions

Both NMA models were initially described within the Bayesian statistical framework.Reference Dias, Sutton, Ades and Welton ¹ ^, Reference Hong, Chu, Zhang and Carlin ⁴ ^, Reference Dias, Welton, Sutton and Ades ¹² For simplicity, we use the frequentist framework.

2.1 Contrast-based model

The CBM assumes a given treatment contrast is exchangeable across studies, as standard pairwise meta-analysis does. The CBM can be written as

$$\begin{align*}{y}_{ik} = {s}_{b(i)}+{\delta}_{ib(i)k}+{\varepsilon}_{ik}\end{align*}$$

(Model 1)

$$\begin{align}{\varepsilon}_{ik}\sim N\left(0,{se}_{ik}^2\right)\end{align}$$

$$\begin{align*}{\delta}_{ib(i)k}\sim N\left({d}_{b(i)k},{\tau}^2\right),{d}_{b(i)k} = {d}_{Ak}-{d}_{Ab(i)},{d}_{AA} = 0,\end{align*}$$

where ${y}_{ik}$ is treatment k’s response in study i with standard error ${se}_{ik}$ ; ${\delta}_{ib(i)k}$ is treatment k’s trial-specific effect relative to trial i’s baseline treatment $b(i)$ ; ${d}_{b(i)k}$ is the average of the contrast between k and $b(i)$ ; ${\tau}^2$ , a variance, describes heterogeneity between trials in this contrast, and ${d}_{Ak}$ is the difference between the average of treatment k and a reference treatment A. The heterogeneity ${\tau}^2$ is usually assumed identical for all treatment contrasts, so their covariances are $\frac{1}{2}{\tau}^2$ . For binary data, $y$ is usually the log odds or log risk, so $\delta$ and $d$ are the log odds ratio or log risk ratio between two treatments. For continuous data, $y$ is the outcome or change in outcome from the start of observation, so $\delta$ and $d$ are the mean difference in the outcome between two treatment groups. For either type of data, ${s}_{b(i)}$ is a fixed-effect parameter specific to study i, often interpreted as a study effect, while ${\delta}_{ib(i)k}$ is a random effect following a normal distribution, implying that it is exchangeable across studies.

2.2 Baseline model

The BM differs from the CBM in that the reference treatment’s level is modeled as a random effect with a separate draw from a normal distribution for each study. If we designate treatment A as the reference treatment, regardless of whether it was included in every study, we can write the BM (in arm-based form) asReference Dias, Welton, Sutton and Ades ¹²

$$\begin{align*}{y}_{ik} = {s}_i+{\delta}_{iAk}+{\varepsilon}_{ik}\end{align*}$$

(Model 2)

$$\begin{align}{s}_i\sim N\left({\overline{s}}_A,{\sigma}_A^2\right),{\varepsilon}_{ik}\sim N\left(0,{se}_{ik}^2\right)\end{align}$$

$$\begin{align*}{\delta}_{iAk}\sim N\left({d}_{Ak},{\tau}^2\right),{d}_{AA} = 0,\end{align*}$$

where ${s}_i$ is the (possibly hypothetical) true response for treatment A in study i, and ${\overline{s}}_A$ and ${\sigma}_A^2$ are the mean and variance across studies of this true response; ${\tau}^2$ is assumed identical for all treatment contrasts, so their covariances are $\frac{1}{2}{\tau}^2$ . The random effects ${s}_i$ and ${\delta}_{iAk}$ are assumed independent. (Model-2 is equivalent to Model 3 in the article by White et al. therein called the CBM with random study intercepts.)Reference White, Turner, Karahalios and Salanti ⁵ Different specifications of this model have been published. As specified in Model 2, the analysis results do not depend on the choice of reference treatment; as specified in White et al.Reference White, Turner, Karahalios and Salanti ⁵ , the results do depend on the choice of reference treatment. The Supplementary Material gives more details. Other parameters are as in the CBM. Thus, the baseline and CBMs differ in that ${s}_i$ is a draw from a random effect in the former but a fixed effect in the latter.

2.3 Statistical assumptions of these NMA models

The study effect ${s}_{b(i)}$ in the CBM, a fixed effect, is the response of study i’s control group to the baseline treatment; the baseline treatment response is specific to each study and is not assumed exchangeable across studies. Rather, the difference in responses between two treatment groups—the treatment contrast or relative effect—is exchangeable.

This assumption that treatment contrasts are exchangeable further implies no association between study i’s baseline effect ${s}_{b(i)}$ and study i’s relative effect ${\delta}_{iAk}$ . Suppose all studies include the reference treatment A. The assumption is that if A has a large or small absolute effect in study i, this has no relation to the relative effect between A and the other treatments in study i. Even if the included studies come from different populations with different distributions of treatment effect modifiers, such as age, the relative treatment effects between A and other treatments in different studies remain similar by assumption, although the absolute effect of A may vary from study to study with differences in age.

The BM assumes baseline treatment A’s effect is exchangeable across studies, a random effect with a normal distribution. When a study’s patients are randomly assigned to A and other treatments, that study’s other treatments are thus randomly sampled from the same population, so the BM implicitly assumes all treatment effects are exchangeable across studies.Reference White, Turner, Karahalios and Salanti ⁵ We revisit this later when we discuss the selection of a baseline treatment.

3 Lord’s Paradox

We may gain insight into the differences between the CBM and BM from the literature about Lord’s Paradox.Reference Lord ¹⁰ ^, Reference Lord ¹¹ This literature is vastReference Holland and Rubin ¹³ ^– Reference Wainer ¹⁶ ; we do not provide a comprehensive review.

In FM Lord’s original article, a university studied the effect on students’ body weight of the diet provided in the university dining halls. The study also asked whether males and females differed in the effects of the diet. Each student’s weight was measured upon their arrival in September and again the following June. Two statisticians analyzed the data independently.

3.1 A numerical example

For our demonstration, we simulated body weights of 100 female and 100 male students. Table 1 summarizes the simulated data.

Table 1 Summary statistics for the hypothetical body weight data with 100 male and 100 female students

Note: bw1, body weight measured at baseline; bw2, body weight measured at follow-up; cw, change in weight.

The first statistician calculates the mean weights of female students at the beginning and end of the year (56.20 kg and 55.75 kg, respectively) and finds the change (−0.46 kg) very small. The mean weights of male students at the beginning (70.90 kg) and end of the year (70.08 kg) are also similar, and the change (−0.83 kg) is also small. She compares weight change between females and males using a t-test, obtains a p-value of 0.68, and concludes that females and males did not differ in weight change. This t-test can be written as a linear regression model:

(1)

$$\begin{align} B{W}_2-B{W}_1 = {a}_0+{a}_1 Gender+{e}^{(a)},\end{align}$$

where $B{W}_1$ and $B{W}_2$ are body weights measured at baseline and follow-up, respectively, and Gender is a dummy variable with females coded 0 and males 1. The estimate of ${a}_0$ is −0.46 kg, the mean weight change in females, and ${a}_1$ is −0.37 kg, suggesting no difference between sexes in mean weight change.

The second statistician notices that males are larger than females at baseline on average (70.9 kg vs 56.2 kg), so she uses ANCOVA to adjust for the baseline difference in body weight. The ANCOVA model can also be written as a linear regression:

(2)

$$\begin{align}B{W}_2 = {b}_0+{b}_1 Gender+{b}_2B{W}_1+{e}^{(b)}.\end{align}$$

The estimate of ${b}_0$ is 12.49 kg, the estimated mean body weight at the end of the year for females whose baseline body weight is zero, so ${b}_0$ has no practical meaning. The estimate of ${b}_1$ is 3.02 kg, the estimated difference in follow-up body weight between females and males with the same baseline body weight. The estimate of ${b}_2$ is 0.77, the estimated difference in follow-up body weight between students of a given gender whose baseline body weights differ by 1 kg. Because ${b}_1$ is statistically significant, the second statistician concludes that, on average, males gain 3 kg more than females. The horizontal and vertical axes of Figure 1’s scatterplot are the baseline and follow-up body weights, respectively. Red and blue circles represent females and males. Red and blue lines are fitted parallel regression lines for females and males, respectively, both with slope ${b}_2 = 0.77$ . Their intercepts differ by ${\widehat{b}}_1 = 3.02$ .

Figure 1 Scatterplot of the hypothetical data with 100 male (blue circles) and 100 female (red circles) students. The blue and red solid lines are the fitted regression lines for male and female students, respectively. The black solid line has an intercept of zero and a slope of 1.

3.2 How does Lord’s Paradox happen?

When the difference between males and females in mean body weight change is small, the regression coefficient ${a}_1$ in Equation (Reference Dias, Sutton, Ades and Welton1) will be close to 0. However, ${b}_1$ in Equation (Reference Lu and Ades2) equals 0 only under very special conditions.

Equation (Reference Dias, Sutton, Ades and Welton1) can be rearranged as

(3)

$$\begin{align}B{W}_2 = {a}_0+{a}_1 Gender+1\times B{W}_1+{e}^{(a)}.\end{align}$$

The t-test is thus a linear regression in which $B{W}_2$ is regressed on $Gender$ and $B{W}_1$ with $B{W}_1'$ s coefficient constrained to ${b}_2 = 1$ , represented by Figure 1’s black line. When ${a}_1$ is close to 0 as in our example, the fitted black lines for males and females are indistinguishable. Comparing Equations (Reference Salanti3) and (Reference Lu and Ades2), we see that Lord’s Paradox arises when ${b}_2$ is substantially less than 1 in Equation (Reference Lu and Ades2), in which case the two statisticians will give different conclusions.

We can rearrange Equation (Reference Lu and Ades2) as

(4)

$$\begin{align}B{W}_2-{b}_2{BW}_1 = {b}_0+{b}_1 Gender+{e}^{(b)}.\end{align}$$

Comparing Equations (Reference Dias, Sutton, Ades and Welton1) and (Reference Hong, Chu, Zhang and Carlin4), Lord’s Paradox can be interpreted as follows: The first statistician regresses the observed weight change $B{W}_2-B{W}_1$ on $Gender$ , while the second regresses the adjusted weight change $B{W}_2-{b}_2B{W}_1$ on $Gender$ . If $B{W}_2\approx B{W}_1$ , as in our example, then ${BW}_2-{b}_2B{W}_1\approx \left(1-{b}_2\right)B{W}_1$ . Because both males and females have tiny observed weight changes, $B{W}_2\approx B{W}_1$ for both genders, so the two genders show a negligible difference in weight change. Thus, the first statistician finds no difference in weight change between females and males. In contrast, males have greater $B{W}_1$ than females, so males have greater adjusted weight change. In our example, the average adjusted weight changes for males and females are $70.1-0.77\times 70.9 = 15.5$ and $55.8-0.77\times 56.2 = 12.5$ , respectively, so the difference is about 3 kg, as the ANCOVA estimates.

3.3 An alternative formulation of Lord’s Paradox

Alternatively, we can consider the difference between the first and second statisticians as arising from their differing assumptions about the relationship between change score and baseline body weight. The first statistician assumes no relationship between weight change and baseline body weight, because Equation (Reference Dias, Sutton, Ades and Welton1) can be expressed as

(5)

$$\begin{align}B{W}_2-B{W}_1 = {a}_0+{a}_1 Gender+0\times B{W}_1+{e}^{(a)}.\end{align}$$

The coefficient for $B{W}_1$ is 0, so Equation (Reference Dias, Sutton, Ades and Welton1) assumes weight change is not related to baseline body weight. Thus, although males have greater body weight than females, on average, that plays no role in comparing their weight changes.

The second statistician assumes there is a relationship between weight change and baseline weight, as seen when Equation (Reference Lu and Ades2) is reexpressed as

(6)

$$\begin{align}B{W}_2-B{W}_1 = {b}_0+{b}_1 Gender+\left({b}_2-1\right)B{W}_1+{e}^{(b)}.\end{align}$$

The two statisticians reach the same conclusion if ${b}_2\approx 1$ . Because of random errors in weight measurements and natural variation in body weight within each student, ${b}_2$ is likely to be rather less than 1.Reference Shiffrin ¹⁴ ^, Reference Reichardt ¹⁷ This is what FM Lord tried to demonstrate in his short article.Reference Lord ¹⁰

4 The relation between NMA models and Lord’s Paradox

We use an NMA with treatments X, Y, and Z to show how Lord’s Paradox illuminates the differences between the two NMA models. Following White et al.Reference Dias, Sutton, Ades and Welton ¹ ^, Reference White, Turner, Karahalios and Salanti ⁵ , our NMA includes only trials comparing two treatments, one of which is treatment X. Thus, this NMA includes only two types of trials, also called designs in the NMA literatureReference White, Barrett, Jackson and Higgins ¹⁸ : one comparing X with Y (design XY) and the other comparing X with Z (design XZ). X is each trial’s baseline treatment and also the network’s reference treatment, so it is treatment “A” in Model-1 (the CB model) and Model-2 (the BM) above. We wish to know whether Z’s effect differs from Y’s.

The CBM for this NMA can be expressed (in contrast-based form) as a regression model:

$$\begin{align*}{y}_{i2}-{y}_{iX} = {\delta}_{iX2}+{e}_{i,2X},{e}_{i,2X} = {\varepsilon}_{i2}-{\varepsilon}_{iX}\sim N\left(0,{se}_{iX}^2+{se}_{i2}^2\right),\kern2.00em\end{align*}$$

where group 2 is either Y or Z; group 1 is always X so the subscript “1” has been replaced by X. Model 1’s trial-specific effect of treatment k relative to trial i’s baseline treatment $b(i)$ , ${\delta}_{ib(i)k}$ , can be expressed as

$$\begin{align*}{\delta}_{iX2} = {d}_{XY}{1}_{XY(i)}+{d}_{XZ}{1}_{XZ(i)}+{b}_{i2}-{b}_{iX}\end{align*}$$

where ${d}_{XY}$ and ${d}_{XZ}$ are as in Model 1; ${1}_{XY(i)} = $ 1 when study i has design XY and 0 when study i has design XZ; ${1}_{XZ(i)} = 1$ when study i has design XZ and 0 when study i has design XY; and ${b}_{ij}\sim N\left(0,{\tau}^2\right)$ captures heterogeneity. The CBM is then

(7)

$$\begin{align}{y}_{i2}-{y}_{iX} = {d}_{XY}{1}_{XY(i)}+{d}_{XZ}{1}_{XZ(i)}+{b}_{i2}-{b}_{iX}+{e}_{i,2X},\end{align}$$

where the subscript 2 for ${y}_{i2}$ represents each study’s second treatment group, Y or Z. We can add $0\times {y}_{iX}$ to Equation (Reference Tu7), so it becomes clear that the CBM assumes no relationship between the treatment contrast and the baseline treatment:

(8)

$$\begin{align}{y}_{i2}-{y}_{iX} = {d}_{XY}{1}_{XY(i)}+{d}_{XZ}{1}_{XZ(i)}+0\times {y}_{iX}+{b}_{i2}-{b}_{iX}+{e}_{i,2X}.\end{align}$$

Equation (Reference Hong, Chu, Zhang and Carlin8) can be further rearranged by moving $-{y}_{iX}$ to the right side, yielding:

(9)

$$\begin{align}{y}_{i2} = {d}_{XY}{1}_{XY(i)}+{d}_{XZ}{1}_{XZ(i)}+1\times {y}_{iX}+{b}_{i2}-{b}_{iX}+{e}_{i,2X}.\end{align}$$

The CBM can thus be viewed as regressing the test treatment level on the reference treatment level, with the regression coefficient fixed at 1. In all these ways, the CBM’s analysis is like the first statistician’s analysis in Lord’s Paradox.

Now consider the BM for this NMA, Model 2. Suppose we have an estimate ${\widehat{s}}_i^s$ for ${s}_i$ . Such an estimate using Model 2, a mixed linear model, will be shrunk (hence the superscript) toward ${\widehat{\overline{s}}}_X$ , an estimate of ${\overline{s}}_X$ , the mean of ${s}_i'$ s distribution, so ${\widehat{s}}_i^s$ can be written

$$\begin{align*}{\widehat{s}}_i^s = {\widehat{\overline{s}}}_X+{q}_i\left({\widehat{s}}_i^u-{\widehat{\overline{s}}}_X\right),\end{align*}$$

where ${q}_i$ is the shrinkage factor and ${\widehat{s}}_i^u$ is the unshrunk estimate of ${s}_i$ from a CBM fit, where the baseline treatment is a fixed effect. Each study and arm has its own sampling-error variance (usually treated as known), so ${q}_i$ is a complex function of the heterogeneity ${\tau}^2$ and reference-group random-effect variance ${\sigma}^2,$ but it is easy to show that as ${\sigma}^2\uparrow \infty$ i.e., as the BM tends toward the CBM, ${q}_i\uparrow 1$ .

Now ${\widehat{s}}_i^u = {s}_i+{\gamma}_i$ and ${\widehat{\overline{s}}}_X = {\overline{s}}_X+\xi$ , where ${\gamma}_i$ and $\xi$ have mean zero because the respective estimates are unbiased. Therefore,

$$\begin{align*}{\widehat{s}}_i^s = \left(1-{q}_i\right){\overline{s}}_X+{q}_i{s}_i+{\omega}_i,\end{align*}$$

where ${\omega}_i = {q}_i{\gamma}_i+\left(1-{q}_i\right)\xi$ has mean 0. If we replace the hypothetical true response for treatment $X$ for study i, ${s}_i$ , in Model 2 with its estimate, ${\widehat{s}}_i^s$ , then

$$\begin{align*}{y}_{i2} = \left(1-{q}_i\right){\overline{s}}_X+{q}_i{s}_i+{d}_{XY}{1}_{XY(i)}+{d}_{XZ}{1}_{XZ(i)}+{b}_{i2}+{\varepsilon}_{i2}+{\omega}_i.\end{align*}$$

where the sum of the last three terms has a mean of zero.

Now ${y}_{i1} = {s}_i+{b}_{i1}+{\varepsilon}_{i1}$ , so

(10)

$$\begin{align}{y}_{i2}-{y}_{i1} &= \left(1-{q}_i\right)\left({\overline{s}}_X-{s}_i\right)+{d}_{XY}{1}_{XY(i)}+{d}_{XZ}{1}_{XZ(i)}\nonumber\\& \quad +\left({b}_{i2}-{b}_{i1}\right)+\left({\varepsilon}_{i2}-{\varepsilon}_{i1}\right)+{\omega}_i,\end{align}$$

where the second line of Equation (Reference Lord10) has a mean of zero.

Equation (Reference Lord10) can be rewritten in ways that are analogous to Equations (Reference Hong, Chu, Zhang and Carlin4) and (Reference Shih and Tu6). First, recall that ${y}_{i1} = {s}_i+{b}_{i1}+{\varepsilon}_{i1}$ . Gather these items from Equation (Reference Lord10)’s right side to give

(11)

$$\begin{align}{y}_{i2}-{y}_{i1}& = \left(1-{q}_i\right){\overline{s}}_X+\left({q}_i-1\right){y}_{i1}+{d}_{XY}{1}_{XY(i)}+{d}_{XZ}{1}_{XZ(i)}\nonumber\\& \quad +\left({b}_{i2}-{q}_i{b}_{i1}\right)+\left({\varepsilon}_{i2}-{q}_i{\varepsilon}_{i1}\right)+{\omega}_i,\end{align}$$

where Equation (Reference Lord11)’s second line has mean zero. Equation (Reference Lord11) and Equation (Reference Shih and Tu6) have the same form: The BM implicitly assumes the study-specific treatment contrast and baseline effect are associated, while the CBM assumes they are independent.

Finally, we can move $\left({q}_i-1\right){y}_{i1}$ to the left-hand side of Equation (Reference Lord11), giving

(12)

$$\begin{align}{y}_{i2}-{q}_i{y}_{i1} &= \left(1-{q}_i\right){\overline{s}}_X+{d}_{XY}{1}_{XY(i)}+{d}_{XZ}{1}_{XZ(i)}\nonumber\\& \quad +\left({b}_{i2}-{q}_i{b}_{i1}\right)+\left({\varepsilon}_{i2}-{q}_i{\varepsilon}_{i1}\right)+{\omega}_i,\end{align}$$

where the second line of Equation (Reference Dias, Welton, Sutton and Ades12) has mean zero. Equation (Reference Dias, Welton, Sutton and Ades12) and Equation (Reference Hong, Chu, Zhang and Carlin4) have the same form: The BM, in effect, has as its dependent variable an adjusted difference between the non-reference and reference treatments, where the adjustment depends on the shrinkage induced by the random effect used to model the reference treatment.

We make two comments about the BM. First, the shrinkage factor ${q}_i$ has the same role that ${b}_2$ has in Lord’s Paradox in Equations (Reference Hong, Chu, Zhang and Carlin4) and (Reference Shih and Tu6), but ${b}_2$ and ${q}_i$ differ in other ways: ${b}_2$ is the same for all students in Lord’s example, while ${q}_i$ takes different values for different studies $i$ , and ${b}_2$ is estimated by fitting the regression in Equation (Reference Hong, Chu, Zhang and Carlin4), while the ${q}_i$ depend on the data indirectly through the study sample sizes, the estimates ${\widehat{\sigma}}^2$ and ${\widehat{\tau}}^2$ , and the ${se}_{ij}^2$ . The ${q}_i$ will be close to zero only for very small studies. Second, $\left(1-{q}_i\right){\overline{s}}_X$ is present in Equations (Reference Lord11) and (Reference Dias, Welton, Sutton and Ades12) but has no analog in Equations (Reference Hong, Chu, Zhang and Carlin4) and (Reference Shih and Tu6). From Equation (Reference Lord11), biases in the BM’s estimates of treatment effects are a function of these terms.

4.1 NMA models and the two statisticians in Lord’s Paradox

Comparing Equations (Reference White, Turner, Karahalios and Salanti5) and (Reference Hong, Chu, Zhang and Carlin8) and Equations (Reference Shih and Tu6) and (Reference Dias, Welton, Sutton and Ades12), the CBM is the first statistician, while the BM is the second statistician. Each student has two measurements of body weight, as each trial has two treatments. Each student’s weight change corresponds to the treatment contrast in each trial. Males and females correspond to designs XZ and XY, respectively. The two statisticians’ approaches yield different results when males have larger baseline body weight; the two approaches to NMA may yield different results when the effects of baseline (and reference) treatment X differ between the two trial designs.

5 Graphical comparisons of NMA models

Figure 2 graphically explains the difference between the two models’ results in different scenarios. In terms of the preceding development for the BM, Figure 2 connects most directly to Equation (Reference Lord10). In Figure 2, two filled circles connected with a solid line represent a trial comparing two treatments, Arms 1 and 2. The horizontal axis is the treatment arms, and the vertical axis is the absolute treatment effects. In Figure 2a, the two arms in each of the six trials have the same outcome levels. In Figures 2b and 2c, the two arms have different outcome levels, but the difference between the two arms is identical in all trials. The open circles are the BM’s shrunken estimates for Arm 1, which are “shrunk” toward Arm 1’s average level.

Figure 2 Line plots for comparing contrast-based and baseline models. The two filled circles connected with a solid line represent a trial comparing two treatments, Arms 1 and 2. The open circles are the shrunken estimates of Arm 1 given by the baseline model; these open circles are “shrunk” closer to the average effect of Arm 1. The horizontal axis is the treatment arms, and the vertical axis is the absolute treatment effects. In (a), the two arms in each of the six trials have the same treatment effects. In (b), Arm 2 is better than Arm 1, and in (c), Arm 1 is better than Arm 2. The difference between the two arms is identical in every trial.

Suppose we include two types of trials in the analysis: design XY and design XZ. X is Arm 1, each study’s baseline treatment, and the network’s reference treatment, and we aim to know whether Z differs from Y. In the CBM, the difference between the two arms’ outcomes in each trial is the difference between the values of each trial’s two filled circles. In the BM, however, the difference in treatment levels is the difference between the open circle on the left and the solid circle on the right, connected by the dashed line. Because X’s level is modeled as a random effect, for trials with above-average levels of X, their estimated levels of X are shrunk downward, and for trials with below-average X, their estimated X are shrunk upward.

When designs XY and XZ have the same distribution of X, e.g., each design includes all six trials in Figure 2, Y and Z do not differ using the CBM or BM. But if the top three trials have design XZ and the bottom three trials have design XY, the two models give different results. In Figure 2a, the CBM finds no difference between Z and Y, but the BM finds Z is better than Y (because Z is better than X in design XZ, and X is better than Y in design XY). In Figure 2b, the CBM finds that both Z and Y are better than X, and Z and Y do not differ. The BM, however, finds that both Z and Y are better than X, but Z is also better than Y. In Figure 2c, the CBM finds that X is better than Z and Y, and again Z and Y do not differ. The BM now finds that X is slightly better than Z but much better than Y, so Z is better than Y.

6 Directed acyclic graphs for Lord’s Paradox and NMA models

Figure 3a is a DAG for Lord’s Paradox, slightly modified from Pearl’s DAG.Reference Pearl ¹⁹ Each node represents a variable in the model; an arrow’s direction describes the relation between the two nodes. In Figure 3a, node G, gender, is a cause of baseline body weight ( ${W}_0$ ) and final weight ( ${W}_1$ ), while ${W}_0$ is a cause of ${W}_1$ and weight gain ( $Y$ ), and ${W}_1$ is a cause of $Y$ . The letter or number associated with an arrow denotes the strength of the path represented by the arrow and can be interpreted as a regression coefficient. For example, because on average, weight does not change for either males or females, the effect of $G$ on ${W}_0$ is equal to its effect on ${W}_1$ , which can be written as

$$\begin{align*}c+ ab = a,\mathrm{which}\kern0.17em \mathrm{implies}\;c = a\left(1-b\right).\end{align*}$$

Statistician 1 considers the total effect of $G$ on $Y$ , the sum of three paths from $G$ to $Y$ , $G\to {W}_0\to {W}_1\to Y$ , $G\to {W}_0\to Y$ , and $G\to {W}_1\to Y$ , so the total effect is $ab-a+c = c-a\left(1-b\right) = 0$ . Statistician 2, however, considers the conditional effect of $G$ on $Y$ by blocking paths through ${W}_0$ ; this conditional effect is $c = a\left(1-b\right)$ . The two statisticians have the same result if $a = 0$ or $b = 1$ , i.e., (respectively) males and females do not differ in baseline body weight, or the regression of final weight on baseline weight has a coefficient of 1.

Figure 3 Directed acyclic graphs for (a) Lord’s Paradox: $G$ represents gender; ${W}_0$ and ${W}_1$ denote the baseline body weight and final weight, respectively; $Y$ is the weight gain and (b) NMA models: $D$ represents study design (XY vs XZ); ${T}_B$ and ${T}_I$ denote the effects of baseline treatment $X$ and the intervention treatment (Y or Z); $E$ is the difference in the effects between $Y$ and $Z$ .

Figure 3b shows an analogous DAG for NMA. Node $D$ represents design (XY vs XZ), which influences the effects of both the baseline treatment $X$ ( ${T}_B$ ) and the intervention treatment Y or Z ( ${T}_I$ ). ${T}_B$ is a cause of ${T}_I$ and of the difference in the effects of $Y$ and $Z$ ( $E$ ), and ${T}_I$ is a cause of $E$ . Suppose the average relative treatment effect (difference between treatments) is the same in designs XY and XZ, so the effects of Y and Z relative to X do not differ, on average. Then the effect of $D$ on ${T}_B$ and ${T}_I$ are the same, which implies

$$\begin{align*}c+ ab = a,\mathrm{thus}\;c = a\left(1-b\right)\end{align*}$$

The CBM considers the total effect of $D$ on $E$ , the sum of three paths from $D$ to $E$ : $D\to {T}_B\to {T}_I\to E$ , $D\to {T}_B\to E$ , and $D\to {T}_I\to E$ , so the total effect is $ab-a+c = c-a\left(1-b\right) = 0$ . The BM estimates the conditional effect of $D$ on $E$ by blocking paths through ${T}_B$ ; this conditional effect is $c = a\left(1-b\right)$ . The two models give the same result if $a = 0$ or $b = 1$ , i.e., (respectively) X has the same effect in designs XY and XZ, or the regression of the intervention treatment (Y or Z) on the baseline treatment (X) has a coefficient of 1.

Figure 3a shows that the t-test and ANCOVA give the same result if either the average baseline body weights are identical or the coefficient for regressing final body weight on baseline body weight is 1. In his 1967 article, FM Lord was more concerned about the latter, as this coefficient tended to be less than 1 because of natural fluctuations in body weights and measurement errors, causing imperfect correlation between two body weight measurements. In a NMA, even if the differences in the average treatment effects between X and Y and between X and Z are the same, the correlations (across studies) between treatment effects are unlikely to be 1, so the CBM and BM give different results.

7 Discussion

7.1 Comparing the two NMA models

When the baseline effects have similar distributions in the different trial designs, the contrast-based and BMs yield similar results. This is analogous to using the t-test or ANCOVA to analyze change scores from a randomized trial: both tests yield the same results because the treatment groups have similar baseline values.Reference Senn ²⁰ The key question, therefore, is which model is more appropriate when trial designs have different baseline effects.

Variation between trials in the effect modifiers of individual patients can lead to differences in baseline effects. Including study-level effect modifiers in the analysis may reduce unexplained variation between trials, but caution is still necessary. Moreover, some effect modifiers may be unknown or unavailable. Also, trials with similarly distributed effect modifiers may still have different baseline effects. If the baseline and CBM show substantially different results, this suggests that baseline effects are heterogeneous across trial designs. Caution is therefore needed in interpreting results from either model, as this heterogeneity may reflect an imbalance in the distribution of some effect modifiers, resulting in violation of the transitivity assumption. We may need to reassess the eligibility of individual trials against the prespecified criteria in the systematic review’s protocol.

Figure 3’s DAGs offer another perspective on the differences between the two models. The BM estimates the adjusted difference between Y and Z in the treatment effects conditional on the effect of X, while the CBM estimates the total, unadjusted difference. Whether the adjusted or unadjusted difference is more clinically meaningful is likely to be context specific. For instance, suppose the effect modifiers are distributed similarly in trials of different designs, while the effects of X are on average slightly higher in design XZ than in design XY. The BM may give a more precise estimate for the difference between Y and Z by adjusting for the heterogeneity in X’s effect. In contrast, if patients’ characteristics differ substantially in designs XZ and XY, neither of these two models can provide a definitive answer about the difference between Y and Z.

7.2 Different formulations of NMA models and selection of the baseline treatment

In our discussion of the two NMA models, the treatment X was included in every trial, so it was the natural candidate for both the baseline and reference treatment. However, it is rare for a treatment to be included in all trials, so our single-level formulation of the BM in Equations (10)–(12), with treatment contrasts as the unit of analysis, is not applicable to most NMAs. Nevertheless, we used this formulation to show the similarity between these NMA models and Lord’s Paradox. A more practical formulation of these NMA models uses treatment arms as the unit of analysis, in which case any treatment can be the reference or baseline treatment (if Model 2’s specification is used; see the Supplementary Material).

In a previous study, Shi and TuReference Shih and Tu ⁶ used structural equation modeling to show that the CBM is a BM with the variance of the random study effects approaching infinity. This implies that results from the BM are not affected by the selection of reference treatment. There are, in fact, two versions of the BM that differ in their specification of random effects.Reference White, Turner, Karahalios and Salanti ⁵ ^, Reference Shih and Tu ⁶ The choice of a reference treatment affects the results in one specification but not the other. The Supplementary Material gives a more thorough technical discussion of the BM’s two specifications and of how the choice of the reference treatment affects their results differently.

Finally, although we used a specific NMA to demonstrate the analogy between Lord’s Paradox and the two NMA models, the conclusions and implications of this analogy extend beyond the specific NMA. For instance, we showed that the CBM uses observed treatment contrasts as outcomes, while the BM uses adjusted treatment contrasts. Thus, if the baseline treatment’s effect varies substantially across different designs of trials, these two models may yield different results. We refer readers to our previous article on bias propagation in NMA for simulations and a complex example in which results from these models can differ greatly.Reference Li, Shih, Song and Tu ²¹

8 Conclusion

This article used Lord’s Paradox to provide a framework for comparing the CBM and BMs for NMA. These two models make different assumptions about the relationships between baseline and relative treatment effects. When they yield substantially different results, we need to be cautious in interpreting either model’s results.

Author contributions

Y.-K.T. conceived the idea for this project. Y.-K.T. and J.S.H. contributed to the design of the study and interpretation of the results. Y.-K.T. wrote the first draft of the manuscript, and J.S.H. prepared the Supplementary Material. J.S.H. contributed to critical revisions of the draft. Both authors approved the final version of the article.

Competing interest statement

The authors declare that no competing interests exist.

Data availability statement

The R code for generating the example data used in this study is available in the Supplementary Material.

Funding statement

This study was partly supported by a grant from the National Science and Technology Council in Taiwan (Grant No. NSTC 112-2314-B-002-210-MY3).

Supplementary material

To view supplementary material for this article, please visit http://doi.org/10.1017/rsm.2025.10036.

Footnotes

This article was awarded Open Data and Open Materials badges for transparent practices. See the Data availability statement for details.

References

Dias, S, Sutton, AJ, Ades, AE, Welton, NJ. Evidence synthesis for decision making 2: a generalized linear modeling framework for pairwise and network meta-analysis of randomized controlled trials. Med Decis Mak. 2013;33(5):607–617. https://doi.org/10.1177/0272989x12458724.CrossRef Google Scholar PubMed

Lu, G, Ades, AE. Combination of direct and indirect evidence in mixed treatment comparisons. Stat Med. 2004;23(20):3105–3124. https://doi.org/10.1002/sim.1875.CrossRef Google Scholar PubMed

Salanti, G. Indirect and mixed-treatment comparison, network, or multiple-treatments meta-analysis: many names, many benefits, many concerns for the next generation evidence synthesis tool. Res Synth Methods. 2012;3(2):80–97. https://doi.org/10.1002/jrsm.1037.CrossRef Google Scholar PubMed

Hong, H, Chu, HT, Zhang, J, Carlin, BP. A Bayesian missing data framework for generalized multiple outcome mixed treatment comparisons. Res Synth Methods. 2016;7(1):6–22. https://doi.org/10.1002/jrsm.1153.CrossRef Google Scholar PubMed

White, IR, Turner, RM, Karahalios, A, Salanti, G. A comparison of arm-based and contrast-based models for network meta-analysis. Stat Med. 2019;38(27):5197–5213. https://doi.org/10.1002/sim.8360.CrossRef Google Scholar PubMed

Shih, M-C, Tu, Y-K. Evaluating network meta-analysis and inconsistency using arm-parameterized model in structural equation modeling. Res Synth Methods. 2019;10(2):240–254. https://doi.org/10.1002/jrsm.1344.CrossRef Google Scholar PubMed

Tu, Y-K. Use of generalized linear mixed models for network meta-analysis. Med Decis Mak. 2014;34(7):911–918. https://doi.org/10.1177/0272989x14545789.CrossRef Google Scholar PubMed

Hong, H, Chu, H, Zhang, J, Carlin, BP. Rejoinder to the discussion of “a Bayesian missing data framework for generalized multiple outcome mixed treatment comparisons,” by S. Dias and A. E. Ades. Res Synth Methods. 2016;7(1):29–33. https://doi.org/10.1002/jrsm.1186.CrossRef Google Scholar

Dias, S, Ades, AE. Absolute or relative effects? Arm-based synthesis of trial data. Res Synth Methods. 2016;7(1):23–28. https://doi.org/10.1002/jrsm.1184.CrossRef Google Scholar PubMed

Lord, FM. A paradox in the interpretation of group comparisons. Psychol Bull. 1967;68:304–305.10.1037/h0025105CrossRef Google Scholar PubMed

Lord, FM. Statistical adjustments when comparing preexisting groups. Psychol Bull. 1969;72:337–338.CrossRef Google Scholar

Dias, S, Welton, NJ, Sutton, AJ, Ades, AE. Evidence synthesis for decision making 5: the baseline natural history model. Med Decis Mak. 2013;33(5):657–670 (In Eng.). https://doi.org/10.1177/0272989X13485155.CrossRef Google Scholar PubMed

Holland, PW, Rubin, DB. On Lord’s paradox. In: Wainer H, Messick S, eds. Principles of Modern Psychological Measurement. Routledge; 1983.Google Scholar

Shiffrin, RM. Lord’s paradox: a commentary on causal inference. Ann Biomet Biostat. 2020;5(1):1034. https://doi.org/10.47739/2374-0116/1034.Google Scholar

Tu, YK, Gunnell, DJ, Gilthorpe, MS. Simpson’s paradox, Lord’s paradox, and suppression effects are the same phenomenon - the reversal paradox. Emerg Themes Epidemiol. 2008;5:2.CrossRef Google Scholar PubMed

Wainer, H. Adjusting for differential base rates: Lord’s paradox again. Psychol Bull. 1991;109:147–151.10.1037/0033-2909.109.1.147CrossRef Google Scholar

Reichardt, CS. Regression facts and artifacts. Eval Program Plann. 2000;23(4):411–414. https://doi.org/10.1016/S0149-7189(00)00030-6.CrossRef Google Scholar

White, IR, Barrett, JK, Jackson, D, Higgins, JP. Consistency and inconsistency in network meta-analysis: model estimation using multivariate meta-regression. Res Synth Methods. 2012;3(2):111–125. https://doi.org/10.1002/jrsm.1045.CrossRef Google Scholar PubMed

Pearl, J. Lord’s Paradox revisited – (Oh Lord! Kumbaya! J Causal Infer. 2016;4(2). https://doi.org/10.1515/jci-2016-0021.CrossRef Google Scholar

Senn, S. Change from baseline and analysis of covariance revisited. StatMed. 2006;25(24):4334–4344. https://doi.org/10.1002/sim.2682.Google Scholar PubMed

Li, H, Shih, M-C, Song, C-J, Tu, Y-K. Bias propagation in network meta-analysis models. Res Synth Methods. 2023;14(2):247–265. https://doi.org/10.1002/jrsm.1614.CrossRef Google Scholar PubMed

Table 1 Summary statistics for the hypothetical body weight data with 100 male and 100 female students

Figure 3 Directed acyclic graphs for (a) Lord’s Paradox:$G$represents gender;${W}_0$and${W}_1$denote the baseline body weight and final weight, respectively;$Y$is the weight gain and (b) NMA models:$D$represents study design (XY vs XZ);${T}_B$and${T}_I$denote the effects of baseline treatment$X$and the intervention treatment (Y or Z);$E$is the difference in the effects between$Y$and$Z$.