One-step parametric network meta-analysis models using the exact likelihood that allow for time-varying treatment effects

Harlan Campbell; Dylan Maciel; Keith Chan; Jeroen P. Jansen; Sven Klijn; Kevin Towle; Bill Malcolm; Shannon Cope

doi:10.1017/rsm.2025.21

One-step parametric network meta-analysis models using the exact likelihood that allow for time-varying treatment effects

Published online by Cambridge University Press: 15 May 2025

Keith Chan ,

Sven Klijn ,

Bill Malcolm and

Harlan Campbell: Affiliation:
Health Economics and Outcomes Research, Precision AQ, Vancouver, BC, Canada
Dylan Maciel: Affiliation:
Health Economics and Outcomes Research, Precision AQ, Vancouver, BC, Canada
Keith Chan: Affiliation:
Health Economics and Outcomes Research, Precision AQ, Vancouver, BC, Canada
Jeroen P. Jansen: Affiliation:
Health Economics and Outcomes Research, Precision AQ, Vancouver, BC, Canada
Sven Klijn: Affiliation:
Bristol Myers Squibb, Princeton, NJ, USA
Kevin Towle: Affiliation:
Health Economics and Outcomes Research, Precision AQ, Vancouver, BC, Canada
Bill Malcolm: Affiliation:
Bristol Myers Squibb, Uxbridge, UK
Shannon Cope*: Affiliation:
Health Economics and Outcomes Research, Precision AQ, Vancouver, BC, Canada
*: Corresponding author: Shannon Cope; Email: shannon.cope@precisionvh.com

Article contents

Abstract
Highlights
Introduction
A parametric IPD-NMA model for TTE outcomes
Simulation study
Illustrative examples with existing empirical data
Discussion
Author contributions
Competing interest statement
Data availability statement
Funding statement
Footnotes
References

Rights & Permissions

Abstract

The importance of network meta-analysis (NMA) methods for time-to-event (TTE) that do not rely on the proportional hazard (PH) assumption is increasingly recognized in oncology, where clinical trials evaluating new interventions versus standard comparators often violate this assumption. However, existing NMA methods that allow for time-varying treatment effects do not directly leverage individual events and censor times that can be reconstructed from Kaplan–Meier curves, which may be more accurate than discrete hazards. They are also challenging to implement given reparameterizations that rely on discrete hazards. Additionally, two-step methods require assumptions regarding within-study normality and variance. We propose a one-step fully Bayesian parametric individual patient data (IPD)-NMA model that fits TTE data with the exact likelihood and allows for time-varying treatment effects. We define fixed or random effects with the following distributions: Weibull, Gompertz, log-normal, log-logistic, gamma, or generalized gamma distributions. We apply the one-step model to a network of randomized controlled trials (RCTs) evaluating multiple interventions for advanced melanoma and compare results with those obtained with the two-step approach. Additionally, a simulation study was performed to compare the proposed one-step method to the two-step method. The one-step method allows for straightforward model selection among the “standard” distributions, now including gamma and generalized gamma, with treatment effects on either the scale alone or with multivariate treatment effects. Generalized gamma offers flexibility to model U-shaped hazards within a network of RCTs, with accessible interpretation of parameters that simplifies to exponential, Weibull, log-normal, or gamma in special cases.

Keywords

comparative effectiveness research evidence synthesis network meta-analysis parametric models time-to-event outcomes

Information

Type: Research Article
Information: Research Synthesis Methods , Volume 16 , Issue 4 , July 2025 , pp. 650 - 671

DOI: https://doi.org/10.1017/rsm.2025.21 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Open Practices: Open data Open materials
Copyright: © Bristol-Myers Squibb Company, 2025. Published by Cambridge University Press on behalf of The Society for Research Synthesis Methodology

Highlights

What is already known

Network meta-analysis (NMA) methods that do not rely on the proportional hazard assumption (for time-to-event outcomes) are often required in oncology. Individual event and censor times from clinical trials can typically be reconstructed from published Kaplan–Meier curves, but existing NMA methods used to synthesize these trials do not directly leverage this individual-level data.

What is new

We propose using a one-step fully Bayesian parametric NMA model that leverages individual event and censor times with the exact likelihood and allows for time-varying treatment effects. By using the exact likelihood specification, this model offers a flexible fit while avoiding certain assumptions/approximations required with existing approaches. The Bayesian framework ensures that model selection and interpretation is straightforward and efficient computation for fitting the model can be done using Stan software.

Potential impact for RSM readers outside the authors’ field

Flexible NMA methods are particularly important when there are differences in the survival distributions for the different treatments being compared, and relative treatment effects need to be extrapolated beyond the available trial data for a cost-effectiveness analysis. In this context, the one-step parametric NMA model offers a valuable tool that leverages individual-level data.

1 Introduction

Network meta-analysis (NMA) is a widely used statistical technique that expands upon standard pairwise meta-analysis, enabling simultaneous direct and indirect comparisons of multiple interventions within a unified statistical model.Reference Ades¹^– Reference Lu and Ades³ Given that randomized controlled trials (RCTs) in oncology rarely compare the new intervention to all relevant comparators, NMAs are often needed to synthesize time-to-event outcomes (TTE), such as progression-free survival (PFS) and overall survival (OS).Reference Ades, Welton, Dias, Phillippo and Caldwell⁴

The synthesis of TTE outcomes is typically based on hazard ratios (HRs)Reference Fay and Li⁵ derived from the Cox proportional hazards (PH) model.Reference Williamson, Smith, Hutton and Marson⁶^, Reference Batson, Greenall and Hudson⁷ However, for many reasons, the PH assumption may be implausible,Reference Li, Han, Hou, Chen and Chen⁸^– Reference Royston and Parmar¹⁰ for example, due to treatment effects that vary over time, treatment crossover, delayed treatment effects, competing risks, time-varying covariates, and differential dropout.Reference Magirr¹¹^– Reference Rahman, Fell and Ventz¹³ When the PH assumption does not hold, NMAs that rely on trial-specific HRs may yield biased estimates that result in poor predictions,Reference Cope, Chan and Campbell¹⁴ which are important for cost-effectiveness modeling.

Several methods have been proposed as alternatives to NMAs based on reported HRs, which have been reviewed by Cope et al.Reference Cope, Chan and Campbell¹⁴ Crowther et al.Reference Crowther, Riley, Staessen, Wang, Gueyffier and Lambert¹⁵ proposed a model for individual patient data (IPD) meta-analysis using Poisson regression models that allows for non-proportional hazards by prespecifying specific time-points at which the HR between two treatments is allowed to change. Freeman and CarpenterReference Freeman and Carpenter¹⁶ proposed NMA models using cubic splines to model the log-cumulative hazard functions. While flexible, these cubic spline models are restricted to HRs that specifically vary with log-time. Another possibility, proposed by Petit et al.Reference Petit, Blanchard, Pignon and Lueza¹⁷ is to consider the difference in restricted mean survival time at a prespecified time horizon as an alternative to the HR and to fit a standard univariate NMA. While practical, prespecifying the time horizon of interest in accordance with clinical interest may be challenging.

Ouwens et al.Reference Ouwens, Philips and Jansen¹⁸ and JansenReference Jansen¹⁹ proposed using multivariate parametric NMA models that allow for the HR to change over time. A single parametric survival distribution is used to model interventions in each trial with multivariate relative treatment effects, which are pooled and indirectly compared across trials. This means that interventions can impact both the shape and scale of a selected distribution, resulting in a time-varying treatment effect, which improves model fit and predictions. However, these models were proposed prior to the development of an algorithm to reconstruct event and censor times from the published Kaplan–Meier (KM) curves.Reference Guyot, Ades, Ouwens and Welton²⁰ Consequently, they derived discrete hazards from “binned” events over time assuming a binomial likelihood per interval, which may be less accurate than assuming the actual event and censor times follow the likelihood for a particular distribution. Moreover, parameters for these models were transformed to a linear scale, and as a consequence, implementation of these models can be challenging given that the parameterization does not align with standard software (i.e., flexsurv in R).Reference Salanti²¹

To address these limitations, Cope et al.Reference Cope, Chan and Jansen²² proposed a two-step NMA approach, first fitting parametric survival distributions using frequentist maximum likelihood estimation to each arm of each trial (using the flexsurv R packageReference Jackson²³). In the second step, the arm-specific parameter estimates, their standard errors, and the correlation between the parameters were synthesized using a Bayesian multivariate Normal NMA.Reference Achana, Cooper and Bujkiewicz²⁴

Burke et al.Reference Burke, Ensor and Riley²⁵ and Debray et al.Reference Debray, Moons, Abo-Zaid, Koffijberg and Riley²⁶ discuss the pros and cons of two-step approaches for IPD meta-analysis relative to one-step approaches. Specifically, the two-step approach for NMA by Cope et al.Reference Cope, Chan and Jansen²² offers practical benefits in terms of using established statistical software packages and possibly improving computational burden.Reference Cope, Chan and Campbell¹⁴^, Reference Freeman, Cooper, Sutton, Crowther, Carpenter and Hawkins²⁷ However, it was only applied to a subset of relevant standard distributions (Weibull, Gompertz, log-normal, and log-logistic), with parameters transformed to be on a linear scale as in Ouwens et al.Reference Ouwens, Philips and Jansen¹⁸ It also relies on bootstrapping to account for parameter correlationReference Debray, Moons, Abo-Zaid, Koffijberg and Riley²⁶^, Reference Kanters, Karim, Thorlund, Anis and Bansback²⁸ and requires assumptions regarding within-study normality and variance.Reference Burke, Ensor and Riley²⁵^, Reference Jackson and White²⁹ Given that these assumptions can result in certain unexpected biases, Jackson and WhiteReference Jackson and White²⁹ recommend that, when possible, to use one-step methods that make fewer normality assumptions. Therefore, there is a need for one-step NMA models with time-varying treatment effects that can also be applied to distributions such as gamma and generalized gamma that relate the available IPD (whether available or reconstructed) directly to the likelihoods of interest without complex reparameterizations.

In this article, a one-step parametric IPD-NMA framework for TTE outcomes is proposed, which allows for time-varying (i.e., multivariate) treatment effects with the exact likelihood. In Section 2, we extend existing one-step PH models (exponential, Weibull, and Gompertz) or accelerated failure time models (Weibull, log-logistic, gamma, log-normal, and generalized gamma)Reference Freeman, Cooper, Sutton, Crowther, Carpenter and Hawkins²⁷^, Reference Saramago, Chuang and Soares³⁰^, Reference Phillippo³¹ to have time-varying treatment effects. In Section 3, we evaluate the proposed method in a simulation study, and in Section 4, we apply the model to a network of RCTs evaluating multiple interventions for advanced melanoma regarding OS. We conclude in Section 5 with a general discussion, where we explain that proposed models provide the foundation for extensions to the multilevel network meta-regression (ML-NMR) framework to synthesize IPD and aggregate data (AD), while adjusting for potential prognostic factors and effect modifiers.Reference Phillippo³¹

2 A parametric IPD-NMA model for TTE outcomes

We follow the parametrization used by PhillippoReference Phillippo³¹ and adopt the reference treatment parameterization defining ${t}_{ijk}$ and ${y}_{ijk}$ as the event time and censoring indicator, respectively, of individual i, in study j, with treatment k, for i in 1,…,N, for j in 1,…,J, and for k in 1,…,K. Importantly, k = 1 is considered the “reference treatment.” We outline the logic and formulas for the Weibull, which can be extended to other standard distributions.

PhillippoReference Phillippo³¹ defines a PH Weibull likelihood for IPD (without covariates) as

(1)

$$\begin{align}{L}_{i,j,k}\left(t,{y}_{i,j,k}\right)={S}_{j,k}(t){\lambda}_{j,k}{(t)}^{y_{i,j,k}},\end{align}$$

where the hazard function is

(2)

$$\begin{align}{\lambda}_{j,k}(t)={\upsilon}_j{t}^{\left({\upsilon}_j-1\right)}{\alpha}_j\exp \left({\gamma}_k\right)\end{align}$$

and the survivor function is

(3)

$$\begin{align}{S}_{j,k}(t)=\exp \left(-{\alpha}_j\exp \left({\gamma}_k\right){t}^{\upsilon_j}\right),\end{align}$$

where ${\gamma}_1$ = 0, as treatment 1 is the “reference treatment.” As such, each study has a unique baseline hazard function ( ${\lambda}_{j,1}(t)={\upsilon}_j{t}^{\upsilon_j-1}{\alpha}_j$ ) defined by the shape parameter, ${\upsilon}_j$ , and scale parameter, ${\alpha}_j$ , for study j, in 1,…,J. This specification restricts the scale and shape parameters to ${\alpha}_j$ > 0 and ${\upsilon}_j$ > 0, for j in 1,…,J, and for k in 1,…,K. It is noteworthy that when the shape parameter, ${\upsilon}_j,$ is <1, the hazard rate decreases over time, whereas when ${\upsilon}_j$ is >1, the hazard rate increases over time. When ${\upsilon}_j=1$ , the hazard rate is constant over time, and the Weibull reduces to the exponential distribution.

The treatment effect is represented by ${\gamma}_k$ , and PhillippoReference Phillippo³¹ proposes that priors for the model parameters can be specified as

(4)

$$\begin{align}\log \left({\unicode{x3b1}}_{\mathrm{j}}\right)\sim \mathrm{Normal}\big(0,{100}^2\big),\mathrm{for}\;{j}\;\mathrm{in}\;1,\dots, {J};\end{align}$$

(5)

$$\begin{align}{\upsilon}_{\mathrm{j}}\sim \mathrm{Uniform}\left(0,+\infty \right),\mathrm{for}\;{j}\;\mathrm{in}\;1,\dots, {J};\mathrm{and}\end{align}$$

(6)

$$\begin{align}{\unicode{x3b3}}_{\mathrm{k}}\sim \mathrm{Normal}\big(0,{100}^2\big),\mathrm{for}\;{k}\;\mathrm{in}\;2,\dots, {K}.\end{align}$$

We extend the PH Weibull model proposed by PhillippoReference Phillippo³¹ by defining the hazard function as:

(7)

$$\begin{align}{\lambda}_{j,k}(t)={\upsilon}_{j,k}{t}^{\left({\upsilon}_{j,k}-1\right)}{\alpha}_{j,k}\end{align}$$

and the survivor function as

(8)

$$\begin{align}{S}_{j,k}(t)=\exp \!\left(-{\alpha}_{j,k}{t}^{\upsilon_{j,k}}\right),\end{align}$$

where ${\alpha}_{j,k}$ is the scale parameter and ${\upsilon}_{j,k}$ is the shape parameter for the k-th treatment in the j-th study. An NMA model with an arm-based likelihoodReference Phillippo³¹ is then specified as

(9)

$$\begin{align}\left(\genfrac{}{}{0pt}{}{\log \left({\alpha}_{j,k}\right)}{\log \left({\upsilon}_{j,k}\right)}\right)=\left(\genfrac{}{}{0pt}{}{\mu_{1,j}}{\mu_{2,j}}\right)+\left(\genfrac{}{}{0pt}{}{\delta_{1,j,k}}{\delta_{2,j,k}}\right),\end{align}$$

where ${\delta}_{1,j,1}={\delta}_{2,j,1}=0$ , for j in 1,…,J. It is noteworthy that, in this notation, the first subscript used for $\mu$ and $\delta$ differentiates between the shape and scale parameters. Also note that the PH Weibull model from Phillippo³¹ can be recovered by setting: ${\delta}_{1,j,k}={\gamma}_k$ , ${\mu}_{1,j}={\alpha}_j$ , $\exp \left({\mu}_{2,j}\right)={\upsilon}_j$ , and $\exp \left({\delta}_{2,j,k}\right)=1.$

A fixed-effect (FE) NMA model sets ${\delta}_{1,j,k}$ = ${d}_{1,k}$ and ${\delta}_{2,j,k}$ = ${d}_{2,k}$ , for k in 1,…,K, so that treatment effects are not study-specific. Alternatively, a random-effects (RE) NMA model is specified with study-specific treatment effects such that, for j in 1,…,J, we define the following multivariate normal distribution:

(10)

$$\begin{align}\left(\begin{array}{c}{\delta}_{1,j,2}\\ {}{\delta}_{2,j,2}\\ {}\begin{array}{c}\vdots \\ {}{\delta}_{1,j,K}\\ {}{\delta}_{2,j,K}\end{array}\end{array}\right)\sim \mathrm{MVN}\left(\left(\begin{array}{c}{d}_{1,2}\\ {}{d}_{2,2}\\ {}\begin{array}{c}\vdots \\ {}{d}_{1,K}\\ {}{d}_{2,K}\end{array}\end{array}\right),{\Sigma}_{\delta}\right),\end{align}$$

where ${\Sigma}_{\delta }$ , the 2(K−1) by 2(K−1) covariance matrix (following Cope et al.Reference Cope, Chan and Jansen²²), is defined as

(11)

For both the FE and RE models, the ${d}_{1,k}$ and ${d}_{2,k}$ parameters represent the relative effect of treatment k versus treatment 1 within each of the J study populations (see White et al.Reference White, Turner, Karahalios and Salanti³²).

All Bayesian models require defining priors for all model parameters. For j in 1,…,J, and for k in 2,…,K, wide Normal priors with a mean of 0 and a variance of 1,000 can be specified for ${\mu}_{1,j}$ , ${\mu}_{2,j}$ and for ${d}_{1,k}$ and ${d}_{2,k}$ (also following Cope et al.Reference Cope, Chan and Jansen²²):

$$\begin{align*}{\mu}_{1,j}\sim \mathrm{Normal}\left(0,\mathrm{1,000}\right)\!,\\{\mu}_{2,j}\sim \mathrm{Normal}\left(0,\mathrm{1,000}\right)\!,\\{d}_{1,k}\sim \mathrm{Normal}\left(0,\mathrm{1,000}\right)\!,\mathrm{and}\\{d}_{2,k}\sim \mathrm{Normal}\left(0,\mathrm{1,000}\right)\!.\end{align*}$$

In the RE NMA, a prior is also needed for $\Sigma$ (or alternatively for the individual ${\sigma}_1$ , ${\sigma}_2$ , and $\rho$ parameters). Following Cope et al.Reference Cope, Chan and Jansen²² and Jansen Reference Jansen¹⁹ we can define the following Inverse-Wishart prior:

$$\begin{align*}\Sigma \sim \mathrm{IW}\left(P,2\right),\end{align*}$$

where P is a given 2 by 2 scale matrix.

It is noteworthy that, alternatively, a somewhat less complex RE model could be defined whereby RE are specified for only one of the two treatment effect parameters. This simpler model could be useful when there are reasons to believe that one parameter varies between studies while the rest remain relatively constant. For example, a model could define study-specific treatment effects with respect to the scale parameter, while the shape parameter would remain fixed across studies. For instance, one could define, for k in 1, …,K, ${\delta}_{2,j,k}$ = ${d}_{2,k}$ , and for j in 1,…, J:

(12)

We also note that alternative prior specifications are also possible, and various parameterizations of the variance–covariance matrix specified in equations (11) and (12) could be considered (see Wei and HigginsReference Wei and Higgins³³). Our multivariate Weibull NMA model can be easily generalized to other parametric distributions. We outline standard parametric models in Table 1, including the Weibull, Gompertz, log-normal, log-logistic, gamma, and generalized gamma distributions. The generalized gamma distribution is characterized by three parameters and in our NMA model; only two of these three parameters are study- and treatment-specific. The third parameter, Q, is assumed to have the same value for all individuals across all studies and treatments within the network.

Table 1 Survival and hazard functions for standard parametric survival models

Note: For reference, the parameterization used (“standard parameterization”) corresponds to that used in the flexsurv R packageReference Jackson²³ following the equivalences listed in the “Parameters” columns. Recall that our model specification (equation (9)) restricts the scale and shape parameter to be strictly positive (i.e., ${\alpha}_{j,k}$ > 0 and ${\upsilon}_{j,k}$ > 0, for j in 1,…,J, and for k in 1,…,K). Note that $\phi \left(\bullet \right)$ and $\varPhi \left(\bullet \right)$ denote the probability density function and the cumulative density function of the standard normal distribution, respectively; $\varGamma \left(\bullet \right)$ and $\gamma \left(\bullet, \bullet \right)$ denote the gamma function and the incomplete gamma function, respectively; and ${\rho}_{j,k}(t)=\left[\mathit{\log}(t)-\mathit{\log}\left({\alpha}_{j,k}\right)\right]/{\upsilon}_{j,k}$ .

3 Simulation study

In this section, we consider a simple simulation study to investigate the validity of the proposed one-step model and compare it to the two-step model. In this simulation study, both the proposed one-step model and two-step model from Cope et al.Reference Cope, Chan and Jansen²² are used to analyze 5,000 simulated datasets, assuming FE with the Weibull distribution. Jackson and WhiteReference Jackson and White²⁹ suggest that a one-step approach may be preferable due to the “hidden normality” assumptions required with a two-step approach, which can, at least in theory, “have serious implications for the accuracy of the resulting statistical inference.” The simulation study is based on the work in Section 7.3 of Phillippo,Reference Phillippo³¹ where a single simulated dataset is considered consisting of two RCTs: one trial comparing treatments A versus B, and the other trial comparing treatments A versus C. Here, we extend the network to include a third trial comparing treatments C versus D. In the Supplementary Material, additional analyses are presented to demonstrate how the proposed one-step model fits when different distributions are assumed.

3.1 Data-generating mechanism

As illustrated in Figure 1, each artificial dataset includes survival data for three two-arm RCTs: Study j = 1 compares treatment k = 2 to k = 1 (henceforth, “Study AB” comparing intervention B with A); Study j = 2 compares treatment k = 3 to k = 1 (“Study AC”); and Study j = 3 compares treatment k = 4 to k = 3 (“Study CD”). Each study consists of N_study individuals randomly assigned (with a 1:1 randomization ratio) to one of the two treatment arms. We considered two scenarios: (1) with N_study = 36 and (2) with N_study = 100.

Figure 1 Network diagram of artificial randomized controlled trials.

We simulated survival times to obtain time-varying treatment effects. Specifically, survival times were simulated from Weibull distributions corresponding to the FE Weibull model outlined in equations (7)–(9) using the cumulative distribution function inversion method (as implemented in the R package simsurvReference Brilleman, Wolfe, Moreno-Betancur and Crowther³⁴) with the following parameters: ${\mu}_{1,1}={\mu}_{1,2}={\mu}_{1,3}=1.5$ , ${\mu}_{2,1}={\mu}_{2,2}={\mu}_{2,3}=0.5, {d}_{1,2}=1.2$ , ${d}_{2,2}=0.6,{d}_{1,3}=0.5,$ ${d}_{2,3}=-0.3,{d}_{1,4}=-0.5,$ and ${d}_{2,4}=-0.7$ .

Potential censoring times were simulated independently of event times from a Uniform(0,1) distribution, and a random 10% of individuals were selected for potential censoring. An individual was ultimately censored if their survival time was greater than their potential censoring time. All individuals who had not experienced an event by 1 year were censored at 1 year. KM plots illustrate the event and censor times for one of the simulated datasets in Figure 2.

Figure 2 Kaplan–Meier survival curves for simulated event times, for each treatment (colors) in each study (panels). Censored events are marked with a cross (“+”).

3.2 Methods to estimate treatment effects based on the simulated data

The relative treatment effects for the competing interventions based on the artificial RCTs in the network were assessed using a one-step FE IPD-NMA model assuming a Weibull distribution for event times (as outlined in equations (7)–(9)). For comparison, we also fit the data using the two-step approach of Cope et al.,Reference Cope, Chan and Jansen²² also assuming an FE model and a Weibull distribution. Both the one-step and two-step models used the parameterization specified in Table 1. The parameters for these models were estimated using the Markov Chain Monte Carlo (MCMC) method using R (packages: rstan, loo, and flexsurvReference Jackson²³) and Stan,Reference Phillippo³⁵^– ³⁷ where an initial series of 2,000 iterations from the sampler was discarded (i.e., burn-in) and inferences were based on subsequent 2,000 iterations using four chains.

Results for each dataset were summarized in terms of the posterior medians and 95% credible intervals (CrIs). For each of the two N_study scenarios, across all 5,000 datasets, the mean bias, mean credible interval coverage, and mean credible interval width were calculated.

3.3 Results

Results are summarized in Table 2 and suggest that the NMA estimates obtained from both the one-step and two-step models have little to no bias and that both methods provide 95% CrIs with approximately correct coverage (i.e., the 95% CrIs obtained contain the true parameter value ~95% of the time). For almost all parameter estimates, the one-step method appears to be less biased than the two-step method. The average widths of the 95% CrIs are very similar for the one-step and two-step models, suggesting that both approaches are equally efficient. Comparing results from the N_study = 36 scenario with those from the N_study = 100 scenario, we see that with smaller sample sizes (i.e., with N_study = 36), the estimates have larger biases, and the 95% CrIs are perhaps slightly too narrow. Finally, with respect to the computational resources required, the one-step model took about twice as long to fit compared to the two-step model when N_study = 36 and about three times as long to fit when N_study = 100.

Table 2 Results from the simulation study: for each parameter, the average estimate (averaged over the 2,000 simulated datasets); the bias (average estimate—truth); 95% CrI coverage (proportion of simulation for which the 95% CrI contained the true parameter value); and average 95% CrI width (averaged over the 5,000 simulated datasets)

Note: Survival times were simulated from Weibull distributions corresponding to the FE Weibull model with the following parameters: ${\mu}_{1,1}={\mu}_{1,2}={\mu}_{1,3}=1.5$ , ${\mu}_{2,1}={\mu}_{2,2}={\mu}_{2,3}=0.5,{d}_{1,2}=1.2$ , ${d}_{2,2}=0.6,{d}_{1,3}=0.5,$ ${d}_{2,3}=-0.3,{d}_{1,4}=-0.5, \mathrm{and}\; {d}_{2,4}=-0.7$ . Bolded values indicate smaller bias value (comparing two-step vs. one-step).

Figure 3 Network of evidence for melanoma randomized controlled trials. Node size and line thickness correspond to the number of studies, including the treatment and the treatment comparison. Abbreviations: DTIC, dacarbazine; IFN, interferon (IFN).

Figure 4 Kaplan–Meier plots of the reconstructed individual event and censoring times obtained for each randomized controlled trial. Abbreviations: DTIC, dacarbazine; IFN, interferon (IFN).

4 Illustrative examples with existing empirical data

The illustrative example was based on a network of studies concerning the treatment of advanced (Stage IIIc or IV) melanoma. Details of the network have been presented previously in Jansen et al. Reference Jansen, Vieira and Cope³⁸ and Cope et al.Reference Cope, Chan and Jansen²² Notably, these studies are somewhat dated, having been published between 1991 and 2004, and we refer readers to Boutros et al.Reference Boutros, Croce and Ferrari³⁹ for a review of the current treatment landscape.

Ten RCTs identified in a systematic literature review formed a connected network, illustrated in Figure 3, evaluating four treatments: dacarbazine (DTIC) monotherapy, DTIC + Interferon (IFN), DTIC + Non-IFN, and Non-DTIC. For each treatment arm in each RCT, the reported KM curves were digitized (DigitizeIt; http://www.digitizeit.de/), and the reconstructed individual event and censoring times were obtained using the Guyot et al.Reference Guyot, Ades, Ouwens and Welton²⁰ algorithm (which was recently recommended by Saluja et al.Reference Saluja, Cheng, delos Santos and Chan⁴⁰ following an assessment of its reliability, accuracy, and precision) (see Figure 4).

For each of the 10 studies in the melanoma network, Table 3 lists the number of patients per arm, the number of OS events per arm, the median survival time per arm, and the p-value obtained from applying the Grambsch and Therneau test for PH.Reference Grambsch and Therneau⁵¹ Notably, there is evidence of non-PH in the Chiarion 1992 study, which compared DTIC and DTIC + non-IFN (p-value = 0.021), and therefore, according to Cope et al.Reference Cope, Chan and Campbell¹⁴ and recent guidance (e.g., “if the PH assumption is deemed to be implausible for one or more comparisons in the network, then (network) meta-analysis of HRs should not be carried out”⁵²), a method that does not require the PH assumption should be used for analysis.

Table 3 For each of the 10 studies in the melanoma network: the number of patients per arm, the number of overall survival events per arm, the median survival time per arm, and the p-value obtained from applying the Grambsch and Therneau test for proportional hazard (PH) assumption

4.1 Methods

We fit the one-step and two-step multivariate NMA models with both the fixed and the REs using MCMC with R and StanReference Phillippo³⁵^– ³⁷ as described in Section 3. We defined REs on both the shape and scale parameters (following equation (10)) and specified an Inverse-Wishart prior for $\Sigma$ (following equation (11)) with:

$$\begin{align*}P=\left(\begin{array}{cc}1/10& 0\\ {}0& 1/10\end{array}\right).\end{align*}$$

Consistent with Cope et al.,Reference Cope, Chan and Jansen²² we considered four different distributions for the likelihood: the Weibull, the Gompertz, the log-normal, and log-logistic. It is noteworthy that, for the two-step approach, the parameterization originally used by Cope et al.Reference Cope, Chan and Jansen²² was based on models originally proposed by Ouwens et al.Reference Ouwens, Philips and Jansen¹⁸ In contrast, we fit all models with the more familiar parameterization detailed in Table 1.

The approximate leave-one-out (LOO) information criterion (LOOIC) was used to select the most appropriate model. The LOOIC is similar to the DIC in that smaller values signal better model fit, but, unlike the DIC, it is invariant to parametrization and corresponds to a model’s predictive performance (integrating over the posterior distribution of the parameters) (see Vehtari et al.Reference Vehtari, Gelman and Gabry⁵³). Convergence for each parameter was assessed using the Gelman-Rubin statistic.Reference Gelman and Rubin⁵⁴^, Reference Gelman⁵⁵ Checks for divergent transitions were also performed (see Betancourt et al.Reference Betancourt and Girolami⁵⁶).

4.2 Results

All models converged and appeared to fit the data reasonably well (see predicted survival curves in Figures 5 and 6, and MCMC trace plots for the log-logistic models in Figures 7 and 8). However, for all models to successfully converge, the RStan default settings for the “adapt_delta” and “max_treedepth” control variables needed to be changed following current recommendations (see details in code provided in the Supplemental Materials).³⁷ On a laptop computer (Macbook Pro, with M1 chip and 16 GB memory), the two-step models all took less than a minute to fit, whereas the one-step models took between 1 and 188 minutes (with RE models taking about two and a half times longer to fit than FE models; see Table 4).

Figure 5 Fitted survival functions for all distributions from FE one-step model (Weibull, Gompertz, log-normal, and log-logistic) and Kaplan–Meier curves by treatment arm. Abbreviations: DTIC, dacarbazine; IFN, interferon (IFN).

Figure 6 Fitted study-specific survival functions (using study-specific baseline risk [ ${\boldsymbol{\mu}}_{\boldsymbol{1},\boldsymbol{j}}\;\boldsymbol{and}\;{\boldsymbol{\mu}}_{\boldsymbol{1},\boldsymbol{j}}$ ] and relative treatment effects [ ${\boldsymbol{\delta}}_{\boldsymbol{1},\boldsymbol{j},\boldsymbol{k}}\;\boldsymbol{and}\;{\boldsymbol{\delta}}_{\boldsymbol{2},\boldsymbol{j},\boldsymbol{k}}$ ]) for all distributions from the RE one-step model (Weibull, Gompertz, log-normal, and log-logistic) and Kaplan–Meier curves by treatment arm. Abbreviations: DTIC, dacarbazine; IFN, interferon (IFN).

Figure 7 MCMC trace plots for relative treatment effect parameters of the one-step FE log-logistic NMA model.

Figure 8 MCMC trace plots for relative treatment effect parameters of the one-step RE log-logistic NMA model.

The LOOIC suggests that the log-logistic distribution was most appropriate (Table 4), which is consistent with findings by Cope et al.Reference Cope, Chan and Jansen²² based on the sum of treatment-arm specific AIC values per distribution in the first step of their analysis. The parameter estimates and 95% CrIs for the one-step log-logistic random-effects (and fixed-effect) model are very similar to those obtained using the two-step NMA method (Table 5). The small discrepancies between the estimates may be explained by the assumptions required in the two-step NMA regarding within-study estimates in terms of normality and standard errors and/or by the different priors required for the one-step and two-step models. These findings were consistent across the alternative distributions for the likelihood evaluated (Table 6), which are summarized in terms of the predicted survivals in Figures 5 and 6.

Figure 9 plots the estimated survival curves from the FE and RE one-step models fit to a population with the same baseline risk as the Avril 2004 study population. Within the first 6 months, survival appears highest for DTIC + non-IFN. However, the DTIC + non-IFN and non-DTIC survival curves cross shortly after 6 months (illustrating the effect of time-varying treatment effects), and among all four treatments, long-term survival appears highest for non-DTIC.

Table 4 Computational sampling time required for each model and leave-one-out information criterion (LOOIC)

Note: The LOOIC can be used to select the most appropriate model. The distribution with the lowest LOOIC values, the log-logistic in this case (for both random effects [REs] and fixed effect [FE]), is considered the “best” model. Abbreviation: SE, standard error.

Table 5 Parameter estimates (posterior medians and 95% CrIs) obtained with random-effects (REs) and fixed-effect (FE) log-logistic NMA models: (1) two-step multivariate network meta-analysis model (Cope et al.Reference Cope, Chan and Jansen²²) versus (2) the proposed one-step IPD NMA

Table 6 Parameter estimates (posterior medians and 95% CrIs) obtained with random-effects (REs) and fixed-effect (FE) NMA models: (1) two-step multivariate network meta-analysis model (Cope et al.Reference Cope, Chan and Jansen²²) versus (2) the proposed one-step IPD NMA

Figure 9 Posterior estimates of overall survival (with shaded 95% CrIs) comparing DTIC, DTIC + IFN, DTIC + non-IFN, and Non-DTIC in the Avril 2004 population from the one-step NMA FE (top panel) and RE (bottom panel) models. Abbreviations: DTIC, dacarbazine; IFN, interferon (IFN).

Table 7 Estimates obtained from the one-step and two-step log-logistic models regarding overall survival at 1 year and at 2 years comparing DTIC + IFN and non-DTIC fit for a population with baseline risk of the Avril 2004 study population

The treatment effect estimates can be summarized in many ways, and Cope and JansenReference Cope and Jansen⁵⁷ consider how various quantitative summaries compare using the same melanoma network as an illustrative example.Reference Cope and Jansen⁵⁷ Suppose one was specifically interested in OS at 1 year and at 2 years, and was interested in comparing DTIC + IFN and non-DTIC, a treatment comparison for which there is no direct evidence. Table 7 lists the relevant estimates obtained from the one-step and two-step log-logistic models fit to a population with the same baseline risk as the Avril 2004 study population. Briefly, with the one-step FE log-logistic model, the OS at 1 year with DTIC + IFN is estimated to be 20% (95% CrI: [12%, 30%]), slightly lower than with non-DTIC: 30% (95% CrI: [24%, 37%]). The estimated difference in OS at 1 year is −10% (95% CrI: [−20%, 1%]), and at 2 years is −6% (95% CrI: [−11%, −1%]). These findings were consistent for one-step and two-step models. Credible intervals are notably wider with RE models relative to FE models.

5 Discussion

The proposed multivariate NMA models for TTE outcomes provide a one-step framework that allows for time-varying treatment effects, which leverage exact (or reconstructed) event and censor times for the studies in a connected network of evidence. By using the exact likelihood specification, we avoid the assumptions regarding within-study normality and variance required in the two-step method (Cope et al.Reference Cope, Chan and Jansen²²). This allows the entire model to be fit within a Bayesian framework, facilitating straightforward model selection and interpretation. Potential concerns regarding computational burden are mitigated by using Stan software, which provides a more efficient MCMC sampling than previous software (i.e., WinBUGS or JAGS) through the implementation of the Hamiltonian Monte Carlo algorithm (see Monnahan et al.Reference Monnahan, Thorson and Branch⁵⁸).

While we discussed using the LOOIC for model selection, often, valuable insight regarding the plausibility of different models may also be obtained from clinical expertsReference Latimer⁵⁹^– Reference Latimer⁶¹ and observational studies.Reference Ruiz-Morales, Swierkowski and Wells⁶² Moreover, when there is no reason to believe that treatment effects vary over time, simpler models with fewer parameters may be more appropriate. Cope et al.Reference Cope, Chan and Campbell¹⁴ previously proposed a stepwise process exploring standard parametric models with treatment effect on scale alone, followed by models with multivariate treatment effects (scale and shape) for each trial and network (or other more flexible models), which builds upon the process outlined by Latimer et al.Reference Latimer⁵⁹ for survival analysis of a single RCT in context of cost-effectiveness analysis. The proposed one-step models allow this process to be applied to all the “standard” distributions, now including gamma and generalized gamma. Further, the generalized gamma distribution may be particularly useful to improve fit over other standard distributions to a network of RCTs given the flexibility to model U-shaped hazards with this distributionReference Cox, Chu, Schneider and Munoz⁶³ as well as the nature of its nested distributions (i.e., exponential, Weibull, log-normal, and gamma).

The simulation study in Section 3 found that the proposed one-step method and the previously proposed two-step method provide broadly similar results. When study sample sizes are especially small, the one-step method may be preferred, in line with previous recommendations.Reference Jackson and White²⁹ However, due to finite computational resources, the simulation study was limited in that we were unable to consider fitting RE models and did not consider a wide range of sample sizes and prior specifications. More extensive simulation studies may be helpful to better understand the impact of model misspecification, the differences between fitting FE and RE models, and the differences between using one-step and two-step methods.Reference Heinze, Boulesteix, Kammer, Morris and White⁶⁴

The illustrative melanoma example highlighted an application to a network of multiple trials, where we demonstrated the feasibility of accounting for between-study heterogeneity in terms of both the shape and the scale parameters. However, simpler random-effect models specified for only one of the two treatment effect parameters may often be sufficient. We advise consulting clinicians to consider whether between-study heterogeneity is most likely to affect the scale versus the shape of the distributions. Future research should consider the merits of using different priors and parameterizations for the variance–covariance matrix,Reference Wang, Lin, Hodges and Chu⁶⁵^, Reference Alvarez, Niemi and Simpson⁶⁶ potentially based on incorporating external evidence on the between-trial heterogeneity.Reference Turner, Domínguez-Islas, Jackson, Rhodes and White⁶⁷

It may be of interest to extend the proposed methods to nonparametric IPD-NMA models. While potentially flexible, nonparametric models can be considerably more complex and may not be among the first NMA models to consider when following recommended model selection procedures in the broader cost-effectiveness framework.Reference Latimer⁵⁹ Nonetheless, there may be utility in extending the current framework to include additional nonparametric methods.

During the development of this publication, Phillippo et al. proposed the extension the ML-NMR framework for TTE data, considering networks in which IPD-level covariate data is only available for a subset of trials within the network (with aggregate-level covariate data available for the remaining studies).Reference Phillippo, Dias, Ades and Welton⁶⁸ This allows for the adjustment of prognostic factors and effect modifiers when information on these is available. Maciel et al.Reference Maciel, Cope and Towle⁶⁹ demonstrated the feasibility of applying ML-NMR for TTE outcomes with a univariate treatment effect. The main advantage of NMA models with a multivariate treatment effect as proposed is that they do not rely on a PH assumption across studies and treatment comparisons. This flexibility may be important when there are differences in the survival distributions for the treatments compared, and relative treatment effects need to be extrapolated beyond the available trial data for a cost-effectiveness analysis.

Author contributions

Concept and design: H.C., D.M., K.C., J.J., S.K., K.T., B.M., and S.C.; Statistical analysis: D.M. and H.C.; Analysis and interpretation of results: H.C., D.M., K.C., J.J., S.K., K.T., B.M., and S.C.; Drafting of the manuscript: H.C. and S.C.; Critical revision of the paper for important intellectual content: H.C., D.M., K.C., J.J., S.K., K.T., B.M., and S.C.; Obtaining funding: S.K. and S.C.; Administrative, technical, or logistic support: K.T.; Supervision: S.C.

Competing interest statement

S.K. and B.M. are employees of Bristol Myers Squibb. H.C., D.M., K.C., J.J., K.T., and S.C. are employees of Precision AQ, a consulting firm that received funding from Bristol Myers Squibb for this study.

Data availability statement

We provide Stan code for both the two-step and one-step FE and RE models in the Supplementary Material. Also in the Supplementary Material, R code and data are provided to fit the one-step NMA models to the illustrative melanoma example from Section 4.

Funding statement

Funding for this study was provided by Bristol Myers Squibb.

Supplementary material

To view supplementary material for this article, please visit http://doi.org/10.1017/rsm.2025.21.

Footnotes

This article was awarded Open Data and Open Materials badges for transparent practices. See the Data availability statement for details.

References

Ades, AE. A chain of evidence with mixed comparisons: models for multi-parameter synthesis and consistency of evidence. Stat Med. 2003;22(19):2995–3016. https://doi.org/10.1002/sim.1566 CrossRef Google Scholar PubMed

Caldwell, DM, Ades, AE, Higgins, JPT. Simultaneous comparison of multiple treatments: combining direct and indirect evidence. BMJ. 2005;331(7521):897–900. https://doi.org/10.1136/bmj.331.7521.897 CrossRef Google Scholar PubMed

Lu, G, Ades, AE. Combination of direct and indirect evidence in mixed treatment comparisons. Stat Med. 2004;23(20):3105–3124. https://doi.org/10.1002/sim.1875 CrossRef Google Scholar

Ades, AE, Welton, NJ, Dias, S, Phillippo, DM, Caldwell, DM. Twenty years of network meta-analysis: continuing controversies and recent developments. Res Synth Methods. 2024;15(5):702–727.CrossRef Google Scholar PubMed

Fay, MP, Li, F. Causal interpretation of the hazard ratio in randomized clinical trials. Clin Trials. 2024;21(5):623–635.CrossRef Google Scholar PubMed

Williamson, PR, Smith, CT, Hutton, JL, Marson, AG. Aggregate data meta-analysis with time-to-event outcomes. Stat Med. 2002;21(22):3337–3351. https://doi.org/10.1002/sim.1303 CrossRef Google Scholar PubMed

Batson, S, Greenall, G, Hudson, P. Review of the reporting of survival analyses within randomised controlled trials and the implications for meta-analysis. PloS One. 2016;11(5):e0154870. https://doi.org/10.1371/journal.pone.0154870 CrossRef Google Scholar PubMed

Li, H, Han, D, Hou, Y, Chen, H, Chen, Z. Statistical inference methods for two crossing survival curves: a comparison of methods. PLOS ONE. 2015;10(1):e0116774. https://doi.org/10.1371/journal.pone.0116774 CrossRef Google Scholar PubMed

Schadendorf, D, Hodi, FS, Robert, C, et al. Pooled analysis of long-term survival data from Phase II and Phase III trials of ipilimumab in unresectable or metastatic melanoma. J Clin Oncol. 2015;33(17):1889–1894. https://doi.org/10.1200/jco.2014.56.2736 CrossRef Google Scholar PubMed

Royston, P, Parmar, M. Restricted mean survival time: an alternative to the hazard ratio for the design and analysis of randomized trials with a time-to-event outcome. BMC Med Res Methodol. 2013;13:152. https://doi.org/10.1186/1471-2288-13-152 CrossRef Google Scholar

Magirr, D. Non-proportional hazards in immuno-oncology: is an old perspective needed? Pharm Stat. 2021;20(3):512–527.CrossRef Google Scholar

Ristl, R, Ballarini, NM, Götte, H, Schüler, A, Posch, M, König, F. Delayed treatment effects, treatment switching and heterogeneous patient populations: how to design and analyze RCTs in oncology. Pharm Stat. 2021;20(1):129–145.CrossRef Google Scholar PubMed

Rahman, R, Fell, G, Ventz, S, et al. Deviation from the proportional hazards assumption in randomized phase 3 clinical trials in oncology: prevalence, associated factors, and implications. Clin Cancer Res. 2019;25(21):6339–6345.CrossRef Google Scholar PubMed

Cope, S, Chan, K, Campbell, H, et al. A comparison of alternative network meta-analysis methods in the presence of nonproportional hazards: a case study in first-line advanced or metastatic renal cell carcinoma. Value Health. 2022;26(4):465–476. https://doi.org/10.1016/j.jval.2022.11.017 CrossRef Google Scholar PubMed

Crowther, MJ, Riley, RD, Staessen, JA, Wang, J, Gueyffier, F, Lambert, PC. Individual patient data meta-analysis of survival data using Poisson regression models. BMC Med Res Methodol. 2012;12:34. https://doi.org/10.1186/1471-2288-12-34 CrossRef Google Scholar PubMed

Freeman, S, Carpenter, J. Bayesian one-step IPD network meta-analysis of time-to-event data using royston-parmar models. Res Synth Methods. 2017;8(4):451–464. https://doi.org/10.1002/jrsm.1253 CrossRef Google Scholar PubMed

Petit, C, Blanchard, P, Pignon, J, Lueza, B. Individual patient data network meta-analysis using either restricted mean survival time difference or hazard ratios: is there a difference? a case study on locoregionally advanced nasopharyngeal carcinomas. Syst Rev. 2019;8(1):96. https://doi.org/10.1186/s13643-019-0984-x CrossRef Google Scholar PubMed

Ouwens, MJ, Philips, Z, Jansen, JP. Network meta-analysis of parametric survival curves. Res Synth Methods. 2010;1(3–4):258–271.CrossRef Google Scholar PubMed

Jansen, JP. Network meta-analysis of survival data with fractional polynomials. BMC Med Res Methodol. 2011;11:61. https://doi.org/10.1186/1471-2288-11-61 CrossRef Google Scholar PubMed

Guyot, P, Ades, AE, Ouwens, MJ, Welton, NJ. Enhanced secondary analysis of survival data: reconstructing the data from published Kaplan-Meier survival curves. BMC Med Res Methodol. 2012;12:9. https://doi.org/10.1186/1471-2288-12-9 CrossRef Google Scholar PubMed

Salanti, G. Indirect and mixed-treatment comparison, network, or multiple-treatments meta-analysis: many names, many benefits, many concerns for the next generation evidence synthesis tool. Res Synth Methods. 2012;3(2):80–97.CrossRef Google Scholar PubMed

Cope, S, Chan, K, Jansen, J. Multivariate network meta-analysis of survival function parameters. Res Synth Methods. 2020;11(3):443–456. https://doi.org/10.1002/jrsm.1405 CrossRef Google Scholar PubMed

Jackson, CH. flexsurv: a platform for parametric survival modeling in R. J Stat Softw. 2016; 70:i80 CrossRef Google Scholar PubMed

Achana, FA, Cooper, NJ, Bujkiewicz, S, et al. Network meta-analysis of multiple outcome measures accounting for borrowing of information across outcomes. BMC Med Res Methodol. 2014;14:92. https://doi.org/10.1186/1471-2288-14-92 CrossRef Google Scholar PubMed

Burke, DL, Ensor, J, Riley, RD. Meta-analysis using individual participant data: one-stage and two-stage approaches, and why they may differ. Stat Med. 2017;36(5):855–875.CrossRef Google Scholar PubMed

Debray, TPA, Moons, KGM, Abo-Zaid, GMA, Koffijberg, H, Riley, RD. Individual participant data meta-analysis for a binary outcome: one-stage or two-stage? PloS One. 2013;8(4):e60650.CrossRef Google Scholar PubMed

Freeman, SC, Cooper, NJ, Sutton, AJ, Crowther, MJ, Carpenter, JR, Hawkins, N. Challenges of modelling approaches for network meta-analysis of time-to-event outcomes in the presence of non-proportional hazards to aid decision making: application to a melanoma network. Stat Methods Med Res. 2022;31(5):839–861. https://doi.org/10.1177/09622802211070253 CrossRef Google Scholar

Kanters, S, Karim, ME, Thorlund, K, Anis, A, Bansback, N. When does the use of individual patient data in network meta-analysis make a difference? A simulation study. BMC Med Res Methodol. 2021;21(1):21. https://doi.org/10.1186/s12874-020-01198-2 CrossRef Google Scholar PubMed

Jackson, D, White, IR. When should meta-analysis avoid making hidden normality assumptions? Biom J. 2018;60(6):1040–1058.CrossRef Google Scholar PubMed

Saramago, P, Chuang, LH, Soares, MO. Network meta-analysis of (individual patient) time to event data alongside (aggregate) count data. BMC Med Res Methodol. 2014;14:105. https://doi.org/10.1186/1471-2288-14-105 CrossRef Google Scholar PubMed

Phillippo, DM. Calibration of Treatment Effects in Network Meta-Analysis Using Individual Patient Data. University of Bristol; 2019. https://research-information.bris.ac.uk/en/studentTheses/calibration-of-treatment-effects-in-network-meta-analysis-using-i Google Scholar

White, IR, Turner, RM, Karahalios, A, Salanti, G. A comparison of arm-based and contrast-based models for network meta-analysis. Stat Med. 2019;38(27):5197–5213.CrossRef Google Scholar PubMed

Wei, Y, Higgins, JPT. Estimating within-study covariances in multivariate meta-analysis with multiple outcomes. Stat Med. 2013;32(7):1191–1205.CrossRef Google Scholar PubMed

Brilleman, SL, Wolfe, R, Moreno-Betancur, M, Crowther, MJ. Simulating survival data using the simsurv R package. J Stat Softw. 2021;97:1–27.CrossRef Google Scholar

Phillippo, DM. multinma: Bayesian Network Meta-Analysis of Individual and Aggregate Data. R package version 0.8.0. 2025; https://doi.org/10.5281/zenodo.3904454.CrossRef Google Scholar

R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing; 2018.Google Scholar

Stan Development Team. Stan Modeling Language Users Guide and Reference Manual Version 2.27. 2019; https://mc-stan.org Google Scholar

Jansen, JP, Vieira, MC, Cope, S. Network meta-analysis of longitudinal data using fractional polynomials. Stat Med. 2015;34(15):2294–2311. https://doi.org/10.1002/sim.6492 CrossRef Google Scholar PubMed

Boutros, A, Croce, E, Ferrari, M, et al. The treatment of advanced melanoma: current approaches and new challenges. Crit Rev Oncol Hematol. 2024:104276.CrossRef Google Scholar PubMed

Saluja, R, Cheng, S, delos Santos, KA, Chan, KKW. Estimating hazard ratios from published Kaplan-Meier survival curves: a methods validation study. Res Synth Methods. 2019;10(3):465–475.CrossRef Google Scholar PubMed

Avril, MF, Aamdal, S, Grob, JJ, et al. Fotemustine compared with dacarbazine in patients with disseminated malignant melanoma: a phase III study. J Clin Oncol. 2004;22(6):1118–1125.CrossRef Google Scholar PubMed

Bajetta, E, Di Leo, A, Zampino, MG, et al. Multicenter randomized trial of dacarbazine alone or in combination with two different doses and schedules of interferon alfa-2a in the treatment of advanced melanoma. J Clin Oncol. 1994;12(4):806–811.CrossRef Google Scholar PubMed

Chapman, PB, Einhorn, LH, Meyers, ML, et al. Phase III multicenter randomized trial of the Dartmouth regimen versus dacarbazine in patients with metastatic melanoma. J Clin Oncol. 1999;17(9):2745–2751.CrossRef Google Scholar PubMed

Chiarion Sileni, V, Nortilli, R, Aversa, SM, et al. Phase II randomized study of dacarbazine, carmustine, cisplatin and tamoxifen versus dacarbazine alone in advanced melanoma patients. Melanoma Res. 2001;11(2):189–196.CrossRef Google Scholar PubMed

Cocconi, G, Bella, M, Calabresi, F, et al. Treatment of metastatic malignant melanoma with dacarbazine plus tamoxifen. N Engl J Med. 1992;327(8):516–523.CrossRef Google Scholar PubMed

Falkson, CI, Falkson, G, Falkson, HC. Improved results with the addition of interferon alfa-2b to dacarbazine in the treatment of patients with metastatic malignant melanoma. J Clin Oncol. 1991;9(8):1403–1408.CrossRef Google Scholar PubMed

Falkson, CI, Ibrahim, J, Kirkwood, JM, Coates, AS, Atkins, MB, Blum, RH. Phase III trial of dacarbazine versus dacarbazine with interferon alpha-2b versus dacarbazine with tamoxifen versus dacarbazine with interferon alpha-2b and tamoxifen in patients with metastatic malignant melanoma: an eastern cooperative oncology group study. J Clin Oncol. 1998;16(5):1743–1751.CrossRef Google Scholar PubMed

Middleton, MR, Grob, JJ, Aaronson, N, et al. Randomized phase III study of temozolomide versus dacarbazine in the treatment of patients with advanced metastatic malignant melanoma. J Clin Oncol. 2000;18(1):158–166.CrossRef Google Scholar PubMed

Thomson, DB, Adena, M, McLeod, GR, et al. Interferon-alpha 2a does not improve response or survival when combined with dacarbazine in metastatic malignant melanoma: results of a multi-institutional Australian randomized trial. Melanoma Res. 1993;3(2):133–138.Google Scholar PubMed

Young, AM, Marsden, J, Goodman, A, Burton, A, Dunn, JA. Prospective randomized comparison of dacarbazine (DTIC) versus DTIC plus interferon-alpha (IFN-alpha) in metastatic melanoma. Clin Oncol. 2001;13(6):458–465.Google Scholar

Grambsch, PM, Therneau, TM. Proportional hazards tests and diagnostics based on weighted residuals. Biometrika. 1994;81(3):515–526.CrossRef Google Scholar

Assessment HC-Mscgoht. Practical Guideline for Quantitative Evidence Synthesis: Direct and Indirect Comparisons. 2024. https://health.ec.europa.eu/document/download/1f6b8a70-5ce0-404e-9066-120dc9a8df75_en?filename=hta_practical-guideline_direct-and-indirect-comparisons_en.pdf Google Scholar

Vehtari, A, Gelman, A, Gabry, J. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Stat Comput. 2017;27:1413–1432.CrossRef Google Scholar

Gelman, A, Rubin, DB. Inference from iterative simulation using multiple sequences. Statist Sci. 1992;7:457–511.CrossRef Google Scholar

Gelman, A. Bayesian Data Analysis, 3rd ed. Boca Raton: Texts in Statistical Science; 2013.CrossRef Google Scholar

Betancourt, M, Girolami, M. Hamiltonian Monte Carlo for hierarchical models. Curr Trends Bayesian Methodol Appl. 2015;79(30):2–4.Google Scholar

Cope, S, Jansen, JP. Quantitative summaries of treatment effect estimates obtained with network meta-analysis of survival curves to inform decision-making. BMC Med Res Methodol. 2013;13:147. https://doi.org/10.1186/1471-2288-13-147 CrossRef Google Scholar PubMed

Monnahan, CC, Thorson, JT, Branch, TA. Faster estimation of Bayesian models in ecology using Hamiltonian Monte Carlo. Methods Ecol Evol. 2017;8(3):339–348.CrossRef Google Scholar

Latimer, NR. Survival analysis for economic evaluations alongside clinical trials--extrapolation with patient-level data: inconsistencies, limitations, and a practical guide. Med Decis Making. 2013;33(6):743–754. https://doi.org/10.1177/0272989x12472398 CrossRef Google Scholar PubMed

Rutherford, M, Lambert, P, Sweeting, M, Pennington, R, Crowther, M, Abrams, K. NICE DSU Technical Support Document 21: Flexible Methods for Survival Analysis. 2020. http://nicedsu.org.uk/ Google Scholar

Latimer, N. NICE DSU Technical Support Document 14: Survival analysis for Economic Evaluations Alongside Clinical Trials-Extrapolation with Patient-Level Data. 2011. http://nicedsu.org.uk/wp-content/uploads/2016/03/NICE-DSU-TSD-Survival-analysis.updated-March-2013.v2.pdf Google Scholar

Ruiz-Morales, JM, Swierkowski, M, Wells, JC, et al. First-line sunitinib versus pazopanib in metastatic renal cell carcinoma: results from the international metastatic renal cell carcinoma database consortium. Eur J Cancer. 2016;65:102– 108. https://doi.org/10.1016/j.ejca.2016.06.016 CrossRef Google Scholar PubMed

Cox, C, Chu, H, Schneider, MF, Munoz, A. Parametric survival analysis and taxonomy of hazard functions for the generalized gamma distribution. Stat Med. 2007;26(23):4352–4374.CrossRef Google Scholar PubMed

Heinze, G, Boulesteix, AL, Kammer, M, Morris, TP, White, IR. Simulation Panel of the SI. Phases of methodological research in biostatistics—building the evidence base for new methods. Biom J. 2024;66(1): e2200222.CrossRef Google Scholar

Wang, Z, Lin, L, Hodges, JS, Chu, H. The impact of covariance priors on arm-based Bayesian network meta-analyses with binary outcomes. Stat Med. 2020;39(22):2883–2900.CrossRef Google Scholar PubMed

Alvarez, I, Niemi, J, Simpson, M. Bayesian inference for a covariance matrix. arXiv preprint arXiv:14084050 . 2014.CrossRef Google Scholar

Turner, RM, Domínguez-Islas, CP, Jackson, D, Rhodes, KM, White, IR. Incorporating external evidence on between-trial heterogeneity in network meta-analysis. Stat Med. 2019;38(8):1321–1335.CrossRef Google Scholar PubMed

Phillippo, DM, Dias, M, Ades, AE, Welton, NJ. Multilevel network meta-regression for general likelihoods: synthesis of individual and aggregate data with applications to survival analysis. arXiv preprint arXiv:2401.12640. 2024.Google Scholar

Maciel, D, Cope, S, Towle, K, et al. MSR31 applying multi-level network meta-regression (ML-NMR) to a case study in triple-class exposed (TCE) relapsed/refractory multiple myeloma (RRMM). Value Health. 2023;26(6):S283.CrossRef Google Scholar

Table 1 Survival and hazard functions for standard parametric survival models

Figure 1 Network diagram of artificial randomized controlled trials.

Figure 2 Kaplan–Meier survival curves for simulated event times, for each treatment (colors) in each study (panels). Censored events are marked with a cross (“+”).

Figure 4 Kaplan–Meier plots of the reconstructed individual event and censoring times obtained for each randomized controlled trial. Abbreviations: DTIC, dacarbazine; IFN, interferon (IFN).

Figure 6 Fitted study-specific survival functions (using study-specific baseline risk [${\boldsymbol{\mu}}_{\boldsymbol{1},\boldsymbol{j}}\;\boldsymbol{and}\;{\boldsymbol{\mu}}_{\boldsymbol{1},\boldsymbol{j}}$] and relative treatment effects [${\boldsymbol{\delta}}_{\boldsymbol{1},\boldsymbol{j},\boldsymbol{k}}\;\boldsymbol{and}\;{\boldsymbol{\delta}}_{\boldsymbol{2},\boldsymbol{j},\boldsymbol{k}}$]) for all distributions from the RE one-step model (Weibull, Gompertz, log-normal, and log-logistic) and Kaplan–Meier curves by treatment arm. Abbreviations: DTIC, dacarbazine; IFN, interferon (IFN).