Hostname: page-component-89b8bd64d-n8gtw Total loading time: 0 Render date: 2026-05-08T00:46:52.947Z Has data issue: false hasContentIssue false

LITMUS: Bayesian lag recovery in reverberation mapping with fast differentiable models

Published online by Cambridge University Press:  23 January 2026

Hugh Gareth McDougall*
Affiliation:
School of Mathematics and Physics, The University of Queensland , Australia
Benjamin Pope
Affiliation:
School of Mathematics and Physics, The University of Queensland , Australia School of Mathematical and Physical Sciences, Macquarie University, Australia
Tamara M. Davis
Affiliation:
School of Mathematics and Physics, The University of Queensland , Australia
*
Corresponding author: Hugh Gareth McDougall; Email: hughmcdougallemail@gmail.com
Rights & Permissions [Opens in a new window]

Abstract

Reverberation mapping (RM) is a technique in which the mass of a Seyfert I galaxy’s central supermassive black hole is estimated, along with the system’s physical scale, from the timescale at which variations in brightness propagate through the galactic nucleus. This mapping allows for a long baseline of time measurements to extract spatial information beyond the angular resolution of our telescopes, and is the main means of constraining supermassive black hole masses at high redshift. The most recent generation of multi-year RM campaigns for large numbers of active galactic nuclei (AGN) (e.g. OzDES) have had to deal with persistent complications of identifying false positives, such as those arising from aliasing due to seasonal gaps in time-series data. We introduce LITMUS (Lag Inference Through the Mixed Use of Samplers), a modern lag recovery tool built on the ‘damped random walk’ model of quasar variability, built in the automatic differentiation framework jax. LITMUS is purpose-built to handle the multimodal aliasing of seasonal observation windows and provides Bayesian evidence integrals for model comparison and null hypothesis testing, a more quantified alternative to existing post-fit selection methods. LITMUS also offers a flexible and modular framework for using more expressive high-dimensional models for the AGN variability and includes jax-enabled implementations of other popular lag recovery methods like nested sampling and the interpolated cross-correlation function. We test LITMUS on a number of mock light curves modelled after the OzDES sample and find that it recovers their lags with high precision and successfully identifies spurious lag recoveries, reducing its false positive rate to drastically outperform the state-of-the art program JAVELIN. LITMUS’s high performance is accomplished by an algorithm for mapping the Bayesian posterior density which both constrains the lag and provides Bayesian evidences for model comparison and null hypothesis testing while outperforming nested sampling in computational cost by an order of magnitude.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2026. Published by Cambridge University Press on behalf of Astronomical Society of Australia
Figure 0

Figure 1. A demonstration of the sort of light curves that GP modelling can reconstruct from observations. For some time-series observations (error bars) a particular GP models the entire family of underlying light curves that exhibit the power spectral density of the GP, conditioned on how well they fit the observations. In this example the light curves is fit as a DRW with $\tau=200\,\mathrm{d}$ and $\sigma=1$, both in arbitrary units for this demonstrative example. The shaded regions represent the 1 and $2 \sigma$ contours of the distribution of all such walks.

Figure 1

Figure 2. A demonstration of the source of the aliasing problem, specifically in the context of a parametric GP model. Top shows mock data with cadence, measurement uncertainty and baseline similar to OzDES with a DRW timescale of $\tau=200\,\mathrm{d}$ and a true lag of $\Delta t=360\,\mathrm{d}$. From left to right the sub-panels show lags being tested at $\Delta t=0\, \mathrm{d}$, $180\,\mathrm{d}$ and $360\,\mathrm{d}$. The left panel is clearly a bad fit as near simultaneous observations are in clear tension, and the right panel is a clear good fit as we see very little tension. The middle panel, corresponding to the first aliasing peak, is an ambiguously good fit; the lack of overlap means we cannot observe clear tensions between the light curves. The bottom panel shows the (un-normalised) log-natural of the posterior distribution, with all non-lag parameters fixed at their true values. At ‘on-season’ lags (un-shaded) we can easily reject bad fits, and so the posterior is extremely low. During the off-season lags (blue shading) there are local optima arising from the ambiguity. The mode associated with the true lag (red dot) is clearly defined and dominates over aliasing modes, with the rest of the posterior being $\lt1 \%$ of the maximum posterior density in this well behaved, high SNR example. Even so, the posterior still suffers from the rough geometry and multimodality that introduces numerical challenges in navigating it.

Figure 2

Figure 3. A demonstration of the failure mode of the Affine-Invariant Ensemble Sampler (AIES), the MCMC proposal algorithm used by emcee, in multi-modal distributions. Both top and bottom panels are posterior distributions generated from the same mock data with a true lag at $\Delta t = 854\,\mathrm{d}$ (dashed line), with the bottom panel being the result from the AIES, the same MCMC sampler as JAVELIN, while the top is found from exhaustive sampling of the prior range. The AIES estimate for the posterior has produced an aliasing peak at $\Delta t = 540\,\mathrm{d}$ where none truly exists due to its ensemble of live sampling points becoming pinned at this minor mode.

Figure 3

Figure 4. A simplified demonstration of the operating principle behind LITMUS’s Laplace Quadrature for a case of only two free parameters (lag and DRW timescale). First, a 1D locus of conditional optima is traced out along the lag axis (orange line), finding the conditional optima at a discrete grid of lags (white points). At these points, the Laplace approximation is applied to divide the posterior up into a series of Gaussian slices (purple, shaded).

Figure 4

Figure 5. A demonstration of the difference in the Laplace and SVI approximations, both attempting to emulate a Cauchy distribution (black solid line). The Laplace approximation (blue dotted line) creates a Gaussian that matches the curvature at the MAP of the true distribution, and in this case under-estimates the distribution everywhere else. The SVI approximation, here also fitting a Gaussian, instead tries to get as close as possible to the true distribution ‘on average’, and so under-estimates in the core region while balancing the impact on the evidence integral with over-estimates in the distribution’s tails.

Figure 5

Figure 6. Mocks and evidence ratios for the demonstrative mock signals in the body of the text. The top panels show the mock light curves for continuum (blue) and response (pink) signals. The left and right columns correspond to high and low SNR, while the rows from top to bottom show the mocks for a coupled response at a lag of $\Delta t = 540\,\mathrm{d}$, a decoupled response and a pure white noise response. The bottom panel shows the differences in log-Bayes factors for each of the hypotheses in Table 1, with orange bars being how strongly we can see structure in the response signals and blue bars being how well we can confirm the existence of a lag from this structure, with dotted lines indicating different significance levels in favour of accepting or rejecting these hypotheses. Red and green markers dots indicate the ground truth for each question: green circles mean true, red squares mean false.

Figure 6

Table 1. Bayes factors (log scale) of model evidences when the different mocks in Figure 6 are fit with a model that encodes a lag response or that encodes an uncoupled but still structured response signal as compared to a model in which the response is unstructured noise. The bottom panel of Figure 6 shows the Bayes factors from these evidences that are used to test different hypotheses/compare the relative strength of the different models.

Figure 7

Figure 7. A comparison of the posterior distributions for the lag error, i.e. the difference between true and recovered lag, comparing some of LITMUS’s aliasing-friendly methods, namely Nested Sampling (left panel) and the Laplace Quadrature (middle panel), to the JAVELIN-like AEIS (right panel). These plots are for mock sample 1 which has 440 mocks with true lags distributed uniformly over the range $\Delta t \in [0,1\,000]\,\mathrm{d}$. The aliasing fraction is the fraction of samples/posterior density that sits more than $30\,\mathrm{d}$ from the true value for these mocks. The Laplace Quadrature and Nested Sampling results adhere extremely closely to the true lags save for a single errant false positive, and the similarity between the two validates the Laplace Quadrature’s recovery of the true posterior shape. Conversely, more than half the AEIS samples are incorrect, and the posterior median (black dots) is often far from the true value.

Figure 8

Table 2. Summary of the performance of LITMUS’s three fitting methods, the Laplace Quadrature, SVI Quadrature and Nested Sampling approaches, as compared the AEIS fitting method used by JAVELIN and the ICCF method use by PyCCF, as tested on the three sets of mock light-curves (uniform in lag, log-normal in lag to emulate the OzDES MgII sample, and mocks with the response light curve decoupled such that no true lag is present). Listed are the fraction of false positive lags before and after quality cuts (a lag here being considered incorrect if it differs by more than 30 d from the ground truth), as well as the total number of retained sources after cuts. In general, the LITMUS fitting methods perform significantly better at identifying a true lag where it exists, with a pre-cut $\mathrm{FPR} \lt 5\%$ in all cases, reducing by a factor of a few when removing sources with a lag recovery evidence ratio $Z_2/Z_1\lt10$. Overall, LITMUS yields significantly more and and significantly more accurate lags, while also retaining 10–20 $\times$ fewer spurious lags from the decoupled sample.

Figure 9

Figure 8. A demonstration of how the Bayes factor acts as a measure of lag reliability. The top panel shows histograms of the Bayes factor evidence ratios for the decoupled mocks with no lag (grey), mocks with a lag that was successfully recovered (navy) and mocks that had an underlying lag but for which the posterior median of the recovery was more than $30\,\mathrm{d}$ from the ground truth. The bottom panel shows how the error in the lag (here, the deviation between ground truth and posterior median) decreases for strong Bayes factors. As the evidence ratio for the lag and decoupled models lowers, the error in the median recovered lag rapidly increases, and above some reasonable threshold (e.g. $Z_2/Z_1\gt10^2$, the results become significantly more reliable. The solid, dashed and dotted lines represent evidence ratio thresholds of $1{:}1$, $1{:}10$ and $1{:}100$ in favour of a lag. The correct and incorrect lags are for the 490 realistic mocks in mock set 2, while the mocks with no lag are from the decoupled mock set 3.

Figure 10

Figure B1. An exaggerated demonstration of the grid smoothing algorithm for a simple multimodal function using $\alpha = 0.8$ up to $j=5$ iterations with 32 points. The top panel shows the true distribution (black) with its estimate from the first evenly spaced grid (red) and the final smoothed grid (blue). The bottom panel shows how the spacing of the grid updates over each iteration, progressing from top to bottom, with the first and last iterations coloured for emphasis, and gray dots representing samples from previous iterations. The initial spacing is so coarse that it misses much of the detail of the left mode, and cuts off the right-mode entirely. By final iteration, the estimate of the mode is significantly more accurate.

Figure 11

Table C1. Tuning parameters for the fitting methods used in Section 5.

Figure 12

Figure C1. Histogram of the run-times for the five fitting methods in LITMUS over all mocks, using the fitting parameters described in Table C1. The ICCF method, which requires no matrix inversion owing to its absence of GP fitting, is consistently the fastest. The Laplace Quadrature can perform very fast except for cases where it get stuck optimising at a new test lag when the local optimum changes quickly over the lag axis. The SVI Quadrature has a similar issue, but runs overall somewhat slower.