Bias-free model fitting of correlated data in interferometry

In optical and infrared long-baseline interferometry, data often display significant correlated errors because of uncertain multiplicative factors such as the instrumental transfer function or the pixel-to-visibility matrix. In the context of model fitting, this situation often leads to a significant bias in the model parameters. In the most severe cases this can can result in a fit lying outside of the range of measurement values. This is known in nuclear physics as Peelle's Pertinent Puzzle. I show how this arises in the context of interferometry and determine that the relative bias is of the order of the square root of the correlated component of the relative uncertainty times the number of measurements. It impacts preferentially large data sets, such as those obtained in medium to high spectral resolution. I then give a conceptually simple and computationally cheap way to avoid the issue: model the data without covariances, estimate the covariance matrix by error propagation using the modelled data instead of the actual data, and perform the model fitting using the covariance matrix. I also show that a more imprecise but also unbiased result can be obtained from ignoring correlations in the model fitting.


INTRODUCTION
Optical and infrared long-baseline interferometry consists in measuring the fringe contrast and phase of interference fringes in the recombined light collected at several telescopes 1 . These observables hold information on the celestial object's spatial properties, often obtained through model fitting.
In spite of strong evidence of correlations in the data, due to redundancy (Monnier, 2007, in the case of closure phases), calibration (Perrin, 2003), or atmospheric biases acting on all spectral channels in the same way (Lawson, 2000), only a few authors (Perrin et al., 2004;Absil et al., 2006;Berger et al., 2006;Lachaume et al., 2019;Kammerer et al., 2020) have accounted for these correlations while most assumed statistically independent errors. In particular, the only interferometric instrument I know of with a data processing software taking into account one source of correlations-calibration-is FLUOR 2 (at IOTA 3 , then CHARA 4 , Perrin et al., 2004). None of * regis.lachaume@gmail.com 1 Recombination may be performed by software in the case of intensity interferometry or heterodyne detection.
2 Fiber Linked Unit for Optical Recombination 3 Infrared and Optical Telescope Array 4 Center for High Angular Resolution Array the five first and second-generation ones at the VLTI 5 does (Millour et al., 2008;Hummel & Percheron, 2006;Le Bouquin et al., 2011;ESO GRAVITY pipeline team, 2020;ESO MATISSE pipeline team, 2020). The same lack of support for correlations is present in image reconstruction programmes (e.g. MIRA, see Thiébaut, 2008), model-fitting tools (e.g. Litpro, see Tallon-Bosc et al., 2008), or the still widespread first version of the Optical Interferometric FITS format (OIFITS v. 1, Pauls et al., 2005). Unfortunately, ignoring correlations may lead to significant errors in model parameters as Lachaume et al. (2019) evidenced with stellar diameters using PIONIER 6 (Le Bouquin et al., 2011) data at the VLTI. Also Kammerer et al. (2020) established that accounting for correlations is necessary to achieve a higher contrast ratio in companion detection using GRAVITY (Eisenhauer et al., 2011) at the VLTI.
Several sources of correlated uncertainties occur in a multiplicative context, when several data points are normalised with the transfer function (Perrin, 2003) or the coherent fluxes are derived with the pixel-to-visibility 5 Very Large Telescope Interforometer 6 Precision Integrated-Optics Near-infrared Imaging ExpeRiment matrix formalism (Tatulli et al., 2007). In both cases, the uncertainty on the multiplicative factor translates into a systematic, correlated one in the final data product. In the context of experimental nuclear physics, Peelle (1987) noted that this scenario could lead to an estimate falling below the individual data points, a paradox known as Peelle's Pertinent Puzzle (PPP). It results from the usual, but actually incorrect, way to propagate covariances, in which the measured values are used in the calculations (D'Agostini, 1994;Neudecker et al., 2012). A few workarounds have been proposed but they are either computationally expensive (e.g. sampling of the posterior probability distribution for Bayesian analysis, see Neudecker et al., 2012) or require a conceptually difficult implementation (Becker et al., 2012;Nisius, 2014).
The issue, however, is not widely known in many other fields where the problem has seldom arisen. In this paper, I present the paradox within the context of long-baseline interferometry (Sect. 2), derive the order of magnitude of its effect using the modelling of a single value (Sect. 3), analyse in detail its effect in least squares model-fitting (Sect. 4) and propose a simple, computer-efficient way to avoid it (Sect. 5).

PEELLE'S PERTINENT PUZZLE
I rewrite and adapt Peelle's original example in the context of long-baseline interferometry (see Neudecker et al., 2012, Sect. 1 & 2.1). One or several calibrator observations yield the inverse of the instrumental fringe contrast τ ± τ ς τ . I use the relative uncertainty ς τ on the transfer function as it is often referred to in percentage terms. A visibility amplitude is now estimated from two contrast measurements ν 1 ± σ ν and ν 2 ± σ ν . For each measurement, the visibility amplitudes are: where the second-order error term τ ς τ σ ν has been ignored. They are normalised with the same quantity (τ ), so they are correlated, hence the systematic uncertainty term between parentheses in Eq. (1). Error propagation yields the covariance matrix Under the hypothesis of Gaussian errors, I obtain the least squares estimate using the weight matrix W = Σ −1 :  Figure 1. Original Peelle problem rewritten in the context of interferometry. Top: Two raw visibility amplitudes ν 1 and ν 2 (points with statistic error bars of ≈ 0.6%) are calibrated by the transfer function 1/τ (solid line with systematic error zone of 5%). Bottom: the two calibrated visibility measurements V 1 and V 2 (points with statistic error bars of ≈ 0.6%) are strongly correlated. The least-squares estimate for the visibility V (dashed line, with statistic uncertainty error zone displayed) falls outside of the data range. The systematic error on V , V 1 , and V 2 is shown on the right.
with the uncertainty The visibility estimate V systematically falls below the average of the two values V 1 and V 2 . If the measurements differ significantly, it can even fall below the lowest value. Figure 1 gives such an example with an instrumental visibility of 50% and two measurements on an unresolved target: τ = 2.000 ± 0.100 (ς τ = 5%), which yields two points 2.4 standard deviations apart falls outside the data range. The uncertainties quoted for V correspond to the first and second terms within the square brackets of Eq. (4).

FIT BY A CONSTANT
I now generalise the results of the last section to an arbitrary number of measurements of a single normalised

Symbol
Meaning a of the measurement error a τ a of the normalisation error a a impacted by PPP ς sys.
relative statistical uncertainty quantity, such as the visibility amplitude of an unresolved source, which is expected to be constantly one for all interferometric baselines. Let the column vector V = (V 1 , · · · , V n ) t contain the n visibility amplitudes. It is derived from an uncalibrated quantity like the fringe contrast, ν = (ν 1 , · · · , ν n ) t and a normalisation factor, like the cotransfer function, τ = (τ 1 , · · · , τ n ) t by where denotes the Hadamard (element-wise) product of vectors. With V , τ , and ν the true, but unknown, values of these quantities, the error vector on V can be written as a sum of measurement and normalisation relative errors if one ignores the second-order terms: These errors are given by I assume ε ν and ε τ are independent, of mean 0, and have standard deviations ς ν and ς τ , respectively. In addition, I consider correlation of the normalisation errors, with correlation coefficient . In the case of interferometry, it can arise from the uncertainty on the calibrators' geometry. The covariance matrix of the visibility amplitudes is given by where ⊗ denotes the outer product of vectors and <> stands for the expected value, so that The non-diagonal diagonal elements of the matrix feature the systematic relative uncertainty ς sys. , i.e. the correlated component of the uncertainties. In the case of a fully correlated transfer function ( = 1), it is equal its uncertainty (ς sys. = ς τ ). The diagonal term of the matrix additionally includes the statistical relative uncertainty ς stat. , i.e. the uncorrelated component of the uncertainties. In the case of a fully correlated transfer function, it is equal to the uncertainty of the uncalibrated visibility (ς stat. = ς ν ). The value V is yet to be determined, so the covariances are often derived using the measurements V in the propagation: The least squares estimate for V is given by where x = (1, · · · , 1) t is the trivial sensitivity vector. The covariance matrix is the sum of an invertible diagonal matrix and one of rank one-see Eq. (12)-, so that the inverse is obtained using the Woodbury matrix identity: where we have introduced the statistical (uncorrelated) component of the uncertainty on the calibrated visibilities Appendix A.1 shows the analytical derivation for the least squares estimate µ using the previous formulae. I write it in a way that highlights the generalisation of Eq. (3) of the previous section: For small enough errors (η i V ) the second-order Taylor development in is biased. If the data are not correlated (ς sys. = 0), the bias is small (ς 2 stat. to 2ς 2 stat. ) but it becomes larger for correlated data if the number of points is large (n/2 to n × ς 2 τ for fully correlated data) as D'Agostini (1994) already noted. This analytical derivation confirms the numerical simulation by Neudecker et al. (2012, see their Fig. 4). For visualisation purposes, Figure 2 shows a similar simulation of the bias as a function of the normalisation uncertainty ς τ for various data sizes (n = 2 to 100). I have verified that it reproduces the quadratic behaviour of Eq. (18) for small values of ς stat. and ς sys.
The bias from PPP arises, intuitively, because the modelled uncertainty is a non-constant function of the measured value. In the present case, data that fall below the average are given a lower uncertainty and, thus, a higher weight in the least squares fit. Conversely, data that fall above the average have a higher uncertainty and a lower weight. This fundamentally biases the estimate towards lower values. The effect is much stronger with correlations because it impacts the ∼ n 2 /2 independent elements of the covariance matrix instead of being restricted to the n diagonal ones. In the literature, the puzzle is generally discussed as arising from a normalisation, as it it where it has been first identified. However, I show in appendix A.3 that it is not necessary and determine the bias in the case of correlated photon noise.
For spectro-interferometric observations with 4 telescopes, the number of correlated points can be over 1,000, so even with a low correlation coefficient, the bias can be significant. For instance, a single GRAVITY observation in medium spectral resolution yields n = 6×210 visibility amplitudes. With an observed correlation of ≈ 16% in the instrumental visibility amplitudes (Kammerer et al., 2020) and a typical ς τ = 1-2% normalisation error, the bias on the calibrated visibilities could be 2-8%.

GENERAL MODEL FITTING
I now consider a set of measurements corresponding to the linear model where p are the unknown parameters and X is the known sensitivity matrix. Typically, x ik = f k (u i , v i ) for a linear model and x ik = ∂f /∂p k (u i , v i ) for a non-linear model approximated by a linear one close to a solution.
(u, v) is the reduced baseline projected onty the sky, i.e. u = B u /λ if B is the baseline and λ, the wavelength.
The true values V are impacted by errors so that the data are with the error term η again expressed as the sum of a measurement and a normalisation error: The measurement errors η ν and normalisation errors ε τ follow multivariate distributions of mean zero with covariance matrices Σ ν and Σ τ respectively. Given the covariance matrix Σ of this model, the least squares estimate is I investigate four ways to determine the covariance matrix Σ  2. Using the naïve estimate Σ = Σ ν + (V ⊗ V ) Σ τ which is known to lead to Peelle's pertinent puzzle in the trivial case of a constant model. 3. Using the data model of the fit without the normalisation error:

Ignoring the correlations in the normalisation using
This is the generalisation of the two-variables approach by Neudecker et al. (2014). The resulting least squares model is µ 1 = X(X t Σ −1 1 X) −1 X t Σ −1 1 V . 4. Recursively fitting the data by updating the data model in the covariance matrix. I derive In order to compare these covariance matrix prescriptions, I will use the typical example of an under-resolved centro-symmetric source observed at a four-telescope facility in medium spectral resolution. It is close to the context under which I serendipitiously noticed the effect while modelling stellar diameters (see Lachaume et al., 2019). The p y t h o n code to produce the results (figures in this paper) is available on github. 7 In the under-resolved case all models-Gaussian, uniform disc, or limb-darkened disc-are equivalent (Lachaume, 2003), so I will use instead a linear least squares fit µ = a − bx 2 to V ≈ 1 − x 2 where x is dimensionless variable proportional to the projected baseline length √ u 2 + v 2 . This fit corresponds to the second-order Taylor development of any of the aforementioned models. Figure 3(a) shows the example of such a fit performed for each covariance matrix prescription. Data have been simulated using V = (1 − x 2 )(1 + ε τ ) + η ν where ε τ is a fully correlated normalisation error (3%) and η ν are uncorrelated statistical errors (2%). As expected, the use of data V in the correlation matrix, method 2, leads to grossly underestimated data values, in the very same way as in the classical Peelle case described in Sects. 2 & 3. Other methods, including ignoring correlations, yield reasonable parameter estimates. Figure 4(a) sums up the behaviour of the same fit performed a large number of times on different simulated data sets, each following V ≈ 1 − x 2 . For each correlation matrix prescription, it displays the dispersion of the reduced chi squared, the model parameters a and b, and the difference between modelled value and true value. It also reports the uncertainty on model parameters given by the least squares optimisation routine in comparison to the scatter of the distribution of the values. While the model fitting ignoring correlations (method 1) does not show any bias on the parameter estimates, it displays a higher dispersion of model parameters, grossly underestimates the uncertainty on model parameters and has a biased chi squared. The correlation matrix calculated from data (method 2) is, as expected, strongly biased. Both methods estimating the correlation matrix from modelled data (methods 3 & 4), are equivalent in terms of the absence of bias, dispersion of these quantities, and correct prediction of the uncertainty on model parameters.
Given that fitting recursively the covariance matrix doesn't yield additional benefits for the modelling, I would suggest to use method 3. One would expect this to hold for any smooth enough model, as the update in the covariance matrix is expected to be a small effect. Indeed, I have checked that the result holds for a fully resolved Gaussian disc V ≈ exp −3x 2 fit by µ = a exp −bx 2 (see Figs. 3(b) & 4(b)) a well-resolved binary, with methods 3 & 4 providing unbiased estimates and similar uncertainties. If, for some other application, the model µ 1 obtained with method 3 were to differ significantly from the starting guess µ 0 (method 1), it would cer- covariance determination method (b) Well-resolved data fitted with a non-linear least squares model tainly make sense to examine whether recursive fitting (method 4) is needed. However, while it converged for the smooth models I tested, I have not proven that it will necessarily do so, in particular for less smooth models that may require it.

CONCLUSION
The standard covariance propagation, using the measurement values in in the the calculation, can result in a bias in the model parameters of a least-squares fit taking correlations into account. It will occur as soon as the error bars and covariances depend on the measured values, in particular when a normalisation factor, such as the instrumental transfer function of an interferometer, is obtained experimentally. Some bias will even occur without correlations, but the effect is strongest when a large set of correlated data is modelled. This is precisely the case in optical and infrared long-baseline interferometry, where the calibration of spectrally dispersed fringes easily yields 10 2 to 10 3 correlated data points.
While solutions exist that are either numerically expensive or require some care to be implemented (Burr et al., 2011;Becker et al., 2012;Nisius, 2014), I have shown with a simple example that there is an easy and cheap way to solve the issue. First an uncorrelated fit is performed to estimate the true values corresponding to the data. Secondly, these estimates are used to determine the covariance matrix by error propagation. At last, this covariance matrix is used to perform a least squares model fit.
Alternatively, it is possible to obtain an (almost) unbiased estimate for the model parameters by ignoring correlations altogether, with the cost of a larger imprecision, under-estimated uncertainties, and a biased chi square. It is, at the moment, the approach taken in the vast majority of published studies in optical interferometry, as data processing pipelines of most instrument do not determine covariances. To my knowledge, Lachaume et al. (2019) is the only work where Peelle's Pertinent Puzzle has been explicitly taken care of in optical interferometry.
order as I now determine the Taylor development for The Taylor developments for m 0 and m 1 yield + 1 n 2 f 21 .

A.3 PPP with photon detection
I model n photon measurements N i of expected value N with noise uncertainty y i = √ N i showing correlation 8 under the assumption of Gaussian errors (N 1). The statistical component of the uncertainty is given by σ 2 i = (1 − )N i . The correlation matrix is Σ ij = σ 2 i δ ij + y i y j and its (Woodbury) inverse , with the moments m l , d lk , etc. defined with respect to y i , while m l , d lk , etc. are defined with respect to N i . The least squares estimate for the number of photons is given by Correlation in the photon noise is a quantum effect detected in particular experimental setting such as coupled lasers (e.g. Mayer et al., 2003). In astronomy, intensity interferometry makes use of these correlations. and, by using N i = y 2 i , or, more explicitly, In the second order in η, it can be simplified to and, by noting that The leading factor can be approximated in the second order using the Taylor series: