Hostname: page-component-6766d58669-7fx5l Total loading time: 0 Render date: 2026-05-24T00:06:10.036Z Has data issue: false hasContentIssue false

Hyper-Fit: Fitting Linear Models to Multidimensional Data with Multivariate Gaussian Uncertainties

Published online by Cambridge University Press:  23 September 2015

A. S. G. Robotham*
Affiliation:
ICRAR, M468, University of Western Australia, Crawley, WA 6009, Australia
D. Obreschkow
Affiliation:
ICRAR, M468, University of Western Australia, Crawley, WA 6009, Australia
Rights & Permissions [Opens in a new window]

Abstract

Astronomical data is often uncertain with errors that are heteroscedastic (different for each data point) and covariant between different dimensions. Assuming that a set of D-dimensional data points can be described by a (D − 1)-dimensional plane with intrinsic scatter, we derive the general likelihood function to be maximised to recover the best fitting model. Alongside the mathematical description, we also release the hyper-fit package for the R statistical language (github.com/asgr/hyper.fit) and a user-friendly web interface for online fitting (hyperfit.icrar.org). The hyper-fit package offers access to a large number of fitting routines, includes visualisation tools, and is fully documented in an extensive user manual. Most of the hyper-fit functionality is accessible via the web interface. In this paper, we include applications to toy examples and to real astronomical data from the literature: the mass-size, Tully–Fisher, Fundamental Plane, and mass-spin-morphology relations. In most cases, the hyper-fit solutions are in good agreement with published values, but uncover more information regarding the fitted model.

Information

Type
Research Article
Copyright
Copyright © Astronomical Society of Australia 2015 
Figure 0

Figure 1. Schematic representation of the linear model (blue) fitted to the data points (red). Both the model and the data are assumed to have Gaussian distributions, representing intrinsic model scatter and statistical measurement uncertainties, respectively. In both cases, 1σ-contours are shown as dashed lines.

Figure 1

Figure 2. The front page view of the hyper-fit web tool available at hyperfit.icrar.org. The web tool allows users to interact with hyper-fit through a simple GUI interface, with nearly all hyper-fit functionality available to them. The code itself runs remotely on a machine located at ICRAR, and the user’s computer is only used to render the website graphics.

Figure 2

Figure 3. 2D fit with no errors. Figure shows the default plot output of the hyper.plot2d function (accessed via the class specific plot method) included as part of the R hyper-fit package, where the best generative model for the data is shown as a solid line with the intrinsic scatter indicated by dashed lines. The colouring of the points shows the ‘tension’ with respect to the best-fit linear model and the measurement errors (zero in this case), where redder colours indicate data that is less likely to be explainable.

Figure 3

Figure 4. 2D fit with uncorrelated (between x and y) errors. The errors are represented by 2D ellipses at the location of the xy-data. See Figure 3 for further details on this Figure. The reduction in the intrinsic scatter required to explain the data compared to Figure 3 is noticeable (see the dashed line intersects on the y-axis).

Figure 4

Figure 5. 2D fit with correlated (between x and y) errors. The errors are represented by 2D ellipses at the location of the xy-data. See Figure 3 for further details on this Figure. The keen reader might notice that this Figure uses data from the Figure 1 schematic.

Figure 5

Figure 6. 2D fit with correlated (between x and y) errors. The errors here are a factor 1.9 times larger than used in Figure 5. See Figure 3 for further details on this Figure.

Figure 6

Figure 7. 2D fit with correlated (between x and y) errors. The errors here are a factor 1.9 times larger than used in Figure 5 and the correlation matrix is rotated by 90°. See Figure 3 for further details on this Figure.

Figure 7

Figure 8. 2D toy data with correlated (between x and y) errors taken from Hogg et al. (2010). The top panel shows the fit to the Hogg et al. (2010) data minus row 3, as per exercise 17 and Figure 13 of Hogg et al. (2010), the bottom panel shows the MCMC posterior chains for the intrinsic scatter, as per Figure 14 of Hogg et al. (2010). The vertical dashed lines specify the 95th and 99th percentile range of the posterior chains, as requested for the original exercise. See Figure 3 for further details on the top two panels of this Figure.

Figure 8

Figure 9. GAMA mass-size relation data taken from Lange et al. (2015) See Figure 4 for further details on this Figure.

Figure 9

Figure 10. Tully–Fisher data taken from Obreschkow & Meyer (2013), with the best-fit hyper-fit generative model for the data shown as a solid line and the intrinsic scatter indicated with dashed lines. See Figure 3 for further details on this Figure.

Figure 10

Figure 11. 6dFGS Fundamental Plane data and hyper-fit fit. The two panels are different orientations of the default plot output of the hyper.plot3d function (accessed via the class specific plot method) included as part of the R hyper-fit package, where the best-fit hyper-fit generative model for the data is shown as a translucent grey 3D plane. The package function is interactive, allowing the user the rotate the data and the overlaid 3D plane to any desired orientation. See Figure 3 for further details on this Figure.

Figure 11

Figure 12. Mass-spin-morphology (MJB) data and hyper-fit fit. The two panels are different orientations of the default plot output. All error ellipsoids overlap with the best-fit 3D plane found using hyper-fit, implying that the generative model does not require any additional scatter. Indeed the observed data is unusually close to the plane, implying slightly overestimated errors. See Figures 3 and 11 for further details on this Figure.

Figure 12

Figure A1. Convergence tests for simulated generative hyperplane datasets. The solid lines show the raw mean intrinsic scatter for different types of generative model and fitting, and the dashed lines show the intrinsic scatter once the appropriate combination of bias and sample–population corrections have been applied. In all cases, the generative model was simulated with an intrinsic scatter for the population equal to 3.