Hostname: page-component-5db58dd55d-pjp64 Total loading time: 0 Render date: 2026-05-31T15:51:24.190Z Has data issue: false hasContentIssue false

A sensitivity analysis of a regression model of ocean temperature

Published online by Cambridge University Press:  30 August 2022

Rachel Furner*
Affiliation:
Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, United Kingdom British Antarctic Survey, Cambridge, United Kingdom
Peter Haynes
Affiliation:
Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, United Kingdom
Dave Munday
Affiliation:
British Antarctic Survey, Cambridge, United Kingdom
Brooks Paige
Affiliation:
UCL Centre for Artificial Intelligence, Computer Science, University College London, London, United Kingdom
Daniel C. Jones
Affiliation:
British Antarctic Survey, Cambridge, United Kingdom
Emily Shuckburgh
Affiliation:
Department of Computer Science and Technology, University of Cambridge
*
*Corresponding author. E-mail: raf59@cam.ac.uk

Abstract

There has been much recent interest in developing data-driven models for weather and climate predictions. However, there are open questions regarding their generalizability and robustness, highlighting a need to better understand how they make their predictions. In particular, it is important to understand whether data-driven models learn the underlying physics of the system against which they are trained, or simply identify statistical patterns without any clear link to the underlying physics. In this paper, we describe a sensitivity analysis of a regression-based model of ocean temperature, trained against simulations from a 3D ocean model setup in a very simple configuration. We show that the regressor heavily bases its forecasts on, and is dependent on, variables known to be key to the physics such as currents and density. By contrast, the regressor does not make heavy use of inputs such as location, which have limited direct physical impacts. The model requires nonlinear interactions between inputs in order to show any meaningful skill—in line with the highly nonlinear dynamics of the ocean. Further analysis interprets the ways certain variables are used by the regression model. We see that information about the vertical profile of the water column reduces errors in regions of convective activity, and information about the currents reduces errors in regions dominated by advective processes. Our results demonstrate that even a simple regression model is capable of learning much of the physics of the system being modeled. We expect that a similar sensitivity analysis could be usefully applied to more complex ocean configurations.

Information

Type
Application Paper
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2022. Published by Cambridge University Press
Figure 0

Table 1. Key parameter information for MITgcm simulation.

Figure 1

Figure 1. Plot of simulator temperature (°C) at 25 m below the surface (a) and at 13° E (d), for one particular day. Change in temperature between over 1 day at 25 m below the surface (b) and at 13° E (e). Standard deviation in temperature at 25 m below the surface (c) and at 13° E (f). Time series at 57° N, 17° E, and −25 m (g), and at 55° S, 9° E, and −25 m (h). Note that the depth axis is scaled to give each GCM grid cell equal spacing. The simulator shows a realistic temperature distribution with warm surface water near the equator, and cooler water near the poles and in the deep ocean. Temperature changes are largest in the very north of the domain and throughout the southern region. Though changes per day are small, they accumulate over time to give cycles of around 0.2° in some regions of the domain.

Figure 2

Figure 2. Scatter plot of predictions against truth for both training (a) and validation (b) datasets for the control regressor. Over the training set, the regressor does a good job of predicting for both the dominant near-zero behavior, and the very rare temperature changes of more than $ \pm $0.002°. Over the validation dataset, the regressor drops in accuracy, with a tendency to underpredict, particularly for large changes, but still shows some skill.

Figure 3

Table 2. Table showing RMS errors and RMS errors normalised by the control for a series of withholding experiments. Results are also included from a persistence model (bottom row) for comparison. The two withholding experiments which make the largest difference to each error metric are shown in italic. These are ordered in terms of RMS error over the training dataset, with variables which are most necessary for predictive skill appearing nearest the bottom. It is critical to include polynomial interactions. Information on the vertical structure, and on the currents is also necessary for good predictive skill.

Figure 4

Figure 3. Mean Abs Error of predictions (°C) at −25 m depth (a) and 13° E (b).The errors are largest in the very north of the domain, and in the southern region, in locations where the temperature change itself is largest. Comparing with Figures A1 and A2, we see that errors are largest in the areas of increased vertical fluxes and locations with high meridional diffusion, and high zonal advection.

Figure 5

Figure 4. (a) Coefficients of the control regressor. Coefficients averaged over all input locations for each variable type, and each set of nonlinear combinations of variables. (b) Coefficients for polynomial terms representing temperature–temperature interactions across all pairs of input locations. We see that density is very heavily weighted, and therefore providing a large part of the predictive skill of this model, this is in line with our physical understanding that density changes are driving convective temperature change. The interactions between the temperature at the point we are predicting and the temperature at surrounding points are also very highly weighted. This is in line with our physical knowledge of advection and diffusion driving temperature change.

Figure 6

Figure 5. Scatter plot of predictions against truth over the training dataset for the regressor trained with no polynomial interaction terms. A purely linear regressor (trained without nonlinear interactions) is unable to capture the behavior of the system. This is expected as we know the underlying system to be highly nonlinear.

Figure 7

Figure 6. (a) Cross section at 13° E of Mean Abs Error for the regressor trained using a 2D stencil. (b) Difference between this and the control run (Figure 3b). When withholding information about the vertical structure, errors in the regressors prediction are increased in a region north of 50° and south of −30°. Comparing this with Figures A1 and A2, we can see how the areas of increased errors correspond to particular processes.

Figure 8

Figure 7. (a) Cross section at 13° E of Mean Abs Error for the regressor trained with information on the currents withheld. (b) Difference between this and the control run (Figure 3b). When withholding currents, errors in the regression model are increased north of 55°, and in a broader region south of −35°. Comparing this with Figures A1 and A2, we can see how the areas of increased errors correspond to particular processes.

Figure 9

Figure A1. Average absolute zonal (a), meridional (b), and vertical (c) advective fluxes of temperature at 13° E. Horizontal advective fluxes are largest in the southern region of the domain, associated with the ACC-like current. There is a large amount of vertical advection in the north of 55°, and at −30 to −40°, associated with regions of upwelling and downwelling.

Figure 10

Figure A2. Average absolute zonal (a), meridional (b), and (explicit) vertical (c) diffusive temperature fluxes, and convective (implicit vertical diffusive) temperature fluxes (d) at 13° E. There are large amounts of meridional diffusion associated with the ACC-like jet in the south. Zonal diffusion occurs in mid depth in the north of the domain, and just north of −40°. Vertical diffusion occurs through the south of the domain, and a small region just south of 50°. Convection occurs throughout the domain, and is particularly noteworthy in the upper waters of the ocean north of 50°, and south of −35°.

Figure 11

Figure B1. Coefficients of the control regressor for each input location and for each variable type. For linear inputs (top row) and for each set of nonlinear combinations of variables.

Figure 12

Table C1. Table showing RMS errors and skill scores for four models trained to predict temperature change over an increasing forecast period. RMS errors increase with forecast period, but skill scores are largely unaffected.