Hostname: page-component-89b8bd64d-mmrw7 Total loading time: 0 Render date: 2026-05-08T04:03:50.396Z Has data issue: false hasContentIssue false

Classification-informed estimation: the role of water-type clustering to improve neural network generalization for salinity and temperature estimation in coastal waters

Published online by Cambridge University Press:  04 June 2025

Solomon White*
Affiliation:
School of Infrastructure and Environment, University of Edinburgh, Edinburgh, UK
Encarni Medina-Lopez
Affiliation:
School of Infrastructure and Environment, University of Edinburgh, Edinburgh, UK
Tiago Silva
Affiliation:
Centre for Environment, Fisheries and Aquaculture Science (CEFAS), Lowestoft, UK
Evangelos Spyrakos
Affiliation:
Biological and Environmental Sciences, University of Stirling, Stirling, UK
Laurent Amoudry
Affiliation:
Marine Physics and Ocean Climate, National Oceanography Centre (NOC), Southampton, UK.
Adrien Martin
Affiliation:
Noveltis, Labège, France
*
Corresponding author: Solomon White; Email: solomon.white@ed.ac.uk

Abstract

Sea surface salinity and temperature are essential climate variables in monitoring and modeling ocean health. Multispectral ocean color satellites allow the estimation of these properties at a resolution of 10 to 300 m, which is required to correctly represent their spatial variability in coastal waters. This paper investigates the effect of pre-applying an unsupervised classification in the performance of both temperature and salinity inversion. Two methodologies were explored: clustering based solely on spectral radiances, and clustering applied directly to satellite images. The former improved model generalization by identifying similar water clusters across different locations, reducing location dependency. It also demonstrated results correlating cluster type with salinity and temperature distributions thereby enhancing regression model performance and improving a global ocean color sea surface temperature regression model RMSE error by 10%. The latter approach, applying clustering directly to satellite images, incorporated spatial information into the models and enabled the identification of front boundaries and gradient information, improving global sea surface temperature models RMSE by 20% and sea surface salinity models by 30%, compared to the initial ocean color model. Beyond improving algorithm performance, optical water classification can be used to monitor and interpret changes to water optics, including algal blooms, sediment disturbance or other climate change or antropogenic disturbances. For example, the clusters have been used to show the impact of a category 4 hurricane landfall on the Mississippi estuarine region.

Information

Type
Application Paper
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NC
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial licence (http://creativecommons.org/licenses/by-nc/4.0), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original article is properly cited. The written permission of Cambridge University Press must be obtained prior to any commercial use.
Copyright
© The Author(s), 2025. Published by Cambridge University Press
Figure 0

Figure 1. Silhouette score and cluster visualization for number of clusters = 9. Feature space visualization for first and second principle components with PCA variance 99% (number of components = 5).

Figure 1

Figure 2. Clustered images, showing the coast adjacent to the Apalachicola River Wildlife and Environmental Area. With land masked in white. Clusters show agreement between different processes with less noise in the dilation clustering. a) Original Image, b) Simple clustered image, c) Merged clustered image (with minimum cluster size 25 pixels), d) Dilation smoothed clustered image.

Figure 2

Figure 3. Cluster spread from the Global dataset focused on the Mediterranean (a) and the North-West Atlantic coast and Gulf of Mexico (b).

Figure 3

Figure 4. K-means clustering shows the different cluster classes against different input feature groups, enabling visualization of the feature weight and spectral variability to the classified cluster. a) K-means on pointwise spectral data shows separation of clusters by band b) K-means on segmented image has less clear band importance.

Figure 4

Figure 5. Spectra for the different clusters for the global dataset (a), Gulf of Mexico (d), and the UK (g), and the corresponding kernel density functions for the sea surface temperature (°C) and salinity (PSU) distributions.

Figure 5

Figure 6. Global Scatter plots of predicted against actual sea surface temperature for each segmented image model trained only on the selected cluster class.

Figure 6

Table 1. Model performance metrics for overall sea surface temperature and salinity models, with the best-performing model per region highlighted in gray

Figure 7

Figure 7. Cluster visualization for Mississippi outflow in the Gulf of Mexico against a true color Sentinel 2 image. The images are from September 2022 and September 2021, when there was significant flooding due to the landfall of Hurricane Ida, a Category 4 storm.

Figure 8

Figure 8. Cluster size for each season in the years 2018–2023, for the Mississippi outflow region is shown in Figure 7.

Figure 9

Figure 9. Cluster class vs sea surface temperature in the Gulf of Mexico test region, adjacent to the Apalachicola River Wildlife and Environmental Area.

Author comment: Classification-informed estimation: the role of water-type clustering to improve neural network generalization for salinity and temperature estimation in coastal waters — R0/PR1

Comments

Special Issue Title: Tackling Climate Change with Machine Learning

Environmental Data Science

Dear Claire Monteleoni,

I am pleased to submit our manuscript entitled “Sea Surface Salinity and Temperature Estimation Using Unsupervised Clustering with Multispectral Ocean Colour Satellites” for consideration in the special issue “Tackling Climate Change with Machine Learning?” in Environmental Data Science.

In this manuscript, we explore the crucial role of multispectral ocean colour satellites in enhancing the estimation of sea surface salinity (SSS) and temperature (SST), pivotal variables in monitoring ocean health and climate dynamics. Our study investigates the impact of unsupervised clustering techniques applied to satellite data, comparing methodologies based on spectral radiances and direct image clustering. We demonstrate significant improvements in model performance, reducing RMSE errors in global SST models by 20% and SSS models by 30% through enhanced spatial resolution and refined data interpretation capabilities.

Furthermore, our research highlights the broader implications of optical water classification, including its utility in monitoring environmental changes such as algal blooms, sediment disturbance, and responses to climate change events like hurricane impacts on estuarine regions.

Given the special issue’s focus on the interplay between artificial intelligence and application to climate change, we believe our findings will contribute substantially to understanding between advanced data analytics, remote sensing and climate model approaches in environmental sciences.

We trust that our manuscript aligns well with the scope and objectives of this special issue. We appreciate your consideration of our work for publication in Environmental Data Science and look forward to your feedback.

Thank you for your time and consideration.

Sincerely,

Solomon White

University of Edinburgh

solomon.white@ed.ac.uk

Review: Classification-informed estimation: the role of water-type clustering to improve neural network generalization for salinity and temperature estimation in coastal waters — R0/PR2

Conflict of interest statement

Reviewer declares none.

Comments

Including water type classification as a pre-processing step to improve neural network generalisation for salinity and temperature estimation in coastal waters.

This paper introduces a neat methodology for kmeans clustering water types before input into a neural network to increase performance. As a way of incorporating some physical knowledge into the neural network, it seems to work well for specific use cases. However, I am missing some detail around the model, and the results may be oversold or not quite contextualized enough. I therefore recommend major revisions.

For reference – I am using the page numbers starting with pg 2 as the title page and line numbers are approximate.

Major:

I am struggling to understand what the actual predictive model is here because I’m not sure it is explicitly stated. E.g., what is the model in Figure 6 that relates the clustered images to an SST prediction? I assume it is some kind of neural network given the title but I think it needs to be further specified. Especially things like training/validation/testing data splits and model selection.

This may be related to my previous comment since I do not understand the model used here, but I am skeptical of the proposed methodology’s advancement. Table 1 is not convincing to me as it appears overfit to the training data. I notice the authors do comment on this in the conclusion as a caveat, but as the use case of these predictions is around coastal regions, I think the lack of ability to generalize is a concern. Is the skill in the global model entirely driven by correct open water prediction?

Minor

Abstract: There are percentage improvements quoted here but these are not explicitly referenced in the text, e.g. on pg 9 these could be incorporated into lines 33-46 (I assume that it where these numbers come from)

Pg3 L 10ish “trained using…” I think a word is missing

Pg3 L 10-16 split this paragraph up

Pg3 L 18ish “they model learning relationships” does not quite make sense to me

Section 2: Include the spatial resolution of both the CMEM5 and Sentinel data e.g. approx how big is a pixel?

Pg 4-5 Since Kmeans is the only clustering algorithm used here, I do not think descriptions of the other algorithms are necessary

Pg 6 L 35 typo “model no extrapolating”

Fig 3 Please make text in the label larger

Fig 5 I suggest moving the panel headings to above the panels. Also please include SST and SSS units on the x axes, (unless they are standardized in which case please state). I also think the cluster 8 is erroneously included in panel h

Pg 11 L 48 specify that the panel is an SST prediction in the text. Maybe also include the ground truth SST in figure 9 for a visual comparison

Table 1: Scanning down the NMSE, as well as the training/testing differences, it appears to me that the init model is still the most preferable as it is more generalizable and much less over-fit

Pg 13: If the ideas of the two final paragraphs are to be juxtaposed, perhaps mention that there is some tension between the conclusion that data availability will hinder this work’s applicability in the global south, while the algorithm’s simplicity will encourage it’s use. I think we (all scientists) need to be careful about making broad statements such as these when they are nuanced, multi-faceted problems.

Pg 13 L 45: It’s -> It is

Review: Classification-informed estimation: the role of water-type clustering to improve neural network generalization for salinity and temperature estimation in coastal waters — R0/PR3

Conflict of interest statement

Reviewer declares none.

Comments

The authors conducted experiments on various datasets to validate the proposed algorithm. The results show the proposed approach is useful to the domain. The manuscript is well - written.

Recommendation: Classification-informed estimation: the role of water-type clustering to improve neural network generalization for salinity and temperature estimation in coastal waters — R0/PR4

Comments

Both reviewers see the paper as an important contribution to the community. R1 provides detailed comments on what needs to be improved prior to acceptance. Thus, I am asking the authors to prepare a major revision. The details missing for the machine learning model and concerns about overfitting must be presented and discussed in more detail.

Decision: Classification-informed estimation: the role of water-type clustering to improve neural network generalization for salinity and temperature estimation in coastal waters — R0/PR5

Comments

No accompanying comment.

Author comment: Classification-informed estimation: the role of water-type clustering to improve neural network generalization for salinity and temperature estimation in coastal waters — R1/PR6

Comments

Dear Editor,

I am pleased to submit our manuscript entitled “Improving Neural Network Model Generalisation by Including Water Type Classification as Pre-Processing to the Satellite Image” for consideration in Environmental Data Science. This paper addresses the critical need for improved generalization in ocean color models used for monitoring sea surface properties, specifically sea surface temperature (SST) and salinity (SSS). By incorporating water type classification as a pre-processing step, our study demonstrates a significant improvement in model accuracy and spatial coherence, especially in diverse and data-scarce coastal environments.

We believe this manuscript aligns well with Environmental Data Science’s focus on innovative approaches to environmental monitoring and modeling. We look forward to the opportunity to contribute to the ongoing discourse on leveraging data science for ocean health assessment.

Thank you for your time and consideration. We look forward to your feedback.

Sincerely,

Solomon White

University of Edinburgh

Review: Classification-informed estimation: the role of water-type clustering to improve neural network generalization for salinity and temperature estimation in coastal waters — R1/PR7

Conflict of interest statement

Reviewer declares none.

Comments

I thank the authors for implementing my suggested changes to the manuscript, and find it improved. I still have places where I am a little confused, or need further detail. I therefore am recommending minor revisions.

Minor comments (note page numbers are taken from the EDS version)

Throughout: Please have consistent use of cluster indexing in figures. Currently, in some figures they are numbered 0-8 and in some they are numbered 1-9

Page 1: I’m not sure what has happened with the template used, but I think co-authors, affiliations and key words are now missing

Page 5 L17-22: Thank you for including this text. I note that testing and validation seem to be used interchangeably here. Where possible, the same datasets should not be used for both validation and testing, but especially not since the authors are using both early-stopping and k-fold cross validation. If the testing data is used as validation data for model selection (i.e. for early stopping, hyper-parameter tuning) then the neural network will most likely not be generalisable. On the other hand the authors do state that they hold out some buoy data. So please be clear, which data is used for training, which for validation, which is unseen testing data, and then what data is being used to generate the figures and metrics in the results section.

Page 9 L24: For readability, I suggest making this a new section and being very clear in the first sentence that these SSTs and SSSs are predictions from the neural net.

Page 9 L36: Since NMSE has been removed from the table, also remove from the text

Table 1: Are the cluster averages weighted by the number of samples in each cluster?

Figure 9b I suggest adding a colour bar to indicate which clusters numbers are present here.

Review: Classification-informed estimation: the role of water-type clustering to improve neural network generalization for salinity and temperature estimation in coastal waters — R1/PR8

Conflict of interest statement

Reviewer declares none.

Comments

1. Highlight the best metrics in Table 1

2. Describe SST and SSS in captions of Tables and Figures wherever used

3. Give R2 values in Figure 6

5. Describe cluster name in Figure 5

6. I did not get clarity on what neural network is used and description of it.

7. Give an overview of work done in the paper in the form of a diagram for reader’s clarity

Recommendation: Classification-informed estimation: the role of water-type clustering to improve neural network generalization for salinity and temperature estimation in coastal waters — R1/PR9

Comments

The two reviewers recommend a minor revision to prepare the manuscript for publication. Please study their comments in detail and provide a revision as suggested by the reviewers. I see priority on suggestions 1-5 of R2. All of R1’s comments should be considered.

Decision: Classification-informed estimation: the role of water-type clustering to improve neural network generalization for salinity and temperature estimation in coastal waters — R1/PR10

Comments

No accompanying comment.

Author comment: Classification-informed estimation: the role of water-type clustering to improve neural network generalization for salinity and temperature estimation in coastal waters — R2/PR11

Comments

I am writing to submit our manuscript for consideration for publication in Environmental Data Science. This study presents an innovative approach to enhancing the estimation of key ocean health parameters—sea surface temperature (SST) and sea surface salinity (SSS)—using multispectral satellite ocean colour data.

Our paper explores the impact of applying unsupervised classification techniques to satellite-derived data for improved inversion of SST and SSS. We compare two clustering approaches: one based on spectral radiances and another on satellite images directly. The results demonstrate that clustering based on spectral radiances reduces location dependency and improves regression model generalisation, leading to a 10% reduction in SST RMSE. Furthermore, clustering applied to satellite images integrates spatial information and identifies front boundaries and gradients, improving SST RMSE by 20% and SSS RMSE by 30%. This approach not only enhances model performance but also offers a valuable tool for monitoring and interpreting changes in water optics due to algal blooms, sediment disturbance, and other climate or anthropogenic disturbances. A case study showing the impact of a category 4 hurricane on the Mississippi estuarine region highlights the utility of this method for real-world environmental monitoring.

Review: Classification-informed estimation: the role of water-type clustering to improve neural network generalization for salinity and temperature estimation in coastal waters — R2/PR12

Conflict of interest statement

Reviewer declares none.

Comments

The authors have addressed my comments satisfactorily. I recommend accept.

Recommendation: Classification-informed estimation: the role of water-type clustering to improve neural network generalization for salinity and temperature estimation in coastal waters — R2/PR13

Comments

Please cite the datasets mentioned in the Data Availability Statement. See Cambridge University Press advice on data citation here: https://www.cambridge.org/core/services/why-cite-data

Decision: Classification-informed estimation: the role of water-type clustering to improve neural network generalization for salinity and temperature estimation in coastal waters — R2/PR14

Comments

No accompanying comment.