Hostname: page-component-5db58dd55d-h5th4 Total loading time: 0 Render date: 2026-06-06T08:08:09.542Z Has data issue: false hasContentIssue false

GalProTE: Galactic properties mapping using transformer encoder

Published online by Cambridge University Press:  01 July 2025

Omar Anwar*
Affiliation:
International Centre for Radio Astronomy Research (ICRAR), The University of Western Australia (UWA), Crawley, WA, Australia
Brent Groves
Affiliation:
International Centre for Radio Astronomy Research (ICRAR), The University of Western Australia (UWA), Crawley, WA, Australia
Luca Cortese
Affiliation:
International Centre for Radio Astronomy Research (ICRAR), The University of Western Australia (UWA), Crawley, WA, Australia
Adam Brian Watts
Affiliation:
International Centre for Radio Astronomy Research (ICRAR), The University of Western Australia (UWA), Crawley, WA, Australia
*
Corresponding author: Omar Anwar, Email: omar.anwar@uwa.edu.au
Rights & Permissions [Opens in a new window]

Abstract

This work introduces GalProTE, a proof-of-concept Machine Learning model, leveraging Transformer Encoder architecture to efficiently determine the stellar age, metallicity, and dust attenuation of galaxies from optical spectra. Designed to address the challenges posed by the vast datasets produced by modern astronomical surveys, GalProTE offers a significant improvement in processing speed while maintaining accuracy. Using the E-MILES spectral library, we generate a dataset of 111936 diverse templates by expanding the original 636 simple stellar population models with varying extinction levels, combinations of multiple spectra, and noise modifications. This ensures robust training over the spectral range of 4750–7100 Å at a resolution of 2.5 Å. GalProTE architecture employs four parallel attention-based encoders with varying kernel sizes to capture diverse spectral features. The model demonstrates a mean squared error (MSE) of 0.27% with a standard deviation of 0.10% between the input spectra and the GalProTE-generated spectra for the synthetic test dataset. Performance evaluation against real data from two galaxies in the PHANGS-MUSE survey (NGC4254 and NGC5068) demonstrates its ability to extract physical parameters efficiently, with spectral fit residuals showing a mean of -0.02% and 0.28%, and standard deviations of 4.3% and 5.3%, respectively. To contextualize these results, we compare age, metallicity and dust attenuation maps generated by GalProTE with those of pPXF, a state-of-the-art spectral fitting tool. While pPXF achieves robust results, it requires approximately 11 sec per spectrum. In contrast, GalProTE processes a spectrum in less than 4 ms – a speedup factor exceeding 2750, while also consuming 68 times less power per spectrum. The comparison with pPXF maps from PHANGS-MUSE underscores GalProTE’s capacity to enhance traditional methods through machine learning, paving the way for faster, more energy-efficient, and more comprehensive analyses of galactic properties. This study demonstrates the potential of GalProTE as an efficient, scalable, and sustainable solution for processing large astronomical surveys.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press on behalf of Astronomical Society of Australia
Figure 0

Figure 1. Overview of the proposed deep learning model. GalProTE accepts a normalised spectrum as input, which is processed through four independent parallel blocks. Each block employs a self-attention mechanism to emphasize distinct regions of the spectrum and convolutional layers to extract features. The transformer within each block encodes these features, which are then passed to two fully connected layers. One layer predicts the age-metallicity grid for the input spectrum, while the other predicts the dust attenuation. Using these predictions, a reconstructed spectrum is generated for comparison with the input spectrum.

Figure 1

Table 1. Breakdown of single and combination templates for each value of $A_v$.

Figure 2

Figure 2. Average of Age-Z grids for the entire dataset, before and after the processing.

Figure 3

Table 2. Masked spectral regions to exclude major emission lines.

Figure 4

Figure 3. Block diagram of the model’s data flow from input to the Encoder’s output. The self-attention mechanism extracts context from input spectrum, which is then added back to the input spectrum, followed by convolution, batch normalisation, and ReLU activation. After average pooling, data passes through the Transformer Encoder layer with three attention heads, producing the block output.

Figure 5

Figure 4. Mean context vectors learned by the 4 parallel blocks of model with different kernel sizes, where each block emphasises features of different sizes. These contexts are added to the input spectra, before features are extracted to predict age, metallicity and dust attenuation.

Figure 6

Figure 5. A test spectrum with 0% Gaussian noise, high-noise patches, and masked patches. The first subplot compares the noisy input spectrum with the predicted spectrum. Some of the high noise patches are shown using brown rectangles, whereas green ellipses show some of the masked patches in the input spectra. The second subplot shows the residuals between the input and predicted spectra. The third subplot displays the grid predicted by the model, with the mean age and metallicity bin marked. The fourth subplot presents the grid of the original noise-free input spectrum. The residuals are primarily concentrated in high-noise and masked patches, with some impact from the mismatch of 0.1 in Av. Overall, the predicted grid and the reconstructed spectra demonstrate promising accuracy.

Figure 7

Figure 6. A test spectrum with 2% Gaussian noise, high-noise patches, and masked patches. The predicted mean metallicity aligns closely with the original metallicity on the grid, while the predicted age shows some error. The model tends to struggle for higher age values. However, the most dominant population on the predicted grid (bright yellow) is notably closer to the original mean age.

Figure 8

Figure 7. A test spectrum with 4% Gaussian noise, high-noise patches, and masked patches. This example poses a significant challenge due to the high level of Gaussian noise and the close proximity of the four populations on the original grid. Despite these, the predicted mean values remain close to the original means, and the spectral fit demonstrates reasonable quality, showcasing the model’s robustness under adverse conditions.

Figure 9

Figure 8. Histogram on the top and confusion matrix on the bottom for the original and the predicted mean ages.

Figure 10

Figure 9. Histogram on the top and confusion matrix on the bottom for the original and the predicted mean metallicities.

Figure 11

Figure 10. Histogram on the top and confusion matrix on the bottom for the original and the predicted dust attenuation.

Figure 12

Figure 11. Errors in age (dex), dust attenuation ($A_v$) and metallicity ([M/H]) predictions plotted against the % MSE of the test set.

Figure 13

Figure 12. Histograms of the residuals between the input and the predicted spectra.

Figure 14

Figure 13. Age maps and running difference for NGC4254.

Figure 15

Figure 14. Age maps and running difference for NGC5068.

Figure 16

Figure 15. Metallicity maps and running difference for NGC4254.

Figure 17

Figure 16. Metallicity maps and running difference for NGC5068.

Figure 18

Figure 17. Dust attenuation maps and running difference for NGC4254.

Figure 19

Figure 18. Dust attenuation maps and running difference for NGC5068.

Figure 20

Figure A1. Alternative approach: Dust attenuation maps of NGC4254.

Figure 21

Figure A2. Alternative approach: Dust attenuation maps of NGC5068.