Hostname: page-component-89b8bd64d-dvtzq Total loading time: 0 Render date: 2026-05-11T05:36:52.077Z Has data issue: false hasContentIssue false

A catalogue of complex radio sources in the Rapid ASKAP Continuum Survey created using a self-organising map

Published online by Cambridge University Press:  17 January 2025

Afrida Alam*
Affiliation:
E.A. Milne Centre for Astrophysics, University of Hull, Kingston-upon-Hull, UK
Kevin Pimbblet
Affiliation:
E.A. Milne Centre for Astrophysics, University of Hull, Kingston-upon-Hull, UK Centre of Excellence for Data Science, AI, and Modelling (DAIM), University of Hull, Kingston-upon-Hull, UK
Yjan Gordon
Affiliation:
Department of Physics, University of Wisconsin-Madison, Madison, WI, USA
*
Corresponding author: Afrida Alam; Email: a.alam-2019@hull.ac.uk
Rights & Permissions [Opens in a new window]

Abstract

Next generations of radio surveys are expected to identify tens of millions of new sources and identifying and classifying their morphologies will require novel and more efficient methods. Self-organising maps (SOMs), a type of unsupervised machine learning, can be used to address this problem. We map 251 259 multi-Gaussian sources from Rapid ASKAP Continuum Survey (RACS) onto a SOM with discrete neurons. Similarity metrics, such as Euclidean distances, can be used to identify the best-matching neuron or unit (BMU) for each input image. We establish a reliability threshold by visually inspecting a subset of input images and their corresponding BMU. We label the individual neurons based on observed morphologies, and these labels are included in our value-added catalogue of RACS sources. Sources for which the Euclidean distance to their BMU is $\lesssim$5 (accounting for approximately 79$\%$ of sources) have an estimated $ \gt $90% reliability for their SOM-derived morphological labels. This reliability falls to less than 70$\%$ at Euclidean distances $\gtrsim$7. Beyond this threshold it is unlikely that the morphological label will accurately describe a given source. Our catalogue of complex radio sources from RACS with their SOM-derived morphological labels from this work will be made publicly available.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press on behalf of Astronomical Society of Australia
Figure 0

Figure 1. The preprocessing stages for each of the RACS image cutouts for a randomly chosen image cutout. On the first panel (a) we have the original image with its RA and DEC coordinates. On the second panel (b) we show the distribution of the pixel values along with the upper bound of the noise estimate and the mask applied (given by noise estimate multiplied by a minimum signal-to-noise value of 2). On the third panel (c) we show the original image with the mask overlaid on top. Once a mask is applied, we log scale the remained pixels and normalise them from 0 to 1 which gives us the final preprocessed image in the fourth panel (d).

Figure 1

Figure 2. The original cutout and the preprocessed images of the cutouts done with different values of the minimum signal-to-noise ratio (0, 1, 2, 3, and 4) for the mask. The value was set to 2 since it is enough to mask the majority of the noise without losing much information as we can see in the images.

Figure 2

Table 1. The hyperparameters used in the four training stages: the width of the neighbourhood function $G_{ij}$ given by $\sigma$, the learning rate $\alpha$, the number of rotations and iterations.

Figure 3

Figure 3. The trained 10x10 SOM with manual labels of their morphological labels: C (Compact) sources, EC (Extended Compact), CD (Connected Double) sources, SD (Split Double) sources, T (Triple) sources, U/A (Uncertain/Ambiguous) sources. The labels on the axis indicate the neuron coordinate in the SOM grid such that the top left neuron is (0, 0) with morphological label EC. The SOM can also divided into four quadrants: top left, top right, bottom left, and bottom right (marked in red) for additional analysis.

Figure 4

Figure 4. Density map showing the Best-Matching Unit (BMU) count across the trained SOM.

Figure 5

Figure 5. The distribution of the Euclidean distance between input images and their corresponding BMU neuron.

Figure 6

Table 2. Intervals based on Euclidean distance between randomly chosen input images and their BMU.

Figure 7

Figure 6. Upper panel (a): A manual validation of the match between original input images in the full validation sample and their corresponding BMU, where the sample is divided into smaller intervals on the Euclidean distance (Table 2). Lower panels (b–e): Distribution of the ‘Yes’ and ‘No’ matches from the validation scheme above split into the SOM quadrants: top left quadrant (b), top right quadrant (c), bottom left quadrant (d), and bottom right quadrant (e).

Figure 8

Figure 7. Examples of a ‘Yes’ match for an Input Image, and its corresponding Preprocessed Image and Best-Matching Unit (BMU).

Figure 9

Figure 8. Examples of a ‘No’ match for an Input Image, and its corresponding Preprocessed Image and Best-Matching Unit (BMU).

Figure 10

Figure 9. The distribution of the Euclidean distance between input images and their corresponding BMU for the ‘Yes’ and ‘No’ matches in the validation sample (a). The distance distributions for the sources in the validation sample grouped into SOM regions: Top left quadrant (b), top right quadrant (c), bottom left quadrant (d), and bottom right quadrant (e).

Figure 11

Figure 10. The best-matching, 50th percentile and 90th percentile images for the labels Compact (C), Extended Compact (EC) and Connected Double (CD).

Figure 12

Figure 11. The best-matching, 50th percentile and 90th percentile images for the labels Split Double (SD), Triple (T) and Uncertain/Ambiguous (U/A).

Figure 13

Table 3. Summary of the classification of the morphological labels. From left to right we give the morphological label, the number of neurons which were assigned said label, the total number of sources in the RACS catalogue once the labels were transferred, and the split of the sources into each reliability percentage from the validation process based on Euclidean distances.

Figure 14

Figure 12. The 12 sources with the largest Euclidean distances between their input images and BMU. For each source we give its source name, RA, Dec, BMU in the SOM grid and its morphological label after label transfer. The distance for these sources range from 34.36 to 43.82 and they all have a reliability percentage of around 4.0% based on the validation scheme.

Figure 15

Table 4. Description of the columns in the catalogue created in this paper.

Figure 16

Table 5. The first 30 rows from the final catalogue of complex sources created using the SOM.

Figure 17

Figure A1. The distributions of the BMU Euclidean distance between an individual neuron in the SOM grid and all the input images for which it was chosen as the BMU for all the neurons in the top left quadrant of the SOM (Fig. 3).

Figure 18

Figure A2. The distributions of the BMU Euclidean distance between an individual neuron in the SOM grid and all the input images for which it was chosen as the BMU for all the neurons in the top right quadrant of the SOM (Fig. 3).

Figure 19

Figure A3. The distributions of the BMU Euclidean distance between an individual neuron in the SOM grid and all the input images for which it was chosen as the BMU for all the neurons in the bottom left quadrant of the SOM (Fig. 3).

Figure 20

Figure A4. The distributions of the BMU Euclidean distance between an individual neuron in the SOM grid and all the input images for which it was chosen as the BMU for all the neurons in the bottom right quadrant of the SOM (Fig. 3).