Hostname: page-component-89b8bd64d-72crv Total loading time: 0 Render date: 2026-05-08T00:17:15.186Z Has data issue: false hasContentIssue false

WALLABY Pilot Survey: HI source-finding with a machine learning framework

Published online by Cambridge University Press:  10 February 2025

Li Wang*
Affiliation:
ATNF, CSIRO Space and Astronomy, Bentley, WA, Australia
O. Ivy Wong
Affiliation:
ATNF, CSIRO Space and Astronomy, Bentley, WA, Australia International Centre for Radio Astronomy Research (ICRAR), University of Western Australia, Crawley, WA, Australia ARC Centre of Excellence for All-Sky Astrophysics in 3 Dimensions (ASTRO 3D), Australia
Tobias Westmeier
Affiliation:
International Centre for Radio Astronomy Research (ICRAR), University of Western Australia, Crawley, WA, Australia ARC Centre of Excellence for All-Sky Astrophysics in 3 Dimensions (ASTRO 3D), Australia
Chandrashekar Murugeshan
Affiliation:
ATNF, CSIRO Space and Astronomy, Bentley, WA, Australia ARC Centre of Excellence for All-Sky Astrophysics in 3 Dimensions (ASTRO 3D), Australia
Karen Lee-Waddell
Affiliation:
ATNF, CSIRO Space and Astronomy, Bentley, WA, Australia International Centre for Radio Astronomy Research (ICRAR), University of Western Australia, Crawley, WA, Australia International Centre for Radio Astronomy Research (ICRAR), Curtin University, Bentley, WA, Australia
Yuanzhi Cai
Affiliation:
CSIRO Mineral Resource, Kensington, WA, Australia
Xiu Liu
Affiliation:
ATNF, CSIRO Space and Astronomy, Bentley, WA, Australia Western Australian School of Mines: Minerals, Energy and Chemical Engineering, Curtin University, Perth, WA, Australia
Austin Xiaofan Shen
Affiliation:
ATNF, CSIRO Space and Astronomy, Bentley, WA, Australia
Jonghwan Rhee
Affiliation:
International Centre for Radio Astronomy Research (ICRAR), University of Western Australia, Crawley, WA, Australia
Helga Dénes
Affiliation:
School of Physical Sciences and Nanotechnology, Yachay Tech University, Urcuquí, Ecuador
Nathan Deg
Affiliation:
Department of Physics, Engineering Physics, and Astronomy, Queen’s University, Kingston, ON, Canada
Peter Kamphuis
Affiliation:
Faculty of Physics and Astronomy, Astronomical Institute (AIRUB), Ruhr University Bochum, Bochum, Germany
Barbara Catinella
Affiliation:
International Centre for Radio Astronomy Research (ICRAR), University of Western Australia, Crawley, WA, Australia ARC Centre of Excellence for All-Sky Astrophysics in 3 Dimensions (ASTRO 3D), Australia
*
Corresponding author: Li Wang; Email: Li.Wang1@csiro.au
Rights & Permissions [Opens in a new window]

Abstract

The data volumes generated by theWidefield ASKAP L-band Legacy All-sky Blind surveY atomic hydrogen (Hi) survey using the Australian Square Kilometre Array Pathfinder (ASKAP) necessitate greater automation and reliable automation in the task of source finding and cataloguing. To this end, we introduce and explore a novel deep learning framework for detecting low signal-to-noise ratio (SNR) Hi sources in an automated fashion. Specifically, our proposed method provides an automated process for separating true Hi detections from false positives when used in combination with the source finding application output candidate catalogues. Leveraging the spatial and depth capabilities of 3D convolutional neural networks, our method is specifically designed to recognize patterns and features in three-dimensional space, making it uniquely suited for rejecting false-positive sources in low SNR scenarios generated by conventional linear methods. As a result, our approach is significantly more accurate in source detection and results in considerably fewer false detections compared to previous linear statistics-based source finding algorithms. Performance tests using mock galaxies injected into real ASKAP data cubes reveal our method’s capability to achieve near-100% completeness and reliability at a relatively low integrated SNR $\sim3-5$. An at-scale version of this tool will greatly maximise the science output from the upcoming widefield Hi surveys.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press on behalf of Astronomical Society of Australia
Figure 0

Figure 1. Integration of Machine Learning into SoFiA Workflow. On the left, the diagram depicts the comprehensive workflow of SoFiA, within which the right segment illustrates our integrated machine learning approach. The right-hand section details the machine learning pipeline, starting from the HI Input derived from SoFiA’s process, proceeding through Data Preprocessing, detailing the feature map extraction strategy, outlining the Optimization Objective, showcasing the Classifier stage, and culminating in the Output Results. This visualisation demonstrates how our machine learning methodology fits into and enhances the existing SoFiA workflow.

Figure 1

Figure 2. Residual block. Shortcut connections bypass a signal from the top of the block to the tail. Signals are summed at the tail.

Figure 2

Table 1. The network architecture of the 3D ResNet model used in this work. Each convolutional layer is followed by batch normalization and ReLU. Downsampling is performed by conv3_1, conv4_1, conv5_1 with a stride of 2.

Figure 3

Figure 3. Based on the input WALLABY image WALLABY J133032-211729, examples of derived datasets. The panels are HI contours overlaid on optical image (Top left), HI contours overlaid on multiwavelength image (top right), velocity map showing the galaxy rotation (middle right), pixel-by-pixel SNR maps (bottom right), spectra without noise (bottom left).

Figure 4

Table 2. Performance metrics of our method on mock galaxy dataset

Figure 5

Figure 4. The confusion matrix illustrates the model’s performance in classifying data as either ‘Galaxy’ or ‘Noise.’

Figure 6

Figure 5. Histogram of detected (blue) and undetected (orange) mock galaxies and the completeness (black) as a function of SNR, demonstrating that the model is able to achieve 100 percent completeness at SNR$\gtrsim$2.

Figure 7

Figure 6. The distribution of the SNR in the dataset that consists of 5,889 potential subjects selected from DR2.

Figure 8

Figure 7. Learning curves monitor the change of training (blue curve) and validation (orange curve) accuracies (Y-axis) as the training progresses by number of iterations (X-axis).

Figure 9

Table 3. Comparative performance metrics of ResNet architectures on SoFiA output data.

Figure 10

Figure 8. Confusion Matrix showcasing the performance of our model on real astronomical data. The matrix quantifies the model’s ability to distinguish between actual galaxies and noise/artifacts, reflecting the real-world complexities such as lower SNR and the presence of artifacts.

Figure 11

Figure 9. Histogram of detected (blue), undetected (orange) real galaxies, and the completeness (black) as a function of SNR.

Figure 12

Figure 10. Panels (a), (b) and (c) show the distribution of integrate flux, peak flux and RMS flux for the DR2 SoFiA candidate list (small yellow open circles) and the twelve false negative sources missed by our model (large blue filled diamonds).

Figure 13

Figure 11. The relationship between the candidate lists and new sources found.

Figure 14

Figure 12. Hi sources identified by our model that are not catalogued in the default 30-arcsec WALLABY DR2 catalogue. The left column shows the Hi moment zero column density maps as magenta contours overlaid on g-band images from the Legacy Survey. The higher-density regions are closer to the centre. The synthesised beam is shown in the bottom left corner of each moment zero map. The right column shows the integrated Hi spectrum for each source.

Figure 15

Table 4. Properties of the additional Hi sources. Col (1): Name of the source; Col (2): Optical ID of the associated galaxy; Col (3): Right Ascension (RA) centre of the Hi emission; Col (4): Declination (Dec) centre of the Hi emission; Col (5): Central Hi velocity (in optical convention); Col (6): Integrated Hi flux; Col (7): Width of Hi emission line at full-width-half-maximum; Col (8): Comments about the source including the optical identity of the source.