Hostname: page-component-89b8bd64d-b5k59 Total loading time: 0 Render date: 2026-05-10T16:09:27.189Z Has data issue: false hasContentIssue false

RG-CAT: Detection pipeline and catalogue of radio galaxies in the EMU pilot survey

Published online by Cambridge University Press:  01 April 2024

Nikhel Gupta*
Affiliation:
CSIRO Space & Astronomy, Bentley, WA, Australia
Ray P. Norris
Affiliation:
Western Sydney University, Penrith, NSW, Australia CSIRO Space & Astronomy, Epping, NSW, Australia
Zeeshan Hayder
Affiliation:
CSIRO Data61, Black Mountain, ACT, Australia
Minh Huynh
Affiliation:
CSIRO Space & Astronomy, Bentley, WA, Australia International Centre for Radio Astronomy Research, The University of Western Australia, Crawley, WA, Australia
Lars Petersson
Affiliation:
CSIRO Data61, Black Mountain, ACT, Australia
X. Rosalind Wang
Affiliation:
Western Sydney University, Penrith, NSW, Australia
Andrew M. Hopkins
Affiliation:
Western Sydney University, Penrith, NSW, Australia School of Mathematical and Physical Sciences, Macquarie University, Sydney, NSW, Australia
Heinz Andernach
Affiliation:
Thüringer Landessternwarte, Tautenburg, Germany Departamento de Astronomía, Universidad de Guanajuato, Guanajuato, GTO, Mexico
Yjan Gordon
Affiliation:
Department of Physics, University of Wisconsin-Madison, Madison, WI, USA
Simone Riggi
Affiliation:
INAF-Osservatorio Astrofisico di Catania, Catania, Italy
Miranda Yew
Affiliation:
Western Sydney University, Penrith, NSW, Australia
Evan J. Crawford
Affiliation:
Western Sydney University, Penrith, NSW, Australia
Bärbel Koribalski
Affiliation:
Western Sydney University, Penrith, NSW, Australia CSIRO Space & Astronomy, Epping, NSW, Australia
Miroslav D. Filipović
Affiliation:
Western Sydney University, Penrith, NSW, Australia
Anna D. Kapińska
Affiliation:
National Radio Astronomy Observatory, Socorro, NM, USA
Stanislav Shabala
Affiliation:
School of Natural Sciences, University of Tasmania, Hobart, Australia
Tessa Vernstrom
Affiliation:
CSIRO Space & Astronomy, Bentley, WA, Australia International Centre for Radio Astronomy Research, The University of Western Australia, Crawley, WA, Australia
Joshua R. Marvil
Affiliation:
INAF-Osservatorio Astrofisico di Catania, Catania, Italy
*
Corresponding author: Nikhel Gupta; Email: Nikhel.Gupta@csiro.au
Rights & Permissions [Opens in a new window]

Abstract

We present source detection and catalogue construction pipelines to build the first catalogue of radio galaxies from the 270 $\rm deg^2$ pilot survey of the Evolutionary Map of the Universe (EMU-PS) conducted with the Australian Square Kilometre Array Pathfinder (ASKAP) telescope. The detection pipeline uses Gal-DINO computer vision networks (Gupta et al. 2024, PASA, 41, e001) to predict the categories of radio morphology and bounding boxes for radio sources, as well as their potential infrared host positions. The Gal-DINO network is trained and evaluated on approximately 5 000 visually inspected radio galaxies and their infrared hosts, encompassing both compact and extended radio morphologies. We find that the Intersection over Union (IoU) for the predicted and ground-truth bounding boxes is larger than 0.5 for 99% of the radio sources, and 98% of predicted host positions are within $3^{\prime \prime}$ of the ground-truth infrared host in the evaluation set. The catalogue construction pipeline uses the predictions of the trained network on the radio and infrared image cutouts based on the catalogue of radio components identified using the Selavy source finder algorithm. Confidence scores of the predictions are then used to prioritise Selavy components with higher scores and incorporate them first into the catalogue. This results in identifications for a total of 211 625 radio sources, with 201 211 classified as compact and unresolved. The remaining 10 414 are categorised as extended radio morphologies, including 582 FR-I, 5 602 FR-II, 1 494 FR-x (uncertain whether FR-I or FR-II), 2 375 R (single-peak resolved) radio galaxies, and 361 with peculiar and other rare morphologies. Each source in the catalogue includes a confidence score. We cross-match the radio sources in the catalogue with the infrared and optical catalogues, finding infrared cross-matches for 73% and photometric redshifts for 36% of the radio galaxies. The EMU-PS catalogue and the detection pipelines presented here will be used towards constructing catalogues for the main EMU survey covering the full southern sky.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2024. Published by Cambridge University Press on behalf of Astronomical Society of Australia
Figure 0

Figure 1. Examples of the radio (left panels) and corresponding infrared (right panels) images, as described by the column titles. Each of these images has a frame size of $8^{\prime} \times 8^{\prime}$ in the sky ($240 \times 240$ pixels). In the radio images, we display classes and bounding boxes for radio galaxies encapsulating all their components. Here, the ‘FR-X’ type is positioned between the FR-I and FR-II categories, ‘R’ denotes resolved radio sources with one visible peak, ‘C’ represents compact unresolved radio sources, and ‘Pec’ refers to peculiar or other rare radio morphologies (see Section 2.4 for details). On the infrared images, circles indicate the positions of host galaxies.

Figure 1

Figure 2. Shown are the dataset split distributions, depicting the distributions of extended radio galaxies in a single cutout (left), their respective categories (middle), and the occupied area per radio galaxy (A; right). The tables below the figures provide detailed counts of radio morphologies in the training, validation, and test sets. See Section 2.5 for more details. Note that each radio morphology has a corresponding infrared host, so the counts here also represent the number of corresponding infrared hosts.

Figure 2

Figure 3. Shown is an example of an 8-channel image used for the training and evaluation of the Gal-DINO network. The first 7 channels contain data from the radio FITS file, representing the extended radio galaxy with clipping between the 50th percentile level and 7 specific maxima, corresponding to the 95th, 99th, 99.2th, 99.5th, 99.7th, 99.9th, and 99.99th percentile levels. The 8th channel, in the bottom right, displays the corresponding pre-processed infrared image. The bounding box and keypoint annotations are not depicted here for brevity; examples of these annotations are shown in Fig. 1 on the radio (with maxima at the 95th percentile level) and infrared images, respectively.

Figure 3

Table 1. Results for bounding box and keypoint detection using the trained Gal-DINO network are presented on a combination of the test and validation datasets with 8-channel images (see Fig. 3). The columns, from left to right, showcase various metric types: average precision for IoU (or OKS) thresholds ranging from 0.50 to 0.95 (AP), a specific IoU (or OKS) threshold of 0.5 (AP$_{50}$), IoU (or OKS) threshold of 0.75 (AP$_{75}$), and average precision for small-sized (AP$_{\mathrm{S}}$), medium-sized (AP$_{\mathrm{M}}$), and large-sized (AP$_{\mathrm{L}}$) radio galaxies. Further details on the training and evaluation can be found in Sections 3.3 and 3.4, respectively.

Figure 4

Figure 4. Presented is the normalised confusion matrix for the Gal-DINO detection model. Each matrix is normalised based on the total number of galaxies within its corresponding class. The diagonal entries denote true positive (TP) instances, representing objects correctly detected with an IoU and OKS threshold surpassing 0.5 compared to the ground-truth instances, and a confidence threshold of 0.25. False positive (FP) instances correspond to model detections lacking corresponding ground-truth instances, while false negative (FN) instances signify objects that the model failed to detect at the same IoU and OKS thresholds, along with a confidence threshold of 0.25.

Figure 5

Table 2. The bounding box and keypoint detection results achieved through the Gal-DINO network on a merged dataset comprising both test and validation sets. The columns correspond to those outlined in Table 1. The PNG results reflect outcomes obtained from 3-channel images, while the All-10% results signify a scenario where 10% of the entire training dataset is intentionally corrupted, aiming to assess the model’s robustness to potential noisy labels. The Ext-10% results specifically involve introducing 10% noise in annotations exclusively for extended radio galaxies within the training dataset (see Section 3 for details).

Figure 6

Figure 5. An overview of the catalogue construction pipeline. The process initiates with obtaining predictions from the Gal-DINO model for all radio and infrared cutouts centred at the components in the Selavy catalogue. Subsequently, a dictionary of predictions is generated for the central sources within these cutouts. The consolidated catalogue is then formed by calibrating the confidence scores in the dictionary, organising them in descending order, and systematically consolidating and removing entries from the Selavy catalogue based on decreasing score values. We refer the readers to Section 4 for further details.

Figure 7

Figure 6. Uncalibrated and calibrated scores for combined validation and test sets: (a) depicts the cumulative score and probability with the fractile. In (b), the same data is shown, but with a distorted horizontal axis, resulting in a cumulative score graph forming a straight line. This is visualised through scatter plots of cumulative (score, score) in blue and (score, probability) in orange. In the case of a perfectly calibrated network, the probability line will coincide as a straight line with the (score, score) line. (c) and (d) showcase plots of non-cumulative scores and probabilities against the fractile or score. The upper panel demonstrates that the network significantly overestimates the probability of detection based on the score with a KS error of 7.3%. The calibration reduces the KS error to 4.2%. See Section 4.4 for more details.

Figure 8

Figure 7. Shown is a potential scenario in which an extended radio galaxy is accompanied by two adjacent compact radio galaxies. Blue circles represent all Selavy catalogue components. The extended radio galaxy has a confidence score of 0.5, but the two compact radio galaxies with higher scores (0.8 and 0.6) are consolidated first, removing them from the Selavy catalogue. Subsequently, only the three remaining components of the extended radio galaxy, encompassed by the biggest bounding box with a score of 0.5, are consolidated. As a result, the consolidated catalogue registers three radio galaxies in this particular case.

Figure 9

Table 3. Description of columns in the catalogue (best viewed in a PDF reader).

Figure 10

Figure 8. An example where the larger box prediction, scoring 0.48, is deemed redundant. The existence of smaller boxes within implies that these contain three compact radio galaxies, each with higher scores, rather than a single, bent FR-II radio galaxy within the larger bounding box.

Figure 11

Figure 9. The top panel depicts the distributions of all radio galaxies (solid bars) and subsets categorised by classification types (coloured contours) based on prediction confidence scores. The bottom panel displays the distribution across different types as predicted by the Gal-DINO model above and below scores of 0.5. Approximately 99.1% of the radio galaxies in our consolidated catalogue have a score larger than 0.5. Notably, among the galaxy types, FR-II exhibits the largest fraction of galaxies with scores below 0.5; for further insights, refer to Section 5.1.

Figure 12

Figure 10. Examples of the catalogue galaxies overlaid on radio images, showcasing their host positions, bounding boxes, and classifications derived from both the Gal-DINO and catalogue construction pipelines. Blue rectangles and accompanying text denote the bounding boxes and classification types with confidence scores for the extended radio galaxies. The positions of these extended radio galaxies are marked by blue circles. For brevity, we omit the presentation of bounding boxes for compact radio galaxies, which are solely indicated by blue circles.

Figure 13

Figure 11. Displayed are the numbers of DES, DESI, and SCos counterparts within 2$^{\prime \prime}$ of the CatWISE position (top panel). The photometric redshift distributions are based on the DESI legacy surveys for compact and extended radio galaxy counterparts (bottom panel). Both these plots include galaxies with a Gal-DINO confidence score greater than 0.5.

Figure 14

Figure 12. The distributions of multiwavelength counterparts for compact and extended radio galaxies, as well as FR-I- and FR-II-type radio galaxies, are shown in three panels. The top panel displays contours for radio luminosity and CatWISE magnitudes, the middle panel illustrates the infrared colour-colour plot for AllWISE counterparts, emphasising the dominance of AGN above an integrated radio flux of 5 mJy. The bottom panel exhibits the colour-colour plot for DES radio galaxy counterparts. All plots include galaxies with a Gal-DINO confidence score greater than 0.5, and contours describe 5, 25, 50, 75, and 95 percentile levels. Further details and discussion can be found in Section 5.3.

Figure 15

Table 4. Number of CatWISE, DES, DESI and SCos counterparts for radio galaxies. The cross-matching process involves CatWISE sources matched with radio galaxies within a 3$^{\prime \prime}$ search radius while cross-matching with DES, DESI, and SCos catalogues utilises these CatWISE sources within a 2$^{\prime \prime}$ radius. The upper and lower section of the table presents the number of galaxies and the percentage of cross-matches with confidence scores exceeding 0.33 and 0.5, respectively. The numbers and percentages refer to the sources remaining after applying all of the previous criteria. The columns display these statistics for all, compact, and extended radio galaxies.

Figure 16

Table A1. First few rows of the consolidated catalogue, excluding the remaining columns described in Table 3 for brevity.