Hostname: page-component-89b8bd64d-nlwjb Total loading time: 0 Render date: 2026-05-11T05:57:30.182Z Has data issue: false hasContentIssue false

EMUSE: Evolutionary Map of the Universe Search Engine

Published online by Cambridge University Press:  01 July 2025

Nikhel Gupta*
Affiliation:
Australia Telescope National Facility, CSIRO, Space & Astronomy, Bentley, WA, Australia
Zeeshan Hayder
Affiliation:
CSIRO Data61, Black Mountain, ACT, Australia
Minh Huynh
Affiliation:
Australia Telescope National Facility, CSIRO, Space & Astronomy, Bentley, WA, Australia International Centre for Radio Astronomy Research (ICRAR), M468, The University of Western Australia, Crawley, WA, Australia
Ray Norris
Affiliation:
Western Sydney University, Penrith, NSW, Australia Australia Telescope National Facility, CSIRO Space & Astronomy, Epping, NSW, Australia
Lars Petersson
Affiliation:
CSIRO Data61, Black Mountain, ACT, Australia
Andrew Hopkins
Affiliation:
School of Mathematical and Physical Sciences, 12 Wally’s Walk, Macquarie University, Sydney, NSW, Australia
Simone Riggi
Affiliation:
INAF-Osservatorio Astrofisico di Catania, Catania, Italy
Bärbel Silvia Koribalski
Affiliation:
Western Sydney University, Penrith, NSW, Australia Australia Telescope National Facility, CSIRO Space & Astronomy, Epping, NSW, Australia
Miroslav D. Filipović
Affiliation:
Western Sydney University, Penrith, NSW, Australia
*
Author for correspondence: Nikhel Gupta, Email: Nikhel.Gupta@csiro.au.
Rights & Permissions [Opens in a new window]

Abstract

We present Evolutionary Map of the Universe Search Engine (EMUSE), a tool designed for searching specific radio sources within the extensive datasets of the Evolutionary Map of the Universe (EMU) survey, with potential applications to other Big Data challenges in astronomy. Built on a multimodal approach to radio source classification and retrieval, EMUSE fine-tunes the OpenCLIP model on curated radio galaxy datasets. Leveraging the power of foundation models, our work integrates visual and textual embeddings to enable efficient and flexible searches within large radio astronomical datasets. We fine-tune OpenCLIP using a dataset of 2 900 radio galaxies, encompassing various morphological classes, including FR-I, FR-II, FR-x, R-type, and other rare and peculiar sources. The model is optimised using adapter-based fine-tuning, ensuring computational efficiency while capturing the unique characteristics of radio sources. The fine-tuned model is then deployed in the EMUSE, allowing for seamless image and text-based queries over the EMU survey dataset. Our results demonstrate the model’s effectiveness in retrieving and classifying radio sources, particularly in recognising distinct morphological features. However, challenges remain in identifying rare or previously unseen radio sources, highlighting the need for expanded datasets and continuous refinement. This study showcases the potential of multimodal machine learning in radio astronomy, paving the way for more scalable and accurate search tools in the field. The search engine is accessible at https://askap-emuse.streamlit.app/ and can be used locally by cloning the repository at https://github.com/Nikhel1/EMUSE.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press on behalf of Astronomical Society of Australia
Figure 0

Figure 1. Overview of EMUSE (Evolutionary Map of the Universe Search Engine). Starting with the open-source OpenCLIP model, which is pre-trained on approximately 2.3 billion image-text pairs from the LAION dataset, we further fine-tuned it using an image-text dataset of extended radio sources in the EMU-PS1 survey. The fine-tuned model is then used to generate image embeddings of EMU sources based on PNG images from the EMU and AllWISE surveys at the positions of extended radio sources identified in the RG-CAT catalogue. The fine-tuned model, along with the generated image embeddings and catalogue metadata – which includes sky position, integrated flux, and host galaxy information – is integrated into the EMUSE application framework to retrieve similar sources. EMUSE facilitates the search of the embedding database and outputs a table of EMU survey radio sources that are similar to a given image or text prompt. The search engine is accessible at https://askap-emuse.streamlit.app/ and can be used locally by cloning https://github.com/Nikhel1/EMUSE.

Figure 1

Figure 2. Model accuracy evaluated on the test set after each epoch. Error bars represent the variance, calculated by fine-tuning and testing the model 10 times with randomly drawn training and test sets.

Figure 2

Figure 3. The top panel shows the confusion matrix comparing ground truth labels to predicted labels for each main category. The displayed values are averaged over 10 training iterations. The bottom panel shows the UMAP projection generated from the image embeddings produced by the model’s image encoder, illustrating that different ground truth categories cluster in distinct regions. The plotted points include test sets from all 10 training iterations.

Figure 3

Figure 4. Example image queries for EMUSE. These figures are screenshots from the EMU-PS1 image, taken while being viewed in CARTA. The left panel shows an FR-II radio galaxy, while the right panel displays ORC J2103-6200 (Norris et al. 2021b).

Figure 4

Table A1. Top-50 EMUSE output for text query, ‘A bent-tailed radio galaxy’.

Figure 5

Figure A1. Top-50 EMUSE output for the text query, ‘A bent-tailed radio galaxy’. Positions in Table 1 are used here for $5^{\prime}\times5^{\prime}$ cutout images with radio-radio-infrared (RGB) channels.

Figure 6

Table A2. Top-50 EMUSE output for text query, ‘Resolved star forming radio galaxy’.

Figure 7

Figure A2. Top-50 EMUSE output for the text query, ‘Resolved star forming radio galaxy’. Positions in Table 2 are used here for $5^{\prime}\times5^{\prime}$ cutout images with radio-radio-infrared channels.

Figure 8

Table A3. Top-50 EMUSE output for image query shown on the left panel of Figure 4.

Figure 9

Figure A3. Top-50 EMUSE output for image query shown on the left panel of Figure 4. Positions in Table 3 are used here for $5^{\prime}\times5^{\prime}$ cutout images with radio-radio-infrared channels.

Figure 10

Table A4. Top-50 EMUSE output for image query shown on the right panel of Figure 4.

Figure 11

Table A5. Examples of the expanded text descriptions for the main radio source classes. These, along with similar variations based on subcategories and special features, are used to fine-tune the OpenCLIP model.

Figure 12

Figure A4. Top-50 EMUSE output for image query shown on the right panel of Figure 4. Positions in Table 4 are used here for $5^{\prime}\times5^{\prime}$ cutout images with radio-radio-infrared channels.