Hostname: page-component-77f85d65b8-pztms Total loading time: 0 Render date: 2026-04-18T08:24:15.132Z Has data issue: false hasContentIssue false

High-time resolution GPU imager for FRB searches at low radio frequencies

Published online by Cambridge University Press:  18 September 2024

Marcin Sokolowski*
Affiliation:
International Centre for Radio Astronomy Research, Curtin University, Bentley, WA, Australia
Gayatri Aniruddha
Affiliation:
International Centre for Radio Astronomy Research, Curtin University, Bentley, WA, Australia
Cristian Di Pietrantonio
Affiliation:
Curtin University, Pawsey Supercomputing Research Centre, Kensington, WA, Australia
Chris Harris
Affiliation:
Pawsey Supercomputing Research Centre, Kensington, WA, Australia
Danny Price
Affiliation:
International Centre for Radio Astronomy Research, Curtin University, Bentley, WA, Australia Square Kilometre Array Observatory (SKAO), Kensington, WA, Australia
Samuel McSweeney
Affiliation:
International Centre for Radio Astronomy Research, Curtin University, Bentley, WA, Australia
Randall Bruce Wayth
Affiliation:
International Centre for Radio Astronomy Research, Curtin University, Bentley, WA, Australia
Ramesh Bhat
Affiliation:
International Centre for Radio Astronomy Research, Curtin University, Bentley, WA, Australia
*
Corresponding author: Marcin Sokolowski; Email: marcin.sokolowski@curtin.edu.au
Rights & Permissions [Opens in a new window]

Abstract

Fast Radio Bursts (FRBs) are millisecond dispersed radio pulses of predominately extra-galactic origin. Although originally discovered at GHz frequencies, most FRBs have been detected between 400 and 800 MHz. Nevertheless, only a handful of FRBs were detected at radio frequencies $\le$400 MHz. Searching for FRBs at low frequencies is computationally challenging due to increased dispersive delay that must be accounted for. Nevertheless, the wide field of view (FoV) of low-frequency telescopes – such as the the Murchison Widefield Array (MWA), and prototype stations of the low-frequency Square Kilometre Array (SKA-Low) – makes them promising instruments to open a low-frequency window on FRB event rates, and constrain FRB emission models. The standard approach, inherited from high-frequencies, is to form multiple tied-array beams to tessellate the entire FoV and perform the search on the resulting time series. This approach, however, may not be optimal for low-frequency interferometers due to their large FoVs and high spatial resolutions leading to a large number of beams. Consequently, there are regions of parameter space in terms of number of antennas and resolution elements (pixels) where interferometric imaging is computationally more efficient. Here we present a new high-time resolution imager BLINK implemented on modern graphical processing units (GPUs) and intended for radio astronomy data. The main goal for this imager is to become part of a fully GPU-accelerated FRB search pipeline. We describe the imager and present its verification on real and simulated data processed to form all-sky and widefield images from the MWA and prototype SKA-Low stations. We also present and compare benchmarks of the GPU and CPU code executed on laptops, desktop computers, and Australian supercomputers. The code is publicly available at https://github.com/PaCER-BLINK-Project/imager and can be applied to data from any radio telescope.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - SA
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike licence (http://creativecommons.org/licenses/by-nc-sa/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the same Creative Commons licence is used to distribute the re-used or adapted article and the original article is properly cited. The written permission of Cambridge University Press must be obtained prior to any commercial use.
Copyright
© The Author(s), 2024. Published by Cambridge University Press on behalf of Astronomical Society of Australia
Figure 0

Table 1. Summary of parameters of the MWA and SKA-Low stations.

Figure 1

Table 2. Different GPU architectures used for testing and benchmarking of the presented imager.

Figure 2

Table 3. The summary of the theoretical costs of the main steps of imaging and beamforming. $N_{ant}$ is the number of antennas (or MWA tiles), $N_{px}$ is 1D dimensional number of resolution elements (pixels). Hence, for square images the total number of pixels is $N_{px}^2$. $N_{kern}$ is the size (total number of pixels) of the convolving kernel. It can be seen that computational cost of correlation and gridding is independent of the image size. It depends only on the number of visibilities to be gridded, which equals number of baselines $N_b$ directly related to the number of antennas as $N_b = {1}/{2} N_{ant} (N_{ant} - 1)$ (excluding auto-corrrelations). The total computational cost is dominated by the FFT, and is directly related to the total number of pixels in the final sky images. For the presented image sizes, the imaging requires a few (${\sim}$5–8) times less operations than beamforming for 128 antennas (MWA) and even order of magnitude less (${\sim}$ 11–15 times) for 256 antennas (SKA-Low stations). This is because the total cost of both is dominated by the component ${\sim} \alpha N_{px}^2$, where $\alpha=N_{ant}$ for beamforming and $\alpha=\log(N_{px}^2)$ for imaging ($\log$ is the logarithm to the base 2). Hence, for a given image size $N_{px}^2$ beamforming dominates when $N_{ant} \ge \log(N_{px}^2)$. For example, for image size $180\times 180$ beamforming dominates when $N_{ant} \gtrsim$15.

Figure 3

Figure 1. Ratio of theoretical compute cost of beamforming (label BF) to imaging (label IMG) as a function of number of antennas ($N_{ant}$) and number of pixels in 1D ($N_{px}$) based on the total cost equations in the Table 3. The black curves superimposed on the graph indicate where the ratio of theoretical cost of beamforming and imaging is of the order of one (dashed line), 5 (dashed-dotted line) and 10 (solid line). The plot shows that for 128 and 256 antennas (MWA and SKA-Low station respectively) the number of operations in beamforming is respectively at least around 5 and 10 times larger than in imaging indicating that standard visibility-based imaging should be more efficient than beamforming in these regions of parameter space.

Figure 4

Figure 2. A comparison of the all-sky images from EDA2 visibilities at 160 MHz recorded on 2021-11-16 20:30 UTC. Left: Image produced with MIRIAD. Centre: Image produced with CASA. Right: Image produced with BLINK imager presented in this paper. All are $180\times 180$ pixels dirty images in natural weighting. The images are not expected to be the same because MIRIAD and CASA apply a gridding kernel, which has not been implemented in the BLINK imager yet (as discussed in Section 5.1). Therefore, the differences between the BLINK and MIRIAD/CASA images are of the order of 10–20%.

Figure 5

Figure 3. Sky images in X polarisation produced from EDA2 visibilities at 160 MHz generated in MIRIAD and using frequency-scaled ‘HASLAM map’ as a sky model. The image corresponds to real EDA2 data recorded on 2023-02-05 03:46:18 UTC. Left: Image generated with MIRIAD task invert. Right: Image generated with the CPU version of the BLINK imager presented in this paper.

Figure 6

Figure 4. Sky images in X polarisation generated from MWA visibilities at 154 MHz simulated in CASA based on the model image of Hydra-A radio galaxy. Left: Image generated with WSCLEAN. Right: Image generated with the CPU version of the BLINK imager presented in this paper.

Figure 7

Figure 5. Block diagram of the main steps in the GPU version of the imager for multiple frequency channels.

Figure 8

Figure 6. A diagram of the gridding and imaging of multiple channels ($N_{ch}$) performed on GPU. The input visibilities are gridded on a 2D UV grid ($N_{px}$, $N_{px}$) represented as a 1D array m_in_buffer_gpu of complex floats. This is realised by a gridding kernel executed as shown in the listing in Figure 7. This 1D array is used as the input to gpufftPlanMany and gpufftExecC2C functions to calculate FFT of all UV grids (for different frequency channels), and the final sky images can be obtained as real part of the complex images resulting from the FFT.

Figure 9

Figure 7. Listing of the GPU gridding code for multiple frequency channels. The GPU gridding kernel gridding_imaging_lfile_vis_blocks is called in the source file pacer_imager_multi_hip.cpp, while the GPU gridding kernel itself is implemented in the source files gridding_multi_image_cuda.h(cpp).

Figure 10

Table 4. A comparison of the execution times of gridding, imaging and both (total) for CPU and GPU implementations of the imager for a single time-stamp and frequency channel, tested for various image sizes using EDA2 and MWA data. The times were measured using function std::chrono::high_resolution_clock::now() on several different systems and GPUs used for during the development stage of the project (Table 2).

Figure 11

Table 5. Results of benchmarking of multi-channel version of the code using CPU and two variations of GPU code. All tests were executed on Setonix on 1 min of EDA2 visibilities in 100 ms time resolution and 32 fine channels. The times in this table are mean and standard deviation of processing time of a single integration with 32 fine channels calculated based on 600 integrations.

Figure 12

Figure 8. The execution time of gridding as a function of number of images using GPU streams (square points/blue fitted curve) and layers of thread-blocks per image (triangle points/green fitted curve). The black data points and fitted curve are for CPU version of gridding, which confirms that it is between 1 and 2 orders of magnitude slower than its GPU implementation. The fitted parameters intercept (in ms) and slope (execution time per image) in ms are summarised in Table 6.

Figure 13

Figure 9. The execution time of 2D FFT using gpufftw (hipfft on Setonix) shown as black data points and fitted curve, and fftw library (blue data points and fitted curve). The black curve is about 1000 times lower, which means 2D FFT performed on GPU is about 3 orders of magnitude faster than its CPU counterpart. The fitted parameters, including the fitted slopes representing execution time per single image in ms, are summarised in Table 6. Based on the fitted slopes, it takes about 39.4 ms per image in the CPU version vs. only 0.053 ms (53 usec) per image in the GPU version.

Figure 14

Table 6. Parameters of straight lines fitted to execution times of gridding and 2D FFT as a function of number of images (Figs. 8 and 9 respectively. The fitted slope corresponds to execution time per single image. Therefore, the slopes are larger for larger images. It can be seen that GPU gridding is about 2 orders of magnitude faster than CPU version for image sizes 180 and 1 024, and about 3 orders of magnitude faster for the largest images (4 096 pixels). Similarly, for the case of 2D FFT using GPU functions using hipfft library on Setonix when compared to FFTW CPU library. 2D FFT using fftw library led to very similar results when performed in 1 and 15 CPU threads. The optimal GPU version uses parallelisation of multiple frequency channels processing with GPU layers (multiple blocks of GPU threads), and is about 2 times faster than parallelisation implemented with GPU Streams.