Hostname: page-component-89b8bd64d-72crv Total loading time: 0 Render date: 2026-05-12T02:24:00.643Z Has data issue: false hasContentIssue false

PUMA: The Positional Update and Matching Algorithm

Published online by Cambridge University Press:  10 January 2017

J. L. B. Line*
Affiliation:
School of Physics, The University of Melbourne, Parkville, VIC 3010, Australia ARC Centre of Excellence for All-sky Astrophysics (CAASTRO)
R. L. Webster
Affiliation:
School of Physics, The University of Melbourne, Parkville, VIC 3010, Australia ARC Centre of Excellence for All-sky Astrophysics (CAASTRO)
B. Pindor
Affiliation:
School of Physics, The University of Melbourne, Parkville, VIC 3010, Australia ARC Centre of Excellence for All-sky Astrophysics (CAASTRO)
D. A. Mitchell
Affiliation:
ARC Centre of Excellence for All-sky Astrophysics (CAASTRO) CSIRO Astronomy and Space Science (CASS), Marsfield, NSW 2122, Australia
C. M. Trott
Affiliation:
ARC Centre of Excellence for All-sky Astrophysics (CAASTRO) International Centre for Radio Astronomy Research, Curtin University, Bentley, WA 6102, Australia
Rights & Permissions [Opens in a new window]

Abstract

We present new software to cross-match low-frequency radio catalogues: the Positional Update and Matching Algorithm. The Positional Update and Matching Algorithm combines a positional Bayesian probabilistic approach with spectral matching criteria, allowing for confusing sources in the matching process. We go on to create a radio sky model using Positional Update and Matching Algorithm based on the Murchison Widefield Array Commissioning Survey, and are able to automatically cross-match ~ 98.5% of sources. Using the characteristics of this sky model, we create simple simulated mock catalogues on which to test the Positional Update and Matching Algorithm, and find that Positional Update and Matching Algorithm can reliably find the correct spectral indices of sources, along with being able to recover ionospheric offsets. Finally, we use this sky model to calibrate and remove foreground sources from simulated interferometric data, generated using OSKAR (the Oxford University visibility generator). We demonstrate that there is a substantial improvement in foreground source removal when using higher frequency and higher resolution source positions, even when correcting positions by an average of 0.3 arcmin given a synthesised beam-width of ~ 2.3 arcmin.

Information

Type
Research Article
Copyright
Copyright © Astronomical Society of Australia 2017 
Figure 0

Figure 1. The steps and outcomes of the matching process are shown. Yellow boxes represent steps with no criteria applied, cyan represent criteria being applied, and all other colours represent final points. Each cyan box refers to a specific Algorithm, as detailed in Algorithms 1–5. The section labels on the right refer to Sections 3.1–3.3, which detail each step. Each section is performed by a separate script.

Figure 1

Figure 2. All sources in the VLSSr (Lane et al. 2014) catalogue are plotted. To calculate the source density of the catalogue, cross_match.py takes given RA and Dec bounds, and counts the number of sources within that area. In this example, the limits are represented by the cyan lines. It is left to the user to pick an area that will give a representative source density of the entire catalogue. For example, if too small an area, or a particularly under-dense area such as that at RA, δ = − 4h, 40°, is selected, an unrepresentative source density will be calculated.

Figure 2

Table 1. Details of the content of the final matched catalogue output by make_table.py

Figure 3

Figure 3. The overall sky coverage of each catalogue is shown in (a). Apart from MRC, all catalogues only partially cover the MWACS field, which is emphasised by the zoom in on the EoR0 field shown in (b). A contour plot shows the MWA primary beam at 180 MHz, with EoR0 at zenith. The first four grating side lobes are clearly visible outside the dashed circle which represents 2h from field centre.

Figure 4

Table 2. General characteristics of the base and cross-matching catalogues. The quoted beam widths for MWACS and MRC and indicative only, as they vary across the sky and with frequency.

Figure 5

Table 3. The settings used and matching statistics obtained when running PUMA on real data. The number of sources shows the number of base catalogue sources for each case, and the number of matches the instances where a match to at least one catalogue was found.

Figure 6

Figure 4. An example of an accepted isolated match. As there is only one possible combination of sources, and that combination has P1 > Pu, the cross-match combination is accepted without investigating the SED.

Figure 7

Figure 5. An example of an accepted dominant match. There are two NVSS sources well within the resolution of the base MWACS source. Given the positional error on the MWACS source, both cross-match combinations yield high positional probabilities. The SEDs of both cross-match combinations are investigated, and it is found that P1 > Pu, P2 < Pl as well as cross-match combination 1 having far lower residuals to a power law than cross-match combination 2. This results in match 1 being selected as the correct match.

Figure 8

Figure 6. An example of an accepted multiple match. In this example, both cross-match combinations 2 and 12 yield P > Pu and so there is no dominant match. Instead, all flux densities are combined, and the new SED tested with a power law fit. As the fit is deemed to be good, the source is accepted, and the weighted NVSS position (orange star) is used as the corrected position.

Figure 9

Figure 7. An example of an reject position match. Both SUMSS source lie outside of the resolution of the MWACS catalogue plus the positional error of the MWACS source. As P1, 2 < Pu, all cross-match combinations are deemed improbable and are rejected. Further investigation of cross-matches such as these are best diagnosed in conjunction with postage stamp images such as shown in Figure 11.

Figure 10

Figure 8. An example of an eyeball multiple match. Many cross-match combinations lie outside of the resolution plus error of the base MWACS source, with no dominant combination. A sum of the flux densities of the matched sources that passed the positional criteria yields a poor fit to a power law, and so the MWACS source is not accepted and labelled to eyeball. Again, further investigation of cross-matches such as these are best diagnosed in conjunction with postage stamp images such as shown in Figure 11.

Figure 11

Figure 9. (a) kernel density estimate of the SI distribution of each PUMA classification. The median and absolute median deviation of each distribution is quoted in the legend. (b) A histogram of the offsets of MWACS sources to either NVSS or SUMSS found by PUMA, including all match types except from eyeball and reject. We find similar positional offset behaviour from NVSS and SUMSS as is described in Hurley-Walker et al. (2014).

Figure 12

Figure 10. The positional offset found to either NVSS or SUMSS from either MWACS or MRC is shown. The edge of the MWACS field is clearly seen at δ = −15°. The positional agreement with MRC is excellent, most likely due to MRC only containing bright sources. As explained in Hurley-Walker et al. (2014), the positional offsets to MWACS vary with RA. The MWACS survey was taken over two declination strips, the effect of which appears to be visible in the plot, with the decrease in offset density at around δ = −37°. There are hints of an overall north-east offset in the upper declination strip; this is further investigated in Carroll et al. (2016). Coherent patches of positional offsets are consistent with a phase gradient introduced by ionospheric effects. As these would vary over a night, the offsets seen here could well be ionospheric.

Figure 13

Figure 11. An example of the matching process for extended sources is shown. The two upper left panels show all information given to PUMA. The bottom three panels show postage stamp images of the three matched catalogues, with the reported catalogues over-plotted, along with the MWACS source. In this case, the source reported in the VLSSr catalogue does not realistically match the VLSSr image. This artificially creates a curved SED, which causes PUMA to label this match an eyeball. Given the doubt cast on the VLSSr source, it is ignored in the cross-match, and the SUMSS and NVSS sources that seem positionally reasonable are combined and matched. This gives a realistic positional match as well as spectra.

Figure 14

Figure 12. A comparison of a real NVSS postage stamp image (upper panel) and a simulated NVSS postage stamp (central panel), created as described in Section 5.1.2. The same area of sky is also shown as simulated to mimic an MWACS postage stamp (lower panel).

Figure 15

Figure 13. The positional corrections (left column) and SI distributions (right column) derived by PUMA when matching mock catalogues, split in to isolated, dominant, and multiple (top, middle, and bottom rows, respectively). For every distribution, the median and median absolute deviation is quoted in the legend. In the left-hand column, the PUMA positional corrections found when using the PyBDSM mock MWACS positions (no offset) and with positional offsets added (with offset) are shown. The added positional offsets (injected offsets), as well as the PyBDSM reported errors are also plotted. In the right-hand column, the PUMA SI distributions are again shown for both the PyBDSM MWACS and the perturbed positions, compared to expected SI distribution as derived in Section 5.1.3, by finding the flux density from noiseless mock MWACS images. An SI distribution is also shown by performing a nearest neighbour match to the PyBDSM MWACS positions to within 90 arcsec. The PUMA classifications were taken from the match with the original PyBDSM MWACS positions; only matches which were accepted by both PUMA runs (offset and no offset) are plotted for a direct comparison.

Figure 16

Table 4. The matching classifications found by PUMA when matching the mock catalogues (No offset), along with the case where positional errors were introduced into the mock MWACS catalogue (With offset).

Figure 17

Figure 14. Four ‘dirty’ (with the synthesised interferometric beam still convolved with the sky brightness) naturally weighted images are shown for an integration of 64 s of data across the entire 30 MHz bandwidth. The left-hand column shows OSKAR simulations, with the right hand showing MWA data. The top row shows calibrated data, and the bottom data with the same 1 000 sources from the PUMA sky model created in Section 4 subtracted. (a) and (b) reveal the excellent agreement of the synthesised beam created by OSKAR and the real MWA. (c) and (d) reveal the biggest difference in the sky, that being the diffuse emission clearly visible in (d); diffuse emission is mostly due to synchrotron emission from cosmic rays interacting local with galactic field lines (Ginzburg & Syrovatsk 1969).

Figure 18

Figure 15. Two 2D power spectra are shown (left and centre), both created using the XX polarisation and the entire simulated hour of data. Each plot shows amplitude as a function of k-modes perpendicular to the line of sight (derived from angular scales on the sky, k) horizontally, and k-modes parallel to the line of sight (derived from frequency response, k) vertically. The plot on the left shows the power before source subtraction, and the centre after 1 000 sources have been subtracted. The plot on the right shows the difference plot of the 2D power spectra, with the 1 000 source spectra subtracted from the spectra without source subtraction. Blue in this case shows more power being present before source subtraction. The absolute scale shown here is not the most instructive part of these plots as an interferometer naturally measures variations about a mean; the relative power as a function of k-space however informs us where foreground power is being removed from.

Figure 19

Figure 16. Four difference PS are shown to contrast data processed with the PUMA source list to the MWACS source list: (a) Zenith pointing, PUMA source list—MWACS source list, XX polarisation; (b) Zenith pointing, PUMA source list—MWACS source list, YY polarisation; (c) Off-zenith pointing, PUMA source list—MWACS source list, XX polarisation; (d) Off-zenith pointing, PUMA source list—MWACS source list, YY polarisation. In each case, blue represents more power for data with exact positional source subtraction opposed to offset positional subtraction, and red less power.

Figure 20

Figure 17. Four difference PS for data processed using the PUMA source list only, each representing half an hour of data: (a) No peeling, zenith—off-zenith, XX polarisation; (b) No peeling, zenith—off-zenith, YY polarisation; (c) 1 000 sources peeled, zenith—off-zenith, XX polarisation; (d) 1 000 sources peeled, zenith—off-zenith, YY polarisation. In each case, blue represents more power in a zenith pointing, and red less power. The top row shows there is overall more power seen for the zenith pointing before source subtraction, and the bottom rows show there is overall less power after subtracting the 1 000 brightest sources. Again, the absolute value of the power is less important than the distribution of power throughout k-space.

Figure 21

Algorithm 1: Positional selection criteria for all cross-match combinations associated with a base source. Any catalogue with more than one match source is labelled as ‘repeated’. The algorithm accepts a combination if it is either likely, or if the repeated source is within the resolution of the base catalogue. The retained combinations are then investigated through Algorithm 3 and 4. Pu can be modified by the user. At all stages, statistics of the matching process are gathered to propagate through to the final matched catalogue.

Figure 22

Algorithm 2: Positional selection criteria for a single source cross-match. If there is only one combination possible, and it has a positional probability over a given threshold, it is accepted without scrutinising the spectral data. This avoids assuming any spectral model. If the match is below Pu, all matched sources are checked to be within the resolution of the base catalogue. As there was only one possible match, a high positional probability was expected, so a spectral test is applied. If the residuals ε, χ2red of a fit to a power law (as detailed in Section 3.3.1) are below a certain threshold χ2red, u, the source is accepted. At all stages, statistics of the matching process are gathered to propagate through to the final matched catalogue.

Figure 23

Algorithm 3: A test for spectral dominance. If one combination has residuals that are at least three times smaller than all other combinations, and is positionally likely whilst all other combinations are unlikely, accept the source. Positional and spectral dominance are required at the same time, to rule out chance alignment of sources with particular flux densities. Otherwise, the combinations are passed on to Algorithm 4. At all stages, statistics of the matching process are gathered to propagate through to the final matched catalogue.

Figure 24

Algorithm 4: A test for source combining. If no one combination passes Algorithm 3, try combing the flux densities from the sources from the same catalogue. If the combined flux densities pass a spectral test, create a new position for the combined source, weighting the RA and Dec of each source by its flux density. If splitting is implemented, pass to Algorithm 5. Otherwise accept the combined source. If the combination of flux densities does not pass, send the combinations to be investigated by eye. At all stages, statistics of the matching process are gathered to propagate through to the final matched catalogue.

Figure 25

Algorithm 5: A test for source splitting. If a source can be combined, but the components to be combined are separated by a distance larger than the user specified dsplit, the combination is tested for splitting. If more than one catalogue has repeated sources, the Algorithm requires they have the same amount of sources. Each set of repeated sources are then matched by distance to create components. An SED is constructed for each component, and fit to the linear model. If all components pass the spectral test, the cross-match combination is split up in to multiple cross-matched sources.

Figure 26

Figure B1. An exploration of the effects of parameter space for isolated and dominant classifications. The bottom right panel of (a) shows that χ2red and ε have no effect on the number of dominant sources; this is because dominance is established using a ratio of residuals, rather than a cut-off. (a) isolated cases. (b) dominant cases.

Figure 27

Figure B2. An exploration of the effects of parameter space for multiple and eyeball classifications. (a) multiple cases. (b) eyeball cases.

Figure 28

Figure B3. An exploration of the effects of parameter space for the reject classification and the median of the SI distribution. (a) reject cases. (b) The median SI.