Hostname: page-component-89b8bd64d-sd5qd Total loading time: 0 Render date: 2026-05-09T00:12:43.132Z Has data issue: false hasContentIssue false

Selecting variable sources with median colours using a self-organising map

Published online by Cambridge University Press:  16 December 2024

Thomas Venville*
Affiliation:
Research School of Astronomy and Astrophysics, Australian National University, Canberra 2611, A.C.T., Australia Centre for Astrophysics and Supercomputing, Swinburne University of Technology, PO Box 218, Hawthorn, Victoria, 3122, Australia
Peter L. Capak
Affiliation:
Cosmic Dawn Center (DAWN), Denmark
Andreas L. Faisst
Affiliation:
Caltech/IPAC, 1200 E. California Blvd. Pasadena, CA 91125, USA
Bomee Lee
Affiliation:
Caltech/IPAC, 1200 E. California Blvd. Pasadena, CA 91125, USA Korea Astronomy and Space Science Institute, 776 Daedeokdae-ro, Yuseong-gu, Daejeon 34055, Korea
Karun G. Thanjavur
Affiliation:
Department of Physics and Astronomy, University of Victoria, 3800 Finnerty Road, Victoria, BC V8P 5C2, Canada
Chris Flynn
Affiliation:
Centre for Astrophysics and Supercomputing, Swinburne University of Technology, PO Box 218, Hawthorn, Victoria, 3122, Australia ARC Centre of Excellence for Gravitational Wave Discovery (OzGrav), Mail H29, Swinburne University of Technology, PO Box 218, Hawthorn, VIC 3122, Australia
*
Corresponding author: Thomas Venville; Email: thomas.venville@anu.edu.au
Rights & Permissions [Opens in a new window]

Abstract

A key objective for upcoming surveys, and when re-analysing archival data, is the identification of variable stellar sources. However, the selection of these sources is often complicated by the unavailability of light curve data. Utilising a self-organising map (SOM), we demonstrate the selection of diverse variable source types from a catalogue of variable and non-variable SDSS Stripe 82 sources whilst employing only the median $u-g$, $g-r$, $r-i$, and $i-z$ photometric colours for each source as input, without using source magnitudes. This includes the separation of main sequence variable stars that are otherwise degenerate with non-variable sources ($u-g$,$g-r$) and ($r-i$,$i-z$) colour-spaces. We separate variable sources on the main sequence from all other variable and non-variable sources with a purity of $80.0\%$ and completeness of $25.1\%$, figures which can be modified depending on the application. We also explore the varying ability of the same method to simultaneously select other types of variable sources from the heterogeneous sample, including variable quasars and RR-Lyrae stars. The demonstrated ability of this method to select variable main sequence stars in colour-space holds promise for application in future survey reduction pipelines and for the analysis of archival data, where light curves may not be available or may be prohibitively expensive to obtain.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2024. Published by Cambridge University Press on behalf of Astronomical Society of Australia
Figure 0

Figure 1. The ($u-g$,$g-r$) colour-space distribution of the 425546 SDSS Stripe 82 standard star sources used in this study (as detailed in Section 2). Also shown for comparison with Fig. 2 are the six colour-space regions detailed for variable Stripe 82 sources in Sesar et al. (2007). $418\,772$ main sequence standard star sources are located in Region V, which contains $98.4\%$ of all standard star sources. The number and type of sources in the other regions is described in Section 2.

Figure 1

Figure 2. The ($u-g$,$g-r$) colour-space of the $67\,507$ variable SDSS Stripe 82 sources used in the sample used in this study (as detailed in Section 2), with the six colour-space regions detailed for Stripe 82 variable sources in Sesar et al. (2007) overlaid. 75% of all variable sources are located in Region V, which is overwhelmingly dominated by main sequence sources. Region II contains 8735 sources ($12.9\%$ of the total) and is dominated by low-redshift quasars. Region IV contains 2725 sources (4% of the total) and is dominated by RR-lyrae stars.

Figure 2

Figure 3. The ($r-i$,$i-z$) colour-space of the Stripe 82 variable and standard star sources used in this study (detailed in Section 2). It is clear that the vast majority of the variable sources (in blue contours) are entirely degenerate with the standard star sources in this colour-space, inhibiting straightforward colour-space selection. Degenerate sources in this ($r-i$,$i-z$) colour-space include the variable and standard star sources in the main sequence ‘Region V’; this is further illustrated in Fig. 5.

Figure 3

Figure 4. The $(u-g,g-r)$ colour-space contours of all variable (blue) and standard star (red) sources located in Region V of $(u-g,g-r)$ colour-space (see Figs. 2 and 1, respectively). It is clear that the vast majority of variable sources from this region are within in the red contours, illustrating the degeneracy between these variable and standard star sources in $(u-g,g-r)$ colour-space. As depicted in Fig. 5, these Region V sources are also largely degenerate in $(r-i,i-z)$ colour-space.

Figure 4

Figure 5. The $(r-i,i-z)$ colour-space contours of all variable (blue) and standard star (red) sources located in Region V of $(u-g,g-r)$ colour-space (see Figs. 2 and 1, respectively). Note that the outermost variable source density contour contains few variable sources. Accordingly, the vast majority of the variable Region V sources are within in the red contours and entirely degenerate in $(r-i,i-z)$ colour-space with the Region V standard star sources.

Figure 5

Figure 6. The two dimensional SOM representation of the complete four dimensional SDSS S82 colour-space sample. (a) displays the total number of sources in each cell of the SOM representation. This is equal to the sum of the number of variable sources (b) and the number of standard star sources (c) in each cell. It is evident that the variable and standard star sources generally inhabit different regions of the SOM representation. As discussed in Sections 3 and 4, this indicates that these four median photometric colours are sufficient for separating variable and standard star sources that are otherwise often degenerate in $(u-g,g-r)$ and $(r-i,i-z)$ colour-spaces. This is further emphasised by the the variable source purity $\mathcal{P}$ of each cell is displayed in (d).

Figure 6

Figure 7. (a) The number and (b) the purity $\mathcal{P}_V$ of of variable Region V sources upon the SOM representation depicted in Fig. 5. The cells containing variable Region V sources contain primarily variable Region V sources with few variable sources from other regions or standard star sources – crucially, including standard star sources otherwise degenerate in $(u-g,g-r)$ and $(r-i,i-z)$ colour-space (see Figs. 4 and 5). (c) The overall purity $\mathcal{P}_{\text{V}}$ (in blue) and completeness $\mathcal{R}_{\text{V}}$ (in black dashes) of variable Region V sources in each group of cells defined by a given cell minimum variable Region V source purity, $\mathcal{P}_{\text{V,min}}$. The $(u-g,g-r,r-i,i-z)$ SOM can be used to separate variable Region V sources with a variety of $\mathcal{P}_{\text{V}}$ and $\mathcal{R}_{\text{V}}$ values depending on the use-case, including $(\mathcal{P}_{\text{V}},\mathcal{R}_{\text{V}})=(80.2\%,25.1\%)$, $(\mathcal{P}_{\text{V}},\mathcal{R}_{\text{V}})=(48.5\%,48.5\%)$ and $(\mathcal{P}_{\text{V}},\mathcal{R}_{\text{V}})=(75.4\%,29.1\%)$. (d) The group of cells where the variable Region V source purity $\mathcal{P}_{\text{V}}$ of each cell exceeds $\mathcal{P}_{\text{V,min}}=60.0\%$. $\mathcal{P}_{\text{V}}=80.2\%$ of all sources in this group of cells are variable Region V sources. This group of cells contains $\mathcal{R}_{\text{V}}=25.1\%$ of all Region V variable sources (some 125177 sources). This group of cells, discussed in detail in section 3, is the largest group where $\mathcal{P}_{\text{V}}\gt80.0\%$.

Figure 7

Figure 8. The fraction of variable (blue) and standard star (red) sources from Region V with a given four-dimensional colour-distance (4DCD) value. The larger fraction of variable sources with high 4DCD values indicate that these variable sources, on average, are further from the median stellar locus (defined using standard star source colours) than standard star sources, emphasising the fact that the variable and standard star sources from Region V often occupy different regions of the four-dimensional colour-space.

Figure 8

Figure 9. The groups of cells on the SOM containing variable sources from each of the six regions described in Section 2. As detailed in Section 4, the groups of cells depicted for Regions III, IV, V, and VI are the largest groups of cells where the overall purity of variable sources from the given region exceeds $80.0\%$. The group of cells containing variable sources from Region V is also discussed in detail in Section 3. The group of cells for Region I, as justified in Section 4.1, is defined with a Region I variable source purity of $\mathcal{P}_{\text{I}}=27.1\%$ and completeness of $\mathcal{R}_{\text{I}}=71.7\%$. This group consists of only three cells, located at (94,26), (95,27), and (94,27) on the depicted axes. The (magenta) group of cells displayed on Fig. 9 is dominated by variable Region II sources, with a variable Region II source purity of $\mathcal{P}_{\text{II}}=94.5\%$, and contains $\mathcal{R}_{\text{II}}=96.5\%$ of all variable Region II sources. Described in Section 4.2, it is the largest group where all cells predominately contain variable Region II sources. It is important to note that the groups of cells defined for each region do not overlap. The variable source purity and completeness for each group of cells is detailed in Table 1.

Figure 9

Table 1. The purity and completeness of variable sources from each region of $(u-g,g-r)$ colour-space in each corresponding group of cells depicted in Fig. 9. As explained in Section 4, the groups of cells for Regions III, IV, V, and VI are defined as the largest groups where the overall purity of variable sources from the given region exceeds $80.0\%$. As discussed in Section 4.1, the low variable source purity and completeness for the group of cells defined for variable Region I sources reflects the inability of the SOM to separate these variable sources. The group of cells defined for variable Region II sources the largest group where all cells predominately contain variable Region II sources and is discussed in Section 4.2.

Figure 10

Figure 10. The distributions of median colours for all sources within the six groups of cells detailed in Section 4. For the sources in the cell groups dominated by Region II to Region VI variable sources, the variable source median (a) $(u-g,g-r)$ and (b) $(r-i,i-z)$ colour distributions differ markedly from, but encompass, the standard star median $(u-g,g-r)$ (c) and (d) $(r-i,i-z)$ colour distributions. This reflects the ability of the SOM to successfully select these variable sources with a high purity. In contrast, the variable sources from the cell group identified as containing the highest purity of variable Region I sources (described in Section 4.1) occupy a subset of the four dimensional colour-space spanned by standard star sources from the same group of cells. This is not unexpected given the inability of the SOM to select variable sources from Region I (also described in Section 4.1).