Hostname: page-component-6766d58669-7cz98 Total loading time: 0 Render date: 2026-05-18T09:17:22.480Z Has data issue: false hasContentIssue false

Survey-scale discovery-based research processes: Evaluating a bespoke visualisation environment for astronomical survey data

Published online by Cambridge University Press:  05 July 2023

C. J. Fluke*
Affiliation:
Centre for Astrophysics & Supercomputing, Swinburne University of Technology, Hawthorn, Australia
D. Vohl
Affiliation:
Anton Pannekoek Institute for Astronomy, University of Amsterdam, Amsterdam, The Netherlands ASTRON, Netherlands Institute for Radio Astronomy, Dwingeloo, The Netherlands
V. A. Kilborn
Affiliation:
Centre for Astrophysics & Supercomputing, Swinburne University of Technology, Hawthorn, Australia
C. Murugeshan
Affiliation:
CSIRO, Space and Astronomy, Bentley, WA, Australia ARC Centre of Excellence for All Sky Astrophysics in 3 Dimensions (ASTRO 3D), Australia
*
Corresponding author: C. J. Fluke; Email: cfluke@swin.edu.au
Rights & Permissions [Opens in a new window]

Abstract

Next-generation astronomical surveys naturally pose challenges for human-centred visualisation and analysis workflows that currently rely on the use of standard desktop display environments. While a significant fraction of the data preparation and analysis will be taken care of by automated pipelines, crucial steps of knowledge discovery can still only be achieved through various level of human interpretation. As the number of sources in a survey grows, there is need to both modify and simplify repetitive visualisation processes that need to be completed for each source. As tasks such as per-source quality control, candidate rejection, and morphological classification all share a single instruction, multiple data (SIMD) work pattern, they are amenable to a parallel solution. Selecting extragalactic neutral hydrogen (Hi) surveys as a representative example, we use system performance benchmarking and the visual data and reasoning methodology from the field of information visualisation to evaluate a bespoke comparative visualisation environment: the encube visual analytics framework deployed on the 83 Megapixel Swinburne Discovery Wall. Through benchmarking using spectral cube data from existing Hi surveys, we are able to perform interactive comparative visualisation via texture-based volume rendering of 180 three-dimensional (3D) data cubes at a time. The time to load a configuration of spectral cubes scale linearly with the number of voxels, with independent samples of 180 cubes (8.4 Gigavoxels or 34 Gigabytes) each loading in under 5 min. We show that parallel comparative inspection is a productive and time-saving technique which can reduce the time taken to complete SIMD-style visual tasks currently performed at the desktop by at least two orders of magnitude, potentially rendering some labour-intensive desktop-based workflows obsolete.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2023. Published by Cambridge University Press on behalf of the Astronomical Society of Australia
Figure 0

Figure 1. The Swinburne Discovery Wall: a multi-purpose 83 Megapixel tiled display wall, comprising a matrix of two rows and five columns of Philips BDM4350UC 4K-UHD monitors and five Lenovo ThinkStation P410 MiniTowers. See Section 2.2 and Table 1 for additional details. A small-multiples visualisation approach is used, with a single-instruction multiple data interaction paradigm. Interaction with the dataset is achieved through the browser-based user interface, visible in the left-hand monitor in the bottom row. Columns are enumerated from 1 to 5 from left to right. The keyboards in front of each column can be used for direct interaction with an individual data cube on the corresponding column. Shown here is a configuration of 80 spectral cubes sampled from the WHISP (van der Hulst, van Albada, & Sancisi 2001; Swaters et al. 2002), THINGS (Walter et al. 2008) and LVHIS (Koribalski et al. 2018) projects (see Section 3.3).

Figure 1

Figure 2. Simultaneous visualisation of 180 spectral cubes from the LVHIS Hi survey. Sources are randomly sampled with replacement, resulting in repetition of objects across the display. This configuration loads in less than 100 s. (Top) A zoomed-in view in showing the spatial distribution of Hi using a heat-style colour map where low signal is black and high signal is white. (Bottom) All cubes are rotated to show the kinematic structure along the spectral axis. A blue-red two-ended colour map is used to aid with identifying Hi that is either blue-shifted or red-shifted with respect to the observer, relative to each galaxy’s systemic velocity.

Figure 2

Table 1. Specifications for the ten Philips BDM4350UC 4K-UHD monitors of the Swinburne Discovery Wall. Parameters and corresponding units are: screen linear dimension, $L_{\rm dim}$ (m $\times$ m), screen area, $A_{\rm screen}$ (m$^2$), pixel dimensions, $P_{\rm dim}$ (pix $\times$ pix), and total pixels, $P_{\rm total}$ (Megapixels).

Figure 3

Table 2. Extragalactic Hi surveys used for evaluating encube on the Swinburne Discovery Wall. $N_{\rm s}$ is the number of spectral cubes selected from each of the three surveys (see Section 3.3 for a discussion as to why several spectral cubes were omitted). Data volumes are reported in Megabytes (MB) and voxel counts in Megavoxels (Mvox), with spectral cubes stored in the FITS format. Statistical quantities presented are the min(imum), max(imum), mean, sample standard deviation (SD), and median. The total column summarises the volume or voxel count for the entire survey.

Figure 4

Table 3. Display and survey configurations for which the encube benchmarks were obtained. Set is the label used to identify the five different configurations (A-E), with $N_{\rm cube}$ = 20, 40, 80, 120, or 180. Config is the arrangement of S2PLOT panels (rows $\times$ columns) per column of the Discovery Wall. Survey is one of [W]HISP, [T]HINGS, [L]VHIS, or [C]ombination. $N_{\rm W}$, $N_{\rm T}$, and $N_{\rm L}$ are the number of spectral cubes selected from each of the input surveys. Random sampling with replacement is used for configurations where the total number of cubes displayed exceeds the input survey size. $N_{\rm vox}$ is the total number of voxels (in Gigavoxels) and $V_{\rm Store}$ is the total data volume (in GB). $M_{\rm GPU}$ is the mean memory per GPU in GB, which must be less than 8 GB so as not to exceed the memory bound of the NVIDIA GTX1080 graphics cards. $T_{\rm Load}$ (in seconds) is the time measured for all of the spectral cubes to be loaded, rounded up to the nearest second. Statistical quantities calculated are the mean, sample standard deviation (SD), and median.

Figure 5

Table 4. With spectral cube data stored in the FITS format, there is a slight variation in the ratio between the total data volume, $V_{\rm Store}$ measured in GB, and the number of voxels, $N_{\rm vox}$ measured in Gigavoxels across all 54 survey configurations. This is due, in part, to the varying lengths of the FITS headers.

Figure 6

Figure 3. (Left panel) Based on the 54 independent benchmarks (see the summary in Table 3), the total time taken to load all spectral cubes for a given input configuration grows linearly with the storage volume. Load times are rounded up to the nearest second. Symbols are used to denote the four different input surveys; WHISP (square), THINGS (circle), LVHIS (triangle), or Combination (diamond). (Right panel) From a subset of 21 benchmarks, the minimum recorded frame rate decreases as the mean memory per GPU of the Discovery Wall increases. Plotted values are the mean $\pm$ standard deviation of the minimum observed frame-rate across columns 2–5 of the Discovery Wall (see Table 5). Frame rate benchmarks were only obtained for Set A (circle) and Set E (triangle), with $N_{\rm cube}$ = 20 or 180 respectively. A reasonable frame rate for interactivity is above 10 frames s$^{-1}$, which was achieved except in the Combination configuration containing higher data volume THINGS spectral cubes.

Figure 7

Table 5. Indicative frame rates for each of the five columns of the Swinburne Discovery Wall using a subset of the survey configurations. Quantities and units not defined elsewhere (see the caption to Table 3) are the version number of each mock survey, Ver, and the lowest measured column-based frame rates, $F_i$, in frames/s, recorded after several complete rotations of each spectral cube. Subscripts 1–5 on the frame rate indicate the column of the Discovery Wall, numbered from left to right as seen in Fig. 1.

Figure 8

Figure 4. A quality control activity using encube and the Swinburne Discovery Wall to visualise 80 WHISP spectral cubes. (Top) Visualisation of the mock survey using the data as obtained from the WHISP survey website. We observe that the volume rendering has not worked as expected. In 77 cubes, there is visible excess flux at both ends of the spectral axis. This is seen as the strong blue and red features in each cube, making it difficult to see the WHISP galaxies in most cases. (Bottom) By choosing to reset data values to zero in the first eight and last eight channels of each WHISP spectral cube, the kinematic Hi structures are now visible.

Figure 9

Figure 5. A demonstration of encube in use for a SIMD candidate rejection or morphological classification activity. Shown here are columns 2–5 of the Swinburne Discovery Wall. Five sources of interest (labelled A–E under the column in which they are located, and described in Section 5.2) have been highlighted for further investigation. The overview provided by visualising many small-multiples allows for rapid identification of these five sources, which show spatial or spectral features that are quite different to the other 75 sources in the survey sample.

Figure 10

Figure 6. Single file load time for the three representative spectral data cubes (minimum, median, and maximum file sizes) for each of the WHISP, THINGS, and LVHIS surveys. Load times were measured for the local disk (filled circles) and across the local network via an NFS mount (open circles). In both cases, there is minimal difference between the two measurements, with a reaction time error of 0.5 s

Figure 11

Table 6. Single-object (Single) and multi-object (Multi) mean and median load times, $T_{\rm Load}$ in seconds, for the 80-cube [W]HISP, [T]HINGS, [L]VHIS and [C]ombination configuration, using survey data volumes from Table 3. The ratio of the single-to-multi object load times are recorded in the final two columns.

Figure 12

Figure 7. Estimated throughput for a SIMD workflow based on visual inspection of the entire [L]VHIS, [W]HISP, [A]PERTIF, and WALLA[B]Y extragalactic Hi surveys, as per configurations described in Section 6.3.3.For each survey, we consider three scenarios with different follow-up action times: (1) $T_{\rm Action} = 0$; (2) $T_{\rm Action} = 30$ s source$^{-1}$ for 10% of sources; and (3) $T_{\rm Action} = 60$ s source$^{-1}$ for 25% of sources. Symbols are used to differentiate between the inspection times, with $T_{\rm Inspect} = 3$ s source$^{-1}$ for a multi-object workflow (filled circle) and $T_{\rm Inspect} = 10$ s source$^{-1}$ (open triangle) and $T_{\rm Inspect} = 30$ s source$^{-1}$ (plus symbol) for single-object workflows.

Figure 13

Figure A.1. The key components required for encube to operate on the Swinburne Discovery Wall. The Master node hosts the Data Store, which is accessed by the Process and Render nodes via a network file system mount point. Direct communication between the Process and Render nodes and the Master occur over the shared network via sockets. Each Process and Render node provides a graphical output to two monitors, which are tiled into a matrix of S2PLOT panels. The User Interface operates on the Master node, controlling the assignment of spectral cubes to each of the Process and Render nodes and modification of the appearance of the spectral cubes.

Figure 14

Figure A.2. The encube user interface (UI) operating in the Firefox Web browser on the Master node. The main elements of the UI are (A) the world in miniature view, replicating the layout of the Discovery Wall; (B) the survey database containing filenames and associated metadata; and (C) the visualisation parameters, controlling visual aspects such as choice of colour map and labelling of spectral cubes. Additional section of the interface (not shown here) includes the camera controller and interactive plots such as voxels histogram (i.e. to modified the dynamic range) or other custom meta information (e.g. stellar masses of galaxies displayed on the screens as a function of grid position).

Figure 15

Figure A.3. A proposed enhancement to encube would support non-uniform tiling of the display area. In the existing configuration (left-hand panel), the same level of detail is used for every spectral cube. A modification to the tiling (right-hand panel) would allow individual cubes with different sizes to be presented at the same scale or for the volume rendering to occur with a higher level of detail.