GASKAP-HI Pilot Survey Science III: An unbiased view of cold gas in the Small Magellanic Cloud

We present the first unbiased survey of neutral hydrogen (HI) absorption in the Small Magellanic Cloud (SMC). The survey utilises pilot HI observations with the Australian Square Kilometre Array Pathfinder (ASKAP) telescope as part of the Galactic ASKAP HI (GASKAP-HI) project whose dataset has been processed with the GASKAP-HI absorption pipeline, also described here. This dataset provides absorption spectra towards 229 continuum sources, a 275% increase in the number of continuum sources previously published in the SMC region, as well as an improvement in the quality of absorption spectra over previous surveys of the SMC. Our unbiased view, combined with the closely matched beam size between emission and absorption, reveals a lower cold gas faction (11%) than the 2019 ATCA survey of the SMC and is more representative of the SMC as a whole. We also find that the optical depth varies greatly between the SMC's bar and wing regions. In the bar we find that the optical depth is generally low (correction factor to the optically thin column density assumption of $\mathcal{R}_{\rm HI} \sim 1.04$) but increases linearly with column density. In the wing however, there is a wide scatter in optical depth despite a tighter range of column densities.


Introduction
Neutral hydrogen (H ) exists in multiple phases in a galaxy's interstellar medium (ISM). H is observed in two long-lived phases, the warm neutral medium (WNM, 5000-10,000 K) and cold neutral medium (CNM, (McKee & Ostriker, 1977). In a pressure equilibrium these two phases will coexist, but that equilibrium is dependent on metallicity (Wolfire et al., 1995(Wolfire et al., , 2003Bialy & Sternberg, 2019). Turbulence and colliding flows can drive the formation of CNM (Hennebelle & Audit, 2007;Kim & Ostriker, 2017). The CNM is a precursor to the formation of the dense cores of molecular hydrogen (H 2 ) from which stars form. The fraction of H in the CNM state is an important metric for the efficiency of star formation and galaxy evolution, along with the molecular gas fraction (Krumholz et al., 2009;Kennicutt & Evans, 2012).
We can use the spin temperature (T S ) of H to assess the fraction of cold gas. The spin temperature of H is the excitation temperature of the H 21-cm spin-flip transition. As the gas is thermalised by collisions in the dense CNM, the spin temperature will be equal to the kinetic temperature of the gas in this environment (Field, 1958). In the WNM, it will be a lower limit for the kinetic temperature (Liszt, 2001).
By combining absorption observations with adjacent emission observations we are able to measure the spin temperature of the gas Jameson et al., 2019). H absorption against continuum sources allows us to directly detect the presence of cold gas clouds. These clouds are otherwise difficult to detect as, due to their low spin temperature, they produce only weak 21-cm emission.
The Small Magellanic Cloud (SMC) is the perfect lab-arXiv:2204.06285v1 [astro-ph.GA] 13 Apr 2022 oratory to study cold gas formation at high resolution in a low metallicity environment. The SMC is a nearby (61 kpc; Graczyk et al. 2014) low mass galaxy, part of the interacting Magellanic System, along with the Large Magellanic Cloud, the Leading Arm and the trailing Magellanic Stream. The SMC has metallicity of 0.2 solar (Russell & Dopita, 1992) and thus will have a typical temperature range for the CNM of 50 ≤ T S ≤ 100 K (Bialy & Sternberg, 2019). Dickey et al. (2000) presented the first survey of H absorption across the SMC, examining emission and absorption towards 32 continuum sources. They achieved optical depth noise levels of 0.05 ≤ σ τ ≤ 0.203 at 0.825 km s -1 spectral resolution and detected significant absorption in 13 of these spectra. The sources were selected for their strong continuum flux. Jameson et al. (2019) built on this by examining emission and absorption towards 55 continuum sources in 21 fields. These fields were selected for the greatest likelihood of finding absorption, with strong continuum flux (S cont > 50 mJy) and towards high column density regions (N HI > 4 × 10 20 cm -2 ). In their ≥ 10 hr dwell time per target they reached noise levels of 0.01 ≤ σ τ ≤ 1.28 and detected absorption in 37 of the spectra. Their spectra were imaged at 0.2 km s -1 but smoothed to 0.6 km s -1 for analysis. Both surveys used the Australia Telescope Compact Array with baselines up to 6 km (FWHM beam ≈ 5 arcsec) for absorption and compared them to emission from the  survey of the SMC (FWHM beam ≈ 98 arcsec). Table 1 summarises the observational parameters of these surveys and the survey presented in this work.  Dickey et al. (2000) and Jameson et al. (2019) were targeted surveys which affects the detection rate.
Using the Australian Square Kilometre Array Pathfinder (ASKAP) telescope, the Galactic ASKAP (GASKAP; Dickey et al. 2013) survey will observe H and OH in the Galactic Plane and the Magellanic System with unprecedented detail. The observations use high angular resolution (FWHM beam ≈ 16 arcsec) and high spectral resolution (0.24 km s -1 per channel). Planned observations include long dwell times on the Magellanic Clouds and the low latitude Galactic Plane and shorter dwell times on the Magellanic Stream and Bridge. The repeated observations of key fields will provide H absorption spectra with a flux density sensitivity of σ S = 0.5 mJy (Dickey et al., 2013). The survey is planned to cover 13,020 deg 2 in total. With a rate of ≈ 10 sources per square degree, up to 130,200 H absorption spectra are expected in the GASKAP-HI survey. With such high volumes of spectra, a repeatable process is essential.
In this work we present the GASKAP H absorption pipeline and use it with the pilot phase I SMC H observations (see Pingel et al., 2022) to explore the distribution of cold gas in the SMC in an unbiased way. In section 2, we describe the GASKAP pilot observations of the SMC. In section 3, we present our H absorption pipeline along with the processing parameters used. We describe the observed absorption in the SMC and surrounds in section 4. In section 5, we discuss the results and their implications. Finally, in section 6, we summarise our findings.

Observations
The SMC was the first of three adjacent Magellanic fields targeted during the GASKAP Pilot Phase I observations. Two 12-hour observations of the field (ASKAP scheduling blocks 10941 and 10944) were taken in December 2019 using the standard GASKAP-HI observing configuration. The closepack-36 phased-array feed (PAF) footprint was used with a pitch of 0.9 deg and 3 interleaves for even coverage across the 25 square degree field. The field was centred on J2000 RA = 00 h 58 m 43.280 s , Dec = -72 d 31 m 49.03 s . The zoom-16 mode, with ∆ν = 1.15 kHz, was used to provide a spectral resolution of ∆v ∼ 0.24 km s -1 . The observed band covered 18.5 MHz centred on 1419.85 MHz with 15558 channels, however only the 2048 channels covering the Milky Way and SMC velocity ranges were processed. The flagging was even across the field, providing a change in RMS across the field of ≤ 7.5%.
As noted in Dickey et al. (2013), ASKAP's bimodal baseline distribution provides excellent capabilities for measuring both H emission and absorption. In these observations, there was good coverage of baselines longer than 2000m, which provide fine spatial resolution. Baselines up to 6000m were present. The expected sensitivity from the combined observation is 3.3 mJy/beam after accounting for flagging and excluded baselines (see Section 3.2.1).
The data were flagged and calibrated using the standard ASKAPSoft pipeline (Hotan et al., 2021) with configuration suitable for wide field emission. A continuum image and a continuum source catalogue were also produced by the ASKAP-Soft pipeline for each observation. The observations and the initial processing with ASKAPSoft are described in further detail in Pingel et al. (2022).

H Absorption Pipeline
To measure the absorption against continuum sources we need a spectral line cube for the region surrounding each source. We have two potential strategies to produce these source cubes: a) producing a large, continuum included cube covering the entire field and then extracting sub-cubes for each source, or b) producing sub-cubes for each source position directly from the measurement sets.
Note, we cannot use the GASKAP emission cube as the continuum subtraction will limit our ability to accurately measure deep absorption, and the cube contains large-scale emission which will add noise to the absorption spectrum. We have chosen to take approach (b) as we know in advance the positions we wish to image. This approach also has several advantages for our use case. This saves us from imaging the unused regions (> 99%) of the cube not sampled by the sparse and small continuum sources. Moreover, we are able to set the phase-centre to the source position for each sub-cube, thus avoiding w-term effects which can impact wide fields (Cornwell & Perley, 1992). Finally we are able to optimise the cube production to obtain the most accurate absorption spectra possible. We have developed a pipeline (Dempsey, 2022) to produce H absorption spectra for a large number of sources from the calibrated ASKAP measurement sets. As GASKAP is a wide field survey, and not targeted at specific sources, we have the opportunity to take an unbiased sample of the cold H absorption in every field. We obtain absorption spectra towards all continuum sources above a given brightness (e.g. S cont ≥ 15 mJy). For each source, we produce a small cube around the source, sufficient to include all continuum emission from the source. From this we extract an integrated absorption spectrum. We also measure the emission surrounding the source from the GASKAP emission cube for the field (Pingel et al., 2022). Finally, we then process all spectra to find any significant absorption detections and produce catalogues of spectra and absorption features.
The pipeline takes as input: The pipeline is split into three phases: preparation, imaging, and spectra extraction, each described below.

Preparation Phase
In the preparation phase we identify the target sources and the input data for the pipeline run. We take the continuum component catalogue produced as part of the ASKAPSoft processing of the field (Hotan et al., 2021). We designate all sources with S cont ≥ 15 mJy as target sources. A typical GASKAP-HI observation for 20 hrs has a σ S = 1.89 mJy/beam (Pingel et al., 2022), and after the baseline cutoff described in Section 3.2 σ S = 2.80 mJy/beam. For sources with S cont = 15 mJy we have a theoretical optical depth noise level of σ cont = 0.19. Examining the 47 sources near this threshold (15 ≤ S cont ≤ 17 mJy) in the SMC pilot data, we see a median optical depth noise level of σ cont,median = 0.23 with a range of 0.07 ≤ σ cont ≤ 1.34, showing that we are at the limit of useful spectra in these data. For the 200 hr integrations on the Magellanic Clouds planned in the full survey, we could expect to push this limit down to as low as S cont ∼ 5 mJy with similar optical depth noise levels.
For each target source we use all beams within 0.55 deg of the source (or 0.8 deg for edge sources where no beams are closer). From these we produce a list of all target sources and the beams which will be used to produce the cutout for each source.

Imaging Phase
The imaging phase is the most computationally intensive. As discussed earlier, we produce a sub-cube around each source. We process this phase in parallel with a job per sub-cube. In the pipeline, we dynamically schedule the sub-cube extraction jobs to minimise competing use of measurement sets and thus improve overall I/O throughput while providing high parallelism within the resources of the compute infrastructure.
For each cube we use the CASA (McMullin et al., 2007) task tclean to image a 50 arcsec x 50 arcsec region around the source at 1 arcsec per pixel and with 0.24 km s -1 LSRK velocity resolution (the native spectral resolution). This images the beam measurement sets selected for each source in the preparation phase. We use a 1.5 kλ baseline length cutoff, natural weighting, primary beam correction and light cleaning (1000 total iterations), as discussed in Section 3.2.1. This gives a typical synthesised beam size of 16 x 14 arcsec 2 . We use a primary beam model of a 12 m dish with a 0.75 m blockage.  For absorption studies we wish to exclude emission from the spectra while maintaining high signal-to-noise ratios. In this pipeline we have achieved that primarily by setting a minimum baseline length when producing the cutout cube. Figure  1 shows the results of our trials of a range of baseline length cutoffs on a sample of seven spectra selected to represent the diverse range of spectra seen in the full dataset. Based on these trials, we found that the optimum balance of reduced emission and reduced noise came from a 1.5 kλ (315 m) baseline length cutoff. This retains 574 of the 630 ASKAP baselines but provides sensitivity only to features of 3 arcmin in size. This is well suited for analysis of compact extra-Galactic sources, but means these data products would not be suitable for analysis of larger Galactic structures such as supernova remnants and H regions. Similarly, we tested a variety of weighting parameters, as shown in Figure 2. Based on these results we found 'natural' weighting produced cutout cubes with the lowest optical depth noise and emission noise. Additionally, we used primary beam correction and light cleaning (1000 total iterations) when imaging the data.

Spectra Extraction Phase
From the cutout cubes for each source, we extract absorption spectra for every target source. We use the technique developed in Dempsey et al. (2020), but we describe the process here for completeness. The continuum source catalogue defines a source ellipse for each component. We combine all pixels within this source ellipse to produce the spectrum. We define a line-free region of the cube (with a velocity range -100 ≤v LSRK ≤ -60 km s -1 for the SMC) and measure the mean brightness within this range for each pixel. We then weight each pixel's contribution to the spectrum by the square of the mean brightness, as described in Dickey et al. (1992). Lastly, we divide the combined spectrum by its mean brightness within the line-free region to produce the absorption spectrum, e -τ .
We estimate the noise in the spectrum using a combination of the noise in the off-line region and emission in the primary beam of the dish, as described in Jameson et al. (2019). The standard deviation of the spectrum in the line-free region is taken as the base noise for the spectrum. In order to model the increase in system temperature due to emission received by the antenna at different frequencies, we then measure the emission level in the ASKAP primary beam of 62 arcmin using data from the Parkes Galactic All-Sky Survey (GASS; McClure-Griffiths et al. 2009;Kalberla & Haud 2015). We average the GASS emission across a 7 pixel (33 arcmin) radius annulus centred on the source position, with a 1 pixel exclusion at the centre. The 1σ noise envelope for the GASKAP absorption spectrum is then calculated as: where the system temperature T sys = 50 K and antenna efficiency η ant = 0.67 (Hotan et al., 2021), σ cont is the standard deviation of the line-free region of the spectrum, and the mean emission T em (v) at each velocity step is as measured in GASS. We identify any absorption features in the spectra during this phase. We classify absorption features as those having one channel of 3σ absorption and an adjacent channel of ≥ 2.8σ. We expand the feature to include any adjacent channels of at least 2.8σ significance. These criteria were chosen empirically to detect even shallower absorption whilst still avoiding noise spikes being detected as features. Statistically, these criteria give an average of less than one false positive absorption feature per ≈ 370 spectra.
With a large number of spectra expected in each field, it is important to consistently identify which spectra are of sufficient quality to be analysed. We use a subset of the Brown et al. (2014) rating system to classify the spectra quality level from A to D. Rating A spectra pass all tests, while each spectrum's rating is reduced by one step for each failed test until rating D spectra fail all tests. The tests described in Brown et al. aim to flag unphysical or noisy spectra.
The three tests we use are: • Continuum noise -The 1σ noise level (σ cont ) in optical depth (e -τ ) must be less than 1/3 • Max signal to max noise -The ratio of the deepest absorption to the highest emission noise (i.e. e -τ > 1) in the spectrum: (1 -min(e -τ ))/(max(e -τ ) -1) ≥ 3 • Optical depth range -The range from the deepest absorption to highest emission noise in the spectrum: max(e -τ )min(e -τ ) < 1.5.
We do not use the other two Brown et al. tests as they are not applicable to our spectral extraction method.
To make measurements of spin temperature it is also necessary to have an emission spectrum for each source. The emission spectra are calculated from the full field continuumsubtracted GASKAP emission cube produced for the same observations (Pingel et al., 2022). We calculate the emission spectrum of a source as the mean value of an annulus of radius 56 arcsec (8 pixels, ≈ 2 beam widths) around each source with a central exclusion of 28 arcsec (4 pixels, ≈ 1 beam width). This allows us to measure the mean emission in the close vicinity of the source without measuring the source itself, in all but the most extended cases. The high resolution gives us as close an approximation for the emission in the line of sight of the source as possible. We use the standard deviation of the annulus as the emission noise level.
The pipeline produces a series of outputs describing the spectra towards each source. For each source we produce a spectrum votable file with the absorption and, where available, the emission spectra, along with the noise in both the absorption and emission spectra. We output catalogues of the spectra, (see Table 2), as well as the absorption features detected (see Table 3).

SMC H Absorption
For each of the 373 continuum sources which satisfy our criteria of peak flux density ≥ 15 mJy (see Sec. 3.1), we produced an absorption spectrum using the process described in Section 3.3. However, we averaged the data to 1 km s -1 spectral resolution in the imaging phase to reduce noise. We selected the sources from the Selavy (Whiting & Humphreys, 2012) continuum source catalogue from the SB 10941 observation. We then excluded any sources with σ cont > 0.3 or at the edges of the field where beam power < 80%, leaving 229 sources. The source density is 10 sources per square degree, a rate which we expect to be replicated in most GASKAP fields of similar noise. The distribution of all sources by noise is shown in Figure 3. Notably, all of the excluded noisy sources are towards the edges of the field.
An example spectrum for continuum source J005556-722605 is shown in Figure 4. In the top panel, we can see that the spectral bandpass has a slope of ≈ 0 and is consistent with random noise around a constant continuum level. The emission has been successfully excluded from the spectrum, with the spectrum above the continuum level consistent with the continuum noise envelope. In the bottom panel, we show the brightness temperature spectrum from the GASKAP emission cube. Note that the velocity range of the emission data is restricted to the range of the SMC emission cube. Overall, the approach of excluding short baselines from the imaging has been successful in excluding emission while maintaining a high signal-to-noise ratio. The full set of spectra are available in the dataset .
A subset of sources are described in Table 2, with the full set of sources provided in the dataset. For each source we provide a) ID -the id of the source within the field b) Source -name based on the detected location of the source c) RA -right ascension of the source d) Dec -declination of the source e) Peak Flux -peak flux density of the source from the Selavy continuum catalogue f ) Rating -quality rating of the spectrum (see Sec 3.3) g) σ cont -1σ noise level of the spectrum in absorption units.
Does not include emission noise h) Peak τ -the maximum τ value within the SMC velocity range. Where the spectrum is saturated (e -τ ≤ 0) a minimum limit is specified based on the 1σ noise level of the channel in the spectrum i) N HI,uncorr -column density towards the continuum source, excluding Milky Way velocities, from the GASS survey under the assumption that the H is optically thin j) T S -the density-weighted mean spin temperature of the sight-line (see Section 5.3) k) R HI -column density correction ratio for the sight-line (see Section 5.2).
Using the criteria described in Sec. 3.3, we detect absorption features at SMC velocities (v LSRK ≥ 50 km s -1 ) in 65 (28%) of the spectra. Thirty-six of these spectra have multiple features, giving a total of 122 features. Note that a full decomposition of these features into components is beyond the scope of this work. A subset of absorption features is shown in Table  3, with the full set of features provided in the dataset. For each feature we provide a) Source -name of the source based on the location of the source b) Feature -name of the feature based on the source name and the minimum velocity of the feature c) Min Velocity -the minimum velocity bound of the feature d) Max Velocity -the maximum velocity bound of the feature e) Width -the number of 1 km s -1 channels the feature spans f ) Peak Absorption -the maximum measured absorption (1e -τ ) of the feature with the uncertainty in absorption spectrum at that velocity, values greater than 1 are saturated g) Peak τ -the maximum τ value of the feature. Where the spectrum is saturated (e -τ ≤ 0) a minimum limit is specified based on the noise level in the peak channel h) Significance -the highest single-channel significance of the feature as measured in absorption (e -τ ) i) Equivalent Width -the integral of τ for the feature. This will be a lower limit for saturated spectra as we use a minimum limit based on noise for each saturated velocity channel in the same way as peak τ above.
The distribution of optical depth of these features is shown in Figure 5. Sixteen of the features are saturated (e -τ < 0) and 6 features have sufficient noise that they could be saturated (above the noise limit in Fig. 5). The τ values of the 100 remaining features range from 0.04 to as high as 3.3, with a median τ = 0.5. The trend to have higher τ values for higher noise spectra is a side-effect of our minimum significance requirement.
Following Jameson et al. (2019), we define an arbitrary     (Pingel et al., 2022). Triangles are sources excluded due to either high noise or being on the edges of the cube, squares are sources against which absorption was detected, and circles are other sources. Darker colours indicate lower optical depth noise. column density limit for the body of the SMC a (N HI,uncorr > 2 × 10 21 cm -1 ) and show this as the contour in Figure 6. Within the body of the SMC, we see a much higher detection rate, as shown in Table 4, with absorption features in 62% of spectra. Thus 75% of all spectra with absorption are found in a As shown in Fig 8, the detection rate drops smoothly across this limit so the results are not strongly dependent on the exact value selected. the body of the SMC. Notably, of the 23 bright (S cont ≥ 50 mJy) continuum sources within the body of the SMC only one, J003754-725157, does not have any detectable absorption features in its spectrum. This spectrum has τ peak = 0.11 ± 0.06. So in these higher density regions we see absorption in almost all low noise spectra. The detection rate also drops with source strength, with only 41% of the 51 SMC spectra against faint (S cont < 30 mJy) continuum sources having detectable absorption.
The 16 features outside the body of the SMC have generally low τ values within a smaller range (min=0.09, med=0.5 max=1.3) than within the body, and none are saturated. In contrast to the body of the SMC, only one of the 16 spectra outside the SMC has multiple features. Of the 71 bright continuum sources outside the SMC, only 7 show detectable absorption features. As shown in Figure 3, some detections  such as J013218-715348 and J005715-704046 are well away from the SMC. Also, as seen in Figure 6, particularly at the extremes of velocity, we detect some shallow absorption features in regions with little emission. These are discussed in further detail in Section 4.1.

Absorption in Low Density Regions
Six of the detected absorption features are in velocity regions with little emission (T B < 5 K), indicating low column density (see Fig. 6). This is surprising as cold H is normally associated with a higher density of warm H (e.g. Kanekar et al., 2011). Three of these features are reliable detections. The first feature, J005732-741243_137 is a very clear detection (significance ≈ 61σ) with deep absorption through low column density (see Fig. 7), and is also reported in Jameson et al. (2019). The feature is also apparent as absorption in the GASKAP emission cube. Its location, seen in Fig. 6h, is in the lower centre of the image below the SMC body. The second feature, J005715-704046_255 is outside the velocity range for which we have emission data and thus is not shown in Fig. 6. It is located well to the North of the SMC body. The feature is only two channels wide, but has adjacent channels which also show less significant absorption, making this likely to be a real absorption feature. A further absorption feature, J011134-711414_114, was detected within a velocity region with T B = 9 ± 4 K. This source is shown in Fig. 6f and is located in the upper left of the image slightly to the North-East of the SMC body. With two adjacent channels over 3σ significance this is a reliable detection. Of these three features, the latter two are shallow, similar to the low-N(H ) clouds found by Stanimirović et al. (2007) in the Milky Way. They found these type of clouds to be the lower column density population of the CNM but likely to be short-lived due to to their small size and lack of shielding. The other three features are less firm detections. In two of these cases (J013134-700042_240 which is not plotted, and J005116-734000_65 in Fig. 6a), the features are marginal detections and consistent with noise. Notably, the spectrum J013134-700042 was flagged as having a potentially underestimated noise level. Finally, the feature J010524-722524_87 (see Fig. 6c) has a large emission noise spike in the next channel (v = 89 km s -1 ) to the feature. This may indicate that the absorption is a noise artefact also.

Column density thresholds and detection rate
Using the GASKAP emission data we can calculate the H column density for each sightline, under the assumption that the H gas is optically thin (τ 1) (Dickey & Lockman, 1990, eq. 3): We use a Monte-Carlo method to establish the uncertainties in N HI,uncorr . For each velocity channel of each spectrum Figure 6. Locations of the 122 SMC absorption features described in Sec. 4, plotted against 10 km s -1 slices of emission. Each absorption feature is shown as a circle centred on its position with the colour reflecting the depth of absorption. The scale for the absorption is shown in the top right of each panel, with darker colours indicating deeper absorption and lighter colours shallower absorption. Features are plotted in the velocity slice containing the centre of the absorption feature's velocity range. In each slice, the column density for that velocity range, as measured in GASKAP emission (Pingel et al., 2022), is shown as a linear grayscale, with the scale shown on the right of each row. The green contour shows the 2×10 21 column density limit for all SMC velocity ranges, as detected in GASKAP emission (Pingel et al., 2022), representing the outline of the SMC. we draw 1000 samples of both the optical depth and the brightness temperature. The samples are randomly drawn from a normal distribution utilising the actual spectrum value as the mean and the 1-σ noise envelope as the standard deviation. We then perform all calculations across all samples, and use the median of the result for each sample as the final value, and the 15th and 85th percentiles as the 1-σ uncertainty ranges.
With an unbiased sample of absorption in the field we have the opportunity to examine where the cold gas is present. Figure 8 shows a comparison of the H column density (uncorrected for optical depth, see Sec. 5.2) for both spectra with and without absorption detections. We can see that we have some detections in regions with column densities as low as 1.25 × 10 20 cm -2 . However, it is not until 2.5 × 10 21 cm -2 that we have a high rate of detection. Most absorption detections are in sight-lines with a column density of 3 × 10 21 cm -2 or greater. From 6 × 10 21 cm -2 almost all sight-lines, even for faint continuum sources, show absorption. Kanekar et al. (2011) presented a relationship between the observables of uncorrected column density and the integral of τ (the equivalent width) in Milky Way gas clouds. The following equation (their Equation 3) models the presence of cold gas within warm H envelopes, which shield the cold H from surrounding hot gas and radiation.
Here, N 0 is the threshold column density for the formation of cold gas, N ∞ is the column density at which the cold gas saturates (τ → ∞) and τ c is the effective opacity, or the mean observed optical depth over the velocity range (∆V). They further define a lower and upper limit using Eq. 3 on the expected range in which absorption forms in the Milky Way. The lower limit has N 0 = 10 20 cm -2 , N ∞ = 5.0 × 10 21 cm -2 and ∆V = 20 km s -1 . The upper limit has N 0 = 2 × 10 20 cm -2 , N ∞ = 10 22 cm -2 and ∆V = 10 km s -1 . Comparing our detections against the Kanekar et al. (2011) limits (see Figure 9), we see outliers on both sides of the envelope, despite most sources lying within the limits. The two sources to the left of the lower limit hint at cold gas formation below the column density limit seen in the Milky Way. In the current data we do not have the sensitivity to probe this lower column density regime in detail. To the right of the upper limit we find eight sources. Four of these sources have low equivalent width noise levels and thus support a higher limit for the condensation of dense H into molecular H 2 saturation in the SMC. This higher limit reflects the findings of both Bialy & Sternberg (2016) and Krumholz et al. (2009) that higher column density is required at low metallicities for the formation of molecular gas, and of Bolatto et al. (2011) that the SMC has a very low fraction of molecular gas as compared to its atomic gas content. With the addition of future GASKAP observations of the SMC we should be able to test these bounds more rigorously.

Corrected column density for the SMC field
The column density measurements from emission data are made under the assumption that the gas is optically thin (τ 1). However, if the H is not optically thin we need to account for the absorption to get an accurate estimation of the column density. For the sight-lines we have observed, we can correct these measurements for the actual optical depth of the gas. Using the assumption that the gas is isothermal, that is each velocity channel of gas only has a single temperature, we can calculate the corrected column density using the formula Benson 1982, eq 5 andChengalur et al. 2013, eq 8): where T B (v) and τ(v) are the brightness temperature and absorption respectively for a velocity channel. The column density correction ratio is then R HI = N HI,corr,iso N HI,uncorr,iso , We use a Monte-Carlo method to establish the uncertainties in N H,corr,iso and R HI , as described in Sec. 5.1. There is a large spread in the correction factors as compared to uncorrected column density, as shown in Fig. 10. Strikingly, the range of correction factors is very different between the wing and the bar of the SMC. The bar shows generally low correction factors with an increase in correction factor with increasing column density. The exception is the sight-line towards J010029-713826, which shows both a deep and a wide absorption component in a region with relatively low column density for the bar region. In contrast, the wing shows a great variety of correction factors unrelated to column density. This likely reflects that the bar has a longer line of sight than the wing, and so multiple features would be blurred. In the shorter line of sight through the wing, we are more likely to pick out individual features such as diffuse tidal structures and regions of intense star forming. The cluster of four sight-lines with the highest column density and highest correction factor in the wing are all close to NGC 460 and NGC 465. The other two sight-lines with R HI > 1.3 are towards the SE end of the wing, another region of active star formation. The detections outside the wing and the bar, all at lower column densities, show small correction factors as expected. The one exception is the sight-line towards J012924-733153, in which the correction factor is boosted by noise (σ cont = 0.211) as well as the single narrow detection and a potential shallow, wide component. For comparison we also show the correction factors of low-noise (σ cont < 0.1) sight-lines where no absorption was detected. These are mostly at lower column densities and have minimal correction, as expected. Overall, the correction factors range from R HI ≈ 1 at the low column densities, to R HI = 1.49 in the higher density regions of the wing, close to known star formation regions. Dickey et al. (2000) found a correction factor relationship with column density for the SMC of R HI 1+0.667(log 10 N -21.4). With our broader view of the SMC optical depth, we find a slightly shallower linear rise in correction factor with column density for just the SMC bar of R HI 1+0.51(log 10 N -21.43). The wide range of correction factors in the SMC wing means that a single line cannot be fit to that region. The wide range for the SMC wing is similar to the results of Nguyen et al. (2019) who found large regional variation in R HI around five giant molecular clouds.
The shallower relationship of column density and correction factor than found in Dickey et al. (2000) has two drivers. Firstly, as noted in Sec. 5.4, we detect a higher proportion of shallower absorption features due to the unbiased selection of continuum sources. Secondly, the smaller emission beam size will reduce beam smearing of small-scale dense structure, so in these cases we will measure a higher uncorrected column density. One of the great advantages GASKAP has over past absorption studies is that we have a very closely matched beam size between the emission (30x30 arcsec 2 ) and absorption (15x15 arcsec 2 ). This means both that our emission sample is taken from as close as possible to the target continuum source and that the scale of structures detected by the emission and absorption are similar.
Comparing the cumulative distribution of correction factors between the field outside the SMC and the body of the SMC, we find that the body has far higher correction factors (see Fig. 13, top). The higher correction factors reflect the column density threshold apparent at N HI,uncorr < 2 × 10 21 cm -2 in Fig. 10, below which there are much lower correction factors. The uncertainties in this top panel are generated from the R HI uncertainties for each source.

Temperature distribution
To measure the density-weighted mean spin temperature we can compare the integrals of brightness temperature and optical depth using the following formula (Dickey et al., 2000, Eq. 4): where n(s) is the line-of-sight volume density. We use a Monte-Carlo method to establish the uncertainties in T S , as described in Sec. 5.1. Following Dickey et al. (2000) we measure the integrals across the velocity range where T B ≥ 3 K so as to reduce the noise in the integrals. Note that in some cases the integral range will contain regions where the emission level dips below this threshold (e.g. where the sight-line has a bimodal emission distribution). In those cases we still include the emission from this inner velocity region in the integral.
Within the SMC (N HI,GASKAP,uncorr ≥ 2 × 10 21 cm -2 ) we find an inverse-noise weighted mean spin temperature of T S = 245±2 K. This is lower than the 350 K found by Dickey et al. (2000). Jameson et al. (2019) found an arithmetic mean spin temperature of T S = 117.2 ± 101.7 K, reflecting their targeting of cold gas. Outside the SMC we find T S = 156 +9 -7 K. The spread of mean spin temperatures against corrected column density for all sight-lines with detected absorption is shown in Fig. 11.
The distribution both by temperature and by corrected column density also varies by region, as shown in Fig. 11. As was done in McClure-Griffiths et al. (2018), we split the SMC into two regions, the bar running from N to SW and the wing from N to SE (see Fig. 11 bottom). The bar has a large spread in both column density and in mean spin temperature. It contains the highest densities and the highest temperatures of sight-lines in the SMC. However, there is no significant trend in temperature as column density increases in the bar. The wing has a lower column density than much of the bar and both column density and mean spin temperatures are more tightly clustered than the bar. Despite the greatly varying environment, the distribution of spin temperatures in the wing is remarkably flat (163 < T S,wing < 546 K) with no strong trend with column density. In the lower column density regions outside the SMC, there is a much wider spread of spin temperatures and again there is no significant trend of temperature with column density. In two cases (J013134-700042 and J013701-730415) our spin temperatures are unconstrained. If there is sufficient emission pollution of the absorption spectrum to drive the integral of 1e -τ negative, then the calculated mean spin temperature will be negative. As noted in section 4.1, the absorption in J013134-700042 is likely a noise artefact and this noise also results in narrow spikes of emission in the absorption spectrum. For J013701-730415, while the absorption is a clear detection, there are also more random emission spikes than expected in the spectrum.

Fraction of Cold Gas
By assuming a temperature of the cool gas (T c ) we can estimate the fraction of cool gas in each sight-line as follows (Dickey et al., 2000, Eq. 7): where N c is the column density of the CNM and N w is the WNM column density.
Following Jameson et al. (2019), we assume T c = 30 K. This gives a median cold gas fraction across the field of f c ≈ 11% with the distribution as shown in Fig. 12. The distribution within and outside the body of the SMC is not significantly different. However, this is significantly lower than the f c ≈ 20% found by Jameson et al. (2019). Dickey et al. (2000) reported a median cold gas fraction of 13% for T c = 55 K, which is equivalent to f c ≈ 7% for T c = 30 K. Our result sits between these two earlier SMC results, similarly to the mean spin temperature results in Sec. 5.3, as would be expected. The higher mean spin temperatures and lower cold gas fractions than those found in the targeted Jameson et al. (2019) survey are driven by our detection of shallower absorption in the unbiased sight-lines of the present study. We can expect that with the addition of future GASKAP observations of the SMC we will see further shallow absorption detections and thus an even lower median cold gas fraction. The key results of this survey and the earlier SMC surveys are shown in Table  5.

Comparison with the Milky Way
We compare the results for the field and the body to recent absorption studies of the Milky Way (see Fig. 13, bottom). The field shows a similar, but slightly denser, CDF to the MACH survey ( probed a window of very low density Galactic gas at high latitudes. In contrast, the SMC CDF is closest to the Perseus cloud (Lee et al., 2015) at lower R HI but lacks the higher R HI , likely as our sensitivity is not currently sufficient to probe the highest correction factors that they find. Overall this suggests correction factors similar to the envelopes of Giant Molecular Clouds.
We also compare the distribution of mean spin temperatures with the same Galactic H absorption studies. Fig. 14 shows the mean spin temperatures of all sightlines with detected H absorption, but not saturated, for the SMC body and the field surrounding the SMC, along with comparisons against the recent Galactic H absorption samples. Over 70% of temperatures in the body of the SMC are in the 200 < T S < 500 K range, with a slightly lower proportion in the field. Note that our sensitivity in this study limits our ability to detect high spin temperatures (T S ≥ 1150 K). The two distributions are still remarkably similar, likely indicating that the cold gas we detect in the field is associated with outflows from the SMC. Both the SMC body and the field show similar temperatures to the mid-high latitude (|b| > 10°) HT03 (Heiles & Troland, 2003) and 21-SPONGE (Murray et al., 2018) samples, but lacking the higher spin temperatures. These two studies surveyed a variety of environments across the Milky Way, rather than focusing on a single region.

Mean Spectrum of the SMC
The large number of sight-lines sampled through the SMC allow us to build a mean spectrum across the galaxy. This provides a view of the where the cold gas is, and is not, across the SMC. In Figure 15 we show the unweighted mean of spectra for the 43 sight-lines with σ cont ≤ 0.1 within the body of the SMC. The spectrum has not been corrected for the rotation of the SMC. Notably, there is still fine velocity structure in the mean SMC spectrum. At the primary emission peak (155 ≤ v LSR ≤ 165 km s -1 ) 14% of the gas is cold. While at the secondary emission peak (120 ≤ v LSR ≤ 130 km s -1 ) only 7% of the gas is cold. The absorption falls away much faster from the peak at 160 km s -1 towards higher velocities, predominantly associated with the wing, than towards lower velocities, which are associated with the bar. Overall, this mean spectrum has a total cold gas fraction of f c ≈ 9% and a column density correction factor of R HI ≈ 1.06.

Comparing velocity distributions of absorption and emission
21 cm emission in the SMC is known to exhibit a complex, multi-peaked velocity structure . Here we investigate whether H absorption is found preferentially in one velocity component or another, or if it follows the same velocity distribution of H emission.
To identify the velocities of detected absorption components, we take the first numerical derivative of all spectra (here in the form of 1e -τ(v) ). As the noise in the spectrum is amplified by the derivative operation, we first smooth each spectrum by a Gaussian kernel of standard deviation of two channels. Detected components are identified by their central channels (v c ) if they pass the following criteria: (1) the first derivative crosses the zero line; (2) the feature is a maximum (the derivative decreases between adjacent channels); (3) the feature is detected above 3σ at v c and above 2.8σ for adjacent channels; and (4) the feature falls within the SMC velocity range (70 < v c < 250 km s -1 ). We repeat the same process for the emission spectra. Figure 16 displays maps of the positions of detected absorption (a) and emission (b) components, coloured by v c . Multiple components per line of sight are displayed as concentric circles.
In Figure 17, we compare the cumulative distribution functions of v c for absorption and emission. Uncertainties on the distributions are computed by bootstrapping each sample with replacement over 10 4 trials, and represent the 1 st through 99 th percentiles. Although the distributions are indistinguishable between 100 < v c < 150km s -1 , absorption components are not found in these data beyond v c ∼ 175 km s -1 whereas emission components extend to v c > 200 km s -1 . Using a two-sided Kolmogorov-Smirnov test across 10 4 bootstrapped trials we find a p-value∼ 0.17. This preliminary result may indicate that the absorption components are more likely to be found in low-velocity gas in the SMC relative to emission components. Future, deeper GASKAP observations of the SMC will provide the opportunity to examine this result further.

Conclusion
In this paper we have presented the first untargeted survey of H absorption in the SMC, alongside the pipeline used to process these and future GASKAP H absorption data. We have produced H absorption spectra against 337 continuum sources, matched those to emission spectra from the Pingel et al. (2022) SMC emission cube, and analysed 229 of those which meet our quality requirements. This represents a 275% increase on sight-lines over previous studies of the SMC region. We have also described the major choices for the pipeline and demonstrated the effectiveness of the pipeline using the SMC data as an example.
We have found 122 absorption features across 65 of the spectra, with no features detected in the other 164 spectra. Within the body of the SMC (N HI,uncorr > 2 × 10 21 cm -1 ) we found absorption features against 49 of 79 (62%) spectra, including in all sight-lines with S cont ≥ 100 mJy. Outside the SMC body we found absorption features against only 16 of 150 (11%) of spectra.
From this first work exploring GASKAP-HI's unbiased view of absorption across the SMC and its surrounds, we have made two primary findings: • The median fraction of H in the SMC in the CNM is f c ≈ 11%, which is lower than found in Jameson et al. (2019) but higher than the fraction reported in Dickey et al. (2000). This value is driven by the observation of more shallow absorption in less dense regions. This new fraction better represents the cold gas across the whole of the SMC, rather than just the denser regions observed in Jameson et al. (2019). However, it is potentially still an over estimate due to our sensitivity limits on detection of shallow absorption. Future more sensitive GASKAP-HI observations of the field will refine this figure. • The range of column density correction factors for optically thick gas varies greatly between the wing and the bar of the SMC. The bar shows a linear increase in correction factor with column density (R HI 1 + 0.51[log 10 N -21.43]) whilst the wing shows a large variety of correction values with no relationship to uncorrected column density. Overall, the SMC is closer in optical thickness to Galactic molecular clouds, such as the Perseus cloud, whilst the field surrounding the SMC is similar to low density fields at high Galactic latitudes.
In addition we have found that: a) The SMC has an inverse noise weighted mean spin temperature T S = 245 K; which is lower than that found by Dickey et al. (2000) b) There is no significant trend of mean spin temperature with column density c) The equivalent widths and uncorrected column densities we see in the SMC are predominantly within the Milky Way cold gas formation thresholds proposed by Kanekar et al. (2011), however we see a higher limit for the condensation of dense H into molecular H 2 saturation in the SMC, due to the lower metallicity of the SMC, and we see hints of cold gas formation below their threshold d) Fine velocity structure is present even in the mean SMC spectrum, which implies that this structure may also be seen in extra-Galactic experiments e) There are indications that absorption components may be preferentially found in the lower velocity gas of the SMC as compared to the emission components.
Further exploration of the pilot SMC absorption data is planned in future GASKAP-HI papers. In addition, a much longer observation (up to 200 hrs) of the SMC in the full GASKAP-HI survey will obtain H absorption spectra up to 3 times as sensitive as the spectra presented here. We will have the opportunity to explore the column density threshold for the formation of cold H proposed by Kanekar et al. (2011) in more detail. We also expect that we will see further clear detections of shallow absorption features, and thus slightly refine the cold gas fraction for the SMC. the NSF grant AST-2108370. We thank the anonymous reviewer for their thoughtful comments that have improved this paper.