MWA tied-array processing III: Microsecond time resolution via a polyphase synthesis filter

A new high time resolution observing mode for the Murchison Widefield Array (MWA) is described, enabling full polarimetric observations with up to 30.72 MHz of bandwidth and a time resolution of ~0.8 $\mu$s. This mode makes use of a polyphase synthesis filter to"undo"the polyphase analysis filter stage of the standard MWA's Voltage Capture System (VCS) observing mode. Sources of potential error in the reconstruction of the high time resolution data are identified and quantified, with the S/N loss induced by the back-to-back system not exceeding -0.65 dB for typical noise-dominated samples. The system is further verified by observing three pulsars with known structure on microsecond timescales.


INTRODUCTION
Some of the most exciting advances in time-domain astronomy have only been made possible by pushing the capabilities of latest generation telescopes to be sensitive to signals of shorter and shorter duration. The serendipitous discovery of pulsars in the late 1960's is perhaps the prototypical example (Hewish et al., 1968). In more recent times, the ongoing effort to detect nanohertz gravitational waves by means of pulsar timing arrays requires the continual monitoring of the times of arrival (TOAs) of millisecond pulsars (MSPs) with microsecond accuracy (e.g. Hobbs & Dai, 2017). Pulsars are also known to exhibit temporal structures on microsecond and even nanosecond time scales (e.g. Craft et al., 1968;Hankins et al., 2003), providing major clues for the underlying radio emission mechanism (e.g. Cordes, 1981;Popov et al., 2002). Similarly, fast radio bursts (FRBs) have been shown to exhibit temporal sub-millisecond structures that either point to the intrinsic emission mechanism or to interesting propagation effects occurring in the intergalactic medium (Farah et al., 2018;Hessels et al., 2019). All of these examples serve to illustrate the scientifically important and still largely untapped parameter space that is only accessible to telescopes equipped with a sufficiently high time resolution observing mode.

The
Murchison Widefield Array (MWA; Tingay et al., 2013) is a low-frequency (∼ 80 to 300 MHz) aperture array telescope located at the Murchison Radio Observatory (MRO) in Western Australia. Now in its second phase of development (Phase II; Wayth et al., 2018), it consists of 256 'tiles' (sets of 4 × 4 cross-dipole antennas) distributed over an area approximately 5.3 km in diameter, 128 of which can be used at a single time to form an interferometer. Originally conceived as an imaging telescope (which requires only the time-averaged cross-correlation products of the tiles, or 'visibilities', to be retained on disk), it was subsequently augmented with the functionality to capture the raw complex voltages of each tile, known as the Voltage Capture System (VCS; Tremblay et al., 2015). This system has enabled the MWA to be used as a premier instrument for high-time resolution studies of transient signals, especially pulsars (e.g. Bhat et al., 2016;McSweeney et al., 2017;Meyers et al., 2018;Kirsten et al., 2019).
Although the tile voltages are sampled at a (Nyquist) rate of 655.36 MHz, these data undergo several stages of processing before finally being written to disk. After preliminary filtering and digitisation, the raw voltages are subjected to a two-stage frequency analysis filter, which trades time resolution for increased frequency resolution. In the MWA's case, both stages of the anal-ysis filter were implemented as polyphase filterbanks (PFBs; Harris & Haines, 2011;Prabu et al., 2015). The first stage ('coarse') PFB reduces the effecting sampling rate by a factor of 512, resulting in an array of complexvalued samples with 1.28 MHz resolution in frequency ('coarse' channels) and ∼ 0.8 µs in time. In the second stage ('fine') PFB, each coarse channel is further split into 128×10 kHz 'fine' channels at the cost of decreasing the time resolution to 100 µs.
In the current MWA system design, only the latter time resolution data product (i.e. 100 µs) is made available to the user. While this is sufficient for many pulsar studies (e.g. Oronsaye et al., 2015;McSweeney et al., 2017;Bhat et al., 2018;Meyers et al., 2018), it is nevertheless too coarse for many science applications involving MSPs. In principle, the original higher time resolution can be recovered from the channelised output (either approximately or exactly) by means of a synthesis filter, which acts as an 'inverse' operation to the analysis filter. The conditions under which the original time series can be exactly reproduced depends on the choice of analysis and synthesis filters.
Here, we describe the synthesis filter that is implemented as the (optional) final stage of the tied-array beamforming pipeline, the former stages of which are described in detail in Ord et al. (2019, hereafter Paper I) and Xue et al. (2019, hereafter Paper II). The synthesis filter is applied to the fine-channel output of the beamformer, and recovers the coarse channel time series. That is, it effectively 'undoes' the fine PFB, increasing the available time resolution to ∼ 0.8 µs. A brief review of PFBs in general, and their particular implementation in the case of the MWA, are given in Section 2. The design of the synthesis filter is described in Section 3, including a discussion of its fidelity, i.e., the appearance of any temporal and spectral artefacts introduced by the synthesis filter itself. Finally, the practical use of this functionality is demonstrated in Section 4.1 through three examples: MWA observations of the PSRs J2241−5236, J0437−4715, and B0950+08 (J0953+0755).

POLYPHASE FILTERBANKS (PFB)
PFBs are a type of analysis filter, designed to extract spectral information out of discrete time series data. They can be considered a generalisation of the more familiar discrete Fourier transform (DFT), and are designed to overcome the undesirably uneven frequency response (i.e. spectral leakage) inherent in the application of DFTs to discretely sampled time series of finite length. PFBs are well described in standard texts (Crochiere & Rabiner, 1983;Harris, 2004;Oppenheim & Schafer, 2009); a clear and concise review of PFBs in the context of radio astronomy is given in Harris & Haines (2011). Thus, only a brief review of the salient features is given, in order to prepare the reader for the description of the synthesis filter in the following sections.

General review and mathematical notation
The PFB is a transformation from the time domain, x[n], to the frequency domain X k [m], where the [·] notation denotes a discretised index, k is the channel number, and n, m ∈ Z are the time indices for the pre-and post-channelised data, respectively. Let K be the number of (equally spaced) frequency channels required for some desired spectral resolution. Although applying a DFT to N = K adjacent time samples in x[n] will produce the desired resolution, the result will be an imperfect representation of the true spectrum because of spectral leakage, which is when power that properly 'belongs' to some particular frequency bin appears in (or, is aliased to) other, nearby bins. The impossibility of perfectly eliminating spectral leakage can be seen by realising that operating on a finite-length time series is equivalent to multiplying an arbitrarily long time series with a rectangular window function, whose effect in the frequency domain is to convolve the "true" spectrum of the signal with the Fourier transform of the window function. In the case of a rectangular window, this is the sinc function. A common strategy for mitigating spectral leakage is to choose an alternative window function whose Fourier pair is localised in the frequency domain, and which therefore produces a tolerable level of spectral leakage when convolved with the signal's spectrum. Many possible windowing functions have been identified, which in general trade the 'amount' of leakage with the 'location' of the leaked components. For our purposes, it is sufficient to note that the inevitable presence of a windowing function motivates the definition of the analysis filter, h [n], and the generalised windowed DFT where j = √ −1 denotes the imaginary number, m is the time index of the channelised (output) data, and 0 ≤ k < K denotes the channel number. Eq. (2) describes the action of performing DFTs on short, windowed segments of the input time series of length K. M is the number of samples that the window is translated along x[n] between successive DFT operations; thus, the index of h[mM − n] represents the shift required in order to produce the spectrum at time m. If M < K then the windows overlap and the resulting channelisation is oversampled; if M = K, then it is critically sampled. The choice of h[n] is motivated by the shape of its frequency response (i.e. its Fourier pair), whose characteristics (e.g. width, location of sidelobes) are chosen according to the advantages they carry in particular contexts. Leaving x[n] "unweighted" is equivalent to choosing h[n] = w R [n], in which case Eq. (2) merely describes a DFT performed on each successive window, which in this context is also called a short-time Fourier transform (STFT). On the other hand, choosing h[n] to be the sinc function will result in a frequency response that approximates a rectangular window.
It is well known that scaling a function in the time domain produces the inverse scaling in the Fourier domain. This fact motivates an alternative strategy for mitigating spectral leakage. Choosing a larger window size, N = KP (for integer P > 1), and a corresponding wider analysis filter, will result in a frequency response that is similar in shape, but P times narrower than the frequency response of the original analysis filter. A DFT applied to the larger number of samples will naturally produce a correspondingly larger number of (more closely spaced) frequency channels, but choosing only every P th channel and discarding the rest (known as decimation) ensures that the desired spectral resolution with K channels is retained. In this way, spectral leakage can be contained arbitrarily close to the "correct" channel by choosing a sufficiently high value of P .
The two-step algorithm described above (performing a windowed DFT on N = KP samples and decimating the resulting spectrum) defines the PFB. Formally, it is equivalent to Eq. (2); however, in this context the term critically sampled (i.e. M = K) implies that the Nlength windows will now overlap. The term "polyphase" derives from the fact that each block of K = N/P samples (known as taps) in x[n] is included in multiple applications of the DFT, but appearing at a different relative phase in each case.
One of the great advantages of the critically sampled PFB is the existence of a mathematically equivalent but computationally efficient implementation. It can be shown that Eq. (2) is equivalent to first segmenting the windowed time series into taps, summing their respective samples element-wise, and performing a single DFT on the resulting array (now also of size K), i.e.
A short proof of this equivalence is given in Harris & Haines (2011). The procedure described by Eq.
(3) is called the weighted overlap-add algorithm, and is illustrated in Fig. 1.

MWA implementation of the fine PFB
The first stage (coarse) PFB is described in Prabu et al. (2015), and here we only document a few details pertaining to the second stage (fine) PFB. A PFB is specified by (1) the number of output channels, K, (2) the number of taps, P , and (3) the analysis filter, h [n]. For the MWA, K = 128 (giving fine channels 10 kHz wide), P = 12, and is the Hanning window, and is defined to be symmetric around sample n = 767 = N/2 − 1. The analysis filter is shown in Fig. 2. The MWA's fine PFB is critically sampled, with M = K = 128. The rationale behind the particular design choices for the MWA's fine PFB is beyond the scope of this paper-for the purposes of creating a synthesis filter, it is sufficient to know merely how the analysis filter is defined 1 , and the fact that the PFB is critically sampled. The fine PFB is implemented on field-programmable gate arrays (FPGAs) 2 and was designed in such a way to accommodate the data rate and bit depth constraints set by the surrounding hardware. The input signal values are signed (5+5)-bit complex integers, and the final outputs are (4+4)-bit complex integers. The loss of precision associated with the demotion of 1 bit naturally places limits on the ability for any synthesis filter to perfectly reconstruct the original coarse channel time series, which is analysed for our system below. The full details of the FPGA implementation are given in the appendix.

SYNTHESIS FILTERS
In order to regain the time resolution lost during analysis, the spectral output of the PFB must be transformed In this expression, the index m is allowed to run over all integers in order accommodate an arbitrarily large synthesis filter. Back-to-back analysis-synthesis filters can be designed so that the original time series can be perfectly reconstructed without any loss of signal (i.e.x[n] = x[n] exactly), and the system as a whole can be thought of as an identity operation. The condition for perfect reconstruction can be found by substituting the analysed spectrum obtained from Eq. (2) into the synthesis oper-ation defined in Eq. (5). For a critically sampled system, where, following Crochiere & Rabiner (1983), we have adopted the shorthand notation Perfect reconstruction corresponds to the case when the only non-zero contribution comes from the s = 0 term, giving rise to the necessary and sufficient condition for all values of n, where δ 0 s is the Kronecker delta. Finding the synthesis filter that perfectly inverts a given analysis filter is tantamount to solving Eq. (7) for f [n]. However, exact solutions do not always exist, and in general, numerical methods must be employed to find a synthesis filter that minimises the reconstruction error.

Reconstruction error
In the context of an astrophysical signal, it is desirable to quantify the reconstruction error in terms of the signal-to-noise ratio (S/N). If we assume that individual samples follow, and are dominated by, the same Gaussian noise statistics, then the reduction in S/N of a reconstructed sample can be estimated by considering the fraction of the power of the reconstructed sample that came from the original sample: This expression is invariant under the substitution n → n+λM , λ ∈ Z, which indicates that the (average) reconstruction error is only a function of where the sample in question falls within a tap. Consequently, any synthesis filter (except the exact inverse of the analysis filter, if it exists) will introduce a "ringing" effect into the reconstructed time series that has a period equal to the size of the PFB tap.
The "leakage" of power into other samples implied by Eq. (8) can also manifest itself as spurious imaging of a "true" signal at intervals of one tap. This can be seen by considering the effect of a single-sample impulse, with a power much greater than the ambient noise, on the reconstructed time series. The sample itself, say, x[n], will be reconstructed with high fidelity, as the reconstructed error is only comprised of contributions from the (relatively) low-level noise. The reconstruction error of the sample x[n + K], however, will be dominated by the term in Eq. (6) involving x[n], according to the relative weighting introduced by the analysis and synthesis filters. Thus, the impulse will reappear at intervals of one tap, but where the relative power of each appearance is given by This effect, termed temporal imaging, is discussed further in the context of specific filters.

Optimal and sub-optimal filter designs
Minimising the loss of S/N is equivalent to solving Eq. (7) using least squares regression. Since any given reconstructed sample only receives contributions from samples spaced one tap apart, Eq. (7) can be thought of as M independent conditions, one for each tap position, n = 0, 1, 2, . . . , M − 1. Thus, considering each tap position separately, each condition can be expressed as a minimal matrix equation, where H (n) , F (n) , and D are matrices whose elements are given by where P is the number of taps in the analysis filter, and the size of F (n) is set to the desired number of (nonzero) taps in the synthesis filter, P ′ . The indices are chosen such that, owing to the finite size of the analysis filter, the smallest H (n) that captures every non-trivial term in Eq. (7) is the (P ′ + P − 1) × P ′ matrix  with row number i = (P ′ + P )/2 (in H (n) and D) corresponding to the s = 0 term 3 .
Once the matrices H (n) , F (n) , and D have been defined, solution by least squares regression can proceed in the usual way, yielding the best-fit filter coefficientŝ where H (n) T is the transpose of H (n) . With this notation, the reconstruction error can be more concisely expressed whereD = H (n)F (n) . Fig. 3 shows the solutions found for the MWA's analysis filter defined in Eq. (4), for 12-, 18-, and 24-tap synthesis filter sizes. Despite the complexity of the synthesis filters, they all share the same basic form as the analysis filter. This resemblance suggests that choosing a synthesis filter that is the mirror image of the analysis filter (i.e. f [n] = h[−n]), might also yield a reasonably good reconstruction, without having to calculate the optimal solutions numerically.
The performance of both the (sub-optimal) mirror filter and the (optimal) set of least-squares solutions can be evaluated straightforwardly using Eq. (14). Fig.  4 compares the loss of S/N due to each filter (under the assumption of noise-dominated samples), revealing that, as expected, the filters with the larger number of taps perform better, and the mirror filter performs the least well.  8) and (14), as a function of the position of the reconstructed sample within a tap for the filters displayed in Fig. 3. Bottom: The effect of temporal imaging, quantified in Eq. (9), demonstrating how power "leaks" across adjacent taps during reconstruction. A grey dashed line is drawn at −25 dB to aid visual comparison. Fig. 4 suggests that one can achieve arbitrarily good performance by choosing a sufficiently long filter. However, the better performance of the longer filters is offset by the increased computational resources and/or time required to apply them, which may exceed the system's design specifications. Note that the mirror filter, being the same size as the 12-tap least squares filter, offers no advantage in terms of minimising the reconstruction error. On the other hand, the way that the signal power is distributed across multiple taps is markedly different for the two filter types, as shown in the bottom panel of Fig.  4. The mirror filter performs slightly worse in nearby taps, but drops off more quickly further out. For applications where the samples are signal-dominated, and minimising temporal imaging is more important than measuring the S/N of the signal precisely, the mirror filter may therefore offer some advantage over the least squares solution. However, observations of pulsars only rarely fall into this regime, even for observations of single pulses, where typically many samples must be averaged before the signal becomes more significant than the noise. A more detailed analysis of these effects is therefore outside the scope of the present work.

MWA Implementation
We initially implemented the mirror filter for the purpose of prototyping the synthesis filter, due to the ease of generating the coefficients, and the 12-tap least squares filter was implemented once proof-of-concept had been demonstrated. Both of these synthesis filters are available as optional components of the MWA-VCS beamforming software, VCSTools 4 , described in Paper I. Even though our system places no stringent limits on computational resources, we have not implemented the larger filters due to the identification of other sources of error that dominate over the reconstruction error defined above, rendering the advantages of the longer filters relatively minor in comparison. An analysis of these errors is presented in the following sections.
Viewed as a linear operation, Eq. (5) is ideally suited for GPUs, and we have implemented it in VCSTools for NVIDIA's CUDA architecture. Rather than use existing implementations of the implied inverse DFT, the N × K exponential terms (the so-called "twiddle factors") are pre-calculated and stored in a 2D array of double-precision floating point numbers on the GPUs. These are then accessed as required for the calculation of a given samplex [n]. The combined GPU-accelerated calculations for both beamforming and coarse channel reconstruction run faster than real time 5 .
Once the synthesis filter has been applied to each of the MWA's polarisation streams for each coarse channel, the resulting high-time resolution time series is written out as complex voltages (i.e. without converting to Stokes parameters) in the VDIF file format (Whitney et al., 2009). This format was chosen because it is supported by the coherent de-dispersion functionality of DSPSR 6 (van Straten & Bailes, 2011), a software suite for processing pulsar time series. The VDIF format requires each sample to fit into a signed 8-bit complex integer data type, which is performed as the final step before writing to disk.

SYSTEM PERFORMANCE
The MWA implementation of the fine PFB differs from Eq. (3) in a few important respects, causing a reduction in the reconstructed S/N that would be present even if the synthesis filter perfectly inverted the analysis filter. The most significant difference is that both the coefficients b[n] and the final sum undergo a rounding operation, resulting in the approximate (i.e. both less precise and biased) spectrum where ⌊·⌋ and ⌊·⌋ asym denote symmetric and asymmetric rounding operations respectively, described in Appendix A.
The non-linearity of the back-to-back system induced by Eq. (15) implies that there is no single impulse response test that can adequately characterise the whole system. To wit, the results of an impulse response test depend sensitively on at least three factors: the magnitude of the impulse; its position within a tap; and the arbitrary scaling factor and quantisation applied at the end of the test required by the integer-based output formats. For example, an impulse at tap position n ≡ 0 (mod K) will be perfectly reproduced if its magnitude is such that the amount of rounding that takes place during analysis is minimised. On the other hand, an impulse at n ≡ 64 (mod K) can be chosen such that the response is significantly worse than that implied by Fig.  4.
For this reason, and also because the vast majority of applications of this system falls in the regime of noisedominated samples, we have decided to forego the traditional impulse response test in favour of a back-toback test involving real data collected expressly for this purpose. This is an empirical test of system that made use of a non-standard observing mode to record a small amount of simultaneous coarse and fine channel data (i.e. both before and after the fine PFB analysis filter stage). The fine channels from one polarisation of a single tile were extracted and subjected to both the mirror The real (top) and imaginary (bottom) components are shown separately. The original coarse channel data (black) are compared with the data reconstructed using the 12-tap least squares filter (red), with the residuals plotted in the lower panels. Because the VDIF data uses an arbitrary scaling factor, the red line has been rescaled by eye for a better visual comparison.
filter and the 12-tap least squares filter to reconstruct the 1.28 MHz coarse channel time series, which could then be compared directly with the original data. The real and imaginary parts of the time series resulting from the least squares filter are shown in Fig. 5. With the original and reconstructed time series in hand, we estimated the S/N loss for each tap position by comparing the variance of the original time series (the 'signal'), σ 2 S , with the variance of the residuals (the 'noise'), σ 2 N . In analogy with Eq. (8), Recognising that some of the noise variance is due to the synthesis filter (cf. Fig. 4), we have estimated  16) and (17)).
the contribution due to rounding errors (and any other implementation-specific effects) by calculating the "filter-subtracted" S/N loss: where σ filter is derived from Eq. (8). Fig. 6 shows the results for just under one second's worth of data (1278464 × 0.78 µs samples), where 1536 samples (one tap) were excised due to the synthesis filter being applied to zero-padded data beyond the edge of the second. The remaining samples were sorted into their respective tap positions, and the variances calculated for each set (containing 1278464/128 = 9988 samples). Near the edges of the tap, the rounding errors dominate the reconstruction noise, which we estimate contributes roughly −0.26 dB of signal loss at all tap positions. In more central tap positions, however, the contributions from the rounding errors and the filters are comparable, and significant improvements could made by using longer filters. Nevertheless, unless there is a way to mitigate the rounding errors during the application of the analysis filter (explored below), it is unlikely that we will be able to achieve a total S/N loss better than approximately −0.26 dB at any tap position.

Mitigation of rounding errors
The original (i.e. before applying the fine PFB) hightime resolution samples are (5+5)-bit complex integers, scaled such that the bit occupancy is not so low that information is lost, and not so high that individual samples are clipped. Interestingly, one may question whether or not this known quantisation can be used to recover some of the information lost by the imperfect back-to-back system. If the errors are sufficiently small, then re-quantising the reconstructed samples will correct more errors than it introduces, giving overall better performance.
In this section, we offer a short proof of the condition for which re-quantisation results in a net recovery of S/N for samples following Gaussian statistics. As before, let σ 2 N be the variance of the residuals which are assumed to be drawn from a normal distribution with zero mean. The effect of re-quantisation on the variance is that every sample between k − 1 2 and k + 1 2 , for any integer k, gets "counted" as if it had the value k. The adjusted variance,σ 2 N , can then be expressed in terms of σ 2 N via the second moment, where the last equality takes advantage of the symmetry of the error function. Re-quantisation is beneficial precisely whenσ 2 N < σ 2 N , which can be shown by numerical methods to occur when σ 2 N 0.29. The MWA test data set presented above has a variance of approximately 6.7, which implies that the total S/N loss of the back-to-back system would have to be smaller than −0.05 dB before requantisation would improve the signal reconstruction. As Fig. 6 shows, this is not met for any tap position, so we did not pursue this possibility further. However, systems that are able to achieve better than σ 2 N 0.29 may do better yet by re-quantising the suitably scaled reconstructed samples.

Verification using pulsar observations
Observations of three pulsars are presented here to validate the synthesis filter as a useful tool for undertaking high time resolution studies of pulsars with the MWA: MSPs J2241−5236 and J0437−4715, and the bright, long-period PSR B0950+08. The first two pulsars were processed with the mirror filter, and B0950+08 was processed with the 12-tap least squares filter. As Fig. 6 implies, the use of the mirror filter instead of the more optimal least squares filter would result in a further ∼ 0.1 dB reduction in S/N.  0.2 ms, or 45 • of rotation phase, consistent with the width of the average profile formed from incoherently de-dispersed, fine channelised data recorded with the MWA-VCS system. Upon applying the synthesis filter and coherently de-dispersing the reconstructed time series sampled at the much higher rate of 1.28 MHz, the same data revealed exquisite detail in the average profile, including a pair of pre-cursor components that are only marginally visible (if at all) at higher frequencies (Fig. 7). These have since been confirmed during followup observations (Kaur et al., in prep) using the Band 3 receiver (250 to 500 MHz) of the upgraded Giant Metrewave Radio Telescope (uGMRT). In addition, these new, low-frequency observations have allowed us to measure the DM of this pulsar with unprecedented precision (∼ (2-6) × 10 −6 pc cm −3 ), with important consequences for measuring DM chromaticity and evaluating its effect on pulsar timing experiments. These results are discussed in detail in Kaur et al. (2019).

PSR J0437−4715
PSR J0437−4715 is a nearby MSP (distance ≈ 157 pc; period P ≈ 5.76 ms) and an established target for pulsar timing array (PTA) applications. It boasts a complex, multi-component polarimetric profile (Fig. 8) that spans more than half of its rotation period (cf. Yan et al., 2011). It was used in Paper I as a verification of the beamforming method employed in the VCS processing pipeline, although there is a noticeable difference between the circular polarisation published in that work and that shown here (see the Figure caption for details).
As evident from Fig. 8, the application of the synthe- Figure 8. Profiles of PSR J0437−4715 formed from the same data set. Top: incoherently de-dispersed with 10 kHz (fine) frequency channels formed from the standard PFB analysis filter. Bottom: coherently de-dispersed using 1.28 MHz (coarse) frequency channels reconstructed using the synthesis filter described in this paper. The higher time resolution profile shows features (e.g. notches in the total intensity profile at pulse phases ∼ 0.54 and ∼ 0.7) that are obscured by dispersion smear at the lower time resolution. A comparison with these profiles with that published in Paper I reveals an excess of circular polarisation on both the leading and trailing edges of the profile. The reason for the discrepancy is not clear, but it should be noted that the earlier profiles were published before the polarimetric verification of Paper II was carried out.
sis filter successfully recovers some curious fine structure (e.g. notch-like features at pulse phases ∼ 0.54 and ∼ 0.7) characteristic to this pulsar, first reported in early high time resolution studies made at 430 MHz (Navarro et al., 1997). The pulsar detection (for Stokes I only) was originally presented in Bhat et al. (2018), where the combination of a lower time resolution (100 µs) and a non-negligible dispersive smearing (∼45 µs) obscured the detection of these fine structures. Fig. 8 thus presents the highest-fidelity detection of this important pulsar at frequencies below 200 MHz.

PSR B0950+08
B0950+08 is a bright, long-period (P = 0.253 s) pulsar known to exhibit microstructure (Popov et al., 2002;Kuzmin et al., 2003). A total of eighty seconds (315 pulses) of data (128 × 10 kHz fine channels) were recorded, beamformed, and subjected to the polyphase synthesis filter. Across the MWA bandwidth, we used the RM synthesis technique 7 (Brentjens & de Bruyn, 2005) to measure a rotation measure (RM) of 1.43 ± 0.15 rad m −2 , which is consistent with previous measurements of this pulsar's RM (e.g. Noutsos et al., 2015). The mean profile formed from the one coarse channel is shown in Fig. 9. Several individual pulses were sufficiently bright to see substructure. One such pulse is showcased in Fig. 10, where all 24 coarse channels were integrated to maximise the S/N. To highlight the pulse's substructure, we show it at three different time resolutions (corresponding to the chosen bin width of each plot). Using observations at 102.5 MHz, Kuzmin et al. (2003) report microstructure with a characteristic timescale of ∼ 60µs, which we are able to unambiguously identify in our data. We suggest that the finest observable structures (Fig. 10, bottom plot) might correspond to the ∼ 10 µs structures reported by Popov et al. (2002) at 1.65 GHz. However, Kuzmin et al. (2003) used only a single linear polarisation feed, and Popov et al. (2002), only a left-handed circularly polarised feed. With the benefit of full Stokes polarimetry, we are able to verify that on the smallest time scales, the emission is almost 100% polarised-mostly linear, but with a small amount of circular polarisation during the brightest part of the pulse. This is evidence that the errors introduced by the analysis-synthesis filter do not contribute a significant amount of polarisation leakage into neighbouring time bins-in particular, at 50 µs intervals.

CONCLUSION
The MWA-VCS telescope system outputs voltage data with a frequency resolution of 10 kHz and a time resolution of 100 µs, which is suitable for many pulsar applications, as demonstrated in several pulsar science papers published to date. In this paper, we have presented an algorithm to undo the fine channelisation stage (fine PFB) of the VCS pipeline by using a synthesis polyphase filter, implemented as part of our processing software suite, VCSTools. The result is a reconstructed, coarse channelised (pre-PFB) time series, suitable for high-time resolution studies of MSPs and other rapid transient signals. Although the reconstruction is not perfect, the average error does not exceed −0.65 dB for noise-dominated samples.
We have further verified our system by observing PSRs J2241−5236, J0437−4715, and B0950+08, all of which are known to exhibit structures on ∼ 1-10 µs timescales. In each case, our results have been shown to be consistent with observations at other radio frequencies. The advent of the VCS high time resolution observing mode thus distinguishes the MWA as a premier low-frequency instrument for studying pulsars and other transients at microsecond resolution, with important implications for studies of the pulsar emission mechanism, the characterisation of the ISM in the ongoing search for gravitational waves, and others.

A THE FPGA IMPLEMENTATION OF THE FINE PFB
Each coarse channel is independently processed by dedicated FPGAs to convert (5+5)-bit complex integers sampled at 1.28 MHz into a series of spectra composed of 128×(4+4)-bit complex integers sampled at 10 kHz. The PFB is implemented in two stages: (1) a "frontend" stage which prepares the array b[n] (see Eq. (3)), and (2) the "FFT" stage, which calculates the 128-point spectrum, X k [m]. The first stage involves a multiplication of a window of 1536 consecutive coarse channel samples with the analysis filter shown in Fig. 2, and then a summation of the 12 taps together to produce a single array of 128 complex integers. The combination of the allowed input values [−16, 15] and the filter values guarantees that the magnitudes of the summed (signed) numbers do not exceed 2 21 = 2097152. At this stage of the processing, they are stored as 48-bit signed integers, of which only the bottom 22 bits are therefore significant. Each integer n (either real or imaginary) is then reduced from 48 bits to 8 bits in the following manner. If n is positive, then bits 14 through 21 (counting from the least significant bit) are selected to form the 8-bit integer, and 1 is subsequently added if bit 13 is 1. This is equivalent to rounding the number n/2 14 to the nearest integer, where fractional values of 0.5 are always rounded up. If n is negative, then bits 14 through 21 are selected, but no rounding occurs. This is equivalent to applying the floor operation to n/2 14 . It should be noted that this rounding scheme introduces a bias into the distribution of values. In particular, the distribution of 8-bit values has a different mean than the distribution of the original 5-bit values, and it results in an artificial deficit in the number of values that get rounded to zero. The rounding scheme and its effect on the distribution are illustrated in Fig. 11, including a displaced mean which artificially adds power to the DC bin of the spectrum.
The second stage calculates the spectrum of b[n] using the standard fixed point FFT algorithm that is implemented on (Xilinx Virtex4 XC4VSX35) FPGAs. The output values are scaled and (symmetrically) rounded, producing (4+4)-bit output values. No malised f equency Figure 11. The first-stage (i.e. pre-FFT) rounding scheme implemented in the MWA FPGAs compared with a symmetric scheme. The top panel illustrates the two rounding schemes (each has been given a slightly different y-offset for visual clarity). A population of 5-bit samples was drawn from a rounded Gaussian distribution with mean µ = 0 and standard deviation σ = 2. The vertical dashed lines show the means of the distributions after the population has been subjected to the first stage of the PFB (i.e. converted to 8-bit integers) and rounded according to the two different schemes. The bottom panel compares the distributions of the resulting population of 8-bit samples.