Fast Pixelated Detectors in Scanning Transmission Electron Microscopy. Part I: Data Acquisition, Live Processing, and Storage

Abstract The use of fast pixelated detectors and direct electron detection technology is revolutionizing many aspects of scanning transmission electron microscopy (STEM). The widespread adoption of these new technologies is impeded by the technical challenges associated with them. These include issues related to hardware control, and the acquisition, real-time processing and visualization, and storage of data from such detectors. We discuss these problems and present software solutions for them, with a view to making the benefits of new detectors in the context of STEM more accessible. Throughout, we provide examples of the application of the technologies presented, using data from a Medipix3 direct electron detector. Most of our software are available under an open source licence, permitting transparency of the implemented algorithms, and allowing the community to freely use and further improve upon them.


I. INTRODUCTION
Several technological advances have been critical in the development of the scanning transmission electron microscope (STEM) from its inception (von Ardenne, 1938) to its current status as one of the most important techniques for high resolution imaging of materials. Specifically, improved vacuum systems, field emission sources, aberration correction and the introduction of annular dark field (ADF) detectors (Crewe et al., 1968;Crewe, 1966) were all crucial developments. ADF detectors are typically formed of one or more PN diode segments or scintillator photomultiplier tube arrangements. These are placed in the far field of the objective lens and sample an angular range of the diffraction pattern of the area of the sample illuminated by the electron beam. Such devices are wellsuited for use in STEM due to their fast readout, and imaging with pixel dwell times measured in microseconds is normal. ADF imaging was initially understood as being based on Z-contrast (Crewe, 1970a,b), though understanding of the contrast mechanism evolved over time (Donald and Craven, 1979), in turn influencing the design of such detectors. In particular, later contributions demonstrated that the inner angle of ADF detectors had to be relatively high to exclude coherent diffraction from dominating the signal (Hartel et al., 1996;Pennycook and Jesson, 1991). Other refinements of this arrangement have been introduced over the years, including the * Magnus.Nord@uantwerpen.be † Gary.Paterson@glasgow.ac.uk use of split detectors for differential phase contrast (Chapman et al., 1990(Chapman et al., , 1978Dekkers and de Lang, 1977;McGrouther et al., 2014), multiple annular detectors (Shibata et al., 2017, and the use of bright field or annular bright field imaging Hammel and Rose, 1995;LeBeau et al., 2009;MacLaren et al., 2013). However, all these detector configurations integrate over large angular ranges of the back focal plane, resulting in the loss of most of the information contained in the diffraction pattern. Furthermore, space constraints in the microscope's camera chamber can limit which detectors can be used simultaneously in an individual experiment, so that data acquisition may have to be repeated several times from the same area using different detectors to collect all the signals of interest. This can lead to difficulties in correlating the information contained in images acquired in successive experiments due to drift, and results in a higher overall dose to the sample, which is undesirable for beam-sensitive samples.
Along with the many advantages that fast pixelated detectors bring, many practical limitations arise from their use, such as the ability to get real-time information from the data stream produced from a scan to enable navigation and identification of relevant sample features, and the storage and processing of very large datasets, often much larger than the available computer memory. In this paper (Part I), we present solutions for the hardware control, data acquisition, real-time processing and visualisation, and storage of data from fast pixelated detectors. The names of the software packages, modules, classes and functions we present are given in typewriter font.
The majority of the software solutions presented in this work are made available under the free and open source GPLv3 license, allowing transparency of the implemented algorithms, and the ability for anyone to use and to further improve upon them. Although some aspects of the codebase are specific to the use of a Medipix3 detector (Ballabriga et al., 2011), many of the techniques and tools are applicable to a wide range of other detectors.
Most of the libraries reported here are implemented in Python. Python, being an open and free programming language is rapidly becoming the standard language for many aspects of scientific computing (Gouillart et al., 2016;Oliphant, 2007). In addition to its comparative ease of use, which lowers the barrier for people to contribute and minimises developer time, Python has an extensive standard library and a large ecosystem of external libraries, including ones for optimised numerical (Oliphant, 2006) and scientific (Jones et al., 2001) computing, image processing (van der Walt et al., 2014), data visualisation (Hunter, 2007), and work flow documentation (Kluyver et al., 2016). Furthermore, it is straightforward to link Python to low level C-code, allowing development of optimised routines or use of external libraries (Behnel et al., 2011).
Within the electron microscopy community, a number of Python packages have also been developed. One example of this is HyperSpy (de la Peña et al., 2018), which contains functionality for processing data from a wide range of TEM techniques: electron energy loss spectroscopy, energydispersive X-ray spectroscopy, electron holography, and more standard imaging. It also serves as a base for several other packages, such as pyXem for analysing SPED data (Johnstone et al., 2019), Atomap for processing atomic resolution STEM data , and pixStem for working with data from fast pixelated STEM detectors (pixStem devs, 2015). Several other packages exist, like rigidRegistration for doing rigid image registration of atomic resolution image stacks (Savitzky et al., 2018), and wrappers for doing STEM simulations, like PyPrismatic (Ophus, 2017). Other packages for processing data from fast pixelated STEM detectors include py4DSTEM (Savitzky et al., 2019), LiberTEM (Clausen et al., 2019), pycroscopy (Somnath et al., 2019), and fpd (fpd devs, 2015). The post acquisition visualisation and processing of data from fast pixelated detectors using the fpd and pixStem libraries will be reported in Part II of this work.
In Section II, the Medipix3 detector is briefly introduced. Methodologies for acquiring data from it are discussed in Section III. In Section IV, an architecture developed to process a live data stream from a fast pixelated detector is outlined. In Section V, the issues around data storage are discussed and our implementation is presented.

II. MEDIPIX3 DETECTOR
All pixelated data reported in this work is from a 256×256 pixel Medipix3RX (henceforth referred to as Medipix3) detector (Ballabriga et al., 2011) affixed to a Merlin 1R retractable Medipix3 mount from Quantum Detectors (Harwell, Oxfordshire, UK). The Medipix3 detector is a radiation-hard hybrid counting direct electron detector, where active analogue and digital signal processing circuitry in each 55 µm pixel is bump-bonded to a relatively thick sensor layer. Si sensor layers of 500 µm are needed for operation at primary electron energies of 300 keV. In our case, a 300 µm silicon sensor layer was used for all data except that in Fig. 1, where a 500 µm layer was used instead.
In electron microscopy applications, an incident electron produces electron-hole pairs in the sensor layer in sufficient numbers (Scholze et al., 1998) for the signal due to a primary electron to be clearly distinguishable from noise in the detector. This makes the detector capable of noiseless operation by the setting of an appropriate threshold for counting, and the detector is thus able to detect individual electrons. As a consequence, the Medipix3 detector is of potential use in time resolved electron microscopy experiments, where sub-100 ns time resolution has been recently demonstrated (Paterson et al., 2019).
Each pixel can operate independently, with its active circuitry processing only the signal induced in that pixel, in a mode of operation known as single pixel mode (SPM). Alternatively, in so-called charge summing mode (CSM), neigh-FIG. 1 Imaging of SrTiO 3 along the [110] direction using different bit depths and probe dwell times (in rows) with the Medipix3 detector in continuous read-write mode. High angle ADF (HAADF) images (left column) were calculated by summing all counts inside a virtual aperture defined over the collection angles 80-192 mrad (assuming a linear mapping of pixel count to diffracted angle, which may not be entirely true in an image corrected microscope), shown by the red lines in the diffraction images (middle column), using the pixStem library (pixStem devs, 2015). The coloured section of the 1-bit HAADF image in (a) is Fourier filtered with a schematic overlay of the atomic columns imaged: green: Sr, yellow: O, and blue: Ti. The third column shows the summed diffraction patterns, with the insets displaying their radial distributions from 0 to 192 mrad. The dip in intensity in the centre of the direct spot in the 12-bit mode data in (h) and (i) is due to the higher bit depth now allowing the details of the primary beam and low order diffraction discs to be seen.
bouring pixels can pool their circuitry and collectively process the signals induced in each pixel (Ballabriga et al., 2013). CSM attempts to account for charge spread between pixels due to electron-matter interactions in the thick sensor layer. At an acceleration voltage of up to 80 kV, the Medipix3 has a near-perfect DQE and MTF when imaging electrons (Mir et al., 2017). The use of alternative high-Z sensor layer materials is expected to improve the performance at higher acceleration voltages (McMullan et al., 2007) and is currently being investigated.
Another notable feature of the Medipix3 detector is the abil-ity to operate in continuous read-write mode, where one of the two sets of counters in each pixel is used to readout the data while the other takes over counting. This gapless recording maximises dose efficiency, which is important for beam sensitive samples, and also enables faster acquisitions, which is important for minimising artefacts due to microscope instabilities, particularly when imaging with atomic resolution. The Medipix3 detector can be operated in 1, 6, 12, and 24 bit depth modes, allowing the compromise between readout time, file size and dynamic range to be varied. The clock on the Medipix3 was designed to be driven at frequencies up to 200 MHz but, with additional cooling, it can be overclocked to allow faster operation. With the 120 MHz clock rate of the Merlin readout system (Plackett et al., 2013) used here, the readout times are 70.8 µs, 412 µs, 822 µs and 1.64 ms, for 1, 6, 12, and 24 bit modes, respectively. While the 24 bit mode is ideal for very high dynamic range diffraction studies (Mir et al., 2017), the higher readout rates of the lower bit depth modes are more generally useful across a wide range of imaging conditions (it would take >4 ms to exceed 12 bits at 1 MHz count rates per pixel, so 24 bits is only needed for long counting times or high arrival rates on some pixels).
To demonstrate the use of different bit depths, atomic resolution data from SrTiO 3 imaged along the [110] direction was acquired on a Medipix3 detector at bit depths of 1, 6 and 12, giving maximum counts of 1, 63, and 4095, respectively. The data was acquired on a JEOL ARM 300CF using an acceleration voltage of 200 kV and a convergence angle of 22.4 mrad, with the Medipix3 operated in SPM with continuous read-write enabled. High angle ADF (HAADF) images produced from these datasets are shown in the left hand column of Fig. 1, with the bit depth increasing from top, (a) to bottom, (g). The atomic resolution contrast in these images arises mostly from incoherent scattering of the electrons, similar to that in regular HAADF imaging with dedicated annular detectors. The middle and right hand columns show individual and summed diffraction patterns from each scan, respectively. The circular red lines in the diffraction patterns mark the edges of the virtual aperture used within which pixel counts were added up to give the intensity used for each pixel of the realspace images, while the insets in the third column show the radial distributions. The non-round 'shadow' easily visible at the outer edges of the 1-bit diffraction pattern [ Fig. 1(a)] is due to high angle cutoff in the microscope due to the image corrector. Although the 1-bit diffraction patterns [Figs. 1(b) and 1(c)] do not seem to contain much information, the ADF data [ Fig. 1(a)] shows high quality atomic resolution imaging is possible, with the SrO, Ti, and O 2 columns (Abramov et al., 1995) all resolved, as shown in the inset schematic.
The much higher frame rates achievable of 12,500 frames per second with 1-bit data in continuous read-write mode makes this acquisition configuration particularly suitable for navigation during setup or in especially beam sensitive materials. As shown by the radial distributions, we have selected the scattering angles where the detector is not saturated and contrast can be extracted. With shorter exposures or lower beam currents, regions closer to the central spot of the diffraction pattern will produce useable image contrast. In 6-bit mode, more features of the diffraction pattern are visible than in the 1-bit mode and the darkfield image [ Fig. 1(d)] is better defined. This trend continues to the 12-bit mode [ Fig. 1 where the direct beam is no longer saturated, as shown in the inset to Fig. 1(i). However, more atomic columns are present in the image as a result of larger spatial drift during the longer acquisition (9 or 10 Sr columns per row in the 12 bit data compared to 8 or 9 columns per row for the 1-bit data). The principal benefit of higher bit depth imaging in this context is that a greater range of scattering angles may be used for virtual aperture imaging post acquisition and the signal-to-noise ratio (SNR) ratio is generally higher, even if there is a cost in acquisition time and consequent drift.
Selection of higher scattering angles by saturating the central spot in 1-bit and 6-bit modes is possible here because, unlike CCDs, the Medipix3 is not damaged by the very intense direct beam, and because the noise-free readout enables each single electron hit to be accurately recorded. With very intense beams (approximately 1 MHz count rate per pixel), the electron arrival rate can exceed the counting rate of the detector; this does no harm to the detector, but electrons are missed and the counts no longer represent an accurate reflection of arrival rates (and even a little below this level, counting linearity is lost). In this data, however, the beam current was not high enough to cause such an effect and the slight dips in intensity in the centre of Figs. 1(h) and 1(i) are due to the real internal structure of the brightest portion of the diffraction pattern being resolved at the highest bit depth.
Regardless of the imaging mode used for the collection of the source data, smaller bit depths also makes both the file storage and the data processing more efficient; a 6-bit dataset is about 4 times smaller than a 24-bit one of the same scan area, making it much more convenient to store and transfer. This advantage also extends to the data processing, since loading and processing data files which are 4 times smaller will be much quicker.

III. MEDIPIX3 DATA ACQUISITION
Data from the Medipix3 detector was acquired through the Merlin readout system (Plackett et al., 2013). This allows setting of the acquisition parameters, either through a graphical user interface (GUI) or over TCP/IP, and reads and processes the raw data through a field-programmable gate array (FPGA), returning the data to the acquisition computer. The FPGA processing can be bypassed to some extent by operating the system in 'raw' mode. This enables larger scan sizes at high frame rates, with the requirement that the data must be reshaped post-acquisition, and with no live visualisation of the acquired images directly in the Merlin software. However, the Merlin TCP/IP data API remains functional, so it is possible to get live imaging through other means (see Section IV).
The Merlin system can be triggered by software over TCP/IP or by hardware (TTL) input. We typically use the latter approach and couple to the TTL signals produced by a Gatan DigiScan system, as shown schematically in Fig. 2(a). This produces extra acquisitions due to triggers sent during the flyback time and the handling of these is discussed in Section V. The main advantage of this approach is that Gatan Digital Micrograph (DM), in addition to allowing access to microscope control, can be used for setting scan parameters in one of several ways discussed below, and additional STEM detector signals may be acquired simultaneously.
The simultaneously acquired DM datasets also serve to document the microscope and scan parameters in the data tags, which can then be used in data conversion (discussed in Section V), abstracting away the differences in how various microscope manufacturers provide microscope configuration information.
When regular STEM detectors can be used to navigate, the images produced from them may be used to set region of interest (ROI) scans using an image produced by a prior 'survey' scan, following the spectrum imaging methodology, or regular STEM scans may be used to maximise read rates. When these approaches are used, a scripted DM plugin may be used for setting low level Merlin parameters (Merlin DM Plugin devs, 2017), as shown in Fig. 2(b). Alternatively, real space pixel sizes and scan ranges may be set and low level DM commands used to configure and enable the scan. A scripted DM GUI has been developed, MERLIN PixSTEM, to coordinate this with configuring the Merlin system to acquire data in the optimised continuous read-write mode and is shown in Fig. 2(c). Amongst the other features implemented, this plugin also allows different projection system settings to be saved and restored, enabling efficient switching between different detectors. The Merlin system includes two TCP/IP servers, shown in red in Fig. 2(a), one for setting and reading acquisition parameters and the second for image data transfer. The DM scripted GUIs, indicated in cyan in Fig. 2(a), interface with the Merlin communication server through a separate TCP/IP C++ plugin (Merlin DM Plugin devs, 2017), shown in red. The TCP/IP plugin may be installed alone, allowing it to be used for many other communication purposes. For more advanced control of the Merlin system over TCP/IP, a Python implementation of Merlin TCP/IP commands has been developed (Merlin Interface devs, 2016).
An example of an additional STEM signal we collect is the STEM noise correction (NC) signal, which may be used for gun noise correction of the 4-D dataset. The NC signal is produced by a current pickup attached to the condenser aperture and gives a measure of the gun emission. Correction of gun noise is particularly useful in intensity-based low contrast imaging modes, as shown in the bright field (BF) images of a mouse liver microtomed thin section in Fig. 3. The image in Fig. 3(a) is produced by summing the entire diffraction pattern (an example is shown in the inset) at each scan position. Sample contrast primarily arises due to incoherent Rutherford scattering of electrons to angles beyond the detector. The large circular feature in the bottom left corner is part of a mitochondrion organelle, while the darker spotted stripe structures are endoplasmic reticula studded with ribosomes.
The horizontal stripes in the as-measured image in Fig. 3(a) are from short period variations in the cold-FEG emission. The gun signal measured by the NC detector is shown in the inset to Fig. 3(b). Figure 3(b) itself shows the corrected image produced by minimising the contrast introduced by these gun emission current variations using a linear gun-noise model. This reveals much more detail in the BF image than previously seen in Fig. 3(a). Taking the corrected image as a reference, the power SNR of the uncorrected image, calculated using the implementation of the two-image method (Frank, 1980) in the fpd.utils module, is 11 dB, giving a measure of the improvement in the image quality by applying gun noise correction.
The Medipix3 data acquired following the above methodologies can be saved to disk on the acquisition computer in a flat binary format and may also be sent over the network using TCP/IP. Conversion of the binary data to more appropriate formats is discussed in Section V. The network transfer of data has many potential uses and we discuss these in the context of live data processing in the next section. Measured (a) BF image and (b) the same image after gun noise correction using the fpd.tem tools.nc correct function. The inset to (a) shows the summed diffraction pattern on a logarithmic scale, while that to (b) shows the recorded gun noise. The acceleration voltage was 200 kV, the objective lens was off, the condenser aperture was 30 µm, the camera length was 600 cm, the convergence semi-angle was 13.1 mrad, and the pixel spacing was 3.7 nm.

IV. LIVE DATA PROCESSING
Live feedback from the data collected by a fast pixelated detector in a STEM acquisition is crucial for both optimising imaging conditions and navigating to regions of interest in a sample. This is especially true for some modes of imaging where traditional STEM detectors may not produce useful contrast, such as when imaging magnetic features which are typically not visible in STEM without a custom segmented detector and readout system (McGrouther et al., 2014). To facilitate real-time feedback, we developed the Python library fpd live imaging (FPD Live Imaging devs, 2015), which implements multiple common analysis routines and wraps processing routines from other libraries (fpd devs, 2015). Although the fpd live imaging package was developed for use with the Medipix3 detector and Merlin readout system, its design is modular and can easily be extended to work with any detector. Our implementation makes use of Python's multiprocessing library, with shared parameters passed through 'queue' objects or other shared memory, making possible good performance by taking advantage of the many cores available in modern CPUs.
The internal workings of the package are outlined in Fig. 4. The Medipix3 1R insertion and retraction mechanism (shown coloured in purple, i) is controllable through a serial interface (shown in green) and is made possible through library function calls. As discussed in the previous section, the Merlin system (drawn in yellow, ii) can be interfaced with via two TCP/IP servers (shown in red), which are utilised by the fpd live imaging package (white, iii) to get data from the detector and to control the acquisition of data. The first step in the visualisation is receiving the raw binary data from the Merlin TCP/IP data interface using the receive data medipix function (iv). This function runs a TCP/IP socket which gets the raw binary data and passes it along to a parser function. The function has its own CPU process to be able to handle the very high framerate of the Medipix3 detector. Due to the nature of the TCP/IP protocol, the raw binary images can be split into different fragments. These fragments are pieced together in the parse function, which results in the image in the form of a NumPy array (Oliphant, 2006). The function also handles the bit depth of the data and the number of pixels in the detector, and also runs in its own separate CPU process.
After having constructed the image in the form of the NumPy array, a copy is sent to any number of data processing classes. These data processing classes are shown in blue (v) in Fig. 4 and can be separated into two categories based on the imaging mode: scanning and parallel. The scanning data classes include things like virtual bright field and annular darkfield, where the input detector image is reduced to a single output value. In the parallel data classes, the output image is the same size as the input one, and the processing methods include passing through the input image, a thresholded version of the input image, or a Fourier transformed image. All these run in separate CPU processes. In addition to the aforementioned processing classes are ones for single pixel extraction, centre of mass and phase-correlation for electro-or magnetostatic field imaging, and routines for HOLZ processing.
The processing time varies greatly, depending on the computational complexity of the routine (Nord et al., 2016), and the choice of routine depends upon the nature of the sample. For example, in magnetic imaging, the integrated induction components perpendicular to the electron path can be determined from deflections in the position of the bright field disc (Chapman and Scheinfein, 1999). The centre of mass calcula- tion provides good contrast in many cases but can be affected by the crystallinity of the sample due to intensity diffracted from the bright field disc to angles either outside or inside the detector collection angle (Chapman et al., 1990). Phase-or cross-correlation  approaches can greatly improve upon this at the expense of computation time, and can be crucial to detecting magnetic contrast in highly diffracting samples. On the other hand, single pixel extraction, where a single pixel on the edge of the disc is used as a measure of up to around pixel-level disc shifts, requires the minimum of processing and is orders of magnitudes faster, taking approximately 2 µs when the 256×256 scan position 12-bit dataset is in memory (Nord et al., 2016). As each selected pixel gives a measure of a component of the integrated induction in a direction tangential to the disc, the use of only two pixels out of each diffraction image is sufficient to form a qualitative 2-D vector map, which allows the user to at least navigate to an appropriate position, magnification and focus. Multiple processes may be run sequentially or simultaneously, allowing the trade-off between runtime and sensitivity to be seen in real time.
The output data from any kind of processing is sent to a visualisation class, which shows the result of the processing on the computer running the fpd live imaging package. Due to rescaling of the intensity to optimise the contrast, this visualisation is qualitative, while the calculations themselves can be quantitative. This computer may be anywhere on the network. The visualisation is separated into parallel and scanning modes, as shown in pink (vi) in Fig. 4, and they also run in separate CPU processes. An example of the visualisation GUI is shown in Fig. 5(b). In this case, the image is from thresholded centre of mass analysis of data from a patterned DC sputtered 8 nm permalloy film capped with 4 nm of copper. The 2 µm discs were patterned with a Ga focused ion beam, and the contrast in the resulting structures shows they support magnetic vortices. A detailed study of the sample will be published elsewhere. The GUI has buttons for setting the brightness and contrast during the acquisition, and the analysis parameters can be tuned during imaging, allowing for live optimisation of the required contrast. Alternatively, the processed data can be sent over TCP/IP to any computer on the network, for example, directly into Digital Micrograph.
All the above processes are orchestrated from the 'Acquisition Control' class (shown in brown, vii in Fig. 4), which handles the initialisation and connection of all of these separate functions. For ease-of-use, the Acquisition Control class can be accessed through a GUI, as shown in Fig. 5(a). This allows for starting and stopping of the acquisition, modification of the scan parameters, the addition and removal of processing classes, modification of their parameters, and insertion and retraction of the detector itself. The three separate stages described above, reading data from the detector, processing the images, and visualising or sending the result over TCP/IP, are implemented in modular design, making simple the addition of new detector data sources, image processing classes, and visualisation.
FIG. 5 fpd live imaging's graphical user interface, showing (a) the control window for the visualisation, and (b) a thresholded centre of mass contrast of a patterned 8 nm permalloy film capped with 4 nm of copper. The contrast in the 2 µm discs represents the beam deflection along a single axis, and shows that the discs support magnetic vortices. The inset to (b) shows the thresholded detector image.

V. DATA STORAGE
The principal issues when choosing a file format for fast pixelated data are the ability to store the data with the dimensionality of the scan, store metadata along with the detector data, allow access to subsets of the data without reading the entire and often very large dataset into memory, support compression, and be an open format with read and write support across a variety of programming languages. An HDF5 (The HDF Group, 1997-2018 based format was chosen for our use since it meets all of the above requirements. The HDF format has long been widely used in the synchrotron community and is increasingly being used in electron microscopy (de la Peña et al., 2018;EMD authors, 2019;Somnath et al., 2019). It can be both read and written in a number of programming languages, including MatLab, C++, Python, Java, R, and Gatan Digital Micrograph through a third party plugin (Niermann, T, 2016). The HDF5 format consists of an arbitrary structure of hierarchies of groups containing further groups or datasets, enabling the relationship between data to be indicated by the file structure. For datasets, the data type definitions are stored with the data, making it selfdescribing and ensuring maximum portability. Additionally, all groups and datasets can have attributes, allowing user and acquisition metadata to be stored along with the detector data in appropriate locations. The datasets may be of any number of dimensions and so it is ideal for multidimensional data from fast pixelated detectors when used in STEM or other acquisition modes.
FIG. 6 (a) Example of potential dataset chunking for data from a 1-D scan stored in an HDF5 file. (b) Data indexing sequence for chunked data. (c)-(e) HDF5 chunk performance metrics for the 256×256 probe position STEM dataset from Fig. 3 with a 256×256 Medipix3 detector in 12-bit mode. Level 4 GZIP compression was used. The upper and lower insets are the same data on linear and logarithmic scales, respectively. The (e) read and (c) write times are normalised to the optimum value used. The compression ratios (d) are of the entire HDF5 relative to only the raw Merlin binary file. The read times are those required to load a 128-sided hypercube into an in-memory NumPy array using h5py. All tests were performed on a single HDD.
HDF5 has in-built support for a variety of compression algorithms and other so-called 'filters', all providing transparent read and write access to the data. To allow access to subsets of the data without having to decompress the entire dataset, the dataset can be divided into smaller pieces and stored in a B-tree, a balanced hierarchical data structure, by enabling 'chunking'. Figure 6(a) shows an example of the potential chunking of a one-dimensional (1-D) scan dataset. The stack of images (shown on the left) occupy a 3-D data 'cube' (middle), with one axis being the scan dimension. On the right of panel (a) we show the same dataset with two chunks along each dimension, with each chunk in a different colour. The dataset access sequence is summarised in Fig. 6(b). When indexing a chunked dataset, the B-tree is navigated and each chunk containing the required data is decompressed and only the selected components are returned. For example, when reading the image slice shown by the blue dashed line in the right of Fig. 6(a), each of the top four chunks must be read.

A. Chunk Size
When choosing a chunk size, a compromise is made between the cost of B-tree navigation, compression level, and data reading speed, with the optimum choice ultimately depending on the intended data access pattern. For STEM data, the diffraction pattern can be sparse and compression can be optimised by chunking in both the scan and image dimensions. In addition, data from the Medipix3 detector is zero padded to align it with common data types (e.g. 12-bit data is stored as 16-bit), allowing compression to achieve significant reductions in file size. For example, with a chunking of 16 along each axis of a 4-D dataset, the in-built lossless GZIP compression at level 4 typically reduces the data size of a scanning acquisition of 256×256 probe positions in 12-bit mode from 8.6 GB to 2.7 GB.
Figures 6(c)-6(e) show three HDF5 performance metrics as a function of the hypercube chunk edge length for the liver sample data in Fig. 3: normalised write time, compression ratio and normalised read time. Hypercubes were used in this test since these give the most uniform data access properties across different axes. This can also improve data processing efficiency by allowing reduction of the volumes of data that must be read for some analyses. For example, with a chunk size of 16 along each axis, getting the direct beam in a dataset where it resides in four chunks would require loading into memory only those chunks, corresponding to only 1.6% of the total file.
As the chunk size is increased, the compression ratio [ Fig. 6(d)] goes through a minimum at a hypercube chunk length of around 16, and then gradually worsens, while the write time [ Fig. 6(c)] initially decreases with chunk size and then plateaus at a similar point. Although the read time [ Fig. 6(e)] shows a continual improvement with chunk size, reflecting in part the lower cost of B-tree navigation overhead, the same optimum chunk size marks the approximate corner of the curve, with little improvement seen at larger chunk sizes. The read times plotted in the graph are for reading one quarter of the entire dataset and so potentially large penalties will arise with larger chunk sizes when indexing subsets of the data.
From the performance metrics shown in Fig. 6, a reasonable optimum chunk size is a hypercube of edge length 16, and this is the default value in our implementation. For markedly different datasets or where the data access pattern is known in advance, the optimum chunking may be somewhat different and this can be set by the user at the point of conversion.

B. Merlin Data
The Merlin readout software stores the detector and readout system parameters in a separate header file, and the detector data as a stream of uncompressed binary data, with each image containing a variable length header of acquisition parameters specific to that image. The MerlinBinary class from the fpd_file module of the fpd library (fpd demos devs, 2018; fpd devs, 2015) allows parsing of data files, array access to raw data using memory mapping, and conversion to the HDF5 format. The scan parameters and metadata can be extracted from Digital Micrograph files acquired simultaneously with the diffraction patterns, or may be supplied separately. For the former case, the DM files are accessed through the HyperSpy library (de la Peña et al., 2018) and are also embedded in the HDF5 file as raw binary blobs for reuse in the proprietary DM software. All DM files are also stored in the HDF5 file in the open EMD format (EMD authors, 2019) (discussed in the next section). Examples of Merlin data converted to HDF5 format using the MerlinBinary class are available in the open data deposit for this work (Nord et al., 2019b). Figure 7(a) shows an example of a 2-D scan when the acquisition is being triggered by the microscope scanning system. Images, indicated in green, are acquired on a regular scan grid. As discussed in Section III, during the time when the beam is being moved from the end of one row to the start of the next, the 'flyback' time, the DigiScan system continues to send triggers, causing additional images to be acquired. These are shown in red, and may be excluded during data access and conversion to the HDF5 format with appropriate parameter settings.
Most pixelated STEM datasets are 4-D, where the first two axes are the scan ones, and the last two are the detector axes. However, the Medipix3 detector can be operated in colour mode, where an additional axis representing multiple thresholds exists between the scan and detector axes. This axis, while not generally used in STEM acquisitions at present, is used for spectroscopic X-ray imaging, is useful for characterising the detector performance using X-rays, and is supported by the fpd library.
While the HDF5 conversion is most appropriate for data archival and later processing of data acquired under all modes of operation, the MerlinBinary class also provides a memory mapped array interface to the data on disk for most but not all acquisitions. For example, 1-bit data acquired in raw mode is stored as 1-bit by the Merlin system and has the image segments out of order, and cannot currently be easily memory mapped. However, in most cases, this mode of access allows the dataset to be visualised or processed without conversion of the data on disk to the HDF5 format, which is particularly useful for checking datasets immediately after acquisition.

C. HDF5 File Structure
Figure 7(b) shows an overview of one of the HDF5 files read in HDFView (The HDF Group, 2017), a Java GUI program that allows, amongst many other things, quick inspection of HDF file contents. Information from the binary headers such as the DAC values, the image exposure, the comparator threshold values and the acquisition time are automatically extracted and included as datasets in the HDF5 file. During conversion to the HDF5 format, a sum of both the image and diffraction dimensions are generated and stored in the HDF5 file as separate datasets, resulting in bright field and diffraction sum images. These images may be used for data inspection and navigation without having to process the entire dataset in order to re-render them every time the file is loaded. Also during conversion, any bad pixels in the detector (hot, noisy or dead), may be replaced by interpolated values when the user supplies a mask image, which can be important for optimisation of some forms of data analysis.
The detector data, including images created from it, is stored in the EMD format, a simple open subset of the HDF5 format, created by a specific collection of datasets and attributes (EMD authors, 2019). The EMD datasets may be read in software such as EMD viewer (EMDViewer devs, 2015), HyperSpy (de la Peña et al., 2018) and, of course, any HDF5 reader. Many utility functions are also provided in the fpd.fpd file module, allowing conversion to other formats and extraction of data.
We note that work is currently underway by the LiberTEM project (LiberTEM devs, 2018) to include (transmission) electron microscopy data in the NeXus data format (Könnecke et al., 2015). The Nexus format is an open subset of the HDF5 file format, originally developed to improve the data exchange within the fields of neutron, X-ray and muon experiments. Having a common data format across all of these fields would be beneficial, since it would make the sharing of data and of processing routines that rely on metadata easier than is currently the case. This format could clearly be used within the suite of tools described in this paper.

D. Merlin Equipped SPED Systems
A recent development in precession electron diffraction (Midgley and Eggeman, 2015) (PED) is the use of fast pixelated detectors. One such example is the work by NanoMEGAS to incorporate a Medipix3 DED into their DigiSTAR precession system in order to enable high fidelity recording of diffraction patterns in scanning PED (SPED) applications, as has been tested in recent work (MacLaren et al., 2020). Additional benefits brought about by the use of DEDs in SPED will be discussed in Part II of this work. We note here, however, that the properties of the 4-D dataset obtained by such a system are in many ways equivalent to those of 4-D non-SPED datasets, and so many of the same issues of data access, storage, and processing apply here too. To enable these datasets to be more easily used, the topspin app5 to hdf5 function of the fpd.fpd io module allows conversion of data originally recorded in the native NanoMEGAS TopSpin app5 format to one almost identical to the HDF5 format outlined above. Alternatively, the Merlin acquisition software can be programmed to output the data directly to a raw file whilst acquisition is being performed and controlled by the TopSpin software. The main differences between the converted files is the inclusion of precession metadata instead of Medipix3 metadata in SPED datasets, and the absence of simultaneously acquired DM datasets.

VI. SUMMARY
The use of fast pixelated detectors for electron imaging is a burgeoning field with the prospect of revolutionising many aspects of transmission electron microscopy (TEM) and, in particular, scanning TEM. We have presented many of the key tools needed to i) acquire data from fast pixelated detectors, ii) analyse in real-time the data from one and visualise the results, and iii) store data from them in an optimised way.
The software packages presented are hosted in public repositories (fpd demos devs, 2018; fpd devs, 2015; FPD Live Imaging devs, 2015; Merlin DM Plugin devs, 2017; Merlin Interface devs, 2016; pixStem devs, 2015), are under active development and contain many more features than are covered in this short publication. Many of the data analysis algorithms in these libraries are applicable to data from any detector. Most of these packages are provided under an open source licence, allowing transparency of the algorithms implemented and for them to be continually improved upon by the community.
Part II of this paper will cover post-acquisition processing and visualisation of data from fast pixelated detectors, with examples of their application to materials studied using scanning transmission electron microscopy.