Application of benchtop micro-XRF to geological materials

Abstract Recent developments in X-ray optics have allowed the development of a range of commercially available benchtop micro-XRF (μ-XRF) instruments that can produce X-ray spot sizes of 20–30 μm on the sample, allowing major- and trace-element analysis on a range of sample types and sizes with minimal sample preparation. Such instruments offer quantitative analysis using fundamental parameter based 'standardless' quantification algorithms. The accuracy and precision of this quantitative analysis on geological materials, and application of micro-XRF to wider geological problems is assessed using a single benchtop micro-XRF instrument. Quantitative analysis of internal reference materials and international standards shows that such instruments can provide highly reproducible data but that, for many silicate materials, standardless quantification is not accurate.Accuracy can be improved, however, by using a simple type-calibration against a reference material of similar matrix and composition. Qualitative analysis with micro-XRF can simplify and streamline sample characterization and processing for subsequent geochemical and isotopic analysis.


Introduction
CHEMICAL characterization of rocks and minerals is fundamental to the study of geology and earth sciences. X-ray fluorescence (XRF) determined major-, minor-and trace-element abundances are employed routinely to characterize and understand bulk rock geochemistry, whilst electron microprobe analysis (EMPA) provides major-, minor-and some abundant trace-element concentrations for mineral samples at high spatial resolution (micrometre scale). Typical XRF and EMPA techniques often complement each other, but neither routinely provides high spatial resolution trace-element data, for which researchers have to rely on synchrotron radiation X-ray micro-beam XRF (SR-μXRF) or laser ablation inductively coupled plasma mass spectrometry (LA-ICP-MS). Over the last decade SR-μXRF has been increasingly employed to provide high spatial resolution, non-destructive analysis of major and trace elements for a wide range of geoscience applications, including mineralogy and petrology (Figueiredo et al., 1999;Cauzid et al., 2006;Schmidt et al., 2012) and palaeontology (Bergmann et al., 2012;Gorzelak et al., 2013). Micro-beam XRF is evidently a highly valuable technique, but due to the difficulties in focusing X-ray beams, its availability has previously been limited to specialist synchrotron facilities where the high flux of X-rays allow production of a small X-ray spot size by use of collimator optics. The development of capillary optics that can focus X-rays to produce a beam on the order of tens of micrometres (Haschke and Haller, 2003;Guilherme et al., 2012) has recently facilitated the development of laboratory-based, benchtop micro-XRF (μ-XRF) instruments. While these instruments do not yet have the sensitivity and lateral resolution of SR-μXRF, they nevertheless have the potential to contribute important information to geological research.

Benchtop μ-XRF
Commercially available benchtop μ-XRF instruments tend to be marketed as non-destructive, highly precise elemental analysis tools that can be applied to a versatile range of sample types and sizes, due to the small X-ray beam size. Commonly advertised applications focus on imaging elemental variations in a sample, as is routinely carried out for major elements using EMPA and scanning electron microscopy energy dispersive spectroscopy (SEM-EDS). Most μ-XRF systems allow analysis of larger samples than is possible with electron beam techniques, but a relative disadvantage is the lower resolution (tens of micrometres vs. nano to micrometre scale) due to the larger incident radiation beam diameter (typically 20-50 μm), and the larger sample interaction volumes associated with X-rays compared to electrons. With XRF the information depth varies with both the atomic number of the fluorescing element and the sample matrix, and is much greater than for electron-beam techniques. This can be an advantage in that, for heavier elements, good quality maps can be produced from rough, unpolished surfaces. This difference in information depth of Xrays from different elements is illustrated in Fig. 1 which shows a multi element map, displaying silicon and iron, of a piece of polished silicate glass mounted in epoxy resin. Here, the information depth of Si (Z = 14) is much shallower than that of Fe (Z = 26) because the lower energy X-rays are attenuated by the sample and epoxy-mount matrix. This results in Si only being 'visible' (as a combination of red Si and green Fe = yellow) where the glass is exposed on the mount surface -i.e. Si has an information depth of <10 µm. By contrast, the Fe is 'visible' through the epoxy resin with an information depth of up to 1 mm. The X-rays emitted by the Fe show increasing attenuation with depth beneath the surface of the mount, resulting in a shaded relief image that gives an indication of 3D structure of the glass sample beneath the surface of the resin.
This difference in information depth can be both an advantage and a hindrance when analysing small and fine-scale features. On the one hand, it is possible to identify sub-surface phases (e.g. a magnetite inclusion in feldspar may be visible as a Fe-hotspot within the feldspar, but is not visible on the sample surface); but this also means that analysis of small features is difficult as X-rays from below the feature may be detected (e.g. a spot analysis of a 100 µm apatite crystal in a basalt may appear to contain Fe because characteristic X-rays derived from Fe-rich material beneath the apatite are able to transmit through the crystal).
While benchtop μ-XRF is primarily marketed as a tool for qualitative analysis (element mapping, line scans), the commercial software attached to many μ-XRF instruments offer fundamental parameter (FP) based 'standardless' quantification of X-ray spectra, typically with an option for further standard calibration. Elam et al. (2004) tested the accuracy of FP-based standardless quantification on bulk alloys and bulk oxide certified reference samples and suggested that the accuracy for most elements is better than 1%. This contrasts with the findings of Newbury and Ritchie (2013), who noted that, while standardless quantification procedures used in SEM-EDS work were highly precise, their accuracy was low. Given this context, we now provide a summary of the fundamental parameter quantification method, discuss why this is the preferred quantification method for μ-XRF analysis, and highlight potential sources of error.
'Standardless' X-ray spectrum quantification models using fundamental parameters, and application to μ-XRF analysis During X-ray spectrometry, many variables contribute to the measured X-ray spectrum, such as elements present in the sample, the density, structure and composition of the sample matrix, absorption and enhancement of x-rays and secondary fluorescence, and the voltage, current, geometry and source of the excitation beam. As a result, converting X-ray spectra into elemental concentrations (i.e. quantifying a spectrum) is a complex, high-effort process. In general, X-ray spectrum quantification procedures can be classified as standard-based (empirical) and 'standardless' quantification models, and hybrid procedures are common. Standard-based models use empirically-determined influence coefficients to describe the relationship between concentrations and measured intensities (Potts and Webb, 1992;Kanngießer, 2003). Influence coefficients are determined for each element of interest by analysis of well-characterized reference materials, or standards, which must be of comparable quality (matrix, composition) to the samples being analysed. This is one of the simplest approaches to spectrum quantification, but the need for a large number of standards of similar matrix to the sample is a drawback (Potts and Webb, 1992). The validity range of influence coefficients can be extended beyond that of available standards by using physical models for the influence coefficients. In this case, certain influence coefficients (commonly those for minor and trace elements) are predicted via fundamental parameter (FP) calculations (see below), rather than measured on a suite of standards, meaning that a wider range of elements can be measured using fewer standards (Potts and Webb, 1992). The range of concentrations that can be analysed with these hybrid methods is wider but a large number of standards are still required. Several such hybrid empirical-FP quantification schemes have been developed, mostly for special applications and from different instrument manufacturers, with targets to improve accuracy and reduce the number of required references (Potts and Webb, 1992;Pereira and Brandao, 2001;Rousseau, 2009).
'Standardless' quantification using fundamental parameters, is based on the Sherman equation (Sherman, 1955), which uses atomic fundamental parameters (such as absorption, scattering and emission parameters/coefficients) for each element to calculate predicted X-ray intensities for given concentrations (see Supplementary Information -Note: Supplementary Information, Figures and Data have been deposited with the Principle Editor of Mineralogical Magazine and are available at https://www.minersoc.org/pages/e_ journals/dep_mat_mm.html.). Unfortunately, this equation cannot be inverted to allow calculation of concentrations from given X-ray intensities. However, as computing power has improved, it has become possible to accurately estimate concentrations from X-ray spectra via forward calculation of X-ray intensities for samples with assumed concentrations. In this case the measured and calculated X-ray intensities can be compared and the assumed concentrations improved by iterating the calculation with refined concentration assumptions until convergence of predicted and measured intensities is achieved (Potts and Webb, 1992). Using this method, quantified results are independent of the actual measurement conditions because these are incorporated in the calculated excitation spectrum, which is a required fundamental parameter for this method (Ebel, 1999). Fundamental parameter methods give the best results when a full X-ray spectrum is calculated, rather than just the characteristic X-ray lines of interest; this uses physical theory to calculate the spectrum background, so can give improved sensitivity for trace elements whose characteristic X-ray lines might be hidden in a high background, or in tails of higher intensity peaks, and facilitates more accurate peak identification by fitting multiple X-ray lines (Elam et al., 2006). Calculation of the full spectrum also allows the influence of undetectable elements, such as O and C, to be considered by calculating major-element compositions as assumed stoichiometric compounds such as oxides and carbonates.
For benchtop μ-XRF, quantification via a standardless model is considered to be the best option; the heterogeneity of the samples most likely to be analysed means that large compositional and matrix differences may exist within a small area, meaning that a large set of reference materials would be required if empirical methods were to be used for quantification (Kanngießer, 2003). It can also be difficult to find a suitable range of well characterized reference materials which are homogeneous at the 20-50 µm scale. This results in a high analytical effort for empirical-based models compared to standardless FP-based models, which is difficult to justify if the improvement in accuracy over FP-models is small. FP-based 'standardless' quantification procedures are used by a number of commercial benchtop μ-XRF manufacturers (e.g. Bruker Nano, EDAX). Such algorithms rely on a database of atomic fundamental parameters for each element, the most comprehensive and up to date of which was compiled by Elam et al. (2002). Using these FP algorithms, concentrations are calculated as mass fractions, normalized to 100%, to avoid systematic errors in the geometric factors used when forward calculating X-ray intensities (Elam et al., 2004).
These FP methods are commonly referred to as 'standardless', because day-to-day measurement of standards is not necessary to calculate concentrations in a sample. However, it is still necessary to consider the influence of the X-ray focusing optics on the excitation spectrum (Padilla et al., 2005;Wolff et al., 2011) and this involves measurement of the scattered spectrum on a small number of pure element standards. For commercially produced instruments, this is typically completed in the factory prior to delivery and is not carried out by instrument users.
Sources of error for FP-based quantification include errors in the fundamental parameters themselves, incomplete consideration of all X-ray-sample interactions (including incorrect assumptions regarding the concentrations of unmeasurable elements, such as oxygen, carbon and hydrogen), and incorrect description of the measurement geometry (Rousseau, 2006). These errors can be minimized and the accuracy of FPbased model results further improved by using an additional type-calibration. In this case, a single reference material of similar composition to the sample is analysed and correction factors are determined for the mass fraction of every element of interest. This calibration is typically available as a function within the instrument software which incorporates the correction factor into the concentration calculations to ensure that concentrations total 100%.
Assessing benchtop μ-XRF as an analytical tool for geological materials In this paper, we assess how useful benchtop μ-XRF systems are likely to be when applied to qualitative and quantitative analysis of geological materials. We use a range of sample types as case studies to assess how qualitative analysis using a benchtop μ-XRF system can contribute to sample characterization and streamlining of workflows when a sample is being prepared for other analytical techniques. We then go on to test the accuracy and precision of quantitative analysis of geological materials, by measuring international and internal silicate reference materials. In this first assessment of the quantitative ability of the benchtop μ-XRF technique, we chose to focus on the simplest and most homogenous sample geometries possible, in order to rule out any analytical variation or inaccuracies that might derive from sample inhomogeneity, surface roughness, sample edge effects or thickness inconsistencies. To this end, large fragments of silicate glasses (polished if a flat surface wasn't available) and pressed pellets of powdered silicate rocks were used as reference materials. Mineral standards were not analysed due to difficulties in acquiring samples that are confirmed to be homogenous at the ∼3 mm scale necessary to ensure that quantitative analyses are not influenced by sample edge effects or thickness inconsistencies. Likewise, it was not possible to compare glass vs. powder data, or data from rough and polished surfaces due to a lack of suitably large and homogenous reference materials and a lack of in-house polishing equipment.
Throughout, we discuss some of the ways that benchtop μ-XRF instruments can contribute to geological and mineralogical research, together with weaknesses of the technique that users should be aware of.

Instrumentation
For this study we used a commercially-available benchtop μ-XRF instrumentthe M4 Tornadoproduced by Bruker Nano. This system has a Rh X-ray tube with a Be side window and polycapillary optics giving an X-ray beam with a diameter of 25-30 µm on the sample. The X-ray tube can operate up to 50 kVand 800 µA, although the transmission function of the polycapillary optics is low for higher energies, limiting the range of high-energy lines that can be excited; e.g. Ba K-lines at ∼32-36 keV are not excited to a detectable level. X-rays are detected by a 30 mm 2 xflash® Silicon Drift Detector with an energy resolution of <135 eV at 250,000 cps (measured on MnKα). The sample chamber (600 mm × 350 mm × 260 mm) facilitates analysis of large samples and allows analysis either at atmospheric pressure or under oil-free and controlled vacuum by use of a pressure-controlled diaphragm pump; in this study, all analyses were carried out at 20 mbar vacuum. Scanning and sample navigation is by a motorized stage which moves the sample beneath the static X-ray beam.
All data acquisition and processing was carried out using the proprietary Bruker software supplied with the instrument.
Quantitative analyses were carried out only after the X-ray tube had been switched on for at least 1.5 h, to reduce errors from beam instability whilst the tube is warming up. Unless otherwise stated, spectrometer energy calibration was carried out twice daily by analysing a pure Cu standard and tuning the spectrum according to the zero and CuKα peaks (see Supplementary Data Table S1 deposited at https://www.minersoc.org/pages/e_ journals/dep_mat_mm.html for details on longterm detector drift).

Qualitative geochemical analysis with μ-XRF
The qualitative abilities of the M4-Tornado and its associated proprietary software were assessed using a series of case studies designed to explore the capability and limitations of the instrument for characterization of geological materials and streamlining of sample preparation workflows. Element maps and line scans were most commonly used for this purpose.
Element mapping produces 2-dimensional compositional maps, by collecting an entire X-ray spectrum for each pixel in a grid; single or multiple elements can be displayed during and after map acquisition. For a given element displayed on a map, pixel intensity is proportional to the intensity of the X-ray spectrum in the selected region of interest (ROI). By default, an element's ROI is centred on the elemental peak with the highest intensity (Kα peaks in many cases) but alternative peaks can be selected for display and it is possible to manually select, and display 'free regions' of the spectrum on the map. These features allow meaningful element maps to be produced when elements with overlapping characteristic X-ray energies are present in a sample, and when artefact peaks interfere with the ROI of an element ( Supplementary Fig. S1, deposited at https://www. minersoc.org/pages/e_journals/dep_mat_mm. html.). Post-collection data processing can display and quantify the spectrum for the entire map, or for selected areas of the map.
Line scans measure the entire X-ray spectrum emitted by a sample whilst scanning along a line between two specified points. X-ray intensity in the ROI for the element of interest is displayed as a proxy for relative element concentration.
Quantitative and semi-quantitative geochemical analysis with μ-XRF The quantitative abilities of the M4-Tornado and its associated proprietary software and quantification algorithms were assessed by measuring and quantifying X-ray spectra on a range of international and internal reference materials. First, spectrometer drift was assessed by repeat singlespot measurements on glass standards over 2-3 days to ascertain how often spectrometer calibration should be carried out (Supplementary  Table S1). Based on these results, spectrometer calibration was carried out twice a day for subsequent analyses. When analysing powder pellets, the powder grain size (<30 μm) is comparable to the X-ray beam spot size (∼25 μm) and so, for analysis of multi-mineralic rock powders, single spot analysis will not give bulk-rock values. The reference materials used in this study were a combination of pressed powder pellets and glass samples, so all analyses were carried out using the multi-point method, which sums the spectra collected at multiple points on the sample. A grid of ∼100 spots, over an area of 1.5-3 mm 2 was analysed for each sample. X-ray spectra were measured for different times (30-600 s) on sample GSP-2 to determine the optimum analysis time to minimize errors due to counting statistics (see Supplementary Fig. S2 and Table S3). For testing precision and accuracy, each spot was measured for 6 s, giving a total measurement time of ∼600 s and the resulting spectra combined to create a sum-spectrum that is representative of the bulk composition of the area analysed. This was repeated 10 times for each sample to assess precision. Beam conditions for quantitative analysis were 50 kV and 200 μA.
'Standardless' quantification of the X-ray spectra was carried, using the M4 Tornado's software, by iterative numerical solution of the Sherman equation and comparison of the measured and calculated spectra. This proprietary FP-based algorithm automatically corrects for detector artefacts such as pile up and escape peaks. Elements present in the spectrum, but not in the sample (e.g. Rh from the tube radiation) were matched during the pattern fitting but excluded from the quantification results. The quantification scheme initially employed here calculates abundance in weight percent (wt.%) for the following major and minor oxides and trace elements of geological interest: Na 2 O, MgO, Al 2 O 3 , SiO 2 , P 2 O 5 , K 2 O, CaO, TiO 2 , MnO, Fe 2 O 3 , V, Cr, Co, Ni, Cu, Zn, Ga, Rb, Sr, Y, Zr, Nb, Ba, La, Ce and Th. Sulfur was not included due to its low abundance (<3 ppm) in the standards. Cl is difficult to analyse contemporaneously with lighter elements; interference on the Cl characteristic X-ray peak from the Rh tube radiation requires quantitative Cl analyses to be carried out using an energy filter, which reduces the intensity and thus quantitative precision on low energy (light element) characteristic X-rays. Such analyses are possible, but require a tedious 2 stage analysis with and without energy filters during spectrum acquisition. For this reason, Cl (6-113 ppm in MPI-DING standards, unknown in USGS standards and internal references) has not been quantified.

Reference materials and sample preparation for quantification assessment
Five international standards and two internal references of varying type and composition have been used: three USGS powder standards (BHVO-2, AGV-2, GSP-2 - Wilson, 1998aWilson, ,b, 2000, two MPI-DING glass standards (T1-G, GOR-132G - Jochum et al., 2006) and two previously characterized aphyric obsidians used as internal standards: K5, Kerlingarfjöll, Iceland (Flude et al., 2010) and OOL-31A, Cochetopa Dome, San Juan, USA (Lipman and McIntosh, 2008). The MPI-DING synthetic glass standards were shown to be homogenous at analytical volumes greater than ∼30 µm 3 (Kempenaers et al., 2003) and so can be expected to yield consistent results. Sample K5 was used to test spectrometer drift (analysis of the same spot over time), but subsequent analysis revealed significant SiO 2 and K 2 O zoning related to flow banding in this sample, and so it has not been used to assess spectrum quantification. Glass samples were fragments of at least 3 mm × 3 mm × 3 mm and the smallest samples (T1-G, GOR-132G) were embedded in epoxy resin and polished to ensure optimum analysis conditions for this assessment. These dimensions ensure that the samples are approaching infinite thickness with respect to most of the characteristic X-rays of geological interest (E < 16 keV). All powder samples (grainsize < 30 µm) were made into 1 cm diameter, 5 mm thick pellets by pressing at 3-5 tons without using a binder. The published composition of the standards (normalized to 100% volatile free and with Fe as Fe 2 O 3 and Mn as MnO) is given in Supplementary Data Table S2 and measurement  results in Tables S4-S9.

Qualitative analysis of geological materials with benchtop μ-XRF for sample characterization and preparationan appraisal
Sample characterization is an essential part of any petrological or geochemical study, providing information on the phases present in the sample, their relationship to each other, and identifying phases for further investigation. Comprehensively characterizing a sample using traditional methods can use many techniques and thus be rather timeconsuming. An example of a typical comprehensive workflow used to separate mineral grains for an 40 Ar/ 39 Ar age determination study is shown in Fig. 2, and similar workflows are used for any technique which requires separation of an individual phase. In principle, this workflow could be simplified and shortened by using benchtop μ-XRF; mineral phases present and their approximate compositions could be characterized on large (tens of cm) rough-cut slabs and unconsolidated sediments, rather than highly polished petrographic sections, and mapping of crushed material can aid in hand-picking of high-purity mineral separates.
Here we use a number of case studies to illustrate how μ-XRF can be applied to the sample workflow and discuss the strengths and weaknesses relative to more traditional sample processing methods.
FIG. 2. Typical workflow for separating mineral phases for isotopic analysis. Grey text indicates stages that can wholly or partially be replaced by μ-XRF characterization.

Sample screening and bulk characterization (Stages 1-4 of Fig. 2)
Traditionally, bulk characterization of a sample is carried out by visual inspection of a hand sample, followed by preparation of petrographic sections for study using a petrographic microscope and, commonly, SEM or EMPA work. This allows the phases present in the sample to be identified, and their textural relationships and internal homogeneity to be well characterized. Potential disadvantages of this process include the time taken to create petrographic and polished sections, the small area of the specimen sampled by the section (typically 2 cm × 4 cm) and even smaller area sampled by subsequent analysis; the field of view of a petrographic microscope rarely exceeds a few millimetres, making it difficult to easily assess wider-scale structure and inhomogeneity in a sample. While it is possible to create photomosaics of petrographic sections, such as the Open University's teaching aid The Virtual Microscope (Whalley et al., 2011), creating these is very time consuming.
Using μ-XRF to produce element maps of roughly cut slabs is potentially a much faster way of determining the mineral phases present and their distribution through a sample. Micro-XRF element distribution maps collected from a roughly cut granite, and from a polished slab of sandstone are shown in Fig. 3. At >10 cm across, both of these specimens are too large to fit in conventional SEM or EMPA instruments.
Deformed granite from Bukit Bunuh, Malaysia is shown in Figs 3a and b. This granite is clearly porphyritic with large (1-3 cm), simply twinned, white feldspar phenocrysts set in a coarse-grained (1-3 mm) matrix of quartz, feldspar and biotite. An 11 cm × 4.5 cm × 0.8 cm slab of the granite was cut from a larger sample using a rock saw and the worst of the saw marks removed by 5 min of hand polishing with sand paper. The sample was mapped by a single scan using beam conditions of 50 kV and 200 µA, a pixel acquisition time of 10 ms and a pixel step size of 70 µm. Different mineral phases, textures and their distribution through the sample can be identified with a multi-element map displaying K, Ca, Si and Fe (Fig. 3b). Distinguishing quartz (SiO 2 ), alkali feldspar ((K, Na)AlSi 3 O 8 ) and plagioclase feldspar (NaAlSi 3 O 8 -CaAl 2 Si 2 O 8 ), which may be difficult even in thin section if the minerals do not exhibit euhedral mineral forms or display twinning, is particularly easy using this multi-element map combination. Note that, while the saw marks are still prominent in the photograph (Fig. 3a), they have not effected the quality of this element map. A fine-grained sandstone from the Precambrian Voltaian Formation, Ghana is shown in Figs 3c and d. Cross beds are visible in hand specimen as dark bands and the sample has a low porosity. X-ray mapping (50 kV, 600 μA, 60 μm pixel size) of a polished surface on the sample reveals that the bulk of the sample is formed by quartz, with ∼20% Krich feldspar. The distribution of accessory minerals in the sandstone is shown in Fig. 3d. Grains of a Zrrich phase ( presumed to be zircon, ZrSiO 4red) and a Ti-rich phase ( presumed to be rutile, TiO 2blue) form 2-5% of the bulk rock and are concentrated at cross lamination surfaces in the middle facies of this field of view, resulting in accumulations up to 1 mm thick. Interestingly, these cross laminations picked out by the accessory minerals are not visually obvious in the hand specimen. The rutile overlies the zircon, as would be expected from differential settling rates due to the density contrast between the two minerals. In the lower part of the sample the sediment is darker, reflected by a higher Fe-content in the element map. In this lower facies, rutile is much more common than zircon while the two minerals occur in roughly equal proportions in the upper facies. Such information can help reconstruct geological histories; clearly there has been some kind of change in the fluvial system between the lower and upper facies. Perhaps the upper facies simply reflects an increase in energy in the system, allowing denser minerals to be mobilized and redeposited. Alternatively the two facies may represent deposition from different sedimentary sources. Recent age determinations of detrital zircons from the Voltaian Formation has shown that the sandstones contain multiple age populations of zircon (Kalsbeek et al., 2008). Perhaps using X-ray maps to target sampling at higher stratigraphic resolution (centimetre-to-decimetre scale) may identify fine-scale fluctuations in sedimentary source location.
For characterization of bulk samples, and tentative identification of mineral phases in rock samples, μ-XRF is a useful technique. Large samples can be analysed with minimal preparation (a flat surface is required for element mapping, but polishing is not necessary for most elements) and the distribution of phases throughout a sample at the centimetre to decimetre scale can be characterized much more easily than with optical or electron microscopy. Using benchtop μ-XRF element mapping can thus potentially replace stages 2-3 in Fig. 2.

Within-phase variability
Micro-XRF mapping is clearly a useful tool for characterization at the hand-sample scale, but many geochemical applications require information on the homogeneity of individual mineral phases. This information may traditionally be acquired by a combination of petrographical study, with SEM imaging and EMPA analysis to characterize internal variation of mineral grains. One advantage of μ-XRF over electron beam techniques is that there is no need for charge neutralization (carbon coating or charge neutralizing gas in the sample chamber), as the excitation beam is of X-rays, rather than electrons, and there is no risk of sample damage due to charging, as can happen with electron beam techniques (Flude et al., 2013). However, the larger spot size of μ-XRF relative to electron beam techniques (∼25 μm vs. < 1 μm) and deeper information depth ( potentially hundreds of micrometres in silicates vs. <5 μm for electron beams) means that small-scale features may be difficult or impossible to characterize. To assess this, we studied two samples: a 12 mm alkali feldspar phenocryst from the Dartmoor Granite, UK, and 0.5-1.5 mm plagioclase phenocrysts in an andesitic ash from the Soa Basin, Flores, Indonesia. These samples are polished sections, prepared in the same way as for electron beam analysis.
The Dartmoor feldspar phenocryst has been studied previously using SEM and shows extensive evidence for in situ, fluid-mediated recrystallization and displays a range of phases and microtextures including homogenous orthoclase, pristine cryptoand microperthites, perthitic intergrowths of Abrich (Na-rich) and Or-rich (K-rich) feldspar and microcline veining (Flude et al., 2012). Many of these features are isochemical, homotactic, a maximum of 20 or 30 µm across and thus currently unresolvable by benchtop μ-XRF, but element mapping of the entire crystal reveals large scale compositional variation that is less easy to identify via SEM-based techniques due to the smaller field of view typically employed. Figure 4 shows a K and Ba map of the feldspar with associated line scan profiles that illustrate perthite texture, barium zoning and zones of recrystallization. Perthite textures in this sample highlight one of the limitations of the benchtop μ-XRF technique; Na is the lightest element that can currently be detected by these instruments but the low intensity of X-rays emitted by Na atoms makes mapping of Na difficult. The Na-rich patches and veins in the crystal are visible as K-depleted areas but under typical mapping conditions the Na-rich patches are not displayed because the peak-to-background ratio is too small to produce sufficient contrast. However, linescans across the crystal (50 kV, 200 µA, up to 500 spots per line, 200 ms per spot and with 10 repeated scans) are able to show variation in Na content (Fig. 4); potassium and sodium exhibit an inverse relationship that is evident in both crystalwide zoning ( profile A) and across perthite lamellae ( profile B). Potassium shows zones of enrichment around the edges and along the centre of the crystal. These areas of K-enrichment correspond to brown, discoloured areas which were considered by Flude et al. (2012) to be the result of fluid-mediated recrystallization and are associated with microcline veins. The crystal appears to exhibit oscillatory zoning in Ba, but not parallel to the crystal edges. This zoning reflects real variations in the intensity of the Ba X-rays rather than an artefact due to fluctuations in the spectrum background, as may happen for trace elements. Concentric, boundary parallel Ba zoning is also displayed in the subgrain defined by mapping differences in spectral background in the energy range 7.1-7.5 keV (free energy region 'f', dark pink, Fig. 4 see section below on crystallographic contrast imaging/ Fig. 7). The relative roles of magmatic and metasomatic crystallization have long been debated for feldspar phenocryst formation in granites. Here, the lack of coherence between Ba and K distribution may reflect processes related to initial crystallization and subsequent metasomatism of the phenocrysts.
The Indonesian volcanic phenocrysts exhibit oscillatory zoning under crossed polars and sometimes contain apatite inclusions (Fig. 5). Micro-XRF mapping of these crystals at the highest resolution possible (4 µm step size with ∼25 µm beam diameter) gives an indication of the scale of features that can be resolved using this technique (Fig. 5).
Calcium zoning visible on the X-ray maps in Fig. 5b,d is on the order of 100 µm and the finescale oscillatory zoning visible under crossed polars cannot be resolved. The scale of features that can be resolved by X-ray mapping is dependent FIG. 4. Multi-element X-ray map and element line scans of a feldspar phenocryst from the Dartmoor Granite. Display of K and Ba maps reveals decoupled crystal-wide zoning in both elements. Perthite texture is visible in the K-maps as relative K-depletion and enrichment, but this is not observed on Na-maps (not shown) due to the low fluorescence yield of Na characteristic X-rays. Line scans, however, do illustrate the variation in Na, which has an inversely proportional relationship with potassium. 'f' = free region, mapping differences in the spectral background.
on the contrast in X-ray intensities between those features; as the statistical error on the X-ray intensity, I, is ΔI/I = 1/√I, low X-ray intensities result in larger fluctuations in the spectrum (i.e. unresolvable contrast). In such cases, image contrast may be improved by increased measurement times or repeated scanning of the map to increase the net X-ray intensity for each pixel. In the case of the oscillatory zoning, where the compositional differences between the zones are relatively small and gradational, only the largest elemental contrasts and broader-scale zoning are visible.
In the case of high X-ray contrasts of small features, measurement of the dimensions of the features from X-ray maps should be carried out with caution, especially when using pixel averaging filters (see Supplementary Fig. S4 for examples of how pixel averaging filters affect the clarity of the element maps). In Fig. 5c,d, a 40 µm wide apatite crystal appears to be twice as large in the X-ray map as in the photomicrograph due to the convolution of the crystal size with the spot size. This effect may be enhanced by image processing that averages or smoothes pixels.
Micro-XRF is a potentially valuable tool for imaging wide-scale variation within mineral phases where elemental variation is strong or for trace elements. But benchtop μ-XRF cannot compete with SEM-EDS for imaging of small-scale or subtle major-element zoning profiles, especially for elements with relatively low characteristic X-ray yields.

Mineral separation
A common problem during mineral purification (e.g. for 40 Ar/ 39 Ar geochronology), especially for less experienced researchers, is conclusive identification of the correct mineral phase during hand picking of crystals or mineral grains under a binocular microscope. In particular, K-feldspar is impossible to distinguish conclusively from plagioclase by sight alone, and even quartz grains can be difficult to distinguish from feldspars in some rock types. Visually distinguishing K-bearing amphiboles from K-poor pyroxenes can also be difficult in some situations. Hynek et al. (2011) successfully applied the technique of staining crystals with sodium cobaltinitrite to facilitate hand picking of sanidine phenocrysts for 40 Ar/ 39 Ar analysis, but the staining process itself and subsequent removal of the stain adds an extra layer of complexity into the sample preparation procedure, and, in some countries, sodium cobaltinitrite is a regulated substance, with special training and licenses required for its usage.
Micro-XRF element mapping of mineral grains can aid identification of the phases of interest and provide a level of quality control to ensure that mineral separates are high-purity. For many samples, crushing and sieving is adequate preparation for sample screening and mineral purification. Figure 6 shows a μ-XRF map of a sieved sample of volcanic ash that was used to select Kfeldspar grains for 40 Ar/ 39 Ar geochronology. This is a sample of the Younger Toba Tuff, collected from the Lengong Valley, Malaysia (Storey et al., 2012). Minerals were separated from an unconsolidated ash sample by washing in a prospecting pan. An aliquot of these phases with a grain size of 250-315 µm was scattered onto a numbered 4 mm grid microscope slide (total area 20 mm × 50 mm) and the grains fixed into place using hairspray. The slide was mapped using 10 ms per pixel (total mapping time ∼100 min) with a pixel distance of 50 µm. Silicon, K and Ca were displayed using the same colour scheme as in Fig. 3b, with the addition of Ti in white; the white paint that forms the grid on the microscope slide contains Ti, so displaying this element allows easy location of the position on the slide. With this colour scheme, the quartz grains display as red, the red Si and green K combine to display Kfeldspar and biotite in yellow-green shades (the stronger the green colour, the higher the K:Si ratio) and the red Si and blue Ca combine to display Cabearing plagioclase as light red to purple, depending on the Ca-content (in this example the plagioclase crystals are Na-rich so there is only a subtle colour difference between quartz and plagioclase, but these minerals can be identified more easily by adding Al to the map). The multielement map was saved and compared to the microscope slide to allow easy and rapid hand picking of the phase of interest; in this case 150 grains of K-feldspar (∼50 mg) were collected for argon isotopic analysis (Storey et al., 2012). For samples where the mineral grains are >300 µm, the mapping time can be reduced significantly by reducing the step size and/or the pixel dwell time. This more rapidly generated map produces a lowerquality image with a grainy appearance and lower image contrast, but is adequate to distinguish large, well-spaced, chemically distinctive mineral grains. While we envisage this technique being of particular interest to 40 Ar/ 39 Ar geochronologists, it can also be applied to locating minerals required for other specific analytical methods, such as zircon for U-Pb age determination (cf. Voltaian Sandstone case study, above), ore mineral screening, and grain provenance studies.

Other applications for μ-XRF qualitative analysis
In addition to improved sample characterization and processing, qualitative analysis with a benchtop μ-XRF has a lot of potential for other geological and mineralogical applications. We highlight two of these here as tools to be developed in the future.

Crystallographic contrast imaging
Special artefact peaks can arise when analysing crystalline material by diffraction of the polychromatic tube spectrum by the crystal lattice, resulting in the formation of diffraction peaks and variable background in the spectrum (Fig. 7). Such peaks may interfere with the correct identification of element peaks and can be identified by changing the diffraction angle for a single crystale.g. by rotating or tilting the crystal, or, in the case of crystals containing multiple domains with different crystallographic orientations, moving to a different part of the crystal (see f1 (green) and f2 (red) in Fig. 7). Mapping of these diffraction peaks has the potential to allow identification of qualitative differences in crystallographic orientation within and between minerals, providing a form of orientation contrast (OC) imaging (cf. Prior et al., 1996). Detailed interpretation of these orientation contrasts is probably more difficult than in SEMbased OC techniques as, during SEM-OC and electron back-scatter diffraction (EBSD) imaging, the energy of the scattered electrons is well constrained, while in the case of μ-XRF-OC the radiation is polychromatic. This feature can be exploited by employing multiple X-ray detectors in the instrument, located at different orientations to the sample, but even using a single detector can reveal crystalline microtexture information. Such qualitative OC information may be of particular interest to 40 Ar/ 39 Ar geochronologists investigating the effect of microtextures on diffusion of Ar within crystals, as XRF is unlikely to disturb the K/Ar or 40 Ar/ 39 Ar systems as has been observed for SEMbased techniques (Flude et al., 2013). An example of this OC imaging is illustrated in Fig. 7, which shows a combined element and diffraction peak map of the large alkali feldspar phenocryst on the left of the sample in Fig. 3a 7. (a) Qualitative orientation contrast map of a simply twinned alkali feldspar. f1 and f2 represent selected energy channels in (b). 'Q' shows the location of polycrystalline quartz. (b) Two X-ray spectra representing the two different alkali feldspar simple twin domains in (a). The free regions selected for mapping (f1 and f2) are highlighted and illustrate differences in spectral background due to scattering of X-rays by the crystal lattice.
visible in hand specimen for this crystal. A higher resolution element map (70 μm step size) of this area was collected and two areas of the map, corresponding to the two twin domains, were selected. The spectra derived from these two areas were examined and compared to identify diffraction peaks and differences in spectral background that may be due to X-ray diffraction by the crystal lattice (Fig. 7b). Appropriate energy ranges were selected (free regions 'f1' and 'f2') and their maps displayed. This composite map clearly shows crystallographic orientation contrasts between the alkali feldspar twin domains (Fig. 7a). These energy ranges also show up as green and red flecks in patches of quartz ('Q' on Fig. 7a), suggesting that the quartz patches are polycrystalline, and that the quartz grains are oriented randomly.

Palaeontology
During fossilization of organic remains, material may be replaced or destroyed. The resulting fossil may be fragile, delicate and easily damaged, thus difficult to study in fine detail. Recent application of SR-μXRF to various fossilized materials, including an Archaeopteryx fossil, have identified both invisible fossilized components that are hidden behind a thin layer of sediment and the direct preservation of biological soft-parts, such as feathers (Wogelius et al., 2011;Bergmann et al., 2012). To assess such capabilities on a benchtop μ-XRF system, chemical mapping was carried out on a well preserved fossil of Diplomystus dentatus (Cope, 1877;Grande, 1982) from the Eocene Green River Formation (Smith et al., 2008), Wyoming. As would be expected, element maps of P and Sr show fine detail of the fossilized skeleton, but of particular note are the fish scales revealed by the P map (Fig. 8).
Whilst hydroxylapatite is a common component of fish scales (Lanzing and Wright, 1976;Ikoma et al., 2003;Kalvoda et al., 2009), these scales are practically invisible on the fossil itself. It is not clear whether the scales have simply been preserved in a way that is not visibly obvious, or whether they are preserved beneath a thin layer of limestone, but given the low atomic number of P and the low energy of X-rays it emits, we would expected that P-derived X-rays would be attenuated by just a few micrometres of overlying material and it is more probable that the scales have been preserved but are almost invisible to the naked eye.

An appraisal of 'standardless' quantitative analysis using benchtop μ-XRF
Benchtop μ-XRF systems are generally marketed as tools for qualitative elemental analysis, such as FIG. 8. X-ray map and photograph of a fossil fish, Diplomystus dentatus, from the Eocene Green River formation, Wyoming, USA, displaying P (red) and Sr (green). Fish scales, which are not visible on the fossil specimen itself, are clearly visible as variations in P intensity on the element map. The black box shows the position of the close up images. element mapping, but commercial manufacturers also claim that standardless, fully quantitative analysis is also possible. Here we assess the precision and accuracy of quantitative elemental analysis of silicate geomaterials using benchtop μ-XRF, by measurement of commonly analysed elements in certified, international standards and in internal reference materials. In turn, we assess the relative contributions of errors due to counting statistics, instrument stability, peak deconvolution and standardless quantification to the statistical error and precision on quantitative analyses, followed by an assessment of the accuracy of the method. While this assessment is specific to the instrument used, the principles controlling accuracy and precision are universal to standardless quantification of X-ray spectra and will provide an overview of the capabilities and limitations of this technique. We note that, for elements that suffer interferences from overlapping peaks, such as Ba, La and Ce in Ti-bearing samples, the assessment of counting statistics and instrument stability are not fully representative as these assessments take place before the peak convolution process.

Precision
The relative percentage error due to counting statistics is shown in Supplementary Fig. S2 and Supplementary Data Table S3 (where the standard deviation on an X-ray intensity measurement is assumed to be the square root of the measured gross intensityi.e. the area under the spectral peak in the region of interest (ROI) of the characteristic X-ray, not corrected for background) for a number of elements using different analysis times.
The error due to counting statistics is minimized by measuring for at least 300 s, which reduces the relative percentage error to <1% for light elements (Na, Mg) and trace elements and <0.5% for most other major and minor elements. To optimize analysis conditions for this first assessment of quantitative analysis, each standard was analysed for ∼600 s per analysis, as described in the methods section.
The error contribution from short-term instrument stability was assessed by carrying out ten sequential measurements (600 s each) on the same standard (GSP-2) and calculating the mean and standard deviation of the gross intensity in the ROI for each element. These results are shown in Supplementary Data Table S4 and the coefficient of variation (relative percentage errors) are summarized in Table 1. For most elements the coefficient of variation (n = 10) is between 0.2 and 0.8%. For Zr this is significantly higher at 1.36%. To investigate the possible reasons for this, the ROI gross intensities for individual analyses were plotted in the order they were analysed (Fig. 9). For most elements there is no systematic variation in intensity over time, but for Zr and Y, and to a lesser extent for Ba, Ti, Co, Sr and Nb, the measured intensity increased during the experiment. In the case of Zr this increase was significant enough to raise the standard deviation of the ten measurements and this systematic intensity increase translates to an increase in calculated concentration, of ∼30 ppm, over time (Fig. 9). This observed increase in intensities is probably due to a slight drift in the detector over time; for analyses of metal samples, the spectrometer is usually calibrated with the zero peak and high energy X-rays from Zr (15.7 keV) or Mo (17.5 keV) but the GSP-2 analyses were carried out after calibration with the CuKα peak (8.0 keV) to facilitate more accurate calibration of the lower energy part of the spectrum which dominates in silicate analyses. The result is that any drift in the spectrometer will have a magnified effect on the spectrum outside of the calibrated range (i.e. >8 keV), which includes the elements Rb, Sr, Y, Zr and Nb. Notably, the Kα peaks of Y, Zr and Nb are overlapped by the Kβ peaks of Rb, Sr and Y, respectively, and we hypothesize that changes in measured intensity due to spectrometer drift will be more pronounced for elements that experience an overlap in X-ray energy range. Yttrium and Zr exhibit a more pronounced change in measured intensity than Nb and this may be explained by the different concentrations of the overlapping elements; in this sample, Rb and Sr, which overlap Y and Zr, are of an order of magnitude higher concentration than Y, which overlaps Nb (248 and 243 ppm vs. 28 ppm) and so interferences from Rb and Sr are expected to produce a greater increase in measured intensity.
To test if these changes were due to spectrometer drift, each of the ten GSP-2 spectra was recalibrated manually using the zero and CuKα peaks and the Zr results are plotted beneath the raw Zr data in Fig. 9. As expected, the recalibrated data do not show the systematic increase over time. This is also true for Y, Nb and Sr (Table S4). However the accuracy and precision of the data has decreased, due to calibrating the spectrum with a peak of low intensity (GSP-2 Cu-content = 43 ppm).
Plotting the data sequentially also showed that, for most elements, the first analysis gives Errors are quoted as relative percentage error (error/value × 100). True wt.% is the certified value of the standard used (Wilson, 1998b, Supplementary Table S2.) Error contributions are listed from: Counting statistics (square root of the mean gross intensity, n = 10); Instrument error (standard deviation of the gross intensity, n = 10); Deconvolution (intensity) (errors introduced during spectrum deconvolution for quantification -standard deviation of the deconvoluted intensities, n = 10); Deconvolution (wt.%) (standard deviation of the calculated concentrations derived from the deconvoluted intensities, n = 10); Trueness (accuracy of the calculated concentrations -deviation of the mean value, n = 10, from the published value), for 'standardless' quantification ('non-cal') and using a single standard type calibration (calibrated).
'nd' = element not quantifiably detectable or not determined. Note that the relative errors due to counting statistics and instrument stability are not fully representative for elements that suffer from peak overlaps, such as Ba, La and Ce in Ti-bearing samples. See the section on Accuracy for details of how Trueness is calculated and use of calibration.
*Concentrations of major and minor elements are calculated as oxides, as listed in the 'element' column, but X-ray intensity data refer to the pure element.
FIG. 9: Variation in ROI gross intensity (i.e. the area of the non-background-corrected characteristic X-ray peak) and concentration, calculated using 'standardless' quantification, over the course of ten repeated measurements (data in Table S4). Grey boxes show the value of the mean ± 1 standard deviation (n = 10). Error bars (±1 standard deviation) are from counting statistics for each element (see Table 1).
consistently lower intensities by ∼1%. Closer inspection of the metadata associated with this spectrum shows that the measurement time was only 594 s, rather than 600 s. This is due to using a slightly different multi-point grid configuration during the first analysis (9 × 11 grid = 99 analyses of 6 s each vs. 10 × 10 grid = 100 analyses of 6 s each) and, as the gross intensity increases linearly with measurement time, this 1% discrepancy can be explained by the 1% reduction in measurement time. This lower X-ray intensity observed for many elements in the first analysis does not translate to a systematic difference in calculated concentration (Fig. 9, Supplementary Table S4), but for some trace elements (e.g. Rb) the first analysis is ∼10 ppm lower than the average concentration.
Recalculating the standard deviations of the ROI gross intensities, to exclude the first analysis, gives relative percentage errors of 1.16% for Zr and 0.10-0.75% for other elements. Plotting these values against the relative % error due to counting statistics (Fig. S3) shows an approximate 1:1 correlation, suggesting that, for most elements, the error due to counting statistics dominates over short-term instrument error. Closer inspection of the calculated concentrations in Fig. 9 shows that the third analysis gives concentrations for Fe 2 O 3 and Na 2 O that are, respectively, lower than and higher than their mean ± 1 standard deviation. The calculated Fe 2 O 3 and Na 2 O concentrations appear to be inversely proportional throughout the experiment, even though exactly the same area was analysed for analyses 2-10. This illustrates how small fluctuations on one major-element peak can influence the precise calculation of other elemental concentrations.
Next we investigate the loss of precision due to the spectrum deconvolution process and test the validity of the fundamental parameter algorithm used by the Bruker proprietary software. Rousseau (2006) suggested that the fundamental parameter algorithm could be validated by measuring the same multi-element specimen ten times and comparing the coefficients of variation of the calculated concentrations to that of the net intensities; for a valid algorithm, the relative errors will be within the same order of magnitude for both the net intensity and the concentration data. A basic quantification scheme was used to calculate common major (Na, Mg, Al, Si, K, Ca and Fe) and minor (P, Ti, Mn) elements as oxides and trace elements (V, Cr, Co, Ni, Cu, Zn, Ga, Rb, Sr, Y, Zr, Nb, Ba, La, Ce, Th) as pure elements. The deconvolution process involves identifying the elements to be quantified and fitting Gaussian peaks for each element to the spectrum. The net intensity is then calculated as the integral within the full width at half maximum of the peak, minus FIG. 10. Relative intensity errors caused by the statistical error of the gross peak intensity, the standard deviation of ten repeated measurements of the gross peak intensity (instrumental error), and the standard deviation of the net peak intensities after spectrum deconvolution. Elements in bold are those that experience a notable increase in error during spectrum deconvolution due to overlapping peaks. X-ray energy (x axis) refers to the approximate energy region of the higher intensity characteristic X-ray peaks used in the deconvolution; for most elements these are K-lines but L-lines are used for Ba and Ce. Values are reported in Table 1. spectral background, sum and escape peaks and overlapping peaks from other elements. The error on the deconvoluted peak intensity (net intensity) is controlled by the statistical error of the peak (which in turn is dependent on the intensity of the peak itself ) and on any overlap with other element peaks. The relative error on the net peak intensities and the calculated concentrations will therefore be higher for small peak intensities (due to limited excitation efficiency or low concentrations, e.g. Na, Mg), for peaks with a high spectral background (e.g. Ni, Cu, Rb, Sr), and for peaks that experience strong overlaps (e.g. Ti, Ba and Ce or Cr, Mn, Fe, Co and Ni, see Fig. 10). For many elements, the coefficient of variation increases with deconvolution, indicating that peak deconvolution causes a loss of precision compared to measurement of gross X-ray intensity (Table 1 and Fig. 10).
Comparison of the coefficient of variation for the net intensities ('Deconvol. (intensity)' in Table 1) and calculated concentrations ('Deconvol. (calcconcs.)' in Table 1) shows that these relative errors are very similar and of the same order of magnitude. The fundamental parameter algorithm used by the Bruker proprietary software is thus valid, according to the test described by Rousseau (2006). Figure 11 plots the coefficient of variation (n = 10) of the calculated concentrations against the true concentration. When light elements (Na and Mg) and trace elements that overlap with Ti (Ba, Ce and La) are discounted, a rough trend of increasing error with decreasing concentration is observed scattered around a trend line following a power law of form y = 0.6255x −0.329 , where x is the element (or oxide) concentration in wt.% and y is the coefficient of variation (n = 10). Lighter elements experience a steeper trend, indicating a stronger control of concentration on the error; Fig. 11 compares Na 2 O, MgO, K 2 O and CaO and illustrates a decrease in trendline slope with increasing Z.
In summary, the measurement reproducibility is controlled by both the intensity of the characteristic X-ray peak, which is a function of element concentration and the atomic number, and the ease of deconvoluting the characteristic X-ray peaks in the spectrum. The influence of peak deconvolution on data quality means that elemental detection limits will vary from sample to sample, depending on the bulk chemistry, material, and influence of overlapping, interfering and artefact peaks. Precision can be optimized by measuring for at least 300 s, which reduces the relative error from counting statistics to <0.5% for most major elements and <1% for all elements. Trace elements whose peaks overlap with higher intensity peaks (such as Ce, La and Ba, overlapping with Ti) give the least precise data, as proportionally small variations in the deconvoluted high-intensity peak translate into proportionally large variations in the smaller peaks. Similarly, low-abundance and especially light-elements are strongly affected by subtle variations in the deconvoluted background intensity and so also show reduced precision. Nevertheless, relative standard deviations can be expected to be <1% for most major and minor elements, <5% for low-Z elements and 1-10% for most trace elements.

Accuracy
Application of standardless quantification calculations to XRF-data is a relatively new development, FIG. 11. Plots of the coefficient of variation (relative percentage error) against the true concentration. (a) Black crosshairs represent all data from all standards. Black circles are the same but excluding Na 2 O, MgO, Ba, La and Ce measurements and are fitted by a power law. (b) Errors are more strongly influenced by concentration for the lighter elements. but variations of these procedures have been in use with some EMPA systems for ∼20 years. As discussed by Newbury and Ritchie (2013), many EMPA studies using standardless quantification procedures consider only the errors associated with analytical precision and fail to consider the absolute accuracy of the quantification technique. An earlier study (Newbury et al., 1995) showed that, for standardless quantification procedures, the relative errors ([Measured-True]/True × 100%) were ±∼25% for major and minor elements (Newbury et al., 1995), while modern commercially available standardless quantification protocols yielded relative errors for major elements of up to 30%, resulting in miscalculation of chemical formula (Newbury and Ritchie, 2013). Such large errors obviously place limitations on the quantitative abilities of these standardless techniques, and so here we assess the accuracy of modern standardless quantification of μ-XRF spectra. Table 1 shows that, for standard GSP-2, the deviation from the expected value (i.e. the trueness of the measurement) is much larger than the instrumental errors. In general, the largest relative deviations are associated with low-abundance elements (<1 wt.%), but the large relative error on the K measurement (37%) is an exception. For GSP-2, most elements, including trace elements, are within ±50% of the true value. The large relative error (>2000%) on the Co data is likely to be due to difficulty deconvoluting the Co and Fe peaks.
Relative percentage errors for some of the other standards, however, are much larger, with 1σ relative errors much greater than 100% for many elements and some trace elements being overestimated by an order of magnitude (Co and Ba). The deviation of the measured values (mean of 10 analyses) from the published values for standards GSP-2 (granite) and BHVO-2 (basalt) is compared in Fig. 12. For trace elements especially, the data from BHVO-2 are less accurate than for GSP-2.
Some elements were not detectable in the reference materials. In some cases this is probably due simply to low element concentrations (e.g. Vor Cr in OOL31A, GSP-2, AGV-2 and T1-G; when not detected the published value is always <60 ppm). Lanthanum and Ce prove to be difficult to detect quantitatively, despite standards containing concentrations as high as 182 ppm (La, GSP-2). This is probably due to difficulties in deconvoluting the La and Ce characteristic Lα and Lβ X-ray peaks from the larger Ba (Lα and Lβ) and Ti (Kα and Kβ) peaks. Phosphorus was not detectable quantifiably in any standard, regardless of being present in concentrations up to 0.49 wt.% P 2 O 5 (AGV-2, OOL-31A) with a small peak being visible in the ROI for P on many of the spectra. Close inspection of a number of spectra suggests that deconvolution of the ZrLα1 (2.044 keV) may interfere with detection of the PKα1 peak (2.010 kV) and that an estimate of the P 2 O 5 concentration can be given by excluding Zr from the quantification procedure, although this still under-estimates the P 2 O 5 content of the standards. In the case of OOL-31A (an internal reference material with a published P 2 O 5 content of 0.49%; measured concentration of 0%) the discrepancy between the published and measured concentration FIG. 12. Accuracy ( published/measured concentrations) for all quantified geochemical elements in GSP-2 and BHVO-2. Data that plot closer to the black horizontal line are more accurate than those that plot further away. Data where the measured concentration of an element is 0 are not included on this plot. may be an artefact of bulk sample inhomogeneity; the published analysis was carried out using standard XRF techniques on a powdered sample which would incorporate any rare apatite crystals in the rock. Conversely, our analyses were carried out on a comparatively small volume of crystal-free obsidian and so any contribution of P from apatite crystals would not be measured. Many of the observed large deviations from true values are probably the result of the fundamental parameters employed in the quantification procedure not being completely exact, the measurement geometry not being exactly as described by the Sherman equation, and inadequate assumptions about the stoichiometric proportions of oxygen when calculating oxide concentrations. As a result, quantification of silicate materials will benefit from an additional level of calibration. The M4 Tornado software includes a 'Type Calibration' function that introduces a calibration factor for each element into the quantification algorithm. The calibration factor is calculated as the true concentration divided by the measured concentration on an appropriate standard. As we have already noted, the accuracy of the data seems to vary with composition of the material and so type-calibrations should use calibration factors derived from a standard of similar composition and matrix to the unknown. Figure 13 shows how the calibration factors calculated for each element in each reference material varies as a function of abundance for two major and two trace elements, with deviation from unity acting as a proxy for inaccuracy.
Trace elements tend to be highly over-estimated, with calibration factors ranging from <0.1 to ∼0.8, and increasing with increasing abundance. For Nb (all reference materials <50 ppm), plotting the calibration factor against abundance yields a linear correlation (R 2 = 0.94) while for Rb (all reference materials <250 ppm) a logarithmic fit (R 2 = 0.96) describes the distribution; the difference in fit between Nb and Rb is probably due to the difference in abundance, with many trace elements generally showing a steep increase in calibration factor between 0 and 100 ppm. There is no apparent difference in behaviour between glass and powdered matrix for trace elements.
Most major elements show no systematic relationship between elemental abundance and calibration factor. Exceptions are Al and Fe, measured as the oxides Al 2 O 3 and Fe 2 O 3 (Fig. 13). Fe 2 O 3 shows a slight decrease in calibration factor, away from unity, with increasing abundance, suggesting that Fe-analyses are more accurate at lower concentrations. As a whole, this trend gives a poor correlation, but when considered FIG. 13. Variation in calibration factor (expected value/measured value) for Al 2 O 3 , Fe 2 O 3 , Rb and Nb. Black circles represent glass and grey circles powdered reference materials. Trend lines represent regression lines (linear and polynomial), fit to either glass or powder data or both, as described in the text. Horizontal black line on the major elements represents unityi.e. coherence between measured and expected values.
in terms of sample matrix ( powder and glass), two trends, that can be described by second degree polynomial regressions with R 2 = 1, become apparent. This suggests that sample matrix (glass/ powder/crystal) influences the accuracy of Feanalyses, but more work is needed on a wider range of standards to confirm these trends. Al 2 O 3 , which contains one of the lighter (and thus more difficult to measure) elements shows the highest deviation from expected values at lower concentrations (<14 wt.%), while calibration factors for concentrations >14 wt.% are close to unity. The Al 2 O 3 calibration factor for BHVO-2 is notably low (0.89) compared to the other reference materials. BHVO-2 is a powdered basalt, and a similar Al 2 O 3 calibration factor was observed for an additional (data not published) in-house basaltic powder reference material. If the BHVO-2 data is discounted, the relationship between Al 2 O 3 abundance and calibration factor can be described by a third-order polynomial regression ( y = 0.002x 3 − 0.001x 2 − 0.1257x + 2.3957, where y = calibration factor and x = abundance in wt.%) with R 2 = 0.9844 (n = 5). A probable reason for the anomalous behaviour of Al 2 O 3 in basaltic materials was provided by Perrett et al. (2014), who observed similar behaviour when analysing powdered basalt from Iceland using combined Particle Induced X-ray Emission and XRF. They suggested that, for silicate rocks where the constituent minerals might have very different compositions (e.g. in the case of basalts, Fe-rich pyroxene and Fe-poor plagioclase), problems may occur for light elements when analysing powdered materials because the transmission of characteristic X-rays from the sample will be determined by the individual mineral grains present in the powder, rather than the bulk composition, as is assumed by many spectrum deconvolution and fundamental parameter algorithms. In such scenarios, the software will assume a high degree of attenuation of Al X-rays due to the high Fe-content, but in reality the Al X-rays are emitted from Fe-poor plagioclase grains and so experience less attenuation; this results in an underestimation of the theoretical Al X-ray yield, and subsequent over-estimation of the Al-abundance, even for fine-grained, well-mixed powders. Calibration factors for many elements during standardless XRF quantification of silicate materials will probably vary with elemental concentration, raw sample material, and matrix of the analysed sample. However, as geological materials may have a wide range of geochemical compositions, appropriate standards may not always be readily available. To assess whether applying a single type-calibration can improve analyses for a wide range of materials we recalculated all of our standard and reference material data using a type calibration based on standard AGV-2, which is of intermediate composition. Calibration factors (Supplementary Table S10) for each element were calculated by dividing the expected value by the measured value of AGV-2 (mean of n = 10). Cr was not quantifiably detectable in AGV-2 and so the calibration factor for this element is derived using data from BHVO-2. Lanthanum and Ce were only quantifiably detectable in the most silicic standards (La in OOL-31A and Ce in GSP-2) and, given the difficulties in accurately deconvolving their peaks from Ti and Ba, these have been excluded from the calibrated quantification scheme for simplicity. The new calibrated data were calculated using a 2-step quantification process. First the apparent concentration of P 2 O 5 in all standards was recalculated by excluding Zr from the quantification, as described above. This value (mean of n = 10) was then used to fix the concentration of P 2 O 5 in the second stage, during which the calculated AGV-2 calibration factors were applied to each element. While this method does not facilitate calibration of the P 2 O 5 concentrations, it does at least allow the concentrations to be estimated in most of the standards and manual calibration of the P 2 O 5 data can be carried out where appropriate.
These calibrated data are shown in Supplementary Tables S4-S9. Interestingly, in AGV-2, V was detected during the non-calibrated quantification, but not in the calibrated quantification. This may be because, similar to Ce and La, the VKα line (4.953 keV) has a strong overlap with Ti and Ba peaks. P 2 O 5 data still show large errors (up to 100% when not quantifiably detected) but are improved overall compared to the original quantification.
Calibrated and non-calibrated data are compared in detail in Fig. 14 for three of the reference materials -BHVO-2 (basaltic composition), T1-G (intermediate composition), and GSP-2 (silicic composition), together with data for all of the reference materials. Calibration improves accuracy for most trace elements in all of the standards but the major-and minor-element accuracy was only improved in AGV-2 (the standard used to generate most of the calibration factors), and somewhat in GSP-2 and OOL-31A. These reference materials are the most silicic and, as previously noted the non-calibrated data also seem to be more accurate for more silicic samples. Much of this may be explained by the tendency of more silicic rocks to contain higher trace-element concentrations and are thus easier to measure, but the observed decrease in accuracy with increasing Fe-content suggests that there are problems associated with quantifying Fe. This most probably relates to incorrect assumptions regarding the oxidation state of iron when calculating FeO or Fe 2 O 3 concentrations and may affect the accuracy of other elements due to subsequent assumptions regarding the sample matrix. Application of a correction factor in this situation may magnify these errors, resulting in a decrease of accuracy for many major elements. As a result, we recommend that type-calibration is only carried out for major-and minor-element quantification if standards of comparable composition and matrix are available, and/or if the oxidation state of the iron is known. For trace elements, however, application of a type-calibration seems to improve accuracy (reducing errors to <100 relative %), even for samples of significantly different composition to the standard; many trace elements are more accurate than the 2σ relative errors of 50% for trace elements measured with standardless EMPA (Fialin et al., 1999;Pyle, 2005;Imayama and Suzuki, 2013).
When considered in isolation, the relative errors of the appropriately-calibrated data still seem relatively high. However, when considered in terms of the absolute values they represent, these errors are much more acceptable (Fig. 14). Relative errors on major-and minor-element oxides may exceed 100%, but this translates to absolute errors of <2 wt% for major and <0.2 wt% for minor elements (when Z ≥ 19). When calibrated, relative errors on trace elements may still exceed 100% but the majority of this data, falls within ±<50 ppm of the true value.
These errors are significantly higher than associated with traditional XRF analyses. Appropriate use of standards for type-calibration may improve the accuracy, but more work is needed to identify the factors that influence whether a standard is appropriate or not (composition, matrix). However, this conclusion is based on data from elementally complex materials and it is possible that quantification is more reliable on simpler ( purer) materials such as minerals. More work is needed to identify suitable mineral standards to test this and develop optimized analysis protocols. In the meantime, the accuracy will be sufficient to roughly characterize a sample for many applications and where EMPA, normal XRF or ICP-MS techniques are either unavailable or inappropriate; but benchtop μ-XRF cannot yet provide a substitute for these techniques.
FIG. 14. Raw and calibrated data expressed in terms of the % deviation from the true value in relation to the true elemental abundance. Log scale on both axes. Also plotted are lines showing the percentage relative error for absolute errors of between ±1 ppm and ±2 wt.% on concentrations of between 0.1 ppm and 100 wt.%. Noncalibrated data (grey) are compared with data calibrated to intermediate-composition AGV-2 (black). Data available in Supplementary Tables S4-S9.
We recommend that appropriate standards are used to develop type calibrations for full quantification using benchtop μ-XRF, but in the absence of appropriate standards, the quantification can be improved by applying a type-calibration to the trace elements only, resulting in typical relative errors of <50% for most major and minor elements and up to 100% for most trace elements (Fig. 14). Where an appropriate standard and type-calibration is available, the typical 1σ relative errors may be reduced to <5% for most major elements and <6% for some minor elements, although quantification of low abundance light elements (P 2 O 5 , Na 2 O, MgO) remains a problem. We note that application of a type-calibration to trace elements should only be carried out on analyses with good counting statistics (i.e. longer measurement times) otherwise the error-propagation associated with the calibration can result in errors that are larger than the initial deviations.

Summary, implications and recommendations
Overall, benchtop μ-XRF instruments present a variety of advantages and disadvantages compared to established in-house microanalysis techniques. The larger sample chamber compared to SEM-EDS, EMPA and ICP-MS instruments allows a greater range of sample sizes and shapes to be analysed. This, combined with the lack of preparation needed for many samples, means that such instruments are excellent tools for first order sample characterization and phase identification.
For element mapping, high quality maps can be produced from flat but unpolished surfaces, when mapping strong concentration contrasts (e.g. due to different mineral phases) in elements Z > 13, although lighter elements (Z < 19) benefit from a polished sample surface due to the shallower information depth of their characteristic X-rays. For major elements with Z < 13 (i.e. Na and Mg) elemental distribution maps are only feasible for particularly high concentrations and contrasts. In some cases, line scans can provide more detailed information than element mapping. For many elements, subtle concentration contrasts are difficult to map, especially at small scales. Therefore, for mapping of light elements and of subtle concentration differences, μ-XRF can only provide limited data and cannot yet compete with SEM-EDS or LA-ICP-MS techniques. For some applications (characterization of large samples; mineral separation), however, benchtop μ-XRF is an unrivalled technique due to its ability to analyse large samples with minimal sample preparation.
The concentration contrasts that are resolvable via element mapping/line scans are sample specific and will depend on the elemental concentration, Z, the sample surface and sample matrix (material and composition). The case studies provided above illustrate the type of scenarios that are well-suited to, or approach/exceed the capabilities of benchtop μ-XRF (e.g. perthite texture in alkali feldspars can be imaged regardless of difficulties in analysing sodium, but subtle calcium concentration differences in oscillatory zoned plagioclase crystals cannot be fully resolved).
Semi-quantitative data can be acquired by FP based 'standardless' quantification of X-ray spectra. In the reference materials analysed, most elements present with Z > 11 (Na) were detected quantitatively although detecting minor element phosphorous as P 2 O 5 (<1 wt%) was difficult or impossible in many samples. Quantitative detection of particularly low abundance (<200 ppm) trace elements whose characteristic X-ray peaks overlap with higher concentration elements (La, Ce) and for lighter trace elements at concentrations of <60 ppm (V, Cr) was also difficult. For other trace elements, however, detection was possible at concentrations as low as 10 ppm (Rb in BHVO-2, Zr in GOR-132). Measurement reproducibility is generally proportional to concentration. For major elements other than Na 2 O and MgO, the coefficient of variation on ten measurements is <1% and often <0.5%; for Na 2 O and MgO it is <8% and for minor elements it is <2%. The coefficient of variation for trace elements generally ranges from ∼0.3 to 51%, depending on concentration and ease of peak deconvolution. These relative errors translate to maximum standard deviations of <0.2 wt.% for major, <0.02 wt.% for minor and <10 ppm for most trace elements, and so benchtop μ-XRF can be a useful semi-quantitative tool when distinguishing materials with concentration differences greater than this.
Accuracy of the FP based standardless quantification is lower for silicate materials than for the metal alloys commonly used by instrument manufacturers to demonstrate the accuracy of the technique and is generally more accurate for high-Si, low-Fe samples than for low-Si, high-Fe samples. Measured concentrations may deviate from true values by up to 2 wt.% for major elements and tens to hundreds (rarely >1000) of ppm for trace elements. More accurate quantification is possible by analysis of a standard and using a type-calibration to correct the data. Correction factors for a given element do not always follow a linear relationship with abundance and are not necessarily valid across different materials (e.g. glass vs. powder), and so the standard used for the type calibration should be of similar composition and matrix to the material of interest. Type calibration with an appropriate standard allows determination of most trace-element concentrations to within 50 ppm of the true value, although this significantly increases the effort of data acquisition. Furthermore, extra care is required for quantitative analysis of powders derived from multi-mineralic samples, if minerals with very different compositions (high and low Fe-content) are present, due to incorrect assumptions by many FP algorithms regarding the homogeneity of X-ray attenuation in the sample. The accuracy of elements that yield low X-ray intensities and overlap with other peaks (e.g. Ce and La with Ba and Ti; P with Zr) will be affected by the concentration of the overlapping peaks and the ability of the software to deconvolve the relevant peaks; detection limits for different elements will thus vary between samples due to different bulk composition and matrix. Considering these problems, benchtop μ-XRF is thus not an optimum tool for routine quantitative geochemical analysis of bulk rock samples or for high-accuracy traceelement determinations, for which established XRF and ICP-MS techniques will probably give better results. However, for quick and easy distribution analysis and semi-quantitative geochemical analysis, including for trace elements, benchtop μ-XRF has the potential to be a very powerful tool for the geoscience community.