Life detection in Martian returned samples: correlation between analytical techniques and biological signatures

Abstract As soon as samples collected from Mars will be brought back to Earth, the samples will be placed inside a receiving facility to check for the presence of life. There is a large number of approaches that were proposed on the techniques to be used to investigate the presence of life and any biological risk in the returned samples. Another interesting approach was reported by Kminek in which suggestions were provided on how to organize the sample analysis sequence within the facility. Finally, another study suggested a long list of techniques capable of measuring biological signatures based on their general characteristics: global, morphological, mineralogical, organic, molecular and biochemical, isotopic analysis. Despite the effort of the cited studies, there is still the need of a critical approach to make an actual comparison between the techniques, with the aim to find a ranking. In this work, we focused on the construction of a correlation matrix with which to correlate biosignatures to analytical techniques. It is known that a number of techniques can detect biological signatures and, at the same time, each technique can be applied to multiple biological signatures. Using this method, it is possible to summarize all this information to be easily consulted, but also to define in a quantitative way how strong each correlation is.


Introduction
When dealing with life on Mars, the aim is to search biosignatures of ancient or present life: something that suggests the presence of a biological process indicative of life.
Space agencies are making an effort to design a Mars sample return mission. This means that, as soon as the samples collected from Mars surface will be brought back to Earth, the samples will be placed inside a receiving facility to check for these biosignatures. Hence, properly setting up receiving facilities to analyse Mars samples is of extremely high interest to the international Mars exploration community .
In the last years, many studies have been published to identify life-correlated Martian biosignatures. The most detailed one is probably the international MSR Objectives and Samples Team (iMOST) study (Beaty et al., 2018a(Beaty et al., , 2018b, chartered by the International Mars Exploration Working Group (IMEWG). The study assessed the expected value of the samples to be collected by the Mars 2020 rover mission and identified the most promising biosignatures categories. Nowadays a huge number of possible techniques are available to investigate biosignatures and various approaches have been proposed. Wilson (1999) proposed a detailed set of instruments to investigate the presence of life and the correlated biological risk in the returned samples. Kminek et al. (2014) made suggestions on how to organize the sample analysis sequence within a receiving facility. Race (2009) suggested a long list of techniques to detect biosignatures starting from their general characteristics: global, morphological, mineralogical, organic, molecular and biochemical, isotopic patterns.
Despite the effort of the cited studies, there is still the need of a critical approach to make an actual comparison between the techniques, with the aim to find a ranking.
Starting from this premise, in this work we focused on the construction of a correlation matrix with which to correlate biosignatures and analytical techniques. The major drivers we took into account were to define which techniques are really important and which can be considered as optional, rationalizing the activity flow inside a Martian curation facility and providing a support for the design choices of the curation. It is known that a single biosignature can be detected by a number of techniques and, at the same time, a single technique can be applied to a number of biosignatures. The method we proposed has the aim to summarize, in an easily readable manner, the information on biosignatures and techniques and to define in a quantitative way how strong their correlations are.

The correlation matrix method
The correlation matrix is a structured approach translating the customer needs into specific plans able to meet those needs. The matrix is a quality management tool originally developed in Japan (Akao, 1994) in the 60s of the last century and considerably evolved since then (Cohen and Ficalora, 2009).
The philosophy of this technique is designing a product or a process according to the use functions expected by the customer. Akao (1994) described the tool as a 'method to transform qualitative user demands into quantitative parameters, to deploy the functions forming quality, and to deploy methods for achieving the design quality into subsystems and component parts, and ultimately to specific elements of the manufacturing process'.
To build the matrix, the first step is to know the characteristics and the quality attributes of a product desired by the customer (the so-called VOC, voice of costumer), then to identify the engineering characteristics (the so-called CTQ, critical to quality) which may be relevant to those desires and finally you build the matrix itself. This can happen at all organizational levels, from design to the creation and distribution of the finished product or service.
As an example, at design level, the method should be used to translate a customer desire (e.g. the ease of writing for a pen, the VOC) into design characteristics (dimension and shape of the pen section, ink viscosity, dimension of ball-point, pressure on ball-point, etc., the CTQs) at each stage of the product development.
Currently, the correlation matrix approach is widely applied to a huge variety of fields: engineering, manufacturing, research and development (R&D), transactional process, etc.
At first, the main steps of the process are: 1. define the VOCs; 2. identify the CTQs that are relevant for the VOCs; 3. build the correlation matrix, identifying the correlations between VOCs and CTQs.
This is a qualitative first-step approach, helpful to summarize data, but unable to give information about the strength of the correlations. Thus, translating the matrix to a quantitative approach gives the opportunity to have a more detailed and critical method. In particular, the additional steps of the process are: 4. quantify the importance of VOCs, rated in a defined scale (input); 5. quantify the VOCs/CTQs correlations, rated in a defined scale (input); 6. define the appropriate algorithms (input) to calculate the CTQs importance (output).
A representation of the qualitative versus the quantitative approach of the correlation matrix is shown in Fig. 1.
It is important to highlight that each input element of the correlation matrix (the VOCs, the CTQs, the importance rating and the values attributed to each VOC, the correlation rating and the values attributed to each VOC/CTQ correlation and the algorithms needed to calculate the CTQs importance) should be the results of a collegial activity where the owners of the object/process (e.g. costumers, designers, producers, experts, etc.) work together to find an agreement to define values and functions. This a core need in order to encompass the method and to robustly convert the qualitative approach to a quantitative one.

Method application
The correlation matrix presented in this work has the aim to correlate the possible Martian biosignatures (VOCs) with the nowadays available life detection techniques (CTQs), by means of a quantitative approach.
An earlier step of our correlation matrix was published (Longobardo, 2021) as a result of the European Curation of Astromaterials Returned from Exploration of Space (EURO-CARES), a three-year (2015-2017) multinational project, funded under the European Commission's Horizon 2020 research programme to develop a roadmap for a European Extra-terrestrial Sample Curation Facility (Smith et al., 2021). The matrix, shown in Fig. 2, correlated five categories (the same of our work) of biosignatures with a number of techniques. The matrix was not Martian-centred but also took into account biosignatures from other potential mission target such as the outer Solar System icy moons.
In 2017, the iMOST study assessed the most promising Martian biosignatures, as part of the support activities needed for the NASA Mars 2020 mission. This gave us the opportunity to renew the old matrix, building new one, more deeply focused on Mars samples (Fig. 3, where the input values are shown in italic while the calculated output values have a gradient coloured background).

International Journal of Astrobiology
• Chemical biosignatures: abundance patterns of metals used by biota, elevated concentrations of organic matter, abundance patterns of elements involved in redox reactions (e.g. S, N, transition metals, etc.), redox boundaries in rocks (redox 'fronts') and atmospheric gases (CH4, O2, etc.); • Organic biosignatures: structures of individual molecules, relative abundances of molecules, molecular weight distributions, abundances of species containing C, H, N, O, P, S; • Isotope biosignatures: isotopic patterns within organic molecules, isotopic patterns between individual organic molecules, isotopic patterns between classes of organic compounds, isotopic patterns between oxidized versus reduced compounds that contain C, N, S, and redox metals; • Mineralogical biosignatures: biogenic minerals (e.g. carbonates, sulphur minerals, phosphates, phyllosilicates, transition metal oxides, etc.).
Each sub-category was given an importance rate in order to disentangle how much each biosignature is able to provide us with a greater or lesser proof of the presence of life on Mars. Two categories were agreed: highly (H) and possibly (P) diagnostic biosignatures.
Following the quantitative approach, the highly diagnostic biosignatures have a 33% higher importance value with respect to the possibly diagnostic ones.
Once the biosignatures were defined, a number of techniques were identified, able to detect all the biosignatures and then the correlation matrix was built.
The correlation values between biosignatures and techniques were defined in a non-linear exponential range from 0 to 9 (0,1,3,9). The non-linear range is commonly applied for correlation matrix because it increases the sensitivity level of the matrix in the case of high correlation, emphasizing the numerical result. Value 0 is given if no correlation exists (absence of correlation), 1 if the technique is no specific for the biosignature but still gives some information on the biosignature at low resolution (low correlation), 3 if the technique is suitable for the biosignature, although not specific, with medium resolution (medium correlation), 9 if the technique is very specific technique for the biosignature, with high resolution (high correlation).
Given these inputs, it was possible to start evaluating the techniques. At first three values were calculated: • Biosignatures Occurrence, the number of biosignatures that can be detected by a single technique. This is still a qualitative value and it gives information about the versatility of each technique: the higher the number, the more qualitatively versatile is the technique. • Techniques Mean Value, the mean correlation of each technique with the detected biosignatures. For each column (technique) the value is calculated by dividing the sum of correlations values by the biosignature occurrence. This is another way to evaluate the technique versatility. • Techniques Importance, the degree of correlation of a single technique with all the listed biosignature. For each column (technique) the value is calculated as the sum of products of the biosignature importance and the correlation value.
To better emphasize the technique importance, a destructive/non-destructive coefficient was introduced to increase the value of non-destructive techniques. Techniques able to preserve the sample pristinity or integrity are considered better than the destructive ones. Value 1 is given if the technique is destructive, 1.1 if partially destructive, 1.33 if non-destructive.
Using the destructive/non-destructive coefficient it was finally possible to calculate the final value of each technique: • Techniques Overall Importance, the final degree of correlation of a single technique with all the listed biosignature. For each column (technique) the value is calculated as the product of the technique importance and the destructive/non-destructive coefficient.
As already described in general terms in paragraph 4 of this work, it is necessary to highlight that the input data of the matrix were chosen and quantified through a discussion process that involved a large number of participants. What we actually did was to integrate the results of a collegial work done by the working group of EURO-CARES (Brucato and Meneghin, 2016), which led to the earlier version of the matrix, with a new collegial panel of experts in order to build this new Martian-centred version of the matrix.

Main results and discussion
Analysing the correlation matrix, it is now possible to state some main thoughts: • There are 19 biosignatures (VOCs), grouped into five categories.
• Eleven of these biosignatures are considered as highly diagnostic for Martian life.
• There are 18 techniques (CTQs) able to detect the biosignatures.
• All the techniques have at least one or more high correlation (biosignatures occurrences) with the biosignatures. Only XRD technique has a single high correlation with a biosignature (biogenic materials). The occurrences are comprised in the range between 3 and 12. • The techniques mean values are comprised in the range between 3.6 and 9.0. The mean value of techniques mean values is 6.5. These data proof that the entire set of techniques was properly chosen by the panel of experts. • The minimum number of techniques able to detect all the biosignatures, with any level of correlation, is 3 (e.g. a combination should be optical microscopy, FTIR and SIMS). • The minimum number of techniques able to detect all the biosignatures, with a high correlation, is 5 (e.g. a combination should be SEM, MALDI-TOF, GC-MS, TEM XRF). • The minimum number of techniques able to detect all the highly diagnostic biosignatures, with any level of correlation, is 2 (e.g. a combination should be SEM and LDI-MS). • In terms of importance and overall importance the techniques should be ranked as shown in Table 1, where the techniques are ranked according to their overall importance (taking into account the destructive/non-destructive coefficient). The ranking according to the importance is slightly different from the latter, because the destructivity of the techniques has a not negligible impact.
The correlation matrix has to be considered as a 'living' tool to help scientist and designers to make the right choices when setting up a sample receiving facility. The matrix we presented here is a kind of 'photography' of the state-of-the-art of what we actually know about Martian samples, expected biosignatures and currently available detecting technologies. Nevertheless, the matrix can be easily modified in case of the advancement of skills and knowledge leads to changes in the biosignatures and techniques fields.
This can also happen in case of specific needs: • The nature of the samples itself may vary (e.g. gases, ice, regolite, dust, etc.) and be well known before the Earth return. This could lead to a more detailed and specific correlation matrix to evaluate only the expected biosignatures and the correlated techniques. • If the amount of Earth returned samples was really small, it could be decided to give priority to non-destructive techniques, in order to preserve the integrity of the samples as much as possible. This could result in a further increase of the value of the destructive/non-destructive coefficient, or in the elimination of destructive techniques. • In case of use in BSL-4 laboratories, the techniques must be compatible with the required environmental standards. This could affect the techniques values, introducing a further BLS-4 compatibility coefficient, whose value should be chosen according to BSL-4 sizing, presence of waste or ancillary system requirements needs.
The latter examples help to understand that the correlation matrix here presented is only a possible version: in case of specific needs and/or receiving facility design specification, the list of techniques and biosignatures could vary so as their correlation values; the coefficients themselves could be modified, erased, added or substituted. Furthermore, and for the same reason, the way of using the matrix results can even change: is it more important to use the techniques ranked with a higher overall importance (listed in Table 1)? Or the minimum number of techniques able to detect all the biosignatures (e.g. optical microscopy, FTIR and SIMS)?

Conclusions
While trying to find a good method to help scientists and designers to build a sample receiving facility, the major questions we tried to give an answer were: • Among the currently available detecting technologies, how is it possible to choose the ones that are really important and the ones which can be considered as optional? • How can this help to rationalize the activity flows inside the curation? • How is it possible to provide a support for the evaluation of the design choices of the curation?
The correlation matrix is a powerful, adaptable tool able to rationalize in a single-sight table the major items to deal with when setting up a receiving facility. It allows to convert a subjective approach to an objective one, helping to rationalize a problem from the boundaries conditions to the final solution, giving numbers, ranking the information and helping to orientate the decisional process. Starting from the matrix results it is possible to facilitate the design choices (e.g. the choice of the set of techniques allows a better evaluation of the facility layout, depending on the size and position of the instrument, its position, the compatibility with other instruments, the need of ancillary systems, etc.).
Furthermore, the correlation matrix provides a wide set of data that can be used according to different requirements (e.g. the sizing and the layout of the facility, the nature of samples, etc.) and it can be easily modified and adapted (e.g. in case of new skills and knowledge about samples, biosignatures and techniques).