Designing daily-life research combining experience sampling method with parallel data

Background. Ambulatory monitoring is gaining popularity in mental and somatic health care to capture an individual ’ s wellbeing or treatment course in daily-life. Experience sampling method collects subjective time-series data of patients ’ experiences, behavior, and context. At the same time, digital devices allow for less intrusive collection of more objective time-series data with higher sampling frequencies and for prolonged sampling periods. We refer to these data as parallel data. Combining these two data types holds the promise to revolutionize health care. However, existing ambulatory monitoring guidelines are too specific to each data type, and lack overall directions on how to effectively combine them. Methods. Literature and expert opinions were integrated to formulate relevant guiding principles. Results. Experience sampling and parallel data must be approached as one holistic time series right from the start, at the study design stage. The fluctuation pattern and volatility of the different variables of interest must be well understood to ensure that these data are compatible. Data have to be collected and operationalized in a manner that the minimal common denominator is able to answer the research question with regard to temporal and disease severity resolution. Furthermore, recommendations are provided for device selection, data management, and analysis. Open science practices are also highlighted throughout. Finally, we provide a practical checklist with the delineated considerations and an open-source example demonstrating how to apply it. Conclusions. The provided considerations aim to structure and support researchers as they undertake the new challenges presented by this exciting multidisciplinary research field.


Abstract
Background. Ambulatory monitoring is gaining popularity in mental and somatic health care to capture an individual's wellbeing or treatment course in daily-life. Experience sampling method collects subjective time-series data of patients' experiences, behavior, and context. At the same time, digital devices allow for less intrusive collection of more objective timeseries data with higher sampling frequencies and for prolonged sampling periods. We refer to these data as parallel data. Combining these two data types holds the promise to revolutionize health care. However, existing ambulatory monitoring guidelines are too specific to each data type, and lack overall directions on how to effectively combine them. Methods. Literature and expert opinions were integrated to formulate relevant guiding principles. Results. Experience sampling and parallel data must be approached as one holistic time series right from the start, at the study design stage. The fluctuation pattern and volatility of the different variables of interest must be well understood to ensure that these data are compatible. Data have to be collected and operationalized in a manner that the minimal common denominator is able to answer the research question with regard to temporal and disease severity resolution. Furthermore, recommendations are provided for device selection, data management, and analysis. Open science practices are also highlighted throughout. Finally, we provide a practical checklist with the delineated considerations and an open-source example demonstrating how to apply it. Conclusions. The provided considerations aim to structure and support researchers as they undertake the new challenges presented by this exciting multidisciplinary research field.

Background
The experience sampling method (ESM) is a powerful diary-based tool to assess subjective daily-life data (Christensen, Barrett, Bliss-Moreau, Lebo, & Christensen, 2003;Eisele et al., 2020;Palmier-Claus et al., 2011). Typically, users complete an identical (or quasi-identical) questionnaire repeatedly throughout the day over the course of several days, weeks, or months. These questionnaires are often scheduled using semirandomized cues and responding is time limited to avoid biased-or back-filling. ESM has been extensively used in the health care sector to describe individuals' wellbeing and symptom course  as well as to evaluate therapeutic effects in mental and physical health [Corrigan-Curay, Sacks, & Woodcock, 2018;FDA (Food and Drug Administration), 2022;Oyinlola, Campbell, & Kousoulis, 2016].
Technological advances in wearable devices and passive sensing tools are expected to revolutionize health care. These tools are increasingly used in combination with ESM to enhance daily-life monitoring and to explore a much wider and comprehensive array of questions (Rehg, Murphy, & Kumar, 2017). Throughout this paper we will refer to wearables and passive sensing data as 'parallel data'. Broadly, parallel data are any data that are collected in parallel to, and with the purpose of supplementing ESM. Examples of parallel data include but are not restricted to physiological (heart rate, blood pressure, movement), environmental (geolocation), or behavioral (smartphone usage) data. These data have been combined with ESM to study addiction (Bertz, Epstein, & Preston, 2018), affective disorders (Cousins et al., 2011, p. 11;Kim et al., 2019;Minaeva et al., 2020), schizophrenia (Kimhy et al., 2017), and movement disorders (Heijmans et al., 2019a), among others.
Experience sampling and parallel data can be complementary to each other, which explains the growing interest in combining them. ESM can capture the variability within the day of variables that are not, or hardly, measurable with parallel data, such as affect, perceptions, and contextual cues and events . Parallel data, on the other hand, are better suited to capture processes that are hard to measure subjectively such as physiological parameters like heart rate and skin conductance (van Halem, van Roekel, Kroencke, Kuper, & Denissen, 2020), and behaviors that are notoriously difficult to report such as internet usage (Yuan et al., 2019). In addition, parallel data can be collected passively and non-intrusively allowing for higher sampling frequencies (Fig. 1) and longer sampling periods with lower burden on participants (Barnett, Torous, Reeder, Baker, & Onnela, 2020). By combining ESM and parallel data one can overcome each method's limitations and explore their full potential. The combined methodology is promising for both ambulatory research and clinical practice focusing on real-life or real-time symptom tracking (Nahum-Shani et al., 2016). Especially its role in just in time adaptive interventions (JITAIs) is highly anticipated and expected to change the health care landscape. JITAIs are personalized interventions that are provided directly in daily-life, at the right time, and adapted to the patient's needs (Nahum-Shani et al., 2016;Sharmin et al., 2015).
However, as is often the case, technological advances outpace their scientific evidence and the common lack of parallel ecologically valid, contextual information complicates parallel data analysis, validation, and reproducibility (Shortliffe, 1993;Stupple, Singerman, & Celi, 2019;Tackett, Brandes, King, & Markon, 2019). Currently available ESM and parallel data monitoring guidelines focus too often on their respective data sources or their specific use case, and lack dedicated general directions to guide researchers in designing studies that combine them (Baumeister & Montag, 2019;Janssens, Bos, Rosmalen, Wichers, & Riese, 2018;Mehl & Conner, 2013;Palmier-Claus et al., 2011;Rehg et al., 2017). Reproducibility is further threatened by the lack of standardization and the large heterogeneity in measures, methods, and approaches used to combine these two data types (Vaessen et al., 2021).
Consequently, as part of the Belgian-Dutch Network for ESM Research in Mental Health, an expert group focused on combining ESM and parallel data came together to formulate clear points to consider in the various stages of designing such a study. Rather than a rigid guideline, these are general considerations aimed at providing researchers with the necessary structure and support to design and conduct meaningful and reproducible research combining ESM and parallel data.
How to design a study that combines ESM with parallel data?

Research question and hypotheses
The ESM and parallel data are typically combined to (1) enrich subjective self-assessments of ESM with an 'objective' proxy or . Heart rate (HR) can be unobtrusively recorded by wrist-worn devices over periods of circa 20 min (Graham et al., 2019). For the accelerometer (ACC) signal, the raw tri-axial signal is showed. The summarizing feature is the variation (Scipy.Stats. Variation -SciPy v1.6.2 Reference Guide, n.d.) of the resulting signal vector magnitude (black dotted line, right y-axis). complementary data source, or (2) provide interpretable contextual or 'ground truth' labels next to volatile high-frequency time series. During the definition of the research question and its hypotheses, the limitations of both data types have to be carefully considered. A graphical representation of the anticipated results of the two variables of interest can be useful to grasp the fluctuation patterns within our variables of interest ( Fig. 1). During this hypothetical exploration of the combined dataset, the question 'What information does ESM data add to parallel data (or vice versa)?' has to be leading. An understanding of the data collection, operationalization, and analytical techniques is therefore required. Answering this question helps to consider whether the combined use of ESM and parallel data is justified, meaning whether each data type has its own unique contribution while also maintaining synergy.
Since the variables of interest will result from two different data types and cover different timelines, it is essential to understand the expected temporal relationship between those variables and to specify the assumptions about the direction of the association of interest. For instance, are we interested in the parallel data features such as for example GPS location, preceding, following, or simultaneous to the ESM assessment of for example momentary anxiety (Fig. 2)? Or are we interested in relationships over time, the effect of moment n on moment n + 1, and if so, how long should the time lag be? In addition, the duration of parallel data corresponding with the ESM measure has to be determined. All these considerations should lead to a schematic draft of the temporal relationship between the variables of interest as seen in Fig. 2. Preregistration of the hypotheses and the study design will help solidify the expected associations and increase the robustness of the results (Nosek & Lakens, 2014).
It is important to note that these directional patterns can help 'unpack' mutually occurring temporal relationships but they cannot prove causal relations due to the observational nature of data collected in the flow of daily-life (Holleman, Hooge, Kemner, & Hessels, 2020). Causality depends on several co-occurring factors and therefore cannot be claimed in such study designs (Rohrer, 2018;Rubin, 2007). Researchers who are interested in a causal relationship in an ecologically valid context should consider an experimental design in which a daily-life variable, for example stress, is actively manipulated in order to test its effect on another variable (Smets, De Raedt, & Van Hoof, 2019). Although, no absolute claims about causality can be made, ecological data collection allows more control over a certain variable in a more natural environment. This benefit has to be carefully weighed against the threat that it poses to the ecological validity (Holleman et al., 2020). Findings should thus be interpreted with this in mind.

Variable fluctuation and volatility
Once the variables of interest and the nature of their relationship have been identified, we need to understand their variability (i.e. how much do they fluctuate?) and volatility (i.e. how fast do they fluctuate?). These factors will determine data collection technicalities such as sensitivity (what is the smallest detectable symptom difference?), frequency (e.g. continuous high frequent sampling, multiple times per day or week), timing (e.g. morning, evening, event-triggered), and duration (days, weeks, months). Since the aim is to combine two separate data time series, it is essential at this time to define the temporal resolution of the anticipated 'outcome variables' or 'fluctuation scores' of both data types. If one data type results in outcomes of a higher frequency, a valid and meaningful aggregation method has to be designed to enable matching information from both data types.
ESM variables may have different fluctuations and volatilities, for instance depressive feelings may fluctuate slowly compared to anxiety, which may be more volatile. While it is tempting to select a higher ESM sampling frequency, this may cause increased participant burden, lower compliance, and lower data quality (Eisele et al., 2020;Fuller-Tyszkiewicz et al., 2013;Trull & Ebner-Priemer, 2020). Akin to parallel data variables, sampled data-points should not miss relevant fluctuations, but large amounts of redundant data should also be avoided.
Specific temporal associations between variables can drive the timing of the data collection. An association that includes a time-lag will determine the interval-period between consecutive assessments. On the other hand, when the focus is on a specific event-dependent time window, a data collection strategy based on a specific occurrence will be required; for example, triggering an ESM measurement following changes in physiology (van Halem et al., 2020).
It is of note, that variable fluctuations can vary depending on the population's health, social, economic, or cultural characteristics (Okun, 2019). It is therefore advised to use explicitly validated variables or conduct a pilot study testing whether a protocol captures the expected fluctuations in the population of interest. Furthermore, studies including multiple parallel data types should address these questions for each data type separately.
The above-mentioned questions should be carefully addressed prior to data collection and ideally be pre-registered in one of the open science platforms most relevant to the specific field. There are many available platforms and guidelines to help explore different options (Kathawalla, Silverstein, & Syed, 2021). Due to the multiple types of data and the different steps necessary to track and report research, a platform that allows for greater flexibility, such as the Open Science Framework (OSF), is advised. The OSF provides a common place to enact all open science practices, such as pre-registration, data storage and sharing, code sharing, pre-prints, to name a few (Foster & Deardorff, 2017).

Data analysis
Thus far, defining detailed hypotheses and their variables of interest including their intended operationalization and expected fluctuation patterns have been discussed. It is time to determine the pre-processing and statistical analysis that best answers our research question. Although a detailed statistical discussion is beyond the scope of this paper, it is necessary to highlight the importance of choosing the right analysis prior to data collection. The chosen methods of data pre-processing and analysis will likely influence the required study design, but also potentially limit the ability to answer the intended research question.
Literature exists on data analysis for ESM specifically, which helps researchers to consider and perform for instance power size calculations Scherbaum & Pesner, 2019) and multilevel statistics which account for the data hierarchy as well as disentangle between-and within-person differences (Bolger & Laurenceau, 2013;Mehl & Conner, 2013;Singer, Willett, Willett, & Willett, 2003). On the other hand, parallel data sources can require various analytic approaches depending on the type and format of the data. Some of these approaches are already showcased specifically in relation to ESM (Baumeister & Montag, 2019;Rehg et al., 2017). However, in the case of parallel data sources that are not yet referenced, it is advised to consult existing analyses in the specific field of interest and ideally with similar variables.
Broadly, ESM and parallel data studies are typically limited to either describing the association between variables, or developing and/or validating a model that predicts a variable prospectively based on the other variable(s) (Baumeister & Montag, 2019;Pencina, Goldstein, & D'Agostino, 2020;Yarkoni & Westfall, 2017). To avoid spurious findings in predictive modeling, crossvalidation is advised by splitting the collected data in a separate 'training data set' and a 'test data set', both containing all types of collected data (Kubben, Dumontier, & Dekker, 2018).
Likewise, we recommend pre-registering the statistical analysis plan, including the code, prior to data collection. Once the data are collected, we advocate that it should be shared in an appropriate database so that other researchers may replicate the work (Turkyilmaz- van der Velden, Dintzner, & & Teperek, 2020).
How to collect parallel data that truly capture the variable of interest?

Device selection
Device selection is a vital component of studies measuring parallel data, and an important decision to ensure good compliance and data quality. There are many commercial wearable devices which collect data with minimal intervention, but all of these perform differently and have specific limitations that may change by (patient) populations (Fuller et al., 2020;Lai et al., 2020;Nelson & Allen, 2019). It is important to ensure the device collects accurate data that are reliable and valid. Moreover, the device must capture the required data with the right frequency and be validated in the appropriate population. Although they are not available for every application, systemic reviews or comprehensive guidelines exist to help researchers select wearables devices for specific scientific applications (Kunkels, van Roon, Wichers, & Riese, 2021;Nelson et al., 2020).
In addition to data quality, there are other topics deserving attention such as patient comfort and burden, privacy regulations, data security, storage and ownership, and battery life (Rehg et al., 2017). If real-time evaluation is desired, for example in case of event-dependent ESM assessment, connectivity and data sharing issues should be considered (Cornet & Holden, 2018;Kohrt et al., 2019;Trifan, Oliveira, & Oliveira, 2019).
Furthermore, it is essential that devices reliably log the timestamps in universal comparable time. Timestamps simply note when an assessment took place. Some devices need to be synchronized at the beginning of a recording session, and some are subject to drift, which means the timestamp accuracy decreases over time. Overall, inaccuracies within second ranges are negligible since ESM answers do not represent events at a (micro-)second level.
An important distinction in the available devices is whether they provide raw data or proxy data. Data provenance should be well-known prior to data collection, that is the various pre-processing steps taken to transform raw data into meaningful information (Rehg et al., 2017). Proxy data, often provided by commercial devices, are already processed or summarized into assessment scores, such as activity rates per day or per hour. When proxy data are preferred, or it is not possible to obtain raw data, it is vital that the algorithm used to compute the proxy data is known, or at least well-understood, and most importantly validated (Feehan et al., 2018;Horton, Stergiou, Fung, & Katz, 2017). Not understanding the essence of the proxy data may heavily affect the validity and interpretation of the obtained findings.
At this stage it is also relevant to consider a data management plan (Wilkinson et al., 2016). That is where data will be stored (short and long term), how will it be preserved, and who will have access to it. A proper data management plan is essential for all studies but especially in this case where there are many data sources with different formats and sizes. Part of this plan should include comprehensive information on what the datasets contain, if the data are raw or have been through any preprocessing steps.

Sampling frequency
Sampling frequency should be defined carefully for the reasons stated above. For variables with a stable volatility, the Nyquist theorem can be used. The Nyquist theorem is commonly used in signal processing, and dictates the sampling frequency to be larger than twice the frequency of the smallest fluctuation in the variable of interest (Bogdan, 2009). Violating the Nyquist theorem by under-sampling can lead to aliasing; the incorrect extraction of peaks and frequencies from a raw signal. Aliasing is more applicable for high-frequency sampling of parallel data than data collected with ESM.
On the contrary, variables with unstable volatilities are more complicated, such as stress-reactivity, or geolocation. Methodologies from studies or reviews assessing the same variable can often provide evidence regarding specific sampling frequencies. Some examples are heart rate variability (Shaffer & Ginsberg, 2017), GPS-based out-of-home activity (Kondo et al., 2020;Liao, Song, Robertson, Cox-Martin, & Basen-Engquist, 2020;Zeng, Fraccaro, & Peek, 2019), and accelerometry-based activity monitoring (Kolar et al., 2020;Niazi et al., 2017). It is important to stress that under-sampling issues cannot be resolved by simply collecting more data over longer periods of time. Larger datasets which still do not capture the fluctuation of the variable(s) of interest will not lead to meaningful interpretations.

Feature extraction
For a meaningful interpretation of parallel data, information must be extracted from the raw parallel data in a way that it represents the variable of interest. In signal processing terms, the values containing this information are called features. The period of data used to calculate one feature is called the feature window. In some specific cases, the raw data contain the desired information, and no feature extraction is required (e.g. body temperature at specific moments). However, in general, raw parallel data will need to be pre-processed via feature extraction. The type and timescale of features is dependent on the type of data, and the exact variable of interest. For example, proximities to outdoor natural environments can be extracted from GPS-data per 10 min (Kondo et al., 2020), while physiological features like heart rate or movement need to be calculated over (milli)seconds. Choosing, or finding, the right feature window size is important since various window sizes may lead to different results (Heijmans, Habets, Kuijf, Kubben, & Herff, 2019b). For some parallel data or hypotheses, aggregation of high-frequency features over longer windows might be necessary.
For both data types, it is important to consider these (pre-) processing steps including how to store and annotate the raw data, the features, and preferably the code. For this it is highly recommended to pre-register the study prior to conducting data collection. Publishing detailed scripts of the performed preprocessing steps and analyses, including possible post hoc or additional analysis, will further improve the study's scientific quality and reproducibility. Several resources are available to help with these steps such as the ESM pre-registration template (Kirtley, Lafit, Achterhof, Hiekkaranta, & Myin-Germeys, 2020) and guidance for scientific data care (Goodman et al., 2014).

Temporal feature aggregation
Assuming the parallel data features and ESM answers differ in sampling frequency, we can regard these two time series of datapoints both as snapshots of an ongoing, continuous fluctuating process (Fig. 2). To describe an association between them, we need to either up-sample, or down-sample one of them, or both. Up-sampling, via value imputation or extrapolation, tends to generate uncertainty, especially when it needs to be done repeatedly. Down-sampling however, can lead to important information loss. It is common practice to down-sample the parallel data to ESM data sampling frequency.
The most straightforward comparison between ESM and parallel data is to regard each completed ESM questionnaire as a single event, and to compare it with the parallel data collected in the corresponding time window (see Fig. 2). In this case, the higher-frequency parallel data are down-sampled via feature extraction. For this we need to define the duration (how many seconds or minutes), and timing (prior or after the ESM completion) of the window of parallel data corresponding with the ESM variable. The correct duration and timing will depend on the subjective experience that is assessed with the ESM item, the ESM instruction given to the participant, and the formulated hypothesis. The selected parallel data window will then by translated into the variable(s) of interest via the described feature extraction process (Habets et al., 2021).
The comparison of ESM and parallel data in which both data types are regarded as ongoing processes over time is based on a different theoretical principle and requires different statistical approaches. This could be especially relevant for hypotheses focusing on continuous processes. Instead of down-sampling the parallel data, up-sampling of the ESM data is required via for example extrapolation (Fig. 2). It is important to carefully consider each statistical method's limitations and potential bias.

Missing data
Similar to the statistical analyses, a detailed description of missing data management is beyond the scope of this work. However, ignoring it entirely would be a significant omission. In general, missing data can be handled in various ways (Little & Rubin, 2002b). Broadly, it is important to assess whether the missing Psychological Medicine 5 data-points are missing at random or may represent completion bias. Especially in the case of ESM, missing data can be caused by (disease) specific reasons and can contain significant information (Cursio, Mermelstein, & Hedeker, 2019). In addition to asking participants why they missed entries, analyzing both the corresponding parallel data as well as the non-missing ESM data can be considered. Many sources already exist to help researchers handle missing data in a multilevel structured dataset ( van Ginkel, Linting, Rippe, & van der Voort, 2020), as well as in time-series data (De Waal, Pannekoek, & Scholtus, 2011). There are also resources to consider regarding for example data imputation techniques (Beard et al., 2019;van Breda et al., 2016) and the possible bias it may introduce (Little & Rubin, 2002a).
How to bring these considerations into practice?

Practical checklist
To provide researchers with an easy overview of the key elements to consider when conducting a study that combines ESM and parallel data, we provide a detailed checklist to guide in the different stages of the study design (see Table 1). This is intended to be a useful advisory research tool, rather than a rigid guideline.

Open-source example
A comprehensive practical example has been drafted to demonstrate how to apply all the different considerations mentioned in this paper (see Table 2). Here, we present how the checklist can be used in practice. For easy access and replicability, we use a publicly available dataset containing ESM and parallel data (i.e. movement data) collected from 20 patients with Parkinson's disease over the course of 14 consecutive days, without restrictions, in daily-life (Habets et al., 2021;. The ESM assessment contains psychological items assessing affect and mood, as well as questions on motor symptoms, physical ability, and contextual questions . The movement data consist of raw acceleration (accelerometer) and rotation (gyroscope) time-series data derived from a wrist-worn movement sensor. From these data Parkinson motor symptom variables, such as tremor, were calculated. The database and accompanying Python Notebooks (Perez & Granger, 2007) with example code on data pre-processing and data merging have been published Habets et al., 2021) and are available at: https://zenodo.org/record/4734199#.YJAOZRQza3J. For computational details and background on the movement data derived tremor scores we refer to a previous publication (Heijmans et al., 2019b). A detailed description of this example can be found in the Supplementary Material. For this example, we are interested in the effect that Parkinsonian tremor severity has on negative affect. Interesting within-subject relationships between mood and tremor severity are showed in a longitudinal n = 1 study (van der Velden, Mulders, Drukker, Kuijf, & Leentjens, 2018). This study only used ESM data and did not include any parallel data. With complementary movement data, we want to reproduce and further explore this relationship. Since this data repository is already available, we assess whether it is suitable to answer our question.

Conclusion
The ESM is increasingly combined with parallel data, collected by passive sensing devices or wearable sensors in daily-life. These methods hold a great potential to contribute to ambulatory monitoring and personalized health care. However, combining these data types in research or clinical practice comes with new challenges for which specific guidance is lacking. We presented several important and useful considerations to support researchers in every stage of designing a research project. We stressed the importance of understanding the fluctuations and the temporal resolution within the two separate data timelines and their operationalization. We further described essential considerations on device selection and feature extraction and aggregation, as well as their statistical analysis. Finally, we underlined necessary methods to ensure transparency and reproducibility.
The provided recommendations aim to guide researchers in conducting meaningful research combining state-of-the-art  Is there support in the literature for a specific association or is exploratory research required? − There is evidence that more tremor episodes lead to more NA in Parkinson's disease.

Variables
What are the specific variables of interest? − NA is a composite measure computed as the average of the items: mood, depression, anxiety, negatively related to joy, happiness, positive excitement (ESM). − Tremor: measured with movement features of accelerometer and gyroscope time series ( parallel data). What is their unique contribution? − Subjective affect in daily-life. − Passive, unobtrusive measure of tremor.
What is known about each variable's fluctuation and volatility? − NA fluctuates heavily within and between days. − Tremor can fluctuate heavily in severity over minutes, but will also fluctuate over hours based on dopaminergic medication states.
Is there literature on the appropriate sampling frequency to capture meaningful changes in the dimension of interest? If not, based on its fluctuation and volatility, what would be the minimum required frequency to capture meaningful variability? − 7 to 10 assessments a day have shown to reliably capture NA variability without over-burdening Parkinson patients. − Tremor is a relatively volatile Parkinson motor symptom, and episodes can be less than a minute. Tremor severity must be assessed every ten to thirty seconds. At least 25-50 Hz is required for tremor detection. Current 200 Hz is oversampled but can be reduced during the pre-processing stage.

Association of interest
Is there a temporal order between variables? If so, which one? Keeping in mind that causal associations can scarcely be established in such study designs. − We are interested in NA after tremor, so parallel data is down-sampled, and windows are computed prior to the timestamp of the ESM assessment. What window size is necessary for parallel data to reflect an ESM assessment? − Due to the high volatility of NA, we down-sample the parallel data to windows of 30 minutes prior to each ESM assessment.

Device selection
What devices are available to collect the measures of interest? − Several ESM platforms with user-interfaces suitable for Parkinson patients are available. − Wrist-worn wearable movement sensors.
Has the device been validated in the population of interest? − Data collection with the applied methodology was reported to be feasible in this specific population.
How is data managed, saved, and distributed? − ESM data is directly transferred to a server when connected to WiFi. − Parallel data is primarily stored on the device and is later downloaded during a research visit. Is there sufficient storage and battery life? − There is sufficient storing capacity and participants receive clear instructions about battery life and charging methods. Is raw data accessible? If not, is data provenance known? − Yes, all raw data is available.

Data analyses
What analyses are anticipated? − Nested multi-level analyses on the ESM data have to show significant fluctuations in NA within patients, within days, but also enable comparison between patients based on differences to inter-patient means. − Parallel data will be analyzed with signal processing techniques in which relevant movement features (e.g. power, variance, variability, spectral decomposition scores, smoothness) will be extracted over short time-windows of 1 second to several minutes. How much data is necessary to perform these analyses? − Assuming the occurrence of frequent fluctuations in tremor severity, fourteen days of data would be sufficient to make meaningful conclusions. How will missingness be handled? − If missing parallel data are caused by technical difficulties this can be accepted and excluded. If the wearable devices are not worn frequently, the researcher has to explore anamnestically or in the ESM data whether there is a pattern between those moments. − Similar to parallel data, the moments without ESM answers should be explored in the parallel data for potentially high tremor severity.