A preregistered vignette experiment on determinants of health data sharing behavior Willingness to donate sensor data, medical records, and biomarkers

A BSTRACT . The COVID-19 pandemic has spotlighted the importance of high-quality data for empirical health research and evidence-based political decision-making. To leverage the full potential of these data, a better understanding of the determinants and conditions under which people are willing to share their health data is critical. Building on the privacy theory of contextual integrity, the privacy calculus, and previous findings regarding different data types and recipients, we argue that established social norms shape the acceptance of novel practices of data collection and use. To investigate the willingness to share health data, we conducted a preregistered vignette experiment. The scenarios experimentally varied the vignette dimensions by data type, recipient, and research purpose. While some findings contradict our hypotheses, the results indicate that all three dimensions affected respondents ’ data sharing decisions. Additional analyses suggest that institutional and social trust, privacy concerns, technical affinity, altruism, age, and device ownership influence the willingness to share health data.


Introduction
The global COVID-19 pandemic has spotlighted the relevance of health research and evidence-based public policy decision-making around the world. Technological advancements have made it possible to collect, share, and analyze large amounts of health data. However, appropriate data collection infrastructures and instruments are needed to collect high-quality data, which have been shown to be lacking in several countries during the COVID-19 pandemic (e.g., Klingwort & Schnell, 2020;Schaurer & Weiß, 2020). Moreover, the quality of empirical evidence relies heavily on people's willingness to share their health data for research purposes (Aitken et al., 2016). Willingness to share data is closely connected to questions of data privacy and ethics that need to be asked anew with the rise of novel data sources, such as smartphone sensors that track bodily functions and mobility (Oberski & Kreuter, 2020;Struminskaya et al., 2021).
In this context, data collectors need to take a fine-grained perspective on such sentiments as acceptance of data use may strongly depend on the concrete scenario in which a person is asked to share personal information. This is because the legitimacy of a specific data collection may be questioned by individuals if strong and transparent privacy safeguards are not in place along each step of the data sharing process.
To comply with the public's privacy expectations, policymakers and data collectors need to know the conditions under which the collection of specific kinds of data is considered acceptable by their citizens. Understanding privacy as "contextual integrity" (CI; see Nissenbaum, 2010, 2019) provides a context-and situation-sensitive perspective on data flows that allows us to investigate the circumstances under which people accept the collection and use of their health data. CI is upheld if no violation of context-specific privacy norms occurs. CI posits that the (novel) data flow needs to be specified and then evaluated to determine whether it conforms with established and context-specific privacy norms.
The novelty of data flows that aim to improve public health depends on which practices are already established in contexts within specific countries. For example, Germany is a country in which the digitalization of the health system is not advanced compared with many other EU countries (Bertelsmann Stiftung, 2019). Several technological and medical developments (e.g., electronic patient records) could be more integrated into the maintenance of individual and public health. Sensor data from smartphones promise greater digitalization of medical health research. However, in order to roll out new systems, such as applications that monitor COVID-19, in a manner that is ethical and acceptable to the public, it is crucial to construct data flows that align with contextual norms. Yet, most of these technologies require data flows that citizens are not familiar with, and social norms for these data have only been established to a limited extent. Still, these novel data flows may be embedded in established social contexts or resemble already existing data flows (see, e.g., Vitak and Zimmer, 2020, with respect to the acceptance of COVID-19 contact tracing apps depending on situational parameters). Therefore, to improve individual and public health, we need to learn which health data flows are considered appropriate in which contexts.
Against this backdrop, we investigated the conditions under which individuals deem the sharing of different types of health data to be more acceptable, particularly with respect to the sharing of health data for public or personal benefit. Our study drew on the framework of CI to define 18 unique data sharing scenarios, which were presented to respondents in an online vignette survey experiment (Auspurg & Hinz, 2015). These scenarios varied on three levels: data type, data recipient, and data use purpose. We presented randomly selected vignettes related to cancer research, which has the advantage that our results were not directly affected by current events or changes in the global health situation regarding the COVID-19 pandemic. At the same time, cancer receives a large amount of attention from the scientific community and the public and affects many people's lives. Thus, combating this disease should be relevant to most citizens. Therefore, willingness to share data for cancer research may be higher than for comparatively less severe and/or less known diseases.
Studying willingness to share health data across different scenarios allows us to better understand which data flows are socially considered appropriate for sharing health data for private and public benefit. In particular, given the interplay of public and private entities in handling such new types of health data flows, the findings tell us whether private-and public-benefit uses of health data are accepted only when requested by private and public data recipients, respectively. This empirical investigation provides insights by shedding light on the nature of social norms in the health contexts-that is, which recipients and which data are appropriate to be used in the provision of personal and public health. For data sharing practice, the findings can inform the design of data collection activities of public and private organizations and help adjust practices to the expectations of individuals, thereby increasing the trust and willingness of citizens to participate.

Theoretical background
CI provides a framework to jointly investigate several relevant features of data flows, thereby allowing researchers to empirically ascertain which factor combinations are publicly accepted and align with social norms. From a CI perspective, the following situational parameters need to be specified to sufficiently describe data flows: the data type; the involved actors, such as the data sender and recipient; and the transmission principles, that is, the "rules" under which the data are transferred (Nissenbaum, 2010(Nissenbaum, , 2019. For example, individuals (data senders) might find it acceptable to provide sensor data from their smartphones (Data Type A) to a company (Recipient A) or to give consent to transfer a copy of their medical records (Data Type B) to university researchers (Recipient B) but not to a public authority (Recipient C). The CI perspective, however, does not allow us to make predictions about whether specific parameters, such as specific data types or recipients, will be generally more accepted. Instead, it can be argued that the closer a specific data flow is to contextual privacy norms, the higher is the likelihood that people will accept this data flow.
The CI theory suggests a prescriptive understanding of social norms, that is, what is "right" to do in a certain situation (Nissenbaum, 2010). Yet, from a CI perspective, novel data flows may still be acceptable if they fulfill contextual purposes better than established practices, even if they do not conform to them (Nissenbaum, 2010). In such situations, individuals might still be willing to share data, for example, because the data flow serves a public purpose that is perceived as sufficiently important and appropriate. Similarly, individuals may think that a data flow conforms with established norms but may nonetheless be hesitant to provide their datafor example, because the purpose is not perceived as sufficiently important or the effort to share these data is considered too high.
From the perspective of individual decisions to share data when confronted with novel practices, we argue that individuals may consider potential benefits and risks, as suggested by the notion of the "privacy calculus" (Culnan & Armstrong, 1999). More specifically, the privacy calculus assumes that privacy is an economic good that can be traded for benefits, such as other goods or services (Kehr et al., 2015;Smith et al., 2011). For example, individuals may decide whether to use new technologies depending on their ease of use and their usefulness (Davis et al., 1989). Considering the privacy calculus, we suggest that the privacy-specific risks and benefits are related to the fulfillment of contextual norms and goals. This means that individuals evaluate a novel health data flow depending on its appropriateness to fulfill the contextual purpose of promoting health. In short, we argue that novel data flows that do not conform to established norms may still be acceptable to individuals and that their acceptability is linked to the perceived benefits and costs of the new data flow, which are context dependent.
With respect to the purpose of a data flow, we need to determine which purposes individuals consider to be relevant contextual purposes. According to CI, purposes are core constitutive elements of social contexts (Nissenbaum, 2019). Certain sub-contexts (see Nissenbaum, 2010) of the health context might be understood to serve one purpose more than another. For example, the doctor-patient relationship is likely to constitute a sub-context that has the purpose of improving personal health. In contrast, transferring information about COVID-19 symptoms to a local public health agency likely serves the purpose of safeguarding public health. Yet, in both cases, personal and public benefits may arise. With respect to the acceptability of data sharing, however, it is a crucial to determine which uses are perceived to serve the desired improvement of public or personal health and which uses are perceived to violate central tenets of the health context.
In line with CI theory, our study has a strong situational and exploratory component as we cannot stipulate that any data type, recipient, or purpose that is aimed at providing individual and collective benefits is, as such, more or less acceptable to individuals. Instead, we need to consider the situational parameters in interaction with another. Given the theoretical considerations outlined earlier, our hypotheses are led by three prepositions: Health data flows that are closer to established privacy norms are more likely to be accepted by individuals (P1). Individuals are more likely to share their health data when the benefits (personal and collective) of doing so appear higher and the costs (e.g., required effort and consequences of out-of-context use) appear lower (P2). The potential benefits and costs of a novel data flow need to be interpreted with respect to the social context in which the data flow is embedded (P3). In the following, we specify the CI framework parameters to investigate the conditions under which individuals are willing to share their health data.

Previous research
Prior empirical research has investigated the willingness to share health data in several scenarios, showing, for example, that data sharing is viewed as most acceptable when the purpose is in the interest of the public, when the data are shared in a privacy-and security-preserving way, 1 and when the data recipient can be trusted (Waind, 2020). Previous work on the use of health administrative and clinical trial data also found that trust and public benefits are key to data sharing acceptability (Hutchings et al., 2020). In addition, control over the data that are shared 1 Preserving privacy and security is critical when digital data are shared because these data are exposed to threats during transmission. Thus, it is best practice to encrypt messages and files while they are being transmitted. was shown to be an important mediating factor that influenced willingness to share health data (e.g., Jones et al., 2019;Juga et al., 2021;Stockdale et al., 2018). It was also emphasized that citizens are concerned about the profit orientation of commercial data recipients and that they favored a public benefit for those data recipients (Aitken et al., 2016).
Earlier research also found that people are indeed willing to share (health) data, such as biobank data, for health research purposes (Husedzinovic et al., 2015). In contrast, more skepticism can be expected for health-related use of data collected in nonhealth contexts. For example, previous research showed that the use of data collected on Facebook for research purposes is often less accepted than uses that are merely aimed at improving user experience (Gilbert et al., 2021). Similarly, a survey showed that linking health data to personal nonhealth data was less acceptable than linking data from the same context (Aitken et al., 2018).
Previous survey experiments based on CI have shown that respondents' privacy attitudes changed depending on who exactly received which kinds of data under which conditions. For example, Martin and Nissenbaum (2017) showed that commercial uses (e.g., health data sold to pharmaceutical companies for marketing) overall conform less with privacy expectations than uses that fulfill contextual purposes (e.g., health data used for research to improve health conditions). In another study, Martin and Shilton (2016) showed that privacy expectations with respect to data collection from mobile devices for targeted ads and tracking greatly vary depending on the situational parameters. In addition to such situational parameters and contextual norms, individual characteristics may impact citizens' evaluations of data flows. For instance, individuals with high trust in government institutions may be less skeptical of data used by public authorities than individuals with lower institutional trust (Kehr et al., 2015). While individuals may, regardless of their level of trust in the government, support the use of health data for research that aims to improve public health generally (Waind, 2020), they would likely disagree on who should receive such data to achieve this purpose. Other individuals may reject the idea of sharing their personal health data with any recipient because they regard the requested data as too personal and the data sharing request as intrusive (Lacasse et al., 2021). Gerdon et al. (2021) conducted a vignette experiment on the acceptability of data sharing in which they compared the acceptance of data sharing of health data with two other data types (energy consumption and location data). They also experimentally varied the organization that received the data (a public authority or a company). Surprisingly, sharing data with a public institution was overall less accepted than sharing data with a private organization. This finding has worrisome implications, especially considering the COVID-19 pandemic but also in general for other public health crises, as public institutions rely on data to monitor and prevent the spread of diseases, for example, through contact tracing apps or the targeted implementation of public health campaigns. However, the study only investigated one specific type of health data, while health research and public health policy rely on several sources of data to tackle issues of public health.
Willingness to share health data: Data type, recipient, and purpose In this section, we discuss the effects of changes in CIbased data flow parameters on the willingness to share health data. In particular, we are interested in several recent technological and medical opportunities that have the potential to be used more frequently in Germany and in many other countries in the near future: electronic health records, biomarker data, 2 and health-related smartphone sensor data. These data types cover different types of health data collections that may happen in different social contexts with various data recipients. They especially may involve privacy considerations specific to the data type and/or private actors (Gerdon et al., 2021). On the one hand, medical records and biomarker data are usually collected in narrow and well-defined contexts that suggest high standards of data protection-that is, by physicians or other care providers, health insurances, and researchers. Sensors, one the other hand, can amass high volumes of data in infrastructures in which sharing is technically feasible among multitudes of actors, such as app developers, smartphone providers, and other thirdparty actors. Individuals may associate various contexts and potential uses when considering sharing their sensor data. The use of sensor data out-of-context appears to be a more salient threat than, for example, for the use of The Biomarker Working Group of the US Food and Drug Administration and the National Health Institute defined a biomarker as a "characteristic that is measured as an indicator of normal biological processes, pathogenic processes, or biological responses to an exposure or intervention, including therapeutic interventions" (FDA-MH Biomarker Working Group, 2021, p. 45). Biomarkers can be measured using, for example, blood, urine, or soft tissue (Hirsch & Watkins, 2020). biomarkers, which has been discussed with respect to COVID-19 tracing apps (Vitak & Zimmer, 2020). Therefore, we expect that people will be more likely to agree to share their biomarker data and medical records than their sensor data if the recipient has a public background (H1.1). For private recipients, we expect that individuals will be less likely to share data that are associated with specific health contexts (medical records and biomarkers) than sensor data (H1.2). Overall, we argue that the high effort required to share biomarkers (e.g., blood) results in a particularly strong data sharing hesitancy for this data type. Therefore, the acceptance to share biomarker data should be, ceteris paribus, the lowest among the three data types studied (H1.3).
With respect to data recipients, a particular concern is the previously found lower acceptance of data sharing with public institutions compared with private entities in Germany (Gerdon et al., 2021). Such reluctance might result from concerns that government institutions could use the data for different purposes than initially intended without asking for permission (Turow & Hennessy, 2007;Weitzman et al., 2012). While such concerns can be present for private recipients (e.g., companies) as well, concerns about potential consequences might be more pronounced for public institutions, especially with respect to government surveillance. However, research shows that there are differences in trust levels across public institutions (Krause et al., 2019), and citizens may approve of public-benefit uses of data with respect to certain public institutions that explicitly follow research purposes, for example, dedicated university research centers (Karampela et al., 2019;Mello et al., 2018).
Given the different possible public and private recipients, we argue that out-of-context use is least likely to be expected from university research centers. At the same time, the recipients are unlikely to be associated with differences in perceived benefits or required data sharing efforts. Therefore, we expect that the willingness to share data will be higher for university research institutions than for public health authorities and private companies (H2.1). Moreover, trust is a central prerequisite for accepting the sharing of health data (Bauer et al., 2019). Individuals may vary in their trust toward different recipients, irrespective of the indicated purposes for which the data will be used. Therefore, higher trust in the respective organization should, ceteris paribus, lead to a higher willingness to share data (H2.2).
Taking the contextual perspective into account, a data recipient can never be fully separated from the purpose for which the recipient plans to use the data. While each of the data types can be analyzed to provide a benefit to the individual data subject (e.g., improvement of diagnoses, recommendations on health-related behavior) and/or recipient, the public also appears to be willing to accept the use of health data for the public interest (Bearth & Siegrist, 2020;Waind, 2020)-that is, to improve public health. In both cases, individuals may perceive the data sharing to be useful. Yet, while we assume that individuals will be generally more likely to share their data if they anticipate a personal benefit (H3.1), it may depend mainly on the data sharing context, especially on the data recipient, to determine in which situation(s) these benefits are considered as sufficient, for example, because of the low risk of out-of-context use.
Some sub-contexts of the health context might be more oriented toward promoting individual health (e.g., doctor-patient relationships), while others are more linked to the improvement of public health (e.g., health agency-individual relationships regarding notifiable infectious diseases). It is likely that public recipients are associated with public-specific contextual goals, while private recipients are associated with private-specific contextual goals. However, Gerdon et al. (2021) did not consistently find such a relationship. Yet, individuals are expected to have a higher likelihood of fearing outof-context use if recipients use data for a purpose that is not in accordance with established norms. Therefore, we expect that a match between a private data recipient and a private purpose and a public data recipient and a public purpose will result in higher acceptance rates than a "mismatch" between data recipient and purpose (H3.2).
Beyond contextual characteristics, individuals may vary in how much they are willing to help others and contribute to the public welfare. That is, some individuals may be more inclined than others to perceive public health benefits as an appropriate purpose compared with individual health benefits. Thus, we hypothesize that individuals who display higher altruism (Kim & Stanton, 2016) will be more willing to share health data for public benefit than people with lower scores on altruism (H4.1). Similarly, we assume that the more individuals perceive public duties (Voigt et al., 2020), such as voting and paying taxes, as important obligations of good citizens, the more willing they will be to share health data for a public benefit (H4.2). In addition, given the general trend of increasing trust in scientists in recent years (Funk et al., 2019), 3 we expect that higher levels of general trust in the scientific community will positively affect the likelihood to share data for a public benefit (H4.3). Sharing for a personal benefit should be less or not affected by trust in science.
Finally, without a concrete hypothesis, we collected data about respondents' cancer exposure, smartphone and smartwatch use, technical affinity, social trust, and political ideology. These supplementary analyses, which are exploratory in nature, are reported at the end of the Results section.
Given the importance of data sharing for health research and policymaking, the results of our study can help inform the scientific debate about data sharing hesitancy. The study can help develop best practice advice for three data types (sensor data, medical history, and biomarkers) but also identify privacy-related social norms. Since, in practice, there is rarely a previously tested scenario that exactly matches the needs of a data recipient, the study can contribute to a better general understanding of how situational parameters may work differently for different data types. Additionally, the breakdown of data types, recipients, and purposes allows us to estimate the relative importance of each component. This will help identify the main drivers of respondents' willingness to share data. For example, for some groups of respondents, their level of trust in the data recipient might be especially important, whereas for other respondent groups, the purpose might be the most relevant variable. Getting a deeper understanding of the mechanisms behind nonacceptance can also help us develop successful and privacy-conforming data sharing practices that increase willingness to share data for research.

Preregistered research design
We conducted a preregistered survey experiment in which we randomly varied parameters of the data flow as defined by the CI framework to learn which kinds of health data German citizens were willing to share under which conditions. 4 The so-called vignette experiment or factorial survey experiment (Auspurg & Hinz, 2015) was implemented in a web survey in Germany with a minimum sample size of about 750 respondents. This sample size was based on an approximated power analysis using an ANOVA design with repeated measures and within-between interaction, using the software G*Power (Faul et al., 2007) (input parameters: effect size = 0.1, 5 α error probability = 0.05, power = 0.95, number of groups = 18, number of measurements = 3, nonsphericity correction = 1). The suggested sample size was 648 respondents. To account for possible exclusion of cases because of insufficient data quality, we increased the minimum sample size by 15 percent, which resulted in 746 respondents. The respondents were recruited from a German commercial online nonprobability access panel and received a small monetary incentive for their participation. To ensure a heterogeneous sample, we screened by gender, age, and educational attainment to represent noncrossed quotas of the German general population.
As displayed in Table 1, the vignette experiment included three dimensions: data type (sensor data, medical records, biomarkers), data recipient (public health agency, university research center, private company), and purpose of the research (public policy, personal recommendation). This resulted in 18 unique vignettes (3 Â 3 Â 2). We presented each respondent with one vignette on each data type in random order. Thus, each respondent was randomly assigned to one of the six versions (three data recipients combined with two purposes) for each data type. Random assignment and order allowed us to control for potential context effects.
To specify all CI parameters, we needed to define the data subject, data sender, and transmission principle. We kept the transmission principle constant by defining a high level of individual control over the data use-that is, we measured individual willingness to share under conditions that enable individuals to make an active decision to agree to data use or not (i.e., opt in). The data subjects were always the respondents themselves. Finally, the data sender was always fixed within each data type and adjusted to produce a realistic scenario.
The following sections provide descriptions of the vignettes by data type. 2018). This is likely to generalize to other countries as well. We also note that Germany does not deviate significantly from the international scientific trust average, whereas the United States is below the international mean (Huber et al., 2019).

5
The effect size was based on the general recommendation from Brysbaert and Stevens (2018) and Kühberger et al. (2014). We reduced the suggested number from 0.3 to 0.1 to account for the nature of our hypotheses, which feature multiple interaction effects.

Data Type 3: Biomarkers
Blood samples that are collected for biobanks can be used to assess the health conditions of people. With the consent of a person, these data are transferred to a German public health agency [private company; university research center]. This public health agency [private company; university research center] uses these data for a research program to fight cancer. [This public health agency [private company, university research center] uses these data to provide the persons with personal recommendations on their health behavior with respect to protection against cancer.] The public health agency [private company; university research center] guarantees that the data are safe, anonymous, and protected from misuse.

Other measures
The study included several additional measures, 7 which were needed to test some of our hypotheses (trust The sentence "This public health agency [private company; university research center] uses these data for a research program to fight cancer" represents the public benefit, and the sentence "This public health agency [private company, university research center] uses these data to provide the persons with personal recommendations on their health behavior with respect to protection against cancer" represents the personal benefit. Each vignette includes only one of these two research purposes.

7
The questionnaire was implemented in German. Most questions were taken from German scales and translated into English by the A preregistered vignette experiment on determinants of health data sharing behavior POLITICS AND THE LIFE SCIENCES • FALL 2022 • VOL. 41, NO. 2 in science in general, trust in public health agencies, private companies, and university research centers, altruism, attitudes toward public duties) and to conduct the additional exploratory analyses (cancer exposure, smartphone and smartwatch usage, technical affinity, social trust, political ideology, and sociodemographic characteristics). Specifically, respondents' cancer exposure was measured by asking whether the respondent, a relative, or a close friend had ever been diagnosed with cancer. Device ownership was measured by a single multiple-choice question. Technical affinity was measured using five rating scale items about, for example, how good a respondent is at operating digital systems . Public duty was measured using three items featuring a rating scale that asked about what respondents think a good citizen should do (e.g., to obey laws; ESS Round 1: European Social Survey, 2018). A respondent's level of institutional trust with respect to the three data recipients of our vignette design, and with respect to science in general, was assessed using individual items with a rating scale for each institution (based on ESS Round 9: European Social Survey, 2021). Similarly, social trust was asked using a single item with a rating scale asking whether most people can be trusted or not (ESS Round 9: European Social Survey, 2021). Respondents' altruism was measured by asking about their willingness to do something good without expecting anything in return (SOEP-IS Group, 2021). Finally, political ideology was measured using respondents' self-reported left-right orientation (ESS Round 9: European Social Survey, 2021).
The question wordings for all these measures are provided in the appendix. For measures that include multiple items, we conducted an explorative factor analysis to verify that the items load on a single factor. Items with lower factor loadings than 0.5 were excluded. 8 Basic sum scores were used to combine the items to a single measure for the respective construct.
The placement of the additional measures within the questionnaire is not a trivial decision. If they are placed before the vignette experiment, they could affect the answers to the vignettes. If they are placed after the vignette experiment, the vignette questions could affect the answers to the additional measures that are intended to explain the answers to the vignettes. Since none of these placements is optimal, a random half of the sample received the additional measures before the vignette experiment and a random half after the experiment. This randomization in the placement of the vignette experiment and the other measures allowed us to control for possible order effects within our analyses. Similarly, we randomized the order of the items within each multiple-item measure to avoid systematic question order effects.

Data
The data were collected using a sample drawn from a German online access panel administered by Bilendi and respondi, which had been used for scientific research before (e.g., Daikeler et al., 2022;Gerdon et al., 2021;Silber et al., 2019). The field time was between May 30 and June 2, 2022. The panel provider invited 14,000 panel members by email to our survey. In all, 2,423 individuals started the survey by clicking on the link in the invitation email. Of these, 34 panel members were screened out, and 1,088 could not participate because our quotas had been reached. Another 140 respondents did not complete the questionnaire. This resulted in 1,161 completed interviews before conducting quality checks. 9 The median response time was 5 minutes and 6 seconds, and the average enjoyment rating of the survey was 4.10 on a scale from 1 "not at all" to 5 "very good." To recruit a diverse set of participants, we used quotas based on the German "Mikrozensus" 2019 regarding age, gender, and educational attainment. Descriptive results of the demographics and the other measures of the initial sample (before the data quality checks) can be found in the online supplement (see Table A1 in the Supplementary Materials). The study was approved by the ethical review board of the University of Mannheim (EK Mannheim 22/2022). authors. New questions without a German version were translated by the authors. The factor analyses for the two multi-item constructs "public duty" and "technical affinity" showed that while all items for technically affinity had a factor loading above 0.5, one item measuring public duty ("To be a good citizen, how important would you say it is for a person to support people who are worse off than themselves?") had a factor score of 0.40, so that this item was not included when building the sum score. Only one respondent selected "diverse" in the gender category. To ensure the privacy of that person, we removed the respondent from the sample before the analyses and publication of the data set. However, we repeated all analyses, including this respondent, to make sure that the results presented in the manuscript were not affected by this decision.

Data quality checks
We implemented three data quality checks. 10 First, we excluded respondents who provided item nonresponse to one of the vignettes or the covariates. As a robustness check, we initially planned to impute missing values and report analyses of our hypotheses with imputation in the online supplement. Second, using paradata on response time, we excluded speeders, that is, respondents who answered the questions so fast that they could not possibly have read and processed the questions. For this, we used the method proposed by Roßmann (2010), which identifies all respondents who finish the survey in less than 60 percent of the median completion time as speeders. The analyses without speeders are included in the main text, whereas the analyses with speeders are provided in the Supplementary Materials. 11 Third, we tested whether the experimental assignment worked with respect to demographic characteristics (i.e., gender, age, and education). For this analysis, we used χ 2 -tests. In case there was a systematic dependency of the experimental assignment, we used those demographic variables as control variables throughout our analyses.

Data analyses
The data analyses included multilevel models to account for the vignette experiment's hierarchical data structure (vignettes nested in respondents). First, we analyzed our hypotheses regarding the data type (H1.1 to H1.3), data recipient (H2.1 to H2.2), and purpose of the research (H3.1 and H3.2). H1.3, H2.1, and H3.1 are concerned with the main effects of the data type, data recipient, and research purpose on the willingness to share data, while H1.1, H1.2, and H3.2 were tested by considering an interaction effect between the vignette characteristics and data type, data recipient, and research purpose. To test H2.2, H4.1, H4.2, and H4.3, interactions between vignette characteristics and respondent characteristics were specified, namely, between data recipient and trust in the respective institution (H2.2), research purpose and altruism (H4.1), research purpose and attitudes toward public duties (H4.2), and research purpose and trust in science (H4.3). While the main analyses focused on random-effects models in which the dependent variable was treated as continuous, we implemented two additional model sets as robustness checks. These included fixed-effects models with continuous outcomes and random-effects models in which the dependent variable was treated as ordinal.

Transparent changes
We deviated from the preregistration in three instances. First, we planned to test the experimental assignment regarding the region in which respondents live but failed to collect this variable in our study, so we had to deviate from the preregistered analyses in this respect. Second, the step of excluding respondents who did not complete the full questionnaire was moved from the data quality check section to the data section without changing the procedure of excluding incomplete interviews. Third, since only a small number 10 After the preregistration, the step of excluding respondents who did not complete the full questionnaire was moved from the data quality check section to the data section without changing the procedure of excluding incomplete interviews.

11
A benchmark from a previous study with a similar respondent pool from which we have drawn our sample showed that about 7 percent of the sample was identified as speeders (Roßmann, 2017). However, if we had experienced an unusually high number of speeders (i.e., more than 15 percent), we would have increased the sample size and not have relied on imputation. Similarly, if we had excluded a large number of respondents because of breakoffs or item nonresponse, we would have increased the sample size of the study to achieve the minimum sample size. In addition, we asked respondents at the beginning of the questionnaire to carefully read and answer the questions to mitigate speeding (Conrad et al., 2017).

Data quality and robustness checks
First, we excluded up to eight respondents who provided item nonresponses, depending on the variables included in the specific analysis (see Tables A1-A15 in the Supplementary Materials). Since such a small number of respondents provided item nonresponse, we decided against replicating the analyses with imputed values for those respondents. Second, we excluded 146 speeders, which were defined as respondents who finished the survey in less than 60 percent of the median completion time. Analyses with speeders can be found in the online supplement (see Table A15 in the Supplementary Materials). This robustness check showed that the decision of excluding speeders did not affect the substantive results reported here. Third, a series of χ 2 -tests confirmed that the experimental assignment of the vignettes worked except for education and data recipient. Thus, we included education as an additional covariate in all models (see Tables A2-A10 in the  Supplementary Materials). 14 In addition, we included a question asking respondents whether they had read the vignettes carefully, with seven response categories ranging from 1, "not at all carefully," to 7, "very carefully," which had a mean rating of 6.10. Since only eight respondents selected the values 1 or 2, we decided against a robustness check excluding those respondents.
As robustness checks, we replicated the multilevel models (1) as fixed-effects models with continuous outcomes and (2) as random-effects models in which the dependent variable is treated as ordinal (see Tables A16-A21 in the Supplementary Materials). Neither alternative approach changed the substantive findings compared with the random-effects models with continuous outcomes. Table 2 shows the descriptive results for each of the 18 vignettes. The level of willingness to provide data for health research ranged from 3.37 for sharing sensor data with a public health agency for a personal benefit to 4.84 for sharing biomarkers with a university research center for a public benefit. Given that the scale ranged from 1, "very unlikely," to 7, "very likely," the sharing levels are around the midpoint of the answer scale, with four vignettes showing values above 4.5 and two vignettes showing values below 3.5.

Preregistered hypotheses
Regarding our hypotheses about the main effect of the vignette experiment, Model 1a in Figure 1 shows that H1.3, which suggested that biomarkers would return the lowest willingness to share, was not supported. On the contrary, respondents reported that they would be significantly more likely to share biomarkers ( b β = .616, p < .001) and medical records ( b β = .435, p < .001) compared with sensor data. The main effect hypothesis regarding the recipient (H2.1) suggested that the willingness to share would be highest for university research centers. The data supported this hypothesis, with respondents showing significantly lower willingness to share health data with both other recipients: private companies ( b β = -.660, p < .001) and public health agencies ( b β = -.380, p < .001). With respect to the purpose, we expected that respondents would be more willing to share their health data if they anticipated a personal benefit (H3.1). However, the experimental results show that the willingness to share was significantly higher for the vignettes that featured a public benefit as compared to a personal benefit ( b β = -.256, p < .001). When considering interaction effects, none of our hypotheses about the interaction between data type and recipient (H1.1 and H1.2, Model 1b) and the interaction between recipient and purpose (H3.2, Model 1c) was supported (p ≥ .05). Figure 2 shows the interaction effects with additional measures. H2.2 suggested that higher levels of trust in the respective recipient will result in a higher willingness to share health data. The experimental results support this hypothesis for the two recipients, private company ( b β = .117, p < .001, Model 2a) and university ( b β = .103, p < .001, Model 2c), but not for public agency ( b β = .025, p = .268, Model 2b). Hypotheses H4.1, H4.2, and H4.3 suggested interaction effects of public purpose with trust in science in general, perceptions of the importance of public duties, and altruism. The interaction effects for trust 14 The tests of the experimental assignment included age, gender, and education. As an additional sensitivity check, we recalculated all models without including education to ensure that including it did not affect our substantive conclusions. in science ( b β = .060, p = .013, Model 3a) and altruism ( b β = .054, p = .011, Model 3b) were in the expected direction and significant, showing higher willingness to share when they displayed higher values on these covariates, while public duty showed an effect in the expected direction, which was, however, not statistically significant ( b β = .019, p = .121, Model 3c).

Exploratory analyses
We also included several variables for additional exploratory analyses shown in Figure 3 (see Table A1 in the Supplementary Materials for descriptive results of these additional variables). With respect to demographics, young respondents (18-28 years) reported a significantly higher willingness to share their health data than respondents aged 29 to 64 years (p < .05, Model 4a). The effects of educational attainment and gender were statistically nonsignificant (p > .05). Respondents who owned a smartwatch ( b β = .300, p = .024) and/or a smartphone ( b β = .505, p = .022) and respondents with higher levels of technical affinity ( b β = .038, p < .001) reported a significantly higher willingness to share their data than respondents who did not own either of these devices (Model 4b). Respondents with higher levels of trust in others (i.e., social trust, b β = .147, p < .001, Model 4c) and respondents who have been confronted with cancer personally or in their close social environment reported a significantly higher willingness to share their health data ( b β = .271, p = .019, Model 4d). In contrast, respondents with higher privacy concerns reported a significantly lower willingness to share their health data ( b β = -.267, p < .001). Self-reported political ideology did not affect respondents' willingness to share their data ( b β = -.018, p = .509).

Summary of results
The results of the vignette experiment confirmed that all three dimensions experimentally tested in our vignette study (data type, recipient, and purpose) significantly influenced individual data sharing decisions. However, the effects of two of the three main effects of vignette dimensions were statistically significant in the opposite direction than hypothesized. Specifically, of our main effects hypotheses, only hypothesis H2.1 regarding the effect of the different recipients on respondents' data sharing intentions was supported, as university researcher centers were the most accepted recipients. Yet, the hypotheses about interaction effects between the vignette dimensions were not supported. From a CI perspective, this finding is somewhat striking, as we would have expected the effects of single parameters to depend on the specification of the other parameters. One explanation is that the specific data sharing scenarios that we investigated come with similar privacy expectations once they are placed within the respective health contexts. In contrast, most of our hypotheses about interactions with additional measures were supported (e.g., public purpose and altruism), and most of our exploratory analyses showed statistically significant effects (e.g., social trust and privacy concerns). The latter results indicate that general attitudes and characteristics of respondents indeed influenced their willingness to share across scenarios. With respect to the different data types, our study found that respondents reported higher willingness to share biomarkers and medical records compared with sensor data for health research, which echoes the finding of . A possible reason for this finding is that the threat of out-of-context use for sensor data appeared to be more salient than for the other two data types (Vitak & Zimmer, 2020). Another reason is the hypothetical nature of the outcome variable of our study: respondents may have not considered the higher data sharing effort for biomarkers compared with sensor data.
Our study did not reproduce the result of Gerdon et al. (2021) that respondents were more willing to share their data with a private than with a public recipient. Possible reasons are that we referred to more specific public institutions than Gerdon et al. (2021) and that public trust levels toward public authorities changed during the pandemic. While the willingness to share was the highest for university research centers, respondents were also more likely to be willing to share their data with a public health agency compared with a private company. This finding reinsures confidence in publicly funded health research. However, for data related to current crises or data directly linked to concerns of government Figure 1. Results of the multilevel regression analyses predicting willingness to share health data: Main effects and interaction effects between the experimental dimensions. Model 1a displays results of the main effects of the vignette dimensions. Model 1b displays the main effects and the interaction between data recipient and purpose. Model 1c displays the main effects and the interaction between data type and recipient. The dots show the respective point estimates, and the bars indicate the 95% confidence intervals.
surveillance, the findings might be different. Additional research is needed to explore this further.
With respect to the purpose of the data collection, the study showed that respondents were more likely to be willing to share their data in case of a public benefit compared with a personal health recommendation. This finding confirms previous research suggesting that sharing health data in the interest of improving public health aligns with societal norms and is, therefore, highly accepted (Bearth & Siegrist, 2020;Waind, 2020). However, our findings do not support the assumption drawn from the privacy calculus (Culnan & Armstrong, 1999), which would have suggested that individuals are more likely to share their data if they expect personal (health) benefits.

Practical implications
Our study illustrated that willingness to share health data is closely connected to individual variables such as institutional and social trust, privacy concerns, altruism, technical affinity, and age. Building on this information, invitation letters to protentional study participants could illustrate the trustworthiness of the respective data recipient and the purpose of the data collection. More generally, and in line with previous research (e.g., Aitken et al., 2016;Rosman et al., 2022;Waind, 2020), the findings Figure 2. Results of the multilevel regression analyses predicting willingness to share health data: Additional measures and interaction effects. All models (2a-3c) include main effects of the vignette dimensions (not shown). Models 2a-3a display the results for various trust measures. Model 3b displays the results of altruism and Model 3c for public duty. The dots show the respective point estimates, and the bars indicate the 95% confidence intervals.
underline that health research needs to clearly show that it serves public interest to achieve public acceptance. In the invitation letter, researchers should also make sure to address study-specific privacy concerns regarding data collection, storage, and processing. Beyond that, the study suggested that a private company or public health agency, which plans to run a data sharing campaign, may likely increase the trustworthiness of their projects by involving independent university researchers. Finally, the more an institution knows about the data sharing norms, preferences, and privacy concerns of the target population, the more it can tailor the design of the health data collection.
Researchers who are interested in estimating how many participants they need for their study are advised to be mindful that a data sharing process has several steps. In this study, respondents first had to follow the invitation to take part in the survey. They then had to complete the entire survey and provide answers of sufficient quality (e.g., without speeding through the questionnaire). In actual health data collections, individuals would have to answer the request for sharing additional health data affirmatively and complete that data sharing procedure successfully. Yet, for the generalizability of a study, it is not merely important how many people are willing to share their data; it is as critical whether there Figure 3. Results of the multilevel regression analyses predicting willingness to share health data: Exploratory analyses. All models (4a-4d) include main effects of the vignette dimensions (not shown). Model 4a displays the results for the demographic variables. Model 4b displays the results of device ownership and technical affinity. Model 4c displays the effects for political ideology and social trust. Model 4d displays the results for cancer exposure and privacy concerns. The dots show the respective point estimates, and the bars indicate the 95% confidence intervals.
are specific subgroups of invited persons who are not willing to share their health data (or take part in the survey). For example, if a study is focused on vaccinations against COVID-19 and the realized health data sample only includes people who had at least three vaccinations, important subgroups of the population would be missing, and the generalizability of the study would be limited in that respect. Thus, researchers should always consider both aspects simultaneously, optimizing participation and minimizing sample bias.

Limitations
This research has several limitations. First, we use cancer research as our study topic. While cancer research is less affected by current events than other health research topics, such as the COVID-19 pandemic, it remains an open question to what degree our findings will generalize to other health topics. Cancer research might be perceived as more important than less severe diseases, so that we expect lower data sharing rates for those topics. Second, our study was carried out during the COVID-19 pandemic, when sharing health data might be generally viewed more positively than during times when personal and public health are less salient topics. Third, one might wonder whether our findings will generalize to other countries. While this is again a question for future investigations, research has shown that privacy concerns and related behavior may differ across countries (e.g., Li, 2022;Trepte et al., 2017). Moreover, the digitalization of the health system in Germany is not considered very advanced (Bertelsmann Stiftung, 2019). Thus, willingness to share health data may be higher in countries with fewer privacy concerns and/or a higher level of digitalization of the health system. Fourth, our vignette experiment only captures people's intent to share health data. While this approach allows us to experimentally manipulate several factors at once, it negatively influences the external validity of our study. However, previous research has shown that there is a strong association between intended and actual behavior (e.g., Hainmueller et al., 2015;Petzold & Wolbring, 2018;Sheeran, 2002), so that we believe that most of our main findings will be directly transferable to "real-world" data sharing situations. An advantage of our hypothetical study is that the results will not be influenced by the specific data sharing method, which can have a large impact on the results . Maybe most importantly, researchers should expect substantially lower data sharing rates in studies in which actual data is requested, because the costs for respondents are higher since they have to share their data. 15 Another aspect that could reduce the data sharing rates in studies that measure actual sharing behavior is that following the request and providing data appears to be socially desirable. Given the lower costs of the hypothetical situation, more people might tend to answer the request affirmatively. Finally, our study uses a nonprobability sample. While prior research has shown that multivariate relationships obtained from such surveys often generalize to the general population, univariate distributions and bivariate associations should be treated with the appropriate caution (Cornesse et al., 2020). However, our study focuses on uncovering multivariate and causal relationships.

Conclusion
Our vignette study showed that the willingness to share health data is highly dependent on the specific data sharing situation. All three vignette dimensions (data type, recipient, and research purpose) significantly affected respondents' willingness to share their data. Similarly, the additional variables measuring trust, privacy, age, and device ownership affected the reported willingness to share health data. However, we found no meaningful interaction effects between the vignette dimensions. From a CI perspective, this raises questions on the similarity of social norms of data sharing scenarios within specific health contexts. The results suggest that individual data sharing decisions are affected by a multitude of factors, which include the idiosyncrasies of a data sharing situation as well as individual variables. Thus, since data sharing decisions are embedded in complex social contexts, we need to ensure that study design, research infrastructure, and public communication of science, as well as invitations to participate in studies, create a trustworthy environment and aim to foster public benefits.

15
At the same time, the benefits are usually also higher since respondents often receive monetary incentives for their data sharing effort. This can help counterbalance the additional data sharing effort.

Data availability statement
This article earned Open Materials, Open Data, and Preregistration badges for open scientific practices. The materials, data, and preregistration that support the findings of this study and the award of these badges are openly available at https://doi.org/10.23668/ psycharchives.7058 (data and codebook), https://osf.io/ p6h7j/ (analyses code in R), and https://osf.io/kgwe7 (preregistration report).

Political ideology
Source: ESS Round 9: European Social Survey (2021). Licensed under a CC BY-SA 4.0 International License.
In politics people sometimes talk of "left" and "right". Where would you place yourself on this scale, where 0 means the left and 10 means the right?

Public duties
Source: ESS Round 1: European Social Survey (2018). Licensed under a CC BY-SA 4.0 International License.
To be a good citizen, how important would you say it is for a person to… …support people who are worse off than themselves?

Social trust
Source: ESS Round 9: European Social Survey (2021). Licensed under a CC BY-SA 4.0 International License.
In general, do you think that most people can be trusted, or that you can't be careful enough when dealing with other people?
• 0 -You can never be too careful • 10 -Most people can be trusted To what extent do you trust public health agencies in general?
To what extent do you trust private companies in general?
To what extent do you trust university researchers in general?
To what extent do you trust the scientific community in general?
• 10 -Complete trust Altruism Source: SOEP-IS Group (2021). Licensed under a CC BY-SA 4.0 International License. Now we would like to know how well the following statement describes you as a person.
I am willing to do something for a good purpose without expecting anything in return.
• 0 -Does not describe me at all • 10 -Describes me perfectly