Polarization of public trust in scientists between 1978 and 2018 Insights from a cross-decade comparison using interpretable machine learning

Abstract. The U.S. public's trust in scientists reached a new high in 2019 despite the collision of science and politics witnessed by the country. This study examines the cross-decade shift in public trust in scientists by analyzing General Social Survey data (1978–2018) using interpretable machine learning algorithms. The results suggest a polarization of public trust as political ideology made an increasingly important contribution to predicting trust over time. Compared with previous decades, many conservatives started to lose trust in scientists completely between 2008 and 2018. Although the marginal importance of political ideology in contributing to trust was greater than that of party identification, it was secondary to that of education and race in 2018. We discuss the practical implications and lessons learned from using machine learning algorithms to examine public opinion trends.

t the center of many social debates are questions related to science: Has the global temperature increased to a dangerous level? Do childhood vaccines cause autism? Are genetically modified crops as safe as conventional ones? Despite established scientific consensus on these issues, Americans are divided in their answers (Pew Research Center, 2015). Many politicians and party leaders take positions that contradict what the best available evidence suggests, rendering their followers suspicious of the motives and intellectual legitimacy of scientists (Druckman et al., 2013).
Despite the collision of science and politics witnessed by the country, the U.S. public's trust in scientists reached a new high in 2019, with 86% having "a great deal" or "fair amount" of confidence in scientists, surpassing trust in the military, elected officials, business leaders, and the news media (Funk et al., 2019;Funk et al. 2020). Many believe that this surge in public trust is encouraging. Especially in the early months of the COVID-19 pandemic, scholars and policymakers expected the U.S. public to comply with public health guidelines, driven by high trust in scientists and medical doctors (Plohl & Musil, 2020). Nonetheless, the public health crisis ended up being profoundly political, with liberals and conservatives differing drastically in their attitudinal and behavioral responses to the pandemic (Gollwitzer et al., 2020).
The divisive response to COVID-19 can be at least partially attributed to a long-term crisis of public trust in scientists and science-driven institutions (Millstone & van Zwanenberg, 2000). Previous studies documented a gradual yet steady decline of confidence in scientists among conservatives from the 1970s to 2010 (Gauchat, 2012). More recently, there was a "sharply increased gap" between Democrats' and Republicans' confidence in scientists between 2016 and 2018 (Krause et al., 2019, p. 822), which appeared to grow worse in 2019 (Funk et al., 2019). As public trust in scientists appears to be polarized along political lines, it is imperative to precisely characterize the polarization trend and identify the group leading the change.
To achieve these goals, we conducted a secondary analysis of General Social Survey (GSS) data using interpretative machine learning (ML) algorithms. We first trained a series of linear and nonlinear ML models to predict individuals ' trust in 1978, 1988, 1998, 2008, and 2018 using model-agnostic methods for ML interpretation (e.g., computing and analyzing Shapley values). Lastly, we graphed the predicted probabilities to visualize the polarization trend. The practical implications and lessons learned from using ML algorithms for examining public opinion trends are discussed.

Public trust in scientists
Previous researchers defined the concept of "trust" in various ways based on the context in which trust is developed (Brossard & Nisbet, 2007;Nadelson et al., 2014). While interpersonal trust refers to the expression of positive expectations for others, social trust denotes the impersonal trust that is attributed to public institutions . This study defines public trust in scientists as individuals' confidence in the collective performance of scientists as a public institution . This type of social trust is conceptually different from epistemic trust, which refers to "trust in science behind the technology under discussion" (Sjöberg & Herber, 2008, p. 32).
On a related point, it is necessary to distinguish between "trust in scientists" and "trust in science as the most reliable way to gain knowledge" (Cofnas et al., 2018, p. 137). Specifically, "[a] person might not believe that the scientific method is the best way to learn about the world but support scientific authorities on the basis that they advocate policies with which the person happens to agree. Conversely, someone might strongly believe in the scientific method, but doubt that mainstream scientific authorities are living up to its requirements" (Cofnas et al., 2018, p. 137). Therefore, we caution against equating declining trust in scientists among some segments of the population with the growth of antipathy toward scientific research or resistance to certain scientific findings. Rather, declining trust should reflect individuals' skepticism of mainstream scientists whose stances may appear to contradict their values or cultural outlooks.
Public trust in scientists has been split for years in the United States, with individuals on the political right reporting less trust in scientists than those on the political left. As of 2019, despite majorities of Republicans and Democrats thinking that scientists are intelligent, nearly 40% of Republicans believed that research scientists "don't pay attention to moral values of society" (Funk et al., 2019). Many more liberals (73%) agreed that scientists should take an active role in policy debates than conservatives (43%) (Funk et al., 2019). Additionally, Democrats and Republicans tend to trust different types of scientists, depending on the pragmatic value of their research outcomes (McCright et al., 2013). While liberals trust scientists who study the environment and public health, conservatives trust scientists who create more economic benefits (McCright et al., 2013).
These findings shed a bright light on the psychological mechanisms through which political subgroups form their trust in scientists as a public institution. As most Americans do not have direct experience doing scientific research, they tend to rely on the research outcomes of certain scientists to assess their credibility. Kahan et al. (2011) found that when assessing the credentials of a white male scientist, participants were only willing to acknowledge him as an "expert" when the scientist's view was congruent with the dominant view of their political group. Many participants dismissed prominent cues signaling academic credentials (e.g., membership in the National Academy of Sciences) and discredited the expert if his position did not resonate with their cultural outlook (Kahan et al., 2011).

Polarization of public trust
The current division in public trust has resulted from a longitudinal shift in opinion. Through an analysis of GSS data, Gauchat (2012) found that the predicted probability of conservatives having "a great deal" of confidence in the scientific community declined significantly between 1972 and 2010; however, the change among nonconservatives was nonsignificant. This finding has been widely cited to assert that public trust in scientists is increasingly polarized as the right moves asymmetrically to the extreme (e.g., Krause et al., 2019;Lewandowsky & Oberauer, 2016;Motta, 2018;Nadelson et al., 2014). Early in this period, conservatives' distrust in scientists can be primarily attributed to the "political philosophy accompanying the NR (New Right)" after the Ronald Reagan era and the "increased connection between scientific knowledge and regulatory regimes in the United States, the latter of which conservatives generally oppose" (Gauchat, 2012, p. 172).
Later, the polarization of public trust co-occurred with the politicization of science in the context of policymaking. Using a 2013 online survey of 2,000 registered voters, Blank and Shaw (2015) examined how Democrats' and Republicans' deference to scientific expertise varied for a set of policy issues, ranging from climate change and teaching evolution to AIDS prevention and regulation of nuclear power. Results showed that Democrats were deferential to scientists' advice across all issue domains. Republicans, in contrast, rated their deference below the midpoint for issues such as global warming, same-sex adoption, mandatory health insurance, and evolution. Notably, Republicans, at least in 2013, held opinions very close to those of independents: "it is the relative pro-science attitudes of Democrats that stand in contrast to the rest of the population and not the anti-science attitudes of Republicans" (Blank & Shaw, 2015, p. 26).
More recently, the Donald Trump administration's perceived anti-science stance, along with its cuts to research funding, triggered resistance from scientists and science enthusiasts (Mervis, 2020;Newman, 2020). A series of mobilized science events occurred beginning in April 2017, known collectively as the "March for Science." Dozens of scientists and academics organized the events in partnership with interest groups devoted to scientific advancements, including the American Association for the Advancement of Science (Wessel, 2017). In 2017, the group organized more than 600 marches across the globe and continued to operate until 2018 (Science News Staff, 2018). Most participants in the March for Science agreed that scientists' working conditions were getting worse and assigned the most blame to Republicans in Congress and to President Trump (Myers et al., 2018).
The March for Science received significant attention from major news outlets and triggered heated debates among political leaders, media pundits, and celebrities on social media (Science News Staff, 2017). Using a panel survey of Amazon Mechanical Turk workers, Motta (2018) found that the gap between liberals' and conservatives' attitudes toward scientists was exacerbated immediately after the first rally on April 22, 2017. While liberals' attitudes toward scientists became more positive after the event, conservatives' attitudes shifted in the opposite direction. The intense media coverage might have led the public to view scientists as a "liberal constituency" and political partisans to be affectively polarized toward scientists.
Another notable trend is the emergence of a "war on science" frame amid the March for Science events (Hardy et al., 2019). The term "war on science" was coined by Chris Mooney in his 2005 book The Republican War on Science. Since then, people have used the term to condemn right-leaning politicians' attempts to undermine, alter, or interfere with the scientific process for political reasons (Hardy et al., 2019). Being verbally aggressive, the "war on science" frame casts blame on Republicans for the eroded authority of scientists. Hardy et al. (2019) found that after reading a blog post titled "War on Science," conservatives were less likely to rate the scientific community as credible than liberals. However, the attitudinal difference was more significant among those who believed the blog was "aggressive" than among those who thought it was "polite." Since 2017, political partisans' trust in scientists appears to have become further polarized. In 2016, 41% of Democrats reported having "a great deal" of confidence in scientists, while 36% of Republicans held the same view (Funk et al., 2019). In 2018, The 5-point gap grew to 11 points as more Democrats became positive toward scientists (Krause et al., 2019). In 2019, the gap increased to 16 points as 43% of Democrats indicated they had a great deal of confidence in scientists "to act in the best interest of the public," whereas only 27% Republicans believed so (Funk et al., 2019).

Characterizing the polarization trend with interpretable machine learning algorithms
Despite the documented polarizing trend, there is ambiguity surrounding the conceptualization of "polarization" (Baldassarri & Bearman, 2007). Polarization was first defined as a bimodal distribution of public opinion in which most members of the public hold extreme yet opposing opinions on certain issues (Baldassarri & Gelman, 2008). Such bimodal distribution is different from a nonpolarized, normal distribution, in which most people are modest and fewer people are extreme. However, according to this definition, few opinion differences in the United States would merit the label "polarization" (Fiorina & Abrams, 2008). Hence, researchers have defined polarization in terms of increasing bimodality-that is, the middle losing people to both extremes (Fiorina & Abrams, 2008).
To empirically demonstrate the polarization of public trust in scientists, researchers need to (1) identify when and how people with various political orientations start to differ in their attitudes and (2) examine how the attitudinal gap grows over time. Previous researchers have attempted to achieve these goals by analyzing GSS data with descriptive and inferential statistics. confidence in scientists over time and visualized the fluctuation with a line graph. Although such approach helps researchers and pollsters gain intuitive insights into the opinion trend, it may oversimplify the picture by excluding those in the middle and ignoring the other ordinal options that respondents can choose from (e.g., "hardly any" and "only some" confidence).
In contrast, Gauchat (2012) combined 29 years of GSS data  and used mixed-level logit models to examine changes in trust among conservatives, moderates, and liberals across the decades. Since traditional tests of equality of coefficients across groups were not suitable in this case, Gauchat computed and compared the predicted probability of each group having "a great deal" of confidence over the period. Although the predicted probability drastically declined for conservatives, it did not change significantly for liberals and moderates. While such evidence arguably supports the polarization thesis, it might be subject to bias, as the author did not examine the changing probabilities for other response options. In addition, as the predicted probabilities generated from logit regressions only represent the sample average, the author did not assess the prediction accuracy. Additionally, while Gauchat (2012) suggested that Democrats and Republicans do not differ in their trust in scientists, Krause et al. (2019) noticed the gap between Democrats and Republicans. It remained unclear whether the polarization trend is primarily driven by people with distinct political ideologies or partisan identifications.
With these considerations in mind, we attempted to characterize the polarization of public trust by reanalyzing the GSS data. Specifically, we aimed to (1) generate predicted probabilities for all response options with an estimate of prediction accuracy and (2) examine the relative importance of political ideology in contributing to trust compared with other political and demographic factors. We chose interpretable ML as an alternative analytical approach to achieve these tasks.
ML trains machines to learn patterns from data and use the knowledge gained to make predictions (Murphy, 2012). Different from conventional statistics, ML presents excellent flexibility in finding inherent associations in data (Murphy, 2012). In our case, ML can be used to predict how people with various political orientations report trust without assuming a linear relationship between the variables. In addition, researchers can assess model performance using various metrics, including accuracy, precision, recall, or a combination of the last two (Murphy, 2012). Using certain ML interpretation techniques, such as computing and analyzing Shapley values, researchers can understand the marginal contribution of input variables to predicting the output variable (Štrumbelj & Kononenko, 2013).
This study used five common ML algorithms to predict individuals' probability of having "a great deal," "only some," and "hardly any" confidence in scientists in 1978, 1988, 1998, 2008, and 2018. We compared the model performance metrics and Shapley values to examine whether political ideology plays an increasingly important role in predicting trust across the decades. In addition, we compared the marginal contribution of political ideology, party identification, and other demographic factors, including age, gender, education, race, and income with predicting trust. Last, we graphed the predicted probabilities to delineate how public trust has become polarized along ideological lines over time.

Data and sample
The National Opinion Research Center has conducted the GSS since 1972. The basic GSS design is a repeated cross-sectional survey of a nationally representative sample of noninstitutionalized adults who speak either English or Spanish (Robinson, 2014). The GSS has a response rate of over 70% (Robinson, 2014). The selected years-1978, 1988, 1998, 2008, and 2018yielded a sample of 7,349 valid responses, which constituted 18.7% of the total valid responses for this question.

Input, output variables, and measures
The output variable-trust in scientists-was measured by a question that has been asked for all but two years (1972 and 1985) since the GSS was first conducted. The question reads: "I am going to name some institutions in this country. As far as the people running these institutions are concerned, would you say you have a great deal of confidence, only some confidence, or hardly any confidence at all in scientific community?" Coded response options include "a great deal," "only some," and "hardly any." For input variables, we included age, gender, education, income, race, party identification, and political ideology. Age was coded as an integer ranging from 18 to 89. Gender and race were one-hot encoded as they were nominal variables. One-hot encoding is a technique frequently used in ML to transform categorical variables into numeric ones when pre-processing data (Kuhn & Johnson, 2019)  categorical variable into a series of dummy variables. Education was measured by the highest number of years of school completed, ranging from 0 to 20. Income was measured by a total of 12 brackets indicating the total pretax household income. Party identification was measured on a 7-point scale (0 = "strong Democrat," 3 = "independent," 6 = "strong Republican"). Similarly, political ideology was measured on a 7-point scale (1 = "extremely liberal," 4 = "moderate," 7 = "extremely conservative").

Training interpretable machine learning models
We considered five popular ML algorithms: multiclass logistic regression, linear support-vector machine, random forest, eXtreme Gradient Boosting (XGBoost), and deep neural networks (DNN). These algorithms cover both parametric and nonparametric learning algorithms with varying abilities in accounting for linear and nonlinear relationships. Multiclass logistic regression estimates categorical values (e.g., choosing "a great deal," "only some," or "hardly any") based on a given set of input variables. It predicts the probability of an event by fitting the data to a logistic function (Murphy, 2012). A support-vector machine is a robust classification method that maps training examples to points in space so as to maximize the width of the gap between categories. Random forest and XGBoost models are both decision-tree-based ensemble algorithms. Such models are used to handle nonlinear relationships, as a tree can split on any numerical feature multiple times at different value thresholds. They usually outperform the simple linear models for small to medium-sized data. Lastly, DNN is an artificial neutral network with multiple layers that consist of components functioning like the human brain and can model complex relationships.
To monitor and fine-tune the learning process, we went through a robust process to set the model hyperparameters (see the Appendix, section I, for details). In addition, we split the data into an 80% training set and a 20% test set. We only included data with nonmissing values and weighted all data points equally. Lastly, we compared the cross-validation error and the test error to make sure the models were not overfit. To assess the chosen models' performance, we used the F1 score, which is a harmonic mean of precision and recall. We trained a total of 25 models using five algorithms for each of the five data sets. The F1 scores for the testing sets ranged from 0.49 to 0.61 for multiclass logistic regression models, 0.50 to 0.61 for linear support-vector machine models, 0.49 to 0.60 for random forest models, 0.52 to 0.59 for XGBoost models, and 0.51 to 0.63 for DNN models (see the Appendix, section II, for details).
Additionally, we computed and compared the Shapley values, not only to examine the relationship between political ideology and trust, but also to investigate the marginal contribution of political ideology to predicting trust when compared with other input variables. Originating from cooperative game theory, Shapley values provide guidance on how to fairly share a payout among the players in a collaborative game (Štrumbelj & Kononenko, 2013). The collaborative game idea can be applied to ML, where input variables (i.e., the players) collaborate to make a prediction. Shapley values are used to measure the marginal contributions of predictors and offer insights into the model's interpretability (Lundberg & Lee, 2017;Štrumbelj & Kononenko, 2013). Inspired by several methods, Lundberg and Lee (2016) proposed Shapley values as a unified measure to analyze the outputs of ML models. Predictors with higher Shapley values make more marginal contributions than those with lower Shapley values.

Results
An analysis of the Shapley values across the models suggests that political ideology made an increasingly important contribution to predicting trust between 1978 and 2018 when compared with demographic factors. Taking the logistic regression models, for example (see Figure 1), the Shapley value of political ideology was lower than that of education, race, income, gender in 1978 and 1988. In 1998, race, education, and income had higher Shapley values than political ideology. Similarly, the Shapley value of political ideology was lower than those of race, education, and age in 2008. However, the Shapley value of political ideology increased significantly in 2018, making it the third most important factor contributing to trust. Similar patterns were found when comparing the Shapley values of all variables in other models (see the Appendix, section III, for details). Although political ideology made a limited contribution to predicting trust from 1978 to 1998, its contribution increased significantly beginning in 2008 and followed only education and race in 2018. Additionally, while linear models (i.e., multiclass logistic regression and linear supportvector machine) yielded higher Shapley values for race, especially in 1998, 2008, and 2018, nonlinear models (i.e., random forest, XGBoost, and DNN) yielded higher Shapley values for education during those years. Interestingly, although the multiclass logistic regression model assigned more importance to party identification than political ideology in 2008, other models consistently generated higher Shapley values for political ideology in 2008 and 2018.
To further examine the relationship between political ideology and trust, we trained a series of one-dimensional models using multiclass logistic regression. As the In addition, we predicted the probabilities of having "a great deal," "only some," and "hardly any" confidence based on political ideology and graphed the results for individuals of every ideological stripe (see Figure 2). Overall, the predicted probability of reporting "only some" confidence remained the highest across the decades. While the probability of reporting "a great deal" of confidence surpassed that of reporting "hardly any" between 1978 and 2008 for almost all people, a significant portion of conservatives were more likely to report having "hardly any" confidence in 2018 (see Figure 2). These results suggest that the U.S. public's trust in scientists has become increasingly polarized as more and more conservatives lost confidence in scientists completely, while liberals remained high in their trust.

Discussion
Americans' trust in scientists has been divided along political lines for years, although evidence characterizing the division is mixed. Using interpretable ML algorithms as an alternative analytical method, this study delineated the polarization of public trust as a manifestation of the "increasing bimodality" process. Results suggest that political ideology has played an increasingly important role in shaping public trust during the past decades. As the marginal contribution of political ideology was consistently greater than that of party identification, it was reasonable to infer that the polarization trend was primarily driven by conservatives who had declined in trust since late 2000s. While liberals were more likely to report having "a great deal" of confidence than having "hardly any" confidence across the decades, the reverse was true for conservatives in 2018. Nonetheless, the attitudinal gap caused by political ideology might not be as significant as that caused by education and race, as those factors presented greater marginal contribution when predicting trust.
Before discussing the results in more detail, we note a few methodological limitations. First, although the study used various ML algorithms, the primary purpose was not to obtain the most accurate predictions, which is frequently the objective of state-of-the-art ML algorithms. Second, given the imperfect F1 scores, we caution against generalizing the predicted probabilities to the population or overstating the prediction accuracy. Any interpretation of the visualized trend ( Figure 2) should highlight the important shifts in data patterns but not the exact probabilities. Future researchers may include additional input variables or use a much larger sample size for model training to generate more accurate predictions. Lastly, while we chose five years to represent different decades, there could be more nuanced changes occurring between those years; future researchers may wish to examine the yearly shift in data trends.
Despite these limitations, we recommend using interpretative ML algorithms to examine the dynamics of public opinion formation and shifts. Especially for large longitudinal data sets, ML algorithms can help discover specific trends and patterns that might not be easily discernible using statistics. In addition, while ML models used to be known as "mysterious," many interpretation techniques are available now to explain ML models and their predictions. Future scholars may apply such techniques to investigate the determinants of public opinion and understand how those effects change over time. For example, they may use feature importance (Fisher et al., 2019) to examine the relationship between political ideology and trust; they can also use average local effects (Apley & Zhu, 2020), partial dependence plots (Friedman, 2001), and individual conditional expectation curves (Goldstein et al., 2015) to examine the change in predictions based on political ideology over time.
Maintaining a uniformly high level of trust in scientists at the societal level is a necessary condition for securing scientists' cultural authority and promoting democratic processes in policymaking (Howell et al., 2020). In recent years, Americans have achieved a historically high level of trust in scientists. People trusted scientists more than other institutions (e.g., industry leaders, news media, and elected officials), even when it came to contested scientific issues, such as vaccine safety, climate change, and genetically modified foods (Funk, 2017). Nonetheless, underlying this promising trend is an increasingly polarized public that differentiates trust along political ideology lines. While some extreme conservatives initiated this trend by losing trust in scientist in the 2000s, more and more conservatives lost trust in scientists completely, resulting in an exacerbated gap cannot be not easily amended. Considering what occurred during the COVID-19 pandemic, a divided Polarization of public trust in scientists POLITICS AND THE LIFE SCIENCES • SPRING 2022 • VOL. 41, NO. 1 and increasingly polarized public could hinder the implementation of evidence-based policies and increase the public's susceptibility to misinformation and disinformation campaigns. Even if people do not take extreme positions on the sources they choose to trust, the end result could be a segmented opinion climate that ultimately jeopardizes social integration and erodes the cultural authority of scientists in the long run.