Moral Rightness Comes in Degrees

Abstract This article questions the traditional view that moral rightness and wrongness are discrete predicates with sharp boundaries. I contend that moral rightness and wrongness come in degrees: Some acts are somewhat right and somewhat wrong. My argument is based on the assumption that meaning tracks use. If an overwhelming majority of competent language users frequently say that some acts are a bit right and a bit wrong, this indicates that rightness and wrongness are gradable concepts. To support the empirical part of the argument I use the tools of experimental philosophy. Results from three surveys (n = 715, 578, and 182) indicate that respondents use right and wrong as gradable terms to approximately the same extent as color terms, meaning that rightness and wrongness come in degrees roughly as much as colors do. In the largest study, only 4 percent persistently used right and wrong as non-gradable terms.


Introduction
There is an undisputed sense in which moral rightness and wrongness come in degrees. As Thomas Hurka explains, 'it is wrong to steal a car and wrong to murder, but murder is more seriously wrong than auto theft, which is more seriously wrong than breaking a promise to have lunch' (Hurka : ). It is also, in a similar sense, more importantly right to eradicate world poverty than to complete a book review on time.
Be that as it may. There is also another, theoretically more interesting sense in which moral rightness and wrongness may come in degrees. According to what I call the gradualist hypothesis, some acts are somewhat right and somewhat wrong. We can think of such acts as being located in a moral gray area. The gradualist hypothesis is compatible with, but does not entail, the idea that rightness and wrongness are vague concepts. An act can be somewhat right and somewhat wrong even if the boundary that separates the gray area from right and wrong is sharp. (Example: an act could be entirely right just in case none of the agent's moral obligations conflict, but somewhat right and somewhat wrong if they do.) By definition, vague concepts allow for Sorites series-that is, for sequences of similar instances in which there is no sharp boundary between correct and incorrect applications of a concept. Miriam Schoenfield suggests that some abortion cases may fit this description: 'Plausibly, we can create a Sorites series, admitting of borderline cases of permissibility, out of a series of abortions in which the fetus' age differ by a day (or a minute, or a second)' (Schoenfield : ).
The argument for the gradualist hypothesis that I offer in this article is based on the familiar idea that meaning tracks use. If an overwhelming majority of competent language users frequently say that some acts are a bit right and a bit wrong, this indicates that rightness and wrongness are gradable concepts. The assumption that meaning tracks use can of course be accepted by people who reject the traditional idea that meaning is use in a strict semantic sense. It is uncontroversial that use is a reliable guide to meaning, but some of us reject Wittgenstein's claim that 'the meaning of a word is its use in the language' (Wittgenstein [] (): e, at §, my emphasis).
If we wish to turn the meaning-tracks-use thesis into an argument for gradualism in ethics, it is not sufficient to show that a few individuals occasionally use right and wrong in a gradable sense. Not every usage of a word is thoughtful and sincere. However, if people persistently and sincerely talk about moral rightness and wrongness as if they allow for degrees, then this would be a strong reason for accepting the gradualist hypothesis. On a strict interpretation of the meaning-tracks-use thesis, an overwhelming majority of competent language users cannot be wrong about their persistent and sincere usage of moral terms.
The meaning-tracks-use thesis is widely endorsed by modern linguists. For instance, computational linguist Katrin Erk explains that so-called vector space models of meaning are based on the observation that, 'we can often guess what a word means from the contexts in which it is used. Thus, we can represent meaning as distribution, as observed contexts' (Erk : ). Vector space models have proven to be empirically successful. By observing how words are used in large corpora, by counting their occurrence in different sentences, computational linguists have developed software that predicts whether two words have similar meaning. (See, for example, Erk : ; Turney and Pantel : ). However, the computational approach requires large quantities of empirical data. Erk points out that, 'many phrases do not occur with sufficient frequency in a corpus to be represented through their distributional contexts' (Erk : ). This includes phrases relevant for assessing the gradualist hypothesis, such as 'this act is somewhat right and somewhat wrong' and 'this act is a bit right and a bit wrong'. To overcome this problem I offer support for the gradualist hypothesis based on data from three surveys (n = , , and ) designed to test whether ordinary people use right and wrong as gradable terms. In the largest survey, no more than  percent of ordinary language users persistently used 'right' and 'wrong' as non-gradable terms. The statistical analysis also indicates that 'right' and 'wrong' are used as gradable terms to approximately the same extent as color terms, meaning that rightness and wrongness come in degrees just as much as colors do. Furthermore, by using multidimensional scaling techniques, it can be shown that rightness and wrongness are used in a gradable sense even if no gradable terms (such as 'right to some degree' or 'somewhat right and wrong') appear in the questions or answer options presented to respondents.

. The Gradualist Hypothesis
The gradualist hypothesis is of recent origin. John Stuart Mill writes that, 'The creed which accepts as the foundation of morals, Utility, or the Greatest Happiness Principle, holds that acts are right in proportion as they tend to promote happiness, wrong as they tend to produce the reverse of happiness' (Mill : , my emphasis). If we read this literally, it follows that acts that produce half as much happiness as the optimal alternative(s) are half-right. However, Mill's gradualism seems to have been a slip of the tongue. In the rest of his writings, he never discusses this implicit notion of degrees of rightness, and he never uses the idea that rightness comes in degrees for supporting other claims or for rebutting objections to his ethical theory.
Many modern consequentialists believe that an act is right just in case no alternative brings about better consequences, and that every act that is not right is wrong. This criterion of rightness is obviously incompatible with the gradualist hypothesis. No matter what the consequences are, every act will be either right or wrong in the binary sense. This holds true even if some consequences are incomparable or on a par, because an act is considered to be right as long as its consequences are not worse than those of any alternative act. However, some consequentialist theories do allow for gradable notions of right and wrong. According to the multidimensional account of consequentialism that I discuss in The Dimensions of Consequentialism (Peterson ), an act's rightness or wrongness depends on several irreducible aspects, such as the total wellbeing produced by the act and the degree of equality with which wellbeing is distributed in society. According to this theory, moral rightness comes in degrees whenever no act is optimal with respect to all moral aspects.
In the example in table  (see Peterson : ) the first act, A, is optimal with respect to total wellbeing but scores poorly with respect to equality. The opposite is true of C. Act B is not optimal with respect to any of these aspects, but it scores pretty well with respect to both of them. Multidimensional consequentialists believe that all three alternatives are somewhat right and somewhat wrong, but B is right to a higher degree than A and C all things considered. Multidimensional consequentialists may, for instance, argue that C is almost entirely right (right to degree . on a scale from  to ) while A and C are half right and half wrong (right to degree .).
Some deontologists also accept the idea that rightness and wrongness come in degrees. In Moral Uncertainty and Its Consequences, Ted Lockhart claims that 'actions come in varying degrees of moral rightness between "right" and "wrong"' (Lockhart : ). He argues that this view is particularly attractive for those who accept W. D. Ross's theory of prima facie duties: 'Even for some deontological theories, we have no great difficulty entertaining a many-valued concept of moral rightness. A prima facie duties theory, for example, may readily Lockhart's proposal is a revisionary extension of Ross's theory, which Ross himself would not accept. Ross is committed to what Rob Lawlor calls a 'threshold account' of rightness, just as mainstream consequentialists: acts that have the right kind of properties make it across the threshold, and those acts are right (or permissible) in the binary sense (Lawlor : ). All other acts are wrong. For mainstream consequentialists, the right-making property is that of having optimal consequences, and for Ross the threshold a right act has to pass is to be fitting to perform in light of all applicable prima facie duties.
Somewhat surprisingly, Lockhart's own proposal is a hybrid view that collapses into a threshold account when only two alternatives are available: 'x has greater degree of moral rightness than y in situation S for agent A just in case, if x and y were the only alternatives open to A in S, then x would be morally right for A in S and y would be morally wrong for A in S' (Lockhart : ).
Lockhart's thesis is compatible both with Hurka's commonsensical version of gradualism mentioned above, and with Mill's gradualist account of utilitarianism. However, as pointed out by Campbell Brown, Lockhart's reductive analysis is unable to compare degrees of rightness across different situations. Brown notes that, 'a person who uses her mobile phone while watching a movie in a public cinema may be acting wrongly, but surely not as wrongly as one who does the same while driving a heavy lorry on a motorway thereby causing a major accident' (Brown : ). The problem for Lockhart is that the acts described by Brown are performed in different situations and can therefore not be compared according to his criterion. To address this problem, Brown develops a more general reductive analysis of degrees of rightness that tallies well with Mill's gradualist theory mentioned above (see Brown ).

. Three Experimental Studies
Should we accept the idea that right and wrong are gradable concepts? Almost everyone agrees that meaning tracks use in the sense endorsed by modern linguists. By studying how people use words and phrases we can, under normal circumstances, make reliable inferences about their meaning. As noted above, this is an epistemic thesis. Whether meaning is use in a strict semantic sense is another issue. The truth of the gradualist hypothesis can thus be assessed by testing the following empirical conjecture: Competent language users persistently and sincerely use gradualist notions of moral rightness and wrongness.
To test this conjecture, I distributed a series of online questionnaires to three sets of respondents: students at Texas A&M University taking classes in engineering ethics in spring  (n = ) and spring  (n = ), and a group of US  M A R T I N P E T E R S O N citizens who voted in the  election (n = ). Students took the surveys for credit (about . percent of the total course grade) while respondents in the third study received between $. and $. in compensation. In all three studies, respondents were invited to answer up to twelve questions. Both the order of the questions and the answer options were randomized. At the request of the Institutional Review Board of Texas A&M University, which approved the study, no demographic information was collected.
Respondents were presented with up to five different types of tasks: abstract, semi-abstract, concrete, comparative, and open-ended tasks (detailed below). By using several different kinds of measures that test a single hypothesis the validity of the measurement instrument can be assessed. If the validity is high, we should expect the results to be roughly the same for all types of tasks.

. Abstract Tasks
Subjects were invited to evaluate abstract statements about ethics, mathematics, colors, scientific evidence, and scientific facts on a seven-point Likert scale ranging from 'strongly disagree' () to 'strongly agree' (). The following example pertains directly to the gradualist hypothesis: A. Moral rightness and wrongness come in degrees. The boundary that separates morally right acts from wrong ones is not always sharp. Some acts are somewhat right and somewhat wrong.
Study : average degree of agreement . (n = , std. dev. .) Study : average degree of agreement . (n = , std. dev. .) Study : average degree of agreement . (n = , std. dev. .) The following abstract statements were used as reference points for comparative purposes: A. In mathematics, truth comes in degrees. The boundary that separates true mathematical statements from false ones is not always sharp. Some mathematical statements are somewhat true and somewhat false.
A. Colors come in degrees. The boundary that separates one color from another is not always sharp. Some color hues are somewhat red and somewhat blue.
Study : average degree of agreement . (n = , std. dev. .) Study : average degree of agreement . (n = , std. dev. .) Study : average degree of agreement . (n = , std. dev. .) All items were followed by a simple comprehension check. Responses from subjects who did not answer it correctly have been excluded from the analysis.
The data sets are not normally distributed. It is therefore appropriate to perform a Mann-Whitney U-test. This is a nonparametric test for the null hypothesis that the distributions of two data sets are identical and therefore have the same median value. Table  summarizes the results for Study , which is less likely than the others to yield significant results due to its smaller sample size. The Mann-Whitney U-test indicates that moral rightness and wrongness is judged to come in degrees to a significantly higher extent (p <., one-tailed) than truth in mathematics is judged to come in degrees, but there is no statistically significant difference between A (moral rightness and wrongness come in degrees) and A (colors come in degrees), not even at p <.. This indicates that rightness and wrongness is judged to come in degrees to approximately the same extent as colors are judged to come in degrees. The results of Study  offer additional support to this conclusion: all comparisons in Study  yield significant results (p < ., one-tailed) except that between colors and moral rightness. However, in Study  the comparison between colors and moral rightness also yields a significant difference (p <., one-tailed), indicating that the extent to which colors and moral rightness is reported to come in degrees is not precisely the same.
The abstract tasks also included the following general statements about scientific evidence and scientific facts: A. Scientific evidence comes in degrees. The boundary that separates theories corroborated by evidence from those that are not is not always sharp. Some scientific theories are somewhat supported by evidence and somewhat unsupported.
A. In science, facts come in degrees. The boundary that separates correct scientific claims from incorrect ones is not always sharp. Some scientific claims are somewhat correct and somewhat incorrect.

 M A R T I N P E T E R S O N
In all three studies, the Mann-Whitney U-test for A versus A indicates that scientific evidence is judged to come in degrees to a somewhat higher extent than scientific facts in all three studies (in Study , U = ., p-value < .; significant at p < .; one-tailed). Moreover, scientific facts are also judged to come in degrees to a significantly higher extent than mathematical truths in all three studies (in Study , U = ., p-value < .; significant at p < .; one-tailed), and moral rightness and wrongness are reported to come in degrees to a significantly higher extent than scientific evidence in all three studies (in Study , U = ., p-value = .; significant at p < .; one-tailed). Finally, colors are reported to come in degrees to a significantly higher extent than scientific evidence in all three studies. (In Study , U = , p-value < .; significant at p < .; one-tailed).
These findings can be represented as a hierarchical order with four ordinal levels. Table  summarizes the results for Study  and . (As noted, in Study  colors come in degrees to a slightly higher extent than moral rightness and should therefore be represented on a separate sublevel.) The difference in agreement in table  between all pairs of levels is statistically significant at p < ., except that between scientific evidence and scientific facts, which is significant at p < ..
An alternative explanation of the results in table  could be that respondents tend merely to report their belief in how much disagreement there is in a certain domain, not that the phenomena themselves come in degrees. To control for this, the following items were included in Study (table ): These numbers indicate that respondents do believe that there is more disagreement on moral issues than on mathematical issues (U = , p-value < .; significant at p < .; one-tailed). This is not surprising. However, respondents also believe that moral rightness and wrongness come in degrees even when there is no disagreement, and they do so to a much higher extent than for mathematical issues. (U = ., p-value < .; significant at p < .; one-tailed). This casts doubt on the alternative explanation, but fits well with the gradualist hypothesis. There is little reason to think that respondents merely reported their belief in how much disagreement there is in a certain domain.

. Semi-abstract Tasks
All three studies included two semi-abstract tasks in which subjects were invited to complete moral statements by selecting one of a set of pre-defined alternatives.
The first of these tasks (n = , , and ) was formulated as shown in table : The gradualist analysis was the most frequently selected answer option in all three studies. The differences in gradualist responses in Study  (. percent), Study  (. percent), and Study  (. percent) can be explained by the fact that respondents were presented with four answer options in Study , five in Study , and six in Study . In Study  about . percent chose the relativist answer option, which was not available in the other studies. The more options respondents have to choose from, the less likely is it that everyone selects the same option. If right and wrong had been binary concepts many subjects could arguably have been expected to favor the Kantian answer option ('always morally wrong') or the utilitarian answer option ('always morally right').
Study  included an answer option designed to capture Hurka's notion of degrees, according to which some acts are more importantly right or more seriously wrong. This answer was selected by no more than . percent of respondents.
. . sometimes right to some degree, but also wrong to some degree.
 In Study , a separate group of respondents was presented with the answer option 'sometimes a bit right and a bit wrong' instead of 'sometimes right to some degree, but also wrong to some degree'. About . percent (n = ) selected 'sometimes a bit right and a bit wrong', compared to . percent (n = ) for the group presented with the option 'sometimes right to some degree, but also wrong to some degree'. The difference between . percent and . percent is not significant (Χ  = ., the p-value is < .; not significant at p < .). This is an indication of robustness. The results reported here do not depend on minor alterations of the wording of the gradualist hypothesis.
Unsurprisingly, the vast majority also reported gradualist responses for the second semi-abstract item S (n = , , and ), as shown in table : A possible explanation of why so many respondents in Study  selected the binary response 'sometimes right and sometimes wrong, but never a bit right and a bit wrong' (. percent) or 'always morally right' (. percent) could be that speeding, but not lying, is viewed as less morally problematic by experienced drivers. US citizens who voted in the  election are on average older than college students and are thus more likely to drive.

. Concrete Tasks
Study  included six concrete tasks in which respondents were invited to assess brief descriptions of particular acts. As these tasks were not designed to study respondents' views on moral relativism or Hurka's hypothesis, the number of answer options was limited to four. In Study  the following task was evaluated by all respondents (n = ): . . sometimes right to some degree, but also wrong to some degree.
.% What John did was right to some degree, but also wrong to some degree.
.% What John did cannot be assessed from a moral point of view. .% Although relatively little information is provided in the vignette, over ninety percent reported that John's act was wrong. The next item (C, n = ) serves as a reference point for what seems to be a case of someone doing something right: C. Jared's colleague Bob struggles to understand a new task for work. Jared has no plans for the evening and volunteers to help Bob to get up to speed. Between  pm and  pm Jared helps Bob to figure out how to solve the new task.
What Jared did was morally right. .% What Jared did was morally wrong.
.% What Jared did was right to some degree, but also wrong to some degree.
.% What Jared did cannot be assessed from a moral point of view. .% The gradualist hypothesis states that some acts are somewhat right and somewhat wrong, not that all are. Therefore, the findings for C and C neither refute nor confirm the gradualist hypothesis. However, data for the following tasks, C-C, support it. For these items, the gradualist answer option 'What [Denise/the captain/Anna/Miriam] did was right to some degree, but also wrong to some degree' was the most frequently selected answer in all three studies (see table . C. Denise is in severe pain. She asks her spouse Adam to drive her to the hospital for treatment. Although he knows that her condition is not life-threatening, Adam drives  miles above the speed limit to bring Denise to the hospital as fast as he can.
C. An experienced airline captain flies through a volcanic ash cloud that cause the engines to malfunction. To prevent panic among the passengers, the captain decides to lie to the passengers: 'The airport at our destination is closed due to bad weather. We will land at a nearby airport. There is no danger'.
C. Anna is proud of a new webpage she has designed for a client, and the client is also pleased with it. Anna asks her junior colleague Josh to share his opinion. Josh does not like the webpage, but because Anna and the client seem to like it Josh decides to lie: 'I think your new webpage looks great, congratulations!'  M A R T I N P E T E R S O N C. Miriam is on her way to the airport. Due to severe congestion on the highway, she realizes that there is a risk she will miss her international flight. When the road finally clears she drives  miles above the speed limit to make up for the time lost because of the congestion. Table  summarizes pair-wise chi-square tests for all combinations of C-C. The degrees of freedom for all comparisons is df = , so for p < . the critical chi-square value is ., and for p < . it is .. Note that all pair-wise comparisons are statistically significant. However, the chi-square values for C and C stand out: they are ten to one hundred times higher than the values for all other items. (See the dashed box in table .) From a statistical point of view, the explanation is that C-C are items in which the gradualist answer option is the most popular one; therefore, C-C have more in common with each other than with C and C.
It is also worth pointing out that the chi-square values for C are four to ten times higher than the corresponding values for C, C, and C. What could explain this? Note that in items C, C, and C some widely accepted norm is violated for a good reason: By violating the speed limit, or by lying, the agent brings about good consequences for others. A large majority reported that such norm violations are somewhat right and somewhat wrong. However, in C the agent violates a norm for what appears to be a selfish reason. Fewer subjects considered this to be

M O R A L R I G H T N E S S C O M E S I N D E G R E E S
 somewhat right and somewhat wrong, and about twice as many considered it to be wrong. Combining the findings for the abstract, semi-abstract, and concrete tasks I found that only  percent in the largest study (Study , n = ) persistently used right and wrong as non-gradable terms.

. Comparative Tasks
Study  included a fourth type of task designed to test the gradualist hypothesis without including gradualist terms such as 'degree' and 'somewhat right' in the vignettes. If gradualist terms appear in questions or answer options, respondents might be more willing to apply such terms than they otherwise would (see Messick and Jackson ; see also the discussion below.) Subjects were asked to make pairwise comparisons between items C-C and a seventh item, C: C. After graduation Zofia decides to do unpaid volunteer work for Engineers without Borders for a couple of months before joining Petersen Consulting in Dallas, TX.
The comparative task was formulated as follows: Pairwise comparisons of seven items, C-C, requires  comparisons. Each respondent was invited to make four comparisons, which yielded  comparative data points (n =  to ). To verify that subjects understood the comparative task correctly, three identical comparisons were included in the questionnaire, C-C, C-C, and C-C. The average dissimilarity reported for these items were, ., ., and ., which indicates a good understanding of the task. Table  summarizes the results: Dissimilarities can be interpreted as distances in an n-dimensional geometric space. The more dissimilar two items are, the farther apart is their location in space. The twenty-one comparisons listed in table  can range over twenty dimensions, but by applying multidimensional scaling techniques, the dimensionality of a multidimensional dataset can be reduced (see Kruskal and Wish ).

 M A R T I N P E T E R S O N
The aim of a classical multidimensional scaling is to represent the original data set by a new set of points in a smaller number of dimensions such that the Euclidean distance between each pair of points in the new set approximates the distance in the original multidimensional data set. Ideally, each pairwise distance (similarity) in the original data set (table ) should be exactly the same as the corresponding Euclidean distance in the new representation. However, as we reduce the number of dimensions some minor errors will typically be introduced into the new representation. This is acceptable as long as the errors are small. Figure  shows a classical multidimensional scaling of table . The maximum error is . units, which is a relatively large error (see below). When interpreting figure , it is important to keep in mind that item C is almost unanimously (. percent) considered to be an example of wrongdoing, whereas C is widely considered (. percent) to be an example of an agent doing something right. It is thus not surprising that C and C are located far apart along the x-axis. C can also be taken to be an example of someone doing something right, so it is equally unsurprising that C is located close to C. The locations of C, C, and C can thus be explained without invoking the hypothesis that rightness and wrongness come in degrees.
However, the locations of C-C cannot be easily explained without assuming that rightness and wrongness come in degrees. Although these items are located somewhat to the left in figure , they are not close to C along the x-axis. Moreover, recall that C-C are items in which it is widely believed that the agent's act was somewhat right and somewhat wrong (. percent for C; . percent for C; . percent for C; and . percent for C). This fits well with their locations in figure : items C-C are, literally speaking, located 'between' the entirely wrong act in item C and the entirely right acts in C and C.
If the binary hypothesis had been true, it would have been possible to represent all seven items along a single dimension, and all items would have been clustered in two distinct areas in the figure: right and wrong. However, the findings in table  cannot be represented in such a way and are thus incompatible with the binary hypothesis.
That said, it remains to explain why C and C in figure  are located above C and C on the y-axis. Note that C and C are cases in which the agent violates the speed limit for a good reason, whereas C and C are cases in which the agent lies for what seems to be a good reason. This suggests the following somewhat speculative interpretation: The more similar the agent's reason for some acts are, the closer Table  Average degree of dissimilarity, ranging from  (very similar) to  (very dissimilar).
 are their locations on the y-axis. If so, the underlying similarities between C and C, and C and C, seem to be visible in the figure, which indicate that data in table  have a reasonable degree of validity. A drawback of the two-dimensional scaling is that the maximum error in figure  is, as noted, relatively large. It is therefore appropriate to consider a three-dimensional scaling. See figure . In this representation, Kruskal's stress value is . x  - , which indicates a good fit. Figure  confirms the conclusion of figure : C (the entirely wrong act) is located far apart from C and C (the entirely right acts), but C-C are located between the entirely right and entirely wrong items. In figure , it is thus also reasonable to interpret the x-axis as a visual representation of an act's degree of rightness. The interpretation of the y-axis is the same as in figure , but we can leave it open how the z-axis is to be interpreted as that is of no importance of the present discussion.
Figures  and  are based on the assumption that all similarities in table  can be represented in a metric space. Because it is hard to know if this assumption is true, nonmetric multidimensional scalings are also worth considering. In this type of representation distances are interpreted as ordinal orderings: the aim is to preserve the ordering among the original points in a lower number of dimensions. Figure  shows a two-dimensional nonmetric multidimensional scaling of data in table . Kruskal's stress value is . x  - , which indicates a good fit. This figure confirms the previous conclusions: C-C is located between C and C and C, which    tallies well with the conclusion that the act C is entirely wrong, while the acts C-C somewhat right and somewhat wrong, and C and C entirely right.
In summary, all three multidimensional representations fit well with the gradualist hypothesis, but they are incompatible with the binary one.

. A Socratic Midwifery Effect?
Study  included an open-ended task in which half of the respondents were invited to write a couple of sentences about a case without relying on any predefined answer options. The other half was invited to select one of the following answer options: 'right', 'wrong', 'right to some degree, but also wrong to some degree', and 'I don't know, I would need more information'. The purpose was to study to what extent gradualist answers occur naturally. The study was conducted in May , during the COVID- pandemic. Below is the case: Anouska's -year-old father is reluctant to take the COVID vaccine because he is concerned about possible side effects. She explains to him that it is primarily younger women who have been affected by severe side effects, so for him the benefit would definitely exceed the risk. In Anouska's opinion, it would be irrational of her father to not take the vaccine. He eventually gives in and schedules an appointment, but only after Anouska pressures him to do so. She also deceives him by exaggerating the benefits and minimizes some of less severe side effects. Evaluate Anouska's behavior from a moral point view. What is your moral conclusion all things considered?
The group asked to answer the question by writing a couple of sentences without relying on any predefined answer options (n = ) submitted , words. All responses were analyzed and categorized manually. Gradualist conclusions were expressed by . percent of respondents, compared to . percent (n = ) in the group presented with predefined answer options. (In the first group . percent reported that it was right to pressure Anouska's father to take the vaccine, compared to . percent in the second group; . percent concluded that it was wrong to pressure Anouska's father to take the vaccine, compared to . percent in the second group; and . percent presented with open-ended tasks stated views that were ambiguous or could not be reasonably classified as all-things-considered verdicts, compared to  percent in the second group who selected 'I don't know, I would need more information'.) Below are some examples of spontaneously submitted gradualist answers from respondents in the first group: • 'What she did was morally grey, but the end result is positive'.
• 'From a moral standpoint, it would probably be both right and wrong, but erring toward wrong'. • 'It's a little right and a little wrong'.
• 'It is a bit right and wrong to do what she did'.

 M A R T I N P E T E R S O N
It is not surprising that respondents used gradualist terms spontaneously, but it is surprising that gradualist responses occur almost three times as often (. percent versus . percent) if a gradualist conclusion is included among a set of predefined answer options. This difference is significant at p > ., one-tailed. What explains the difference?
One explanation could be that respondents' willingness to express gradualist views might be stimulated by a bit of philosophical midwifery. Socrates taught us that philosophers can help people articulate ideas they are unable to express clearly themselves, so I call this the Socratic midwifery effect. According to this explanation, respondents have a tacit but somewhat underdeveloped understanding of the gradualist hypothesis, which is clarified by the presence of a clearly stated gradualist answer option and therefore selected to a higher extent.
However, the findings reported in this paper do not permit us to state any definitive conclusion about the possible significance of a Socratic midwifery effect. About half of the respondents in Study  (n = ) were presented with tasks C, C, and C described above. The rest (n = ) were presented with versions of these tasks in which the original answer options ('a bit right and a bit wrong', and so on) had been replaced with more complex 'midwifery' answer options: 'There are reasons for, but also against, doing what [the agent] did. On balance it was neither entirely right nor entirely wrong; it was a bit right but also a bit wrong / it was right / it was wrong / I don't know'. For task C, the percentage of gradualist answers increased from . percent to . percent (which supports the Socratic midwifery effect), but for C, the percentage of gradualist answers decreased from . percent to . percent, and for C it decreased from . percent to . percent. This speaks against the Socratic midwifery effect and there is no obvious explanation of this anomaly. The significance of the Socratic midwifery effect could be a topic for future research.

. Discussion
The empirical findings favor the gradualist hypothesis over its binary rival. The abstract, semi-abstract, concrete, and comparative measures indicate that if the way in which people use words and phrases is a reliable guide to meaning, then rightness and wrongness come in degrees. However, no empirical study is immune to criticism.
First, one might worry that the empirical findings may not uniquely support the gradualist hypothesis. Other hypotheses might be equally well supported by data. Consider, for instance, the suggestion that although no act is a bit right and a bit wrong, some right acts are right to a greater degree than others, and some wrong acts are more wrong than others. According to this alternative hypothesis, right and wrong function much like hot and cold: some cold objects are colder than other cold objects, but no cold object is a bit cold and a bit hot. Data for the comparisons reported above do not discriminate between this alternative hypothesis and the gradualist hypothesis. All we can conclude from the observation that some items (C-C) are located 'between' an act judged to be entirely wrong (C) and others judged to be entirely right (C and C) in a multidimensional scaling is that M O R A L R I G H T N E S S C O M E S I N D E G R E E S  traditional binary analyses offer a poor fit. The hypothesis that right and wrong function like hot and cold is, however, compatible with this result.
In response to this, the experimentalist may point out that the alternative hypothesis is not compatible with some of the other findings. Note, for instance, that the gradualist answer option 'right to some degree, but also wrong to some degree' was the most common response for items C-C-that is, the items located 'between' acts judged to be entirely right and wrong discussed above. This is evidence against the hypothesis that right and wrong function like hot and cold. If that hypothesis had been true, respondents would not have selected 'right to some degree, but also wrong to some degree'.
That said, it is of course possible to formulate other hypotheses that might fit better with data. Suppose, for instance, that respondents believe that acts are complex wholes composed of multivalent parts or aspects. According to this hypothesis, an act can be right with respect to one of its parts or aspects (for example, respect for autonomy) but wrong with respect to some other part or aspect (for example, fairness), but still be right in the binary sense all things considered. If so, respondents who responded 'right to some degree, but also wrong to some degree' may mistakenly have selected a gradualist phrase for expressing the view that an act is wrong with respect to some but not all of its parts or aspects.
A drawback of this hypothesis is that it offers a poor fit with some of the other findings. Consider, for instance, the responses to item S (table ). The vast majority (. percent in Study ) reported that lying in a situation in which doing so would bring about the best consequences is 'sometimes right to some degree, but also wrong to some degree'. If respondents had believed that acts are complex wholes composed of different parts or aspects, they would arguably have selected 'I don't know' or 'not possible to assess' to a higher extent than they did, as it would have been unclear if the statement referred to the entire act or some of its parts or aspects. However, those answers were selected by no more than . percent and . percent. This indicates that a sophisticated philosophical distinction between wholes and parts does not seem to offer a better explanation of data. Another finding that does not fit well with the whole-part hypothesis is the observation that respondents agreed with the statement that 'moral rightness comes in degrees' to roughly the same extent that 'colors come in degrees' (see above). This is also difficult to reconcile with the part-whole hypothesis if we believe that colors come in degrees in an outright sense.
It is also worth keeping in mind that all theories are underdetermined by data, as pointed out by Willard Van Orman Quine () and others. No matter what evidence one gathers for the gradualist hypotheses, it will always be possible to imagine some alternative hypothesis that fits equally well with the experimental findings. We will never be able to prove with certainty that our preferred hypothesis is the uniquely best one. The modest conclusion of this article should therefore be that the findings reported here offer a better fit with the gradualist hypothesis than do any of the alternative hypotheses discussed so far.
This brings me to what is perhaps the most important worry: Is the fact that ordinary people seem to use right and wrong in a sense that permit for degrees a good reason for revising traditional, binary moral theories? If traditional binary  M A R T I N P E T E R S O N moral theories are meant to capture the same concept we use in everyday moral discussions, and meaning tracks use, then the answer is yes. As noted, defenders of the meaning-tracks-use argument do not have to insist that meaning is use in a strict semantic sense. The key premise is just that use is a reliable indicator of meaning. The observation that people use right and wrong as gradable terms shows that these terms are gradable, regardless of whether meaning is use.
A possible response from binary theorists, such as classic act-utilitarians and some Kantians, could be that the notions of right and wrong described in those theories are technical concepts, just like nearly all central concepts used in scientific theories. It is beyond the scope of this article to discuss this objection at length, but note that the analogy with technical concepts in science might be problematic because scientists use technical concepts for what seems to be a good reason. For instance, the technical concept of heat used in physics enables scientists to express nuanced claims about physical processes that cannot be expressed by the everyday concept. But the binary concepts of right and wrong are less nuanced than their gradualist counterparts. What would the point be of introducing technical notions of right and wrong that are less nuanced than our ordinary concepts? The binary concepts of right and wrong have no additional explanatory power compared to the gradualist ones. The gradualist folk notions of right and wrong allow us to articulate more precise moral verdicts than the blunt, binary concepts employed by traditional moral theories. By asserting that an act is somewhat right and somewhat wrong we can express nuances that get lost if we adopt a binary theory that forces us to conclude that an act is either right or wrong in the binary sense.
Another reason for dismissing the empirical findings as irrelevant could be that it may make no practical difference if the gradualist hypothesis is true or false. Agents always have to choose what to do, and according to this argument choices are ultimately binary. From a deliberative point of view, it would therefore be irrelevant if the gradualist hypothesis is correct. This objection is, however, too quick. In an excellent discussion of moral vagueness, J. Robert Williams () distinguishes between what he calls 'moral oughts' and 'decision oughts'. (A similar distinction is sometimes made in discussions of moral uncertainty. See, for instance, Lockhart : ch. ). The term 'decision ought' is a technical term that refers to what a rational and morally conscientious agent would have most instrumental reason to do if she is motivated entirely by a desire to act in accordance with what morality demands of her. Everyone agrees that if all alternatives open to an agent are entirely right or wrong, then the agent is rationally permitted to perform any of the right acts, but none of the wrong ones. But what about acts that are somewhat right and somewhat wrong? Are we ever rationally permitted to perform such acts? There is no consensus on this, but here are some possible answers: () It is rationally permissible to perform every act a that is not entirely wrong (see Williams , ; Bales ). () It is rationally permissible to perform act a if and only if no alternative act is right to a greater degree. () It is rationally permissible to perform act a if and only if a is not more wrong than right. () If act a is neither entirely right nor entirely wrong, then there is no fact of the matter as to whether it is rationally permissible to M O R A L R I G H T N E S S C O M E S I N D E G R E E S  perform a (see Rinard ). () If act a is neither entirely right nor entirely wrong, then it is rationally permissible to perform act a if and only if a is chosen randomly with a probability that reflects a's degree of rightness (see Peterson ; Williams ).
It is beyond the scope of my purpose here to take a stand on which theory of decision-making defenders of the gradualist hypothesis ought to adopt. However, by considering the five views outlined it becomes clear that it does make a practical difference whether we accept the gradualist hypothesis. An agent who, for instance, believes that some act a is more right than wrong, but not entirely right, may reasonably disagree with a binary theorist who insists that a is right in a binary sense about whether it would be rationally permissible to perform a. This shows that it does matter whether the empirical evidence for the gradualist hypothesis can convince utilitarians, Kantians, and others to abandon traditional binary moral theories.