Background to the Debate (Pulakos)
Despite years of research and practice aimed at improving the performance appraisal and performance management process in organizations, dissatisfaction with the process is at an all-time high. More than 90% of managers, employees, and human resource (HR) heads feel that their performance management processes fail to deliver the results they expected, and many view their current processes as ineffective and/or inaccurate (Corporate Leadership Council, 2012). But even beyond this, study after study has shown that one important component of performance management—that is, the performance review—is dreaded. Formal performance reviews are not only perceived to be of little value, they also can be highly demotivating to even the highest performing employees (Aguinis, Joo, & Gottfredson, Reference Aguinis, Joo and Gottfredson2011; Culbertson, Henning, & Payne, Reference Culbertson, Henning and Payne2013; Rock, Reference Rock2008). Performance appraisal and performance management are not synonymous, but there is often a strong relation between the two. Although performance management systems often incorporate more frequent and informal feedback, it is common in many organizations to retain and to rely heavily on annual reviews of performance. Pulakos and O'Leary (Reference Pulakos and O'Leary2011) have argued that the key reason traditional performance management approaches have failed to live up to their promise of enabling performance is that the mechanics of formal systems—how ratings are done, the documentation required, how goals are set—are inconsistent with the goal of providing frequent, credible, and useful feedback about performance. The changing nature of work these days (e.g., semiautonomous teams, remote work, freelancing, temporary work, etc.) also suggests it is time to rethink our fundamental assumptions about approaches to performance management.
One of the main goals of performance appraisal is and always has been simply to achieve high performance by enabling managers to guide employees to increasingly higher levels of productivity and by motivating employees to do their very best. We have somehow managed to create just the opposite, however: Most employees don't find performance appraisals to be valuable or motivating. Instead, they find performance appraisals and performance management systems frustrating, too bureaucratic, and often not relevant to their jobs. Managers spend a lot of time on formal performance management activities that they likewise believe add little value. The Corporate Leadership Council found that managers and employees spend about 210 and 40 hours, respectively, per year on performance appraisal and performance management activities such as formal goal-setting processes, midyear and year-end reviews, and often extensive rating and calibration processes. For a 10,000-person organization with average salaries, this time, plus the other costs associated with performance management like HR time, training costs, and software, can cost over $35 million dollars annually. This is a very big investment only to disengage employees and potentially undermine performance.
Within the past 2 years, a number of companies have begun contemplating changes to their performance management systems aimed at gaining more value and impact on performance. In fact, we cannot remember a time when so many companies have been uniformly focused on making similar disruptive changes to a major talent system—all at the same time, which is a testament to the ineffectiveness of and dissatisfaction with current performance management processes. However, this has also generated considerable debate about how performance management can and should be changed to gain the most impact on performance. Many organizations have begun changing aspects of their formal performance management systems and introducing training and change management to drive more effective performance management behavior (e.g., real-time feedback, more agile expectations and goals, more collaboration, etc.). Where companies are struggling more is in deciding what aspects of their formal systems to change—with the most controversial and challenging decision surrounding whether to eliminate or retain formal (usually annual) performance ratings. Several high profile companies have opted to eliminate annual performance ratings entirely, such as Eli Lilly, Adobe, and Gap, Inc., but most companies are taking smaller steps, such as introducing more goal-setting and performance check-ins to drive more regular expectation setting and feedback.
The focus on performance ratings, in particular, is noteworthy, especially considering the main point in recent thought pieces on performance management reform (e.g., Pulakos, Mueller-Hanson, Arad, & Moye, Reference Pulakos, Mueller-Hanson, Arad and Moye2015; Pulakos & O'Leary, Reference Pulakos and O'Leary2011), which is that over 25 years of research have shown that formal system changes have no discernable impact on driving more effective performance management behavior or performance (DeNisi & Smith, Reference DeNisi and Smith2014). This may be because formal performance management processes that incorporate or rely on annual performance ratings drive both managers and employees to focus on rating outcomes rather than on how to actually improve and drive effectiveness on the job. Based on the research literature and our collective experience engaging in performance management reform over decades, it is not realistic to think that any intermittent activity-driven evaluation system will necessarily compel more effective performance management behavior or, ultimately, improved job performance. The pros and cons of retaining performance ratings were the subject of a lively, standing-room-only debate at the 2015 Society for Industrial and Organizational Psychology conference in Philadelphia. Given the high interest in this topic, the remainder of this article will recap the points made by the panelists who participated in the debate and advanced a position to keep or to eliminate ratings. The participants in the debate included both academics and practitioners, well-known authors in performance management, highly experienced consultants, and people implementing performance management systems in large organizations in both the private and the public sector. They were allocated to each side of the debate to balance the range of perspectives. They were selected for their knowledge and interest in the topic not because they inherently support one side of the debate or the other. The side for getting rid of performance ratings included Alan Colquitt of Eli Lilly, Kevin Murphy of Colorado State University, and Rob Ollander-Krane of Gap, Inc. The side for retaining performance ratings included Seymour Adler of Aon Hewitt, Michael Campion of Purdue University, and Amy Grubb of the FBI. The debate was moderated by Elaine Pulakos of CEB, who set the background for the debate and provided closing remarks.
Get Rid of Performance Ratings (Colquitt, Murphy, & Ollander-Krane)
There is a long history of research on performance rating and performance appraisal (for reviews, see Bernardin & Beatty, Reference Bernardin and Beatty1984; DeCotiis & Petit, Reference DeCotiis and Petit1978; DeNisi, Reference DeNisi2006; DeNisi, Cafferty, & Meglino, Reference DeNisi, Cafferty and Meglino1984; Ilgen & Feldman, Reference Ilgen, Feldman, Cummings and Staw1983; Landy & Farr, Reference Landy and Farr1983; Milkovich & Wigdor, Reference Milkovich and Wigdor1991; Murphy & Cleveland, Reference Murphy and Cleveland1991, Reference Murphy and Cleveland1995; Wherry & Bartlett, Reference Wherry and Bartlett1982), and although different reviews highlight different strengths and weakness of the methods that are used in organizations to measure job performance via rating scales or other similar measures, it is fair to say that none of these reviews leads to the conclusion that performance rating is particularly successful either as a tool for accurately measuring employee performance or as a component of a broader program of performance management. Nearly a century of research on performance appraisal (see, for example, Austin & Villanova, Reference Austin and Villanova1992) suggests that there is a longstanding history of problems with performance rating and little reason to believe that these problems will be solved in the foreseeable future. The conclusion that performance rating is not working is not solely an academic one; there is evidence of widespread dissatisfaction with performance rating and related techniques in organizational settings; many large organizations (e.g., Accenture, Deloitte, Microsoft, Gap, Inc., Eli Lilly) have abandoned or substantially curtailed their use of performance appraisal (Culbert & Rout, Reference Culbert and Rout2010; Cunningham, Reference Cunningham2014).
Most organizations continue to use performance appraisals and ratings, and there is little sign this is changing, at least in the short term (see Lawler, Benson, & McDermott, Reference Lawler, Benson and McDermott2012). Their use affects the distribution of rewards and other outcomes for millions of employees in organizations. The vast majority (89%) of companies link compensation decisions to performance ratings (Mercer, 2013). Performance ratings are the basis for pay-for-performance systems in most organizations; most organizations “pay for performance ratings.” It is fair to say that tens if not hundreds of billions of dollars in compensation and rewards are riding on the backs of performance ratings. These ratings can have long and lasting effects on employees’ lives and careers in organizations, affecting staffing, promotion, and termination decisions as well as affecting access to other development opportunities. This is serious business, and ratings don't measure up.
The task of accurately evaluating someone's performance is difficult if not impossible. It requires a supervisor to observe the performance of another employee over the course of a year and to collect reports of others’ observations of the same employee. The supervisor, usually with little or no formal training, then sifts, sorts, analyzes, weighs, and aggregates this information to make a judgment about the employee's performance. Supervisors must also exclude from consideration other irrelevant information about this individual and any other judgments that may have been made about the individual in the past, and they must suspend any biases or tendencies they possess while making this judgment. Sounds easy enough, right? Judges in professional sports, for whom this is their life's work, have a far simpler task (e.g., count the number of punches each boxer lands), and yet they frequently disagree with each other (see Stallings & Gillmore, Reference Stallings and Gillmore1972). The point of this is what we are asking managers to do is virtually impossible to do well, especially with the frailties of human beings as measurement instruments.
Performance Appraisal: A Failed Experiment
In this section we will argue that it is time to treat performance rating in organizations as a failed experiment. Despite decades of work on methods of improving performance ratings and hundreds of research articles, there is little evidence of meaningful improvement in performance ratings. It is always possible that some future intervention might lead to substantial improvements, but after nearly a century of dashed hopes and unmet expectations, we believe it is time to call a halt to this sorry business. Scientists rarely like to give up on a problem they have struggled with for so long, but if the ship is sinking, and it is also on fire, it is probably time to head for the lifeboats.
A comprehensive review of all of the shortcomings of performance ratings and of the failed efforts to improve ratings is beyond the scope of this article, but the broad outline of the problems with performance rating and our lack of success in resolving these problems can be summarized under seven headings: (a) the disappointing interventions, (b) the widespread disagreement when multiple raters evaluate the same performance, (c) the failure to develop adequate criteria for evaluating ratings, (d) the weak relationship between the performance of ratees and the ratings they receive, (e) the conflicting purposes of performance rating in organizations, (f) the inconsistent effects of performance feedback on subsequent performance, and (g) the weak relationship between performance rating research and practice in organizations.
The most common suggestion for improving performance rating in organizations has been to improve rating scales. In particular, substantial efforts have been devoted over several decades to making rating scales clearer and making ratings more reliable and valid by adding valid and detailed behavioral information to rating scales (e.g., behaviorally anchored rating scales, Smith & Kendall, Reference Smith and Kendall1963; behavior observation scales, Latham & Wexley, Reference Latham and Wexley1977; Murphy, Martin, & Garcia, Reference Murphy, Martin and Garcia1982). The payoff for this effort was judged to be so meager by Landy and Farr (Reference Landy and Farr1980), both of whom had themselves published numerous articles on the development of behavior-based rating scales, that they called for a moratorium on further rating scale research. Subsequent reviews (e.g., Murphy & Cleveland, Reference Murphy and Cleveland1995) suggested that behavior-based rating scales might have advantages in terms of their face validity and acceptance by users, but there is little evidence that the substantial body of research on the development and use of these scales made performance ratings in organizations more reliable, valid, or useful.
The second most common intervention has been rater training. This research has followed two main themes. First, numerous studies have relied on the strategy of telling raters about so-called rating errors (e.g., leniency, range restriction, halo) and urging them not to commit these error. Second, numerous studies have developed variations of frame of reference training, designed to ensure that raters adopt a common frame of reference regarding target performance dimensions and performance levels (see, for example, Bernardin & Buckley, Reference Bernardin and Buckley1981; Bernardin & Pierce, Reference Bernardin and Pierce1980; Day & Sulsky, Reference Day and Sulsky1995; Pulakos, Reference Pulakos1984). On the whole, both types of training “work” in some sense. If you tell raters not to give ratings that are too high or too low, they are a bit more likely to follow your advice; if you give raters a clearer idea of what the performance dimensions mean and what different performance levels look like, they will show a bit more agreement in their evaluations. However, neither variation on rater training has been successful in markedly improving ratings in organizations (Murphy & Cleveland, Reference Murphy and Cleveland1991, Reference Murphy and Cleveland1995).
Murphy and Cleveland (Reference Murphy and Cleveland1995) noted that the interventions that were most commonly used to improve performance ratings were based on the hypothesis that raters lacked the knowledge or the tools to accurately evaluate performance. They suggested that these interventions were probably based on a faulty diagnosis of the shortcomings of performance appraisal in organizations. As we note in a later section, a substantial body of research suggests that there was little real evidence that raters lack the ability to rate accurately and that the likely causes of the failure of performance appraisal are related to raters’ motivation and goals rather than to the ability of raters to perform this task.
Disagreement Among Raters
One way to improve ratings might be to collect evaluations of performance from multiple raters rather than relying on the judgment of any single rater. Unfortunately, there is consistent evidence that raters do not agree in their evaluations of ratees. Disagreement is probably more substantial when raters differ in terms of their relationship with the persons to be rated (e.g., supervisors vs. peers), but even when raters are at the same level in an organization, they often do not agree in their evaluations (Facteau & Craig, Reference Facteau and Craig2001; Harris & Schaubroeck, Reference Harris and Schaubroeck1988; Heneman, Reference Heneman1974; Murphy, Cleveland, & Mohler, Reference Murphy, Cleveland, Mohler, Bracken, Timmreck and Church2001; Viswesvaran, Ones, & Schmidt, Reference Viswesvaran, Ones and Schmidt1996). There have been intense disagreements about precisely how disagreements among raters should be interpreted and about the more general implications of these disagreements for the psychometric quality of ratings (e.g., Murphy & DeShon, Reference Murphy and DeShon2000; Ones, Viswesvaran, & Schmidt, Reference Ones, Viswesvaran and Schmidt2008), but there is little controversy about the fact that raters do not show the level of agreement one might expect from, for example, two different forms of the same paper-and-pencil test. Viswesvaran et al. (Reference Viswesvaran, Ones and Schmidt1996) suggest that performance ratings obtained from two separate sets of raters in organizations are likely to show correlations in the .50s, hardly the level one would expect if ratings were in fact good measures of the performance of ratees.
In theory, it is possible to obtain ratings from multiple raters and pool them to eliminate some types of interrater disagreement. However, in practice this is often difficult. Rating systems that obtain information from multiple raters (e.g., 360 degree feedback programs) often sample raters from different levels of the organization or raters from outside of the organization (e.g., customers), and there are good reasons to believe that differences in roles and perspectives will introduce systematic disagreements among raters that might not be readily resolved by simply pooling ratings (Murphy, Cleveland, & Mohler, Reference Murphy, Cleveland, Mohler, Bracken, Timmreck and Church2001).
Weak Criteria for Evaluating Ratings
Suppose N raters each evaluate the performance of k subordinates. How do you know whether the ratings they provide do a good job measuring the performance of the people being rated? If multiple raters evaluate the same ratees, interrater agreement measures provide one criterion, and as noted above, the data regarding interrater agreement are far from encouraging. The more common approach has been to evaluate performance ratings in terms of the presence or absence of one or more so-called “rater errors.”
Seventy-five years ago, Bingham (Reference Bingham1939) identified what would come to be called “halo error” in ratings. Other rater errors, including leniency or severity and range restriction, were soon identified. There is a substantial research literature that analyzes rater error measures, particularly measures of halo (Balzer & Sulsky, Reference Balzer and Sulsky1992; Cooper, Reference Cooper1981a, Reference Cooper1981b; Murphy, Reference Murphy1982; Murphy & Anhalt, Reference Murphy and Anhalt1992; Murphy & Balzer, Reference Murphy and Balzer1989; Murphy, Jako, & Anhalt, Reference Murphy, Jako and Anhalt1993; Nathan & Tippins, Reference Nathan and Tippins1989; Pulakos, Schmitt, & Ostroff, Reference Pulakos, Schmitt and Ostroff1986). On the whole, rater error measures are deeply flawed criteria. Different measures of halo, leniency, and the like do not necessarily agree. More important, all of these measures are based on completely arbitrary assumptions about how performance is distributed. Suppose, for example, that I give ratings of 4 to 5 while you give ratings of 2 or 3. This could mean that I am lenient, that you are severe, or that the people who work for me are performing better than the people who work for you are, and in principle there might be no way to determine which explanation is correct. Suppose that I rate my subordinates on four performance dimensions (e.g., planning, written communication, attention to detail, and receptiveness to customer feedback) and that the average intercorrelation among the ratings I give is .50. Is this too high, or is it too low? How do you know how high these correlations should be? Maybe I am committing halo error, or maybe the people who are good at planning really are also good at oral communication. Outside of artificial laboratory environments, there is usually no a priori way of determining whether the correlations among ratings is too high, too low, or just right.
Perhaps the most important critique of halo error measures as criteria is the paucity of evidence that ratings that do not exhibit what we judge to be halo, leniency, or the like are better measures of job performance than ratings that are suffused with “rater errors.” Earlier, we noted that one popular form or training instructed raters not to make halo errors, not to be lenient, and so forth. It is not clear that removing halo, leniency, and the like makes performance ratings better measures of performance, and it is even plausible that efforts to suppress these so-called errors make performance ratings worse (Murphy, Reference Murphy1982; Murphy, Jako, & Anhalt, Reference Murphy, Jako and Anhalt1993).
Borman (Reference Borman1977) argued that under some circumstances, it might be possible to assess the accuracy of performance ratings. The prototypical rating accuracy study would collect ratings under standardized conditions in which all raters viewed the same performance (e.g., using videotaped vignettes) and in which the pooled judgments of multiple experts could be used as a criterion for evaluating accuracy. There is a vibrant literature dealing with the measurement and interpretation of rating accuracy (e.g., Murphy & Balzer, Reference Murphy and Balzer1989; Sulsky& Balzer, Reference Sulsky and Balzer1988), and in general, this literature leads to three conclusions: (a) Rater error measures are virtually useless as indices of rating accuracy; (b) there are multiple ways of defining accuracy in rating, and the different accuracy indices are often essentially unrelated to one another; and (c) outside of tightly controlled experimental settings, it is prohibitively difficult to measure accuracy directly in the field.
In sum, there are two classes of measures we can use in field settings to evaluate ratings: rater agreement measures and rater error measures. Rater agreement measures tell a sorry story, but it is one whose full implications are not fully understood (Murphy & DeShon, Reference Murphy and DeShon2000; Viswesvaran et al., Reference Viswesvaran, Ones and Schmidt1996), whereas rater error measures have no clear value as criteria.
Contextual Effects on Ratings
Although the most widely cited conclusion of Landy and Farr's (Reference Landy and Farr1980) review of performance appraisal research was that rating scales probably do not have a large effect on the quality of rating data, perhaps a more important conclusion was that there are a number of variables other than job performance that have a clear influence on performance ratings (see also Murphy, Reference Murphy2008; Scullen, Mount, & Judge, Reference Scullen, Mount and Judge2003). For example, a number of authors (e.g., Grey & Kipnis, Reference Grey and Kipnis1976; Murphy & Cleveland, Reference Murphy and Cleveland1991, Reference Murphy and Cleveland1995) have suggested that contextual factors, ranging from broad societal factors (e.g., the state of the economy and the labor market) to the organizational context in which ratings are collected (e.g., the climate and culture of the organization), are likely to affect performance ratings. Other authors (e.g., Aguinis & Pierce, Reference Aguinis and Pierce2008; Ferris, Munyon, Basik, & Buckley, Reference Ferris, Munyon, Basik and Buckley2008; Judge & Ferris, Reference Judge and Ferris1993) have noted the importance of the social context of rating such as social power, influence, leadership, trust, social exchange, group dynamics, negotiation, communication, and other issues.
One of the nonperformance determinants of performance ratings that has been most extensively studied can best be described as the political use of rating (Cleveland & Murphy, Reference Cleveland, Murphy, Ferris and Rowland1992; Longenecker, Sims, & Gioia, Reference Longenecker, Sims and Gioia1987; Murphy & Cleveland, Reference Murphy and Cleveland1995; Tziner & Murphy, Reference Tziner and Murphy1999; Tziner, Prince, & Murphy, Reference Tziner, Prince and Murphy1997). Research on the political aspects of performance appraisal suggests that raters pursue a variety of goals when completing performance appraisals and that these goals substantially influence the ratings they give (Murphy, Cleveland, Skattebo, & Kinney, Reference Murphy, Cleveland, Skattebo and Kinney2004). Murphy and Cleveland (Reference Murphy and Cleveland1995) suggest that these goals include (a) task performance goals—using performance ratings to influence the subsequent performance of ratees; (b) interpersonal goals—using appraisal to maintain or improve interpersonal relations between the supervisor and the subordinates; (c) strategic goals—using appraisal to increase the supervisor's and/or the workgroup's standing in the organization; and (d) internalized goals—goals that are the results of raters’ beliefs about how he or she should evaluate performance (e.g., some raters want to convey the image of being a hard case, with high performance standards, like the professor who announces at the beginning of a class that he or she rarely gives “A” grades). On the whole, the goals raters pursue tend to push those raters in the direction of the “Lake Woebegone Effect” (i.e., giving high ratings to just about all employees). High ratings will maximize rewards (thereby potentially maximizing motivation), will preserve good relationships between supervisors and subordinates, and will make the supervisor look good (a supervisor whose subordinates are performing poorly will seem ineffective). In most organizations, there are few concrete rewards for giving accurate ratings and few sanctions for giving inflated ratings.
In our experience working with performance appraisal in real-world settings, a number of reasons are often cited by supervisors or managers for systematically manipulating the ratings they give. These are summarized in Table 1.
Ratings are used for many purposes in organizations (e.g., to determine promotions and salary increases, to evaluate developmental needs, to serve as criteria for validating tests and assessments, to provide documentation for legal purposes), and these purposes often come into conflict (Cleveland, Murphy, & Williams, Reference Cleveland, Murphy and Williams1989; Murphy & Cleveland, Reference Murphy and Cleveland1995). Fifty years ago, Meyer, Kay, and French (Reference Meyer, Kay and French1965) noted the incompatibility of using performance appraisal simultaneously to make decisions about rewards and to provide useful feedback and suggested that the different uses of performance appraisal should be separated by creating different evaluative mechanisms for rewards versus feedback. Murphy and Cleveland (Reference Murphy and Cleveland1995) suggested that uses of performance appraisal to highlight differences between people (e.g., salary, promotion, validation) are fundamentally at odds with uses of appraisal to highlight differences within persons (e.g., identifying developmental strengths and weaknesses).
Cleveland et al.’s (Reference Cleveland, Murphy and Williams1989) survey suggested that most organizations used performance appraisals for multiple conflicting purposes and that this attempt to satisfy incompatible goals with performance appraisal was one source for the continuing evidence of dissatisfaction with performance appraisal in organizations. Some goals might be compatible. For example, use of performance appraisal for salary administration is not necessarily incompatible with using performance appraisal as a tool for validating selection instruments (both uses focus on differences between ratees). However, even when there is no underlying conflict among uses, the particular dynamics of one use of appraisals (e.g., for determining raises) might limit the value of the appraisal process for other uses (e.g., the range restriction that is likely when most ratees receive high ratings might limit the value of appraisals as criteria for validating selection tests).
Feedback Is Not Accepted or Acted On
One of the core assumptions of virtually all performance management programs is that if people receive feedback about their performance, they will be both motivated and empowered to improve (assuming that appropriate organizational supports and incentives are in place to support this improvement). This assumption does not hold up well, at least in the context of performance appraisal. There is evidence that employees dislike giving or receiving performance feedback (Cleveland, Murphy, & Lim, Reference Cleveland, Murphy, Lim, Langan-Fox, Cooper and Klimoski2007). The performance feedback they receive is often inconsistent and unreliable (Murphy et al., Reference Murphy, Cleveland, Mohler, Bracken, Timmreck and Church2001). Even if feedback is reasonably accurate, the effects of feedback are inconsistent, sometimes improving subsequent performance, sometimes making things worse, and sometimes having little discernible effect (Kluger & DeNisi, Reference Kluger and DeNisi1996).
Accurate feedback requires accurate evaluations of performance, and there is little good reason to believe that performance appraisals in organizational settings provide this sort of accuracy. However, even if performance feedback were accurate, there is little reason to believe that it would be consistently accepted and acted on. It is well known that peoples’ self-ratings of performance are consistently higher than ratings from supervisors or subordinates (Harris & Schaubroeck, Reference Harris and Schaubroeck1988; Heneman, Reference Heneman1974).
Differences in self-ratings versus others’ ratings are not a shortcoming of performance appraisal per se but rather a reflection of broadly relevant processes in the way we understand our own behavior versus the behavior of others. For example, there is a fundamental attribution error (Ross, Reference Ross and Berkowitz1977) that makes us likely to attribute our own successes to internal factors (e.g., skill, effort) and our own failures to external ones (e.g., luck, lack of opportunity) but to make the opposite attributions when others succeed or fail. In other words, we are likely to take credit when we succeed and to avoid blame when we fail but to see others’ success and failure in starkly different terms, making it likely that we will view our own performance favorably even when we have in fact performed poorly. As a result, feedback that is unbiased and accurate will sometimes seem harsh and unfair.
Performance appraisal and performance management systems are often built on assumptions that may not be warranted. The first assumption is that employees want to be rated. Managers assume employees want to “know where they stand,” and they assume this means employees want some form of rating or relative comparison. This makes sense. People compare themselves with others. This is the basis for social comparison, equity, and gaming theory (see Adams, Reference Adams and Berkowitz1965; Bryson & Read, Reference Bryson and Read2009; Festinger, Reference Festinger1954). However, this doesn't mean that people want to be compared with others by third parties. It is also true that people generally think they are above average (Meyer, Reference Meyer1980), so most feel they will benefit from these ratings and comparison. Although employees may say they want to be rated, their actions suggest otherwise. Barankay (Reference Barankay2011) showed that a majority (75%) of people say they want to know how they are rated (or ranked in this case), but when given the opportunity to choose between a job where their performance was rated against others and one where it was not, most choose the work without ratings.
The second assumption is that performance ratings motivate employees—those getting poor ratings are motivated to improve, and those getting top ratings are motivated to keep them. This assumption comes from labor economics and tournament theory (see Connelly, Tihanyi, Crook, & Gangloff, Reference Connelly, Tihanyi, Crook and Gangloff2014). Research suggests this assumption does not always hold up. Although some research finds ratings improve motivation and performance (see Blanes i Vidal & Nossol, Reference Blanes i Vidal and Nossol2009), other research suggests they can hurt both motivation and performance, especially in collaborative work environments and when ratings are tied to differential rewards (see, e.g., Casas-Arce & Martinez-Jerez, Reference Casas-Arce and Martinez-Jerez2009). On balance, ratings seem to hurt more than they help.
Performance feedback represents the ultimate lose–lose scenario. It is extremely difficult to do well, and if it was done well, the recipients would be likely to dismiss their feedback as inaccurate and unfair. It is no wonder that supervisors and subordinates alike approach performance appraisal with a mix of skepticism and unease.
Weak Research–Practice Relationships
Decades of research and hundreds of articles, chapters, and books dealing with performance appraisal seem to have little impact on the way performance appraisal is done or the way information from a performance appraisal is used in organizations (Banks & Murphy, Reference Banks and Murphy1985; Ilgen, Barnes-Farrell, & McKellin, Reference Ilgen, Barnes-Farrell and McKellin1993). Banks and Murphy (Reference Banks and Murphy1985) suggest that this is due in part to a lack of fit between the questions that are of greatest interest to researchers and the problems practitioners need to solve. For example, a significant portion of the performance appraisal studies published in the 1980s and 1990s were concerned with cognitive processes in appraisal (e.g., attention allocation, recognition, and recall of performance-related information; DeNisi, Reference DeNisi2006). For psychologists, these are very important and interesting questions, but they are not questions that are necessarily relevant to the practical application of performance appraisal in organizations. It is worth noting that the lack of a solid research–practice relationship is not solely the result of academic disengagement from the realities of the workplace. Murphy and Cleveland (Reference Murphy and Cleveland1995) presented a list of practical applications and research-based suggestions for improving performance appraisal, and there is little evidence that these were ever taken up and applied by practitioners.
One potential explanation for the failure of performance appraisal research to address the problems of performance appraisal is that these problems may be more intractable than we want to admit. Several researchers (e.g., Banks & Murphy, Reference Banks and Murphy1985; Murphy & Cleveland, Reference Murphy and Cleveland1995) have suggested that the shortcomings of performance appraisal are not an indication that raters cannot effectively evaluate job performance but rather an indication that raters and more generally organizations do not want to solve the problem. For example, organizations do not appear to do anything to reward supervisors who do a good job with performance appraisal or to sanction those who do a bad job. Accuracy in performance appraisal is likely to make subordinates unhappy and demotivated, and it has the potential to make subordinates look bad. Indeed, if a manager's job is to help get the most out of his or her subordinates, a case can be made that performance appraisal (even at its best) interferes with that job and that a smart manager will do all he or she can to subvert and ignore the appraisal system (Murphy & Cleveland, Reference Murphy and Cleveland1995).
Institutional theory provides some insight into the difficulty of changing performance appraisal and performance management practices in organizations by means of academic research. Performance management and rating practices get embedded and become hard to change. These practices get institutionalized in organizations, and companies copy and adopt them because they are “the standard.” Companies also imitate and copy them in order to achieve legitimacy (Pfeffer, Reference Pfeffer2007; Scott, Reference Scott1995). Social relationships created among organizational members (especially senior leaders who are members of each other's boards) also contribute to this conformity and imitation behavior. General Electric's (and Jack Welch's) adoption of forced ranking systems in the 1990s is a good example.
Additional Negative Consequences of Ratings
Finally, there are several additional potential drawbacks of ratings that were not mentioned above. First, ratings sometimes bring us unpleasant news about diversity differences (see meta-analysis of race differences by McKay & McDaniel, Reference McKay and McDaniel2006). Second, they are the focus of many lawsuits on topics such as promotion, compensation, and terminations. Third, they are often not helpful when you really need them, such as in terminations and downsizing, when you find they have not sufficiently documented poor performance or have too much skew to be useful for decision making.
Get Rid of Ratings but Tread Carefully
Most companies stay with ratings because they don't know what else to do and because ratings are inputs to so many other processes, especially rewards. How will organizations distribute rewards and make promotion and termination decisions without performance ratings? Rest assured these are all solvable problems. These decisions can all be made without performance ratings and will probably be better without them. However, it is also important to realize there are also some significant philosophical questions wrapped up in a decision to abandon performance ratings that are beyond the scope of this article. Most organizations have pay-for-performance philosophies and see themselves as meritocracies. In most companies, performance ratings are the primary way of determining merit. Well over 90% of companies tie pay to performance to at least some extent, and 82% link individual performance to compensation (Mercer, 2013). How will organizations determine merit without ratings? How do organizations design performance-based rewards systems without individual performance ratings? The answer is they don't—at least not individually based reward systems. Companies should not abandon performance ratings without wrestling with these issues. There are effective solutions to these problems as well, but we will save these arguments for another day (or article).
Why Getting Rid of Performance Ratings Is a Bad Idea (Adler, Campion, & Grubb)
Performance evaluation is a complex, difficult, and often unpleasant part of management but is essential in some form if an organization wants to be a meritocracy. In this section, we argue that eliminating performance ratings is probably not the solution for meaningfully addressing the widespread dissatisfaction with performance management. We start by clarifying the point of the debate. Then we argue that performance is always rated in some manner, and being difficult to do is not an excuse. We then summarize the merits of ratings (no pun intended) and the value of differentiated evaluations. We suggest that artificial tradeoffs are driving organizations to abandon ratings. We then describe the alternatives to performance ratings, most of which are even less pleasant. Finally, we provide some suggestions for the improvement rather than the abandonment of performance ratings.
Performance Ratings ≠ Performance Management
Let's start with a clarifying observation on the terms of this debate because some of the issues described above have ranged outside the scope. When addressing the question of using or discarding performance ratings within the larger context of performance management and all that ails it, it may stand out as odd (and possibly unproductive; see Landy & Farr, Reference Landy and Farr1980) to have the conversation focused on just a single element of performance management: the use of performance ratings. There are a myriad of complex and interwoven issues that need untangling and restringing when it comes to the broader and more significant task of effectively managing an employee's performance (e.g., Banks & Murphy, Reference Banks and Murphy1985; Curtis, Harvey, & Ravden, Reference Curtis, Harvey and Ravden2005). But instead, our field seems to have focused on perhaps the least meaningful element of performance management but one that is most comfortable for I-O psychologists—measurement, also known as the ratings. A prime illustration of this disproportionate focus is the extensive summary of the 75-year history of empirical literature on ratings presented above by the “antiratings” side.
To be sure, HR professionals, leaders, and I-O psychologists all widely agree that how performance management—including the assignment of performance ratings—is practiced within the business environment is badly in need of improvement. However, we also need to recognize that, fundamentally, the goal of an organization is to perform well and that this goal necessarily requires the individuals within the organization to perform well. The evaluation of individual performance then becomes an indispensable component of measuring and achieving organizational success. To us, this logic, however painful, is inescapable.
Let's be clear, then, about the center of the debate. This is not about whether there is value in core aspects of performance management (e.g., goals, regular constructive feedback, development focus, etc.). We all agree with that. The issue is whether we should continue to attempt to collect a psychometrically meaningful quantitative performance index of some sort.
Performance Is Always Evaluated
Whether called ratings, standards, or judgments, leaders and employees make decisions and evaluations about the performance, skills, attributes, and assets of those around them all the time. Judgments that employees “don't have it,” “have it,” or “knock it out of the park” are made every day in organizations. Dozens of conceptualizations of what can and should be measured within an individual performance system have been put forth in the literature and in practice (five-, four-, three-, two-point rating scales; competency based, production based, objective based, task based, behavior based, etc.; e.g., Benson, Buckley, & Hall, Reference Benson, Buckley and Hall1988; Bommer, Johnson, Rich, Podsakoff, & MacKenzie, Reference Bommer, Johnson, Rich, Podsakoff and MacKenzie1995; Borman, Reference Borman1979; Catano, Darr, & Campbell, Reference Catano, Darr and Campbell2007; Fay & Latham, Reference Fay and Latham1982; Goffin, Jelley, Powell, & Johnston, Reference Goffin, Jelley, Powell and Johnston2009; Jelley & Goffin, Reference Jelley and Goffin2001; Johnson, Reference Johnson2001; Latham, Fay, & Saari, Reference Latham, Fay and Saari1979; Thompson & Thompson, Reference Thompson and Thompson1985; Tziner & Kopelman, Reference Tziner and Kopelman2002). Irrespective of the manner in which evaluations happen, judgments are made and decisions about rewards and roles in the future are based on those judgments. For corporate success, these judgments should be grounded in something tangible and documented and, ideally, something that furthers the performance of the organization as a whole. The organization needs to have a common language, common perceptions of what is important, and a starting point of where the population is in different skills, achievements, and attributes in order to manage individual and collective performance forward. If there is no starting point metric, how can an organization gauge its progress? These judgments, standards, and evaluations are all essentially what we mean by the term “ratings,” whether or not they are labeled as such.
For those organizations that promote a “development culture” and feel they need to move away from using ratings to do so, we remind them that judgments of where people are in their professional growth are still being made. Development conversations start with a picture of where employees are now, where they are going, and the actions needed to close that delta. The “where they are now” is a judgment on some evaluation scale. Although assigning or being assigned a “rating” may be uncomfortable for some (Rock, Davis, & Jones, Reference Rock, Davis and Jones2014), the science of behavioral change based on control theory also tells us that some discomfort, some feedback on the gap between the “now” state and the desired future state is required to stimulate behavior change and development (Carver, Sutton, & Scheier, Reference Carver, Sutton and Scheier2000). Indeed, in organizations with a genuine development culture, detailed evaluative ratings feedback is not the threatening beast as characterized in recent popular press. In organizations with a genuine developmental culture, employees—and especially high-potential employees and particularly Millennials—manage their own development and proactively seek out feedback on areas of growth and improvement. Those areas can only be identified through the measurement of performance evaluations against established standards, a process otherwise known as rating.
Reflecting the ubiquity and inescapability of evaluative judgments of performance, we should recognize that many—perhaps most—organizations that have trumpeted their “daring” elimination of ratings, have actually done no such thing. Many continue to have managers enter a number into the performance management system to represent a recommended percentage increase in salary or a recommended percent-of-target-bonus award. These entries are intended to quantitatively reflect prior year performance—we would call that activity “rating.” Other organizations claiming to have eliminated ratings continue to use some sort of traditional rating scale but have only eliminated imposing a forced distribution due to employee reactions (Blume, Baldwin, & Rubin, Reference Blume, Baldwin and Rubin2009; Chattopadhayay & Ghosh, Reference Chattopadhayay and Ghosh2012; Schleicher, Bull, & Green, Reference Schleicher, Bull and Green2008). Others have simplified their rating process, typically reducing the number of dimensions rated (e.g., fewer individual goals rated, a one-item global rating of the “how” of performance in lieu of rating a long list of individual competencies). Still others have simply changed the wording of the dimensions rated. For example, some organizations have substituted the term “overall impact” for “overall performance.” One more creative example is Deloitte's recent adoption of a three-item scale (a fourth item measures promotion potential) instead of more traditional performance rating formats (Buckingham & Goodall, Reference Buckingham and Goodall2015). A sample item is “Given what I know of this person's performance, I would always want him or her on my team.” Many of these organizations have made laudable improvements in aspects of performance management—but they surely did not “get rid of ratings.”
“Too Hard” Is No Excuse for I-O Psychology
Many in the field seem to have turned the conversation about performance management into a lament that “performance is too hard to measure,” which therefore leads to the suggestion that we abandon the science and practice of defining and measuring what “performing well” means. To this we say, differential psychology is, and always has been, a core—arguably, the core—discipline of our field (Chamorro-Premuzic, Reference Chamorro-Premuzic2011). Many (most?) I-O psychologists spend their careers attempting to quantify how people differ in attributes as varied as knowledge, skill, engagement, safety orientation, narcissism, and—yes—job performance. We have a well-developed set of psychometric principles and tools that allow us to evaluate and improve the reliability and validity of our measures.
Admittedly, job performance is a complex construct (Campbell, McCloy, Oppler, & Sager, Reference Campbell, McCloy, Oppler, Sager, Schmitt and Borman1993). Its definition varies across organizational contexts and roles. There are almost as many ways that job performance has been measured as there are jobs (Aguinis, Reference Aguinis2013). When assessed, the psychometric characteristics of measures of job performance range widely, with most measures showing weak-to-moderate reliability and construct-oriented validity; this is well documented in the prior section of this article. But a history of measurement challenges is not an excuse for abandoning the differential psychology of performance for the “armchair” assignment of idiosyncratic, vague qualitative descriptors of performance. Giving up on the effort to improve the differential measurement of performance should simply not be an option for either I-O scientists or practitioners. Note also that for many roles across a large segment of the workforce, including for those in government and other organizations, there is actually no real option of abandoning ratings, whatever their flaws; ratings are mandated by law or regulation (e.g., civil service merit-principle requirements) or traditional practice (e.g., think teacher evaluations). It is therefore critical that our field continues to explore supportable ways to rate individual performance.
What Are the Merits of Ratings?
Are there any merits to having a psychometric index of job performance? Although not perfect, there is evidence supporting the reliability and validity of performance ratings. For example, meta-analytic research shows at least modest reliability for performance ratings (Rothstein, Reference Rothstein1990; Viswesvaran et al., Reference Viswesvaran, Ones and Schmidt1996); if ratings can be pooled across many similarly situated raters, it should be possible to obtain quite reliable assessments. Despite the current debate, there is probably general agreement among I-O psychologists that performance ratings, if done well, share at least some meaningful true score variance with actual job performance. If collected for research purposes, performance ratings have been shown to be a useful criterion in thousands of validation studies of selection procedures in the history of our profession (Schmidt & Hunter, Reference Schmidt and Hunter1998) and are recommended in our testing principles (Society for Industrial and Organizational Psychology, 2003).
Applied psychometrics is what I-O psychologists are good at, and it brings great value in other domains. As examples, consider how well it helps us measure the knowledge and skills of candidates in personnel selection or measure the attitudes and beliefs of employees in engagement surveys. Would we really be better off not to at least attempt to generate a psychometric index to add to the performance management system? Would we really be better off with just unreliable qualitative data?
Performance rating that depends on the subjective judgments of one or more raters is not necessarily the preferred method of evaluating job performance; it is the only feasible method for attacking this problem. Objective measures (e.g., absenteeism, production counts) are almost always deficient criteria (Landy & Farr, Reference Landy and Farr1980). Frequent informal feedback, something that is often suggested by proponents of performance management (e.g., Aguinis, Reference Aguinis2013), would be difficult to validate in any systematic way. Establishing the job relatedness of informal feedback would be difficult, and establishing the fairness and freedom from bias of an informal evaluation system would be even more daunting. Like it or not, performance rating by supervisors or by some other source that is knowledgeable about the job (e.g., peers) remains our best bet for obtaining useful measures of performance.
The Value of Differentiated Evaluations
Research has consistently shown that differentiating rewards and recognition based on quantified performance differences actually contributes to enhanced individual and organizational performance. Four closely related lines of research have consistently shown the following:
• Strong performers are differentially attracted to organizations that recognize individual contributions (e.g., Menefee & Murphy, Reference Menefee and Murphy2004). For instance, Trank, Rynes, and Bretz (Reference Trank, Rynes and Bretz2002) demonstrated that high achievers have a stronger preference for organizations that apply individual rather than group pay systems, emphasize praise and recognition for individual accomplishment, and employ fast-track promotion practices for top performers. Not surprisingly, strong performers consistently choose organizations that have a reputation for pay-for-performance systems over fixed salary compensation systems (e.g., Cadsby, Song, & Tapon, Reference Cadsby, Song and Tapon2007).
• Strong performers will leave organizations that do not differentially reward strong performance (e.g., Allen & Griffeth, Reference Allen and Griffeth2001). As an example, Trevor, Gerhart, and Boudreau (Reference Trevor, Gerhart and Boudreau1997) showed that low salary growth is related to high turnover among top performers, relative to low performers.
• Weak performers will leave organizations that strongly reward high performance (e.g., Lazear, Reference Lazear1986). Corresponding to the finding regarding low salary growth, Trevor, Gerhart, and Boudreau (Reference Trevor, Gerhart and Boudreau1997) showed that high salary growth is related to higher turnover among low performers but to low turnover among top performers.
• Weak performers are more likely to stay with an employer when pay-for-performance differentiation is weak (e.g., Harrison, Virick, & William, Reference Harrison, Virick and William1996). For example, Williams and Livingstone (Reference Williams and Livingstone1994) showed that the relationship between performance and turnover is significantly weaker when rewards are not linked to performance than when they are linked to performance.
Recent CEB Corporate Leadership Council (2014) research, based on a sample of over 10,000, showed that Millennials are particular motivated by relative, rather than absolute, performance feedback, compared with the broader employee population. This research shows that, in contrast to the myth that characterizes Millennials as more collaborative, they are actually more competitive and have a strong desire for performance evaluation systems that provide clear differentiation from peers. One conclusion, then, is that the importance of performance differentiation may indeed grow as Millennials become the core segment of the working population.
Artificial Tradeoffs Are Driving Organizations To Abandon Ratings
If the benefits of measuring differences in performance and rewarding differentially based on those performance differences are so clear, why have some small number of organizations abandoned the attempt to measure individual differences in performance? We believe this recent trend (if there is one) stems at least in part from a set of falsely dichotomized choices:
Quantification versus humanization
There is a belief that attaching a rating to the quality of an individual's job performance somehow dehumanizes that individual, reducing him or her to a number. That indeed is a risk. How often have we heard employees referred to as “he's a 2” or “she's a 4” instead of the accurate and appropriate “her performance last year was a 4; it exceeded the established goals and expectations.” Quantification can be dehumanizing but does not have to be. Managers need to learn and apply the language of effective performance management; indeed, the impact of feedback on performance improvement is strongest when that feedback is focused on performance and on partnering with the employee on the means of enhancing performance rather than on labeling performance as an inherent quality of the performer. The fact that rating labels are misused is not a reason to avoid ratings. There is a large research literature in our field on how to improve reactions to performance evaluation (e.g., Bobko & Colella, Reference Bobko and Colella1994; Cawley, Keeping, & Levy, Reference Cawley, Keeping and Levy1998; Dipboye & Pontbriand, Reference Dipboye and Pontbriand1981; Dorfman, Stephan, & Loveland, Reference Dorfman, Stephan and Loveland1986; Giles & Mossholder, Reference Giles and Mossholder1990; Ilgen, Peterson, Martin, & Boeschen, Reference Ilgen, Peterson, Martin and Boeschen1981; Pichler, Reference Pichler2012; Silverman & Wexley, Reference Silverman and Wexley1984).
Bad performance management versus no performance ratings
In many—perhaps even most—organizations, performance management practices are not well-implemented, as both sides of this debate concede. As argued by the antiratings side above, appraisal forms are often overengineered in an attempt to satisfy multiple organizational objectives (e.g., rewards, developmental planning, succession, legal and regulatory compliance, etc.). Managers are neither selected nor adequately trained to manage performance in many instances (Murphy, Reference Murphy2008). Rating guidelines are often confusing, poorly understood, and inconsistently applied across raters and, within raters, across ratees. Organizations at times require ratings to fit distribution curves, which may help reduce leniency (Scullen, Bergey, & Aiman-Smith, Reference Scullen, Bergey and Aiman-Smith2005) but may not represent the true distribution of performance (Aguinis & O'Boyle, Reference Aguinis and O'Boyle2014, but cf. Beck, Beatty, & Sackett, Reference Beck, Beatty and Sackett2014). All these flaws can be and in many organizations are being addressed (see the example of Cargill in Pulakos et al., Reference Pulakos, Mueller-Hanson, Arad and Moye2015). There is no reason to add to bad performance management practice by eliminating a core area of managerial accountability.
Ratings versus conversations
The value of active, ongoing, constructive feedback, performance strategy, and coaching conversations between managers and employees has been long recognized (Meyer et al., Reference Meyer, Kay and French1965). No one disputes that. However, there is no reason to believe that eliminating systematic, contextually bounded, and documented performance evaluations will result in an increase in the frequency or quality of those conversations. Where is the evidence that the only reason managers do not conduct more frequent, more engaging, and more impactful conversations is that they are too busy filling out appraisal forms? Opponents of ratings cite a recent Korn Ferry study that indicated, across companies, the competency consistently rated lowest (67 of 67 on the Lominger list!) is growing talent (Rock et al., Reference Rock, Davis and Jones2014). It would seem to be naïve to think that, relieved of the burden of ratings and without the “crutch” of a structured feedback tool, managers will somehow overcome this weakness and consistently engage in positive and impactful conversations. Indeed, one important mechanism for assuring that quality ongoing performance conversations occur—and acknowledging Murphy's (Reference Murphy2008) argument that it is more a matter of manager motivation than skill—is by setting goals as well as evaluating and rewarding managers for how effectively they manage the performance of their own subordinates.
Feedback versus motivation
Feedback—if provided destructively—can indeed be a demotivator (Rock, Reference Rock2008). Poorly communicated feedback is an aspect of poor leader performance. Effective feedback is motivating. Research has shown that feedback can stimulate the development of more effective performance strategies. Brockner (Reference Brockner1988) has shown that people differ in behavioral plasticity and that, although some do have difficulty internalizing feedback, others demonstrate meaningful behavioral change when provided with feedback. Further, opponents of ratings are fond of citing Dweck's (Reference Dweck2006) mindset work in support of eliminating ratings. However, that research actually shows that it is not performance feedback per se that is the issue; it is the interpretation of that feedback that is key. Those with a learning mindset benefit from performance feedback; it is just that they use that feedback constructively to improve their skills and ultimately their performance. Effective performance management requires the constructive communication of specific, actionable feedback, not the avoidance of feedback.
In other words, we are arguing that organizations can have it all—if they apply our science and effective performance management practice. Managers can quantify individual differences in performance and treat their employees as partners in performance management. They can design administratively simple but impactful performance management practices and train managers to apply those practices to drive better individual and collective performance. They can learn to validly assess performance and, based on those assessments, engage employees in frequent, constructive dialogues in ways that are empowering, motivating, and inspiring to improve performance and employee engagement.
What Are the Alternatives to Performance Ratings?
Let us consider the consequences of living without ratings. Consider the potential answers to questions such as the following:
• Will performance measurement be more precise without a rating? This seems unlikely because the only measurement will be narrative data, which are inherently less reliable.
• Will merit-based management be improved? With no measure of merit other than narrative comments, it is hard to imagine how merit-based management, like pay increases and promotions, would be improved.
• Will employees be more motivated to work hard? Antagonists of ratings argue that eliminating ratings will indeed be more motivating, but that is yet an untested hypothesis. It is certainly true that most employees prefer not to be rated (Cleveland et al., Reference Cleveland, Murphy, Lim, Langan-Fox, Cooper and Klimoski2007), but achieving higher ratings (and avoiding low ones) is very motivational.
• Will lawsuits be easier to defend? Without any measure of performance, it would likely be more difficult to defend HR decisions based on performance. However, the existence of diversity differences in performance ratings is sometimes used in an attempt to establish a claim of discrimination. In addition, if the organization does not use the performance ratings consistently and fairly to determine HR decisions, this might create more of a liability than a defense. So the answer to this question may be uncertain.
• How would we make compensation decisions? Across-the-board increases serve little value in motivating higher performance. Step increases based on time in grade are similarly limited. Standard of living increases are the least desirable way to allocate pay increases because they imply that increases are somehow required of the organization to keep pace with the cost of living and not related to individual performance. Organizations can only give increases when there is a change of job or responsibilities. This is not uncommon, especially in small organizations, but it not as useful for improving performance because of the limited availability of higher level jobs. Finally, but by no means the last possibility, management could use subjective judgment based on the narrative-only performance appraisal. This would maintain the appearance of a link to performance but would likely be difficult to justify to employees getting smaller pay increases.
• How would we make promotion decisions? We could rely on selection tools and not past performance. However, past performance in the same company in similar jobs is often considered to be an excellent predictor of performance after promotion (Beehr, Taber, & Walsh, Reference Beehr, Taber and Walsh1980; DeVaro, Reference DeVaro2006), and employees have a strong expectation that past performance should be a factor considered in promotion (De Pater, Van Vianen, Bechtoldt, & Klehe, Reference De Pater, Van Vianen, Bechtoldt and Klehe2009). We could just use seniority, like most union contracts require. This may be viewed as fair by some employees, especially in environments with low trust in management, but it does not ensure that the highest performers or the most skilled are promoted. We could rely on succession management systems, which refer to management planning discussions of promotion potential, but the results of these are usually not shared with employees and thus do not serve the purposes of motivating performance or ensuring fairness perceptions. We could use promotion panels to read the narrative appraisals and recommend promotions. This is common in some settings, especially in the public sector, and has many advantages, such as considering a wide range of inputs, multiple decision makers, and a consistent process, but this is very costly in terms of management time. Finally, to complete this nonexhaustive list, we could qualitatively consider past performance based on the narrative-only appraisals. Again, this would maintain the appearance of a link to performance but would likely be difficult to justify to the employees who are not promoted without any metric of performance.
The Path Forward
Perhaps the better question than whether or not to get rid of ratings is this: How could performance ratings be improved? Many choices of rating scales and expected distributions of ratings might help, depending on the context and culture. A recent trend with considerable merit is the formal use of “calibration,” which refers to meetings among managers for the purpose of justifying and comparing the proposed performance ratings of employees to ensure their accuracy and comparability across business units (Catano et al., Reference Catano, Darr and Campbell2007; Church, Reference Church1995; DeNisi & Pritchard, Reference DeNisi and Pritchard2006). Another increasing trend is competency modeling, which involves (in part) the explicit establishment of behavioral descriptions of performance expectations, at each rating level, for important job competencies (M. A. Campion et al., Reference Campion, Fink, Ruggeberg, Carr, Phillips and Odman2011; Catano et al., Reference Catano, Darr and Campbell2007; Halim, Reference Halim2011). Yet another trend worth considering is the use of 360 feedback as an input to performance management (Bracken & Rose, Reference Bracken and Rose2011; M. C. Campion, Campion, & Campion, Reference Campion, Campion and Campion2015; Gilliland & Langdon, Reference Gilliland, Langdon and Smither1998; Latham, Almost, Mann, & Moore, Reference Latham, Almost, Mann and Moore2005; Reilly & McGourty, Reference Reilly, McGourty and Smither1998). Multirater 360 feedback in the form of either formal ratings or informal input may be especially relevant as work becomes more team based, less hierarchical, and more customer focused, as other changes result in additional parties having performance information on an employee, and as work arrangements change (e.g., remote work) such that the manager has less of an opportunity to observe the work of the employee. It has many potential advantages, such as increased reliability, reduced bias, and reduced leniency (e.g., Aguinis, Gottfredson, & Joo, Reference Aguinis, Gottfredson and Joo2013; Antonioni, Reference Antonioni1994; Flint, Reference Flint1999; London & Wohlers, Reference London and Wohlers1991; Smither, London, & Reilly, Reference Smither, London and Reilly2005). Simplifying the cognitive demands and reducing, through simplification of the process and forms, the motivational barriers for managers is also likely to be helpful (Efron & Ort, Reference Efron and Ort2010). Finally, performance review panels might be used, which would have similar advantages and disadvantages as promotion panels (Catano et al., Reference Catano, Darr and Campbell2007; Church, Reference Church1995; Gilliland & Langdon, Reference Gilliland, Langdon and Smither1998; Kozlowski, Chao, & Morrison, Reference Kozlowski, Chao, Morrison and Smither1998; Vance, Winne, & Wright, Reference Vance, Winne and Wright1983; Werner & Bolino, Reference Werner and Bolino1997).
In our view, it is the consequences of leaders assigning and communicating evaluative judgments expressed as ratings that matter the most, not the ratings themselves. Leaders are the front line of performance management. The stronger the promotion process to select leaders who value development, and the stronger the formal and informal support for leaders to help them better develop their people, the stronger the performance management will be within the organization. The more employees are rewarded—formally and informally—for evaluating performance diligently and (on the employee side) for actively seeking and incorporating feedback, the more performance management will strongly contribute to organizational performance. The potential impact of I-O psychology on performance management practice is not limited to improved measurement. We have the capacity for enhancing the full lifecycle of performance-relevant behaviors and informing organizations how to build their performance culture. The full gamut of interconnected human capital activities can reinforce that culture: leadership development, employee development, selection and promotion, and rewards and compensation. Therefore, let's stop engaging in the self-deceptive thinking that getting rid of ratings is the key to improving performance management.
Moreover, before we toss out ratings as the villain, we should ask ourselves the broader and much more important question: Are we conducting the entire performance management process properly? Our perspective is that the ratings issue is just representative of deeper problems—a performance management process that is poorly designed and implemented. Our profession has accumulated extensive advice on how to design and implement performance management (e.g., Posthuma & Campion, Reference Posthuma and Campion2008; Posthuma, Campion, & Campion, Reference Posthuma, Campion and Campion2015), often summarized as “best practices,” which if followed correctly may well moot the discussion about ratings.
In the end, the narrow ratings issue may be a dilemma—a choice between undesirable alternatives. However, that is not new to I-O psychologists. In fact, helping make these hard choices is what we do. Consider the fact that turned-down candidates are not happy and would prefer to do away with our tests and selection systems. Nonpromoted employees would prefer to get rid of our promotion procedures. Employees receiving smaller pay increases will disagree with whatever system was used. So should we abandon performance ratings because managers or employees do not like them and because they are difficult to do well? Will we be better off?
If managers and employees engaged in effective day-to-day performance management behavior as needed, in real time, there should be less if any need for formal performance management systems, including formal performance ratings. Some managers, in fact, do this and realize high team and individual performance as a result, in spite of the formal performance management system. However, not all or even most managers regularly engage in effective performance management behavior, especially given that the formal system often gets in the way and drives ineffective performance management behavior and reactions. Although organizations can abandon ratings and even their formal performance management systems entirely, many are not ready to consider such extreme steps. Furthermore, many organizations feel the need to maintain evaluations of record, for legal and other purposes. The merits of these positions have been argued above, but what remains is a practically more important question: How can organizations make the right decisions about performance management reform, including the question of ratings, to best mitigate negative impact on effective performance management behavior and performance?
The concept of performance management is squarely aimed at helping individuals and organizations maximize their productivity through enabling employees to perform to their potential. To achieve this, performance needs to be managed with three critical goals in mind:
• Enable employees to align their efforts to the organization's goals.
• Provide guideposts to monitor behavior and results, and make real-time adjustments to maximize performance.
• Help employees remove barriers to success.
Each aspect of the performance management process should be designed to efficiently and directly impact one or more of these goals. Many organizations seeking to improve their performance management approaches want to start with questions such as “should we have ratings?” or “should we use a forced distribution, and if so what should the cutoff percentage be for the lowest rating?” These are the wrong questions to ask at the start. The better questions to start with are these: What are the critical outcomes we want to achieve, and how can we best ensure employees deliver against key goals and outcomes? Framed from this perspective, there is no right answer to the ratings question. It really is “It depends,” based on the organization's goals, strategies, maturity, trust, openness to change, management philosophy, and other contextual factors. Taking a broader and more strategic approach versus a narrow view focused simply on ratings will help keep the focus on what the performance management system needs to attain holistically. This will require answering key questions that matter most in deciding on the right performance management strategy for each situation. We close with those questions organizational members have found most useful for driving effective performance management reform.
• What business problem(s) are we trying to solve?
○ Drive organizational strategy?
○ Reduce unnecessary process and costs?
○ Increase engagement and performance?
○ Use formal evaluations to justify differential decisions?
• What do we want to evaluate and reward?
○ Individual or team performance or both?
○ Behaviors, outcomes, both, or other?
• How do we view our employees, and how much do we need to compete for talent?
• What proportion of an employee's total compensation do we put at risk?
• How much do we really rely on ratings for decision making?
○ How much do our ratings actually align with our decisions?
○ Do we need a number to make decisions? Why or why not?
• How mature and ready is our organization to remove formal performance management steps and process?
• How effective are we at the daily behaviors that drive performance?
• Are the answers the same for all units, jobs, and work in the organization?