Hostname: page-component-848d4c4894-ndmmz Total loading time: 0 Render date: 2024-05-24T11:25:02.162Z Has data issue: false hasContentIssue false

Construct Validity Evidence for Multisource Performance Ratings: Is Interrater Reliability Enough?

Published online by Cambridge University Press:  04 July 2016

Jisoo Ock*
Affiliation:
Seoul, Republic of Korea
*
Correspondence concerning this article should be addressed to Jisoo Ock, 355 Achasan-ro Gwangjin Acrotel A-1506, Seoul, Republic of Korea. E-mail: jisoo.ock@gmail.com

Extract

As organizations become decentralized and work becomes team based, organizations are adopting performance management practices that integrate employees’ performance information from multiple perspectives (e.g., 360-degree performance ratings). Both arguments for and against the use of performance ratings presented in the focal article focused on rater agreement (or lack thereof) as evidence supporting the position that multisource ratings are a useful (or not a useful) approach to performance appraisal. In the argument for the use of multisource ratings, Adler, Campion, and Grubb (Adler et al., 2016) point out that multisource ratings are advantageous because they lead to increased interrater reliability in the ratings. Although Adler and colleagues were not explicit about why this would be true, proponents of multisource ratings often cite the measurement theory assumption that increasing the number of raters will yield more valid and reliable scores to the extent that there is any correlation in the ratings (Shrout & Fleiss, 1979). In the argument against the use of multisource performance ratings, Colquitt, Murphy, and Ollander-Krane argued that because multisource ratings pool together ratings from raters who are systematically different in terms of their roles and perspectives about the target employee's performance, the increased number of raters is not expected to resolve the low level of interrater agreement that is typically observed in performance ratings (Viswesvaran, Ones, & Schmidt, 1996).

Type
Commentaries
Copyright
Copyright © Society for Industrial and Organizational Psychology 2016 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Adler, S., Campion, M., Colquitt, A., Grubb, A., Murphy, K., Ollander-Krane, R., & Pulakos, E. D. (2016). Getting rid of performance ratings: Genius or folly? A debate. Industrial and Organizational Psychology: Perspectives on Science and Practice, 9 (2), 219252.Google Scholar
Borman, W. C. (1974). The rating of individuals in organizations: An alternative approach. Organizational Behavior and Human Performance, 12, 105124.Google Scholar
Borman, W. C. (1997). 360 ratings: An analysis of assumptions and research agenda for evaluating their validity. Human Resource Management Review, 7, 299315.Google Scholar
Conway, J. M. (1996). Analysis and design of multitrait–multirater performance appraisal studies. Journal of Management, 22, 139162.Google Scholar
Conway, J. M., & Huffcutt, A. I. (1997). Psychometric properties of multisource performance ratings: A meta-analysis of subordinate, supervisor, peer, and self-ratings. Human Performance, 10, 331360.Google Scholar
Cortina, J. M. (1993). What is coefficient alpha? An examination of theory and applications. Journal of Applied Psychology, 78, 98104.Google Scholar
Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281302.Google Scholar
Hoffman, B. J., Lance, C. E., Bynum, B., & Gentry, W. A. (2010). Rater source effects are alive and well after all. Personnel Psychology, 63, 119151.Google Scholar
Hoffman, B. J., & Woehr, D. J. (2009). Disentangling the meaning of multisource performance rating source and dimension factors. Personnel Psychology, 62, 735765.Google Scholar
Lance, C. E., Hoffman, B. J., Gentry, W. A., & Baranik, L. E. (2008). Rater source factors represent important subcomponents of the criterion construct space, not rater bias. Human Resource Management Review, 18, 223232.Google Scholar
Meng, X. L., Rosenthal, R., & Rubin, D. B. (1992). Comparing correlated correlation coefficients. Psychological Bulletin, 111, 172175.Google Scholar
Mount, M. K., Judge, T. A., Scullen, S. E., Systma, M. R., & Hezlett, S. A. (1998). Trait, rater, and level effects in 360-degree performance ratings. Personnel Psychology, 51, 557576.Google Scholar
Murphy, K. R. (2008). Explaining the weak relationship between job performance and ratings of job performance. Industrial and Organizational Psychology: Perspectives on Science and Practice, 1, 148160.Google Scholar
Murphy, K. R., & DeShon, R. (2000). Interrater correlations do not estimate the reliability of job performance ratings. Personnel Psychology, 53, 873900.Google Scholar
Podsakoff, P. M., MacKenzie, S. B., Podsakoff, N. P., & Lee, J. (2003). Common method biases in behavioral research: A critical review of the literature and recommended remedies. Journal of Applied Psychology, 88, 879903.Google Scholar
Schmidt, F. L., & Hunter, J. E. (1996). Measurement error in psychological research: Lessons from 26 research scenarios. Psychological Methods, 1, 199223.Google Scholar
Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86, 420428.Google Scholar
Viswesvaran, C., Ones, D. S., & Schmidt, F. L. (1996). Comparative analysis of the reliability of job performance ratings. Journal of Applied Psychology, 81, 557574.CrossRefGoogle Scholar
Woehr, D. J., Sheehan, M. K., & Bennett, W. (2005). Assessing measurement equivalence across rating sources: A multitrait–multirater approach. Journal of Applied Psychology, 90, 592600.Google Scholar