Construct Validity Evidence for Multisource Performance Ratings: Is Interrater Reliability Enough?

Jisoo Ock

doi:10.1017/iop.2016.19

Construct Validity Evidence for Multisource Performance Ratings: Is Interrater Reliability Enough?

Published online by Cambridge University Press: 04 July 2016

Jisoo Ock

Show author details

Jisoo Ock*: Affiliation:
Seoul, Republic of Korea
*: Correspondence concerning this article should be addressed to Jisoo Ock, 355 Achasan-ro Gwangjin Acrotel A-1506, Seoul, Republic of Korea. E-mail: jisoo.ock@gmail.com

Article contents

Extract
References

Get access

Rights & Permissions

Extract

As organizations become decentralized and work becomes team based, organizations are adopting performance management practices that integrate employees’ performance information from multiple perspectives (e.g., 360-degree performance ratings). Both arguments for and against the use of performance ratings presented in the focal article focused on rater agreement (or lack thereof) as evidence supporting the position that multisource ratings are a useful (or not a useful) approach to performance appraisal. In the argument for the use of multisource ratings, Adler, Campion, and Grubb (Adler et al., 2016) point out that multisource ratings are advantageous because they lead to increased interrater reliability in the ratings. Although Adler and colleagues were not explicit about why this would be true, proponents of multisource ratings often cite the measurement theory assumption that increasing the number of raters will yield more valid and reliable scores to the extent that there is any correlation in the ratings (Shrout & Fleiss, 1979). In the argument against the use of multisource performance ratings, Colquitt, Murphy, and Ollander-Krane argued that because multisource ratings pool together ratings from raters who are systematically different in terms of their roles and perspectives about the target employee's performance, the increased number of raters is not expected to resolve the low level of interrater agreement that is typically observed in performance ratings (Viswesvaran, Ones, & Schmidt, 1996).

Type: Commentaries
Information: Industrial and Organizational Psychology , Volume 9 , Issue 2 , June 2016 , pp. 329 - 333

DOI: https://doi.org/10.1017/iop.2016.19 [Opens in a new window]
Copyright: Copyright © Society for Industrial and Organizational Psychology 2016

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Adler, S., Campion, M., Colquitt, A., Grubb, A., Murphy, K., Ollander-Krane, R., & Pulakos, E. D. (2016). Getting rid of performance ratings: Genius or folly? A debate. Industrial and Organizational Psychology: Perspectives on Science and Practice, 9 (2), 219–252.Google Scholar

Borman, W. C. (1974). The rating of individuals in organizations: An alternative approach. Organizational Behavior and Human Performance, 12, 105–124.Google Scholar

Borman, W. C. (1997). 360 ratings: An analysis of assumptions and research agenda for evaluating their validity. Human Resource Management Review, 7, 299–315.Google Scholar

Conway, J. M. (1996). Analysis and design of multitrait–multirater performance appraisal studies. Journal of Management, 22, 139–162.Google Scholar

Conway, J. M., & Huffcutt, A. I. (1997). Psychometric properties of multisource performance ratings: A meta-analysis of subordinate, supervisor, peer, and self-ratings. Human Performance, 10, 331–360.Google Scholar

Cortina, J. M. (1993). What is coefficient alpha? An examination of theory and applications. Journal of Applied Psychology, 78, 98–104.Google Scholar

Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281–302.Google Scholar

Hoffman, B. J., Lance, C. E., Bynum, B., & Gentry, W. A. (2010). Rater source effects are alive and well after all. Personnel Psychology, 63, 119–151.Google Scholar

Hoffman, B. J., & Woehr, D. J. (2009). Disentangling the meaning of multisource performance rating source and dimension factors. Personnel Psychology, 62, 735–765.Google Scholar

Lance, C. E., Hoffman, B. J., Gentry, W. A., & Baranik, L. E. (2008). Rater source factors represent important subcomponents of the criterion construct space, not rater bias. Human Resource Management Review, 18, 223–232.Google Scholar

Meng, X. L., Rosenthal, R., & Rubin, D. B. (1992). Comparing correlated correlation coefficients. Psychological Bulletin, 111, 172–175.Google Scholar

Mount, M. K., Judge, T. A., Scullen, S. E., Systma, M. R., & Hezlett, S. A. (1998). Trait, rater, and level effects in 360-degree performance ratings. Personnel Psychology, 51, 557–576.Google Scholar

Murphy, K. R. (2008). Explaining the weak relationship between job performance and ratings of job performance. Industrial and Organizational Psychology: Perspectives on Science and Practice, 1, 148–160.Google Scholar

Murphy, K. R., & DeShon, R. (2000). Interrater correlations do not estimate the reliability of job performance ratings. Personnel Psychology, 53, 873–900.Google Scholar

Podsakoff, P. M., MacKenzie, S. B., Podsakoff, N. P., & Lee, J. (2003). Common method biases in behavioral research: A critical review of the literature and recommended remedies. Journal of Applied Psychology, 88, 879–903.Google Scholar

Schmidt, F. L., & Hunter, J. E. (1996). Measurement error in psychological research: Lessons from 26 research scenarios. Psychological Methods, 1, 199–223.Google Scholar

Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86, 420–428.Google Scholar

Viswesvaran, C., Ones, D. S., & Schmidt, F. L. (1996). Comparative analysis of the reliability of job performance ratings. Journal of Applied Psychology, 81, 557–574.CrossRef Google Scholar

Woehr, D. J., Sheehan, M. K., & Bennett, W. (2005). Assessing measurement equivalence across rating sources: A multitrait–multirater approach. Journal of Applied Psychology, 90, 592–600.Google Scholar

Article contents

Construct Validity Evidence for Multisource Performance Ratings: Is Interrater Reliability Enough?

Extract

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests