Skip to main content
×
×
Home

IRT Models for Expert-Coded Panel Data

  • Kyle L. Marquardt (a1) and Daniel Pemstein (a2)
Abstract

Data sets quantifying phenomena of social-scientific interest often use multiple experts to code latent concepts. While it remains standard practice to report the average score across experts, experts likely vary in both their expertise and their interpretation of question scales. As a result, the mean may be an inaccurate statistic. Item-response theory (IRT) models provide an intuitive method for taking these forms of expert disagreement into account when aggregating ordinal ratings produced by experts, but they have rarely been applied to cross-national expert-coded panel data. We investigate the utility of IRT models for aggregating expert-coded data by comparing the performance of various IRT models to the standard practice of reporting average expert codes, using both data from the V-Dem data set and ecologically motivated simulated data. We find that IRT approaches outperform simple averages when experts vary in reliability and exhibit differential item functioning (DIF). IRT models are also generally robust even in the absence of simulated DIF or varying expert reliability. Our findings suggest that producers of cross-national data sets should adopt IRT techniques to aggregate expert-coded data measuring latent concepts.

Copyright
Corresponding author
Footnotes
Hide All

Authors’ note: Earlier drafts presented at the 2016 MPSA Annual Convention, the 2016 IPSA World Convention and the 2016 V-Dem Latent Variable Modeling Week Conference. We thank Chris Fariss, Juraj Medzihorsky, Pippa Norris, Jon Polk, Shawn Treier, Carolien van Ham and Laron Williams for their comments on earlier drafts of this paper, as well as V-Dem Project members for their suggestions and assistance. We are also grateful to the editor and two anonymous reviewers for their detailed suggestions. This material is based upon work supported by the National Science Foundation under Grant No. SES-1423944 (PI: Daniel Pemstein); the Riksbankens Jubileumsfond, Grant M13-0559:1 (PI: Staffan I. Lindberg); the Swedish Research Council, 2013.0166 (PI: Staffan I. Lindberg and Jan Teorell); the Knut and Alice Wallenberg Foundation (PI: Staffan I. Lindberg); the University of Gothenburg, Grant E 2013/43; and internal grants from the Vice-Chancellor’s office, the Dean of the College of Social Sciences, and the Department of Political Science at University of Gothenburg. We performed simulations and other computational tasks using resources provided by the Notre Dame Center for Research Computing (CRC) through the High Performance Computing section and the Swedish National Infrastructure for Computing (SNIC) at the National Supercomputer Centre in Sweden (SNIC 2016/1- 382, 2017/1-407 and 2017/1-68). We specifically acknowledge the assistance of In-Saeng Suh at CRC and Johan Raber at SNIC in facilitating our use of their respective systems. Replication materials available in Marquardt and Pemstein (2018).

Contributing Editor: R. Michael Alvarez

Footnotes
References
Hide All
Aldrich, John H., and McKelvey, Richard D.. 1977. A method of scaling with applications to the 1968 and 1972 Presidential elections. American Political Science Review 71(1):111130.
Bakker, R., de Vries, C., Edwards, E., Hooghe, L., Jolly, S., Marks, G., Polk, J., Rovny, J., Steenbergen, M., and Vachudova, M. A.. 2012. Measuring party positions in Europe: The Chapel Hill expert survey trend file, 1999–2010. Party Politics 21(1):143152.
Bakker, Ryan, Jolly, Seth, Polk, Jonathan, and Poole, Keith. 2014. The European common space: Extending the use of anchoring vignettes. The Journal of Politics 76(4):10891101.
Boyer, K. K., and Verma, R.. 2000. Multiple raters in survey-based operations management research: A review and tutorial. Production and Operations Management 9(2):128140.
Brady, Henry E. 1985. The perils of survey research: Inter-personally incomparable responses. Political Methodology 11(3/4):269291.
Buttice, Matthew K., and Stone, Walter J.. 2012. Candidates matter: Policy and quality differences in congressional elections. Journal of Politics 74(3):870887.
Clinton, Joshua D., and Lewis, David E.. 2008. Expert opinion, agency characteristics, and agency preferences. Political Analysis 16(1):320.
Coppedge, Michael, Gerring, John, Lindberg, Staffan I., Teorell, Jan, Pemstein, Daniel, Tzelgov, Eitan, Wang, Yi-ting, Glynn, Adam, Altman, David, Bernhard, Michael, Steven Fish, M., Hicken, Allen, McMann, Kelly, Paxton, Pamela, Reif, Megan, Skaaning, Svend-Erik, and Staton, Jeffrey. 2014. V-Dem: A new way to measure democracy. Journal of Democracy 25(3):159169.
Coppedge, Michael, Gerring, John, Lindberg, Staffan I., Skaaning, Svend-Erik, Teorell, Jan, Altman, David, Bernhard, Michael, Steven Fish, M., Glynn, Adam, Hicken, Allen, Knutsen, Carl Henrik, McMann, Kelly, Paxton, Pamela, Pemstein, Daniel, Staton, Jeffrey, Zimmerman, Britte, Andersson, Frida, Mechkova, Valeriya, and Miri, Farhad. 2016. Varieties of democracy codebook v6. Technical report. Varieties of Democracy Project: Project Documentation Paper Series.
Coppedge, Michael, Gerring, John, Lindberg, Staffan I., Skaaning, Svend-Erik, Teorell, Jan, Altman, David, Bernhard, Michael, Steven Fish, M., Glynn, Adam, Hicken, Allen, Knutsen, Carl Henrik, Marquardt, Kyle L., McMann, Kelly, Miri, Farhad, Paxton, Pamela, Pemstein, Daniel, Staton, Jeffrey, Tzelgov, Eitan, Wang, Yi-ting, and Zimmerman, Brigitte. 2016. V–Dem Dataset v6.2. Technical report. Varieties of Democracy Project. https://ssrn.com/abstract=2968289.
Coppedge, Michael, Gerring, John, Lindberg, Staffan I., Skaaning, Svend-Erik, Teorell, Jan, Andersson, Frida, Marquardt, Kyle L., Mechkova, Valeriya, Miri, Farhad, Pemstein, Daniel, Pernes, Josefine, Stepanova, Natalia, Tzelgov, Eitan, and Wang, Yi-Ting. 2016. Varieties of Democracy Methodology v5. Technical report. Varieties of Democracy Project: Project Documentation Paper Series.
Hare, Christopher, Armstrong, David A., Bakker, Ryan, Carroll, Royce, and Poole, Keith T. 2015. Using Bayesian Aldrich-McKelvey Scaling to study citizens’ ideological preferences and perceptions. American Journal of Political Science 59(3):759774.
Johnson, Valen E., and Albert, James H.. 1999. Ordinal Data Modeling . New York: Springer.
Jones, Bradford S., and Norrander, Barbara. 1996. The reliability of aggregated public opinion measures. American Journal of Political Science 40(1):295309.
King, Gary, Murray, Christopher J. L., Salomon, Joshua A., and Tandon, Ajay. 2004. Enhancing the validity and cross-cultural comparability of measurement in survey research. The American Political Science Review 98(1):191207.
King, Gary, and Wand, Jonathan. 2007. Comparing incomparable survey responses: Evaluating and selecting anchoring vignettes. Political Analysis 15(1):4666.
Konig, T., Marbach, M., and Osnabrugge, M.. 2013. Estimating party positions across countries and time–a dynamic latent variable model for manifesto data. Political Analysis 21(4):468491.
Kozlowski, Steve W., and Hattrup, Keith. 1992. A disagreement about within-group agreement: Disentangling issues of consistency versus consensus. Journal of Applied Psychology 77(2):161167.
Lebreton, J. M., and Senter, J. L.. 2007. Answers to 20 questions about interrater reliability and interrater agreement. Organizational Research Methods 11(4):815852.
Lindstädt, Rene, Proksch, Sven-Oliver, and Slapin, Jonathan B.. 2016. When experts disagree: Response aggregation and its consequences in expert surveys.
Maestas, Cherie D., Buttice, Matthew K., and Stone, Walter J.. 2014. Extracting wisdom from experts and small crowds: Strategies for improving informant-based measures of political concepts. Political Analysis 22(3):354373.
Marquardt, Kyle, and Pemstein, Daniel. 2018. Replication Data for: IRT models for expert-coded panel data, https://doi.org/10.7910/DVN/KGP01E, Harvard Dataverse, V1.
Norris, Pippa, Frank, Richard W., and Martínez I Coma, Ferran. 2013. Assessing the quality of elections. Journal of Democracy 24(4):124135.
Pemstein, Daniel, Seim, Brigitte, and Lindberg, Staffan I.. 2016. Anchoring vignettes and item response theory in cross-national expert surveys.
Pemstein, Daniel, Tzelgov, Eitan, and Wang, Yi-ting. 2015. Evaluating and improving item response theory models for cross-national expert surveys. Varieties of Democracy Institute Working Paper 1(March):153.
Pemstein, Daniel, Marquardt, Kyle L., Tzelgov, Eitan, Wang, Yi-ting, and Miri, Farhad. 2015. The V-Dem measurement model: Latent variable analysis for cross-national and cross-temporal expert-coded data. Varieties of Democracy Institute Working Paper , 21.
Ramey, Adam. 2016. Vox populi, vox dei? Crowdsourced ideal point estimation. The Journal of Politics 78(1):281295.
Stan Development Team. 2015. Stan: A C++ Library for Probability and Sampling, Version 2.9.0. http://mc-stan.org/.
Teorell, Jan, Dahlström, Carl, and Dahlberg, Stefan. 2011. The QoG expert survey dataset. Technical report. University of Gothenburg: The Quality of Government Institute, http://www.qog.pol.gu.se.
Treier, Shawn, and Jackman, Simon. 2008. Democracy as a latent variable. American Journal of Political Science 52(1):201217.
Van Bruggen, Gerrit H., Lilien, Gary L., and Kacker, Manish. 2002. Informants in organizational marketing research: Why use multiple informants and how to aggregate responses. Journal of Marketing Research 39(4):469478.
von Davier, Matthias, Shin, Hyo-Jeong, Khorramdel, Lale, and Stankov, Lazar. 2017. The effects of vignette scoring on reliability and validity of self-reports. Applied Psychological Measurement 42(4):291306.
Recommend this journal

Email your librarian or administrator to recommend adding this journal to your organisation's collection.

Political Analysis
  • ISSN: 1047-1987
  • EISSN: 1476-4989
  • URL: /core/journals/political-analysis
Please enter your name
Please enter a valid email address
Who would you like to send this to? *
×
MathJax

Keywords

Type Description Title
UNKNOWN
Supplementary materials

Marquardt and Pemstein supplementary material
Online Appendix

 Unknown (696 KB)
696 KB

Metrics

Altmetric attention score

Full text views

Total number of HTML views: 4
Total number of PDF views: 48 *
Loading metrics...

Abstract views

Total abstract views: 345 *
Loading metrics...

* Views captured on Cambridge Core between 3rd September 2018 - 24th September 2018. This data will be updated every 24 hours.