Hostname: page-component-89b8bd64d-dvtzq Total loading time: 0 Render date: 2026-05-14T03:00:33.586Z Has data issue: false hasContentIssue false

Measures of Agreement with Multiple Raters: Fréchet Variances and Inference

Published online by Cambridge University Press:  27 December 2024

Jonas Moss*
Affiliation:
BI Norwegian Business School
*
Correspondence should be made to JonasMoss, Department of Data Science and Analytics, BI Norwegian Business School, Oslo, Norway. Email: jonas.moss.statistics@gmail.com
Rights & Permissions [Opens in a new window]

Abstract

Most measures of agreement are chance-corrected. They differ in three dimensions: their definition of chance agreement, their choice of disagreement function, and how they handle multiple raters. Chance agreement is usually defined in a pairwise manner, following either Cohen’s kappa or Fleiss’s kappa. The disagreement function is usually a nominal, quadratic, or absolute value function. But how to handle multiple raters is contentious, with the main contenders being Fleiss’s kappa, Conger’s kappa, and Hubert’s kappa, the variant of Fleiss’s kappa where agreement is said to occur only if every rater agrees. More generally, multi-rater agreement coefficients can be defined in a g-wise way, where the disagreement weighting function uses g raters instead of two. This paper contains two main contributions. (a) We propose using Fréchet variances to handle the case of multiple raters. The Fréchet variances are intuitive disagreement measures and turn out to generalize the nominal, quadratic, and absolute value functions to the case of more than two raters. (b) We derive the limit theory of g-wise weighted agreement coefficients, with chance agreement of the Cohen-type or Fleiss-type, for the case where every item is rated by the same number of raters. Trying out three confidence interval constructions, we end up recommending calculating confidence intervals using the arcsine transform or the Fisher transform.

Information

Type
Original Research
Creative Commons
Creative Common License - CCCreative Common License - BY
This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Copyright
Copyright © 2024 The Author(s)
Figure 0

Table 1 Weighted agreement coefficients.

Figure 1

Table 2 Maximal agreement for the data of Fleiss (1971).

Figure 2

Figure 1 Simulated sampling distribution of κ^d\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\kappa }_{d}$$\end{document} for quadratic weights using three transformations, n=20,R=3\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=20, R=3$$\end{document}. The simulation setup is described in Example 3. The arcsine transform makes the sampling distribution closest to the normal distribution.

Figure 3

Table 3 Confidence intervals for the data of Fleiss (1971) using the arcsine method.

Figure 4

Table 4 Confidence intervals for Zapf et al. (2016) using the arcsine method.

Figure 5

Table 5 Coverage (first entry) and lengths (second entry) of confidence intervals: Perreault–Leigh model, Cohen’s kappa.

Figure 6

Table 6 Coverage (first entry) and lengths (second entry) of confidence intervals: normal model, Cohen’s kappa.

Figure 7

Table 7 Coverage (first entry) and lengths (second entry) of confidence intervals for g-wise coefficients: Perreault–Leigh model, Cohen’s kappa.

Figure 8

Figure 2 Sample distribution of κ^d\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\kappa }_{d}$$\end{document} for nominal (left) and absolute value (right) weights. Both plots omit a dominating spike at 1. Here n=20\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=20$$\end{document} and j=5\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$j=5$$\end{document}, and we use the Perreault–Leigh model (same parameters as in Sect. 5.1) to simulate the data. There were 2573 unique values for the nominal weight and 8790 unique values for the absolute value weight after N=200,000\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N=200{,}000$$\end{document} simulations.

Figure 9

Table 8 Coverage (first entry) and lengths (second entry) of confidence intervals: Perreault–Leigh model, Fleiss’s kappa.

Figure 10

Table 9 Coverage (first entry) and lengths (second entry) of confidence intervals: Normal model, Fleiss’s kappa.

Figure 11

Table 10 Coverage (first entry) and lengths (second entry) of confidence intervals: Perreault–Leigh model, Fleiss’ kappa (R=5\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R=5$$\end{document}).

Supplementary material: File

Moss supplementary material

Moss supplementary material
Download Moss supplementary material(File)
File 5.5 MB