Measuring Agreement Using Guessing Models and Knowledge Coefficients

Jonas Moss

doi:10.1007/s11336-023-09919-4

Measuring Agreement Using Guessing Models and Knowledge Coefficients

Published online by Cambridge University Press: 01 January 2025

Jonas Moss

Show author details

Jonas Moss*: Affiliation:
BI Norwegian Business School
*: Correspondence should be made to Jonas Moss, Department of Data Science and Analytics, BI Norwegian Business School, Oslo, Norway. Email: jonas.moss@bi.no

Article contents

Abstract
Guessing Models
The Knowledge Coefficient
Sensitivity and Performance
Inference
Concluding Remarks
Funding
Declarations
Footnotes
References

Rights & Permissions

Abstract

Several measures of agreement, such as the Perreault–Leigh coefficient, the AC1\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\textsc {AC}_{1}$$\end{document}, and the recent coefficient of van Oest, are based on explicit models of how judges make their ratings. To handle such measures of agreement under a common umbrella, we propose a class of models called guessing models, which contains most models of how judges make their ratings. Every guessing model have an associated measure of agreement we call the knowledge coefficient. Under certain assumptions on the guessing models, the knowledge coefficient will be equal to the multi-rater Cohen’s kappa, Fleiss’ kappa, the Brennan–Prediger coefficient, or other less-established measures of agreement. We provide several sample estimators of the knowledge coefficient, valid under varying assumptions, and their asymptotic distributions. After a sensitivity analysis and a simulation study of confidence intervals, we find that the Brennan–Prediger coefficient typically outperforms the others, with much better coverage under unfavorable circumstances.

Keywords

Agreement Interrater reliability AC1 Cohen’s kappa

Information

Type: Theory & Methods
Information: Psychometrika , Volume 88 , Issue 3 , September 2023 , pp. 1002 - 1025

DOI: https://doi.org/10.1007/s11336-023-09919-4 [Opens in a new window]
Creative Commons: This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Copyright: Copyright © 2023 The Author(s)

The most popular measures of agreement are chance-corrected. These can usually be written on the form

(0.1) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \frac{p_{a}-p_{ca}}{1-p_{ca}}, \end{aligned}$$\end{document}

where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{a}$$\end{document} is the percent agreement and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{ca}$$\end{document} is a notion of chance agreement. The best known coefficients in this class are the (weighted) Cohen’s kappa Reference Cohen1960; Reference Cohen1968, Krippendorff’s Reference Krippendorff1970 alpha, Scott’s Reference Scott1955 pi, and Fleiss’ Reference Fleiss1971 kappa. The difference between these measures lies solely in their definition of the chance agreement, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{ca}$$\end{document} . These coefficient make few to no assumptions about the underlying distribution of ratings, and can be regarded as non-parametric.

It is also possible to model the judgment process directly, and then attempt to derive reasonable chance-corrected measures of agreement from these models (Janes, Reference Janes1979). Examples of measures of agreements developed in this way include the Perreault–Leigh coefficient (Perreault & Leigh, Reference Perreault and Leigh1989), the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textsc {AC}_{1}$$\end{document} (Gwet, Reference Gwet2008), Maxwell’s RE coefficient (Maxwell, Reference Maxwell1977), Aickin’s \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document} (Aickin, Reference Aickin1990), the estimators of Klauer and Batchelder (Reference Klauer and Batchelder1996), and the more recent coefficient of van Oest (van Oest, Reference van Oest2019; van Oest & Girard, Reference van Oest and Girard2021). These measures of agreement depend on the parameters of the underlying judgment process, and may be considered semi-parametric instead of non-parametric. The models used by the above-mentioned authors may be called guessing models, as they represent ratings as being either known or guessed.

To make it clear what these models are about, consider the “textbook case argument” of Grove et al. (Reference Grove, Andreasen, McDonald-Scott, Keller and Shapiro1981) (see Gwet Reference Gwet2014, Chapter 4, for an extended justification). When two judges classify people into, say, psychiatric categories, some people are bound to be “textbook cases”, i.e., being classifiable without much effort. Disagreement between competent judges will mostly occur when subjects are hard to classify, when the judges have to guess. But judges may agree on hard subjects as well, simply due to chance. We can then define a coefficient of “agreement due to knowledge” as the proportion of textbook cases.

The guessing model, introduced in the next section, will encompass the textbook case model and many more. As we will show, it is a generalization of several judgment process models discussed in the literature on measures of agreement. Any guessing model is associated with a knowledge coefficient, a measure of agreement defined directly from its parameters. These coefficients generalize the “agreement due to knowledge” from Grove’s textbook case to more general settings. The knowledge coefficient can, under various additional assumptions, be easily estimated from the data; the details are in Theorem 2. In some cases, it equals already established coefficients such as the Brennan–Prediger coefficient Brennan and Prediger (Reference Brennan and Prediger1981) or Fleiss’ kappa, but we will establish some less familiar formulas as well. We provide methods for doing inference for our proposed coefficients, based on the delta rule and the theory of U-statistics. Using sensitivity analyses and confidence interval simulations, we find that the Brennan–Prediger coefficient generally outperforms its competitors as an estimator of the knowledge coefficient, with reasonably small bias and variance in a variety of circumstances.

1. Guessing Models

We work in the setting where one rating is definitely true, such as psychiatric diagnoses. Thus we exclude problems such as measuring agreement between movie reviewers, where there is no true rating. We also exclude measures of agreement between continuous measurement instrument, as continuous ratings are rarely exactly right. For instance, an instrument for measuring blood glucose may be decidedly better than another, but will never be precisely on the spot.

We will consider only agreement studies with a rectangular design, i.e., when R judges rate n items into one of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C<\infty $$\end{document} categories, with every item being rated exactly once by every judge. Moreover, we will understand the set of judges as fixed and the set of items as being random and increasing with n. These assumptions may not be necessary for all of the results in this paper, but will make the presentation easier to follow, and are necessary for the asymptotic results. Denote the probability that the R judges will rate an item as belonging to the categories \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x=(x_{1},x_{2},\ldots x_{R})$$\end{document} by p(x).

The joint distribution of the guessing model is

(1.1) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} p(x\mid s,x^{\star })=\prod _{r=1}^{R}\left[ s_{r}1[x_{r}=x^{\star }]+(1-s_{r})q_{r}(x_{r})\right] . \end{aligned}$$\end{document}

where

• \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_{r}$$\end{document} is the rating given by the rth judge on an item.
• \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s=\{s_{1},s_{2},\ldots ,s_{r}\}$$\end{document} are the skill-difficulty parameters, the probabilities that the rth judge knows the correct classification of an item. The skill-difficulty parameters can be deterministic or random. For instance, they can be sampled from a Beta distribution.
• \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x^{\star }$$\end{document} is the true classification the item, an unknown latent variable. These are assumed to be independent of the skill-difficulty parameters s. The distribution of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x^{\star }$$\end{document} is \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t(x^{\star })$$\end{document} , the true rating distribution.
• \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_{r}(x)$$\end{document} are the guessing distributions, the distributions the ratings are drawn from when the rth judge does not know the true classification of the item.

We may use \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t(x^{\star })$$\end{document} to remove the dependence on \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_{i}^{\star }$$\end{document} from the univariate guessing model, giving

(1.2) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} p(x_{r}\mid s)=s_{r}t(x_{r})+(1-s_{r})q_{r}(x_{r}). \end{aligned}$$\end{document}

The interpretation of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(x_{r}\mid s)$$\end{document} is straight-forward. When faced with an item, a judge r knows its true classification, drawn from t(x), with probability \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{r}$$\end{document} . If the judge doesn’t know the true classification, the rating will be drawn at random from his potentially idiosyncratic guessing distribution \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_{r}(x)$$\end{document} . We do not allow for guessing distributions \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_{r}$$\end{document} that depend both on both the true classification \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x^{\star }$$\end{document} and the judge, as it would make the parameters unidentifiable.

We have said nothing about the joint distribution of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(s_{1},s_{2},\dots ,s_{R},x^{\star })$$\end{document} except that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x^{\star }$$\end{document} is independent of the skill-difficulty parameters. This assumption is not realistic in all situations. For instance, correctly diagnosing patients with Down syndrome is easier than correctly diagnosing patients with ADHD, implying that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$E[s_{r}\mid x^{\star }=\text {Down syndrome]}$$\end{document} dominates \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$E[s_{r}\mid x^{\star }=\text {ADHD]}$$\end{document} , which violates independence of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{r}$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x^{\star }$$\end{document} . The independence assumption is not needed for the definition of the guessing model to make sense, but will be used in the remainder of the paper as it is required for Theorem 2.

In most settings with latent parameters one would decide on a model for them, such as multivariate normal in the case of linear random effects models. Instead of following this route, we will impose additional assumptions on the skill-difficulty parameters s, the guessing distributions \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_{r}$$\end{document} , and/or the true distribution t to make the problem manageable.

The guessing model 1.1 has, to our knowledge, not been presented in this generality before. Klauer and Batchelder (Reference Klauer and Batchelder1996, Theorem 5 and Section 9) define a model of almost as high generality, but does not allow the the skill-difficulty parameters to differ between items.

1.1. Knowledge Coefficient

We have introduced the guessing model in order to define the notion of “agreement due to knowledge” in a precise way. To gain an intuition about what we’re getting at, first consider the case of two judges with potentially different, but deterministic, skill-difficulty parameters \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{1}$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{2}$$\end{document} . The probability that two judges agree on the classification of an item because they both know its classification is the product of their skill parameters, or \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu =s_{1}s_{2}$$\end{document} . As “agree on classification of an item because they both know its classification” is cumbersome to read, we will call it “knowledgeable agreement” or “agree knowledgeably” from now. Extending this notion to R judges, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu =\left( {\begin{array}{c}R\\ 2\end{array}}\right) ^{-1}\sum _{r_{1}>r_{2}}s_{r_{1}}s_{r_{2}}$$\end{document} is the probability that a randomly selected pair of judges will agree knowledgeably on a pair of ratings.

Another simple case happens when there are R judges with random skill-difficulty parameters that do not wary across judges when the item is fixed, i.e., \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S_{1}=S_{2}=\cdots =S_{R}$$\end{document} , where we use capital letters to emphasize that the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$S_{r}$$\end{document} are random. Now \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$E(S_{r}^{2})$$\end{document} is the probability of knowledgeable agreement. Finally, in the general guessing model, we find that the probability of knowledgeable agreement among two judges is

(1.3) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \nu =\left( {\begin{array}{c}R\\ 2\end{array}}\right) ^{-1}\sum _{r_{1}>r_{2}}E[S_{r_{1}}S_{r_{2}}]. \end{aligned}$$\end{document}

1.2. Earlier Guessing Models

The guessing model and its associated knowledge coefficient are extensions, formalizations, or slight modifications of models or coefficients used in several earlier papers.

1.2.1. The Two Models of Maxwell (Reference Maxwell1977)

Maxwell (Reference Maxwell1977, Section 3) works in the setting of two judges and binary ratings. From his Table II one can derive the the joint model for two ratings \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_{1},x_{2}$$\end{document} by two judges as

(1.4) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} p(x_{1},x_{2})=\alpha p(x_{1})1[x_{1}=x_{2}]+(1-\alpha )p(x_{1})p(x_{2}), \end{aligned}$$\end{document}

where p is the marginal distribution of the data. Maxwell then shows that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha ={{\,\textrm{Cor}\,}}(X_{1},X_{2})$$\end{document} .

Maxwell’s joint distribution is the unconditional variant of a guessing model (1.1) with two judges, i.e., a model on the form

with associated knowledge coefficient \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu =\alpha $$\end{document} . The guessing model satisfies

(i) The judges’ guessing distributions are equal to the marginal distribution, i.e., \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_{1}(x)=q_{2}(x)=p(x)$$\end{document} .
(ii) The true distribution is assumed to be equal to the marginal distribution, i.e., \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t(x)=p(x)$$\end{document} .
(iii) Both judges share the same skill-difficulty parameter s. It is Bernoulli distributed with success probability \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document} , so that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha =P(s=1)=Es^{2}$$\end{document} . Then s will 1 if the the case is easy to judge (i.e., a textbook case) and 0 otherwise, and the probability of an item being a textbook case is \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document} .

In Section 4, Maxwell (Reference Maxwell1977) is still working with binary data and two judges. He describes a guessing model where (iii) above still holds, but (i) and (ii) are replaced with

(i) Both the judges’ guessing distributions are uniform, i.e., \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_{1}(x)=q_{2}(x)=1/2$$\end{document} and
(ii) The true distribution t(x) is arbitrary.

Then he derives the knowledge for this model, the Maxwell RE (abbreviation of random error) coefficient for binary data, a special case of the Brennan–Prediger coefficient,

(1.5) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \nu _{BP}=\frac{p_{a}-1/C}{1-1/C}, \end{aligned}$$\end{document}

where C, in this case equal to 2, is the number of categories.

1.2.2. Perreault–Leigh Coefficient (1989)

Perreault and Leigh (Reference Perreault and Leigh1989) devise an explicit model for the rating procedure involving two judges and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C<\infty $$\end{document} categories. Using an index for reliability \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s\in [0,1]$$\end{document} , they define the univariate model

(1.6) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} p(x)=st(x)+(1-s)C^{-1}. \end{aligned}$$\end{document}

The model is similar to the second Maxwell model, except that the skill-difficulty parameters are deterministic and constant across judges and the number of categories is arbitrary. The guessing distributions are uniform, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_{1}=q_{2}=1/C$$\end{document} , and t(x) is arbitrary. From this model they derive that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s=\sqrt{(p_{a}-1/C)/(1-1/C)}=\sqrt{\nu _{BP}}$$\end{document} , the square root of the Brennan–Prediger coefficient. Hence the knowledge coefficient is \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu =s^{2}.$$\end{document}

1.2.3. Aickin’s Coefficient (1990)

Aickin (Reference Aickin1990) works in a setting of two judges. He defines the joint model for two ratings \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_{1},x_{2}$$\end{document} by two judges as

(1.7) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} p(x_{1},x_{2})=(1-\alpha )q_{1}(x_{1})q_{2}(x_{2})+\alpha 1[x_{1}=x_{2}]\frac{q_{1}(x_{1})q_{2}(x_{1})}{\sum _{x=1}^{C}q_{1}(x_{1})q_{2}(x_{1})}, \end{aligned}$$\end{document}

with the goal of doing inference on \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document} . He does this using maximum likelihood, estimating the distributions \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_{1}$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_{2}$$\end{document} alongside \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document} .

Aickin’s model is a guessing model and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha =\nu $$\end{document} is its knowledge coefficient. The assumptions of the guessing model are:

(i) As in the first Maxwell model, both judges share the same skill-difficulty parameter s. It is Bernoulli distributed with success probability \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha $$\end{document} , so that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha =P(s=1)=Es^{2}$$\end{document} .
(ii) The judges’ guessing distributions are arbitrary.
(iii) The true distribution is assumed to be equal to \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t(x)=q_{1}(x)q_{2}(x)/\sum _{x=1}^{C}q_{1}(x)q_{2}(x)$$\end{document} .

That Aicken’s model is a guessing model satisfying conditions (i)–(iii) is a direct consequence of the following fact. Whenever the number of judges is two and the skill-difficulty parameter \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s\sim \text {Bernoulli}(\alpha )$$\end{document} is the same for both judges, the guessing model has unconditional distribution

(1.8) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} p(x_{1},x_{2})=\alpha 1[x_{1}=x_{2}]t(x_{1})+(1-\alpha )q_{1}(x_{1})q_{2}(x_{2}). \end{aligned}$$\end{document}

The details are in the appendix, p. 22.

Assumption (iii) is not justified by Aickin, and does not appear to be necessary. If we define the generalized Aicken model as the guessing model satisfying only (i) and (ii) above, with an arbitrary number of judges \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R\ge 2$$\end{document} , its parameters \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(\nu ,q_{1},q_{2},...q_{R})$$\end{document} are identified when \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C>2$$\end{document} . This can be shown following the arguments laid out in the proof of Theorem 1 of Klauer and Batchelder (Reference Klauer and Batchelder1996).

1.2.4. The Klauer–Batchelder Model (1996)

Klauer and Batchelder (Reference Klauer and Batchelder1996) performs a detailed structural analysis of the guessing model with identical skill-difficulty parameters. In our notation, their equation 2 is

(1.9) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} p(x\mid s,x^{\star })=s1[x=x^{\star }]+(1-s)q_{r}(x), \end{aligned}$$\end{document}

where the guessing distributions \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_{1},q_{2}$$\end{document} and the true distribution t are arbitrary. They show that, provided the number of judges is equal to two, then (Klauer & Batchelder Reference Klauer and Batchelder1996, eq. 3)

(1.10) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} p(x_{1},x_{2})=p(x_{1})p(x_{2})+s^{2}t(x_{1})[1[x_{1}=x_{2}]-t(x_{2})], \end{aligned}$$\end{document}

which does not depend directly on the guessing distributions \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_{r}(x)$$\end{document} . Equation (1.10) provides a nice interpretation of the skill-difficulty parameter s: The higher s is, the more weight will be on the main diagonal of the agreement matrix and less on the off-diagonal elements. Moreover, in Theorem 5, they extend equation 1.10 to the case of two judges with different deterministic skill-difficulty parameters.

They show the model is identified when the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C>2$$\end{document} , and propose to estimate it by maximum likelihood using the EM algorithm developed by Hu and Batchelder (Reference Hu and Batchelder1994). In addition, they show that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s^{2}$$\end{document} equals Cohen’s kappa when both guessing distributions are equal to the true distribution; they also show that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s^{2}$$\end{document} equals the Brennan–Prediger coefficient when both distributions are uniform. We generalize these results to arbitrary skill-difficulty parameters and an arbitrary number of judges in Theorem 2 below.

1.2.5. Van Oest’s Coefficient (2019)

Modifying the setup of Perreault and Leigh (Reference Perreault and Leigh1989), van Oest (Reference van Oest2019) develops a guessing model for two judges and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C<\infty $$\end{document} categories. He assumes the guessing distributions are equal to the marginal distribution, i.e., \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_{1}(x)=q_{2}(x)=p(x)$$\end{document} , and that the skill-difficulty coefficients are deterministic and constant across judges. The marginal model becomes

(1.11) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} p(x)=st(x)+(1-s)p(x). \end{aligned}$$\end{document}

van Oest (Reference van Oest2019) proceeds to show that s equals the weighted Scott’s pi under these circumstances.

2. The Knowledge Coefficient

2.1. Definitions

Letw(x,y)be anagreement weighting function. This is a function of two arguments that satisfies \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w(x,y)\le 1$$\end{document} and equals 1 when \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x=y$$\end{document} , i.e., \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w(x,x)=1$$\end{document} . The purpose of this function is to measure the degree of similarity between x and y, where 1 is understood as the maximal degree of similarity. While there are infinitely many weighting functions, only three are in widespread use. The first is the nominal weight,

With this function, similarity does not come in degrees, but it does with the quadratic weighting function, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w(x_{1},x_{2})=1-(x_{1}-x_{2})^{2}$$\end{document} . The absolute value weighting function (sometimes called the linear weighting function) measures the similarity between x, y using the absolute value, i.e., \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w(x_{1},x_{2})=1-|x_{1}-x_{2}|$$\end{document} .

Definition 1

Recall that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{r}(x)$$\end{document} is the marginal distribution of ratings for judge r, and let \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_{1}$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_{2}$$\end{document} be ratings by two different judges \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r_{1}$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r_{2}$$\end{document} . Define the weighted agreement as

(2.1) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} p_{wa}= & {} \left( {\begin{array}{c}R\\ 2\end{array}}\right) ^{-1}\sum _{r_{1}>r_{2}}\sum _{x_{1},x_{2}}w(x_{1},x_{2})p(x_{r_{1}},x_{r_{2}}), \end{aligned}$$\end{document}

the weighted Cohen-type chance agreement as

(2.2) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} p_{wc}= & {} \left( {\begin{array}{c}R\\ 2\end{array}}\right) ^{-1}\sum _{r_{1}>r_{2}}\sum _{x_{1},x_{2}}w(x_{1},x_{2})p(x_{r_{1}})p(x_{r_{2}}), \end{aligned}$$\end{document}

and the weighted Fleiss-type chance agreement as

(2.3) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} p_{wf}= & {} R^{-2}\sum _{r_{1},r_{2}}\sum _{x_{1},x_{2}}w(x_{1},x_{2})p(x_{r_{1}})p(x_{r_{2}}). \end{aligned}$$\end{document}

The difference between the two notions of weighted chance agreement should be clear enough. The Fleiss-type probability of chance agreement counts the cases when a judge agrees with himself, while the Cohen-type does not.

Letting \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\{x_{1},x_{2},\ldots ,x_{C}\}$$\end{document} be the set of possible ratings and w an agreement weighting function, define the weighting matrix W as the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C\times C$$\end{document} matrix with elements \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W_{ir}=w(x_{i},x_{r})$$\end{document} . Using W, we can write \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{wf}=p^{T}Wp,$$\end{document} where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p=R^{-1}\sum _{r}p_{r}$$\end{document} is the marginal distribution of ratings, and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{wc}=\left( {\begin{array}{c}R\\ 2\end{array}}\right) ^{-1}\sum _{r_{1}>r_{2}}p_{r_{1}}^{T}Wp_{r_{2}}$$\end{document} .

When w is the nominal weight, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{wa},p_{wc}$$\end{document} , and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{wf}$$\end{document} are proper probabilities, and are often referred to as unweighted probabilities of (chance) agreement. We do not require that the weighting functions to be non-negative, hence the quantities \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{wa},p_{wc}$$\end{document} , and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{wf}$$\end{document} are not, in general, proper probabilities. Since the number of categories C is finite, however, we may assume that the weighting function is positive by normalizing, i.e., redefining the weighting function to \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1-[1-w(x_{1},x_{2})]/\max _{x_{1},x_{2}}(1-w(x_{1},x_{2}))$$\end{document} .

2.2. The Knowledge Coefficient Theorem

As we have seen, well-known coefficients such as Scott’s pi and the Brennan–Prediger can be understood as knowledge coefficients, albeit under restrictive assumptions such as all judges being equally skilled. The following theorem describes less stringent sets of assumptions that keeps interpretable expressions for the knowledge coefficient. We must assume either that all guessing distributions are equal or that the skill-difficulty parameters have zero pairwise covariance to get anywhere. In addition, we must assume something about either the true distribution or the guessing distribution. We never have to assume that every judge is equally competent, and we never have to assume anything about the number of categories rated, however. The content of the theorem, including the required assumptions, is summarized in Table 1.

Table 1

Coefficients covered in this paper.

Note: New coefficients in italics

Theorem 2

(Knowledge Coefficient Theorem) Let w be any agreement weighting function and W its associated agreement weighting matrix. Then the following holds:

(i) Assume all guessing distributions are equal, i.e, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_{r}(x)=q(x)$$\end{document} . Then the following are equivalent:
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} q(x)=t(x),\quad p(x)=t(x),\quad q(x)=p(x) \end{aligned}$$\end{document}
Assuming either of these, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{wc}=p_{wf}$$\end{document} , and the knowledge coefficient equals both the weighted multi-rater Cohen’s kappa (Conger’s kappa) and the weighted Fleiss’ kappa
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \nu =\nu _{C}=\frac{p_{wa}-p_{wc}}{1-p_{wc}};\quad \nu =\nu _{F}=\frac{p_{wa}-p_{wf}}{1-p_{wf}}. \end{aligned}$$\end{document}
(ii) Assume all guessing distributions \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_{r}$$\end{document} are uniform. Then the knowledge coefficient equals the Brennan–Prediger coefficient,
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \nu =\nu _{BP}=\frac{p_{a}-1/C}{1-1/C}, \end{aligned}$$\end{document}
where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{a}$$\end{document} denotes the weighted agreement with nominal weights.
(iii) Assume that the skill-difficulty parameters have pairwise covariance equal to zero, i.e., \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$E[s_{r_{1}}s_{r_{2}}]=E[s_{r_{1}}]E[s_{r_{2}}]$$\end{document} for all \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r_{1},r_{2}$$\end{document} . Then the knowledge coefficient equals
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \nu =\frac{p_{wa}-p{}_{wc}}{1-t^{T}Wt}. \end{aligned}$$\end{document}
In particular, if \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t(x)=p(x)$$\end{document} , it equals the “Cohen–Fleiss” coefficient
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \nu =\nu _{CF}=\frac{p_{wa}-p_{wc}}{1-p_{wf}}. \end{aligned}$$\end{document}
Moreover, if t is uniform, it equals the “Cohen–Brennan–Prediger” coefficient,
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \nu =\nu _{CBP}=\frac{p_{wa}-p_{wc}}{1-{\varvec{1}}^{T}W{\varvec{1}}/C^{2}}. \end{aligned}$$\end{document}

Proof

The knowledge coefficient theorem is proved in the appendix, page 18. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\square $$\end{document}

Every coefficient in the theorem can be estimated by substituting the values of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{wa}$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{wf}$$\end{document} , and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{wc}$$\end{document} for their sample variants. Under any of the equivalent conditions of Theorem 2 part (i), we have that that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu $$\end{document} equals

The coefficient \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu _{F}=\frac{p_{wa}-p_{wf}}{1-p_{wf}}$$\end{document} is a weighted Fleiss’ kappa. This coefficient is strongly related to the weighted Krippendorff’s alpha, which we denote by \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\alpha }}$$\end{document} . Indeed, it is easy to see that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\alpha }}$$\end{document} is a linear transformation of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\nu }}_{F}$$\end{document} ,

(2.4) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} {\hat{\alpha }}={\hat{\nu }}_{F}+\frac{1}{N}(1-{\hat{\nu }}_{F}), \end{aligned}$$\end{document}

where N is the total number of ratings made (see the appendix of, Moss, J (Reference Moss2023)) and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\nu }}_{F}$$\end{document} , the sample weighted Fleiss kappa. Thus \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\hat{\alpha }}$$\end{document} is a consistent estimator of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu _{F}$$\end{document} .

On the other hand, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu _{C}$$\end{document} is a weighted multi-rater Cohen’s kappa of the form discussed by Berry, K& J., Mielke, P. W. (Reference Berry and Mielke1988) and Janson and Olsson (Reference Janson and Olsson2001); it can also be regarded as a weighted Conger’s kappa (Conger, Reference Conger1980). I have not seen anything like \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu _{CF}$$\end{document} , a curious combination of Cohen’s kappa and Fleiss’ kappa, probably because it is not a classical chance-corrected chance measure of agreement, as its numerator chance agreement \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{wc}$$\end{document} is distinct from the denominator chance agreement \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{wf}$$\end{document} . The required condition for the Cohen–Fleiss coefficient, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(x)=t(x)$$\end{document} , holds if and only if

which hols trivially if \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_{r}(x)=t(x)$$\end{document} for all r. Hence the Cohen–Fleiss coefficient is consistent for the population knowledge coefficient under strictly more situations than weighted Fleiss’ kappa and weighted multi-rater Cohen’s kappa, and may be preferred to them by a researcher who buys the rationale behind the guessing model.

The Cohen-Brennan–Prediger coefficient will be equal to the knowledge coefficient if the true distribution equals the uniform distribution and the skill-difficulty parameters have pairwise covariances equal to zero. This scenario may be uncommon, but can happen if the designer of the agreement study has complete control over the true ratings.

Part (ii) of Theorem 2 concerns the case when a judge who does not know the correct classification of an item guesses uniformly at random, so that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_{r}(x)=C^{-1}$$\end{document} for all judges r. This assumption is used by e.g. Brennan and Prediger (Reference Brennan and Prediger1981), Maxwell (Reference Maxwell1977), Gwet (Reference Gwet2008) and Perreault and Leigh (Reference Perreault and Leigh1989).

Example 3

Zapf et al. (Reference Zapf, Castell, Morawietz and Karch2016) did a case study on histopathological assessment of breast cancer. The number of judges was \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R=4$$\end{document} , the number breast cancer biopsies rated was \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=50,$$\end{document} and the number of categories \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C=5$$\end{document} . The estimated coefficients are

In this case, the coefficients are quite close, suggesting that the guessing distributions are close to the marginal distribution and that the marginal distribution is close to the uniform distribution. The observed marginal distribution in this data set 0.255, 0.025, 0.12, 0.21, 0.39. This appears to be quite far away from the uniform distribution, which raises the question of how sensitive the Brennan–Prediger coefficient is to the uniformity assumption.

3. Sensitivity and Performance

Recall the assumptions on the coefficients of Theorem 2

• Brennan–Prediger: All guessing distributions are equal to the uniform distribution.
• Cohen’s kappa, Fleiss’ kappa: All guessing distributions are equal, the true distribution equals the marginal distribution.
• Cohen–Fleiss: The true distribution equals the marginal guessing distribution, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$E[s_{r_{1}}s_{r_{2}}]=E[s_{r_{1}}]E[s_{r_{2}}]$$\end{document} for all \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r_{1},r_{2}$$\end{document} .
• Cohen–Brennan–Prediger: The true distribution is uniform, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$E[s_{r_{1}}s_{r_{2}}]=E[s_{r_{1}}]E[s_{r_{2}}]$$\end{document} for all \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r_{1},r_{2}$$\end{document} .

These assumptions are quite stringent, and will realistically never hold exactly. In this section, we do two sensitivity–performance studies to check how well the coefficients perform when the assumptions are broken. Theorem 2 contains two classes of coefficients. The first class, containing the Brennan–Prediger coefficient in addition to Cohen’s and Fleiss’ kappa, requires at minimum that all guessing distributions are equal. The second class, containing the Cohen–Fleiss and Cohen–Brennan–Prediger coefficient, requires that all pairs of skill-difficulty parameters have zero covariance. We will do two studies, one where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{r_{1}}$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{r_{2}}$$\end{document} have zero covariance and one where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{r_{1}}$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{r_{2}}$$\end{document} are correlated. We restrict our study the the case of nominal weights.

3.1. When \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$E[s_{r_{1}}s_{r_{2}}]=E[s_{r_{1}}]E[s_{r_{2}}]$$\end{document}

We will use the a special case of the guessing model we call the judge skill model. In this model the skill-difficulty parameter s is deterministic. This is a generalization of the models used by e.g. Perreault and Leigh (Reference Perreault and Leigh1989) and van Oest (Reference van Oest2019) to allow for judges with different skill levels. Since s is deterministic, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$E[s_{r_{1}}s_{r_{2}}]=E[s_{r_{1}}]E[s_{r_{2}}]$$\end{document} and the main condition of Theorem 2 part (iii) is satisfied. Under the judge skill model, it is fairly easy to calculate the theoretical values of the five coefficients in Theorem 2. To calculate \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{wa}$$\end{document} , we use representation (i) of Lemma 7 (p. 18), in the appendix. Since \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{wf}=p^{T}Wp$$\end{document} , where p is the marginal distribution, and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{wc}=\left( {\begin{array}{c}R\\ 2\end{array}}\right) ^{-1}\sum _{r_{1}>r_{2}}p_{r_{1}}^{T}Wp_{r_{2}}$$\end{document} , the values of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{wf}$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{wc}$$\end{document} are easily calculated.

The parameters R, C and s are sampled as follows:

(i) The number of judges (R) is sampled uniformly from [2, 20].
(ii) The number of categories (C) is sampled uniformly from [2, 10].
(iii) The R skill–difficulty parameters \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{1},\ldots ,s_{R}$$\end{document} are drawn independently from a beta distribution with parameters 7 and 1.5. This is a slightly dispersed, asymmetric distribution with a mean of 0.82.

We study what happens when the true distribution deviates from the uniform (an assumption for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu _{CBP}$$\end{document} ) and / or the guessing distribution deviates from the uniform distribution (an assumption for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu _{BP}$$\end{document} ). The numbers are the simulated mean absolute deviations from the true knowledge coefficient \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$E|\nu -\nu _{x}|$$\end{document} , where x is one of F, C, BP, CF, or CBP. The smallest numbers by orders of magnitude on each row is in bold.

3.1.1. True Distribution Centered on the Uniform Distribution

If the variability of the true distribution is “None”, it equals the uniform distribution. If the variability is “Low”, it is sampled from a symmetric Dirichlet distribution ( Johnson, Kotz, & BalakrishnanReference Johnson, Kotz and Balakrishnan1994, Chapter 49) with concentration parameter \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha =10$$\end{document} ; if variability is “High”, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha =0.5$$\end{document} . Likewise, for the guessing distributions, if the variability is “None”, all guessing distributions are equal to the uniform distribution. If the variability is “Low”, the guessing distributions are sampled from a symmetric Dirichlet distribution with concentration parameter \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha =10$$\end{document} . Finally, if the variability is “High”, they are sampled from a symmetric Dirichlet distribution with \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha =0.5$$\end{document} .

Table 2

Sensitivity analysis when \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$E[s_{r_{1}}s_{r_{2}}]=E[s_{r_{1}}]E[s_{r_{2}}]$$\end{document} . True distribution centered on the uniform distribution.

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{*}$$\end{document} Variability of the true distributions: Baseline: True distribution is uniform.

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{\star }$$\end{document} Variability of the guessing distributions. Baseline: All guessing distributions are equal to the true distribution.

From Table 2 we see that the Cohen–Brennan–Prediger coefficient performs worst in every setting,, usually by a large margin, except when the true distribution is uniform. Moreover, the Brennan–Prediger coefficient performs well in every scenario. The bias \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|\nu _{BP}-\nu |$$\end{document} will be likely be overshadowed by sampling variability for most conceivable sample sizes. Finally, there is little difference between Cohen’s kappa, Fleiss’ kappa, and the Cohen–Fleiss coefficient. Their biases are quite small, at least when the true distribution isn’t far away from the uniform distribution.

3.1.2. True Distribution Centered on the Marginal Distribution

To derive Fleiss’ kappa, Cohen’s kappa, and the Cohen–Fleiss coefficient, we assumed that the true distribution equals the marginal distribution. We can use an asymmetric Dirichlet distribution to extend this scenario, just as we used the symmetric Dirichlet distribution to extend the scenario when the true distribution is uniform. This time we won’t use the uniform distribution as a base true distribution, but randomly generated distribution h, from a symmetric Dirichlet distribution with \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha =5$$\end{document} , instead. The rest of the settings are identical to the previous sensitivity study.

Table 3

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{*}$$\end{document} Variability of the true distributions: Baseline: True distribution equals the marginal guessing distribution.

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{\star }$$\end{document} Variability of the guessing distributions. Baseline: All guessing distributions are equal to the marginal guessing distribution.

The results are in Table 3. We see that the Cohen–Brennan–Prediger coefficient performs worst in every setting, usually by a large margin. Surprisingly, the Brennan–Prediger coefficient performs best in 6/9 cases, and its performance is good in the remaining cases too. Some of the biases in the table are unacceptably large. Only the Brennan–Prediger coefficient has a bias less than 0.1 in every case. Finally, there is little difference between Cohen’s kappa, Fleiss’ kappa, and the Cohen–Fleiss coefficient. The Cohen–Fleiss coefficient does better, especially when the true distribution equals the marginal distribution, but the difference in performance is insignificant under slight deviations from equality.

3.2. When \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$E[s_{r_{1}}s_{r_{2}}]\ne E[s_{r_{1}}]E[s_{r_{2}}]$$\end{document}

To model dependent skill-difficulty parameters, let F be a R-variate distribution with \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s\sim F$$\end{document} . Evidently, the only restriction on F is that it’s a multivariate distribution function on [0, 1]. A natural way to model situation is to use copulas for the dependence structure and a density on [0, 1] for the marginals (Nelsen, Reference Nelsen2007). We will use the R-variate Gaussian copula with uniform correlation structure, i.e., the correlation matrix parameter

Denote this Gaussian copula by \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_{\rho }$$\end{document} . Let \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H_{(a,b)}$$\end{document} be the cumulative distribution function of a beta distribution with parameters a and b, and define

Then \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F_{\rho ,a,b}$$\end{document} is a reasonable model for dependent skill-difficulty parameters.

We use the small correlation parameter \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho =0.2$$\end{document} . The rest of the settings are exactly the same as the previous setting, including the beta parameters \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a=7,b=1/2$$\end{document} . We see that the Cohen–Brennan–Prediger coefficient still outperforms the alternatives when the true distribution equals the uniform distribution, but only by a an order of magnitude. Since the alternatives to the Cohen–Brennan–Prediger coefficient also performs well in this situation, with biases likely to be overshadowed by sampling error, the case for it remains weak. Similarly, we see that the Cohen–Fleiss coefficient still outperforms Cohen’s kappa and Fleiss’ kappa, roughly by one order of magnitude, a much smaller margin than before. These comments also hold for the case of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho =0.5,0.9$$\end{document} , which can be found in the online appendix (p. 23).

Table 4

Sensitivity analysis when \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$E[s_{r_{1}}s_{r_{2}}]\ne E[s_{r_{1}}]E[s_{r_{2}}]$$\end{document} . True distribution centered on the uniform distribution.

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{*}$$\end{document} Variability of the true distributions: Baseline: True distribution is uniform.

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{\star }$$\end{document} Variability of the guessing distributions. Baseline: All guessing distributions are equal to the true distribution.

Table 5

Sensitivity analysis when \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$E[s_{r_{1}}s_{r_{2}}]\ne E[s_{r_{1}}]E[s_{r_{2}}]$$\end{document} . True distribution centered on the marginal guessing distribution.

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{*}$$\end{document} Variability of the true distributions: Baseline: True distribution equals the marginal guessing distribution.

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{\star }$$\end{document} Variability of the guessing distributions. Baseline: All guessing distributions are equal.

3.3. Recommendations

We can draw three tentative conclusions from the sensitivity analysis.

(i) Unless the researcher can make sure the true distribution is exactly uniform, the Cohen–Brennan–Prediger coefficient is probably not worth reporting. Its bias is often unacceptably large, frequently larger than 0.1.
(ii) The Brennan–Prediger coefficient performs reasonably well in all situations, with biases less than 0.01 when the true distribution is centered on the uniform distribution. It performs decently in the second scenario as well, with biases at around 0.05 across the board.
(iii) The Cohen–Fleiss coefficient does slightly better than Cohen’s kappa and Fleiss’ kappa, but not enough to be important.

Since the Cohen–Fleiss coefficient does slightly better than Fleiss’ kappa and Cohen’s kappa, it appears prudent to report it. However, we believe it would be best not to. For both Fleiss’ kappa and Cohen’s kappa are well-known chance-corrected measures of agreement. In contrast, the Cohen–Fleiss coefficient is neither well-known nor a chance-corrected measure of agreement. We recommend that you report Cohen’s kappa (or Fleiss’ kappa) together with the Brennan–Prediger coefficient. This solution accounts both for the scenario when the marginal distribution is close to the true distribution and the scenario when the uniform distribution is close to the marginal distribution.

4. Inference

The asymptotic distribution of the coefficients Table 1 can readily be calculated using the theory of U-statistics.

The following Lemma is instrumental in constructing the confidence intervals.

Lemma 4

Define the parameter vectors \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{p}}=(p_{wa},p_{wc},p_{wf})$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{{\varvec{p}}}=({\hat{p}}_{wa},{\hat{p}}_{wc},{\hat{p}}_{wf})$$\end{document} , and let \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Sigma $$\end{document} be the covariance matrix with elements

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \sigma _{11}&= \sigma _{A}^{2} = {{\,\textrm{Var}\,}}\mu _{wa}({\varvec{X}}_{1}) ,\quad \sigma _{12} = \sigma _{AC}^{2} = 2{{\,\textrm{Cov}\,}}\left( \mu _{wa}({\varvec{X}}_{1}),\mu _{wc}({\varvec{X}}_{1})\right) ,\\ \sigma _{22}&= \sigma _{C}^{2} = 4{{\,\textrm{Var}\,}}\mu _{wc}({\varvec{X}}_{1}) ,\quad \sigma _{13} = \sigma _{AF}^{2} = 2{{\,\textrm{Cov}\,}}\left( \mu _{wa}({\varvec{X}}_{1}),\mu _{wc}({\varvec{X}}_{1})\right) ,\\ \sigma _{33}&= \sigma _{F}^{2} = 4{{\,\textrm{Var}\,}}\mu _{wf}({\varvec{X}}_{1}) ,\quad \sigma _{23} = \sigma _{CF}^{2} = 4{{\,\textrm{Cov}\,}}\left( \mu _{wc}({\varvec{X}}_{1}),\mu _{wf}({\varvec{X}}_{1})\right) . \end{aligned}$$\end{document}

Then

Proof

See Lemma 1 of Moss, J (Reference Moss2023). The definitions of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _{wa}({\varvec{X}}_{1}),\mu _{wc}({\varvec{X}}_{1})$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _{wf}({\varvec{X}}_{1})$$\end{document} can be found there. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\square $$\end{document}

An application of the delta method yields the following.

Proposition 5

The coefficients in Table 1 are asymptotically normal, and their asymptotic variances are

(4.1) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \sigma _{F}^{2}= & {} \sigma _{A}^{2}\frac{1}{(1-p_{wf})^{2}}-2\sigma _{FA}\frac{1-p_{wa}}{(1-p_{wf})^{3}}+\sigma _{F}^{2}\frac{(1-p_{wa})^{2}}{(1-p_{wf})^{4}}. \end{aligned}$$\end{document}

(4.2) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \sigma _{C}^{2}= & {} \sigma _{A}^{2}\frac{1}{(1-p_{wc})^{2}}-2\sigma _{CA}\frac{1-p_{wa}}{(1-p_{wc})^{3}}+\sigma _{C}^{2}\frac{(1-p_{wa})^{2}}{(1-p_{wc})^{4}}, \end{aligned}$$\end{document}

(4.3) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \sigma _{BP}^{2}= & {} \sigma _{A}^{2}\frac{C^{2}}{(1-C)^{2}}, \end{aligned}$$\end{document}

(4.4) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \sigma _{CF}^{2}= & {} (1-p_{wf})^{-2}\left( 1,-1,\nu _{CBP}\right) \Sigma \left( 1,-1,\nu _{CBP}\right) ^{T}, \end{aligned}$$\end{document}

(4.5) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \sigma _{CBP}^{2}= & {} \frac{\sigma _{A}^{2}-2\sigma _{CA}+\sigma _{C}^{2}}{(1-{\varvec{1}}^{T}W{\varvec{1}}/C^{2})^{2}}. \end{aligned}$$\end{document}

Proof

The expressions for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma _{F}^{2}$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma _{C}^{2}$$\end{document} are from Moss, J (Reference Moss2023, Proposition 2). The simple proof for the Cohen–Fleiss coefficient is in the appendix, p. 6. The asymptotic variance of the Brennan–Prediger coefficient is well-known and the easiest to derive. The variance of the Cohen–Brennan–Prediger coefficient follows immediately from an application of the delta method. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\square $$\end{document}

To estimate the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma _{x}^{2}$$\end{document} , where x is a placeholder for F, C, BP, CF, or CBP, we use an empirical approach that coincides with that of Gwet (Reference Gwet2021) in the special case of Fleiss’ kappa with nominal weights. See the comments following Proposition 1 of Moss, J (Reference Moss2023) for details.

4.1. Confidence Intervals

Moss, J (Reference Moss2023) found that the arcsine interval tends to do slightly better than the untransformed interval for agreement coefficients with rectangular design. For that reason, we will only look at the arcsine interval here. Using the delta method, together with the fact that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\frac{d}{dx}\arcsin (x)=1/\sqrt{1-x^{2}}$$\end{document} , we find that

(4.6) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \sqrt{n}(\arcsin {\hat{\nu }}_{x}-\arcsin \nu _{x}){\mathop {\rightarrow }\limits ^{d}}N(0,(1-\nu _{x}^{2})^{-1}\sigma _{x}^{2}), \end{aligned}$$\end{document}

(4.7) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} CI_{x}=\sin \left( \arcsin {\hat{\nu }}\pm t_{1-\alpha /2}(n-1)(1-{\hat{\nu }}_{x}^{2})^{-1}{\hat{\sigma }}_{x}^{2}\right) , \end{aligned}$$\end{document}

Example 6

(Example 3 (cont.)) We calculate arcsine confidence intervals along with the point estimates for the five coefficients using the data from Zapf et al. (Reference Zapf, Castell, Morawietz and Karch2016). The results are in Table 6.

Table 6

Confidence limits for Zapf et al. (Reference Zapf, Castell, Morawietz and Karch2016).

4.2. Coverage of the Confidence Intervals

We use the arcsine interval and nominally weighted coefficients. The settings of our simulation study follows the settings of the first sensitivity analysis closely. We use \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N=10,000$$\end{document} repetitions and the following simulation parameters:

(i) Number of judges R. We use 2, 5, 20, corresponding to a small, medium, and large selection of judges.
(ii) Sample sizes n. We use \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=20,100$$\end{document} , corresponding to small and large agreement studies.
(iii) Model. We simulate from the judge skill model used in the sensitivity study (p. 9).

In some cases the simulation yields data frames with only identical values. Our confidence interval construction do not cover these instances, so we decided to discard these simulations, repeating the simulation until we got a data frame with at least two different values.

4.2.1. True Distribution Centered on the Uniform Distribution

We use the same setup as in 3.1.1, where we studied deviations from two assumptions. (a), that the true distribution is centered on the uniform distribution, (b) that the guessing distributions are equal.

Table 7

Coverage and lengths of confidence intervals, deviation from uniform.

Table 7 contains the results of the simulation. All coefficients, except the Brennan–Prediger coefficient, perform poorly when t is far from the uniform. The Cohen–Fleiss coefficient has poor coverage when the true distribution is far away from the marginal distribution; likewise for Fleiss’ kappa and Cohen’s kappa. The Brennan–Prediger coefficient performs surprisingly well, with far better coverage than Cohen’s kappa, Fleiss’ kappa, and the Cohen–Fleiss coefficient. On the other hand, the Cohen–Brennan–Prediger coefficient performs poorly. The coverage when t is far from the uniform or \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=100$$\end{document} is unacceptably low for all coefficients except the Brennan–Prediger coefficient.

4.2.2. True Distribution Centered on the Marginal Distribution

We use the same setup as in 3.1.2, where we studied deviations from the assumption that the true distribution is centered on the marginal distribution and that the guessing distributions are equal.

Table 8

Coverage and lengths of confidence intervals, deviation from marginal.

The results are in Table 8. The most striking feature is, once again, the poor performance of every coefficient except the Brennan–Prediger coefficient. The Brennan–Prediger coefficient performs well, with a coverage of approximately 0.95 in most scenarios when \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=20$$\end{document} . What’s more, its length is always the smallest. Cohen’s kappa, Fleiss’ kappa, and the Cohen–Fleiss coefficient performs decently, except when the marginal distribution is far away from the uniform, when their coverage is dismal. The Cohen–Brennan–Prediger coefficients performs worse than the others, with larger confidence interval lengths and horrible coverage. Compared to Table 7, the coverages in Table 8 are much worse. Even the best-performing Brennan–Prediger coefficient gets as low as 0.58 at a point.

5. Concluding Remarks

In the guessing model, a judge either knows – with \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$100\%$$\end{document} certainty – the correct classification, or he makes a guess. A more realistic model would let knowledge come in degrees. A judge could be, say, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$70\%$$\end{document} sure that a patient is X, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$20\%$$\end{document} sure that he is Y, and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$10\%$$\end{document} spread evenly on the remaining options. In epistemology, the credence function (Pettigrew, Reference Pettigrew and Zalta2019) quantifies his degree of belief in the different propositions. Defining and working with knowledge coefficients in more general “credence models” might be possible, but identifiability issues looms large.

The sensitivity and coverage studies in this paper are limited in scope, as they only covers nominal weights and and a small number of parameter tweaks. Larger simulation study could potentially confirm or disconfirm our recommendation of reporting Cohen’s kappa and Fleiss’ kappa together with the Brennan–Prediger coefficient.

We reiterate that the agreement coefficient \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {AC}_{1}$$\end{document} (Gwet, Reference Gwet2008) is justified using a guessing model similar to the first Maxwell’s Reference Maxwell1977 (discussed on p. 3), but it is not a knowledge coefficient. The relationship between the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {AC}_{1}$$\end{document} and the knowledge coefficient will be explored in a future paper.

We have only discussed a rather peculiar sort of estimation of the knowledge coefficient. It would, perhaps, be more natural to discuss traditional estimation methods, such maximum likelihood estimation, as explored by Aickin (Reference Aickin1990) and Klauer and Batchelder (Reference Klauer and Batchelder1996) in their submodels of the guessing model. In particular, composite maximum likelihood estimation (Varin et al., Reference Varin, Reid and Firth2011) appears to be a good fit to the problem. Bayesian estimation could also be a reasonable option. If we only care about performance measures such as the mean squared error, the Brennan–Prediger coefficient has small bias under many scenarios, and its variance is virtually guaranteed to be smaller than the variance of a composite maximum likelihood estimator. But if we care about inference, even the superior confidence intervals for the Brennan–Prediger coefficient have unacceptably poor coverage under some circumstances. Since constructing approximate confidence intervals for maximum likelihood and composite maximum likelihood is routine, going this route will likely fix the coverage problem.

Funding

Open access funding provided by Norwegian Business School

Declarations

Conflict of interest

The authors do not have any conflicts of interest to disclose.

Appendix

Proof of the Knowledge Coefficient Theorem on p. 6.

Recall that W is a matrix \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C\times C$$\end{document} matrix of agreement weights, that is, a symmetric matrix whose elements \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W_{ir}$$\end{document} satisfy \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1\ge W_{ir}$$\end{document} with \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W_{ir}=1$$\end{document} if and only if \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i=r$$\end{document} . We simplify our notation by considering two judges only. Define \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{wa}(r_{1},r_{2})$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{wc}(r_{1},r_{2})$$\end{document} as the the probability of (chance) agreement restricted to the two judges \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r_{1}$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r_{2}$$\end{document} . Clearly, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{wa}=\left( {\begin{array}{c}R\\ 2\end{array}}\right) ^{-1}\sum _{r_{1}<r_{2}}p_{wa}(r_{1},r_{2})$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{wc}=\left( {\begin{array}{c}R\\ 2\end{array}}\right) ^{-1}\sum _{r_{1}<r_{2}}p_{wc}(r_{1},r_{2})$$\end{document} .

In the following lemma, we will view probability mass functions as vectors. That is, we will view e.g. p as the vector in \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathbb {R}}^{C}$$\end{document} whose ith element is \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(x_{i})$$\end{document} . This will greatly simplify our notation.

Lemma 7

The following is true.

(i) The weighted agreement between \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r_{1}$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r_{2}$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{wa}(r_{1},r_{2})$$\end{document} , equals
(5.1) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} E[s_{r_{1}}s_{r_{2}}]+(E[s_{r_{1}}]-E[s_{r_{1}}s_{r_{2}}])t^{T}Wq_{r_{2}}+(E[s_{r_{2}}]-E[s_{r_{1}}s_{r_{2}}])q_{r_{1}}^{T}Wt\nonumber \\ +(1-E[s_{r_{1}}]-E[s_{r_{2}}]+E[s_{r_{1}}s_{r_{2}}])q_{r_{1}}^{T}Wq_{r_{2}}. \end{aligned}$$\end{document}
(ii) The weighted chance agreement between \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r_{1}$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r_{2}$$\end{document} , or \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{wc}(r_{1},r_{2})$$\end{document} , equals
(5.2) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} E[s_{r_{1}}]E[s_{r_{2}}]t^{T}Wt+(E[s_{i_{1}r_{1}}]-E[s_{r_{1}}]E[s_{r_{2}}])t^{T}Wq_{r_{2}}+(E[s_{r_{2}}]-E[s_{r_{1}}]E[s_{r_{2}}])q_{r_{1}}^{T}Wt\nonumber \\ +(1-E[s_{r_{1}}]-E[s_{r_{2}}]+E[s_{r_{1}}]E[s_{r_{2}}])q_{r_{1}}^{T}Wq_{r_{2}}. \nonumber \\ \end{aligned}$$\end{document}
(iii) Finally, the weighted chance agreement between \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r_{1}$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r_{2}$$\end{document} can be written in terms of the marginal distributions and the weighting matrix,
(5.3) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} p_{wc}(r_{1},r_{2})=\sum _{x_{1},x_{2}}w(x_{1},x_{2})p_{r_{1}}(x_{1})p_{r_{2}}(x_{2})=p_{r_{1}}^{T}Wp_{r_{2}}. \end{aligned}$$\end{document}

Proof

(i) We will make use the following expression for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{wa}(r_{1},r_{2})$$\end{document} :

Recall that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x^{\star }$$\end{document} is the true rating, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_{1}$$\end{document} is the rating by judge 1, and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x_{2}$$\end{document} the rating by judge 2, and the expression for the full guessing model, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(x\mid s,x^{\star })=s_{r}1[x=x^{\star }]+(1-s_{r})q_{r}(x)$$\end{document} of formula (1.1). We expand the right hand side of the expression for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{wa}(r_{1},r_{2})$$\end{document} above to obtain

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} p_{wa}(r_{1},r_{2})&=\sum _{x^{\star }}\sum _{x_{1}}\sum _{x_{2}}w(x_{1},x_{2})s_{r_{1}}s_{r_{2}}t(x^{\star })1[x_{1}=x_{2}=x^{\star }]&(=A)\\&+\sum _{x^{\star }}\sum _{x_{1}}\sum _{x_{2}}w(x_{1},x_{2})s_{r_{1}}(1-s_{r_{2}})t(x^{\star })1[x_{1}=x^{\star }]q_{r_{2}}(x_{2})&(=B)\\&+\sum _{x^{\star }}\sum _{x_{1}}\sum _{x_{2}}w(x_{1},x_{2})s_{r_{2}}(1-s_{r_{2}})q_{r_{1}}(x_{1})t(x^{\star })1[x_{2}=x^{\star }]&(=C)\\&+\sum _{x^{\star }}\sum _{x_{1}}\sum _{x_{2}}w(x_{1},x_{2})(1-s_{r_{1}})(1-s_{r_{2}})q_{r_{1}}(x_{1})q_{r_{2}}(x_{2})t(x^{\star })&(=D) \end{aligned}$$\end{document}

Now we sum over \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x^{\star },x_{1},x_{2}$$\end{document} for each of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(A)-(D)$$\end{document} . Starting with (A), recall that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w(x,x)=1$$\end{document} for all x. Thus

Now consider (B), where we must recall that W, the weighting matrix, is the matrix with elements \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W_{ir}=w(x_{i},x_{r}$$\end{document} ).

Likewise, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C=s_{r_{2}}(1-s_{r_{2}})q_{r_{1}}^{T}Wt$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$D=(1-s_{r_{1}})(1-s_{r_{2}})q_{r_{1}}^{T}Wq_{r_{2}}$$\end{document} .

and 5.1 follows from taking expectations over \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{r_{1}},s_{r_{2}}$$\end{document} .

(ii) We proceed in the same way as we did in (i).

Again, we expand this expression to obtain

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} p_{wc}(r_{1},r_{2}\mid s_{r_{1}},s_{r_{2}})&=\sum _{x_{1}^{\star }}\sum _{x_{2}^{\star }}\sum _{x_{1}}\sum _{x_{2}}w(x_{1},x_{2})s_{r_{1}}s'_{r_{2}}t(x_{1}^{\star })t(x_{2}^{\star })1[x_{1}=x_{1}^{\star }]1[x_{2}=x_{2}^{\star }]&(=A)\\&+\sum _{x_{1}^{\star }}\sum _{x_{2}^{\star }}\sum _{x_{1}}\sum _{x_{2}}w(x_{1},x_{2})s_{r_{1}}(1-s{}_{r_{2}})t(x_{1}^{\star })1[x_{1}=x^{\star }]q_{r_{2}}(x_{2})t(x_{2}^{\star })&(=B)\\&+\sum _{x_{1}^{\star }}\sum _{x_{2}^{\star }}\sum _{x_{1}}\sum _{x_{2}}w(x_{1},x_{2})s_{r_{2}}(1-s{}_{r_{1}})q_{r_{1}}(x_{1})t(x_{2}^{\star })1[x_{2}=x^{\star }]t(x_{1}^{\star })&(=C)\\&+\sum _{x_{1}^{\star }}\sum _{x_{2}^{\star }}\sum _{x_{1}}\sum _{x_{2}}w(x_{1},x_{2})(1-s_{r_{1}})(1-s{}_{r_{2}})q_{r_{1}}(x_{1})q_{r_{2}}(x_{2})t(x_{1}^{\star })t(x_{2}^{\star })&(=D) \end{aligned}$$\end{document}

The main difference from (i) is in (A),

Let’s consider (B) too:

(C) and (D) can be calculated in the same way. After taking expectations with respect to independent \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{r_{1}},s_{r_{2}}$$\end{document} , we find the expression for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{wc}(r_{1},r_{2})$$\end{document} in the statement of the Lemma.

(iii) The expression for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{wc}$$\end{document} in terms of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{r_{1}},p_{r_{2}}$$\end{document} is trivial. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\square $$\end{document}

Proof

(Proof of Theorem 2)

(i). Assume all guessing distributions are equal, i.e., \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_{r}(x)=q(x)$$\end{document} . We wish to show that the following are equivalent

(5.4) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} q(x)=t(x),\quad p(x)=t(x),\quad q(x)=p(x). \end{aligned}$$\end{document}

To do this, recall the marginal univariate model \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(x\mid s)=s_{r}t(x)+(1-s_{r})q(x).$$\end{document} Take expectations over s to obtain, with \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha =R^{-1}\sum _{r}E(s_{r}),$$\end{document}

(5.5) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} p(x)=\alpha t(x)+(1-\alpha )q(x). \end{aligned}$$\end{document}

It immediately follows that the expressions in 5.4 are equivalent.

Let’s proceed to prove the rest of (i). If all guessing distributions are equal to the true distribution, then the formula 5.1 can be written as

Some of the terms cancel, leaving

Moreover, the formula for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{wc}(r_{1},r_{2})$$\end{document} can be simplified:

Most of the terms cancel, leaving only

To verify these formulas, simply replace all instances of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_{r}$$\end{document} with t in Lemma 7, parts (i) and (ii). Since the marginal distribution for every judge is the same under the assumption that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q(x)=t(x)$$\end{document} , it follows that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{wa}=p_{wa}$$\end{document} . Since the true distribution equals the marginal distribution,

and by part (iii) of Lemma 7, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{wf}=R^{-2}\sum _{r_{1},r_{2}}p_{r_{1}}^{T}Wp_{r_{2}}$$\end{document} , Take the mean over all combinations of judges and reorder to arrive at

(ii). Assume that all guessing distributions are uniform. Then if \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_{r}=C^{-1}\textbf{1}$$\end{document} , which implies that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_{r}^{T}p=C^{-1}$$\end{document} for any probability mass function p. (This happens since p sums to 1.) It follows that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$q_{r_{2}}^{T}t=q_{r_{1}}^{T}q_{r_{2}}=C^{-1}$$\end{document} for all \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r_{1},r_{2}$$\end{document} . Using expression (i) of the Lemma and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$W=I$$\end{document} , we find that

Canceling terms, we find that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{a}(r_{1},r_{2})=E[s_{r_{1}}s_{r_{2}}](1-C^{-1})+C^{-1}$$\end{document} . It follows that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu =\frac{p_{a}-C^{-1}}{1-C^{-1}}$$\end{document} .

(iii). Suppose that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$E[s_{r_{1}}s_{r_{2}}]=E[s_{r_{1}}]E[s_{r_{2}}]$$\end{document} for all \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r_{1},r_{2}$$\end{document} . Now subtract \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{wc}(r_{1},r_{2})$$\end{document} from \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{wa}(r_{1},r_{2})$$\end{document} , using Lemma 7, parts (i) and (ii). Most of the terms cancel, leaving us with

Take the mean over all combinations of judges and reorder to arrive at

If the true distribution equals the marginal distribution, then \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_{wf}=p^{T}Wp=t^{T}Wt$$\end{document} , as explained in (i), hence \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu =\frac{p_{wa}-p_{wc}}{1-p_{wf}}$$\end{document} , as claimed. On the other hand, if the true distribution is uniform, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t=C^{-1}\textbf{1}$$\end{document} , hence \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\nu =\frac{p_{wa}-p_{wc}}{1-\textbf{1}^{T}W\textbf{1}/C^{-2}}$$\end{document} . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\square $$\end{document}

Proof of the Expression for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma _{CF}$$\end{document} in Proposition 5

First, let us recall the multidimensional delta method. Let \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f:{\mathbb {R}}^{k}\rightarrow {\mathbb {R}}$$\end{document} be continuously differentiable at \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta $$\end{document} and suppose that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sqrt{n}({\hat{\theta }}-\theta ){\mathop {\rightarrow }\limits ^{d}}N(0,\Sigma )$$\end{document} . Then

(5.6) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \sqrt{n}[f({\hat{\theta }})-f(\theta )]{\mathop {\rightarrow }\limits ^{d}}N(0,\nabla f(\theta )^{T}\Sigma \nabla f(\theta )) \end{aligned}$$\end{document}

In the case of the Cohen–Fleiss coefficient, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta =(p_{wa},p_{wc},p_{wf})$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$f(\theta )=\frac{p_{wa}-p_{wc}}{1-p_{wf}}$$\end{document} . Then

Thus the variance is

Proof that Aickin’s Model is a Guessing Model

Lemma 8

Let the number of judges be 2 and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s\sim \text {Bernoulli}(\alpha )$$\end{document} be the same for both judges. Then the guessing model has unconditional distribution

(5.7) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} p(x_{1},x_{2})=\alpha 1[x_{1}=x_{2}]t(x_{1})+(1-\alpha )q_{1}(x_{1})q_{2}(x_{2}). \end{aligned}$$\end{document}

Proof

Expanding the guessing model 1.1

Since \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s\sim \text {Bernoulli}(\alpha )$$\end{document} , we have that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$Es^{2}=\alpha $$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$E(s(1-s))=0$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$E(1-s)(1-s)=1-\alpha $$\end{document} . It follows that

Summing over \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x^{\star }$$\end{document} yields \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p(x_{1},x_{2})=\alpha 1[x_{1}=x_{2}]t(x_{1})+(1-\alpha )q_{1}(x_{1})q_{2}(x_{2})$$\end{document} . \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\square $$\end{document}

Footnotes

Supplementary Information The online version contains supplementary material available at https://doi.org/10.1007/s11336-023-09919-4.

Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

Aickin, M.(1990). Maximum likelihood estimation of agreement in the constant predictive probability model, and its relation to Cohen’s kappa. Biometrics, 46 (2),293–302CrossRef Google Scholar PubMed

Berry, K. J., &Mielke, P. W.(1988). A generalization of Cohen’s kappa agreement measure to interval measurement and multiple raters. Educational and Psychological Measurement, 48 (4),921–933CrossRef Google Scholar

Brennan, R. L., & Prediger, D. J.(1981). Coefficient kappa: Some uses, misuses, and alternatives. Educational and Psychological Measurement, 41 (3),687–699CrossRef Google Scholar

Cohen, J.(1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20 (1),37–46CrossRef Google Scholar

Cohen, J.(1968). Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. Psychological Bulletin, 70 (4),213–220CrossRef Google Scholar PubMed

Conger, A. J. (1980). Integration and generalization of kappas for multiple raters.Psychological bulletin 88, (2), 322–328. https://psycnet.apa.org/fulltext/1980-29309-001.pdf https://10.1037/0033-2909.88.2.322 CrossRef Google Scholar

Fleiss, J. L.(1971). Measuring nominal scale agreement among many raters. Psychological Bulletin, 76 (5),378–382CrossRef Google Scholar

Grove, W. M., Andreasen, N. C., McDonald-Scott, P., Keller, M. B., & Shapiro, R. W. (1981). Reliability studies of psychiatric diagnosis. theory and practice. Archives of General Psychiatry, 38(4), 408–413. https://10.1001/archpsyc.1981.01780290042004 CrossRef Google Scholar

Gwet, K. L.(2008). Computing inter-rater reliability and its variance in the presence of high agreement. The British Journal of Mathematical and Statistical Psychology, 61 29–48CrossRef Google Scholar PubMed

Gwet, K. L. (2014). Handbook of inter-rater reliability. Advanced Analytics LLC.Google Scholar

Gwet, K. L.(2021). Large-sample variance of Fleiss generalized kappa. Educational and Psychological Measurement, 8243202CrossRef Google Scholar PubMed

Hu, X., &Batchelder, W. H.(1994). The statistical analysis of general processing tree models with the EM algorithm. Psychometrika, 59 (1),21–47CrossRef Google Scholar

Janes, C. L.(1979). Agreement measurement and the judgment process. The Journal of Nervous and Mental Disease, 167 (6),343–347CrossRef Google Scholar PubMed

Janson, H., &Olsson, U.(2001). A measure of agreement for interval or nominal multivariate observations. Educational and Psychological Measurement, 61 (2),277–289CrossRef Google Scholar

Johnson, N. L., Kotz, S., & Balakrishnan, N. (1994). Continuous univariate distributions (Vol. 1). Wiley.Google Scholar

Klauer, K. C., &Batchelder, W. H.(1996). Structural analysis of subjective categorical data. Psychometrika, 61 (2),199–239CrossRef Google Scholar

Krippendorff, K.(1970). Bivariate agreement coefficients for reliability of data. Sociological Methodology, 2 139–150CrossRef Google Scholar

Maxwell, A. E.(1977). Coefficients of agreement between observers and their interpretation. The British Journal of Psychiatry, 130 79–83CrossRef Google Scholar PubMed

Moss, J (2023). Measures of agreement with multiple raters: Fréchet variances and inference.CrossRef Google Scholar

Nelsen, R. B. (2007). An introduction to copulas. Springer Science & Business Media.Google Scholar

Perreault, W. D., &Leigh, L. E.(1989). Reliability of nominal data based on qualitative judgments. Journal of Marketing Research, 26 (2),135–148CrossRef Google Scholar

Pettigrew, R.(2019). Epistemic utility arguments for probabilism.Zalta, E. N. The stanford encyclopedia of philosophy, Metaphysics Research LabStanford UniversityGoogle Scholar

Scott, W. A.(1955). Reliability of content analysis: The case of nominal scale coding. Public Opinion Quarterly, 19 (3),321–325CrossRef Google Scholar

van Oest, R.(2019). A new coefficient of interrater agreement: The challenge of highly unequal category proportions. Psychological Methods, 24 (4),439–451CrossRef Google Scholar PubMed

van Oest, R., &Girard, J. M.(2021). Weighting schemes and incomplete data: A generalized Bayesian framework for chance-corrected interrater agreement. Psychological Methods,CrossRef Google Scholar

Varin, C., Reid, N., & Firth, D. (2011). An overview of composite likelihood methods. Statistica Sinica, 21(1), 5–42. https://www.jstor.org/stable/24309261.Google Scholar

Zapf, A., Castell, S., Morawietz, L., &Karch, A.(2016). Measuring inter-rater reliability for nominal data-which coefficients and confidence intervals are appropriate?. BMC Medical Research Methodology,CrossRef Google Scholar PubMed

Table 1 Coefficients covered in this paper.

Table 2 Sensitivity analysis when E[sr1sr2]=E[sr1]E[sr2]\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$E[s_{r_{1}}s_{r_{2}}]=E[s_{r_{1}}]E[s_{r_{2}}]$$\end{document}. True distribution centered on the uniform distribution.

Table 3 Sensitivity analysis when E[sr1sr2]=E[sr1]E[sr2]\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$E[s_{r_{1}}s_{r_{2}}]=E[s_{r_{1}}]E[s_{r_{2}}]$$\end{document}. True distribution centered on the marginal guessing distribution.

Table 4 Sensitivity analysis when E[sr1sr2]≠E[sr1]E[sr2]\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$E[s_{r_{1}}s_{r_{2}}]\ne E[s_{r_{1}}]E[s_{r_{2}}]$$\end{document}. True distribution centered on the uniform distribution.

Table 5 Sensitivity analysis when E[sr1sr2]≠E[sr1]E[sr2]\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$E[s_{r_{1}}s_{r_{2}}]\ne E[s_{r_{1}}]E[s_{r_{2}}]$$\end{document}. True distribution centered on the marginal guessing distribution.

Table 6 Confidence limits for Zapf et al. (2016).

Table 7 Coverage and lengths of confidence intervals, deviation from uniform.

Table 8 Coverage and lengths of confidence intervals, deviation from marginal.

Moss supplementary material

File 196.7 KB

Article contents

Measuring Agreement Using Guessing Models and Knowledge Coefficients

Abstract

Keywords

Information

1. Guessing Models

1.1. Knowledge Coefficient

1.2. Earlier Guessing Models

1.2.1. The Two Models of Maxwell (Reference Maxwell1977)

1.2.2. Perreault–Leigh Coefficient (1989)

1.2.3. Aickin’s Coefficient (1990)

1.2.4. The Klauer–Batchelder Model (1996)

1.2.5. Van Oest’s Coefficient (2019)

2. The Knowledge Coefficient

2.1. Definitions

Definition 1

2.2. The Knowledge Coefficient Theorem

Theorem 2

Proof

Example 3

3. Sensitivity and Performance

3.1.1. True Distribution Centered on the Uniform Distribution

3.1.2. True Distribution Centered on the Marginal Distribution

3.3. Recommendations

4. Inference

Lemma 4

Proof

Proposition 5

Proof

4.1. Confidence Intervals

Example 6

4.2. Coverage of the Confidence Intervals

4.2.1. True Distribution Centered on the Uniform Distribution

4.2.2. True Distribution Centered on the Marginal Distribution

5. Concluding Remarks

Funding

Declarations

Conflict of interest

Appendix

Proof of the Knowledge Coefficient Theorem on p. 6.

Lemma 7

Proof

Proof

Proof that Aickin’s Model is a Guessing Model

Lemma 8

Proof

Footnotes

References

Moss supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests