Hostname: page-component-8448b6f56d-jr42d Total loading time: 0 Render date: 2024-04-23T17:22:04.441Z Has data issue: false hasContentIssue false

Statistical tools to improve assessing agreement between several observers

Published online by Cambridge University Press:  24 January 2014

I. Ruddat*
Affiliation:
Department of Biometry, Epidemiology and Information Processing, WHO Collaborating Centre for Research and Training in Veterinary Public Health, University of Veterinary Medicine, Hannover, Germany
B. Scholz
Affiliation:
Friedrich-Loeffler-Institut, Institute of Animal Welfare and Animal Husbandry, Celle, Germany
S. Bergmann
Affiliation:
Department of Veterinary Science, Faculty of Veterinary Medicine, Chair of Animal Welfare, Ethology, Animal Hygiene and Animal Housing, Ludwig-Maximilians-University, Munich, Germany
A.-L. Buehring
Affiliation:
Friedrich-Loeffler-Institut, Institute of Animal Welfare and Animal Husbandry, Celle, Germany
S. Fischer
Affiliation:
Institute for Animal Breeding and Genetics, University of Veterinary Medicine, Hannover, Germany
A. Manton
Affiliation:
Department of Farm Animal Ethology and Poultry Science, University of Hohenheim, Stuttgart, Germany
D. Prengel
Affiliation:
Department of Veterinary Science, Faculty of Veterinary Medicine, Chair of Animal Welfare, Ethology, Animal Hygiene and Animal Housing, Ludwig-Maximilians-University, Munich, Germany
E. Rauch
Affiliation:
Department of Veterinary Science, Faculty of Veterinary Medicine, Chair of Animal Welfare, Ethology, Animal Hygiene and Animal Housing, Ludwig-Maximilians-University, Munich, Germany
S. Steiner
Affiliation:
Department of Veterinary Science, Faculty of Veterinary Medicine, Chair of Animal Welfare, Ethology, Animal Hygiene and Animal Housing, Ludwig-Maximilians-University, Munich, Germany
S. Wiedmann
Affiliation:
Bavarian State Research Center for Agriculture, Kitzingen, Germany
L. Kreienbrock
Affiliation:
Department of Biometry, Epidemiology and Information Processing, WHO Collaborating Centre for Research and Training in Veterinary Public Health, University of Veterinary Medicine, Hannover, Germany
A. Campe
Affiliation:
Department of Biometry, Epidemiology and Information Processing, WHO Collaborating Centre for Research and Training in Veterinary Public Health, University of Veterinary Medicine, Hannover, Germany
Get access

Abstract

In the context of assessing the impact of management and environmental factors on animal health, behaviour or performance it has become increasingly important to conduct (epidemiological) studies in the field. Hence, the number of investigated farms per study is considerably high so that numerous observers are needed for investigation. In order to maintain the quality and validity of study results calibration meetings where observers are trained and the current level of agreement is assessed have to be conducted to minimise the observer effect. When study animals were rated independently by the same observers by a categorical variable the exclusion test can be performed to identify disagreeing observers. This statistical test compares for each variable and each observer the observer-specific agreement with the overall agreement among all observers based on kappa coefficients. It accounts for two major challenges, namely the absence of a gold-standard observer and different data type comprising ordinal, nominal and binary data. The presented methods are applied on a reliability study to assess the agreement among eight observers rating welfare parameters of laying hens. The degree to which the observers agreed depended on the investigated item (global weighted kappa coefficients: 0.37 to 0.94). The proposed method and graphical description served to assess the direction and degree to which an observer deviates from the others. It is suggested to further improve studies with numerous observers by conducting calibration meetings and accounting for observer bias.

Type
Full Paper
Copyright
© The Animal Consortium 2014 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Agresti, A 2002. Categorical data analysis. Wiley-Interscience, Hoboken, NJ, USA.CrossRefGoogle Scholar
Blokhuis, HJ, Van Niekerk, TF, Bessei, W, Elson, A, Guemene, D, Kjaer, JB, Levrino, GAM, Nicol, CJ, Tauson, R, Weeks, CA and De Weerd, HAV 2007. The LayWel project: welfare implications of changes in production systems for laying hens. Worlds Poultry Science Journal 63, 101114.Google Scholar
Brenninkmeyer, C, Dippel, S, March, S, Brinkmann, J, Winckler, C and Knierim, U 2007. Reliability of a subjective lameness scoring system for dairy cows. Animal Welfare 16, 127129.CrossRefGoogle Scholar
Byrt, T, Bishop, J and Carlin, JB 1993. Bias, prevalence and kappa. Journal of Clinical Epidemiology 46, 423429.CrossRefGoogle ScholarPubMed
EFSA Panel on Animal Health and Welfare 2012. Statement on the use of animal-based measures to assess the welfare of animals. EFSA Journal 10, 129.Google Scholar
Elbers, ARW, Vos, JH, Bouma, A and Stegeman, JA 2004. Ability of veterinary pathologists to diagnose classical swine fever from clinical signs and gross pathological findings. Preventive Veterinary Medicine 66, 239246.CrossRefGoogle ScholarPubMed
Fleiss, JL and Cohen, J 1973. Equivalence of weighted kappa and intraclass correlation coefficient as measures of reliability. Educational and Psychological Measurement 33, 613619.CrossRefGoogle Scholar
Gardner, IA, Stryhn, H, Lind, P and Collins, MT 2000. Conditional dependence between tests affects the diagnosis and surveillance of animal diseases. Preventive Veterinary Medicine 45, 107122.CrossRefGoogle ScholarPubMed
Kaler, J, Wassink, GJ and Green, LE 2009. The inter- and intra-observer reliability of a locomotion scoring scale for sheep. Veterinary Journal 180, 189194.Google Scholar
Krummenauer, F 2005. Methoden zur Evaluation bildgebender Verfahren von begrenzter Reproduzierbarkeit. Shaker Verlag, Aachen, Germany.Google Scholar
Krummenauer, F 2006. The comparison of clinical imaging devices with respect to parallel readings in both devices. European Journal of Medical Research 11, 119122.Google Scholar
Landis, JR and Koch, GG 1977. Measurement of observer agreement for categorical data. Biometrics 33, 159174.Google Scholar
March, S, Brinkmann, J and Winkler, C 2007. Effect of training on the inter-observer reliability of lameness scoring in dairy cattle. Animal Welfare 16, 131133.Google Scholar
Meagher, RK 2009. Observer ratings: validity and value as a tool for animal welfare research. Applied Animal Behaviour Science 119, 114.Google Scholar
Ott, S, Schalke, E, Campe, A and Hackbarth, H 2011. Urteilsübereinstimmung bei zwei Beobachterpaaren in einem Verhaltenstest für Hunde. In Proceedings of 16. Internationale DVG-Fachtagung zum Thema Tierschutz, Nürtingen, Deutschland, pp. 307–319.Google Scholar
Pedersen, KS, Holyoake, P, Stege, H and Nielsen, JP 2011. Observations of variable inter-observer agreement for clinical evaluation of faecal consistency in grow-finishing pigs. Preventive Veterinary Medicine 98, 284287.CrossRefGoogle ScholarPubMed
Petersen, HH, Enoe, C and Nielsen, EO 2004. Observer agreement on pen level prevalence of clinical signs in finishing pigs. Preventive Veterinary Medicine 64, 147156.CrossRefGoogle ScholarPubMed
SAS Institute Inc. 2012. SAS/SAT user’s guide. SAS Institute Inc., Cary, NC, USA.Google Scholar
Sim, J and Wright, CC 2005. The kappa statistic in reliability studies: use, interpretation, and sample size requirements. Physical Therapy 85, 257268.Google Scholar
Svartberg, K 2005. A comparison of behaviour in test and in everyday life: evidence of three consistent boldness-related personality traits in dogs. Applied Animal Behaviour Science 91, 103128.CrossRefGoogle Scholar
Thomsen, PT and Baadsgaard, NP 2006. Intra- and inter-observer agreement of a protocol for clinical examination of dairy cows. Preventive Veterinary Medicine 75, 133139.CrossRefGoogle ScholarPubMed
Thomsen, PT, Munksgaard, L and Togersen, FA 2008. Evaluation of a lameness scoring system for dairy cows. Journal of Dairy Science 91, 119126.Google Scholar
Winckler, C and Willen, S 2001. The reliability and repeatability of a lameness scoring system for use as an indicator of welfare in dairy cattle. Acta Agriculturae Scandinavica Section a-Animal Science 51, 103107.Google Scholar