Hostname: page-component-5d59c44645-mhl4m Total loading time: 0 Render date: 2024-03-01T10:34:38.260Z Has data issue: false hasContentIssue false

An Examination of Judge Reliability at a major U.S. Wine Competition*

Published online by Cambridge University Press:  08 June 2012

Robert T. Hodgson
Professor Emeritus, Department of Oceanography, Humboldt State University, Arcata, CA 95521, email:


Wine judge performance at a major wine competition has been analyzed from 2005 to 2008 using replicate samples. Each panel of four expert judges received a flight of 30 wines imbedded with triplicate samples poured from the same bottle. Between 65 and 70 judges were tested each year. About 10 percent of the judges were able to replicate their score within a single medal group. Another 10 percent, on occasion, scored the same wine Bronze to Gold. Judges tend to be more consistent in what they don't like than what they do. An analysis of variance covering every panel over the study period indicates only about half of the panels presented awards based solely on wine quality. (JEL Classification: Q13, Q19)

Copyright © American Association of Wine Economists 2008

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)


Ashenfelter, O. and Quandt, R.E.. (1999). Analyzing wine tasting statistically. Chance, 12, 1620.Google Scholar
Ashenfelter, O. (2006). Tales from the crypt: Bruce Kaiser tells us about the trials and tribulations of a wine judge. Journal of Wine Economics, 1(2), 173175.Google Scholar
Bartko, J.J. (1966). The intraclass correlation coefficient as a measure of reliability. Psychological Reports, 19, 311.Google Scholar
Cicchetti, D.V. (2004a). Who won the 1976 wine tasting of French Bordeaux and U.S. cabernets? Parametrics to the rescue. Journal of Wine Research, 15, 211220.Google Scholar
Cicchetti, D.V. (2004b). On designing experiments and analyzing data to assess the reliability and accuracy of blind wine tastings. Journal of Wine Research, 15, 221226.Google Scholar
Cicchetti, D.V. (2006). The Paris 1976 tastings revisited once more: Comparing ratings of consistent and inconsistent tasters. Journal of Wine Economics, 1(2), 125140.Google Scholar
Cliff, M.A. and King, M.C. (1996). A proposed approach for evaluating expert judge performance using descriptive statistics. Journal of Wine Research, 7, 8390.Google Scholar
Cliff, M.A. and King, M.C. (1997). The evaluation of judges at wine competitions: The application of eggshell plots. Journal of Wine Research, 8(2), 7580.Google Scholar
Lima, Tony. (2006). Price and quality in the California wine industry: an empirical investigation. Journal of Wine Economics, 1(2), 176190.Google Scholar
Shrout, P.E. and Fleiss, J.L.. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86(2), 420428.Google Scholar
Thach, L. (2008). How American consumers select wine. Wine Business Monthly (June 2008), 6671.Google Scholar