Hostname: page-component-6766d58669-h8lrw Total loading time: 0 Render date: 2026-05-20T10:28:32.862Z Has data issue: false hasContentIssue false

Wine Review Descriptors as Quality Predictors: Evidence from Language Processing Techniques

Published online by Cambridge University Press:  07 April 2022

Chenyu Yang
Affiliation:
Department of Statistical Science, Southern Methodist University, Dallas, Texas, 75275; e-mail: chenyuy@smu.edu.
Jackson Barth
Affiliation:
Department of Statistical Science, Southern Methodist University, Dallas, Texas, 75275; e-mail: jbarth@smu.edu.
Duwani Katumullage
Affiliation:
Department of Statistical Science, Southern Methodist University, Dallas, Texas, 75275; e-mail: dkatumullage@smu.edu.
Jing Cao*
Affiliation:
Department of Statistical Science, Southern Methodist University, Dallas, Texas, 75275
*
e-mail: jcao@smu.edu (corresponding author).
Rights & Permissions [Opens in a new window]

Abstract

There is an ongoing debate on whether wine reviews provide meaningful information on wine properties and quality. However, few studies have been conducted aiming directly at comparing the utility of wine reviews and numeric measurements in wine data analysis. Based on data from close to 300,000 wines reviewed by Wine Spectator, we use logistic regression models to investigate whether wine reviews are useful in predicting a wine's quality classification. We group our sample into one of two binary quality brackets, wines with a critical rating of 90 or above and the other group with ratings of 89 or below. This binary outcome constitutes our dependent variable. The explanatory variables include different combinations of numerical covariates such as the price and age of wines and numerical representations of text reviews. By comparing the explanatory accuracy of the models, our results suggest that wine review descriptors are more accurate in predicting binary wine quality classifications than are various numerical covariates—including the wine's price. In the study, we include three different feature extraction methods in text analysis: latent Dirichlet allocation, term frequency-inverse document frequency, and Doc2Vec text embedding. We find that Doc2Vec is the best performing feature extraction method that produces the highest classification accuracy due to its capability of using contextual information from text documents. (JEL Classifications: C45, C88, D83)

Information

Type
Articles
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
Copyright © The Author(s), 2022. Published by Cambridge University Press on behalf of American Association of Wine Economists
Figure 0

Table 1 Wine Spectator Tasting Staff

Figure 1

Table 2 Wine Spectator Rating Categories and Distribution

Figure 2

Figure 1 Histogram of Wine Spectator Ratings

Figure 3

Table 3 Descriptive Statistics for the Data

Figure 4

Figure 2 Boxplots of Age and log(Age) by Rating Class

Figure 5

Figure 3 Boxplots of Price and log(Price) by Rating Class

Figure 6

Figure 4 LDA Architecture

Figure 7

Figure 5 Illustration of TF and TF-IDF RepresentationNote: harmon* represents harmonious.

Figure 8

Figure 6 Illustration of Word2Vec and Doc2Vec

Figure 9

Table 4 Classification Results