Wine Review Descriptors as Quality Predictors: Evidence from Language Processing Techniques

Abstract There is an ongoing debate on whether wine reviews provide meaningful information on wine properties and quality. However, few studies have been conducted aiming directly at comparing the utility of wine reviews and numeric measurements in wine data analysis. Based on data from close to 300,000 wines reviewed by Wine Spectator, we use logistic regression models to investigate whether wine reviews are useful in predicting a wine's quality classification. We group our sample into one of two binary quality brackets, wines with a critical rating of 90 or above and the other group with ratings of 89 or below. This binary outcome constitutes our dependent variable. The explanatory variables include different combinations of numerical covariates such as the price and age of wines and numerical representations of text reviews. By comparing the explanatory accuracy of the models, our results suggest that wine review descriptors are more accurate in predicting binary wine quality classifications than are various numerical covariates—including the wine's price. In the study, we include three different feature extraction methods in text analysis: latent Dirichlet allocation, term frequency-inverse document frequency, and Doc2Vec text embedding. We find that Doc2Vec is the best performing feature extraction method that produces the highest classification accuracy due to its capability of using contextual information from text documents. (JEL Classifications: C45, C88, D83)


I. Introduction
Wine is one of the most popular beverages in the world. For customers who have limited knowledge of wines, choosing a wine from the vast number of different wines can be overwhelming. Information on the vintage, grape type, and region of wine can be helpful, but important aspects such as flavor and aroma cannot be accurately described using numeric measurements. To provide more detailed information, professional wine magazines and wine websites publish wine reviews either by wellrespected wine critics or by general customers. Wine reviews provide a useful source of information on sensory properties of wines (e.g., flavor, color, aroma), objective characteristics (e.g., color, grape type), as well as an overall evaluation of wines' quality.
For most wine consumers, wine ratings offer a simple summary of wine quality. The 100-point wine rating system has become the benchmark in the wine industry, where most wines are rated in a much narrower range between 80 and 100 points. Wine ratings may play an important role in customers' purchase intent of wines. It is very common, for example, to see wine in a local store advertised as a "92point" wine. Some researchers have examined the relationship between wine ratings and some other numeric variables (i.e., price, age, vintage, etc.). Limited studies have been conducted to investigate the utility of wine text reviews. We analyze whether text reviews can provide more useful information than numeric measurements by conducting a classification study on ratings of wines based on their text reviews and numeric measurements.
Using only numeric variables, Dubois and Nauges (2010) found a positive relationship between expert ratings and wine prices based on a structural model. Hilger, Rafert, and Villas-Boas (2011) showed that demand decreases for lowscoring wines, and demand increases for wines scoring average or higher. Massett, Weisskopf, and Cossutta (2015) found that ratings by reviewers of Wine Spectator have a positive association with wine prices. In a different study, however, Ashenfelter and Jones (2013) presented evidence that expert wine ratings, while not completely worthless, are not significantly associated with price. The authors also showed that models with inputs on the weather during the vintage year could provide additional predictive power. Ashenfelter (2010) discovered that the red Bordeaux wines with a vintage that corresponded to a year with a warm growing season had better quality on average. Compared to numeric variables, which can be easily incorporated into statistical methods, text data create a unique challenge in data analysis. Text data are of a nonstandard format: they are neither numeric nor of the same length, making it difficult to include them in conventional statistical models. Because of this, the first step in most natural language processing (NLP) is feature extraction, which summarizes relevant and meaningful information from a text document into a numerical representation. Such extracted features from a text can serve as input factors to be included in further statistical analysis. Significant progress has been made in feature extraction techniques, including the bag-of-words (BoW) model, the one-hot encoded vector representation, and the more recent development in machine learning (e.g., Word2Vec and Doc2Vec). This progress, along with the publicly available product reviews on the internet, has contributed to an increase in text analysis and NLP methods applied to consumer product reviews.
In this study, we apply three types of commonly used feature extraction methods, which are representative of the broad range of complexities in text analysis, and compare their performance in a classification task of wine ratings. The first one is the latent Dirichlet allocation (LDA) model (Blei, Ng, and Jordan, 2003). LDA is an unsupervised technique that first identifies a pre-selected number of different topics (such as color, flavor, aroma in a wine review text), each represented by a group of words that appear together frequently. It then identifies the relative strength of the topics in each text document, measured by a number between 0 and 1. While LDA can be useful in some studies, such as cataloging text documents, it may not be a good candidate to extract information and opinion embedded in a text. The second feature extraction tool implemented is the term frequency-inverse document frequency (TF-IDF), first proposed by Jones (1972). TF-IDF is a BoW model, which essentially measures the frequency of certain words in a text document relative to the frequency of the word across the corpus. The BoW model is based on a simple idea, and it is easy to implement. However, it discards information on the order and structure of words in the document, limiting its ability to detect important aspects of meaning from the content. The third feature extraction method included for this study is Doc2Vec, which employs a machine learning prediction algorithm to extract contextual information from the text (Le and Mikolov, 2014). More details for the three feature extraction tools are provided in the methodology section.
There are a number of text analysis studies focused on wine reviews. In a study of Napa Valley wines, Ramirez (2010) examined the relationship between the length of text reviews and the price of wines. Using a dynamic price model that accounts for factors such as rating and maturity, Ramirez showed that a 10% increase in the text length corresponds to a $2-4 increase in the price per bottle. Hendrickx et al. (2016) used both LDA and Word2Vec to classify wines based on color, grape variety, country of origin, and price. In a similar study, Croijmans et al. (2019) used principal component analysis to show consistency in the language used by Wine Enthusiast reviewers. Buccafusco, Masur, and Whalen (2021) applied computational linguistic analysis to chateau names in the Bordeaux wine region to study the degree of brand congestion within a mature, traditional, and high-value market. Capehart (2021aCapehart ( , 2021b recently conducted two related studies on whether words previously identified as expensive and cheap ones are indicative of a wine's price. In the more recent development of wine text analysis, McCannon (2020) constructed predictive models for wine prices based on numeric covariates and text features extracted using LDA. McCannon found that text review appears to be a significant predictor of wine price on its own, but the effect often disappears once wine type and rating are included in the model. Chen et al. (2018) performed a binary classification of wine ratings based on wine reviews collected from Wine Spectator. Using a BoW approach for the text analysis, they applied the Naïve Bayes and the Support Vector Machine (SVM) methods to classify the reviews in two classes based on wine ratings (i.e., 90 and above or 89 and below). They found that the SVM model outperforms the Naïve Bayes method in terms of classification accuracy.
There is an ongoing debate on whether wine reviews provide meaningful information on wine properties and quality. Storchmann (2012) provided a review of the work studying the role of expert opinion. Gawel (2007) suggested that wine experts tend to use vague and abstract terms (e.g., complex, attractive, etc.) when describing wines. Quandt (2007) presented examples of the "bullshit" found in wine reviews by comparing legitimate professional wine reviews to random, artificial reviews generated from a wine lexicon. Weil (2007) conducted an experiment to show that wine consumers cannot match critics' descriptions of wines. Klimmek (2013) provided a new metric to distinguish meaningful wine reviews from redundant wine reviews, citing that reviews with a higher level of specificity tend to be more informative. On the other hand, there is evidence showing that reviews written by wine experts are more accurately matched to wines than those by novices (Solomon, 1997). Croijmans and Majid (2016) suggested that experts can accurately describe flavors of wine and coffee. Using a mixture ANOVA model, the authors found that professional wine and coffee tasters are in general more consistent than novices in their description of aroma and flavor. However, to the best of our knowledge, there is no specific study aiming directly at comparing the contribution of two different types of information, that is, numeric variables and text reviews, on the classification of wine ratings.
In this study, we will investigate whether wine reviews provide more useful information than numeric measurements in the classification of wine ratings. The study is conducted on a large data set that contains 271,461 wine reviews from Wine Spectator. Based on this study, we will answer the following questions. (1) Which type of variables is more informative, numeric variables or extracted feature variables from wine reviews? (2) Which of the three feature extraction methods has a better performance measured by the classification accuracy in the study? (3) Whether the two types of variables (i.e., numeric and text) contain unique information and whether using them both in the classification will produce a better result? To address these questions, we create a binary outcome variable based on the wine ratings where one group contains ratings of 90 or above (Class A) and the other group ratings of 89 or below (Class B), then we construct a number of logistic regression models where the explanatory variables include different combinations of numerical covariates such as price and age of wines and numerical representations of text reviews. By comparing the classification accuracy of the models with the 10-fold cross-validation, which is the rate of correct classifications in the test set, we demonstrate that wine reviews contain more useful information than the numerical covariates in the classification of wine ratings.
We choose to use logistic regression as the modeling approach to compare the utility of numeric variables and text reviews for the following reasons. First, wine ratings are subjective measurements on wine quality, where there are random differences in ratings that are close together, particularly when they belong to the same types of wines. A binary classification reduces the influence of random error in subjective wine ratings by grouping the ratings into two classes. Second, our goal is to compare the amount of information carried by numeric variables and text reviews; it is not to find the optimal prediction model on wine ratings. The logistic regression model provides a convenient and straightforward platform to conduct the comparison.
In the next section, we will describe the dataset and data preparation steps used for the analysis. The details of the three text feature extraction methods will be covered in Section III. The summary of the comparison of the classification study results will be presented in Section IV. The final section provides a discussion of the findings and limitations of the study as well as directions for future research.

II. Data
The dataset used in the study contains 271,461 wine reviews from 10 reviewers published in Wine Spectator. The journal has the highest circulation of any wine magazine in the United States. Each year, its editors choose more than 15,000 wines to review with detailed tasting notes, numeric ratings, and recommendations. The reviews are provided by professional wine experts. The tastings are conducted under controlled conditions where the reviewers are only aware of the general type of the wine and its vintage. Additional information about how tastings are conducted can be found at https://www.winespectator.com/articles/about-our-tastings.
The dataset contains almost all reviews published on the Wine Spectator website (winespectator.com) from 1983 to June 2020. Wines with a price of $1,000 or higher or an age of 50 years or more were excluded from the analysis to reduce the impact of potentially influential data points. In addition, we only include reviews from reviewers who have made frequent contributions to wine tasting. Table 1 shows the Wine Spectator reviewers included in the dataset.
In this study, we construct logistic regression models to predict wine ratings as a binary outcome. To determine the cutoff point for the binary classification of wine ratings, we include the summary of the distribution of wine ratings provided by Wine Spectator (Table 2). Note that 50 is the lowest possible score.
The center of the distribution falls in the 85-89 range, where the median is 88, and the mean is 87.55, so it follows logically that the cutoff point between the two classes should be close to this score range. We place the cutoff point at 89/90, that is, all the ratings between 90-100 are defined as Class A, and all the ratings between 50-89 are defined as Class B. The same cutoff value was used by Chen et al. (2018). Figure 1 shows the distribution of the ratings. The distribution is slightly left-skewed and centered around 88, where roughly 68% of the reviews are in Class B.
In addition to the text reviews and ratings, the data also include variables such as price, review year, vintage, reviewer, and (in some cases) country and wine type. We have created a new variable age, calculated as review yearvintage (this would represent the length of time in years between the time the wine was bottled and the time it was tasted). Records with missing price or vintage were excluded. Table 3 contains the summary statistics for rating, age, and price of the wines in the dataset.
Both the descriptive statistics and the boxplots in Figures 2 and 3 show that age and price are right-skewed. Heavily skewed data can create influential observations that can lead to biased inference results. For this reason, we have applied the log transformation to both variables. Note that the age of a wine is 0 if the wine's vintage and review year are the same, where log(age = 0) is undefined. To avoid this, we have applied the log (age) = log(age + 1) transformation instead. Figures 2  and 3 show the boxplots of both variables before and after the log transformation, separated by rating class. The distribution for each variable is slightly higher for

III. Text Feature Extraction Methods
The first step in most NLP tasks is feature extraction, where a text document is converted to some form of numeric representation to be used in further data analysis. In  this study, we perform three feature extraction methods: LDA, TF-IDF, and Doc2Vec. The extracted features are then used as input by a logistic regression classifier-both on an individual basis and in conjunction with numeric variables (i.e., age and price) to compare their performance in the classification of wine ratings.

A. Latent Dirichlet Allocation
LDA is a three-level hierarchical Bayesian model that is based on a very simple intuition: within a corpus, there is a fixed number of topics, each topic can be described as a distribution of words, and each document can be described as a distribution of topics (Blei, Ng, and Jordan, 2003). LDA assumes that the generation of a document adopts a generative process. The process starts with generating a random topic from the distribution of a fixed number of topics, where each topic is associated with a probabilistic distribution of words. Next, a word is generated according to the distribution of words of the chosen topic. Finally, documents are then produced by repeating this generative process. Both the topic distribution and the word distribution within a topic follow a Dirichlet distribution, a multivariate generalization of the beta distribution. Figure 4 provides a graphic illustration of the LDA architecture.
At the top layer, the corpus consists of a collection of reviews, and each review is composed of sequential input of words. In the second layer, each review is decomposed into a distribution of topics. Each topic is then further decomposed into a distribution of words. The training of the LDA model involves finding the distribution of topics, and within each topic, the distribution of words. Mathematically, this process aims to derive a joint posterior probability that involves both the distribution of topics and the topic-specific distributions of words. Typically, this probability is intractable and must be approximated with some known probability distributions. Python's "genism" package conducts this variational inference by minimizing some notion of distance (e.g., the Kullback-Leibler divergence) between the true posterior and its approximation.
Overall, the LDA algorithm takes in the assumed number of topics, trains the model over the corpus, and converts each document into a discrete distribution of topics. We will use these distributions of topics as our document representations.

B. Term Frequency-Inverse Document Frequency
Term frequency representation transforms a document into a vector of count numbers that measure how frequently words/terms occur in a document. The TF method stems from the simple idea that the frequencies of terms could serve as a quantitative representation of the document. One problem with TF is that certain words tend to appear very often but contain little domain-specific information. Such words include function words that are not context-specific (e.g., "the," "a," and "is"). Some domain-specific words could also fall under this category. In the case of wine review, words such as "wine" or "taste" probably tell us little about the quality of the wine. The TF-IDF method is introduced to mitigate the effect of high-frequency function words, where the term frequency is weighted down by how often it is used. This weight is defined as:

IDF(term) ¼ ln
Number of documents Number of documents containing term Figure 5 provides an example of the TF and TF-IDF methods, where two reviews are converted to the TF and TF-IDF representation, respectively. Notice that the function words "and" and "is" are the only two words that appeared in both reviews, and both words are being more penalized for their prevalence in the corpus.
The TF-IDF approach has several advantages. First, it is easy to implement. Second, TF-IDF prevents the representation from being inflated by high-frequency function words. And last, the extracted information can be readily used to compute the similarities between two documents. However, TF-IDF also has a substantial disadvantage. It is based on the BoW model, which assumes that a text document is represented as a set of words, ignoring its contextual structure and word order. This inherent shortcoming prevents the BoW-based approach from exploiting the contextual information in the document, which can be very valuable in text analysis.
Both TF and TF-IDF produce very sparse document representations. The wine review corpus includes over 13,000 unique words, and each document typically consists of less than 100 words. Therefore, each document vector is made of mostly zero entries that contain no information. Truncated singular value decomposition (Truncated SVD) (Hansen, 1987) is employed to compress these long and sparse vectors into short and dense vectors. Singular value decomposition factorizes a rectangle data matrix as the multiplication of three constituent parts, typically denoted as A = U * Ω * V, where A is the data matrix with the rows as observations and the columns as the features. If A has a dimension of n by m, then U would be an n by n matrix, Ω an n by m rectangular diagonal matrix, and V an m by m matrix. The   Truncated SVD retains k largest singular values in Ω, the first k columns of U matrix, and the first k columns of V matrix. As a result of truncation, the reconstructed truncated data matrix A k is an n by k matrix with rank k.

C. Word2Vec and Doc2Vec Representation
In contrast to frequency-based feature extraction, an alternative is a predictionbased approach, which extracts features to capture the contextual information of texts by applying neural networks to perform a prediction task. Word2Vec is a prediction-based method first proposed by Mikolov et al. (2013). Subsequently, Doc2Vec is introduced by Le and Mikolov (2014) as a generalization of Word2Vec, which performs better than Word2Vec in many specific scenarios.
Word2vec and Doc2vec models take a corpus of review texts as input and produce a vector space. Each word and document have a unique representation in this vector space where word similarities are represented by measures of distances like the cosine similarity, as shown in Figure 6. It is suggested that this approach allows word vectors to retain syntax and semantics. Word2Vec can be viewed as an improvement over TF-IDF in the sense that Word2Vec exploits contextual information that TF-IDF ignores.
Word2Vec utilizes a two-layer neural network structure to train the word vectors. Each document is first converted into a one-hot encoded input vector, then compressed to a hidden layer of a pre-specified number of neurons, and finally passed on to an output layer. Word2Vec comes in two forms, the continuous bag-ofwords (CBoW) model and the skip-gram model. The CBoW model randomly

Review 2
Note: harmon* represents harmonious. masks some words from the input and uses the words adjacent to the masked word (i.e., surrounding words) to predict the masked words. The skip-gram model is constructed to predict the surrounding words given the target words. Word vectors are first assigned with randomly generated weights, and through appropriate tuning and training, the algorithm eventually learns meaningful representations of words. Then, document vectors can be derived by averaging over the word vectors of words in a specific document.
Word2Vec is capable of encapsulating the semantic information of words, and thus it can be used to compare text similarities. However, it often falls short of assessing document similarities-the average pooling of word vectors is not an ideal approach to generate document embeddings because the word-ordering information is lost. Built on the idea of the Word2Vec model, the Doc2Vec model introduces a document-specific ID vector in addition to the word vectors. Instead of just using word vectors to perform the prediction task, Doc2Vec also includes this documentspecific vector alongside the word vectors. This document-specific vector produces document embedding at the end instead of averaging over the word vectors. It has been shown to be a more appropriate choice for many document-level tasks.
Like Word2Vec, Doc2Vec can also be implemented by two models. The first is the distributed memory (DM) model, which is like the CBoW model in Word2Vec. DM is trained to predict the target words from its neighboring words with the addition of the document-specific ID vector. The second one is the distributed bag-of-words (DBoW) model, the Doc2Vec equivalent of the skip-gram model, which aims to predict the surrounding words from a target word-again, with the addition of the Illustration of Word2Vec and Doc2Vec document-specific ID vector. Le and Mikolov (2014) suggested that DM performs slightly better than DBoW in many instances. As a result, the DM model is implemented for Doc2Vec in this study.

D. Implementation
All three feature extraction models are implemented with Python's "gensim" package. We first preprocess the data by reducing all letters to lower case, which is known as case-folding. Case-folding allows a word (e.g., Spicy) at the beginning of a sentence to be recognized as the same word (i.e., spicy) that appears in the middle of a sentence. We also remove numbers and punctuations from the reviews. The numbers are removed because, in our dataset, they are only used to describe the number of cases imported, which are irrelevant to the sensory information of the wines. We removed punctuation because we believe it to contain little information regarding the overall sentiment of the reviews. As for LDA, in addition to numbers and punctuations, we also remove the high-frequency function words. Otherwise, LDA might have trouble generating a valid distribution of the topics and the associated word distributions of the topics. By contrast, both TF-IDF and Doc2Vec have a built-in mechanism to deal with these high-frequency words.
Because of this, we do not have to manually remove them during data preprocessing. While words with high frequencies are weighed down in TF-IDF, such words are under-sampled in Doc2Vec during the training process so that they appear less frequently as the prediction targets.
Furthermore, the truncated SVD is implemented to shrink the dimension of document vectors down to 50 in TF-IDF. Note that LDA requires the number of topics to be pre-determined. We have tried a range of different values for the number of topics in LDA (i.e., from 2 to 50) and generated document representations based on each of these values. We find that the performance on the LDA representation is insensitive to the choice of the number of topics. Therefore, we use five as the number of topics in LDA instead of listing the results from the other values.
Based on the implementation of these three methods following the steps described earlier, the feature exacted under LDA is represented by a 5-dimensional vector for each text, while the features exacted under TF-IDF and Doc2Vec are represented by a 50-dimensional vector, respectively.

IV. Classification Results
We created a binary outcome variable based on the wine ratings where one group contains ratings of 90 or above (Class A) and the other group ratings of 89 or below (Class B). This is our dependent variable. We then have constructed logistic regression models where the explanatory variables include different combinations of numerical covariates such as price and age of wines and the extracted features of text reviews, respectively. The extracted features of the text are abstract and do not have a straightforward interpretation as the numeric variables. Note that the focus of the study is to compare the utility of the two types of variables by searching for the best logistic model (i.e., classifier) that can yield the most accurate classification of the binary wine rating outcome. So we are not interested in the parameter estimation of the regression coefficients and their interpretations. The ultimate measure is classification accuracy. All the logistic models are implemented with the 10-fold crossvalidation, which produces an out-of-sample estimate of the classification accuracy to obtain an objective evaluation of the models' performance.

A. Single-Variable Classification
First, we fit a logistic regression model on each of the variables to get respective baseline prediction accuracy levels, where the variable can be a single numeric variable or a feature vector. Table 4 shows the results in the ascending order of classification accuracy (single-variable classifications are 1, 2, 4, 6, and 7). Regarding the two numeric variables (i.e., age and price), price alone yields a much better classification accuracy than the variable age. Recall that the dataset is split into Class A, which contains 32% of all the wines, and Class B, which contains 68%. This unbalanced class proportion indicates that age is adding very little value to the classification since its performance (Model 1) is only slightly better than the 68% baseline. Price (Model 4), on the other hand, provides a 9% boost in accuracy over the 68% baseline.
Among the feature extraction methods, LDA does not perform well (Model 2). With a classification accuracy rate of 69.5%, only 1.5% above the baseline of 68%, LDA performs substantially worse than the model with price alone. This poor performance can be attributed to the fact that LDA is not designed for such classification/prediction tasks. LDA identifies latent topics in text documents. Such topics are useful in cataloging documents, but they provide little information on how text reviews are related to their ratings. For example, the LDA model might tell us that the topic "flavor" is very likely present in a review, but it is not able to tell us the sentiment associated with it (i.e., if the taste is good or bad). The other two text representations, on the other hand, yield substantially better classification accuracy rates, outperforming the one based on price alone. In particular, the Doc2Vec feature (Model 7) introduces a roughly 15% boost from the baseline, an increase of 14% in classification accuracy from that of the LDA model, 6.4% from that of price, and about 2% from that of TF-IDF (Model 6). This result is not surprising, as the contextual information that is best captured by the Doc2Vec algorithm is most relevant to the wine ratings. The comparison of classification performance based on individual variables suggests that the performance of text-based variables depends on the choice of feature extraction methods, and the text-based variables generated from an appropriate feature extraction method do carry more useful information than the numerical variables.

B. Multiple-Variable Classification
Next, we include both types of variables in the model to find out whether the model performance can be further improved. As shown in Table 4, the best performing model among all the fitted models is Model 9, which includes age, price, and the Doc2Vec embedding. It outperforms the single variable model with the Doc2Vec embedding (Model 7) by more than 2%. This suggests that while text-based variables are more useful than the numeric variables from the individual contribution perspective, the numeric variables still add additional information leading to an improvement in the classification results. A similar conclusion can be made between Model 8 and Model 6 for the TF-IDF embedding. Note that the gap in accuracy between Doc2Vec embedding (Model 7) and the TF-IDF embedding (Model 6) shrinks with the inclusion of price and age, that is, Model 7 with Doc2Vec has an edge of 1.9% over Model 6 with TF-IDF while Model 9 (Doc2Vec with age and price) only has 1.3% over Model 8 (TF-IDF with age and price). This suggests that some of the advantages in the predictive power that Doc2Vec holds over TF-IDF can probably be explained by the two numeric variables (price and age).

V. Discussion
In this paper, we have compared the utility of text reviews and numeric variables in a classification study of wine ratings. By comparing the classification accuracy based on logistic regression models with the 10-fold cross-validation, we have demonstrated that professional wine reviews can be more useful than some common numeric variables such as age and price of wines in wine data analysis. Furthermore, incorporating both text reviews and numeric variables can yield improved analysis results over the analysis with only one type of variable. It is also worth mentioning that different text analysis methods are designed to address different purposes. It is important to choose a suitable feature extraction method for a particular study. Otherwise, the result can be misleading, and the advantage of text analysis may not be fully exploited. Specifically, we have found that LDA, which is used to find the distribution of topics in the text, may be an improper feature extraction method in this kind of classification study. On the other hand, the BoW-based method (TF-IDF) and the prediction-based method (Doc2Vec), where the extracted features are more related to the content of texts, both demonstrate their utility in the study. Given the sheer size of the dataset, spanning close to 300,000 wines over almost four decades of wine tastings, these results can be generalized to similar data sets or studies with a high level of confidence.
One potential limitation in this study is the small number of numeric variables included in the logistic models. Variables, such as wines' physiochemical measurements, grape varieties, and production sites, are not available for the data used in this analysis. If we can obtain information on some of those variables, we will conduct a more thorough study of this kind. Another direction we consider is to investigate the utility of wine reviews provided by general customers to find out whether they are significantly different from the professional wine reviews.