Hostname: page-component-5db58dd55d-qmkzp Total loading time: 0 Render date: 2026-06-28T11:32:40.975Z Has data issue: false hasContentIssue false

Predicting Thoroughbred Yearling Auction Prices with Machine Learning: Evidence from the Keeneland September Sale

Published online by Cambridge University Press:  26 June 2026

Yanchao Yang*
Affiliation:
School of Business and Leadership, DePauw University, USA
John Clarke
Affiliation:
EMR Metal Recycling, USA
Tapan Mandal
Affiliation:
School of Business and Leadership, DePauw University, USA
Thinh Nguyen
Affiliation:
School of Business and Leadership, DePauw University, USA
Trung Pham
Affiliation:
School of Business and Leadership, DePauw University, USA
*
Corresponding author: Yanchao Yang; Email: yyang056@ucr.edu
Rights & Permissions [Opens in a new window]

Abstract

We apply machine learning methods to predict Thoroughbred yearling auction prices at the Keeneland September Sale (2020–2024). Our sample includes 5,788 yearling prices with pedigree data. We use both linear and tree-based models to predict log prices. We use cross-validation to tune model hyperparameters and select Ridge regression (α = 1.451) as the primary model for interpretation given its stability and interpretability. The Ridge regression explains approximately 54% of out-of-sample variation (R2≈ 0.5403). Sire and Dam Reputation emerge as the dominant predictors. Results provide pricing benchmarks and show how reputation and session structure shape Thoroughbred yearling auction prices.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2026. Published by Cambridge University Press on behalf of Southern Agricultural Economics Association
Figure 0

Table 1. Summary statistics for auction prices, pedigree performance, dosage profile, and reputation metricsTable 1 long description.

Figure 1

Figure 1. Sale price distribution by session.

Figure 2

Table 2. Model performance comparison

Figure 3

Table 3. Ridge model specification

Figure 4

Figure 2. Coefficients from the Ridge model.

Figure 5

Table 5. Session-level model performance (ranked by R2)

Figure 6

Table 6. Session-level ridge regression coefficients

Figure 7

Table 4. Model diagnostic test results

Supplementary material: File

Yang et al. supplementary material

Yang et al. supplementary material
Download Yang et al. supplementary material(File)
File 252.3 KB