Abstract
ABSTRACT
Background: Batch reactor process optimization has traditionally relied on Analysis of Variance (ANOVA) for factor effect quantification. However, Structural Equation Modeling (SEM) and machine learning (ML) offer complementary mechanistic and predictive capabilities that remain underexplored in chemical engineering applications. This study presents a methodological triangulation framework comparing ANOVA, SEM, and ML for optimizing esterification reactions.
Methods: We generated a synthetic kinetic dataset (N = 1,024 observations) from a 4×4×2×2×2×2 full factorial design simulating batch esterification of acetic acid with ethanol. Four operational factors were investigated: temperature (35-95°C), acid concentration (0.5-3.5 M), catalyst concentration (0.01-0.05 M), and reaction time (60-180 min). Three analytical methods were applied: (1) ANOVA with effect size quantification (partial η²), (2) SEM testing causal pathways (Temperature → ln(k) → Conversion → Yield), and (3) ensemble ML (XGBoost) with SHAP value interpretation and partial dependence analysis. Convergence across methods was assessed via Spearman rank correlations and optimal condition agreement.
Results: All three methods achieved perfect ordinal agreement on factor importance rankings: Temperature (ANOVA η² = 0.359, SEM standardized β = 0.603, ML mean |SHAP| = 10.05) > Acid Concentration (0.144, indirect effect through Conversion, 7.68) ≈ Catalyst Concentration (0.105, 0.944 on ln(k), 7.61) > Reaction Time (0.019, excluded from SEM, 3.06). Quantitative convergence was demonstrated by near-perfect correlations: ANOVA-ML (Spearman ρ = 1.000, p < 0.001), ANOVA-SEM (ρ = 0.800, p < 0.001), and SEM-ML (ρ = 0.800, p < 0.001). SEM confirmed full mediation (100% indirect effect) of temperature through the Arrhenius kinetic pathway, validating theoretical expectations. XGBoost achieved superior predictive performance (Test R² = 0.949, RMSE = 2.67%) compared to linear regression (R² = 0.782) while automatically capturing interaction effects. Consensus optimal conditions were identified: temperature 90-95°C, acid concentration 3.0-3.5 M, catalyst concentration 0.05-0.07 M, and reaction time 180 min, yielding predicted maximum conversion of 100%.
Conclusions: Methodological triangulation across ANOVA, SEM, and ML provides robust, convergent evidence for factor importance rankings and optimal operating conditions, with each method offering unique strengths: ANOVA delivers interpretable main effects and interaction quantification, SEM elucidates mechanistic causal pathways, and ML enables high-accuracy prediction with automatic nonlinearity/interaction detection. The demonstrated convergence (ρ = 0.80-1.00) validates that fundamentally different analytical approaches reach consistent conclusions when applied to well-structured process data, increasing confidence beyond single-method analyses. We recommend multi-method frameworks become standard practice in chemical process optimization, particularly for systems where mechanistic understanding (SEM), experimental efficiency (ANOVA), and predictive accuracy (ML) are all valued. Future work should validate predictions via pilot-scale experiments, incorporate rigorous thermodynamic equilibrium constraints, and extend the framework to continuous reactor systems and multi-objective optimization scenarios balancing yield, cost, and environmental sustainability.
Keywords: Batch reactor optimization; Esterification kinetics; Analysis of variance; Structural equation modeling; Machine learning; XGBoost; SHAP; Methodological triangulation; Process intensification; Chemical reaction engineering; Factorial design; Predictive modeling
________________________________________
Supplementary materials
Title
Supplementary Materials
Description
Tables & Figures
Actions
Title
Supplementary Dataset Files for Esterification Process Modeling
Description
These datasets contain the complete synthetic experimental matrices used for the esterification batch reactor optimization study. The files include input variables (reaction time, temperature, catalyst loading, molar ratio, agitation speed) and corresponding output responses. The datasets were generated according to standard reaction engineering ranges and follow the structure required for structural equation modeling (SEM) and ensemble machine-learning algorithms. All preprocessing steps (scaling, encoding, cleaning) and variable definitions are documented.
Actions
Supplementary weblinks
Title
Code and Workflow Repository
Description
This GitHub repository contains the full source code, analysis scripts, and workflow used to perform the structural equation modeling (SEM), machine-learning optimization, and statistical validation for the esterification study. The repository includes:
- Data preprocessing scripts
- SEM model specification and estimation
- Ensemble ML models (RF, XGBoost, GBM, stacking)
- Performance metrics and visualization scripts
- Jupyter notebooks and reproducibility instructions
Version-controlled files enable complete reproducibility of the results presented in the manuscript.
Actions
View 


![Author ORCID: We display the ORCID iD icon alongside authors names on our website to acknowledge that the ORCiD has been authenticated when entered by the user. To view the users ORCiD record click the icon. [opens in a new tab]](https://www.cambridge.org/engage/assets/public/coe/logo/orcid.png)