Beyond ANOVA: A Structural Equation Modeling and Ensemble Machine Learning Approach to Batch Reactor Process Optimization

Anfal Rababah

doi:10.26434/chemrxiv-2025-nwqc1

Chemical Engineering and Industrial Chemistry

Search within Chemical Engineering and Industrial Chemistry

Beyond ANOVA: A Structural Equation Modeling and Ensemble Machine Learning Approach to Batch Reactor Process Optimization

21 November 2025, Version 1

Working Paper

Anfal Rababah

Show author details

This content is an early or alternative research output and has not been peer-reviewed by Cambridge University Press at the time of posting.

Abstract

ABSTRACT Background: Batch reactor process optimization has traditionally relied on Analysis of Variance (ANOVA) for factor effect quantification. However, Structural Equation Modeling (SEM) and machine learning (ML) offer complementary mechanistic and predictive capabilities that remain underexplored in chemical engineering applications. This study presents a methodological triangulation framework comparing ANOVA, SEM, and ML for optimizing esterification reactions. Methods: We generated a synthetic kinetic dataset (N = 1,024 observations) from a 4×4×2×2×2×2 full factorial design simulating batch esterification of acetic acid with ethanol. Four operational factors were investigated: temperature (35-95°C), acid concentration (0.5-3.5 M), catalyst concentration (0.01-0.05 M), and reaction time (60-180 min). Three analytical methods were applied: (1) ANOVA with effect size quantification (partial η²), (2) SEM testing causal pathways (Temperature → ln(k) → Conversion → Yield), and (3) ensemble ML (XGBoost) with SHAP value interpretation and partial dependence analysis. Convergence across methods was assessed via Spearman rank correlations and optimal condition agreement. Results: All three methods achieved perfect ordinal agreement on factor importance rankings: Temperature (ANOVA η² = 0.359, SEM standardized β = 0.603, ML mean |SHAP| = 10.05) > Acid Concentration (0.144, indirect effect through Conversion, 7.68) ≈ Catalyst Concentration (0.105, 0.944 on ln(k), 7.61) > Reaction Time (0.019, excluded from SEM, 3.06). Quantitative convergence was demonstrated by near-perfect correlations: ANOVA-ML (Spearman ρ = 1.000, p < 0.001), ANOVA-SEM (ρ = 0.800, p < 0.001), and SEM-ML (ρ = 0.800, p < 0.001). SEM confirmed full mediation (100% indirect effect) of temperature through the Arrhenius kinetic pathway, validating theoretical expectations. XGBoost achieved superior predictive performance (Test R² = 0.949, RMSE = 2.67%) compared to linear regression (R² = 0.782) while automatically capturing interaction effects. Consensus optimal conditions were identified: temperature 90-95°C, acid concentration 3.0-3.5 M, catalyst concentration 0.05-0.07 M, and reaction time 180 min, yielding predicted maximum conversion of 100%. Conclusions: Methodological triangulation across ANOVA, SEM, and ML provides robust, convergent evidence for factor importance rankings and optimal operating conditions, with each method offering unique strengths: ANOVA delivers interpretable main effects and interaction quantification, SEM elucidates mechanistic causal pathways, and ML enables high-accuracy prediction with automatic nonlinearity/interaction detection. The demonstrated convergence (ρ = 0.80-1.00) validates that fundamentally different analytical approaches reach consistent conclusions when applied to well-structured process data, increasing confidence beyond single-method analyses. We recommend multi-method frameworks become standard practice in chemical process optimization, particularly for systems where mechanistic understanding (SEM), experimental efficiency (ANOVA), and predictive accuracy (ML) are all valued. Future work should validate predictions via pilot-scale experiments, incorporate rigorous thermodynamic equilibrium constraints, and extend the framework to continuous reactor systems and multi-objective optimization scenarios balancing yield, cost, and environmental sustainability. Keywords: Batch reactor optimization; Esterification kinetics; Analysis of variance; Structural equation modeling; Machine learning; XGBoost; SHAP; Methodological triangulation; Process intensification; Chemical reaction engineering; Factorial design; Predictive modeling ________________________________________

Keywords

Batch reactor optimization

Esterification kinetics

Analysis of variance

Structural equation modeling

Machine learning

XGBoost

Methodological triangulation

Process intensification

Polynomial Regression

Chemical reaction engineering

Factorial design

Supplementary materials

Title

Description

Actions

Title

Supplementary Materials

Description

Tables & Figures

Actions

Title

Supplementary Dataset Files for Esterification Process Modeling

Description

These datasets contain the complete synthetic experimental matrices used for the esterification batch reactor optimization study. The files include input variables (reaction time, temperature, catalyst loading, molar ratio, agitation speed) and corresponding output responses. The datasets were generated according to standard reaction engineering ranges and follow the structure required for structural equation modeling (SEM) and ensemble machine-learning algorithms. All preprocessing steps (scaling, encoding, cleaning) and variable definitions are documented.

Actions

Supplementary weblinks

Title

Description

Actions

Title

Code and Workflow Repository

Description

This GitHub repository contains the full source code, analysis scripts, and workflow used to perform the structural equation modeling (SEM), machine-learning optimization, and statistical validation for the esterification study. The repository includes: - Data preprocessing scripts - SEM model specification and estimation - Ensemble ML models (RF, XGBoost, GBM, stacking) - Performance metrics and visualization scripts - Jupyter notebooks and reproducibility instructions Version-controlled files enable complete reproducibility of the results presented in the manuscript.

Actions

View

Comments

Comments are not moderated before they are posted, but they can be removed by the site moderators if they are found to be in contravention of our Commenting and Discussion Policy - please read this policy before you post. Comments should be used for scholarly discussion of the content in question. You can find more information about how to use the commenting feature here .

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Version History

Nov 21, 2025 Version 1

Metrics

358

124

Views

Downloads

Citations

License

The content is available under CC BY 4.0

DOI

10.26434/chemrxiv-2025-nwqc1

Author’s competing interest statement

The author(s) have declared they have no conflict of interest with regard to this content

Ethics

The author(s) have declared ethics committee/IRB approval is not relevant to this content

Beyond ANOVA: A Structural Equation Modeling and Ensemble Machine Learning Approach to Batch Reactor Process Optimization

Authors

Abstract

Keywords

Supplementary materials

Supplementary weblinks

Comments

Version History

Metrics

License

DOI

Author’s competing interest statement

Ethics

Share