Physically Consistent and Unseen-Mix-Validated Machine Learning for Concrete Compressive Strength Prediction

Hongzhi Lu

Authors

Hongzhi Lu School of Industrial Technology, Universiti Sains Malaysia, Gelugor 11800, Malaysia

Keywords:

Concrete compressive strength; Unseen-mix validation; Monotonic machine learning; LightGBM; XGBoost; Conformal prediction; SHAP; Engineering technology

Abstract

Concrete compressive-strength prediction can support early mix screening, but core engineering use requires evidence beyond random test accuracy. This study develops a physically consistent and unseen-mix-validated machine-learning workflow using the UCI concrete compressive-strength dataset with 1030 records, eight input variables and one strength target. Advanced ensembles, including XGBoost, LightGBM, CatBoost and monotonic gradient boosting, were evaluated using repeated random cross-validation and stricter group validation that held out complete mix-proportion families. Additional evidence included nested group hyperparameter tuning, local physical-response diagnostics, split conformal prediction intervals, conditional coverage, residual diagnostics, permutation importance and SHAP analysis. The best random-validation model was LightGBM unrestricted, with MAE = 2.858 MPa, whereas the best unseen-mix validation model was LightGBM unrestricted, with MAE = 3.839 MPa. The tuned monotonic histogram model achieved unseen-mix MAE = 4.056 MPa and R2 = 0.885 while eliminating tested monotonic-response violations. Conditional conformal analysis revealed high-strength undercoverage, with coverage = 0.834. Regime-transfer stress tests further showed elevated error for high-strength extrapolation (best MAE = 7.872 MPa) and low water-to-binder records (best MAE = 7.115 MPa). The results show that concrete-strength models should be judged by unseen-mixture generalization, physical consistency and conditional uncertainty, not random accuracy alone.