Table 4. Synthesis of individual studies focusing exclusively in performance prediction.
Study General Aim Outcomes Predicted Key Performance Metrics Interpretability / Key Insights Main Results & Conclusions
(Cornforth et al., 2015) Performance Prediction Prediction of in-game performance in elite Australian football players using pre-match HRV measures (time, frequency, nonlinear domains) plus environmental/field data. Best correlations with GA wrapper + regression algorithms: Walk r=0.76, Jog r=0.75, Cruise r=0.73, Player Load r=0.72, Match Distance r=0.73. PCA improved slightly over all-variables approach, but GA wrapper yielded the highest predictive performance (mean r=0.60 vs. 0.49–0.53). Highlighted the value of advanced regression (esp. SMOreg, Gaussian Processes) combined with feature selection. Identified HRV-derived features (esp. nonlinear measures) plus environmental conditions (temperature, field size) as significant contributors to match performance. Authors conclude sophisticated regression models can predict match performance >0.70 correlation from HRV and environmental data. Potential to support player selection decisions and training load adjustments tailored to field dimensions and match-day conditions. Early demonstration of sport informatics potential in team sport.
(Duncan et al., 2024) Performance Prediction Dribbling skill (UGent dribbling test, skill differential with/without ball). Initial accuracy: linear ~57%, ridge ~48%, lasso ~34%, RF ~68%, boosted ~66%. When stratified by age band: RF 98.6%, boosted trees 96.1%, lasso 94.1%, linear 91.9%. Feature importance: FMS score most influential, followed by coach overall rating, years of playing experience, and APHV. Birth quartile and chronological age least important. ML showed technical skills can be predicted with high accuracy from multidimensional inputs, especially FMS. Supports theory that broad motor skill competence underpins technical soccer ability. Coaches should emphasize FMS training before sport-specific drills. Suggests a shift away from over-reliance on physical testing alone.
(Sandamal et al., 2024) Performance Prediction Prediction of soccer players’ performance in field-based tests: Dribbling Shuttle Test (DSt), Goal Accuracy Test (GAt), and Yo-Yo Intermittent Recovery Test Level 1 (YYIRT1). XGBoost consistently outperformed RF and KNN across tests (highest R2 and lowest error). RF showed moderate accuracy, KNN lowest. Performance varied between cohorts, with Karakalpakstan athletes showing reduced predicted fitness values. SHAP global explanations: anthropometric (sitting height, meso breadth), hematological, and hormonal markers (E2, IGF-1, cortisol, testosterone) emerged as top predictors. LIME local explanations confirmed hormonal differences: E2, IGF-1, cortisol strongly impacted fitness in environmentally exposed group, while testosterone was more influential in controls. Authors conclude explainable ML (esp. XGBoost + SHAP/LIME) offers accurate and interpretable fitness prediction in young soccer players. Results highlight negative effects of environmental degradation (Aral Sea region) on hormonal balance and physical performance. Study demonstrates value of explainable AI for screening and tailoring training in vulnerable populations. Limitations: relatively small cohorts, region-specific findings, no external validation.
(Sanjaykumar et al., 2024) Performance Prediction Prediction of on-court performance based on demographic and physical attributes (age, height, weight, fat %, muscle mass, bone mass, BMI). RF: R2=0.9418, accuracy=94.18%, RMSE=2.67. XGBoost: R2=0.9276, acc=92.76%, RMSE=2.98. Linear Regression weaker: R2=0.7531, acc=75.31%, RMSE=5.51. Correlation analysis: Height (r=0.879), muscle mass (r=0.653), bone mass (r=0.622) strongly positively related to performance. BMI not significant (r=0.04). RF captured nonlinearities best; XGBoost close. Authors conclude ML—especially Random Forest—provides accurate and objective prediction of volleyball performance from physical attributes. Supports more data-driven talent ID, moving beyond subjective scouting. Future work: integrate skill and psychological factors, extend to diverse populations.
ACC = Accuracy; AUC = Area Under the Curve; APHV = Age at Peak Height Velocity; BMI = Body Mass Index; DSt = Dribbling Shuttle Test; FMS = Fundamental Movement Skills; GA = Genetic Algorithm; GAt = Goal Accuracy Test; HRV = Heart Rate Variability; IGF-1 = Insulin-like Growth Factor 1; KNN = K-Nearest Neighbors; LASSO = Least Absolute Shrinkage and Selection Operator; LIME = Local Interpretable Model-agnostic Explanations; PCA = Principal Component Analysis; R2 = Coefficient of Determination; RF = Random Forest; RMSE = Root Mean Squared Error; SHAP = SHapley Additive exPlanations; SMOreg = Sequential Minimal Optimization regression; XGBoost = Extreme Gradient Boosting; YYIRT1 = Yo-Yo Intermittent Recovery Test, Level 1.