Table 3. Synthesis of individual studies focusing exclusively in selection prediction.
Study General Aim Outcomes Predicted Key Performance Metrics Interpretability / Key Insights Main Results & Conclusions
(Abidin and Erdem, 2025) Selection Prediction Stage 1: Admission (Pass/Fail). Stage 2: Branch allocation (Football, Basketball, Volleyball, Athletics, Other). Stage 1: 98.9% accuracy (SDL). Stage 2: 97.4% accuracy, MCC 96.6% (SCM-DL, 6 features). Feature selection revealed 6 key features spanning device tests & coach ratings; novel SCM-DL architecture captured hierarchical relations. Authors conclude SCM-DL outperforms classical ML, can generalize to hierarchical datasets, and helps coaches prioritize features. External validity remains untested.
(Altmann et al., 2024) Selection Prediction Selection vs. deselection to the next age group (U12–U19) in elite German youth soccer academy across 7 years. Best model XGBoost: ROC-AUC 0.69, F1-score 0.84. Models more sensitive to “selected” than “deselected.” Physical & physiological factors (linear sprint, COD sprint, CMJ, aerobic speed reserve) and soccer-specific skill most influential. Psychological measures of medium importance; health, age, and position-related variables inconsistent. Authors conclude physical and skill- related measures are most decisive in selection/deselection; psychological factors moderate contributors. Suggests focusing academy monitoring on speed, power, endurance, and soccer-specific skill. Limitations: internal validation only, moderate discriminative ability (AUC <0.70).
(Brown et al., 2024) Selection Prediction & Profiling Differences between selected vs. non-selected youth male cricketers (U14–17) and between White British (WB) vs. British South Asian (BSA) selected players in County Age Group (CAG) programmes. Not accuracy-based: model estimated probability shifts. Positive predictors of selection: athleticism, wellbeing/cohesion, birth in Q2–Q3, older brothers. Negative predictors: higher psych. scores, antisocial behaviour, younger brothers/older sisters. Ethnic group differences observed in athleticism, wellbeing, distress, antisocial behaviour. Multidimensional input: 104 characteristics across 5 domains (physiological, perceptual-cognitive, psychological, participation history, socio-cultural). Analysis identified interaction between family structure, socio-cultural factors, and selection outcomes. Authors conclude both athletic and socio-cultural variables play significant roles in selection. Highlight disparities: despite high BSA participation in grassroots cricket, nder-representation persists at selection level. Suggest systemic bias may influence CAG selection. Findings exploratory; sample small (N=82).
(Craig and Swinton, 2021) Selection Prediction Whether anthropometric (height, mass, BMI) and physical performance tests (20m sprint, CMJ, YoYo IR1) predict awarding of professional contracts in an elite Scottish soccer academy over 10 years. Despite significant mean differences (successful players taller, faster, higher CMJ), predictive accuracy was near random: error proportion 0.43 (train), 0.45 (test) vs. 0.50 for random guessing. Relative age effect (RAE) very strong: 50% of successful contracts born in Q1. CMJ, stature, and sprint had small associations but high overlap with non- successful players. No reliable case-level prediction possible. Authors conclude that anthropometric and physical performance profiling alone cannot predict professional contract success within already talented academy players. Recommend data be used to guide training, not selection. Suggest holistic models integrating technical, tactical, psychological, and sociocultural variables, plus coach expertise. Stress need for addressing RAE bias (e.g., bio-banding, scout education).
(Formenti et al., 2022) Selection Prediction Classification of female junior volleyball players as regional vs. provincial level based on volleyball-specific skills, physical performance, and cognitive functions. Decision Tree: Precision 93%, Recall 73%, F1 = 0.83. Other models (LD, LR, SVM) performed lower (Precision 47–63%, Recall 57–73%). DT identified passing and spiking technique plus cognitive task response times (Flanker congruent/incongruent, Visual search 10/15 items) as key discriminators. Physical tests (COD, CMJ) contributed less. Authors conclude that higher-level players outperform lower-level peers across volleyball skills, COD, CMJ, and cognitive functions. ML results emphasize the role of cognitive functions + technical skills (passing, spiking) in discriminating competitive level. Practical recommendation: include training of both volleyball-specific techniques and executive/perceptual skills in youth development.
(Jauhiainen et al., 2019) Selection Prediction Detection of potential elite youth soccer players (academy contracts) from large dataset of junior players (N=951, age 14). Best performance with “phys large” dataset (N=951, 16 physical test variables): AUC-ROC = 0.763 (±0.007), AUC-PR = 0.960, Sensitivity = 0.80, Specificity = 0.61. Smaller sets (“phys+quest”, “quest”) performed worse (AUC-ROC 0.58–0.66). Demonstrated utility of anomaly detection for imbalanced TID problems (14 academy vs. 937 non-academy). Physical tests (jump, sprint, agility) more predictive than questionnaire/self-assessment. Nonlinear SVM outperformed linear baseline. Authors conclude that one-class SVM can moderately identify future academy players but specificity remains limited (many false positives). Results promising but not sufficient for stand-alone selection. Recommend larger datasets, longitudinal validation, and integration of multidimensional variables.
(Jennings et al., 2024) Selection Prediction Drafted vs. not-drafted players in the AFL National Draft (2021) using physical, GPS (in-game movement), and technical involvement data. Neural networks consistently outperformed logistic regression: NN specificity = 79 ± 13%, sensitivity = 61 ± 24%, accuracy = 76 ± 8% vs. LR specificity = 73 ± 15%, sensitivity = 29 ± 14%, accuracy = 66 ± 11%. At draft-rate threshold (15%) and convergence threshold (35%), NN classified more drafted players in 88% of comparisons. Neural networks handled unfactored, high-dimensional inputs better than LR, capturing nonlinear relationships. Logistic regression benefited only when data were factored (dimensionality reduction). Key insight: sensitivity (identifying drafted players) is paramount, and NN achieved superior balance of sensitivity and specificity. Authors conclude that NN models are more effective than logistic regression for predicting draft outcome, particularly when identifying drafted players (sensitivity). Practical implications: clubs may apply NN-based models to complement subjective scouting and reduce bias. Limitations: data restricted to one state league, psychosocial variables absent, career success beyond draft not considered.
(Owen et al., 2022) Selection Prediction Selection vs. non-selection to regional U16 and U18 rugby squads based on 21 physiological and 47 psychosocial factors. Analyses run for all players, forwards, and backs. Physiological models: 67.6% (all), 70.1% (forwards), 62.5% (backs). Psychosocial models: 62.3% (all), 73.7% (forwards), 60.4% (backs). Specificity higher than sensitivity in all cases. Key physiological predictors: greater hand grip strength, faster 10m & 40m sprints, higher power and momentum. Key psychosocial predictors: lower burnout, reduced exhaustion, lower reduced sense of accomplishment, lower life stress (forwards), and lower difficulty describing feelings (forwards). For backs, lower interjected regulation and lower burnout were features. Authors conclude physiological factors (strength, speed, power) are more predictive of rugby selection than psychosocial ones, but psychosocial variables (especially lower burnout and stress) also play a significant role. Position-specific differences exist (e.g., emotional regulation markers more relevant for forwards). Recommend holistic, position-tailored selection frameworks including psychosocial screening alongside physiological testing.
(Theagarajan and Bhanu, 2021) Selection Support Classification of students’ sports-specific talent category (basketball, volleyball, football, athletics, kabaddi, weightlifting) based on anthropometric and physical fitness attributes. Random Forest highest: 96.2% accuracy; SVM 95.5%; KNN 95.2%; Decision Tree 92.6%; Naïve Bayes 89.8%. Feature importance analysis showed attributes like height, weight, speed, and endurance strongly influenced classification. Models could allocate students to most likely successful sport pathway. Authors conclude ML, especially RF and SVM, can reliably classify school-level athletes into suitable sports, providing data-driven support for talent identification and allocation. Limitations: small, single-institution dataset; attributes mostly physical, excluding psychological/technical. Recommend broader variables and longitudinal validation.
AUC = Area Under the Curve; AUC-PR = Area Under the Precision–Recall Curve; BMI = Body Mass Index; BSA = British South Asian; CAG = County Age Group; CMJ = Countermovement Jump; COD = Change of Direction; DT = Decision Tree; F1 = F1-score (harmonic mean of precision and recall); IR1 (YoYo IR1) = Yo-Yo Intermittent Recovery Test, Level 1; KNN = K-Nearest Neighbors; LD/LDA = Linear Discriminant Analysis; LOOCV = Leave-One-Out Cross-Validation; LR = Logistic Regression; MCC = Matthews Correlation Coefficient; NN = Neural Network; Q1–Q4 = Birth quartiles (Relative Age Effect); RAE = Relative Age Effect; RF = Random Forest; ROC-AUC = Receiver Operating Characteristic – Area Under the Curve; SCM-DL = Split–Combine–Merge Deep Learning; SDL = Shallow Deep Learning; SVM = Support Vector Machine.