Table 6. Main results of the individual studies on multiple objectives.
Study General Aim Outcomes Predicted Key Performance Metrics Interpretability / Key Insights Main Results & Conclusions
(de Almeida-Neto et al., 2023) Orientation & Selection Support Predicted similarity between morphological + neuromuscular profiles of youth in Sport Initiation (SI) vs. young athletes in six sports (soccer, swimming, tennis, volleyball, rowing, BJJ). Reliability of MLP models reported at 87%. Similarity scores: SI → Soccer 88%, Swimming 79%, BJJ 77%, Tennis 70% (combined analysis). No significant similarity for Rowing. Demonstrated how MLPs can integrate morphological + neuromuscular + biological maturation factors. Highlighted BM as a major confounder influencing neuromuscular strength and morphology. Suggested that MLPs can reduce selection errors by combining multiple domains. Authors conclude MLPs are effective tools to guide orientation of SI youth into sports matching their physical/neuromuscular profiles, reducing misallocation risk. Stress need to consider biological maturation in TID. Limitations: cross-sectional, small sample (N=75), no longitudinal follow-up, non-elite athletes.
(Contreras-García et al., 2024) Development / Specialization Analysis Classification of shooting zones and detection of outlier patterns to identify early specialization vs. versatility in U14 basketball players compared with professional players. KNN model classification of shots reached 99.6% accuracy (professionals as reference). Outlier analysis: 97.7% of U14 players vs. 64.7% of professionals showed extreme FGA% patterns. Versatility: U14 2.3% vs. Professionals 35.4%. Machine learning cluster analysis identified 8 shooting zones; combined with outlier detection, yielded 7 role categories. Revealed U14 lacked versatility and 3-point shooting ability, often over- specializing in 2–4 midrange zones. Professionals characterized by either versatile players or one-zone specialists. Authors conclude U14 basketball players show premature specialization patterns not aligned with professional demands. Recommend formative training to enhance shooting versatility or to cultivate one-zone specialist roles deliberately. Findings raise concerns that current youth competitions may prioritize short-term success over long-term player development.
(Ge, 2024) Performance Assessment & Training Support Quantitative classification of youth basketball players’ physical fitness (excellent, good, pass, fail) using CNN-AE-MG model. CNN-AE-MG achieved mAP = 89.12%, assessment accuracy = 97.5%. Male subgroup prediction 100% accurate (20/20 correct), female subgroup 95% (19/20 correct). Combination of CNN + Autoencoder enabled unsupervised feature learning, reducing feature loss. Gaussian Mixture with EM algorithm improved classification reliability. Identified endurance (1000m/800m), lung capacity, grip strength as weak areas in youth players. Authors conclude the CNN-AE-MG model provides accurate, dynamic assessment of youth basketball players’ physical health, superior to baseline models. Proposed use for exercise prescription personalization, training program adjustment, and talent selection support. Limitations: single-country, limited external validation, general fitness focus rather than sport-specific outcomes.
(Gogos et al., 2020) Selection Prediction & Career Outcome Forecasting Career outcomes of AFL draftees (matches played, mean AFL Player Rating, mean AFL Player Ranking). Draft combine alone explained <3–4% of variance in career outcomes. Adding draft order & playing position improved variance explained slightly (up to 6%). Individual combine tests explained <2% variance. Boosted trees showed player position (>35% relative importance) and draft order (>25%) far outweighed combine results (<10%). Key forwards showed no clear relation between draft position and in-game performance; midfielders/rucks showed positive relation. Evidence of loss aversion bias: early draftees played more games irrespective of performance. Authors conclude AFL Draft Combine tests are poor predictors of long-term career outcomes. Draft position and playing position provide small additional explanatory power. Suggests physical test batteries are insufficient for TID and should be complemented by in-game skill, decision-making, and contextual factors. Highlights systemic biases (early draft order → more opportunities).
(Kelly et al., 2022) Talent Development (a) Player review ratings (U9–U16, n=98); (b) Selection to professional contract (U18, n=18). Both based on ~53 variables across four domains (technical/tactical, physical, psychological, social). Study 1: 15/53 features had non-zero coefficients; strongest = % predicted adult height (0.196), lob pass (0.160), dribble completion (0.124), total match-play hours (0.145), older relative age. Study 2: strongest predictors of professional contract = PCDEQ Factor 3 (coping with pressures), PCDEQ Factor 4 (ability to organise quality practice), plus progression ratings, slalom dribble, lower home SES. Lasso regression identified holistic, non-linear predictors across all FCM domains. Key insight: psychological factors (esp. coping with pressure, organization) emerged as strongest contributors to contract attainment, not just technical/physical. Also highlights relative age bias and importance of match-play opportunities. Authors conclude that youth development is multifactorial and dynamic. Success not solely determined by technical/tactical ability; psychological resilience and self-organization are critical. Early maturation, relative age, and cumulative match-play also drive coaches’ evaluations. Findings support bio-banding and greater investment in psychological development within academies. Limitations: small samples (esp. Study 2), retrospective data, exploratory nature of ML.
(Kilian et al., 2023) Profiling / Latent Structure Analysis Identification of latent factors underlying multidimensional assessments (technical, tactical, physical, anthropometric, psychosocial). Not predictive classification; evaluated model fit and factor interpretability. nI-WAVE outperformed PCA with clearer separation, fewer cross-loadings. Four interpretable latent factors: (1) Subjective coach evaluations, (2) Anthropometric/age-related (incl. sprint), (3) Technical skills (dribbling, ball control, juggling), (4) Speed/agility. nI-WAVE showed superior interpretability and factor structure stability. Authors conclude that deep learning factor models (nI-WAVE) provide better latent structure recovery than PCA, improving interpretability of multidimensional TID data. Highlight importance of large-scale datasets in advancing ML-based profiling. Limitations: requires large data, anchors affect loadings, only U12 German cohort examined.
(López-De-Armentia, 2024) Scouting Support & Talent Detection Detection of potential women’s football talents across ~30 leagues using automated data collection (Soccerdonna) + alert system. No accuracy metrics (non-ML predictive model). Evaluation: Usefulness 4–5/5; Ease of use 4–5/5; all experts agreed alerts were effective and tool improved efficiency. Tool integrates basic player data (demographics, position, minutes, contract expiry, market value, injuries) with automatic alert generation (e.g., U20 players with 1000 min, >5 goals, or consistent starts). Dashboards allow filtering/searching ~12,000 players. Authors conclude WTDTool increases efficiency and coverage in scouting women’s football, particularly for clubs with limited resources. Experts confirmed ease of use and usefulness. Limitations: women’s data coverage incomplete (contract and market data available for only ~25% of players); no predictive analytics yet. Future: add anomaly detection and integrate multiple data sources.
(Retzepis et al., 2024) Maturation Prediction Classification of athletes with predicted PHV ≤ median vs. > median age, using anthropometric, body composition, and strength measures. LR achieved 96.67% accuracy, 98% recall, 96.33% precision, 97.09% F1-score, ROC AUC 99%. RF and NN slightly lower (94–96%). SHAP (explainable AI) revealed key predictors: sitting height, weight, height, body fat, left & right handgrip strength, father’s height. Sitting height and weight most influential (higher values → PHV > median). Body fat higher predicted PHV ≤ median. Study concludes explainable ML can accurately predict PHV timing in 11-year-old athletes. Key growth and strength indicators (esp. sitting height, weight, grip strength) discriminate maturity status. Findings help avoid misclassification of early maturers as “talents” and support better talent ID, injury prevention, and training load management. Recommends longitudinal validation to confirm predictive power and extend to other sports and female athletes.
(Venkataraman et al., 2024) Scouting Support & Cognitive Profiling Player suitability for selection and development, integrating psychometric (YODA) and coach-based evaluations into a standardized scouting framework (YUVA-SQ). Not accuracy-based: case demonstration. YODA generated trait/personality plots for individual players, producing actionable insights for coaches. Validated by expert use and player development outcomes. YODA psychometric tool provided granular insights into players’ cognitive profile (e.g., coachability, team orientation, game knowledge, analytical style). Combined with coach technical ratings and trial performance for continuous monitoring. Authors conclude YUVA-SQ offers a holistic, standardized scouting framework blending cognitive/behavioral assessment with technical/physical evaluation. Demonstrated utility in restructuring a university football team. Proposed extension to grassroots talent scouting in India, aligning with AIFF “Vision 2047.” Limitations: descriptive case study only, no predictive performance metrics, no large-scale validation.
(Woods et al., 2018a) Talent Development & Competition Comparison Classification of competition level (elite youth U20 vs. senior NRL) using 12 team performance indicators (runs, tackles, missed tackles, kicks, etc.). CI classification tree correctly classified 79% of U20 and 93% of NRL games. Key discriminators: ‘all runs’, ‘tackles’, ‘tackle breaks’, ‘missed tackles’, ‘kicks’. NRL games = more runs and tackles, fewer missed tackles. U20 = higher tackle breaks, more errors. Authors conclude that NRL and U20 competitions show distinct gameplay profiles. U20 players entering NRL may lack exposure to required tackling capacity and physicality. Coaches should focus on tackling ability and physical development in U20s. Suggests “bridging” via State League participation to aid transition. Practical implication: training interventions should aim to align youth gameplay with senior competition demands.
(Zhao et al., 2019) Talent Identification & Sport- Specific Profiling Classification of U15–U16 male athletes (basketball, fencing, judo, swimming, table tennis, volleyball) into their respective sport based on 25 tests (18 anthropometric, 5 physiological, 2 motor). DA: 71.3% correct classification (original: 98.9%). Best: fencing 85%, volleyball 72.7%. Worst: basketball 57.1%. MLP: 71.0% correct classification (original: 99.3%). Best: volleyball 83.4%, table tennis 83.3%. Worst: basketball 20%. Key discriminators: Anthropometry (height, shoulder width, crista width, Achilles tendon length), Motor (back strength, reaction time), Physiological (vital capacity, hemoglobin mass, resting HR). Volleyball = tall stature, strength, high lung capacity. Judo = strength, chest girth, Hb mass. Swimming = lung capacity, tendon length. Fencing = smaller chest/shoulder width. Table tennis = short lower leg length + strong back. Authors conclude that generic test batteries of anthropometric, physiological, and motor measures can differentiate youth athletes by sport with ~70% accuracy, comparable to European studies. Findings confirm discriminative value of body size, strength, and aerobic capacity in talent ID. Basketball was hardest to classify due to small sample size. Implication: test batteries are useful for broad sport allocation, but need more sport-specific, larger-scale validation.
AE = Autoencoder; AFL = Australian Football League; APHV = Age at Peak Height Velocity; BJJ = Brazilian Jiu-Jitsu; BM = Body Mass; CNN = Convolutional Neural Network; CNN-AE-MG = Convolutional Neural Network – Autoencoder – Mixture Gaussian model; CI = Conditional Inference; DA = Discriminant Analysis; EM = Expectation–Maximization; F1 = F1-score (harmonic mean of precision and recall); FGA% = Field Goal Attempt Percentage; Hb = Hemoglobin; HR = Heart Rate; IGF-1 = Insulin-like Growth Factor 1; KNN = K-Nearest Neighbors; Lasso = Least Absolute Shrinkage and Selection Operator regression; LR = Logistic Regression; MLP = Multilayer Perceptron; nI-WAVE = Nonlinear Importance-Weighted Autoencoding Variational Inference with normalizing flow priors; NRL = National Rugby League; PCA = Principal Component Analysis; PCDEQ = Psychological Characteristics of Developing Excellence Questionnaire; PHV = Peak Height Velocity; RF = Random Forest; ROC AUC = Receiver Operating Characteristic – Area Under the Curve; SHAP = SHapley Additive exPlanations; SI = Sport Initiation; U20 = Under-20 age category; U12/U14/U18 = Under-12 / Under-14 / Under-18 age categories; YODA = Youth Online Diagnostic Assessment; YUVA-SQ = Youth Universal Value Assessment – Scouting Questionnaire.