|
Talent identification (TID) in team sports is complex, influenced by biological, technical, psychological, and socio-cultural factors. Machine learning (ML) offers tools to integrate high-dimensional data, yet its applications in youth TID remain underexplored. Objectives: To systematically review ML approaches applied to youth talent identification in team sports, with emphasis on data domains, algorithms, validation strategies, and interpretability. Eligible studies included peer-reviewed quantitative research applying ML to youth athletes (≤21 years) in team sports for TID outcomes. Searches were conducted in PubMed, Scopus, and Web of Science, supplemented by reference and citation screening. Extracted data items included input data domains (anthropometric, physical, technical, perceptual–cognitive, psychological, socio-cultural, and multi-domain), ML approach, validation methods, performance metrics (e.g., accuracy, AUC, F1-score), and interpretability techniques. Risk-of-bias assessment was implemented using PROBAST. From 228 records, 27 studies met inclusion criteria. Soccer was most studied (n = 13), with others covering rugby, basketball, cricket, volleyball, and Australian football. Sample sizes ranged from 21 to 13,876 athletes, predominantly male. Supervised algorithms (Random Forest, gradient boosting, neural networks, penalized regression) were most common; some studies used unsupervised clustering. Validation practices varied, with few employing nested cross-validation or external testing. Reported discrimination metrics ranged from modest to excellent (ROC-AUC ≈ 0.58-0.96, depending on model and context), yet calibration performance (e.g., Brier score, calibration slope) was rarely reported, and external validation was uncommon. Across studies, predictive accuracy was moderate to high internally but rarely externally confirmed. Risk of bias was high in 59 % of studies, mainly due to inadequate analysis and limited generalizability. Overall, ML shows potential to complement, not replace, traditional TID approaches - acting as a decision-support and hypothesis-generation tool that can assist practitioners in early screening, individualized progression modeling, and evidence-based talent forecasting. To strengthen translational impact, future research should emphasize transparent reporting, calibration assessment, and external validation to ensure robust, applicable ML models for sport talent systems. |