| Research article - (2026)25, 476 - 486 DOI: https://doi.org/10.52082/jssm.2026.476 |
| Machine Learning–Based Classification of Alertness Levels in Elite Shooting Athletes Using Heart Rate Variability |
Jiaojiao Lu1,2, Jun Qiu2, , Yan An2 |
| Key words: Heart Rate Variability, Shooters, Vigilance, Machine Learning, Psychomotor Vigilance Task |
| Key Points |
|
|
|
| Sample size calculation |
The state of “sub-optimal” alertness was operationally defined based on behavioral performance in the Psycho-motor Vigilance Task (PVT). A PVT trial was classified as representing a sub-optimal alertness state if the participant's reaction time exceeded 500 ms (Basner and Dinges, Preliminary data from an internal pilot study (n ≈ 20 athletes) conducted by our research group indicated an expected model discrimination performance (Area Under the Curve, AUC) of 0.75 for identifying this state, with an estimated positive event rate of 35%. The required sample size was calculated using Obuchowski's method for correlated data (α = 0.05, power = 80%), which determined that a minimum of 54 positive event epochs (i.e., PVT trials with RT > 500 ms) were needed. From the final dataset, 480 valid PVT epochs, with each epoch defined as a 10-minute task block and meeting the data quality criteria, were obtained. Within this dataset, 168 epochs were identified as positive cases (sub-optimal alertness), which significantly exceeded the minimum requirement calculated a priori. Post-hoc evaluation con-firmed that the study achieved robust statistical power. Given that the a priori power analysis required a minimum of 54 positive events to detect an AUC of 0.75 with 80% power, our final dataset—comprising 168 positive events and achieving an observed AUC of 0.77—substantially exceeded these requirements. Therefore, the observed power for the primary classification metric (AUC) is inherently greater than the predefined 80% threshold. |
| Participants |
A total of 83 elite shooting athletes from Shanghai (48 males, 35 females; mean age 18.4 ± 2.5 years; mean body mass 62.3 ± 9.1 kg; mean height 168.7 ± 7.4 cm) were enrolled, all certified as national first-grade athletes or above. In terms of competitive level, 9 athletes held International Master of Sport certification, 25 held National Master of Sport certification, and the remaining 49 were certified national first-grade athletes. Athletes specialized in 10-meter air rifle (n = 47) or 10-meter air pistol (n = 36) events, with mean training experience of 5.8 ± 2.3 years and a weekly training volume of 28.6 ± 4.2 hours, conducted predominantly in the morning as part of the team's standardized schedule. The study was approved by the Ethics Committee of the Shanghai Research Institute of Sports Science (Shanghai Anti-Doping Agency) [LLSC20250005], All participants were informed about the study protocol and provided written informed consent to participate in the study. All procedures were performed in accordance with the ethical standards laid down in the 1964 Declaration of Helsinki and its later amendments. |
| Inclusion and exclusion criteria |
Participants were eligible for inclusion if they met the following criteria: (i) certified elite shooting athletes at or above the national first-grade level; (ii) actively engaged in regular training programs; and (iii) able to complete the full experimental protocol. Participants were excluded if they met any of the following criteria: (i) self-reported chronic sleep disorders; (ii) acute illness at the time of testing; or (iii) inability to complete physiological or behavioral measurements. To control for potential confounding factors, several pre-test restrictions were applied. All participants were non-smokers and were required to abstain from alcohol consumption for at least 24 hours prior to testing, in accordance with standard athlete health management protocols. Participants were further instructed to refrain from caffeine-containing beverages, stimulants, and ergo-genic supplements for at least 12 hours before testing to minimize their influence on autonomic nervous system activity. Sleep-related factors were partially controlled by excluding participants with self-reported chronic sleep disorders; however, formal objective assessment of baseline sleep quality (e.g., PSQI or actigraphy) was not conducted prior to testing. Female participants (n = 35) were included in the study; however, menstrual cycle phase was not system-atically recorded or controlled, and testing was not restrict-ed to specific cycle phases. All experimental sessions were conducted in the morning prior to routine training to reduce variability associated with diurnal fluctuations in physiological and cognitive performance. |
| Data collection |
All testing sessions were conducted between 08:00 and 09:30 in the morning, prior to the commencement of daily training. This timing was standardized to control for diurnal variation in both HRV and vigilance performance, given that HRV exhibits well-documented circadian rhythmicity in autonomic tone (Boudreau et al., |
| Alertness task |
To quantify vigilance levels under conditions simulating the sustained attention demands of precision shooting, the PVT was employed. The task was programmed and presented using PsychoPy (v2022.2.5) (Peirce et al., At the start of the experiment, participants were instructed to rest their dominant hand on a keyboard spacebar and to maintain fixation on a central point on the screen. Each trial began with a blank-screen inter-trial interval varying randomly between 2 and 10 seconds, followed by the appearance of a visual stimulus (a milli-second timer starting from 0). Participants were required to press the spacebar as quickly as possible upon stimulus onset. Immediately after each response, the reaction time (RT) in milliseconds was displayed for 1 second, followed by the next trial. If no response was detected within a pre-defined window, the trial terminated automatically after a timeout, and the next trial began. To ecologically mirror the temporal structure of 10-meter air rifle and pistol events—in which athletes typically complete their match within approximately 60 minutes amid varying pacing strategies—the PVT was structured into six consecutive 10-minute blocks, totaling 60 minutes of testing. Each 10-minute block contained 80 trials. This block duration was designed to simulate the sustained focus required during a typical shooting series, where athletes repeatedly engage in aiming, breath control, and trigger execution over extended periods without prolonged breaks. Throughout the entire task, HRV was recorded continuously. A valid response was defined as a keyboard press occurring after stimulus onset and within the response window. Performance metrics derived from the PVT included: number of valid responses, mean RT, median RT, mean reciprocal RT (1/RT), the average of the fastest 10% of RTs, and the average of the slowest 10% of RTs. These indices provide a multi-faceted assessment of vigilance, with reciprocal RT and fastest RTs reflecting optimal alertness, and slowest RTs capturing lapses in attention. A one-way analysis of variance (ANOVA) was conducted to examine differences in reaction times across experimental conditions or groups. Alertness task is presented in |
| Electrocardiographic signal acquisition and processing |
Electrocardiographic (ECG) data continuously recorded throughout the vigilance task using a Polar V800 heart rate monitor (Polar Electro Oy, Finland) with a sampling frequency of 1000 Hz. The raw ECG signal was first band-pass filtered (0.5-35 Hz) to attenuate baseline wander and high-frequency noise. Subsequently, the derived RR interval time series were processed to correct for artifacts and ectopic beats, which are common in ambulatory recordings. This correction was performed using the built-in "Artifact Correction Algorithm" within Kubios HRV Premium software (version 3.5). The software's automatic correction mode, with its threshold set to "Medium", was applied to identify and interpolate spurious or missing beats, ensuring the integrity of the inter-beat interval data for subsequent analysis. Under this 'Medium' correction threshold, the proportion of corrected RR intervals was consistently below 2% per participant per epoch. This low correction rate falls well within the accepted quality threshold for HRV analysis (Laborde et al., |
| Alertness level labeling |
To ensure the robustness of the predictive model, alertness levels were operationally defined based on the distribution of mean reaction times (RT) for each 10-minute block. Initially, a fine-grained five-level classification was established using a standard deviation (SD) deviation method based on the entire dataset's distribution. The specific cut-off values and data distribution for these five levels are detailed in
The binary framework was prioritized for the final model development to maximize discriminative power and ensure a balanced dataset for machine learning training. |
| Machine learning and performance evaluation |
Machine learning and performance evaluation were conducted by first splitting the original dataset into a training dataset and a test dataset at a ratio of 7:3. During model development, feature selection was performed using random forest (RF) combined with recursive feature elimination to identify relevant HRV features, including time-domain, frequency-domain, nonlinear, and autonomic nervous system indices. The selected features were then used to build vigilance level prediction models via four machine learning algorithms: support vector machine (SVM), random forest (RF), extreme gradient boosting (XGBoost), and adaptive boosting (AdaBoost). Hyper-parameter tuning was carried out using grid search with five-fold cross-validation. Hyperparameter tuning was conducted using exhaustive grid search combined with five-fold cross-validation (folds defined at the subject level to prevent data leakage). The search grids were as follows: for SVM, regularization parameter C ∈ {0.1, 1, 10, 100 and kernel coefficient gamma ∈ {'scale', 'auto', 0.001, 0.01}; for RF, number of estimators ∈ {50, 100, 200}and maximum tree depth ∈ {None, 5, 10, 20; for XGBoost, number of estimators ∈ {50, 100, 200}, learning rate ∈ {0.01, 0.1, 0.3}, and maximum depth ∈ {3, 5, 7}; for AdaBoost, number of estimators ∈ {50, 100, 200} and learning rate ∈ {0.01, 0.1, 0.5, 1.0}. AUC was used as the optimization metric throughout. Model performance was evaluated on an independent test set using accuracy, specificity, sensitivity, F1 score, and AUC as evaluation metrics. Furthermore, feature importance ranking and SHAP (Shapley Additive Explanations) analysis were applied to identify key factors associated with vigilance levels. |
| Statistical analysis |
Data preprocessing and statistical analyses were conducted using JASP (Version 0.19.3). Prior to analysis, the normality of continuous variables was assessed using the Shapiro–Wilk test. Descriptive statistics are reported as M ± SD. A one-way ANOVA was performed to compare reaction time differences across the six PVT blocks. To feature selection, Pearson correlation analysis was employed to evaluate the linear associations between all extracted HRV indices (time-domain, frequency-domain, and nonlinear) and behavioral alertness metrics. A correlation was considered statistically significant at p < 0.05. Features showing significant correlations were identified as candidate inputs for machine learning models. |
|
|
| PVT test |
The results of the six consecutive PVT blocks are presented in |
| Dynamic Changes in Heart Rate Variability |
HRV parameters exhibited distinct temporal patterns throughout the task, reflecting a shift in autonomic regulation. |
| Time-domain and Frequency-domain Analysis |
Time-domain analysis ( |
| Nonlinear feature changes |
| Correlation analysis |
Pearson correlation analysis was conducted to examine the associations between HRV indices and PVT reaction times. As shown in |
| Alertness level classification |
The three-class model results revealed a consistent pattern across all four algorithms: the 'Moderate alertness' class was poorly identified, with Recall values ranging from only 0.02 (SVM, AdaBoost) to 0.15 (XGBoost) and F1-scores between 0.05 and 0.22 ( Based on the binary classification criteria established in the |
| Model development and evaluation |
Using the selected HRV features, four machine learning algorithms were evaluated. |
| Model interpretability |
To interpret the contribution of individual HRV features to the AdaBoost model, a SHAP analysis was performed. As shown in the feature importance ranking ( |
|
|
This study presents a novel quantitative approach for assessing alertness in elite shooting athletes by integrating dynamic HRV monitoring with machine learning algorithms. The present findings broadly support our a priori hypothesis. The binary AdaBoost model achieved an AUC of 0.77, exceeding the hypothesized threshold of 0.70, confirming that HRV features recorded during the 60-minute simulated competition task provide sufficient discriminative information for alertness classification in this elite population. Furthermore, consistent with our hypothesis, the frequency-domain index VLF% emerged as the most critical predictor in the SHAP analysis, underscoring the role of slower autonomic oscillations in encoding alertness states. However, the hypothesis regarding cross-demographic generalizability remains to be tested, given the single-sport, mixed-sex, and relatively young sample. Physiologically, shooting is a psychomotor task requiring intense top-down cognitive control and emotional regulation with minimal metabolic demand (Shao et al., A key theoretical contribution of this study lies in the identification of specific HRV signatures unique to precision sports. Our SHAP analysis revealed that VLF% (very low frequency percentage) and the SD2/SD1 ratio were the most sensitive predictors of vigilance. The prominence of VLF%, which typically reflects long-term regulatory mechanisms influenced by thermoregulation and hormonal activity, suggests that the physiological demand of shooting differs significantly from high-intensity sports (Storniolo et al., While the model's accuracy (0.75) indicates room for refinement, this performance must be interpreted within the context of the specific cohort. For instance, a recent study employing sliding-window HRV metrics on sleep-deprived healthy adults reported a binary classification accuracy of 89% using SVM (Xie and Ma, From a practical perspective, this study provides a validated, non-invasive tool for assessing "pre-competition readiness." The use of portable ECG devices improves assessment efficiency by reducing testing time by over 90% compared to behavioral tasks like the PVT (Zhou and Zhang, The present study has several limitations. First, the predictive models were validated only through internal subject-level cross-validation and were not evaluated using an independent external cohort, limiting the general-izability of the findings across populations, sports, and testing conditions. Future studies should prioritize external validation using independent datasets. Second, all participants were elite shooting athletes (n = 83), which, although ensuring high ecological validity, restricts cross-sport generalization. Additionally, no stratified analyses were conducted by sex, age, or training characteristics; such analyses were not feasible given the limited sample size but should be considered in larger cohorts. Third, concurrent subjective alertness measures (e.g., visual analog scale or NASA-TLX) and neurophysiological markers (e.g., EEG) were not included. While this was intended to preserve ecological validity, it precludes assessment of convergent validity and should be addressed in future studies. Fourth, although several strategies were implemented to mitigate overfitting—including subject-level cross-validation, recursive feature elimination, and validation-based hyperparameter tuning—the relatively limited sample size constrains the application of more complex models and the establishment of robust normative references (Collins et al., |
|
|
This study identifies the very low frequency percentage (VLF%) and the SD2/SD1 ratio as the most sensitive physiological signatures of vigilance in elite shooting athletes, highlighting the pivotal role of slow-wave autonomic regulation in precision performance. By integrating these features with the AdaBoost algorithm, we developed a binary classification model that effectively distinguishes optimal from sub-optimal alertness states with superior reliability compared to traditional classifiers. This framework provides a validated, non-invasive, and efficient tool for monitoring pre-competition readiness, offering coaches actionable data to support training optimization in high-fidelity environments. |
| ACKNOWLEDGEMENTS |
The study was supported by the Science and Technology Program of the Shanghai Municipal Science and Technology Commission: "Research on New Training Strategies for Improving Athletic Performance in High-Temperature and High-Humidity Environments" (grant number 25Y42800301). The APC was funded by the Shanghai Research Institute of Sports Science (Shanghai Anti-Doping Agency).The anonymized dataset and analysis code are available upon reasonable written request to the corresponding author, subject to a data sharing agreement compliant with institutional ethics requirements (Ethics approval: LLSC20250005). The authors declare that they have no competing interests. |
| AUTHOR BIOGRAPHY |
|
| REFERENCES |
|