|
|
|
| |
| ABSTRACT |
|
Since its launch, the chatbot ChatGPT has gained significant popularity and may serve as a valuable resource for evidence-based exercise training advice. However, its capability to provide accurate and actionable exercise training information has not been systematically evaluated. This study assessed ChatGPT’s proficiency by comparing its responses to those of human personal trainers. Nine currently active level 4 (European Qualification Framework (EQF)) personal trainers (PTs) submitted their most frequently asked exercise training questions along with their own answers to them, and these questions were then posed to ChatGPT (version 3.5). Responses from both sources were evaluated by 18 PTs and 9 topic experts, who rated them on scientific correctness, actionability, and comprehensibility. Scores for each criterion were averaged into an overall score, and group means were compared using permutation tests. ChatGPT outperformed PTs in six of nine questions overall, with higher ratings in scientific correctness (5/9), comprehensibility (6/9), and actionability (5/9). In contrast, none of the responses from PTs were higher than those from ChatGPT for any question or metric. Our results suggest that ChatGPT can be used as a tool to answer questions that are frequently asked to PTs, and that chatbots may be useful for delivering informational support relating to physical exercise. |
| Key words:
Artificial Intelligence, exercise, natural language processing, machine learning, training guidance
|
Key
Points
- ChatGPT outperformed personal trainers in answering six of nine exercise training-related questions.
- Scores were consistently higher for ChatGPT across all metrics measured, including scientific correctness, actionability and comprehensibility.
- Large language models may be useful tools to support personal trainers and trainees by providing comprehensible and scientifically correct answers to frequently asked exercise training questions.
|
In recent years, advancements in digital innovation have made cutting-edge technologies a part of daily life, largely driven by the widespread availability and ease of use offered by the Internet, smartphones, and other electronic devices. One of the most pronounced examples of this has been the introduction of highly sophisticated chatbots that utilize natural language processing (large language models, LLMs), a subset of artificial intelligence, to interpret and reply to text input, simulating human conversation. Of the various options available, OpenAI’s ChatGPT has been the most prominent, with around 200 million monthly active users at the time of writing (Malik, 2023; Moore, 2023). ChatGPT’s versatility, enhanced context understanding, and advanced natural language processing capabilities enable users to quickly obtain foundational knowledge in a particular field more efficiently than browsing through search engine results (Ray, 2023). The domain of exercise has great potential for the implementation of chatbots due to the number of people who tend to seek out information related to this field online (Lupton, 2020; Digital Health: Critical and Cross-Disciplinary Perspectives). While this has traditionally occurred via individual websites accessed via search engines, chatbots offer distinct advantages. For example, chatbots can aggregate data from multiple sources, providing a more comprehensive and balanced view in one convenient location rather than requiring users to visit several different web pages. Furthermore, the possibilities to engage in conversation allow users to ask follow-up questions and seek clarification, enhancing the learning experience in ways that traditional web platforms often cannot. Within the domain of exercise, chatbots can also have practical advantages over alternative and more conventional ways of obtaining information, such as communicating with fitness professionals. For example, chatbots are not restricted to office hours and are therefore constantly available to answer fitness-related questions. They can also generate answers instantly, and many are free to use. The promising role of chatbots has been recently explored in the domains of public health, medicine, and nutrition and dietetics (Gardiner et al., 2017; Gabarron et al., 2020; Kirk et al., 2023; Morreel et al., 2023). However, research on the use of chatbots in the domain of exercise and training primarily focuses on their role in promoting physical activity and prescribing exercise. For example, Piao et al. found that a chatbot served as a cost-effective tool to enhance conscious decision-making, encouraging increased physical activity in the workplace (Piao et al., 2020). Zhu et al. found that ChatGPT combines the perception of the physician, exercise scientist, and personal trainer to successfully create exercise plans (Zhu et al., 2024). Conversely, Düking et al. (2024) found that running experts rated training plans created by ChatGPT as suboptimal; however, they concluded that this was also dependent on the depth of information provided in the prompt input (Düking et al., 2024). For both clinical and healthy resistance-trained populations, it has been shown that ChatGPT can provide suitable exercise prescriptions, though findings regarding its safety were mixed (Dergaa et al., 2024; Washif et al., 2024; Zaleski et al., 2024). While ChatGPT shows promise in supporting the field of exercise prescription, it may lack personalization and require further modification before implementation (Dergaa et al., 2024; Washif et al., 2024; Zaleski et al., 2024), with the provided information being rather accurate but not comprehensive (Zaleski et al., 2024). Additionally, studies that have investigated the role of chatbots in promoting physical activity lack the consistent usage of measurements and reporting of outcome evaluations (Oh et al., 2021). Studies investigating ChatGPT in exercise prescription may have a biased scoring process, as the reviewers evaluating ChatGPT’s answers do not appear to have been blinded (Dergaa et al., 2024; Düking et al., 2024). It has also been noted that there is insufficient evidence for the acceptability and practical feasibility of chatbots in general (Han et al., 2023). Hence, while there is promise, chatbots in exercise science remain understudied, with only a limited number of domains assessed and no direct comparison with human alternatives. Therefore, the objective of the current study was to compare answers from ChatGPT to frequently asked exercise training questions with those of human personal trainers (PTs). We hypothesized that ChatGPT would perform better than human PTs in answering frequently asked exercise training questions. By investigating this aim, we provide scientific support for the use of chatbots as a source of information in the exercise field, which may reduce the workload burden for PTs and support trainees seeking knowledge.
Experimental ProtocolThe study took place from March to July 2024. The chronological flow of the experimental protocol can be found in Figure 1. PTs were asked via Microsoft Forms to provide their most frequently asked questions related to exercise training, as well as their own answers to them. The PTs were requested to answer these questions in the same fashion that they would for a potential client. PTs could provide the question and answer in Dutch, French, and English, depending on their linguistic competence and preference. Exclusion criteria for the questions included failure to refer to exercise training or the implementation of lifestyle and health-related factors; answers that were not articulated in a manner suitable for direct communication with a potential client (e.g., “it is a boring question”, “it depends”); and those that fell below the minimum word count of 80 or exceeded the maximum of 300 words. None of the PTs were made aware of the goal of the study or the involvement of ChatGPT prior to their contribution to prevent influencing the questions or answers they provided. Following submission of their questions and answers, they were informed about the goal of the study and were given the opportunity to retract their input if they wished to; however, none of the participants chose to do so. After applying the exclusion criteria, the remaining questions formed the final set of nine questions and answers from nine different personal trainers used in the study (Table 1). While this falls slightly short of our desired sample size (N = 17, based on an effect size of a mean difference, a standard deviation of 1 between the groups, and for each question, a desired power of 0.8 and α of 0.95 (two-tailed), we initiated analysis after recruiting 9 participants due to a low response rate and time limitations. While this reduces the power of our study slightly, sample sizes of a similar number have been reported elsewhere in peer-reviewed, published literature (Kirk et al., 2023). These questions were then asked to ChatGPT (Version 3.5) in their original form. ChatGPT’s customization settings were left in the default settings, and no additional information other than the question was provided. The questions were only asked once, and the answer was recorded. The answers to each of the questions were graded by other PTs and scientific experts in the field of the questions. Each set of answers was graded by a total of 27 graders. The order of answers to the questions was randomized in the grading document to reduce the chances of pattern recognition and unblinding of the graders. One of the three scoring components was scientific correctness, and therefore, nine of the 27 graders (1/3) were scientific domain experts in the field of the question. The other 18 graders were personal trainers who graded all nine questions, irrespective of the topic. As with the PTs providing the question and answers, graders were initially blinded to the actual goal of the study and thus the involvement of ChatGPT. After receiving the evaluation of the answers, graders were informed about the goal of the study and were provided the opportunity to subtract their contribution if they disagreed with the procedure; however, none of the graders chose to do so. The grading was based on three components deemed relevant for determining the quality of an answer to a hypothetical knowledge-seeker asking a question: “Scientific correctness”, reflecting how accurately an answer reflects the current state of knowledge in the scientific domain to which the question belongs; “Comprehensibility”, capturing how well the answer could be expected to be understood by the layman receiving the answers, and; “Actionability”, the extent to which the answer to the question contains information that is useful and can be acted upon by the hypothetical layman asking the question. The rubric describing how each component should be scored that was sent to the graders is presented in Table 2. Each component could be scored between 0 and 10, with 0 representing a complete failure to satisfy a given component in the answer and 10 reflecting a perfect score on the given component. The overall score was acquired by averaging the scores of all three components.
Participants – contributorsPersonal trainers (PTs) providing questions and answers were recruited through relations within Ghent University and the European Register of Exercise Professionals (EREPS). Contributing PTs were required to hold at least a European Qualification Framework (EQF) Level 4 certification or an equivalent or higher qualification. This criterion ensured inclusion of professionally qualified PTs while excluding self-proclaimed ones, and participants were further required to be actively engaged in personal training. All provided online written informed consent to participate in the study.
Participants – gradersTo grade the answers provided by the contributing PTs, nine scientific experts per question topic were recruited from universities or research institutions and were required to comply with the inclusion criteria: minimum academic level of having attained a PhD, being currently affiliated with a research institute (either academic or private) and having expertise in the topic of the question based on topic-related peer-reviewed scientific article output. Hence, the scientific experts (1/3 of all graders) only rated the answers to the question relating to their domain of expertise across all three components (comprehensibility, actionability, and scientific correctness). In alignment with their professional responsibilities relating to relaying information to trainees and thus representing comprehensibility and actionability, the remaining 18 graders (2/3 of all graders) were personal trainers. Accordingly, the grading PTs rated all answers to all questions across the three components. Grading PTs were recruited through contacts of Ghent University and the Belgian commercial gym franchise ‘Jims’. The same inclusion criteria applied to the grading PTs as for those contributing to questions and answers, implying having obtained a minimum of EQF level 4 or equivalent or higher, and being actively engaged in personal training.
Ethical approvalThe study was conducted according to the Declaration of Helsinki, and all procedures were approved by the ethics committee of the faculty of psychology and educational sciences, Ghent University (reference code: 2024-005). After receiving a description of the study process, all participants provided online written informed consent to participate in the current study and could withdraw at any time (including after the actual goal of the study and the involvement of ChatGPT was made clear). All participants were informed that their participation would be made pseudonymous.
Statistical analysisDifferences between the grades of the answers overall and for each grading component were determined using permutation tests with the function perm.test from the package jmuOutlier in R software (R: The R Project for Statistical Computing). Permutation tests were used to test for group differences with the test statistic set to the mean. We further tested the robustness of our results with a sensitivity analysis using the median as the test statistic to account for the potential effects of any outliers. For both the overall score and the scores of the components for each question, p-values were approximated from 100,000 simulations to gauge the strength of evidence of a difference between the groups. Additional analysis on the comprehensibility of the answers was done using the Flesch Reading Ease (FRE) score and the Flesch–Kincaid Grade Level from the textstat library (version 0.7.10) in Python (version 3.11.7) using the flesch_reading_ease and flesch_reading_ease functions, respectively. These scores were calculated for each answer in both groups and compared using two-sided permutation tests using the permutation_test (N permutation = 9999, permutation type = ‘independent’) from the scipy.stats library (version 1.11.4). To evaluate the consistency among graders, inter-rater reliability (IRR) was assessed using the intraclass correlation coefficient (ICC) function from the irr package in R software. ICC values could be calculated for all grading PTs and for the scientific expert graders within the topic “training for fat loss”, as these were the only instances with multiple raters per item - a prerequisite for ICC computation. Cut-off values for the interpretation of the ICC values were based on Koo and Li (Koo and Li, 2016). Furthermore, the level of agreement in question grading was calculated across all scientific expert graders to assess their consistency, using descriptive statistics (standard deviation) for each answer set (defined by topic, question, criterion, and source: ChatGPT or PT).
Study sampleAfter applying the exclusion criteria, nine questions and answers from nine PTs were eligible for inclusion out of the initial 47 inputs received. The age of the nine PTs providing the questions and answers ranged from 19 to 48 (mean age = 38 years). Their experience in personal training varied from 0.5 to 15 years, with a mean of 5.3 years. The nationalities of the PTs were Spanish, Swedish, Malaysian, Belgian, French, Danish, Pakistani, and two were Dutch. All of them had obtained a European Qualification Framework 4 (EQF4) personal training certificate and were all currently active in personal training. Additional personal training-related qualifications that the PTs held are displayed in Table S1. Of the nine included questions, five were related to training for fat/weight loss, one to training frequency, one to training motivation, one to training around pain, and one to timing of the day for training. The answers from the personal trainers across the nine questions ranged from 80 to 280 words, with an average of 140 words, whereas those from ChatGPT ranged from 181 to 274 and had an average of 241 words. The scientific experts who graded the topic-specific answers represented a diverse range of countries of employment. Among the nine reviewers assigned to each topic, at least five different countries were represented, with one topic including experts from up to eight countries. Furthermore, for three of the topics, seven out of nine reviewers were active professors in the respective field, while the remaining two topics included no fewer than five and six professors among the reviewers, respectively (Table S2). As for the grading PTs, five different nationalities were represented, with the majority being Belgian. All grading PTs held an EREPS level 4 personal training qualification or an equivalent—or higher—certification (Table S3).
Grades of answers from PTs and ChatGPTFigure 2 illustrates the overall scores for the answers to each question, and the key summary statistics for the overall scores of the answers to each question can be found in Table S4. There was strong evidence that the mean overall scores for ChatGPT were significantly higher for six of the questions, specifically, Question 1 (7.57 for ChatGPT vs 6.69 for PTs; p = 0.0178), Question 2 (7.96 for ChatGPT vs 5.49 for PTs; p = < 0.0001), Question 3 (7.98 for ChatGPT vs 7.01 for PTs; p = 0.0004), Question 6 (7.99 for ChatGPT vs 7.07 for PTs; p = 0.0071), Question 7 (8.15 for ChatGPT vs 6.36 for PTs; p = 0.0001), Question 8 (7.73 for ChatGPT vs 6.65 for PTs; p = 0.0047). The scores of ChatGPT also tended to be higher for Question 9, although the strength of evidence was weaker (p = 0.0535). The mean overall scores were similar for Questions 4 and 5. The answers to each of the questions from the PTs and ChatGPT can be seen in Table S5, and the grades for them in Table S6. Figure 3, Figure 4 and Figure 5 show boxplots of the grades for the individual components and tables of their summary statistics can be found in Tables S7-S9. The mean scores for scientific correctness were higher for ChatGPT for five of the questions, specifically, Question 1 (8.26 for ChatGPT vs 5.43 for PTs; p < 0.0001), Question 2 (8.04 for ChatGPT vs 5.22 for PTs; p < 0.0001), Question 3 (7.26 for ChatGPT vs 6.33 for PTs; p = 0.0036), Question 7 (7.89 for ChatGPT vs 5.93 for PTs; p = 0.0002) and, Question 8 (7.59 for ChatGPT vs 6.19 for PTs; p = 0.026). No differences were observed for the other four questions (Figure 3). There was strong evidence for higher mean scores of comprehensibility for ChatGPT in six instances, specifically, Question 2 (8.63 for ChatGPT vs 6.15 for PTs; p < 0.0001), Question 3 (8.48 for ChatGPT vs 7.37 for PTs; p = 0.0077), Question 6 (8.37 for ChatGPT vs 7.11 for PTs; p = 0.0011), Question 7 (8.52 for ChatGPT vs 6.70 for PTs; p = 0.0001), Question 8 (8.26 for ChatGPT vs 7.26 for PTs; p = 0.0039) and, Question 9 (8.04 for ChatGPT vs 7.30 for PTs; p = 0.0438). No differences were observed for the remaining Questions 1, 4, and 5 (Figure 4). Finally, there was strong evidence for higher mean actionability for ChatGPT on five occasions, specifically, Question 2 (7.22 for ChatGPT vs 5.11 for PTs; p < 0.0001), Question 3 (8.20 for ChatGPT vs 7.33 for PTs; p = 0.0213), Question 6 (7.78 for ChatGPT vs 6.48 for PTs; p = 0.0127), Question 7 (8.04 for ChatGPT vs 6.44 for PTs; p = 0.0023), Question 8 (7.33 for ChatGPT vs 6.52 for PTs; p = 0.0355), with no clear differences for the remaining Questions 1,4, 5 and 9 (Figure 5). A visual representation of the questions that score higher for ChatGPT, both overall and for each grading component can be seen in Figure 6.
Sensitivity analysisSensitivity analysis using the median instead of the mean as the test statistic for group differences was also conducted. These results can be seen in Tables S10. In general, the p-values of the permutations test tended to be higher when the median was used as the test statistic. This could suggest that the central tendency appeared to be more similar between the groups, with scores closer to the limits of the response (i.e., 0 or 10) driving mean differences. Nevertheless, although the median differences were less pronounced, there was still clear evidence for higher scores for answers from ChatGPT compared to those from PTs (Figure 6). We also analyzed comprehensibility by comparing Flesch Reading Ease (FRE) scores and the Flesch–Kincaid Grade Levels between the groups, though we identified no significant differences here (Tables S11 and Tables S12). In addition, inter-rater reliability among the grading PTs, who evaluated all answers to all nine questions, showed good to excellent intraclass correlation coefficients (ICC) values overall, except for the ChatGPT answer set under the criterion scientific correctness, which demonstrated lower inter-rater reliability (Tables S13). The ICCs among the scientific expert graders evaluating the answers to the five training for fat loss questions were generally higher for the PT-provided answers than for those generated by ChatGPT across all three grading criteria (Tables S14). The per-question agreement analysis among all scientific expert graders indicated generally higher grading consistency for ChatGPT compared to PTs, with the greatest variability observed for the criterion of scientific correctness (Tables S15 and Tables S16).
ChatGPT outperforms human PTs in answering common exercise training questionsThe results of the current study provide compelling evidence supporting our hypothesis that responses from ChatGPT outperform human PTs when answering common exercise training questions, as indicated by higher scores in six of the nine questions for the mean overall score. In contrast, overall scores for human PTs failed to exceed those of ChatGPT in any of the questions. A similar pattern was also seen for each of the individual grading criteria from which the overall scores were composed (i.e., scientific correctness, comprehensibility, and actionability), suggesting that ChatGPT provided better responses across multiple distinct components relevant for providing high-quality answers. While recent studies have compared the generative performance of different chatbots (Havers et al., 2025), our study is, to the best of our knowledge, the first to directly evaluate ChatGPT’s capabilities against human peers using a blinded rating procedure for assessing informational support in a personal training context. In the field of nutrition, Kirk et al. (Kirk et al., 2023) found that responses from ChatGPT to commonly asked nutrition questions received higher scores than those of Dutch dietitians. Interestingly, both our findings and those of Kirk et al. show that the scores of the individual grading criteria were similar to the overall scores and that the overall differences were not simply driven by extreme scores in one of the grading components (Kirk et al., 2023). Furthermore, an overlapping finding in both studies was that on no occasion for either the overall scores or any individual grading criteria were scores for ChatGPT lower than those provided by human counterparts (Kirk et al., 2023). The questions in both studies belonged to a variety of subject areas within their respective domains, which suggests that ChatGPT’s capabilities are not limited to specific fields but rather are wide-reaching. This indicates a clear advantage over human professionals, whose expertise is usually confined to a limited number of areas. However, admittedly, we cannot draw definitive conclusions about this hypothesis due to the limited number of questions and therefore subject areas in our study.
LLMs for fitness and health knowledge acquisitionOur findings support the use of ChatGPT for knowledge acquisition in the fitness and health domain. We propose that ChatGPT and other LLMs could support health professionals in promoting commitment to physical activity and exercise guidelines by providing accurate and comprehensible responses to inquiries in real time. Parallels can be seen in other branches of health, such as medicine and nutrition. In medicine, LLMs have been proposed to be used for patient queries, educating medical students or patients, and managing chronic diseases (Thirunavukarasu et al., 2023)(Kurniawan et al., 2024). In nutrition, they could be used to support the nutritional needs of those looking to improve their diet (Bergling et al., 2025). In recent years, the internet has become a key source of information in the fitness and health space (Tan and Goonawardene, 2017). Given the number of users that ChatGPT has attracted, combined with its ability to synthesize information from different sources and thus eliminate the need to look through multiple web pages, it is reasonable to assume that ChatGPT will be used for knowledge acquisition by those looking to deepen their knowledge on exercise-related topics. Hence, validating the adequacy of ChatGPT in this role is an important contribution of our work, and our findings support that ChatGPT could be used to answer exercise-related questions that might otherwise have been sought online or asked to a fitness professional.
AI-driven chatbots may support fitness professionals and traineesStudies conducted so far in the field of exercise and health indicate that chatbots can improve the efficiency of resource allocation by offloading human operators from duties that can be automated (Fadhil and Gabrielli, 2017) and may enhance physical activity levels (Oh et al., 2021). This could provide opportunities for both PTs and trainees. For example, unlike PTs, chatbots are available 24/7, can generate answers instantly, and may be free of charge. Additionally, by responding to relatively basic questions, chatbots can free up PTs’ time to focus on more interesting activities, particularly those for which chatbots are not well-suited. Initially, chatbots may primarily address simple questions, as in our study, but they could evolve to fulfill additional roles, such as functioning as virtual PTs and offering personalized approaches (Kirk et al., 2021). In these expanded roles, human PTs would still contribute essential elements, including direct interaction and emotional support.
StrengthsWe highlight some strengths of our work. First, all participants—including PTs submitting questions and answers, grading PTs, and scientific experts—were blinded to group allocation, reducing bias. Second, the questions used were directly representative of those commonly asked by trainees seeking support from a PT, enhancing real-world relevance. Third, evaluating answers across three components (scientific correctness, comprehensibility, and actionability) provided a multidimensional assessment of answer quality. The grading structure – one-third scientific experts and two-thirds PTs – reflected these components appropriately, and the resulting grades demonstrated good inter-rater reliability.
LimitationsWe note the following limitations. Firstly, although the sample size was smaller than initially planned (9 vs 17), the experimental design generated a total of 486 individual ratings, allowing stable comparative estimates. We therefore interpret the findings as exploratory and hypothesis-generating rather than definitive. Due to the limited sample size, we were unable to examine whether answer quality differed among PTs with varying levels of academic education. This is an important avenue for future research, particularly given the relatively low entry requirements for becoming a licensed PT. The inter-rater reliability (IRR) analysis among the scientific experts could only be conducted for those grading the five questions related to the “training for fat loss” topic, as for all other topics, each expert rated only one subset of answers. Given the rapid evolution of LLMs, it is reasonable to expect that newer versions of ChatGPT (e.g., GPT-5 and beyond) will demonstrate greater consistency and accuracy, particularly regarding factual and practical aspects of exercise prescription. Future research should therefore consider model versioning as a key methodological factor, systematically recording the model type, release date, and prompting protocol, enabling temporal benchmarking of AI performance.
Our study found that ChatGPT (version 3.5) outperformed human PTs in answering commonly asked exercise training questions. The overall quality of responses from ChatGPT was higher in 6/9 of the questions investigated, while scores from the answers of human PTs were higher in none of the questions. These findings also extended to each of the individual metrics used in the assessment of the quality of the answers, showing a general superiority of ChatGPT in providing actionable, comprehensible, and scientifically correct answers to common exercise training questions. Our results provide evidence that AI-driven LLMs such as ChatGPT may be used by knowledge-seeking trainees to answer common training questions, which could both enhance knowledge acquisition and thus encourage commitment to exercise guidelines as well as reduce the workload for PTs. Future work should look to validate our findings using a larger sample of questions and answers to better identify the strengths and weaknesses of LLMs such as ChatGPT across a broad range of exercise training topics.
| ACKNOWLEDGEMENTS |
The author(s) reported there is no funding associated with the work featured in this article. No potential conflict of interest was reported by the authors. Experiments comply with the current laws of the country in which they were performed. The data that support the findings of this study are available on request from the corresponding author. |
|
| AUTHOR BIOGRAPHY |
|
 |
Brecht D’hoe |
| Employment: Department of Movement and Sports Sciences, Ghent University, Ghent, Belgium. |
| Degree: MSc |
| Research interests: Nutrigenetics, applied resistance training |
| E-mail: brecht.dhoe@ugent.be |
| |
 |
Daniel Kirk |
| Employment: Department of Twin Research & Genetic Epidemiology, King's College London, St Thomas Hospital, Westminster Bridge Road, London SE1 7EH, UK. |
| Degree: MSc |
| Research interests: Microbiome, Nutrition, Personalized Nutrition, Machine Learning |
| E-mail: daniel.1.kirk@kcl.ac.uk |
| |
 |
Jan Boone |
| Employment: Department of Movement and Sports Sciences, Ghent University, Ghent, Belgium |
| Degree: PhD, Professor |
| Research interests: Physical conditioning, Sports Training, Sports Physiology |
| E-mail: jan.boone@ugent.be |
| |
 |
Alessandro Colosio |
| Employment: Inter-University Laboratory of Human Movement Biology, Université Saint-Etienne, Saint-Etienne, France |
| Degree: PhD, Professor |
| Research interests: Exercise Physiology, Exercise Prescription, Exercise Therapy |
| E-mail: alessandro.colosio@univ-st-etienne.fr |
| |
|
| |
| REFERENCES |
 Bergling, K., Antin, P., Hannun, Y. A., Rahmim, A., Afshin, A. (2025) From bytes to bites: application of large language models to enhance nutritional recommendations. Clinical Kidney Journal 18(4), sfaf082. Crossref
|
 Düking, P., Steinacker, J. M., Badtke, F., Sperlich, B. (2024) ChatGPT generated training plans for runners are not rated optimal by coaching experts, but increase in quality with additional input information. Journal of Sports Science and Medicine 23(1), 56-72. Crossref
|
 Dergaa, I., Abdulrashid, H., Abdelrahman, H., Souissi, A. (2024) Using artificial intelligence for exercise prescription in personalised health promotion: A critical evaluation of OpenAI’s GPT-4 model. Biology of Sport 42(2), 221-241. Crossref
|
 Fadhil, A., Gabrielli, S. (2017) Addressing challenges in promoting healthy lifestyles: the al-chatbot approach. In Proceedings of the 11th EAI International Conference on Pervasive Computing Technologies for Healthcare, 261-265. Crossref
|
 Gabarron, E., Dorronzoro-Zubiete, E., Bradway, M., Årsand, E. (2020) What do we know about the use of chatbots for public health? Digital Personalized Health and Medicine, 796-800. Crossref
|
 Gardiner, P. M., McCue, K. D., Negash, L. M., Cheng, T. Y., White, L. F., Yinusa-Nyahkoon, L., Jack, H. E., Bickmore, T. W., Lestoquoy, A. S. (2017) Engaging women with an embodied conversational agent to deliver mindfulness and lifestyle recommendations: A feasibility randomized control trial. Patient Education and Counseling 100(9), 1720-1729. Crossref
|
 Han, R., Burke, L. E., Sereika, S. M. (2023) Feasibility and acceptability of chatbots for nutrition and physical activity health promotion among adolescents: Systematic scoping review with adolescent consultation. JMIR Human Factors 10(1), e43227. Crossref
|
 Havers, T., O’Connor, D., Dergaa, I., Souissi, A. (2025) Reproducibility and quality of hypertrophy-related training plans generated by GPT-4 and Google Gemini as evaluated by coaching experts. Biology of Sport 42(2), 289-329. Crossref
|
 Kirk, D., Catal, C., Tekinerdogan, B. (2021) Precision nutrition: A systematic literature review. Computers in Biology and Medicine 133, 104365. Crossref
|
 Kirk, D., van Eijnatten, E., Camps, G. (2023) Comparison of answers between ChatGPT and human dieticians to common nutrition questions. Journal of Nutrition and Metabolism 2023(1), 5548684. Crossref
|
 Koo, T. K., Li, M. Y. (2016) A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of Chiropractic Medicine 15(2), 155-163. Crossref
|
 Kurniawan, M. H., Al Mamun, M., Sutanto, F. (2024) A systematic review of artificial intelligence-powered (AI-powered) chatbot intervention for managing chronic illness. Annals of Medicine 56(1), 2302980. Crossref
|
 Lupton, D. (2020) ‘Better understanding about what’s going on’: young Australians’ use of digital technologies for health and fitness. Sport, Education and Society 25(1), 1-13. Crossref
|
 Malik, A. (2023) OpenAI’s ChatGPT now has 100 million weekly active users. TechCrunch. Crossref
|
 Moore, O. (2023) How are consumers using generative AI? Andreessen Horowitz. Crossref
|
 Morreel, S., Verhoeven, V., Mathysen, D. (2023) Microsoft Bing outperforms five other generative artificial intelligence chatbots in the Antwerp University multiple choice medical license exam. medRxiv, 2023.08.18.23294263. Crossref
|
 Oh, Y. J., Zhang, J., Fang, M. L., Xu, Y. (2021) A systematic review of artificial intelligence chatbots for promoting physical activity, healthy diet, and weight loss. The International Journal of Behavioral Nutrition and Physical Activity 18(1), 160. Crossref
|
 Piao, M., Ryu, H., Lee, H., Kim, J., Lee, J. (2020) Use of the healthy lifestyle coaching chatbot app to promote stair-climbing habits among office workers: Exploratory randomized controlled trial. JMIR mHealth and uHealth 8(5), e15085. Crossref
|
 Ray, P. P. (2023) ChatGPT: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope. Internet of Things and Cyber-Physical Systems 3, 121-154. Crossref
|
 Tan, S. S.-L., Goonawardene, N. (2017) Internet health information seeking and the patient-physician relationship: A systematic review. Journal of Medical Internet Research 19(1), e5729. Crossref
|
 Thirunavukarasu, A. J., Elmas, B., Ramachandran, N., Panch, T. (2023) Large language models in medicine. Nature Medicine 29(8), 1930-1940. Crossref
|
 Washif, J. A., Ramlan, A. A., Wong, M. Y. (2024) Artificial intelligence in sport: Exploring the potential of using ChatGPT in resistance training prescription. Biology of Sport 41(2), 209-220. Crossref
|
 Zaleski, A. L., Thomas, R. J., Allison, K. M., Arena, R. (2024) Comprehensiveness, accuracy, and readability of exercise recommendations provided by an AI-based chatbot: Mixed methods study. JMIR Medical Education 10, e51308. Crossref
|
 Zhu, W., Liu, M., Wang, H., Gao, Z. (2024) Who could and should give exercise prescription: Physicians, exercise and health scientists, fitness trainers, or ChatGPT? Journal of Sport and Health Science 13(3), 368-372. Crossref
|
|
| |
|
|
|
|