Supplementary Tables

 
Table S10. Sensitivity analysis using the median instead of the mean as the test statistic for group differences between ChatGPT and PT.
Mean Median Mean Median Mean Median Mean Median
Question 1 0.01778 0.00395 1e-05 0.03072 0.81334 1 0.934 1
Question 2 <0.0001 0.00465 <0.0001 0.00422 <0.0001 0.06457 1e-05 7e-05
Question 3 0.00041 0.004 0.00356 0.0312 0.00768 0.06548 0.02126 1
Question 4 0.81171 1 1 1 0.16879 1 0.12076 0.18911
Question 5 0.118 0.68951 0.6447 1 0.56767 1 0.31438 1
Question 6 0.00708 0.18943 0.70128 1 0.00106 0.1916 0.01272 0.18976
Question 7 6e-05 0.00097 0.00021 0.03094 0.00013 0.00427 0.00231 0.01247
Question 8 0.00474 0.00417 0.02595 0.28711 0.00388 0.00416 0.03549 0.40283
Question 9 0.0535 0.18872 0.30964 1 0.04379 0.18761 0.30654 0.54254
 
Table S11. Flesch Reading Ease (FRE) score and the Flesch–Kincaid Grade Level analysis.
ChatGPT_FRE ChatGPT_FKGL PT_FRE PT_FKGL
39.16236565 11.88473129 61.9276667 9.9877037
46.46332501 10.36121783 51.175 11.09
53.68649827 9.699404644 64.2640385 9.14012821
41.38750055 11.4176077 60.0575 9.23
51.20650273 11.1684153 38.1017241 15.1493103
44.70138251 11.06547541 12.7697627 16.3846519
57.30355769 10.15875 47.5343521 13.6845915
40.14042135 12.39841894 48.89 10.415
 
Table S12. Flesch Reading Ease (FRE) score and the Flesch–Kincaid Grade Level summary statistics.
ChatGPT_FRE ChatGPT_FKGL PT_FRE PT_FKGL
count 8 8 8 8
mean 46.75644422 11.01925264 48.0900055 11.8851732
std 6.69377978 0.906303254 16.7077928 2.80604254
min 39.16236565 9.699404644 12.7697627 9.14012821
25% 41.07573075 10.31060087 45.1761951 9.79827778
50% 45.58235376 11.11694536 50.0325 10.7525
75% 51.82650162 11.5343886 60.5250417 14.0507712
max 57.30355769 12.39841894 64.2640385 16.3846519
 
Table S13. Intraclass correlation coefficient (ICC) among grading PTs.
criterion Answer set ICC (2,k) lower bound CI upper bound CI nr of Qs nr of graders Inter-pretation
Actionability ChatGPT 0.69108278 0.390892658 0.906857013 9 18 Good
Actionability PT 0.61026517 0.24603212 0.881856518 9 18 Good
Comprehen-sibility ChatGPT 0.634818 0.289576003 0.889575215 9 18 Good
Comprehen-sibility PT 0.59796979 0.200445836 0.881201226 9 18 Fair
Scientific correctness ChatGPT 0.27672621 -0.220600161 0.754692695 9 18 Poor
Scientific correctness PT 0.76384081 0.49759271 0.933323246 9 18 Excellent
 
Table S14. Intraclass correlation coefficient (ICC) among scientific experts grading the five questions relating to the topic “training for fat loss”.
criterion Answer set ICC (2,k) lower bound CI upper bound CI nr of Qs nr of graders Inter-pretation
Actionability ChatGPT -0.116451 -0.330532978 0.56717331 5 9 Poor
Actionability PT 0.54667368 0.046674825 0.930112294 5 9 Moderate
Comprehen-sibility ChatGPT -0.0503919 -0.371801441 0.689715167 5 9 Poor
Comprehen-sibility PT 0.77114428 0.38094296 0.970030924 5 9 Good
Scientific correctness ChatGPT 0.2901169 -0.270922221 0.875311571 5 9 Poor
Scientific correctness PT 0.84285411 0.540023571 0.980256013 5 9 Good