Supplementary Materials

 
Supplementary Table 3. Table of machine learning and statistical definitions.
Abbreviation Full Name Definition
AUC Area Under the Receiver Operating Characteristic Curve AUC quantifies the overall ability of a binary classifier to distinguish between positive and negative classes by computing the area under the ROC curve, with values ranging from 0.5 (random) to 1 (perfect classification).
Precision Precision Defined as TP / (TP + FP), precision indicates the proportion of positive identifications that were actually correct. High precision indicates a low false positive rate.
Sensitivity Sensitivity (Recall, True Positive Rate) Defined as TP / (TP + FN), it measures the proportion of actual positives correctly identified by the model, reflecting the model’s completeness in detecting positives.
Specificity Specificity (True Negative Rate) Defined as TN / (TN + FP), it assesses the proportion of actual negatives correctly identified. A higher specificity implies fewer false positives.
DT Decision Tree A tree-structured model that splits data based on feature thresholds to predict a target variable. It uses recursive partitioning to maximize information gain or minimize impurity (e.g., Gini or entropy).
RF Random Forest An ensemble of decision trees trained on bootstrapped subsets with feature randomness, improving generalization by averaging predictions to reduce overfitting.
SVM Support vector machine A supervised classifier that finds the optimal hyperplane to separate classes by maximizing the margin between support vectors, applicable in both linear and non-linear spaces via kernel tricks.
XGBoost eXtreme Gradient Boosting An efficient and scalable implementation of gradient boosting that uses second-order derivatives, regularization, and tree pruning for accurate and fast predictive modeling.
ANN Artificial Neural Networks A class of models inspired by biological neurons, composed of layers of interconnected nodes (neurons) that learn hierarchical representations through weighted summation and activation functions.
Cost-NN Cost-Sensitive Neural Network A neural network trained with misclassification cost weights to penalize minority class errors more heavily, often used in imbalanced data contexts.
dFusionModel RF-based fusion of XGBoost submodels A meta-classifier that combines outputs from RF and XGBoost submodels using majority voting or weighted averaging to enhance robustness and accuracy.
LASSO LR LASSO Logistic Regression Logistic regression with L1 regularization that shrinks coefficients to zero, performing variable selection and preventing overfitting in high-dimensional settings.