We constructed a two-stage refining strategy by transfer learning that takes into consideration both local and global characteristics of ophthalmic images for T2DM risk assessment (Supplementary Fig. 3). This strategy improved the model performance and robustness for T2DM risk assessment (
Fig. 3, Supplementary Table 5). On the validation set, the G, L, G2L, and L2G models achieved AUROCs of 0.8869 (95% CI, 0.8751-0.8988), 0.886 (95% CI, 0.8741-0.8983), 0.9169 (95% CI, 0.9072-0.9266), and 0.8867 (95% CI, 0.8749-0.8992), respectively. On the internal test set and external test set 1, G achieved AUROCs of 0.8874 (95% CI, 0.8794-0.8951) and 0.915 (95% CI, 0.8986-0.9299), respectively, while L achieved AUROCs of 0.8894 (95% CI, 0.8812-0.8967) and 0.9182 (95% CI, 0.9025-0.9334), respectively, and G2L achieved AUROCs of 0.8912 (95% CI, 0.8837-0.8991) and 0.9163 (95% CI, 0.9002-0.9313), respectively. Moreover, L2G achieved AUROCs of 0.8893 (95% CI, 0.8818-0.8968) and 0.9167 (95% CI, 0.9009-0.9325), respectively. G2L achieved sensitivity of 0.935 (95% CI, 0.9207-0.9484), 0.9474 (95% CI, 0.8816-0.9868), and 0.9045 (95% CI, 0.8971-0.9123) on T2DM, prediabetes, and HR-QS samples, respectively, and specificity of 0.6806 (0.6597-0.6996) on non-HR samples of the internal test set, sensitivity of 0.7908 (95% CI, 0.75-0.8291), 0.65 (95% CI, 0.5-0.775), and 0.6928 (95% CI, 0.6536-0.7386) on T2DM, prediabetes, and HR-QS samples, respectively, and specificity of 0.9182 (0.8931-0.9413) on non-HR samples of external test set 1 (Supplementary Table 6). Therefore, the two-stage fine-tuned models, especially G2L, exhibited higher MCC and AUROC values than the one-stage models on most test sets, demonstrating the utility of the transfer learning method. The two-stage strategy improved model performance by two possible factors: 1) T2DM-relevant ocular characteristics are region specific, so the fine-tuning process endowed G2L/L2G models with the ability to learn both global and local features; 2) compared to models pretrained with ImageNet [
32], refined models provided better source parameters for transfer learning in T2DM risk assessment.