INTRODUCTION
Fig. 1. FedSA aims to train a student model with the help of selective knowledge from pre-trained but private models. The target task and training data can be different from private teachers, allowing flexible model customization in downstream tasks. |
RELATED WORK
Federated learning
Knowledge amalgamation
Model reusing
METHODS
Problem setup
Federated knowledge amalgamation
Overview
Fig. 2. Overview of the FedSA framework. It comprises three parts: saliency-based knowledge selection, selective aggregation, and target task adaptation. Knowledge selection picks proper models with saliency maps, selective aggregation distills the knowledge from the pre-trained models to local students, and task adaptation strengthens the performance of the target model. |
Saliency-based knowledge selection
Selective aggregation
Task adaptation
Algorithm summary
EXPERIMENTS
Experimental settings
Datasets and models
Segmentation details
Single-task amalgamation
Baselines
Results on single-task amalgamation
Table 1. The performance of FedSA and baseline methods on CIFAR10 and CIFAR100. Symbol + indicates that the method is modified for our problem settings. |
| Dataset | Method | Homogeneous | Empty Cell | Heterogeneous (10%) | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| 1% | 5% | 10% | 100% | Empty Cell | Wrn | Res18 | Vgg | Mob | ||
| CIFAR-10 | Scratch Training | 40.00 | 62.14 | 75.43 | 91.56 | 73.97 | 74.91 | 73.63 | 66.23 | |
| Transfer | 63.70 | 79.38 | 83.34 | 92.88 | - | - | - | - | ||
| KACC10 | 64.52 | 81.43 | 84.52 | 93.43 | 83.96 | 84.60 | 75.77 | 80.44 | ||
| KACFL11 | 37.66 | 53.37 | 62.09 | 88.73 | 63.16 | 64.56 | 59.96 | 34.61 | ||
| SOKA-Net12 | 72.84 | 79.16 | 80.65 | 91.58 | 80.74 | 83.01 | 81.44 | 81.28 | ||
| FedAvg+20 | 69.96 | 71.58 | 78.04 | 89.24 | 76.05 | 71.69 | 80.97 | 81.87 | ||
| FedMD +26 | 71.25 | 83.20 | 85.52 | 92.37 | 81.28 | 84.23 | 80.76 | 71.17 | ||
| FedProx +27 | 36.51 | 55.88 | 63.05 | 87.07 | 46.11 | 65.11 | 61.17 | 49.47 | ||
| FedSA | 77.15 | 85.30 | 87.57 | 93.83 | 84.70 | 86.39 | 85.71 | 82.04 | ||
| CIFAR-100 | Scratch Training | 9.24 | 21.00 | 29.25 | 66.75 | 28.99 | 29.82 | 19.78 | 26.05 | |
| Transfer | 18.43 | 42.54 | 48.45 | 69.27 | - | - | - | - | ||
| KACC10 | 19.28 | 39.41 | 47.61 | 71.86 | 47.05 | 46.95 | 26.85 | 40.98 | ||
| KACFL11 | 6.24 | 12.67 | 28.79 | 46.62 | 18.79 | 21.58 | 13.43 | 20.41 | ||
| SOKA-Net12 | 26.69 | 49.79 | 54.63 | 69.75 | 48.62 | 53.72 | 51.92 | 46.09 | ||
| FedAvg +20 | 20.93 | 29.35 | 36.08 | 62.91 | 42.76 | 34.46 | 49.85 | 44.32 | ||
| FedMD +26 | 28.67 | 44.73 | 50.31 | 70.28 | 42.30 | 49.09 | 42.38 | 30.35 | ||
| FedProx +27 | 4.82 | 15.72 | 21.86 | 52.67 | 16.29 | 22.36 | 11.46 | 13.88 | ||
| FedSA | 32.99 | 52.76 | 57.87 | 73.49 | 50.49 | 58.32 | 53.29 | 47.25 | ||
Ablation study on aggregation strategy
Table 2. The performance of different amalgamation strategies. Accuracy of student models amalgamated from 10 homogeneous and heterogeneous ImageNet32 teachers. |
| Dataset | Strategy | Homogeneous | Heterogeneous (10%) | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| 1% | 5% | 10% | 100% | Wrn | Res | Vgg | Mob | |||
| CIFAR-10 | Fixed + Unweighted | 73.46 | 82.22 | 85.28 | 93.11 | 83.89 | 84.00 | 83.78 | 80.49 | |
| Top-K + Unweighted | 75.14 | 84.15 | 85.55 | 93.28 | 83.23 | 85.76 | 84.07 | 81.76 | ||
| Top-K + Positive Weighted | 74.65 | 79.89 | 87.38 | 93.16 | 83.69 | 84.70 | 83.55 | 81.19 | ||
| Top-K + Negative Weighted | 76.54 | 83.52 | 85.87 | 93.51 | 83.36 | 85.29 | 84.26 | 81.50 | ||
| Least-K + Positive Weighted | 75.50 | 84.44 | 85.16 | 93.21 | 82.78 | 85.72 | 83.85 | 81.14 | ||
| Random + Positive Weighted | 75.81 | 84.15 | 86.45 | 93.65 | 84.17 | 86.10 | 85.23 | 81.30 | ||
| FedSA | 77.15 | 85.30 | 87.57 | 93.83 | 84.70 | 86.39 | 85.71 | 82.04 | ||
| CIFAR-100 | Fixed + Unweighted | 28.42 | 48.72 | 54.05 | 71.38 | 49.08 | 54.97 | 48.71 | 44.21 | |
| Top-K + Unweighted | 32.78 | 49.10 | 55.01 | 71.99 | 50.37 | 55.18 | 50.04 | 44.33 | ||
| Top-K + Positive Weighted | 32.80 | 51.22 | 54.50 | 71.67 | 50.24 | 54.42 | 49.88 | 45.37 | ||
| Top-K + Negative Weighted | 32.69 | 50.22 | 56.13 | 72.44 | 49.39 | 55.34 | 51.16 | 45.24 | ||
| Least-K + Positive Weighted | 32.74 | 50.95 | 55.61 | 71.85 | 49.76 | 55.74 | 49.57 | 45.88 | ||
| Random + Positive Weighted | 32.46 | 52.39 | 56.77 | 72.97 | 50.03 | 57.76 | 51.98 | 46.96 | ||
| FedSA | 32.99 | 52.76 | 57.87 | 73.49 | 50.49 | 58.32 | 53.29 | 47.25 | ||
Visualization of dynamic knowledge selection
Fig. 3. a, Visualization of Dynamic Knowledge Selection for CIFAR10 and CIFAR100. The darker color indicates the higher transferability scores. b-c, The performance of FedSA with different number of nodes selected. |
Table 3. Relation between transfer accuracy and saliency similarity. The similarity is normalized to [0,1] for contrast. High saliency typically reveals better transfer performance, which validates the effectiveness of saliency-based knowledge selection. |
| Dataset | Teacher ID | #1 | #2 | #3 | #4 | #5 | #6 | #7 | #8 | #9 | #10 |
|---|---|---|---|---|---|---|---|---|---|---|---|
| CIFAR-10 | Transfer Acc. (%) | 62.13 | 61.38 | 63.73 | 66.51 | 63.28 | 65.48 | 62.41 | 62.97 | 63.93 | 61.99 |
| Transferability Score | 0.35 | 0.61 | 0.92 | 1.0 | 0.11 | 0.61 | 0.0 | 0.26 | 0.53 | 0.19 | |
| CIFAR-100 | Transfer Acc. (%) | 31.76 | 33.17 | 30.19 | 39.04 | 41.04 | 42.17 | 41.53 | 41.82 | 43.89 | 39.27 |
| Transferability Score | 0.43 | 0.23 | 0.0 | 0.25 | 0.42 | 0.74 | 0.48 | 0.42 | 1.0 | 0.77 |
Multi-task amalgamation
Results on taskonomy
Table 4. Performance of target model amalgamated from Taskonomy models. The classification and semantic segmentation are used as the target task for comparison. |
Visualization of semantic segmentation
Fig. 4. The visualization semantic segmentation results of proposed FedSA and some baselines. The transfer method requires the pre-trained models to be accessible. |

