-
Notifications
You must be signed in to change notification settings - Fork 214
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Parallelization for standalone mode with NCCL (#487)
- Loading branch information
Showing
15 changed files
with
641 additions
and
25 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,85 @@ | ||
# Parallelization for standalone mode | ||
|
||
To facilitate developers to quickly verify their algorithms, we designed and implemented `StandaloneMultiGPURunner` with torch distributed data parallel (DDP). The new runner can better utilize the computing resources of multiple GPUs and accelerate training in standalone mode of FederatedScope. | ||
|
||
## When to use | ||
Use `StandaloneMultiGPURunner` when you have **multiple GPUs (>=2)** in your machine and need quick verification with **standalone mode**. | ||
|
||
|
||
## Configuration | ||
|
||
Add `federate.process_num` item in the configuration file to parallelize the training. | ||
|
||
> Note: `federate.process_num` only takes effect when `use_gpu=True`, `backend='torch'`, `federate.mode='standalone'` and `federate.share_local_model=False`, and the value is required to be not greater than the number of GPUs. | ||
```yaml | ||
use_gpu: True | ||
backend: 'torch' | ||
device: 0 | ||
early_stop: | ||
patience: 5 | ||
seed: 12345 | ||
federate: | ||
mode: standalone | ||
client_num: 100 | ||
total_round_num: 20 | ||
sample_client_rate: 0.2 | ||
share_local_model: False | ||
process_num: 4 # run 4 processes simultaneously | ||
... | ||
``` | ||
|
||
## Use cases | ||
|
||
Here we give an example to demonstrate the efficiency of `StandaloneMultiGPURunner` compared to `StandaloneRunner`. The configuration file and experiment result are listed below. | ||
The experiment result shows that the totoal running time of `StandaloneMultiGPURunner` is only 1/3 of `StandaloneRunner` in the case of 8 GPUs. | ||
|
||
```yaml | ||
use_gpu: True | ||
device: 0 | ||
early_stop: | ||
patience: 5 | ||
seed: 12345 | ||
federate: | ||
mode: standalone | ||
client_num: 100 | ||
total_round_num: 10 | ||
sample_client_rate: 0.4 | ||
share_local_model: False | ||
# use StandaloneMultiGPURunner with 8 GPUs | ||
process_num: 8 | ||
# use StandaloneRunner | ||
# process_num: 1 | ||
|
||
data: | ||
root: data/ | ||
type: femnist | ||
splits: [0.6,0.2,0.2] | ||
batch_size: 10 | ||
subsample: 0.05 | ||
num_workers: 0 | ||
transform: [['ToTensor'], ['Normalize', {'mean': [0.1307], 'std': [0.3081]}]] | ||
model: | ||
type: convnet2 | ||
hidden: 2048 | ||
out_channels: 62 | ||
train: | ||
local_update_steps: 1 | ||
batch_or_epoch: epoch | ||
optimizer: | ||
lr: 0.01 | ||
weight_decay: 0.0 | ||
grad: | ||
grad_clip: 5.0 | ||
criterion: | ||
type: CrossEntropyLoss | ||
trainer: | ||
type: cvtrainer | ||
eval: | ||
freq: 10 | ||
metrics: ['acc', 'correct'] | ||
``` | ||
| | StandaloneMultiGPURunner | StandaloneRunner | | ||
| :---: | :---: | :---: | | ||
| Total running time (minute) | 0.2406 | 0.7292 | |
Empty file.
Oops, something went wrong.