Repository to accompany "On Noisy Evaluation in Federated Hyperparameter Tuning" (MLSys'23).
In a Linux terminal:
- Pull this repository by running
git clone https://github.com/imkevinkuo/noisy-eval-in-fl
. - Install miniconda3 from https://docs.conda.io/en/latest/miniconda.html.
- Set up the conda environment with name
noisyfl
. Run the following command:conda env create -n noisyfl -f environment.yml
- Activate the environment with
conda activate noisyfl
.
torch.cuda
requires CUDA-compatible hardware, CUDA Toolkit, and potentially manufacturer drivers. We ran our experiments with 8 NVIDIA GeForce GTX 1080 Ti GPUs and CUDA Toolkit 11.6. The CUDA installer can be downloaded from https://developer.nvidia.com/cuda-11-6-0-download-archive.
To directly run analysis, we provide training trace logs at https://drive.google.com/file/d/1ayvSx1EhIt9-_ImNBGfbVlABa90UYDvs/view?usp=share_link. Place runs.tar.gz
in the project directory and extract the contents with tar -xzvf runs.tar.gz
.
To run the training scripts, we provide preprocessed data at https://drive.google.com/file/d/1iDK5JvEiv3Vz0jNV05cNGKBGbclBwQNu/view?usp=share_link. Place data.tar.gz
in the project directory and extract the contents with tar -xzvf data.tar.gz
.
notebooks/generate_data.ipynb
contains code to set up the datasets from scratch.
Training uses either a weighted or uniform loss across clients. A uniform evaluation bounds individual client sensitivity, and therefore we use a uniform weighting for both training/eval in our differential privacy experiments. However, in all other experiments, we weight the losses/evals by client size.
The main scripts are named fedtrain_simple.py
, fedtrain_tpe.py
, and fedtrain_bohb.py
.
fedtrain_simple.py
trains a single configuration which is later post-processed in the notebook analysis of RS and HB. The optimization hyperparameters are passed as arguments into this script.
fedtrain_tpe.py
and fedtrain_bohb.py
run the respective HP tuning algorithms and depend on fedtrain_simple.py
for model training and evaluation. The two main arguments --batch
and --eps
are used to set the number of subsampled evaluation clients and the value of epsilon used for differentially private evaluation.
Each fedtrain_*.py
script has a corresponding init_*.py
wrapper. This wrapper runs multiple trials of the corresponding fedtrain
script.
Running each of the init
scripts once without any modifications will produce analysis results for CIFAR10. The current scripts synchronously execute all trials on a single GPU. The number of trials and run-to-GPU assignment can be changed by modifying the init
files..
python init_simple.py
python init_tpe.py
python init_bohb.py
python init_tpe_noisy.py
python init_bohb_noisy.py
After model training, several nested logging directories are created. These correspond to configurations of: the training objective (losses weighted uniformly or by client data size), HPO wrapper (none, TPE, or BOHB), and dataset.
runs_unif, runs_weighted
runs_simple, runs_tpe, runs_bohb
runs_cifar10, runs_femnist, runs_stackoverflow, runs_reddit
train, eval
run0, run1, ...
train
and eval
store the information for individual training runs i.e. a single HP configuration. train/{run_name}
stores the TensorBoard .events
file and and arguments args.json
, while eval/{run_name}
stores a list of evaluated client error rates in a pickle file with the name P{X}_R(Y).pkl
. X
is the probability for the three data heterogeneity settings (0, 0.5, 1) and Y
is the round number (by default, we eval every 15 rounds and train for a total of 405).
Results from the init scripts are logged in the following directories:
runs_weighted/runs_simple/runs_cifar10
runs_weighted/runs_simple/runs_femnist
runs_unif/runs_simple/runs_cifar10
runs_unif/runs_simple/runs_femnist
runs_unif/runs_bohb/runs_cifar10
runs_unif/runs_bohb/runs_cifar10_s_e=100
runs_unif/runs_tpe/runs_cifar10
runs_unif/runs_tpe/runs_cifar10_s_e=100
Run all the cells in the notebook analysis.ipynb
. The plots will be displayed in the notebook.