You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, @samiul272 , After my careful debugging, I finally found the problem. I have found that if I take the repository code directly and run the default command in README.md: python main_resnet.py --data_name CIFAR10 \ --model_name resnet18 \ --control_name 1_100_0.1_non-iid-2_dynamic_a1-b1-c1-d1-e1_bn_1_1 \ --exp_name roll_test \ --algo roll \ --g_epoch 3200 \ --l_epoch 1 \ --lr 2e-4 \ --schedule 1200 \ --seed 31 \ --num_experiments 3 \ --devices 0 1 2
then each client locally uses the Adam optimizer instead of the SGD optimizer!
I guess the reason for this is that your default optimizer in config.yml is Adam. Although you changed the value of cfg['optimizer_name'] in the process_control function in the utils.py file, this change is only valid for the main process. However, the ray framework runs in parallel, assigning a different process to each client, so when the client declares a new optimizer in step function, all the parameters it gets from cfg are still parameters in the config.yml file, which means that the client is actually running the Adam optimizer. To test this, we print out the information for the optimizer on the client side, as shown below.
Then, ① set optimizer_name to Adam in the config.yml file (which is also the default setting in your code), run the command above and we can see the result as follow:
② set optimizer_name to SGD in the config.yml file, run the command above and we can see that:
After testing, mode ① can achieve the effect of Table 3, while mode ② cannot be trained. That's why I ran out of practice in issue # 7.
The text was updated successfully, but these errors were encountered:
Hi @Sherrylife, thanks for letting me know. I will review the issue. As far as I remember, the clients receive the config file from the main process during initialization. I think I put the config in main in Ray's shared memory using ray.put(cfg). I could be wrong. I will check what is actually being passed around.
Hi, @samiul272 , After my careful debugging, I finally found the problem. I have found that if I take the repository code directly and run the default command in README.md:
python main_resnet.py --data_name CIFAR10 \ --model_name resnet18 \ --control_name 1_100_0.1_non-iid-2_dynamic_a1-b1-c1-d1-e1_bn_1_1 \ --exp_name roll_test \ --algo roll \ --g_epoch 3200 \ --l_epoch 1 \ --lr 2e-4 \ --schedule 1200 \ --seed 31 \ --num_experiments 3 \ --devices 0 1 2
then each client locally uses the Adam optimizer instead of the SGD optimizer!
I guess the reason for this is that your default optimizer in config.yml is Adam. Although you changed the value of
cfg['optimizer_name']
in theprocess_control
function in theutils.py
file, this change is only valid for the main process. However, the ray framework runs in parallel, assigning a different process to each client, so when the client declares a new optimizer instep
function, all the parameters it gets fromcfg
are still parameters in theconfig.yml
file, which means that the client is actually running theAdam
optimizer. To test this, we print out the information for the optimizer on the client side, as shown below.Then, ① set
optimizer_name
to Adam in theconfig.yml
file (which is also the default setting in your code), run the command above and we can see the result as follow:② set
optimizer_name
to SGD in theconfig.yml
file, run the command above and we can see that:After testing, mode ① can achieve the effect of Table 3, while mode ② cannot be trained. That's why I ran out of practice in issue # 7.
The text was updated successfully, but these errors were encountered: