The adversarial training script is showing strange trend #8

ksouvik52 · 2022-07-22T18:03:46Z

Hi The adversarial training script is showing strage trend, after certain epochs top-1 accuracy has fallen to 1.6% from around 21%. Is it normal?

I used the script for adv training as:
python -m torch.distributed.launch --nproc_per_node=4 --master_port=5672 --use_env main_adv_deit.py --model deit_small_patch16_224_adv --batch-size 128 --data-path /datasets/imagenet-ilsvrc2012 --attack-iter 1 --attack-epsilon 4 --attack-step-size 4 --epoch 100 --reprob 0 --no-repeated-aug --sing singln --drop 0 --drop-path 0 --start_epoch 0 --warmup-epochs 10 --cutmix 0 --output_dir save/deit_adv/deit_small_patch16_224

Here is the training log (till 40 epochs):
{"train_lr": 1.0000000000000031e-06, "train_loss": 6.885785259502969, "test_0_loss": 6.7725973782139715, "test_0_acc1": 0.806, "test_0_acc5": 2.804, "test_5_loss": 6.844994894602477, "test_5_acc1": 0.55, "test_5_acc5": 1.958, "epoch": 0, "n_parameters": 22050664}
{"train_lr": 1.0000000000000031e-06, "train_loss": 6.885785259502969, "test_0_loss": 6.7725973782139715, "test_0_acc1": 0.806, "test_0_acc5": 2.804, "test_5_loss": 6.844994894602477, "test_5_acc1": 0.55, "test_5_acc5": 1.958, "epoch": 0, "n_parameters": 22050664}
{"train_lr": 1.0000000000000031e-06, "train_loss": 6.846427675869634, "test_0_loss": 6.689390176393554, "test_0_acc1": 1.192, "test_0_acc5": 4.378, "test_5_loss": 6.844994894602477, "test_5_acc1": 0.55, "test_5_acc5": 1.958, "epoch": 1, "n_parameters": 22050664}
{"train_lr": 0.00020090000000000288, "train_loss": 6.701197089479981, "test_0_loss": 5.865309043488896, "test_0_acc1": 5.43, "test_0_acc5": 14.672, "test_5_loss": 6.844994894602477, "test_5_acc1": 0.55, "test_5_acc5": 1.958, "epoch": 2, "n_parameters": 22050664}
{"train_lr": 0.00040079999999998546, "train_loss": 6.543532955179588, "test_0_loss": 5.340847122768371, "test_0_acc1": 9.812, "test_0_acc5": 23.782, "test_5_loss": 6.844994894602477, "test_5_acc1": 0.55, "test_5_acc5": 1.958, "epoch": 3, "n_parameters": 22050664}
{"train_lr": 0.0006006999999999715, "train_loss": 6.4769038225916455, "test_0_loss": 5.03732673006796, "test_0_acc1": 13.248, "test_0_acc5": 29.602, "test_5_loss": 6.844994894602477, "test_5_acc1": 0.55, "test_5_acc5": 1.958, "epoch": 4, "n_parameters": 22050664}
{"train_lr": 0.0008006000000000287, "train_loss": 6.315360357340196, "test_0_loss": 5.300121459301969, "test_0_acc1": 10.944, "test_0_acc5": 25.546, "test_5_loss": 6.55244832365313, "test_5_acc1": 2.756, "test_5_acc5": 7.6525, "epoch": 5, "n_parameters": 22050664}
{"train_lr": 0.0010004999999999689, "train_loss": 6.190600837687318, "test_0_loss": 4.9149362563476755, "test_0_acc1": 14.35, "test_0_acc5": 31.418, "test_5_loss": 6.55244832365313, "test_5_acc1": 2.756, "test_5_acc5": 7.6525, "epoch": 6, "n_parameters": 22050664}
{"train_lr": 0.0012004000000000647, "train_loss": 6.088374964529566, "test_0_loss": 5.50498779843575, "test_0_acc1": 10.254, "test_0_acc5": 24.242, "test_5_loss": 6.55244832365313, "test_5_acc1": 2.756, "test_5_acc5": 7.6525, "epoch": 7, "n_parameters": 22050664}
{"train_lr": 0.0014002999999999238, "train_loss": 6.08913704293142, "test_0_loss": 4.774700349977363, "test_0_acc1": 14.72, "test_0_acc5": 32.384, "test_5_loss": 6.55244832365313, "test_5_acc1": 2.756, "test_5_acc5": 7.6525, "epoch": 8, "n_parameters": 22050664}
{"train_lr": 0.0016001999999999618, "train_loss": 6.150533516344121, "test_0_loss": 5.227625224198276, "test_0_acc1": 10.67, "test_0_acc5": 25.058, "test_5_loss": 6.55244832365313, "test_5_acc1": 2.756, "test_5_acc5": 7.6525, "epoch": 9, "n_parameters": 22050664}
{"train_lr": 0.0018001000000000126, "train_loss": 6.101692359891536, "test_0_loss": 5.141786843786161, "test_0_acc1": 11.414, "test_0_acc5": 26.346, "test_5_loss": 6.756372647642403, "test_5_acc1": 2.309, "test_5_acc5": 6.5675, "epoch": 10, "n_parameters": 22050664}
{"train_lr": 0.001951301233713633, "train_loss": 6.093319233182332, "test_0_loss": 4.774902591320924, "test_0_acc1": 14.368, "test_0_acc5": 31.786, "test_5_loss": 6.756372647642403, "test_5_acc1": 2.309, "test_5_acc5": 6.5675, "epoch": 11, "n_parameters": 22050664}
{"train_lr": 0.001941176365109525, "train_loss": 6.128870297345421, "test_0_loss": 5.251185640492503, "test_0_acc1": 11.492, "test_0_acc5": 26.726, "test_5_loss": 6.756372647642403, "test_5_acc1": 2.309, "test_5_acc5": 6.5675, "epoch": 12, "n_parameters": 22050664}
{"train_lr": 0.0019301276034588222, "train_loss": 6.053121808859752, "test_0_loss": 4.758080562108309, "test_0_acc1": 16.252, "test_0_acc5": 34.882, "test_5_loss": 6.756372647642403, "test_5_acc1": 2.309, "test_5_acc5": 6.5675, "epoch": 13, "n_parameters": 22050664}
{"train_lr": 0.0019181658525555538, "train_loss": 6.0439764577136055, "test_0_loss": 4.586510399862962, "test_0_acc1": 16.69, "test_0_acc5": 35.526, "test_5_loss": 6.756372647642403, "test_5_acc1": 2.309, "test_5_acc5": 6.5675, "epoch": 14, "n_parameters": 22050664}
{"train_lr": 0.0019053029172036828, "train_loss": 5.91496213320062, "test_0_loss": 4.488940908904268, "test_0_acc1": 17.398, "test_0_acc5": 36.698, "test_5_loss": 7.4814025707452325, "test_5_acc1": 1.2555, "test_5_acc5": 4.2435, "epoch": 15, "n_parameters": 22050664}
{"train_lr": 0.0018915514915675221, "train_loss": 6.002524321551898, "test_0_loss": 4.450921233922186, "test_0_acc1": 17.934, "test_0_acc5": 37.114, "test_5_loss": 7.4814025707452325, "test_5_acc1": 1.2555, "test_5_acc5": 4.2435, "epoch": 16, "n_parameters": 22050664}
{"train_lr": 0.0018769251466436458, "train_loss": 5.878266204508851, "test_0_loss": 4.308091710831062, "test_0_acc1": 20.404, "test_0_acc5": 41.2, "test_5_loss": 7.4814025707452325, "test_5_acc1": 1.2555, "test_5_acc5": 4.2435, "epoch": 17, "n_parameters": 22050664}
{"train_lr": 0.0018614383168689135, "train_loss": 5.789360093222343, "test_0_loss": 4.410817133793065, "test_0_acc1": 18.154, "test_0_acc5": 38.082, "test_5_loss": 7.4814025707452325, "test_5_acc1": 1.2555, "test_5_acc5": 4.2435, "epoch": 18, "n_parameters": 22050664}
{"train_lr": 0.0018451062858745686, "train_loss": 5.750880390286541, "test_0_loss": 4.467262921391278, "test_0_acc1": 19.266, "test_0_acc5": 39.462, "test_5_loss": 7.4814025707452325, "test_5_acc1": 1.2555, "test_5_acc5": 4.2435, "epoch": 19, "n_parameters": 22050664}
{"train_lr": 0.0018279451714032378, "train_loss": 5.764791792602562, "test_0_loss": 4.67896575738586, "test_0_acc1": 17.392, "test_0_acc5": 37.15, "test_5_loss": 7.557907587735987, "test_5_acc1": 0.9905, "test_5_acc5": 3.1575, "epoch": 20, "n_parameters": 22050664}
{"train_lr": 0.0018099719094030393, "train_loss": 5.759131700348416, "test_0_loss": 4.419680974762636, "test_0_acc1": 19.966, "test_0_acc5": 40.798, "test_5_loss": 7.557907587735987, "test_5_acc1": 0.9905, "test_5_acc5": 3.1575, "epoch": 21, "n_parameters": 22050664}
{"train_lr": 0.0017912042373137494, "train_loss": 5.710006896111605, "test_0_loss": 4.2751427415236405, "test_0_acc1": 20.356, "test_0_acc5": 41.114, "test_5_loss": 7.557907587735987, "test_5_acc1": 0.9905, "test_5_acc5": 3.1575, "epoch": 22, "n_parameters": 22050664}
{"train_lr": 0.0017716606765619972, "train_loss": 5.68051082098322, "test_0_loss": 4.154385426833091, "test_0_acc1": 21.638, "test_0_acc5": 43.102, "test_5_loss": 7.557907587735987, "test_5_acc1": 0.9905, "test_5_acc5": 3.1575, "epoch": 23, "n_parameters": 22050664}
{"train_lr": 0.0017513605142823508, "train_loss": 5.693617649811158, "test_0_loss": 4.25816687512535, "test_0_acc1": 20.994, "test_0_acc5": 41.96, "test_5_loss": 7.557907587735987, "test_5_acc1": 0.9905, "test_5_acc5": 3.1575, "epoch": 24, "n_parameters": 22050664}
{"train_lr": 0.0017303237842843694, "train_loss": 5.6821105527839695, "test_0_loss": 4.421267043301026, "test_0_acc1": 19.116, "test_0_acc5": 39.094, "test_5_loss": 8.897946262237587, "test_5_acc1": 0.514, "test_5_acc5": 1.892, "epoch": 25, "n_parameters": 22050664}
{"train_lr": 0.001708571247280513, "train_loss": 5.69677297047955, "test_0_loss": 4.398700253595852, "test_0_acc1": 19.178, "test_0_acc5": 39.536, "test_5_loss": 8.897946262237587, "test_5_acc1": 0.514, "test_5_acc5": 1.892, "epoch": 26, "n_parameters": 22050664}
{"train_lr": 0.0016861243703990647, "train_loss": 5.740358965097666, "test_0_loss": 4.446112109237348, "test_0_acc1": 19.972, "test_0_acc5": 40.84, "test_5_loss": 8.897946262237587, "test_5_acc1": 0.514, "test_5_acc5": 1.892, "epoch": 27, "n_parameters": 22050664}
{"train_lr": 0.0016630053059970855, "train_loss": 5.712303198760838, "test_0_loss": 4.1932648324234245, "test_0_acc1": 21.566, "test_0_acc5": 42.98, "test_5_loss": 8.897946262237587, "test_5_acc1": 0.514, "test_5_acc5": 1.892, "epoch": 28, "n_parameters": 22050664}
{"train_lr": 0.0016392368698000565, "train_loss": 5.74558376472631, "test_0_loss": 4.124165606513972, "test_0_acc1": 21.932, "test_0_acc5": 43.39, "test_5_loss": 8.897946262237587, "test_5_acc1": 0.514, "test_5_acc5": 1.892, "epoch": 29, "n_parameters": 22050664}
{"train_lr": 0.0016148425183847566, "train_loss": 5.3731044158518175, "test_0_loss": 4.680003530995935, "test_0_acc1": 15.374, "test_0_acc5": 33.588, "test_5_loss": 11.426024395688863, "test_5_acc1": 0.1075, "test_5_acc5": 0.45, "epoch": 30, "n_parameters": 22050664}
{"train_lr": 0.0015898463260310706, "train_loss": 4.259690835869474, "test_0_loss": 5.981102620495181, "test_0_acc1": 5.786, "test_0_acc5": 14.55, "test_5_loss": 11.426024395688863, "test_5_acc1": 0.1075, "test_5_acc5": 0.45, "epoch": 31, "n_parameters": 22050664}
{"train_lr": 0.0015642729609628443, "train_loss": 4.075305948785836, "test_0_loss": 5.933592574686403, "test_0_acc1": 4.598, "test_0_acc5": 13.066, "test_5_loss": 11.426024395688863, "test_5_acc1": 0.1075, "test_5_acc5": 0.45, "epoch": 32, "n_parameters": 22050664}
{"train_lr": 0.001538147661004018, "train_loss": 4.167220209940351, "test_0_loss": 6.295307500501207, "test_0_acc1": 3.228, "test_0_acc5": 9.566, "test_5_loss": 11.426024395688863, "test_5_acc1": 0.1075, "test_5_acc5": 0.45, "epoch": 33, "n_parameters": 22050664}
{"train_lr": 0.001511496208671658, "train_loss": 4.134730825523774, "test_0_loss": 5.972504679850104, "test_0_acc1": 3.806, "test_0_acc5": 11.758, "test_5_loss": 11.426024395688863, "test_5_acc1": 0.1075, "test_5_acc5": 0.45, "epoch": 34, "n_parameters": 22050664}
{"train_lr": 0.0014843449057311518, "train_loss": 4.365966309007885, "test_0_loss": 6.5958606600380065, "test_0_acc1": 2.156, "test_0_acc5": 7.3, "test_5_loss": 12.839466273136347, "test_5_acc1": 0.002, "test_5_acc5": 0.0035, "epoch": 35, "n_parameters": 22050664}
{"train_lr": 0.00145672054724078, "train_loss": 4.49492947772729, "test_0_loss": 6.905164708865429, "test_0_acc1": 1.588, "test_0_acc5": 5.264, "test_5_loss": 12.839466273136347, "test_5_acc1": 0.002, "test_5_acc5": 0.0035, "epoch": 36, "n_parameters": 22050664}
{"train_lr": 0.0014286503951072877, "train_loss": 4.562651729769558, "test_0_loss": 6.958603466617245, "test_0_acc1": 1.594, "test_0_acc5": 5.226, "test_5_loss": 12.839466273136347, "test_5_acc1": 0.002, "test_5_acc5": 0.0035, "epoch": 37, "n_parameters": 22050664}
{"train_lr": 0.0014001621511816529, "train_loss": 4.620032903101804, "test_0_loss": 6.883705623624269, "test_0_acc1": 1.946, "test_0_acc5": 5.582, "test_5_loss": 12.839466273136347, "test_5_acc1": 0.002, "test_5_acc5": 0.0035, "epoch": 38, "n_parameters": 22050664}
{"train_lr": 0.0013712839299212382, "train_loss": 4.635755813831715, "test_0_loss": 7.244386745887312, "test_0_acc1": 0.964, "test_0_acc5": 3.976, "test_5_loss": 12.839466273136347, "test_5_acc1": 0.002, "test_5_acc5": 0.0035, "epoch": 39, "n_parameters": 22050664}
{"train_lr": 0.0013420442306441068, "train_loss": 4.83734727265547, "test_0_loss": 7.3013705145603405, "test_0_acc1": 1.686, "test_0_acc5": 4.768, "test_5_loss": 14.802937962195847, "test_5_acc1": 0.0, "test_5_acc5": 0.0, "epoch": 40, "n_parameters": 22050664}

ksouvik52 · 2022-07-22T18:04:57Z

Any help in this regard is highly appreciated. Is something happening after 30th epoch?

ytongbai · 2022-07-23T19:15:46Z

Hi, thanks for your interest in our work. Yes I think this log doesn't look right to me. Can you try larger batchsize (4096 for example)? You can try the accumulate gradient to mimic the large batch size that we provided in the code.

ytongbai · 2022-07-24T00:29:34Z

Hi, we located the problem:
Can you try to change this line:

ViTs-vs-CNNs/main_adv_deit.py

Line 510 in 99bd87d

    
           linear_scaled_lr = args.lr * args.batch_size * utils.get_world_size() * 4 / args.adjust_lr

to:
linear_scaled_lr = args.lr * args.batch_size * utils.get_world_size() * args.update_freq / args.adjust_lr

where update_freq is your accumulated time.

The update_freq should be set as 8 in your case to maintain the 4096 total batch size.

Sorry we previously temperally changed our code for a fixed setting under a certain machine, but it should be fed with a argument. Will fix this. Please let me know if you have encountered further problem, will be happy to solve!

ksouvik52 · 2022-07-24T00:50:40Z

So, then we are good with batch size of 64 if this line is changed, right?
As per my understanding you are saying args.batch_size * utils.get_world_size() * args.update_freq should be 4096 right? If so, I think for a batch-size of 64, with 4 gpus the update_fre should be 16, right?

ytongbai · 2022-07-24T01:05:49Z

Oh, I just found out that you shrinked the total batch size (--nproc_per_node=4) in your orginal script, right?

We set --nproc_per_node=8, in https://github.com/ytongbai/ViTs-vs-CNNs/blob/99bd87d1ea3a59724887b1b84fe6cda43267ed70/script/advdeit.sh

That means you used half of the batch size than this:

Can you try to maintain the original total batch size first?

ksouvik52 · 2022-07-24T01:26:15Z

I am now using exactly your settings:

python -m torch.distributed.launch --nproc_per_node=8 --master_port=12349 --use_env main_adv_deit.py --model deit_tiny_patch16_224_adv --batch-size=128 --data-path /datasets/imagenet-ilsvrc2012 --attack-iter 1 --attack-epsilon 4 --attack-step-size 4 --epoch 100 --reprob 0 --no-repeated-aug --sing singln --drop 0 --drop-path 0 --start_epoch 0 --warmup-epochs 10 --cutmix 0 --output_dir save/deit_adv/deit_tiny_patch16_224,

, but was just curious what is the issue with this update freq? As I dont see you using this to divide the total dataset? how are you maintaining a virtual batch size of 4096 here? I understand 8x128x4 = 4096, however, did not see its use anywhere.

ytongbai · 2022-07-24T01:31:32Z

Yeah try this first and check if the curve looks healthy.

Please ignore that for now. I thought you already tried 1024 batch size and it collapse so I am thinking this current line is not flexible enough for you to perform accumulate gradient for a even larger batch size.

But try this first and i'll be happy to help if you still got other questions :)

ksouvik52 · 2022-07-24T01:34:27Z

Thanks for your quick response. Hope this works.!

ytongbai mentioned this issue Jul 24, 2022

Is epsilon 4/255 or 8/255 in adversarial training? #7

Closed

ytongbai closed this as completed Jul 24, 2022

ytongbai mentioned this issue Jul 24, 2022

Adversarial training is not working #9

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The adversarial training script is showing strange trend #8

The adversarial training script is showing strange trend #8

ksouvik52 commented Jul 22, 2022

ksouvik52 commented Jul 22, 2022

ytongbai commented Jul 23, 2022

ytongbai commented Jul 24, 2022

ksouvik52 commented Jul 24, 2022 •

edited

Loading

ytongbai commented Jul 24, 2022

ksouvik52 commented Jul 24, 2022 •

edited

Loading

ytongbai commented Jul 24, 2022

ksouvik52 commented Jul 24, 2022

The adversarial training script is showing strange trend #8

The adversarial training script is showing strange trend #8

Comments

ksouvik52 commented Jul 22, 2022

ksouvik52 commented Jul 22, 2022

ytongbai commented Jul 23, 2022

ytongbai commented Jul 24, 2022

ksouvik52 commented Jul 24, 2022 • edited Loading

ytongbai commented Jul 24, 2022

ksouvik52 commented Jul 24, 2022 • edited Loading

ytongbai commented Jul 24, 2022

ksouvik52 commented Jul 24, 2022

ksouvik52 commented Jul 24, 2022 •

edited

Loading

ksouvik52 commented Jul 24, 2022 •

edited

Loading