alpha should not be optimized in updating weight #5

yangsenius · 2019-03-26T09:55:09Z

Line 217 in cfcdd02

self.alpha_normal = nn.Parameter(torch.randn(k, num_ops))

nn.Parameters() will make the alpha and beta registered to model.parameters(), so your optimizer will update the alpha and beta when optimize the weight of operations. So i think the nn.parameters() should not be used in here, which will be not consistent with the paper or original code.

dragen1860 · 2019-04-01T22:50:26Z

@yangsenius you remind me! Thank you.
Have you try 👍

self.alpha_normal = torch.randn(k, num_ops)
self.alpha_reduce = torch.randn(k, num_ops)

What's the performance when you update the code with above statement?
Please tell me if you re-run the exp.

yangsenius · 2019-04-02T05:38:09Z

self.alpha_normal = torch.randn(k, num_ops)
self.alpha_reduce = torch.randn(k, num_ops)

mill make self.alpha_normal and self.alpha_reduce always be torch.floatTensor, somtimes causing error with model.cuda(), this is a little trouble.
maybe


self.alpha_normal = torch.randn(k, num_ops, dtype = self.your_conv.dtype)
self.alpha_reduce = torch.randn(k, num_ops, dtype = self.your_conv.dtype)

is OK?

or

just

self.alpha_normal = nn.Parameter(torch.randn(k, num_ops)) 

def filter(model):
    for name, param in model.name_parameters():
        if 'alpha' in name:
            contiue
        yield param

optimizer = torch.optim.Adam(filter(model),)

What do you think about ? Does it have a better code implementation about this issue?

dragen1860 · 2019-04-02T10:26:21Z

since we usually set device to 'cuda:0', the

self.alpha_reduce = torch.randn(k, num_ops, dtype = torch.device("cuda"))

would be ok option.
and see any problems.

@yangsenius

zh583007354 · 2019-06-12T07:30:32Z

Hi, I also noticed this problem yesterday. I think that making the parameters into two groups maybe a good choice. When training a ConvNet (ie. MobileNet), we always make the weights or parameters of conv having decay : 5e-4, but no decay for BN, so we will define
optimizer = SGD([{param groups1 for conv with decay}, {param groups2 for BN without decay}])

I think we can separate alphas and weights in this way.

@dragen1860 @yangsenius

yangsenius · 2019-06-13T03:46:42Z

Yeah, you get it . @zh583007354

self.alpha_normal = nn.Parameter(torch.randn(k, num_ops)) 
def filter(model):
    for name, param in model.name_parameters():
        if 'alpha' in name:
            contiue
        yield param
optimizer = torch.optim.Adam([ {'weights':filter(model), 'alphas':model.alpha_normal}])

skx6 · 2019-06-26T10:40:02Z

It takes one hour for a epoch to search architecture. However, the paper use "a small network of 8 cells is trained using DARTS for 50 epochs. The search takes one day on a single GPU". If I train 50 epochs. It will take more than two days.

zh583007354 · 2019-06-29T06:18:21Z

@yangsenius hi, I have another question.

I want to know whether it is necessary of the clip_grad_norm_() in train_search.py
loss.backward() nn.utils.clip_grad_norm_(model.parameters(), args.grad_clip) optimizer.step()

If it is necessary to clip the gradient, should it be used for only weight params or all params?

Thank you.

yangsenius · 2019-06-30T12:47:24Z

I think the necessity of this clip_grad_norm_() is unknowable. Because we can't get the gradient range of the parameters, but this should be done to avoid gradient explosions (just in case), although this may not happen. So this code snippet may be useless . @zh583007354

yangsenius changed the title ~~alpha and beta should not be optimized in update weight~~ alpha should not be optimized in update weight Mar 26, 2019

yangsenius changed the title ~~alpha should not be optimized in update weight~~ alpha should not be optimized in updating weight Mar 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

alpha should not be optimized in updating weight #5

alpha should not be optimized in updating weight #5

yangsenius commented Mar 26, 2019

dragen1860 commented Apr 1, 2019 •

edited

Loading

yangsenius commented Apr 2, 2019

dragen1860 commented Apr 2, 2019 •

edited

Loading

zh583007354 commented Jun 12, 2019

yangsenius commented Jun 13, 2019

skx6 commented Jun 26, 2019

zh583007354 commented Jun 29, 2019

yangsenius commented Jun 30, 2019 •

edited

Loading

alpha should not be optimized in updating weight #5

alpha should not be optimized in updating weight #5

Comments

yangsenius commented Mar 26, 2019

dragen1860 commented Apr 1, 2019 • edited Loading

yangsenius commented Apr 2, 2019

dragen1860 commented Apr 2, 2019 • edited Loading

zh583007354 commented Jun 12, 2019

yangsenius commented Jun 13, 2019

skx6 commented Jun 26, 2019

zh583007354 commented Jun 29, 2019

yangsenius commented Jun 30, 2019 • edited Loading

dragen1860 commented Apr 1, 2019 •

edited

Loading

dragen1860 commented Apr 2, 2019 •

edited

Loading

yangsenius commented Jun 30, 2019 •

edited

Loading