Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detach in Lab3-2 & 3-3 #20

Open
pandasfang opened this issue Dec 7, 2017 · 4 comments
Open

Detach in Lab3-2 & 3-3 #20

pandasfang opened this issue Dec 7, 2017 · 4 comments

Comments

@pandasfang
Copy link

pandasfang commented Dec 7, 2017

Dear TA:

In the Lab3-2. why don't we need to detach Discriminator when we backward propagate Generator?

############################
# (2) Update G network: maximize log(D(G(z)))
###########################
netG.zero_grad()
labelv = Variable(label.fill_(real_label))  # fake labels are real for generator cost
output = netD(fake)
errG = criterion(output, labelv)
errG.backward()
D_G_z2 = output.data.mean()
optimizerG.step()
@hui-po-wang
Copy link
Contributor

hui-po-wang commented Dec 7, 2017

Hi @pandasfang,

In optimizerG = optim.Adam(netG.parameters(), lr=opt.lr, betas=(opt.beta1, 0.999)), we tell the optimizer that it only needs to update the parameters of generator. That is, although netD will receive gradients, it won't be updated, so we don't have to detach it.

Now, you may have another question. Why do we call detach in this line output = netD(fake.detach()) ? Well, the answer is that it's not necessary to call detach.

Considering the following example which is a very simple auto-encoder.

fc1 = nn.Linear(1, 2)
fc2 = nn.Linear(2, 1)
opt1 = optim.Adam(fc1.parameters(),lr=1e-1)
opt2 = optim.Adam(fc2.parameters(),lr=1e-1)

x = Variable(torch.FloatTensor([5]))
z = fc1(x)
x_p = fc2(z)
cost = (x_p - x) ** 2
'''
print (z)
print (x_p)
print (cost)
'''
opt1.zero_grad()
opt2.zero_grad()

cost.backward()
for n, p in fc1.named_parameters():
    print (n, p.grad.data)

for n, p in fc2.named_parameters():
    print (n, p.grad.data)


opt1.zero_grad()
opt2.zero_grad()

z = fc1(x)
x_p = fc2(z.detach())
cost = (x_p - x) ** 2

cost.backward()
for n, p in fc1.named_parameters():
    print (n, p.grad.data)

for n, p in fc2.named_parameters():
    print (n, p.grad.data)

The output would be :

weight 
 12.0559
 -8.3572
[torch.FloatTensor of size 2x1]

bias 
 2.4112
-1.6714
[torch.FloatTensor of size 2]

weight 
-33.5588 -19.4411
[torch.FloatTensor of size 1x2]

bias 
-9.9940
[torch.FloatTensor of size 1]

================================================

weight 
 0
 0
[torch.FloatTensor of size 2x1]

bias 
 0
 0
[torch.FloatTensor of size 2]

weight 
-33.5588 -19.4411
[torch.FloatTensor of size 1x2]

bias 
-9.9940
[torch.FloatTensor of size 1]

You can find that there's no influence on the gradients of fc2 though we detach the result from fc1. Once we know that the gradient won't be influenced, we can simply use the optimizerD (which only updates the parameters of discriminator) to update the netD without concerning the generator (even when we don't detach it). However, it may lead to some additional computational cost if you don't detach the parts which you don't need.

Thanks

@hui-po-wang
Copy link
Contributor

hui-po-wang commented Dec 7, 2017

I think it's a good question and you guys can verify if what I told is right (maybe I am wrong because I am still learning, too :) ).

If possible, please keep this thread open, and I think it would be helpful for people who want to know more about detach.

It's also highly welcome to discuss with me.

Thanks

@yyrkoon27
Copy link

yyrkoon27 commented Dec 8, 2017

Soumith's reply in this thread might also clarify things a little bit...
[https://github.com/pytorch/examples/issues/116]

@hui-po-wang
Copy link
Contributor

Hi @yyrkoon27 ,

In this case, it's right. In VAE-GAN, the detach function may be needed for the correctness if you use, for example, opt1 = optim.RMSprop(G.parameters(), lr=1e-1) where G consists of an encoder and a decoder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants