Higher order derivative products? #1102

rdangovs · 2020-03-27T02:07:24Z

Suppose I have a loss function $L(a,x)$ that takes data $a$ and parameters $x$ . Can I compute the following?

$\nabla_x L(b, x - \eta \nabla_x L(a, x))$

I.e. can I differentiate through the gradient? Please find below a pytorch example of how to achieve that for $L(a,x)=ax^3$ and $x=a=\eta=1, b=2$ .

import torch

# define the loss as a function of data and param
def loss(data, param):
    return data * (param ** 3)

# inner and outer data
a = 1. 
b = 2.

# init the param and make a copy in `y`
x = torch.tensor([1.], requires_grad=True)
y = x

# finetune `y` for one step
# note that `create_graph=True` which allows graph of the derivative 
# to be computed, and thus allowing higher order derivative products
innerloss = loss(a, y)
grad = torch.autograd.grad(b, y, create_graph=True)[0]
y = y - 1 * grad

# get gradients for the original param `x`
outerloss = loss(dataouter, y)
print(torch.autograd.grad(outerloss, x)[0]) # result is -120

I wonder whether this behavior could be reproduced in Flux for a large class of loss functions in an easy way, say starting from this one? Here is one unsuccessful attempt of mine.

η = 1.

function innerloss(a, x)
    sum(a .* x .^ 3)
end

function outerloss(a, b, x)
    innergs = gradient(params(x)) do 
        innerloss(a, x)
    end    
    adaptedx = x - η * innergs[x]
    innerloss(b, adaptedx)
end

a = [1.]; b = [2.]; x = [1.];

gs = gradient(params(x)) do
    outerloss(a, b, x)
end

print(gs[x], '\n') # [-552.0]

It seems to me that here gradient does not differentiate through innergs properly. I am afraid my understanding of gradient is currently limited to make this work right now. Could you help me? Any advice is appreciated. Thanks!

The text was updated successfully, but these errors were encountered:

CarloLucibello · 2020-03-27T07:46:07Z

I modified outerloss from your examples, is this the expected result?

η = 1.

function innerloss(a, x)
    sum(a .* x .^ 3)
end

function outerloss(a, b, x)
    g = gradient(x -> innerloss(a, x), x)[1]
    adaptedx = x - η * g
    innerloss(b, adaptedx)
end

a = [1.]; b = [2.]; x = [1.];

gs = gradient(params(x)) do
    outerloss(a, b, x)
end

println(gs[x]) # [-264.0]

rdangovs · 2020-03-27T12:36:46Z

@CarloLucibello: thanks! I am afraid this is not what we look for exactly. Given the example above, the computation one would like to do is the following

$\nabla_x L(b, x - \eta \nabla_x L(a,x)) = \nabla_x b(x - \eta(3ax^2))^3 = 3b(x-\eta(3ax^2))^2(1-6\eta ax)=-120$

I guess, another way to tackle this is to write the chain rule explicitly

$\nabla_x L(b, x - \eta \nabla_x L(a,x)) = (I - \eta \nabla^2_x L(a,x))L'(b, x - \eta \nabla_x L(a,x))$

So then my code would have to compute the Hessian $\nabla_x L(b, x - \eta \nabla_x L(a,x)) = \nabla^2_x L(a,x)$ explicitly. It seems that this issue is similar to #129.

Any thoughts on how I can solve this elegantly? Thanks!

lssimoes · 2020-10-13T16:47:53Z

@rdangovs would perhaps Zygote.hessian suit you?

rdangovs · 2020-10-21T03:30:48Z

@lssimoes Thanks! Will give it a go!

rdangovs closed this as completed Oct 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Higher order derivative products? #1102

Higher order derivative products? #1102

rdangovs commented Mar 27, 2020

CarloLucibello commented Mar 27, 2020

rdangovs commented Mar 27, 2020

lssimoes commented Oct 13, 2020

rdangovs commented Oct 21, 2020

Higher order derivative products? #1102

Higher order derivative products? #1102

Comments

rdangovs commented Mar 27, 2020

CarloLucibello commented Mar 27, 2020

rdangovs commented Mar 27, 2020

lssimoes commented Oct 13, 2020

rdangovs commented Oct 21, 2020