meta_update with a single task and meta-loss calculated with current weight? #15

hwijeen · 2019-01-25T12:26:14Z

Hi Kate, thanks for the Pytorch code of MAML!

I have two questions(in which I suspect a bug?) on your implementation.

Line 10, Algorithm2 from the original paper indicates that meta_update is performed using each D'_i. To do so with your code, I think the function meta_update need access to every task sampled, since each task contains its D'_i in your implementation.

pytorch-maml/src/maml.py

Line 172 in 75907ac

self.meta_update(task, grads)

However, it seems that you perform meta_update with a single task, resulting in using only one D'_i of a specific task.

Line 10 also states that meta-loss is calculated with adapted parameters.

pytorch-maml/src/maml.py

Line 71 in 75907ac

loss, out = forward_pass(self.net, in_, target)

You seem to have calculated meta-loss with self.net, which I think is "original parameters"(\theta_i) in stead of adapted parameters.

Am I missing something?

katerakelly · 2019-01-25T18:43:43Z

Thanks for your interest in my repo!

The gradients are accumulated across tasks in this line: https://github.com/katerakelly/pytorch-maml/blob/master/src/maml.py#L164
The line you linked to is a bit of a hack that uses a hook to replace the grad fields with the grads from the adapted parameters. See here for actual computation of meta-gradients: https://github.com/katerakelly/pytorch-maml/blob/master/src/inner_loop.py#L47

hwijeen · 2019-01-28T07:21:16Z

Thank you for the quick reply!

You've referenced https://github.com/katerakelly/pytorch-maml/blob/master/src/maml.py#L164, in which you accumulate gradients across task(using data D). This is related to line 6-7 in Algorithm 2 of the original paper.
However, my question was about using D' of each task to perform meta-update, which is in line 10!
I think your implementation use one single D', in stead of each D', when calculating "meta-loss" across tasks.

katerakelly · 2019-01-28T19:53:49Z

The line I referenced is accumulating meta-gradients.
In Algorithm 2 in the MAML paper, lines 5-8 are implemented in inner_loop.py, and the gradients used in line 10 are computed there also. The actual update is applied in maml.py after these gradients have been accumulated across tasks.

katerakelly · 2019-02-03T20:28:08Z

The important thing is that the gradients computed for each task in the meta-batch are computed with respect to the same original model parameters (theta, in the paper). This process can be parallelized, see this un-merged PR: #1

…

On Sun, Feb 3, 2019 at 9:56 AM pppplin ***@***.***> wrote: Hi Kate, In algorithm 2 in the MAML paper, it seems like each gradient is calculated separately not sequentially (which seems to be the case in inner_loop.py). Does this difference matter? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#15 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADplVmtPA6q9SfL21o0_4Q8IclX7jaS8ks5vJyLBgaJpZM4aSyVB> .

-- Kate Rakelly UC Berkeley EECS PhD Student [email protected]

katerakelly closed this as completed Jul 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

meta_update with a single task and meta-loss calculated with current weight? #15

meta_update with a single task and meta-loss calculated with current weight? #15

hwijeen commented Jan 25, 2019

katerakelly commented Jan 25, 2019

hwijeen commented Jan 28, 2019

katerakelly commented Jan 28, 2019

katerakelly commented Feb 3, 2019 via email

meta_update with a single task and meta-loss calculated with current weight? #15

meta_update with a single task and meta-loss calculated with current weight? #15

Comments

hwijeen commented Jan 25, 2019

katerakelly commented Jan 25, 2019

hwijeen commented Jan 28, 2019

katerakelly commented Jan 28, 2019

katerakelly commented Feb 3, 2019 via email