When training high-level policy, is it a bug to use the fixed observation(first one) while iterating in time? #6

minuk302 · 2024-04-09T22:46:08Z

Hi,

When training the high-level policy in skimo_agent.py, z_next_pred is initialized as the first observation(line 616) and it is not updated at all after that.
Assuming from the comment and the paper, it seems like there should be a function call for hl_agent.model.imagine_step to update z_next_pred to the next imagine step. However, there is no such function call.
Is it a bug? or am I missing something?

Also, the code seems to suggest using the 'encoded ground-truth state' for the task policy when calculating the skill_prior_loss. But, in paper (Ep 7). it uses the imagined state to calculate the skill_prior_loss. I would like to know the logistics behind, why to use imagine step for the actor loss and why to use ground-truth state for the prior loss

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When training high-level policy, is it a bug to use the fixed observation(first one) while iterating in time? #6

When training high-level policy, is it a bug to use the fixed observation(first one) while iterating in time? #6

minuk302 commented Apr 9, 2024

When training high-level policy, is it a bug to use the fixed observation(first one) while iterating in time? #6

When training high-level policy, is it a bug to use the fixed observation(first one) while iterating in time? #6

Comments

minuk302 commented Apr 9, 2024