How to deal with an impure reset function #312

Theo-Cheynel · 2023-01-26T13:02:11Z

Theo-Cheynel
Jan 26, 2023

Hi,

I want to train an agent to imitate reference motions from a motion capture dataset. I wrote a custom environment, which works well when the reference motion capture clip is "hardcoded" (when it is the same throughout all environments).
However, I would like to make the reference clip vary across envs, in other words,I want the env.reset to pick a random motion clip (at the moment, motion clips are obtained from another class' __getitem__ method).

The thing is, that would make the reset function an impure function, because its outputs would differ everytime, and JAX jitting only supports pure functions. Is there a workaround you can think of ?

At the very least, is it possible to tell ppo.train not to jit the reset function, while still jitting the step function ?

Thanks for your help

btaba · 2023-01-26T19:47:45Z

btaba
Jan 26, 2023
Maintainer

Hi @Theo-Cheynel , are you able to do something like this in the env.reset:

mocap = jp.ones((10, 3)) * jp.arange(0, 10)[:, None]  # load this once
rng = jax.random.PRNGKey(0)

def reset(rng):
  rng, key = jax.random.split(rng, 2)
  return rng, jax.random.choice(key, mocap, (1,))

rng, val = jax.jit(reset)(rng)

3 replies

btaba Feb 10, 2023
Maintainer

transferred to discussion due to inactivity

Theo-Cheynel Mar 1, 2023
Author

Thanks a lot for answering ! Sorry it took so long, I had to fix several problems with my BRAX environment (one of them was because I stored data in the state.info, updating it at each step, and didn't notice that the AutoResetWrapper wasn't replacing the info, only the qp and obs. In hindsight it made sense, but boy was that tough to debug !).

So now I'm finally facing the issue that I raised in this discussion 😅 So far it's simply overfitting on a single motion sequence, since I didn't deal with the impure function yet. Here's a visual from the very first functional training ! I'm hyped 🚀

motion.imitation.mp4

If I do as you proposed, won't that mean that the entire mocap data is stored by JAX on the GPU, as many times as the number of environments that run in parallel ? I'll try it but I feel like if I have 100 MB of mocap data, il will load 100*128=12.8 GB of RAM. Am I right in assuming so ? And if yes, do you know of a way for JAX to share data across the parallel envs so that it isn't replicated ?

Thank you so much ❤️

btaba Mar 31, 2023
Maintainer

Awesome demo! I don't think the full dataset would be replicated per environment, but if memory becomes an issue, you can always make a host callback to reset states (it doesn't all need to happen in one jitted call)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to deal with an impure reset function #312

{{title}}

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

How to deal with an impure reset function #312

Theo-Cheynel Jan 26, 2023

Replies: 1 comment · 3 replies

btaba Jan 26, 2023 Maintainer

btaba Feb 10, 2023 Maintainer

Theo-Cheynel Mar 1, 2023 Author

btaba Mar 31, 2023 Maintainer

Theo-Cheynel
Jan 26, 2023

Replies: 1 comment 3 replies

btaba
Jan 26, 2023
Maintainer

btaba Feb 10, 2023
Maintainer

Theo-Cheynel Mar 1, 2023
Author

btaba Mar 31, 2023
Maintainer