-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apparent memory leak in PPO training #55
Comments
Here's the error:
Actually I'm not so sure this is a memory error now. I've run it with varying numbers of steps per episode (which effectively varies the number of overall episodes) and it always crashes after 3 RLlib iterations. I'm going to try re-running on the medium mesh as well to see if the size of the problem has any impact. |
Same behavior on the medium-resolution mesh, and actually almost the same message appears using the SpinningUp PPO implementation:
So I guess this is not a Ray issue after all. Just for reference, that appears after 125 epochs of 100 steps each using the medium mesh. Also, the error always pops up on line 174, which is here:
I'm going to try a few things:
|
The control updates definitely are different: originally (here in
To currently (in
If I remember correctly the former may not have actually been updating the controls, but possibly it didn't crash? |
Looks like it was actually much simpler... after some more tracking using the SpinningUp implementation it seems like the issue was just that the solver was eventually diverging. Decreasing the time step seems to have resolved the problem - here's the |
Sorry have a deadline today so cannot comment in depth right now. But this is a known problem for which there exist a number of potential approaches to remedy this. It would probably prudent to implement a number of them to protect the user from diverging simulator trajectories. I'll post the references in here once I managed to get out of the abyss of deadline hell. |
Yeah that would be great! It seems like ideally we should be able to set it up so that if the regular sim is pretty stable with respect to CFL and all then the RL training also won't diverge. I'll leave this open to track. |
Actually, after apparently successfully training once with RLLib, I'm still getting this error (on main branch):
This doesn't actually seem related to Ray or RL training at all - I can actually reproduce just by running the PD control example. Which should hopefully make it a bit easier to zero in on at least. |
Alright, should be fixed now. As best I can tell it was actually somehow a discrepancy between floating point types? For some reason I was using Tested with |
When running a very simple (serial) PPO training with the ppo_train.py script the training runs successfully for 3 iterations and then crashes (will post the error message later).
I'm not sure if this is an issue on the Firedrake or Ray side - I've run into memory-leak-type behavior with Firedrake before, but there are a couple of documented instances of this kind of thing with Ray:
Debugging ideas:
env.reset()
ray.rllib.algorithms.callbacks.MemoryTrackingCallbacks
to track in TensorboardThe text was updated successfully, but these errors were encountered: