Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training signal from environments in RLlib #49

Closed
jcallaham opened this issue Sep 3, 2022 · 2 comments · Fixed by #57
Closed

Training signal from environments in RLlib #49

jcallaham opened this issue Sep 3, 2022 · 2 comments · Fixed by #57
Assignees
Labels
bug Something isn't working priority High-priority core feature

Comments

@jcallaham
Copy link
Collaborator

jcallaham commented Sep 3, 2022

Training a basic PPO agent using RLlib and the script here this is what I'm getting for the "episode reward mean". Somehow it doesn't really seem to be learning anything, although I haven't actually run the learned model forward yet.

image

@jcallaham jcallaham added bug Something isn't working priority High-priority core feature labels Sep 3, 2022
@jcallaham jcallaham self-assigned this Sep 3, 2022
@jcallaham
Copy link
Collaborator Author

Seems like the problem might be that the flow config and solver both need to be able to reset with the environment.

@jcallaham
Copy link
Collaborator Author

Now the signal is changing (though performance doesn't look great)

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority High-priority core feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant