Training signal from environments in RLlib #49

jcallaham · 2022-09-03T10:52:51Z

Training a basic PPO agent using RLlib and the script here this is what I'm getting for the "episode reward mean". Somehow it doesn't really seem to be learning anything, although I haven't actually run the learned model forward yet.

jcallaham · 2022-09-03T14:41:57Z

Seems like the problem might be that the flow config and solver both need to be able to reset with the environment.

jcallaham · 2022-09-30T09:03:39Z

Now the signal is changing (though performance doesn't look great)

jcallaham added bug Something isn't working priority High-priority core feature labels Sep 3, 2022

jcallaham self-assigned this Sep 3, 2022

jcallaham mentioned this issue Sep 27, 2022

Debugging PPO training #57

Merged

jcallaham closed this as completed in #57 Sep 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training signal from environments in RLlib #49

Training signal from environments in RLlib #49

jcallaham commented Sep 3, 2022 •

edited

Loading

jcallaham commented Sep 3, 2022

jcallaham commented Sep 30, 2022

Training signal from environments in RLlib #49

Training signal from environments in RLlib #49

Comments

jcallaham commented Sep 3, 2022 • edited Loading

jcallaham commented Sep 3, 2022

jcallaham commented Sep 30, 2022

jcallaham commented Sep 3, 2022 •

edited

Loading