-
Notifications
You must be signed in to change notification settings - Fork 387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature(nyz): add new middleware distributed demo #321
Conversation
Codecov Report
@@ Coverage Diff @@
## main #321 +/- ##
==========================================
- Coverage 85.39% 84.79% -0.60%
==========================================
Files 532 556 +24
Lines 43943 44718 +775
==========================================
+ Hits 37523 37919 +396
- Misses 6420 6799 +379
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
Add more desc (ci skip) Add timeout on model exchanger
What is the throughput of this? Does this beat SampleFactory? @PaParaZz1 @sailxjx |
@zxzzz0 This is not to compare the speed with sample factory, because you know that the bottleneck of RL training may appear in any one of the collecting, training, and evaluation, for example, too fast collecting may lead to too much generation difference and underfitting of the model, and because of the existence of GIL, the deserialization of data on the training process will also slow down the overall training efficiency, and there are many points that we need to consider in this project. |
No. To clarify, we only care about overall performance, which means the time it will take to reach certain reward in the end. Usually if you can squeeze every drop of performance out of the CPU/GPU, you can learn faster. Environment-side collecting is just one indicator and there are many other indicators as well. You will also have to pay attention to the learner FPS, GPU utilization and other indicators for you to understand the throughput of the whole system. When doing benchmarking, it's not targeted for the collector side but the overall growth speed of reward. |
Yeah, that's right, the purpose of the distributed version is to maximize overall performance while not requiring too much effort to write code on multiple tasks. Another consideration is that we need to go design-first. Only after the upper layer interface is unified and stable, it will be possible to gradually optimize all aspects of performance without disturbing the user. You can see that from version 0.x to version 1.0, we have gradually developed a definite interface style example, and the purpose of this branch is to extend this interface style to distributed operation. |
Sounds good. In the future, please benchmark different design/interface so that you are confident enough to say that you've chosen the design with the best overall performance. If you don't do benchmarking (as I did before for di-engine) and you find something that you could improve the performance after the design is frozen in version 1.0, you can't change it without a major version update. |
6dfebeb
to
813580f
Compare
…ader (#425) * Add singleton log writer * Use get_instance on writer * feature(nyz): polish atari ddp demo and add dist demo * Refactor dist version * Wrap class based middleware * Change if condition in wrapper * Only run enhancer on learner * Support new parallel mode on slurm cluster * Temp data loader * Stash commit * Init data serializer * Update dump part of code * Test StorageLoader * Turn data serializer into storage loader, add storage loader in context exchanger * Add local id and startup interval * Fix storage loader * Support treetensor * Add role on event name in context exchanger, use share_memory function on tensor * Double size buffer * Copy tensor to cpu, skip wait for context on collector and evaluator * Remove data loader middleware * Upgrade k8s parser * Add epoch timer * Dont use lb * Change tensor to numpy * Remove files when stop storage loader * Discard shared object * Ensure correct load shm memory * Add model loader * Rename model_exchanger to ModelExchanger * Add model loader benchmark * Shutdown loaders when task finish * Upgrade supervisor * Dont cleanup files when shutting down * Fix async cleanup in model loader * Check model loader on dqn * Dont use loader in dqn example * Fix style check * Fix dp * Fix github tests * Skip github ci * Fix bug in event loop * Fix enhancer tests, move router from start to __init__ * Change default ttl * Add comments Co-authored-by: niuyazhe <[email protected]>
Description
Related Issue
#102
#176
TODO
Check List