Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single-node distributed processing with Hydra #42

Open
4 tasks
briankosw opened this issue Dec 5, 2020 · 4 comments · May be fixed by #43
Open
4 tasks

Single-node distributed processing with Hydra #42

briankosw opened this issue Dec 5, 2020 · 4 comments · May be fixed by #43
Assignees

Comments

@briankosw
Copy link

briankosw commented Dec 5, 2020

Distributed processing with Hydra in single-node multi-GPU setting, as mentioned here.

  • Explain PyTorch's distributed processing/training.
  • Simple demonstration of various distributed communication primitives.
  • Incorporate Hydra into PyTorch's distributed processing.
  • Using multirun to run multiple processes.

This will serve as an introductory example for #38.

@briankosw
Copy link
Author

@romesco would love your feedback on this!

@romesco
Copy link
Contributor

romesco commented Dec 5, 2020

Sounds great! What do you think about using the MNIST example as a base? Or did you have something even simpler in mind?

I want to make sure we don't over complicate things on this one. As an example, I would say we can start without using the configs directly since they're somewhat orthogonal from demonstrating how hydra and DDP interact. If you make a draft PR, I'll run everything and provide feedback of course =].

@omry
Copy link
Contributor

omry commented Dec 5, 2020

I think the idea here is to not actually train but just demonstrate basic primitives.

@briankosw
Copy link
Author

Sounds great! What do you think about using the MNIST example as a base? Or did you have something even simpler in mind?

If you check this PR out, you'll see a basic distributed processing setup using Hydra and distributed communication primitives between multiple processes. This is basically as simple as it gets and much simpler than MNIST.

I want to make sure we don't over complicate things on this one. As an example, I would say we can start without using the configs directly since they're somewhat orthogonal from demonstrating how hydra and DDP interact. If you make a draft PR, I'll run everything and provide feedback of course =].

So this PR/example will be about how Hydra helps set up distributed processes without using configs? Should the configs aspect be implemented in the other PR?

I think the idea here is to not actually train but just demonstrate basic primitives.

In that case, I will only demonstrate how Hydra can be used to set up distributed processing.

@briankosw briankosw linked a pull request Dec 6, 2020 that will close this issue
@romesco romesco assigned romesco and unassigned romesco Dec 17, 2020
@romesco romesco linked a pull request Dec 17, 2020 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants