Single-node distributed processing with Hydra #42

briankosw · 2020-12-05T03:50:33Z

Distributed processing with Hydra in single-node multi-GPU setting, as mentioned here.

Explain PyTorch's distributed processing/training.
Simple demonstration of various distributed communication primitives.
Incorporate Hydra into PyTorch's distributed processing.
Using multirun to run multiple processes.

This will serve as an introductory example for #38.

briankosw · 2020-12-05T03:51:12Z

@romesco would love your feedback on this!

romesco · 2020-12-05T07:17:23Z

Sounds great! What do you think about using the MNIST example as a base? Or did you have something even simpler in mind?

I want to make sure we don't over complicate things on this one. As an example, I would say we can start without using the configs directly since they're somewhat orthogonal from demonstrating how hydra and DDP interact. If you make a draft PR, I'll run everything and provide feedback of course =].

omry · 2020-12-05T08:50:52Z

I think the idea here is to not actually train but just demonstrate basic primitives.

briankosw · 2020-12-05T09:34:09Z

Sounds great! What do you think about using the MNIST example as a base? Or did you have something even simpler in mind?

If you check this PR out, you'll see a basic distributed processing setup using Hydra and distributed communication primitives between multiple processes. This is basically as simple as it gets and much simpler than MNIST.

I want to make sure we don't over complicate things on this one. As an example, I would say we can start without using the configs directly since they're somewhat orthogonal from demonstrating how hydra and DDP interact. If you make a draft PR, I'll run everything and provide feedback of course =].

So this PR/example will be about how Hydra helps set up distributed processes without using configs? Should the configs aspect be implemented in the other PR?

I think the idea here is to not actually train but just demonstrate basic primitives.

In that case, I will only demonstrate how Hydra can be used to set up distributed processing.

briankosw mentioned this issue Dec 5, 2020

Single-node ImageNet DDP implementation #38

Draft

briankosw linked a pull request Dec 6, 2020 that will close this issue

Basic distributed processing with Hydra #43

Draft

romesco assigned romesco and unassigned romesco Dec 17, 2020

romesco linked a pull request Dec 17, 2020 that will close this issue

Basic distributed processing with Hydra #43

Draft

romesco assigned briankosw Dec 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Single-node distributed processing with Hydra #42

Single-node distributed processing with Hydra #42

briankosw commented Dec 5, 2020 •

edited

Loading

briankosw commented Dec 5, 2020

romesco commented Dec 5, 2020

omry commented Dec 5, 2020

briankosw commented Dec 5, 2020

Single-node distributed processing with Hydra #42

Single-node distributed processing with Hydra #42

Comments

briankosw commented Dec 5, 2020 • edited Loading

briankosw commented Dec 5, 2020

romesco commented Dec 5, 2020

omry commented Dec 5, 2020

briankosw commented Dec 5, 2020

briankosw commented Dec 5, 2020 •

edited

Loading