Updated the UI for TPU support #201

sayantan1410 · 2022-01-10T10:43:04Z

Description - Adding support for TPU

New feature

I have updated the UI but couldn't figure out the next step: that if XLA-TPU is selected then training should be distributed to 8 processes.

Here's a screenshot of the UI

netlify · 2022-01-10T10:43:06Z

👷 Deploy request for code-generator pending review.
Visit the deploys page to approve it

🔨 Explore the source changes: 460772b

vfdev-5 · 2022-01-10T11:06:59Z

I have updated the UI but couldn't figure out the next step: that if XLA-TPU is selected then training should be distributed to 8 processes.

@ydcjeff can you help with that please ? How to make the following:

when user selects "xla-tpu", training should be only distributed with 8 processes and "Run the training with torch.multiprocessing.spawn".

@sayantan1410 first comment about the UI, basically we can not and should not choose the backend if distirbuted training is not specified => "Choose a Backend" should be a subpart of "Distributed Training" and should be active if distributed training is selected.

sayantan1410 · 2022-01-10T12:41:59Z

@vfdev-5 Okay will change that !!

sayantan1410 · 2022-01-12T11:52:57Z

@vfdev-5 I have made the changes as you suggested, now the "choose a backend" option is shown only when distributed training is selected.

vfdev-5

Thanks for the update @sayantan1410
Now let's move on with content update once a backend is selected.

src/components/TabTraining.vue

src/metadata/metadata.json

sayantan1410 · 2022-01-14T12:03:46Z

@vfdev-5 can you please guide me on how to do that ?

vfdev-5 · 2022-01-14T13:15:34Z

@sayantan1410 we are doing right now a coding sprint (please check our discord, #start-contributing channel). if you can join it, it could be a good opportunity to learn more about the projects and being guided.

sayantan1410 · 2022-01-22T15:06:31Z

@vfdev-5 Sorry for the halt, I have made the changes requested above, can you please guide me how to update the content once a backend is selected.

vfdev-5 · 2022-01-22T15:37:23Z

@sayantan1410 no worries and thanks for the update!
I had an idea on how to progress on this in gradual manner. Adding TPU support would require a lot updates and I'm not quite sure about all of them right now. However, what can we do right now is the following. Let's add "Gloo" backend which is very similar to NCCL. Basically, we can create another PR with almost the same UI update as here but instead of XLA-TPU we put GLOO and we have to make if-else template conditions in the content where we put "nccl" string. What do you think ?

For example, the content changes we would like to do for each template. Let me illustrate on vision classification template

README.md

In the readme we give some launch commands depending on the config, for example:

python -m torch.distributed.launch \
  --nproc_per_node #:::= nproc_per_node :::# \
  --nnodes #:::= it.nnodes :::# \
  --node_rank 0 \
  --master_addr #:::= it.master_addr :::# \
  --master_port #:::= it.master_port :::# \
  --use_env main.py \
  --backend nccl

We can that --nproc_per_node is taken from UI with #:::= nproc_per_node :::# etc
We need to replace all --backend nccl by something similar: --backend #:::= backend :::#.

I think that's it for GLOO backend...

By the way, we need to replace python -m torch.distributed.launch by torchrun, see #199.

sayantan1410 · 2022-01-23T07:51:36Z

@vfdev-5 Okay, then I will make another draft PR with the updates that you suggested, and then we can take that forward. Also Should I close this PR here or keep it as it is ?

vfdev-5 · 2022-01-23T09:20:48Z

Sounds good for another draft PR and let's keep this one open

vfdev-5 · 2022-01-26T14:57:20Z

@sayantan1410 can you sync now this PR against main branch and we could work on adding the TPU option.
If you feel better to close this one and open another one, feel free to do as you are comfortable

sayantan1410 · 2022-01-27T13:22:57Z

@vfdev-5 Yeah sure, doing it !!

sayantan1410 · 2022-01-27T16:37:53Z

@vfdev-5 Added XLA-TPU option in the backend dropdown.

vfdev-5 · 2022-01-27T16:50:52Z

@vfdev-5 Added XLA-TPU option in the backend dropdown.

Sounds good, so what's next ? :)

sayantan1410 · 2022-01-27T16:57:26Z

@vfdev-5 I have a very stupid question, let's say someone selects backend as XLA-TPU, then we will have to check if XLA is there in the system or not and whether TPU is on or not, and if they are present then according to that we will have to give him/her the template code right ?

vfdev-5 · 2022-01-27T17:08:23Z

We do not check for the infrastructure in code-generator app. For example, when specifying nccl and distributed training with 1000 nodes and 10000 processes we just say in the readme how to launch and that's it.
User can download everything as a zip or open in colab where we set GPU by default in metadata.
Same thing should be done for XLA if exported in colab otherwise, we do not check for user's infrastructure.
You can try from your side, take the code and execute it with xla-tpu backend and see what is the error message. I think this can be enough.

sayantan1410 · 2022-01-27T17:09:45Z

@vfdev-5 Okay I will try and let you know the update.

vfdev-5 reviewed Jan 12, 2022

View reviewed changes

src/components/TabTraining.vue Outdated Show resolved Hide resolved

src/metadata/metadata.json Outdated Show resolved Hide resolved

sayantan1410 mentioned this pull request Jan 25, 2022

Added Gloo as a backend option #203

Merged

1 task

vfdev-5 closed this in #203 Jan 26, 2022

vfdev-5 reopened this Jan 26, 2022

sayantan1410 closed this Jan 27, 2022

sayantan1410 force-pushed the main branch from 857d740 to 1e7abf8 Compare January 27, 2022 13:34

Added XLA-TPU to the backend dropdown

68a702c

sayantan1410 reopened this Jan 27, 2022

ran prettier as lint test was failing

460772b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated the UI for TPU support #201

Updated the UI for TPU support #201

sayantan1410 commented Jan 10, 2022

netlify bot commented Jan 10, 2022 •

edited

Loading

vfdev-5 commented Jan 10, 2022 •

edited

Loading

sayantan1410 commented Jan 10, 2022

sayantan1410 commented Jan 12, 2022

vfdev-5 left a comment

sayantan1410 commented Jan 14, 2022

vfdev-5 commented Jan 14, 2022

sayantan1410 commented Jan 22, 2022

vfdev-5 commented Jan 22, 2022 •

edited

Loading

sayantan1410 commented Jan 23, 2022

vfdev-5 commented Jan 23, 2022

vfdev-5 commented Jan 26, 2022 •

edited

Loading

sayantan1410 commented Jan 27, 2022

sayantan1410 commented Jan 27, 2022

vfdev-5 commented Jan 27, 2022 •

edited

Loading

sayantan1410 commented Jan 27, 2022 •

edited

Loading

vfdev-5 commented Jan 27, 2022

sayantan1410 commented Jan 27, 2022

Updated the UI for TPU support #201

Are you sure you want to change the base?

Updated the UI for TPU support #201

Conversation

sayantan1410 commented Jan 10, 2022

Description - Adding support for TPU

netlify bot commented Jan 10, 2022 • edited Loading

vfdev-5 commented Jan 10, 2022 • edited Loading

sayantan1410 commented Jan 10, 2022

sayantan1410 commented Jan 12, 2022

vfdev-5 left a comment

Choose a reason for hiding this comment

sayantan1410 commented Jan 14, 2022

vfdev-5 commented Jan 14, 2022

sayantan1410 commented Jan 22, 2022

vfdev-5 commented Jan 22, 2022 • edited Loading

sayantan1410 commented Jan 23, 2022

vfdev-5 commented Jan 23, 2022

vfdev-5 commented Jan 26, 2022 • edited Loading

sayantan1410 commented Jan 27, 2022

sayantan1410 commented Jan 27, 2022

vfdev-5 commented Jan 27, 2022 • edited Loading

sayantan1410 commented Jan 27, 2022 • edited Loading

vfdev-5 commented Jan 27, 2022

sayantan1410 commented Jan 27, 2022

netlify bot commented Jan 10, 2022 •

edited

Loading

vfdev-5 commented Jan 10, 2022 •

edited

Loading

vfdev-5 commented Jan 22, 2022 •

edited

Loading

vfdev-5 commented Jan 26, 2022 •

edited

Loading

vfdev-5 commented Jan 27, 2022 •

edited

Loading

sayantan1410 commented Jan 27, 2022 •

edited

Loading