feat: add convenient selection of data parallelism #335

shpface · 2022-04-19T23:34:39Z

Issue #, if available:

Description of changes:
Adds a parameter to easily populate the hyperparameters of CreateJob for the use of sagemaker data parallelism.

Testing done:
unit tests, integ tests

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

I have read the CONTRIBUTING doc
I used the commit message format described in CONTRIBUTING
I have updated any necessary documentation, including READMEs and API docs (if appropriate)

Tests

I have added tests that prove my fix is effective or that my feature works (if appropriate)
I have checked that my tests are not configured for a specific region or account (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

src/braket/jobs/quantum_job_creation.py

Co-authored-by: Aaron Berdy <[email protected]>

ajberdy · 2022-04-25T17:24:29Z

src/braket/aws/aws_quantum_job.py

@@ -134,6 +135,10 @@ def create(
                to execute the job. Default: InstanceConfig(instanceType='ml.m5.large',
                instanceCount=1, volumeSizeInGB=30).

+            distribution (str): A str that specifies how the job should be distributed. If set to


Should we add a note that it's intended for use with >1 instance count?

I don't think so. Data parallel distribution could also be used with a single multi-gpu instance

Ah, I didn't realize that. In that case, is there any potential a user may want to create a local job with data parallel distribution if their local hardware supports it?

I think we can cut local mode out of scope. According to this link, SageMaker local mode does not support distributed training with local GPU

christianbmadsen

Let's change the name from distribution to data_parallel

shpface · 2022-04-26T22:09:23Z

Let's change the name from distribution to data_parallel

The option is kept as distribution : str rather than data_parallel : bool to leave the door open to future distribution methods.

feat: add convenient selection of data parallelism

1000b71

shpface requested a review from a team as a code owner April 19, 2022 23:34

shpface requested a review from ajberdy April 25, 2022 17:13

ajberdy reviewed Apr 25, 2022

View reviewed changes

src/braket/jobs/quantum_job_creation.py Outdated Show resolved Hide resolved

fix spelling

fca5c5b

Co-authored-by: Aaron Berdy <[email protected]>

ajberdy reviewed Apr 25, 2022

View reviewed changes

ajberdy approved these changes Apr 25, 2022

View reviewed changes

yitchen-tim approved these changes Apr 25, 2022

View reviewed changes

christianbmadsen reviewed Apr 26, 2022

View reviewed changes

Rename distribution to 'data_parallel'

50d371c

shpface merged commit 4ed61b3 into local-sim-jobs Apr 26, 2022

shpface deleted the local-sim-jobs-ddp branch April 26, 2022 21:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add convenient selection of data parallelism #335

feat: add convenient selection of data parallelism #335

shpface commented Apr 19, 2022

ajberdy Apr 25, 2022

shpface Apr 25, 2022

ajberdy Apr 25, 2022

yitchen-tim Apr 25, 2022

christianbmadsen left a comment

shpface commented Apr 26, 2022

feat: add convenient selection of data parallelism #335

feat: add convenient selection of data parallelism #335

Conversation

shpface commented Apr 19, 2022

Merge Checklist

General

Tests

ajberdy Apr 25, 2022

Choose a reason for hiding this comment

shpface Apr 25, 2022

Choose a reason for hiding this comment

ajberdy Apr 25, 2022

Choose a reason for hiding this comment

yitchen-tim Apr 25, 2022

Choose a reason for hiding this comment

christianbmadsen left a comment

Choose a reason for hiding this comment

shpface commented Apr 26, 2022