Fix hsdp_device_mesh=None when enable HSDP and HYBRID_SHARD #402
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Fixes # A variable
hsdp_device_mesh
is assigned None before the same-name functionhsdp_device_mesh()
is called, whenhsdp
andHYBRID_SHARD
are both set. Change the variablehsdp_device_mesh
tohsdp_device_mesh_plan
to avoid conflict.Feature/Issue validation/testing
Please describe the tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.
Test 1
TypeError("'NoneType' object is not callable")
fixed.An error from
torch/distributed/fsdp/_init_utils.py
L107 will be triggered.As neither
process_group
ordevice_mesh
is None. With a temporary modification, the training will proceed (with both HSDP and HYBRID_SHARD enabled)Before submitting
Pull Request section?
to it if that's the case.
Thanks for contributing 🎉!