We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When training with a PyTorchTrial, the validation_loader for the PyTorchTrialController is set to None for all processes with rank > 0: https://github.com/determined-ai/determined/blob/main/harness/determined/pytorch/_pytorch_trial.py#L463-L465
validation_loader
None
But then when performing validation with evaluate_full_dataset, the PyTorchTrialController asserts that validation_loader must not be None for all processes: https://github.com/determined-ai/determined/blob/main/harness/determined/pytorch/_pytorch_trial.py#L995
evaluate_full_dataset
1.Train with a PyTorch trial using more than one GPU and evaluate_full_dataset
With evaluate_full_dataset, only the rank=0 process should require that validation_loader is not None. Could the assertion be moved after the if self.is_cheif statement? https://github.com/determined-ai/determined/blob/main/harness/determined/pytorch/_pytorch_trial.py#L996
if self.is_cheif
N/A
No response
The text was updated successfully, but these errors were encountered:
This indeed seems like a pretty obvious bug on our part. 🤦
We'll land this fix before next release (0.37.0) slated to go out in about two weeks. Thank you for reporting this.
Sorry, something went wrong.
No branches or pull requests
Describe the bug
When training with a PyTorchTrial, the
validation_loader
for the PyTorchTrialController is set toNone
for all processes with rank > 0: https://github.com/determined-ai/determined/blob/main/harness/determined/pytorch/_pytorch_trial.py#L463-L465But then when performing validation with
evaluate_full_dataset
, the PyTorchTrialController asserts thatvalidation_loader
must not beNone
for all processes: https://github.com/determined-ai/determined/blob/main/harness/determined/pytorch/_pytorch_trial.py#L995Reproduction Steps
1.Train with a PyTorch trial using more than one GPU and evaluate_full_dataset
Expected Behavior
With evaluate_full_dataset, only the rank=0 process should require that validation_loader is not None. Could the assertion be moved after the
if self.is_cheif
statement? https://github.com/determined-ai/determined/blob/main/harness/determined/pytorch/_pytorch_trial.py#L996Screenshot
N/A
Environment
N/A
Additional Context
No response
The text was updated successfully, but these errors were encountered: