-
Notifications
You must be signed in to change notification settings - Fork 388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed two issues: SSIM evaluation and DDP subsampling bug #60
Conversation
Hi @z-fabian! Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. If you have received this in error or have any questions, please contact us at [email protected]. Thanks! |
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice find on the stack bug. It looks to be an error I introduced during the refactor. I'd like to merge that one right away and do the seed separately (maybe another PR). For the seed you could update the unet
and varnet
examples in experimental
to use deterministic=True
and seed_everything()
.
fastmri/data/mri_data.py
Outdated
@@ -114,7 +114,7 @@ class SliceDataset(Dataset): | |||
what fraction of the volumes should be loaded. | |||
""" | |||
|
|||
def __init__(self, root, transform, challenge, sample_rate=1): | |||
def __init__(self, root, transform, challenge, sample_rate=1, seed=0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Random seeds should be done at the project/trainer level. See the following:
https://pytorch-lightning.readthedocs.io/en/latest/trainer.html#reproducibility
And this:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So for example I should add seed_everything(seed_val)
at the start of varnet.py and set deterministic = True
for the trainer object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. I don't know if it will fix your val/test discrepancy but it should make your experiment behave better.
The SSIM bug may be impacting things but I don't know if that will fix the discrepancy either. Would be interested in what you see, first.
fastmri/data/mri_data.py
Outdated
@@ -126,6 +126,7 @@ def __init__(self, root, transform, challenge, sample_rate=1): | |||
|
|||
files = list(pathlib.Path(root).iterdir()) | |||
if sample_rate < 1: | |||
random.seed(seed) # get the same files in every process |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See init arg comment.
fastmri/mri_module.py
Outdated
output = torch.stack([out for _, out in sorted(outputs[fname])], dim=0).numpy() | ||
target = torch.stack([tgt for _, tgt in sorted(targets[fname])], dim=0).numpy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice find on this one. You can remove the dim=0 here.
fastmri/mri_module.py
Outdated
log_metrics = { | ||
f"metrics/{metric}": values / tot_examples | ||
for metric, values in metrics.items() | ||
} | ||
print(log_metrics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this - should not be default behavior.
Okay, thanks for the feedback. I will make two separate PRs shortly for the two issues. |
(1) There was a bug in
validation_epoch_end
in theMriModule
where the slices were concatenated along a spatial dimension instead of a new slice dimension.evaluate.ssim
operated on these single concatenated slices instead of volumes and therefore there was no averaging along the slice dimension. I also added a print to show validation metrics in command line output.(2) The random seed of
SliceDataset
has not been explicitly set. Different processes during ddp training had different random seed. This led to different parts of the training data selected by different workers ifsub_sample < 1
. I set the random seed to all workers to 0, but we could also pass the random seed from the argparser if needed.