-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[wip] fix imagenet example: lr_scheduler, loader workers, batch size when ddp #2432
[wip] fix imagenet example: lr_scheduler, loader workers, batch size when ddp #2432
Conversation
Hello @ruotianluo! Thanks for updating this PR. There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-08-08 18:14:30 UTC |
40a3f2f
to
94f28bd
Compare
Not done yet. Fixing evaluation now. |
Mind add a test for the example, similar to here #2285 or create a small synthetic dataset and run a few steps... |
Will try. |
@Borda Not exactly sure how test should look like. I add one mimicking the commit you attached. Please advice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, this is what I had in mind, just resolve the image source...
f7e80a6
to
cbf87d5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🚀
it seems that the TPU build is not pushed to GKE... @zcain117
|
def test_dataloader(self, *args, **kwargs): | ||
return self.val_dataloader(*args, **kwargs) | ||
|
||
def test_step(self, *args, **kwargs): | ||
return self.validation_step(*args, **kwargs) | ||
|
||
def test_epoch_end(self, *args, **kwargs): | ||
return self.validation_epoch_end(*args, **kwargs) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not correct to directly redirect to validation methods here, because the metric names will refer to "val_...". This will affect the logging and progress bar display.
You can do it like this but you need to fetch the output and replace the key names to "test_...".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
anyway, in v0.9 this will change with the new structured outputs:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@awaelchli mind edit it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main reason I do this is to use the Trainer.test(). For imagenet, the evaluation is supposed to run on validation set, so the progress bar is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just want to make sure this example is solid and does not create a misunderstanding of what test and eval is.
You can always pass in the val dataloader to Trainer.test(), but what I mean is that the logged plots will look weird if you run validation during training and then at the end also run test, which will append the logs to the validation results if the have the same names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like this?
outputs = self.validation_epoch_end(*args, **kwargs)
outputs = {k.replace('val', 'test'):v for k,v in outputs.items()}
return outputs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed something.
pl_examples/test_examples.py
Outdated
_make_image(os.path.join(tmpdir, split, class_id, str(image_id)+'.JPEG')) | ||
|
||
cli_args = cli_args.split(' ') if cli_args else [] | ||
cli_args += ['--data-path', tmpdir] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to change to cli_args += ['--data-path', str(tmpdir)]
to pass python -m pytest pl_examples/test_examples.py
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Borda I changed it in the recent commit so that I can pass the test locally, feel free to change it back.
Codecov Report
@@ Coverage Diff @@
## imagenet_example #2432 +/- ##
==================================================
+ Coverage 59% 90% +32%
==================================================
Files 79 79
Lines 7152 7239 +87
==================================================
+ Hits 4196 6533 +2337
+ Misses 2956 706 -2250 |
This pull request is now in conflict... :( |
22d43ab
to
af79cf2
Compare
This pull request is now in conflict... :( |
af79cf2
to
e9172f2
Compare
This pull request is now in conflict... :( |
e9172f2
to
c968633
Compare
This pull request is now in conflict... :( |
finishing the PR here in #2889 with your commits added. thanks for your patience :) |
@awaelchli a better way is to merge this PR to your continuous branch... |
This pull request is now in conflict... :( |
* fix imagenet example: lr_scheduler, loader workers, batch size when ddp * Fix evaluation for imagenet example * add imagenet example test * cleanup * gpu * add imagenet example evluation test * fix test output * test is fixed in master, remove unecessary hack * CHANGE * Apply suggestions from code review * image net example * update imagenet example * update example * pep * imports * type hint * docs * obsolete arg * [wip] fix imagenet example: lr_scheduler, loader workers, batch size when ddp (#2432) * fix imagenet example: lr_scheduler, loader workers, batch size when ddp * Fix evaluation for imagenet example * add imagenet example test * cleanup * gpu * add imagenet example evluation test * fix test output * test is fixed in master, remove unecessary hack * CHANGE * Apply suggestions from code review Co-authored-by: Jirka <[email protected]> Co-authored-by: Adrian Wälchli <[email protected]> * update chlog * add missing chlog * pep * pep Co-authored-by: Ruotian Luo <[email protected]> Co-authored-by: Jirka <[email protected]>
What does this PR do?
Use the learning rate scheduler from the official pytorch examples/imagenet
Add workers as an argument (instead of using 0)
Fix batch size, when use ddp as distributed backend.
Fixes #2422