-
Notifications
You must be signed in to change notification settings - Fork 372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Overhaul of train.py and adding Chesapeake CVPR trainer #103
Conversation
…the configuration files are structured
Codecov Report
@@ Coverage Diff @@
## main #103 +/- ##
==========================================
- Coverage 83.65% 79.32% -4.34%
==========================================
Files 31 32 +1
Lines 1927 2089 +162
==========================================
+ Hits 1612 1657 +45
- Misses 315 432 +117
Continue to review full report at Codecov.
|
# Render the image, ground truth mask, and predicted mask for the first | ||
# image in the batch | ||
img = np.rollaxis( # convert image to channels last format | ||
batch["image"][0].cpu().numpy(), 0, 3 | ||
) | ||
mask = batch["mask"][0].cpu().numpy() | ||
pred = y_hat_hard[0].cpu().numpy() | ||
fig, axs = plt.subplots(1, 3, figsize=(12, 4)) | ||
axs[0].imshow(img[:, :, :3]) | ||
axs[0].axis("off") | ||
axs[1].imshow(mask, vmin=0, vmax=6, cmap=CMAP, interpolation="none") | ||
axs[1].axis("off") | ||
axs[2].imshow(pred, vmin=0, vmax=6, cmap=CMAP, interpolation="none") | ||
axs[2].axis("off") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we instead move the plotting logic to the ChesapeakeCVPR
dataset class? That's how RasterDataset
and VectorDataset
are designed. Then you can just call dataset.plot(sample)
and the code can be reused by anyone using the dataset in their own code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RasterDataset
plot takes a single Tensor as input, while this code is plotting (image, mask, predictions). Do you imagine multiple versions of plot?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I think this is the first GeoDataset we have with both image
and mask
. We should change plot
to take in a sample dict instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gave things another review. My biggest complaint with this chunk of code is a lack of documentation and testing. I realize we're kind of busy trying to finish experiments before we can publish things, but we need to make sure that this code is actually well-tested and documented before we start making releases.
@@ -0,0 +1,22 @@ | |||
trainer: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My biggest complaint about all this OmegaConf stuff is that there doesn't seem to be any documentation on what options are supported or what possible values they can take. There's no way to get a help message without argparse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is discussed in much more detail here -- facebookresearch/hydra#633.
For now I think comments in the yaml files are OK -- we can do more here. I think there will be few scenarios in which a user is trying to configure experiments without looking at the trainer code.
) | ||
|
||
|
||
class ChesapeakeCVPRSegmentationTask(LightningModule): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These trainer modules desperately need unit tests at some point, our overall code coverage is plummeting, see #109
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tried running a trainer :)?
We ran into a very obvious problem in "tested" code this morning: I wanted to run python train.py
with the cyclone dataset. This works in my environment because the dataset is where I expect and this works in tests because we have fake data where we expect it. This did not work in practice because other people don't have the dataset downloaded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment train.py
isn't "tested" code. In the long run, I want to have integration tests that use real data and may take several hours to run. These will only be run on release branches, not on main/PRs. Everything is set up to do this except someone needs to actually write the tests. The unit tests will only be for coverage and catching minor issues.
|
If you look at our current API docs, you'll see that many of these methods are undocumented and only mention the names of arguments, not what they mean. We should try adding
Yep, that's fine. None of these suggestions are requirements that need to be done now, just pointing them out so we can think about them. As soon as we submit the paper I plan on going back and adding a lot of unit tests and fixing bugs so we can safely release. |
Some key points
train.py
.train.py
saves the final config that is used for the experiment to the experiment output directoryAnd.... it is working well! Here is output from tensorboard showing val image, val mask, predictions:
Things to add in the nearish future: