TensorBoard images logging by datamodules plot() #1598

roybenhayun · 2023-09-29T11:49:02Z

roybenhayun
Sep 29, 2023

would be happy to get an explanation on the datamodule plot() that is called and returns a figure logged to TensorBoard. It's a very useful feature for debugging and reviewing the training. it seems that "occasionally" a figure is being logged. we would like to know how this works and when, and possibly to be able to control the logging.

for example, can see the plot() is called in the validation step once:

sample = unbind_samples(batch)[0]
fig = datamodule.plot(sample)
summary_writer = self.logger.experiment
summary_writer.add_figure(f"image/{batch_idx}", fig, global_step=self.global_step)

not sure where how this happens in training.

Questions:

should the datamodule extend torchgeo.datamodules base classes or pl.LightningDataModule? (it look both options possible but not clear).
when is the datamodule plot() called during training, and with which arguments?
is it possible to control the frequency and on which events the datamodule plot() is called?

thanks!

Answered by adamjstewart

Sep 29, 2023

should the datamodule extend torchgeo.datamodules base classes or pl.LightningDataModule? (it look both options possible but not clear).

I would recommend extending GeoDataModule or NonGeoDataModule. That's what they're there for.

when is the datamodule plot() called during training, and with which arguments?

It actually isn't called during training, it's only called during validation. But when you run trainer.fit, it runs both the train and validation steps.

is it possible to control the frequency and on which events the datamodule plot() is called?

This would be a cool feature to add, want to open a PR?

how is it possible to get the coordinates or location of a sample when it is …

View full answer

roybenhayun · 2023-09-29T12:11:43Z

roybenhayun
Sep 29, 2023
Author

also, for the logging, how is it possible to get the coordinates or location of a sample when it is being processed or plotted?

0 replies

adamjstewart · 2023-09-29T14:46:26Z

adamjstewart
Sep 29, 2023
Maintainer

should the datamodule extend torchgeo.datamodules base classes or pl.LightningDataModule? (it look both options possible but not clear).

I would recommend extending GeoDataModule or NonGeoDataModule. That's what they're there for.

when is the datamodule plot() called during training, and with which arguments?

It actually isn't called during training, it's only called during validation. But when you run trainer.fit, it runs both the train and validation steps.

is it possible to control the frequency and on which events the datamodule plot() is called?

This would be a cool feature to add, want to open a PR?

how is it possible to get the coordinates or location of a sample when it is being processed or plotted?

At the moment, this isn't possible, as the bbox entry of the sample is deleted. But #1407 and similar issues propose ways we could transform bbox so that it can be kept and used during prediction to stitch images back together. This would also allow it to be used for logging, for example to plot the image using cartopy.

0 replies

roybenhayun · 2023-10-01T20:21:08Z

roybenhayun
Oct 1, 2023
Author

This would be a cool feature to add, want to open a PR?

sure. I'll open a PR and start logging the needs that come while working on a task.

0 replies

kaybe20 · 2023-10-24T17:29:58Z

kaybe20
Oct 24, 2023

Regarding the datamodule.plot() call during validation: Do you have to implement the plot function in custom DataModules?
I have a custom NonGeoDataModule and during the trainer.fit() call it says: 'NoneType' object has no attribute 'plot'
That error is thrown at the datamodule.plot() call that was mentioned here.

I haven't had this error in torchgeo before the update.

1 reply

adamjstewart Oct 24, 2023
Maintainer

Can you share a minimal reproducible example? The base class already has a plot method that simply calls the plot method of the dataset. This sounds similar to #1551 which was fixed in #1585 but I can't tell without being able to reproduce the issue.

kaybe20 · 2023-10-24T20:48:26Z

kaybe20
Oct 24, 2023

Of course, with the following code snippet I was able to reproduce it:

import torch
import os
from torchgeo.models import ViTSmall16_Weights
from torchgeo.datasets import NonGeoDataset
from torchgeo.trainers import RegressionTask
from lightning.pytorch import Trainer
from torch.utils.data import DataLoader

accelerator = "gpu" if torch.cuda.is_available() else "cpu"
default_root_dir = os.path.join("experiments")
num_workers = 2
max_epochs = 10
fast_dev_run = False

weights = ViTSmall16_Weights.LANDSAT_ETM_SR_MOCO

class CustomDataset(NonGeoDataset):
    def __init__(self):
        self.labels = [1]*300

    def __len__(self):
        return len(self.labels)

    def __getitem__(self, index: int):
        label = self.labels[index]
        return {"image" : torch.rand(6,224,224), "label" : torch.Tensor(label)}

reg_task = RegressionTask(model="vit_small_patch16_224",
                          weights=weights,
                          in_channels=6,
                          num_outputs=1,
                          loss="mse",
                          lr=0.001,
                          patience=5)

trainer = Trainer(
    accelerator=accelerator,
    default_root_dir=default_root_dir,
    fast_dev_run=fast_dev_run,
    log_every_n_steps=1,
    min_epochs=1,
    max_epochs=max_epochs,
)

trainer.fit(
    model=reg_task, 
    train_dataloaders=DataLoader(CustomDataset(), batch_size=16),
    val_dataloaders=DataLoader(CustomDataset(), batch_size=16))

4 replies

adamjstewart Oct 24, 2023
Maintainer

You aren't using NonGeoDataModule in this example, only NonGeoDataset. But I think it should be possible to support this. The following change fixes the issue for me:

diff --git a/torchgeo/trainers/regression.py b/torchgeo/trainers/regression.py
index 2d142dac..b2847556 100644
--- a/torchgeo/trainers/regression.py
+++ b/torchgeo/trainers/regression.py
@@ -185,6 +185,7 @@ class RegressionTask(BaseTask):
         if (
             batch_idx < 10
             and hasattr(self.trainer, "datamodule")
+            and hasattr(self.trainer.datamodule, "plot")
             and self.logger
             and hasattr(self.logger, "experiment")
             and hasattr(self.logger.experiment, "add_figure")

Need to do this for all trainers. Also need to add tests to make sure the issue doesn't come back.

kaybe20 Oct 24, 2023

Sorry I mixed up DataModule and Dataset, but thank you very much :)

adamjstewart Oct 27, 2023
Maintainer

@kaybe20 want to open a PR that applies the above fix to all trainers?

kaybe20 Oct 28, 2023

I opened a PR now, had to read into it first, because I've never used GitHub to that extent in the past :)
PR is #1703

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TensorBoard images logging by datamodules plot() #1598

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 5 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

TensorBoard images logging by datamodules plot() #1598

roybenhayun Sep 29, 2023

Replies: 5 comments · 5 replies

roybenhayun Sep 29, 2023 Author

adamjstewart Sep 29, 2023 Maintainer

roybenhayun Oct 1, 2023 Author

kaybe20 Oct 24, 2023

adamjstewart Oct 24, 2023 Maintainer

kaybe20 Oct 24, 2023

adamjstewart Oct 24, 2023 Maintainer

kaybe20 Oct 24, 2023

adamjstewart Oct 27, 2023 Maintainer

kaybe20 Oct 28, 2023

roybenhayun
Sep 29, 2023

Replies: 5 comments 5 replies

roybenhayun
Sep 29, 2023
Author

adamjstewart
Sep 29, 2023
Maintainer

roybenhayun
Oct 1, 2023
Author

kaybe20
Oct 24, 2023

adamjstewart Oct 24, 2023
Maintainer

kaybe20
Oct 24, 2023

adamjstewart Oct 24, 2023
Maintainer

adamjstewart Oct 27, 2023
Maintainer