Pytorch Lightning Model Question #107

calvinp0 · 2024-06-24T14:34:56Z

Hi!

I have built my model using Pytorch Lighntning, thus it has the functions training_step, validation_step etc. I attempted to follow the tutorial here: https://torch-uncertainty.github.io/auto_tutorials/tutorial_der_cubic.html#gathering-everything-and-training-the-model

But it errors with NotImplementedError: Module [CMPNNModel] is missing the required "forward" function (which I guess may be obvious). So does this mean to utilise this package I will need to change my model from a PyTorch Lightning one to a Torch one? Or have I done something incorrect.

Thank you!

The text was updated successfully, but these errors were encountered:

o-laurent · 2024-06-24T15:13:29Z

Hi @calvinp0!

Thank you for your feedback!

You need to define the __forward__(self, x: Tensor) -> Tensor method of your Lightning module (as shown in the starter example). A Lightning module is an extension of an nn.Module, and therefore should include a __forward__. If you do so, you will be able to test and train your model with the RegressionRoutine.

However, if you use trainer.fit or trainer.test, it will not use your own loops but those of the RegressionRoutine, so it may not work depending on your model. In the general supervised case, we would advise wrapping a simple torch.nn.Module in the RegressionRoutine.

🚧 Our implementation for regression is still unstable but we have made progress in the soon-to-come 0.2.1 version that we will merge in the following days. Reach out and raise issues if you have other questions or concerns. 🚧

To read if you want to keep your LightningModule:

You won't have the computation of the metrics that come with the RegressionRoutine - you can directly use the DERLoss from torch_uncertainty.losses and the NormalInverseGammaLayer from torch_uncertainty.layers.distributions. But anyway, you will need the __forward__(self, x: Tensor) -> Tensor method.

In any case, don't hesitate to give us more details here or contact us through Discord

(written with @alafage)

calvinp0 · 2024-07-08T17:36:59Z

Thank @o-laurent

I in the end decided to create simple torch.nn.Module version of my model. However, I want to clarify, would the package be able to accommodate a customLightningDataModule, for example:

class DataModule(pl.LightningDataModule):
    def __init__(self, data_dir: str, features_generator: List[str], batch_size: int, num_workers: int, persistent_workers: bool = False):
        super().__init__()
        self.data_dir = data_dir
        self.features_generator = features_generator
        self.batch_size = batch_size
        self.num_workers = num_workers
        self.persistent_workers = persistent_workers

    def prepare_data(self):
        """
        This method is called only once and on only one GPU. It's used to perform any data download or preparation steps.
        """
        print("Preparing data...")

    def setup(self, stage: Optional[str] = None):
        """
        Call in SmilesDataset
        Need to consider if splitting via scaffolding or 5 fold etc.
        
        Multiple GPU
        """
        self.data = SmilesDataset(f'{self.data_dir}/delaney-processed.csv', features_generator=self.features_generator)
        if stage == 'fit' or stage is None:
            self.train_data = self.data.get_split('train')
            self.val_data = self.data.get_split('val')
            self.test_data = self.data.get_split('test')
        if stage == 'test':
            self.test_data = self.data.get_split('test')

    def train_dataloader(self):
        return DataLoader(self.train_data, batch_size=self.batch_size, num_workers=self.num_workers, collate_fn=collate_molgraph_dataset, persistent_workers=self.persistent_workers)

    def val_dataloader(self):
        return DataLoader(self.val_data, batch_size=self.batch_size, num_workers=self.num_workers, collate_fn=collate_molgraph_dataset, persistent_workers=self.persistent_workers)

    def test_dataloader(self):
        return DataLoader(self.test_data, batch_size=self.batch_size, num_workers=self.num_workers, collate_fn=collate_molgraph_dataset, persistent_workers=self.persistent_workers)

I ask, because I have attempted to follow the tutorial here, whilst using my model and dataset but am now receiving when I run the code:

from lightning.pytorch import Trainer
trainer = Trainer(max_epochs=5) #, enable_progress_bar=False)
trainer.fit(model=routine, datamodule=data_module)

and receive this error:

ValueError                                Traceback (most recent call last)
Cell In[27], line 3
      1 from lightning.pytorch import Trainer
      2 trainer = Trainer(max_epochs=5) #, enable_progress_bar=False)
----> 3 trainer.fit(model=routine, datamodule=data_module)

File ~/miniforge3/envs/deepchem_cuda/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py:543, in Trainer.fit(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
    541 self.state.status = TrainerStatus.RUNNING
    542 self.training = True
--> 543 call._call_and_handle_interrupt(
    544     self, self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
    545 )

File ~/miniforge3/envs/deepchem_cuda/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py:44, in _call_and_handle_interrupt(trainer, trainer_fn, *args, **kwargs)
     42     if trainer.strategy.launcher is not None:
     43         return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
---> 44     return trainer_fn(*args, **kwargs)
     46 except _TunerExitException:
     47     _call_teardown_hook(trainer)

File ~/miniforge3/envs/deepchem_cuda/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py:579, in Trainer._fit_impl(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
    572 assert self.state.fn is not None
    573 ckpt_path = self._checkpoint_connector._select_ckpt_path(
    574     self.state.fn,
    575     ckpt_path,
    576     model_provided=True,
    577     model_connected=self.lightning_module is not None,
    578 )
--> 579 self._run(model, ckpt_path=ckpt_path)
    581 assert self.state.stopped
    582 self.training = False

File ~/miniforge3/envs/deepchem_cuda/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py:946, in Trainer._run(self, model, ckpt_path)
    943 self.__setup_profiler()
    945 log.debug(f"{self.__class__.__name__}: preparing data")
--> 946 self._data_connector.prepare_data()
    948 call._call_setup_hook(self)  # allow user to set up LightningModule in accelerator environment
    949 log.debug(f"{self.__class__.__name__}: configuring model")

File ~/miniforge3/envs/deepchem_cuda/lib/python3.10/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:89, in _DataConnector.prepare_data(self)
     87 lightning_module = trainer.lightning_module
     88 # handle datamodule prepare data:
---> 89 if datamodule is not None and is_overridden("prepare_data", datamodule):
     90     prepare_data_per_node = datamodule.prepare_data_per_node
     91     with _InfiniteBarrier():

File ~/miniforge3/envs/deepchem_cuda/lib/python3.10/site-packages/lightning/pytorch/utilities/model_helpers.py:42, in is_overridden(method_name, instance, parent)
     40     if parent is None:
     41         _check_mixed_imports(instance)
---> 42         raise ValueError("Expected a parent")
     44 from lightning_utilities.core.overrides import is_overridden as _is_overridden
     46 return _is_overridden(method_name, instance, parent)

ValueError: Expected a parent

calvinp0 · 2024-07-08T17:40:33Z

Thank @o-laurent

I in the end decided to create simple torch.nn.Module version of my model. However, I want to clarify, would the package be able to accommodate a customLightningDataModule, for example:

class DataModule(pl.LightningDataModule):
    def __init__(self, data_dir: str, features_generator: List[str], batch_size: int, num_workers: int, persistent_workers: bool = False):
        super().__init__()
        self.data_dir = data_dir
        self.features_generator = features_generator
        self.batch_size = batch_size
        self.num_workers = num_workers
        self.persistent_workers = persistent_workers

    def prepare_data(self):
        """
        This method is called only once and on only one GPU. It's used to perform any data download or preparation steps.
        """
        print("Preparing data...")

    def setup(self, stage: Optional[str] = None):
        """
        Call in SmilesDataset
        Need to consider if splitting via scaffolding or 5 fold etc.
        
        Multiple GPU
        """
        self.data = SmilesDataset(f'{self.data_dir}/delaney-processed.csv', features_generator=self.features_generator)
        if stage == 'fit' or stage is None:
            self.train_data = self.data.get_split('train')
            self.val_data = self.data.get_split('val')
            self.test_data = self.data.get_split('test')
        if stage == 'test':
            self.test_data = self.data.get_split('test')

    def train_dataloader(self):
        return DataLoader(self.train_data, batch_size=self.batch_size, num_workers=self.num_workers, collate_fn=collate_molgraph_dataset, persistent_workers=self.persistent_workers)

    def val_dataloader(self):
        return DataLoader(self.val_data, batch_size=self.batch_size, num_workers=self.num_workers, collate_fn=collate_molgraph_dataset, persistent_workers=self.persistent_workers)

    def test_dataloader(self):
        return DataLoader(self.test_data, batch_size=self.batch_size, num_workers=self.num_workers, collate_fn=collate_molgraph_dataset, persistent_workers=self.persistent_workers)

I ask, because I have attempted to follow the tutorial here, whilst using my model and dataset but am now receiving when I run the code:

from lightning.pytorch import Trainer
trainer = Trainer(max_epochs=5) #, enable_progress_bar=False)
trainer.fit(model=routine, datamodule=data_module)

and receive this error:

ValueError                                Traceback (most recent call last)
Cell In[27], line 3
      1 from lightning.pytorch import Trainer
      2 trainer = Trainer(max_epochs=5) #, enable_progress_bar=False)
----> 3 trainer.fit(model=routine, datamodule=data_module)

File ~/miniforge3/envs/deepchem_cuda/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py:543, in Trainer.fit(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
    541 self.state.status = TrainerStatus.RUNNING
    542 self.training = True
--> 543 call._call_and_handle_interrupt(
    544     self, self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
    545 )

File ~/miniforge3/envs/deepchem_cuda/lib/python3.10/site-packages/lightning/pytorch/trainer/call.py:44, in _call_and_handle_interrupt(trainer, trainer_fn, *args, **kwargs)
     42     if trainer.strategy.launcher is not None:
     43         return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
---> 44     return trainer_fn(*args, **kwargs)
     46 except _TunerExitException:
     47     _call_teardown_hook(trainer)

File ~/miniforge3/envs/deepchem_cuda/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py:579, in Trainer._fit_impl(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
    572 assert self.state.fn is not None
    573 ckpt_path = self._checkpoint_connector._select_ckpt_path(
    574     self.state.fn,
    575     ckpt_path,
    576     model_provided=True,
    577     model_connected=self.lightning_module is not None,
    578 )
--> 579 self._run(model, ckpt_path=ckpt_path)
    581 assert self.state.stopped
    582 self.training = False

File ~/miniforge3/envs/deepchem_cuda/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py:946, in Trainer._run(self, model, ckpt_path)
    943 self.__setup_profiler()
    945 log.debug(f"{self.__class__.__name__}: preparing data")
--> 946 self._data_connector.prepare_data()
    948 call._call_setup_hook(self)  # allow user to set up LightningModule in accelerator environment
    949 log.debug(f"{self.__class__.__name__}: configuring model")

File ~/miniforge3/envs/deepchem_cuda/lib/python3.10/site-packages/lightning/pytorch/trainer/connectors/data_connector.py:89, in _DataConnector.prepare_data(self)
     87 lightning_module = trainer.lightning_module
     88 # handle datamodule prepare data:
---> 89 if datamodule is not None and is_overridden("prepare_data", datamodule):
     90     prepare_data_per_node = datamodule.prepare_data_per_node
     91     with _InfiniteBarrier():

File ~/miniforge3/envs/deepchem_cuda/lib/python3.10/site-packages/lightning/pytorch/utilities/model_helpers.py:42, in is_overridden(method_name, instance, parent)
     40     if parent is None:
     41         _check_mixed_imports(instance)
---> 42         raise ValueError("Expected a parent")
     44 from lightning_utilities.core.overrides import is_overridden as _is_overridden
     46 return _is_overridden(method_name, instance, parent)

ValueError: Expected a parent

Actually, I discovered the issue was calling pytorch lightning differently during import as reported here: Lightning-AI/pytorch-lightning#17485

o-laurent · 2024-07-09T10:29:21Z

Hi @calvinp0,

Thanks for the details! We could add a comment to advise users to use lightning.pytorch if you find it relevant. As a side note, you could use our slightly modified version of the Trainer called TUTrainer in the utils folder to have improved metric printing. We also plan to improve this side of the library in the following months.

Please don't hesitate to let us know if we can help you in any way.

calvinp0 · 2024-07-10T09:18:33Z

Hi @calvinp0,

Thanks for the details! We could add a comment to advise users to use lightning.pytorch if you find it relevant. As a side note, you could use our slightly modified version of the Trainer called TUTrainer in the utils folder to have improved metric printing. We also plan to improve this side of the library in the following months.

Please don't hesitate to let us know if we can help you in any way.

Hi @o-laurent , yes I think that would be great to add that advisement for future users.

I will try to utilise the TUTTrainer, thanks! On that note, and please tell me if I should open up another thread in discussions, is there a tutorial or information on the Monte Carlo Dropout wrapper: https://github.com/ENSTA-U2IS-AI/torch-uncertainty/blob/main/torch_uncertainty/models/wrappers/mc_dropout.py

o-laurent · 2024-07-15T10:29:11Z

Hi again @calvinp0,

Thanks, we'll find a place to highlight this when we improve the documentation.

We can create a discussion thread or chat on Discord if you have more specific questions. Otherwise, I've just slightly improved the wrapper, its documentation, and the MC-Dropout tutorial on the dev branch.

NB: Since the modified version of the tutorial is not yet pushed on main, our website's tutorial page remains outdated.

o-laurent self-assigned this Jun 24, 2024

o-laurent assigned alafage Jun 24, 2024

o-laurent added the question Further information is requested label Jun 24, 2024

o-laurent linked a pull request Jun 24, 2024 that will close this issue

✨ Refactor wrappers & PP, Add Checkpoint Ensembles, EMA, SWA, & SWAG, Add LaplaceApprox & ABNN #98

Merged

5 tasks

o-laurent closed this as completed in #98 Jun 26, 2024

o-laurent reopened this Jun 26, 2024

calvinp0 closed this as completed Jul 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pytorch Lightning Model Question #107

Pytorch Lightning Model Question #107

calvinp0 commented Jun 24, 2024

o-laurent commented Jun 24, 2024

calvinp0 commented Jul 8, 2024

calvinp0 commented Jul 8, 2024

o-laurent commented Jul 9, 2024

calvinp0 commented Jul 10, 2024

o-laurent commented Jul 15, 2024

Pytorch Lightning Model Question #107

Pytorch Lightning Model Question #107

Comments

calvinp0 commented Jun 24, 2024

o-laurent commented Jun 24, 2024

calvinp0 commented Jul 8, 2024

calvinp0 commented Jul 8, 2024

o-laurent commented Jul 9, 2024

calvinp0 commented Jul 10, 2024

o-laurent commented Jul 15, 2024