Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

r2.1.0 ptl fix #11694

Merged
merged 1 commit into from
Dec 21, 2024
Merged

r2.1.0 ptl fix #11694

merged 1 commit into from
Dec 21, 2024

Conversation

pablo-garay
Copy link
Collaborator

r2.1.0 ptl fix

@@ -1,7 +1,7 @@
cloudpickle
fiddle
hydra-core>1.3,<=1.3.2
lightning>2.2.1
lightning>2.2.1,<=2.4.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you addressing the progress printer issue with 2.5? You can fix that directly by calling on_train_epoch_start from inside of a new on_fit_start in the progressprinter class.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eg you could add:

    @override
    def on_fit_start(self, trainer: Trainer, pl_module: LightningModule) -> None:
        self.on_train_epoch_start(trainer, pl_module)

Right before https://github.com/NVIDIA/NeMo/blob/main/nemo/lightning/pytorch/callbacks/progress_printer.py#L114-L120

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that got 2.5 working for me with job resumption without the AttributeError: 'ProgressPrinter' object has no attribute 'total' error.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Virtually all tests are failing. Thats why . Needs workaround before 2.5 upgrade

@pablo-garay pablo-garay merged commit 49ef560 into r2.1.0 Dec 21, 2024
11 checks passed
@pablo-garay pablo-garay deleted the pagaray/r2.1.0_ptl_fix branch December 21, 2024 06:24
@pablo-garay
Copy link
Collaborator Author

This change made tests pass again, so merging it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants