Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stepwise LR scheduler #20211

Open
wants to merge 23 commits into
base: master
Choose a base branch
from

Conversation

01AbhiSingh
Copy link
Contributor

@01AbhiSingh 01AbhiSingh commented Aug 18, 2024

What does this PR do?

Fixes #<17544>

Hii @awaelchli. Can you please verify the changes I made. If they are correct then i will take up and correct any failing tests also.

Before submitting
  • Was this discussed/agreed via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you list all the breaking changes introduced by this pull request?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or minor internal changes/refactors)

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:

hii Reviewer checklist - [ ] Is this pull request ready for review? (if not, please submit in draft mode) - [ ] Check that all items from **Before submitting** are resolved - [ ] Make sure the title is self-explanatory and the description concisely explains the PR - [ ] Add labels and milestones (and optionally projects) to the PR so it can be classified

📚 Documentation preview 📚: https://pytorch-lightning--20211.org.readthedocs.build/en/20211/

@github-actions github-actions bot added the pl Generic label for PyTorch Lightning package label Aug 18, 2024
Copy link

codecov bot commented Aug 18, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81%. Comparing base (5be58f6) to head (3c48c9e).
Report is 14 commits behind head on master.

❗ There is a different number of reports uploaded between BASE (5be58f6) and HEAD (3c48c9e). Click for more details.

HEAD has 553 uploads less than BASE
Flag BASE (5be58f6) HEAD (3c48c9e)
cpu 147 21
lightning 106 16
pytest 87 2
python3.9 43 6
python3.10 42 6
lightning_fabric 25 0
gpu 4 2
python3.11 42 6
python3.12 20 3
pytorch2.1 38 12
pytest-full 64 21
pytorch2.3 9 3
pytorch_lightning 20 7
pytorch2.2 9 3
pytorch2.4 8 3
Additional details and impacted files
@@            Coverage Diff            @@
##           master   #20211     +/-   ##
=========================================
- Coverage      89%      81%     -8%     
=========================================
  Files         267      264      -3     
  Lines       23084    23029     -55     
=========================================
- Hits        20585    18618   -1967     
- Misses       2499     4411   +1912     

@01AbhiSingh
Copy link
Contributor Author

Hii @Borda. Do I need to make any kind of changes in the PR ?

@lantiga
Copy link
Collaborator

lantiga commented Oct 7, 2024

This looks good, thank you for the contribution @01AbhiSingh

Ideally we could add a test to verify the behavior described in #17544. The current test suite can't detect the current change and this is usually a sign of insufficient coverage. Would you be willing to contribute such test?

@01AbhiSingh
Copy link
Contributor Author

Yes, sure let me look into it.

@01AbhiSingh
Copy link
Contributor Author

Hi @lantiga , Do you want a new test written from scratch or need me to make the necessary changes in a preexisting file? All the tests have been passed. If the changes need to be made in a preexisting file, it would be very helpful if you could point out the test in which I need to make the changes, as all the tests have been passed, and due to that, I can't find the test.

@lantiga
Copy link
Collaborator

lantiga commented Nov 12, 2024

hey @01AbhiSingh sorry for the wait

You can take inspiration from:

def test_lr_scheduler_epoch_step_frequency(mocked_sched, check_val_every_n_epoch, tmp_path):

and add a new test where scheduling goes across epoch boundaries. Maybe @falckt can help too?

@lantiga lantiga added the waiting on author Waiting on user action, correction, or update label Nov 12, 2024
@lantiga
Copy link
Collaborator

lantiga commented Nov 19, 2024

Hey @01AbhiSingh will you have time to add a test?

@01AbhiSingh
Copy link
Contributor Author

Hey @01AbhiSingh will you have time to add a test?

Yes, Today's my last exam after that I will add the test asap. Sorry for the delay.

@lantiga
Copy link
Collaborator

lantiga commented Nov 25, 2024

@01AbhiSingh just checking, no pressure :-)

@01AbhiSingh
Copy link
Contributor Author

Hii @lantiga. please check whether this is correct or not. What kind of changes I need to make to this code ?

def test_lr_scheduler_step_across_epoch_boundaries(mocked_sched, tmp_path):
    class StepAcrossEpochsModel(LightningModule):
        def __init__(self):
            super().__init__()
            self.layer = torch.nn.Linear(32, 2)

        def forward(self, x):
            return self.layer(x)

        def training_step(self, batch, batch_idx):
            return {"loss": torch.tensor(0.1, requires_grad=True)}

        def configure_optimizers(self):
            optimizer = torch.optim.SGD(self.layer.parameters(), lr=0.1)
            scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=1)
            return {
                "optimizer": optimizer,
                "lr_scheduler": {
                    "scheduler": scheduler,
                    "interval": "step",
                    "frequency": 5,  # Scheduler steps every 5 iterations
                },
            }

    model = StepAcrossEpochsModel()

    # Trainer configuration for cross-epoch testing
    trainer = Trainer(
        default_root_dir=tmp_path,
        limit_train_batches=7,  # More than `frequency` iterations per epoch
        max_epochs=3,  # Test across multiple epochs
    )

    # Fit the model
    trainer.fit(model)

    # Calculate the total number of steps (iterations) and expected scheduler calls
    total_steps = 7 * 3  # Total iterations (7 batches per epoch * 3 epochs)
    expected_steps = total_steps // 5  # Scheduler steps every 5 iterations

    # Assert that the scheduler was called the expected number of times
    assert mocked_sched.call_count == expected_steps

@lantiga
Copy link
Collaborator

lantiga commented Dec 6, 2024

@01AbhiSingh the test looks good! feel free to add it to test_optimizers.py and we should be ok

@01AbhiSingh
Copy link
Contributor Author

Done. sorry I was bogged down by few things. But, a lot of new PRs for all those issues coming your way now.

@lantiga
Copy link
Collaborator

lantiga commented Dec 8, 2024

there's an issue with mocked_sched that is undefined in the test, other than that things look good

@01AbhiSingh
Copy link
Contributor Author

there's an issue with mocked_sched that is undefined in the test, other than that things look good

do you mean I need to add fixture to the test function's parameters.

@pytest.fixture
def mocked_sched(mocker):
    return mocker.patch('torch.optim.lr_scheduler.StepLR.step')

@lantiga
Copy link
Collaborator

lantiga commented Dec 11, 2024

That's right, you can also decorate the test with

from unittest.mock import patch

...

@patch("torch.optim.lr_scheduler.StepLR.step")
def test_lr_scheduler_step_across_epoch_boundaries(mocked_sched, tmp_path):
    ...

this should work

@01AbhiSingh
Copy link
Contributor Author

Done please check

@lantiga
Copy link
Collaborator

lantiga commented Dec 11, 2024

Hey @01AbhiSingh can you import LightningModule here?

https://github.com/Lightning-AI/pytorch-lightning/pull/20211/files#diff-3c3f104dbdd06271c9e6e6d4fdf61398458148412401dd55a9bac1e9b5f913a8R19

Change:

from lightning.pytorch import Trainer

to

from lightning.pytorch import Trainer, LightningModule

this should fix the failing test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pl Generic label for PyTorch Lightning package waiting on author Waiting on user action, correction, or update
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants