Stepwise LR scheduler #20211

01AbhiSingh · 2024-08-18T14:10:46Z

What does this PR do?

Fixes #<17544>

Hii @awaelchli. Can you please verify the changes I made. If they are correct then i will take up and correct any failing tests also.

Before submitting

Was this discussed/agreed via a GitHub issue? (not for typos and docs)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you list all the breaking changes introduced by this pull request?
Did you update the CHANGELOG? (not for typos, docs, test updates, or minor internal changes/refactors)

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:

hii

Reviewer checklist

- [ ] Is this pull request ready for review? (if not, please submit in draft mode) - [ ] Check that all items from **Before submitting** are resolved - [ ] Make sure the title is self-explanatory and the description concisely explains the PR - [ ] Add labels and milestones (and optionally projects) to the PR so it can be classified

📚 Documentation preview 📚: https://pytorch-lightning--20211.org.readthedocs.build/en/20211/

for more information, see https://pre-commit.ci

…ytorch-lightning into ddp-strategy-alias

codecov · 2024-08-18T20:26:37Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81%. Comparing base (5be58f6) to head (3c48c9e).
Report is 14 commits behind head on master.

❗ There is a different number of reports uploaded between BASE (5be58f6) and HEAD (3c48c9e). Click for more details.

HEAD has 553 uploads less than BASE

Flag BASE (5be58f6) HEAD (3c48c9e)

cpu 147 21

lightning 106 16

pytest 87 2

python3.9 43 6

python3.10 42 6

lightning_fabric 25 0

gpu 4 2

python3.11 42 6

python3.12 20 3

pytorch2.1 38 12

pytest-full 64 21

pytorch2.3 9 3

pytorch_lightning 20 7

pytorch2.2 9 3

pytorch2.4 8 3

Additional details and impacted files

@@            Coverage Diff            @@
##           master   #20211     +/-   ##
=========================================
- Coverage      89%      81%     -8%     
=========================================
  Files         267      264      -3     
  Lines       23084    23029     -55     
=========================================
- Hits        20585    18618   -1967     
- Misses       2499     4411   +1912

01AbhiSingh · 2024-10-07T02:58:15Z

Hii @Borda. Do I need to make any kind of changes in the PR ?

lantiga · 2024-10-07T10:55:57Z

This looks good, thank you for the contribution @01AbhiSingh

Ideally we could add a test to verify the behavior described in #17544. The current test suite can't detect the current change and this is usually a sign of insufficient coverage. Would you be willing to contribute such test?

01AbhiSingh · 2024-10-07T11:24:40Z

Yes, sure let me look into it.

01AbhiSingh · 2024-10-18T03:50:55Z

Hi @lantiga , Do you want a new test written from scratch or need me to make the necessary changes in a preexisting file? All the tests have been passed. If the changes need to be made in a preexisting file, it would be very helpful if you could point out the test in which I need to make the changes, as all the tests have been passed, and due to that, I can't find the test.

lantiga · 2024-11-12T22:58:37Z

hey @01AbhiSingh sorry for the wait

You can take inspiration from:

pytorch-lightning/tests/tests_pytorch/trainer/optimization/test_optimizers.py

Line 459 in b0aa504

    
           def test_lr_scheduler_epoch_step_frequency(mocked_sched, check_val_every_n_epoch, tmp_path):

and add a new test where scheduling goes across epoch boundaries. Maybe @falckt can help too?

lantiga · 2024-11-19T21:35:32Z

Hey @01AbhiSingh will you have time to add a test?

01AbhiSingh · 2024-11-20T02:32:08Z

Hey @01AbhiSingh will you have time to add a test?

Yes, Today's my last exam after that I will add the test asap. Sorry for the delay.

lantiga · 2024-11-25T22:41:35Z

@01AbhiSingh just checking, no pressure :-)

01AbhiSingh · 2024-12-06T17:19:09Z

Hii @lantiga. please check whether this is correct or not. What kind of changes I need to make to this code ?

def test_lr_scheduler_step_across_epoch_boundaries(mocked_sched, tmp_path):
    class StepAcrossEpochsModel(LightningModule):
        def __init__(self):
            super().__init__()
            self.layer = torch.nn.Linear(32, 2)

        def forward(self, x):
            return self.layer(x)

        def training_step(self, batch, batch_idx):
            return {"loss": torch.tensor(0.1, requires_grad=True)}

        def configure_optimizers(self):
            optimizer = torch.optim.SGD(self.layer.parameters(), lr=0.1)
            scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=1)
            return {
                "optimizer": optimizer,
                "lr_scheduler": {
                    "scheduler": scheduler,
                    "interval": "step",
                    "frequency": 5,  # Scheduler steps every 5 iterations
                },
            }

    model = StepAcrossEpochsModel()

    # Trainer configuration for cross-epoch testing
    trainer = Trainer(
        default_root_dir=tmp_path,
        limit_train_batches=7,  # More than `frequency` iterations per epoch
        max_epochs=3,  # Test across multiple epochs
    )

    # Fit the model
    trainer.fit(model)

    # Calculate the total number of steps (iterations) and expected scheduler calls
    total_steps = 7 * 3  # Total iterations (7 batches per epoch * 3 epochs)
    expected_steps = total_steps // 5  # Scheduler steps every 5 iterations

    # Assert that the scheduler was called the expected number of times
    assert mocked_sched.call_count == expected_steps

lantiga · 2024-12-06T22:47:07Z

@01AbhiSingh the test looks good! feel free to add it to test_optimizers.py and we should be ok

for more information, see https://pre-commit.ci

01AbhiSingh · 2024-12-07T04:46:32Z

Done. sorry I was bogged down by few things. But, a lot of new PRs for all those issues coming your way now.

lantiga · 2024-12-08T03:51:32Z

there's an issue with mocked_sched that is undefined in the test, other than that things look good

01AbhiSingh · 2024-12-08T12:25:27Z

there's an issue with mocked_sched that is undefined in the test, other than that things look good

do you mean I need to add fixture to the test function's parameters.

@pytest.fixture
def mocked_sched(mocker):
    return mocker.patch('torch.optim.lr_scheduler.StepLR.step')

lantiga · 2024-12-11T00:14:37Z

That's right, you can also decorate the test with

from unittest.mock import patch

...

@patch("torch.optim.lr_scheduler.StepLR.step")
def test_lr_scheduler_step_across_epoch_boundaries(mocked_sched, tmp_path):
    ...

this should work

for more information, see https://pre-commit.ci

01AbhiSingh · 2024-12-11T09:03:27Z

Done please check

lantiga · 2024-12-11T11:25:55Z

Hey @01AbhiSingh can you import LightningModule here?

https://github.com/Lightning-AI/pytorch-lightning/pull/20211/files#diff-3c3f104dbdd06271c9e6e6d4fdf61398458148412401dd55a9bac1e9b5f913a8R19

Change:

from lightning.pytorch import Trainer

to

from lightning.pytorch import Trainer, LightningModule

this should fix the failing test

01AbhiSingh and others added 13 commits July 23, 2024 20:03

Fix DDP strategy registration with override

5dba6f9

added ddp alias strategy in strategies/ddp.py

3d8b2bf

added ddp alias strategy in strategies/ddp.py

7a55c5c

[pre-commit.ci] auto fixes from pre-commit.com hooks

f4b01e5

for more information, see https://pre-commit.ci

Merge branch 'master' into ddp-strategy-alias

4424d70

updated tests

607363e

[pre-commit.ci] auto fixes from pre-commit.com hooks

3099586

for more information, see https://pre-commit.ci

updated test_registry.py

935a9c1

Merge branch 'master' into ddp-strategy-alias

c70ef61

updated test_cli.py

ebfedf6

Merge branch 'ddp-strategy-alias' of https://github.com/01AbhiSingh/p…

3285d7a

…ytorch-lightning into ddp-strategy-alias

Stepwise LR scheduler not working across epochs

4b7b719

Merge remote-tracking branch 'origin' into Stepwise-LR-scheduler

5be642f

01AbhiSingh requested review from lantiga, Borda, tchaton, awaelchli and justusschock as code owners August 18, 2024 14:10

github-actions bot added the pl Generic label for PyTorch Lightning package label Aug 18, 2024

01AbhiSingh and others added 4 commits August 21, 2024 12:10

Merge branch 'master' into stepwiseLRscheduler

fc01630

Merge branch 'master' into stepwiseLRscheduler

7f748cf

Merge branch 'master' into stepwiseLRscheduler

06f0a0a

Merge branch 'master' into stepwiseLRscheduler

3c48c9e

lantiga mentioned this pull request Nov 12, 2024

Update LR step scheduler to use total step to work across epochs #20248

Closed

7 tasks

lantiga added the waiting on author Waiting on user action, correction, or update label Nov 12, 2024

Added test for LR scheduler stepping across epoch boundaries

63cd1f0

01AbhiSingh requested a review from ethanwharris as a code owner December 7, 2024 04:43

[pre-commit.ci] auto fixes from pre-commit.com hooks

48a7c8e

for more information, see https://pre-commit.ci

01AbhiSingh and others added 4 commits December 11, 2024 14:13

added the required changes

64ed819

added the required changes

29af194

added the required changes

09bc52b

[pre-commit.ci] auto fixes from pre-commit.com hooks

e96c474

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stepwise LR scheduler #20211

Stepwise LR scheduler #20211

01AbhiSingh commented Aug 18, 2024 •

edited by github-actions bot

Loading

codecov bot commented Aug 18, 2024 •

edited

Loading

01AbhiSingh commented Oct 7, 2024

lantiga commented Oct 7, 2024

01AbhiSingh commented Oct 7, 2024

01AbhiSingh commented Oct 18, 2024

lantiga commented Nov 12, 2024

lantiga commented Nov 19, 2024

01AbhiSingh commented Nov 20, 2024

lantiga commented Nov 25, 2024

01AbhiSingh commented Dec 6, 2024

lantiga commented Dec 6, 2024 •

edited

Loading

01AbhiSingh commented Dec 7, 2024

lantiga commented Dec 8, 2024

01AbhiSingh commented Dec 8, 2024

lantiga commented Dec 11, 2024 •

edited

Loading

01AbhiSingh commented Dec 11, 2024

lantiga commented Dec 11, 2024

Stepwise LR scheduler #20211

Are you sure you want to change the base?

Stepwise LR scheduler #20211

Conversation

01AbhiSingh commented Aug 18, 2024 • edited by github-actions bot Loading

What does this PR do?

PR review

codecov bot commented Aug 18, 2024 • edited Loading

Codecov Report

01AbhiSingh commented Oct 7, 2024

lantiga commented Oct 7, 2024

01AbhiSingh commented Oct 7, 2024

01AbhiSingh commented Oct 18, 2024

lantiga commented Nov 12, 2024

lantiga commented Nov 19, 2024

01AbhiSingh commented Nov 20, 2024

lantiga commented Nov 25, 2024

01AbhiSingh commented Dec 6, 2024

lantiga commented Dec 6, 2024 • edited Loading

01AbhiSingh commented Dec 7, 2024

lantiga commented Dec 8, 2024

01AbhiSingh commented Dec 8, 2024

lantiga commented Dec 11, 2024 • edited Loading

01AbhiSingh commented Dec 11, 2024

lantiga commented Dec 11, 2024

01AbhiSingh commented Aug 18, 2024 •

edited by github-actions bot

Loading

codecov bot commented Aug 18, 2024 •

edited

Loading

lantiga commented Dec 6, 2024 •

edited

Loading

lantiga commented Dec 11, 2024 •

edited

Loading