-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lightning Lite Examples #9987
Lightning Lite Examples #9987
Conversation
The design document isn't visible to people outside of grid.aI so it's hard to know the context for this |
Hey @ananthsub. Find the document here: https://docs.google.com/document/d/1b10LMNqnv1ellVTAEIlJFV5KvBuxIlFCTnNB3SYIFok/edit#heading=h.jl44rslqge7e. Best, |
Co-authored-by: Jirka Borovec <[email protected]>
Co-authored-by: Jirka Borovec <[email protected]>
Co-authored-by: Jirka Borovec <[email protected]>
def __len__(self) -> Union[int, float]: | ||
if isinstance(self._dataloader, Sized): | ||
return len(self._dataloader) | ||
return float("inf") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not belong in this PR. Why did we add this?
Needs to be addressed in #10297
What does this PR do?
This is the V1 for the new Lightning Lite package. It bundles all major changes together, but individual PRs will be done to get it merged (e.g., #10175, #10176)
Planned to be released as part of 1.5.
Demo
TODOs
Precision support
Plugin support
Move data to device automatically
Move model to device automatically
Allow only one model per setup() call
DataLoader setup: Currently, there is no distributed sampler.
Resolve miscellaneous TODOs in the code base
Fix changes that broke Lightning tests
Make self.setup() take model and optimizers positionally.
Unit testing, parity tests
Typing (mypy)
Discussions
LightningLite constructor arguments: We are currently changing the Trainer constructor arguments to support a new pattern:
Trainer(accelerator="cpu/tpu/gpu", strategy="ddp/deepspeed/...", devices=X)
Should we start promoting this directly in the LightningLite API?Deepspeed API for backward: The user can't call
loss.backward()
, it needs to be called on the model. Which API do we want to offer:In both cases, user needs to change their code if they switch from one strategy to the next.
Deepspeed API for optimization step: Plain Deepspeed requires a call to
model.step()
as opposed to the usualoptimizer.step()
. Since we wrap the optimizers of the user anyway, we could still offeroptimizer.step()
and redirect tomodel.step()
. It would mean less code changes for the user switching between plugins, but for deepspeed users it might be confusing!Related work:
Part of #1 (it's a lie, this is just here to avoid noisy GitHub bot)