-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Remove {running,accumulated}_loss
#9372
Comments
{running,accumulated}_loss
@carmocca nothing yet, but just created PR Lightning-AI/torchmetrics#506 in torchmetrics that implements simple aggregation metrics (sum, mean, max, min, cat) :] |
+1 I'm in favor of this! Good find @carmocca ! |
I agree with the window accumulation for the regular loss, it's not really needed and the value is anyway not configurable. What will we do with the loss accumulation for gradient accumulation phases? Will you also remove that? |
Yes, sounds like a great idea ! |
Almost all users (if not all?) are logging the loss explicitly already to include it in the loggers.
The progress bar and the concept of automatic optimization don't need to be linked like this. Also, it raises the question: "what about manual? do I need to return the loss there too?" This goes with the theme of avoiding multiple ways to do the same thing.
It does take care of it already by getting values from
Yes, the point is to show in the progress bar the same that the users will see when they open TensorBoard, whatever that is. |
i can take a look at this issue if no one has started yet. |
After some offline discussion, we decided to split this into separate PRs:
The main arguments for (2) are:
Now, if (2) is approved, there are things we could do to improve the experience: |
I think an info msg is not required. The user chooses not to show it in the prog_bar, so it seems reasonable that they don't want it. |
Proposed refactoring or deprecation
Remove the following code: a979944
Motivation
The running loss is a running window of loss values returned by the
training_step
. It has been present since the very beginning of Lightning and has become legacy code.Problems:
loss
value theyself.log
ed.self.log
their actual loss which makes them see two "loss" values in the progress bar.get_progress_bar_dict
hook which is inconvenient.TrainingBatchLoop.__init__
.Alternative:
self.log("loss", loss, prog_bar=True)
torchmetrics.Metric
specialized for it. (is there a Metric to replace theTensorRunningAccum
already? cc @justusschock @awaelchli @akihironitta @rohitgr7 @SeanNaren @kaushikb11 @SkafteNicki)Pitch
Remove the code, I don't think there's anything to deprecate here.
get_progress_bar_dict
stays for thev_num
andsplit_idx
.TrainingBatchLoop.{accumulated,running}_loss
attributes should be private.FitLoop.running_loss
property seems to be there only for theTuner
and could be considered private: https://grep.app/search?q=fit_loop.running_lossTensorRunningAccum
: https://grep.app/search?q=TensorRunningAccumcc @awaelchli @ananthsub
If you enjoy Lightning, check out our other projects! ⚡
Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, finetuning and solving problems with deep learning
Bolts: Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch
Lightning Transformers: Flexible interface for high performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.
The text was updated successfully, but these errors were encountered: