Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bugfix] Resolve memory not logged when missing metrics #8174
[bugfix] Resolve memory not logged when missing metrics #8174
Changes from all commits
4549007
3fc2d94
ead9f88
65d4af8
c6e40e8
319ad41
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we're already in the trainer, why do we have to log through the lightning module's
log
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did this PR need to add
_gpu_metrics
? Doesn't seem related to the issue linked.This means that gpu metrics are now duplicated in this dictionary and in logged metrics.
Also it only gets filled when
self.log_gpu_memory
so it can't be used anyways without the flag.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this broke
log_gpu_memory="min_max"
option inTrainer.fit
. Looks related to: #9010Essentially
memory. get_memory_profile
adds two keys that are not in conventional formatmin_gpu_mem
andmax_gpu_mem
(typically keys are inf"gpu_id: {gpu_id}/memory.used (MB)"
I see #9013 fixed it.