Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align logged attributes for errors and run metadata in kill_loss_spike_callback.py #1494

Merged

Conversation

joyce-chen-uni
Copy link
Contributor

  • Log loss spike/high loss metrics instead of message to run metadata (message will be reconstructed in mapi)
  • Only log to run metadata if log_only to avoid duplication of message in train_updated event and failed_exception event
  • Add loss_window to error attributes

…ata if log_only, add loss_window to error attributes
@joyce-chen-uni joyce-chen-uni requested a review from a team as a code owner August 28, 2024 22:25
@joyce-chen-uni joyce-chen-uni changed the title Make error and run metadata logging consistent in kill_loss_spike_callback.py Align logging of metrics for errors and run metadata in kill_loss_spike_callback.py Aug 28, 2024
@joyce-chen-uni joyce-chen-uni changed the title Align logging of metrics for errors and run metadata in kill_loss_spike_callback.py Align logged attributes for errors and run metadata in kill_loss_spike_callback.py Aug 28, 2024
Copy link
Contributor

@jjanezhang jjanezhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thank you :)

@joyce-chen-uni joyce-chen-uni force-pushed the joyce/update-metadata-logging branch from 0467865 to 56891fc Compare August 29, 2024 00:17
@joyce-chen-uni joyce-chen-uni merged commit 8516181 into mosaicml:main Aug 29, 2024
9 checks passed
@joyce-chen-uni joyce-chen-uni deleted the joyce/update-metadata-logging branch August 29, 2024 04:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants