Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[memory] Avoid storing trainer in ModelCardCallback and SentenceTransformerModelCardData #3144

Merged
merged 1 commit into from
Jan 6, 2025

Conversation

tomaarsen
Copy link
Collaborator

Resolves #3136

Hello!

Pull Request overview

  • Avoid storing trainer in ModelCardCallback and SentenceTransformerModelCardData

Details

This seems to prevent cleanup, as there's a cyclical dependency between trainer -> model -> model card -> trainer. This means that once the trainer and model get overridden (e.g. in https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/data_augmentation/train_sts_seed_optimization.py), the old model/trainer/model_card_data don't get automatically eaten by the garbage disposal.

I've moved a lot of components around, and now ModelCardCallback nor SentenceTransformerModelCardData need to store the Trainer. Although annoying, this does mean that memory should be cleared if the model/trainer gets overridden/deleted.

Before:

Approximate highest recorded VRAM during train_sts_seed_optimization:

16332MiB /  24576MiB

After

Approximate highest recorded VRAM during train_sts_seed_optimization:

8222MiB /  24576MiB

Note that the VRAM usage does still grow, albeit a lot more slowly, so this might not have resolved all issues. Having said that, because most people only make 1 trainer, it's not that big of an issue I suspect.

  • Tom Aarsen

and SentenceTransformerModelCardData

This prevents a proper cleanup
@tomaarsen tomaarsen merged commit a41aada into UKPLab:master Jan 6, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Memory leaked when the model and trainer were reinitialized
1 participant