Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code clean-ups #171

Merged
merged 19 commits into from
Jan 3, 2025
Merged

Code clean-ups #171

merged 19 commits into from
Jan 3, 2025

Conversation

vince62s
Copy link
Contributor

No description provided.

@vince62s
Copy link
Contributor Author

Main difference in perf comes from the use of the context manager:
with sdpa_kernel([SDPBackend.EFFICIENT_ATTENTION]): before the scaled_dot_product_attention()
The counterpart is that it takes longer to startup because of some recompilings.
Not sure what is the best.

For the rest, some code clean-ups.

Before this PR, EuroLLM-9B finetuning the estimator:

[2024-12-27 09:50:51,765 INFO] Step 10/ 4000; acc: 54.8; ppl: 24.84; xent: 3.21; aux: 0.599; lr: 3.33e-06; sents:    2560; bsz:  208/ 208/ 2; 1694/1694 tok/s;    157 sec;
[2024-12-27 09:51:39,617 INFO] Step 20/ 4000; acc: 55.0; ppl: 24.50; xent: 3.20; aux: 0.548; lr: 6.67e-06; sents:    2560; bsz:  210/ 210/ 2; 5625/5625 tok/s;    205 sec;
[2024-12-27 09:52:28,284 INFO] Step 30/ 4000; acc: 55.3; ppl: 23.62; xent: 3.16; aux: 0.451; lr: 1.00e-05; sents:    2560; bsz:  213/ 213/ 2; 5602/5602 tok/s;    253 sec;
[2024-12-27 09:53:17,060 INFO] Step 40/ 4000; acc: 55.2; ppl: 24.26; xent: 3.19; aux: 0.248; lr: 1.33e-05; sents:    2560; bsz:  211/ 211/ 2; 5545/5545 tok/s;    302 sec;
[2024-12-27 09:54:06,210 INFO] Step 50/ 4000; acc: 55.4; ppl: 23.51; xent: 3.16; aux: 0.110; lr: 1.67e-05; sents:    2560; bsz:  215/ 215/ 2; 5592/5592 tok/s;    351 sec;
[2024-12-27 09:54:54,319 INFO] Step 60/ 4000; acc: 54.8; ppl: 25.14; xent: 3.22; aux: 0.061; lr: 2.00e-05; sents:    2560; bsz:  206/ 206/ 2; 5481/5481 tok/s;    399 sec;

This PR:

[2024-12-27 10:20:55,554 INFO] Step 10/ 4000; acc: 54.8; ppl: 24.81; xent: 3.21; aux: 0.599; lr: 3.33e-06; sents:    2560; bsz:  208/ 208/ 2; 830/830 tok/s;    320 sec;
[2024-12-27 10:21:41,045 INFO] Step 20/ 4000; acc: 55.0; ppl: 24.47; xent: 3.20; aux: 0.548; lr: 6.67e-06; sents:    2560; bsz:  210/ 210/ 2; 5917/5917 tok/s;    366 sec;
[2024-12-27 10:22:27,125 INFO] Step 30/ 4000; acc: 55.3; ppl: 23.59; xent: 3.16; aux: 0.451; lr: 1.00e-05; sents:    2560; bsz:  213/ 213/ 2; 5916/5916 tok/s;    412 sec;
[2024-12-27 10:23:13,582 INFO] Step 40/ 4000; acc: 55.2; ppl: 24.22; xent: 3.19; aux: 0.248; lr: 1.33e-05; sents:    2560; bsz:  211/ 211/ 2; 5821/5821 tok/s;    458 sec;
[2024-12-27 10:24:00,311 INFO] Step 50/ 4000; acc: 55.3; ppl: 23.48; xent: 3.16; aux: 0.110; lr: 1.67e-05; sents:    2560; bsz:  215/ 215/ 2; 5881/5881 tok/s;    505 sec;
[2024-12-27 10:24:45,582 INFO] Step 60/ 4000; acc: 54.8; ppl: 25.10; xent: 3.22; aux: 0.061; lr: 2.00e-05; sents:    2560; bsz:  206/ 206/ 2; 5825/5825 tok/s;    550 sec;

Before this PR, Encoder-Decoder training:

[2024-12-27 09:59:18,407 INFO] Step 100/200000; acc: 13.6; ppl: 10780.07; xent: 9.29; aux: 0.000; lr: 6.72e-06; sents:  118220; bsz: 8506/10554/197; 26799/33251 tok/s;    190 sec;
[2024-12-27 10:01:04,435 INFO] Step 200/200000; acc: 17.8; ppl: 3006.98; xent: 8.01; aux: 0.000; lr: 1.34e-05; sents:  107813; bsz: 8515/10553/180; 48188/59718 tok/s;    296 sec;
[2024-12-27 10:02:51,181 INFO] Step 300/200000; acc: 19.5; ppl: 1181.01; xent: 7.07; aux: 0.000; lr: 2.02e-05; sents:  100017; bsz: 8583/10569/167; 48244/59408 tok/s;    403 sec;

This PR:

[2024-12-27 10:11:14,616 INFO] Step 100/200000; acc: 13.6; ppl: 10779.73; xent: 9.29; aux: 0.000; lr: 6.72e-06; sents:  118220; bsz: 8506/10554/197; 12455/15453 tok/s;    410 sec;
[2024-12-27 10:12:59,324 INFO] Step 200/200000; acc: 17.8; ppl: 3007.03; xent: 8.01; aux: 0.000; lr: 1.34e-05; sents:  107813; bsz: 8515/10553/180; 48796/60471 tok/s;    514 sec;
[2024-12-27 10:14:44,983 INFO] Step 300/200000; acc: 19.5; ppl: 1184.54; xent: 7.08; aux: 0.000; lr: 2.02e-05; sents:  100017; bsz: 8583/10569/167; 48740/60020 tok/s;    620 sec;

@vince62s vince62s changed the title misc optimization Code clean-ups Dec 27, 2024
Copy link
Member

@francoishernandez francoishernandez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice clean-up for the new year!
Quite a few comments not necessarily all relevant.

Also, do we know where the main diffs in the test outputs come from? Is it just the attention backend related changes? Or maybe a few numerical differences ensuing from the modified operations here and there?

.github/workflows/push.yml Show resolved Hide resolved
eole/decoders/ensemble.py Show resolved Hide resolved
eole/decoders/transformer_decoder.py Show resolved Hide resolved
eole/decoders/transformer_decoder.py Outdated Show resolved Hide resolved
eole/decoders/transformer_lm_decoder.py Outdated Show resolved Hide resolved
eole/predict/inference.py Show resolved Hide resolved
eole/predict/inference.py Show resolved Hide resolved
eole/predict/translator.py Show resolved Hide resolved
eole/tests/test_model_lm/config.json Show resolved Hide resolved
eole/train_single.py Show resolved Hide resolved
@vince62s vince62s merged commit 8a8987f into eole-nlp:main Jan 3, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants