-
Notifications
You must be signed in to change notification settings - Fork 356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How about the torch.compile in TransformerEngine ? #1241
Comments
We have used For the moment we manually identify fusion opportunities and incorporate them into our modules, e.g. |
yeah, But now i am runing on bfloat16. Through add torch.compile to rmsnorm and swiglu for llama2-7b in legacy mode, I can get more benefit than te, leagacy mode through torch.compile can brings 10% performance improvements than te, So can i compile the benefit with te and torch.compile, I think it can be more fast. |
@timmoon10 |
@timmoon10 Yeah, I have now found that the performance of using TE does not exceed the benefits of non-TE+torch.compile in llama2-7b, except for FP8 support, the functions for FA, TE and non-TE calls are the same and the linear layer is also called by the blaslt. So for the rest of the parts, although te did some fusion, will the benefits of TE exceed the improvement brought by torch.compile? Do you have any suggestions? How we're taking performance even further. |
In PyTorch, we know that Torch.Compile will bring us a lot of benefits, and the TransformerEngine also brings performance improvements through strategies such as Transformer fusion optimization, so does the Transformer Engine also support Torch.compile? Is there any documentation on whether it is possible to get better benefits by using torch.compile in TE mode compared to non-TE mode?
Do you have suggestions for me to use torch.compile in TransformerEngine?
In llama2, we found that torch.compile can make better profits on rmsnorm and swiglu, but in TE, it is not possible to directly add torch.compile to rmsnorm and swig;u, is there any good way?
The text was updated successfully, but these errors were encountered: