Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyTorch Hugging Face Models do not have ACL calls for Docker versions > 23.05 #200

Open
abhishek-rn opened this issue Sep 19, 2023 · 3 comments

Comments

@abhishek-rn
Copy link

Hi,

Docker Tags:
r23.09-torch-2.0.0-onednn-acl
r23.05-torch-2.0.0-onednn-acl

I am unable to get acl calls in docker versions higher than 23.05 for Pytorch Hugging Face Models

Attaching oneDNN verbose calls for BERT model here
23.05_Bert_Verbose.txt
23.09_Bert_Verbose.txt

The code to reproduce this is attached as below:
PyT_Bert_Training.txt --> Use this for the first run to generate necessary inference checkpoints and files.
PyT_Bert_Inf.txt --> For subsequent runs to generate the oneDNN logs

Also, as a result, the later oneDNN verbose exhibits gemm:jit calls for Matmuls and this results in poor performance for inference compared to gemm:acl calls.

Thanks

@nSircombe
Copy link
Contributor

Hi @abhishek-rn
Thanks for the report. This transition from 23.05 to 23.06 marks the move from PyTorch 1.x to 2.x, so it looks like we may have lost some functionality at this stage.
Would you be able to confirm if the same behaviour is present if you use the pip installed pytorch packages for 1.3 and 2.0 on AArch64, and also on x86?

@abhishek-rn
Copy link
Author

Hi @nSircombe
The Docker tag read r23.05-torch-2.0.0-onednn-acl.
So, I thought that would mean torch-2.0.0.
However, I ran the pip installed pytorch 2.0.0 and 1.13 and PFB the logs:
ARM_PyT_1.13_Bert_Verbose.txt
ARM_PyT_2.0.0_Bert_Verbose.txt

And the results there show that PyT 1.13 has no ACL calls but PyT 2.0.0 has.

x86_PyT_1.13_Bert_Verbose.txt
x86_PyT_2.0.0_Bert_Verbose.txt

Also, x86 PyTorch do not have oneDNN calls for Matmuls as seen in the above logs

@nSircombe
Copy link
Contributor

Yes you're right, the version is 2.0. The tag is correct - matches the version in the Dockerfile. The mistake is in the README for the 23.05 increment here which still has 1.3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants