-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ppc64le support #2921
Comments
It's hard for us to make progress on it because our team don't have any ppc64le hardware that can be used for dev and testing. |
Seems new manylinux2014 docker images can help us solve this. |
@snnn , we were able to build onnxruntime for ppc64le using the changes here cms-externals#4 but some of our tests failed to produce identical results. One of onnxruntime test also failed to run. @mrodozov do you remember which test was failing? Have you tried using proot and qemu to get emulate powerpc ? We do use it to install ppc64le rpm packages on our x86_64 server. |
@tracysh Could you please take a look at cms-externals#4 ? |
turn this:
the other tests:
are going fine (no mismatch prints at least) |
That's strange, because the Conv2D tests build on the GEMM routine. MlasFgemmTest::ExecuteShort first loops over small GEMMs from 1-15 which stresses some of the partial vector stores. Do the tests after this, which are multiples of 16, work okay? |
this is the full unittest output |
Update: I was curious about the latest state of Power ISA (I worked on Xbox 360, a PowerPC 2.02 implementation), so I updated MLAS to directly use VSX intrinsics. I verified the GEMM using gcc 7.4 to cross compile then run from qemu. I'll get my changes into a branch you can try on your end in a few days. |
We will be happy to test it as soon as it is available. many thanks for looking in to this. |
@tracysh , any update which we can test? |
I'm going to need a few more days to clean this up. Just curious, which POWER versions are you using this on? |
We are using power8
|
I'm curious if there was an update for this issue |
Apologies for the delay on this. I've put the changes into the branch tracysh/mlas_powerpc. With this, I was able to build with gcc 7.5 and run under qemu. I ran onnxruntime_mlas_test and was able to run the subset of the GEMM tests. There are more GEMM tests that I usually run for validation of big changes, but qemu was too slow to tackle that. I was also able to run through onnxruntime_test_all (run as part of the build), but there was a MathSinFloat test that uses Eigen that was failing. I'm curious what happens on real hardware to know if this is worth investigating further. I was also able to point onnx_test_runner at resnet50 and bertsquad from the onnx model zoo and both passed successfully. I have no idea how performant the SGEMM might be. It may be possible to scale up the GEMM further, but I'll need some help from you to measure on real hardware. Also, I want to make a few changes to onnxruntime_mlas_test to test a few more things out. Let me know how it goes. |
Thanks @tracysh , we are testing your changes now and will let you know soon. |
Hello again,
when I run:
|
And the result from |
I pushed some new changes to cleanup the GEMM kernel templating. How is performance of the runtime? I'm curious what you see for resnet50 or other test models from the ONNX model zoo. If you download some models + test data from the zoo (https://github.com/onnx/models), you can use onnx_test_runner to verify that the models run. And you can use "onnxruntime_perf_test -e cpu -t 30 path/to/model_and_data" to get a reference time. Once you have some timing data, can you try updating MlasSgemmKernel in SgemmKernelPower.cpp to see if doing 6 rows improves or degrades performance? GCC seemed to build this and keep everything in registers, but this isn't always faster.
As far as the onnxruntime_mlas_test errors, I see the same problem in the ARM64 build. The expected data is based on what is observed with x86/x64. |
@smuzaffar Are there any additional comments on these changes (see my last comment for some questions)? Are you able to run your models successfully with these changes? |
@tracysh , we are working on it cms-sw/cmsdist#5743 . We needed few fixes on top of v1.2.0 to build it ( https://github.com/cms-externals/onnxruntime/commits/cms/v1.2.0_plus_ppc_update_pb31130 ) . @mrodozov is working on it. |
I merged all of the pending Power changes into master. |
Thanks @tracysh , we have integrated this in our software and things looks in much better state. |
Hi, @smuzaffar, just checking in: how does the performance of ONNX Runtime compare to the other runtimes you were using on Power? Do these systems have GPUs too that might benefit from using the CUDA support? |
@tracysh , as x86_64 is our production architecture so when we migrated to onnxruntime then we did a performance test for x86_64. You can find the preformance results here cms-sw/cmssw#28112 . In short we noticed 7x gain in modules where we have used onnxruntime. Unfortunately we do not have same exact comparison for Power (i.e. exact cmssw with and without onnxruntime). But the comparison between cmssw from Dec 2019 (which was without onnxruntime) and latest nightlies we see much better gain (this could be due to both onnxruntime plus improve,ent in our code) CMSSW 2019-12-04 + without ONNXRuntime
CMSSW 2020-05-07 + ONNXRuntime
Although our Power machines have GPU but currently we are not building with cuda support. Hopefully in near future we will enable it and report back the results. |
Hi, @tracysh, there is a comparison between ONNX Runtime and another runtime to measure performance on x86_64, results are available here: |
Is your feature request related to a problem? Please describe.
We build our software for x86, aarch64 and ppc64le and our developers would like to use onnxruntime but as it does not build for ppc64le archs, so we can not integrate it.
System information
Describe the solution you'd like
We would like to build and use onnxruntime on PPC64 archs.
Describe alternatives you've considered
Nothing yet
The text was updated successfully, but these errors were encountered: