Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance drop on macos 14 #12

Open
siyuzou opened this issue Nov 25, 2023 · 7 comments
Open

performance drop on macos 14 #12

siyuzou opened this issue Nov 25, 2023 · 7 comments

Comments

@siyuzou
Copy link

siyuzou commented Nov 25, 2023

Thanks for the great work. I`ve been using this since ort-1.13, on a MBP with M1 Pro chip.

The problem is, after I updated my system from macos 13 to 14, all the models using coreml EP become slower than before the updating (still faster than using cpu EP though). The performance drop ~50-75% in average. I didn't make a time machine backup before updating, so it's not a good idea to downgrade the system back to 13.

I've made it to manually build a wheel instead of pip install onnxruntime-silicon, but the performance remains the same.

Would you support macos 14 recently?

@Oil3
Copy link

Oil3 commented Dec 5, 2023

@siyuzou I'm building the 1.16.3 on my M1 promax right now and the plan is to do some tests to see the differences if any.
For me i didn't pay attention when i upgraded from 13 to 14, but I remember having a huge increases when passing from the older version to the ones of this git.
I have the m1promax and the M2 air to try.
Can you give some infos? python version, onnx, pytorch..? are you still in arm python?
Could it be somethin g unrelated to OS version? Like a new pip install that had requierements that messed with your usual (putting a different version that messes thing up)

edit: Ok i built it and it's... much faster.
I did a lot of tests actually and go figure, the freshly compiled 1.16.2 is nearly twice as fast!!
results are very boring and unorganized, stuff like this, i spare you

.`
	 4783/18162 [09:00<20:28, 10.89frames/s, memory_usage=16.08GB, execution_threads=99]
	9105/18162 [17:32<36:31,  4.13frames/s, memory_usage=16.18GB, execution_threads=99
	9241/18162 [17:47<13:32, 10.98frames/s, memory_usage=16.18GB, execution_threads=99
	9508/18162 [18:18<09:06, 15.83frames/s, memory_usage=16.18GB, execution_threads=99]
Processing gif took 94.78080701828003 secs 99threads)  VS Processing gif took 97.30715584754 secs 3threads	 1124/18162 [01:05<17:11, 16.51frames/s, memory_usage=06.02GB, execution_threads=32]
 1861/18162 [01:45<08:23, 32.39frames/s, memory_usage=06.07GB, execution_threads=32]
 5694/18162 [04:22<07:19, 28.34frames/s, memory_usage=06.08GB, execution_threads=32]
x< 5816/18162 [04:27<12:08, 16.95frames/s, memory_usage=06.09GB, execution_threads=32]
  | 16505/18162 [12:35<01:37, 17.08frames/s, memory_usage=06.12GB, execution_threads=32]`

edit 2: try with my compiled version? i uploaded it https://github.com/Oil3/onnxruntime-silicon-1.16.2/releases/download/onnxruntime-silicon/onnxruntime_silicon-1.16.3-cp311-cp311-macosx_14_0_arm64.whl

@siyuzou
Copy link
Author

siyuzou commented Dec 8, 2023

@Oil3 Hi !

So on your M1 Pro Max system, the speed increases hugely after upgrading the OS from 13 to 14, right ? That's impressive, for I only experienced performance drop instead. I would like to try your pre-built wheel when I have time.

I'm on mac os Sonoma 14.1.1, here's my system info:

Operating System: Darwin 23.1.0
Architecture: arm64
Python Version: Python 3.10.13
Python Architecture: 64bit
Python Executable: /opt/homebrew/anaconda3/envs/xxx/bin/python: Mach-O 64-bit executable arm64
PIP Version: 23.3.1

and package info related to onnx:

$ pip list | grep onnx
onnx                   1.15.0
onnxconverter-common   1.14.0
onnxmltools            1.11.2
onnxruntime-silicon    1.16.0

@cansik
Copy link
Owner

cansik commented Dec 8, 2023

@siyuzou As already mentioned in various other issues, performance depends a lot on your model. If your model uses unsupported layers, the runtime has to move the data from the GPU back to the CPU and can result in slower inference than just on the CPU.

@siyuzou
Copy link
Author

siyuzou commented Dec 8, 2023

@siyuzou As already mentioned in various other issues, performance depends a lot on your model. If your model uses unsupported layers, the runtime has to move the data from the GPU back to the CPU and can result in slower inference than just on the CPU.

@cansik Hi !

I'm pretty sure that the models I'm using don't have unsupported layers. The problem is that they perform pretty well on macos 13, but the speed drops significantly after I upgraded my system to macos 14.

E.g., one of the model takes 5 ms for one inference, but takes 10 ~ 15 ms after system upgrade (both using CoreML EP). The speed will be much slower if using CPU EP, like, 40 ms.

@cansik
Copy link
Owner

cansik commented Dec 8, 2023

I've just added a release for v1.16.3 built on MacOS 14. Maybe it fixes the issues you currently have?

@Oil3
Copy link

Oil3 commented Dec 9, 2023

@Oil3 Hi !

So on your M1 Pro Max system, the speed increases hugely after upgrading the OS from 13 to 14, right ? That's impressive, for I only experienced performance drop instead. I would like to try your pre-built wheel when I have time.

Hi @siyuzou , actually I meant from the 1.14 onnxruntime-sillicon to the 1.16.3.
What else did you install? Can you do a 'pip list" and a "pip check" see whats up? Dont happen to have the onnxruntime from microsoft and not from cansik?
It must be something silly, waitin for you to try Cansik(s new uploadjr

and @cansik thanks/danke/merci for your work.
Is there somewhere a list of whats already implemented and works inside onnx-silicon? a la pytorch matrix? ](url)https://qqaatw.dev/pytorch-mps-ops-coverage`*`

@cansik
Copy link
Owner

cansik commented Dec 9, 2023

@Oil3 Yes there is a list of supported operators, you can find them over here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants