-
Notifications
You must be signed in to change notification settings - Fork 437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow performance on mac #1135
Comments
SummarySo to clarify/summarise, it seems the reason for this recursion is that futures associated with previous frames are not being cleaned up properly on macos. The reason for this seems to be related to the call to Some thoughts:
|
I'm not certain about this, it could be the case but I think they might be getting cleaned up but not as fast as we are calling
Yeh it does happen.
To clarify both calls return a
To clarify if you call vkWaitForFences This is what makes me think we are just calling the function much quicker then Moltenvk is signalling it's fences and therefor digging down a deep recursion stack.
Seems to jump up there very quickly. Yeh it's all in the cleanup although I'm not sure my profiler is super accurate due to hitting it's recursion limit. |
I'm a little stuck on how to come up with a solution for this problem. But some ideas:
|
This is the recursive loop: |
@freesig I just noticed that there is a I wonder if calling If this does fix the behaviour on macos, perhaps we could change |
Yep so calling The problem with using Also it seems like if we don't call |
Ohh hmm, are you sure you can't call it? I think Anyway, that's awesome news that
Yeah it would be great to have an example that reliably synced with the refresh rate of the display, but I guess we can probably work out how to deal with this in another issue. |
Actually yeh you can call |
This is fixed in for Nannou however the vulkano examples still won't work on properly macos. .then_signal_fence_and_flush();
future.wait(None).unwrap(); after to the examples (I think some already have it). I do think that it's possible that something is wrong with when MoltenVK signals the My understanding of MoltenVK is uses a Metal command buffer to submit all the current "presents" and once the GPU executes a present it will signal the swapchain image that it's free. I'm not sure why this is happening faster then refresh rate. Either it's building up more "presents" in that command buffer then max swapchain images or some sort of mailbox situation is happening where it is overriding the swapchain image with the latest. I'm happy to keep digging into this but also happy to close the issue by adding the |
Sorry, I've been struggling to keep up with everything on vulkano recently. :( |
Please refer to #1247 for more information. |
To produce call
cleanup_finished()
on the previous framesGpuFuture
each frame.This happens in the triangle example.
Issue
Getting very slow performance with a sequence of images with moltenvk.
It starts rending very slowly but progressively gets quicker.
I'm not sure if this is a problem with vulkano or moltenvk.
The same code runs fine on linux.
However if I change this line:
vulkano/vulkano/src/sync/future/fence_signal.rs
Line 167 in 9a08414
to
Then it works perfectly. Except that molten spits out a timeout error.
Another option is to change the section to this:
Note I tried going down to
from_nanos(1)
but then it runs slowly again.Also this "fix" doesn't spit out the Molten timeout error
Profiling
I profiled the original vulkano code with the bug and I noticed that
_$LT$vulkano..sync..future..fence_signal..FenceSignalFuture$LT$F$GT$$u20$as$u20$vulkano..sync..future..GpuFuture$GT$::cleanup_finished::h4c367c6caff4531e
I believe that this is actually
cleanup_finished_impl
but shows ascleanup_finished
because of the release optimizationsIs calling itself recursively but the stack is beyond my profilers limit at 500 calls. So I don't know how deep it is going. Also it was using about 97% CPU.
With the above fix it is still recursing but only to ~40 deep. I assume the sleeping is preventing it from going deeper. It also only uses 22% CPU.
MoltenVK
Looking at the implementation of both
vkWaitForFences
andvkGetFenceStatus
you end up locking the fences mutex:But at the same time the signal needs the lock:
What I think is happening
At the start of the program there is more setup stuff going on before the fence is signaled and while the fence isn't signaled vulkano is constantly calling vkWaitForFences but getting back a busy
VK_TIMEOUT
then recursively trying again.This is cause very deep callstacks and once a signaled fence is found it has to return all the way back up this giant callstack.
I'm not sure why this only happens on Moltenvk and not regular Vulkan. Perhaps Molten is just slower or the underlying Metal doesn't signal fences as efficiently. Or maybe the
vkWaitForFences
with a 0 timeout actually takes a bit of time to return of regular vulkan and this prevents the deep recursion.I should note that I'm getting this error regardless of the above "fix".
The text was updated successfully, but these errors were encountered: