-
-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More performant partial region updates to multimeshes #957
Comments
I'd like to second the option for omitting AABB computations. The developer should be allowed to provide a custom AABB in a way similar to the I am building a game that displays thousands of dynamically moving 3D creatures (allocated in a handful of multimeshes) in a mostly top-down world map. The creatures are usually all over the map, and never beyond, so there's absolutely no need to waste the CPU's time (and the battery for mobile platforms) to keep recalculating something I can easily provide for each multimesh: the entire world's AABB. P.S. Interestingly, changing the color of instances also seems to trigger an AABB recalculation (looking at the source code of Godot for both GLES2 and GLES3), which seems unnecessary. Even if a developer used color to transform instances in the shader, the engine's AABB calculation can't take that into account, and would produce the same result as before the color change anyway, so I don't know what that's all about. In such a case, the AABB would be incorrect anyway, and only the developer would be able to provide the correct AABB. All the more reason for a custom_aabb. To remove the unnecessary AABB calculation when color changes, I think the line that sets |
Regarding custom AABBs, I've realized there are two functions to set a custom AABB for MultiMeshes:
However, setting a custom AABB doesn't prevent the GLES3 and GLES2 renderers from recalculating the AABB anyway every time an instance is moved or marked dirty for any other reason. While the custom AABB takes effect for frustum culling in my tests using GLES3, the AABB is recalculated anyway in every update, which I've confirmed using a debugger, which is an unnecessary performance hit. Details of why this is happening within Godot's source code: |
Describe the project you are working on:
I'm working on a 2d shmup. I use instanced rendering with multimeshes to render bullets, since I found other methods not performant enough. To avoid constant reallocation of the multimesh, I allocate a maximum number of instances and limit rendering by setting the number of instances. Every frame I would go over each bullet, write out the instance data, then set the number of rendered instances to the number of bullets that were rendered.
Describe the problem or limitation you are having in your project:
My initial approach was to write the instance data to a
Vector<float>
in a tight loop and then push it to the multimesh usingmultimesh_set_buffer
. However, this had some performance issues due tomultimesh_set_buffer
only taking vectors with data for the whole buffer:In addition, the AABB is recomputed across all instances, rather than just visible instances (https://github.com/godotengine/godot/blob/00949f0c5fcc6a4f8382a4a97d5591fd9ec380f8/servers/rendering/rasterizer_rd/rasterizer_storage_rd.cpp#L2984). This was actually the biggest performance sink found when I profiled.
In the end I ended up just eating the cost of the API calls and used
multimesh_instance_set_*
to write the instances out. These functions mark specific regions to be sent to the GPU, and also only compute the AABB over visible instances. This method was faster.There are potentially also issues with reallocations when the
Vector<float>
changes size frequently, but I believe that is a more general issue.LocalVector
, which does not reallocate on shrinking, was introduced in godotengine/godot#38386, but it is not used in many existing APIs, includingmultimesh_set_buffer
.Describe the feature / enhancement and how it helps to overcome the problem or limitation:
There are multiple issues and a couple different use cases that might require different fixes:
multimesh_instance_set_transform
, so that only visible instances are included in the computation. Ideally AABB computation could be optionally omitted entirely in case the developer is sure that the multimesh will always be on screen (as in my use case).multimesh_set_buffer
to take an undersized buffer. This is because my usage pattern is to update only the visible prefix of the buffer, with no gaps. I wouldn't need to use themultimesh_set_instance_*
functions, so no data cache would be necessary or helpful.multimesh_set_instance_*
). This would require a different interface to allow specifying the start index of the region. I am not sure what a realistic use case would be for this since I don't need it myself, but it seems useful.Performance may also improve in
CPUParticles2D
which has a similar usage pattern to mine. It would need to be changed to set undersized buffers.Describe how your proposal will work, with code, pseudocode, mockups, and/or diagrams:
I think the first two changes are mostly straightforward local code changes. A general region update function could have some signature like the following:
If this enhancement will not be used often, can it be worked around with a few lines of script?:
No. I have described a workaround for the base issue above (using
multimesh_instance_set_*
), but this enhancement would improve performance further by avoiding doing the API calls in a loop and not having an extra cache to copy to.Is there a reason why this should be core and not an add-on in the asset library?:
It needs to touch core rendering APIs.
The text was updated successfully, but these errors were encountered: