Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More performant partial region updates to multimeshes #957

Open
raymoo opened this issue May 29, 2020 · 2 comments
Open

More performant partial region updates to multimeshes #957

raymoo opened this issue May 29, 2020 · 2 comments

Comments

@raymoo
Copy link

raymoo commented May 29, 2020

Describe the project you are working on:

I'm working on a 2d shmup. I use instanced rendering with multimeshes to render bullets, since I found other methods not performant enough. To avoid constant reallocation of the multimesh, I allocate a maximum number of instances and limit rendering by setting the number of instances. Every frame I would go over each bullet, write out the instance data, then set the number of rendered instances to the number of bullets that were rendered.

Describe the problem or limitation you are having in your project:

My initial approach was to write the instance data to a Vector<float> in a tight loop and then push it to the multimesh using multimesh_set_buffer. However, this had some performance issues due to multimesh_set_buffer only taking vectors with data for the whole buffer:

  • Requires copying of data for the whole buffer, even if I only want to write some of it (there are only a number of rendered instances equal to the number of bullets, and the rest is garbage data that will be changed when I actually have more bullets to render).
  • The whole buffer is pushed to the GPU every time, a similar issue to the above point, but between the GPU and CPU.

In addition, the AABB is recomputed across all instances, rather than just visible instances (https://github.com/godotengine/godot/blob/00949f0c5fcc6a4f8382a4a97d5591fd9ec380f8/servers/rendering/rasterizer_rd/rasterizer_storage_rd.cpp#L2984). This was actually the biggest performance sink found when I profiled.

In the end I ended up just eating the cost of the API calls and used multimesh_instance_set_* to write the instances out. These functions mark specific regions to be sent to the GPU, and also only compute the AABB over visible instances. This method was faster.

There are potentially also issues with reallocations when the Vector<float> changes size frequently, but I believe that is a more general issue. LocalVector, which does not reallocate on shrinking, was introduced in godotengine/godot#38386, but it is not used in many existing APIs, including multimesh_set_buffer.

Describe the feature / enhancement and how it helps to overcome the problem or limitation:

There are multiple issues and a couple different use cases that might require different fixes:

  • The AABB calculation can be made faster (in the case of partial rendering) by marking the AABB dirty as in the instance update functions like multimesh_instance_set_transform, so that only visible instances are included in the computation. Ideally AABB computation could be optionally omitted entirely in case the developer is sure that the multimesh will always be on screen (as in my use case).
  • For the other issues, it would be sufficient in my use case to allow multimesh_set_buffer to take an undersized buffer. This is because my usage pattern is to update only the visible prefix of the buffer, with no gaps. I wouldn't need to use the multimesh_set_instance_* functions, so no data cache would be necessary or helpful.
  • The more general case where regions are updated in the middle of the buffer may benefit from caching when multiple regions are updated in a frame (similar to multimesh_set_instance_*). This would require a different interface to allow specifying the start index of the region. I am not sure what a realistic use case would be for this since I don't need it myself, but it seems useful.

Performance may also improve in CPUParticles2D which has a similar usage pattern to mine. It would need to be changed to set undersized buffers.

Describe how your proposal will work, with code, pseudocode, mockups, and/or diagrams:

I think the first two changes are mostly straightforward local code changes. A general region update function could have some signature like the following:

void multimesh_set_region(RID p_multimesh, size_t start_index, const Vector<float> &p_data);

If this enhancement will not be used often, can it be worked around with a few lines of script?:

No. I have described a workaround for the base issue above (using multimesh_instance_set_*), but this enhancement would improve performance further by avoiding doing the API calls in a loop and not having an extra cache to copy to.

Is there a reason why this should be core and not an add-on in the asset library?:

It needs to touch core rendering APIs.

@ArdaE
Copy link

ArdaE commented Jun 11, 2020

I'd like to second the option for omitting AABB computations. The developer should be allowed to provide a custom AABB in a way similar to the mesh_set_custom_aabb function for meshes, and the engine should use the custom AABB if it's provided instead of recalculating it every time one or more instances move.

I am building a game that displays thousands of dynamically moving 3D creatures (allocated in a handful of multimeshes) in a mostly top-down world map. The creatures are usually all over the map, and never beyond, so there's absolutely no need to waste the CPU's time (and the battery for mobile platforms) to keep recalculating something I can easily provide for each multimesh: the entire world's AABB.

P.S. Interestingly, changing the color of instances also seems to trigger an AABB recalculation (looking at the source code of Godot for both GLES2 and GLES3), which seems unnecessary. Even if a developer used color to transform instances in the shader, the engine's AABB calculation can't take that into account, and would produce the same result as before the color change anyway, so I don't know what that's all about. In such a case, the AABB would be incorrect anyway, and only the developer would be able to provide the correct AABB. All the more reason for a custom_aabb. To remove the unnecessary AABB calculation when color changes, I think the line that sets multimesh->dirty_aabb = true in both RasterizerStorageGLES2::multimesh_instance_set_color and RasterizerStorageGLES3::multimesh_instance_set_color should be removed from the source code of Godot.

@ArdaE
Copy link

ArdaE commented Oct 18, 2020

Regarding custom AABBs, I've realized there are two functions to set a custom AABB for MultiMeshes:

However, setting a custom AABB doesn't prevent the GLES3 and GLES2 renderers from recalculating the AABB anyway every time an instance is moved or marked dirty for any other reason. While the custom AABB takes effect for frustum culling in my tests using GLES3, the AABB is recalculated anyway in every update, which I've confirmed using a debugger, which is an unnecessary performance hit.

Details of why this is happening within Godot's source code:
While VisualServerScene's _update_instance_aabb function appears to avoid an unnecessary AABB calculation on the surface by not calling VSG::storage->multimesh_get_aabb() if/when a custom_aabb is provided, the rasterizers' call to update_dirty_multimeshes in every frame causes AABB for the MultiMesh to be recalculated anyway. It seems like the rasterizers need to be made aware of the presence of a custom AABB by adding a new function to the RasterizerStorage class, to be implemented by both RasterizerStorageGLES3 and RasterizerStorageGLES2 to avoid recalculating the AABB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants