-
-
Notifications
You must be signed in to change notification settings - Fork 21.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize AnimationMixer blend process #92838
Optimize AnimationMixer blend process #92838
Conversation
@@ -45,7 +45,7 @@ class Animation : public Resource { | |||
|
|||
static inline String PARAMETERS_BASE_PATH = "parameters/"; | |||
|
|||
enum TrackType { | |||
enum TrackType : uint8_t { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how much this improves, the original data type would have been int
but the alignment and other factors prevent significant optimization generally, but it might cause issues down the line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was done in order to reduce the size of the Track
from 40 to 32 bytes. You can check using sizeof.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then that's useful, but needs to be evaluated for other aspects and for data safety with memory management, unsure if some data here is used in more direct ways that might break if you change it
Your test project is not very suitable for such a test, due to the dynamic camera, and a large number of meshes, I used the project from a recent discussion (#92724), which uses 301 skeletons, and it really showed excellent results! 4.3 beta1 (master): FPS 47-48 I'll attach a project that is great for testing this PR: Let's see what @TokageItLab says about the code |
I would say that by itself it won’t be enough but with the pr you mentioned it might close it as it reduces it cost a lot more(40% as you stated when combining it ) Either way this is great job and hopefully it gets merged quick in 4.4 along with your other great optimizations pr’s. |
There are no compatibility problems or any major innovations here, this is just an optimization of what already exists. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested locally (rebased on top of master
292e50e), it works as expected. Great work 🙂
Benchmark
Using an optimized editor build (optimize=speed lto=full
) for all tests. Project is kept at its default window size and the camera isn't moved after startup. The editor is not running in the background.
PC specifications
- CPU: Intel Core i9-13900K
- GPU: NVIDIA GeForce RTX 4090
- RAM: 64 GB (2×32 GB DDR5-5800 C30)
- SSD: Solidigm P44 Pro 2 TB
- OS: Linux (Fedora 39)
Using Animation_test.zip:
master |
This PR | #92554 | This PR + #92554 1 |
---|---|---|---|
90 FPS (11.11 mspf) | 116 FPS (8.62 mspf) | 113 FPS (8.84 mspf) | 123 FPS (8.13 mspf) |
Footnotes
-
This required trivial code changes to get it to compile, since there's a small merge conflict when rebasing https://github.com/godotengine/godot/pull/92554 on top of this PR. Visual results still looked correct after doing so. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recall our earlier discussion that we should use GDREGISTER_VIRTUAL_CLASS for some Animation related classes.
Variant AnimationMixer::post_process_key_value(const Ref<Animation> &p_anim, int p_track, Variant p_value, ObjectID p_object_id, int p_object_sub_idx) { | ||
if (is_GDVIRTUAL_CALL_post_process_key_value) { | ||
Variant res; | ||
if (GDVIRTUAL_CALL(_post_process_key_value, p_anim, p_track, p_value, p_object_id, p_object_sub_idx, res)) { | ||
return res; | ||
} | ||
is_GDVIRTUAL_CALL_post_process_key_value = false; | ||
} | ||
return _post_process_key_value(p_anim, p_track, p_value, p_object_id, p_object_sub_idx); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this change would not be necessary if you make AnimationMixer
be registered as GDREGISTER_VIRTUAL_CLASS
instead of GDREGISTER_ABSTRACT_CLASS
in register_core_types.cpp
? cc @reduz
This is also applicable to the AnimationNode
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is still necessary. I don't know what is happening in the GDVIRTUAL_CALL method, but I know that it is slow no matter how this class was registered. And so this variable caches the return result of this method if it returned false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, for some reason, the performance decreased when the registration was changed to GDREGISTER_VIRTUAL_CLASS
@@ -1073,15 +1076,18 @@ void AnimationMixer::_blend_calc_total_weight() { | |||
real_t weight = ai.playback_info.weight; | |||
Vector<real_t> track_weights = ai.playback_info.track_weights; | |||
Vector<int> processed_indices; | |||
for (int i = 0; i < a->get_track_count(); i++) { | |||
if (!a->track_is_enabled(i)) { | |||
const Vector<Animation::Track *> tracks = a->get_tracks(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const Vector<Animation::Track *> tracks = a->get_tracks(); | |
const Vector<Animation::Track *> tracks = a->get_tracks(); |
I remember we always saying that we should use LocalVector
whenever possible. There are several other places in the animation code where Vector
is used besides here, but as long as it doesn't use has()
, sort()
, etc., I think it can be migrated to LocalVector
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using LocalVector
can result in many copies. Also, if it is changed, it will be necessary to make many changes in the already existing code. For example, with int
in for loop, replace it with uint32_t
so that there are no warnings from the compiler.
In order to remove the performance overhead, I used ptr
.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
@reduz I think it's time to merge it with 4.4, no conflicts related to this PR have been found |
6ada396
to
4fcae61
Compare
4fcae61
to
660e28f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the code, there should be no change in the essential behavior. I guess there are places where we can be optimized more, but I think this PR is fine for now.
Thanks! |
This PR is created to optimize the AnimaionMixer
_process_animation
.Benchmarking methods:
I made some benchmarks of how long each of these methods takes for one 3D model, using animation_tree.
Some explanations to understand the results.
Units of measurement: usec.
is_process = _blend_pre_process
weight = _blend_calc_total_weight
Master:

You can see here that the
_blend_process
,_blend_pre_process
and_blend_calc_total_weight
methods take the most time.I will show the results that are in this PR.
You can see that
blend_process
has improved by 30%, blendcalc_total_weight
has improved by 15% and_blend_pre_process
has improved by 50%.Real project benchmarks:
Master:
Project FPS: 56 (17.85 mspf)
Project FPS: 56 (17.85 mspf)
Project FPS: 56 (17.85 mspf)
Project FPS: 56 (17.85 mspf)
Project FPS: 55 (18.18 mspf)
Project FPS: 55 (18.18 mspf)
Current PR:
Project FPS: 68 (14.70 mspf)
Project FPS: 69 (14.49 mspf)
Project FPS: 66 (15.15 mspf)
Project FPS: 66 (15.15 mspf)
Project FPS: 68 (14.70 mspf)
Here you can see a pretty good + 22%. Note that #92554 also improves animation performance, which with this PR adds 40% to performance.
How to test:
Here #92554 in the benchmark section is Animation_test.zip project. After opening, fps will be output to the console.
What was done:
Animation:
AnimationTree:
AnimationMixer:
int count = a->get_track_count();
In order to store in a register the number of iterations.const Vector<Animation::Track *> tracks = a->get_tracks();
post_process_key_value
I cache whether there is GDVIRTUAL_CALL. That is, there will be only 1 check per _blend_process call.Probably closes: #92693