Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize AnimationMixer blend process #92838

Merged

Conversation

Nazarwadim
Copy link
Contributor

This PR is created to optimize the AnimaionMixer _process_animation.

void AnimationMixer::_process_animation(double p_delta, bool p_update_only) {
	_blend_init();
	if (_blend_pre_process(p_delta, track_count, track_map)) {
		_blend_capture(p_delta);
		_blend_calc_total_weight();             
		_blend_process(p_delta, p_update_only);
		_blend_apply();
		_blend_post_process();
		emit_signal(SNAME("mixer_applied"));
	};
	clear_animation_instances();
}

Benchmarking methods:

I made some benchmarks of how long each of these methods takes for one 3D model, using animation_tree.

Some explanations to understand the results.
Units of measurement: usec.
is_process = _blend_pre_process
weight = _blend_calc_total_weight

Master:

You can see here that the _blend_process, _blend_pre_process and _blend_calc_total_weight methods take the most time.

I will show the results that are in this PR.

You can see that blend_process has improved by 30%, blend calc_total_weight has improved by 15% and _blend_pre_process has improved by 50%.

Real project benchmarks:

Master:
Project FPS: 56 (17.85 mspf)
Project FPS: 56 (17.85 mspf)
Project FPS: 56 (17.85 mspf)
Project FPS: 56 (17.85 mspf)
Project FPS: 55 (18.18 mspf)
Project FPS: 55 (18.18 mspf)

Current PR:
Project FPS: 68 (14.70 mspf)
Project FPS: 69 (14.49 mspf)
Project FPS: 66 (15.15 mspf)
Project FPS: 66 (15.15 mspf)
Project FPS: 68 (14.70 mspf)

Here you can see a pretty good + 22%. Note that #92554 also improves animation performance, which with this PR adds 40% to performance.

How to test:

Here #92554 in the benchmark section is Animation_test.zip project. After opening, fps will be output to the console.

What was done:

Animation:

  • For enums, the size was reduced from 4 to 1 byte.
  • Added the get_tracks method.

AnimationTree:

  • I made track_map a pointer in order not to copy maps.

AnimationMixer:

  • The first is to use getptr instead of has + operator[].
  • The second is to use int count = a->get_track_count(); In order to store in a register the number of iterations.
  • Used the method already created in animation.h to take the array. const Vector<Animation::Track *> tracks = a->get_tracks();
  • In the method post_process_key_value I cache whether there is GDVIRTUAL_CALL. That is, there will be only 1 check per _blend_process call.

Probably closes: #92693

@Nazarwadim Nazarwadim requested a review from a team as a code owner June 6, 2024 12:55
@@ -45,7 +45,7 @@ class Animation : public Resource {

static inline String PARAMETERS_BASE_PATH = "parameters/";

enum TrackType {
enum TrackType : uint8_t {
Copy link
Member

@AThousandShips AThousandShips Jun 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how much this improves, the original data type would have been int but the alignment and other factors prevent significant optimization generally, but it might cause issues down the line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was done in order to reduce the size of the Track from 40 to 32 bytes. You can check using sizeof.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then that's useful, but needs to be evaluated for other aspects and for data safety with memory management, unsure if some data here is used in more direct ways that might break if you change it

@JekSun97
Copy link
Contributor

JekSun97 commented Jun 6, 2024

Your test project is not very suitable for such a test, due to the dynamic camera, and a large number of meshes, I used the project from a recent discussion (#92724), which uses 301 skeletons, and it really showed excellent results!

4.3 beta1 (master): FPS 47-48
This PR: FPS 56-58

test1
test2

I'll attach a project that is great for testing this PR:
anm.zip

Let's see what @TokageItLab says about the code

@Saul2022
Copy link

Saul2022 commented Jun 6, 2024

I would say that by itself it won’t be enough but with the pr you mentioned it might close it as it reduces it cost a lot more(40% as you stated when combining it )

Either way this is great job and hopefully it gets merged quick in 4.4 along with your other great optimizations pr’s.

@JekSun97
Copy link
Contributor

JekSun97 commented Jun 6, 2024

Either way this is great job and hopefully it gets merged quick in 4.4 along with your other great optimizations pr’s.

There are no compatibility problems or any major innovations here, this is just an optimization of what already exists.
So I think this would work well for 4.3 as well

Copy link
Member

@Calinou Calinou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested locally (rebased on top of master 292e50e), it works as expected. Great work 🙂

Benchmark

Using an optimized editor build (optimize=speed lto=full) for all tests. Project is kept at its default window size and the camera isn't moved after startup. The editor is not running in the background.

PC specifications
  • CPU: Intel Core i9-13900K
  • GPU: NVIDIA GeForce RTX 4090
  • RAM: 64 GB (2×32 GB DDR5-5800 C30)
  • SSD: Solidigm P44 Pro 2 TB
  • OS: Linux (Fedora 39)

Using Animation_test.zip:

master This PR #92554 This PR + #92554 1
90 FPS (11.11 mspf) 116 FPS (8.62 mspf) 113 FPS (8.84 mspf) 123 FPS (8.13 mspf)

Footnotes

  1. This required trivial code changes to get it to compile, since there's a small merge conflict when rebasing https://github.com/godotengine/godot/pull/92554 on top of this PR. Visual results still looked correct after doing so.

Copy link
Member

@TokageItLab TokageItLab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recall our earlier discussion that we should use GDREGISTER_VIRTUAL_CLASS for some Animation related classes.

Comment on lines +960 to +979
Variant AnimationMixer::post_process_key_value(const Ref<Animation> &p_anim, int p_track, Variant p_value, ObjectID p_object_id, int p_object_sub_idx) {
if (is_GDVIRTUAL_CALL_post_process_key_value) {
Variant res;
if (GDVIRTUAL_CALL(_post_process_key_value, p_anim, p_track, p_value, p_object_id, p_object_sub_idx, res)) {
return res;
}
is_GDVIRTUAL_CALL_post_process_key_value = false;
}
return _post_process_key_value(p_anim, p_track, p_value, p_object_id, p_object_sub_idx);
}

Copy link
Member

@TokageItLab TokageItLab Jun 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this change would not be necessary if you make AnimationMixer be registered as GDREGISTER_VIRTUAL_CLASS instead of GDREGISTER_ABSTRACT_CLASS in register_core_types.cpp? cc @reduz

This is also applicable to the AnimationNode.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is still necessary. I don't know what is happening in the GDVIRTUAL_CALL method, but I know that it is slow no matter how this class was registered. And so this variable caches the return result of this method if it returned false.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, for some reason, the performance decreased when the registration was changed to GDREGISTER_VIRTUAL_CLASS

@@ -1073,15 +1076,18 @@ void AnimationMixer::_blend_calc_total_weight() {
real_t weight = ai.playback_info.weight;
Vector<real_t> track_weights = ai.playback_info.track_weights;
Vector<int> processed_indices;
for (int i = 0; i < a->get_track_count(); i++) {
if (!a->track_is_enabled(i)) {
const Vector<Animation::Track *> tracks = a->get_tracks();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const Vector<Animation::Track *> tracks = a->get_tracks();
const Vector<Animation::Track *> tracks = a->get_tracks();

I remember we always saying that we should use LocalVector whenever possible. There are several other places in the animation code where Vector is used besides here, but as long as it doesn't use has(), sort(), etc., I think it can be migrated to LocalVector.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using LocalVector can result in many copies. Also, if it is changed, it will be necessary to make many changes in the already existing code. For example, with int in for loop, replace it with uint32_t so that there are no warnings from the compiler.
In order to remove the performance overhead, I used ptr.

@akien-mga akien-mga changed the title Optimize animations Optimize AnimationMixer blend process Jun 12, 2024
@TokageItLab TokageItLab requested a review from reduz June 15, 2024 12:03
@WOLFxxxxxx

This comment was marked as off-topic.

@TokageItLab

This comment was marked as off-topic.

@JekSun97
Copy link
Contributor

JekSun97 commented Aug 26, 2024

@reduz I think it's time to merge it with 4.4, no conflicts related to this PR have been found

@fire fire requested a review from a team August 27, 2024 00:03
@Nazarwadim Nazarwadim force-pushed the small_animation_optimization branch from 6ada396 to 4fcae61 Compare August 30, 2024 16:37
@Nazarwadim Nazarwadim force-pushed the small_animation_optimization branch from 4fcae61 to 660e28f Compare August 30, 2024 16:41
@akien-mga akien-mga requested a review from TokageItLab August 30, 2024 21:34
Copy link
Member

@TokageItLab TokageItLab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the code, there should be no change in the essential behavior. I guess there are places where we can be optimized more, but I think this PR is fine for now.

@TokageItLab TokageItLab modified the milestones: 4.x, 4.4 Aug 31, 2024
@akien-mga akien-mga merged commit 25fc316 into godotengine:master Sep 2, 2024
19 checks passed
@akien-mga
Copy link
Member

Thanks!

@Nazarwadim Nazarwadim deleted the small_animation_optimization branch September 3, 2024 09:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

AnimationPlayer/AnimationTree high process large framedrops
9 participants