Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up monster action planning #35632

Merged

Conversation

FeepingCreature
Copy link
Contributor

@FeepingCreature FeepingCreature commented Nov 20, 2019

SUMMARY: Performance "Speed up monster action planning."

Noticed that sleeping in the lab was a bit slow, so I attached a profiler to see what was going on.

With GNU libstdc++, shared_ptr<> will always use threadsafe operations when linked against libpthread at all. This is not necessary in CDDA, so replace shared_ptr<> with shared_ptr_fast<> and weak_ptr<> with weak_ptr_fast<>, which is redefined to use libstdc++'s internal single-threaded version.

After that, I noticed that calculating distances in monster::rate_target() was eating up a lot of time to compute trig distances. rate_target doesn't need the actual distance until a way's in, so if we inline rl_dist and allow it to defer the sqrt until later on by adding rl_dist_squared and comparing with the square of best instead, we can make up a bunch more speed.

…_ptr_fast

std::weak_ptr/std::shared_ptr forces atomic access despite the fact cdda is singlethreaded.
@mlangsdorf mlangsdorf changed the title Fix/improve performance Speed up monster action planning Nov 21, 2019
@mlangsdorf mlangsdorf added [C++] Changes (can be) made in C++. Previously named `Code` Code: Performance Performance boosting code (CPU, memory, etc.) labels Nov 21, 2019
@ZhilkinSerg
Copy link
Contributor

Do you have a savegame to profile a performance increase (or maybe performance test results as text/screenshots)?

@FeepingCreature
Copy link
Contributor Author

FeepingCreature commented Nov 21, 2019

I should probably do that.

update: impromptu bench timing just sitting around for six hours in my lab cell says this PR probably doesn't do enough to be worthwhile: 87s -> 79s. Gonna try to remove the unwieldy first commit to see how much is the _squared change.

update:

version time
before 87s
rl_dist_fast 83s
rl_dist_fast + shared_ptr_fast 79s

perf shows lock() eating 15% of cpu while waiting, which fits - non-atomic lock isn't free, but it skips the 40%-of-15% lock cmpxchg.

I think it's still good as a PR. Free 5% twice is nothing to sneeze at.

edit: One important thing to add would probably be making the rl_dist_squared calls fast in the non-trig case by skipping the square/sqrt hack in that case.

edit: Done! rl_dist_squared is now rl_dist_fast and returns a wrapper that automatically calls sqrt() for the int conversion (iff trigdist is set).

…the length checks in monster::rate_target() and compute the sqrt when we know it's necessary.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[C++] Changes (can be) made in C++. Previously named `Code` Code: Performance Performance boosting code (CPU, memory, etc.)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants