Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance & fix WorkerThreadPool #86587

Merged
merged 3 commits into from
Jan 11, 2024
Merged

Conversation

RandomShaper
Copy link
Member

@RandomShaper RandomShaper commented Dec 28, 2023

A number of changes to the WorkerThreadPool, split into commits with specific goals:

2023-01-05: Heavily edited after squashing two commits together and greatly polishing some changes.

  • WorkerThreadPool: Overhaul scheduling and synchronization

This commits rewrites the sync logic in a way that the use_system_threads_for_low_priority_tasks setting, which was added due to the lack of a cross-platform wait-for-multiple-objects functionality, can be removed (it's as if it was effectively hardcoded to false).

With the new implementation, we have the best of both worlds: threads don't have to poll, plus no bespoke threads are used.

In addition, regarding deadlock prevention, since not every possible case of wait-deadlock could be avoided, this commits removes the current best-effort avoidance mechanisms and keeps only a simple, pessimistic way of detection.

It turns out that the only current user of deadlock prevention, ResourceLoader, works fine with it and so every possible situation in resource loading is now properly handled, with no possibilities of deadlocking. There's a comment in the code with further details.

Lastly, a potential for load tasks never being awaited/disposed is cleared.

  • WorkerThreadPool: Avoid deadlocks when CommandQueueMT is involved

This commit lets CommandQueueMT play nicely with the WorkerThreadPool to avoid non-progressable situations caused by an interdependence between both. While a command queue is being flushed, it allows the WorkerThreadPool to release
the lock of the former while tasks are being awaited so they can make progress in case they need in turn to post to the command queue.

  • WorkerThreadPool: Avoid most runtime allocations

Just a little optimization.

NOTE:
With RID_Owner we could replace each pair of PagedAllocator and HashMap-of-ids-to-pointers. However, that would force us to expose RID as the task/group id, instead of int, which would break the API. Too bad. Let's wait until Godot 5.0.


Includes #86555.
Fixes #78790.


Testing appreciated. I'll keep testing it on different projects.

@fire
Copy link
Member

fire commented Dec 28, 2023

The changes to the xml documentation are straightforward changes, but any idea why the asan and such is failing.

@RandomShaper RandomShaper requested a review from a team as a code owner December 29, 2023 00:10
@RandomShaper RandomShaper force-pushed the wtp_enhance branch 3 times, most recently from f9b9178 to e030ada Compare December 29, 2023 10:11
@RandomShaper
Copy link
Member Author

Marking as Needs work because I'm rethinking one of the changes a bit, as I've found room for extra improvement.

@RandomShaper
Copy link
Member Author

Now I'm happy with this.

@RandomShaper RandomShaper force-pushed the wtp_enhance branch 2 times, most recently from f690123 to a0c2cee Compare January 5, 2024 17:29
Copy link
Member

@reduz reduz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I like the new approach!

core/templates/command_queue_mt.h Outdated Show resolved Hide resolved
core/object/worker_thread_pool.cpp Outdated Show resolved Hide resolved
@clayjohn
Copy link
Member

clayjohn commented Jan 6, 2024

I just cancelled CI, the tests that run the unit tests appeared to be totally stuck. They had been running for 5 hours when I cancelled the workflow

@AThousandShips
Copy link
Member

AThousandShips commented Jan 6, 2024

Sounds like a deadlock somewhere, suspect the threads aren't joined correctly

As a first step I'd suggest using RAII locking where possible to avoid problems there

@RandomShaper
Copy link
Member Author

I've reproduced it locally. Thankfully the current tests stress the thing enough to make the issue apparent. When group tasks are involved in huge amounts, some low-prio tasks are kept in the queue with no threads realizing the need to claim them. Under investigation. 🔍

@RandomShaper RandomShaper force-pushed the wtp_enhance branch 2 times, most recently from 5e04514 to 9bbe65e Compare January 8, 2024 11:10
This commits rewrites the sync logic in a way that the
`use_system_threads_for_low_priority_tasks` setting, which was added due to
the lack of a cross-platform wait-for-multiple-objects functionality, can be
removed (it's as if it was effectively hardcoded to `false`).

With the new implementation, we have the best of both worlds: threads don't
have to poll, plus no bespoke threads are used.

In addition, regarding deadlock prevention, since not every possible case of
wait-deadlock could be avoided, this commits removes the current best-effort
avoidance mechanisms and keeps only a simple, pessimistic way of detection.

It turns out that the only current user of deadlock prevention, ResourceLoader,
works fine with it and so every possible situation in resource loading is now
properly handled, with no possibilities of deadlocking. There's a comment in
the code with further details.

Lastly, a potential for load tasks never being awaited/disposed is cleared.
This commit lets CommandQueueMT play nicely with the WorkerThreadPool to avoid
non-progressable situations caused by an interdependence between both. While a
command queue is being flushed, it allows the WTP to release its lock while tasks
are being awaited so they can make progress in case they need in turn to post
to the command queue.
Just a little optimization.

**NOTE:**
With `RID_Owner` we could replace each pair of `PagedAllocator` and
`HashMap`-of-ids-to-pointers. However, that would force us to expose
`RID` as the task/group id, instead of `int`, which would break the
API. Too bad. Let's wait until Godot 5.0.
@RandomShaper
Copy link
Member Author

OK. I've found and fixed the logic error.

@akien-mga akien-mga merged commit dc79e95 into godotengine:master Jan 11, 2024
15 checks passed
@akien-mga
Copy link
Member

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error on exit about pages still in use for Thread and WorkerThreadPool::Task
6 participants