Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix multiple issues in WorkerThreadPool #76945

Merged
merged 1 commit into from
May 12, 2023

Conversation

RandomShaper
Copy link
Member

@RandomShaper RandomShaper commented May 11, 2023

  • Fix project settings being ignored.
  • Made usages of native_thread_allocator thread-safe.
  • Remove redundant thread-safety from low_priority_threads_used, exit_threads.
  • Fix deadlock due to unintended extra lock of task_mutex.

NOTE: I have a concern that by cherry-picking this into 4.0 the users that tweaked threading/worker_pool/* settings will now get a behavior change. It's the correct thing to happen, but maybe too surprising for 4.0, or at least warrants a note in the release notes.

UPDATE: For the records, this may also fix some hard to reproduce crash related to a thread object in one of the projects used to test #74405, that worked fine otherwise.

@akien-mga
Copy link
Member

NOTE: I have a concern that by cherry-picking this into 4.0 the users that tweaked threading/worker_pool/* settings will now get a behavior change. It's the correct thing to happen, but maybe too surprising for 4.0, or at least warrants a note in the release notes.

Doesn't seem too problematic. But yes, it will be mentioned in the changelog.

bool low_priority_use_system_threads = GLOBAL_GET("threading/worker_pool/use_system_threads_for_low_priority_tasks");
float low_property_ratio = GLOBAL_GET("threading/worker_pool/low_priority_thread_ratio");

if (editor || project_manager) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've force-pushed to amend this check. (Formerly, it was calling the getters in the OS singleton, but they still return false unconditionally at this point.)

@RandomShaper RandomShaper marked this pull request as draft May 11, 2023 10:48
@RandomShaper RandomShaper marked this pull request as ready for review May 11, 2023 11:06
@RandomShaper
Copy link
Member Author

Sorry for the pushes, mark as draft, etc. I had just found an issue and thought it was a regression from this PR, but it isn't. I will just fix it in another PR, which may take more time.

@reduz
Copy link
Member

reduz commented May 11, 2023

Fantastic work!

@akien-mga
Copy link
Member

akien-mga commented May 11, 2023

This seems to be causing a deadlock when running unit tests godot --test, when compiling with scons tests=yes (or dev_mode=yes).

(gdb) bt
#0  0x00007f8821dfb736 in __futex_abstimed_wait_common () from /lib64/libc.so.6
#1  0x00007f8821dfdda8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libc.so.6
#2  0x000000000a56c66c in Semaphore::wait (this=0xe8d16a8) at ./core/os/semaphore.h:56
#3  WorkerThreadPool::wait_for_task_completion (this=0xd6225c0, p_task_id=1) at core/object/worker_thread_pool.cpp:274
#4  0x0000000005987e62 in TestWorkerThreadPool::DOCTEST_ANON_FUNC_4977 () at ./tests/core/threads/test_worker_thread_pool.h:66
#5  0x00000000057b1b81 in doctest::Context::run (this=0x7ffd6311b9a0) at ./thirdparty/doctest/doctest.h:6930
#6  0x0000000005c2d145 in test_main (argc=2, argv=0x7ffd6311c018) at tests/test_main.cpp:186
#7  0x000000000576e168 in Main::test_entrypoint (argc=2, argv=0x7ffd6311c018, tests_need_run=@0x7ffd6311ba7f: true) at main/main.cpp:684
#8  0x00000000056f9e02 in main (argc=2, argv=0x7ffd6311c018) at platform/linuxbsd/godot_linuxbsd.cpp:55

I cancelled the CI builds where the 3 jobs running unit tests had been stuck since more than 40 min.

- Fix project settings being ignored.
- Made usages of `native_thread_allocator` thread-safe.
- Remove redundant thread-safety from `low_priority_threads_used`, `exit_threads`.
- Fix deadlock due to unintended extra lock of `task_mutex`.
@RandomShaper RandomShaper requested a review from a team as a code owner May 11, 2023 14:10
@RandomShaper
Copy link
Member Author

Quoting myself from RC:

The problem is that, now the WorkerThreadPool is initialized once project settings are read, it's not initialized as part of the "very core" and, thus, it's not initialized for tests.
The simplest thing would be to add WorkerThreadPool::get_singleton()->init(); to test_main().
I'm not sure that is the right way to do it, though.
In any case, I'm doing that in the PR. Should be fine at least for now.

@myaaaaaaaaa
Copy link
Contributor

myaaaaaaaaa commented May 11, 2023

How late can WorkerThreadPool initialization be pushed to?

Specifically, if it can be initialized after ScriptServer (see below link), this will be able to supersede #74501 and #76586 (both of which are temporary hacks), which is currently blocking ThreadSanitizer in CI ( #73777 ) from being merged.

ScriptServer::init_languages();

@RandomShaper
Copy link
Member Author

Good question! And, conversely, how early can the other call happen? Maybe it can just be found out empirically, by watching when the pool is first used by some component.

@myaaaaaaaaa
Copy link
Contributor

Alternatively, WorkerThreadPool::init() and WorkerThreadPool::finish() can be merged into a single WorkerThreadPool::resize() function, defaulting to 0 worker threads where tasks are run on the main thread.

This would make WorkerThreadPool available for use immediately in single-threaded mode, allowing it to be safely "initialized" into multithreaded mode much later than was possible before.

It would also make it possible to unit test different WorkerThreadPool sizes.

@RandomShaper
Copy link
Member Author

@myaaaaaaaaa, that can work. Could you please incorporate that change to your PR? Let's say that's a related but separate issue than the ones this PR aims to fix and it shouldn't take much longer to be merged. Aside, there should be a way to assert that the thread pool has been properly initialized with threads by some point. For instance, this PR was breaking tests by failing to initialize it because there would be zero threads to run the tasks, but at least we could realize. With the changes we are considering, the test cases would just happily run all the tasks on the main thread, which is not what they are meant to do.

@akien-mga akien-mga merged commit 186aa64 into godotengine:master May 12, 2023
@akien-mga
Copy link
Member

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants