24.8.8 Backport PR #68694 Thread pool: move thread creation out of lock #573
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Original PR:
ClickHouse#68694
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Optimized thread creation in the ThreadPool to minimize lock contention. Thread creation is now performed outside of the critical section to avoid delays in job scheduling and thread management under high load conditions. This leads to a much more responsive ClickHouse under heavy concurrent load.
Documentation entry for user-facing changes
The thread creation process within the ThreadPool has been modified to address potential delays caused by holding a lock during thread creation. Previously, threads were created within a locked section, which could lead to significant contention and delays, especially when thread creation is slow. To mitigate this, the thread creation has been moved outside of the critical section.
Same test https://gist.github.com/filimonov/7e7adde17421d4a9f83c6fea2be8f802 - before and after the change - results are in comment below.
It looks like a clear win giving about 10% better QPS, much better response times and better threadpool and CPU usage, despite that thread creation become even slower - it's because now several thread can be created simultaneously and kernel side contetion is bigger. But now it does not block the thread pool work anymore.
P.S. I have no idea how to test it in CI/CD, but there is a change that performance tests will show something.