-
-
Notifications
You must be signed in to change notification settings - Fork 4
feature[RR v2024]: max_jobs
jitter to prevent thundering herd problem
#113
feature[RR v2024]: max_jobs
jitter to prevent thundering herd problem
#113
Conversation
Signed-off-by: Vladimir Plakhotnikov <[email protected]>
WalkthroughThese changes bring about a more flexible and context-aware worker allocation and management system across various components. By incorporating contexts, configurations for maximum job dispersion, and execution limits, the system now offers improved control over worker lifecycle and task execution. This update ensures better resource management, adaptability to workload variations, and enhanced operational efficiency in handling concurrent tasks within the system. Changes
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 2
Configuration used: CodeRabbit UI
Files selected for processing (4)
- pool/allocator.go (4 hunks)
- pool/config.go (2 hunks)
- pool/static_pool/pool.go (2 hunks)
- worker/worker.go (2 hunks)
Additional comments: 7
pool/allocator.go (2)
- 5-6: Imports for
math
andmath/rand
are added to support the new functionality related to job dispersion calculation. Ensure these libraries are used effectively and securely.- 26-37: > 📝 NOTE
This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [29-45]
The
NewPoolAllocator
function now accepts acfg *Config
parameter, replacing the previoustimeout
parameter. This change aligns with the introduction of theMaxJobsDispersion
feature. Ensure that all calls to this function have been updated to pass the correct configuration object.Verification successful
The verification process has confirmed that the call to
NewPoolAllocator
inpool/static_pool/pool.go
has been updated to pass the correct configuration object, aligning with the updated function signature that includes acfg *Config
parameter. There are no indications of other calls toNewPoolAllocator
that have not been updated accordingly. This suggests that the necessary updates following the introduction of theMaxJobsDispersion
feature have been made correctly.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for NewPoolAllocator function calls to ensure they are updated. rg 'NewPoolAllocator\('Length of output: 275
pool/config.go (1)
- 61-63: The initialization logic for
MaxJobsDispersion
ensures its value is within the range (0.0, 1.0]. This logic is correctly implemented. However, consider adding a comment to explain why values outside this range are reset to 1.0 for clarity.pool/static_pool/pool.go (2)
- 90-90: The
NewPool
function call has been updated to pass the correct set of arguments, reflecting the changes made toNewPoolAllocator
. Verify that the updated arguments are correctly passed throughout the codebase.- 204-204: The condition to check if the maximum number of jobs has been reached now correctly uses the
MaxJobsReached
method. This change ensures that the logic for determining if a worker has reached its execution limit is centralized and can easily account for theMaxJobsDispersion
feature.worker/worker.go (2)
- 487-489: The
SetMaxExecs
method correctly sets the maximum number of executions for a worker. This method is straightforward and correctly implements the intended functionality.- 491-493: The
MaxJobsReached
method correctly checks if the worker has reached its maximum number of executions. This method is essential for integrating theMaxJobsDispersion
feature into the worker's lifecycle management.
Signed-off-by: Vladimir Plakhotnikov <[email protected]>
Signed-off-by: Vladimir Plakhotnikov <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 5
Configuration used: CodeRabbit UI
Files selected for processing (7)
- fsm/fsm.go (1 hunks)
- ipc/pipe/pipe_factory.go (2 hunks)
- ipc/socket/socket_factory.go (2 hunks)
- pool/allocator.go (2 hunks)
- pool/config.go (1 hunks)
- pool/static_pool/pool.go (1 hunks)
- worker/worker.go (3 hunks)
Files skipped from review as they are similar to previous changes (3)
- pool/allocator.go
- pool/config.go
- pool/static_pool/pool.go
Additional comments: 2
worker/worker.go (2)
- 34-36: Introduction of
maxExecs
in theProcess
struct and related methods (WithMaxExecs
,SetMaxExecs
,MaxJobsReached
) aligns with the PR's objectives forMaxJobsDispersion
. Ensure these capabilities are effectively utilized in worker initialization.- 35-35: Past discussions on the necessity of
maxExecs
and precalculated values for optimization are addressed by the current implementation, offering dynamic configuration and potential efficiency improvements.
Signed-off-by: Vladimir Plakhotnikov <[email protected]>
Signed-off-by: Vladimir Plakhotnikov <[email protected]>
Signed-off-by: Vladimir Plakhotnikov <[email protected]>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #113 +/- ##
==========================================
+ Coverage 72.69% 73.08% +0.39%
==========================================
Files 23 24 +1
Lines 2212 1646 -566
==========================================
- Hits 1608 1203 -405
+ Misses 553 395 -158
+ Partials 51 48 -3 ☔ View full report in Codecov by Sentry. |
Signed-off-by: Vladimir Plakhotnikov <[email protected]>
@Kaspiman tests are red 😢 |
Tests are red and I'll finish the tests in the coming days, we're not in a hurry, are we? |
@Kaspiman Sure, take your time 👍 |
Signed-off-by: Vladimir Plakhotnikov <[email protected]>
max_jobs
jitter to prevent thundering herd problemmax_jobs
jitter to prevent thundering herd problem
Signed-off-by: Vladimir Plakhotnikov <[email protected]>
Signed-off-by: Vladimir Plakhotnikov <[email protected]>
Signed-off-by: Vladimir Plakhotnikov <[email protected]>
Signed-off-by: Vladimir Plakhotnikov <[email protected]>
Some tests are always failed because |
Hey @Kaspiman 👋 |
Signed-off-by: Vladimir Plakhotnikov <[email protected]>
Signed-off-by: Vladimir Plakhotnikov <[email protected]>
Signed-off-by: Vladimir Plakhotnikov <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks @Kaspiman 👍 It'll be merged a little bit later, when I'll be preparing v2024. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Configuration used: CodeRabbit UI
Files selected for processing (3)
- ipc/socket/socket_factory.go (5 hunks)
- ipc/socket/socket_factory_spawn_test.go (16 hunks)
- ipc/socket/socket_factory_test.go (19 hunks)
Additional comments: 9
ipc/socket/socket_factory.go (3)
- 33-36: The
NewSocketServer
function correctly initializes a newFactory
instance with the provided listener and logger. This change aligns with the removal of thetout
parameter as mentioned in the summary.- 86-90: The
SpawnWorkerWithContext
method has been introduced to replace theSpawnWorker
method, accepting a context and additional worker options. This is a significant improvement for managing worker lifecycles with context, allowing for better timeout and cancellation handling.- 170-170: In the
findRelayWithContext
method, the error handling has been enhanced to return a more specific error when the context deadline is exceeded. This improvement provides clearer feedback to the caller about the nature of the failure.ipc/socket/socket_factory_spawn_test.go (3)
- 24-24: The use of
require.NoError
for asserting the absence of errors when closing the listener is a good practice, ensuring that the test fails immediately if an unexpected error occurs. This change enhances the robustness of the test suite.- 32-35: The transition to using
context.WithTimeout
for managing timeouts in socket operations is a positive change, aligning with modern Go practices for timeout and cancellation handling. This approach provides more flexibility and control over the execution flow.- 96-105: > 📝 NOTE
This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [87-102]
In the
Test_Tcp_StartError2
test case, the introduction of a context with a timeout for spawning workers is a significant improvement. It ensures that the test case can gracefully handle situations where the worker fails to start within the expected timeframe.ipc/socket/socket_factory_test.go (3)
- 35-35: The update to use
SpawnWorkerWithContext
in theTest_Tcp_Start
test case is consistent with the changes in the main codebase, demonstrating the new approach to spawning workers with context. This ensures that the tests accurately reflect the current functionality.- 101-109: > 📝 NOTE
This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [83-106]
In the
Test_Tcp_StartError
test case, the use ofcontext.WithTimeout
and the subsequent call toSpawnWorkerWithContext
align with the updated worker spawning mechanism. This change ensures that the test case properly handles timeout scenarios.
- 154-154: The
Test_Tcp_Timeout
test case correctly uses a very short timeout to simulate a timeout scenario. This test validates the behavior ofSpawnWorkerWithContext
when the context expires before the worker can be successfully spawned.
Thanks @Kaspiman 👍 |
Reason for This PR
Added jitter for smooth workers` restart.
A similar solution from baldinof/roadrunner-bundle
Description of Changes
AI POWER HELP ME
License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the MIT license.
PR Checklist
[Author TODO: Meet these criteria.]
[Reviewer TODO: Verify that these criteria are met. Request changes if not]
git commit -s
).CHANGELOG.md
.Summary by CodeRabbit