Skip to content
This repository has been archived by the owner on Jun 27, 2024. It is now read-only.

feature[RR v2024]: max_jobs jitter to prevent thundering herd problem #113

Merged
merged 37 commits into from
Apr 9, 2024

Conversation

Kaspiman
Copy link
Contributor

@Kaspiman Kaspiman commented Feb 7, 2024

Reason for This PR

Added jitter for smooth workers` restart.

A similar solution from baldinof/roadrunner-bundle

Description of Changes

AI POWER HELP ME

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the MIT license.

PR Checklist

[Author TODO: Meet these criteria.]
[Reviewer TODO: Verify that these criteria are met. Request changes if not]

  • All commits in this PR are signed (git commit -s).
  • The reason for this PR is clearly provided (issue no. or explanation).
  • The description of changes is clear and encompassing.
  • Any required documentation changes (code and docs) are included in this PR.
  • Any user-facing changes are mentioned in CHANGELOG.md.
  • All added/changed functionality is tested.

Summary by CodeRabbit

  • New Features
    • Introduced configuration options for worker processes, including execution caps and jitter settings.
    • Added functionality to calculate maximum executions based on job dispersion.
  • Enhancements
    • Updated pool allocator to use a configuration object, enhancing flexibility and customization.
    • Modified worker spawning to utilize context for better deadline and cancellation management.
  • Refactor
    • Renamed and updated functions to align with context usage and configuration adjustments.
  • Bug Fixes
    • Adjusted job execution checks to correctly recognize when maximum executions are reached.
  • Tests
    • Updated and removed certain tests to reflect changes in worker spawning and execution management.
  • Documentation
    • Added comments and documentation for new features and significant changes.

Copy link

coderabbitai bot commented Feb 7, 2024

Walkthrough

These changes bring about a more flexible and context-aware worker allocation and management system across various components. By incorporating contexts, configurations for maximum job dispersion, and execution limits, the system now offers improved control over worker lifecycle and task execution. This update ensures better resource management, adaptability to workload variations, and enhanced operational efficiency in handling concurrent tasks within the system.

Changes

Files Summary
pool/allocator.go, pool/static_pool/pool.go Updated to use contexts and config for worker allocation, added execution limits, and suggested removing unused functions.
pool/config.go Added fields for job dispersion and jitter in worker execution.
worker/worker.go, worker/options.go Introduced functionality for setting and updating maximum executions, with configuration options for workers.
ipc/pipe/..., ipc/socket/... Modified to use SpawnWorkerWithContext, aligning worker spawning with context handling.
ipc/pipe/pipe_factory_bench_test.go Updated benchmark tests to reflect changes in worker spawning methods.

🐇✨
To code, to build, under moon's gentle glow,
Changes afoot, in the system's deep flow.
Contexts embraced, with a config's keen sight,
Workers spawn forth, in the soft, silent night.
Through dispersion, through jitter, they find their way,
A rabbit's work done, as dawn greets the day.
🌟🐾

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 2

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 2d02d88 and c62e288.
Files selected for processing (4)
  • pool/allocator.go (4 hunks)
  • pool/config.go (2 hunks)
  • pool/static_pool/pool.go (2 hunks)
  • worker/worker.go (2 hunks)
Additional comments: 7
pool/allocator.go (2)
  • 5-6: Imports for math and math/rand are added to support the new functionality related to job dispersion calculation. Ensure these libraries are used effectively and securely.
  • 26-37: > 📝 NOTE

This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [29-45]

The NewPoolAllocator function now accepts a cfg *Config parameter, replacing the previous timeout parameter. This change aligns with the introduction of the MaxJobsDispersion feature. Ensure that all calls to this function have been updated to pass the correct configuration object.

Verification successful

The verification process has confirmed that the call to NewPoolAllocator in pool/static_pool/pool.go has been updated to pass the correct configuration object, aligning with the updated function signature that includes a cfg *Config parameter. There are no indications of other calls to NewPoolAllocator that have not been updated accordingly. This suggests that the necessary updates following the introduction of the MaxJobsDispersion feature have been made correctly.

Scripts Executed

The following scripts were executed for the analysis:


Script:

#!/bin/bash
# Search for NewPoolAllocator function calls to ensure they are updated.
rg 'NewPoolAllocator\('

Length of output: 275

pool/config.go (1)
  • 61-63: The initialization logic for MaxJobsDispersion ensures its value is within the range (0.0, 1.0]. This logic is correctly implemented. However, consider adding a comment to explain why values outside this range are reset to 1.0 for clarity.
pool/static_pool/pool.go (2)
  • 90-90: The NewPool function call has been updated to pass the correct set of arguments, reflecting the changes made to NewPoolAllocator. Verify that the updated arguments are correctly passed throughout the codebase.
  • 204-204: The condition to check if the maximum number of jobs has been reached now correctly uses the MaxJobsReached method. This change ensures that the logic for determining if a worker has reached its execution limit is centralized and can easily account for the MaxJobsDispersion feature.
worker/worker.go (2)
  • 487-489: The SetMaxExecs method correctly sets the maximum number of executions for a worker. This method is straightforward and correctly implements the intended functionality.
  • 491-493: The MaxJobsReached method correctly checks if the worker has reached its maximum number of executions. This method is essential for integrating the MaxJobsDispersion feature into the worker's lifecycle management.

pool/allocator.go Outdated Show resolved Hide resolved
pool/config.go Outdated Show resolved Hide resolved
pool/config.go Outdated Show resolved Hide resolved
pool/allocator.go Outdated Show resolved Hide resolved
pool/static_pool/pool.go Outdated Show resolved Hide resolved
worker/worker.go Show resolved Hide resolved
@rustatian rustatian added the C-enhancement Category: enhancement. Meaning improvements of current module, transport, etc.. label Feb 7, 2024
@rustatian rustatian marked this pull request as draft February 7, 2024 21:04
pool/allocator.go Outdated Show resolved Hide resolved
pool/allocator.go Outdated Show resolved Hide resolved
pool/config.go Outdated Show resolved Hide resolved
pool/static_pool/pool.go Outdated Show resolved Hide resolved
Signed-off-by: Vladimir Plakhotnikov <[email protected]>
@Kaspiman Kaspiman changed the title [WIP] Implementation of dispersion when restarting workers [WIP] Jitter when restarting workers Feb 12, 2024
@Kaspiman Kaspiman marked this pull request as ready for review February 12, 2024 16:20
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 5

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between c62e288 and e841f82.
Files selected for processing (7)
  • fsm/fsm.go (1 hunks)
  • ipc/pipe/pipe_factory.go (2 hunks)
  • ipc/socket/socket_factory.go (2 hunks)
  • pool/allocator.go (2 hunks)
  • pool/config.go (1 hunks)
  • pool/static_pool/pool.go (1 hunks)
  • worker/worker.go (3 hunks)
Files skipped from review as they are similar to previous changes (3)
  • pool/allocator.go
  • pool/config.go
  • pool/static_pool/pool.go
Additional comments: 2
worker/worker.go (2)
  • 34-36: Introduction of maxExecs in the Process struct and related methods (WithMaxExecs, SetMaxExecs, MaxJobsReached) aligns with the PR's objectives for MaxJobsDispersion. Ensure these capabilities are effectively utilized in worker initialization.
  • 35-35: Past discussions on the necessity of maxExecs and precalculated values for optimization are addressed by the current implementation, offering dynamic configuration and potential efficiency improvements.

ipc/pipe/pipe_factory.go Outdated Show resolved Hide resolved
ipc/pipe/pipe_factory.go Outdated Show resolved Hide resolved
fsm/fsm.go Outdated Show resolved Hide resolved
ipc/socket/socket_factory.go Outdated Show resolved Hide resolved
ipc/socket/socket_factory.go Outdated Show resolved Hide resolved
Signed-off-by: Vladimir Plakhotnikov <[email protected]>
Signed-off-by: Vladimir Plakhotnikov <[email protected]>
Signed-off-by: Vladimir Plakhotnikov <[email protected]>
fsm/fsm.go Outdated Show resolved Hide resolved
ipc/pipe/pipe_factory.go Outdated Show resolved Hide resolved
pool/allocator.go Outdated Show resolved Hide resolved
pool/config.go Outdated Show resolved Hide resolved
@rustatian rustatian marked this pull request as draft February 12, 2024 16:32
Copy link

codecov bot commented Feb 12, 2024

Codecov Report

Attention: Patch coverage is 90.90909% with 3 lines in your changes are missing coverage. Please review.

Project coverage is 73.08%. Comparing base (91b2b9c) to head (8f4fd4e).
Report is 23 commits behind head on master.

❗ Current head 8f4fd4e differs from pull request most recent head ce532e5. Consider uploading reports for the commit ce532e5 to get more accurate results

Files Patch % Lines
worker/options.go 83.33% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #113      +/-   ##
==========================================
+ Coverage   72.69%   73.08%   +0.39%     
==========================================
  Files          23       24       +1     
  Lines        2212     1646     -566     
==========================================
- Hits         1608     1203     -405     
+ Misses        553      395     -158     
+ Partials       51       48       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@rustatian
Copy link
Member

@Kaspiman tests are red 😢

@Kaspiman
Copy link
Contributor Author

@Kaspiman tests are red 😢

Tests are red and
Violets are blue.
PR is sweet, but not as sweet as free time on weekend ❤️

I'll finish the tests in the coming days, we're not in a hurry, are we?

@rustatian
Copy link
Member

@Kaspiman Sure, take your time 👍

@rustatian rustatian marked this pull request as draft February 24, 2024 15:27
ipc/socket/socket_factory.go Outdated Show resolved Hide resolved
ipc/socket/socket_factory_spawn_test.go Outdated Show resolved Hide resolved
@rustatian rustatian changed the title feature: max_jobs jitter to prevent thundering herd problem feature[RR v2024]: max_jobs jitter to prevent thundering herd problem Mar 8, 2024
Kaspiman added 4 commits March 8, 2024 23:43
Signed-off-by: Vladimir Plakhotnikov <[email protected]>
Signed-off-by: Vladimir Plakhotnikov <[email protected]>
Signed-off-by: Vladimir Plakhotnikov <[email protected]>
Signed-off-by: Vladimir Plakhotnikov <[email protected]>
@Kaspiman
Copy link
Contributor Author

@Kaspiman Sure, take your time 👍

Some tests are always failed because socket is busy. I tried to t.skip but it does not working properly. The code already contains test omissions regarding this. Did they ever work?

@rustatian
Copy link
Member

@Kaspiman Sure, take your time 👍

Some tests are always failed because socket is busy. I tried to t.skip but it does not working properly. The code already contains test omissions regarding this. Did they ever work?

Hey @Kaspiman 👋
Tests are failing, as far as I see in the logs, because of NPE, not because of busy socket.

Signed-off-by: Vladimir Plakhotnikov <[email protected]>
ipc/socket/socket_factory_spawn_test.go Outdated Show resolved Hide resolved
ipc/socket/socket_factory_spawn_test.go Outdated Show resolved Hide resolved
ipc/socket/socket_factory_spawn_test.go Outdated Show resolved Hide resolved
ipc/socket/socket_factory_spawn_test.go Outdated Show resolved Hide resolved
ipc/socket/socket_factory_spawn_test.go Outdated Show resolved Hide resolved
ipc/socket/socket_factory_spawn_test.go Outdated Show resolved Hide resolved
ipc/socket/socket_factory_spawn_test.go Outdated Show resolved Hide resolved
ipc/socket/socket_factory_spawn_test.go Outdated Show resolved Hide resolved
ipc/socket/socket_factory_spawn_test.go Outdated Show resolved Hide resolved
Signed-off-by: Vladimir Plakhotnikov <[email protected]>
Signed-off-by: Vladimir Plakhotnikov <[email protected]>
@Kaspiman Kaspiman requested a review from rustatian March 12, 2024 08:09
@rustatian rustatian marked this pull request as ready for review March 12, 2024 09:23
Copy link
Member

@rustatian rustatian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rustatian
Copy link
Member

Thanks @Kaspiman 👍 It'll be merged a little bit later, when I'll be preparing v2024.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 880e4cc and 8f4fd4e.
Files selected for processing (3)
  • ipc/socket/socket_factory.go (5 hunks)
  • ipc/socket/socket_factory_spawn_test.go (16 hunks)
  • ipc/socket/socket_factory_test.go (19 hunks)
Additional comments: 9
ipc/socket/socket_factory.go (3)
  • 33-36: The NewSocketServer function correctly initializes a new Factory instance with the provided listener and logger. This change aligns with the removal of the tout parameter as mentioned in the summary.
  • 86-90: The SpawnWorkerWithContext method has been introduced to replace the SpawnWorker method, accepting a context and additional worker options. This is a significant improvement for managing worker lifecycles with context, allowing for better timeout and cancellation handling.
  • 170-170: In the findRelayWithContext method, the error handling has been enhanced to return a more specific error when the context deadline is exceeded. This improvement provides clearer feedback to the caller about the nature of the failure.
ipc/socket/socket_factory_spawn_test.go (3)
  • 24-24: The use of require.NoError for asserting the absence of errors when closing the listener is a good practice, ensuring that the test fails immediately if an unexpected error occurs. This change enhances the robustness of the test suite.
  • 32-35: The transition to using context.WithTimeout for managing timeouts in socket operations is a positive change, aligning with modern Go practices for timeout and cancellation handling. This approach provides more flexibility and control over the execution flow.
  • 96-105: > 📝 NOTE

This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [87-102]

In the Test_Tcp_StartError2 test case, the introduction of a context with a timeout for spawning workers is a significant improvement. It ensures that the test case can gracefully handle situations where the worker fails to start within the expected timeframe.

ipc/socket/socket_factory_test.go (3)
  • 35-35: The update to use SpawnWorkerWithContext in the Test_Tcp_Start test case is consistent with the changes in the main codebase, demonstrating the new approach to spawning workers with context. This ensures that the tests accurately reflect the current functionality.
  • 101-109: > 📝 NOTE

This review was outside the diff hunks and was mapped to the diff hunk with the greatest overlap. Original lines [83-106]

In the Test_Tcp_StartError test case, the use of context.WithTimeout and the subsequent call to SpawnWorkerWithContext align with the updated worker spawning mechanism. This change ensures that the test case properly handles timeout scenarios.

  • 154-154: The Test_Tcp_Timeout test case correctly uses a very short timeout to simulate a timeout scenario. This test validates the behavior of SpawnWorkerWithContext when the context expires before the worker can be successfully spawned.

@rustatian rustatian merged commit 015d517 into roadrunner-server:master Apr 9, 2024
5 checks passed
@rustatian
Copy link
Member

Thanks @Kaspiman 👍

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
C-enhancement Category: enhancement. Meaning improvements of current module, transport, etc..
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

2 participants